feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)

Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed identity and authorisation model. The gateway no longer has an "allow-all" or "no auth" mode; every request is authenticated via the IAM service, authorised against a capability model that encodes both the operation and the workspace it targets, and rejected with a deliberately-uninformative 401 / 403 on any failure. IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam) ----------------------------------------------------------------------- * New backend service (iam-svc) owning users, workspaces, API keys, passwords and JWT signing keys in Cassandra. Reached over the standard pub/sub request/response pattern; gateway is the only caller. * Operations: bootstrap, resolve-api-key, login, get-signing-key-public, rotate-signing-key, create/list/get/update/disable/delete/enable-user, change-password, reset-password, create/list/get/update/disable- workspace, create/list/revoke-api-key. * Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and retires the previous one; validation is grace-period friendly. * Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt. * API keys: 128-bit random, SHA-256 hashed. Plaintext returned once. * Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a required startup argument with no permissive default. Masked "auth failure" errors hide whether a refused bootstrap request was due to mode, state, or authorisation. Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py) ------------------------------------------------------------------- * IamAuth replaces the legacy Authenticator. Distinguishes JWTs (three-segment dotted) from API keys by shape; verifies JWTs locally using the cached IAM public key; resolves API keys via IAM with a short-TTL hash-keyed cache. Every failure path surfaces the same 401 body ("auth failure") so callers cannot enumerate credential state. * Public key is fetched at gateway startup with a bounded retry loop; traffic does not begin flowing until auth has started. Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py) --------------------------------------------------------------------- * Roles have two dimensions: a capability set and a workspace scope. OSS ships reader / writer / admin; the first two are workspace- assigned, admin is cross-workspace ("*"). No "cross-workspace" pseudo-capability — workspace permission is a property of the role. * check(identity, capability, target_workspace=None) is the single authorisation test: some role must grant the capability *and* be active in the target workspace. * enforce_workspace validates a request-body workspace against the caller's role scopes and injects the resolved value. Cross- workspace admin is permitted by role scope, not by a bypass. * Gateway endpoints declare a required capability explicitly — no permissive default. Construction fails fast if omitted. Enterprise editions can replace the role table without changing the wire protocol. WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py) ---------------------------------------------------------------- * /api/v1/socket handshake unconditionally accepts; authentication runs on the first WebSocket frame ({"type":"auth","token":"..."}) with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}. The socket stays open on failure so the client can re-authenticate — browsers treat a handshake-time 401 as terminal, breaking reconnection. * Mux.receive rejects every non-auth frame before auth succeeds, enforces the caller's workspace (envelope + inner payload) using the role-scope resolver, and supports mid-session re-auth. * Flow import/export streaming endpoints keep the legacy ?token= handshake (URL-scoped short-lived transfers; no re-auth need). Auth surface ------------ * POST /api/v1/auth/login — public, returns a JWT. * POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap op which itself enforces mode + tables-empty. * POST /api/v1/auth/change-password — any authenticated user. * POST /api/v1/iam — admin-only generic forwarder for the rest of the IAM API (per-op REST endpoints to follow in a later change). Removed / breaking ------------------ * GATEWAY_SECRET / --api-token / default_api_token and the legacy Authenticator.permitted contract. The gateway cannot run without IAM. * ?token= on /api/v1/socket. * DispatcherManager and Mux both raise on auth=None — no silent downgrade path. CLI tools (trustgraph-cli) -------------------------- tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users, tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password, tg-reset-password, tg-create-api-key, tg-list-api-keys, tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords read via getpass; tokens / one-time secrets written to stdout with operator context on stderr so shell composition works cleanly. AsyncSocketClient / SocketClient updated to the first-frame auth protocol. Specifications -------------- * docs/tech-specs/iam.md updated with the error policy, workspace resolver extension point, and OSS role-scope model. * docs/tech-specs/iam-protocol.md (new) — transport, dataclasses, operation table, error taxonomy, bootstrap modes. * docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS role bundles, agent-as-composition note, enforcement-boundary policy, enterprise extensibility. Tests ----- * test_auth.py (rewritten) — IamAuth + JWT round-trip with real Ed25519 keypairs + API-key cache behaviour. * test_capabilities.py (new) — role table sanity, check across role x workspace combinations, enforce_workspace paths, unknown-cap / unknown-role fail-closed. * Every endpoint test construction now names its capability explicitly (no permissive defaults relied upon). New tests pin the fail-closed invariants: DispatcherManager / Mux refuse auth=None; i18n path-traversal defense is exercised. * test_socket_graceful_shutdown rewritten against IamAuth.
2026-07-11 06:12:11 +02:00 · 2026-04-24 17:29:10 +01:00 · 2026-04-24 17:29:10 +01:00 · 67b2fc448f
commit 67b2fc448f
parent ae9936c9cc
61 changed files with 6474 additions and 792 deletions
--- a/trustgraph-flow/pyproject.toml
+++ b/trustgraph-flow/pyproject.toml
@ -63,6 +63,7 @@ chunker-token = "trustgraph.chunking.token:run"
 bootstrap = "trustgraph.bootstrap.bootstrapper:run"
 config-svc = "trustgraph.config.service:run"
 flow-svc = "trustgraph.flow.service:run"
+iam-svc = "trustgraph.iam.service:run"
 doc-embeddings-query-milvus = "trustgraph.query.doc_embeddings.milvus:run"
 doc-embeddings-query-pinecone = "trustgraph.query.doc_embeddings.pinecone:run"
 doc-embeddings-query-qdrant = "trustgraph.query.doc_embeddings.qdrant:run"
--- a/trustgraph-flow/trustgraph/gateway/auth.py
+++ b/trustgraph-flow/trustgraph/gateway/auth.py
@ -1,22 +1,264 @@
+"""
+IAM-backed authentication for the API gateway.

-class Authenticator:
+Replaces the legacy GATEWAY_SECRET shared-token Authenticator.  The
+gateway is now stateless with respect to credentials: it either
+verifies a JWT locally using the active IAM signing public key, or
+resolves an API key by hash with a short local cache backed by the
+IAM service.

-    def __init__(self, token=None, allow_all=False):
+Identity returned by authenticate() is the (user_id, workspace,
+roles) triple the rest of the gateway — capability checks, workspace
+resolver, audit logging — needs.
+"""

-        if not allow_all and token is None:
-            raise RuntimeError("Need a token")
+import asyncio
+import base64
+import hashlib
+import json
+import logging
+import time
+import uuid
+from dataclasses import dataclass

-        if not allow_all and token == "":
-            raise RuntimeError("Need a token")
+from aiohttp import web

-        self.token = token
-        self.allow_all = allow_all
+from cryptography.hazmat.primitives import serialization
+from cryptography.hazmat.primitives.asymmetric import ed25519

-    def permitted(self, token, roles):
+from ..base.iam_client import IamClient
+from ..base.metrics import ProducerMetrics, SubscriberMetrics
+from ..schema import (
+    IamRequest, IamResponse,
+    iam_request_queue, iam_response_queue,
+)

-        if self.allow_all: return True
+logger = logging.getLogger("auth")

-        if self.token != token: return False
+API_KEY_CACHE_TTL = 60  # seconds

-        return True

+@dataclass
+class Identity:
+    user_id: str
+    workspace: str
+    roles: list
+    source: str   # "api-key" | "jwt"
+
+
+def _auth_failure():
+    return web.HTTPUnauthorized(
+        text='{"error":"auth failure"}',
+        content_type="application/json",
+    )
+
+
+def _access_denied():
+    return web.HTTPForbidden(
+        text='{"error":"access denied"}',
+        content_type="application/json",
+    )
+
+
+def _b64url_decode(s):
+    pad = "=" * (-len(s) % 4)
+    return base64.urlsafe_b64decode(s + pad)
+
+
+def _verify_jwt_eddsa(token, public_pem):
+    """Verify an Ed25519 JWT and return its claims.  Raises on any
+    validation failure.  Refuses non-EdDSA algorithms."""
+    parts = token.split(".")
+    if len(parts) != 3:
+        raise ValueError("malformed JWT")
+    h_b64, p_b64, s_b64 = parts
+    signing_input = f"{h_b64}.{p_b64}".encode("ascii")
+    header = json.loads(_b64url_decode(h_b64))
+    if header.get("alg") != "EdDSA":
+        raise ValueError(f"unsupported alg: {header.get('alg')!r}")
+
+    key = serialization.load_pem_public_key(public_pem.encode("ascii"))
+    if not isinstance(key, ed25519.Ed25519PublicKey):
+        raise ValueError("public key is not Ed25519")
+
+    signature = _b64url_decode(s_b64)
+    key.verify(signature, signing_input)  # raises InvalidSignature
+
+    claims = json.loads(_b64url_decode(p_b64))
+    exp = claims.get("exp")
+    if exp is None or exp < time.time():
+        raise ValueError("expired")
+    return claims
+
+
+class IamAuth:
+    """Resolves bearer credentials via the IAM service.
+
+    Used by every gateway endpoint that needs authentication.  Fetches
+    the IAM signing public key at startup (cached in memory).  API
+    keys are resolved via the IAM service with a local hash→identity
+    cache (short TTL so revoked keys stop working within the TTL
+    window without any push mechanism)."""
+
+    def __init__(self, backend, id="api-gateway"):
+        self.backend = backend
+        self.id = id
+
+        # Populated at start() via IAM.
+        self._signing_public_pem = None
+
+        # API-key cache: plaintext_sha256_hex -> (Identity, expires_ts)
+        self._key_cache = {}
+        self._key_cache_lock = asyncio.Lock()
+
+    # ------------------------------------------------------------------
+    # Short-lived client helper.  Mirrors the pattern used by the
+    # bootstrap framework and AsyncProcessor: a fresh uuid suffix per
+    # invocation so Pulsar exclusive subscriptions don't collide with
+    # ghosts from prior calls.
+    # ------------------------------------------------------------------
+
+    def _make_client(self):
+        rr_id = str(uuid.uuid4())
+        return IamClient(
+            backend=self.backend,
+            subscription=f"{self.id}--iam--{rr_id}",
+            consumer_name=self.id,
+            request_topic=iam_request_queue,
+            request_schema=IamRequest,
+            request_metrics=ProducerMetrics(
+                processor=self.id, flow=None, name="iam-request",
+            ),
+            response_topic=iam_response_queue,
+            response_schema=IamResponse,
+            response_metrics=SubscriberMetrics(
+                processor=self.id, flow=None, name="iam-response",
+            ),
+        )
+
+    async def _with_client(self, op):
+        """Open a short-lived IamClient, run ``op(client)``, close."""
+        client = self._make_client()
+        await client.start()
+        try:
+            return await op(client)
+        finally:
+            try:
+                await client.stop()
+            except Exception:
+                pass
+
+    # ------------------------------------------------------------------
+    # Lifecycle
+    # ------------------------------------------------------------------
+
+    async def start(self, max_retries=30, retry_delay=2.0):
+        """Fetch the signing public key from IAM.  Retries on
+        failure — the gateway may be starting before IAM is ready."""
+
+        async def _fetch(client):
+            return await client.get_signing_key_public()
+
+        for attempt in range(max_retries):
+            try:
+                pem = await self._with_client(_fetch)
+                if pem:
+                    self._signing_public_pem = pem
+                    logger.info(
+                        "IamAuth: fetched IAM signing public key "
+                        f"({len(pem)} bytes)"
+                    )
+                    return
+            except Exception as e:
+                logger.info(
+                    f"IamAuth: waiting for IAM signing key "
+                    f"({type(e).__name__}: {e}); "
+                    f"retry {attempt + 1}/{max_retries}"
+                )
+            await asyncio.sleep(retry_delay)
+
+        # Don't prevent startup forever.  A later authenticate() call
+        # will try again via the JWT path.
+        logger.warning(
+            "IamAuth: could not fetch IAM signing key at startup; "
+            "JWT validation will fail until it's available"
+        )
+
+    # ------------------------------------------------------------------
+    # Authentication
+    # ------------------------------------------------------------------
+
+    async def authenticate(self, request):
+        """Extract and validate the Bearer credential from an HTTP
+        request.  Returns an ``Identity``.  Raises HTTPUnauthorized
+        (401 / "auth failure") on any failure mode — the caller
+        cannot distinguish missing / malformed / invalid / expired /
+        revoked credentials."""
+
+        header = request.headers.get("Authorization", "")
+        if not header.startswith("Bearer "):
+            raise _auth_failure()
+        token = header[len("Bearer "):].strip()
+        if not token:
+            raise _auth_failure()
+
+        # API keys always start with "tg_".  JWTs have two dots and
+        # no "tg_" prefix.  Discriminate cheaply.
+        if token.startswith("tg_"):
+            return await self._resolve_api_key(token)
+        if token.count(".") == 2:
+            return self._verify_jwt(token)
+        raise _auth_failure()
+
+    def _verify_jwt(self, token):
+        if not self._signing_public_pem:
+            raise _auth_failure()
+        try:
+            claims = _verify_jwt_eddsa(token, self._signing_public_pem)
+        except Exception as e:
+            logger.debug(f"JWT validation failed: {type(e).__name__}: {e}")
+            raise _auth_failure()
+
+        sub = claims.get("sub", "")
+        ws = claims.get("workspace", "")
+        roles = list(claims.get("roles", []))
+        if not sub or not ws:
+            raise _auth_failure()
+
+        return Identity(
+            user_id=sub, workspace=ws, roles=roles, source="jwt",
+        )
+
+    async def _resolve_api_key(self, plaintext):
+        h = hashlib.sha256(plaintext.encode("utf-8")).hexdigest()
+
+        cached = self._key_cache.get(h)
+        now = time.time()
+        if cached and cached[1] > now:
+            return cached[0]
+
+        async with self._key_cache_lock:
+            cached = self._key_cache.get(h)
+            if cached and cached[1] > now:
+                return cached[0]
+
+            try:
+                async def _call(client):
+                    return await client.resolve_api_key(plaintext)
+                user_id, workspace, roles = await self._with_client(_call)
+            except Exception as e:
+                logger.debug(
+                    f"API key resolution failed: "
+                    f"{type(e).__name__}: {e}"
+                )
+                raise _auth_failure()
+
+            if not user_id or not workspace:
+                raise _auth_failure()
+
+            identity = Identity(
+                user_id=user_id, workspace=workspace,
+                roles=list(roles), source="api-key",
+            )
+            self._key_cache[h] = (identity, now + API_KEY_CACHE_TTL)
+            return identity
--- a/trustgraph-flow/trustgraph/gateway/capabilities.py
+++ b/trustgraph-flow/trustgraph/gateway/capabilities.py
@ -0,0 +1,238 @@
+"""
+Capability vocabulary, role definitions, and authorisation helpers.
+
+See docs/tech-specs/capabilities.md for the authoritative description.
+The data here is the OSS bundle table in that spec.  Enterprise
+editions may replace this module with their own role table; the
+vocabulary (capability strings) is shared.
+
+Role model
+----------
+A role has two dimensions:
+
+  1. **capability set** — which operations the role grants.
+  2. **workspace scope** — which workspaces the role is active in.
+
+The authorisation question is: *given the caller's roles, a required
+capability, and a target workspace, does any role grant the
+capability AND apply to the target workspace?*
+
+Workspace scope values recognised here:
+
+  - ``"assigned"`` — the role applies only to the caller's own
+    assigned workspace (stored on their user record).
+  - ``"*"`` — the role applies to every workspace.
+
+Enterprise editions can add richer scopes (explicit permitted-set,
+patterns, etc.) without changing the wire protocol.
+
+Sentinels
+---------
+- ``PUBLIC`` — endpoint requires no authentication.
+- ``AUTHENTICATED`` — endpoint requires a valid identity, no
+  specific capability.
+"""
+
+from aiohttp import web
+
+
+PUBLIC = "__public__"
+AUTHENTICATED = "__authenticated__"
+
+
+# Capability vocabulary.  Mirrors the "Capability list" tables in
+# capabilities.md.  Kept as a set so the gateway can fail-closed on
+# an endpoint that declares an unknown capability.
+KNOWN_CAPABILITIES = {
+    # Data plane
+    "agent",
+    "graph:read", "graph:write",
+    "documents:read", "documents:write",
+    "rows:read", "rows:write",
+    "llm",
+    "embeddings",
+    "mcp",
+    # Control plane
+    "config:read", "config:write",
+    "flows:read", "flows:write",
+    "users:read", "users:write", "users:admin",
+    "keys:self", "keys:admin",
+    "workspaces:admin",
+    "iam:admin",
+    "metrics:read",
+    "collections:read", "collections:write",
+    "knowledge:read", "knowledge:write",
+}
+
+
+# Capability sets used below.
+_READER_CAPS = {
+    "agent",
+    "graph:read",
+    "documents:read",
+    "rows:read",
+    "llm",
+    "embeddings",
+    "mcp",
+    "config:read",
+    "flows:read",
+    "collections:read",
+    "knowledge:read",
+    "keys:self",
+}
+
+_WRITER_CAPS = _READER_CAPS | {
+    "graph:write",
+    "documents:write",
+    "rows:write",
+    "collections:write",
+    "knowledge:write",
+}
+
+_ADMIN_CAPS = _WRITER_CAPS | {
+    "config:write",
+    "flows:write",
+    "users:read", "users:write", "users:admin",
+    "keys:admin",
+    "workspaces:admin",
+    "iam:admin",
+    "metrics:read",
+}
+
+
+# Role definitions.  Each role has a capability set and a workspace
+# scope.  Enterprise overrides this mapping.
+ROLE_DEFINITIONS = {
+    "reader": {
+        "capabilities": _READER_CAPS,
+        "workspace_scope": "assigned",
+    },
+    "writer": {
+        "capabilities": _WRITER_CAPS,
+        "workspace_scope": "assigned",
+    },
+    "admin": {
+        "capabilities": _ADMIN_CAPS,
+        "workspace_scope": "*",
+    },
+}
+
+
+def _scope_permits(role_name, target_workspace, assigned_workspace):
+    """Does the given role apply to ``target_workspace``?"""
+    role = ROLE_DEFINITIONS.get(role_name)
+    if role is None:
+        return False
+    scope = role["workspace_scope"]
+    if scope == "*":
+        return True
+    if scope == "assigned":
+        return target_workspace == assigned_workspace
+    # Future scope types (lists, patterns) extend here.
+    return False
+
+
+def check(identity, capability, target_workspace=None):
+    """Is ``identity`` permitted to invoke ``capability`` on
+    ``target_workspace``?
+
+    Passes iff some role held by the caller both (a) grants
+    ``capability`` and (b) is active in ``target_workspace``.
+
+    ``target_workspace`` defaults to the caller's assigned workspace,
+    which makes this function usable for system-level operations and
+    for authenticated endpoints that don't take a workspace argument
+    (the call collapses to "do any of my roles grant this cap?")."""
+    if capability not in KNOWN_CAPABILITIES:
+        return False
+
+    target = target_workspace or identity.workspace
+
+    for role_name in identity.roles:
+        role = ROLE_DEFINITIONS.get(role_name)
+        if role is None:
+            continue
+        if capability not in role["capabilities"]:
+            continue
+        if _scope_permits(role_name, target, identity.workspace):
+            return True
+    return False
+
+
+def access_denied():
+    return web.HTTPForbidden(
+        text='{"error":"access denied"}',
+        content_type="application/json",
+    )
+
+
+def auth_failure():
+    return web.HTTPUnauthorized(
+        text='{"error":"auth failure"}',
+        content_type="application/json",
+    )
+
+
+async def enforce(request, auth, capability):
+    """Authenticate + capability-check for endpoints that carry no
+    workspace dimension on the request (metrics, i18n, etc.).
+
+    For endpoints that carry a workspace field on the body, call
+    :func:`enforce_workspace` *after* parsing the body to validate
+    the workspace and re-check the capability in that scope.  Most
+    endpoints do both.
+
+    - ``PUBLIC``: no authentication, returns ``None``.
+    - ``AUTHENTICATED``: any valid identity.
+    - capability string: identity must have it, checked against the
+      caller's assigned workspace (adequate for endpoints whose
+      capability is system-level, e.g. ``metrics:read``, or where
+      the real workspace-aware check happens in
+      :func:`enforce_workspace` after body parsing)."""
+    if capability == PUBLIC:
+        return None
+
+    identity = await auth.authenticate(request)
+
+    if capability == AUTHENTICATED:
+        return identity
+
+    if not check(identity, capability):
+        raise access_denied()
+
+    return identity
+
+
+def enforce_workspace(data, identity, capability=None):
+    """Resolve + validate the workspace on a request body.
+
+    - Target workspace = ``data["workspace"]`` if supplied, else the
+      caller's assigned workspace.
+    - At least one of the caller's roles must (a) be active in the
+      target workspace and, if ``capability`` is given, (b) grant
+      ``capability``.  Otherwise 403.
+    - On success, ``data["workspace"]`` is overwritten with the
+      resolved value — callers can rely on the outgoing message
+      having the gateway's chosen workspace rather than any
+      caller-supplied value.
+
+    For ``capability=None`` the workspace scope alone is checked —
+    useful when the body has a workspace but the endpoint already
+    passed its capability check (e.g. via :func:`enforce`)."""
+    if not isinstance(data, dict):
+        return data
+
+    requested = data.get("workspace", "")
+    target = requested or identity.workspace
+
+    for role_name in identity.roles:
+        role = ROLE_DEFINITIONS.get(role_name)
+        if role is None:
+            continue
+        if capability is not None and capability not in role["capabilities"]:
+            continue
+        if _scope_permits(role_name, target, identity.workspace):
+            data["workspace"] = target
+            return data
+
+    raise access_denied()
--- a/trustgraph-flow/trustgraph/gateway/dispatch/iam.py
+++ b/trustgraph-flow/trustgraph/gateway/dispatch/iam.py
@ -0,0 +1,40 @@
+
+from ... schema import IamRequest, IamResponse
+from ... schema import iam_request_queue, iam_response_queue
+from ... messaging import TranslatorRegistry
+
+from . requestor import ServiceRequestor
+
+
+class IamRequestor(ServiceRequestor):
+    def __init__(self, backend, consumer, subscriber, timeout=120,
+                 request_queue=None, response_queue=None):
+
+        if request_queue is None:
+            request_queue = iam_request_queue
+        if response_queue is None:
+            response_queue = iam_response_queue
+
+        super().__init__(
+            backend=backend,
+            consumer_name=consumer,
+            subscription=subscriber,
+            request_queue=request_queue,
+            response_queue=response_queue,
+            request_schema=IamRequest,
+            response_schema=IamResponse,
+            timeout=timeout,
+        )
+
+        self.request_translator = (
+            TranslatorRegistry.get_request_translator("iam")
+        )
+        self.response_translator = (
+            TranslatorRegistry.get_response_translator("iam")
+        )
+
+    def to_request(self, body):
+        return self.request_translator.decode(body)
+
+    def from_response(self, message):
+        return self.response_translator.encode_with_completion(message)
--- a/trustgraph-flow/trustgraph/gateway/dispatch/manager.py
+++ b/trustgraph-flow/trustgraph/gateway/dispatch/manager.py
@ -9,6 +9,7 @@ logger = logging.getLogger(__name__)

 from . config import ConfigRequestor
 from . flow import FlowRequestor
+from . iam import IamRequestor
 from . librarian import LibrarianRequestor
 from . knowledge import KnowledgeRequestor
 from . collection_management import CollectionManagementRequestor
@ -72,6 +73,7 @@ request_response_dispatchers = {
 global_dispatchers = {
    "config": ConfigRequestor,
    "flow": FlowRequestor,
+    "iam": IamRequestor,
    "librarian": LibrarianRequestor,
    "knowledge": KnowledgeRequestor,
    "collection-management": CollectionManagementRequestor,
@ -105,13 +107,31 @@ class DispatcherWrapper:

 class DispatcherManager:

-    def __init__(self, backend, config_receiver, prefix="api-gateway",
-                 queue_overrides=None):
+    def __init__(self, backend, config_receiver, auth,
+                 prefix="api-gateway", queue_overrides=None):
+        """
+        ``auth`` is required.  It flows into the Mux for first-frame
+        WebSocket authentication and into downstream dispatcher
+        construction.  There is no permissive default — constructing
+        a DispatcherManager without an authenticator would be a
+        silent downgrade to no-auth on the socket path.
+        """
+        if auth is None:
+            raise ValueError(
+                "DispatcherManager requires an 'auth' argument — there "
+                "is no no-auth mode"
+            )
+
        self.backend = backend
        self.config_receiver = config_receiver
        self.config_receiver.add_handler(self)
        self.prefix = prefix

+        # Gateway IamAuth — used by the socket Mux for first-frame
+        # auth and by any dispatcher that needs to resolve caller
+        # identity out-of-band.
+        self.auth = auth
+
        # Store queue overrides for global services
        # Format: {"config": {"request": "...", "response": "..."}, ...}
        self.queue_overrides = queue_overrides or {}
@ -163,6 +183,15 @@ class DispatcherManager:
    def dispatch_global_service(self):
        return DispatcherWrapper(self.process_global_service)

+    def dispatch_auth_iam(self):
+        """Pre-configured IAM dispatcher for the gateway's auth
+        endpoints (login, bootstrap, change-password).  Pins the
+        kind to ``iam`` so these handlers don't have to supply URL
+        params the global dispatcher would expect."""
+        async def _process(data, responder):
+            return await self.invoke_global_service(data, responder, "iam")
+        return DispatcherWrapper(_process)
+
    def dispatch_core_export(self):
        return DispatcherWrapper(self.process_core_export)

@ -314,7 +343,10 @@ class DispatcherManager:

    async def process_socket(self, ws, running, params):

-        dispatcher = Mux(self, ws, running)
+        # The mux self-authenticates via the first-frame protocol;
+        # pass the gateway's IamAuth so it can validate tokens
+        # without reaching back into the endpoint layer.
+        dispatcher = Mux(self, ws, running, auth=self.auth)

        return dispatcher

--- a/trustgraph-flow/trustgraph/gateway/dispatch/mux.py
+++ b/trustgraph-flow/trustgraph/gateway/dispatch/mux.py
@ -16,11 +16,28 @@ MAX_QUEUE_SIZE = 10

 class Mux:

-    def __init__(self, dispatcher_manager, ws, running):
+    def __init__(self, dispatcher_manager, ws, running, auth):
+        """
+        ``auth`` is required — the Mux implements the first-frame
+        auth protocol described in ``iam.md`` and will refuse any
+        non-auth frame until an ``auth-ok`` has been issued.  There
+        is no no-auth mode.
+        """
+        if auth is None:
+            raise ValueError(
+                "Mux requires an 'auth' argument — there is no "
+                "no-auth mode"
+            )

        self.dispatcher_manager = dispatcher_manager
        self.ws = ws
        self.running = running
+        self.auth = auth
+
+        # Authenticated identity, populated by the first-frame auth
+        # protocol.  ``None`` means the socket is not yet
+        # authenticated; any non-auth frame is refused.
+        self.identity = None

        self.q = asyncio.Queue(maxsize=MAX_QUEUE_SIZE)

@ -31,6 +48,41 @@ class Mux:
        if self.ws:
            await self.ws.close()

+    async def _handle_auth_frame(self, data):
+        """Process a ``{"type": "auth", "token": "..."}`` frame.
+        On success, updates ``self.identity`` and returns an
+        ``auth-ok`` response frame.  On failure, returns the masked
+        auth-failure frame.  Never raises — auth failures keep the
+        socket open so the client can retry without reconnecting
+        (important for browsers, which treat a handshake-time 401
+        as terminal)."""
+        token = data.get("token", "")
+        if not token:
+            await self.ws.send_json({
+                "type": "auth-failed",
+                "error": "auth failure",
+            })
+            return
+
+        class _Shim:
+            def __init__(self, tok):
+                self.headers = {"Authorization": f"Bearer {tok}"}
+
+        try:
+            identity = await self.auth.authenticate(_Shim(token))
+        except Exception:
+            await self.ws.send_json({
+                "type": "auth-failed",
+                "error": "auth failure",
+            })
+            return
+
+        self.identity = identity
+        await self.ws.send_json({
+            "type": "auth-ok",
+            "workspace": identity.workspace,
+        })
+
    async def receive(self, msg):

        request_id = None
@ -38,6 +90,16 @@ class Mux:
        try:

            data = msg.json()
+
+            # In-band auth protocol: the client sends
+            # ``{"type": "auth", "token": "..."}`` as its first frame
+            # (and any time it wants to re-auth: JWT refresh, token
+            # rotation, etc).  Auth is always required on a Mux —
+            # there is no no-auth mode.
+            if isinstance(data, dict) and data.get("type") == "auth":
+                await self._handle_auth_frame(data)
+                return
+
            request_id = data.get("id")

            if "request" not in data:
@ -46,9 +108,49 @@ class Mux:
            if "id" not in data:
                raise RuntimeError("Bad message")

+            # Reject all non-auth frames until an ``auth-ok`` has
+            # been issued.
+            if self.identity is None:
+                await self.ws.send_json({
+                    "id": request_id,
+                    "error": {
+                        "message": "auth failure",
+                        "type": "auth-required",
+                    },
+                    "complete": True,
+                })
+                return
+
+            # Workspace resolution.  Role workspace scope determines
+            # which target workspaces are permitted.  The resolved
+            # value is written to both the envelope and the inner
+            # request payload so clients don't have to repeat it
+            # per-message (same convenience HTTP callers get via
+            # enforce_workspace).
+            from ..capabilities import enforce_workspace
+            from aiohttp import web as _web
+
+            try:
+                enforce_workspace(data, self.identity)
+                inner = data.get("request")
+                if isinstance(inner, dict):
+                    enforce_workspace(inner, self.identity)
+            except _web.HTTPForbidden:
+                await self.ws.send_json({
+                    "id": request_id,
+                    "error": {
+                        "message": "access denied",
+                        "type": "access-denied",
+                    },
+                    "complete": True,
+                })
+                return
+
+            workspace = data["workspace"]
+
            await self.q.put((
                    data["id"],
-                    data.get("workspace", "default"),
+                    workspace,
                    data.get("flow"),
                    data["service"],
                    data["request"]
--- a/trustgraph-flow/trustgraph/gateway/endpoint/auth_endpoints.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/auth_endpoints.py
@ -0,0 +1,115 @@
+"""
+Gateway auth endpoints.
+
+Three dedicated paths:
+  POST /api/v1/auth/login            — unauthenticated; username/password → JWT
+  POST /api/v1/auth/bootstrap         — unauthenticated; IAM bootstrap op
+  POST /api/v1/auth/change-password   — authenticated; any role
+
+These are the only IAM-surface operations that can be reached from
+outside.  Everything else routes through ``/api/v1/iam`` gated by
+``users:admin``.
+"""
+
+import logging
+
+from aiohttp import web
+
+from .. capabilities import enforce, PUBLIC, AUTHENTICATED
+
+logger = logging.getLogger("auth-endpoints")
+logger.setLevel(logging.INFO)
+
+
+class AuthEndpoints:
+    """Groups the three auth-surface handlers.  Each forwards to the
+    IAM service via the existing ``IamRequestor`` dispatcher."""
+
+    def __init__(self, iam_dispatcher, auth):
+        self.iam = iam_dispatcher
+        self.auth = auth
+
+    async def start(self):
+        pass
+
+    def add_routes(self, app):
+        app.add_routes([
+            web.post("/api/v1/auth/login", self.login),
+            web.post("/api/v1/auth/bootstrap", self.bootstrap),
+            web.post(
+                "/api/v1/auth/change-password",
+                self.change_password,
+            ),
+        ])
+
+    async def _forward(self, body):
+        async def responder(x, fin):
+            pass
+        return await self.iam.process(body, responder)
+
+    async def login(self, request):
+        """Public.  Accepts {username, password, workspace?}.  Returns
+        {jwt, jwt_expires} on success; IAM's masked auth failure on
+        anything else."""
+        await enforce(request, self.auth, PUBLIC)
+        try:
+            body = await request.json()
+        except Exception:
+            return web.json_response(
+                {"error": "invalid json"}, status=400,
+            )
+        req = {
+            "operation": "login",
+            "username": body.get("username", ""),
+            "password": body.get("password", ""),
+            "workspace": body.get("workspace", ""),
+        }
+        resp = await self._forward(req)
+        if "error" in resp:
+            return web.json_response(
+                {"error": "auth failure"}, status=401,
+            )
+        return web.json_response(resp)
+
+    async def bootstrap(self, request):
+        """Public.  Valid only when IAM is running in bootstrap mode
+        with empty tables.  In every other case the IAM service
+        returns a masked auth-failure."""
+        await enforce(request, self.auth, PUBLIC)
+        resp = await self._forward({"operation": "bootstrap"})
+        if "error" in resp:
+            return web.json_response(
+                {"error": "auth failure"}, status=401,
+            )
+        return web.json_response(resp)
+
+    async def change_password(self, request):
+        """Authenticated (any role).  Accepts {current_password,
+        new_password}; user_id is taken from the authenticated
+        identity — the caller cannot change someone else's password
+        this way (reset-password is the admin path)."""
+        identity = await enforce(request, self.auth, AUTHENTICATED)
+        try:
+            body = await request.json()
+        except Exception:
+            return web.json_response(
+                {"error": "invalid json"}, status=400,
+            )
+        req = {
+            "operation": "change-password",
+            "user_id": identity.user_id,
+            "password": body.get("current_password", ""),
+            "new_password": body.get("new_password", ""),
+        }
+        resp = await self._forward(req)
+        if "error" in resp:
+            err_type = resp.get("error", {}).get("type", "")
+            if err_type == "auth-failed":
+                return web.json_response(
+                    {"error": "auth failure"}, status=401,
+                )
+            return web.json_response(
+                {"error": resp.get("error", {}).get("message", "error")},
+                status=400,
+            )
+        return web.json_response(resp)
--- a/trustgraph-flow/trustgraph/gateway/endpoint/constant_endpoint.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/constant_endpoint.py
@ -1,28 +1,27 @@

-import asyncio
-from aiohttp import web
-import uuid
 import logging

+from aiohttp import web
+
+from .. capabilities import enforce, enforce_workspace
+
 logger = logging.getLogger("endpoint")
 logger.setLevel(logging.INFO)

+
 class ConstantEndpoint:

-    def __init__(self, endpoint_path, auth, dispatcher):
+    def __init__(self, endpoint_path, auth, dispatcher, capability):

        self.path = endpoint_path
-
        self.auth = auth
-        self.operation = "service"
-
+        self.capability = capability
        self.dispatcher = dispatcher

    async def start(self):
        pass

    def add_routes(self, app):
-
        app.add_routes([
            web.post(self.path, self.handle),
        ])
@ -31,22 +30,14 @@ class ConstantEndpoint:

        logger.debug(f"Processing request: {request.path}")

-        try:
-            ht = request.headers["Authorization"]
-            tokens = ht.split(" ", 2)
-            if tokens[0] != "Bearer":
-                return web.HTTPUnauthorized()
-            token = tokens[1]
-        except:
-            token = ""
-
-        if not self.auth.permitted(token, self.operation):
-            return web.HTTPUnauthorized()
+        identity = await enforce(request, self.auth, self.capability)

        try:
-
            data = await request.json()

+            if identity is not None:
+                enforce_workspace(data, identity)
+
            async def responder(x, fin):
                pass

@ -54,10 +45,8 @@ class ConstantEndpoint:

            return web.json_response(resp)

+        except web.HTTPException:
+            raise
        except Exception as e:
-            logging.error(f"Exception: {e}")
-
-            return web.json_response(
-                { "error": str(e) }
-            )
-
+            logger.error(f"Exception: {e}", exc_info=True)
+            return web.json_response({"error": str(e)})
--- a/trustgraph-flow/trustgraph/gateway/endpoint/i18n.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/i18n.py
@ -4,16 +4,18 @@ from aiohttp import web

 from trustgraph.i18n import get_language_pack

+from .. capabilities import enforce
+
 logger = logging.getLogger("endpoint")
 logger.setLevel(logging.INFO)


 class I18nPackEndpoint:

-    def __init__(self, endpoint_path: str, auth):
+    def __init__(self, endpoint_path: str, auth, capability):
        self.path = endpoint_path
        self.auth = auth
-        self.operation = "service"
+        self.capability = capability

    async def start(self):
        pass
@ -26,26 +28,13 @@ class I18nPackEndpoint:
    async def handle(self, request):
        logger.debug(f"Processing i18n pack request: {request.path}")

-        token = ""
-        try:
-            ht = request.headers["Authorization"]
-            tokens = ht.split(" ", 2)
-            if tokens[0] != "Bearer":
-                return web.HTTPUnauthorized()
-            token = tokens[1]
-        except Exception:
-            token = ""
-
-        if not self.auth.permitted(token, self.operation):
-            return web.HTTPUnauthorized()
+        await enforce(request, self.auth, self.capability)

        lang = request.match_info.get("lang") or "en"

-        # This is a path traversal defense, and is a critical sec defense.
-        # Do not remove!
+        # Path-traversal defense — critical, do not remove.
        if "/" in lang or ".." in lang:
            return web.HTTPBadRequest(reason="Invalid language code")

        pack = get_language_pack(lang)
-
        return web.json_response(pack)
--- a/trustgraph-flow/trustgraph/gateway/endpoint/manager.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/manager.py
@ -8,72 +8,278 @@ from . variable_endpoint import VariableEndpoint
 from . socket import SocketEndpoint
 from . metrics import MetricsEndpoint
 from . i18n import I18nPackEndpoint
+from . auth_endpoints import AuthEndpoints
+
+from .. capabilities import PUBLIC, AUTHENTICATED

 from .. dispatch.manager import DispatcherManager

+
+# Capability required for each kind on the /api/v1/{kind} generic
+# endpoint (global services).  Coarse gating — the IAM bundle split
+# of "read vs write" per admin subsystem is not applied here because
+# this endpoint forwards an opaque operation in the body.  Writes
+# are the upper bound on what the endpoint can do, so we gate on
+# the write/admin capability.
+GLOBAL_KIND_CAPABILITY = {
+    "config": "config:write",
+    "flow": "flows:write",
+    "librarian": "documents:write",
+    "knowledge": "knowledge:write",
+    "collection-management": "collections:write",
+    # IAM endpoints land on /api/v1/iam and require the admin bundle.
+    # Login / bootstrap / change-password are served by
+    # AuthEndpoints, which handle their own gating (PUBLIC /
+    # AUTHENTICATED).
+    "iam": "users:admin",
+}
+
+
+# Capability required for each kind on the
+# /api/v1/flow/{flow}/service/{kind} endpoint (per-flow data-plane).
+FLOW_KIND_CAPABILITY = {
+    "agent": "agent",
+    "text-completion": "llm",
+    "prompt": "llm",
+    "mcp-tool": "mcp",
+    "graph-rag": "graph:read",
+    "document-rag": "documents:read",
+    "embeddings": "embeddings",
+    "graph-embeddings": "graph:read",
+    "document-embeddings": "documents:read",
+    "triples": "graph:read",
+    "rows": "rows:read",
+    "nlp-query": "rows:read",
+    "structured-query": "rows:read",
+    "structured-diag": "rows:read",
+    "row-embeddings": "rows:read",
+    "sparql": "graph:read",
+}
+
+
+# Capability for the streaming flow import/export endpoints,
+# keyed by the "kind" URL segment.
+FLOW_IMPORT_CAPABILITY = {
+    "triples": "graph:write",
+    "graph-embeddings": "graph:write",
+    "document-embeddings": "documents:write",
+    "entity-contexts": "documents:write",
+    "rows": "rows:write",
+}
+
+FLOW_EXPORT_CAPABILITY = {
+    "triples": "graph:read",
+    "graph-embeddings": "graph:read",
+    "document-embeddings": "documents:read",
+    "entity-contexts": "documents:read",
+}
+
+
+from .. capabilities import enforce, enforce_workspace
+import logging as _mgr_logging
+_mgr_logger = _mgr_logging.getLogger("endpoint")
+
+
+class _RoutedVariableEndpoint:
+    """HTTP endpoint whose required capability is looked up per
+    request from the URL's ``kind`` parameter.  Used for the two
+    generic dispatch paths (``/api/v1/{kind}`` and
+    ``/api/v1/flow/{flow}/service/{kind}``).  Self-contained rather
+    than subclassing ``VariableEndpoint`` to avoid mutating shared
+    state across concurrent requests."""
+
+    def __init__(self, endpoint_path, auth, dispatcher, capability_map):
+        self.path = endpoint_path
+        self.auth = auth
+        self.dispatcher = dispatcher
+        self._capability_map = capability_map
+
+    async def start(self):
+        pass
+
+    def add_routes(self, app):
+        app.add_routes([web.post(self.path, self.handle)])
+
+    async def handle(self, request):
+        kind = request.match_info.get("kind", "")
+        cap = self._capability_map.get(kind)
+        if cap is None:
+            return web.json_response(
+                {"error": "unknown kind"}, status=404,
+            )
+
+        identity = await enforce(request, self.auth, cap)
+
+        try:
+            data = await request.json()
+            if identity is not None:
+                enforce_workspace(data, identity)
+
+            async def responder(x, fin):
+                pass
+
+            resp = await self.dispatcher.process(
+                data, responder, request.match_info,
+            )
+            return web.json_response(resp)
+
+        except web.HTTPException:
+            raise
+        except Exception as e:
+            _mgr_logger.error(f"Exception: {e}", exc_info=True)
+            return web.json_response({"error": str(e)})
+
+
+class _RoutedSocketEndpoint:
+    """WebSocket endpoint whose required capability is looked up per
+    request from the URL's ``kind`` parameter.  Used for the flow
+    import/export streaming endpoints."""
+
+    def __init__(self, endpoint_path, auth, dispatcher, capability_map):
+        self.path = endpoint_path
+        self.auth = auth
+        self.dispatcher = dispatcher
+        self._capability_map = capability_map
+
+    async def start(self):
+        pass
+
+    def add_routes(self, app):
+        app.add_routes([web.get(self.path, self.handle)])
+
+    async def handle(self, request):
+        from .. capabilities import check, auth_failure, access_denied
+
+        kind = request.match_info.get("kind", "")
+        cap = self._capability_map.get(kind)
+        if cap is None:
+            return web.json_response(
+                {"error": "unknown kind"}, status=404,
+            )
+
+        token = request.query.get("token", "")
+        if not token:
+            return auth_failure()
+
+        from . socket import _QueryTokenRequest
+        try:
+            identity = await self.auth.authenticate(
+                _QueryTokenRequest(token)
+            )
+        except web.HTTPException as e:
+            return e
+        if not check(identity, cap):
+            return access_denied()
+
+        # Delegate the websocket handling to a standalone SocketEndpoint
+        # with the resolved capability, bypassing the per-request mutation
+        # concern by instantiating fresh state.
+        ws_ep = SocketEndpoint(
+            endpoint_path=self.path,
+            auth=self.auth,
+            dispatcher=self.dispatcher,
+            capability=cap,
+        )
+        return await ws_ep.handle(request)
+
+
 class EndpointManager:

    def __init__(
-            self, dispatcher_manager, auth, prometheus_url, timeout=600
+            self, dispatcher_manager, auth, prometheus_url, timeout=600,
    ):

        self.dispatcher_manager = dispatcher_manager
        self.timeout = timeout

-        self.services = {
-        }
-
        self.endpoints = [
+
+            # Auth surface — public / authenticated-any.  Must come
+            # before the generic /api/v1/{kind} routes to win the
+            # match for /api/v1/auth/* paths.  aiohttp routes in
+            # registration order, so we prepend here.
+            AuthEndpoints(
+                iam_dispatcher=dispatcher_manager.dispatch_auth_iam(),
+                auth=auth,
+            ),
+
            I18nPackEndpoint(
-                endpoint_path = "/api/v1/i18n/packs/{lang}",
-                auth = auth,
+                endpoint_path="/api/v1/i18n/packs/{lang}",
+                auth=auth,
+                capability=PUBLIC,
            ),
            MetricsEndpoint(
-                endpoint_path = "/api/metrics",
-                prometheus_url = prometheus_url,
-                auth = auth,
+                endpoint_path="/api/metrics",
+                prometheus_url=prometheus_url,
+                auth=auth,
+                capability="metrics:read",
            ),
-            VariableEndpoint(
-                endpoint_path = "/api/v1/{kind}", auth = auth,
-                dispatcher = dispatcher_manager.dispatch_global_service(),
+
+            # Global services: capability chosen per-kind.
+            _RoutedVariableEndpoint(
+                endpoint_path="/api/v1/{kind}",
+                auth=auth,
+                dispatcher=dispatcher_manager.dispatch_global_service(),
+                capability_map=GLOBAL_KIND_CAPABILITY,
            ),
+
+            # /api/v1/socket: WebSocket handshake accepts
+            # unconditionally; the Mux dispatcher runs the
+            # first-frame auth protocol.  Handshake-time 401s break
+            # browser reconnection, so authentication is always
+            # in-band for this endpoint.
            SocketEndpoint(
-                endpoint_path = "/api/v1/socket",
-                auth = auth,
-                dispatcher = dispatcher_manager.dispatch_socket()
+                endpoint_path="/api/v1/socket",
+                auth=auth,
+                dispatcher=dispatcher_manager.dispatch_socket(),
+                capability=AUTHENTICATED,  # informational only; bypassed
+                in_band_auth=True,
            ),
-            VariableEndpoint(
-                endpoint_path = "/api/v1/flow/{flow}/service/{kind}",
-                auth = auth,
-                dispatcher = dispatcher_manager.dispatch_flow_service(),
+
+            # Per-flow request/response services — capability per kind.
+            _RoutedVariableEndpoint(
+                endpoint_path="/api/v1/flow/{flow}/service/{kind}",
+                auth=auth,
+                dispatcher=dispatcher_manager.dispatch_flow_service(),
+                capability_map=FLOW_KIND_CAPABILITY,
            ),
-            SocketEndpoint(
-                endpoint_path = "/api/v1/flow/{flow}/import/{kind}",
-                auth = auth,
-                dispatcher = dispatcher_manager.dispatch_flow_import()
+
+            # Per-flow streaming import/export — capability per kind.
+            _RoutedSocketEndpoint(
+                endpoint_path="/api/v1/flow/{flow}/import/{kind}",
+                auth=auth,
+                dispatcher=dispatcher_manager.dispatch_flow_import(),
+                capability_map=FLOW_IMPORT_CAPABILITY,
            ),
-            SocketEndpoint(
-                endpoint_path = "/api/v1/flow/{flow}/export/{kind}",
-                auth = auth,
-                dispatcher = dispatcher_manager.dispatch_flow_export()
+            _RoutedSocketEndpoint(
+                endpoint_path="/api/v1/flow/{flow}/export/{kind}",
+                auth=auth,
+                dispatcher=dispatcher_manager.dispatch_flow_export(),
+                capability_map=FLOW_EXPORT_CAPABILITY,
+            ),
+
+            StreamEndpoint(
+                endpoint_path="/api/v1/import-core",
+                auth=auth,
+                method="POST",
+                dispatcher=dispatcher_manager.dispatch_core_import(),
+                # Cross-subject import — require the admin bundle via a
+                # single representative capability.
+                capability="users:admin",
            ),
            StreamEndpoint(
-                endpoint_path = "/api/v1/import-core",
-                auth = auth,
-                method = "POST",
-                dispatcher = dispatcher_manager.dispatch_core_import(),
+                endpoint_path="/api/v1/export-core",
+                auth=auth,
+                method="GET",
+                dispatcher=dispatcher_manager.dispatch_core_export(),
+                capability="users:admin",
            ),
            StreamEndpoint(
-                endpoint_path = "/api/v1/export-core",
-                auth = auth,
-                method = "GET",
-                dispatcher = dispatcher_manager.dispatch_core_export(),
-            ),
-            StreamEndpoint(
-                endpoint_path = "/api/v1/document-stream",
-                auth = auth,
-                method = "GET",
-                dispatcher = dispatcher_manager.dispatch_document_stream(),
+                endpoint_path="/api/v1/document-stream",
+                auth=auth,
+                method="GET",
+                dispatcher=dispatcher_manager.dispatch_document_stream(),
+                capability="documents:read",
            ),
        ]

@ -84,4 +290,3 @@ class EndpointManager:
    async def start(self):
        for ep in self.endpoints:
            await ep.start()
-
--- a/trustgraph-flow/trustgraph/gateway/endpoint/metrics.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/metrics.py
@ -10,17 +10,19 @@ import asyncio
 import uuid
 import logging

+from .. capabilities import enforce
+
 logger = logging.getLogger("endpoint")
 logger.setLevel(logging.INFO)

 class MetricsEndpoint:

-    def __init__(self, prometheus_url, endpoint_path, auth):
+    def __init__(self, prometheus_url, endpoint_path, auth, capability):

        self.prometheus_url = prometheus_url
        self.path = endpoint_path
        self.auth = auth
-        self.operation = "service"
+        self.capability = capability

    async def start(self):
        pass
@ -35,17 +37,7 @@ class MetricsEndpoint:

        logger.debug(f"Processing metrics request: {request.path}")

-        try:
-            ht = request.headers["Authorization"]
-            tokens = ht.split(" ", 2)
-            if tokens[0] != "Bearer":
-                return web.HTTPUnauthorized()
-            token = tokens[1]
-        except:
-            token = ""
-
-        if not self.auth.permitted(token, self.operation):
-            return web.HTTPUnauthorized()
+        await enforce(request, self.auth, self.capability)

        path = request.match_info["path"]
        url = (
--- a/trustgraph-flow/trustgraph/gateway/endpoint/socket.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/socket.py
@ -4,6 +4,9 @@ from aiohttp import web, WSMsgType
 import logging

 from .. running import Running
+from .. capabilities import (
+    PUBLIC, AUTHENTICATED, check, auth_failure, access_denied,
+)

 logger = logging.getLogger("socket")
 logger.setLevel(logging.INFO)
@ -11,12 +14,25 @@ logger.setLevel(logging.INFO)
 class SocketEndpoint:

    def __init__(
-            self, endpoint_path, auth, dispatcher,
+            self, endpoint_path, auth, dispatcher, capability,
+            in_band_auth=False,
    ):
+        """
+        ``in_band_auth=True`` skips the handshake-time auth check.
+        The WebSocket handshake always succeeds; the dispatcher is
+        expected to gate itself via the first-frame auth protocol
+        (see ``Mux``).
+
+        This avoids the browser problem where a 401 on the handshake
+        is treated as permanent and prevents reconnection, and lets
+        long-lived sockets refresh their credential mid-session by
+        sending a new auth frame.
+        """

        self.path = endpoint_path
        self.auth = auth
-        self.operation = "socket"
+        self.capability = capability
+        self.in_band_auth = in_band_auth

        self.dispatcher = dispatcher

@ -61,15 +77,29 @@ class SocketEndpoint:
            raise
        
    async def handle(self, request):
-        """Enhanced handler with better cleanup"""
-        try:
-            token = request.query['token']
-        except:
-            token = ""
+        """Enhanced handler with better cleanup.
+
+        Auth: WebSocket clients pass the bearer token on the
+        ``?token=...`` query string; we wrap it into a synthetic
+        Authorization header before delegating to the standard auth
+        path so the IAM-backed flow (JWT / API key) applies uniformly.
+        The first-frame auth protocol described in the IAM spec is
+        a future upgrade."""
+
+        if not self.in_band_auth and self.capability != PUBLIC:
+            token = request.query.get("token", "")
+            if not token:
+                return auth_failure()
+            try:
+                identity = await self.auth.authenticate(
+                    _QueryTokenRequest(token)
+                )
+            except web.HTTPException as e:
+                return e
+            if self.capability != AUTHENTICATED:
+                if not check(identity, self.capability):
+                    return access_denied()

-        if not self.auth.permitted(token, self.operation):
-            return web.HTTPUnauthorized()
-        
        # 50MB max message size
        ws = web.WebSocketResponse(max_msg_size=52428800)

@ -150,3 +180,11 @@ class SocketEndpoint:
            web.get(self.path, self.handle),
        ])

+
+class _QueryTokenRequest:
+    """Minimal shim that exposes headers["Authorization"] to
+    IamAuth.authenticate(), derived from a query-string token."""
+
+    def __init__(self, token):
+        self.headers = {"Authorization": f"Bearer {token}"}
+
--- a/trustgraph-flow/trustgraph/gateway/endpoint/stream_endpoint.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/stream_endpoint.py
@ -1,82 +1,64 @@

-import asyncio
-from aiohttp import web
 import logging

+from aiohttp import web
+
+from .. capabilities import enforce
+
 logger = logging.getLogger("endpoint")
 logger.setLevel(logging.INFO)

+
 class StreamEndpoint:

-    def __init__(self, endpoint_path, auth, dispatcher, method="POST"):
-
+    def __init__(
+            self, endpoint_path, auth, dispatcher, capability, method="POST",
+    ):
        self.path = endpoint_path
-
        self.auth = auth
-        self.operation = "service"
+        self.capability = capability
        self.method = method
-
        self.dispatcher = dispatcher

    async def start(self):
        pass

    def add_routes(self, app):
-
        if self.method == "POST":
-            app.add_routes([
-                web.post(self.path, self.handle),
-            ])
+            app.add_routes([web.post(self.path, self.handle)])
        elif self.method == "GET":
-            app.add_routes([
-                web.get(self.path, self.handle),
-            ])
+            app.add_routes([web.get(self.path, self.handle)])
        else:
-            raise RuntimeError("Bad method" + self.method)
+            raise RuntimeError("Bad method " + self.method)

    async def handle(self, request):

        logger.debug(f"Processing request: {request.path}")

-        try:
-            ht = request.headers["Authorization"]
-            tokens = ht.split(" ", 2)
-            if tokens[0] != "Bearer":
-                return web.HTTPUnauthorized()
-            token = tokens[1]
-        except:
-            token = ""
-
-        if not self.auth.permitted(token, self.operation):
-            return web.HTTPUnauthorized()
+        await enforce(request, self.auth, self.capability)

        try:
-
            data = request.content

            async def error(err):
-                return web.HTTPInternalServerError(text = err)
+                return web.HTTPInternalServerError(text=err)

            async def ok(
-                    status=200, reason="OK", type="application/octet-stream"
+                    status=200, reason="OK",
+                    type="application/octet-stream",
            ):
                response = web.StreamResponse(
-                    status = status, reason = reason,
-                    headers = {"Content-Type": type}
+                    status=status, reason=reason,
+                    headers={"Content-Type": type},
                )
                await response.prepare(request)
                return response

-            resp = await self.dispatcher.process(
-                data, error, ok, request
-            )
-
+            resp = await self.dispatcher.process(data, error, ok, request)
            return resp

+        except web.HTTPException:
+            raise
        except Exception as e:
-            logging.error(f"Exception: {e}")
-
-            return web.json_response(
-                { "error": str(e) }
-            )
-
+            logger.error(f"Exception: {e}", exc_info=True)
+            return web.json_response({"error": str(e)})
--- a/trustgraph-flow/trustgraph/gateway/endpoint/variable_endpoint.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/variable_endpoint.py
@ -1,27 +1,27 @@

-import asyncio
-from aiohttp import web
 import logging

+from aiohttp import web
+
+from .. capabilities import enforce, enforce_workspace
+
 logger = logging.getLogger("endpoint")
 logger.setLevel(logging.INFO)

+
 class VariableEndpoint:

-    def __init__(self, endpoint_path, auth, dispatcher):
+    def __init__(self, endpoint_path, auth, dispatcher, capability):

        self.path = endpoint_path
-
        self.auth = auth
-        self.operation = "service"
-
+        self.capability = capability
        self.dispatcher = dispatcher

    async def start(self):
        pass

    def add_routes(self, app):
-
        app.add_routes([
            web.post(self.path, self.handle),
        ])
@ -30,35 +30,25 @@ class VariableEndpoint:

        logger.debug(f"Processing request: {request.path}")

-        try:
-            ht = request.headers["Authorization"]
-            tokens = ht.split(" ", 2)
-            if tokens[0] != "Bearer":
-                return web.HTTPUnauthorized()
-            token = tokens[1]
-        except:
-            token = ""
-
-        if not self.auth.permitted(token, self.operation):
-            return web.HTTPUnauthorized()
+        identity = await enforce(request, self.auth, self.capability)

        try:
-
            data = await request.json()

+            if identity is not None:
+                enforce_workspace(data, identity)
+
            async def responder(x, fin):
                pass

            resp = await self.dispatcher.process(
-                data, responder, request.match_info
+                data, responder, request.match_info,
            )

            return web.json_response(resp)

+        except web.HTTPException:
+            raise
        except Exception as e:
-            logging.error(f"Exception: {e}")
-
-            return web.json_response(
-                { "error": str(e) }
-            )
-
+            logger.error(f"Exception: {e}", exc_info=True)
+            return web.json_response({"error": str(e)})
--- a/trustgraph-flow/trustgraph/gateway/service.py
+++ b/trustgraph-flow/trustgraph/gateway/service.py
@ -12,7 +12,7 @@ import os
 from trustgraph.base.logging import setup_logging, add_logging_args
 from trustgraph.base.pubsub import get_pubsub, add_pubsub_args

-from . auth import Authenticator
+from . auth import IamAuth
 from . config.receiver import ConfigReceiver
 from . dispatch.manager import DispatcherManager

@ -35,7 +35,6 @@ default_prometheus_url = os.getenv("PROMETHEUS_URL", "http://prometheus:9090")
 default_pulsar_api_key = os.getenv("PULSAR_API_KEY", None)
 default_timeout = 600
 default_port = 8088
-default_api_token = os.getenv("GATEWAY_SECRET", "")

 class Api:

@ -60,13 +59,14 @@ class Api:
        if not self.prometheus_url.endswith("/"):
            self.prometheus_url += "/"

-        api_token = config.get("api_token", default_api_token)
-
-        # Token not set, or token equal empty string means no auth
-        if api_token:
-            self.auth = Authenticator(token=api_token)
-        else:
-            self.auth = Authenticator(allow_all=True)
+        # IAM-backed authentication.  The legacy GATEWAY_SECRET
+        # shared-token path has been removed — there is no
+        # "open for everyone" fallback.  The gateway cannot
+        # authenticate any request until IAM is reachable.
+        self.auth = IamAuth(
+            backend=self.pubsub_backend,
+            id=config.get("id", "api-gateway"),
+        )

        self.config_receiver = ConfigReceiver(self.pubsub_backend)

@ -118,6 +118,7 @@ class Api:
            config_receiver = self.config_receiver,
            prefix = "gateway",
            queue_overrides = queue_overrides,
+            auth = self.auth,
        )

        self.endpoint_manager = EndpointManager(
@ -132,12 +133,18 @@ class Api:
        ]

    async def app_factory(self):
-        
+
        self.app = web.Application(
            middlewares=[],
            client_max_size=256 * 1024 * 1024
        )

+        # Fetch IAM signing public key before accepting traffic.
+        # Blocks for a bounded retry window; the gateway starts even
+        # if IAM is still unreachable (JWT validation will 401 until
+        # the key is available).
+        await self.auth.start()
+
        await self.config_receiver.start()

        for ep in self.endpoints:
@ -189,12 +196,6 @@ def run():
        help=f'API request timeout in seconds (default: {default_timeout})',
    )

-    parser.add_argument(
-        '--api-token',
-        default=default_api_token,
-        help=f'Secret API token (default: no auth)',
-    )
-
    add_logging_args(parser)

    parser.add_argument(
--- a/trustgraph-flow/trustgraph/iam/init.py
+++ b/trustgraph-flow/trustgraph/iam/init.py
--- a/trustgraph-flow/trustgraph/iam/service/init.py
+++ b/trustgraph-flow/trustgraph/iam/service/init.py
@ -0,0 +1 @@
+from . service import *
--- a/trustgraph-flow/trustgraph/iam/service/main.py
+++ b/trustgraph-flow/trustgraph/iam/service/main.py
@ -0,0 +1,4 @@
+
+from . service import run
+
+run()
--- a/trustgraph-flow/trustgraph/iam/service/iam.py
+++ b/trustgraph-flow/trustgraph/iam/service/iam.py
--- a/trustgraph-flow/trustgraph/iam/service/service.py
+++ b/trustgraph-flow/trustgraph/iam/service/service.py
@ -0,0 +1,210 @@
+"""
+IAM service processor.  Terminates the IAM request queue and forwards
+each request to the IamService business logic, then returns the
+response on the IAM response queue.
+
+Shape mirrors trustgraph.config.service.
+"""
+
+import logging
+
+from trustgraph.schema import Error
+from trustgraph.schema import IamRequest, IamResponse
+from trustgraph.schema import iam_request_queue, iam_response_queue
+
+from trustgraph.base import AsyncProcessor, Consumer, Producer
+from trustgraph.base import ConsumerMetrics, ProducerMetrics
+from trustgraph.base.cassandra_config import (
+    add_cassandra_args, resolve_cassandra_config,
+)
+
+from . iam import IamService
+
+logger = logging.getLogger(__name__)
+
+default_ident = "iam-svc"
+
+default_iam_request_queue = iam_request_queue
+default_iam_response_queue = iam_response_queue
+
+
+class Processor(AsyncProcessor):
+
+    def __init__(self, **params):
+
+        iam_req_q = params.get(
+            "iam_request_queue", default_iam_request_queue,
+        )
+        iam_resp_q = params.get(
+            "iam_response_queue", default_iam_response_queue,
+        )
+
+        bootstrap_mode = params.get("bootstrap_mode")
+        bootstrap_token = params.get("bootstrap_token")
+
+        if bootstrap_mode not in ("token", "bootstrap"):
+            raise RuntimeError(
+                "iam-svc: --bootstrap-mode is required.  Set to 'token' "
+                "(with --bootstrap-token) for production, or 'bootstrap' "
+                "to enable the explicit bootstrap operation over the "
+                "pub/sub bus (dev / quick-start only, not safe under "
+                "public exposure).  Refusing to start."
+            )
+        if bootstrap_mode == "token" and not bootstrap_token:
+            raise RuntimeError(
+                "iam-svc: --bootstrap-mode=token requires "
+                "--bootstrap-token.  Refusing to start."
+            )
+        if bootstrap_mode == "bootstrap" and bootstrap_token:
+            raise RuntimeError(
+                "iam-svc: --bootstrap-token is not accepted when "
+                "--bootstrap-mode=bootstrap.  Ambiguous intent.  "
+                "Refusing to start."
+            )
+
+        self.bootstrap_mode = bootstrap_mode
+        self.bootstrap_token = bootstrap_token
+
+        cassandra_host = params.get("cassandra_host")
+        cassandra_username = params.get("cassandra_username")
+        cassandra_password = params.get("cassandra_password")
+
+        hosts, username, password, keyspace = resolve_cassandra_config(
+            host=cassandra_host,
+            username=cassandra_username,
+            password=cassandra_password,
+            default_keyspace="iam",
+        )
+
+        self.cassandra_host = hosts
+        self.cassandra_username = username
+        self.cassandra_password = password
+
+        super().__init__(
+            **params | {
+                "iam_request_schema": IamRequest.__name__,
+                "iam_response_schema": IamResponse.__name__,
+                "cassandra_host": self.cassandra_host,
+                "cassandra_username": self.cassandra_username,
+                "cassandra_password": self.cassandra_password,
+            }
+        )
+
+        iam_request_metrics = ConsumerMetrics(
+            processor=self.id, flow=None, name="iam-request",
+        )
+        iam_response_metrics = ProducerMetrics(
+            processor=self.id, flow=None, name="iam-response",
+        )
+
+        self.iam_request_topic = iam_req_q
+
+        self.iam_request_consumer = Consumer(
+            taskgroup=self.taskgroup,
+            backend=self.pubsub,
+            flow=None,
+            topic=iam_req_q,
+            subscriber=self.id,
+            schema=IamRequest,
+            handler=self.on_iam_request,
+            metrics=iam_request_metrics,
+        )
+
+        self.iam_response_producer = Producer(
+            backend=self.pubsub,
+            topic=iam_resp_q,
+            schema=IamResponse,
+            metrics=iam_response_metrics,
+        )
+
+        self.iam = IamService(
+            host=self.cassandra_host,
+            username=self.cassandra_username,
+            password=self.cassandra_password,
+            keyspace=keyspace,
+            bootstrap_mode=self.bootstrap_mode,
+            bootstrap_token=self.bootstrap_token,
+        )
+
+        logger.info(
+            f"IAM service initialised (bootstrap-mode={self.bootstrap_mode})"
+        )
+
+    async def start(self):
+        await self.pubsub.ensure_topic(self.iam_request_topic)
+        # Token-mode auto-bootstrap runs before we accept requests so
+        # the first inbound call always sees a populated table.
+        await self.iam.auto_bootstrap_if_token_mode()
+        await self.iam_request_consumer.start()
+
+    async def on_iam_request(self, msg, consumer, flow):
+
+        id = None
+        try:
+            v = msg.value()
+            id = msg.properties()["id"]
+            logger.debug(
+                f"Handling IAM request {id} op={v.operation!r}"
+            )
+            resp = await self.iam.handle(v)
+            await self.iam_response_producer.send(
+                resp, properties={"id": id},
+            )
+        except Exception as e:
+            logger.error(
+                f"IAM request failed: {type(e).__name__}: {e}",
+                exc_info=True,
+            )
+            resp = IamResponse(
+                error=Error(type="internal-error", message=str(e)),
+            )
+            if id is not None:
+                await self.iam_response_producer.send(
+                    resp, properties={"id": id},
+                )
+
+    @staticmethod
+    def add_args(parser):
+        AsyncProcessor.add_args(parser)
+
+        parser.add_argument(
+            "--iam-request-queue",
+            default=default_iam_request_queue,
+            help=f"IAM request queue (default: {default_iam_request_queue})",
+        )
+        parser.add_argument(
+            "--iam-response-queue",
+            default=default_iam_response_queue,
+            help=f"IAM response queue (default: {default_iam_response_queue})",
+        )
+        parser.add_argument(
+            "--bootstrap-mode",
+            default=None,
+            choices=["token", "bootstrap"],
+            help=(
+                "IAM bootstrap mode (required).  "
+                "'token' = operator supplies the initial admin API "
+                "key via --bootstrap-token; auto-seeds on first start, "
+                "bootstrap operation refused.  "
+                "'bootstrap' = bootstrap operation is live over the "
+                "bus until tables are populated; a token is generated "
+                "and returned by tg-bootstrap-iam.  Unsafe to run "
+                "'bootstrap' mode with public exposure."
+            ),
+        )
+        parser.add_argument(
+            "--bootstrap-token",
+            default=None,
+            help=(
+                "Initial admin API key plaintext, required when "
+                "--bootstrap-mode=token.  Treat as a one-time "
+                "credential: the operator should rotate to a new key "
+                "and revoke this one after first use."
+            ),
+        )
+
+        add_cassandra_args(parser)
+
+
+def run():
+    Processor.launch(default_ident, __doc__)
--- a/trustgraph-flow/trustgraph/tables/iam.py
+++ b/trustgraph-flow/trustgraph/tables/iam.py
@ -0,0 +1,422 @@
+"""
+IAM Cassandra table store.
+
+Tables:
+  - iam_workspaces (id primary key)
+  - iam_users (id primary key) + iam_users_by_username lookup table
+    (workspace, username) -> id
+  - iam_api_keys (key_hash primary key) with secondary index on user_id
+  - iam_signing_keys (kid primary key) — RSA keypairs for JWT signing
+
+See docs/tech-specs/iam-protocol.md for the wire-level context.
+"""
+
+import logging
+
+from cassandra.cluster import Cluster
+from cassandra.auth import PlainTextAuthProvider
+from ssl import SSLContext, PROTOCOL_TLSv1_2
+
+from . cassandra_async import async_execute
+
+logger = logging.getLogger(__name__)
+
+
+class IamTableStore:
+
+    def __init__(
+            self,
+            cassandra_host, cassandra_username, cassandra_password,
+            keyspace,
+    ):
+        self.keyspace = keyspace
+
+        logger.info("IAM: connecting to Cassandra...")
+
+        if isinstance(cassandra_host, str):
+            cassandra_host = [h.strip() for h in cassandra_host.split(",")]
+
+        if cassandra_username and cassandra_password:
+            ssl_context = SSLContext(PROTOCOL_TLSv1_2)
+            auth_provider = PlainTextAuthProvider(
+                username=cassandra_username, password=cassandra_password,
+            )
+            self.cluster = Cluster(
+                cassandra_host,
+                auth_provider=auth_provider,
+                ssl_context=ssl_context,
+            )
+        else:
+            self.cluster = Cluster(cassandra_host)
+
+        self.cassandra = self.cluster.connect()
+
+        logger.info("IAM: connected.")
+
+        self._ensure_schema()
+        self._prepare_statements()
+
+    def _ensure_schema(self):
+        # FIXME: Replication factor should be configurable.
+        self.cassandra.execute(f"""
+            create keyspace if not exists {self.keyspace}
+                with replication = {{
+                    'class' : 'SimpleStrategy',
+                    'replication_factor' : 1
+                }};
+        """)
+        self.cassandra.set_keyspace(self.keyspace)
+
+        self.cassandra.execute("""
+            CREATE TABLE IF NOT EXISTS iam_workspaces (
+                id text PRIMARY KEY,
+                name text,
+                enabled boolean,
+                created timestamp
+            );
+        """)
+
+        self.cassandra.execute("""
+            CREATE TABLE IF NOT EXISTS iam_users (
+                id text PRIMARY KEY,
+                workspace text,
+                username text,
+                name text,
+                email text,
+                password_hash text,
+                roles set<text>,
+                enabled boolean,
+                must_change_password boolean,
+                created timestamp
+            );
+        """)
+
+        self.cassandra.execute("""
+            CREATE TABLE IF NOT EXISTS iam_users_by_username (
+                workspace text,
+                username text,
+                user_id text,
+                PRIMARY KEY ((workspace), username)
+            );
+        """)
+
+        self.cassandra.execute("""
+            CREATE TABLE IF NOT EXISTS iam_api_keys (
+                key_hash text PRIMARY KEY,
+                id text,
+                user_id text,
+                name text,
+                prefix text,
+                expires timestamp,
+                created timestamp,
+                last_used timestamp
+            );
+        """)
+
+        self.cassandra.execute("""
+            CREATE INDEX IF NOT EXISTS iam_api_keys_user_id_idx
+            ON iam_api_keys (user_id);
+        """)
+
+        self.cassandra.execute("""
+            CREATE INDEX IF NOT EXISTS iam_api_keys_id_idx
+            ON iam_api_keys (id);
+        """)
+
+        self.cassandra.execute("""
+            CREATE TABLE IF NOT EXISTS iam_signing_keys (
+                kid text PRIMARY KEY,
+                private_pem text,
+                public_pem text,
+                created timestamp,
+                retired timestamp
+            );
+        """)
+
+        logger.info("IAM: Cassandra schema OK.")
+
+    def _prepare_statements(self):
+        c = self.cassandra
+
+        self.put_workspace_stmt = c.prepare("""
+            INSERT INTO iam_workspaces (id, name, enabled, created)
+            VALUES (?, ?, ?, ?)
+        """)
+        self.get_workspace_stmt = c.prepare("""
+            SELECT id, name, enabled, created FROM iam_workspaces
+            WHERE id = ?
+        """)
+        self.list_workspaces_stmt = c.prepare("""
+            SELECT id, name, enabled, created FROM iam_workspaces
+        """)
+
+        self.put_user_stmt = c.prepare("""
+            INSERT INTO iam_users (
+                id, workspace, username, name, email, password_hash,
+                roles, enabled, must_change_password, created
+            )
+            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+        """)
+        self.get_user_stmt = c.prepare("""
+            SELECT id, workspace, username, name, email, password_hash,
+                   roles, enabled, must_change_password, created
+            FROM iam_users WHERE id = ?
+        """)
+        self.list_users_by_workspace_stmt = c.prepare("""
+            SELECT id, workspace, username, name, email, password_hash,
+                   roles, enabled, must_change_password, created
+            FROM iam_users WHERE workspace = ? ALLOW FILTERING
+        """)
+
+        self.put_username_lookup_stmt = c.prepare("""
+            INSERT INTO iam_users_by_username (workspace, username, user_id)
+            VALUES (?, ?, ?)
+        """)
+        self.get_user_id_by_username_stmt = c.prepare("""
+            SELECT user_id FROM iam_users_by_username
+            WHERE workspace = ? AND username = ?
+        """)
+        self.delete_username_lookup_stmt = c.prepare("""
+            DELETE FROM iam_users_by_username
+            WHERE workspace = ? AND username = ?
+        """)
+        self.delete_user_stmt = c.prepare("""
+            DELETE FROM iam_users WHERE id = ?
+        """)
+
+        self.put_api_key_stmt = c.prepare("""
+            INSERT INTO iam_api_keys (
+                key_hash, id, user_id, name, prefix, expires,
+                created, last_used
+            )
+            VALUES (?, ?, ?, ?, ?, ?, ?, ?)
+        """)
+        self.get_api_key_by_hash_stmt = c.prepare("""
+            SELECT key_hash, id, user_id, name, prefix, expires,
+                   created, last_used
+            FROM iam_api_keys WHERE key_hash = ?
+        """)
+        self.get_api_key_by_id_stmt = c.prepare("""
+            SELECT key_hash, id, user_id, name, prefix, expires,
+                   created, last_used
+            FROM iam_api_keys WHERE id = ?
+        """)
+        self.list_api_keys_by_user_stmt = c.prepare("""
+            SELECT key_hash, id, user_id, name, prefix, expires,
+                   created, last_used
+            FROM iam_api_keys WHERE user_id = ?
+        """)
+        self.delete_api_key_stmt = c.prepare("""
+            DELETE FROM iam_api_keys WHERE key_hash = ?
+        """)
+
+        self.put_signing_key_stmt = c.prepare("""
+            INSERT INTO iam_signing_keys (
+                kid, private_pem, public_pem, created, retired
+            )
+            VALUES (?, ?, ?, ?, ?)
+        """)
+        self.list_signing_keys_stmt = c.prepare("""
+            SELECT kid, private_pem, public_pem, created, retired
+            FROM iam_signing_keys
+        """)
+        self.retire_signing_key_stmt = c.prepare("""
+            UPDATE iam_signing_keys SET retired = ? WHERE kid = ?
+        """)
+
+        self.update_user_profile_stmt = c.prepare("""
+            UPDATE iam_users
+            SET name = ?, email = ?, roles = ?, enabled = ?,
+                must_change_password = ?
+            WHERE id = ?
+        """)
+        self.update_user_password_stmt = c.prepare("""
+            UPDATE iam_users
+            SET password_hash = ?, must_change_password = ?
+            WHERE id = ?
+        """)
+        self.update_user_enabled_stmt = c.prepare("""
+            UPDATE iam_users SET enabled = ? WHERE id = ?
+        """)
+
+        self.update_workspace_stmt = c.prepare("""
+            UPDATE iam_workspaces SET name = ?, enabled = ?
+            WHERE id = ?
+        """)
+
+    # ------------------------------------------------------------------
+    # Workspaces
+    # ------------------------------------------------------------------
+
+    async def put_workspace(self, id, name, enabled, created):
+        await async_execute(
+            self.cassandra, self.put_workspace_stmt,
+            (id, name, enabled, created),
+        )
+
+    async def get_workspace(self, id):
+        rows = await async_execute(
+            self.cassandra, self.get_workspace_stmt, (id,),
+        )
+        return rows[0] if rows else None
+
+    async def list_workspaces(self):
+        return await async_execute(
+            self.cassandra, self.list_workspaces_stmt,
+        )
+
+    # ------------------------------------------------------------------
+    # Users
+    # ------------------------------------------------------------------
+
+    async def put_user(
+            self, id, workspace, username, name, email, password_hash,
+            roles, enabled, must_change_password, created,
+    ):
+        await async_execute(
+            self.cassandra, self.put_user_stmt,
+            (
+                id, workspace, username, name, email, password_hash,
+                set(roles) if roles else set(),
+                enabled, must_change_password, created,
+            ),
+        )
+        await async_execute(
+            self.cassandra, self.put_username_lookup_stmt,
+            (workspace, username, id),
+        )
+
+    async def get_user(self, id):
+        rows = await async_execute(
+            self.cassandra, self.get_user_stmt, (id,),
+        )
+        return rows[0] if rows else None
+
+    async def get_user_id_by_username(self, workspace, username):
+        rows = await async_execute(
+            self.cassandra, self.get_user_id_by_username_stmt,
+            (workspace, username),
+        )
+        return rows[0][0] if rows else None
+
+    async def list_users_by_workspace(self, workspace):
+        return await async_execute(
+            self.cassandra, self.list_users_by_workspace_stmt, (workspace,),
+        )
+
+    async def delete_user(self, id):
+        await async_execute(
+            self.cassandra, self.delete_user_stmt, (id,),
+        )
+
+    async def delete_username_lookup(self, workspace, username):
+        await async_execute(
+            self.cassandra, self.delete_username_lookup_stmt,
+            (workspace, username),
+        )
+
+    # ------------------------------------------------------------------
+    # API keys
+    # ------------------------------------------------------------------
+
+    async def put_api_key(
+            self, key_hash, id, user_id, name, prefix, expires,
+            created, last_used,
+    ):
+        await async_execute(
+            self.cassandra, self.put_api_key_stmt,
+            (key_hash, id, user_id, name, prefix, expires,
+             created, last_used),
+        )
+
+    async def get_api_key_by_hash(self, key_hash):
+        rows = await async_execute(
+            self.cassandra, self.get_api_key_by_hash_stmt, (key_hash,),
+        )
+        return rows[0] if rows else None
+
+    async def get_api_key_by_id(self, id):
+        rows = await async_execute(
+            self.cassandra, self.get_api_key_by_id_stmt, (id,),
+        )
+        return rows[0] if rows else None
+
+    async def list_api_keys_by_user(self, user_id):
+        return await async_execute(
+            self.cassandra, self.list_api_keys_by_user_stmt, (user_id,),
+        )
+
+    async def delete_api_key(self, key_hash):
+        await async_execute(
+            self.cassandra, self.delete_api_key_stmt, (key_hash,),
+        )
+
+    # ------------------------------------------------------------------
+    # Signing keys
+    # ------------------------------------------------------------------
+
+    async def put_signing_key(self, kid, private_pem, public_pem,
+                              created, retired):
+        await async_execute(
+            self.cassandra, self.put_signing_key_stmt,
+            (kid, private_pem, public_pem, created, retired),
+        )
+
+    async def list_signing_keys(self):
+        return await async_execute(
+            self.cassandra, self.list_signing_keys_stmt,
+        )
+
+    async def retire_signing_key(self, kid, retired):
+        await async_execute(
+            self.cassandra, self.retire_signing_key_stmt,
+            (retired, kid),
+        )
+
+    # ------------------------------------------------------------------
+    # User partial updates
+    # ------------------------------------------------------------------
+
+    async def update_user_profile(
+            self, id, name, email, roles, enabled, must_change_password,
+    ):
+        await async_execute(
+            self.cassandra, self.update_user_profile_stmt,
+            (
+                name, email,
+                set(roles) if roles else set(),
+                enabled, must_change_password, id,
+            ),
+        )
+
+    async def update_user_password(
+            self, id, password_hash, must_change_password,
+    ):
+        await async_execute(
+            self.cassandra, self.update_user_password_stmt,
+            (password_hash, must_change_password, id),
+        )
+
+    async def update_user_enabled(self, id, enabled):
+        await async_execute(
+            self.cassandra, self.update_user_enabled_stmt,
+            (enabled, id),
+        )
+
+    # ------------------------------------------------------------------
+    # Workspace updates
+    # ------------------------------------------------------------------
+
+    async def update_workspace(self, id, name, enabled):
+        await async_execute(
+            self.cassandra, self.update_workspace_stmt,
+            (name, enabled, id),
+        )
+
+    # ------------------------------------------------------------------
+    # Bootstrap helpers
+    # ------------------------------------------------------------------
+
+    async def any_workspace_exists(self):
+        rows = await self.list_workspaces()
+        return bool(rows)