refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853)

The gateway no longer holds any policy state — capability sets, role definitions, workspace scope rules. Per the IAM contract it asks the regime "may this identity perform this capability on this resource?" per request. That moves the OSS role-based regime entirely into iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing the gateway, the wire protocol, or backend services. Contract: - authenticate(credential) -> Identity (handle, workspace, principal_id, source). No roles, claims, or policy state surface to the gateway. - authorise(identity, capability, resource, parameters) -> (allow, ttl). Cached per-decision (regime TTL clamped above; fail-closed on regime errors). - authorise_many available as a fan-out variant. Operation registry drives every authorisation decision: - /api/v1/iam -> IamEndpoint, looks up bare op name (create-user, list-workspaces, ...). - /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op> (config:get, flow:list-blueprints, librarian:add-document, ...). - /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>. - /api/v1/flow/{flow}/{import,export}/{kind} -> flow-{import,export}:<kind>. - WS Mux per-frame -> flow-service:<kind>; closes a gap where authenticated users could hit any service kind. 85 operations registered across the surface. JWT carries identity only — sub + workspace. The roles claim is gone; the gateway never reads policy state from a credential. The three coarse *_KIND_CAPABILITY maps are removed. The registry is the only source of truth for the capability + resource shape of an operation. Tests migrated to the new Identity shape and to authorise()-mocked auth doubles. Specs updated: docs/tech-specs/iam-contract.md (Identity surface, caching, registry-naming conventions), iam.md (JWT shape, gateway flow, role section reframed as OSS-regime detail), iam-protocol.md (positioned as one implementation of the contract).
2026-05-17 03:15:14 +02:00 · 2026-04-28 16:19:41 +01:00 · 2026-04-28 16:19:41 +01:00 · 5e28d3cce0
commit 5e28d3cce0
parent 9f2d9adcb1
24 changed files with 2359 additions and 587 deletions
--- a/trustgraph-flow/trustgraph/gateway/auth.py
+++ b/trustgraph-flow/trustgraph/gateway/auth.py
@ -1,15 +1,16 @@
 """
-IAM-backed authentication for the API gateway.
+IAM-backed authentication and authorisation for the API gateway.

-Replaces the legacy GATEWAY_SECRET shared-token Authenticator.  The
-gateway is now stateless with respect to credentials: it either
-verifies a JWT locally using the active IAM signing public key, or
-resolves an API key by hash with a short local cache backed by the
-IAM service.
+The gateway delegates both authentication ("who is this caller?")
+and authorisation ("may they do this?") to the IAM regime via the
+contract specified in docs/tech-specs/iam-contract.md.  No regime-
+specific policy (roles, scopes, claims) lives in the gateway.

-Identity returned by authenticate() is the (user_id, workspace,
-roles) triple the rest of the gateway — capability checks, workspace
-resolver, audit logging — needs.
+- Authentication: API keys are resolved by IAM; JWTs are validated
+  locally against the cached signing public key.
+- Authorisation: every per-request decision is asked of IAM via
+  ``authorise(identity, capability, resource, parameters)``, with
+  results cached for the TTL the regime returns.
 """

 import asyncio
@ -19,7 +20,7 @@ import json
 import logging
 import time
 import uuid
-from dataclasses import dataclass
+from dataclasses import dataclass, field

 from aiohttp import web

@ -37,12 +38,34 @@ logger = logging.getLogger("auth")

 API_KEY_CACHE_TTL = 60  # seconds

+# Upper bound on cache TTL the gateway honours for an authorisation
+# decision, regardless of what the regime suggested.  Caps the
+# revocation latency window.
+AUTHZ_CACHE_TTL_MAX = 60  # seconds
+

@dataclass
 class Identity:
-    user_id: str
+    """The gateway-side surface of an authenticated caller.
+
+    Per the IAM contract this is a small fixed shape; regime-internal
+    state (roles, claims, group memberships) is reachable only via
+    the regime's ``authorise`` operation.  The gateway itself never
+    reads policy from this object.
+    """
+    # Opaque handle, quoted back when calling ``authorise``.  For
+    # the OSS regime this is the user record's id; the gateway
+    # treats it as a string with no semantic content.
+    handle: str
+    # The workspace this credential authenticates to.  Used by the
+    # gateway as the default-fill-in for operations that omit a
+    # workspace.  Never used as policy input.
    workspace: str
-    roles: list
+    # Stable identifier for audit logs.  In OSS this is the same
+    # value as ``handle``; not assumed equal in the contract.
+    principal_id: str
+    # How the credential was presented.  Non-policy; useful for
+    # logs / metrics only.
    source: str   # "api-key" | "jwt"


@ -111,6 +134,13 @@ class IamAuth:
        self._key_cache = {}
        self._key_cache_lock = asyncio.Lock()

+        # Authorisation decision cache: hash(handle, capability,
+        # resource, parameters) -> (allow_bool, expires_ts).  Holds
+        # both allows and denies — denies cached briefly to avoid
+        # hammering iam-svc with repeated rejected attempts.
+        self._authz_cache: dict[str, tuple[bool, float]] = {}
+        self._authz_cache_lock = asyncio.Lock()
+
    # ------------------------------------------------------------------
    # Short-lived client helper.  Mirrors the pattern used by the
    # bootstrap framework and AsyncProcessor: a fresh uuid suffix per
@ -221,12 +251,13 @@ class IamAuth:

        sub = claims.get("sub", "")
        ws = claims.get("workspace", "")
-        roles = list(claims.get("roles", []))
        if not sub or not ws:
            raise _auth_failure()

+        # JWT carries no policy state under the IAM contract;
+        # any roles / claims field is ignored here.
        return Identity(
-            user_id=sub, workspace=ws, roles=roles, source="jwt",
+            handle=sub, workspace=ws, principal_id=sub, source="jwt",
        )

    async def _resolve_api_key(self, plaintext):
@ -245,7 +276,10 @@ class IamAuth:
            try:
                async def _call(client):
                    return await client.resolve_api_key(plaintext)
-                user_id, workspace, roles = await self._with_client(_call)
+                # ``roles`` is returned by the OSS regime as a hint
+                # but is not consulted by the gateway; all policy
+                # decisions go through ``authorise``.
+                user_id, workspace, _roles = await self._with_client(_call)
            except Exception as e:
                logger.debug(
                    f"API key resolution failed: "
@ -257,8 +291,81 @@ class IamAuth:
                raise _auth_failure()

            identity = Identity(
-                user_id=user_id, workspace=workspace,
-                roles=list(roles), source="api-key",
+                handle=user_id, workspace=workspace,
+                principal_id=user_id, source="api-key",
            )
            self._key_cache[h] = (identity, now + API_KEY_CACHE_TTL)
            return identity
+
+    # ------------------------------------------------------------------
+    # Authorisation
+    # ------------------------------------------------------------------
+
+    @staticmethod
+    def _authz_cache_key(handle, capability, resource, parameters):
+        payload = json.dumps(
+            {
+                "h": handle,
+                "c": capability,
+                "r": resource or {},
+                "p": parameters or {},
+            },
+            sort_keys=True,
+            separators=(",", ":"),
+        )
+        return hashlib.sha256(payload.encode("utf-8")).hexdigest()
+
+    async def authorise(self, identity, capability, resource, parameters):
+        """Ask the IAM regime whether ``identity`` may perform
+        ``capability`` on ``resource`` given ``parameters``.
+
+        Caches the decision for the regime's suggested TTL, clamped
+        above by ``AUTHZ_CACHE_TTL_MAX``.  Both allow and deny
+        decisions are cached (denies briefly, to avoid hammering
+        iam-svc with repeated rejected attempts).
+
+        Raises ``HTTPForbidden`` (403 / "access denied") on a deny
+        decision.  Raises ``HTTPUnauthorized`` (401 / "auth failure")
+        if the IAM service errors out — failing closed."""
+
+        key = self._authz_cache_key(
+            identity.handle, capability, resource, parameters,
+        )
+        now = time.time()
+
+        cached = self._authz_cache.get(key)
+        if cached and cached[1] > now:
+            allow, _ = cached
+            if not allow:
+                raise _access_denied()
+            return
+
+        async with self._authz_cache_lock:
+            cached = self._authz_cache.get(key)
+            if cached and cached[1] > now:
+                allow, _ = cached
+                if not allow:
+                    raise _access_denied()
+                return
+
+            try:
+                async def _call(client):
+                    return await client.authorise(
+                        identity.handle, capability,
+                        resource or {}, parameters or {},
+                    )
+                allow, ttl = await self._with_client(_call)
+            except Exception as e:
+                logger.warning(
+                    f"authorise failed: {type(e).__name__}: {e}; "
+                    f"failing closed for "
+                    f"{identity.principal_id!r} cap={capability!r}"
+                )
+                raise _auth_failure()
+
+            ttl = max(0, min(int(ttl or 0), AUTHZ_CACHE_TTL_MAX))
+            self._authz_cache[key] = (bool(allow), now + ttl)
+
+            if not allow:
+                raise _access_denied()
+            return
--- a/trustgraph-flow/trustgraph/gateway/capabilities.py
+++ b/trustgraph-flow/trustgraph/gateway/capabilities.py
@ -1,36 +1,23 @@
 """
-Capability vocabulary, role definitions, and authorisation helpers.
+Gateway-side authorisation entry points.

-See docs/tech-specs/capabilities.md for the authoritative description.
-The data here is the OSS bundle table in that spec.  Enterprise
-editions may replace this module with their own role table; the
-vocabulary (capability strings) is shared.
+Under the IAM contract (see docs/tech-specs/iam-contract.md) the
+gateway holds *no* policy state.  Roles, capability sets, and
+workspace-scope rules all live in the IAM regime (iam-svc for OSS).
+This module is the thin surface the gateway uses to ask the regime
+for a decision:

-Role model
----------
-A role has two dimensions:
+- ``PUBLIC`` / ``AUTHENTICATED`` sentinels for endpoints that don't
+  go through capability-based authorisation.
+- :func:`enforce` — authenticate-only, then ask the regime.
+- :func:`enforce_workspace` — default-fill the workspace from the
+  caller's bound workspace and ask the regime, with the workspace
+  treated as the resource address.

-  1. **capability set** — which operations the role grants.
-  2. **workspace scope** — which workspaces the role is active in.
-
-The authorisation question is: *given the caller's roles, a required
-capability, and a target workspace, does any role grant the
-capability AND apply to the target workspace?*
-
-Workspace scope values recognised here:
-
-  - ``"assigned"`` — the role applies only to the caller's own
-    assigned workspace (stored on their user record).
-  - ``"*"`` — the role applies to every workspace.
-
-Enterprise editions can add richer scopes (explicit permitted-set,
-patterns, etc.) without changing the wire protocol.
-
-Sentinels
---------
- ``PUBLIC`` — endpoint requires no authentication.
- ``AUTHENTICATED`` — endpoint requires a valid identity, no
-  specific capability.
+The capability strings themselves are an open vocabulary — see
+docs/tech-specs/capabilities.md.  The gateway does not validate them
+beyond passing them through; an unknown capability simply produces a
+deny verdict from the regime.
 """

 from aiohttp import web
@ -40,125 +27,6 @@ PUBLIC = "__public__"
 AUTHENTICATED = "__authenticated__"


-# Capability vocabulary.  Mirrors the "Capability list" tables in
-# capabilities.md.  Kept as a set so the gateway can fail-closed on
-# an endpoint that declares an unknown capability.
-KNOWN_CAPABILITIES = {
-    # Data plane
-    "agent",
-    "graph:read", "graph:write",
-    "documents:read", "documents:write",
-    "rows:read", "rows:write",
-    "llm",
-    "embeddings",
-    "mcp",
-    # Control plane
-    "config:read", "config:write",
-    "flows:read", "flows:write",
-    "users:read", "users:write", "users:admin",
-    "keys:self", "keys:admin",
-    "workspaces:admin",
-    "iam:admin",
-    "metrics:read",
-    "collections:read", "collections:write",
-    "knowledge:read", "knowledge:write",
-}
-
-
-# Capability sets used below.
-_READER_CAPS = {
-    "agent",
-    "graph:read",
-    "documents:read",
-    "rows:read",
-    "llm",
-    "embeddings",
-    "mcp",
-    "config:read",
-    "flows:read",
-    "collections:read",
-    "knowledge:read",
-    "keys:self",
-}
-
-_WRITER_CAPS = _READER_CAPS | {
-    "graph:write",
-    "documents:write",
-    "rows:write",
-    "collections:write",
-    "knowledge:write",
-}
-
-_ADMIN_CAPS = _WRITER_CAPS | {
-    "config:write",
-    "flows:write",
-    "users:read", "users:write", "users:admin",
-    "keys:admin",
-    "workspaces:admin",
-    "iam:admin",
-    "metrics:read",
-}
-
-
-# Role definitions.  Each role has a capability set and a workspace
-# scope.  Enterprise overrides this mapping.
-ROLE_DEFINITIONS = {
-    "reader": {
-        "capabilities": _READER_CAPS,
-        "workspace_scope": "assigned",
-    },
-    "writer": {
-        "capabilities": _WRITER_CAPS,
-        "workspace_scope": "assigned",
-    },
-    "admin": {
-        "capabilities": _ADMIN_CAPS,
-        "workspace_scope": "*",
-    },
-}
-
-
-def _scope_permits(role_name, target_workspace, assigned_workspace):
-    """Does the given role apply to ``target_workspace``?"""
-    role = ROLE_DEFINITIONS.get(role_name)
-    if role is None:
-        return False
-    scope = role["workspace_scope"]
-    if scope == "*":
-        return True
-    if scope == "assigned":
-        return target_workspace == assigned_workspace
-    # Future scope types (lists, patterns) extend here.
-    return False
-
-
-def check(identity, capability, target_workspace=None):
-    """Is ``identity`` permitted to invoke ``capability`` on
-    ``target_workspace``?
-
-    Passes iff some role held by the caller both (a) grants
-    ``capability`` and (b) is active in ``target_workspace``.
-
-    ``target_workspace`` defaults to the caller's assigned workspace,
-    which makes this function usable for system-level operations and
-    for authenticated endpoints that don't take a workspace argument
-    (the call collapses to "do any of my roles grant this cap?")."""
-    if capability not in KNOWN_CAPABILITIES:
-        return False
-
-    target = target_workspace or identity.workspace
-
-    for role_name in identity.roles:
-        role = ROLE_DEFINITIONS.get(role_name)
-        if role is None:
-            continue
-        if capability not in role["capabilities"]:
-            continue
-        if _scope_permits(role_name, target, identity.workspace):
-            return True
-    return False
-
-
 def access_denied():
    return web.HTTPForbidden(
        text='{"error":"access denied"}',
@ -174,21 +42,19 @@ def auth_failure():


 async def enforce(request, auth, capability):
-    """Authenticate + capability-check for endpoints that carry no
-    workspace dimension on the request (metrics, i18n, etc.).
+    """Authenticate the caller and (for non-sentinel capabilities)
+    ask the IAM regime whether they may invoke ``capability``.

-    For endpoints that carry a workspace field on the body, call
-    :func:`enforce_workspace` *after* parsing the body to validate
-    the workspace and re-check the capability in that scope.  Most
-    endpoints do both.
+    The resource is system-level (``{}``) and parameters are empty —
+    use :func:`enforce_workspace` for workspace-scoped endpoints, or
+    drive authorisation through the operation registry for richer
+    cases.

-    - ``PUBLIC``: no authentication, returns ``None``.
-    - ``AUTHENTICATED``: any valid identity.
-    - capability string: identity must have it, checked against the
-      caller's assigned workspace (adequate for endpoints whose
-      capability is system-level, e.g. ``metrics:read``, or where
-      the real workspace-aware check happens in
-      :func:`enforce_workspace` after body parsing)."""
+    - ``PUBLIC``: returns ``None`` — no authentication.
+    - ``AUTHENTICATED``: returns the ``Identity`` — no authorisation.
+    - capability string: returns the ``Identity`` if the regime
+      allows; raises ``HTTPForbidden`` otherwise.
+    """
    if capability == PUBLIC:
        return None

@ -197,42 +63,38 @@ async def enforce(request, auth, capability):
    if capability == AUTHENTICATED:
        return identity

-    if not check(identity, capability):
-        raise access_denied()
-
+    await auth.authorise(identity, capability, {}, {})
    return identity


-def enforce_workspace(data, identity, capability=None):
-    """Resolve + validate the workspace on a request body.
+async def enforce_workspace(data, identity, auth, capability=None):
+    """Default-fill the workspace on a request body and (optionally)
+    authorise the caller for ``capability`` against that workspace.

    - Target workspace = ``data["workspace"]`` if supplied, else the
-      caller's assigned workspace.
-    - At least one of the caller's roles must (a) be active in the
-      target workspace and, if ``capability`` is given, (b) grant
-      ``capability``.  Otherwise 403.
+      caller's bound workspace.
    - On success, ``data["workspace"]`` is overwritten with the
-      resolved value — callers can rely on the outgoing message
-      having the gateway's chosen workspace rather than any
-      caller-supplied value.
+      resolved value so downstream code sees a single canonical
+      address.
+    - When ``capability`` is given, the regime is asked whether the
+      caller may invoke ``capability`` on ``{workspace: target}``.
+      Raises ``HTTPForbidden`` on a deny.

-    For ``capability=None`` the workspace scope alone is checked —
-    useful when the body has a workspace but the endpoint already
-    passed its capability check (e.g. via :func:`enforce`)."""
+    For ``capability=None`` no authorisation call is made — the
+    caller has presumably already authorised via :func:`enforce`
+    (handy for endpoints that authorise once then resolve workspace
+    on the body before forwarding).
+    """
    if not isinstance(data, dict):
        return data

    requested = data.get("workspace", "")
    target = requested or identity.workspace
+    data["workspace"] = target

-    for role_name in identity.roles:
-        role = ROLE_DEFINITIONS.get(role_name)
-        if role is None:
-            continue
-        if capability is not None and capability not in role["capabilities"]:
-            continue
-        if _scope_permits(role_name, target, identity.workspace):
-            data["workspace"] = target
-            return data
+    if capability is not None:
+        await auth.authorise(
+            identity, capability, {"workspace": target}, {},
+        )

-    raise access_denied()
+    return data
--- a/trustgraph-flow/trustgraph/gateway/dispatch/mux.py
+++ b/trustgraph-flow/trustgraph/gateway/dispatch/mux.py
@ -121,20 +121,45 @@ class Mux:
                })
                return

-            # Workspace resolution.  Role workspace scope determines
-            # which target workspaces are permitted.  The resolved
-            # value is written to both the envelope and the inner
-            # request payload so clients don't have to repeat it
-            # per-message (same convenience HTTP callers get via
-            # enforce_workspace).
+            # Per-service capability gating.  Resolved through the
+            # operation registry so the WS path matches what HTTP
+            # callers see — same authority, same caps.  Service
+            # kinds that aren't registered are refused.
+            from ..registry import lookup as _registry_lookup
            from ..capabilities import enforce_workspace
            from aiohttp import web as _web

+            service = data.get("service", "")
+            op = _registry_lookup(f"flow-service:{service}")
+            if op is None:
+                await self.ws.send_json({
+                    "id": request_id,
+                    "error": {
+                        "message": "unknown service",
+                        "type": "unknown-service",
+                    },
+                    "complete": True,
+                })
+                return
+
+            # Workspace + flow form the resource address for a
+            # flow-level service call.  Resolve workspace first
+            # (default-fill from the caller's bound workspace),
+            # then ask the regime to authorise the service-level
+            # capability against that {workspace, flow} resource.
            try:
-                enforce_workspace(data, self.identity)
+                await enforce_workspace(data, self.identity, self.auth)
                inner = data.get("request")
                if isinstance(inner, dict):
-                    enforce_workspace(inner, self.identity)
+                    await enforce_workspace(inner, self.identity, self.auth)
+
+                resource = {
+                    "workspace": data.get("workspace", ""),
+                    "flow": data.get("flow", ""),
+                }
+                await self.auth.authorise(
+                    self.identity, op.capability, resource, {},
+                )
            except _web.HTTPForbidden:
                await self.ws.send_json({
                    "id": request_id,
@ -145,6 +170,16 @@ class Mux:
                    "complete": True,
                })
                return
+            except _web.HTTPUnauthorized:
+                await self.ws.send_json({
+                    "id": request_id,
+                    "error": {
+                        "message": "auth failure",
+                        "type": "auth-required",
+                    },
+                    "complete": True,
+                })
+                return

            workspace = data["workspace"]

--- a/trustgraph-flow/trustgraph/gateway/endpoint/auth_endpoints.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/auth_endpoints.py
@ -97,7 +97,7 @@ class AuthEndpoints:
            )
        req = {
            "operation": "change-password",
-            "user_id": identity.user_id,
+            "user_id": identity.handle,
            "password": body.get("current_password", ""),
            "new_password": body.get("new_password", ""),
        }
--- a/trustgraph-flow/trustgraph/gateway/endpoint/constant_endpoint.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/constant_endpoint.py
@ -36,7 +36,7 @@ class ConstantEndpoint:
            data = await request.json()

            if identity is not None:
-                enforce_workspace(data, identity)
+                await enforce_workspace(data, identity, self.auth)

            async def responder(x, fin):
                pass
--- a/trustgraph-flow/trustgraph/gateway/endpoint/iam_endpoint.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/iam_endpoint.py
@ -0,0 +1,106 @@
+"""
+Registry-driven /api/v1/iam endpoint.
+
+The gateway no longer gates IAM management with a single coarse
+``users:admin`` capability.  Instead, each operation declares its
+own capability + resource shape in the registry (``registry.py``);
+this endpoint reads the body's ``operation`` field, looks up the
+declaration, and asks the IAM regime to authorise the call.
+
+Operations not in the registry produce a 400 ``unknown operation``.
+This is the gateway's primary mechanism for fail-closed gating of
+the IAM surface — the registry is the source of truth.
+"""
+
+import logging
+
+from aiohttp import web
+
+from .. capabilities import (
+    PUBLIC, AUTHENTICATED, auth_failure,
+)
+from .. registry import lookup, RequestContext
+
+logger = logging.getLogger("iam-endpoint")
+logger.setLevel(logging.INFO)
+
+
+class IamEndpoint:
+    """POST /api/v1/iam — generic forwarder gated by the operation
+    registry.  The IAM dispatcher (``iam_dispatcher``) forwards the
+    body verbatim to iam-svc once authorisation succeeds."""
+
+    def __init__(self, endpoint_path, auth, dispatcher):
+        self.path = endpoint_path
+        self.auth = auth
+        self.dispatcher = dispatcher
+
+    async def start(self):
+        pass
+
+    def add_routes(self, app):
+        app.add_routes([web.post(self.path, self.handle)])
+
+    async def handle(self, request):
+        try:
+            body = await request.json()
+        except Exception:
+            return web.json_response(
+                {"error": "invalid json"}, status=400,
+            )
+        if not isinstance(body, dict):
+            return web.json_response(
+                {"error": "body must be an object"}, status=400,
+            )
+
+        op_name = body.get("operation", "")
+        op = lookup(op_name)
+        if op is None:
+            return web.json_response(
+                {"error": "unknown operation"}, status=400,
+            )
+
+        # Authentication: required for everything except PUBLIC.
+        identity = None
+        if op.capability != PUBLIC:
+            try:
+                identity = await self.auth.authenticate(request)
+            except web.HTTPException:
+                raise
+
+        # Authorisation: capability sentinels short-circuit the
+        # regime call; capability strings go through authorise().
+        if op.capability not in (PUBLIC, AUTHENTICATED):
+            ctx = RequestContext(
+                body=body,
+                match_info=dict(request.match_info),
+                identity=identity,
+            )
+            try:
+                resource = op.extract_resource(ctx)
+                parameters = op.extract_parameters(ctx)
+            except Exception as e:
+                logger.warning(
+                    f"extractor failed for {op_name!r}: "
+                    f"{type(e).__name__}: {e}"
+                )
+                return web.json_response(
+                    {"error": "bad request"}, status=400,
+                )
+
+            await self.auth.authorise(
+                identity, op.capability, resource, parameters,
+            )
+
+        async def responder(x, fin):
+            pass
+
+        try:
+            resp = await self.dispatcher.process(body, responder)
+        except web.HTTPException:
+            raise
+        except Exception as e:
+            logger.error(f"Exception: {e}", exc_info=True)
+            return web.json_response({"error": str(e)})
+
+        return web.json_response(resp)
--- a/trustgraph-flow/trustgraph/gateway/endpoint/manager.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/manager.py
@ -9,90 +9,44 @@ from . socket import SocketEndpoint
 from . metrics import MetricsEndpoint
 from . i18n import I18nPackEndpoint
 from . auth_endpoints import AuthEndpoints
+from . iam_endpoint import IamEndpoint
+from . registry_endpoint import RegistryRoutedVariableEndpoint

-from .. capabilities import PUBLIC, AUTHENTICATED
+from .. capabilities import PUBLIC, AUTHENTICATED, auth_failure
+from .. registry import lookup as _registry_lookup, RequestContext

 from .. dispatch.manager import DispatcherManager


-# Capability required for each kind on the /api/v1/{kind} generic
-# endpoint (global services).  Coarse gating — the IAM bundle split
-# of "read vs write" per admin subsystem is not applied here because
-# this endpoint forwards an opaque operation in the body.  Writes
-# are the upper bound on what the endpoint can do, so we gate on
-# the write/admin capability.
-GLOBAL_KIND_CAPABILITY = {
-    "config": "config:write",
-    "flow": "flows:write",
-    "librarian": "documents:write",
-    "knowledge": "knowledge:write",
-    "collection-management": "collections:write",
-    # IAM endpoints land on /api/v1/iam and require the admin bundle.
-    # Login / bootstrap / change-password are served by
-    # AuthEndpoints, which handle their own gating (PUBLIC /
-    # AUTHENTICATED).
-    "iam": "users:admin",
-}
+# /api/v1/{kind} (config / flow / librarian / knowledge /
+# collection-management), /api/v1/iam, and /api/v1/flow/{flow}/...
+# routes are all gated per-operation by the registry, not by a
+# per-kind capability map.  Login / bootstrap / change-password are
+# served by AuthEndpoints with their own PUBLIC / AUTHENTICATED
+# sentinels.


-# Capability required for each kind on the
-# /api/v1/flow/{flow}/service/{kind} endpoint (per-flow data-plane).
-FLOW_KIND_CAPABILITY = {
-    "agent": "agent",
-    "text-completion": "llm",
-    "prompt": "llm",
-    "mcp-tool": "mcp",
-    "graph-rag": "graph:read",
-    "document-rag": "documents:read",
-    "embeddings": "embeddings",
-    "graph-embeddings": "graph:read",
-    "document-embeddings": "documents:read",
-    "triples": "graph:read",
-    "rows": "rows:read",
-    "nlp-query": "rows:read",
-    "structured-query": "rows:read",
-    "structured-diag": "rows:read",
-    "row-embeddings": "rows:read",
-    "sparql": "graph:read",
-}
-
-
-# Capability for the streaming flow import/export endpoints,
-# keyed by the "kind" URL segment.
-FLOW_IMPORT_CAPABILITY = {
-    "triples": "graph:write",
-    "graph-embeddings": "graph:write",
-    "document-embeddings": "documents:write",
-    "entity-contexts": "documents:write",
-    "rows": "rows:write",
-}
-
-FLOW_EXPORT_CAPABILITY = {
-    "triples": "graph:read",
-    "graph-embeddings": "graph:read",
-    "document-embeddings": "documents:read",
-    "entity-contexts": "documents:read",
-}
-
-
-from .. capabilities import enforce, enforce_workspace
 import logging as _mgr_logging
 _mgr_logger = _mgr_logging.getLogger("endpoint")


 class _RoutedVariableEndpoint:
-    """HTTP endpoint whose required capability is looked up per
-    request from the URL's ``kind`` parameter.  Used for the two
-    generic dispatch paths (``/api/v1/{kind}`` and
-    ``/api/v1/flow/{flow}/service/{kind}``).  Self-contained rather
-    than subclassing ``VariableEndpoint`` to avoid mutating shared
-    state across concurrent requests."""
+    """HTTP endpoint that gates per request via the operation
+    registry.  The URL's ``kind`` parameter combined with a fixed
+    ``registry_prefix`` yields the registry key — e.g. prefix
+    ``flow-service`` and kind ``agent`` looks up
+    ``flow-service:agent``.

-    def __init__(self, endpoint_path, auth, dispatcher, capability_map):
+    Used for ``/api/v1/flow/{flow}/service/{kind}`` (per-flow
+    data-plane services).  ``/api/v1/{kind}`` (workspace-level
+    global services) goes through ``RegistryRoutedVariableEndpoint``
+    which discriminates on body operation as well as URL kind."""
+
+    def __init__(self, endpoint_path, auth, dispatcher, registry_prefix):
        self.path = endpoint_path
        self.auth = auth
        self.dispatcher = dispatcher
-        self._capability_map = capability_map
+        self._registry_prefix = registry_prefix

    async def start(self):
        pass
@ -102,18 +56,26 @@ class _RoutedVariableEndpoint:

    async def handle(self, request):
        kind = request.match_info.get("kind", "")
-        cap = self._capability_map.get(kind)
-        if cap is None:
+        op = _registry_lookup(f"{self._registry_prefix}:{kind}")
+        if op is None:
            return web.json_response(
                {"error": "unknown kind"}, status=404,
            )

-        identity = await enforce(request, self.auth, cap)
+        identity = await self.auth.authenticate(request)

        try:
            data = await request.json()
-            if identity is not None:
-                enforce_workspace(data, identity)
+            ctx = RequestContext(
+                body=data if isinstance(data, dict) else {},
+                match_info=dict(request.match_info),
+                identity=identity,
+            )
+            resource = op.extract_resource(ctx)
+            parameters = op.extract_parameters(ctx)
+            await self.auth.authorise(
+                identity, op.capability, resource, parameters,
+            )

            async def responder(x, fin):
                pass
@ -131,15 +93,15 @@ class _RoutedVariableEndpoint:


 class _RoutedSocketEndpoint:
-    """WebSocket endpoint whose required capability is looked up per
-    request from the URL's ``kind`` parameter.  Used for the flow
-    import/export streaming endpoints."""
+    """WebSocket endpoint gated per request via the operation
+    registry.  Like ``_RoutedVariableEndpoint`` but for the
+    streaming flow import / export socket paths."""

-    def __init__(self, endpoint_path, auth, dispatcher, capability_map):
+    def __init__(self, endpoint_path, auth, dispatcher, registry_prefix):
        self.path = endpoint_path
        self.auth = auth
        self.dispatcher = dispatcher
-        self._capability_map = capability_map
+        self._registry_prefix = registry_prefix

    async def start(self):
        pass
@ -148,11 +110,9 @@ class _RoutedSocketEndpoint:
        app.add_routes([web.get(self.path, self.handle)])

    async def handle(self, request):
-        from .. capabilities import check, auth_failure, access_denied
-
        kind = request.match_info.get("kind", "")
-        cap = self._capability_map.get(kind)
-        if cap is None:
+        op = _registry_lookup(f"{self._registry_prefix}:{kind}")
+        if op is None:
            return web.json_response(
                {"error": "unknown kind"}, status=404,
            )
@ -168,8 +128,20 @@ class _RoutedSocketEndpoint:
            )
        except web.HTTPException as e:
            return e
-        if not check(identity, cap):
-            return access_denied()
+
+        ctx = RequestContext(
+            body={},
+            match_info=dict(request.match_info),
+            identity=identity,
+        )
+        try:
+            resource = op.extract_resource(ctx)
+            parameters = op.extract_parameters(ctx)
+            await self.auth.authorise(
+                identity, op.capability, resource, parameters,
+            )
+        except web.HTTPException as e:
+            return e

        # Delegate the websocket handling to a standalone SocketEndpoint
        # with the resolved capability, bypassing the per-request mutation
@ -178,7 +150,7 @@ class _RoutedSocketEndpoint:
            endpoint_path=self.path,
            auth=self.auth,
            dispatcher=self.dispatcher,
-            capability=cap,
+            capability=op.capability,
        )
        return await ws_ep.handle(request)

@ -203,6 +175,18 @@ class EndpointManager:
                auth=auth,
            ),

+            # /api/v1/iam — registry-driven IAM management.  Per
+            # operation gating happens inside IamEndpoint via the
+            # operation registry; the dispatcher forwards verbatim
+            # to iam-svc once authorisation has succeeded.  Listed
+            # before the generic /api/v1/{kind} route so it wins
+            # the match for "iam".
+            IamEndpoint(
+                endpoint_path="/api/v1/iam",
+                auth=auth,
+                dispatcher=dispatcher_manager.dispatch_auth_iam(),
+            ),
+
            I18nPackEndpoint(
                endpoint_path="/api/v1/i18n/packs/{lang}",
                auth=auth,
@ -215,12 +199,16 @@ class EndpointManager:
                capability="metrics:read",
            ),

-            # Global services: capability chosen per-kind.
-            _RoutedVariableEndpoint(
+            # Global services: registry-driven per-operation gating.
+            # Each kind+op combination has a registry entry that
+            # declares its capability and resource shape.  Listed
+            # after the IAM and auth-surface routes; aiohttp's
+            # path matcher prefers the more-specific path so this
+            # variable route doesn't shadow them.
+            RegistryRoutedVariableEndpoint(
                endpoint_path="/api/v1/{kind}",
                auth=auth,
                dispatcher=dispatcher_manager.dispatch_global_service(),
-                capability_map=GLOBAL_KIND_CAPABILITY,
            ),

            # /api/v1/socket: WebSocket handshake accepts
@ -236,26 +224,29 @@ class EndpointManager:
                in_band_auth=True,
            ),

-            # Per-flow request/response services — capability per kind.
+            # Per-flow request/response services — gated per
+            # ``flow-service:<kind>`` registry entry.
            _RoutedVariableEndpoint(
                endpoint_path="/api/v1/flow/{flow}/service/{kind}",
                auth=auth,
                dispatcher=dispatcher_manager.dispatch_flow_service(),
-                capability_map=FLOW_KIND_CAPABILITY,
+                registry_prefix="flow-service",
            ),

-            # Per-flow streaming import/export — capability per kind.
+            # Per-flow streaming import/export — gated per
+            # ``flow-import:<kind>`` / ``flow-export:<kind>`` registry
+            # entry.
            _RoutedSocketEndpoint(
                endpoint_path="/api/v1/flow/{flow}/import/{kind}",
                auth=auth,
                dispatcher=dispatcher_manager.dispatch_flow_import(),
-                capability_map=FLOW_IMPORT_CAPABILITY,
+                registry_prefix="flow-import",
            ),
            _RoutedSocketEndpoint(
                endpoint_path="/api/v1/flow/{flow}/export/{kind}",
                auth=auth,
                dispatcher=dispatcher_manager.dispatch_flow_export(),
-                capability_map=FLOW_EXPORT_CAPABILITY,
+                registry_prefix="flow-export",
            ),

            StreamEndpoint(
--- a/trustgraph-flow/trustgraph/gateway/endpoint/registry_endpoint.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/registry_endpoint.py
@ -0,0 +1,123 @@
+"""
+Registry-driven dispatch for ``/api/v1/{kind}`` global services.
+
+The body's ``operation`` field plus the URL's ``{kind}`` together
+form the canonical operation name (``<kind>:<operation>``) that the
+gateway looks up in ``registry.py``.  The matched operation
+declares its capability and resource shape; this endpoint asks the
+IAM regime to authorise the call before forwarding the body
+verbatim to the backend dispatcher.
+
+The dispatcher is the same ``dispatch_global_service()`` factory the
+old coarse path used; only the gating layer has changed.
+
+Operations not present in the registry are rejected with 400
+``unknown operation`` — fail closed.
+"""
+
+import logging
+
+from aiohttp import web
+
+from .. capabilities import (
+    PUBLIC, AUTHENTICATED, auth_failure,
+)
+from .. registry import lookup, RequestContext
+
+logger = logging.getLogger("registry-endpoint")
+logger.setLevel(logging.INFO)
+
+
+class RegistryRoutedVariableEndpoint:
+    """POST /api/v1/{kind} — kind comes from the URL, operation comes
+    from the body, both are joined as the registry key."""
+
+    def __init__(self, endpoint_path, auth, dispatcher):
+        self.path = endpoint_path
+        self.auth = auth
+        self.dispatcher = dispatcher
+
+    async def start(self):
+        pass
+
+    def add_routes(self, app):
+        app.add_routes([web.post(self.path, self.handle)])
+
+    async def handle(self, request):
+        kind = request.match_info.get("kind", "")
+        if not kind:
+            return web.json_response(
+                {"error": "missing kind"}, status=404,
+            )
+
+        try:
+            body = await request.json()
+        except Exception:
+            return web.json_response(
+                {"error": "invalid json"}, status=400,
+            )
+        if not isinstance(body, dict):
+            return web.json_response(
+                {"error": "body must be an object"}, status=400,
+            )
+
+        op_name = body.get("operation", "")
+        if not op_name:
+            return web.json_response(
+                {"error": "missing operation"}, status=400,
+            )
+
+        registry_key = f"{kind}:{op_name}"
+        op = lookup(registry_key)
+        if op is None:
+            return web.json_response(
+                {"error": "unknown operation"}, status=400,
+            )
+
+        identity = None
+        if op.capability != PUBLIC:
+            identity = await self.auth.authenticate(request)
+
+        if op.capability not in (PUBLIC, AUTHENTICATED):
+            ctx = RequestContext(
+                body=body,
+                match_info=dict(request.match_info),
+                identity=identity,
+            )
+            try:
+                resource = op.extract_resource(ctx)
+                parameters = op.extract_parameters(ctx)
+            except Exception as e:
+                logger.warning(
+                    f"extractor failed for {registry_key!r}: "
+                    f"{type(e).__name__}: {e}"
+                )
+                return web.json_response(
+                    {"error": "bad request"}, status=400,
+                )
+
+            await self.auth.authorise(
+                identity, op.capability, resource, parameters,
+            )
+
+            # Default-fill workspace into the body so downstream
+            # dispatchers see the canonical resolved value.  The
+            # extractor has already pulled the workspace out;
+            # mirror it back to the body for the verbatim forward.
+            if "workspace" in resource:
+                body["workspace"] = resource["workspace"]
+
+        async def responder(x, fin):
+            pass
+
+        try:
+            resp = await self.dispatcher.process(
+                body, responder, request.match_info,
+            )
+        except web.HTTPException:
+            raise
+        except Exception as e:
+            logger.error(f"Exception: {e}", exc_info=True)
+            return web.json_response({"error": str(e)})
+
+        return web.json_response(resp)
--- a/trustgraph-flow/trustgraph/gateway/endpoint/socket.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/socket.py
@ -5,7 +5,7 @@ import logging

 from .. running import Running
 from .. capabilities import (
-    PUBLIC, AUTHENTICATED, check, auth_failure, access_denied,
+    PUBLIC, AUTHENTICATED, auth_failure,
 )

 logger = logging.getLogger("socket")
@ -97,8 +97,12 @@ class SocketEndpoint:
            except web.HTTPException as e:
                return e
            if self.capability != AUTHENTICATED:
-                if not check(identity, self.capability):
-                    return access_denied()
+                try:
+                    await self.auth.authorise(
+                        identity, self.capability, {}, {},
+                    )
+                except web.HTTPException as e:
+                    return e

        # 50MB max message size
        ws = web.WebSocketResponse(max_msg_size=52428800)
--- a/trustgraph-flow/trustgraph/gateway/endpoint/variable_endpoint.py
+++ b/trustgraph-flow/trustgraph/gateway/endpoint/variable_endpoint.py
@ -36,7 +36,7 @@ class VariableEndpoint:
            data = await request.json()

            if identity is not None:
-                enforce_workspace(data, identity)
+                await enforce_workspace(data, identity, self.auth)

            async def responder(x, fin):
                pass
--- a/trustgraph-flow/trustgraph/gateway/registry.py
+++ b/trustgraph-flow/trustgraph/gateway/registry.py
@ -0,0 +1,515 @@
+"""
+Gateway operation registry.
+
+Single declarative table mapping each operation the gateway
+recognises to:
+
+- The capability the IAM regime is asked to authorise against.
+- The resource level (system / workspace / flow) — determines the
+  shape of the resource identifier handed to ``authorise``.
+- Extractors that build the resource and parameters from the
+  request context.
+
+This is a gateway-internal concept.  It is not part of the IAM
+contract — the contract specifies what arguments ``authorise``
+receives; the registry is how the gateway populates them.
+
+See docs/tech-specs/iam-contract.md for the contract and
+docs/tech-specs/iam.md for the request anatomy.
+"""
+
+from dataclasses import dataclass, field
+from typing import Any, Callable
+
+
+# Sentinels for operations that don't go through capability-based
+# authorisation.  Mirror the values used in capabilities.py so the
+# gateway endpoint layer can recognise them uniformly.
+PUBLIC = "__public__"
+AUTHENTICATED = "__authenticated__"
+
+
+class ResourceLevel:
+    """Where the operation's resource lives.
+
+    ``SYSTEM``    — operation acts on a deployment-level resource
+                    (the user registry, the workspace registry,
+                    the signing key).  resource = {}.  Workspace,
+                    if relevant, is a parameter, not an address.
+
+    ``WORKSPACE`` — operation acts on something within a workspace
+                    (config, library, knowledge, collections, flow
+                    lifecycle).  resource = {workspace}.
+
+    ``FLOW``      — operation acts on something within a flow
+                    within a workspace (graph, agent, llm, etc.).
+                    resource = {workspace, flow}.
+    """
+    SYSTEM = "system"
+    WORKSPACE = "workspace"
+    FLOW = "flow"
+
+
+@dataclass
+class RequestContext:
+    """The bundle of inputs the registry's extractors operate on.
+    Assembled by the gateway from the incoming request after
+    authentication."""
+
+    # Parsed JSON body (HTTP) or inner request payload (WebSocket).
+    body: dict = field(default_factory=dict)
+
+    # URL path components (HTTP) or WebSocket envelope routing
+    # fields (id, service, workspace, flow).
+    match_info: dict = field(default_factory=dict)
+
+    # Authenticated identity for default-fill-in.  Always present
+    # by the time extractors run, except for PUBLIC operations
+    # where it is None.
+    identity: Any = None
+
+
+@dataclass
+class Operation:
+    """Declared operation the gateway can dispatch + authorise."""
+
+    # Canonical operation name (used for registry lookup, audit,
+    # debug logs).  Mirrors the operation strings in the IAM
+    # service and other backends where applicable.
+    name: str
+
+    # Capability required to invoke this operation.  Either a
+    # string from the capability vocabulary in capabilities.md, or
+    # the PUBLIC / AUTHENTICATED sentinel for operations that
+    # don't go through capability-based authorisation.
+    capability: str
+
+    # Where the operation's resource lives.  Determines the
+    # shape of the resource argument passed to authorise.
+    resource_level: str
+
+    # Build the resource identifier from the request context.
+    # Returns a dict with the appropriate components for the
+    # resource level: {} for SYSTEM, {workspace} for WORKSPACE,
+    # {workspace, flow} for FLOW.  Default-fill-in of workspace
+    # from identity.workspace happens here when applicable.
+    extract_resource: Callable[[RequestContext], dict]
+
+    # Build the parameters dict — decision-relevant fields the
+    # operation supplied that are not part of the resource
+    # address.  E.g. workspace association on a system-level
+    # user-registry operation.
+    extract_parameters: Callable[[RequestContext], dict]
+
+
+# ---------------------------------------------------------------------------
+# Registry storage.
+# ---------------------------------------------------------------------------
+
+
+_REGISTRY: dict[str, Operation] = {}
+
+
+def register(op: Operation) -> None:
+    if op.name in _REGISTRY:
+        raise RuntimeError(
+            f"operation {op.name!r} already registered"
+        )
+    _REGISTRY[op.name] = op
+
+
+def lookup(name: str) -> Operation | None:
+    return _REGISTRY.get(name)
+
+
+def all_operations() -> list[Operation]:
+    return list(_REGISTRY.values())
+
+
+# ---------------------------------------------------------------------------
+# Common extractor helpers.
+# ---------------------------------------------------------------------------
+
+
+def _empty_resource(_ctx: RequestContext) -> dict:
+    """System-level resource: empty dict."""
+    return {}
+
+
+def _workspace_from_body(ctx: RequestContext) -> dict:
+    """Workspace-level resource sourced from the request body's
+    workspace field, defaulting to the caller's bound workspace."""
+    ws = (ctx.body.get("workspace") if isinstance(ctx.body, dict) else "")
+    if not ws and ctx.identity is not None:
+        ws = ctx.identity.workspace
+    return {"workspace": ws}
+
+
+def _flow_from_match_info(ctx: RequestContext) -> dict:
+    """Flow-level resource sourced from URL path components or WS
+    envelope fields.  Both ``workspace`` and ``flow`` are required;
+    no default-fill-in (the address is the operation's identity)."""
+    return {
+        "workspace": ctx.match_info.get("workspace", ""),
+        "flow": ctx.match_info.get("flow", ""),
+    }
+
+
+def _no_parameters(_ctx: RequestContext) -> dict:
+    return {}
+
+
+def _body_as_parameters(ctx: RequestContext) -> dict:
+    """All body fields are parameters — used when the operation's
+    body is small and uniformly decision-relevant (e.g. user-
+    registry ops where the body's user.workspace is what the
+    regime checks against the admin's scope)."""
+    return dict(ctx.body) if isinstance(ctx.body, dict) else {}
+
+
+def _workspace_param_only(ctx: RequestContext) -> dict:
+    """Parameters dict carrying only the workspace association.
+    Used by system-level operations (e.g. user-registry ops) where
+    the workspace isn't part of the resource address but is the
+    field the regime uses to scope the admin's authority.
+
+    Pulls the workspace from the inner ``user`` / ``workspace_record``
+    body field if present (create-user, create-workspace), then from
+    the top-level body, then from the caller's bound workspace."""
+    body = ctx.body if isinstance(ctx.body, dict) else {}
+    inner_user = body.get("user") if isinstance(body.get("user"), dict) else {}
+    inner_ws = (
+        body.get("workspace_record")
+        if isinstance(body.get("workspace_record"), dict) else {}
+    )
+    ws = (
+        inner_user.get("workspace")
+        or inner_ws.get("id")
+        or body.get("workspace")
+    )
+    if not ws and ctx.identity is not None:
+        ws = ctx.identity.workspace
+    return {"workspace": ws or ""}
+
+
+# ---------------------------------------------------------------------------
+# Operation registrations.
+#
+# The gateway looks operations up by their canonical name (the same
+# string the request body / WS envelope carries in its ``operation``
+# field where applicable).  Auth-surface operations (login, bootstrap,
+# change-password) are not listed here — they have their own routes
+# in auth_endpoints.py and use PUBLIC / AUTHENTICATED sentinels
+# directly.  Pure gateway↔IAM internal operations (resolve-api-key,
+# authorise, authorise-many, get-signing-key-public) are likewise
+# excluded; they are never invoked over the public API.
+# ---------------------------------------------------------------------------
+
+
+# IAM management operations.  All routed through /api/v1/iam, body
+# carries ``operation`` plus operation-specific fields.
+
+# User registry: SYSTEM-level resource (users are global, identified
+# by handle).  The admin's authority is scoped per workspace via the
+# parameters {workspace} field — that's what the regime checks
+# against the admin's role workspace_scope.
+register(Operation(
+    name="create-user",
+    capability="users:admin",
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_workspace_param_only,
+))
+register(Operation(
+    name="list-users",
+    capability="users:read",
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_workspace_param_only,
+))
+register(Operation(
+    name="get-user",
+    capability="users:read",
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_workspace_param_only,
+))
+register(Operation(
+    name="update-user",
+    capability="users:write",
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_workspace_param_only,
+))
+register(Operation(
+    name="disable-user",
+    capability="users:admin",
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_workspace_param_only,
+))
+register(Operation(
+    name="enable-user",
+    capability="users:admin",
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_workspace_param_only,
+))
+register(Operation(
+    name="delete-user",
+    capability="users:admin",
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_workspace_param_only,
+))
+register(Operation(
+    name="reset-password",
+    capability="users:admin",
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_workspace_param_only,
+))
+
+
+# API keys: workspace-level resource — keys live within a workspace.
+register(Operation(
+    name="create-api-key",
+    capability="keys:admin",
+    resource_level=ResourceLevel.WORKSPACE,
+    extract_resource=_workspace_from_body,
+    extract_parameters=_no_parameters,
+))
+register(Operation(
+    name="list-api-keys",
+    capability="keys:admin",
+    resource_level=ResourceLevel.WORKSPACE,
+    extract_resource=_workspace_from_body,
+    extract_parameters=_no_parameters,
+))
+register(Operation(
+    name="revoke-api-key",
+    capability="keys:admin",
+    resource_level=ResourceLevel.WORKSPACE,
+    extract_resource=_workspace_from_body,
+    extract_parameters=_no_parameters,
+))
+
+
+# Workspace registry: SYSTEM-level resource (workspaces are the
+# top-level addressable unit).  No parameters — the workspace being
+# acted on is identified by the body, not used as a scope cue.
+register(Operation(
+    name="create-workspace",
+    capability="workspaces:admin",
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_no_parameters,
+))
+register(Operation(
+    name="list-workspaces",
+    capability="workspaces:admin",
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_no_parameters,
+))
+register(Operation(
+    name="get-workspace",
+    capability="workspaces:admin",
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_no_parameters,
+))
+register(Operation(
+    name="update-workspace",
+    capability="workspaces:admin",
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_no_parameters,
+))
+register(Operation(
+    name="disable-workspace",
+    capability="workspaces:admin",
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_no_parameters,
+))
+
+
+# Signing key: SYSTEM-level operational op.
+register(Operation(
+    name="rotate-signing-key",
+    capability="iam:admin",
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_no_parameters,
+))
+
+
+# ---------------------------------------------------------------------------
+# Auth-surface entries.
+#
+# Listed here so the registry is the one place the gateway looks for
+# operation→capability mappings — including the sentinels for paths
+# that don't go through capability-based authorisation.  The actual
+# routing is in auth_endpoints.py; these entries let the registry-
+# driven dispatcher recognise the operation if it sees it on a
+# generic path.
+# ---------------------------------------------------------------------------
+
+register(Operation(
+    name="login",
+    capability=PUBLIC,
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_no_parameters,
+))
+register(Operation(
+    name="bootstrap",
+    capability=PUBLIC,
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_no_parameters,
+))
+register(Operation(
+    name="change-password",
+    capability=AUTHENTICATED,
+    resource_level=ResourceLevel.SYSTEM,
+    extract_resource=_empty_resource,
+    extract_parameters=_no_parameters,
+))
+
+
+# ---------------------------------------------------------------------------
+# Generic kind/operation entries.
+#
+# Names are ``<kind>:<operation>`` so the registry key is unique
+# across dispatchers.  All entries below are workspace-level
+# resources (workspace defaulted from the caller's bound workspace
+# if absent).  Read/write distinction maps to the existing
+# ``<subject>:read`` / ``<subject>:write`` capability vocabulary
+# defined in capabilities.md.
+# ---------------------------------------------------------------------------
+
+
+def _register_kind_op(kind: str, op: str, capability: str) -> None:
+    """Helper: register a workspace-level kind:op with the standard
+    extractors (workspace from body, no extra parameters)."""
+    register(Operation(
+        name=f"{kind}:{op}",
+        capability=capability,
+        resource_level=ResourceLevel.WORKSPACE,
+        extract_resource=_workspace_from_body,
+        extract_parameters=_no_parameters,
+    ))
+
+
+# config: KV-style workspace config service.
+for _op in ("get", "list", "getvalues", "getvalues-all-ws", "config"):
+    _register_kind_op("config", _op, "config:read")
+for _op in ("put", "delete"):
+    _register_kind_op("config", _op, "config:write")
+
+
+# flow: flow-blueprint and flow-lifecycle service.
+for _op in ("list-blueprints", "get-blueprint", "list-flows", "get-flow"):
+    _register_kind_op("flow", _op, "flows:read")
+for _op in ("put-blueprint", "delete-blueprint", "start-flow", "stop-flow"):
+    _register_kind_op("flow", _op, "flows:write")
+
+
+# librarian: document storage and processing service.
+for _op in (
+    "get-document-metadata", "get-document-content",
+    "stream-document", "list-documents", "list-processing",
+    "get-upload-status", "list-uploads",
+):
+    _register_kind_op("librarian", _op, "documents:read")
+for _op in (
+    "add-document", "remove-document", "update-document",
+    "add-processing", "remove-processing",
+    "begin-upload", "upload-chunk", "complete-upload", "abort-upload",
+):
+    _register_kind_op("librarian", _op, "documents:write")
+
+
+# knowledge: knowledge-graph core service.
+for _op in ("get-kg-core", "list-kg-cores"):
+    _register_kind_op("knowledge", _op, "knowledge:read")
+for _op in ("put-kg-core", "delete-kg-core",
+            "load-kg-core", "unload-kg-core"):
+    _register_kind_op("knowledge", _op, "knowledge:write")
+
+
+# collection-management: workspace collection lifecycle.
+_register_kind_op("collection-management", "list-collections", "collections:read")
+for _op in ("update-collection", "delete-collection"):
+    _register_kind_op("collection-management", _op, "collections:write")
+
+
+# ---------------------------------------------------------------------------
+# Per-flow data-plane services.
+#
+# /api/v1/flow/{flow}/service/{kind} and the streaming
+# /api/v1/flow/{flow}/{import,export}/{kind} paths.  No body-level
+# ``operation`` discriminator — the URL kind is the operation
+# identity.  Resource is FLOW level (workspace + flow).
+#
+# Names: ``flow-service:<kind>``, ``flow-import:<kind>``,
+# ``flow-export:<kind>``.
+# ---------------------------------------------------------------------------
+
+
+def _register_flow_kind(prefix: str, kind: str, capability: str) -> None:
+    register(Operation(
+        name=f"{prefix}:{kind}",
+        capability=capability,
+        resource_level=ResourceLevel.FLOW,
+        extract_resource=_flow_from_match_info,
+        extract_parameters=_no_parameters,
+    ))
+
+
+# Request/response services on /api/v1/flow/{flow}/service/{kind}.
+_FLOW_SERVICES = {
+    "agent": "agent",
+    "text-completion": "llm",
+    "prompt": "llm",
+    "mcp-tool": "mcp",
+    "graph-rag": "graph:read",
+    "document-rag": "documents:read",
+    "embeddings": "embeddings",
+    "graph-embeddings": "graph:read",
+    "document-embeddings": "documents:read",
+    "triples": "graph:read",
+    "rows": "rows:read",
+    "nlp-query": "rows:read",
+    "structured-query": "rows:read",
+    "structured-diag": "rows:read",
+    "row-embeddings": "rows:read",
+    "sparql": "graph:read",
+}
+for _kind, _cap in _FLOW_SERVICES.items():
+    _register_flow_kind("flow-service", _kind, _cap)
+
+
+# Streaming import socket endpoints.
+_FLOW_IMPORTS = {
+    "triples": "graph:write",
+    "graph-embeddings": "graph:write",
+    "document-embeddings": "documents:write",
+    "entity-contexts": "documents:write",
+    "rows": "rows:write",
+}
+for _kind, _cap in _FLOW_IMPORTS.items():
+    _register_flow_kind("flow-import", _kind, _cap)
+
+
+# Streaming export socket endpoints.
+_FLOW_EXPORTS = {
+    "triples": "graph:read",
+    "graph-embeddings": "graph:read",
+    "document-embeddings": "documents:read",
+    "entity-contexts": "documents:read",
+}
+for _kind, _cap in _FLOW_EXPORTS.items():
+    _register_flow_kind("flow-export", _kind, _cap)
--- a/trustgraph-flow/trustgraph/iam/service/iam.py
+++ b/trustgraph-flow/trustgraph/iam/service/iam.py
@ -40,6 +40,78 @@ API_KEY_RANDOM_BYTES = 24
 JWT_ISSUER = "trustgraph-iam"
 JWT_TTL_SECONDS = 3600

+# Default authorisation cache TTL the regime tells the gateway to
+# observe.  60s is the OSS-spec maximum revocation latency: a role
+# change, workspace disable, or key revoke takes effect within at
+# most this much time.
+AUTHZ_CACHE_TTL_SECONDS = 60
+
+
+# OSS regime role table.  Lives here, not in the gateway — the
+# gateway is regime-agnostic and must not encode policy.
+#
+# Each role has a capability set and a workspace scope.  The
+# evaluator (handle_authorise below) checks (a) that some role
+# held by the caller grants the requested capability, and (b)
+# that role's workspace scope permits the target workspace.
+
+_READER_CAPS = {
+    "agent",
+    "graph:read",
+    "documents:read",
+    "rows:read",
+    "llm",
+    "embeddings",
+    "mcp",
+    "config:read",
+    "flows:read",
+    "collections:read",
+    "knowledge:read",
+    "keys:self",
+}
+
+_WRITER_CAPS = _READER_CAPS | {
+    "graph:write",
+    "documents:write",
+    "rows:write",
+    "collections:write",
+    "knowledge:write",
+}
+
+_ADMIN_CAPS = _WRITER_CAPS | {
+    "config:write",
+    "flows:write",
+    "users:read", "users:write", "users:admin",
+    "keys:admin",
+    "workspaces:admin",
+    "iam:admin",
+    "metrics:read",
+}
+
+ROLE_DEFINITIONS = {
+    "reader": {
+        "capabilities": _READER_CAPS,
+        "workspace_scope": "assigned",
+    },
+    "writer": {
+        "capabilities": _WRITER_CAPS,
+        "workspace_scope": "assigned",
+    },
+    "admin": {
+        "capabilities": _ADMIN_CAPS,
+        "workspace_scope": "*",
+    },
+}
+
+
+def _scope_permits(role_scope, target_workspace, assigned_workspace):
+    """Does the given role apply to ``target_workspace``?"""
+    if role_scope == "*":
+        return True
+    if role_scope == "assigned":
+        return target_workspace == assigned_workspace
+    return False
+

 def _now_iso():
    return datetime.datetime.now(datetime.timezone.utc).isoformat()
@ -250,6 +322,10 @@ class IamService:
                return await self.handle_disable_workspace(v)
            if op == "rotate-signing-key":
                return await self.handle_rotate_signing_key(v)
+            if op == "authorise":
+                return await self.handle_authorise(v)
+            if op == "authorise-many":
+                return await self.handle_authorise_many(v)

            return _err(
                "invalid-argument",
@ -478,7 +554,7 @@ class IamService:

        (
            id, ws, _username, _name, _email, password_hash,
-            roles, enabled, _mcp, _created,
+            _roles, enabled, _mcp, _created,
        ) = user_row

        if not enabled:
@ -496,11 +572,14 @@ class IamService:

        now_ts = int(_now_dt().timestamp())
        exp_ts = now_ts + JWT_TTL_SECONDS
+        # Per the IAM contract the gateway never reads policy state
+        # from the credential — roles stay server-side, reachable
+        # only via authorise().  JWT carries identity + workspace
+        # binding only.
        claims = {
            "iss": JWT_ISSUER,
            "sub": id,
            "workspace": ws,
-            "roles": sorted(roles) if roles else [],
            "iat": now_ts,
            "exp": exp_ts,
        }
@ -1130,3 +1209,134 @@ class IamService:

        await self.table_store.delete_api_key(key_hash)
        return IamResponse()
+
+    # ------------------------------------------------------------------
+    # authorise / authorise-many
+    #
+    # The IAM contract (see docs/tech-specs/iam-contract.md) calls
+    # for the regime — not the gateway — to decide whether an
+    # identity may perform a capability on a resource given the
+    # operation's parameters.  These two operations are the OSS
+    # regime's implementation of that contract.
+    #
+    # Inputs (on IamRequest):
+    #   user_id          — the identity handle (the gateway's
+    #                       opaque reference).  For OSS this is the
+    #                       user record's id.
+    #   capability       — the capability string from the
+    #                       capabilities.md vocabulary.
+    #   resource_json    — JSON dict, the resource address
+    #                       ({} for system, {workspace} for
+    #                       workspace, {workspace, flow} for flow).
+    #   parameters_json  — JSON dict, decision-relevant operation
+    #                       parameters (e.g. workspace association
+    #                       on user-registry operations).
+    #   authorise_checks — for authorise-many, a JSON list of
+    #                       {capability, resource, parameters}.
+    #
+    # Outputs (on IamResponse):
+    #   decision_allow         — single allow / deny verdict.
+    #   decision_ttl_seconds   — gateway cache TTL for this
+    #                            decision.
+    #   decisions_json         — for authorise-many, list of
+    #                            {allow, ttl} in request order.
+    # ------------------------------------------------------------------
+
+    def _decide(self, user_row, capability, resource, parameters):
+        """Single authorisation decision.  Returns (allow, ttl)."""
+
+        if user_row is None:
+            return False, AUTHZ_CACHE_TTL_SECONDS
+
+        # user_row layout:
+        # 0:id 1:workspace 2:username 3:name 4:email 5:password_hash
+        # 6:roles 7:enabled 8:must_change_password 9:created
+        if not user_row[7]:  # disabled
+            return False, AUTHZ_CACHE_TTL_SECONDS
+
+        # Disabled workspace check (defense in depth — credentials
+        # bound to a disabled workspace shouldn't be able to act).
+        # Cheap; one row read.
+        # We do this only when a target workspace is involved, to
+        # avoid an extra read for system-level operations that
+        # bypass workspace altogether.
+        target_workspace = (
+            (resource or {}).get("workspace")
+            or (parameters or {}).get("workspace")
+        )
+
+        roles = user_row[6] or set()
+        assigned_workspace = user_row[1]
+
+        for role_name in roles:
+            defn = ROLE_DEFINITIONS.get(role_name)
+            if defn is None:
+                continue
+            if capability not in defn["capabilities"]:
+                continue
+            if target_workspace is None or _scope_permits(
+                    defn["workspace_scope"],
+                    target_workspace,
+                    assigned_workspace,
+            ):
+                return True, AUTHZ_CACHE_TTL_SECONDS
+
+        return False, AUTHZ_CACHE_TTL_SECONDS
+
+    async def handle_authorise(self, v):
+        if not v.capability:
+            return _err("invalid-argument", "capability required")
+        if not v.user_id:
+            return _err("invalid-argument", "user_id (handle) required")
+
+        try:
+            resource = json.loads(v.resource_json or "{}")
+            parameters = json.loads(v.parameters_json or "{}")
+        except json.JSONDecodeError as e:
+            return _err("invalid-argument", f"bad json: {e}")
+
+        user_row = await self.table_store.get_user(v.user_id)
+        allow, ttl = self._decide(
+            user_row, v.capability, resource, parameters,
+        )
+        return IamResponse(
+            decision_allow=allow,
+            decision_ttl_seconds=ttl,
+        )
+
+    async def handle_authorise_many(self, v):
+        if not v.user_id:
+            return _err("invalid-argument", "user_id (handle) required")
+        if not v.authorise_checks:
+            return _err("invalid-argument", "authorise_checks required")
+
+        try:
+            checks = json.loads(v.authorise_checks)
+        except json.JSONDecodeError as e:
+            return _err("invalid-argument", f"bad json: {e}")
+        if not isinstance(checks, list):
+            return _err(
+                "invalid-argument",
+                "authorise_checks must be a JSON list",
+            )
+
+        # One user lookup for the whole batch.
+        user_row = await self.table_store.get_user(v.user_id)
+
+        decisions = []
+        for c in checks:
+            if not isinstance(c, dict):
+                decisions.append({
+                    "allow": False,
+                    "ttl": AUTHZ_CACHE_TTL_SECONDS,
+                })
+                continue
+            allow, ttl = self._decide(
+                user_row,
+                c.get("capability", ""),
+                c.get("resource") or {},
+                c.get("parameters") or {},
+            )
+            decisions.append({"allow": allow, "ttl": ttl})
+
+        return IamResponse(decisions_json=json.dumps(decisions))