refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853)

The gateway no longer holds any policy state — capability sets, role definitions, workspace scope rules. Per the IAM contract it asks the regime "may this identity perform this capability on this resource?" per request. That moves the OSS role-based regime entirely into iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing the gateway, the wire protocol, or backend services. Contract: - authenticate(credential) -> Identity (handle, workspace, principal_id, source). No roles, claims, or policy state surface to the gateway. - authorise(identity, capability, resource, parameters) -> (allow, ttl). Cached per-decision (regime TTL clamped above; fail-closed on regime errors). - authorise_many available as a fan-out variant. Operation registry drives every authorisation decision: - /api/v1/iam -> IamEndpoint, looks up bare op name (create-user, list-workspaces, ...). - /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op> (config:get, flow:list-blueprints, librarian:add-document, ...). - /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>. - /api/v1/flow/{flow}/{import,export}/{kind} -> flow-{import,export}:<kind>. - WS Mux per-frame -> flow-service:<kind>; closes a gap where authenticated users could hit any service kind. 85 operations registered across the surface. JWT carries identity only — sub + workspace. The roles claim is gone; the gateway never reads policy state from a credential. The three coarse *_KIND_CAPABILITY maps are removed. The registry is the only source of truth for the capability + resource shape of an operation. Tests migrated to the new Identity shape and to authorise()-mocked auth doubles. Specs updated: docs/tech-specs/iam-contract.md (Identity surface, caching, registry-naming conventions), iam.md (JWT shape, gateway flow, role section reframed as OSS-regime detail), iam-protocol.md (positioned as one implementation of the contract).
2026-04-30 02:46:23 +02:00 · 2026-04-28 16:19:41 +01:00 · 2026-04-28 16:19:41 +01:00 · 5e28d3cce0
commit 5e28d3cce0
parent 9f2d9adcb1
24 changed files with 2359 additions and 587 deletions
--- a/docs/tech-specs/capabilities.md
+++ b/docs/tech-specs/capabilities.md
@ -8,22 +8,41 @@ parent: "Tech Specs"

 ## Overview

-Authorisation in TrustGraph is **capability-based**. Every gateway
-endpoint maps to exactly one *capability*; a user's roles each grant
-a set of capabilities; an authenticated request is permitted when
-the required capability is a member of the union of the caller's
-role capability sets.
+Every gateway endpoint maps to exactly one *capability* — a string
+from a closed vocabulary defined in this document.  When the
+gateway authorises a request, it hands the IAM regime four things:
+the authenticated identity, the required capability, the
+operation's resource (the structured identifier of what's being
+operated on), and the operation's parameters.  The IAM regime
+decides allow or deny; see the [IAM contract](iam-contract.md) for
+the full abstraction.

-This document defines the capability vocabulary — the closed list
-of capability strings that the gateway recognises — and the
-open-source edition's role bundles.
+A capability is a **permission**, not a structural classification.
+`graph:read` says "the caller may read graphs"; it does not say
+where graphs live or how they are addressed.  The shape of a
+request — whether workspace appears in the URL, the envelope, or
+the body, and whether it is a resource address component or an
+operation parameter — is determined by what the operation operates
+on, not by what permission it requires.  Permission and structure
+are orthogonal; the contract takes both.

-The capability mechanism is shared between open-source and potential
-3rd party enterprise capability. The open-source edition ships a
-fixed three-role bundle (`reader`, `writer`, `admin`). Enterprise
-capability may define additional roles by composing their own
-capability bundles from the same vocabulary; no protocol, gateway,
-or backend-service change is required.
+This document defines:
+
+- The **capability vocabulary** — the closed list of capability
+  strings the gateway uses as input to `authorise`.  All IAM
+  regimes share this vocabulary; that's the only schema the
+  gateway and the IAM regime have to agree on.
+- The **open-source role bundles** — the role-and-scope table the
+  OSS IAM regime uses to answer `authorise` calls.  Other regimes
+  answer the same call differently; the bundles below are an
+  OSS-specific implementation detail, not a contract assertion.
+
+A regime may evaluate `authorise` using role bundles (OSS), IdP
+group memberships, attribute-based policies, relationship tuples,
+or any other mechanism.  The gateway is unaware of which.  The
+capability strings — and the resource component vocabulary the
+gateway populates alongside them — are the only thing both sides
+have to agree on.

 ## Motivation

@ -113,19 +132,50 @@ granting `llm` expresses exactly that. An administrator granting
 `agent` should treat it as a grant of everything the agent
 composes at deployment time.

-### Authorisation evaluation
+### Authorisation evaluation (OSS regime)
+
+This section describes how the OSS IAM regime answers
+`authorise(identity, capability, resource, parameters)`.  Other
+regimes answer the same contract differently; only the inputs (the
+capability vocabulary, the resource components, the parameter
+shape) are shared.

 For a request bearing a resolved set of roles
-`R = {r1, r2, ...}` against an endpoint that requires capability
-`c`:
+`R = {r1, r2, ...}`, a required capability `c`, a resource, and
+parameters:

 ```
-allow if c IN union(bundle(r) for r in R)
+let target_workspace =
+        resource.workspace                  (workspace-/flow-level resources)
+        or parameters.workspace             (system-level resources whose
+                                             parameters reference a workspace)
+        or unset                            (system-level operations with no
+                                             workspace context)
+
+allow if some role r in R has c in its capability bundle
+        and (target_workspace is unset
+             or r's workspace_scope permits target_workspace)
 ```

-No hierarchy, no precedence, no role-order sensitivity. A user
+The OSS regime considers workspace from whichever role it plays in
+the operation:
+
+- For workspace-level and flow-level resources, the workspace lives
+  in `resource.workspace` and that is what the role's scope is
+  checked against.
+- For system-level resources whose operation parameters reference a
+  workspace (e.g. `create-user with workspace association W`),
+  workspace lives in `parameters.workspace` and that is what the
+  role's scope is checked against.  The resource is system-level
+  (`resource = {}`) but the workspace constraint still bites.
+- For system-level operations with no workspace context (e.g.
+  `bootstrap`, `rotate-signing-key`), the workspace-scope check
+  collapses — only capability-bundle membership matters.
+
+No hierarchy, no precedence, no role-order sensitivity.  A user
 with a single role is the common case; a user with multiple roles
-gets the union of their bundles.
+is allowed if any role independently grants both the capability
+and the relevant workspace scope.

 ### Enforcement boundary

@ -214,5 +264,10 @@ ships that feature.

 ## References

+- [IAM Contract Specification](iam-contract.md) — the abstract
+  gateway↔IAM regime contract; capability strings are inputs to
+  `authorise`.
 - [Identity and Access Management Specification](iam.md)
+- [IAM Service Protocol Specification](iam-protocol.md) — the OSS
+  regime's wire-level protocol.
 - [Architecture Principles](architecture-principles.md)