refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853)
The gateway no longer holds any policy state — capability sets, role
definitions, workspace scope rules. Per the IAM contract it asks the
regime "may this identity perform this capability on this resource?"
per request. That moves the OSS role-based regime entirely into
iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing
the gateway, the wire protocol, or backend services.
Contract:
- authenticate(credential) -> Identity (handle, workspace,
principal_id, source). No roles, claims, or policy state surface
to the gateway.
- authorise(identity, capability, resource, parameters) -> (allow,
ttl). Cached per-decision (regime TTL clamped above; fail-closed
on regime errors).
- authorise_many available as a fan-out variant.
Operation registry drives every authorisation decision:
- /api/v1/iam -> IamEndpoint, looks up bare op name (create-user,
list-workspaces, ...).
- /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op>
(config:get, flow:list-blueprints, librarian:add-document, ...).
- /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>.
- /api/v1/flow/{flow}/{import,export}/{kind} ->
flow-{import,export}:<kind>.
- WS Mux per-frame -> flow-service:<kind>; closes a gap where
authenticated users could hit any service kind.
85 operations registered across the surface.
JWT carries identity only — sub + workspace. The roles claim is gone;
the gateway never reads policy state from a credential.
The three coarse *_KIND_CAPABILITY maps are removed. The registry is
the only source of truth for the capability + resource shape of an
operation. Tests migrated to the new Identity shape and to
authorise()-mocked auth doubles.
Specs updated: docs/tech-specs/iam-contract.md (Identity surface,
caching, registry-naming conventions), iam.md (JWT shape, gateway
flow, role section reframed as OSS-regime detail), iam-protocol.md
(positioned as one implementation of the contract).
2026-04-28 16:19:41 +01:00
---
layout: default
title: "IAM Contract Technical Specification"
parent: "Tech Specs"
---
# IAM Contract Technical Specification
## Overview
The IAM contract is the abstraction between the API gateway and any
identity / access management regime that fronts it. The gateway
treats IAM as a black box behind two operations — *authenticate* and
*authorise* — plus a small surface of management operations. No
regime-specific concept (roles, scopes, groups, claims, policy
languages) is visible to the gateway, and no gateway-specific
concept (capability vocabulary, request anatomy) is visible to
backend services.
The TrustGraph open-source distribution ships one IAM regime — a
role-based implementation defined in
[`iam-protocol.md` ](iam-protocol.md ) — that is one implementation of
this contract. Enterprise editions can replace it with a different
regime (OIDC / SSO, ABAC, ReBAC, external policy engine) without
changing the gateway, the wire protocol, or the backends.
## Motivation
Authorisation models vary by deployment. A small team might be
happy with three predefined roles; an enterprise might need group-
mapping from an upstream IdP, attribute-based policies, or
relationship-based access control. Hard-wiring any one of those
into the gateway forces every other regime to either compromise its
model or be re-implemented.
A narrow contract — "authenticate this credential" and "may this
identity perform this operation on this resource" — captures what
the gateway actually needs to know without committing to a policy
shape. The IAM regime owns the policy decision; the gateway is a
generic enforcement point.
## Operations
### `authenticate`
```
authenticate(credential: bytes) → Identity | AuthFailure
```
Validates a credential the client presented. The gateway treats
the credential as opaque bytes — for the OSS regime today that's
either an API key plaintext or a JWT, but the gateway does not
parse them; the IAM regime decides.
On success, returns an `Identity` . On any failure the IAM regime
returns the same opaque `AuthFailure` — never a description of which
condition failed. This is the spec's masked-error rule: an
attacker probing the endpoint cannot distinguish "no such key",
"expired", "wrong signature", "revoked", "user disabled", etc.
### `authorise`
```
authorise(identity: Identity,
capability: str,
resource: Resource,
parameters: dict)
→ Decision
```
Asks whether the identity is permitted to perform the named
capability on the named resource, given the operation's
parameters. Returns `allow` or `deny` . `identity` is whatever
`authenticate` returned for this caller; the gateway never
decomposes it.
The four arguments separate concerns:
- **`identity` ** — who is asking.
- **`capability` ** — what permission they are exercising (e.g.
`users:write` , `graph:read` ). Permission, not structure.
- **`resource` ** — what is being operated on, as a structured
identifier. See *The Resource model* below.
- **`parameters` ** — operation-specific data that the regime may
need to consider beyond the resource identifier. Used when a
iam: self-service ops, optional workspace filters, Mux service routing (#855)
Three threads, all reinforcing the contract's system-level vs.
workspace-association distinction.
WS Mux service routing
- tg-show-flows (and any workspace-level service over the WS) was
failing with "unknown service" because the post-refactor Mux
unconditionally looked up flow-service:<kind>. Now branches on
the envelope's flow field: with flow → flow-service:<kind>;
without flow → <kind>:<op> from the inner body; with bare op
lookup for service=iam. Resource and parameters come from the
matched op's own extractors — same path the HTTP endpoints take.
Optional workspace on system-level user/key ops
- list-users returns the deployment-wide list when no workspace is
supplied, filters when one is. get-user, update-user,
disable-user, enable-user, delete-user, reset-password,
create-api-key, list-api-keys, revoke-api-key all treat workspace
as an optional integrity check rather than a required argument.
- create-user keeps workspace required — there it's the new user's
home-workspace binding, a parameter rather than an address.
- API keys reclassified as SYSTEM-level resources. By the same
reasoning that makes users system-level, an API key is a
credential record on a deployment-wide registry; the workspace it
authenticates to is a property, not a containment.
Self-service surface
- whoami: returns the caller's own user record. AUTHENTICATED-only;
no users:read capability required. Foundation for UI affordances
that depend on the caller's permissions.
- bootstrap-status: POST /api/v1/auth/bootstrap-status, PUBLIC,
side-effect-free. Returns {bootstrap_available: bool} so a
first-run UI can decide whether to render setup without consuming
the bootstrap op.
- Gateway now injects actor=identity.handle on every authenticated
forward to iam-svc (IamEndpoint and WS Mux iam path), overwriting
any caller-supplied value. Underpins whoami, audit logging, and
future regime-side decisions that need actor identity.
- tg-whoami and tg-update-user CLIs.
Spec polish
- iam-contract.md: actor-injection rule documented; whoami /
bootstrap-status added to operations list; permission-scope
framing tightened (workspace scope is a property of the grant,
not the user or role).
- iam.md: self-service section; gateway flow gains the actor-
injection step; role section reframed so iam-svc constraints
don't leak into contract-level prose.
- iam-protocol.md: ops table updated for whoami, bootstrap-status,
optional-workspace pattern; bootstrap_available added to the
IamResponse listing.
2026-04-28 22:13:12 +01:00
decision depends on attributes the request supplies — e.g.
creating a user *with workspace association W* : the resource is
the system-level user registry, and W is a parameter the regime
checks against the caller's permissions for `users:write` .
Different regimes use the four arguments differently — one regime
might evaluate role bundles whose grants carry workspace scope;
another might consult upstream IdP group memberships; an ABAC
regime evaluates a policy with all four as inputs. The contract
is unchanged.
refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853)
The gateway no longer holds any policy state — capability sets, role
definitions, workspace scope rules. Per the IAM contract it asks the
regime "may this identity perform this capability on this resource?"
per request. That moves the OSS role-based regime entirely into
iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing
the gateway, the wire protocol, or backend services.
Contract:
- authenticate(credential) -> Identity (handle, workspace,
principal_id, source). No roles, claims, or policy state surface
to the gateway.
- authorise(identity, capability, resource, parameters) -> (allow,
ttl). Cached per-decision (regime TTL clamped above; fail-closed
on regime errors).
- authorise_many available as a fan-out variant.
Operation registry drives every authorisation decision:
- /api/v1/iam -> IamEndpoint, looks up bare op name (create-user,
list-workspaces, ...).
- /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op>
(config:get, flow:list-blueprints, librarian:add-document, ...).
- /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>.
- /api/v1/flow/{flow}/{import,export}/{kind} ->
flow-{import,export}:<kind>.
- WS Mux per-frame -> flow-service:<kind>; closes a gap where
authenticated users could hit any service kind.
85 operations registered across the surface.
JWT carries identity only — sub + workspace. The roles claim is gone;
the gateway never reads policy state from a credential.
The three coarse *_KIND_CAPABILITY maps are removed. The registry is
the only source of truth for the capability + resource shape of an
operation. Tests migrated to the new Identity shape and to
authorise()-mocked auth doubles.
Specs updated: docs/tech-specs/iam-contract.md (Identity surface,
caching, registry-naming conventions), iam.md (JWT shape, gateway
flow, role section reframed as OSS-regime detail), iam-protocol.md
(positioned as one implementation of the contract).
2026-04-28 16:19:41 +01:00
### `authorise_many`
```
authorise_many(identity: Identity,
checks: list[(str, Resource, dict)])
→ list[Decision]
```
Bulk variant of `authorise` . Same semantics, one round-trip for
many decisions. Used when an operation fans out to multiple
resources (e.g. an agent that touches several workspaces) and a
single permission check isn't sufficient.
`authorise_many` is not just a performance optimisation; it pins
the contract for fan-out operations early, before clients (or
internal callers) build patterns that assume one-permission-check-
per-request. Regimes implement it as a loop over `authorise`
unless they have a more efficient path.
### Management operations
Beyond the request-time `authenticate` / `authorise` , the contract
also covers identity-lifecycle and credential-lifecycle operations
that are invoked by administrative requests rather than by the
authentication path. These are regime-specific in detail (an SSO
regime that delegates user management to the IdP may not implement
most of them) but the operation set the gateway can forward is:
- User management: `create-user` , `list-users` , `get-user` ,
`update-user` , `disable-user` , `enable-user` , `delete-user`
- Credential management: `create-api-key` , `list-api-keys` ,
`revoke-api-key` , `change-password` , `reset-password`
- Workspace management: `create-workspace` , `list-workspaces` ,
`get-workspace` , `update-workspace` , `disable-workspace`
iam: self-service ops, optional workspace filters, Mux service routing (#855)
Three threads, all reinforcing the contract's system-level vs.
workspace-association distinction.
WS Mux service routing
- tg-show-flows (and any workspace-level service over the WS) was
failing with "unknown service" because the post-refactor Mux
unconditionally looked up flow-service:<kind>. Now branches on
the envelope's flow field: with flow → flow-service:<kind>;
without flow → <kind>:<op> from the inner body; with bare op
lookup for service=iam. Resource and parameters come from the
matched op's own extractors — same path the HTTP endpoints take.
Optional workspace on system-level user/key ops
- list-users returns the deployment-wide list when no workspace is
supplied, filters when one is. get-user, update-user,
disable-user, enable-user, delete-user, reset-password,
create-api-key, list-api-keys, revoke-api-key all treat workspace
as an optional integrity check rather than a required argument.
- create-user keeps workspace required — there it's the new user's
home-workspace binding, a parameter rather than an address.
- API keys reclassified as SYSTEM-level resources. By the same
reasoning that makes users system-level, an API key is a
credential record on a deployment-wide registry; the workspace it
authenticates to is a property, not a containment.
Self-service surface
- whoami: returns the caller's own user record. AUTHENTICATED-only;
no users:read capability required. Foundation for UI affordances
that depend on the caller's permissions.
- bootstrap-status: POST /api/v1/auth/bootstrap-status, PUBLIC,
side-effect-free. Returns {bootstrap_available: bool} so a
first-run UI can decide whether to render setup without consuming
the bootstrap op.
- Gateway now injects actor=identity.handle on every authenticated
forward to iam-svc (IamEndpoint and WS Mux iam path), overwriting
any caller-supplied value. Underpins whoami, audit logging, and
future regime-side decisions that need actor identity.
- tg-whoami and tg-update-user CLIs.
Spec polish
- iam-contract.md: actor-injection rule documented; whoami /
bootstrap-status added to operations list; permission-scope
framing tightened (workspace scope is a property of the grant,
not the user or role).
- iam.md: self-service section; gateway flow gains the actor-
injection step; role section reframed so iam-svc constraints
don't leak into contract-level prose.
- iam-protocol.md: ops table updated for whoami, bootstrap-status,
optional-workspace pattern; bootstrap_available added to the
IamResponse listing.
2026-04-28 22:13:12 +01:00
- Session management: `login` , `whoami`
refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853)
The gateway no longer holds any policy state — capability sets, role
definitions, workspace scope rules. Per the IAM contract it asks the
regime "may this identity perform this capability on this resource?"
per request. That moves the OSS role-based regime entirely into
iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing
the gateway, the wire protocol, or backend services.
Contract:
- authenticate(credential) -> Identity (handle, workspace,
principal_id, source). No roles, claims, or policy state surface
to the gateway.
- authorise(identity, capability, resource, parameters) -> (allow,
ttl). Cached per-decision (regime TTL clamped above; fail-closed
on regime errors).
- authorise_many available as a fan-out variant.
Operation registry drives every authorisation decision:
- /api/v1/iam -> IamEndpoint, looks up bare op name (create-user,
list-workspaces, ...).
- /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op>
(config:get, flow:list-blueprints, librarian:add-document, ...).
- /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>.
- /api/v1/flow/{flow}/{import,export}/{kind} ->
flow-{import,export}:<kind>.
- WS Mux per-frame -> flow-service:<kind>; closes a gap where
authenticated users could hit any service kind.
85 operations registered across the surface.
JWT carries identity only — sub + workspace. The roles claim is gone;
the gateway never reads policy state from a credential.
The three coarse *_KIND_CAPABILITY maps are removed. The registry is
the only source of truth for the capability + resource shape of an
operation. Tests migrated to the new Identity shape and to
authorise()-mocked auth doubles.
Specs updated: docs/tech-specs/iam-contract.md (Identity surface,
caching, registry-naming conventions), iam.md (JWT shape, gateway
flow, role section reframed as OSS-regime detail), iam-protocol.md
(positioned as one implementation of the contract).
2026-04-28 16:19:41 +01:00
- Key management: `get-signing-key-public` , `rotate-signing-key`
iam: self-service ops, optional workspace filters, Mux service routing (#855)
Three threads, all reinforcing the contract's system-level vs.
workspace-association distinction.
WS Mux service routing
- tg-show-flows (and any workspace-level service over the WS) was
failing with "unknown service" because the post-refactor Mux
unconditionally looked up flow-service:<kind>. Now branches on
the envelope's flow field: with flow → flow-service:<kind>;
without flow → <kind>:<op> from the inner body; with bare op
lookup for service=iam. Resource and parameters come from the
matched op's own extractors — same path the HTTP endpoints take.
Optional workspace on system-level user/key ops
- list-users returns the deployment-wide list when no workspace is
supplied, filters when one is. get-user, update-user,
disable-user, enable-user, delete-user, reset-password,
create-api-key, list-api-keys, revoke-api-key all treat workspace
as an optional integrity check rather than a required argument.
- create-user keeps workspace required — there it's the new user's
home-workspace binding, a parameter rather than an address.
- API keys reclassified as SYSTEM-level resources. By the same
reasoning that makes users system-level, an API key is a
credential record on a deployment-wide registry; the workspace it
authenticates to is a property, not a containment.
Self-service surface
- whoami: returns the caller's own user record. AUTHENTICATED-only;
no users:read capability required. Foundation for UI affordances
that depend on the caller's permissions.
- bootstrap-status: POST /api/v1/auth/bootstrap-status, PUBLIC,
side-effect-free. Returns {bootstrap_available: bool} so a
first-run UI can decide whether to render setup without consuming
the bootstrap op.
- Gateway now injects actor=identity.handle on every authenticated
forward to iam-svc (IamEndpoint and WS Mux iam path), overwriting
any caller-supplied value. Underpins whoami, audit logging, and
future regime-side decisions that need actor identity.
- tg-whoami and tg-update-user CLIs.
Spec polish
- iam-contract.md: actor-injection rule documented; whoami /
bootstrap-status added to operations list; permission-scope
framing tightened (workspace scope is a property of the grant,
not the user or role).
- iam.md: self-service section; gateway flow gains the actor-
injection step; role section reframed so iam-svc constraints
don't leak into contract-level prose.
- iam-protocol.md: ops table updated for whoami, bootstrap-status,
optional-workspace pattern; bootstrap_available added to the
IamResponse listing.
2026-04-28 22:13:12 +01:00
- Bootstrap: `bootstrap` , `bootstrap-status`
`whoami` is the self-read counterpart to `get-user` : any
authenticated caller can read their own identity record without
holding a user-management capability. It is the gating-free probe
a UI uses to render affordances appropriate to the caller's role.
`bootstrap-status` is a side-effect-free probe of whether an
unconsumed `bootstrap` call would currently succeed. It exists so
a first-run UI can decide whether to render setup without invoking
the consuming `bootstrap` op. Public — no authentication.
refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853)
The gateway no longer holds any policy state — capability sets, role
definitions, workspace scope rules. Per the IAM contract it asks the
regime "may this identity perform this capability on this resource?"
per request. That moves the OSS role-based regime entirely into
iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing
the gateway, the wire protocol, or backend services.
Contract:
- authenticate(credential) -> Identity (handle, workspace,
principal_id, source). No roles, claims, or policy state surface
to the gateway.
- authorise(identity, capability, resource, parameters) -> (allow,
ttl). Cached per-decision (regime TTL clamped above; fail-closed
on regime errors).
- authorise_many available as a fan-out variant.
Operation registry drives every authorisation decision:
- /api/v1/iam -> IamEndpoint, looks up bare op name (create-user,
list-workspaces, ...).
- /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op>
(config:get, flow:list-blueprints, librarian:add-document, ...).
- /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>.
- /api/v1/flow/{flow}/{import,export}/{kind} ->
flow-{import,export}:<kind>.
- WS Mux per-frame -> flow-service:<kind>; closes a gap where
authenticated users could hit any service kind.
85 operations registered across the surface.
JWT carries identity only — sub + workspace. The roles claim is gone;
the gateway never reads policy state from a credential.
The three coarse *_KIND_CAPABILITY maps are removed. The registry is
the only source of truth for the capability + resource shape of an
operation. Tests migrated to the new Identity shape and to
authorise()-mocked auth doubles.
Specs updated: docs/tech-specs/iam-contract.md (Identity surface,
caching, registry-naming conventions), iam.md (JWT shape, gateway
flow, role section reframed as OSS-regime detail), iam-protocol.md
(positioned as one implementation of the contract).
2026-04-28 16:19:41 +01:00
A regime that does not support one of these (e.g. an SSO regime
where users are managed in the IdP) returns a defined "not
supported" error; the gateway surfaces it as a 501.
iam: self-service ops, optional workspace filters, Mux service routing (#855)
Three threads, all reinforcing the contract's system-level vs.
workspace-association distinction.
WS Mux service routing
- tg-show-flows (and any workspace-level service over the WS) was
failing with "unknown service" because the post-refactor Mux
unconditionally looked up flow-service:<kind>. Now branches on
the envelope's flow field: with flow → flow-service:<kind>;
without flow → <kind>:<op> from the inner body; with bare op
lookup for service=iam. Resource and parameters come from the
matched op's own extractors — same path the HTTP endpoints take.
Optional workspace on system-level user/key ops
- list-users returns the deployment-wide list when no workspace is
supplied, filters when one is. get-user, update-user,
disable-user, enable-user, delete-user, reset-password,
create-api-key, list-api-keys, revoke-api-key all treat workspace
as an optional integrity check rather than a required argument.
- create-user keeps workspace required — there it's the new user's
home-workspace binding, a parameter rather than an address.
- API keys reclassified as SYSTEM-level resources. By the same
reasoning that makes users system-level, an API key is a
credential record on a deployment-wide registry; the workspace it
authenticates to is a property, not a containment.
Self-service surface
- whoami: returns the caller's own user record. AUTHENTICATED-only;
no users:read capability required. Foundation for UI affordances
that depend on the caller's permissions.
- bootstrap-status: POST /api/v1/auth/bootstrap-status, PUBLIC,
side-effect-free. Returns {bootstrap_available: bool} so a
first-run UI can decide whether to render setup without consuming
the bootstrap op.
- Gateway now injects actor=identity.handle on every authenticated
forward to iam-svc (IamEndpoint and WS Mux iam path), overwriting
any caller-supplied value. Underpins whoami, audit logging, and
future regime-side decisions that need actor identity.
- tg-whoami and tg-update-user CLIs.
Spec polish
- iam-contract.md: actor-injection rule documented; whoami /
bootstrap-status added to operations list; permission-scope
framing tightened (workspace scope is a property of the grant,
not the user or role).
- iam.md: self-service section; gateway flow gains the actor-
injection step; role section reframed so iam-svc constraints
don't leak into contract-level prose.
- iam-protocol.md: ops table updated for whoami, bootstrap-status,
optional-workspace pattern; bootstrap_available added to the
IamResponse listing.
2026-04-28 22:13:12 +01:00
### Actor injection
For any management operation forwarded by the gateway after
authentication, the gateway injects the authenticated caller's
`handle` as an `actor` field on the request. Regimes use `actor`
to identify *who is making the request* — distinct from the
operation's target (which lives in `user_id` / `key_id` /
`workspace_record` / etc.) — for purposes such as:
- Self-service operations (`whoami` , `change-password` ) that
resolve "the caller" without taking a target argument.
- Audit logging, where the actor is recorded against the change.
- Decisions that depend on the resolved resource state. The
gateway authorises against the parameters on the request, but it
cannot know the resolved resource's actual properties (e.g. the
workspace association of a target user) before the regime loads
it. When that matters, the regime can re-decide using the
actor's permissions and the resolved record — closing a class
of cases the gateway-side check can't see.
Caller-supplied `actor` values on the request body are overwritten
by the gateway — the gateway is the only authority for actor
identity, and a regime that consults `actor` can rely on it being
authentic.
refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853)
The gateway no longer holds any policy state — capability sets, role
definitions, workspace scope rules. Per the IAM contract it asks the
regime "may this identity perform this capability on this resource?"
per request. That moves the OSS role-based regime entirely into
iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing
the gateway, the wire protocol, or backend services.
Contract:
- authenticate(credential) -> Identity (handle, workspace,
principal_id, source). No roles, claims, or policy state surface
to the gateway.
- authorise(identity, capability, resource, parameters) -> (allow,
ttl). Cached per-decision (regime TTL clamped above; fail-closed
on regime errors).
- authorise_many available as a fan-out variant.
Operation registry drives every authorisation decision:
- /api/v1/iam -> IamEndpoint, looks up bare op name (create-user,
list-workspaces, ...).
- /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op>
(config:get, flow:list-blueprints, librarian:add-document, ...).
- /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>.
- /api/v1/flow/{flow}/{import,export}/{kind} ->
flow-{import,export}:<kind>.
- WS Mux per-frame -> flow-service:<kind>; closes a gap where
authenticated users could hit any service kind.
85 operations registered across the surface.
JWT carries identity only — sub + workspace. The roles claim is gone;
the gateway never reads policy state from a credential.
The three coarse *_KIND_CAPABILITY maps are removed. The registry is
the only source of truth for the capability + resource shape of an
operation. Tests migrated to the new Identity shape and to
authorise()-mocked auth doubles.
Specs updated: docs/tech-specs/iam-contract.md (Identity surface,
caching, registry-naming conventions), iam.md (JWT shape, gateway
flow, role section reframed as OSS-regime detail), iam-protocol.md
(positioned as one implementation of the contract).
2026-04-28 16:19:41 +01:00
## The `Identity` surface
`Identity` is *mostly* opaque. The gateway holds the value as a
token to quote back when calling `authorise` , never decomposing it.
But there are a few gateway-side concerns that need a small
surface:
| Field | Purpose |
|---|---|
| `handle` | Opaque reference passed back to `authorise` . Regime-defined; gateway treats as a string. |
| `workspace` | The workspace this credential authenticates to. Used by the gateway only as a default-fill-in for operations that omit a workspace. Never used as policy input — when authorisation needs to know which workspace the operation acts on, the operation places it in the resource address (or a parameter), and the regime decides. |
| `principal_id` | Stable identifier the gateway logs for audit (a user id, a sub claim, a service account id). Never used for authorisation — that's `authorise` 's job. |
| `source` | How the credential was presented (`api-key` , `jwt` , …). Non-policy; useful for logs and metrics only. |
Anything else — roles, claims, group memberships, policy attributes
— stays inside the regime and is reachable only via `authorise` .
## The `Resource` model
A `Resource` is a structured value identifying *what is being
operated on*. Resources live at one of three levels in TrustGraph,
based on where the resource exists in the deployment:
### Resource levels
| Level | What lives there | Resource shape |
|---|---|---|
| **System** | The user registry, the workspace registry, the signing key, the audit log — anything that exists once per deployment. | `{}` |
| **Workspace** | A workspace's config, flow definitions, library (documents), knowledge cores, collections — things that exist *within* a workspace. | `{workspace: "..."}` |
| **Flow** | A flow's knowledge graph, agent state, LLM context, embedding state, MCP context — things that exist *within* a flow within a workspace. | `{workspace: "...", flow: "..."}` |
Note carefully:
- **Users are a system-level resource.** A user record exists at
the deployment level; the fact that a user has a *workspace
association* (one in OSS, possibly many in other regimes) is a
property of the user record, not a containment. Operations on
the user registry have `resource = {}` ; the workspace
association appears as a *parameter* , not as a resource address
component.
- **Workspaces themselves are a system-level resource.** The
workspace registry exists at the deployment level. `create-
workspace` and ` list-workspaces` are system-level operations;
the workspace identifier in their bodies is a parameter, not an
address.
- **A workspace's contents are workspace-level resources.** A
workspace's config, flows, library, etc. live within a
workspace. Their resource address is `{workspace: ...}` .
- **A flow's contents are flow-level resources.** Knowledge
graphs, agents, etc. live within a flow. Their resource
address is `{workspace: ..., flow: ...}` .
### Component vocabulary
| Component | Type | Meaning | Used by |
|---|---|---|---|
| `workspace` | string | Identifier of the workspace whose contents are being operated on | workspace-level and flow-level resource addresses |
| `flow` | string | Identifier of a flow within a workspace; always paired with `workspace` | flow-level resource addresses |
| `collection` | string | Reserved for finer-grained scoping within a workspace | future / enterprise |
| `document` | string | Reserved for per-document scoping | future / enterprise |
A `Resource` is a partial mapping of these components to values.
The level of the resource (system / workspace / flow) determines
which components must be present. An empty `{}` is the
system-level resource.
### Workspace as parameter vs. address
Workspace plays two distinct roles in operations and shows up in
two distinct places:
- **As a resource address component** — workspace identifies the
thing being operated on. Lives in `resource.workspace` . Example:
`config:read` reads the config *of* workspace W.
- **As an operation parameter** — workspace is data the operation
acts on or filters by, while the resource itself is system-level.
Lives in `parameters.workspace` . Example: `users:write`
creates a user *with workspace association* W; the resource is
the user registry (system), and W is a parameter.
These are not interchangeable. The IAM regime considers each role
separately; the OSS role table, for instance, applies workspace-
scope to the address component when checking workspace-level
operations, and to a parameter when checking
"create-user-with-workspace-W". Both end up enforcing the admin's
scope, but through different code paths.
### Extension rules
The vocabulary is closed but extensible. Adding a new component:
1. The component is added to the vocabulary in this spec, with a
defined name, type, and meaning.
2. Existing IAM regimes ignore unknown components (forward
compatibility — adding a new component does not break older
regimes that don't understand it).
3. Older gateways that don't populate a new component leave it
unset; regimes that need it for a decision treat "unset" as
"absent" and decide accordingly (typically: cannot grant
permission scoped to a component the gateway didn't supply).
A regime that wants stricter behaviour (e.g. fail-closed on
unknown components rather than ignoring them) declares so as part
of its own configuration; the contract default is "ignore unknown".
## Operation registry (gateway-side)
Mapping a request onto `(capability, resource, parameters)` is
service-specific — it cannot be inferred from the capability
alone. The gateway maintains an **operation registry** that
declares, per operation:
- The required capability.
- The resource level (system / workspace / flow) — determines the
shape of the resource identifier.
- How to extract the resource address components (workspace,
flow) from the request — from URL path, WebSocket envelope, or
body.
- Which body fields are operation parameters (and which of those
the IAM regime should see in the `parameters` argument).
This registry is part of the gateway's endpoint declarations, not
part of the IAM contract. The contract specifies what arguments
`authorise` receives; how the gateway populates them is its own
concern.
In the OSS gateway, registry keys follow these conventions:
| Pattern | Used by | Resource level |
|---|---|---|
| bare op name (`create-user` , `list-users` , `login` , …) | `/api/v1/iam` and the auth surface | system / workspace, per op |
| `<kind>:<op>` (`config:get` , `flow:list-blueprints` , `librarian:add-document` , …) | `/api/v1/{kind}` (workspace-scoped global services) | workspace |
| `flow-service:<kind>` (`flow-service:agent` , `flow-service:graph-rag` , …) | `/api/v1/flow/{flow}/service/{kind}` and the WS Mux | flow |
| `flow-import:<kind>` / `flow-export:<kind>` | `/api/v1/flow/{flow}/{import,export}/{kind}` streaming sockets | flow |
Keys are an OSS-gateway implementation detail — the contract does
not constrain naming. The conventions above exist so the registry
key is uniquely derivable from the request path and (where
applicable) body without ambiguity.
## Caching
Both `authenticate` and `authorise` results are cached at the
gateway, on different policies:
- **`authenticate` ** — cached by a hash of the credential. The OSS
gateway uses a fixed short TTL (currently 60 s) so that revoked
API keys and disabled users stop working within the TTL window
without any push mechanism. Regimes that want a different
behaviour can return an `expires` hint with the identity; the
gateway honours the smaller of `expires` and its own ceiling.
- **`authorise` ** — cached by a hash of `(handle, capability,
resource, parameters)`. The regime returns a suggested TTL with
the decision; the gateway clamps it above by a deployment-set
ceiling (currently 60 s). Both allow and deny decisions are
cached; denies briefly, to avoid hammering the regime with
repeated rejected attempts.
The TTL ceiling caps the revocation latency window — a role
revoked at the regime takes effect at the gateway no later than
the ceiling. Operators that need stricter revocation can lower
the ceiling.
## Failure modes
| Condition | Behaviour |
|---|---|
| `authenticate` returns AuthFailure | Gateway responds 401 with the masked `auth failure` body. |
| `authorise` returns deny | Gateway responds 403 with the masked `access denied` body. |
| IAM regime unreachable | Gateway responds 401 / 503 (deployment-defined). No fail-open. |
| `authorise_many` partial deny | Gateway treats the request as denied; the operation is rejected. Partial-success semantics are not part of the contract. |
| Regime returns "not supported" for a management operation | Gateway responds 501. |
There is no fallback or "soft" decision path. An IAM regime that
is unavailable, slow, or returning errors causes requests to fail
closed.
## Implementations
### Open-source role-based regime
Defined in [`iam-protocol.md` ](iam-protocol.md ). Implements the
contract via:
- A pub/sub request/response service (`iam-svc` ) reached only by
the gateway over the message bus.
- Credentials are API keys (opaque) or JWTs (Ed25519, locally
validated by the gateway against the regime's published public
key).
iam: self-service ops, optional workspace filters, Mux service routing (#855)
Three threads, all reinforcing the contract's system-level vs.
workspace-association distinction.
WS Mux service routing
- tg-show-flows (and any workspace-level service over the WS) was
failing with "unknown service" because the post-refactor Mux
unconditionally looked up flow-service:<kind>. Now branches on
the envelope's flow field: with flow → flow-service:<kind>;
without flow → <kind>:<op> from the inner body; with bare op
lookup for service=iam. Resource and parameters come from the
matched op's own extractors — same path the HTTP endpoints take.
Optional workspace on system-level user/key ops
- list-users returns the deployment-wide list when no workspace is
supplied, filters when one is. get-user, update-user,
disable-user, enable-user, delete-user, reset-password,
create-api-key, list-api-keys, revoke-api-key all treat workspace
as an optional integrity check rather than a required argument.
- create-user keeps workspace required — there it's the new user's
home-workspace binding, a parameter rather than an address.
- API keys reclassified as SYSTEM-level resources. By the same
reasoning that makes users system-level, an API key is a
credential record on a deployment-wide registry; the workspace it
authenticates to is a property, not a containment.
Self-service surface
- whoami: returns the caller's own user record. AUTHENTICATED-only;
no users:read capability required. Foundation for UI affordances
that depend on the caller's permissions.
- bootstrap-status: POST /api/v1/auth/bootstrap-status, PUBLIC,
side-effect-free. Returns {bootstrap_available: bool} so a
first-run UI can decide whether to render setup without consuming
the bootstrap op.
- Gateway now injects actor=identity.handle on every authenticated
forward to iam-svc (IamEndpoint and WS Mux iam path), overwriting
any caller-supplied value. Underpins whoami, audit logging, and
future regime-side decisions that need actor identity.
- tg-whoami and tg-update-user CLIs.
Spec polish
- iam-contract.md: actor-injection rule documented; whoami /
bootstrap-status added to operations list; permission-scope
framing tightened (workspace scope is a property of the grant,
not the user or role).
- iam.md: self-service section; gateway flow gains the actor-
injection step; role section reframed so iam-svc constraints
don't leak into contract-level prose.
- iam-protocol.md: ops table updated for whoami, bootstrap-status,
optional-workspace pattern; bootstrap_available added to the
IamResponse listing.
2026-04-28 22:13:12 +01:00
- `authorise` reduces to a lookup against the role bundles in
[`capabilities.md` ](capabilities.md ), with each grant's workspace
scope checked against the operation's workspace component.
refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853)
The gateway no longer holds any policy state — capability sets, role
definitions, workspace scope rules. Per the IAM contract it asks the
regime "may this identity perform this capability on this resource?"
per request. That moves the OSS role-based regime entirely into
iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing
the gateway, the wire protocol, or backend services.
Contract:
- authenticate(credential) -> Identity (handle, workspace,
principal_id, source). No roles, claims, or policy state surface
to the gateway.
- authorise(identity, capability, resource, parameters) -> (allow,
ttl). Cached per-decision (regime TTL clamped above; fail-closed
on regime errors).
- authorise_many available as a fan-out variant.
Operation registry drives every authorisation decision:
- /api/v1/iam -> IamEndpoint, looks up bare op name (create-user,
list-workspaces, ...).
- /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op>
(config:get, flow:list-blueprints, librarian:add-document, ...).
- /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>.
- /api/v1/flow/{flow}/{import,export}/{kind} ->
flow-{import,export}:<kind>.
- WS Mux per-frame -> flow-service:<kind>; closes a gap where
authenticated users could hit any service kind.
85 operations registered across the surface.
JWT carries identity only — sub + workspace. The roles claim is gone;
the gateway never reads policy state from a credential.
The three coarse *_KIND_CAPABILITY maps are removed. The registry is
the only source of truth for the capability + resource shape of an
operation. Tests migrated to the new Identity shape and to
authorise()-mocked auth doubles.
Specs updated: docs/tech-specs/iam-contract.md (Identity surface,
caching, registry-naming conventions), iam.md (JWT shape, gateway
flow, role section reframed as OSS-regime detail), iam-protocol.md
(positioned as one implementation of the contract).
2026-04-28 16:19:41 +01:00
- Identity, user, and workspace records live in Cassandra.
iam: self-service ops, optional workspace filters, Mux service routing (#855)
Three threads, all reinforcing the contract's system-level vs.
workspace-association distinction.
WS Mux service routing
- tg-show-flows (and any workspace-level service over the WS) was
failing with "unknown service" because the post-refactor Mux
unconditionally looked up flow-service:<kind>. Now branches on
the envelope's flow field: with flow → flow-service:<kind>;
without flow → <kind>:<op> from the inner body; with bare op
lookup for service=iam. Resource and parameters come from the
matched op's own extractors — same path the HTTP endpoints take.
Optional workspace on system-level user/key ops
- list-users returns the deployment-wide list when no workspace is
supplied, filters when one is. get-user, update-user,
disable-user, enable-user, delete-user, reset-password,
create-api-key, list-api-keys, revoke-api-key all treat workspace
as an optional integrity check rather than a required argument.
- create-user keeps workspace required — there it's the new user's
home-workspace binding, a parameter rather than an address.
- API keys reclassified as SYSTEM-level resources. By the same
reasoning that makes users system-level, an API key is a
credential record on a deployment-wide registry; the workspace it
authenticates to is a property, not a containment.
Self-service surface
- whoami: returns the caller's own user record. AUTHENTICATED-only;
no users:read capability required. Foundation for UI affordances
that depend on the caller's permissions.
- bootstrap-status: POST /api/v1/auth/bootstrap-status, PUBLIC,
side-effect-free. Returns {bootstrap_available: bool} so a
first-run UI can decide whether to render setup without consuming
the bootstrap op.
- Gateway now injects actor=identity.handle on every authenticated
forward to iam-svc (IamEndpoint and WS Mux iam path), overwriting
any caller-supplied value. Underpins whoami, audit logging, and
future regime-side decisions that need actor identity.
- tg-whoami and tg-update-user CLIs.
Spec polish
- iam-contract.md: actor-injection rule documented; whoami /
bootstrap-status added to operations list; permission-scope
framing tightened (workspace scope is a property of the grant,
not the user or role).
- iam.md: self-service section; gateway flow gains the actor-
injection step; role section reframed so iam-svc constraints
don't leak into contract-level prose.
- iam-protocol.md: ops table updated for whoami, bootstrap-status,
optional-workspace pattern; bootstrap_available added to the
IamResponse listing.
2026-04-28 22:13:12 +01:00
The OSS regime is deliberately simple — three roles, a single
workspace association per user (a regime data-model decision, not
a contract assertion), no policy language. Other regimes can
grant the same user different permissions in different workspaces
without changing anything outside the regime.
refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853)
The gateway no longer holds any policy state — capability sets, role
definitions, workspace scope rules. Per the IAM contract it asks the
regime "may this identity perform this capability on this resource?"
per request. That moves the OSS role-based regime entirely into
iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing
the gateway, the wire protocol, or backend services.
Contract:
- authenticate(credential) -> Identity (handle, workspace,
principal_id, source). No roles, claims, or policy state surface
to the gateway.
- authorise(identity, capability, resource, parameters) -> (allow,
ttl). Cached per-decision (regime TTL clamped above; fail-closed
on regime errors).
- authorise_many available as a fan-out variant.
Operation registry drives every authorisation decision:
- /api/v1/iam -> IamEndpoint, looks up bare op name (create-user,
list-workspaces, ...).
- /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op>
(config:get, flow:list-blueprints, librarian:add-document, ...).
- /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>.
- /api/v1/flow/{flow}/{import,export}/{kind} ->
flow-{import,export}:<kind>.
- WS Mux per-frame -> flow-service:<kind>; closes a gap where
authenticated users could hit any service kind.
85 operations registered across the surface.
JWT carries identity only — sub + workspace. The roles claim is gone;
the gateway never reads policy state from a credential.
The three coarse *_KIND_CAPABILITY maps are removed. The registry is
the only source of truth for the capability + resource shape of an
operation. Tests migrated to the new Identity shape and to
authorise()-mocked auth doubles.
Specs updated: docs/tech-specs/iam-contract.md (Identity surface,
caching, registry-naming conventions), iam.md (JWT shape, gateway
flow, role section reframed as OSS-regime detail), iam-protocol.md
(positioned as one implementation of the contract).
2026-04-28 16:19:41 +01:00
### Future regimes
The contract is shaped to admit, without code change in the
gateway:
- **OIDC / SSO** — `authenticate` validates an OIDC ID token via
the IdP's JWKS; `Identity.handle` carries the verified subject
and group claims; `authorise` evaluates against group-to-
capability mappings configured at the regime.
- **ABAC / Policy engine** — `authorise` calls out to a policy
engine (Rego, Cedar, custom DSL) with the identity's attributes
and the resource as the policy input.
- **ReBAC (Zanzibar-style)** — `authorise` translates `(identity,
capability, resource)` into a relationship-tuple lookup against
a tuple store.
- **Hybrid** — multiple regimes composed: e.g. authenticate via
SSO, authorise via local policy.
None of these require gateway changes. The contract surface is
the same; the regime is what differs.
## References
- [Identity and Access Management Specification ](iam.md ) — overall
design and the gateway-side framing.
- [IAM Service Protocol Specification ](iam-protocol.md ) — the OSS
regime's wire-level protocol.
- [Capability Vocabulary Specification ](capabilities.md ) — the
capability strings the gateway uses as `authorise` input.