refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853)

The gateway no longer holds any policy state — capability sets, role
definitions, workspace scope rules.  Per the IAM contract it asks the
regime "may this identity perform this capability on this resource?"
per request.  That moves the OSS role-based regime entirely into
iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing
the gateway, the wire protocol, or backend services.

Contract:
- authenticate(credential) -> Identity (handle, workspace,
  principal_id, source).  No roles, claims, or policy state surface
  to the gateway.
- authorise(identity, capability, resource, parameters) -> (allow,
  ttl).  Cached per-decision (regime TTL clamped above; fail-closed
  on regime errors).
- authorise_many available as a fan-out variant.

Operation registry drives every authorisation decision:
- /api/v1/iam -> IamEndpoint, looks up bare op name (create-user,
  list-workspaces, ...).
- /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op>
  (config:get, flow:list-blueprints, librarian:add-document, ...).
- /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>.
- /api/v1/flow/{flow}/{import,export}/{kind} ->
  flow-{import,export}:<kind>.
- WS Mux per-frame -> flow-service:<kind>; closes a gap where
  authenticated users could hit any service kind.
85 operations registered across the surface.

JWT carries identity only — sub + workspace.  The roles claim is gone;
the gateway never reads policy state from a credential.

The three coarse *_KIND_CAPABILITY maps are removed.  The registry is
the only source of truth for the capability + resource shape of an
operation.  Tests migrated to the new Identity shape and to
authorise()-mocked auth doubles.

Specs updated: docs/tech-specs/iam-contract.md (Identity surface,
caching, registry-naming conventions), iam.md (JWT shape, gateway
flow, role section reframed as OSS-regime detail), iam-protocol.md
(positioned as one implementation of the contract).
This commit is contained in:
cybermaggedon 2026-04-28 16:19:41 +01:00 committed by GitHub
parent 9f2d9adcb1
commit 5e28d3cce0
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
24 changed files with 2359 additions and 587 deletions

View file

@ -8,22 +8,41 @@ parent: "Tech Specs"
## Overview
Authorisation in TrustGraph is **capability-based**. Every gateway
endpoint maps to exactly one *capability*; a user's roles each grant
a set of capabilities; an authenticated request is permitted when
the required capability is a member of the union of the caller's
role capability sets.
Every gateway endpoint maps to exactly one *capability* — a string
from a closed vocabulary defined in this document. When the
gateway authorises a request, it hands the IAM regime four things:
the authenticated identity, the required capability, the
operation's resource (the structured identifier of what's being
operated on), and the operation's parameters. The IAM regime
decides allow or deny; see the [IAM contract](iam-contract.md) for
the full abstraction.
This document defines the capability vocabulary — the closed list
of capability strings that the gateway recognises — and the
open-source edition's role bundles.
A capability is a **permission**, not a structural classification.
`graph:read` says "the caller may read graphs"; it does not say
where graphs live or how they are addressed. The shape of a
request — whether workspace appears in the URL, the envelope, or
the body, and whether it is a resource address component or an
operation parameter — is determined by what the operation operates
on, not by what permission it requires. Permission and structure
are orthogonal; the contract takes both.
The capability mechanism is shared between open-source and potential
3rd party enterprise capability. The open-source edition ships a
fixed three-role bundle (`reader`, `writer`, `admin`). Enterprise
capability may define additional roles by composing their own
capability bundles from the same vocabulary; no protocol, gateway,
or backend-service change is required.
This document defines:
- The **capability vocabulary** — the closed list of capability
strings the gateway uses as input to `authorise`. All IAM
regimes share this vocabulary; that's the only schema the
gateway and the IAM regime have to agree on.
- The **open-source role bundles** — the role-and-scope table the
OSS IAM regime uses to answer `authorise` calls. Other regimes
answer the same call differently; the bundles below are an
OSS-specific implementation detail, not a contract assertion.
A regime may evaluate `authorise` using role bundles (OSS), IdP
group memberships, attribute-based policies, relationship tuples,
or any other mechanism. The gateway is unaware of which. The
capability strings — and the resource component vocabulary the
gateway populates alongside them — are the only thing both sides
have to agree on.
## Motivation
@ -113,19 +132,50 @@ granting `llm` expresses exactly that. An administrator granting
`agent` should treat it as a grant of everything the agent
composes at deployment time.
### Authorisation evaluation
### Authorisation evaluation (OSS regime)
This section describes how the OSS IAM regime answers
`authorise(identity, capability, resource, parameters)`. Other
regimes answer the same contract differently; only the inputs (the
capability vocabulary, the resource components, the parameter
shape) are shared.
For a request bearing a resolved set of roles
`R = {r1, r2, ...}` against an endpoint that requires capability
`c`:
`R = {r1, r2, ...}`, a required capability `c`, a resource, and
parameters:
```
allow if c IN union(bundle(r) for r in R)
let target_workspace =
resource.workspace (workspace-/flow-level resources)
or parameters.workspace (system-level resources whose
parameters reference a workspace)
or unset (system-level operations with no
workspace context)
allow if some role r in R has c in its capability bundle
and (target_workspace is unset
or r's workspace_scope permits target_workspace)
```
No hierarchy, no precedence, no role-order sensitivity. A user
The OSS regime considers workspace from whichever role it plays in
the operation:
- For workspace-level and flow-level resources, the workspace lives
in `resource.workspace` and that is what the role's scope is
checked against.
- For system-level resources whose operation parameters reference a
workspace (e.g. `create-user with workspace association W`),
workspace lives in `parameters.workspace` and that is what the
role's scope is checked against. The resource is system-level
(`resource = {}`) but the workspace constraint still bites.
- For system-level operations with no workspace context (e.g.
`bootstrap`, `rotate-signing-key`), the workspace-scope check
collapses — only capability-bundle membership matters.
No hierarchy, no precedence, no role-order sensitivity. A user
with a single role is the common case; a user with multiple roles
gets the union of their bundles.
is allowed if any role independently grants both the capability
and the relevant workspace scope.
### Enforcement boundary
@ -214,5 +264,10 @@ ships that feature.
## References
- [IAM Contract Specification](iam-contract.md) — the abstract
gateway↔IAM regime contract; capability strings are inputs to
`authorise`.
- [Identity and Access Management Specification](iam.md)
- [IAM Service Protocol Specification](iam-protocol.md) — the OSS
regime's wire-level protocol.
- [Architecture Principles](architecture-principles.md)

View file

@ -22,8 +22,16 @@ are the boundaries around data, and who owns what?
A workspace is the primary isolation boundary. It represents an
organisation, team, or independent operating unit. All data belongs to
exactly one workspace. Cross-workspace access is never permitted through
the API.
exactly one workspace.
Cross-workspace access through the API is gated by the IAM regime
(see [`iam-contract.md`](iam-contract.md)). In the OSS distribution,
the role table defined in [`capabilities.md`](capabilities.md)
permits cross-workspace operation only to the `admin` role; the
`reader` and `writer` roles are constrained to a single assigned
workspace per credential. Other regimes can model the relationship
between identity and workspace differently — the gateway makes no
assumption.
A workspace owns:
- Source documents
@ -279,9 +287,18 @@ A typical workflow:
The current codebase uses a `user` field in message metadata and storage
partition keys to identify the workspace. The `collection` field
identifies the collection within that workspace. The IAM spec describes
how the gateway maps authenticated credentials to a workspace identity
and sets these fields.
identifies the collection within that workspace.
The gateway is the single point at which workspace gets stamped onto
outbound pub/sub messages. An incoming credential authenticates to a
workspace (the credential's binding, not a user-to-workspace lookup —
see [`iam-contract.md`](iam-contract.md) and the *Identity surface*
section of [`iam.md`](iam.md)); any caller-supplied workspace on the
request is reconciled against the authenticated identity by the IAM
regime; the resolved value is what the gateway writes into outgoing
messages and the storage layers' partition keys. Backend services
trust the workspace they receive — defense-in-depth happens at the
gateway, not at the bus.
For details on how each storage backend implements this scoping, see:
@ -302,7 +319,10 @@ For details on how each storage backend implements this scoping, see:
## References
- [Identity and Access Management](iam.md)
- [IAM Contract](iam-contract.md) — gateway↔IAM regime abstraction.
- [Identity and Access Management](iam.md) — gateway-side framing.
- [Capability Vocabulary](capabilities.md) — capability strings and
the OSS role bundles that decide cross-workspace eligibility.
- [Collection Management](collection-management.md)
- [Entity-Centric Graph](entity-centric-graph.md)
- [Neo4j User Collection Isolation](neo4j-user-collection-isolation.md)

View file

@ -0,0 +1,366 @@
---
layout: default
title: "IAM Contract Technical Specification"
parent: "Tech Specs"
---
# IAM Contract Technical Specification
## Overview
The IAM contract is the abstraction between the API gateway and any
identity / access management regime that fronts it. The gateway
treats IAM as a black box behind two operations — *authenticate* and
*authorise* — plus a small surface of management operations. No
regime-specific concept (roles, scopes, groups, claims, policy
languages) is visible to the gateway, and no gateway-specific
concept (capability vocabulary, request anatomy) is visible to
backend services.
The TrustGraph open-source distribution ships one IAM regime — a
role-based implementation defined in
[`iam-protocol.md`](iam-protocol.md) — that is one implementation of
this contract. Enterprise editions can replace it with a different
regime (OIDC / SSO, ABAC, ReBAC, external policy engine) without
changing the gateway, the wire protocol, or the backends.
## Motivation
Authorisation models vary by deployment. A small team might be
happy with three predefined roles; an enterprise might need group-
mapping from an upstream IdP, attribute-based policies, or
relationship-based access control. Hard-wiring any one of those
into the gateway forces every other regime to either compromise its
model or be re-implemented.
A narrow contract — "authenticate this credential" and "may this
identity perform this operation on this resource" — captures what
the gateway actually needs to know without committing to a policy
shape. The IAM regime owns the policy decision; the gateway is a
generic enforcement point.
## Operations
### `authenticate`
```
authenticate(credential: bytes) → Identity | AuthFailure
```
Validates a credential the client presented. The gateway treats
the credential as opaque bytes — for the OSS regime today that's
either an API key plaintext or a JWT, but the gateway does not
parse them; the IAM regime decides.
On success, returns an `Identity`. On any failure the IAM regime
returns the same opaque `AuthFailure` — never a description of which
condition failed. This is the spec's masked-error rule: an
attacker probing the endpoint cannot distinguish "no such key",
"expired", "wrong signature", "revoked", "user disabled", etc.
### `authorise`
```
authorise(identity: Identity,
capability: str,
resource: Resource,
parameters: dict)
→ Decision
```
Asks whether the identity is permitted to perform the named
capability on the named resource, given the operation's
parameters. Returns `allow` or `deny`. `identity` is whatever
`authenticate` returned for this caller; the gateway never
decomposes it.
The four arguments separate concerns:
- **`identity`** — who is asking.
- **`capability`** — what permission they are exercising (e.g.
`users:write`, `graph:read`). Permission, not structure.
- **`resource`** — what is being operated on, as a structured
identifier. See *The Resource model* below.
- **`parameters`** — operation-specific data that the regime may
need to consider beyond the resource identifier. Used when a
decision depends on attributes the request supplies — e.g. an
admin scoped to one workspace creating a user *with workspace
association W*: the resource is the system-level user registry,
and W is a parameter the regime checks against the admin's
scope.
Different regimes use the four arguments differently — the OSS
regime checks role bundles against the capability and the role's
workspace scope against parameters; an SSO regime might consult an
upstream IdP's group memberships; an ABAC regime evaluates a
policy with all four as inputs. The contract is unchanged.
### `authorise_many`
```
authorise_many(identity: Identity,
checks: list[(str, Resource, dict)])
→ list[Decision]
```
Bulk variant of `authorise`. Same semantics, one round-trip for
many decisions. Used when an operation fans out to multiple
resources (e.g. an agent that touches several workspaces) and a
single permission check isn't sufficient.
`authorise_many` is not just a performance optimisation; it pins
the contract for fan-out operations early, before clients (or
internal callers) build patterns that assume one-permission-check-
per-request. Regimes implement it as a loop over `authorise`
unless they have a more efficient path.
### Management operations
Beyond the request-time `authenticate` / `authorise`, the contract
also covers identity-lifecycle and credential-lifecycle operations
that are invoked by administrative requests rather than by the
authentication path. These are regime-specific in detail (an SSO
regime that delegates user management to the IdP may not implement
most of them) but the operation set the gateway can forward is:
- User management: `create-user`, `list-users`, `get-user`,
`update-user`, `disable-user`, `enable-user`, `delete-user`
- Credential management: `create-api-key`, `list-api-keys`,
`revoke-api-key`, `change-password`, `reset-password`
- Workspace management: `create-workspace`, `list-workspaces`,
`get-workspace`, `update-workspace`, `disable-workspace`
- Session management: `login`
- Key management: `get-signing-key-public`, `rotate-signing-key`
- Bootstrap: `bootstrap`
A regime that does not support one of these (e.g. an SSO regime
where users are managed in the IdP) returns a defined "not
supported" error; the gateway surfaces it as a 501.
## The `Identity` surface
`Identity` is *mostly* opaque. The gateway holds the value as a
token to quote back when calling `authorise`, never decomposing it.
But there are a few gateway-side concerns that need a small
surface:
| Field | Purpose |
|---|---|
| `handle` | Opaque reference passed back to `authorise`. Regime-defined; gateway treats as a string. |
| `workspace` | The workspace this credential authenticates to. Used by the gateway only as a default-fill-in for operations that omit a workspace. Never used as policy input — when authorisation needs to know which workspace the operation acts on, the operation places it in the resource address (or a parameter), and the regime decides. |
| `principal_id` | Stable identifier the gateway logs for audit (a user id, a sub claim, a service account id). Never used for authorisation — that's `authorise`'s job. |
| `source` | How the credential was presented (`api-key`, `jwt`, …). Non-policy; useful for logs and metrics only. |
Anything else — roles, claims, group memberships, policy attributes
— stays inside the regime and is reachable only via `authorise`.
## The `Resource` model
A `Resource` is a structured value identifying *what is being
operated on*. Resources live at one of three levels in TrustGraph,
based on where the resource exists in the deployment:
### Resource levels
| Level | What lives there | Resource shape |
|---|---|---|
| **System** | The user registry, the workspace registry, the signing key, the audit log — anything that exists once per deployment. | `{}` |
| **Workspace** | A workspace's config, flow definitions, library (documents), knowledge cores, collections — things that exist *within* a workspace. | `{workspace: "..."}` |
| **Flow** | A flow's knowledge graph, agent state, LLM context, embedding state, MCP context — things that exist *within* a flow within a workspace. | `{workspace: "...", flow: "..."}` |
Note carefully:
- **Users are a system-level resource.** A user record exists at
the deployment level; the fact that a user has a *workspace
association* (one in OSS, possibly many in other regimes) is a
property of the user record, not a containment. Operations on
the user registry have `resource = {}`; the workspace
association appears as a *parameter*, not as a resource address
component.
- **Workspaces themselves are a system-level resource.** The
workspace registry exists at the deployment level. `create-
workspace` and `list-workspaces` are system-level operations;
the workspace identifier in their bodies is a parameter, not an
address.
- **A workspace's contents are workspace-level resources.** A
workspace's config, flows, library, etc. live within a
workspace. Their resource address is `{workspace: ...}`.
- **A flow's contents are flow-level resources.** Knowledge
graphs, agents, etc. live within a flow. Their resource
address is `{workspace: ..., flow: ...}`.
### Component vocabulary
| Component | Type | Meaning | Used by |
|---|---|---|---|
| `workspace` | string | Identifier of the workspace whose contents are being operated on | workspace-level and flow-level resource addresses |
| `flow` | string | Identifier of a flow within a workspace; always paired with `workspace` | flow-level resource addresses |
| `collection` | string | Reserved for finer-grained scoping within a workspace | future / enterprise |
| `document` | string | Reserved for per-document scoping | future / enterprise |
A `Resource` is a partial mapping of these components to values.
The level of the resource (system / workspace / flow) determines
which components must be present. An empty `{}` is the
system-level resource.
### Workspace as parameter vs. address
Workspace plays two distinct roles in operations and shows up in
two distinct places:
- **As a resource address component** — workspace identifies the
thing being operated on. Lives in `resource.workspace`. Example:
`config:read` reads the config *of* workspace W.
- **As an operation parameter** — workspace is data the operation
acts on or filters by, while the resource itself is system-level.
Lives in `parameters.workspace`. Example: `users:write`
creates a user *with workspace association* W; the resource is
the user registry (system), and W is a parameter.
These are not interchangeable. The IAM regime considers each role
separately; the OSS role table, for instance, applies workspace-
scope to the address component when checking workspace-level
operations, and to a parameter when checking
"create-user-with-workspace-W". Both end up enforcing the admin's
scope, but through different code paths.
### Extension rules
The vocabulary is closed but extensible. Adding a new component:
1. The component is added to the vocabulary in this spec, with a
defined name, type, and meaning.
2. Existing IAM regimes ignore unknown components (forward
compatibility — adding a new component does not break older
regimes that don't understand it).
3. Older gateways that don't populate a new component leave it
unset; regimes that need it for a decision treat "unset" as
"absent" and decide accordingly (typically: cannot grant
permission scoped to a component the gateway didn't supply).
A regime that wants stricter behaviour (e.g. fail-closed on
unknown components rather than ignoring them) declares so as part
of its own configuration; the contract default is "ignore unknown".
## Operation registry (gateway-side)
Mapping a request onto `(capability, resource, parameters)` is
service-specific — it cannot be inferred from the capability
alone. The gateway maintains an **operation registry** that
declares, per operation:
- The required capability.
- The resource level (system / workspace / flow) — determines the
shape of the resource identifier.
- How to extract the resource address components (workspace,
flow) from the request — from URL path, WebSocket envelope, or
body.
- Which body fields are operation parameters (and which of those
the IAM regime should see in the `parameters` argument).
This registry is part of the gateway's endpoint declarations, not
part of the IAM contract. The contract specifies what arguments
`authorise` receives; how the gateway populates them is its own
concern.
In the OSS gateway, registry keys follow these conventions:
| Pattern | Used by | Resource level |
|---|---|---|
| bare op name (`create-user`, `list-users`, `login`, …) | `/api/v1/iam` and the auth surface | system / workspace, per op |
| `<kind>:<op>` (`config:get`, `flow:list-blueprints`, `librarian:add-document`, …) | `/api/v1/{kind}` (workspace-scoped global services) | workspace |
| `flow-service:<kind>` (`flow-service:agent`, `flow-service:graph-rag`, …) | `/api/v1/flow/{flow}/service/{kind}` and the WS Mux | flow |
| `flow-import:<kind>` / `flow-export:<kind>` | `/api/v1/flow/{flow}/{import,export}/{kind}` streaming sockets | flow |
Keys are an OSS-gateway implementation detail — the contract does
not constrain naming. The conventions above exist so the registry
key is uniquely derivable from the request path and (where
applicable) body without ambiguity.
## Caching
Both `authenticate` and `authorise` results are cached at the
gateway, on different policies:
- **`authenticate`** — cached by a hash of the credential. The OSS
gateway uses a fixed short TTL (currently 60 s) so that revoked
API keys and disabled users stop working within the TTL window
without any push mechanism. Regimes that want a different
behaviour can return an `expires` hint with the identity; the
gateway honours the smaller of `expires` and its own ceiling.
- **`authorise`** — cached by a hash of `(handle, capability,
resource, parameters)`. The regime returns a suggested TTL with
the decision; the gateway clamps it above by a deployment-set
ceiling (currently 60 s). Both allow and deny decisions are
cached; denies briefly, to avoid hammering the regime with
repeated rejected attempts.
The TTL ceiling caps the revocation latency window — a role
revoked at the regime takes effect at the gateway no later than
the ceiling. Operators that need stricter revocation can lower
the ceiling.
## Failure modes
| Condition | Behaviour |
|---|---|
| `authenticate` returns AuthFailure | Gateway responds 401 with the masked `auth failure` body. |
| `authorise` returns deny | Gateway responds 403 with the masked `access denied` body. |
| IAM regime unreachable | Gateway responds 401 / 503 (deployment-defined). No fail-open. |
| `authorise_many` partial deny | Gateway treats the request as denied; the operation is rejected. Partial-success semantics are not part of the contract. |
| Regime returns "not supported" for a management operation | Gateway responds 501. |
There is no fallback or "soft" decision path. An IAM regime that
is unavailable, slow, or returning errors causes requests to fail
closed.
## Implementations
### Open-source role-based regime
Defined in [`iam-protocol.md`](iam-protocol.md). Implements the
contract via:
- A pub/sub request/response service (`iam-svc`) reached only by
the gateway over the message bus.
- Credentials are API keys (opaque) or JWTs (Ed25519, locally
validated by the gateway against the regime's published public
key).
- `authorise` reduces to a role-and-workspace-scope check against
the role table defined in [`capabilities.md`](capabilities.md).
- Identity, user, and workspace records live in Cassandra.
The OSS regime is deliberately simple — three roles, single
home-workspace per user (a regime data-model decision, not a
contract assertion), no policy language.
### Future regimes
The contract is shaped to admit, without code change in the
gateway:
- **OIDC / SSO**`authenticate` validates an OIDC ID token via
the IdP's JWKS; `Identity.handle` carries the verified subject
and group claims; `authorise` evaluates against group-to-
capability mappings configured at the regime.
- **ABAC / Policy engine**`authorise` calls out to a policy
engine (Rego, Cedar, custom DSL) with the identity's attributes
and the resource as the policy input.
- **ReBAC (Zanzibar-style)**`authorise` translates `(identity,
capability, resource)` into a relationship-tuple lookup against
a tuple store.
- **Hybrid** — multiple regimes composed: e.g. authenticate via
SSO, authorise via local policy.
None of these require gateway changes. The contract surface is
the same; the regime is what differs.
## References
- [Identity and Access Management Specification](iam.md) — overall
design and the gateway-side framing.
- [IAM Service Protocol Specification](iam-protocol.md) — the OSS
regime's wire-level protocol.
- [Capability Vocabulary Specification](capabilities.md) — the
capability strings the gateway uses as `authorise` input.

View file

@ -8,21 +8,41 @@ parent: "Tech Specs"
## Overview
The IAM service is a backend processor, reached over the standard
request/response pub/sub pattern. It is the authority for users,
workspaces, API keys, and login credentials. The API gateway
delegates to it for authentication resolution and for all user /
workspace / key management.
This document specifies the wire protocol of the **open-source IAM
regime** — one implementation of the abstract IAM contract defined
in [`iam-contract.md`](iam-contract.md). Other regimes (OIDC / SSO,
ABAC, ReBAC, external policy engines) implement the same contract
with different transports, data models, and policy semantics; the
gateway is unaware of which regime it's wired against.
This document defines the wire protocol: the `IamRequest` and
`IamResponse` dataclasses, the operation set, the per-operation
input and output fields, the error taxonomy, and the initial HTTP
forwarding endpoint used while IAM is being integrated into the
gateway.
The OSS regime is a backend processor (`iam-svc`) reached over the
standard request/response pub/sub pattern. It owns users,
workspaces, API keys, login credentials, and JWT signing keys, all
backed by Cassandra. The API gateway is its only caller.
Architectural context — roles, capabilities, workspace scoping,
enforcement boundary — lives in [`iam.md`](iam.md) and
[`capabilities.md`](capabilities.md).
This document defines:
- the `IamRequest` and `IamResponse` dataclasses on the bus,
- the operation set the OSS regime implements,
- per-operation input and output fields,
- the error taxonomy,
- the bootstrap modes,
- the initial HTTP forwarding endpoint used while the protocol is
being exercised.
The mapping from this regime onto the abstract contract is direct:
| Contract operation | OSS regime operation |
|---|---|
| `authenticate(credential)` | `resolve-api-key` (for API keys); local JWT validation against `get-signing-key-public` (for JWTs) |
| `authorise(identity, capability, resource, parameters)` | Role-table lookup against the OSS role bundles defined in [`capabilities.md`](capabilities.md), gated by workspace scope. Workspace can come from the resource address (workspace- and flow-level resources) or from a parameter (system-level resources whose parameters reference a workspace, e.g. `create-user with workspace association W`). |
| `authorise_many` | Loop over `authorise` |
| Identity / credential / workspace management | `create-user`, `create-api-key`, etc. as listed below. These are operations on system-level resources (the user / workspace / credential registries); workspace, where it appears in the body, is a parameter. |
Architectural context — roles, capabilities, workspace as resource
scope, enforcement boundary — lives in [`iam.md`](iam.md) and
[`capabilities.md`](capabilities.md). The contract abstraction
lives in [`iam-contract.md`](iam-contract.md).
## Transport
@ -345,5 +365,7 @@ lands in the subsequent middleware work.
## References
- [IAM Contract Specification](iam-contract.md) — the abstract
gateway↔IAM regime contract this protocol implements.
- [Identity and Access Management Specification](iam.md)
- [Capability Vocabulary Specification](capabilities.md)

View file

@ -199,9 +199,9 @@ The server rejects all non-auth messages until authentication succeeds.
The socket remains open on auth failure, allowing the client to retry
with a different token without reconnecting. The client can also send
a new auth message at any time to re-authenticate — for example, to
refresh an expiring JWT or to switch workspace. The
resolved identity (user, workspace, roles) is updated on each
successful auth.
refresh an expiring JWT or to switch workspace. The resolved
identity (handle, workspace, principal_id, source) is updated on
each successful auth.
#### API keys
@ -219,7 +219,7 @@ For programmatic access: CLI tools, scripts, and integrations.
On each request, the gateway resolves an API key by:
1. Hashing the token.
2. Checking a local cache (hash → user/workspace/roles).
2. Checking a local cache (hash → identity).
3. On cache miss, calling the IAM service to resolve.
4. Caching the result with a short TTL (e.g. 60 seconds).
@ -233,9 +233,15 @@ For interactive access via the UI or WebSocket connections.
- A user logs in with username and password. The gateway forwards the
request to the IAM service, which validates the credentials and
returns a signed JWT.
- The JWT carries the user ID, workspace, and roles as claims.
- The JWT carries identity-binding claims only — user id (`sub`)
and the workspace this credential authenticates to. No roles,
no policy state. Per the IAM contract, all policy decisions go
through `authorise`; the gateway never reads roles or other
regime-internal state from the credential.
- The gateway validates JWTs locally using the IAM service's public
signing key — no service call needed on subsequent requests.
signing key — no service call needed for the authentication step;
authorisation calls remain per-request (cached per the contract's
caching rules).
- Token expiry is enforced by standard JWT validation at the time the
request (or WebSocket connection) is made.
- For long-lived WebSocket connections, the JWT is validated at connect
@ -285,35 +291,82 @@ authentication uses API keys or JWTs. On first start, the bootstrap
process creates a default workspace and admin user with an initial API
key.
### User identity
### Identity, credentials, and workspace binding
A user belongs to exactly one workspace. The design supports extending
this to multi-workspace access in the future (see
[Extension points](#extension-points)).
The gateway never asks "which workspace does *this user* belong to?".
That question forces every IAM regime to expose a user-to-workspace
mapping, which prevents regimes where the relationship is many-to-many
or doesn't exist (e.g. SSO with IdP-driven workspace selection).
Instead, the gateway asks "which workspace does *this credential*
authenticate to?" — a question every regime can answer in its own
terms.
A user record contains:
A credential (API key, JWT, OIDC token, etc.) is **bound to a
workspace at issue time**. The IAM regime decides what binding
means:
- **OSS regime** — each user has a home workspace; credentials
issued to that user are bound to that workspace. A 1:1
user-to-workspace constraint is an internal data-model decision,
not a contract assertion.
- **Multi-workspace regime** (future / enterprise) — a user with
access to several workspaces gets a different credential per
workspace. Each credential authenticates to exactly one
workspace; the relationship between user and workspace is a
regime-internal detail the gateway does not see.
When the gateway authenticates a credential, the IAM regime returns
an `Identity` whose `workspace` is the workspace this credential is
for. That value — not "the user's workspace" — is what the gateway
uses for default-fill-in and as input to the IAM `authorise` call.
#### Identity surface
What the gateway holds after `authenticate`:
| Field | Purpose |
|-------|---------|
| `handle` | Opaque token quoted back when calling `authorise`. Regime-defined. |
| `workspace` | The workspace this credential authenticates to. Used as the default if a request omits workspace. |
| `principal_id` | Stable identifier for audit logging (a user id, sub claim, service account id). Never used for authorisation. |
| `source` | How the credential was presented (`api-key`, `jwt`). Logged with audit events; not policy input. |
Anything else — roles, claims, group memberships, policy attributes
— stays inside the regime and is reachable only via `authorise`.
See [`iam-contract.md`](iam-contract.md) for the full contract.
#### OSS user record
The OSS regime stores the following per user. These fields are
**OSS-implementation specifics**, not part of the contract.
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique user identifier (UUID) |
| `name` | string | Display name |
| `email` | string | Email address (optional) |
| `workspace` | string | Workspace the user belongs to |
| `workspace` | string | Home workspace; default binding for issued credentials |
| `roles` | list[string] | Assigned roles (e.g. `["reader"]`) |
| `enabled` | bool | Whether the user can authenticate |
| `created` | datetime | Account creation timestamp |
The `workspace` field maps to the existing `user` field in `Metadata`.
This means the storage-layer isolation (Cassandra, Neo4j, Qdrant
filtering by `user` + `collection`) works without changes — the gateway
sets the `user` metadata field to the authenticated user's workspace.
The `workspace` field on a user record is the **default binding**
used when issuing credentials, not a constraint visible to the
gateway. An enterprise regime may have no user records at all
(authentication delegated to an IdP).
### Workspaces
A workspace is an isolated data boundary. Users belong to a workspace,
and all data operations are scoped to it. Workspaces map to the existing
`user` field in `Metadata` and the corresponding Cassandra keyspace,
Qdrant collection prefix, and Neo4j property filters.
A workspace is an isolated data boundary — a tenancy scope in which
users, flows, configuration, documents, and knowledge graphs live.
Workspaces map to storage-layer isolation: the `user` field in
`Metadata`, the corresponding Cassandra keyspace, the Qdrant
collection prefix, the Neo4j property filter.
Workspace is the most prominent component of an operation's
**resource scope**: when a request says "do X to Y", workspace is
part of "Y". Listing users, creating flows, querying the graph —
all of these target a specific workspace.
| Field | Type | Description |
|-------|------|-------------|
@ -322,57 +375,164 @@ Qdrant collection prefix, and Neo4j property filters.
| `enabled` | bool | Whether the workspace is active |
| `created` | datetime | Creation timestamp |
All data operations are scoped to a workspace. The gateway determines
the effective workspace for each request as follows:
#### Default-fill-in
1. If the request includes a `workspace` parameter, validate it against
the user's assigned workspace.
- If it matches, use it.
- If it does not match, return 403. (This could be extended to
check a workspace access grant list.)
2. If no `workspace` parameter is provided, use the user's assigned
workspace.
If a request omits workspace, the gateway fills it in from the
authenticated identity's bound workspace (`identity.workspace`)
before any IAM check runs. IAM never receives an unresolved
workspace; every `authorise` call sees a concrete value.
The gateway sets the `user` field in `Metadata` to the effective
workspace ID, replacing the caller-supplied `?user=` query parameter.
#### Authorisation
This design ensures forward compatibility. Clients that pass a
workspace parameter will work unchanged if multi-workspace support is
added later. Requests for an unassigned workspace get a clear 403
rather than silent misbehaviour.
Whether the resolved workspace is permitted to be operated on by
this caller is an **IAM decision**, not a gateway one. The gateway
calls `authorise(identity, capability, {workspace: ..., ...})` and
relays the answer. In the OSS regime, the answer comes from the
caller's role × workspace-scope — see [`capabilities.md`](capabilities.md).
In other regimes it could come from group mappings, policies,
relationship tuples, or anything else the regime models.
### Request anatomy
The shape of a request — where workspace appears, where flow
appears, where parameters live — follows from **the level of the
resource being operated on**, not from any single property of the
request like its URL or its required capability.
Resources live at one of three levels (see also the resource model
in [`iam-contract.md`](iam-contract.md)):
| Resource level | Examples | Resource address |
|---|---|---|
| **System** | The user registry, the workspace registry, the IAM signing key, the audit log | empty `{}` |
| **Workspace** | A workspace's config, flow definitions, library, knowledge cores, collections | `{workspace: ...}` |
| **Flow** | A flow's knowledge graph, agent state, LLM context, embeddings, MCP context | `{workspace: ..., flow: ...}` |
For the gateway-to-bus mapping this dictates **where workspace
lives in the message**, but only when workspace is part of the
*resource address*. Workspace can also appear as an *operation
parameter* on system-level resources (see below).
#### Workspace as address vs. parameter
Two distinct roles, two distinct locations:
- **Workspace as address component.** Workspace identifies the
thing being operated on. Used for workspace-level and flow-level
resources. Lives in the addressing layer of the message — the
URL path for HTTP, or the WebSocket envelope alongside `flow` for
flow-scoped operations sent through the Mux.
- **Workspace as operation parameter.** Workspace is data the
operation acts on, while the resource itself is system-level.
Used for operations on the user registry (`create-user with
workspace association W`), the workspace registry (`create-
workspace W`), and other system-level operations that happen to
reference a workspace. Lives in the request body or inner WS
payload alongside the operation's other parameters.
The two roles never coexist on the same operation. Either the
operation addresses something within a workspace (workspace is in
the address) or it operates on a system-level resource with
workspace as a parameter (workspace is in the body) or workspace
is irrelevant (system-level operations like `bootstrap`,
`rotate-signing-key`, `login` itself).
#### Where workspace lives, by request type
| Request type | Resource level | Workspace lives in |
|---|---|---|
| Flow-scoped data plane (`agent`, `graph-rag`, `llm`, `embeddings`, `mcp`, etc.) | Flow | Envelope alongside `flow` (WS) or URL path (HTTP) — part of the address |
| Workspace-scoped control plane (`config`, `library`, `knowledge`, `collection-management`, flow lifecycle) | Workspace | Body / inner request — part of the address |
| User registry ops (`create-user`, `list-users`, `disable-user`, etc.) | System | Body — as a *parameter* (the user's workspace association or a list filter) |
| Workspace registry ops (`create-workspace`, `list-workspaces`, etc.) | System | Body — as a *parameter* (the workspace identifier in `workspace_record`) |
| Credential ops (`create-api-key`, `revoke-api-key`, `change-password`, `reset-password`) | System | Body — as a *parameter* on ops that have one; absent on `change-password` (target is the caller's identity) |
| System ops (`bootstrap`, `login`, `rotate-signing-key`, `get-signing-key-public`) | System | Not present at all |
The classification is deliberate. Users are a global concept that
*have* a workspace; they don't *live* in one. An OSS regime has
1:1 user-to-workspace; a multi-workspace regime maps a user to many
workspaces; an SSO regime might delegate workspace membership to an
IdP entirely. The gateway treats user-registry operations as
system-level so the contract is the same across regimes — the
workspace association is a parameter the regime interprets in its
own terms.
#### HTTP
HTTP routes by URL path, so the address lives in the URL.
Per-operation REST shape:
- Flow-level: `POST /api/v1/workspaces/{w}/flows/{f}/services/{kind}`
`workspace` and `flow` are URL components.
- Workspace-level: `POST /api/v1/workspaces/{w}/config`,
`/api/v1/workspaces/{w}/library`, etc. — `workspace` is a URL
component.
- System-level: `POST /api/v1/users`, `/api/v1/workspaces`, etc. —
no workspace in URL; if the operation references one, it's a
field in the body.
`/api/v1/iam` is itself registry-driven: the body's `operation`
field is looked up against the registry to obtain the capability,
resource shape, and parameter shape per operation, rather than
gating the whole endpoint with a single coarse capability.
#### WebSocket Mux
The Mux envelope is the addressing layer for flow-scoped
operations. For workspace-level and system-level operations the
envelope routes by `service` only, and the inner request payload
carries the address components or parameters as appropriate. See
[`iam-contract.md`](iam-contract.md) for the operation-registry
mechanism the Mux uses to know which fields to read.
### Roles and access control
Three roles with fixed permissions:
Roles are an OSS-regime concept and live entirely in the IAM
service. The gateway does not enumerate or check them — it asks
`authorise(identity, capability, resource, parameters)` per
request and the regime maps the caller's roles to a decision.
| Role | Data operations | Admin operations | System |
|------|----------------|-----------------|--------|
| `reader` | Query knowledge graph, embeddings, RAG | None | None |
| `writer` | All reader operations + load documents, manage collections | None | None |
| `admin` | All writer operations | Config, flows, collection management, user management | Metrics |
The OSS regime ships three roles:
Role checks happen at the gateway before dispatching to backend
services. Each endpoint declares the minimum role required:
| Role | Capabilities granted |
|------|----------------------|
| `reader` | Read capabilities on data and config (`graph:read`, `documents:read`, `rows:read`, `config:read`, `flows:read`, `knowledge:read`, `collections:read`, `keys:self`, plus the per-service caps `agent`, `llm`, `embeddings`, `mcp`). |
| `writer` | All reader capabilities, plus `graph:write`, `documents:write`, `rows:write`, `knowledge:write`, `collections:write`. |
| `admin` | All writer capabilities, plus `config:write`, `flows:write`, `users:read`, `users:write`, `users:admin`, `keys:admin`, `workspaces:admin`, `iam:admin`, `metrics:read`. |
| Endpoint pattern | Minimum role |
|-----------------|--------------|
| `GET /api/v1/socket` (queries) | `reader` |
| `POST /api/v1/librarian` | `writer` |
| `POST /api/v1/flow/*/import/*` | `writer` |
| `POST /api/v1/config` | `admin` |
| `GET /api/v1/flow/*` | `admin` |
| `GET /api/metrics` | `admin` |
Workspace scope: `reader` and `writer` are active only in the
caller's bound workspace; `admin` is active across all workspaces.
Roles are hierarchical: `admin` implies `writer`, which implies
`reader`.
The gateway gates each endpoint by *capability*, not by role.
Capabilities are declared per operation in the gateway's operation
registry; see [`iam-contract.md`](iam-contract.md) for the
registry mechanism and [`capabilities.md`](capabilities.md) for
the capability vocabulary.
### IAM service
The IAM service is a new backend service that manages all identity and
access data. It is the authority for users, workspaces, API keys, and
credentials. The gateway delegates to it.
The IAM service is a backend service that implements the
[IAM contract](iam-contract.md) — `authenticate`, `authorise`, and
the management operations the gateway forwards. It is the
authority for identity, credential validation, and access decisions.
The gateway treats it as a black box behind the contract; nothing
in the gateway is regime-specific.
#### Data model
The OSS distribution ships one IAM regime: a role-based service
backed by Cassandra, described in
[`iam-protocol.md`](iam-protocol.md). Enterprise / future regimes
can replace this implementation without changing the gateway, the
wire protocol between gateway and backends, or the capability
vocabulary — see the contract spec for the abstraction the gateway
is wired against and the implementation notes for what other
regimes look like.
#### OSS data model
The OSS regime stores users, workspaces, API keys, and signing
keys in Cassandra. This is an **OSS regime implementation
detail**; it is not part of the contract. Other regimes will have
different (or no) data models.
```
iam_workspaces (
@ -456,42 +616,53 @@ surface — e.g. `"missing required field 'workspace'"` or
### Gateway changes
The current `Authenticator` class is replaced with a thin authentication
middleware that delegates to the IAM service:
The current `Authenticator` class is replaced with a thin
authentication+authorisation middleware that delegates to the IAM
service per the IAM contract. The gateway performs no role check
itself — authorisation is asked of the regime via `authorise`.
For HTTP requests:
1. Extract Bearer token from the `Authorization` header.
2. If the token has JWT format (dotted structure):
- Validate signature locally using the cached public key.
- Extract user ID, workspace, and roles from claims.
- Build an `Identity` from `sub` and `workspace` claims (no
other claims are consulted).
3. Otherwise, treat as an API key:
- Hash the token and check the local cache.
- On cache miss, call the IAM service to resolve.
- Cache the result (user/workspace/roles) with a short TTL.
- On cache miss, call the IAM service to resolve to an
`Identity` (handle, workspace, principal_id, source).
- Cache the result with a short TTL.
4. If neither succeeds, return 401.
5. If the user or workspace is disabled, return 403.
6. Check the user's role against the endpoint's minimum role. If
insufficient, return 403.
7. Resolve the effective workspace:
- If the request includes a `workspace` parameter, validate it
against the user's assigned workspace. Return 403 on mismatch.
- If no `workspace` parameter, use the user's assigned workspace.
8. Set the `user` field in the request context to the effective
workspace ID. This propagates through `Metadata` to all downstream
services.
5. Look up the operation in the gateway's operation registry to get
`(capability, resource_level, extractors)`. Build the resource
address (system / workspace / flow level) and parameters from
the request.
6. Default-fill the workspace into the body when the operation is
workspace- or flow-level (so downstream code sees a single
canonical address); the resource address keeps its supplied
value.
7. Call `authorise(identity, capability, resource, parameters)`.
On allow, forward the request; on deny, return 403. On regime
error, fail closed (401 / 503 per deployment).
8. Cache the decision per the contract's caching rules (clamped
above by a deployment-set ceiling).
For WebSocket connections:
1. Accept the connection in an unauthenticated state.
2. Wait for an auth message (`{"type": "auth", "token": "..."}`).
3. Validate the token using the same logic as steps 2-7 above.
3. Validate the token using the same logic as steps 1-3 above.
4. On success, attach the resolved identity to the connection and
send `{"type": "auth-ok", ...}`.
5. On failure, send `{"type": "auth-failed", ...}` but keep the
socket open.
6. Reject all non-auth messages until authentication succeeds.
7. Accept new auth messages at any time to re-authenticate.
8. For each subsequent request frame, look up
`flow-service:<service>` in the registry and call `authorise`
against the `{workspace, flow}` resource — same authority
gateway HTTP callers see, evaluated per-frame.
### CLI changes
@ -892,6 +1063,12 @@ service, not in the config service. Reasons:
## References
- [IAM Contract Specification](iam-contract.md) — the gateway↔IAM
regime abstraction this design is wired against.
- [IAM Service Protocol Specification](iam-protocol.md) — the OSS
regime's wire-level protocol.
- [Capability Vocabulary Specification](capabilities.md) — the
capability strings the gateway uses as `authorise` input.
- [Data Ownership and Information Separation](data-ownership-model.md)
- [MCP Tool Bearer Token Specification](mcp-tool-bearer-token.md)
- [Multi-Tenant Support Specification](multi-tenant-support.md)