mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-29 10:26:21 +02:00
367 lines
16 KiB
Markdown
367 lines
16 KiB
Markdown
|
|
---
|
||
|
|
layout: default
|
||
|
|
title: "IAM Contract Technical Specification"
|
||
|
|
parent: "Tech Specs"
|
||
|
|
---
|
||
|
|
|
||
|
|
# IAM Contract Technical Specification
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
The IAM contract is the abstraction between the API gateway and any
|
||
|
|
identity / access management regime that fronts it. The gateway
|
||
|
|
treats IAM as a black box behind two operations — *authenticate* and
|
||
|
|
*authorise* — plus a small surface of management operations. No
|
||
|
|
regime-specific concept (roles, scopes, groups, claims, policy
|
||
|
|
languages) is visible to the gateway, and no gateway-specific
|
||
|
|
concept (capability vocabulary, request anatomy) is visible to
|
||
|
|
backend services.
|
||
|
|
|
||
|
|
The TrustGraph open-source distribution ships one IAM regime — a
|
||
|
|
role-based implementation defined in
|
||
|
|
[`iam-protocol.md`](iam-protocol.md) — that is one implementation of
|
||
|
|
this contract. Enterprise editions can replace it with a different
|
||
|
|
regime (OIDC / SSO, ABAC, ReBAC, external policy engine) without
|
||
|
|
changing the gateway, the wire protocol, or the backends.
|
||
|
|
|
||
|
|
## Motivation
|
||
|
|
|
||
|
|
Authorisation models vary by deployment. A small team might be
|
||
|
|
happy with three predefined roles; an enterprise might need group-
|
||
|
|
mapping from an upstream IdP, attribute-based policies, or
|
||
|
|
relationship-based access control. Hard-wiring any one of those
|
||
|
|
into the gateway forces every other regime to either compromise its
|
||
|
|
model or be re-implemented.
|
||
|
|
|
||
|
|
A narrow contract — "authenticate this credential" and "may this
|
||
|
|
identity perform this operation on this resource" — captures what
|
||
|
|
the gateway actually needs to know without committing to a policy
|
||
|
|
shape. The IAM regime owns the policy decision; the gateway is a
|
||
|
|
generic enforcement point.
|
||
|
|
|
||
|
|
## Operations
|
||
|
|
|
||
|
|
### `authenticate`
|
||
|
|
|
||
|
|
```
|
||
|
|
authenticate(credential: bytes) → Identity | AuthFailure
|
||
|
|
```
|
||
|
|
|
||
|
|
Validates a credential the client presented. The gateway treats
|
||
|
|
the credential as opaque bytes — for the OSS regime today that's
|
||
|
|
either an API key plaintext or a JWT, but the gateway does not
|
||
|
|
parse them; the IAM regime decides.
|
||
|
|
|
||
|
|
On success, returns an `Identity`. On any failure the IAM regime
|
||
|
|
returns the same opaque `AuthFailure` — never a description of which
|
||
|
|
condition failed. This is the spec's masked-error rule: an
|
||
|
|
attacker probing the endpoint cannot distinguish "no such key",
|
||
|
|
"expired", "wrong signature", "revoked", "user disabled", etc.
|
||
|
|
|
||
|
|
### `authorise`
|
||
|
|
|
||
|
|
```
|
||
|
|
authorise(identity: Identity,
|
||
|
|
capability: str,
|
||
|
|
resource: Resource,
|
||
|
|
parameters: dict)
|
||
|
|
→ Decision
|
||
|
|
```
|
||
|
|
|
||
|
|
Asks whether the identity is permitted to perform the named
|
||
|
|
capability on the named resource, given the operation's
|
||
|
|
parameters. Returns `allow` or `deny`. `identity` is whatever
|
||
|
|
`authenticate` returned for this caller; the gateway never
|
||
|
|
decomposes it.
|
||
|
|
|
||
|
|
The four arguments separate concerns:
|
||
|
|
|
||
|
|
- **`identity`** — who is asking.
|
||
|
|
- **`capability`** — what permission they are exercising (e.g.
|
||
|
|
`users:write`, `graph:read`). Permission, not structure.
|
||
|
|
- **`resource`** — what is being operated on, as a structured
|
||
|
|
identifier. See *The Resource model* below.
|
||
|
|
- **`parameters`** — operation-specific data that the regime may
|
||
|
|
need to consider beyond the resource identifier. Used when a
|
||
|
|
decision depends on attributes the request supplies — e.g. an
|
||
|
|
admin scoped to one workspace creating a user *with workspace
|
||
|
|
association W*: the resource is the system-level user registry,
|
||
|
|
and W is a parameter the regime checks against the admin's
|
||
|
|
scope.
|
||
|
|
|
||
|
|
Different regimes use the four arguments differently — the OSS
|
||
|
|
regime checks role bundles against the capability and the role's
|
||
|
|
workspace scope against parameters; an SSO regime might consult an
|
||
|
|
upstream IdP's group memberships; an ABAC regime evaluates a
|
||
|
|
policy with all four as inputs. The contract is unchanged.
|
||
|
|
|
||
|
|
### `authorise_many`
|
||
|
|
|
||
|
|
```
|
||
|
|
authorise_many(identity: Identity,
|
||
|
|
checks: list[(str, Resource, dict)])
|
||
|
|
→ list[Decision]
|
||
|
|
```
|
||
|
|
|
||
|
|
Bulk variant of `authorise`. Same semantics, one round-trip for
|
||
|
|
many decisions. Used when an operation fans out to multiple
|
||
|
|
resources (e.g. an agent that touches several workspaces) and a
|
||
|
|
single permission check isn't sufficient.
|
||
|
|
|
||
|
|
`authorise_many` is not just a performance optimisation; it pins
|
||
|
|
the contract for fan-out operations early, before clients (or
|
||
|
|
internal callers) build patterns that assume one-permission-check-
|
||
|
|
per-request. Regimes implement it as a loop over `authorise`
|
||
|
|
unless they have a more efficient path.
|
||
|
|
|
||
|
|
### Management operations
|
||
|
|
|
||
|
|
Beyond the request-time `authenticate` / `authorise`, the contract
|
||
|
|
also covers identity-lifecycle and credential-lifecycle operations
|
||
|
|
that are invoked by administrative requests rather than by the
|
||
|
|
authentication path. These are regime-specific in detail (an SSO
|
||
|
|
regime that delegates user management to the IdP may not implement
|
||
|
|
most of them) but the operation set the gateway can forward is:
|
||
|
|
|
||
|
|
- User management: `create-user`, `list-users`, `get-user`,
|
||
|
|
`update-user`, `disable-user`, `enable-user`, `delete-user`
|
||
|
|
- Credential management: `create-api-key`, `list-api-keys`,
|
||
|
|
`revoke-api-key`, `change-password`, `reset-password`
|
||
|
|
- Workspace management: `create-workspace`, `list-workspaces`,
|
||
|
|
`get-workspace`, `update-workspace`, `disable-workspace`
|
||
|
|
- Session management: `login`
|
||
|
|
- Key management: `get-signing-key-public`, `rotate-signing-key`
|
||
|
|
- Bootstrap: `bootstrap`
|
||
|
|
|
||
|
|
A regime that does not support one of these (e.g. an SSO regime
|
||
|
|
where users are managed in the IdP) returns a defined "not
|
||
|
|
supported" error; the gateway surfaces it as a 501.
|
||
|
|
|
||
|
|
## The `Identity` surface
|
||
|
|
|
||
|
|
`Identity` is *mostly* opaque. The gateway holds the value as a
|
||
|
|
token to quote back when calling `authorise`, never decomposing it.
|
||
|
|
But there are a few gateway-side concerns that need a small
|
||
|
|
surface:
|
||
|
|
|
||
|
|
| Field | Purpose |
|
||
|
|
|---|---|
|
||
|
|
| `handle` | Opaque reference passed back to `authorise`. Regime-defined; gateway treats as a string. |
|
||
|
|
| `workspace` | The workspace this credential authenticates to. Used by the gateway only as a default-fill-in for operations that omit a workspace. Never used as policy input — when authorisation needs to know which workspace the operation acts on, the operation places it in the resource address (or a parameter), and the regime decides. |
|
||
|
|
| `principal_id` | Stable identifier the gateway logs for audit (a user id, a sub claim, a service account id). Never used for authorisation — that's `authorise`'s job. |
|
||
|
|
| `source` | How the credential was presented (`api-key`, `jwt`, …). Non-policy; useful for logs and metrics only. |
|
||
|
|
|
||
|
|
Anything else — roles, claims, group memberships, policy attributes
|
||
|
|
— stays inside the regime and is reachable only via `authorise`.
|
||
|
|
|
||
|
|
## The `Resource` model
|
||
|
|
|
||
|
|
A `Resource` is a structured value identifying *what is being
|
||
|
|
operated on*. Resources live at one of three levels in TrustGraph,
|
||
|
|
based on where the resource exists in the deployment:
|
||
|
|
|
||
|
|
### Resource levels
|
||
|
|
|
||
|
|
| Level | What lives there | Resource shape |
|
||
|
|
|---|---|---|
|
||
|
|
| **System** | The user registry, the workspace registry, the signing key, the audit log — anything that exists once per deployment. | `{}` |
|
||
|
|
| **Workspace** | A workspace's config, flow definitions, library (documents), knowledge cores, collections — things that exist *within* a workspace. | `{workspace: "..."}` |
|
||
|
|
| **Flow** | A flow's knowledge graph, agent state, LLM context, embedding state, MCP context — things that exist *within* a flow within a workspace. | `{workspace: "...", flow: "..."}` |
|
||
|
|
|
||
|
|
Note carefully:
|
||
|
|
|
||
|
|
- **Users are a system-level resource.** A user record exists at
|
||
|
|
the deployment level; the fact that a user has a *workspace
|
||
|
|
association* (one in OSS, possibly many in other regimes) is a
|
||
|
|
property of the user record, not a containment. Operations on
|
||
|
|
the user registry have `resource = {}`; the workspace
|
||
|
|
association appears as a *parameter*, not as a resource address
|
||
|
|
component.
|
||
|
|
- **Workspaces themselves are a system-level resource.** The
|
||
|
|
workspace registry exists at the deployment level. `create-
|
||
|
|
workspace` and `list-workspaces` are system-level operations;
|
||
|
|
the workspace identifier in their bodies is a parameter, not an
|
||
|
|
address.
|
||
|
|
- **A workspace's contents are workspace-level resources.** A
|
||
|
|
workspace's config, flows, library, etc. live within a
|
||
|
|
workspace. Their resource address is `{workspace: ...}`.
|
||
|
|
- **A flow's contents are flow-level resources.** Knowledge
|
||
|
|
graphs, agents, etc. live within a flow. Their resource
|
||
|
|
address is `{workspace: ..., flow: ...}`.
|
||
|
|
|
||
|
|
### Component vocabulary
|
||
|
|
|
||
|
|
| Component | Type | Meaning | Used by |
|
||
|
|
|---|---|---|---|
|
||
|
|
| `workspace` | string | Identifier of the workspace whose contents are being operated on | workspace-level and flow-level resource addresses |
|
||
|
|
| `flow` | string | Identifier of a flow within a workspace; always paired with `workspace` | flow-level resource addresses |
|
||
|
|
| `collection` | string | Reserved for finer-grained scoping within a workspace | future / enterprise |
|
||
|
|
| `document` | string | Reserved for per-document scoping | future / enterprise |
|
||
|
|
|
||
|
|
A `Resource` is a partial mapping of these components to values.
|
||
|
|
The level of the resource (system / workspace / flow) determines
|
||
|
|
which components must be present. An empty `{}` is the
|
||
|
|
system-level resource.
|
||
|
|
|
||
|
|
### Workspace as parameter vs. address
|
||
|
|
|
||
|
|
Workspace plays two distinct roles in operations and shows up in
|
||
|
|
two distinct places:
|
||
|
|
|
||
|
|
- **As a resource address component** — workspace identifies the
|
||
|
|
thing being operated on. Lives in `resource.workspace`. Example:
|
||
|
|
`config:read` reads the config *of* workspace W.
|
||
|
|
- **As an operation parameter** — workspace is data the operation
|
||
|
|
acts on or filters by, while the resource itself is system-level.
|
||
|
|
Lives in `parameters.workspace`. Example: `users:write`
|
||
|
|
creates a user *with workspace association* W; the resource is
|
||
|
|
the user registry (system), and W is a parameter.
|
||
|
|
|
||
|
|
These are not interchangeable. The IAM regime considers each role
|
||
|
|
separately; the OSS role table, for instance, applies workspace-
|
||
|
|
scope to the address component when checking workspace-level
|
||
|
|
operations, and to a parameter when checking
|
||
|
|
"create-user-with-workspace-W". Both end up enforcing the admin's
|
||
|
|
scope, but through different code paths.
|
||
|
|
|
||
|
|
### Extension rules
|
||
|
|
|
||
|
|
The vocabulary is closed but extensible. Adding a new component:
|
||
|
|
|
||
|
|
1. The component is added to the vocabulary in this spec, with a
|
||
|
|
defined name, type, and meaning.
|
||
|
|
2. Existing IAM regimes ignore unknown components (forward
|
||
|
|
compatibility — adding a new component does not break older
|
||
|
|
regimes that don't understand it).
|
||
|
|
3. Older gateways that don't populate a new component leave it
|
||
|
|
unset; regimes that need it for a decision treat "unset" as
|
||
|
|
"absent" and decide accordingly (typically: cannot grant
|
||
|
|
permission scoped to a component the gateway didn't supply).
|
||
|
|
|
||
|
|
A regime that wants stricter behaviour (e.g. fail-closed on
|
||
|
|
unknown components rather than ignoring them) declares so as part
|
||
|
|
of its own configuration; the contract default is "ignore unknown".
|
||
|
|
|
||
|
|
## Operation registry (gateway-side)
|
||
|
|
|
||
|
|
Mapping a request onto `(capability, resource, parameters)` is
|
||
|
|
service-specific — it cannot be inferred from the capability
|
||
|
|
alone. The gateway maintains an **operation registry** that
|
||
|
|
declares, per operation:
|
||
|
|
|
||
|
|
- The required capability.
|
||
|
|
- The resource level (system / workspace / flow) — determines the
|
||
|
|
shape of the resource identifier.
|
||
|
|
- How to extract the resource address components (workspace,
|
||
|
|
flow) from the request — from URL path, WebSocket envelope, or
|
||
|
|
body.
|
||
|
|
- Which body fields are operation parameters (and which of those
|
||
|
|
the IAM regime should see in the `parameters` argument).
|
||
|
|
|
||
|
|
This registry is part of the gateway's endpoint declarations, not
|
||
|
|
part of the IAM contract. The contract specifies what arguments
|
||
|
|
`authorise` receives; how the gateway populates them is its own
|
||
|
|
concern.
|
||
|
|
|
||
|
|
In the OSS gateway, registry keys follow these conventions:
|
||
|
|
|
||
|
|
| Pattern | Used by | Resource level |
|
||
|
|
|---|---|---|
|
||
|
|
| bare op name (`create-user`, `list-users`, `login`, …) | `/api/v1/iam` and the auth surface | system / workspace, per op |
|
||
|
|
| `<kind>:<op>` (`config:get`, `flow:list-blueprints`, `librarian:add-document`, …) | `/api/v1/{kind}` (workspace-scoped global services) | workspace |
|
||
|
|
| `flow-service:<kind>` (`flow-service:agent`, `flow-service:graph-rag`, …) | `/api/v1/flow/{flow}/service/{kind}` and the WS Mux | flow |
|
||
|
|
| `flow-import:<kind>` / `flow-export:<kind>` | `/api/v1/flow/{flow}/{import,export}/{kind}` streaming sockets | flow |
|
||
|
|
|
||
|
|
Keys are an OSS-gateway implementation detail — the contract does
|
||
|
|
not constrain naming. The conventions above exist so the registry
|
||
|
|
key is uniquely derivable from the request path and (where
|
||
|
|
applicable) body without ambiguity.
|
||
|
|
|
||
|
|
## Caching
|
||
|
|
|
||
|
|
Both `authenticate` and `authorise` results are cached at the
|
||
|
|
gateway, on different policies:
|
||
|
|
|
||
|
|
- **`authenticate`** — cached by a hash of the credential. The OSS
|
||
|
|
gateway uses a fixed short TTL (currently 60 s) so that revoked
|
||
|
|
API keys and disabled users stop working within the TTL window
|
||
|
|
without any push mechanism. Regimes that want a different
|
||
|
|
behaviour can return an `expires` hint with the identity; the
|
||
|
|
gateway honours the smaller of `expires` and its own ceiling.
|
||
|
|
|
||
|
|
- **`authorise`** — cached by a hash of `(handle, capability,
|
||
|
|
resource, parameters)`. The regime returns a suggested TTL with
|
||
|
|
the decision; the gateway clamps it above by a deployment-set
|
||
|
|
ceiling (currently 60 s). Both allow and deny decisions are
|
||
|
|
cached; denies briefly, to avoid hammering the regime with
|
||
|
|
repeated rejected attempts.
|
||
|
|
|
||
|
|
The TTL ceiling caps the revocation latency window — a role
|
||
|
|
revoked at the regime takes effect at the gateway no later than
|
||
|
|
the ceiling. Operators that need stricter revocation can lower
|
||
|
|
the ceiling.
|
||
|
|
|
||
|
|
## Failure modes
|
||
|
|
|
||
|
|
| Condition | Behaviour |
|
||
|
|
|---|---|
|
||
|
|
| `authenticate` returns AuthFailure | Gateway responds 401 with the masked `auth failure` body. |
|
||
|
|
| `authorise` returns deny | Gateway responds 403 with the masked `access denied` body. |
|
||
|
|
| IAM regime unreachable | Gateway responds 401 / 503 (deployment-defined). No fail-open. |
|
||
|
|
| `authorise_many` partial deny | Gateway treats the request as denied; the operation is rejected. Partial-success semantics are not part of the contract. |
|
||
|
|
| Regime returns "not supported" for a management operation | Gateway responds 501. |
|
||
|
|
|
||
|
|
There is no fallback or "soft" decision path. An IAM regime that
|
||
|
|
is unavailable, slow, or returning errors causes requests to fail
|
||
|
|
closed.
|
||
|
|
|
||
|
|
## Implementations
|
||
|
|
|
||
|
|
### Open-source role-based regime
|
||
|
|
|
||
|
|
Defined in [`iam-protocol.md`](iam-protocol.md). Implements the
|
||
|
|
contract via:
|
||
|
|
|
||
|
|
- A pub/sub request/response service (`iam-svc`) reached only by
|
||
|
|
the gateway over the message bus.
|
||
|
|
- Credentials are API keys (opaque) or JWTs (Ed25519, locally
|
||
|
|
validated by the gateway against the regime's published public
|
||
|
|
key).
|
||
|
|
- `authorise` reduces to a role-and-workspace-scope check against
|
||
|
|
the role table defined in [`capabilities.md`](capabilities.md).
|
||
|
|
- Identity, user, and workspace records live in Cassandra.
|
||
|
|
|
||
|
|
The OSS regime is deliberately simple — three roles, single
|
||
|
|
home-workspace per user (a regime data-model decision, not a
|
||
|
|
contract assertion), no policy language.
|
||
|
|
|
||
|
|
### Future regimes
|
||
|
|
|
||
|
|
The contract is shaped to admit, without code change in the
|
||
|
|
gateway:
|
||
|
|
|
||
|
|
- **OIDC / SSO** — `authenticate` validates an OIDC ID token via
|
||
|
|
the IdP's JWKS; `Identity.handle` carries the verified subject
|
||
|
|
and group claims; `authorise` evaluates against group-to-
|
||
|
|
capability mappings configured at the regime.
|
||
|
|
- **ABAC / Policy engine** — `authorise` calls out to a policy
|
||
|
|
engine (Rego, Cedar, custom DSL) with the identity's attributes
|
||
|
|
and the resource as the policy input.
|
||
|
|
- **ReBAC (Zanzibar-style)** — `authorise` translates `(identity,
|
||
|
|
capability, resource)` into a relationship-tuple lookup against
|
||
|
|
a tuple store.
|
||
|
|
- **Hybrid** — multiple regimes composed: e.g. authenticate via
|
||
|
|
SSO, authorise via local policy.
|
||
|
|
|
||
|
|
None of these require gateway changes. The contract surface is
|
||
|
|
the same; the regime is what differs.
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- [Identity and Access Management Specification](iam.md) — overall
|
||
|
|
design and the gateway-side framing.
|
||
|
|
- [IAM Service Protocol Specification](iam-protocol.md) — the OSS
|
||
|
|
regime's wire-level protocol.
|
||
|
|
- [Capability Vocabulary Specification](capabilities.md) — the
|
||
|
|
capability strings the gateway uses as `authorise` input.
|