trustgraph/docs/tech-specs/capabilities.md

---
layout: default
title: "Capability Vocabulary Technical Specification"
parent: "Tech Specs"
---

# Capability Vocabulary Technical Specification

## Overview

Authorisation in TrustGraph is **capability-based**. Every gateway
endpoint maps to exactly one *capability*; a user's roles each grant
a set of capabilities; an authenticated request is permitted when
the required capability is a member of the union of the caller's
role capability sets.

This document defines the capability vocabulary — the closed list
of capability strings that the gateway recognises — and the
open-source edition's role bundles.

The capability mechanism is shared between open-source and potential
3rd party enterprise capability. The open-source edition ships a
fixed three-role bundle (`reader`, `writer`, `admin`). Enterprise
capability may define additional roles by composing their own
capability bundles from the same vocabulary; no protocol, gateway,
or backend-service change is required.

## Motivation

The original IAM spec used hierarchical "minimum role" checks
(`admin` implies `writer` implies `reader`). That shape is simple
but paints the role model into a corner: any enterprise need to
grant a subset of admin abilities (helpdesk that can reset
passwords but not edit flows; analyst who can query but not ingest)
requires a protocol-level change.

A capability vocabulary decouples "what a request needs" from
"what roles a user has" and makes the role table pure data. The
open-source bundles can stay coarse while the enterprise role
table expands without any code movement.

## Design

### Capability string format

`<subsystem>:<verb>` or `<subsystem>` (for capabilities with no
natural read/write split). All lowercase, kebab-case for
multi-word subsystems.

### Capability list

**Data plane**

| Capability | Covers |
|---|---|
| `agent` | agent (query-only; no write counterpart) |
| `graph:read` | graph-rag, graph-embeddings-query, triples-query, sparql, graph-embeddings-export, triples-export |
| `graph:write` | triples-import, graph-embeddings-import |
| `documents:read` | document-rag, document-embeddings-query, document-embeddings-export, entity-contexts-export, document-stream-export, library list / fetch |
| `documents:write` | document-embeddings-import, entity-contexts-import, text-load, document-load, library add / replace / delete |
| `rows:read` | rows-query, row-embeddings-query, nlp-query, structured-query, structured-diag |
| `rows:write` | rows-import |
| `llm` | text-completion, prompt (stateless invocation) |
| `embeddings` | Raw text-embedding service (stateless compute; typed-data embedding stores live under their data-subject capability) |
| `mcp` | mcp-tool |
| `collections:read` | List / describe collections |
| `collections:write` | Create / delete collections |
| `knowledge:read` | List / get knowledge cores |
| `knowledge:write` | Create / delete knowledge cores |

**Control plane**

| Capability | Covers |
|---|---|
| `config:read` | Read workspace config |
| `config:write` | Write workspace config |
| `flows:read` | List / describe flows, blueprints, flow classes |
| `flows:write` | Start / stop / update flows |
| `users:read` | List / get users within the workspace |
| `users:write` | Create / update / disable users within the workspace |
| `users:admin` | Assign / remove roles on users within the workspace |
| `keys:self` | Create / revoke / list **own** API keys |
| `keys:admin` | Create / revoke / list **any user's** API keys within the workspace |
| `workspaces:admin` | Create / delete / disable workspaces (system-level) |
| `iam:admin` | JWT signing-key rotation, IAM-level operations |
| `metrics:read` | Prometheus metrics proxy |

### Open-source role bundles

The open-source edition ships three roles:

| Role | Capabilities |
|---|---|
| `reader` | `agent`, `graph:read`, `documents:read`, `rows:read`, `llm`, `embeddings`, `mcp`, `collections:read`, `knowledge:read`, `flows:read`, `config:read`, `keys:self` |
| `writer` | everything in `reader` **+** `graph:write`, `documents:write`, `rows:write`, `collections:write`, `knowledge:write` |
| `admin` | everything in `writer` **+** `config:write`, `flows:write`, `users:read`, `users:write`, `users:admin`, `keys:admin`, `workspaces:admin`, `iam:admin`, `metrics:read` |

Open-source bundles are deliberately coarse. `workspaces:admin` and
`iam:admin` live inside `admin` without a separate role; a single
`admin` user holds the keys to the whole deployment.

### The `agent` capability and composition

The `agent` capability is granted independently of the capabilities
it composes under the hood (`llm`, `graph`, `documents`, `rows`,
`mcp`, etc.). A user holding `agent` but not `llm` can still cause
LLM invocations because the agent implementation chooses which
services to invoke on the caller's behalf.

This is deliberate. A common policy is "allow controlled access
via the agent, deny raw model calls" — granting `agent` without
granting `llm` expresses exactly that. An administrator granting
`agent` should treat it as a grant of everything the agent
composes at deployment time.

### Authorisation evaluation

For a request bearing a resolved set of roles
`R = {r1, r2, ...}` against an endpoint that requires capability
`c`:

```
allow if c IN union(bundle(r) for r in R)
```

No hierarchy, no precedence, no role-order sensitivity. A user
with a single role is the common case; a user with multiple roles
gets the union of their bundles.

### Enforcement boundary

Capability checks — and authentication — are applied **only at the
API gateway**, on requests arriving from external callers.
Operations originating inside the platform (backend service to
backend service, agent to LLM, flow-svc to config-svc, bootstrap
initialisers, scheduled reconcilers, autonomous flow steps) are
**not capability-checked**. Backend services trust the workspace
set by the gateway on inbound pub/sub messages and trust
internally-originated messages without further authorisation.

This policy has four consequences that are part of the spec, not
accidents of implementation:

1. **The gateway is the single trust boundary for user
   authorisation.** Every backend service is a downstream consumer
   of an already-authorised workspace scope.
2. **Pub/sub carries workspace, not user identity.** Messages on
   the bus do not carry credentials or the identity that originated
   a request; they carry the resolved workspace only. This keeps
   the bus protocol free of secrets and aligns with the workspace
   resolver's role as the gateway-side narrowing step.
3. **Composition is transitive.** Granting a capability that the
   platform composes internally (for example, `agent`) transitively
   grants everything that capability composes under the hood,
   because the downstream calls are internal-origin and are not
   re-checked. The composite nature of `agent` described above is
   a consequence of this policy, not a special case.
4. **Internal-origin operations have no user.** Bootstrap,
   reconcilers, and other platform-initiated work act with
   system-level authority. The workspace field on such messages
   identifies which workspace's data is being touched, not who
   asked.

**Trust model.** Whoever has pub/sub access is implicitly trusted
to act as any workspace. Defense-in-depth within the backend is
not part of this design; the security perimeter is the gateway
and the bus itself (TLS / network isolation between the bus and
any untrusted network).

### Unknown capabilities and unknown roles

- An endpoint declaring an unknown capability is a server-side bug
  and fails closed (403, logged).
- A user carrying a role name that is not defined in the role table
  is ignored for authorisation purposes and logged as a warning.
  Behaviour is deterministic: unknown roles contribute zero
  capabilities.

### Capability scope

Every capability is **implicitly scoped to the caller's resolved
workspace**. A `users:write` capability does not permit a user
in workspace `acme` to create users in workspace `beta` — the
workspace-resolver has already narrowed the request to one
workspace before the capability check runs. See the IAM
specification for the workspace-resolver contract.

The three exceptions are the system-level capabilities
`workspaces:admin` and `iam:admin`, which operate across
workspaces by definition, and `metrics:read`, which returns
process-level series not scoped to any workspace.

## Enterprise extensibility

Enterprise editions extend the role table additively:

```
data-analyst:   {query, library:read, collections:read, knowledge:read}
helpdesk:       {users:read, users:write, users:admin, keys:admin}
data-engineer:  writer + {flows:read, config:read}
workspace-owner: admin − {workspaces:admin, iam:admin}
```

None of this requires a protocol change — the wire-protocol `roles`
field on user records is already a set, the gateway's
capability-check is already capability-based, and the capability
vocabulary is closed. Enterprises may introduce roles whose bundles
compose the same capabilities differently.

When an enterprise introduces a new capability (e.g. for a feature
that does not exist in open source), the capability string is
added to the vocabulary and recognised by the gateway build that
ships that feature.

## References

- [Identity and Access Management Specification](iam.md)
- [Architecture Principles](architecture-principles.md)
-												feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)

Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model.  The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.

IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
  passwords and JWT signing keys in Cassandra.  Reached over the
  standard pub/sub request/response pattern; gateway is the only
  caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
  rotate-signing-key, create/list/get/update/disable/delete/enable-user,
  change-password, reset-password, create/list/get/update/disable-
  workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA).  Key rotation writes a new kid and
  retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed.  Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
  required startup argument with no permissive default.  Masked
  "auth failure" errors hide whether a refused bootstrap request was
  due to mode, state, or authorisation.

Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator.  Distinguishes JWTs
  (three-segment dotted) from API keys by shape; verifies JWTs
  locally using the cached IAM public key; resolves API keys via
  IAM with a short-TTL hash-keyed cache.  Every failure path
  surfaces the same 401 body ("auth failure") so callers cannot
  enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
  traffic does not begin flowing until auth has started.

Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
  OSS ships reader / writer / admin; the first two are workspace-
  assigned, admin is cross-workspace ("*").  No "cross-workspace"
  pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
  authorisation test: some role must grant the capability *and* be
  active in the target workspace.
* enforce_workspace validates a request-body workspace against the
  caller's role scopes and injects the resolved value.  Cross-
  workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
  permissive default.  Construction fails fast if omitted.  Enterprise
  editions can replace the role table without changing the wire
  protocol.

WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
  runs on the first WebSocket frame ({"type":"auth","token":"..."})
  with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
  The socket stays open on failure so the client can re-authenticate
  — browsers treat a handshake-time 401 as terminal, breaking
  reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
  enforces the caller's workspace (envelope + inner payload) using
  the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
  handshake (URL-scoped short-lived transfers; no re-auth need).

Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
  op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
  the IAM API (per-op REST endpoints to follow in a later change).

Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
  Authenticator.permitted contract.  The gateway cannot run without
  IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
  downgrade path.

CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces.  Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.

Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
  resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
  operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
  role bundles, agent-as-composition note, enforcement-boundary
  policy, enterprise extensibility.

Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
  Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
  role x workspace combinations, enforce_workspace paths,
  unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
  explicitly (no permissive defaults relied upon).  New tests pin
  the fail-closed invariants: DispatcherManager / Mux refuse
  auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
											
										
										
											2026-04-24 17:29:10 +01:00
+								---
 								layout: default
 								title: "Capability Vocabulary Technical Specification"
 								parent: "Tech Specs"
 								---
 								# Capability Vocabulary Technical Specification
 								## Overview
 								Authorisation in TrustGraph is **capability-based**. Every gateway
 								endpoint maps to exactly one *capability*; a user's roles each grant
 								a set of capabilities; an authenticated request is permitted when
 								the required capability is a member of the union of the caller's
 								role capability sets.
 								This document defines the capability vocabulary — the closed list
 								of capability strings that the gateway recognises — and the
 								open-source edition's role bundles.
 								The capability mechanism is shared between open-source and potential
 rd party enterprise capability. The open-source edition ships a
 								fixed three-role bundle (`reader`, `writer`, `admin`). Enterprise
 								capability may define additional roles by composing their own
 								capability bundles from the same vocabulary; no protocol, gateway,
 								or backend-service change is required.
 								## Motivation
 								The original IAM spec used hierarchical "minimum role" checks
 								(`admin` implies `writer` implies `reader`). That shape is simple
 								but paints the role model into a corner: any enterprise need to
 								grant a subset of admin abilities (helpdesk that can reset
 								passwords but not edit flows; analyst who can query but not ingest)
 								requires a protocol-level change.
 								A capability vocabulary decouples "what a request needs" from
 								"what roles a user has" and makes the role table pure data. The
 								open-source bundles can stay coarse while the enterprise role
 								table expands without any code movement.
 								## Design
 								### Capability string format
 								`<subsystem>:<verb>` or `<subsystem>` (for capabilities with no
 								natural read/write split). All lowercase, kebab-case for
 								multi-word subsystems.
 								### Capability list
 								**Data plane**
 								| Capability | Covers |
 								|---|---|
 								| `agent` | agent (query-only; no write counterpart) |
 								| `graph:read` | graph-rag, graph-embeddings-query, triples-query, sparql, graph-embeddings-export, triples-export |
 								| `graph:write` | triples-import, graph-embeddings-import |
 								| `documents:read` | document-rag, document-embeddings-query, document-embeddings-export, entity-contexts-export, document-stream-export, library list / fetch |
 								| `documents:write` | document-embeddings-import, entity-contexts-import, text-load, document-load, library add / replace / delete |
 								| `rows:read` | rows-query, row-embeddings-query, nlp-query, structured-query, structured-diag |
 								| `rows:write` | rows-import |
 								| `llm` | text-completion, prompt (stateless invocation) |
 								| `embeddings` | Raw text-embedding service (stateless compute; typed-data embedding stores live under their data-subject capability) |
 								| `mcp` | mcp-tool |
 								| `collections:read` | List / describe collections |
 								| `collections:write` | Create / delete collections |
 								| `knowledge:read` | List / get knowledge cores |
 								| `knowledge:write` | Create / delete knowledge cores |
 								**Control plane**
 								| Capability | Covers |
 								|---|---|
 								| `config:read` | Read workspace config |
 								| `config:write` | Write workspace config |
 								| `flows:read` | List / describe flows, blueprints, flow classes |
 								| `flows:write` | Start / stop / update flows |
 								| `users:read` | List / get users within the workspace |
 								| `users:write` | Create / update / disable users within the workspace |
 								| `users:admin` | Assign / remove roles on users within the workspace |
 								| `keys:self` | Create / revoke / list **own** API keys |
 								| `keys:admin` | Create / revoke / list **any user's** API keys within the workspace |
 								| `workspaces:admin` | Create / delete / disable workspaces (system-level) |
 								| `iam:admin` | JWT signing-key rotation, IAM-level operations |
 								| `metrics:read` | Prometheus metrics proxy |
 								### Open-source role bundles
 								The open-source edition ships three roles:
 								| Role | Capabilities |
 								|---|---|
 								| `reader` | `agent`, `graph:read`, `documents:read`, `rows:read`, `llm`, `embeddings`, `mcp`, `collections:read`, `knowledge:read`, `flows:read`, `config:read`, `keys:self` |
 								| `writer` | everything in `reader` **+** `graph:write`, `documents:write`, `rows:write`, `collections:write`, `knowledge:write` |
 								| `admin` | everything in `writer` **+** `config:write`, `flows:write`, `users:read`, `users:write`, `users:admin`, `keys:admin`, `workspaces:admin`, `iam:admin`, `metrics:read` |
 								Open-source bundles are deliberately coarse. `workspaces:admin` and
 								`iam:admin` live inside `admin` without a separate role; a single
 								`admin` user holds the keys to the whole deployment.
 								### The `agent` capability and composition
 								The `agent` capability is granted independently of the capabilities
 								it composes under the hood (`llm`, `graph`, `documents`, `rows`,
 								`mcp`, etc.). A user holding `agent` but not `llm` can still cause
 								LLM invocations because the agent implementation chooses which
 								services to invoke on the caller's behalf.
 								This is deliberate. A common policy is "allow controlled access
 								via the agent, deny raw model calls" — granting `agent` without
 								granting `llm` expresses exactly that. An administrator granting
 								`agent` should treat it as a grant of everything the agent
 								composes at deployment time.
 								### Authorisation evaluation
 								For a request bearing a resolved set of roles
 								`R = {r1, r2, ...}` against an endpoint that requires capability
 								`c`:
 								```
 								allow if c IN union(bundle(r) for r in R)
 								```
 								No hierarchy, no precedence, no role-order sensitivity. A user
 								with a single role is the common case; a user with multiple roles
 								gets the union of their bundles.
 								### Enforcement boundary
 								Capability checks — and authentication — are applied **only at the
 								API gateway**, on requests arriving from external callers.
 								Operations originating inside the platform (backend service to
 								backend service, agent to LLM, flow-svc to config-svc, bootstrap
 								initialisers, scheduled reconcilers, autonomous flow steps) are
 								**not capability-checked**. Backend services trust the workspace
 								set by the gateway on inbound pub/sub messages and trust
 								internally-originated messages without further authorisation.
 								This policy has four consequences that are part of the spec, not
 								accidents of implementation:
 . **The gateway is the single trust boundary for user
 								   authorisation.** Every backend service is a downstream consumer
 								   of an already-authorised workspace scope.
 . **Pub/sub carries workspace, not user identity.** Messages on
 								   the bus do not carry credentials or the identity that originated
 								   a request; they carry the resolved workspace only. This keeps
 								   the bus protocol free of secrets and aligns with the workspace
 								   resolver's role as the gateway-side narrowing step.
 . **Composition is transitive.** Granting a capability that the
 								   platform composes internally (for example, `agent`) transitively
 								   grants everything that capability composes under the hood,
 								   because the downstream calls are internal-origin and are not
 								   re-checked. The composite nature of `agent` described above is
 								   a consequence of this policy, not a special case.
 . **Internal-origin operations have no user.** Bootstrap,
 								   reconcilers, and other platform-initiated work act with
 								   system-level authority. The workspace field on such messages
 								   identifies which workspace's data is being touched, not who
 								   asked.
 								**Trust model.** Whoever has pub/sub access is implicitly trusted
 								to act as any workspace. Defense-in-depth within the backend is
 								not part of this design; the security perimeter is the gateway
 								and the bus itself (TLS / network isolation between the bus and
 								any untrusted network).
 								### Unknown capabilities and unknown roles
 								- An endpoint declaring an unknown capability is a server-side bug
 								  and fails closed (403, logged).
 								- A user carrying a role name that is not defined in the role table
 								  is ignored for authorisation purposes and logged as a warning.
 								  Behaviour is deterministic: unknown roles contribute zero
 								  capabilities.
 								### Capability scope
 								Every capability is **implicitly scoped to the caller's resolved
 								workspace**. A `users:write` capability does not permit a user
 								in workspace `acme` to create users in workspace `beta` — the
 								workspace-resolver has already narrowed the request to one
 								workspace before the capability check runs. See the IAM
 								specification for the workspace-resolver contract.
 								The three exceptions are the system-level capabilities
 								`workspaces:admin` and `iam:admin`, which operate across
 								workspaces by definition, and `metrics:read`, which returns
 								process-level series not scoped to any workspace.
 								## Enterprise extensibility
 								Enterprise editions extend the role table additively:
 								```
 								data-analyst:   {query, library:read, collections:read, knowledge:read}
 								helpdesk:       {users:read, users:write, users:admin, keys:admin}
 								data-engineer:  writer + {flows:read, config:read}
 								workspace-owner: admin − {workspaces:admin, iam:admin}
 								```
 								None of this requires a protocol change — the wire-protocol `roles`
 								field on user records is already a set, the gateway's
 								capability-check is already capability-based, and the capability
 								vocabulary is closed. Enterprises may introduce roles whose bundles
 								compose the same capabilities differently.
 								When an enterprise introduces a new capability (e.g. for a feature
 								that does not exist in open source), the capability string is
 								added to the vocabulary and recognised by the gateway build that
 								ships that feature.
 								## References
 								- [Identity and Access Management Specification](iam.md)
 								- [Architecture Principles](architecture-principles.md)