mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-05-01 11:26:22 +02:00
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model. The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.
IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
passwords and JWT signing keys in Cassandra. Reached over the
standard pub/sub request/response pattern; gateway is the only
caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
rotate-signing-key, create/list/get/update/disable/delete/enable-user,
change-password, reset-password, create/list/get/update/disable-
workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and
retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed. Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
required startup argument with no permissive default. Masked
"auth failure" errors hide whether a refused bootstrap request was
due to mode, state, or authorisation.
Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator. Distinguishes JWTs
(three-segment dotted) from API keys by shape; verifies JWTs
locally using the cached IAM public key; resolves API keys via
IAM with a short-TTL hash-keyed cache. Every failure path
surfaces the same 401 body ("auth failure") so callers cannot
enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
traffic does not begin flowing until auth has started.
Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
OSS ships reader / writer / admin; the first two are workspace-
assigned, admin is cross-workspace ("*"). No "cross-workspace"
pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
authorisation test: some role must grant the capability *and* be
active in the target workspace.
* enforce_workspace validates a request-body workspace against the
caller's role scopes and injects the resolved value. Cross-
workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
permissive default. Construction fails fast if omitted. Enterprise
editions can replace the role table without changing the wire
protocol.
WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
runs on the first WebSocket frame ({"type":"auth","token":"..."})
with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
The socket stays open on failure so the client can re-authenticate
— browsers treat a handshake-time 401 as terminal, breaking
reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
enforces the caller's workspace (envelope + inner payload) using
the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
handshake (URL-scoped short-lived transfers; no re-auth need).
Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
the IAM API (per-op REST endpoints to follow in a later change).
Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
Authenticator.permitted contract. The gateway cannot run without
IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
downgrade path.
CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.
Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
role bundles, agent-as-composition note, enforcement-boundary
policy, enterprise extensibility.
Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
role x workspace combinations, enforce_workspace paths,
unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
explicitly (no permissive defaults relied upon). New tests pin
the fail-closed invariants: DispatcherManager / Mux refuse
auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
This commit is contained in:
parent
ae9936c9cc
commit
67b2fc448f
61 changed files with 6474 additions and 792 deletions
218
docs/tech-specs/capabilities.md
Normal file
218
docs/tech-specs/capabilities.md
Normal file
|
|
@ -0,0 +1,218 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Capability Vocabulary Technical Specification"
|
||||
parent: "Tech Specs"
|
||||
---
|
||||
|
||||
# Capability Vocabulary Technical Specification
|
||||
|
||||
## Overview
|
||||
|
||||
Authorisation in TrustGraph is **capability-based**. Every gateway
|
||||
endpoint maps to exactly one *capability*; a user's roles each grant
|
||||
a set of capabilities; an authenticated request is permitted when
|
||||
the required capability is a member of the union of the caller's
|
||||
role capability sets.
|
||||
|
||||
This document defines the capability vocabulary — the closed list
|
||||
of capability strings that the gateway recognises — and the
|
||||
open-source edition's role bundles.
|
||||
|
||||
The capability mechanism is shared between open-source and potential
|
||||
3rd party enterprise capability. The open-source edition ships a
|
||||
fixed three-role bundle (`reader`, `writer`, `admin`). Enterprise
|
||||
capability may define additional roles by composing their own
|
||||
capability bundles from the same vocabulary; no protocol, gateway,
|
||||
or backend-service change is required.
|
||||
|
||||
## Motivation
|
||||
|
||||
The original IAM spec used hierarchical "minimum role" checks
|
||||
(`admin` implies `writer` implies `reader`). That shape is simple
|
||||
but paints the role model into a corner: any enterprise need to
|
||||
grant a subset of admin abilities (helpdesk that can reset
|
||||
passwords but not edit flows; analyst who can query but not ingest)
|
||||
requires a protocol-level change.
|
||||
|
||||
A capability vocabulary decouples "what a request needs" from
|
||||
"what roles a user has" and makes the role table pure data. The
|
||||
open-source bundles can stay coarse while the enterprise role
|
||||
table expands without any code movement.
|
||||
|
||||
## Design
|
||||
|
||||
### Capability string format
|
||||
|
||||
`<subsystem>:<verb>` or `<subsystem>` (for capabilities with no
|
||||
natural read/write split). All lowercase, kebab-case for
|
||||
multi-word subsystems.
|
||||
|
||||
### Capability list
|
||||
|
||||
**Data plane**
|
||||
|
||||
| Capability | Covers |
|
||||
|---|---|
|
||||
| `agent` | agent (query-only; no write counterpart) |
|
||||
| `graph:read` | graph-rag, graph-embeddings-query, triples-query, sparql, graph-embeddings-export, triples-export |
|
||||
| `graph:write` | triples-import, graph-embeddings-import |
|
||||
| `documents:read` | document-rag, document-embeddings-query, document-embeddings-export, entity-contexts-export, document-stream-export, library list / fetch |
|
||||
| `documents:write` | document-embeddings-import, entity-contexts-import, text-load, document-load, library add / replace / delete |
|
||||
| `rows:read` | rows-query, row-embeddings-query, nlp-query, structured-query, structured-diag |
|
||||
| `rows:write` | rows-import |
|
||||
| `llm` | text-completion, prompt (stateless invocation) |
|
||||
| `embeddings` | Raw text-embedding service (stateless compute; typed-data embedding stores live under their data-subject capability) |
|
||||
| `mcp` | mcp-tool |
|
||||
| `collections:read` | List / describe collections |
|
||||
| `collections:write` | Create / delete collections |
|
||||
| `knowledge:read` | List / get knowledge cores |
|
||||
| `knowledge:write` | Create / delete knowledge cores |
|
||||
|
||||
**Control plane**
|
||||
|
||||
| Capability | Covers |
|
||||
|---|---|
|
||||
| `config:read` | Read workspace config |
|
||||
| `config:write` | Write workspace config |
|
||||
| `flows:read` | List / describe flows, blueprints, flow classes |
|
||||
| `flows:write` | Start / stop / update flows |
|
||||
| `users:read` | List / get users within the workspace |
|
||||
| `users:write` | Create / update / disable users within the workspace |
|
||||
| `users:admin` | Assign / remove roles on users within the workspace |
|
||||
| `keys:self` | Create / revoke / list **own** API keys |
|
||||
| `keys:admin` | Create / revoke / list **any user's** API keys within the workspace |
|
||||
| `workspaces:admin` | Create / delete / disable workspaces (system-level) |
|
||||
| `iam:admin` | JWT signing-key rotation, IAM-level operations |
|
||||
| `metrics:read` | Prometheus metrics proxy |
|
||||
|
||||
### Open-source role bundles
|
||||
|
||||
The open-source edition ships three roles:
|
||||
|
||||
| Role | Capabilities |
|
||||
|---|---|
|
||||
| `reader` | `agent`, `graph:read`, `documents:read`, `rows:read`, `llm`, `embeddings`, `mcp`, `collections:read`, `knowledge:read`, `flows:read`, `config:read`, `keys:self` |
|
||||
| `writer` | everything in `reader` **+** `graph:write`, `documents:write`, `rows:write`, `collections:write`, `knowledge:write` |
|
||||
| `admin` | everything in `writer` **+** `config:write`, `flows:write`, `users:read`, `users:write`, `users:admin`, `keys:admin`, `workspaces:admin`, `iam:admin`, `metrics:read` |
|
||||
|
||||
Open-source bundles are deliberately coarse. `workspaces:admin` and
|
||||
`iam:admin` live inside `admin` without a separate role; a single
|
||||
`admin` user holds the keys to the whole deployment.
|
||||
|
||||
### The `agent` capability and composition
|
||||
|
||||
The `agent` capability is granted independently of the capabilities
|
||||
it composes under the hood (`llm`, `graph`, `documents`, `rows`,
|
||||
`mcp`, etc.). A user holding `agent` but not `llm` can still cause
|
||||
LLM invocations because the agent implementation chooses which
|
||||
services to invoke on the caller's behalf.
|
||||
|
||||
This is deliberate. A common policy is "allow controlled access
|
||||
via the agent, deny raw model calls" — granting `agent` without
|
||||
granting `llm` expresses exactly that. An administrator granting
|
||||
`agent` should treat it as a grant of everything the agent
|
||||
composes at deployment time.
|
||||
|
||||
### Authorisation evaluation
|
||||
|
||||
For a request bearing a resolved set of roles
|
||||
`R = {r1, r2, ...}` against an endpoint that requires capability
|
||||
`c`:
|
||||
|
||||
```
|
||||
allow if c IN union(bundle(r) for r in R)
|
||||
```
|
||||
|
||||
No hierarchy, no precedence, no role-order sensitivity. A user
|
||||
with a single role is the common case; a user with multiple roles
|
||||
gets the union of their bundles.
|
||||
|
||||
### Enforcement boundary
|
||||
|
||||
Capability checks — and authentication — are applied **only at the
|
||||
API gateway**, on requests arriving from external callers.
|
||||
Operations originating inside the platform (backend service to
|
||||
backend service, agent to LLM, flow-svc to config-svc, bootstrap
|
||||
initialisers, scheduled reconcilers, autonomous flow steps) are
|
||||
**not capability-checked**. Backend services trust the workspace
|
||||
set by the gateway on inbound pub/sub messages and trust
|
||||
internally-originated messages without further authorisation.
|
||||
|
||||
This policy has four consequences that are part of the spec, not
|
||||
accidents of implementation:
|
||||
|
||||
1. **The gateway is the single trust boundary for user
|
||||
authorisation.** Every backend service is a downstream consumer
|
||||
of an already-authorised workspace scope.
|
||||
2. **Pub/sub carries workspace, not user identity.** Messages on
|
||||
the bus do not carry credentials or the identity that originated
|
||||
a request; they carry the resolved workspace only. This keeps
|
||||
the bus protocol free of secrets and aligns with the workspace
|
||||
resolver's role as the gateway-side narrowing step.
|
||||
3. **Composition is transitive.** Granting a capability that the
|
||||
platform composes internally (for example, `agent`) transitively
|
||||
grants everything that capability composes under the hood,
|
||||
because the downstream calls are internal-origin and are not
|
||||
re-checked. The composite nature of `agent` described above is
|
||||
a consequence of this policy, not a special case.
|
||||
4. **Internal-origin operations have no user.** Bootstrap,
|
||||
reconcilers, and other platform-initiated work act with
|
||||
system-level authority. The workspace field on such messages
|
||||
identifies which workspace's data is being touched, not who
|
||||
asked.
|
||||
|
||||
**Trust model.** Whoever has pub/sub access is implicitly trusted
|
||||
to act as any workspace. Defense-in-depth within the backend is
|
||||
not part of this design; the security perimeter is the gateway
|
||||
and the bus itself (TLS / network isolation between the bus and
|
||||
any untrusted network).
|
||||
|
||||
### Unknown capabilities and unknown roles
|
||||
|
||||
- An endpoint declaring an unknown capability is a server-side bug
|
||||
and fails closed (403, logged).
|
||||
- A user carrying a role name that is not defined in the role table
|
||||
is ignored for authorisation purposes and logged as a warning.
|
||||
Behaviour is deterministic: unknown roles contribute zero
|
||||
capabilities.
|
||||
|
||||
### Capability scope
|
||||
|
||||
Every capability is **implicitly scoped to the caller's resolved
|
||||
workspace**. A `users:write` capability does not permit a user
|
||||
in workspace `acme` to create users in workspace `beta` — the
|
||||
workspace-resolver has already narrowed the request to one
|
||||
workspace before the capability check runs. See the IAM
|
||||
specification for the workspace-resolver contract.
|
||||
|
||||
The three exceptions are the system-level capabilities
|
||||
`workspaces:admin` and `iam:admin`, which operate across
|
||||
workspaces by definition, and `metrics:read`, which returns
|
||||
process-level series not scoped to any workspace.
|
||||
|
||||
## Enterprise extensibility
|
||||
|
||||
Enterprise editions extend the role table additively:
|
||||
|
||||
```
|
||||
data-analyst: {query, library:read, collections:read, knowledge:read}
|
||||
helpdesk: {users:read, users:write, users:admin, keys:admin}
|
||||
data-engineer: writer + {flows:read, config:read}
|
||||
workspace-owner: admin − {workspaces:admin, iam:admin}
|
||||
```
|
||||
|
||||
None of this requires a protocol change — the wire-protocol `roles`
|
||||
field on user records is already a set, the gateway's
|
||||
capability-check is already capability-based, and the capability
|
||||
vocabulary is closed. Enterprises may introduce roles whose bundles
|
||||
compose the same capabilities differently.
|
||||
|
||||
When an enterprise introduces a new capability (e.g. for a feature
|
||||
that does not exist in open source), the capability string is
|
||||
added to the vocabulary and recognised by the gateway build that
|
||||
ships that feature.
|
||||
|
||||
## References
|
||||
|
||||
- [Identity and Access Management Specification](iam.md)
|
||||
- [Architecture Principles](architecture-principles.md)
|
||||
329
docs/tech-specs/iam-protocol.md
Normal file
329
docs/tech-specs/iam-protocol.md
Normal file
|
|
@ -0,0 +1,329 @@
|
|||
---
|
||||
layout: default
|
||||
title: "IAM Service Protocol Technical Specification"
|
||||
parent: "Tech Specs"
|
||||
---
|
||||
|
||||
# IAM Service Protocol Technical Specification
|
||||
|
||||
## Overview
|
||||
|
||||
The IAM service is a backend processor, reached over the standard
|
||||
request/response pub/sub pattern. It is the authority for users,
|
||||
workspaces, API keys, and login credentials. The API gateway
|
||||
delegates to it for authentication resolution and for all user /
|
||||
workspace / key management.
|
||||
|
||||
This document defines the wire protocol: the `IamRequest` and
|
||||
`IamResponse` dataclasses, the operation set, the per-operation
|
||||
input and output fields, the error taxonomy, and the initial HTTP
|
||||
forwarding endpoint used while IAM is being integrated into the
|
||||
gateway.
|
||||
|
||||
Architectural context — roles, capabilities, workspace scoping,
|
||||
enforcement boundary — lives in [`iam.md`](iam.md) and
|
||||
[`capabilities.md`](capabilities.md).
|
||||
|
||||
## Transport
|
||||
|
||||
- **Request topic:** `request:tg/request/iam-request`
|
||||
- **Response topic:** `response:tg/response/iam-response`
|
||||
- **Pattern:** request/response, correlated by the `id` message
|
||||
property, the same pattern used by `config-svc` and `flow-svc`.
|
||||
- **Caller:** the API gateway only. Under the enforcement-boundary
|
||||
policy (see capabilities spec), the IAM service trusts the bus
|
||||
and performs no per-request authentication or capability check
|
||||
against the caller. The gateway has already evaluated capability
|
||||
membership and workspace scoping before sending the request.
|
||||
|
||||
## Dataclasses
|
||||
|
||||
### `IamRequest`
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class IamRequest:
|
||||
# One of the operation strings below.
|
||||
operation: str = ""
|
||||
|
||||
# Scope of this request. Required on every workspace-scoped
|
||||
# operation. Omitted (or empty) for system-level ops
|
||||
# (workspace CRUD, signing-key ops, bootstrap, resolve-api-key,
|
||||
# login).
|
||||
workspace: str = ""
|
||||
|
||||
# Acting user id, for audit. Set by the gateway to the
|
||||
# authenticated caller's id on user-initiated operations.
|
||||
# Empty for internal-origin (bootstrap, reconcilers) and for
|
||||
# resolve-api-key / login (no actor yet).
|
||||
actor: str = ""
|
||||
|
||||
# --- identity selectors ---
|
||||
user_id: str = ""
|
||||
username: str = "" # login; unique within a workspace
|
||||
key_id: str = "" # revoke-api-key, list-api-keys (own)
|
||||
api_key: str = "" # resolve-api-key (plaintext)
|
||||
|
||||
# --- credentials ---
|
||||
password: str = "" # login, change-password (current)
|
||||
new_password: str = "" # change-password
|
||||
|
||||
# --- user fields ---
|
||||
user: UserInput | None = None # create-user, update-user
|
||||
|
||||
# --- workspace fields ---
|
||||
workspace_record: WorkspaceInput | None = None # create-workspace, update-workspace
|
||||
|
||||
# --- api key fields ---
|
||||
key: ApiKeyInput | None = None # create-api-key
|
||||
```
|
||||
|
||||
### `IamResponse`
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class IamResponse:
|
||||
# Populated on success of operations that return them.
|
||||
user: UserRecord | None = None # create-user, get-user, update-user
|
||||
users: list[UserRecord] = field(default_factory=list) # list-users
|
||||
workspace: WorkspaceRecord | None = None # create-workspace, get-workspace, update-workspace
|
||||
workspaces: list[WorkspaceRecord] = field(default_factory=list) # list-workspaces
|
||||
|
||||
# create-api-key returns the plaintext once. Never populated
|
||||
# on any other operation.
|
||||
api_key_plaintext: str = ""
|
||||
api_key: ApiKeyRecord | None = None # create-api-key
|
||||
api_keys: list[ApiKeyRecord] = field(default_factory=list) # list-api-keys
|
||||
|
||||
# login, rotate-signing-key
|
||||
jwt: str = ""
|
||||
jwt_expires: str = "" # ISO-8601 UTC
|
||||
|
||||
# get-signing-key-public
|
||||
signing_key_public: str = "" # PEM
|
||||
|
||||
# resolve-api-key returns who this key authenticates as.
|
||||
resolved_user_id: str = ""
|
||||
resolved_workspace: str = ""
|
||||
resolved_roles: list[str] = field(default_factory=list)
|
||||
|
||||
# reset-password
|
||||
temporary_password: str = "" # returned once to the operator
|
||||
|
||||
# bootstrap: on first run, the initial admin's one-time API key
|
||||
# is returned for the operator to capture.
|
||||
bootstrap_admin_user_id: str = ""
|
||||
bootstrap_admin_api_key: str = ""
|
||||
|
||||
# Present on any failed operation.
|
||||
error: Error | None = None
|
||||
```
|
||||
|
||||
### Value types
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class UserInput:
|
||||
username: str = ""
|
||||
name: str = ""
|
||||
email: str = ""
|
||||
password: str = "" # only on create-user; never on update-user
|
||||
roles: list[str] = field(default_factory=list)
|
||||
enabled: bool = True
|
||||
must_change_password: bool = False
|
||||
|
||||
@dataclass
|
||||
class UserRecord:
|
||||
id: str = ""
|
||||
workspace: str = ""
|
||||
username: str = ""
|
||||
name: str = ""
|
||||
email: str = ""
|
||||
roles: list[str] = field(default_factory=list)
|
||||
enabled: bool = True
|
||||
must_change_password: bool = False
|
||||
created: str = "" # ISO-8601 UTC
|
||||
# Password hash is never included in any response.
|
||||
|
||||
@dataclass
|
||||
class WorkspaceInput:
|
||||
id: str = ""
|
||||
name: str = ""
|
||||
enabled: bool = True
|
||||
|
||||
@dataclass
|
||||
class WorkspaceRecord:
|
||||
id: str = ""
|
||||
name: str = ""
|
||||
enabled: bool = True
|
||||
created: str = "" # ISO-8601 UTC
|
||||
|
||||
@dataclass
|
||||
class ApiKeyInput:
|
||||
user_id: str = ""
|
||||
name: str = "" # operator-facing label, e.g. "laptop"
|
||||
expires: str = "" # optional ISO-8601 UTC; empty = no expiry
|
||||
|
||||
@dataclass
|
||||
class ApiKeyRecord:
|
||||
id: str = ""
|
||||
user_id: str = ""
|
||||
name: str = ""
|
||||
prefix: str = "" # first 4 chars of plaintext, for identification in lists
|
||||
expires: str = "" # empty = no expiry
|
||||
created: str = ""
|
||||
last_used: str = "" # empty if never used
|
||||
# key_hash is never included in any response.
|
||||
```
|
||||
|
||||
## Operations
|
||||
|
||||
| Operation | Request fields | Response fields | Notes |
|
||||
|---|---|---|---|
|
||||
| `login` | `username`, `password`, `workspace` (optional) | `jwt`, `jwt_expires` | If `workspace` omitted, IAM resolves to the user's assigned workspace. |
|
||||
| `resolve-api-key` | `api_key` (plaintext) | `resolved_user_id`, `resolved_workspace`, `resolved_roles` | Gateway-internal. Service returns `auth-failed` for unknown / expired / revoked keys. |
|
||||
| `change-password` | `user_id`, `password` (current), `new_password` | — | Self-service. IAM validates `password` against stored hash. |
|
||||
| `reset-password` | `user_id` | `temporary_password` | Admin-initiated. IAM generates a random password, sets `must_change_password=true` on the user, returns the plaintext once. |
|
||||
| `create-user` | `workspace`, `user` | `user` | Admin-only. `user.password` is hashed and stored; `user.roles` must be subset of known roles. |
|
||||
| `list-users` | `workspace` | `users` | |
|
||||
| `get-user` | `workspace`, `user_id` | `user` | |
|
||||
| `update-user` | `workspace`, `user_id`, `user` | `user` | `password` field on `user` is rejected; use `change-password` / `reset-password`. |
|
||||
| `disable-user` | `workspace`, `user_id` | — | Soft-delete; sets `enabled=false`. Revokes all the user's API keys. |
|
||||
| `create-workspace` | `workspace_record` | `workspace` | System-level. |
|
||||
| `list-workspaces` | — | `workspaces` | System-level. |
|
||||
| `get-workspace` | `workspace_record` (id only) | `workspace` | System-level. |
|
||||
| `update-workspace` | `workspace_record` | `workspace` | System-level. |
|
||||
| `disable-workspace` | `workspace_record` (id only) | — | System-level. Sets `enabled=false`; revokes all workspace API keys; disables all users in the workspace. |
|
||||
| `create-api-key` | `workspace`, `key` | `api_key_plaintext`, `api_key` | Plaintext returned **once**; only hash stored. `key.name` required. |
|
||||
| `list-api-keys` | `workspace`, `user_id` | `api_keys` | |
|
||||
| `revoke-api-key` | `workspace`, `key_id` | — | Deletes the key record. |
|
||||
| `get-signing-key-public` | — | `signing_key_public` | Gateway fetches this at startup. |
|
||||
| `rotate-signing-key` | — | — | System-level. Introduces a new signing key; old key continues to validate JWTs for a grace period (implementation-defined, minimum 1h). |
|
||||
| `bootstrap` | — | `bootstrap_admin_user_id`, `bootstrap_admin_api_key` | If IAM tables are empty, creates the initial `default` workspace, an `admin` user, an initial API key, and an initial signing key; returns them once. No-op on subsequent calls (returns empty fields). |
|
||||
|
||||
## Error taxonomy
|
||||
|
||||
All errors are carried in the `IamResponse.error` field. `error.type`
|
||||
is one of the values below; `error.message` is a human-readable
|
||||
string that is **not** surfaced verbatim to external callers (the
|
||||
gateway maps to `auth failure` / `access denied` per the IAM error
|
||||
policy).
|
||||
|
||||
| `type` | When |
|
||||
|---|---|
|
||||
| `invalid-argument` | Malformed request (missing required field, unknown operation, invalid format). |
|
||||
| `not-found` | Named resource does not exist (`user_id`, `key_id`, workspace). |
|
||||
| `duplicate` | Create operation collides with an existing resource (username, workspace id, key name). |
|
||||
| `auth-failed` | `login` with wrong credentials; `resolve-api-key` with unknown / expired / revoked key; `change-password` with wrong current password. Single bucket to deny oracle attacks. |
|
||||
| `weak-password` | Password does not meet policy (length, complexity — policy defined at service level). |
|
||||
| `disabled` | Target user or workspace has `enabled=false`. |
|
||||
| `operation-not-permitted` | Non-admin attempting system-level operation, or workspace-scoped operation attempting to affect another workspace. |
|
||||
| `internal-error` | Unexpected IAM-side failure. Log and surface as 500 at the gateway. |
|
||||
|
||||
The gateway is responsible for translating `auth-failed` and
|
||||
`operation-not-permitted` into the obfuscated external error
|
||||
response (`"auth failure"` / `"access denied"`); `invalid-argument`
|
||||
becomes a descriptive 400; `not-found` / `duplicate` /
|
||||
`weak-password` / `disabled` become descriptive 4xx but never leak
|
||||
IAM-internal detail.
|
||||
|
||||
## Credential storage
|
||||
|
||||
- **Passwords** are stored using a slow KDF (bcrypt / argon2id — the
|
||||
service picks; documented as an implementation detail). The
|
||||
`password_hash` column stores the full KDF-encoded string
|
||||
(algorithm, cost, salt, hash). Not a plain SHA-256.
|
||||
- **API keys** are stored as SHA-256 of the plaintext. API keys
|
||||
are 128-bit random values (`tg_` + base64url); the entropy
|
||||
makes a slow hash unnecessary. The hash serves as the primary
|
||||
key on the `iam_api_keys` table, enabling O(1) lookup on
|
||||
`resolve-api-key`.
|
||||
- **JWT signing key** is stored as an RSA or Ed25519 private key
|
||||
(implementation choice) in a dedicated `iam_signing_keys` table
|
||||
with a `kid`, `created`, and optional `retired` timestamp. At
|
||||
most one active key; up to N retired keys are kept for a grace
|
||||
period to validate previously-issued JWTs.
|
||||
|
||||
Passwords, API-key plaintext, and signing-key private material are
|
||||
never returned in any response other than the explicit one-time
|
||||
responses above (`reset-password`, `create-api-key`, `bootstrap`).
|
||||
|
||||
## Bootstrap modes
|
||||
|
||||
`iam-svc` requires a bootstrap mode to be chosen at startup. There is
|
||||
no default — an unset or invalid mode causes the service to refuse
|
||||
to start. The purpose is to force the operator to make an explicit
|
||||
security decision rather than rely on an implicit "safe" fallback.
|
||||
|
||||
| Mode | Startup behaviour | `bootstrap` operation | Suitability |
|
||||
|---|---|---|---|
|
||||
| `token` | On first start with empty tables, auto-seeds the `default` workspace, admin user, admin API key (using the operator-provided `--bootstrap-token`), and an initial signing key. No-op on subsequent starts. | Refused — returns `auth-failed` / `"auth failure"` regardless of caller. | Production, any public-exposure deployment. |
|
||||
| `bootstrap` | No startup seeding. Tables remain empty until the `bootstrap` operation is invoked over the pub/sub bus (typically via `tg-bootstrap-iam`). | Live while tables are empty. Generates and returns the admin API key once. Refused (`auth-failed`) once tables are populated. | Dev / compose up / CI. **Not safe under public exposure** — any caller reaching the gateway's `/api/v1/iam` forwarder before the operator can cause a token to be issued to them. Operators choosing this mode accept that risk. |
|
||||
|
||||
### Error masking
|
||||
|
||||
In both modes, any refused invocation of the `bootstrap` operation
|
||||
returns the same error (`auth-failed` / `"auth failure"`). A caller
|
||||
cannot distinguish:
|
||||
|
||||
- "service is in token mode"
|
||||
- "service is in bootstrap mode but already bootstrapped"
|
||||
- "operation forbidden"
|
||||
|
||||
This matches the general IAM error-policy stance (see `iam.md`) and
|
||||
prevents externally enumerating IAM's state.
|
||||
|
||||
### Bootstrap-token lifecycle
|
||||
|
||||
The bootstrap token — whether operator-supplied (`token` mode) or
|
||||
service-generated (`bootstrap` mode) — is a one-time credential. It
|
||||
is stored as admin's single API key, tagged `name="bootstrap"`. The
|
||||
operator's first admin action after bootstrap should be:
|
||||
|
||||
1. Create a durable admin user and API key (or issue a durable API
|
||||
key to the bootstrap admin).
|
||||
2. Revoke the bootstrap key via `revoke-api-key`.
|
||||
3. Remove the bootstrap token from any deployment configuration.
|
||||
|
||||
The `name="bootstrap"` marker makes bootstrap keys easy to detect in
|
||||
tooling (e.g. a `tg-list-api-keys` filter).
|
||||
|
||||
## HTTP forwarding (initial integration)
|
||||
|
||||
For the initial gateway integration — before the IAM service is
|
||||
wired into the authentication middleware — the gateway exposes a
|
||||
single forwarding endpoint:
|
||||
|
||||
```
|
||||
POST /api/v1/iam
|
||||
```
|
||||
|
||||
- Request body is a JSON encoding of `IamRequest`.
|
||||
- Response body is a JSON encoding of `IamResponse`.
|
||||
- The gateway's existing authentication (`GATEWAY_SECRET` bearer)
|
||||
gates access to this endpoint so the IAM protocol can be
|
||||
exercised end-to-end in tests without touching the live auth
|
||||
path.
|
||||
- This endpoint is **not** the final shape. Once the middleware is
|
||||
in place, per-operation REST endpoints replace it (for example
|
||||
`POST /api/v1/auth/login`, `POST /api/v1/users`, `DELETE
|
||||
/api/v1/api-keys/{id}`), and this generic forwarder is removed.
|
||||
|
||||
The endpoint performs only message marshalling: it does not read
|
||||
or rewrite fields in the request, and it applies no capability
|
||||
check. All authorisation for user / workspace / key management
|
||||
lands in the subsequent middleware work.
|
||||
|
||||
## Non-goals for this spec
|
||||
|
||||
- REST endpoint shape for the final gateway surface — covered in
|
||||
Phase 2 of the IAM implementation plan, not here.
|
||||
- OIDC / SAML external IdP protocol — out of scope for open source.
|
||||
- Key-signing algorithm choice, password KDF choice, JWT claim
|
||||
layout — implementation details captured in code + ADRs, not
|
||||
locked in the protocol spec.
|
||||
|
||||
## References
|
||||
|
||||
- [Identity and Access Management Specification](iam.md)
|
||||
- [Capability Vocabulary Specification](capabilities.md)
|
||||
|
|
@ -423,6 +423,37 @@ resolve API keys and to handle login requests. User management
|
|||
operations (create user, revoke key, etc.) also go through the IAM
|
||||
service.
|
||||
|
||||
### Error policy
|
||||
|
||||
External error responses carry **no diagnostic detail** for
|
||||
authentication or access-control failures. The goal is to give an
|
||||
attacker probing the endpoint no signal about which condition they
|
||||
tripped.
|
||||
|
||||
| Category | HTTP | Body | WebSocket frame |
|
||||
|----------|------|------|-----------------|
|
||||
| Authentication failure | `401 Unauthorized` | `{"error": "auth failure"}` | `{"type": "auth-failed", "error": "auth failure"}` |
|
||||
| Access control failure | `403 Forbidden` | `{"error": "access denied"}` | `{"error": "access denied"}` (endpoint-specific frame type) |
|
||||
|
||||
"Authentication failure" covers missing credential, malformed
|
||||
credential, invalid signature, expired token, revoked API key, and
|
||||
unknown API key — all indistinguishable to the caller.
|
||||
|
||||
"Access control failure" covers role insufficient, workspace
|
||||
mismatch, user disabled, and workspace disabled — all
|
||||
indistinguishable to the caller.
|
||||
|
||||
**Server-side logging is richer.** The audit log records the specific
|
||||
reason (`"workspace-mismatch: user alice assigned 'acme', requested
|
||||
'beta'"`, `"role-insufficient: admin required, user has writer"`,
|
||||
etc.) for operators and post-incident forensics. These messages never
|
||||
appear in responses.
|
||||
|
||||
Other error classes (bad request, internal error) remain descriptive
|
||||
because they do not reveal anything about the auth or access-control
|
||||
surface — e.g. `"missing required field 'workspace'"` or
|
||||
`"invalid JSON"` is fine.
|
||||
|
||||
### Gateway changes
|
||||
|
||||
The current `Authenticator` class is replaced with a thin authentication
|
||||
|
|
@ -713,6 +744,16 @@ These are not implemented but the architecture does not preclude them:
|
|||
- **Multi-workspace access.** Users could be granted access to
|
||||
additional workspaces beyond their primary assignment. The workspace
|
||||
validation step checks a grant list instead of a single assignment.
|
||||
- **Workspace resolver.** Workspace resolution on each authenticated
|
||||
request — "given this user and this requested workspace, which
|
||||
workspace (if any) may the request operate on?" — is encapsulated
|
||||
in a single pluggable resolver. The open-source edition ships a
|
||||
resolver that permits only the user's single assigned workspace;
|
||||
enterprise editions that implement multi-workspace access swap in a
|
||||
resolver that consults a permitted set. The wire protocol (the
|
||||
optional `workspace` field on the authenticated request) is
|
||||
identical in both editions, so clients written against one edition
|
||||
work unchanged against the other.
|
||||
- **Rules-based access control.** A separate access control service
|
||||
could evaluate fine-grained policies (per-collection permissions,
|
||||
operation-level restrictions, time-based access). The gateway
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue