mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-26 00:46:22 +02:00
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model. The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.
IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
passwords and JWT signing keys in Cassandra. Reached over the
standard pub/sub request/response pattern; gateway is the only
caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
rotate-signing-key, create/list/get/update/disable/delete/enable-user,
change-password, reset-password, create/list/get/update/disable-
workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and
retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed. Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
required startup argument with no permissive default. Masked
"auth failure" errors hide whether a refused bootstrap request was
due to mode, state, or authorisation.
Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator. Distinguishes JWTs
(three-segment dotted) from API keys by shape; verifies JWTs
locally using the cached IAM public key; resolves API keys via
IAM with a short-TTL hash-keyed cache. Every failure path
surfaces the same 401 body ("auth failure") so callers cannot
enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
traffic does not begin flowing until auth has started.
Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
OSS ships reader / writer / admin; the first two are workspace-
assigned, admin is cross-workspace ("*"). No "cross-workspace"
pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
authorisation test: some role must grant the capability *and* be
active in the target workspace.
* enforce_workspace validates a request-body workspace against the
caller's role scopes and injects the resolved value. Cross-
workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
permissive default. Construction fails fast if omitted. Enterprise
editions can replace the role table without changing the wire
protocol.
WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
runs on the first WebSocket frame ({"type":"auth","token":"..."})
with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
The socket stays open on failure so the client can re-authenticate
— browsers treat a handshake-time 401 as terminal, breaking
reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
enforces the caller's workspace (envelope + inner payload) using
the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
handshake (URL-scoped short-lived transfers; no re-auth need).
Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
the IAM API (per-op REST endpoints to follow in a later change).
Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
Authenticator.permitted contract. The gateway cannot run without
IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
downgrade path.
CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.
Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
role bundles, agent-as-composition note, enforcement-boundary
policy, enterprise extensibility.
Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
role x workspace combinations, enforce_workspace paths,
unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
explicitly (no permissive defaults relied upon). New tests pin
the fail-closed invariants: DispatcherManager / Mux refuse
auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
329 lines
14 KiB
Markdown
329 lines
14 KiB
Markdown
---
|
|
layout: default
|
|
title: "IAM Service Protocol Technical Specification"
|
|
parent: "Tech Specs"
|
|
---
|
|
|
|
# IAM Service Protocol Technical Specification
|
|
|
|
## Overview
|
|
|
|
The IAM service is a backend processor, reached over the standard
|
|
request/response pub/sub pattern. It is the authority for users,
|
|
workspaces, API keys, and login credentials. The API gateway
|
|
delegates to it for authentication resolution and for all user /
|
|
workspace / key management.
|
|
|
|
This document defines the wire protocol: the `IamRequest` and
|
|
`IamResponse` dataclasses, the operation set, the per-operation
|
|
input and output fields, the error taxonomy, and the initial HTTP
|
|
forwarding endpoint used while IAM is being integrated into the
|
|
gateway.
|
|
|
|
Architectural context — roles, capabilities, workspace scoping,
|
|
enforcement boundary — lives in [`iam.md`](iam.md) and
|
|
[`capabilities.md`](capabilities.md).
|
|
|
|
## Transport
|
|
|
|
- **Request topic:** `request:tg/request/iam-request`
|
|
- **Response topic:** `response:tg/response/iam-response`
|
|
- **Pattern:** request/response, correlated by the `id` message
|
|
property, the same pattern used by `config-svc` and `flow-svc`.
|
|
- **Caller:** the API gateway only. Under the enforcement-boundary
|
|
policy (see capabilities spec), the IAM service trusts the bus
|
|
and performs no per-request authentication or capability check
|
|
against the caller. The gateway has already evaluated capability
|
|
membership and workspace scoping before sending the request.
|
|
|
|
## Dataclasses
|
|
|
|
### `IamRequest`
|
|
|
|
```python
|
|
@dataclass
|
|
class IamRequest:
|
|
# One of the operation strings below.
|
|
operation: str = ""
|
|
|
|
# Scope of this request. Required on every workspace-scoped
|
|
# operation. Omitted (or empty) for system-level ops
|
|
# (workspace CRUD, signing-key ops, bootstrap, resolve-api-key,
|
|
# login).
|
|
workspace: str = ""
|
|
|
|
# Acting user id, for audit. Set by the gateway to the
|
|
# authenticated caller's id on user-initiated operations.
|
|
# Empty for internal-origin (bootstrap, reconcilers) and for
|
|
# resolve-api-key / login (no actor yet).
|
|
actor: str = ""
|
|
|
|
# --- identity selectors ---
|
|
user_id: str = ""
|
|
username: str = "" # login; unique within a workspace
|
|
key_id: str = "" # revoke-api-key, list-api-keys (own)
|
|
api_key: str = "" # resolve-api-key (plaintext)
|
|
|
|
# --- credentials ---
|
|
password: str = "" # login, change-password (current)
|
|
new_password: str = "" # change-password
|
|
|
|
# --- user fields ---
|
|
user: UserInput | None = None # create-user, update-user
|
|
|
|
# --- workspace fields ---
|
|
workspace_record: WorkspaceInput | None = None # create-workspace, update-workspace
|
|
|
|
# --- api key fields ---
|
|
key: ApiKeyInput | None = None # create-api-key
|
|
```
|
|
|
|
### `IamResponse`
|
|
|
|
```python
|
|
@dataclass
|
|
class IamResponse:
|
|
# Populated on success of operations that return them.
|
|
user: UserRecord | None = None # create-user, get-user, update-user
|
|
users: list[UserRecord] = field(default_factory=list) # list-users
|
|
workspace: WorkspaceRecord | None = None # create-workspace, get-workspace, update-workspace
|
|
workspaces: list[WorkspaceRecord] = field(default_factory=list) # list-workspaces
|
|
|
|
# create-api-key returns the plaintext once. Never populated
|
|
# on any other operation.
|
|
api_key_plaintext: str = ""
|
|
api_key: ApiKeyRecord | None = None # create-api-key
|
|
api_keys: list[ApiKeyRecord] = field(default_factory=list) # list-api-keys
|
|
|
|
# login, rotate-signing-key
|
|
jwt: str = ""
|
|
jwt_expires: str = "" # ISO-8601 UTC
|
|
|
|
# get-signing-key-public
|
|
signing_key_public: str = "" # PEM
|
|
|
|
# resolve-api-key returns who this key authenticates as.
|
|
resolved_user_id: str = ""
|
|
resolved_workspace: str = ""
|
|
resolved_roles: list[str] = field(default_factory=list)
|
|
|
|
# reset-password
|
|
temporary_password: str = "" # returned once to the operator
|
|
|
|
# bootstrap: on first run, the initial admin's one-time API key
|
|
# is returned for the operator to capture.
|
|
bootstrap_admin_user_id: str = ""
|
|
bootstrap_admin_api_key: str = ""
|
|
|
|
# Present on any failed operation.
|
|
error: Error | None = None
|
|
```
|
|
|
|
### Value types
|
|
|
|
```python
|
|
@dataclass
|
|
class UserInput:
|
|
username: str = ""
|
|
name: str = ""
|
|
email: str = ""
|
|
password: str = "" # only on create-user; never on update-user
|
|
roles: list[str] = field(default_factory=list)
|
|
enabled: bool = True
|
|
must_change_password: bool = False
|
|
|
|
@dataclass
|
|
class UserRecord:
|
|
id: str = ""
|
|
workspace: str = ""
|
|
username: str = ""
|
|
name: str = ""
|
|
email: str = ""
|
|
roles: list[str] = field(default_factory=list)
|
|
enabled: bool = True
|
|
must_change_password: bool = False
|
|
created: str = "" # ISO-8601 UTC
|
|
# Password hash is never included in any response.
|
|
|
|
@dataclass
|
|
class WorkspaceInput:
|
|
id: str = ""
|
|
name: str = ""
|
|
enabled: bool = True
|
|
|
|
@dataclass
|
|
class WorkspaceRecord:
|
|
id: str = ""
|
|
name: str = ""
|
|
enabled: bool = True
|
|
created: str = "" # ISO-8601 UTC
|
|
|
|
@dataclass
|
|
class ApiKeyInput:
|
|
user_id: str = ""
|
|
name: str = "" # operator-facing label, e.g. "laptop"
|
|
expires: str = "" # optional ISO-8601 UTC; empty = no expiry
|
|
|
|
@dataclass
|
|
class ApiKeyRecord:
|
|
id: str = ""
|
|
user_id: str = ""
|
|
name: str = ""
|
|
prefix: str = "" # first 4 chars of plaintext, for identification in lists
|
|
expires: str = "" # empty = no expiry
|
|
created: str = ""
|
|
last_used: str = "" # empty if never used
|
|
# key_hash is never included in any response.
|
|
```
|
|
|
|
## Operations
|
|
|
|
| Operation | Request fields | Response fields | Notes |
|
|
|---|---|---|---|
|
|
| `login` | `username`, `password`, `workspace` (optional) | `jwt`, `jwt_expires` | If `workspace` omitted, IAM resolves to the user's assigned workspace. |
|
|
| `resolve-api-key` | `api_key` (plaintext) | `resolved_user_id`, `resolved_workspace`, `resolved_roles` | Gateway-internal. Service returns `auth-failed` for unknown / expired / revoked keys. |
|
|
| `change-password` | `user_id`, `password` (current), `new_password` | — | Self-service. IAM validates `password` against stored hash. |
|
|
| `reset-password` | `user_id` | `temporary_password` | Admin-initiated. IAM generates a random password, sets `must_change_password=true` on the user, returns the plaintext once. |
|
|
| `create-user` | `workspace`, `user` | `user` | Admin-only. `user.password` is hashed and stored; `user.roles` must be subset of known roles. |
|
|
| `list-users` | `workspace` | `users` | |
|
|
| `get-user` | `workspace`, `user_id` | `user` | |
|
|
| `update-user` | `workspace`, `user_id`, `user` | `user` | `password` field on `user` is rejected; use `change-password` / `reset-password`. |
|
|
| `disable-user` | `workspace`, `user_id` | — | Soft-delete; sets `enabled=false`. Revokes all the user's API keys. |
|
|
| `create-workspace` | `workspace_record` | `workspace` | System-level. |
|
|
| `list-workspaces` | — | `workspaces` | System-level. |
|
|
| `get-workspace` | `workspace_record` (id only) | `workspace` | System-level. |
|
|
| `update-workspace` | `workspace_record` | `workspace` | System-level. |
|
|
| `disable-workspace` | `workspace_record` (id only) | — | System-level. Sets `enabled=false`; revokes all workspace API keys; disables all users in the workspace. |
|
|
| `create-api-key` | `workspace`, `key` | `api_key_plaintext`, `api_key` | Plaintext returned **once**; only hash stored. `key.name` required. |
|
|
| `list-api-keys` | `workspace`, `user_id` | `api_keys` | |
|
|
| `revoke-api-key` | `workspace`, `key_id` | — | Deletes the key record. |
|
|
| `get-signing-key-public` | — | `signing_key_public` | Gateway fetches this at startup. |
|
|
| `rotate-signing-key` | — | — | System-level. Introduces a new signing key; old key continues to validate JWTs for a grace period (implementation-defined, minimum 1h). |
|
|
| `bootstrap` | — | `bootstrap_admin_user_id`, `bootstrap_admin_api_key` | If IAM tables are empty, creates the initial `default` workspace, an `admin` user, an initial API key, and an initial signing key; returns them once. No-op on subsequent calls (returns empty fields). |
|
|
|
|
## Error taxonomy
|
|
|
|
All errors are carried in the `IamResponse.error` field. `error.type`
|
|
is one of the values below; `error.message` is a human-readable
|
|
string that is **not** surfaced verbatim to external callers (the
|
|
gateway maps to `auth failure` / `access denied` per the IAM error
|
|
policy).
|
|
|
|
| `type` | When |
|
|
|---|---|
|
|
| `invalid-argument` | Malformed request (missing required field, unknown operation, invalid format). |
|
|
| `not-found` | Named resource does not exist (`user_id`, `key_id`, workspace). |
|
|
| `duplicate` | Create operation collides with an existing resource (username, workspace id, key name). |
|
|
| `auth-failed` | `login` with wrong credentials; `resolve-api-key` with unknown / expired / revoked key; `change-password` with wrong current password. Single bucket to deny oracle attacks. |
|
|
| `weak-password` | Password does not meet policy (length, complexity — policy defined at service level). |
|
|
| `disabled` | Target user or workspace has `enabled=false`. |
|
|
| `operation-not-permitted` | Non-admin attempting system-level operation, or workspace-scoped operation attempting to affect another workspace. |
|
|
| `internal-error` | Unexpected IAM-side failure. Log and surface as 500 at the gateway. |
|
|
|
|
The gateway is responsible for translating `auth-failed` and
|
|
`operation-not-permitted` into the obfuscated external error
|
|
response (`"auth failure"` / `"access denied"`); `invalid-argument`
|
|
becomes a descriptive 400; `not-found` / `duplicate` /
|
|
`weak-password` / `disabled` become descriptive 4xx but never leak
|
|
IAM-internal detail.
|
|
|
|
## Credential storage
|
|
|
|
- **Passwords** are stored using a slow KDF (bcrypt / argon2id — the
|
|
service picks; documented as an implementation detail). The
|
|
`password_hash` column stores the full KDF-encoded string
|
|
(algorithm, cost, salt, hash). Not a plain SHA-256.
|
|
- **API keys** are stored as SHA-256 of the plaintext. API keys
|
|
are 128-bit random values (`tg_` + base64url); the entropy
|
|
makes a slow hash unnecessary. The hash serves as the primary
|
|
key on the `iam_api_keys` table, enabling O(1) lookup on
|
|
`resolve-api-key`.
|
|
- **JWT signing key** is stored as an RSA or Ed25519 private key
|
|
(implementation choice) in a dedicated `iam_signing_keys` table
|
|
with a `kid`, `created`, and optional `retired` timestamp. At
|
|
most one active key; up to N retired keys are kept for a grace
|
|
period to validate previously-issued JWTs.
|
|
|
|
Passwords, API-key plaintext, and signing-key private material are
|
|
never returned in any response other than the explicit one-time
|
|
responses above (`reset-password`, `create-api-key`, `bootstrap`).
|
|
|
|
## Bootstrap modes
|
|
|
|
`iam-svc` requires a bootstrap mode to be chosen at startup. There is
|
|
no default — an unset or invalid mode causes the service to refuse
|
|
to start. The purpose is to force the operator to make an explicit
|
|
security decision rather than rely on an implicit "safe" fallback.
|
|
|
|
| Mode | Startup behaviour | `bootstrap` operation | Suitability |
|
|
|---|---|---|---|
|
|
| `token` | On first start with empty tables, auto-seeds the `default` workspace, admin user, admin API key (using the operator-provided `--bootstrap-token`), and an initial signing key. No-op on subsequent starts. | Refused — returns `auth-failed` / `"auth failure"` regardless of caller. | Production, any public-exposure deployment. |
|
|
| `bootstrap` | No startup seeding. Tables remain empty until the `bootstrap` operation is invoked over the pub/sub bus (typically via `tg-bootstrap-iam`). | Live while tables are empty. Generates and returns the admin API key once. Refused (`auth-failed`) once tables are populated. | Dev / compose up / CI. **Not safe under public exposure** — any caller reaching the gateway's `/api/v1/iam` forwarder before the operator can cause a token to be issued to them. Operators choosing this mode accept that risk. |
|
|
|
|
### Error masking
|
|
|
|
In both modes, any refused invocation of the `bootstrap` operation
|
|
returns the same error (`auth-failed` / `"auth failure"`). A caller
|
|
cannot distinguish:
|
|
|
|
- "service is in token mode"
|
|
- "service is in bootstrap mode but already bootstrapped"
|
|
- "operation forbidden"
|
|
|
|
This matches the general IAM error-policy stance (see `iam.md`) and
|
|
prevents externally enumerating IAM's state.
|
|
|
|
### Bootstrap-token lifecycle
|
|
|
|
The bootstrap token — whether operator-supplied (`token` mode) or
|
|
service-generated (`bootstrap` mode) — is a one-time credential. It
|
|
is stored as admin's single API key, tagged `name="bootstrap"`. The
|
|
operator's first admin action after bootstrap should be:
|
|
|
|
1. Create a durable admin user and API key (or issue a durable API
|
|
key to the bootstrap admin).
|
|
2. Revoke the bootstrap key via `revoke-api-key`.
|
|
3. Remove the bootstrap token from any deployment configuration.
|
|
|
|
The `name="bootstrap"` marker makes bootstrap keys easy to detect in
|
|
tooling (e.g. a `tg-list-api-keys` filter).
|
|
|
|
## HTTP forwarding (initial integration)
|
|
|
|
For the initial gateway integration — before the IAM service is
|
|
wired into the authentication middleware — the gateway exposes a
|
|
single forwarding endpoint:
|
|
|
|
```
|
|
POST /api/v1/iam
|
|
```
|
|
|
|
- Request body is a JSON encoding of `IamRequest`.
|
|
- Response body is a JSON encoding of `IamResponse`.
|
|
- The gateway's existing authentication (`GATEWAY_SECRET` bearer)
|
|
gates access to this endpoint so the IAM protocol can be
|
|
exercised end-to-end in tests without touching the live auth
|
|
path.
|
|
- This endpoint is **not** the final shape. Once the middleware is
|
|
in place, per-operation REST endpoints replace it (for example
|
|
`POST /api/v1/auth/login`, `POST /api/v1/users`, `DELETE
|
|
/api/v1/api-keys/{id}`), and this generic forwarder is removed.
|
|
|
|
The endpoint performs only message marshalling: it does not read
|
|
or rewrite fields in the request, and it applies no capability
|
|
check. All authorisation for user / workspace / key management
|
|
lands in the subsequent middleware work.
|
|
|
|
## Non-goals for this spec
|
|
|
|
- REST endpoint shape for the final gateway surface — covered in
|
|
Phase 2 of the IAM implementation plan, not here.
|
|
- OIDC / SAML external IdP protocol — out of scope for open source.
|
|
- Key-signing algorithm choice, password KDF choice, JWT claim
|
|
layout — implementation details captured in code + ADRs, not
|
|
locked in the protocol spec.
|
|
|
|
## References
|
|
|
|
- [Identity and Access Management Specification](iam.md)
|
|
- [Capability Vocabulary Specification](capabilities.md)
|