feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model. The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.
IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
passwords and JWT signing keys in Cassandra. Reached over the
standard pub/sub request/response pattern; gateway is the only
caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
rotate-signing-key, create/list/get/update/disable/delete/enable-user,
change-password, reset-password, create/list/get/update/disable-
workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and
retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed. Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
required startup argument with no permissive default. Masked
"auth failure" errors hide whether a refused bootstrap request was
due to mode, state, or authorisation.
Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator. Distinguishes JWTs
(three-segment dotted) from API keys by shape; verifies JWTs
locally using the cached IAM public key; resolves API keys via
IAM with a short-TTL hash-keyed cache. Every failure path
surfaces the same 401 body ("auth failure") so callers cannot
enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
traffic does not begin flowing until auth has started.
Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
OSS ships reader / writer / admin; the first two are workspace-
assigned, admin is cross-workspace ("*"). No "cross-workspace"
pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
authorisation test: some role must grant the capability *and* be
active in the target workspace.
* enforce_workspace validates a request-body workspace against the
caller's role scopes and injects the resolved value. Cross-
workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
permissive default. Construction fails fast if omitted. Enterprise
editions can replace the role table without changing the wire
protocol.
WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
runs on the first WebSocket frame ({"type":"auth","token":"..."})
with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
The socket stays open on failure so the client can re-authenticate
— browsers treat a handshake-time 401 as terminal, breaking
reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
enforces the caller's workspace (envelope + inner payload) using
the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
handshake (URL-scoped short-lived transfers; no re-auth need).
Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
the IAM API (per-op REST endpoints to follow in a later change).
Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
Authenticator.permitted contract. The gateway cannot run without
IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
downgrade path.
CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.
Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
role bundles, agent-as-composition note, enforcement-boundary
policy, enterprise extensibility.
Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
role x workspace combinations, enforce_workspace paths,
unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
explicitly (no permissive defaults relied upon). New tests pin
the fail-closed invariants: DispatcherManager / Mux refuse
auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00
|
|
|
"""
|
|
|
|
|
IAM Cassandra table store.
|
|
|
|
|
|
|
|
|
|
Tables:
|
|
|
|
|
- iam_workspaces (id primary key)
|
|
|
|
|
- iam_users (id primary key) + iam_users_by_username lookup table
|
|
|
|
|
(workspace, username) -> id
|
|
|
|
|
- iam_api_keys (key_hash primary key) with secondary index on user_id
|
|
|
|
|
- iam_signing_keys (kid primary key) — RSA keypairs for JWT signing
|
|
|
|
|
|
|
|
|
|
See docs/tech-specs/iam-protocol.md for the wire-level context.
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
import logging
|
|
|
|
|
|
|
|
|
|
from cassandra.cluster import Cluster
|
|
|
|
|
from cassandra.auth import PlainTextAuthProvider
|
2026-06-04 11:49:29 +01:00
|
|
|
import ssl
|
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model. The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.
IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
passwords and JWT signing keys in Cassandra. Reached over the
standard pub/sub request/response pattern; gateway is the only
caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
rotate-signing-key, create/list/get/update/disable/delete/enable-user,
change-password, reset-password, create/list/get/update/disable-
workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and
retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed. Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
required startup argument with no permissive default. Masked
"auth failure" errors hide whether a refused bootstrap request was
due to mode, state, or authorisation.
Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator. Distinguishes JWTs
(three-segment dotted) from API keys by shape; verifies JWTs
locally using the cached IAM public key; resolves API keys via
IAM with a short-TTL hash-keyed cache. Every failure path
surfaces the same 401 body ("auth failure") so callers cannot
enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
traffic does not begin flowing until auth has started.
Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
OSS ships reader / writer / admin; the first two are workspace-
assigned, admin is cross-workspace ("*"). No "cross-workspace"
pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
authorisation test: some role must grant the capability *and* be
active in the target workspace.
* enforce_workspace validates a request-body workspace against the
caller's role scopes and injects the resolved value. Cross-
workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
permissive default. Construction fails fast if omitted. Enterprise
editions can replace the role table without changing the wire
protocol.
WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
runs on the first WebSocket frame ({"type":"auth","token":"..."})
with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
The socket stays open on failure so the client can re-authenticate
— browsers treat a handshake-time 401 as terminal, breaking
reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
enforces the caller's workspace (envelope + inner payload) using
the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
handshake (URL-scoped short-lived transfers; no re-auth need).
Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
the IAM API (per-op REST endpoints to follow in a later change).
Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
Authenticator.permitted contract. The gateway cannot run without
IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
downgrade path.
CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.
Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
role bundles, agent-as-composition note, enforcement-boundary
policy, enterprise extensibility.
Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
role x workspace combinations, enforce_workspace paths,
unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
explicitly (no permissive defaults relied upon). New tests pin
the fail-closed invariants: DispatcherManager / Mux refuse
auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00
|
|
|
|
|
|
|
|
from . cassandra_async import async_execute
|
|
|
|
|
|
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class IamTableStore:
|
|
|
|
|
|
|
|
|
|
def __init__(
|
|
|
|
|
self,
|
|
|
|
|
cassandra_host, cassandra_username, cassandra_password,
|
|
|
|
|
keyspace,
|
2026-05-08 19:48:12 +01:00
|
|
|
replication_factor=1,
|
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model. The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.
IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
passwords and JWT signing keys in Cassandra. Reached over the
standard pub/sub request/response pattern; gateway is the only
caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
rotate-signing-key, create/list/get/update/disable/delete/enable-user,
change-password, reset-password, create/list/get/update/disable-
workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and
retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed. Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
required startup argument with no permissive default. Masked
"auth failure" errors hide whether a refused bootstrap request was
due to mode, state, or authorisation.
Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator. Distinguishes JWTs
(three-segment dotted) from API keys by shape; verifies JWTs
locally using the cached IAM public key; resolves API keys via
IAM with a short-TTL hash-keyed cache. Every failure path
surfaces the same 401 body ("auth failure") so callers cannot
enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
traffic does not begin flowing until auth has started.
Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
OSS ships reader / writer / admin; the first two are workspace-
assigned, admin is cross-workspace ("*"). No "cross-workspace"
pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
authorisation test: some role must grant the capability *and* be
active in the target workspace.
* enforce_workspace validates a request-body workspace against the
caller's role scopes and injects the resolved value. Cross-
workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
permissive default. Construction fails fast if omitted. Enterprise
editions can replace the role table without changing the wire
protocol.
WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
runs on the first WebSocket frame ({"type":"auth","token":"..."})
with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
The socket stays open on failure so the client can re-authenticate
— browsers treat a handshake-time 401 as terminal, breaking
reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
enforces the caller's workspace (envelope + inner payload) using
the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
handshake (URL-scoped short-lived transfers; no re-auth need).
Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
the IAM API (per-op REST endpoints to follow in a later change).
Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
Authenticator.permitted contract. The gateway cannot run without
IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
downgrade path.
CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.
Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
role bundles, agent-as-composition note, enforcement-boundary
policy, enterprise extensibility.
Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
role x workspace combinations, enforce_workspace paths,
unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
explicitly (no permissive defaults relied upon). New tests pin
the fail-closed invariants: DispatcherManager / Mux refuse
auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00
|
|
|
):
|
|
|
|
|
self.keyspace = keyspace
|
2026-05-08 19:48:12 +01:00
|
|
|
self.replication_factor = replication_factor
|
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model. The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.
IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
passwords and JWT signing keys in Cassandra. Reached over the
standard pub/sub request/response pattern; gateway is the only
caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
rotate-signing-key, create/list/get/update/disable/delete/enable-user,
change-password, reset-password, create/list/get/update/disable-
workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and
retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed. Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
required startup argument with no permissive default. Masked
"auth failure" errors hide whether a refused bootstrap request was
due to mode, state, or authorisation.
Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator. Distinguishes JWTs
(three-segment dotted) from API keys by shape; verifies JWTs
locally using the cached IAM public key; resolves API keys via
IAM with a short-TTL hash-keyed cache. Every failure path
surfaces the same 401 body ("auth failure") so callers cannot
enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
traffic does not begin flowing until auth has started.
Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
OSS ships reader / writer / admin; the first two are workspace-
assigned, admin is cross-workspace ("*"). No "cross-workspace"
pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
authorisation test: some role must grant the capability *and* be
active in the target workspace.
* enforce_workspace validates a request-body workspace against the
caller's role scopes and injects the resolved value. Cross-
workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
permissive default. Construction fails fast if omitted. Enterprise
editions can replace the role table without changing the wire
protocol.
WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
runs on the first WebSocket frame ({"type":"auth","token":"..."})
with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
The socket stays open on failure so the client can re-authenticate
— browsers treat a handshake-time 401 as terminal, breaking
reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
enforces the caller's workspace (envelope + inner payload) using
the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
handshake (URL-scoped short-lived transfers; no re-auth need).
Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
the IAM API (per-op REST endpoints to follow in a later change).
Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
Authenticator.permitted contract. The gateway cannot run without
IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
downgrade path.
CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.
Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
role bundles, agent-as-composition note, enforcement-boundary
policy, enterprise extensibility.
Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
role x workspace combinations, enforce_workspace paths,
unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
explicitly (no permissive defaults relied upon). New tests pin
the fail-closed invariants: DispatcherManager / Mux refuse
auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00
|
|
|
|
|
|
|
|
logger.info("IAM: connecting to Cassandra...")
|
|
|
|
|
|
|
|
|
|
if isinstance(cassandra_host, str):
|
|
|
|
|
cassandra_host = [h.strip() for h in cassandra_host.split(",")]
|
|
|
|
|
|
|
|
|
|
if cassandra_username and cassandra_password:
|
2026-06-04 11:49:29 +01:00
|
|
|
ssl_context = ssl.create_default_context()
|
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model. The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.
IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
passwords and JWT signing keys in Cassandra. Reached over the
standard pub/sub request/response pattern; gateway is the only
caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
rotate-signing-key, create/list/get/update/disable/delete/enable-user,
change-password, reset-password, create/list/get/update/disable-
workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and
retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed. Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
required startup argument with no permissive default. Masked
"auth failure" errors hide whether a refused bootstrap request was
due to mode, state, or authorisation.
Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator. Distinguishes JWTs
(three-segment dotted) from API keys by shape; verifies JWTs
locally using the cached IAM public key; resolves API keys via
IAM with a short-TTL hash-keyed cache. Every failure path
surfaces the same 401 body ("auth failure") so callers cannot
enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
traffic does not begin flowing until auth has started.
Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
OSS ships reader / writer / admin; the first two are workspace-
assigned, admin is cross-workspace ("*"). No "cross-workspace"
pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
authorisation test: some role must grant the capability *and* be
active in the target workspace.
* enforce_workspace validates a request-body workspace against the
caller's role scopes and injects the resolved value. Cross-
workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
permissive default. Construction fails fast if omitted. Enterprise
editions can replace the role table without changing the wire
protocol.
WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
runs on the first WebSocket frame ({"type":"auth","token":"..."})
with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
The socket stays open on failure so the client can re-authenticate
— browsers treat a handshake-time 401 as terminal, breaking
reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
enforces the caller's workspace (envelope + inner payload) using
the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
handshake (URL-scoped short-lived transfers; no re-auth need).
Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
the IAM API (per-op REST endpoints to follow in a later change).
Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
Authenticator.permitted contract. The gateway cannot run without
IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
downgrade path.
CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.
Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
role bundles, agent-as-composition note, enforcement-boundary
policy, enterprise extensibility.
Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
role x workspace combinations, enforce_workspace paths,
unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
explicitly (no permissive defaults relied upon). New tests pin
the fail-closed invariants: DispatcherManager / Mux refuse
auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00
|
|
|
auth_provider = PlainTextAuthProvider(
|
|
|
|
|
username=cassandra_username, password=cassandra_password,
|
|
|
|
|
)
|
|
|
|
|
self.cluster = Cluster(
|
|
|
|
|
cassandra_host,
|
|
|
|
|
auth_provider=auth_provider,
|
|
|
|
|
ssl_context=ssl_context,
|
|
|
|
|
)
|
|
|
|
|
else:
|
|
|
|
|
self.cluster = Cluster(cassandra_host)
|
|
|
|
|
|
|
|
|
|
self.cassandra = self.cluster.connect()
|
|
|
|
|
|
|
|
|
|
logger.info("IAM: connected.")
|
|
|
|
|
|
|
|
|
|
self._ensure_schema()
|
|
|
|
|
self._prepare_statements()
|
|
|
|
|
|
|
|
|
|
def _ensure_schema(self):
|
|
|
|
|
self.cassandra.execute(f"""
|
|
|
|
|
create keyspace if not exists {self.keyspace}
|
|
|
|
|
with replication = {{
|
|
|
|
|
'class' : 'SimpleStrategy',
|
2026-05-08 19:48:12 +01:00
|
|
|
'replication_factor' : {self.replication_factor}
|
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model. The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.
IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
passwords and JWT signing keys in Cassandra. Reached over the
standard pub/sub request/response pattern; gateway is the only
caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
rotate-signing-key, create/list/get/update/disable/delete/enable-user,
change-password, reset-password, create/list/get/update/disable-
workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and
retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed. Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
required startup argument with no permissive default. Masked
"auth failure" errors hide whether a refused bootstrap request was
due to mode, state, or authorisation.
Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator. Distinguishes JWTs
(three-segment dotted) from API keys by shape; verifies JWTs
locally using the cached IAM public key; resolves API keys via
IAM with a short-TTL hash-keyed cache. Every failure path
surfaces the same 401 body ("auth failure") so callers cannot
enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
traffic does not begin flowing until auth has started.
Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
OSS ships reader / writer / admin; the first two are workspace-
assigned, admin is cross-workspace ("*"). No "cross-workspace"
pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
authorisation test: some role must grant the capability *and* be
active in the target workspace.
* enforce_workspace validates a request-body workspace against the
caller's role scopes and injects the resolved value. Cross-
workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
permissive default. Construction fails fast if omitted. Enterprise
editions can replace the role table without changing the wire
protocol.
WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
runs on the first WebSocket frame ({"type":"auth","token":"..."})
with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
The socket stays open on failure so the client can re-authenticate
— browsers treat a handshake-time 401 as terminal, breaking
reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
enforces the caller's workspace (envelope + inner payload) using
the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
handshake (URL-scoped short-lived transfers; no re-auth need).
Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
the IAM API (per-op REST endpoints to follow in a later change).
Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
Authenticator.permitted contract. The gateway cannot run without
IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
downgrade path.
CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.
Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
role bundles, agent-as-composition note, enforcement-boundary
policy, enterprise extensibility.
Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
role x workspace combinations, enforce_workspace paths,
unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
explicitly (no permissive defaults relied upon). New tests pin
the fail-closed invariants: DispatcherManager / Mux refuse
auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00
|
|
|
}};
|
|
|
|
|
""")
|
|
|
|
|
self.cassandra.set_keyspace(self.keyspace)
|
|
|
|
|
|
|
|
|
|
self.cassandra.execute("""
|
|
|
|
|
CREATE TABLE IF NOT EXISTS iam_workspaces (
|
|
|
|
|
id text PRIMARY KEY,
|
|
|
|
|
name text,
|
|
|
|
|
enabled boolean,
|
|
|
|
|
created timestamp
|
|
|
|
|
);
|
|
|
|
|
""")
|
|
|
|
|
|
|
|
|
|
self.cassandra.execute("""
|
|
|
|
|
CREATE TABLE IF NOT EXISTS iam_users (
|
|
|
|
|
id text PRIMARY KEY,
|
|
|
|
|
workspace text,
|
|
|
|
|
username text,
|
|
|
|
|
name text,
|
|
|
|
|
email text,
|
|
|
|
|
password_hash text,
|
|
|
|
|
roles set<text>,
|
|
|
|
|
enabled boolean,
|
|
|
|
|
must_change_password boolean,
|
|
|
|
|
created timestamp
|
|
|
|
|
);
|
|
|
|
|
""")
|
|
|
|
|
|
|
|
|
|
self.cassandra.execute("""
|
|
|
|
|
CREATE TABLE IF NOT EXISTS iam_users_by_username (
|
|
|
|
|
workspace text,
|
|
|
|
|
username text,
|
|
|
|
|
user_id text,
|
|
|
|
|
PRIMARY KEY ((workspace), username)
|
|
|
|
|
);
|
|
|
|
|
""")
|
|
|
|
|
|
|
|
|
|
self.cassandra.execute("""
|
|
|
|
|
CREATE TABLE IF NOT EXISTS iam_api_keys (
|
|
|
|
|
key_hash text PRIMARY KEY,
|
|
|
|
|
id text,
|
|
|
|
|
user_id text,
|
|
|
|
|
name text,
|
|
|
|
|
prefix text,
|
|
|
|
|
expires timestamp,
|
|
|
|
|
created timestamp,
|
|
|
|
|
last_used timestamp
|
|
|
|
|
);
|
|
|
|
|
""")
|
|
|
|
|
|
|
|
|
|
self.cassandra.execute("""
|
|
|
|
|
CREATE INDEX IF NOT EXISTS iam_api_keys_user_id_idx
|
|
|
|
|
ON iam_api_keys (user_id);
|
|
|
|
|
""")
|
|
|
|
|
|
|
|
|
|
self.cassandra.execute("""
|
|
|
|
|
CREATE INDEX IF NOT EXISTS iam_api_keys_id_idx
|
|
|
|
|
ON iam_api_keys (id);
|
|
|
|
|
""")
|
|
|
|
|
|
|
|
|
|
self.cassandra.execute("""
|
|
|
|
|
CREATE TABLE IF NOT EXISTS iam_signing_keys (
|
|
|
|
|
kid text PRIMARY KEY,
|
|
|
|
|
private_pem text,
|
|
|
|
|
public_pem text,
|
|
|
|
|
created timestamp,
|
|
|
|
|
retired timestamp
|
|
|
|
|
);
|
|
|
|
|
""")
|
|
|
|
|
|
|
|
|
|
logger.info("IAM: Cassandra schema OK.")
|
|
|
|
|
|
|
|
|
|
def _prepare_statements(self):
|
|
|
|
|
c = self.cassandra
|
|
|
|
|
|
|
|
|
|
self.put_workspace_stmt = c.prepare("""
|
|
|
|
|
INSERT INTO iam_workspaces (id, name, enabled, created)
|
|
|
|
|
VALUES (?, ?, ?, ?)
|
|
|
|
|
""")
|
|
|
|
|
self.get_workspace_stmt = c.prepare("""
|
|
|
|
|
SELECT id, name, enabled, created FROM iam_workspaces
|
|
|
|
|
WHERE id = ?
|
|
|
|
|
""")
|
|
|
|
|
self.list_workspaces_stmt = c.prepare("""
|
|
|
|
|
SELECT id, name, enabled, created FROM iam_workspaces
|
|
|
|
|
""")
|
|
|
|
|
|
|
|
|
|
self.put_user_stmt = c.prepare("""
|
|
|
|
|
INSERT INTO iam_users (
|
|
|
|
|
id, workspace, username, name, email, password_hash,
|
|
|
|
|
roles, enabled, must_change_password, created
|
|
|
|
|
)
|
|
|
|
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
|
|
|
|
""")
|
|
|
|
|
self.get_user_stmt = c.prepare("""
|
|
|
|
|
SELECT id, workspace, username, name, email, password_hash,
|
|
|
|
|
roles, enabled, must_change_password, created
|
|
|
|
|
FROM iam_users WHERE id = ?
|
|
|
|
|
""")
|
|
|
|
|
self.list_users_by_workspace_stmt = c.prepare("""
|
|
|
|
|
SELECT id, workspace, username, name, email, password_hash,
|
|
|
|
|
roles, enabled, must_change_password, created
|
|
|
|
|
FROM iam_users WHERE workspace = ? ALLOW FILTERING
|
|
|
|
|
""")
|
iam: self-service ops, optional workspace filters, Mux service routing (#855)
Three threads, all reinforcing the contract's system-level vs.
workspace-association distinction.
WS Mux service routing
- tg-show-flows (and any workspace-level service over the WS) was
failing with "unknown service" because the post-refactor Mux
unconditionally looked up flow-service:<kind>. Now branches on
the envelope's flow field: with flow → flow-service:<kind>;
without flow → <kind>:<op> from the inner body; with bare op
lookup for service=iam. Resource and parameters come from the
matched op's own extractors — same path the HTTP endpoints take.
Optional workspace on system-level user/key ops
- list-users returns the deployment-wide list when no workspace is
supplied, filters when one is. get-user, update-user,
disable-user, enable-user, delete-user, reset-password,
create-api-key, list-api-keys, revoke-api-key all treat workspace
as an optional integrity check rather than a required argument.
- create-user keeps workspace required — there it's the new user's
home-workspace binding, a parameter rather than an address.
- API keys reclassified as SYSTEM-level resources. By the same
reasoning that makes users system-level, an API key is a
credential record on a deployment-wide registry; the workspace it
authenticates to is a property, not a containment.
Self-service surface
- whoami: returns the caller's own user record. AUTHENTICATED-only;
no users:read capability required. Foundation for UI affordances
that depend on the caller's permissions.
- bootstrap-status: POST /api/v1/auth/bootstrap-status, PUBLIC,
side-effect-free. Returns {bootstrap_available: bool} so a
first-run UI can decide whether to render setup without consuming
the bootstrap op.
- Gateway now injects actor=identity.handle on every authenticated
forward to iam-svc (IamEndpoint and WS Mux iam path), overwriting
any caller-supplied value. Underpins whoami, audit logging, and
future regime-side decisions that need actor identity.
- tg-whoami and tg-update-user CLIs.
Spec polish
- iam-contract.md: actor-injection rule documented; whoami /
bootstrap-status added to operations list; permission-scope
framing tightened (workspace scope is a property of the grant,
not the user or role).
- iam.md: self-service section; gateway flow gains the actor-
injection step; role section reframed so iam-svc constraints
don't leak into contract-level prose.
- iam-protocol.md: ops table updated for whoami, bootstrap-status,
optional-workspace pattern; bootstrap_available added to the
IamResponse listing.
2026-04-28 22:13:12 +01:00
|
|
|
self.list_users_stmt = c.prepare("""
|
|
|
|
|
SELECT id, workspace, username, name, email, password_hash,
|
|
|
|
|
roles, enabled, must_change_password, created
|
|
|
|
|
FROM iam_users
|
|
|
|
|
""")
|
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model. The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.
IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
passwords and JWT signing keys in Cassandra. Reached over the
standard pub/sub request/response pattern; gateway is the only
caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
rotate-signing-key, create/list/get/update/disable/delete/enable-user,
change-password, reset-password, create/list/get/update/disable-
workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and
retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed. Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
required startup argument with no permissive default. Masked
"auth failure" errors hide whether a refused bootstrap request was
due to mode, state, or authorisation.
Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator. Distinguishes JWTs
(three-segment dotted) from API keys by shape; verifies JWTs
locally using the cached IAM public key; resolves API keys via
IAM with a short-TTL hash-keyed cache. Every failure path
surfaces the same 401 body ("auth failure") so callers cannot
enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
traffic does not begin flowing until auth has started.
Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
OSS ships reader / writer / admin; the first two are workspace-
assigned, admin is cross-workspace ("*"). No "cross-workspace"
pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
authorisation test: some role must grant the capability *and* be
active in the target workspace.
* enforce_workspace validates a request-body workspace against the
caller's role scopes and injects the resolved value. Cross-
workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
permissive default. Construction fails fast if omitted. Enterprise
editions can replace the role table without changing the wire
protocol.
WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
runs on the first WebSocket frame ({"type":"auth","token":"..."})
with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
The socket stays open on failure so the client can re-authenticate
— browsers treat a handshake-time 401 as terminal, breaking
reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
enforces the caller's workspace (envelope + inner payload) using
the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
handshake (URL-scoped short-lived transfers; no re-auth need).
Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
the IAM API (per-op REST endpoints to follow in a later change).
Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
Authenticator.permitted contract. The gateway cannot run without
IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
downgrade path.
CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.
Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
role bundles, agent-as-composition note, enforcement-boundary
policy, enterprise extensibility.
Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
role x workspace combinations, enforce_workspace paths,
unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
explicitly (no permissive defaults relied upon). New tests pin
the fail-closed invariants: DispatcherManager / Mux refuse
auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00
|
|
|
|
|
|
|
|
self.put_username_lookup_stmt = c.prepare("""
|
|
|
|
|
INSERT INTO iam_users_by_username (workspace, username, user_id)
|
|
|
|
|
VALUES (?, ?, ?)
|
|
|
|
|
""")
|
|
|
|
|
self.get_user_id_by_username_stmt = c.prepare("""
|
|
|
|
|
SELECT user_id FROM iam_users_by_username
|
|
|
|
|
WHERE workspace = ? AND username = ?
|
|
|
|
|
""")
|
|
|
|
|
self.delete_username_lookup_stmt = c.prepare("""
|
|
|
|
|
DELETE FROM iam_users_by_username
|
|
|
|
|
WHERE workspace = ? AND username = ?
|
|
|
|
|
""")
|
|
|
|
|
self.delete_user_stmt = c.prepare("""
|
|
|
|
|
DELETE FROM iam_users WHERE id = ?
|
|
|
|
|
""")
|
|
|
|
|
|
|
|
|
|
self.put_api_key_stmt = c.prepare("""
|
|
|
|
|
INSERT INTO iam_api_keys (
|
|
|
|
|
key_hash, id, user_id, name, prefix, expires,
|
|
|
|
|
created, last_used
|
|
|
|
|
)
|
|
|
|
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
|
|
|
|
|
""")
|
|
|
|
|
self.get_api_key_by_hash_stmt = c.prepare("""
|
|
|
|
|
SELECT key_hash, id, user_id, name, prefix, expires,
|
|
|
|
|
created, last_used
|
|
|
|
|
FROM iam_api_keys WHERE key_hash = ?
|
|
|
|
|
""")
|
|
|
|
|
self.get_api_key_by_id_stmt = c.prepare("""
|
|
|
|
|
SELECT key_hash, id, user_id, name, prefix, expires,
|
|
|
|
|
created, last_used
|
|
|
|
|
FROM iam_api_keys WHERE id = ?
|
|
|
|
|
""")
|
|
|
|
|
self.list_api_keys_by_user_stmt = c.prepare("""
|
|
|
|
|
SELECT key_hash, id, user_id, name, prefix, expires,
|
|
|
|
|
created, last_used
|
|
|
|
|
FROM iam_api_keys WHERE user_id = ?
|
|
|
|
|
""")
|
|
|
|
|
self.delete_api_key_stmt = c.prepare("""
|
|
|
|
|
DELETE FROM iam_api_keys WHERE key_hash = ?
|
|
|
|
|
""")
|
|
|
|
|
|
|
|
|
|
self.put_signing_key_stmt = c.prepare("""
|
|
|
|
|
INSERT INTO iam_signing_keys (
|
|
|
|
|
kid, private_pem, public_pem, created, retired
|
|
|
|
|
)
|
|
|
|
|
VALUES (?, ?, ?, ?, ?)
|
|
|
|
|
""")
|
|
|
|
|
self.list_signing_keys_stmt = c.prepare("""
|
|
|
|
|
SELECT kid, private_pem, public_pem, created, retired
|
|
|
|
|
FROM iam_signing_keys
|
|
|
|
|
""")
|
|
|
|
|
self.retire_signing_key_stmt = c.prepare("""
|
|
|
|
|
UPDATE iam_signing_keys SET retired = ? WHERE kid = ?
|
|
|
|
|
""")
|
|
|
|
|
|
|
|
|
|
self.update_user_profile_stmt = c.prepare("""
|
|
|
|
|
UPDATE iam_users
|
|
|
|
|
SET name = ?, email = ?, roles = ?, enabled = ?,
|
|
|
|
|
must_change_password = ?
|
|
|
|
|
WHERE id = ?
|
|
|
|
|
""")
|
|
|
|
|
self.update_user_password_stmt = c.prepare("""
|
|
|
|
|
UPDATE iam_users
|
|
|
|
|
SET password_hash = ?, must_change_password = ?
|
|
|
|
|
WHERE id = ?
|
|
|
|
|
""")
|
|
|
|
|
self.update_user_enabled_stmt = c.prepare("""
|
|
|
|
|
UPDATE iam_users SET enabled = ? WHERE id = ?
|
|
|
|
|
""")
|
|
|
|
|
|
|
|
|
|
self.update_workspace_stmt = c.prepare("""
|
|
|
|
|
UPDATE iam_workspaces SET name = ?, enabled = ?
|
|
|
|
|
WHERE id = ?
|
|
|
|
|
""")
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Workspaces
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
async def put_workspace(self, id, name, enabled, created):
|
|
|
|
|
await async_execute(
|
|
|
|
|
self.cassandra, self.put_workspace_stmt,
|
|
|
|
|
(id, name, enabled, created),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
async def get_workspace(self, id):
|
|
|
|
|
rows = await async_execute(
|
|
|
|
|
self.cassandra, self.get_workspace_stmt, (id,),
|
|
|
|
|
)
|
|
|
|
|
return rows[0] if rows else None
|
|
|
|
|
|
|
|
|
|
async def list_workspaces(self):
|
|
|
|
|
return await async_execute(
|
|
|
|
|
self.cassandra, self.list_workspaces_stmt,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Users
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
async def put_user(
|
|
|
|
|
self, id, workspace, username, name, email, password_hash,
|
|
|
|
|
roles, enabled, must_change_password, created,
|
|
|
|
|
):
|
|
|
|
|
await async_execute(
|
|
|
|
|
self.cassandra, self.put_user_stmt,
|
|
|
|
|
(
|
|
|
|
|
id, workspace, username, name, email, password_hash,
|
|
|
|
|
set(roles) if roles else set(),
|
|
|
|
|
enabled, must_change_password, created,
|
|
|
|
|
),
|
|
|
|
|
)
|
|
|
|
|
await async_execute(
|
|
|
|
|
self.cassandra, self.put_username_lookup_stmt,
|
|
|
|
|
(workspace, username, id),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
async def get_user(self, id):
|
|
|
|
|
rows = await async_execute(
|
|
|
|
|
self.cassandra, self.get_user_stmt, (id,),
|
|
|
|
|
)
|
|
|
|
|
return rows[0] if rows else None
|
|
|
|
|
|
|
|
|
|
async def get_user_id_by_username(self, workspace, username):
|
|
|
|
|
rows = await async_execute(
|
|
|
|
|
self.cassandra, self.get_user_id_by_username_stmt,
|
|
|
|
|
(workspace, username),
|
|
|
|
|
)
|
|
|
|
|
return rows[0][0] if rows else None
|
|
|
|
|
|
|
|
|
|
async def list_users_by_workspace(self, workspace):
|
|
|
|
|
return await async_execute(
|
|
|
|
|
self.cassandra, self.list_users_by_workspace_stmt, (workspace,),
|
|
|
|
|
)
|
|
|
|
|
|
iam: self-service ops, optional workspace filters, Mux service routing (#855)
Three threads, all reinforcing the contract's system-level vs.
workspace-association distinction.
WS Mux service routing
- tg-show-flows (and any workspace-level service over the WS) was
failing with "unknown service" because the post-refactor Mux
unconditionally looked up flow-service:<kind>. Now branches on
the envelope's flow field: with flow → flow-service:<kind>;
without flow → <kind>:<op> from the inner body; with bare op
lookup for service=iam. Resource and parameters come from the
matched op's own extractors — same path the HTTP endpoints take.
Optional workspace on system-level user/key ops
- list-users returns the deployment-wide list when no workspace is
supplied, filters when one is. get-user, update-user,
disable-user, enable-user, delete-user, reset-password,
create-api-key, list-api-keys, revoke-api-key all treat workspace
as an optional integrity check rather than a required argument.
- create-user keeps workspace required — there it's the new user's
home-workspace binding, a parameter rather than an address.
- API keys reclassified as SYSTEM-level resources. By the same
reasoning that makes users system-level, an API key is a
credential record on a deployment-wide registry; the workspace it
authenticates to is a property, not a containment.
Self-service surface
- whoami: returns the caller's own user record. AUTHENTICATED-only;
no users:read capability required. Foundation for UI affordances
that depend on the caller's permissions.
- bootstrap-status: POST /api/v1/auth/bootstrap-status, PUBLIC,
side-effect-free. Returns {bootstrap_available: bool} so a
first-run UI can decide whether to render setup without consuming
the bootstrap op.
- Gateway now injects actor=identity.handle on every authenticated
forward to iam-svc (IamEndpoint and WS Mux iam path), overwriting
any caller-supplied value. Underpins whoami, audit logging, and
future regime-side decisions that need actor identity.
- tg-whoami and tg-update-user CLIs.
Spec polish
- iam-contract.md: actor-injection rule documented; whoami /
bootstrap-status added to operations list; permission-scope
framing tightened (workspace scope is a property of the grant,
not the user or role).
- iam.md: self-service section; gateway flow gains the actor-
injection step; role section reframed so iam-svc constraints
don't leak into contract-level prose.
- iam-protocol.md: ops table updated for whoami, bootstrap-status,
optional-workspace pattern; bootstrap_available added to the
IamResponse listing.
2026-04-28 22:13:12 +01:00
|
|
|
async def list_users(self):
|
|
|
|
|
"""List every user across the deployment. Used by the
|
|
|
|
|
system-level list-users handler when no workspace filter is
|
|
|
|
|
supplied; the gateway has already authorised the call against
|
|
|
|
|
the caller's authority."""
|
|
|
|
|
return await async_execute(
|
|
|
|
|
self.cassandra, self.list_users_stmt, (),
|
|
|
|
|
)
|
|
|
|
|
|
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model. The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.
IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
passwords and JWT signing keys in Cassandra. Reached over the
standard pub/sub request/response pattern; gateway is the only
caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
rotate-signing-key, create/list/get/update/disable/delete/enable-user,
change-password, reset-password, create/list/get/update/disable-
workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and
retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed. Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
required startup argument with no permissive default. Masked
"auth failure" errors hide whether a refused bootstrap request was
due to mode, state, or authorisation.
Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator. Distinguishes JWTs
(three-segment dotted) from API keys by shape; verifies JWTs
locally using the cached IAM public key; resolves API keys via
IAM with a short-TTL hash-keyed cache. Every failure path
surfaces the same 401 body ("auth failure") so callers cannot
enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
traffic does not begin flowing until auth has started.
Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
OSS ships reader / writer / admin; the first two are workspace-
assigned, admin is cross-workspace ("*"). No "cross-workspace"
pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
authorisation test: some role must grant the capability *and* be
active in the target workspace.
* enforce_workspace validates a request-body workspace against the
caller's role scopes and injects the resolved value. Cross-
workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
permissive default. Construction fails fast if omitted. Enterprise
editions can replace the role table without changing the wire
protocol.
WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
runs on the first WebSocket frame ({"type":"auth","token":"..."})
with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
The socket stays open on failure so the client can re-authenticate
— browsers treat a handshake-time 401 as terminal, breaking
reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
enforces the caller's workspace (envelope + inner payload) using
the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
handshake (URL-scoped short-lived transfers; no re-auth need).
Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
the IAM API (per-op REST endpoints to follow in a later change).
Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
Authenticator.permitted contract. The gateway cannot run without
IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
downgrade path.
CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.
Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
role bundles, agent-as-composition note, enforcement-boundary
policy, enterprise extensibility.
Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
role x workspace combinations, enforce_workspace paths,
unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
explicitly (no permissive defaults relied upon). New tests pin
the fail-closed invariants: DispatcherManager / Mux refuse
auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00
|
|
|
async def delete_user(self, id):
|
|
|
|
|
await async_execute(
|
|
|
|
|
self.cassandra, self.delete_user_stmt, (id,),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
async def delete_username_lookup(self, workspace, username):
|
|
|
|
|
await async_execute(
|
|
|
|
|
self.cassandra, self.delete_username_lookup_stmt,
|
|
|
|
|
(workspace, username),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# API keys
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
async def put_api_key(
|
|
|
|
|
self, key_hash, id, user_id, name, prefix, expires,
|
|
|
|
|
created, last_used,
|
|
|
|
|
):
|
|
|
|
|
await async_execute(
|
|
|
|
|
self.cassandra, self.put_api_key_stmt,
|
|
|
|
|
(key_hash, id, user_id, name, prefix, expires,
|
|
|
|
|
created, last_used),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
async def get_api_key_by_hash(self, key_hash):
|
|
|
|
|
rows = await async_execute(
|
|
|
|
|
self.cassandra, self.get_api_key_by_hash_stmt, (key_hash,),
|
|
|
|
|
)
|
|
|
|
|
return rows[0] if rows else None
|
|
|
|
|
|
|
|
|
|
async def get_api_key_by_id(self, id):
|
|
|
|
|
rows = await async_execute(
|
|
|
|
|
self.cassandra, self.get_api_key_by_id_stmt, (id,),
|
|
|
|
|
)
|
|
|
|
|
return rows[0] if rows else None
|
|
|
|
|
|
|
|
|
|
async def list_api_keys_by_user(self, user_id):
|
|
|
|
|
return await async_execute(
|
|
|
|
|
self.cassandra, self.list_api_keys_by_user_stmt, (user_id,),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
async def delete_api_key(self, key_hash):
|
|
|
|
|
await async_execute(
|
|
|
|
|
self.cassandra, self.delete_api_key_stmt, (key_hash,),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Signing keys
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
async def put_signing_key(self, kid, private_pem, public_pem,
|
|
|
|
|
created, retired):
|
|
|
|
|
await async_execute(
|
|
|
|
|
self.cassandra, self.put_signing_key_stmt,
|
|
|
|
|
(kid, private_pem, public_pem, created, retired),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
async def list_signing_keys(self):
|
|
|
|
|
return await async_execute(
|
|
|
|
|
self.cassandra, self.list_signing_keys_stmt,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
async def retire_signing_key(self, kid, retired):
|
|
|
|
|
await async_execute(
|
|
|
|
|
self.cassandra, self.retire_signing_key_stmt,
|
|
|
|
|
(retired, kid),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# User partial updates
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
async def update_user_profile(
|
|
|
|
|
self, id, name, email, roles, enabled, must_change_password,
|
|
|
|
|
):
|
|
|
|
|
await async_execute(
|
|
|
|
|
self.cassandra, self.update_user_profile_stmt,
|
|
|
|
|
(
|
|
|
|
|
name, email,
|
|
|
|
|
set(roles) if roles else set(),
|
|
|
|
|
enabled, must_change_password, id,
|
|
|
|
|
),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
async def update_user_password(
|
|
|
|
|
self, id, password_hash, must_change_password,
|
|
|
|
|
):
|
|
|
|
|
await async_execute(
|
|
|
|
|
self.cassandra, self.update_user_password_stmt,
|
|
|
|
|
(password_hash, must_change_password, id),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
async def update_user_enabled(self, id, enabled):
|
|
|
|
|
await async_execute(
|
|
|
|
|
self.cassandra, self.update_user_enabled_stmt,
|
|
|
|
|
(enabled, id),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Workspace updates
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
async def update_workspace(self, id, name, enabled):
|
|
|
|
|
await async_execute(
|
|
|
|
|
self.cassandra, self.update_workspace_stmt,
|
|
|
|
|
(name, enabled, id),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Bootstrap helpers
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
async def any_workspace_exists(self):
|
|
|
|
|
rows = await self.list_workspaces()
|
|
|
|
|
return bool(rows)
|
2026-05-18 22:08:12 +01:00
|
|
|
|
|
|
|
|
async def any_signing_key_exists(self):
|
|
|
|
|
rows = await self.list_signing_keys()
|
|
|
|
|
return bool(rows)
|