trustgraph/tests/unit/test_gateway/test_endpoint_manager.py

91 lines
4.1 KiB
Python
Raw Normal View History

"""
Tests for Gateway Endpoint Manager
"""
import pytest
from unittest.mock import MagicMock
from trustgraph.gateway.endpoint.manager import EndpointManager
class TestEndpointManager:
"""Test cases for EndpointManager class"""
def test_endpoint_manager_initialization(self):
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849) Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed identity and authorisation model. The gateway no longer has an "allow-all" or "no auth" mode; every request is authenticated via the IAM service, authorised against a capability model that encodes both the operation and the workspace it targets, and rejected with a deliberately-uninformative 401 / 403 on any failure. IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam) ----------------------------------------------------------------------- * New backend service (iam-svc) owning users, workspaces, API keys, passwords and JWT signing keys in Cassandra. Reached over the standard pub/sub request/response pattern; gateway is the only caller. * Operations: bootstrap, resolve-api-key, login, get-signing-key-public, rotate-signing-key, create/list/get/update/disable/delete/enable-user, change-password, reset-password, create/list/get/update/disable- workspace, create/list/revoke-api-key. * Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and retires the previous one; validation is grace-period friendly. * Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt. * API keys: 128-bit random, SHA-256 hashed. Plaintext returned once. * Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a required startup argument with no permissive default. Masked "auth failure" errors hide whether a refused bootstrap request was due to mode, state, or authorisation. Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py) ------------------------------------------------------------------- * IamAuth replaces the legacy Authenticator. Distinguishes JWTs (three-segment dotted) from API keys by shape; verifies JWTs locally using the cached IAM public key; resolves API keys via IAM with a short-TTL hash-keyed cache. Every failure path surfaces the same 401 body ("auth failure") so callers cannot enumerate credential state. * Public key is fetched at gateway startup with a bounded retry loop; traffic does not begin flowing until auth has started. Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py) --------------------------------------------------------------------- * Roles have two dimensions: a capability set and a workspace scope. OSS ships reader / writer / admin; the first two are workspace- assigned, admin is cross-workspace ("*"). No "cross-workspace" pseudo-capability — workspace permission is a property of the role. * check(identity, capability, target_workspace=None) is the single authorisation test: some role must grant the capability *and* be active in the target workspace. * enforce_workspace validates a request-body workspace against the caller's role scopes and injects the resolved value. Cross- workspace admin is permitted by role scope, not by a bypass. * Gateway endpoints declare a required capability explicitly — no permissive default. Construction fails fast if omitted. Enterprise editions can replace the role table without changing the wire protocol. WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py) ---------------------------------------------------------------- * /api/v1/socket handshake unconditionally accepts; authentication runs on the first WebSocket frame ({"type":"auth","token":"..."}) with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}. The socket stays open on failure so the client can re-authenticate — browsers treat a handshake-time 401 as terminal, breaking reconnection. * Mux.receive rejects every non-auth frame before auth succeeds, enforces the caller's workspace (envelope + inner payload) using the role-scope resolver, and supports mid-session re-auth. * Flow import/export streaming endpoints keep the legacy ?token= handshake (URL-scoped short-lived transfers; no re-auth need). Auth surface ------------ * POST /api/v1/auth/login — public, returns a JWT. * POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap op which itself enforces mode + tables-empty. * POST /api/v1/auth/change-password — any authenticated user. * POST /api/v1/iam — admin-only generic forwarder for the rest of the IAM API (per-op REST endpoints to follow in a later change). Removed / breaking ------------------ * GATEWAY_SECRET / --api-token / default_api_token and the legacy Authenticator.permitted contract. The gateway cannot run without IAM. * ?token= on /api/v1/socket. * DispatcherManager and Mux both raise on auth=None — no silent downgrade path. CLI tools (trustgraph-cli) -------------------------- tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users, tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password, tg-reset-password, tg-create-api-key, tg-list-api-keys, tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords read via getpass; tokens / one-time secrets written to stdout with operator context on stderr so shell composition works cleanly. AsyncSocketClient / SocketClient updated to the first-frame auth protocol. Specifications -------------- * docs/tech-specs/iam.md updated with the error policy, workspace resolver extension point, and OSS role-scope model. * docs/tech-specs/iam-protocol.md (new) — transport, dataclasses, operation table, error taxonomy, bootstrap modes. * docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS role bundles, agent-as-composition note, enforcement-boundary policy, enterprise extensibility. Tests ----- * test_auth.py (rewritten) — IamAuth + JWT round-trip with real Ed25519 keypairs + API-key cache behaviour. * test_capabilities.py (new) — role table sanity, check across role x workspace combinations, enforce_workspace paths, unknown-cap / unknown-role fail-closed. * Every endpoint test construction now names its capability explicitly (no permissive defaults relied upon). New tests pin the fail-closed invariants: DispatcherManager / Mux refuse auth=None; i18n path-traversal defense is exercised. * test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00
"""EndpointManager wires up the full endpoint set and
records dispatcher_manager / timeout on the instance."""
mock_dispatcher_manager = MagicMock()
mock_auth = MagicMock()
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849) Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed identity and authorisation model. The gateway no longer has an "allow-all" or "no auth" mode; every request is authenticated via the IAM service, authorised against a capability model that encodes both the operation and the workspace it targets, and rejected with a deliberately-uninformative 401 / 403 on any failure. IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam) ----------------------------------------------------------------------- * New backend service (iam-svc) owning users, workspaces, API keys, passwords and JWT signing keys in Cassandra. Reached over the standard pub/sub request/response pattern; gateway is the only caller. * Operations: bootstrap, resolve-api-key, login, get-signing-key-public, rotate-signing-key, create/list/get/update/disable/delete/enable-user, change-password, reset-password, create/list/get/update/disable- workspace, create/list/revoke-api-key. * Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and retires the previous one; validation is grace-period friendly. * Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt. * API keys: 128-bit random, SHA-256 hashed. Plaintext returned once. * Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a required startup argument with no permissive default. Masked "auth failure" errors hide whether a refused bootstrap request was due to mode, state, or authorisation. Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py) ------------------------------------------------------------------- * IamAuth replaces the legacy Authenticator. Distinguishes JWTs (three-segment dotted) from API keys by shape; verifies JWTs locally using the cached IAM public key; resolves API keys via IAM with a short-TTL hash-keyed cache. Every failure path surfaces the same 401 body ("auth failure") so callers cannot enumerate credential state. * Public key is fetched at gateway startup with a bounded retry loop; traffic does not begin flowing until auth has started. Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py) --------------------------------------------------------------------- * Roles have two dimensions: a capability set and a workspace scope. OSS ships reader / writer / admin; the first two are workspace- assigned, admin is cross-workspace ("*"). No "cross-workspace" pseudo-capability — workspace permission is a property of the role. * check(identity, capability, target_workspace=None) is the single authorisation test: some role must grant the capability *and* be active in the target workspace. * enforce_workspace validates a request-body workspace against the caller's role scopes and injects the resolved value. Cross- workspace admin is permitted by role scope, not by a bypass. * Gateway endpoints declare a required capability explicitly — no permissive default. Construction fails fast if omitted. Enterprise editions can replace the role table without changing the wire protocol. WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py) ---------------------------------------------------------------- * /api/v1/socket handshake unconditionally accepts; authentication runs on the first WebSocket frame ({"type":"auth","token":"..."}) with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}. The socket stays open on failure so the client can re-authenticate — browsers treat a handshake-time 401 as terminal, breaking reconnection. * Mux.receive rejects every non-auth frame before auth succeeds, enforces the caller's workspace (envelope + inner payload) using the role-scope resolver, and supports mid-session re-auth. * Flow import/export streaming endpoints keep the legacy ?token= handshake (URL-scoped short-lived transfers; no re-auth need). Auth surface ------------ * POST /api/v1/auth/login — public, returns a JWT. * POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap op which itself enforces mode + tables-empty. * POST /api/v1/auth/change-password — any authenticated user. * POST /api/v1/iam — admin-only generic forwarder for the rest of the IAM API (per-op REST endpoints to follow in a later change). Removed / breaking ------------------ * GATEWAY_SECRET / --api-token / default_api_token and the legacy Authenticator.permitted contract. The gateway cannot run without IAM. * ?token= on /api/v1/socket. * DispatcherManager and Mux both raise on auth=None — no silent downgrade path. CLI tools (trustgraph-cli) -------------------------- tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users, tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password, tg-reset-password, tg-create-api-key, tg-list-api-keys, tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords read via getpass; tokens / one-time secrets written to stdout with operator context on stderr so shell composition works cleanly. AsyncSocketClient / SocketClient updated to the first-frame auth protocol. Specifications -------------- * docs/tech-specs/iam.md updated with the error policy, workspace resolver extension point, and OSS role-scope model. * docs/tech-specs/iam-protocol.md (new) — transport, dataclasses, operation table, error taxonomy, bootstrap modes. * docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS role bundles, agent-as-composition note, enforcement-boundary policy, enterprise extensibility. Tests ----- * test_auth.py (rewritten) — IamAuth + JWT round-trip with real Ed25519 keypairs + API-key cache behaviour. * test_capabilities.py (new) — role table sanity, check across role x workspace combinations, enforce_workspace paths, unknown-cap / unknown-role fail-closed. * Every endpoint test construction now names its capability explicitly (no permissive defaults relied upon). New tests pin the fail-closed invariants: DispatcherManager / Mux refuse auth=None; i18n path-traversal defense is exercised. * test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00
# The dispatcher_manager exposes a small set of factory
# methods — MagicMock auto-creates them, returning fresh
# MagicMocks on each call.
manager = EndpointManager(
dispatcher_manager=mock_dispatcher_manager,
auth=mock_auth,
prometheus_url="http://prometheus:9090",
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849) Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed identity and authorisation model. The gateway no longer has an "allow-all" or "no auth" mode; every request is authenticated via the IAM service, authorised against a capability model that encodes both the operation and the workspace it targets, and rejected with a deliberately-uninformative 401 / 403 on any failure. IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam) ----------------------------------------------------------------------- * New backend service (iam-svc) owning users, workspaces, API keys, passwords and JWT signing keys in Cassandra. Reached over the standard pub/sub request/response pattern; gateway is the only caller. * Operations: bootstrap, resolve-api-key, login, get-signing-key-public, rotate-signing-key, create/list/get/update/disable/delete/enable-user, change-password, reset-password, create/list/get/update/disable- workspace, create/list/revoke-api-key. * Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and retires the previous one; validation is grace-period friendly. * Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt. * API keys: 128-bit random, SHA-256 hashed. Plaintext returned once. * Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a required startup argument with no permissive default. Masked "auth failure" errors hide whether a refused bootstrap request was due to mode, state, or authorisation. Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py) ------------------------------------------------------------------- * IamAuth replaces the legacy Authenticator. Distinguishes JWTs (three-segment dotted) from API keys by shape; verifies JWTs locally using the cached IAM public key; resolves API keys via IAM with a short-TTL hash-keyed cache. Every failure path surfaces the same 401 body ("auth failure") so callers cannot enumerate credential state. * Public key is fetched at gateway startup with a bounded retry loop; traffic does not begin flowing until auth has started. Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py) --------------------------------------------------------------------- * Roles have two dimensions: a capability set and a workspace scope. OSS ships reader / writer / admin; the first two are workspace- assigned, admin is cross-workspace ("*"). No "cross-workspace" pseudo-capability — workspace permission is a property of the role. * check(identity, capability, target_workspace=None) is the single authorisation test: some role must grant the capability *and* be active in the target workspace. * enforce_workspace validates a request-body workspace against the caller's role scopes and injects the resolved value. Cross- workspace admin is permitted by role scope, not by a bypass. * Gateway endpoints declare a required capability explicitly — no permissive default. Construction fails fast if omitted. Enterprise editions can replace the role table without changing the wire protocol. WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py) ---------------------------------------------------------------- * /api/v1/socket handshake unconditionally accepts; authentication runs on the first WebSocket frame ({"type":"auth","token":"..."}) with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}. The socket stays open on failure so the client can re-authenticate — browsers treat a handshake-time 401 as terminal, breaking reconnection. * Mux.receive rejects every non-auth frame before auth succeeds, enforces the caller's workspace (envelope + inner payload) using the role-scope resolver, and supports mid-session re-auth. * Flow import/export streaming endpoints keep the legacy ?token= handshake (URL-scoped short-lived transfers; no re-auth need). Auth surface ------------ * POST /api/v1/auth/login — public, returns a JWT. * POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap op which itself enforces mode + tables-empty. * POST /api/v1/auth/change-password — any authenticated user. * POST /api/v1/iam — admin-only generic forwarder for the rest of the IAM API (per-op REST endpoints to follow in a later change). Removed / breaking ------------------ * GATEWAY_SECRET / --api-token / default_api_token and the legacy Authenticator.permitted contract. The gateway cannot run without IAM. * ?token= on /api/v1/socket. * DispatcherManager and Mux both raise on auth=None — no silent downgrade path. CLI tools (trustgraph-cli) -------------------------- tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users, tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password, tg-reset-password, tg-create-api-key, tg-list-api-keys, tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords read via getpass; tokens / one-time secrets written to stdout with operator context on stderr so shell composition works cleanly. AsyncSocketClient / SocketClient updated to the first-frame auth protocol. Specifications -------------- * docs/tech-specs/iam.md updated with the error policy, workspace resolver extension point, and OSS role-scope model. * docs/tech-specs/iam-protocol.md (new) — transport, dataclasses, operation table, error taxonomy, bootstrap modes. * docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS role bundles, agent-as-composition note, enforcement-boundary policy, enterprise extensibility. Tests ----- * test_auth.py (rewritten) — IamAuth + JWT round-trip with real Ed25519 keypairs + API-key cache behaviour. * test_capabilities.py (new) — role table sanity, check across role x workspace combinations, enforce_workspace paths, unknown-cap / unknown-role fail-closed. * Every endpoint test construction now names its capability explicitly (no permissive defaults relied upon). New tests pin the fail-closed invariants: DispatcherManager / Mux refuse auth=None; i18n path-traversal defense is exercised. * test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00
timeout=300,
)
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849) Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed identity and authorisation model. The gateway no longer has an "allow-all" or "no auth" mode; every request is authenticated via the IAM service, authorised against a capability model that encodes both the operation and the workspace it targets, and rejected with a deliberately-uninformative 401 / 403 on any failure. IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam) ----------------------------------------------------------------------- * New backend service (iam-svc) owning users, workspaces, API keys, passwords and JWT signing keys in Cassandra. Reached over the standard pub/sub request/response pattern; gateway is the only caller. * Operations: bootstrap, resolve-api-key, login, get-signing-key-public, rotate-signing-key, create/list/get/update/disable/delete/enable-user, change-password, reset-password, create/list/get/update/disable- workspace, create/list/revoke-api-key. * Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and retires the previous one; validation is grace-period friendly. * Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt. * API keys: 128-bit random, SHA-256 hashed. Plaintext returned once. * Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a required startup argument with no permissive default. Masked "auth failure" errors hide whether a refused bootstrap request was due to mode, state, or authorisation. Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py) ------------------------------------------------------------------- * IamAuth replaces the legacy Authenticator. Distinguishes JWTs (three-segment dotted) from API keys by shape; verifies JWTs locally using the cached IAM public key; resolves API keys via IAM with a short-TTL hash-keyed cache. Every failure path surfaces the same 401 body ("auth failure") so callers cannot enumerate credential state. * Public key is fetched at gateway startup with a bounded retry loop; traffic does not begin flowing until auth has started. Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py) --------------------------------------------------------------------- * Roles have two dimensions: a capability set and a workspace scope. OSS ships reader / writer / admin; the first two are workspace- assigned, admin is cross-workspace ("*"). No "cross-workspace" pseudo-capability — workspace permission is a property of the role. * check(identity, capability, target_workspace=None) is the single authorisation test: some role must grant the capability *and* be active in the target workspace. * enforce_workspace validates a request-body workspace against the caller's role scopes and injects the resolved value. Cross- workspace admin is permitted by role scope, not by a bypass. * Gateway endpoints declare a required capability explicitly — no permissive default. Construction fails fast if omitted. Enterprise editions can replace the role table without changing the wire protocol. WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py) ---------------------------------------------------------------- * /api/v1/socket handshake unconditionally accepts; authentication runs on the first WebSocket frame ({"type":"auth","token":"..."}) with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}. The socket stays open on failure so the client can re-authenticate — browsers treat a handshake-time 401 as terminal, breaking reconnection. * Mux.receive rejects every non-auth frame before auth succeeds, enforces the caller's workspace (envelope + inner payload) using the role-scope resolver, and supports mid-session re-auth. * Flow import/export streaming endpoints keep the legacy ?token= handshake (URL-scoped short-lived transfers; no re-auth need). Auth surface ------------ * POST /api/v1/auth/login — public, returns a JWT. * POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap op which itself enforces mode + tables-empty. * POST /api/v1/auth/change-password — any authenticated user. * POST /api/v1/iam — admin-only generic forwarder for the rest of the IAM API (per-op REST endpoints to follow in a later change). Removed / breaking ------------------ * GATEWAY_SECRET / --api-token / default_api_token and the legacy Authenticator.permitted contract. The gateway cannot run without IAM. * ?token= on /api/v1/socket. * DispatcherManager and Mux both raise on auth=None — no silent downgrade path. CLI tools (trustgraph-cli) -------------------------- tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users, tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password, tg-reset-password, tg-create-api-key, tg-list-api-keys, tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords read via getpass; tokens / one-time secrets written to stdout with operator context on stderr so shell composition works cleanly. AsyncSocketClient / SocketClient updated to the first-frame auth protocol. Specifications -------------- * docs/tech-specs/iam.md updated with the error policy, workspace resolver extension point, and OSS role-scope model. * docs/tech-specs/iam-protocol.md (new) — transport, dataclasses, operation table, error taxonomy, bootstrap modes. * docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS role bundles, agent-as-composition note, enforcement-boundary policy, enterprise extensibility. Tests ----- * test_auth.py (rewritten) — IamAuth + JWT round-trip with real Ed25519 keypairs + API-key cache behaviour. * test_capabilities.py (new) — role table sanity, check across role x workspace combinations, enforce_workspace paths, unknown-cap / unknown-role fail-closed. * Every endpoint test construction now names its capability explicitly (no permissive defaults relied upon). New tests pin the fail-closed invariants: DispatcherManager / Mux refuse auth=None; i18n path-traversal defense is exercised. * test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00
assert manager.dispatcher_manager == mock_dispatcher_manager
assert manager.timeout == 300
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849) Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed identity and authorisation model. The gateway no longer has an "allow-all" or "no auth" mode; every request is authenticated via the IAM service, authorised against a capability model that encodes both the operation and the workspace it targets, and rejected with a deliberately-uninformative 401 / 403 on any failure. IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam) ----------------------------------------------------------------------- * New backend service (iam-svc) owning users, workspaces, API keys, passwords and JWT signing keys in Cassandra. Reached over the standard pub/sub request/response pattern; gateway is the only caller. * Operations: bootstrap, resolve-api-key, login, get-signing-key-public, rotate-signing-key, create/list/get/update/disable/delete/enable-user, change-password, reset-password, create/list/get/update/disable- workspace, create/list/revoke-api-key. * Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and retires the previous one; validation is grace-period friendly. * Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt. * API keys: 128-bit random, SHA-256 hashed. Plaintext returned once. * Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a required startup argument with no permissive default. Masked "auth failure" errors hide whether a refused bootstrap request was due to mode, state, or authorisation. Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py) ------------------------------------------------------------------- * IamAuth replaces the legacy Authenticator. Distinguishes JWTs (three-segment dotted) from API keys by shape; verifies JWTs locally using the cached IAM public key; resolves API keys via IAM with a short-TTL hash-keyed cache. Every failure path surfaces the same 401 body ("auth failure") so callers cannot enumerate credential state. * Public key is fetched at gateway startup with a bounded retry loop; traffic does not begin flowing until auth has started. Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py) --------------------------------------------------------------------- * Roles have two dimensions: a capability set and a workspace scope. OSS ships reader / writer / admin; the first two are workspace- assigned, admin is cross-workspace ("*"). No "cross-workspace" pseudo-capability — workspace permission is a property of the role. * check(identity, capability, target_workspace=None) is the single authorisation test: some role must grant the capability *and* be active in the target workspace. * enforce_workspace validates a request-body workspace against the caller's role scopes and injects the resolved value. Cross- workspace admin is permitted by role scope, not by a bypass. * Gateway endpoints declare a required capability explicitly — no permissive default. Construction fails fast if omitted. Enterprise editions can replace the role table without changing the wire protocol. WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py) ---------------------------------------------------------------- * /api/v1/socket handshake unconditionally accepts; authentication runs on the first WebSocket frame ({"type":"auth","token":"..."}) with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}. The socket stays open on failure so the client can re-authenticate — browsers treat a handshake-time 401 as terminal, breaking reconnection. * Mux.receive rejects every non-auth frame before auth succeeds, enforces the caller's workspace (envelope + inner payload) using the role-scope resolver, and supports mid-session re-auth. * Flow import/export streaming endpoints keep the legacy ?token= handshake (URL-scoped short-lived transfers; no re-auth need). Auth surface ------------ * POST /api/v1/auth/login — public, returns a JWT. * POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap op which itself enforces mode + tables-empty. * POST /api/v1/auth/change-password — any authenticated user. * POST /api/v1/iam — admin-only generic forwarder for the rest of the IAM API (per-op REST endpoints to follow in a later change). Removed / breaking ------------------ * GATEWAY_SECRET / --api-token / default_api_token and the legacy Authenticator.permitted contract. The gateway cannot run without IAM. * ?token= on /api/v1/socket. * DispatcherManager and Mux both raise on auth=None — no silent downgrade path. CLI tools (trustgraph-cli) -------------------------- tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users, tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password, tg-reset-password, tg-create-api-key, tg-list-api-keys, tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords read via getpass; tokens / one-time secrets written to stdout with operator context on stderr so shell composition works cleanly. AsyncSocketClient / SocketClient updated to the first-frame auth protocol. Specifications -------------- * docs/tech-specs/iam.md updated with the error policy, workspace resolver extension point, and OSS role-scope model. * docs/tech-specs/iam-protocol.md (new) — transport, dataclasses, operation table, error taxonomy, bootstrap modes. * docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS role bundles, agent-as-composition note, enforcement-boundary policy, enterprise extensibility. Tests ----- * test_auth.py (rewritten) — IamAuth + JWT round-trip with real Ed25519 keypairs + API-key cache behaviour. * test_capabilities.py (new) — role table sanity, check across role x workspace combinations, enforce_workspace paths, unknown-cap / unknown-role fail-closed. * Every endpoint test construction now names its capability explicitly (no permissive defaults relied upon). New tests pin the fail-closed invariants: DispatcherManager / Mux refuse auth=None; i18n path-traversal defense is exercised. * test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00
assert len(manager.endpoints) > 0
def test_endpoint_manager_with_default_timeout(self):
"""Test EndpointManager with default timeout value"""
mock_dispatcher_manager = MagicMock()
mock_auth = MagicMock()
# Mock dispatcher methods
mock_dispatcher_manager.dispatch_global_service.return_value = MagicMock()
mock_dispatcher_manager.dispatch_socket.return_value = MagicMock()
mock_dispatcher_manager.dispatch_flow_service.return_value = MagicMock()
mock_dispatcher_manager.dispatch_flow_import.return_value = MagicMock()
mock_dispatcher_manager.dispatch_flow_export.return_value = MagicMock()
mock_dispatcher_manager.dispatch_core_import.return_value = MagicMock()
mock_dispatcher_manager.dispatch_core_export.return_value = MagicMock()
manager = EndpointManager(
dispatcher_manager=mock_dispatcher_manager,
auth=mock_auth,
prometheus_url="http://prometheus:9090"
)
assert manager.timeout == 600 # Default value
def test_endpoint_manager_dispatcher_calls(self):
"""Test EndpointManager calls all required dispatcher methods"""
mock_dispatcher_manager = MagicMock()
mock_auth = MagicMock()
# Mock dispatcher methods that are actually called
mock_dispatcher_manager.dispatch_global_service.return_value = MagicMock()
mock_dispatcher_manager.dispatch_socket.return_value = MagicMock()
mock_dispatcher_manager.dispatch_flow_service.return_value = MagicMock()
mock_dispatcher_manager.dispatch_flow_import.return_value = MagicMock()
mock_dispatcher_manager.dispatch_flow_export.return_value = MagicMock()
mock_dispatcher_manager.dispatch_core_import.return_value = MagicMock()
mock_dispatcher_manager.dispatch_core_export.return_value = MagicMock()
EndpointManager(
dispatcher_manager=mock_dispatcher_manager,
auth=mock_auth,
prometheus_url="http://test:9090"
)
refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853) The gateway no longer holds any policy state — capability sets, role definitions, workspace scope rules. Per the IAM contract it asks the regime "may this identity perform this capability on this resource?" per request. That moves the OSS role-based regime entirely into iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing the gateway, the wire protocol, or backend services. Contract: - authenticate(credential) -> Identity (handle, workspace, principal_id, source). No roles, claims, or policy state surface to the gateway. - authorise(identity, capability, resource, parameters) -> (allow, ttl). Cached per-decision (regime TTL clamped above; fail-closed on regime errors). - authorise_many available as a fan-out variant. Operation registry drives every authorisation decision: - /api/v1/iam -> IamEndpoint, looks up bare op name (create-user, list-workspaces, ...). - /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op> (config:get, flow:list-blueprints, librarian:add-document, ...). - /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>. - /api/v1/flow/{flow}/{import,export}/{kind} -> flow-{import,export}:<kind>. - WS Mux per-frame -> flow-service:<kind>; closes a gap where authenticated users could hit any service kind. 85 operations registered across the surface. JWT carries identity only — sub + workspace. The roles claim is gone; the gateway never reads policy state from a credential. The three coarse *_KIND_CAPABILITY maps are removed. The registry is the only source of truth for the capability + resource shape of an operation. Tests migrated to the new Identity shape and to authorise()-mocked auth doubles. Specs updated: docs/tech-specs/iam-contract.md (Identity surface, caching, registry-naming conventions), iam.md (JWT shape, gateway flow, role section reframed as OSS-regime detail), iam-protocol.md (positioned as one implementation of the contract).
2026-04-28 16:19:41 +01:00
# Each dispatcher factory is invoked once per endpoint that
# needs a dedicated wire. dispatch_auth_iam is shared by
# two endpoints — AuthEndpoints (login / bootstrap /
# change-password) and IamEndpoint (registry-driven
# /api/v1/iam) — so it's expected to be called twice.
# Both forwarders pin the dispatcher to kind=iam and reuse
# the same factory; they're distinct from
# dispatch_global_service (the generic /api/v1/{kind} route).
mock_dispatcher_manager.dispatch_global_service.assert_called_once()
refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853) The gateway no longer holds any policy state — capability sets, role definitions, workspace scope rules. Per the IAM contract it asks the regime "may this identity perform this capability on this resource?" per request. That moves the OSS role-based regime entirely into iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing the gateway, the wire protocol, or backend services. Contract: - authenticate(credential) -> Identity (handle, workspace, principal_id, source). No roles, claims, or policy state surface to the gateway. - authorise(identity, capability, resource, parameters) -> (allow, ttl). Cached per-decision (regime TTL clamped above; fail-closed on regime errors). - authorise_many available as a fan-out variant. Operation registry drives every authorisation decision: - /api/v1/iam -> IamEndpoint, looks up bare op name (create-user, list-workspaces, ...). - /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op> (config:get, flow:list-blueprints, librarian:add-document, ...). - /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>. - /api/v1/flow/{flow}/{import,export}/{kind} -> flow-{import,export}:<kind>. - WS Mux per-frame -> flow-service:<kind>; closes a gap where authenticated users could hit any service kind. 85 operations registered across the surface. JWT carries identity only — sub + workspace. The roles claim is gone; the gateway never reads policy state from a credential. The three coarse *_KIND_CAPABILITY maps are removed. The registry is the only source of truth for the capability + resource shape of an operation. Tests migrated to the new Identity shape and to authorise()-mocked auth doubles. Specs updated: docs/tech-specs/iam-contract.md (Identity surface, caching, registry-naming conventions), iam.md (JWT shape, gateway flow, role section reframed as OSS-regime detail), iam-protocol.md (positioned as one implementation of the contract).
2026-04-28 16:19:41 +01:00
assert mock_dispatcher_manager.dispatch_auth_iam.call_count == 2
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849) Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed identity and authorisation model. The gateway no longer has an "allow-all" or "no auth" mode; every request is authenticated via the IAM service, authorised against a capability model that encodes both the operation and the workspace it targets, and rejected with a deliberately-uninformative 401 / 403 on any failure. IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam) ----------------------------------------------------------------------- * New backend service (iam-svc) owning users, workspaces, API keys, passwords and JWT signing keys in Cassandra. Reached over the standard pub/sub request/response pattern; gateway is the only caller. * Operations: bootstrap, resolve-api-key, login, get-signing-key-public, rotate-signing-key, create/list/get/update/disable/delete/enable-user, change-password, reset-password, create/list/get/update/disable- workspace, create/list/revoke-api-key. * Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and retires the previous one; validation is grace-period friendly. * Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt. * API keys: 128-bit random, SHA-256 hashed. Plaintext returned once. * Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a required startup argument with no permissive default. Masked "auth failure" errors hide whether a refused bootstrap request was due to mode, state, or authorisation. Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py) ------------------------------------------------------------------- * IamAuth replaces the legacy Authenticator. Distinguishes JWTs (three-segment dotted) from API keys by shape; verifies JWTs locally using the cached IAM public key; resolves API keys via IAM with a short-TTL hash-keyed cache. Every failure path surfaces the same 401 body ("auth failure") so callers cannot enumerate credential state. * Public key is fetched at gateway startup with a bounded retry loop; traffic does not begin flowing until auth has started. Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py) --------------------------------------------------------------------- * Roles have two dimensions: a capability set and a workspace scope. OSS ships reader / writer / admin; the first two are workspace- assigned, admin is cross-workspace ("*"). No "cross-workspace" pseudo-capability — workspace permission is a property of the role. * check(identity, capability, target_workspace=None) is the single authorisation test: some role must grant the capability *and* be active in the target workspace. * enforce_workspace validates a request-body workspace against the caller's role scopes and injects the resolved value. Cross- workspace admin is permitted by role scope, not by a bypass. * Gateway endpoints declare a required capability explicitly — no permissive default. Construction fails fast if omitted. Enterprise editions can replace the role table without changing the wire protocol. WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py) ---------------------------------------------------------------- * /api/v1/socket handshake unconditionally accepts; authentication runs on the first WebSocket frame ({"type":"auth","token":"..."}) with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}. The socket stays open on failure so the client can re-authenticate — browsers treat a handshake-time 401 as terminal, breaking reconnection. * Mux.receive rejects every non-auth frame before auth succeeds, enforces the caller's workspace (envelope + inner payload) using the role-scope resolver, and supports mid-session re-auth. * Flow import/export streaming endpoints keep the legacy ?token= handshake (URL-scoped short-lived transfers; no re-auth need). Auth surface ------------ * POST /api/v1/auth/login — public, returns a JWT. * POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap op which itself enforces mode + tables-empty. * POST /api/v1/auth/change-password — any authenticated user. * POST /api/v1/iam — admin-only generic forwarder for the rest of the IAM API (per-op REST endpoints to follow in a later change). Removed / breaking ------------------ * GATEWAY_SECRET / --api-token / default_api_token and the legacy Authenticator.permitted contract. The gateway cannot run without IAM. * ?token= on /api/v1/socket. * DispatcherManager and Mux both raise on auth=None — no silent downgrade path. CLI tools (trustgraph-cli) -------------------------- tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users, tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password, tg-reset-password, tg-create-api-key, tg-list-api-keys, tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords read via getpass; tokens / one-time secrets written to stdout with operator context on stderr so shell composition works cleanly. AsyncSocketClient / SocketClient updated to the first-frame auth protocol. Specifications -------------- * docs/tech-specs/iam.md updated with the error policy, workspace resolver extension point, and OSS role-scope model. * docs/tech-specs/iam-protocol.md (new) — transport, dataclasses, operation table, error taxonomy, bootstrap modes. * docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS role bundles, agent-as-composition note, enforcement-boundary policy, enterprise extensibility. Tests ----- * test_auth.py (rewritten) — IamAuth + JWT round-trip with real Ed25519 keypairs + API-key cache behaviour. * test_capabilities.py (new) — role table sanity, check across role x workspace combinations, enforce_workspace paths, unknown-cap / unknown-role fail-closed. * Every endpoint test construction now names its capability explicitly (no permissive defaults relied upon). New tests pin the fail-closed invariants: DispatcherManager / Mux refuse auth=None; i18n path-traversal defense is exercised. * test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00
mock_dispatcher_manager.dispatch_socket.assert_called_once()
mock_dispatcher_manager.dispatch_flow_service.assert_called_once()
mock_dispatcher_manager.dispatch_flow_import.assert_called_once()
mock_dispatcher_manager.dispatch_flow_export.assert_called_once()
mock_dispatcher_manager.dispatch_core_import.assert_called_once()
mock_dispatcher_manager.dispatch_core_export.assert_called_once()