feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)

Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model.  The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.

IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
  passwords and JWT signing keys in Cassandra.  Reached over the
  standard pub/sub request/response pattern; gateway is the only
  caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
  rotate-signing-key, create/list/get/update/disable/delete/enable-user,
  change-password, reset-password, create/list/get/update/disable-
  workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA).  Key rotation writes a new kid and
  retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed.  Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
  required startup argument with no permissive default.  Masked
  "auth failure" errors hide whether a refused bootstrap request was
  due to mode, state, or authorisation.

Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator.  Distinguishes JWTs
  (three-segment dotted) from API keys by shape; verifies JWTs
  locally using the cached IAM public key; resolves API keys via
  IAM with a short-TTL hash-keyed cache.  Every failure path
  surfaces the same 401 body ("auth failure") so callers cannot
  enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
  traffic does not begin flowing until auth has started.

Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
  OSS ships reader / writer / admin; the first two are workspace-
  assigned, admin is cross-workspace ("*").  No "cross-workspace"
  pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
  authorisation test: some role must grant the capability *and* be
  active in the target workspace.
* enforce_workspace validates a request-body workspace against the
  caller's role scopes and injects the resolved value.  Cross-
  workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
  permissive default.  Construction fails fast if omitted.  Enterprise
  editions can replace the role table without changing the wire
  protocol.

WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
  runs on the first WebSocket frame ({"type":"auth","token":"..."})
  with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
  The socket stays open on failure so the client can re-authenticate
  — browsers treat a handshake-time 401 as terminal, breaking
  reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
  enforces the caller's workspace (envelope + inner payload) using
  the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
  handshake (URL-scoped short-lived transfers; no re-auth need).

Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
  op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
  the IAM API (per-op REST endpoints to follow in a later change).

Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
  Authenticator.permitted contract.  The gateway cannot run without
  IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
  downgrade path.

CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces.  Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.

Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
  resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
  operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
  role bundles, agent-as-composition note, enforcement-boundary
  policy, enterprise extensibility.

Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
  Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
  role x workspace combinations, enforce_workspace paths,
  unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
  explicitly (no permissive defaults relied upon).  New tests pin
  the fail-closed invariants: DispatcherManager / Mux refuse
  auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
This commit is contained in:
cybermaggedon 2026-04-24 17:29:10 +01:00 committed by GitHub
parent ae9936c9cc
commit 67b2fc448f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
61 changed files with 6474 additions and 792 deletions

View file

@ -15,6 +15,7 @@ from .translators.library import LibraryRequestTranslator, LibraryResponseTransl
from .translators.document_loading import DocumentTranslator, TextDocumentTranslator
from .translators.config import ConfigRequestTranslator, ConfigResponseTranslator
from .translators.flow import FlowRequestTranslator, FlowResponseTranslator
from .translators.iam import IamRequestTranslator, IamResponseTranslator
from .translators.prompt import PromptRequestTranslator, PromptResponseTranslator
from .translators.tool import ToolRequestTranslator, ToolResponseTranslator
from .translators.embeddings_query import (
@ -85,11 +86,17 @@ TranslatorRegistry.register_service(
)
TranslatorRegistry.register_service(
"flow",
FlowRequestTranslator(),
"flow",
FlowRequestTranslator(),
FlowResponseTranslator()
)
TranslatorRegistry.register_service(
"iam",
IamRequestTranslator(),
IamResponseTranslator()
)
TranslatorRegistry.register_service(
"prompt",
PromptRequestTranslator(),

View file

@ -0,0 +1,194 @@
from typing import Dict, Any, Tuple
from ...schema import IamRequest, IamResponse
from ...schema import (
UserInput, UserRecord,
WorkspaceInput, WorkspaceRecord,
ApiKeyInput, ApiKeyRecord,
)
from .base import MessageTranslator
def _user_input_from_dict(d):
if d is None:
return None
return UserInput(
username=d.get("username", ""),
name=d.get("name", ""),
email=d.get("email", ""),
password=d.get("password", ""),
roles=list(d.get("roles", [])),
enabled=d.get("enabled", True),
must_change_password=d.get("must_change_password", False),
)
def _workspace_input_from_dict(d):
if d is None:
return None
return WorkspaceInput(
id=d.get("id", ""),
name=d.get("name", ""),
enabled=d.get("enabled", True),
)
def _api_key_input_from_dict(d):
if d is None:
return None
return ApiKeyInput(
user_id=d.get("user_id", ""),
name=d.get("name", ""),
expires=d.get("expires", ""),
)
def _user_record_to_dict(r):
if r is None:
return None
return {
"id": r.id,
"workspace": r.workspace,
"username": r.username,
"name": r.name,
"email": r.email,
"roles": list(r.roles),
"enabled": r.enabled,
"must_change_password": r.must_change_password,
"created": r.created,
}
def _workspace_record_to_dict(r):
if r is None:
return None
return {
"id": r.id,
"name": r.name,
"enabled": r.enabled,
"created": r.created,
}
def _api_key_record_to_dict(r):
if r is None:
return None
return {
"id": r.id,
"user_id": r.user_id,
"name": r.name,
"prefix": r.prefix,
"expires": r.expires,
"created": r.created,
"last_used": r.last_used,
}
class IamRequestTranslator(MessageTranslator):
def decode(self, data: Dict[str, Any]) -> IamRequest:
return IamRequest(
operation=data.get("operation", ""),
workspace=data.get("workspace", ""),
actor=data.get("actor", ""),
user_id=data.get("user_id", ""),
username=data.get("username", ""),
key_id=data.get("key_id", ""),
api_key=data.get("api_key", ""),
password=data.get("password", ""),
new_password=data.get("new_password", ""),
user=_user_input_from_dict(data.get("user")),
workspace_record=_workspace_input_from_dict(
data.get("workspace_record")
),
key=_api_key_input_from_dict(data.get("key")),
)
def encode(self, obj: IamRequest) -> Dict[str, Any]:
result = {"operation": obj.operation}
for fname in (
"workspace", "actor", "user_id", "username", "key_id",
"api_key", "password", "new_password",
):
v = getattr(obj, fname, "")
if v:
result[fname] = v
if obj.user is not None:
result["user"] = {
"username": obj.user.username,
"name": obj.user.name,
"email": obj.user.email,
"password": obj.user.password,
"roles": list(obj.user.roles),
"enabled": obj.user.enabled,
"must_change_password": obj.user.must_change_password,
}
if obj.workspace_record is not None:
result["workspace_record"] = {
"id": obj.workspace_record.id,
"name": obj.workspace_record.name,
"enabled": obj.workspace_record.enabled,
}
if obj.key is not None:
result["key"] = {
"user_id": obj.key.user_id,
"name": obj.key.name,
"expires": obj.key.expires,
}
return result
class IamResponseTranslator(MessageTranslator):
def decode(self, data: Dict[str, Any]) -> IamResponse:
raise NotImplementedError(
"IamResponse is a server-produced message; no HTTP→schema "
"path is needed"
)
def encode(self, obj: IamResponse) -> Dict[str, Any]:
result: Dict[str, Any] = {}
if obj.user is not None:
result["user"] = _user_record_to_dict(obj.user)
if obj.users:
result["users"] = [_user_record_to_dict(u) for u in obj.users]
if obj.workspace is not None:
result["workspace"] = _workspace_record_to_dict(obj.workspace)
if obj.workspaces:
result["workspaces"] = [
_workspace_record_to_dict(w) for w in obj.workspaces
]
if obj.api_key_plaintext:
result["api_key_plaintext"] = obj.api_key_plaintext
if obj.api_key is not None:
result["api_key"] = _api_key_record_to_dict(obj.api_key)
if obj.api_keys:
result["api_keys"] = [
_api_key_record_to_dict(k) for k in obj.api_keys
]
if obj.jwt:
result["jwt"] = obj.jwt
if obj.jwt_expires:
result["jwt_expires"] = obj.jwt_expires
if obj.signing_key_public:
result["signing_key_public"] = obj.signing_key_public
if obj.resolved_user_id:
result["resolved_user_id"] = obj.resolved_user_id
if obj.resolved_workspace:
result["resolved_workspace"] = obj.resolved_workspace
if obj.resolved_roles:
result["resolved_roles"] = list(obj.resolved_roles)
if obj.temporary_password:
result["temporary_password"] = obj.temporary_password
if obj.bootstrap_admin_user_id:
result["bootstrap_admin_user_id"] = obj.bootstrap_admin_user_id
if obj.bootstrap_admin_api_key:
result["bootstrap_admin_api_key"] = obj.bootstrap_admin_api_key
return result
def encode_with_completion(
self, obj: IamResponse,
) -> Tuple[Dict[str, Any], bool]:
return self.encode(obj), True