mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-06-26 15:09:38 +02:00
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model. The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.
IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
passwords and JWT signing keys in Cassandra. Reached over the
standard pub/sub request/response pattern; gateway is the only
caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
rotate-signing-key, create/list/get/update/disable/delete/enable-user,
change-password, reset-password, create/list/get/update/disable-
workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and
retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed. Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
required startup argument with no permissive default. Masked
"auth failure" errors hide whether a refused bootstrap request was
due to mode, state, or authorisation.
Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator. Distinguishes JWTs
(three-segment dotted) from API keys by shape; verifies JWTs
locally using the cached IAM public key; resolves API keys via
IAM with a short-TTL hash-keyed cache. Every failure path
surfaces the same 401 body ("auth failure") so callers cannot
enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
traffic does not begin flowing until auth has started.
Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
OSS ships reader / writer / admin; the first two are workspace-
assigned, admin is cross-workspace ("*"). No "cross-workspace"
pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
authorisation test: some role must grant the capability *and* be
active in the target workspace.
* enforce_workspace validates a request-body workspace against the
caller's role scopes and injects the resolved value. Cross-
workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
permissive default. Construction fails fast if omitted. Enterprise
editions can replace the role table without changing the wire
protocol.
WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
runs on the first WebSocket frame ({"type":"auth","token":"..."})
with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
The socket stays open on failure so the client can re-authenticate
— browsers treat a handshake-time 401 as terminal, breaking
reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
enforces the caller's workspace (envelope + inner payload) using
the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
handshake (URL-scoped short-lived transfers; no re-auth need).
Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
the IAM API (per-op REST endpoints to follow in a later change).
Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
Authenticator.permitted contract. The gateway cannot run without
IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
downgrade path.
CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.
Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
role bundles, agent-as-composition note, enforcement-boundary
policy, enterprise extensibility.
Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
role x workspace combinations, enforce_workspace paths,
unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
explicitly (no permissive defaults relied upon). New tests pin
the fail-closed invariants: DispatcherManager / Mux refuse
auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
This commit is contained in:
parent
ae9936c9cc
commit
67b2fc448f
61 changed files with 6474 additions and 792 deletions
279
trustgraph-base/trustgraph/base/iam_client.py
Normal file
279
trustgraph-base/trustgraph/base/iam_client.py
Normal file
|
|
@ -0,0 +1,279 @@
|
|||
|
||||
from . request_response_spec import RequestResponse, RequestResponseSpec
|
||||
from .. schema import (
|
||||
IamRequest, IamResponse,
|
||||
UserInput, WorkspaceInput, ApiKeyInput,
|
||||
)
|
||||
|
||||
IAM_TIMEOUT = 10
|
||||
|
||||
|
||||
class IamClient(RequestResponse):
|
||||
"""Client for the IAM service request/response pub/sub protocol.
|
||||
|
||||
Mirrors ``ConfigClient``: a thin wrapper around ``RequestResponse``
|
||||
that knows the IAM request / response schemas. Only the subset of
|
||||
operations actually implemented by the server today has helper
|
||||
methods here; callers that need an unimplemented operation can
|
||||
build ``IamRequest`` and call ``request()`` directly.
|
||||
"""
|
||||
|
||||
async def _request(self, timeout=IAM_TIMEOUT, **kwargs):
|
||||
resp = await self.request(
|
||||
IamRequest(**kwargs),
|
||||
timeout=timeout,
|
||||
)
|
||||
if resp.error:
|
||||
raise RuntimeError(
|
||||
f"{resp.error.type}: {resp.error.message}"
|
||||
)
|
||||
return resp
|
||||
|
||||
async def bootstrap(self, timeout=IAM_TIMEOUT):
|
||||
"""Initial-run IAM self-seed. Returns a tuple of
|
||||
``(admin_user_id, admin_api_key_plaintext)``. Both are empty
|
||||
strings on repeat calls — the operation is a no-op once the
|
||||
IAM tables are populated."""
|
||||
resp = await self._request(
|
||||
operation="bootstrap", timeout=timeout,
|
||||
)
|
||||
return resp.bootstrap_admin_user_id, resp.bootstrap_admin_api_key
|
||||
|
||||
async def resolve_api_key(self, api_key, timeout=IAM_TIMEOUT):
|
||||
"""Resolve a plaintext API key to its identity triple.
|
||||
|
||||
Returns ``(user_id, workspace, roles)`` or raises
|
||||
``RuntimeError`` with error type ``auth-failed`` if the key is
|
||||
unknown / expired / revoked."""
|
||||
resp = await self._request(
|
||||
operation="resolve-api-key",
|
||||
api_key=api_key,
|
||||
timeout=timeout,
|
||||
)
|
||||
return (
|
||||
resp.resolved_user_id,
|
||||
resp.resolved_workspace,
|
||||
list(resp.resolved_roles),
|
||||
)
|
||||
|
||||
async def create_user(self, workspace, user, actor="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
"""Create a user. ``user`` is a ``UserInput``."""
|
||||
resp = await self._request(
|
||||
operation="create-user",
|
||||
workspace=workspace,
|
||||
actor=actor,
|
||||
user=user,
|
||||
timeout=timeout,
|
||||
)
|
||||
return resp.user
|
||||
|
||||
async def list_users(self, workspace, actor="", timeout=IAM_TIMEOUT):
|
||||
resp = await self._request(
|
||||
operation="list-users",
|
||||
workspace=workspace,
|
||||
actor=actor,
|
||||
timeout=timeout,
|
||||
)
|
||||
return list(resp.users)
|
||||
|
||||
async def create_api_key(self, workspace, key, actor="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
"""Create an API key. ``key`` is an ``ApiKeyInput``. Returns
|
||||
``(plaintext, record)`` — plaintext is returned once and the
|
||||
caller is responsible for surfacing it to the operator."""
|
||||
resp = await self._request(
|
||||
operation="create-api-key",
|
||||
workspace=workspace,
|
||||
actor=actor,
|
||||
key=key,
|
||||
timeout=timeout,
|
||||
)
|
||||
return resp.api_key_plaintext, resp.api_key
|
||||
|
||||
async def list_api_keys(self, workspace, user_id, actor="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
resp = await self._request(
|
||||
operation="list-api-keys",
|
||||
workspace=workspace,
|
||||
actor=actor,
|
||||
user_id=user_id,
|
||||
timeout=timeout,
|
||||
)
|
||||
return list(resp.api_keys)
|
||||
|
||||
async def revoke_api_key(self, workspace, key_id, actor="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
await self._request(
|
||||
operation="revoke-api-key",
|
||||
workspace=workspace,
|
||||
actor=actor,
|
||||
key_id=key_id,
|
||||
timeout=timeout,
|
||||
)
|
||||
|
||||
async def login(self, username, password, workspace="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
"""Validate credentials and return ``(jwt, expires_iso)``.
|
||||
``workspace`` is optional; defaults at the server to the
|
||||
OSS default workspace."""
|
||||
resp = await self._request(
|
||||
operation="login",
|
||||
workspace=workspace,
|
||||
username=username,
|
||||
password=password,
|
||||
timeout=timeout,
|
||||
)
|
||||
return resp.jwt, resp.jwt_expires
|
||||
|
||||
async def get_signing_key_public(self, timeout=IAM_TIMEOUT):
|
||||
"""Return the active JWT signing public key in PEM. The
|
||||
gateway calls this at startup and caches the result."""
|
||||
resp = await self._request(
|
||||
operation="get-signing-key-public",
|
||||
timeout=timeout,
|
||||
)
|
||||
return resp.signing_key_public
|
||||
|
||||
async def change_password(self, user_id, current_password,
|
||||
new_password, timeout=IAM_TIMEOUT):
|
||||
await self._request(
|
||||
operation="change-password",
|
||||
user_id=user_id,
|
||||
password=current_password,
|
||||
new_password=new_password,
|
||||
timeout=timeout,
|
||||
)
|
||||
|
||||
async def reset_password(self, workspace, user_id, actor="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
"""Admin-driven password reset. Returns the plaintext
|
||||
temporary password (returned once)."""
|
||||
resp = await self._request(
|
||||
operation="reset-password",
|
||||
workspace=workspace,
|
||||
actor=actor,
|
||||
user_id=user_id,
|
||||
timeout=timeout,
|
||||
)
|
||||
return resp.temporary_password
|
||||
|
||||
async def get_user(self, workspace, user_id, actor="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
resp = await self._request(
|
||||
operation="get-user",
|
||||
workspace=workspace,
|
||||
actor=actor,
|
||||
user_id=user_id,
|
||||
timeout=timeout,
|
||||
)
|
||||
return resp.user
|
||||
|
||||
async def update_user(self, workspace, user_id, user, actor="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
resp = await self._request(
|
||||
operation="update-user",
|
||||
workspace=workspace,
|
||||
actor=actor,
|
||||
user_id=user_id,
|
||||
user=user,
|
||||
timeout=timeout,
|
||||
)
|
||||
return resp.user
|
||||
|
||||
async def disable_user(self, workspace, user_id, actor="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
await self._request(
|
||||
operation="disable-user",
|
||||
workspace=workspace,
|
||||
actor=actor,
|
||||
user_id=user_id,
|
||||
timeout=timeout,
|
||||
)
|
||||
|
||||
async def enable_user(self, workspace, user_id, actor="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
await self._request(
|
||||
operation="enable-user",
|
||||
workspace=workspace,
|
||||
actor=actor,
|
||||
user_id=user_id,
|
||||
timeout=timeout,
|
||||
)
|
||||
|
||||
async def delete_user(self, workspace, user_id, actor="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
await self._request(
|
||||
operation="delete-user",
|
||||
workspace=workspace,
|
||||
actor=actor,
|
||||
user_id=user_id,
|
||||
timeout=timeout,
|
||||
)
|
||||
|
||||
async def create_workspace(self, workspace_record, actor="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
resp = await self._request(
|
||||
operation="create-workspace",
|
||||
actor=actor,
|
||||
workspace_record=workspace_record,
|
||||
timeout=timeout,
|
||||
)
|
||||
return resp.workspace
|
||||
|
||||
async def list_workspaces(self, actor="", timeout=IAM_TIMEOUT):
|
||||
resp = await self._request(
|
||||
operation="list-workspaces",
|
||||
actor=actor,
|
||||
timeout=timeout,
|
||||
)
|
||||
return list(resp.workspaces)
|
||||
|
||||
async def get_workspace(self, workspace_id, actor="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
from ..schema import WorkspaceInput
|
||||
resp = await self._request(
|
||||
operation="get-workspace",
|
||||
actor=actor,
|
||||
workspace_record=WorkspaceInput(id=workspace_id),
|
||||
timeout=timeout,
|
||||
)
|
||||
return resp.workspace
|
||||
|
||||
async def update_workspace(self, workspace_record, actor="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
resp = await self._request(
|
||||
operation="update-workspace",
|
||||
actor=actor,
|
||||
workspace_record=workspace_record,
|
||||
timeout=timeout,
|
||||
)
|
||||
return resp.workspace
|
||||
|
||||
async def disable_workspace(self, workspace_id, actor="",
|
||||
timeout=IAM_TIMEOUT):
|
||||
from ..schema import WorkspaceInput
|
||||
await self._request(
|
||||
operation="disable-workspace",
|
||||
actor=actor,
|
||||
workspace_record=WorkspaceInput(id=workspace_id),
|
||||
timeout=timeout,
|
||||
)
|
||||
|
||||
async def rotate_signing_key(self, actor="", timeout=IAM_TIMEOUT):
|
||||
await self._request(
|
||||
operation="rotate-signing-key",
|
||||
actor=actor,
|
||||
timeout=timeout,
|
||||
)
|
||||
|
||||
|
||||
class IamClientSpec(RequestResponseSpec):
|
||||
def __init__(self, request_name, response_name):
|
||||
super().__init__(
|
||||
request_name=request_name,
|
||||
request_schema=IamRequest,
|
||||
response_name=response_name,
|
||||
response_schema=IamResponse,
|
||||
impl=IamClient,
|
||||
)
|
||||
Loading…
Add table
Add a link
Reference in a new issue