feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)

Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model.  The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.

IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
  passwords and JWT signing keys in Cassandra.  Reached over the
  standard pub/sub request/response pattern; gateway is the only
  caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
  rotate-signing-key, create/list/get/update/disable/delete/enable-user,
  change-password, reset-password, create/list/get/update/disable-
  workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA).  Key rotation writes a new kid and
  retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed.  Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
  required startup argument with no permissive default.  Masked
  "auth failure" errors hide whether a refused bootstrap request was
  due to mode, state, or authorisation.

Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator.  Distinguishes JWTs
  (three-segment dotted) from API keys by shape; verifies JWTs
  locally using the cached IAM public key; resolves API keys via
  IAM with a short-TTL hash-keyed cache.  Every failure path
  surfaces the same 401 body ("auth failure") so callers cannot
  enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
  traffic does not begin flowing until auth has started.

Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
  OSS ships reader / writer / admin; the first two are workspace-
  assigned, admin is cross-workspace ("*").  No "cross-workspace"
  pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
  authorisation test: some role must grant the capability *and* be
  active in the target workspace.
* enforce_workspace validates a request-body workspace against the
  caller's role scopes and injects the resolved value.  Cross-
  workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
  permissive default.  Construction fails fast if omitted.  Enterprise
  editions can replace the role table without changing the wire
  protocol.

WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
  runs on the first WebSocket frame ({"type":"auth","token":"..."})
  with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
  The socket stays open on failure so the client can re-authenticate
  — browsers treat a handshake-time 401 as terminal, breaking
  reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
  enforces the caller's workspace (envelope + inner payload) using
  the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
  handshake (URL-scoped short-lived transfers; no re-auth need).

Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
  op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
  the IAM API (per-op REST endpoints to follow in a later change).

Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
  Authenticator.permitted contract.  The gateway cannot run without
  IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
  downgrade path.

CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces.  Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.

Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
  resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
  operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
  role bundles, agent-as-composition note, enforcement-boundary
  policy, enterprise extensibility.

Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
  Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
  role x workspace combinations, enforce_workspace paths,
  unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
  explicitly (no permissive defaults relied upon).  New tests pin
  the fail-closed invariants: DispatcherManager / Mux refuse
  auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
This commit is contained in:
cybermaggedon 2026-04-24 17:29:10 +01:00 committed by GitHub
parent ae9936c9cc
commit 67b2fc448f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
61 changed files with 6474 additions and 792 deletions

View file

@ -63,6 +63,7 @@ chunker-token = "trustgraph.chunking.token:run"
bootstrap = "trustgraph.bootstrap.bootstrapper:run"
config-svc = "trustgraph.config.service:run"
flow-svc = "trustgraph.flow.service:run"
iam-svc = "trustgraph.iam.service:run"
doc-embeddings-query-milvus = "trustgraph.query.doc_embeddings.milvus:run"
doc-embeddings-query-pinecone = "trustgraph.query.doc_embeddings.pinecone:run"
doc-embeddings-query-qdrant = "trustgraph.query.doc_embeddings.qdrant:run"

View file

@ -1,22 +1,264 @@
"""
IAM-backed authentication for the API gateway.
class Authenticator:
Replaces the legacy GATEWAY_SECRET shared-token Authenticator. The
gateway is now stateless with respect to credentials: it either
verifies a JWT locally using the active IAM signing public key, or
resolves an API key by hash with a short local cache backed by the
IAM service.
def __init__(self, token=None, allow_all=False):
Identity returned by authenticate() is the (user_id, workspace,
roles) triple the rest of the gateway capability checks, workspace
resolver, audit logging needs.
"""
if not allow_all and token is None:
raise RuntimeError("Need a token")
import asyncio
import base64
import hashlib
import json
import logging
import time
import uuid
from dataclasses import dataclass
if not allow_all and token == "":
raise RuntimeError("Need a token")
from aiohttp import web
self.token = token
self.allow_all = allow_all
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.primitives.asymmetric import ed25519
def permitted(self, token, roles):
from ..base.iam_client import IamClient
from ..base.metrics import ProducerMetrics, SubscriberMetrics
from ..schema import (
IamRequest, IamResponse,
iam_request_queue, iam_response_queue,
)
if self.allow_all: return True
logger = logging.getLogger("auth")
if self.token != token: return False
API_KEY_CACHE_TTL = 60 # seconds
return True
@dataclass
class Identity:
user_id: str
workspace: str
roles: list
source: str # "api-key" | "jwt"
def _auth_failure():
return web.HTTPUnauthorized(
text='{"error":"auth failure"}',
content_type="application/json",
)
def _access_denied():
return web.HTTPForbidden(
text='{"error":"access denied"}',
content_type="application/json",
)
def _b64url_decode(s):
pad = "=" * (-len(s) % 4)
return base64.urlsafe_b64decode(s + pad)
def _verify_jwt_eddsa(token, public_pem):
"""Verify an Ed25519 JWT and return its claims. Raises on any
validation failure. Refuses non-EdDSA algorithms."""
parts = token.split(".")
if len(parts) != 3:
raise ValueError("malformed JWT")
h_b64, p_b64, s_b64 = parts
signing_input = f"{h_b64}.{p_b64}".encode("ascii")
header = json.loads(_b64url_decode(h_b64))
if header.get("alg") != "EdDSA":
raise ValueError(f"unsupported alg: {header.get('alg')!r}")
key = serialization.load_pem_public_key(public_pem.encode("ascii"))
if not isinstance(key, ed25519.Ed25519PublicKey):
raise ValueError("public key is not Ed25519")
signature = _b64url_decode(s_b64)
key.verify(signature, signing_input) # raises InvalidSignature
claims = json.loads(_b64url_decode(p_b64))
exp = claims.get("exp")
if exp is None or exp < time.time():
raise ValueError("expired")
return claims
class IamAuth:
"""Resolves bearer credentials via the IAM service.
Used by every gateway endpoint that needs authentication. Fetches
the IAM signing public key at startup (cached in memory). API
keys are resolved via the IAM service with a local hashidentity
cache (short TTL so revoked keys stop working within the TTL
window without any push mechanism)."""
def __init__(self, backend, id="api-gateway"):
self.backend = backend
self.id = id
# Populated at start() via IAM.
self._signing_public_pem = None
# API-key cache: plaintext_sha256_hex -> (Identity, expires_ts)
self._key_cache = {}
self._key_cache_lock = asyncio.Lock()
# ------------------------------------------------------------------
# Short-lived client helper. Mirrors the pattern used by the
# bootstrap framework and AsyncProcessor: a fresh uuid suffix per
# invocation so Pulsar exclusive subscriptions don't collide with
# ghosts from prior calls.
# ------------------------------------------------------------------
def _make_client(self):
rr_id = str(uuid.uuid4())
return IamClient(
backend=self.backend,
subscription=f"{self.id}--iam--{rr_id}",
consumer_name=self.id,
request_topic=iam_request_queue,
request_schema=IamRequest,
request_metrics=ProducerMetrics(
processor=self.id, flow=None, name="iam-request",
),
response_topic=iam_response_queue,
response_schema=IamResponse,
response_metrics=SubscriberMetrics(
processor=self.id, flow=None, name="iam-response",
),
)
async def _with_client(self, op):
"""Open a short-lived IamClient, run ``op(client)``, close."""
client = self._make_client()
await client.start()
try:
return await op(client)
finally:
try:
await client.stop()
except Exception:
pass
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
async def start(self, max_retries=30, retry_delay=2.0):
"""Fetch the signing public key from IAM. Retries on
failure the gateway may be starting before IAM is ready."""
async def _fetch(client):
return await client.get_signing_key_public()
for attempt in range(max_retries):
try:
pem = await self._with_client(_fetch)
if pem:
self._signing_public_pem = pem
logger.info(
"IamAuth: fetched IAM signing public key "
f"({len(pem)} bytes)"
)
return
except Exception as e:
logger.info(
f"IamAuth: waiting for IAM signing key "
f"({type(e).__name__}: {e}); "
f"retry {attempt + 1}/{max_retries}"
)
await asyncio.sleep(retry_delay)
# Don't prevent startup forever. A later authenticate() call
# will try again via the JWT path.
logger.warning(
"IamAuth: could not fetch IAM signing key at startup; "
"JWT validation will fail until it's available"
)
# ------------------------------------------------------------------
# Authentication
# ------------------------------------------------------------------
async def authenticate(self, request):
"""Extract and validate the Bearer credential from an HTTP
request. Returns an ``Identity``. Raises HTTPUnauthorized
(401 / "auth failure") on any failure mode the caller
cannot distinguish missing / malformed / invalid / expired /
revoked credentials."""
header = request.headers.get("Authorization", "")
if not header.startswith("Bearer "):
raise _auth_failure()
token = header[len("Bearer "):].strip()
if not token:
raise _auth_failure()
# API keys always start with "tg_". JWTs have two dots and
# no "tg_" prefix. Discriminate cheaply.
if token.startswith("tg_"):
return await self._resolve_api_key(token)
if token.count(".") == 2:
return self._verify_jwt(token)
raise _auth_failure()
def _verify_jwt(self, token):
if not self._signing_public_pem:
raise _auth_failure()
try:
claims = _verify_jwt_eddsa(token, self._signing_public_pem)
except Exception as e:
logger.debug(f"JWT validation failed: {type(e).__name__}: {e}")
raise _auth_failure()
sub = claims.get("sub", "")
ws = claims.get("workspace", "")
roles = list(claims.get("roles", []))
if not sub or not ws:
raise _auth_failure()
return Identity(
user_id=sub, workspace=ws, roles=roles, source="jwt",
)
async def _resolve_api_key(self, plaintext):
h = hashlib.sha256(plaintext.encode("utf-8")).hexdigest()
cached = self._key_cache.get(h)
now = time.time()
if cached and cached[1] > now:
return cached[0]
async with self._key_cache_lock:
cached = self._key_cache.get(h)
if cached and cached[1] > now:
return cached[0]
try:
async def _call(client):
return await client.resolve_api_key(plaintext)
user_id, workspace, roles = await self._with_client(_call)
except Exception as e:
logger.debug(
f"API key resolution failed: "
f"{type(e).__name__}: {e}"
)
raise _auth_failure()
if not user_id or not workspace:
raise _auth_failure()
identity = Identity(
user_id=user_id, workspace=workspace,
roles=list(roles), source="api-key",
)
self._key_cache[h] = (identity, now + API_KEY_CACHE_TTL)
return identity

View file

@ -0,0 +1,238 @@
"""
Capability vocabulary, role definitions, and authorisation helpers.
See docs/tech-specs/capabilities.md for the authoritative description.
The data here is the OSS bundle table in that spec. Enterprise
editions may replace this module with their own role table; the
vocabulary (capability strings) is shared.
Role model
----------
A role has two dimensions:
1. **capability set** which operations the role grants.
2. **workspace scope** which workspaces the role is active in.
The authorisation question is: *given the caller's roles, a required
capability, and a target workspace, does any role grant the
capability AND apply to the target workspace?*
Workspace scope values recognised here:
- ``"assigned"`` the role applies only to the caller's own
assigned workspace (stored on their user record).
- ``"*"`` the role applies to every workspace.
Enterprise editions can add richer scopes (explicit permitted-set,
patterns, etc.) without changing the wire protocol.
Sentinels
---------
- ``PUBLIC`` endpoint requires no authentication.
- ``AUTHENTICATED`` endpoint requires a valid identity, no
specific capability.
"""
from aiohttp import web
PUBLIC = "__public__"
AUTHENTICATED = "__authenticated__"
# Capability vocabulary. Mirrors the "Capability list" tables in
# capabilities.md. Kept as a set so the gateway can fail-closed on
# an endpoint that declares an unknown capability.
KNOWN_CAPABILITIES = {
# Data plane
"agent",
"graph:read", "graph:write",
"documents:read", "documents:write",
"rows:read", "rows:write",
"llm",
"embeddings",
"mcp",
# Control plane
"config:read", "config:write",
"flows:read", "flows:write",
"users:read", "users:write", "users:admin",
"keys:self", "keys:admin",
"workspaces:admin",
"iam:admin",
"metrics:read",
"collections:read", "collections:write",
"knowledge:read", "knowledge:write",
}
# Capability sets used below.
_READER_CAPS = {
"agent",
"graph:read",
"documents:read",
"rows:read",
"llm",
"embeddings",
"mcp",
"config:read",
"flows:read",
"collections:read",
"knowledge:read",
"keys:self",
}
_WRITER_CAPS = _READER_CAPS | {
"graph:write",
"documents:write",
"rows:write",
"collections:write",
"knowledge:write",
}
_ADMIN_CAPS = _WRITER_CAPS | {
"config:write",
"flows:write",
"users:read", "users:write", "users:admin",
"keys:admin",
"workspaces:admin",
"iam:admin",
"metrics:read",
}
# Role definitions. Each role has a capability set and a workspace
# scope. Enterprise overrides this mapping.
ROLE_DEFINITIONS = {
"reader": {
"capabilities": _READER_CAPS,
"workspace_scope": "assigned",
},
"writer": {
"capabilities": _WRITER_CAPS,
"workspace_scope": "assigned",
},
"admin": {
"capabilities": _ADMIN_CAPS,
"workspace_scope": "*",
},
}
def _scope_permits(role_name, target_workspace, assigned_workspace):
"""Does the given role apply to ``target_workspace``?"""
role = ROLE_DEFINITIONS.get(role_name)
if role is None:
return False
scope = role["workspace_scope"]
if scope == "*":
return True
if scope == "assigned":
return target_workspace == assigned_workspace
# Future scope types (lists, patterns) extend here.
return False
def check(identity, capability, target_workspace=None):
"""Is ``identity`` permitted to invoke ``capability`` on
``target_workspace``?
Passes iff some role held by the caller both (a) grants
``capability`` and (b) is active in ``target_workspace``.
``target_workspace`` defaults to the caller's assigned workspace,
which makes this function usable for system-level operations and
for authenticated endpoints that don't take a workspace argument
(the call collapses to "do any of my roles grant this cap?")."""
if capability not in KNOWN_CAPABILITIES:
return False
target = target_workspace or identity.workspace
for role_name in identity.roles:
role = ROLE_DEFINITIONS.get(role_name)
if role is None:
continue
if capability not in role["capabilities"]:
continue
if _scope_permits(role_name, target, identity.workspace):
return True
return False
def access_denied():
return web.HTTPForbidden(
text='{"error":"access denied"}',
content_type="application/json",
)
def auth_failure():
return web.HTTPUnauthorized(
text='{"error":"auth failure"}',
content_type="application/json",
)
async def enforce(request, auth, capability):
"""Authenticate + capability-check for endpoints that carry no
workspace dimension on the request (metrics, i18n, etc.).
For endpoints that carry a workspace field on the body, call
:func:`enforce_workspace` *after* parsing the body to validate
the workspace and re-check the capability in that scope. Most
endpoints do both.
- ``PUBLIC``: no authentication, returns ``None``.
- ``AUTHENTICATED``: any valid identity.
- capability string: identity must have it, checked against the
caller's assigned workspace (adequate for endpoints whose
capability is system-level, e.g. ``metrics:read``, or where
the real workspace-aware check happens in
:func:`enforce_workspace` after body parsing)."""
if capability == PUBLIC:
return None
identity = await auth.authenticate(request)
if capability == AUTHENTICATED:
return identity
if not check(identity, capability):
raise access_denied()
return identity
def enforce_workspace(data, identity, capability=None):
"""Resolve + validate the workspace on a request body.
- Target workspace = ``data["workspace"]`` if supplied, else the
caller's assigned workspace.
- At least one of the caller's roles must (a) be active in the
target workspace and, if ``capability`` is given, (b) grant
``capability``. Otherwise 403.
- On success, ``data["workspace"]`` is overwritten with the
resolved value callers can rely on the outgoing message
having the gateway's chosen workspace rather than any
caller-supplied value.
For ``capability=None`` the workspace scope alone is checked
useful when the body has a workspace but the endpoint already
passed its capability check (e.g. via :func:`enforce`)."""
if not isinstance(data, dict):
return data
requested = data.get("workspace", "")
target = requested or identity.workspace
for role_name in identity.roles:
role = ROLE_DEFINITIONS.get(role_name)
if role is None:
continue
if capability is not None and capability not in role["capabilities"]:
continue
if _scope_permits(role_name, target, identity.workspace):
data["workspace"] = target
return data
raise access_denied()

View file

@ -0,0 +1,40 @@
from ... schema import IamRequest, IamResponse
from ... schema import iam_request_queue, iam_response_queue
from ... messaging import TranslatorRegistry
from . requestor import ServiceRequestor
class IamRequestor(ServiceRequestor):
def __init__(self, backend, consumer, subscriber, timeout=120,
request_queue=None, response_queue=None):
if request_queue is None:
request_queue = iam_request_queue
if response_queue is None:
response_queue = iam_response_queue
super().__init__(
backend=backend,
consumer_name=consumer,
subscription=subscriber,
request_queue=request_queue,
response_queue=response_queue,
request_schema=IamRequest,
response_schema=IamResponse,
timeout=timeout,
)
self.request_translator = (
TranslatorRegistry.get_request_translator("iam")
)
self.response_translator = (
TranslatorRegistry.get_response_translator("iam")
)
def to_request(self, body):
return self.request_translator.decode(body)
def from_response(self, message):
return self.response_translator.encode_with_completion(message)

View file

@ -9,6 +9,7 @@ logger = logging.getLogger(__name__)
from . config import ConfigRequestor
from . flow import FlowRequestor
from . iam import IamRequestor
from . librarian import LibrarianRequestor
from . knowledge import KnowledgeRequestor
from . collection_management import CollectionManagementRequestor
@ -72,6 +73,7 @@ request_response_dispatchers = {
global_dispatchers = {
"config": ConfigRequestor,
"flow": FlowRequestor,
"iam": IamRequestor,
"librarian": LibrarianRequestor,
"knowledge": KnowledgeRequestor,
"collection-management": CollectionManagementRequestor,
@ -105,13 +107,31 @@ class DispatcherWrapper:
class DispatcherManager:
def __init__(self, backend, config_receiver, prefix="api-gateway",
queue_overrides=None):
def __init__(self, backend, config_receiver, auth,
prefix="api-gateway", queue_overrides=None):
"""
``auth`` is required. It flows into the Mux for first-frame
WebSocket authentication and into downstream dispatcher
construction. There is no permissive default constructing
a DispatcherManager without an authenticator would be a
silent downgrade to no-auth on the socket path.
"""
if auth is None:
raise ValueError(
"DispatcherManager requires an 'auth' argument — there "
"is no no-auth mode"
)
self.backend = backend
self.config_receiver = config_receiver
self.config_receiver.add_handler(self)
self.prefix = prefix
# Gateway IamAuth — used by the socket Mux for first-frame
# auth and by any dispatcher that needs to resolve caller
# identity out-of-band.
self.auth = auth
# Store queue overrides for global services
# Format: {"config": {"request": "...", "response": "..."}, ...}
self.queue_overrides = queue_overrides or {}
@ -163,6 +183,15 @@ class DispatcherManager:
def dispatch_global_service(self):
return DispatcherWrapper(self.process_global_service)
def dispatch_auth_iam(self):
"""Pre-configured IAM dispatcher for the gateway's auth
endpoints (login, bootstrap, change-password). Pins the
kind to ``iam`` so these handlers don't have to supply URL
params the global dispatcher would expect."""
async def _process(data, responder):
return await self.invoke_global_service(data, responder, "iam")
return DispatcherWrapper(_process)
def dispatch_core_export(self):
return DispatcherWrapper(self.process_core_export)
@ -314,7 +343,10 @@ class DispatcherManager:
async def process_socket(self, ws, running, params):
dispatcher = Mux(self, ws, running)
# The mux self-authenticates via the first-frame protocol;
# pass the gateway's IamAuth so it can validate tokens
# without reaching back into the endpoint layer.
dispatcher = Mux(self, ws, running, auth=self.auth)
return dispatcher

View file

@ -16,11 +16,28 @@ MAX_QUEUE_SIZE = 10
class Mux:
def __init__(self, dispatcher_manager, ws, running):
def __init__(self, dispatcher_manager, ws, running, auth):
"""
``auth`` is required the Mux implements the first-frame
auth protocol described in ``iam.md`` and will refuse any
non-auth frame until an ``auth-ok`` has been issued. There
is no no-auth mode.
"""
if auth is None:
raise ValueError(
"Mux requires an 'auth' argument — there is no "
"no-auth mode"
)
self.dispatcher_manager = dispatcher_manager
self.ws = ws
self.running = running
self.auth = auth
# Authenticated identity, populated by the first-frame auth
# protocol. ``None`` means the socket is not yet
# authenticated; any non-auth frame is refused.
self.identity = None
self.q = asyncio.Queue(maxsize=MAX_QUEUE_SIZE)
@ -31,6 +48,41 @@ class Mux:
if self.ws:
await self.ws.close()
async def _handle_auth_frame(self, data):
"""Process a ``{"type": "auth", "token": "..."}`` frame.
On success, updates ``self.identity`` and returns an
``auth-ok`` response frame. On failure, returns the masked
auth-failure frame. Never raises auth failures keep the
socket open so the client can retry without reconnecting
(important for browsers, which treat a handshake-time 401
as terminal)."""
token = data.get("token", "")
if not token:
await self.ws.send_json({
"type": "auth-failed",
"error": "auth failure",
})
return
class _Shim:
def __init__(self, tok):
self.headers = {"Authorization": f"Bearer {tok}"}
try:
identity = await self.auth.authenticate(_Shim(token))
except Exception:
await self.ws.send_json({
"type": "auth-failed",
"error": "auth failure",
})
return
self.identity = identity
await self.ws.send_json({
"type": "auth-ok",
"workspace": identity.workspace,
})
async def receive(self, msg):
request_id = None
@ -38,6 +90,16 @@ class Mux:
try:
data = msg.json()
# In-band auth protocol: the client sends
# ``{"type": "auth", "token": "..."}`` as its first frame
# (and any time it wants to re-auth: JWT refresh, token
# rotation, etc). Auth is always required on a Mux —
# there is no no-auth mode.
if isinstance(data, dict) and data.get("type") == "auth":
await self._handle_auth_frame(data)
return
request_id = data.get("id")
if "request" not in data:
@ -46,9 +108,49 @@ class Mux:
if "id" not in data:
raise RuntimeError("Bad message")
# Reject all non-auth frames until an ``auth-ok`` has
# been issued.
if self.identity is None:
await self.ws.send_json({
"id": request_id,
"error": {
"message": "auth failure",
"type": "auth-required",
},
"complete": True,
})
return
# Workspace resolution. Role workspace scope determines
# which target workspaces are permitted. The resolved
# value is written to both the envelope and the inner
# request payload so clients don't have to repeat it
# per-message (same convenience HTTP callers get via
# enforce_workspace).
from ..capabilities import enforce_workspace
from aiohttp import web as _web
try:
enforce_workspace(data, self.identity)
inner = data.get("request")
if isinstance(inner, dict):
enforce_workspace(inner, self.identity)
except _web.HTTPForbidden:
await self.ws.send_json({
"id": request_id,
"error": {
"message": "access denied",
"type": "access-denied",
},
"complete": True,
})
return
workspace = data["workspace"]
await self.q.put((
data["id"],
data.get("workspace", "default"),
workspace,
data.get("flow"),
data["service"],
data["request"]

View file

@ -0,0 +1,115 @@
"""
Gateway auth endpoints.
Three dedicated paths:
POST /api/v1/auth/login unauthenticated; username/password JWT
POST /api/v1/auth/bootstrap unauthenticated; IAM bootstrap op
POST /api/v1/auth/change-password authenticated; any role
These are the only IAM-surface operations that can be reached from
outside. Everything else routes through ``/api/v1/iam`` gated by
``users:admin``.
"""
import logging
from aiohttp import web
from .. capabilities import enforce, PUBLIC, AUTHENTICATED
logger = logging.getLogger("auth-endpoints")
logger.setLevel(logging.INFO)
class AuthEndpoints:
"""Groups the three auth-surface handlers. Each forwards to the
IAM service via the existing ``IamRequestor`` dispatcher."""
def __init__(self, iam_dispatcher, auth):
self.iam = iam_dispatcher
self.auth = auth
async def start(self):
pass
def add_routes(self, app):
app.add_routes([
web.post("/api/v1/auth/login", self.login),
web.post("/api/v1/auth/bootstrap", self.bootstrap),
web.post(
"/api/v1/auth/change-password",
self.change_password,
),
])
async def _forward(self, body):
async def responder(x, fin):
pass
return await self.iam.process(body, responder)
async def login(self, request):
"""Public. Accepts {username, password, workspace?}. Returns
{jwt, jwt_expires} on success; IAM's masked auth failure on
anything else."""
await enforce(request, self.auth, PUBLIC)
try:
body = await request.json()
except Exception:
return web.json_response(
{"error": "invalid json"}, status=400,
)
req = {
"operation": "login",
"username": body.get("username", ""),
"password": body.get("password", ""),
"workspace": body.get("workspace", ""),
}
resp = await self._forward(req)
if "error" in resp:
return web.json_response(
{"error": "auth failure"}, status=401,
)
return web.json_response(resp)
async def bootstrap(self, request):
"""Public. Valid only when IAM is running in bootstrap mode
with empty tables. In every other case the IAM service
returns a masked auth-failure."""
await enforce(request, self.auth, PUBLIC)
resp = await self._forward({"operation": "bootstrap"})
if "error" in resp:
return web.json_response(
{"error": "auth failure"}, status=401,
)
return web.json_response(resp)
async def change_password(self, request):
"""Authenticated (any role). Accepts {current_password,
new_password}; user_id is taken from the authenticated
identity the caller cannot change someone else's password
this way (reset-password is the admin path)."""
identity = await enforce(request, self.auth, AUTHENTICATED)
try:
body = await request.json()
except Exception:
return web.json_response(
{"error": "invalid json"}, status=400,
)
req = {
"operation": "change-password",
"user_id": identity.user_id,
"password": body.get("current_password", ""),
"new_password": body.get("new_password", ""),
}
resp = await self._forward(req)
if "error" in resp:
err_type = resp.get("error", {}).get("type", "")
if err_type == "auth-failed":
return web.json_response(
{"error": "auth failure"}, status=401,
)
return web.json_response(
{"error": resp.get("error", {}).get("message", "error")},
status=400,
)
return web.json_response(resp)

View file

@ -1,28 +1,27 @@
import asyncio
from aiohttp import web
import uuid
import logging
from aiohttp import web
from .. capabilities import enforce, enforce_workspace
logger = logging.getLogger("endpoint")
logger.setLevel(logging.INFO)
class ConstantEndpoint:
def __init__(self, endpoint_path, auth, dispatcher):
def __init__(self, endpoint_path, auth, dispatcher, capability):
self.path = endpoint_path
self.auth = auth
self.operation = "service"
self.capability = capability
self.dispatcher = dispatcher
async def start(self):
pass
def add_routes(self, app):
app.add_routes([
web.post(self.path, self.handle),
])
@ -31,22 +30,14 @@ class ConstantEndpoint:
logger.debug(f"Processing request: {request.path}")
try:
ht = request.headers["Authorization"]
tokens = ht.split(" ", 2)
if tokens[0] != "Bearer":
return web.HTTPUnauthorized()
token = tokens[1]
except:
token = ""
if not self.auth.permitted(token, self.operation):
return web.HTTPUnauthorized()
identity = await enforce(request, self.auth, self.capability)
try:
data = await request.json()
if identity is not None:
enforce_workspace(data, identity)
async def responder(x, fin):
pass
@ -54,10 +45,8 @@ class ConstantEndpoint:
return web.json_response(resp)
except web.HTTPException:
raise
except Exception as e:
logging.error(f"Exception: {e}")
return web.json_response(
{ "error": str(e) }
)
logger.error(f"Exception: {e}", exc_info=True)
return web.json_response({"error": str(e)})

View file

@ -4,16 +4,18 @@ from aiohttp import web
from trustgraph.i18n import get_language_pack
from .. capabilities import enforce
logger = logging.getLogger("endpoint")
logger.setLevel(logging.INFO)
class I18nPackEndpoint:
def __init__(self, endpoint_path: str, auth):
def __init__(self, endpoint_path: str, auth, capability):
self.path = endpoint_path
self.auth = auth
self.operation = "service"
self.capability = capability
async def start(self):
pass
@ -26,26 +28,13 @@ class I18nPackEndpoint:
async def handle(self, request):
logger.debug(f"Processing i18n pack request: {request.path}")
token = ""
try:
ht = request.headers["Authorization"]
tokens = ht.split(" ", 2)
if tokens[0] != "Bearer":
return web.HTTPUnauthorized()
token = tokens[1]
except Exception:
token = ""
if not self.auth.permitted(token, self.operation):
return web.HTTPUnauthorized()
await enforce(request, self.auth, self.capability)
lang = request.match_info.get("lang") or "en"
# This is a path traversal defense, and is a critical sec defense.
# Do not remove!
# Path-traversal defense — critical, do not remove.
if "/" in lang or ".." in lang:
return web.HTTPBadRequest(reason="Invalid language code")
pack = get_language_pack(lang)
return web.json_response(pack)

View file

@ -8,72 +8,278 @@ from . variable_endpoint import VariableEndpoint
from . socket import SocketEndpoint
from . metrics import MetricsEndpoint
from . i18n import I18nPackEndpoint
from . auth_endpoints import AuthEndpoints
from .. capabilities import PUBLIC, AUTHENTICATED
from .. dispatch.manager import DispatcherManager
# Capability required for each kind on the /api/v1/{kind} generic
# endpoint (global services). Coarse gating — the IAM bundle split
# of "read vs write" per admin subsystem is not applied here because
# this endpoint forwards an opaque operation in the body. Writes
# are the upper bound on what the endpoint can do, so we gate on
# the write/admin capability.
GLOBAL_KIND_CAPABILITY = {
"config": "config:write",
"flow": "flows:write",
"librarian": "documents:write",
"knowledge": "knowledge:write",
"collection-management": "collections:write",
# IAM endpoints land on /api/v1/iam and require the admin bundle.
# Login / bootstrap / change-password are served by
# AuthEndpoints, which handle their own gating (PUBLIC /
# AUTHENTICATED).
"iam": "users:admin",
}
# Capability required for each kind on the
# /api/v1/flow/{flow}/service/{kind} endpoint (per-flow data-plane).
FLOW_KIND_CAPABILITY = {
"agent": "agent",
"text-completion": "llm",
"prompt": "llm",
"mcp-tool": "mcp",
"graph-rag": "graph:read",
"document-rag": "documents:read",
"embeddings": "embeddings",
"graph-embeddings": "graph:read",
"document-embeddings": "documents:read",
"triples": "graph:read",
"rows": "rows:read",
"nlp-query": "rows:read",
"structured-query": "rows:read",
"structured-diag": "rows:read",
"row-embeddings": "rows:read",
"sparql": "graph:read",
}
# Capability for the streaming flow import/export endpoints,
# keyed by the "kind" URL segment.
FLOW_IMPORT_CAPABILITY = {
"triples": "graph:write",
"graph-embeddings": "graph:write",
"document-embeddings": "documents:write",
"entity-contexts": "documents:write",
"rows": "rows:write",
}
FLOW_EXPORT_CAPABILITY = {
"triples": "graph:read",
"graph-embeddings": "graph:read",
"document-embeddings": "documents:read",
"entity-contexts": "documents:read",
}
from .. capabilities import enforce, enforce_workspace
import logging as _mgr_logging
_mgr_logger = _mgr_logging.getLogger("endpoint")
class _RoutedVariableEndpoint:
"""HTTP endpoint whose required capability is looked up per
request from the URL's ``kind`` parameter. Used for the two
generic dispatch paths (``/api/v1/{kind}`` and
``/api/v1/flow/{flow}/service/{kind}``). Self-contained rather
than subclassing ``VariableEndpoint`` to avoid mutating shared
state across concurrent requests."""
def __init__(self, endpoint_path, auth, dispatcher, capability_map):
self.path = endpoint_path
self.auth = auth
self.dispatcher = dispatcher
self._capability_map = capability_map
async def start(self):
pass
def add_routes(self, app):
app.add_routes([web.post(self.path, self.handle)])
async def handle(self, request):
kind = request.match_info.get("kind", "")
cap = self._capability_map.get(kind)
if cap is None:
return web.json_response(
{"error": "unknown kind"}, status=404,
)
identity = await enforce(request, self.auth, cap)
try:
data = await request.json()
if identity is not None:
enforce_workspace(data, identity)
async def responder(x, fin):
pass
resp = await self.dispatcher.process(
data, responder, request.match_info,
)
return web.json_response(resp)
except web.HTTPException:
raise
except Exception as e:
_mgr_logger.error(f"Exception: {e}", exc_info=True)
return web.json_response({"error": str(e)})
class _RoutedSocketEndpoint:
"""WebSocket endpoint whose required capability is looked up per
request from the URL's ``kind`` parameter. Used for the flow
import/export streaming endpoints."""
def __init__(self, endpoint_path, auth, dispatcher, capability_map):
self.path = endpoint_path
self.auth = auth
self.dispatcher = dispatcher
self._capability_map = capability_map
async def start(self):
pass
def add_routes(self, app):
app.add_routes([web.get(self.path, self.handle)])
async def handle(self, request):
from .. capabilities import check, auth_failure, access_denied
kind = request.match_info.get("kind", "")
cap = self._capability_map.get(kind)
if cap is None:
return web.json_response(
{"error": "unknown kind"}, status=404,
)
token = request.query.get("token", "")
if not token:
return auth_failure()
from . socket import _QueryTokenRequest
try:
identity = await self.auth.authenticate(
_QueryTokenRequest(token)
)
except web.HTTPException as e:
return e
if not check(identity, cap):
return access_denied()
# Delegate the websocket handling to a standalone SocketEndpoint
# with the resolved capability, bypassing the per-request mutation
# concern by instantiating fresh state.
ws_ep = SocketEndpoint(
endpoint_path=self.path,
auth=self.auth,
dispatcher=self.dispatcher,
capability=cap,
)
return await ws_ep.handle(request)
class EndpointManager:
def __init__(
self, dispatcher_manager, auth, prometheus_url, timeout=600
self, dispatcher_manager, auth, prometheus_url, timeout=600,
):
self.dispatcher_manager = dispatcher_manager
self.timeout = timeout
self.services = {
}
self.endpoints = [
# Auth surface — public / authenticated-any. Must come
# before the generic /api/v1/{kind} routes to win the
# match for /api/v1/auth/* paths. aiohttp routes in
# registration order, so we prepend here.
AuthEndpoints(
iam_dispatcher=dispatcher_manager.dispatch_auth_iam(),
auth=auth,
),
I18nPackEndpoint(
endpoint_path = "/api/v1/i18n/packs/{lang}",
auth = auth,
endpoint_path="/api/v1/i18n/packs/{lang}",
auth=auth,
capability=PUBLIC,
),
MetricsEndpoint(
endpoint_path = "/api/metrics",
prometheus_url = prometheus_url,
auth = auth,
endpoint_path="/api/metrics",
prometheus_url=prometheus_url,
auth=auth,
capability="metrics:read",
),
VariableEndpoint(
endpoint_path = "/api/v1/{kind}", auth = auth,
dispatcher = dispatcher_manager.dispatch_global_service(),
# Global services: capability chosen per-kind.
_RoutedVariableEndpoint(
endpoint_path="/api/v1/{kind}",
auth=auth,
dispatcher=dispatcher_manager.dispatch_global_service(),
capability_map=GLOBAL_KIND_CAPABILITY,
),
# /api/v1/socket: WebSocket handshake accepts
# unconditionally; the Mux dispatcher runs the
# first-frame auth protocol. Handshake-time 401s break
# browser reconnection, so authentication is always
# in-band for this endpoint.
SocketEndpoint(
endpoint_path = "/api/v1/socket",
auth = auth,
dispatcher = dispatcher_manager.dispatch_socket()
endpoint_path="/api/v1/socket",
auth=auth,
dispatcher=dispatcher_manager.dispatch_socket(),
capability=AUTHENTICATED, # informational only; bypassed
in_band_auth=True,
),
VariableEndpoint(
endpoint_path = "/api/v1/flow/{flow}/service/{kind}",
auth = auth,
dispatcher = dispatcher_manager.dispatch_flow_service(),
# Per-flow request/response services — capability per kind.
_RoutedVariableEndpoint(
endpoint_path="/api/v1/flow/{flow}/service/{kind}",
auth=auth,
dispatcher=dispatcher_manager.dispatch_flow_service(),
capability_map=FLOW_KIND_CAPABILITY,
),
SocketEndpoint(
endpoint_path = "/api/v1/flow/{flow}/import/{kind}",
auth = auth,
dispatcher = dispatcher_manager.dispatch_flow_import()
# Per-flow streaming import/export — capability per kind.
_RoutedSocketEndpoint(
endpoint_path="/api/v1/flow/{flow}/import/{kind}",
auth=auth,
dispatcher=dispatcher_manager.dispatch_flow_import(),
capability_map=FLOW_IMPORT_CAPABILITY,
),
SocketEndpoint(
endpoint_path = "/api/v1/flow/{flow}/export/{kind}",
auth = auth,
dispatcher = dispatcher_manager.dispatch_flow_export()
_RoutedSocketEndpoint(
endpoint_path="/api/v1/flow/{flow}/export/{kind}",
auth=auth,
dispatcher=dispatcher_manager.dispatch_flow_export(),
capability_map=FLOW_EXPORT_CAPABILITY,
),
StreamEndpoint(
endpoint_path="/api/v1/import-core",
auth=auth,
method="POST",
dispatcher=dispatcher_manager.dispatch_core_import(),
# Cross-subject import — require the admin bundle via a
# single representative capability.
capability="users:admin",
),
StreamEndpoint(
endpoint_path = "/api/v1/import-core",
auth = auth,
method = "POST",
dispatcher = dispatcher_manager.dispatch_core_import(),
endpoint_path="/api/v1/export-core",
auth=auth,
method="GET",
dispatcher=dispatcher_manager.dispatch_core_export(),
capability="users:admin",
),
StreamEndpoint(
endpoint_path = "/api/v1/export-core",
auth = auth,
method = "GET",
dispatcher = dispatcher_manager.dispatch_core_export(),
),
StreamEndpoint(
endpoint_path = "/api/v1/document-stream",
auth = auth,
method = "GET",
dispatcher = dispatcher_manager.dispatch_document_stream(),
endpoint_path="/api/v1/document-stream",
auth=auth,
method="GET",
dispatcher=dispatcher_manager.dispatch_document_stream(),
capability="documents:read",
),
]
@ -84,4 +290,3 @@ class EndpointManager:
async def start(self):
for ep in self.endpoints:
await ep.start()

View file

@ -10,17 +10,19 @@ import asyncio
import uuid
import logging
from .. capabilities import enforce
logger = logging.getLogger("endpoint")
logger.setLevel(logging.INFO)
class MetricsEndpoint:
def __init__(self, prometheus_url, endpoint_path, auth):
def __init__(self, prometheus_url, endpoint_path, auth, capability):
self.prometheus_url = prometheus_url
self.path = endpoint_path
self.auth = auth
self.operation = "service"
self.capability = capability
async def start(self):
pass
@ -35,17 +37,7 @@ class MetricsEndpoint:
logger.debug(f"Processing metrics request: {request.path}")
try:
ht = request.headers["Authorization"]
tokens = ht.split(" ", 2)
if tokens[0] != "Bearer":
return web.HTTPUnauthorized()
token = tokens[1]
except:
token = ""
if not self.auth.permitted(token, self.operation):
return web.HTTPUnauthorized()
await enforce(request, self.auth, self.capability)
path = request.match_info["path"]
url = (

View file

@ -4,6 +4,9 @@ from aiohttp import web, WSMsgType
import logging
from .. running import Running
from .. capabilities import (
PUBLIC, AUTHENTICATED, check, auth_failure, access_denied,
)
logger = logging.getLogger("socket")
logger.setLevel(logging.INFO)
@ -11,12 +14,25 @@ logger.setLevel(logging.INFO)
class SocketEndpoint:
def __init__(
self, endpoint_path, auth, dispatcher,
self, endpoint_path, auth, dispatcher, capability,
in_band_auth=False,
):
"""
``in_band_auth=True`` skips the handshake-time auth check.
The WebSocket handshake always succeeds; the dispatcher is
expected to gate itself via the first-frame auth protocol
(see ``Mux``).
This avoids the browser problem where a 401 on the handshake
is treated as permanent and prevents reconnection, and lets
long-lived sockets refresh their credential mid-session by
sending a new auth frame.
"""
self.path = endpoint_path
self.auth = auth
self.operation = "socket"
self.capability = capability
self.in_band_auth = in_band_auth
self.dispatcher = dispatcher
@ -61,15 +77,29 @@ class SocketEndpoint:
raise
async def handle(self, request):
"""Enhanced handler with better cleanup"""
try:
token = request.query['token']
except:
token = ""
"""Enhanced handler with better cleanup.
Auth: WebSocket clients pass the bearer token on the
``?token=...`` query string; we wrap it into a synthetic
Authorization header before delegating to the standard auth
path so the IAM-backed flow (JWT / API key) applies uniformly.
The first-frame auth protocol described in the IAM spec is
a future upgrade."""
if not self.in_band_auth and self.capability != PUBLIC:
token = request.query.get("token", "")
if not token:
return auth_failure()
try:
identity = await self.auth.authenticate(
_QueryTokenRequest(token)
)
except web.HTTPException as e:
return e
if self.capability != AUTHENTICATED:
if not check(identity, self.capability):
return access_denied()
if not self.auth.permitted(token, self.operation):
return web.HTTPUnauthorized()
# 50MB max message size
ws = web.WebSocketResponse(max_msg_size=52428800)
@ -150,3 +180,11 @@ class SocketEndpoint:
web.get(self.path, self.handle),
])
class _QueryTokenRequest:
"""Minimal shim that exposes headers["Authorization"] to
IamAuth.authenticate(), derived from a query-string token."""
def __init__(self, token):
self.headers = {"Authorization": f"Bearer {token}"}

View file

@ -1,82 +1,64 @@
import asyncio
from aiohttp import web
import logging
from aiohttp import web
from .. capabilities import enforce
logger = logging.getLogger("endpoint")
logger.setLevel(logging.INFO)
class StreamEndpoint:
def __init__(self, endpoint_path, auth, dispatcher, method="POST"):
def __init__(
self, endpoint_path, auth, dispatcher, capability, method="POST",
):
self.path = endpoint_path
self.auth = auth
self.operation = "service"
self.capability = capability
self.method = method
self.dispatcher = dispatcher
async def start(self):
pass
def add_routes(self, app):
if self.method == "POST":
app.add_routes([
web.post(self.path, self.handle),
])
app.add_routes([web.post(self.path, self.handle)])
elif self.method == "GET":
app.add_routes([
web.get(self.path, self.handle),
])
app.add_routes([web.get(self.path, self.handle)])
else:
raise RuntimeError("Bad method" + self.method)
raise RuntimeError("Bad method " + self.method)
async def handle(self, request):
logger.debug(f"Processing request: {request.path}")
try:
ht = request.headers["Authorization"]
tokens = ht.split(" ", 2)
if tokens[0] != "Bearer":
return web.HTTPUnauthorized()
token = tokens[1]
except:
token = ""
if not self.auth.permitted(token, self.operation):
return web.HTTPUnauthorized()
await enforce(request, self.auth, self.capability)
try:
data = request.content
async def error(err):
return web.HTTPInternalServerError(text = err)
return web.HTTPInternalServerError(text=err)
async def ok(
status=200, reason="OK", type="application/octet-stream"
status=200, reason="OK",
type="application/octet-stream",
):
response = web.StreamResponse(
status = status, reason = reason,
headers = {"Content-Type": type}
status=status, reason=reason,
headers={"Content-Type": type},
)
await response.prepare(request)
return response
resp = await self.dispatcher.process(
data, error, ok, request
)
resp = await self.dispatcher.process(data, error, ok, request)
return resp
except web.HTTPException:
raise
except Exception as e:
logging.error(f"Exception: {e}")
return web.json_response(
{ "error": str(e) }
)
logger.error(f"Exception: {e}", exc_info=True)
return web.json_response({"error": str(e)})

View file

@ -1,27 +1,27 @@
import asyncio
from aiohttp import web
import logging
from aiohttp import web
from .. capabilities import enforce, enforce_workspace
logger = logging.getLogger("endpoint")
logger.setLevel(logging.INFO)
class VariableEndpoint:
def __init__(self, endpoint_path, auth, dispatcher):
def __init__(self, endpoint_path, auth, dispatcher, capability):
self.path = endpoint_path
self.auth = auth
self.operation = "service"
self.capability = capability
self.dispatcher = dispatcher
async def start(self):
pass
def add_routes(self, app):
app.add_routes([
web.post(self.path, self.handle),
])
@ -30,35 +30,25 @@ class VariableEndpoint:
logger.debug(f"Processing request: {request.path}")
try:
ht = request.headers["Authorization"]
tokens = ht.split(" ", 2)
if tokens[0] != "Bearer":
return web.HTTPUnauthorized()
token = tokens[1]
except:
token = ""
if not self.auth.permitted(token, self.operation):
return web.HTTPUnauthorized()
identity = await enforce(request, self.auth, self.capability)
try:
data = await request.json()
if identity is not None:
enforce_workspace(data, identity)
async def responder(x, fin):
pass
resp = await self.dispatcher.process(
data, responder, request.match_info
data, responder, request.match_info,
)
return web.json_response(resp)
except web.HTTPException:
raise
except Exception as e:
logging.error(f"Exception: {e}")
return web.json_response(
{ "error": str(e) }
)
logger.error(f"Exception: {e}", exc_info=True)
return web.json_response({"error": str(e)})

View file

@ -12,7 +12,7 @@ import os
from trustgraph.base.logging import setup_logging, add_logging_args
from trustgraph.base.pubsub import get_pubsub, add_pubsub_args
from . auth import Authenticator
from . auth import IamAuth
from . config.receiver import ConfigReceiver
from . dispatch.manager import DispatcherManager
@ -35,7 +35,6 @@ default_prometheus_url = os.getenv("PROMETHEUS_URL", "http://prometheus:9090")
default_pulsar_api_key = os.getenv("PULSAR_API_KEY", None)
default_timeout = 600
default_port = 8088
default_api_token = os.getenv("GATEWAY_SECRET", "")
class Api:
@ -60,13 +59,14 @@ class Api:
if not self.prometheus_url.endswith("/"):
self.prometheus_url += "/"
api_token = config.get("api_token", default_api_token)
# Token not set, or token equal empty string means no auth
if api_token:
self.auth = Authenticator(token=api_token)
else:
self.auth = Authenticator(allow_all=True)
# IAM-backed authentication. The legacy GATEWAY_SECRET
# shared-token path has been removed — there is no
# "open for everyone" fallback. The gateway cannot
# authenticate any request until IAM is reachable.
self.auth = IamAuth(
backend=self.pubsub_backend,
id=config.get("id", "api-gateway"),
)
self.config_receiver = ConfigReceiver(self.pubsub_backend)
@ -118,6 +118,7 @@ class Api:
config_receiver = self.config_receiver,
prefix = "gateway",
queue_overrides = queue_overrides,
auth = self.auth,
)
self.endpoint_manager = EndpointManager(
@ -132,12 +133,18 @@ class Api:
]
async def app_factory(self):
self.app = web.Application(
middlewares=[],
client_max_size=256 * 1024 * 1024
)
# Fetch IAM signing public key before accepting traffic.
# Blocks for a bounded retry window; the gateway starts even
# if IAM is still unreachable (JWT validation will 401 until
# the key is available).
await self.auth.start()
await self.config_receiver.start()
for ep in self.endpoints:
@ -189,12 +196,6 @@ def run():
help=f'API request timeout in seconds (default: {default_timeout})',
)
parser.add_argument(
'--api-token',
default=default_api_token,
help=f'Secret API token (default: no auth)',
)
add_logging_args(parser)
parser.add_argument(

View file

@ -0,0 +1 @@
from . service import *

View file

@ -0,0 +1,4 @@
from . service import run
run()

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,210 @@
"""
IAM service processor. Terminates the IAM request queue and forwards
each request to the IamService business logic, then returns the
response on the IAM response queue.
Shape mirrors trustgraph.config.service.
"""
import logging
from trustgraph.schema import Error
from trustgraph.schema import IamRequest, IamResponse
from trustgraph.schema import iam_request_queue, iam_response_queue
from trustgraph.base import AsyncProcessor, Consumer, Producer
from trustgraph.base import ConsumerMetrics, ProducerMetrics
from trustgraph.base.cassandra_config import (
add_cassandra_args, resolve_cassandra_config,
)
from . iam import IamService
logger = logging.getLogger(__name__)
default_ident = "iam-svc"
default_iam_request_queue = iam_request_queue
default_iam_response_queue = iam_response_queue
class Processor(AsyncProcessor):
def __init__(self, **params):
iam_req_q = params.get(
"iam_request_queue", default_iam_request_queue,
)
iam_resp_q = params.get(
"iam_response_queue", default_iam_response_queue,
)
bootstrap_mode = params.get("bootstrap_mode")
bootstrap_token = params.get("bootstrap_token")
if bootstrap_mode not in ("token", "bootstrap"):
raise RuntimeError(
"iam-svc: --bootstrap-mode is required. Set to 'token' "
"(with --bootstrap-token) for production, or 'bootstrap' "
"to enable the explicit bootstrap operation over the "
"pub/sub bus (dev / quick-start only, not safe under "
"public exposure). Refusing to start."
)
if bootstrap_mode == "token" and not bootstrap_token:
raise RuntimeError(
"iam-svc: --bootstrap-mode=token requires "
"--bootstrap-token. Refusing to start."
)
if bootstrap_mode == "bootstrap" and bootstrap_token:
raise RuntimeError(
"iam-svc: --bootstrap-token is not accepted when "
"--bootstrap-mode=bootstrap. Ambiguous intent. "
"Refusing to start."
)
self.bootstrap_mode = bootstrap_mode
self.bootstrap_token = bootstrap_token
cassandra_host = params.get("cassandra_host")
cassandra_username = params.get("cassandra_username")
cassandra_password = params.get("cassandra_password")
hosts, username, password, keyspace = resolve_cassandra_config(
host=cassandra_host,
username=cassandra_username,
password=cassandra_password,
default_keyspace="iam",
)
self.cassandra_host = hosts
self.cassandra_username = username
self.cassandra_password = password
super().__init__(
**params | {
"iam_request_schema": IamRequest.__name__,
"iam_response_schema": IamResponse.__name__,
"cassandra_host": self.cassandra_host,
"cassandra_username": self.cassandra_username,
"cassandra_password": self.cassandra_password,
}
)
iam_request_metrics = ConsumerMetrics(
processor=self.id, flow=None, name="iam-request",
)
iam_response_metrics = ProducerMetrics(
processor=self.id, flow=None, name="iam-response",
)
self.iam_request_topic = iam_req_q
self.iam_request_consumer = Consumer(
taskgroup=self.taskgroup,
backend=self.pubsub,
flow=None,
topic=iam_req_q,
subscriber=self.id,
schema=IamRequest,
handler=self.on_iam_request,
metrics=iam_request_metrics,
)
self.iam_response_producer = Producer(
backend=self.pubsub,
topic=iam_resp_q,
schema=IamResponse,
metrics=iam_response_metrics,
)
self.iam = IamService(
host=self.cassandra_host,
username=self.cassandra_username,
password=self.cassandra_password,
keyspace=keyspace,
bootstrap_mode=self.bootstrap_mode,
bootstrap_token=self.bootstrap_token,
)
logger.info(
f"IAM service initialised (bootstrap-mode={self.bootstrap_mode})"
)
async def start(self):
await self.pubsub.ensure_topic(self.iam_request_topic)
# Token-mode auto-bootstrap runs before we accept requests so
# the first inbound call always sees a populated table.
await self.iam.auto_bootstrap_if_token_mode()
await self.iam_request_consumer.start()
async def on_iam_request(self, msg, consumer, flow):
id = None
try:
v = msg.value()
id = msg.properties()["id"]
logger.debug(
f"Handling IAM request {id} op={v.operation!r}"
)
resp = await self.iam.handle(v)
await self.iam_response_producer.send(
resp, properties={"id": id},
)
except Exception as e:
logger.error(
f"IAM request failed: {type(e).__name__}: {e}",
exc_info=True,
)
resp = IamResponse(
error=Error(type="internal-error", message=str(e)),
)
if id is not None:
await self.iam_response_producer.send(
resp, properties={"id": id},
)
@staticmethod
def add_args(parser):
AsyncProcessor.add_args(parser)
parser.add_argument(
"--iam-request-queue",
default=default_iam_request_queue,
help=f"IAM request queue (default: {default_iam_request_queue})",
)
parser.add_argument(
"--iam-response-queue",
default=default_iam_response_queue,
help=f"IAM response queue (default: {default_iam_response_queue})",
)
parser.add_argument(
"--bootstrap-mode",
default=None,
choices=["token", "bootstrap"],
help=(
"IAM bootstrap mode (required). "
"'token' = operator supplies the initial admin API "
"key via --bootstrap-token; auto-seeds on first start, "
"bootstrap operation refused. "
"'bootstrap' = bootstrap operation is live over the "
"bus until tables are populated; a token is generated "
"and returned by tg-bootstrap-iam. Unsafe to run "
"'bootstrap' mode with public exposure."
),
)
parser.add_argument(
"--bootstrap-token",
default=None,
help=(
"Initial admin API key plaintext, required when "
"--bootstrap-mode=token. Treat as a one-time "
"credential: the operator should rotate to a new key "
"and revoke this one after first use."
),
)
add_cassandra_args(parser)
def run():
Processor.launch(default_ident, __doc__)

View file

@ -0,0 +1,422 @@
"""
IAM Cassandra table store.
Tables:
- iam_workspaces (id primary key)
- iam_users (id primary key) + iam_users_by_username lookup table
(workspace, username) -> id
- iam_api_keys (key_hash primary key) with secondary index on user_id
- iam_signing_keys (kid primary key) RSA keypairs for JWT signing
See docs/tech-specs/iam-protocol.md for the wire-level context.
"""
import logging
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from ssl import SSLContext, PROTOCOL_TLSv1_2
from . cassandra_async import async_execute
logger = logging.getLogger(__name__)
class IamTableStore:
def __init__(
self,
cassandra_host, cassandra_username, cassandra_password,
keyspace,
):
self.keyspace = keyspace
logger.info("IAM: connecting to Cassandra...")
if isinstance(cassandra_host, str):
cassandra_host = [h.strip() for h in cassandra_host.split(",")]
if cassandra_username and cassandra_password:
ssl_context = SSLContext(PROTOCOL_TLSv1_2)
auth_provider = PlainTextAuthProvider(
username=cassandra_username, password=cassandra_password,
)
self.cluster = Cluster(
cassandra_host,
auth_provider=auth_provider,
ssl_context=ssl_context,
)
else:
self.cluster = Cluster(cassandra_host)
self.cassandra = self.cluster.connect()
logger.info("IAM: connected.")
self._ensure_schema()
self._prepare_statements()
def _ensure_schema(self):
# FIXME: Replication factor should be configurable.
self.cassandra.execute(f"""
create keyspace if not exists {self.keyspace}
with replication = {{
'class' : 'SimpleStrategy',
'replication_factor' : 1
}};
""")
self.cassandra.set_keyspace(self.keyspace)
self.cassandra.execute("""
CREATE TABLE IF NOT EXISTS iam_workspaces (
id text PRIMARY KEY,
name text,
enabled boolean,
created timestamp
);
""")
self.cassandra.execute("""
CREATE TABLE IF NOT EXISTS iam_users (
id text PRIMARY KEY,
workspace text,
username text,
name text,
email text,
password_hash text,
roles set<text>,
enabled boolean,
must_change_password boolean,
created timestamp
);
""")
self.cassandra.execute("""
CREATE TABLE IF NOT EXISTS iam_users_by_username (
workspace text,
username text,
user_id text,
PRIMARY KEY ((workspace), username)
);
""")
self.cassandra.execute("""
CREATE TABLE IF NOT EXISTS iam_api_keys (
key_hash text PRIMARY KEY,
id text,
user_id text,
name text,
prefix text,
expires timestamp,
created timestamp,
last_used timestamp
);
""")
self.cassandra.execute("""
CREATE INDEX IF NOT EXISTS iam_api_keys_user_id_idx
ON iam_api_keys (user_id);
""")
self.cassandra.execute("""
CREATE INDEX IF NOT EXISTS iam_api_keys_id_idx
ON iam_api_keys (id);
""")
self.cassandra.execute("""
CREATE TABLE IF NOT EXISTS iam_signing_keys (
kid text PRIMARY KEY,
private_pem text,
public_pem text,
created timestamp,
retired timestamp
);
""")
logger.info("IAM: Cassandra schema OK.")
def _prepare_statements(self):
c = self.cassandra
self.put_workspace_stmt = c.prepare("""
INSERT INTO iam_workspaces (id, name, enabled, created)
VALUES (?, ?, ?, ?)
""")
self.get_workspace_stmt = c.prepare("""
SELECT id, name, enabled, created FROM iam_workspaces
WHERE id = ?
""")
self.list_workspaces_stmt = c.prepare("""
SELECT id, name, enabled, created FROM iam_workspaces
""")
self.put_user_stmt = c.prepare("""
INSERT INTO iam_users (
id, workspace, username, name, email, password_hash,
roles, enabled, must_change_password, created
)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""")
self.get_user_stmt = c.prepare("""
SELECT id, workspace, username, name, email, password_hash,
roles, enabled, must_change_password, created
FROM iam_users WHERE id = ?
""")
self.list_users_by_workspace_stmt = c.prepare("""
SELECT id, workspace, username, name, email, password_hash,
roles, enabled, must_change_password, created
FROM iam_users WHERE workspace = ? ALLOW FILTERING
""")
self.put_username_lookup_stmt = c.prepare("""
INSERT INTO iam_users_by_username (workspace, username, user_id)
VALUES (?, ?, ?)
""")
self.get_user_id_by_username_stmt = c.prepare("""
SELECT user_id FROM iam_users_by_username
WHERE workspace = ? AND username = ?
""")
self.delete_username_lookup_stmt = c.prepare("""
DELETE FROM iam_users_by_username
WHERE workspace = ? AND username = ?
""")
self.delete_user_stmt = c.prepare("""
DELETE FROM iam_users WHERE id = ?
""")
self.put_api_key_stmt = c.prepare("""
INSERT INTO iam_api_keys (
key_hash, id, user_id, name, prefix, expires,
created, last_used
)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""")
self.get_api_key_by_hash_stmt = c.prepare("""
SELECT key_hash, id, user_id, name, prefix, expires,
created, last_used
FROM iam_api_keys WHERE key_hash = ?
""")
self.get_api_key_by_id_stmt = c.prepare("""
SELECT key_hash, id, user_id, name, prefix, expires,
created, last_used
FROM iam_api_keys WHERE id = ?
""")
self.list_api_keys_by_user_stmt = c.prepare("""
SELECT key_hash, id, user_id, name, prefix, expires,
created, last_used
FROM iam_api_keys WHERE user_id = ?
""")
self.delete_api_key_stmt = c.prepare("""
DELETE FROM iam_api_keys WHERE key_hash = ?
""")
self.put_signing_key_stmt = c.prepare("""
INSERT INTO iam_signing_keys (
kid, private_pem, public_pem, created, retired
)
VALUES (?, ?, ?, ?, ?)
""")
self.list_signing_keys_stmt = c.prepare("""
SELECT kid, private_pem, public_pem, created, retired
FROM iam_signing_keys
""")
self.retire_signing_key_stmt = c.prepare("""
UPDATE iam_signing_keys SET retired = ? WHERE kid = ?
""")
self.update_user_profile_stmt = c.prepare("""
UPDATE iam_users
SET name = ?, email = ?, roles = ?, enabled = ?,
must_change_password = ?
WHERE id = ?
""")
self.update_user_password_stmt = c.prepare("""
UPDATE iam_users
SET password_hash = ?, must_change_password = ?
WHERE id = ?
""")
self.update_user_enabled_stmt = c.prepare("""
UPDATE iam_users SET enabled = ? WHERE id = ?
""")
self.update_workspace_stmt = c.prepare("""
UPDATE iam_workspaces SET name = ?, enabled = ?
WHERE id = ?
""")
# ------------------------------------------------------------------
# Workspaces
# ------------------------------------------------------------------
async def put_workspace(self, id, name, enabled, created):
await async_execute(
self.cassandra, self.put_workspace_stmt,
(id, name, enabled, created),
)
async def get_workspace(self, id):
rows = await async_execute(
self.cassandra, self.get_workspace_stmt, (id,),
)
return rows[0] if rows else None
async def list_workspaces(self):
return await async_execute(
self.cassandra, self.list_workspaces_stmt,
)
# ------------------------------------------------------------------
# Users
# ------------------------------------------------------------------
async def put_user(
self, id, workspace, username, name, email, password_hash,
roles, enabled, must_change_password, created,
):
await async_execute(
self.cassandra, self.put_user_stmt,
(
id, workspace, username, name, email, password_hash,
set(roles) if roles else set(),
enabled, must_change_password, created,
),
)
await async_execute(
self.cassandra, self.put_username_lookup_stmt,
(workspace, username, id),
)
async def get_user(self, id):
rows = await async_execute(
self.cassandra, self.get_user_stmt, (id,),
)
return rows[0] if rows else None
async def get_user_id_by_username(self, workspace, username):
rows = await async_execute(
self.cassandra, self.get_user_id_by_username_stmt,
(workspace, username),
)
return rows[0][0] if rows else None
async def list_users_by_workspace(self, workspace):
return await async_execute(
self.cassandra, self.list_users_by_workspace_stmt, (workspace,),
)
async def delete_user(self, id):
await async_execute(
self.cassandra, self.delete_user_stmt, (id,),
)
async def delete_username_lookup(self, workspace, username):
await async_execute(
self.cassandra, self.delete_username_lookup_stmt,
(workspace, username),
)
# ------------------------------------------------------------------
# API keys
# ------------------------------------------------------------------
async def put_api_key(
self, key_hash, id, user_id, name, prefix, expires,
created, last_used,
):
await async_execute(
self.cassandra, self.put_api_key_stmt,
(key_hash, id, user_id, name, prefix, expires,
created, last_used),
)
async def get_api_key_by_hash(self, key_hash):
rows = await async_execute(
self.cassandra, self.get_api_key_by_hash_stmt, (key_hash,),
)
return rows[0] if rows else None
async def get_api_key_by_id(self, id):
rows = await async_execute(
self.cassandra, self.get_api_key_by_id_stmt, (id,),
)
return rows[0] if rows else None
async def list_api_keys_by_user(self, user_id):
return await async_execute(
self.cassandra, self.list_api_keys_by_user_stmt, (user_id,),
)
async def delete_api_key(self, key_hash):
await async_execute(
self.cassandra, self.delete_api_key_stmt, (key_hash,),
)
# ------------------------------------------------------------------
# Signing keys
# ------------------------------------------------------------------
async def put_signing_key(self, kid, private_pem, public_pem,
created, retired):
await async_execute(
self.cassandra, self.put_signing_key_stmt,
(kid, private_pem, public_pem, created, retired),
)
async def list_signing_keys(self):
return await async_execute(
self.cassandra, self.list_signing_keys_stmt,
)
async def retire_signing_key(self, kid, retired):
await async_execute(
self.cassandra, self.retire_signing_key_stmt,
(retired, kid),
)
# ------------------------------------------------------------------
# User partial updates
# ------------------------------------------------------------------
async def update_user_profile(
self, id, name, email, roles, enabled, must_change_password,
):
await async_execute(
self.cassandra, self.update_user_profile_stmt,
(
name, email,
set(roles) if roles else set(),
enabled, must_change_password, id,
),
)
async def update_user_password(
self, id, password_hash, must_change_password,
):
await async_execute(
self.cassandra, self.update_user_password_stmt,
(password_hash, must_change_password, id),
)
async def update_user_enabled(self, id, enabled):
await async_execute(
self.cassandra, self.update_user_enabled_stmt,
(enabled, id),
)
# ------------------------------------------------------------------
# Workspace updates
# ------------------------------------------------------------------
async def update_workspace(self, id, name, enabled):
await async_execute(
self.cassandra, self.update_workspace_stmt,
(name, enabled, id),
)
# ------------------------------------------------------------------
# Bootstrap helpers
# ------------------------------------------------------------------
async def any_workspace_exists(self):
rows = await self.list_workspaces()
return bool(rows)