refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853)

The gateway no longer holds any policy state — capability sets, role
definitions, workspace scope rules.  Per the IAM contract it asks the
regime "may this identity perform this capability on this resource?"
per request.  That moves the OSS role-based regime entirely into
iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing
the gateway, the wire protocol, or backend services.

Contract:
- authenticate(credential) -> Identity (handle, workspace,
  principal_id, source).  No roles, claims, or policy state surface
  to the gateway.
- authorise(identity, capability, resource, parameters) -> (allow,
  ttl).  Cached per-decision (regime TTL clamped above; fail-closed
  on regime errors).
- authorise_many available as a fan-out variant.

Operation registry drives every authorisation decision:
- /api/v1/iam -> IamEndpoint, looks up bare op name (create-user,
  list-workspaces, ...).
- /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op>
  (config:get, flow:list-blueprints, librarian:add-document, ...).
- /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>.
- /api/v1/flow/{flow}/{import,export}/{kind} ->
  flow-{import,export}:<kind>.
- WS Mux per-frame -> flow-service:<kind>; closes a gap where
  authenticated users could hit any service kind.
85 operations registered across the surface.

JWT carries identity only — sub + workspace.  The roles claim is gone;
the gateway never reads policy state from a credential.

The three coarse *_KIND_CAPABILITY maps are removed.  The registry is
the only source of truth for the capability + resource shape of an
operation.  Tests migrated to the new Identity shape and to
authorise()-mocked auth doubles.

Specs updated: docs/tech-specs/iam-contract.md (Identity surface,
caching, registry-naming conventions), iam.md (JWT shape, gateway
flow, role section reframed as OSS-regime detail), iam-protocol.md
(positioned as one implementation of the contract).
This commit is contained in:
cybermaggedon 2026-04-28 16:19:41 +01:00 committed by GitHub
parent 9f2d9adcb1
commit 5e28d3cce0
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
24 changed files with 2359 additions and 587 deletions

View file

@ -1,15 +1,16 @@
"""
IAM-backed authentication for the API gateway.
IAM-backed authentication and authorisation for the API gateway.
Replaces the legacy GATEWAY_SECRET shared-token Authenticator. The
gateway is now stateless with respect to credentials: it either
verifies a JWT locally using the active IAM signing public key, or
resolves an API key by hash with a short local cache backed by the
IAM service.
The gateway delegates both authentication ("who is this caller?")
and authorisation ("may they do this?") to the IAM regime via the
contract specified in docs/tech-specs/iam-contract.md. No regime-
specific policy (roles, scopes, claims) lives in the gateway.
Identity returned by authenticate() is the (user_id, workspace,
roles) triple the rest of the gateway capability checks, workspace
resolver, audit logging needs.
- Authentication: API keys are resolved by IAM; JWTs are validated
locally against the cached signing public key.
- Authorisation: every per-request decision is asked of IAM via
``authorise(identity, capability, resource, parameters)``, with
results cached for the TTL the regime returns.
"""
import asyncio
@ -19,7 +20,7 @@ import json
import logging
import time
import uuid
from dataclasses import dataclass
from dataclasses import dataclass, field
from aiohttp import web
@ -37,12 +38,34 @@ logger = logging.getLogger("auth")
API_KEY_CACHE_TTL = 60 # seconds
# Upper bound on cache TTL the gateway honours for an authorisation
# decision, regardless of what the regime suggested. Caps the
# revocation latency window.
AUTHZ_CACHE_TTL_MAX = 60 # seconds
@dataclass
class Identity:
user_id: str
"""The gateway-side surface of an authenticated caller.
Per the IAM contract this is a small fixed shape; regime-internal
state (roles, claims, group memberships) is reachable only via
the regime's ``authorise`` operation. The gateway itself never
reads policy from this object.
"""
# Opaque handle, quoted back when calling ``authorise``. For
# the OSS regime this is the user record's id; the gateway
# treats it as a string with no semantic content.
handle: str
# The workspace this credential authenticates to. Used by the
# gateway as the default-fill-in for operations that omit a
# workspace. Never used as policy input.
workspace: str
roles: list
# Stable identifier for audit logs. In OSS this is the same
# value as ``handle``; not assumed equal in the contract.
principal_id: str
# How the credential was presented. Non-policy; useful for
# logs / metrics only.
source: str # "api-key" | "jwt"
@ -111,6 +134,13 @@ class IamAuth:
self._key_cache = {}
self._key_cache_lock = asyncio.Lock()
# Authorisation decision cache: hash(handle, capability,
# resource, parameters) -> (allow_bool, expires_ts). Holds
# both allows and denies — denies cached briefly to avoid
# hammering iam-svc with repeated rejected attempts.
self._authz_cache: dict[str, tuple[bool, float]] = {}
self._authz_cache_lock = asyncio.Lock()
# ------------------------------------------------------------------
# Short-lived client helper. Mirrors the pattern used by the
# bootstrap framework and AsyncProcessor: a fresh uuid suffix per
@ -221,12 +251,13 @@ class IamAuth:
sub = claims.get("sub", "")
ws = claims.get("workspace", "")
roles = list(claims.get("roles", []))
if not sub or not ws:
raise _auth_failure()
# JWT carries no policy state under the IAM contract;
# any roles / claims field is ignored here.
return Identity(
user_id=sub, workspace=ws, roles=roles, source="jwt",
handle=sub, workspace=ws, principal_id=sub, source="jwt",
)
async def _resolve_api_key(self, plaintext):
@ -245,7 +276,10 @@ class IamAuth:
try:
async def _call(client):
return await client.resolve_api_key(plaintext)
user_id, workspace, roles = await self._with_client(_call)
# ``roles`` is returned by the OSS regime as a hint
# but is not consulted by the gateway; all policy
# decisions go through ``authorise``.
user_id, workspace, _roles = await self._with_client(_call)
except Exception as e:
logger.debug(
f"API key resolution failed: "
@ -257,8 +291,81 @@ class IamAuth:
raise _auth_failure()
identity = Identity(
user_id=user_id, workspace=workspace,
roles=list(roles), source="api-key",
handle=user_id, workspace=workspace,
principal_id=user_id, source="api-key",
)
self._key_cache[h] = (identity, now + API_KEY_CACHE_TTL)
return identity
# ------------------------------------------------------------------
# Authorisation
# ------------------------------------------------------------------
@staticmethod
def _authz_cache_key(handle, capability, resource, parameters):
payload = json.dumps(
{
"h": handle,
"c": capability,
"r": resource or {},
"p": parameters or {},
},
sort_keys=True,
separators=(",", ":"),
)
return hashlib.sha256(payload.encode("utf-8")).hexdigest()
async def authorise(self, identity, capability, resource, parameters):
"""Ask the IAM regime whether ``identity`` may perform
``capability`` on ``resource`` given ``parameters``.
Caches the decision for the regime's suggested TTL, clamped
above by ``AUTHZ_CACHE_TTL_MAX``. Both allow and deny
decisions are cached (denies briefly, to avoid hammering
iam-svc with repeated rejected attempts).
Raises ``HTTPForbidden`` (403 / "access denied") on a deny
decision. Raises ``HTTPUnauthorized`` (401 / "auth failure")
if the IAM service errors out failing closed."""
key = self._authz_cache_key(
identity.handle, capability, resource, parameters,
)
now = time.time()
cached = self._authz_cache.get(key)
if cached and cached[1] > now:
allow, _ = cached
if not allow:
raise _access_denied()
return
async with self._authz_cache_lock:
cached = self._authz_cache.get(key)
if cached and cached[1] > now:
allow, _ = cached
if not allow:
raise _access_denied()
return
try:
async def _call(client):
return await client.authorise(
identity.handle, capability,
resource or {}, parameters or {},
)
allow, ttl = await self._with_client(_call)
except Exception as e:
logger.warning(
f"authorise failed: {type(e).__name__}: {e}; "
f"failing closed for "
f"{identity.principal_id!r} cap={capability!r}"
)
raise _auth_failure()
ttl = max(0, min(int(ttl or 0), AUTHZ_CACHE_TTL_MAX))
self._authz_cache[key] = (bool(allow), now + ttl)
if not allow:
raise _access_denied()
return

View file

@ -1,36 +1,23 @@
"""
Capability vocabulary, role definitions, and authorisation helpers.
Gateway-side authorisation entry points.
See docs/tech-specs/capabilities.md for the authoritative description.
The data here is the OSS bundle table in that spec. Enterprise
editions may replace this module with their own role table; the
vocabulary (capability strings) is shared.
Under the IAM contract (see docs/tech-specs/iam-contract.md) the
gateway holds *no* policy state. Roles, capability sets, and
workspace-scope rules all live in the IAM regime (iam-svc for OSS).
This module is the thin surface the gateway uses to ask the regime
for a decision:
Role model
----------
A role has two dimensions:
- ``PUBLIC`` / ``AUTHENTICATED`` sentinels for endpoints that don't
go through capability-based authorisation.
- :func:`enforce` authenticate-only, then ask the regime.
- :func:`enforce_workspace` default-fill the workspace from the
caller's bound workspace and ask the regime, with the workspace
treated as the resource address.
1. **capability set** which operations the role grants.
2. **workspace scope** which workspaces the role is active in.
The authorisation question is: *given the caller's roles, a required
capability, and a target workspace, does any role grant the
capability AND apply to the target workspace?*
Workspace scope values recognised here:
- ``"assigned"`` the role applies only to the caller's own
assigned workspace (stored on their user record).
- ``"*"`` the role applies to every workspace.
Enterprise editions can add richer scopes (explicit permitted-set,
patterns, etc.) without changing the wire protocol.
Sentinels
---------
- ``PUBLIC`` endpoint requires no authentication.
- ``AUTHENTICATED`` endpoint requires a valid identity, no
specific capability.
The capability strings themselves are an open vocabulary see
docs/tech-specs/capabilities.md. The gateway does not validate them
beyond passing them through; an unknown capability simply produces a
deny verdict from the regime.
"""
from aiohttp import web
@ -40,125 +27,6 @@ PUBLIC = "__public__"
AUTHENTICATED = "__authenticated__"
# Capability vocabulary. Mirrors the "Capability list" tables in
# capabilities.md. Kept as a set so the gateway can fail-closed on
# an endpoint that declares an unknown capability.
KNOWN_CAPABILITIES = {
# Data plane
"agent",
"graph:read", "graph:write",
"documents:read", "documents:write",
"rows:read", "rows:write",
"llm",
"embeddings",
"mcp",
# Control plane
"config:read", "config:write",
"flows:read", "flows:write",
"users:read", "users:write", "users:admin",
"keys:self", "keys:admin",
"workspaces:admin",
"iam:admin",
"metrics:read",
"collections:read", "collections:write",
"knowledge:read", "knowledge:write",
}
# Capability sets used below.
_READER_CAPS = {
"agent",
"graph:read",
"documents:read",
"rows:read",
"llm",
"embeddings",
"mcp",
"config:read",
"flows:read",
"collections:read",
"knowledge:read",
"keys:self",
}
_WRITER_CAPS = _READER_CAPS | {
"graph:write",
"documents:write",
"rows:write",
"collections:write",
"knowledge:write",
}
_ADMIN_CAPS = _WRITER_CAPS | {
"config:write",
"flows:write",
"users:read", "users:write", "users:admin",
"keys:admin",
"workspaces:admin",
"iam:admin",
"metrics:read",
}
# Role definitions. Each role has a capability set and a workspace
# scope. Enterprise overrides this mapping.
ROLE_DEFINITIONS = {
"reader": {
"capabilities": _READER_CAPS,
"workspace_scope": "assigned",
},
"writer": {
"capabilities": _WRITER_CAPS,
"workspace_scope": "assigned",
},
"admin": {
"capabilities": _ADMIN_CAPS,
"workspace_scope": "*",
},
}
def _scope_permits(role_name, target_workspace, assigned_workspace):
"""Does the given role apply to ``target_workspace``?"""
role = ROLE_DEFINITIONS.get(role_name)
if role is None:
return False
scope = role["workspace_scope"]
if scope == "*":
return True
if scope == "assigned":
return target_workspace == assigned_workspace
# Future scope types (lists, patterns) extend here.
return False
def check(identity, capability, target_workspace=None):
"""Is ``identity`` permitted to invoke ``capability`` on
``target_workspace``?
Passes iff some role held by the caller both (a) grants
``capability`` and (b) is active in ``target_workspace``.
``target_workspace`` defaults to the caller's assigned workspace,
which makes this function usable for system-level operations and
for authenticated endpoints that don't take a workspace argument
(the call collapses to "do any of my roles grant this cap?")."""
if capability not in KNOWN_CAPABILITIES:
return False
target = target_workspace or identity.workspace
for role_name in identity.roles:
role = ROLE_DEFINITIONS.get(role_name)
if role is None:
continue
if capability not in role["capabilities"]:
continue
if _scope_permits(role_name, target, identity.workspace):
return True
return False
def access_denied():
return web.HTTPForbidden(
text='{"error":"access denied"}',
@ -174,21 +42,19 @@ def auth_failure():
async def enforce(request, auth, capability):
"""Authenticate + capability-check for endpoints that carry no
workspace dimension on the request (metrics, i18n, etc.).
"""Authenticate the caller and (for non-sentinel capabilities)
ask the IAM regime whether they may invoke ``capability``.
For endpoints that carry a workspace field on the body, call
:func:`enforce_workspace` *after* parsing the body to validate
the workspace and re-check the capability in that scope. Most
endpoints do both.
The resource is system-level (``{}``) and parameters are empty
use :func:`enforce_workspace` for workspace-scoped endpoints, or
drive authorisation through the operation registry for richer
cases.
- ``PUBLIC``: no authentication, returns ``None``.
- ``AUTHENTICATED``: any valid identity.
- capability string: identity must have it, checked against the
caller's assigned workspace (adequate for endpoints whose
capability is system-level, e.g. ``metrics:read``, or where
the real workspace-aware check happens in
:func:`enforce_workspace` after body parsing)."""
- ``PUBLIC``: returns ``None`` no authentication.
- ``AUTHENTICATED``: returns the ``Identity`` no authorisation.
- capability string: returns the ``Identity`` if the regime
allows; raises ``HTTPForbidden`` otherwise.
"""
if capability == PUBLIC:
return None
@ -197,42 +63,38 @@ async def enforce(request, auth, capability):
if capability == AUTHENTICATED:
return identity
if not check(identity, capability):
raise access_denied()
await auth.authorise(identity, capability, {}, {})
return identity
def enforce_workspace(data, identity, capability=None):
"""Resolve + validate the workspace on a request body.
async def enforce_workspace(data, identity, auth, capability=None):
"""Default-fill the workspace on a request body and (optionally)
authorise the caller for ``capability`` against that workspace.
- Target workspace = ``data["workspace"]`` if supplied, else the
caller's assigned workspace.
- At least one of the caller's roles must (a) be active in the
target workspace and, if ``capability`` is given, (b) grant
``capability``. Otherwise 403.
caller's bound workspace.
- On success, ``data["workspace"]`` is overwritten with the
resolved value callers can rely on the outgoing message
having the gateway's chosen workspace rather than any
caller-supplied value.
resolved value so downstream code sees a single canonical
address.
- When ``capability`` is given, the regime is asked whether the
caller may invoke ``capability`` on ``{workspace: target}``.
Raises ``HTTPForbidden`` on a deny.
For ``capability=None`` the workspace scope alone is checked
useful when the body has a workspace but the endpoint already
passed its capability check (e.g. via :func:`enforce`)."""
For ``capability=None`` no authorisation call is made the
caller has presumably already authorised via :func:`enforce`
(handy for endpoints that authorise once then resolve workspace
on the body before forwarding).
"""
if not isinstance(data, dict):
return data
requested = data.get("workspace", "")
target = requested or identity.workspace
data["workspace"] = target
for role_name in identity.roles:
role = ROLE_DEFINITIONS.get(role_name)
if role is None:
continue
if capability is not None and capability not in role["capabilities"]:
continue
if _scope_permits(role_name, target, identity.workspace):
data["workspace"] = target
return data
if capability is not None:
await auth.authorise(
identity, capability, {"workspace": target}, {},
)
raise access_denied()
return data

View file

@ -121,20 +121,45 @@ class Mux:
})
return
# Workspace resolution. Role workspace scope determines
# which target workspaces are permitted. The resolved
# value is written to both the envelope and the inner
# request payload so clients don't have to repeat it
# per-message (same convenience HTTP callers get via
# enforce_workspace).
# Per-service capability gating. Resolved through the
# operation registry so the WS path matches what HTTP
# callers see — same authority, same caps. Service
# kinds that aren't registered are refused.
from ..registry import lookup as _registry_lookup
from ..capabilities import enforce_workspace
from aiohttp import web as _web
service = data.get("service", "")
op = _registry_lookup(f"flow-service:{service}")
if op is None:
await self.ws.send_json({
"id": request_id,
"error": {
"message": "unknown service",
"type": "unknown-service",
},
"complete": True,
})
return
# Workspace + flow form the resource address for a
# flow-level service call. Resolve workspace first
# (default-fill from the caller's bound workspace),
# then ask the regime to authorise the service-level
# capability against that {workspace, flow} resource.
try:
enforce_workspace(data, self.identity)
await enforce_workspace(data, self.identity, self.auth)
inner = data.get("request")
if isinstance(inner, dict):
enforce_workspace(inner, self.identity)
await enforce_workspace(inner, self.identity, self.auth)
resource = {
"workspace": data.get("workspace", ""),
"flow": data.get("flow", ""),
}
await self.auth.authorise(
self.identity, op.capability, resource, {},
)
except _web.HTTPForbidden:
await self.ws.send_json({
"id": request_id,
@ -145,6 +170,16 @@ class Mux:
"complete": True,
})
return
except _web.HTTPUnauthorized:
await self.ws.send_json({
"id": request_id,
"error": {
"message": "auth failure",
"type": "auth-required",
},
"complete": True,
})
return
workspace = data["workspace"]

View file

@ -97,7 +97,7 @@ class AuthEndpoints:
)
req = {
"operation": "change-password",
"user_id": identity.user_id,
"user_id": identity.handle,
"password": body.get("current_password", ""),
"new_password": body.get("new_password", ""),
}

View file

@ -36,7 +36,7 @@ class ConstantEndpoint:
data = await request.json()
if identity is not None:
enforce_workspace(data, identity)
await enforce_workspace(data, identity, self.auth)
async def responder(x, fin):
pass

View file

@ -0,0 +1,106 @@
"""
Registry-driven /api/v1/iam endpoint.
The gateway no longer gates IAM management with a single coarse
``users:admin`` capability. Instead, each operation declares its
own capability + resource shape in the registry (``registry.py``);
this endpoint reads the body's ``operation`` field, looks up the
declaration, and asks the IAM regime to authorise the call.
Operations not in the registry produce a 400 ``unknown operation``.
This is the gateway's primary mechanism for fail-closed gating of
the IAM surface the registry is the source of truth.
"""
import logging
from aiohttp import web
from .. capabilities import (
PUBLIC, AUTHENTICATED, auth_failure,
)
from .. registry import lookup, RequestContext
logger = logging.getLogger("iam-endpoint")
logger.setLevel(logging.INFO)
class IamEndpoint:
"""POST /api/v1/iam — generic forwarder gated by the operation
registry. The IAM dispatcher (``iam_dispatcher``) forwards the
body verbatim to iam-svc once authorisation succeeds."""
def __init__(self, endpoint_path, auth, dispatcher):
self.path = endpoint_path
self.auth = auth
self.dispatcher = dispatcher
async def start(self):
pass
def add_routes(self, app):
app.add_routes([web.post(self.path, self.handle)])
async def handle(self, request):
try:
body = await request.json()
except Exception:
return web.json_response(
{"error": "invalid json"}, status=400,
)
if not isinstance(body, dict):
return web.json_response(
{"error": "body must be an object"}, status=400,
)
op_name = body.get("operation", "")
op = lookup(op_name)
if op is None:
return web.json_response(
{"error": "unknown operation"}, status=400,
)
# Authentication: required for everything except PUBLIC.
identity = None
if op.capability != PUBLIC:
try:
identity = await self.auth.authenticate(request)
except web.HTTPException:
raise
# Authorisation: capability sentinels short-circuit the
# regime call; capability strings go through authorise().
if op.capability not in (PUBLIC, AUTHENTICATED):
ctx = RequestContext(
body=body,
match_info=dict(request.match_info),
identity=identity,
)
try:
resource = op.extract_resource(ctx)
parameters = op.extract_parameters(ctx)
except Exception as e:
logger.warning(
f"extractor failed for {op_name!r}: "
f"{type(e).__name__}: {e}"
)
return web.json_response(
{"error": "bad request"}, status=400,
)
await self.auth.authorise(
identity, op.capability, resource, parameters,
)
async def responder(x, fin):
pass
try:
resp = await self.dispatcher.process(body, responder)
except web.HTTPException:
raise
except Exception as e:
logger.error(f"Exception: {e}", exc_info=True)
return web.json_response({"error": str(e)})
return web.json_response(resp)

View file

@ -9,90 +9,44 @@ from . socket import SocketEndpoint
from . metrics import MetricsEndpoint
from . i18n import I18nPackEndpoint
from . auth_endpoints import AuthEndpoints
from . iam_endpoint import IamEndpoint
from . registry_endpoint import RegistryRoutedVariableEndpoint
from .. capabilities import PUBLIC, AUTHENTICATED
from .. capabilities import PUBLIC, AUTHENTICATED, auth_failure
from .. registry import lookup as _registry_lookup, RequestContext
from .. dispatch.manager import DispatcherManager
# Capability required for each kind on the /api/v1/{kind} generic
# endpoint (global services). Coarse gating — the IAM bundle split
# of "read vs write" per admin subsystem is not applied here because
# this endpoint forwards an opaque operation in the body. Writes
# are the upper bound on what the endpoint can do, so we gate on
# the write/admin capability.
GLOBAL_KIND_CAPABILITY = {
"config": "config:write",
"flow": "flows:write",
"librarian": "documents:write",
"knowledge": "knowledge:write",
"collection-management": "collections:write",
# IAM endpoints land on /api/v1/iam and require the admin bundle.
# Login / bootstrap / change-password are served by
# AuthEndpoints, which handle their own gating (PUBLIC /
# AUTHENTICATED).
"iam": "users:admin",
}
# /api/v1/{kind} (config / flow / librarian / knowledge /
# collection-management), /api/v1/iam, and /api/v1/flow/{flow}/...
# routes are all gated per-operation by the registry, not by a
# per-kind capability map. Login / bootstrap / change-password are
# served by AuthEndpoints with their own PUBLIC / AUTHENTICATED
# sentinels.
# Capability required for each kind on the
# /api/v1/flow/{flow}/service/{kind} endpoint (per-flow data-plane).
FLOW_KIND_CAPABILITY = {
"agent": "agent",
"text-completion": "llm",
"prompt": "llm",
"mcp-tool": "mcp",
"graph-rag": "graph:read",
"document-rag": "documents:read",
"embeddings": "embeddings",
"graph-embeddings": "graph:read",
"document-embeddings": "documents:read",
"triples": "graph:read",
"rows": "rows:read",
"nlp-query": "rows:read",
"structured-query": "rows:read",
"structured-diag": "rows:read",
"row-embeddings": "rows:read",
"sparql": "graph:read",
}
# Capability for the streaming flow import/export endpoints,
# keyed by the "kind" URL segment.
FLOW_IMPORT_CAPABILITY = {
"triples": "graph:write",
"graph-embeddings": "graph:write",
"document-embeddings": "documents:write",
"entity-contexts": "documents:write",
"rows": "rows:write",
}
FLOW_EXPORT_CAPABILITY = {
"triples": "graph:read",
"graph-embeddings": "graph:read",
"document-embeddings": "documents:read",
"entity-contexts": "documents:read",
}
from .. capabilities import enforce, enforce_workspace
import logging as _mgr_logging
_mgr_logger = _mgr_logging.getLogger("endpoint")
class _RoutedVariableEndpoint:
"""HTTP endpoint whose required capability is looked up per
request from the URL's ``kind`` parameter. Used for the two
generic dispatch paths (``/api/v1/{kind}`` and
``/api/v1/flow/{flow}/service/{kind}``). Self-contained rather
than subclassing ``VariableEndpoint`` to avoid mutating shared
state across concurrent requests."""
"""HTTP endpoint that gates per request via the operation
registry. The URL's ``kind`` parameter combined with a fixed
``registry_prefix`` yields the registry key e.g. prefix
``flow-service`` and kind ``agent`` looks up
``flow-service:agent``.
def __init__(self, endpoint_path, auth, dispatcher, capability_map):
Used for ``/api/v1/flow/{flow}/service/{kind}`` (per-flow
data-plane services). ``/api/v1/{kind}`` (workspace-level
global services) goes through ``RegistryRoutedVariableEndpoint``
which discriminates on body operation as well as URL kind."""
def __init__(self, endpoint_path, auth, dispatcher, registry_prefix):
self.path = endpoint_path
self.auth = auth
self.dispatcher = dispatcher
self._capability_map = capability_map
self._registry_prefix = registry_prefix
async def start(self):
pass
@ -102,18 +56,26 @@ class _RoutedVariableEndpoint:
async def handle(self, request):
kind = request.match_info.get("kind", "")
cap = self._capability_map.get(kind)
if cap is None:
op = _registry_lookup(f"{self._registry_prefix}:{kind}")
if op is None:
return web.json_response(
{"error": "unknown kind"}, status=404,
)
identity = await enforce(request, self.auth, cap)
identity = await self.auth.authenticate(request)
try:
data = await request.json()
if identity is not None:
enforce_workspace(data, identity)
ctx = RequestContext(
body=data if isinstance(data, dict) else {},
match_info=dict(request.match_info),
identity=identity,
)
resource = op.extract_resource(ctx)
parameters = op.extract_parameters(ctx)
await self.auth.authorise(
identity, op.capability, resource, parameters,
)
async def responder(x, fin):
pass
@ -131,15 +93,15 @@ class _RoutedVariableEndpoint:
class _RoutedSocketEndpoint:
"""WebSocket endpoint whose required capability is looked up per
request from the URL's ``kind`` parameter. Used for the flow
import/export streaming endpoints."""
"""WebSocket endpoint gated per request via the operation
registry. Like ``_RoutedVariableEndpoint`` but for the
streaming flow import / export socket paths."""
def __init__(self, endpoint_path, auth, dispatcher, capability_map):
def __init__(self, endpoint_path, auth, dispatcher, registry_prefix):
self.path = endpoint_path
self.auth = auth
self.dispatcher = dispatcher
self._capability_map = capability_map
self._registry_prefix = registry_prefix
async def start(self):
pass
@ -148,11 +110,9 @@ class _RoutedSocketEndpoint:
app.add_routes([web.get(self.path, self.handle)])
async def handle(self, request):
from .. capabilities import check, auth_failure, access_denied
kind = request.match_info.get("kind", "")
cap = self._capability_map.get(kind)
if cap is None:
op = _registry_lookup(f"{self._registry_prefix}:{kind}")
if op is None:
return web.json_response(
{"error": "unknown kind"}, status=404,
)
@ -168,8 +128,20 @@ class _RoutedSocketEndpoint:
)
except web.HTTPException as e:
return e
if not check(identity, cap):
return access_denied()
ctx = RequestContext(
body={},
match_info=dict(request.match_info),
identity=identity,
)
try:
resource = op.extract_resource(ctx)
parameters = op.extract_parameters(ctx)
await self.auth.authorise(
identity, op.capability, resource, parameters,
)
except web.HTTPException as e:
return e
# Delegate the websocket handling to a standalone SocketEndpoint
# with the resolved capability, bypassing the per-request mutation
@ -178,7 +150,7 @@ class _RoutedSocketEndpoint:
endpoint_path=self.path,
auth=self.auth,
dispatcher=self.dispatcher,
capability=cap,
capability=op.capability,
)
return await ws_ep.handle(request)
@ -203,6 +175,18 @@ class EndpointManager:
auth=auth,
),
# /api/v1/iam — registry-driven IAM management. Per
# operation gating happens inside IamEndpoint via the
# operation registry; the dispatcher forwards verbatim
# to iam-svc once authorisation has succeeded. Listed
# before the generic /api/v1/{kind} route so it wins
# the match for "iam".
IamEndpoint(
endpoint_path="/api/v1/iam",
auth=auth,
dispatcher=dispatcher_manager.dispatch_auth_iam(),
),
I18nPackEndpoint(
endpoint_path="/api/v1/i18n/packs/{lang}",
auth=auth,
@ -215,12 +199,16 @@ class EndpointManager:
capability="metrics:read",
),
# Global services: capability chosen per-kind.
_RoutedVariableEndpoint(
# Global services: registry-driven per-operation gating.
# Each kind+op combination has a registry entry that
# declares its capability and resource shape. Listed
# after the IAM and auth-surface routes; aiohttp's
# path matcher prefers the more-specific path so this
# variable route doesn't shadow them.
RegistryRoutedVariableEndpoint(
endpoint_path="/api/v1/{kind}",
auth=auth,
dispatcher=dispatcher_manager.dispatch_global_service(),
capability_map=GLOBAL_KIND_CAPABILITY,
),
# /api/v1/socket: WebSocket handshake accepts
@ -236,26 +224,29 @@ class EndpointManager:
in_band_auth=True,
),
# Per-flow request/response services — capability per kind.
# Per-flow request/response services — gated per
# ``flow-service:<kind>`` registry entry.
_RoutedVariableEndpoint(
endpoint_path="/api/v1/flow/{flow}/service/{kind}",
auth=auth,
dispatcher=dispatcher_manager.dispatch_flow_service(),
capability_map=FLOW_KIND_CAPABILITY,
registry_prefix="flow-service",
),
# Per-flow streaming import/export — capability per kind.
# Per-flow streaming import/export — gated per
# ``flow-import:<kind>`` / ``flow-export:<kind>`` registry
# entry.
_RoutedSocketEndpoint(
endpoint_path="/api/v1/flow/{flow}/import/{kind}",
auth=auth,
dispatcher=dispatcher_manager.dispatch_flow_import(),
capability_map=FLOW_IMPORT_CAPABILITY,
registry_prefix="flow-import",
),
_RoutedSocketEndpoint(
endpoint_path="/api/v1/flow/{flow}/export/{kind}",
auth=auth,
dispatcher=dispatcher_manager.dispatch_flow_export(),
capability_map=FLOW_EXPORT_CAPABILITY,
registry_prefix="flow-export",
),
StreamEndpoint(

View file

@ -0,0 +1,123 @@
"""
Registry-driven dispatch for ``/api/v1/{kind}`` global services.
The body's ``operation`` field plus the URL's ``{kind}`` together
form the canonical operation name (``<kind>:<operation>``) that the
gateway looks up in ``registry.py``. The matched operation
declares its capability and resource shape; this endpoint asks the
IAM regime to authorise the call before forwarding the body
verbatim to the backend dispatcher.
The dispatcher is the same ``dispatch_global_service()`` factory the
old coarse path used; only the gating layer has changed.
Operations not present in the registry are rejected with 400
``unknown operation`` fail closed.
"""
import logging
from aiohttp import web
from .. capabilities import (
PUBLIC, AUTHENTICATED, auth_failure,
)
from .. registry import lookup, RequestContext
logger = logging.getLogger("registry-endpoint")
logger.setLevel(logging.INFO)
class RegistryRoutedVariableEndpoint:
"""POST /api/v1/{kind} — kind comes from the URL, operation comes
from the body, both are joined as the registry key."""
def __init__(self, endpoint_path, auth, dispatcher):
self.path = endpoint_path
self.auth = auth
self.dispatcher = dispatcher
async def start(self):
pass
def add_routes(self, app):
app.add_routes([web.post(self.path, self.handle)])
async def handle(self, request):
kind = request.match_info.get("kind", "")
if not kind:
return web.json_response(
{"error": "missing kind"}, status=404,
)
try:
body = await request.json()
except Exception:
return web.json_response(
{"error": "invalid json"}, status=400,
)
if not isinstance(body, dict):
return web.json_response(
{"error": "body must be an object"}, status=400,
)
op_name = body.get("operation", "")
if not op_name:
return web.json_response(
{"error": "missing operation"}, status=400,
)
registry_key = f"{kind}:{op_name}"
op = lookup(registry_key)
if op is None:
return web.json_response(
{"error": "unknown operation"}, status=400,
)
identity = None
if op.capability != PUBLIC:
identity = await self.auth.authenticate(request)
if op.capability not in (PUBLIC, AUTHENTICATED):
ctx = RequestContext(
body=body,
match_info=dict(request.match_info),
identity=identity,
)
try:
resource = op.extract_resource(ctx)
parameters = op.extract_parameters(ctx)
except Exception as e:
logger.warning(
f"extractor failed for {registry_key!r}: "
f"{type(e).__name__}: {e}"
)
return web.json_response(
{"error": "bad request"}, status=400,
)
await self.auth.authorise(
identity, op.capability, resource, parameters,
)
# Default-fill workspace into the body so downstream
# dispatchers see the canonical resolved value. The
# extractor has already pulled the workspace out;
# mirror it back to the body for the verbatim forward.
if "workspace" in resource:
body["workspace"] = resource["workspace"]
async def responder(x, fin):
pass
try:
resp = await self.dispatcher.process(
body, responder, request.match_info,
)
except web.HTTPException:
raise
except Exception as e:
logger.error(f"Exception: {e}", exc_info=True)
return web.json_response({"error": str(e)})
return web.json_response(resp)

View file

@ -5,7 +5,7 @@ import logging
from .. running import Running
from .. capabilities import (
PUBLIC, AUTHENTICATED, check, auth_failure, access_denied,
PUBLIC, AUTHENTICATED, auth_failure,
)
logger = logging.getLogger("socket")
@ -97,8 +97,12 @@ class SocketEndpoint:
except web.HTTPException as e:
return e
if self.capability != AUTHENTICATED:
if not check(identity, self.capability):
return access_denied()
try:
await self.auth.authorise(
identity, self.capability, {}, {},
)
except web.HTTPException as e:
return e
# 50MB max message size
ws = web.WebSocketResponse(max_msg_size=52428800)

View file

@ -36,7 +36,7 @@ class VariableEndpoint:
data = await request.json()
if identity is not None:
enforce_workspace(data, identity)
await enforce_workspace(data, identity, self.auth)
async def responder(x, fin):
pass

View file

@ -0,0 +1,515 @@
"""
Gateway operation registry.
Single declarative table mapping each operation the gateway
recognises to:
- The capability the IAM regime is asked to authorise against.
- The resource level (system / workspace / flow) determines the
shape of the resource identifier handed to ``authorise``.
- Extractors that build the resource and parameters from the
request context.
This is a gateway-internal concept. It is not part of the IAM
contract the contract specifies what arguments ``authorise``
receives; the registry is how the gateway populates them.
See docs/tech-specs/iam-contract.md for the contract and
docs/tech-specs/iam.md for the request anatomy.
"""
from dataclasses import dataclass, field
from typing import Any, Callable
# Sentinels for operations that don't go through capability-based
# authorisation. Mirror the values used in capabilities.py so the
# gateway endpoint layer can recognise them uniformly.
PUBLIC = "__public__"
AUTHENTICATED = "__authenticated__"
class ResourceLevel:
"""Where the operation's resource lives.
``SYSTEM`` operation acts on a deployment-level resource
(the user registry, the workspace registry,
the signing key). resource = {}. Workspace,
if relevant, is a parameter, not an address.
``WORKSPACE`` operation acts on something within a workspace
(config, library, knowledge, collections, flow
lifecycle). resource = {workspace}.
``FLOW`` operation acts on something within a flow
within a workspace (graph, agent, llm, etc.).
resource = {workspace, flow}.
"""
SYSTEM = "system"
WORKSPACE = "workspace"
FLOW = "flow"
@dataclass
class RequestContext:
"""The bundle of inputs the registry's extractors operate on.
Assembled by the gateway from the incoming request after
authentication."""
# Parsed JSON body (HTTP) or inner request payload (WebSocket).
body: dict = field(default_factory=dict)
# URL path components (HTTP) or WebSocket envelope routing
# fields (id, service, workspace, flow).
match_info: dict = field(default_factory=dict)
# Authenticated identity for default-fill-in. Always present
# by the time extractors run, except for PUBLIC operations
# where it is None.
identity: Any = None
@dataclass
class Operation:
"""Declared operation the gateway can dispatch + authorise."""
# Canonical operation name (used for registry lookup, audit,
# debug logs). Mirrors the operation strings in the IAM
# service and other backends where applicable.
name: str
# Capability required to invoke this operation. Either a
# string from the capability vocabulary in capabilities.md, or
# the PUBLIC / AUTHENTICATED sentinel for operations that
# don't go through capability-based authorisation.
capability: str
# Where the operation's resource lives. Determines the
# shape of the resource argument passed to authorise.
resource_level: str
# Build the resource identifier from the request context.
# Returns a dict with the appropriate components for the
# resource level: {} for SYSTEM, {workspace} for WORKSPACE,
# {workspace, flow} for FLOW. Default-fill-in of workspace
# from identity.workspace happens here when applicable.
extract_resource: Callable[[RequestContext], dict]
# Build the parameters dict — decision-relevant fields the
# operation supplied that are not part of the resource
# address. E.g. workspace association on a system-level
# user-registry operation.
extract_parameters: Callable[[RequestContext], dict]
# ---------------------------------------------------------------------------
# Registry storage.
# ---------------------------------------------------------------------------
_REGISTRY: dict[str, Operation] = {}
def register(op: Operation) -> None:
if op.name in _REGISTRY:
raise RuntimeError(
f"operation {op.name!r} already registered"
)
_REGISTRY[op.name] = op
def lookup(name: str) -> Operation | None:
return _REGISTRY.get(name)
def all_operations() -> list[Operation]:
return list(_REGISTRY.values())
# ---------------------------------------------------------------------------
# Common extractor helpers.
# ---------------------------------------------------------------------------
def _empty_resource(_ctx: RequestContext) -> dict:
"""System-level resource: empty dict."""
return {}
def _workspace_from_body(ctx: RequestContext) -> dict:
"""Workspace-level resource sourced from the request body's
workspace field, defaulting to the caller's bound workspace."""
ws = (ctx.body.get("workspace") if isinstance(ctx.body, dict) else "")
if not ws and ctx.identity is not None:
ws = ctx.identity.workspace
return {"workspace": ws}
def _flow_from_match_info(ctx: RequestContext) -> dict:
"""Flow-level resource sourced from URL path components or WS
envelope fields. Both ``workspace`` and ``flow`` are required;
no default-fill-in (the address is the operation's identity)."""
return {
"workspace": ctx.match_info.get("workspace", ""),
"flow": ctx.match_info.get("flow", ""),
}
def _no_parameters(_ctx: RequestContext) -> dict:
return {}
def _body_as_parameters(ctx: RequestContext) -> dict:
"""All body fields are parameters — used when the operation's
body is small and uniformly decision-relevant (e.g. user-
registry ops where the body's user.workspace is what the
regime checks against the admin's scope)."""
return dict(ctx.body) if isinstance(ctx.body, dict) else {}
def _workspace_param_only(ctx: RequestContext) -> dict:
"""Parameters dict carrying only the workspace association.
Used by system-level operations (e.g. user-registry ops) where
the workspace isn't part of the resource address but is the
field the regime uses to scope the admin's authority.
Pulls the workspace from the inner ``user`` / ``workspace_record``
body field if present (create-user, create-workspace), then from
the top-level body, then from the caller's bound workspace."""
body = ctx.body if isinstance(ctx.body, dict) else {}
inner_user = body.get("user") if isinstance(body.get("user"), dict) else {}
inner_ws = (
body.get("workspace_record")
if isinstance(body.get("workspace_record"), dict) else {}
)
ws = (
inner_user.get("workspace")
or inner_ws.get("id")
or body.get("workspace")
)
if not ws and ctx.identity is not None:
ws = ctx.identity.workspace
return {"workspace": ws or ""}
# ---------------------------------------------------------------------------
# Operation registrations.
#
# The gateway looks operations up by their canonical name (the same
# string the request body / WS envelope carries in its ``operation``
# field where applicable). Auth-surface operations (login, bootstrap,
# change-password) are not listed here — they have their own routes
# in auth_endpoints.py and use PUBLIC / AUTHENTICATED sentinels
# directly. Pure gateway↔IAM internal operations (resolve-api-key,
# authorise, authorise-many, get-signing-key-public) are likewise
# excluded; they are never invoked over the public API.
# ---------------------------------------------------------------------------
# IAM management operations. All routed through /api/v1/iam, body
# carries ``operation`` plus operation-specific fields.
# User registry: SYSTEM-level resource (users are global, identified
# by handle). The admin's authority is scoped per workspace via the
# parameters {workspace} field — that's what the regime checks
# against the admin's role workspace_scope.
register(Operation(
name="create-user",
capability="users:admin",
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_workspace_param_only,
))
register(Operation(
name="list-users",
capability="users:read",
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_workspace_param_only,
))
register(Operation(
name="get-user",
capability="users:read",
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_workspace_param_only,
))
register(Operation(
name="update-user",
capability="users:write",
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_workspace_param_only,
))
register(Operation(
name="disable-user",
capability="users:admin",
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_workspace_param_only,
))
register(Operation(
name="enable-user",
capability="users:admin",
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_workspace_param_only,
))
register(Operation(
name="delete-user",
capability="users:admin",
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_workspace_param_only,
))
register(Operation(
name="reset-password",
capability="users:admin",
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_workspace_param_only,
))
# API keys: workspace-level resource — keys live within a workspace.
register(Operation(
name="create-api-key",
capability="keys:admin",
resource_level=ResourceLevel.WORKSPACE,
extract_resource=_workspace_from_body,
extract_parameters=_no_parameters,
))
register(Operation(
name="list-api-keys",
capability="keys:admin",
resource_level=ResourceLevel.WORKSPACE,
extract_resource=_workspace_from_body,
extract_parameters=_no_parameters,
))
register(Operation(
name="revoke-api-key",
capability="keys:admin",
resource_level=ResourceLevel.WORKSPACE,
extract_resource=_workspace_from_body,
extract_parameters=_no_parameters,
))
# Workspace registry: SYSTEM-level resource (workspaces are the
# top-level addressable unit). No parameters — the workspace being
# acted on is identified by the body, not used as a scope cue.
register(Operation(
name="create-workspace",
capability="workspaces:admin",
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_no_parameters,
))
register(Operation(
name="list-workspaces",
capability="workspaces:admin",
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_no_parameters,
))
register(Operation(
name="get-workspace",
capability="workspaces:admin",
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_no_parameters,
))
register(Operation(
name="update-workspace",
capability="workspaces:admin",
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_no_parameters,
))
register(Operation(
name="disable-workspace",
capability="workspaces:admin",
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_no_parameters,
))
# Signing key: SYSTEM-level operational op.
register(Operation(
name="rotate-signing-key",
capability="iam:admin",
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_no_parameters,
))
# ---------------------------------------------------------------------------
# Auth-surface entries.
#
# Listed here so the registry is the one place the gateway looks for
# operation→capability mappings — including the sentinels for paths
# that don't go through capability-based authorisation. The actual
# routing is in auth_endpoints.py; these entries let the registry-
# driven dispatcher recognise the operation if it sees it on a
# generic path.
# ---------------------------------------------------------------------------
register(Operation(
name="login",
capability=PUBLIC,
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_no_parameters,
))
register(Operation(
name="bootstrap",
capability=PUBLIC,
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_no_parameters,
))
register(Operation(
name="change-password",
capability=AUTHENTICATED,
resource_level=ResourceLevel.SYSTEM,
extract_resource=_empty_resource,
extract_parameters=_no_parameters,
))
# ---------------------------------------------------------------------------
# Generic kind/operation entries.
#
# Names are ``<kind>:<operation>`` so the registry key is unique
# across dispatchers. All entries below are workspace-level
# resources (workspace defaulted from the caller's bound workspace
# if absent). Read/write distinction maps to the existing
# ``<subject>:read`` / ``<subject>:write`` capability vocabulary
# defined in capabilities.md.
# ---------------------------------------------------------------------------
def _register_kind_op(kind: str, op: str, capability: str) -> None:
"""Helper: register a workspace-level kind:op with the standard
extractors (workspace from body, no extra parameters)."""
register(Operation(
name=f"{kind}:{op}",
capability=capability,
resource_level=ResourceLevel.WORKSPACE,
extract_resource=_workspace_from_body,
extract_parameters=_no_parameters,
))
# config: KV-style workspace config service.
for _op in ("get", "list", "getvalues", "getvalues-all-ws", "config"):
_register_kind_op("config", _op, "config:read")
for _op in ("put", "delete"):
_register_kind_op("config", _op, "config:write")
# flow: flow-blueprint and flow-lifecycle service.
for _op in ("list-blueprints", "get-blueprint", "list-flows", "get-flow"):
_register_kind_op("flow", _op, "flows:read")
for _op in ("put-blueprint", "delete-blueprint", "start-flow", "stop-flow"):
_register_kind_op("flow", _op, "flows:write")
# librarian: document storage and processing service.
for _op in (
"get-document-metadata", "get-document-content",
"stream-document", "list-documents", "list-processing",
"get-upload-status", "list-uploads",
):
_register_kind_op("librarian", _op, "documents:read")
for _op in (
"add-document", "remove-document", "update-document",
"add-processing", "remove-processing",
"begin-upload", "upload-chunk", "complete-upload", "abort-upload",
):
_register_kind_op("librarian", _op, "documents:write")
# knowledge: knowledge-graph core service.
for _op in ("get-kg-core", "list-kg-cores"):
_register_kind_op("knowledge", _op, "knowledge:read")
for _op in ("put-kg-core", "delete-kg-core",
"load-kg-core", "unload-kg-core"):
_register_kind_op("knowledge", _op, "knowledge:write")
# collection-management: workspace collection lifecycle.
_register_kind_op("collection-management", "list-collections", "collections:read")
for _op in ("update-collection", "delete-collection"):
_register_kind_op("collection-management", _op, "collections:write")
# ---------------------------------------------------------------------------
# Per-flow data-plane services.
#
# /api/v1/flow/{flow}/service/{kind} and the streaming
# /api/v1/flow/{flow}/{import,export}/{kind} paths. No body-level
# ``operation`` discriminator — the URL kind is the operation
# identity. Resource is FLOW level (workspace + flow).
#
# Names: ``flow-service:<kind>``, ``flow-import:<kind>``,
# ``flow-export:<kind>``.
# ---------------------------------------------------------------------------
def _register_flow_kind(prefix: str, kind: str, capability: str) -> None:
register(Operation(
name=f"{prefix}:{kind}",
capability=capability,
resource_level=ResourceLevel.FLOW,
extract_resource=_flow_from_match_info,
extract_parameters=_no_parameters,
))
# Request/response services on /api/v1/flow/{flow}/service/{kind}.
_FLOW_SERVICES = {
"agent": "agent",
"text-completion": "llm",
"prompt": "llm",
"mcp-tool": "mcp",
"graph-rag": "graph:read",
"document-rag": "documents:read",
"embeddings": "embeddings",
"graph-embeddings": "graph:read",
"document-embeddings": "documents:read",
"triples": "graph:read",
"rows": "rows:read",
"nlp-query": "rows:read",
"structured-query": "rows:read",
"structured-diag": "rows:read",
"row-embeddings": "rows:read",
"sparql": "graph:read",
}
for _kind, _cap in _FLOW_SERVICES.items():
_register_flow_kind("flow-service", _kind, _cap)
# Streaming import socket endpoints.
_FLOW_IMPORTS = {
"triples": "graph:write",
"graph-embeddings": "graph:write",
"document-embeddings": "documents:write",
"entity-contexts": "documents:write",
"rows": "rows:write",
}
for _kind, _cap in _FLOW_IMPORTS.items():
_register_flow_kind("flow-import", _kind, _cap)
# Streaming export socket endpoints.
_FLOW_EXPORTS = {
"triples": "graph:read",
"graph-embeddings": "graph:read",
"document-embeddings": "documents:read",
"entity-contexts": "documents:read",
}
for _kind, _cap in _FLOW_EXPORTS.items():
_register_flow_kind("flow-export", _kind, _cap)

View file

@ -40,6 +40,78 @@ API_KEY_RANDOM_BYTES = 24
JWT_ISSUER = "trustgraph-iam"
JWT_TTL_SECONDS = 3600
# Default authorisation cache TTL the regime tells the gateway to
# observe. 60s is the OSS-spec maximum revocation latency: a role
# change, workspace disable, or key revoke takes effect within at
# most this much time.
AUTHZ_CACHE_TTL_SECONDS = 60
# OSS regime role table. Lives here, not in the gateway — the
# gateway is regime-agnostic and must not encode policy.
#
# Each role has a capability set and a workspace scope. The
# evaluator (handle_authorise below) checks (a) that some role
# held by the caller grants the requested capability, and (b)
# that role's workspace scope permits the target workspace.
_READER_CAPS = {
"agent",
"graph:read",
"documents:read",
"rows:read",
"llm",
"embeddings",
"mcp",
"config:read",
"flows:read",
"collections:read",
"knowledge:read",
"keys:self",
}
_WRITER_CAPS = _READER_CAPS | {
"graph:write",
"documents:write",
"rows:write",
"collections:write",
"knowledge:write",
}
_ADMIN_CAPS = _WRITER_CAPS | {
"config:write",
"flows:write",
"users:read", "users:write", "users:admin",
"keys:admin",
"workspaces:admin",
"iam:admin",
"metrics:read",
}
ROLE_DEFINITIONS = {
"reader": {
"capabilities": _READER_CAPS,
"workspace_scope": "assigned",
},
"writer": {
"capabilities": _WRITER_CAPS,
"workspace_scope": "assigned",
},
"admin": {
"capabilities": _ADMIN_CAPS,
"workspace_scope": "*",
},
}
def _scope_permits(role_scope, target_workspace, assigned_workspace):
"""Does the given role apply to ``target_workspace``?"""
if role_scope == "*":
return True
if role_scope == "assigned":
return target_workspace == assigned_workspace
return False
def _now_iso():
return datetime.datetime.now(datetime.timezone.utc).isoformat()
@ -250,6 +322,10 @@ class IamService:
return await self.handle_disable_workspace(v)
if op == "rotate-signing-key":
return await self.handle_rotate_signing_key(v)
if op == "authorise":
return await self.handle_authorise(v)
if op == "authorise-many":
return await self.handle_authorise_many(v)
return _err(
"invalid-argument",
@ -478,7 +554,7 @@ class IamService:
(
id, ws, _username, _name, _email, password_hash,
roles, enabled, _mcp, _created,
_roles, enabled, _mcp, _created,
) = user_row
if not enabled:
@ -496,11 +572,14 @@ class IamService:
now_ts = int(_now_dt().timestamp())
exp_ts = now_ts + JWT_TTL_SECONDS
# Per the IAM contract the gateway never reads policy state
# from the credential — roles stay server-side, reachable
# only via authorise(). JWT carries identity + workspace
# binding only.
claims = {
"iss": JWT_ISSUER,
"sub": id,
"workspace": ws,
"roles": sorted(roles) if roles else [],
"iat": now_ts,
"exp": exp_ts,
}
@ -1130,3 +1209,134 @@ class IamService:
await self.table_store.delete_api_key(key_hash)
return IamResponse()
# ------------------------------------------------------------------
# authorise / authorise-many
#
# The IAM contract (see docs/tech-specs/iam-contract.md) calls
# for the regime — not the gateway — to decide whether an
# identity may perform a capability on a resource given the
# operation's parameters. These two operations are the OSS
# regime's implementation of that contract.
#
# Inputs (on IamRequest):
# user_id — the identity handle (the gateway's
# opaque reference). For OSS this is the
# user record's id.
# capability — the capability string from the
# capabilities.md vocabulary.
# resource_json — JSON dict, the resource address
# ({} for system, {workspace} for
# workspace, {workspace, flow} for flow).
# parameters_json — JSON dict, decision-relevant operation
# parameters (e.g. workspace association
# on user-registry operations).
# authorise_checks — for authorise-many, a JSON list of
# {capability, resource, parameters}.
#
# Outputs (on IamResponse):
# decision_allow — single allow / deny verdict.
# decision_ttl_seconds — gateway cache TTL for this
# decision.
# decisions_json — for authorise-many, list of
# {allow, ttl} in request order.
# ------------------------------------------------------------------
def _decide(self, user_row, capability, resource, parameters):
"""Single authorisation decision. Returns (allow, ttl)."""
if user_row is None:
return False, AUTHZ_CACHE_TTL_SECONDS
# user_row layout:
# 0:id 1:workspace 2:username 3:name 4:email 5:password_hash
# 6:roles 7:enabled 8:must_change_password 9:created
if not user_row[7]: # disabled
return False, AUTHZ_CACHE_TTL_SECONDS
# Disabled workspace check (defense in depth — credentials
# bound to a disabled workspace shouldn't be able to act).
# Cheap; one row read.
# We do this only when a target workspace is involved, to
# avoid an extra read for system-level operations that
# bypass workspace altogether.
target_workspace = (
(resource or {}).get("workspace")
or (parameters or {}).get("workspace")
)
roles = user_row[6] or set()
assigned_workspace = user_row[1]
for role_name in roles:
defn = ROLE_DEFINITIONS.get(role_name)
if defn is None:
continue
if capability not in defn["capabilities"]:
continue
if target_workspace is None or _scope_permits(
defn["workspace_scope"],
target_workspace,
assigned_workspace,
):
return True, AUTHZ_CACHE_TTL_SECONDS
return False, AUTHZ_CACHE_TTL_SECONDS
async def handle_authorise(self, v):
if not v.capability:
return _err("invalid-argument", "capability required")
if not v.user_id:
return _err("invalid-argument", "user_id (handle) required")
try:
resource = json.loads(v.resource_json or "{}")
parameters = json.loads(v.parameters_json or "{}")
except json.JSONDecodeError as e:
return _err("invalid-argument", f"bad json: {e}")
user_row = await self.table_store.get_user(v.user_id)
allow, ttl = self._decide(
user_row, v.capability, resource, parameters,
)
return IamResponse(
decision_allow=allow,
decision_ttl_seconds=ttl,
)
async def handle_authorise_many(self, v):
if not v.user_id:
return _err("invalid-argument", "user_id (handle) required")
if not v.authorise_checks:
return _err("invalid-argument", "authorise_checks required")
try:
checks = json.loads(v.authorise_checks)
except json.JSONDecodeError as e:
return _err("invalid-argument", f"bad json: {e}")
if not isinstance(checks, list):
return _err(
"invalid-argument",
"authorise_checks must be a JSON list",
)
# One user lookup for the whole batch.
user_row = await self.table_store.get_user(v.user_id)
decisions = []
for c in checks:
if not isinstance(c, dict):
decisions.append({
"allow": False,
"ttl": AUTHZ_CACHE_TTL_SECONDS,
})
continue
allow, ttl = self._decide(
user_row,
c.get("capability", ""),
c.get("resource") or {},
c.get("parameters") or {},
)
decisions.append({"allow": allow, "ttl": ttl})
return IamResponse(decisions_json=json.dumps(decisions))