mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-05-03 04:12:37 +02:00
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model. The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.
IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
passwords and JWT signing keys in Cassandra. Reached over the
standard pub/sub request/response pattern; gateway is the only
caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
rotate-signing-key, create/list/get/update/disable/delete/enable-user,
change-password, reset-password, create/list/get/update/disable-
workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and
retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed. Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
required startup argument with no permissive default. Masked
"auth failure" errors hide whether a refused bootstrap request was
due to mode, state, or authorisation.
Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator. Distinguishes JWTs
(three-segment dotted) from API keys by shape; verifies JWTs
locally using the cached IAM public key; resolves API keys via
IAM with a short-TTL hash-keyed cache. Every failure path
surfaces the same 401 body ("auth failure") so callers cannot
enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
traffic does not begin flowing until auth has started.
Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
OSS ships reader / writer / admin; the first two are workspace-
assigned, admin is cross-workspace ("*"). No "cross-workspace"
pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
authorisation test: some role must grant the capability *and* be
active in the target workspace.
* enforce_workspace validates a request-body workspace against the
caller's role scopes and injects the resolved value. Cross-
workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
permissive default. Construction fails fast if omitted. Enterprise
editions can replace the role table without changing the wire
protocol.
WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
runs on the first WebSocket frame ({"type":"auth","token":"..."})
with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
The socket stays open on failure so the client can re-authenticate
— browsers treat a handshake-time 401 as terminal, breaking
reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
enforces the caller's workspace (envelope + inner payload) using
the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
handshake (URL-scoped short-lived transfers; no re-auth need).
Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
the IAM API (per-op REST endpoints to follow in a later change).
Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
Authenticator.permitted contract. The gateway cannot run without
IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
downgrade path.
CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.
Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
role bundles, agent-as-composition note, enforcement-boundary
policy, enterprise extensibility.
Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
role x workspace combinations, enforce_workspace paths,
unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
explicitly (no permissive defaults relied upon). New tests pin
the fail-closed invariants: DispatcherManager / Mux refuse
auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
412 lines
14 KiB
Python
412 lines
14 KiB
Python
|
|
import asyncio
|
|
from aiohttp import web
|
|
import uuid
|
|
import logging
|
|
|
|
# Module logger
|
|
logger = logging.getLogger(__name__)
|
|
|
|
from . config import ConfigRequestor
|
|
from . flow import FlowRequestor
|
|
from . iam import IamRequestor
|
|
from . librarian import LibrarianRequestor
|
|
from . knowledge import KnowledgeRequestor
|
|
from . collection_management import CollectionManagementRequestor
|
|
|
|
from . embeddings import EmbeddingsRequestor
|
|
from . agent import AgentRequestor
|
|
from . text_completion import TextCompletionRequestor
|
|
from . prompt import PromptRequestor
|
|
from . graph_rag import GraphRagRequestor
|
|
from . document_rag import DocumentRagRequestor
|
|
from . triples_query import TriplesQueryRequestor
|
|
from . rows_query import RowsQueryRequestor
|
|
from . nlp_query import NLPQueryRequestor
|
|
from . sparql_query import SparqlQueryRequestor
|
|
from . structured_query import StructuredQueryRequestor
|
|
from . structured_diag import StructuredDiagRequestor
|
|
from . embeddings import EmbeddingsRequestor
|
|
from . graph_embeddings_query import GraphEmbeddingsQueryRequestor
|
|
from . document_embeddings_query import DocumentEmbeddingsQueryRequestor
|
|
from . row_embeddings_query import RowEmbeddingsQueryRequestor
|
|
from . mcp_tool import McpToolRequestor
|
|
from . text_load import TextLoad
|
|
from . document_load import DocumentLoad
|
|
|
|
from . triples_export import TriplesExport
|
|
from . graph_embeddings_export import GraphEmbeddingsExport
|
|
from . document_embeddings_export import DocumentEmbeddingsExport
|
|
from . entity_contexts_export import EntityContextsExport
|
|
|
|
from . triples_import import TriplesImport
|
|
from . graph_embeddings_import import GraphEmbeddingsImport
|
|
from . document_embeddings_import import DocumentEmbeddingsImport
|
|
from . entity_contexts_import import EntityContextsImport
|
|
from . rows_import import RowsImport
|
|
|
|
from . core_export import CoreExport
|
|
from . core_import import CoreImport
|
|
from . document_stream import DocumentStreamExport
|
|
|
|
from . mux import Mux
|
|
|
|
request_response_dispatchers = {
|
|
"agent": AgentRequestor,
|
|
"text-completion": TextCompletionRequestor,
|
|
"prompt": PromptRequestor,
|
|
"mcp-tool": McpToolRequestor,
|
|
"graph-rag": GraphRagRequestor,
|
|
"document-rag": DocumentRagRequestor,
|
|
"embeddings": EmbeddingsRequestor,
|
|
"graph-embeddings": GraphEmbeddingsQueryRequestor,
|
|
"document-embeddings": DocumentEmbeddingsQueryRequestor,
|
|
"triples": TriplesQueryRequestor,
|
|
"rows": RowsQueryRequestor,
|
|
"nlp-query": NLPQueryRequestor,
|
|
"structured-query": StructuredQueryRequestor,
|
|
"structured-diag": StructuredDiagRequestor,
|
|
"row-embeddings": RowEmbeddingsQueryRequestor,
|
|
"sparql": SparqlQueryRequestor,
|
|
}
|
|
|
|
global_dispatchers = {
|
|
"config": ConfigRequestor,
|
|
"flow": FlowRequestor,
|
|
"iam": IamRequestor,
|
|
"librarian": LibrarianRequestor,
|
|
"knowledge": KnowledgeRequestor,
|
|
"collection-management": CollectionManagementRequestor,
|
|
}
|
|
|
|
sender_dispatchers = {
|
|
"text-load": TextLoad,
|
|
"document-load": DocumentLoad,
|
|
}
|
|
|
|
export_dispatchers = {
|
|
"triples": TriplesExport,
|
|
"graph-embeddings": GraphEmbeddingsExport,
|
|
"document-embeddings": DocumentEmbeddingsExport,
|
|
"entity-contexts": EntityContextsExport,
|
|
}
|
|
|
|
import_dispatchers = {
|
|
"triples": TriplesImport,
|
|
"graph-embeddings": GraphEmbeddingsImport,
|
|
"document-embeddings": DocumentEmbeddingsImport,
|
|
"entity-contexts": EntityContextsImport,
|
|
"rows": RowsImport,
|
|
}
|
|
|
|
class DispatcherWrapper:
|
|
def __init__(self, handler):
|
|
self.handler = handler
|
|
async def process(self, *args):
|
|
return await self.handler(*args)
|
|
|
|
class DispatcherManager:
|
|
|
|
def __init__(self, backend, config_receiver, auth,
|
|
prefix="api-gateway", queue_overrides=None):
|
|
"""
|
|
``auth`` is required. It flows into the Mux for first-frame
|
|
WebSocket authentication and into downstream dispatcher
|
|
construction. There is no permissive default — constructing
|
|
a DispatcherManager without an authenticator would be a
|
|
silent downgrade to no-auth on the socket path.
|
|
"""
|
|
if auth is None:
|
|
raise ValueError(
|
|
"DispatcherManager requires an 'auth' argument — there "
|
|
"is no no-auth mode"
|
|
)
|
|
|
|
self.backend = backend
|
|
self.config_receiver = config_receiver
|
|
self.config_receiver.add_handler(self)
|
|
self.prefix = prefix
|
|
|
|
# Gateway IamAuth — used by the socket Mux for first-frame
|
|
# auth and by any dispatcher that needs to resolve caller
|
|
# identity out-of-band.
|
|
self.auth = auth
|
|
|
|
# Store queue overrides for global services
|
|
# Format: {"config": {"request": "...", "response": "..."}, ...}
|
|
self.queue_overrides = queue_overrides or {}
|
|
|
|
# Flows keyed by (workspace, flow_id)
|
|
self.flows = {}
|
|
# Dispatchers keyed by (workspace, flow_id, kind)
|
|
self.dispatchers = {}
|
|
self.dispatcher_lock = asyncio.Lock()
|
|
|
|
async def start_flow(self, workspace, id, flow):
|
|
logger.info(f"Starting flow {workspace}/{id}")
|
|
self.flows[(workspace, id)] = flow
|
|
return
|
|
|
|
async def stop_flow(self, workspace, id, flow):
|
|
logger.info(f"Stopping flow {workspace}/{id}")
|
|
self.flows.pop((workspace, id), None)
|
|
|
|
# Drop any cached dispatchers for this (workspace, flow).
|
|
# Their publishers and subscribers were wired to the flow's
|
|
# per-flow exchanges, which flow-svc tears down when the flow
|
|
# stops. Leaving the cached dispatcher in place means a
|
|
# subsequent restart of the same flow id would reuse a
|
|
# dispatcher whose subscription queue is still bound to the
|
|
# torn-down (now re-created) response exchange — requests go
|
|
# out but responses are silently dropped and the caller hangs.
|
|
#
|
|
# Per-flow dispatchers are keyed (workspace, flow_id, kind).
|
|
# Global dispatchers are keyed (None, kind) — the len==3
|
|
# check naturally excludes them.
|
|
async with self.dispatcher_lock:
|
|
stale_keys = [
|
|
k for k in self.dispatchers
|
|
if isinstance(k, tuple) and len(k) == 3
|
|
and k[0] == workspace and k[1] == id
|
|
]
|
|
for key in stale_keys:
|
|
dispatcher = self.dispatchers.pop(key)
|
|
try:
|
|
await dispatcher.stop()
|
|
except Exception as e:
|
|
logger.warning(
|
|
f"Error stopping cached dispatcher {key}: {e}"
|
|
)
|
|
|
|
return
|
|
|
|
def dispatch_global_service(self):
|
|
return DispatcherWrapper(self.process_global_service)
|
|
|
|
def dispatch_auth_iam(self):
|
|
"""Pre-configured IAM dispatcher for the gateway's auth
|
|
endpoints (login, bootstrap, change-password). Pins the
|
|
kind to ``iam`` so these handlers don't have to supply URL
|
|
params the global dispatcher would expect."""
|
|
async def _process(data, responder):
|
|
return await self.invoke_global_service(data, responder, "iam")
|
|
return DispatcherWrapper(_process)
|
|
|
|
def dispatch_core_export(self):
|
|
return DispatcherWrapper(self.process_core_export)
|
|
|
|
def dispatch_core_import(self):
|
|
return DispatcherWrapper(self.process_core_import)
|
|
|
|
def dispatch_document_stream(self):
|
|
return DispatcherWrapper(self.process_document_stream)
|
|
|
|
async def process_document_stream(self, data, error, ok, request):
|
|
|
|
ds = DocumentStreamExport(self.backend)
|
|
return await ds.process(data, error, ok, request)
|
|
|
|
async def process_core_import(self, data, error, ok, request):
|
|
|
|
ci = CoreImport(self.backend)
|
|
return await ci.process(data, error, ok, request)
|
|
|
|
async def process_core_export(self, data, error, ok, request):
|
|
|
|
ce = CoreExport(self.backend)
|
|
return await ce.process(data, error, ok, request)
|
|
|
|
async def process_global_service(self, data, responder, params):
|
|
|
|
kind = params.get("kind")
|
|
return await self.invoke_global_service(data, responder, kind)
|
|
|
|
async def invoke_global_service(self, data, responder, kind):
|
|
|
|
key = (None, kind)
|
|
|
|
if key not in self.dispatchers:
|
|
async with self.dispatcher_lock:
|
|
if key not in self.dispatchers:
|
|
request_queue = None
|
|
response_queue = None
|
|
if kind in self.queue_overrides:
|
|
request_queue = self.queue_overrides[kind].get("request")
|
|
response_queue = self.queue_overrides[kind].get("response")
|
|
|
|
dispatcher = global_dispatchers[kind](
|
|
backend = self.backend,
|
|
timeout = 120,
|
|
consumer = f"{self.prefix}-{kind}-request",
|
|
subscriber = f"{self.prefix}-{kind}-request",
|
|
request_queue = request_queue,
|
|
response_queue = response_queue,
|
|
)
|
|
|
|
await dispatcher.start()
|
|
self.dispatchers[key] = dispatcher
|
|
|
|
return await self.dispatchers[key].process(data, responder)
|
|
|
|
def dispatch_flow_import(self):
|
|
return self.process_flow_import
|
|
|
|
def dispatch_flow_export(self):
|
|
return self.process_flow_export
|
|
|
|
def dispatch_socket(self):
|
|
return self.process_socket
|
|
|
|
def dispatch_flow_service(self):
|
|
return DispatcherWrapper(self.process_flow_service)
|
|
|
|
async def process_flow_import(self, ws, running, params):
|
|
|
|
workspace = params.get("workspace", "default")
|
|
flow = params.get("flow")
|
|
kind = params.get("kind")
|
|
|
|
flow_key = (workspace, flow)
|
|
if flow_key not in self.flows:
|
|
raise RuntimeError(f"Invalid flow {workspace}/{flow}")
|
|
|
|
if kind not in import_dispatchers:
|
|
raise RuntimeError("Invalid kind")
|
|
|
|
key = (workspace, flow, kind)
|
|
|
|
intf_defs = self.flows[flow_key]["interfaces"]
|
|
|
|
# FIXME: The -store bit, does it make sense?
|
|
if kind == "entity-contexts":
|
|
int_kind = kind + "-load"
|
|
else:
|
|
int_kind = kind + "-store"
|
|
|
|
if int_kind not in intf_defs:
|
|
raise RuntimeError("This kind not supported by flow")
|
|
|
|
# FIXME: The -store bit, does it make sense?
|
|
qconfig = intf_defs[int_kind]["flow"]
|
|
|
|
id = str(uuid.uuid4())
|
|
dispatcher = import_dispatchers[kind](
|
|
backend = self.backend,
|
|
ws = ws,
|
|
running = running,
|
|
queue = qconfig,
|
|
)
|
|
|
|
await dispatcher.start()
|
|
|
|
return dispatcher
|
|
|
|
async def process_flow_export(self, ws, running, params):
|
|
|
|
workspace = params.get("workspace", "default")
|
|
flow = params.get("flow")
|
|
kind = params.get("kind")
|
|
|
|
flow_key = (workspace, flow)
|
|
if flow_key not in self.flows:
|
|
raise RuntimeError(f"Invalid flow {workspace}/{flow}")
|
|
|
|
if kind not in export_dispatchers:
|
|
raise RuntimeError("Invalid kind")
|
|
|
|
key = (workspace, flow, kind)
|
|
|
|
intf_defs = self.flows[flow_key]["interfaces"]
|
|
|
|
# FIXME: The -store bit, does it make sense?
|
|
if kind == "entity-contexts":
|
|
int_kind = kind + "-load"
|
|
else:
|
|
int_kind = kind + "-store"
|
|
|
|
if int_kind not in intf_defs:
|
|
raise RuntimeError("This kind not supported by flow")
|
|
|
|
qconfig = intf_defs[int_kind]["flow"]
|
|
|
|
id = str(uuid.uuid4())
|
|
dispatcher = export_dispatchers[kind](
|
|
backend = self.backend,
|
|
ws = ws,
|
|
running = running,
|
|
queue = qconfig,
|
|
consumer = f"{self.prefix}-{id}",
|
|
subscriber = f"{self.prefix}-{id}",
|
|
)
|
|
|
|
return dispatcher
|
|
|
|
async def process_socket(self, ws, running, params):
|
|
|
|
# The mux self-authenticates via the first-frame protocol;
|
|
# pass the gateway's IamAuth so it can validate tokens
|
|
# without reaching back into the endpoint layer.
|
|
dispatcher = Mux(self, ws, running, auth=self.auth)
|
|
|
|
return dispatcher
|
|
|
|
async def process_flow_service(self, data, responder, params):
|
|
|
|
# Workspace can come from URL or from request body, defaulting
|
|
# to "default". Having it in the URL allows gateway routing to
|
|
# be workspace-aware without touching the body.
|
|
workspace = params.get("workspace")
|
|
if not workspace and isinstance(data, dict):
|
|
workspace = data.get("workspace")
|
|
if not workspace:
|
|
workspace = "default"
|
|
|
|
flow = params.get("flow")
|
|
kind = params.get("kind")
|
|
|
|
return await self.invoke_flow_service(
|
|
data, responder, workspace, flow, kind,
|
|
)
|
|
|
|
async def invoke_flow_service(
|
|
self, data, responder, workspace, flow, kind,
|
|
):
|
|
|
|
flow_key = (workspace, flow)
|
|
if flow_key not in self.flows:
|
|
raise RuntimeError(f"Invalid flow {workspace}/{flow}")
|
|
|
|
key = (workspace, flow, kind)
|
|
|
|
if key not in self.dispatchers:
|
|
async with self.dispatcher_lock:
|
|
if key not in self.dispatchers:
|
|
intf_defs = self.flows[flow_key]["interfaces"]
|
|
|
|
if kind not in intf_defs:
|
|
raise RuntimeError("This kind not supported by flow")
|
|
|
|
qconfig = intf_defs[kind]
|
|
|
|
if kind in request_response_dispatchers:
|
|
dispatcher = request_response_dispatchers[kind](
|
|
backend = self.backend,
|
|
request_queue = qconfig["request"],
|
|
response_queue = qconfig["response"],
|
|
timeout = 120,
|
|
consumer = f"{self.prefix}-{workspace}-{flow}-{kind}-request",
|
|
subscriber = f"{self.prefix}-{workspace}-{flow}-{kind}-request",
|
|
)
|
|
elif kind in sender_dispatchers:
|
|
dispatcher = sender_dispatchers[kind](
|
|
backend = self.backend,
|
|
queue = qconfig["flow"],
|
|
)
|
|
else:
|
|
raise RuntimeError("Invalid kind")
|
|
|
|
await dispatcher.start()
|
|
self.dispatchers[key] = dispatcher
|
|
|
|
return await self.dispatchers[key].process(data, responder)
|
|
|