mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 00:16:23 +02:00
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model. The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.
IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
passwords and JWT signing keys in Cassandra. Reached over the
standard pub/sub request/response pattern; gateway is the only
caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
rotate-signing-key, create/list/get/update/disable/delete/enable-user,
change-password, reset-password, create/list/get/update/disable-
workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and
retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed. Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
required startup argument with no permissive default. Masked
"auth failure" errors hide whether a refused bootstrap request was
due to mode, state, or authorisation.
Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator. Distinguishes JWTs
(three-segment dotted) from API keys by shape; verifies JWTs
locally using the cached IAM public key; resolves API keys via
IAM with a short-TTL hash-keyed cache. Every failure path
surfaces the same 401 body ("auth failure") so callers cannot
enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
traffic does not begin flowing until auth has started.
Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
OSS ships reader / writer / admin; the first two are workspace-
assigned, admin is cross-workspace ("*"). No "cross-workspace"
pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
authorisation test: some role must grant the capability *and* be
active in the target workspace.
* enforce_workspace validates a request-body workspace against the
caller's role scopes and injects the resolved value. Cross-
workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
permissive default. Construction fails fast if omitted. Enterprise
editions can replace the role table without changing the wire
protocol.
WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
runs on the first WebSocket frame ({"type":"auth","token":"..."})
with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
The socket stays open on failure so the client can re-authenticate
— browsers treat a handshake-time 401 as terminal, breaking
reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
enforces the caller's workspace (envelope + inner payload) using
the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
handshake (URL-scoped short-lived transfers; no re-auth need).
Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
the IAM API (per-op REST endpoints to follow in a later change).
Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
Authenticator.permitted contract. The gateway cannot run without
IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
downgrade path.
CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.
Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
role bundles, agent-as-composition note, enforcement-boundary
policy, enterprise extensibility.
Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
role x workspace combinations, enforce_workspace paths,
unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
explicitly (no permissive defaults relied upon). New tests pin
the fail-closed invariants: DispatcherManager / Mux refuse
auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
179 lines
6.1 KiB
Python
179 lines
6.1 KiB
Python
"""
|
|
Tests for gateway/service.py — the Api class that wires together
|
|
the pub/sub backend, IAM auth, config receiver, dispatcher manager,
|
|
and endpoint manager.
|
|
|
|
The legacy ``GATEWAY_SECRET`` / ``default_api_token`` / allow-all
|
|
surface is gone, so the tests here focus on the Api's construction
|
|
and composition rather than the removed auth behaviour. IamAuth's
|
|
own behaviour is covered in test_auth.py.
|
|
"""
|
|
|
|
import pytest
|
|
from unittest.mock import AsyncMock, Mock, patch
|
|
from aiohttp import web
|
|
|
|
from trustgraph.gateway.service import (
|
|
Api,
|
|
default_pulsar_host, default_prometheus_url,
|
|
default_timeout, default_port,
|
|
)
|
|
from trustgraph.gateway.auth import IamAuth
|
|
|
|
|
|
# -- constants -------------------------------------------------------------
|
|
|
|
|
|
class TestDefaults:
|
|
|
|
def test_exports_default_constants(self):
|
|
# These are consumed by CLIs / tests / docs. Sanity-check
|
|
# that they're the expected shape.
|
|
assert default_port == 8088
|
|
assert default_timeout == 600
|
|
assert default_pulsar_host.startswith("pulsar://")
|
|
assert default_prometheus_url.startswith("http")
|
|
|
|
|
|
# -- Api construction ------------------------------------------------------
|
|
|
|
|
|
@pytest.fixture
|
|
def mock_backend():
|
|
return Mock()
|
|
|
|
|
|
@pytest.fixture
|
|
def api(mock_backend):
|
|
with patch(
|
|
"trustgraph.gateway.service.get_pubsub",
|
|
return_value=mock_backend,
|
|
):
|
|
yield Api()
|
|
|
|
|
|
class TestApiConstruction:
|
|
|
|
def test_defaults(self, api):
|
|
assert api.port == default_port
|
|
assert api.timeout == default_timeout
|
|
assert api.pulsar_host == default_pulsar_host
|
|
assert api.pulsar_api_key is None
|
|
# prometheus_url gets normalised with a trailing slash
|
|
assert api.prometheus_url == default_prometheus_url + "/"
|
|
|
|
def test_auth_is_iam_backed(self, api):
|
|
# Any Api always gets an IamAuth. There is no "no auth" mode
|
|
# (GATEWAY_SECRET / allow_all has been removed — see IAM spec).
|
|
assert isinstance(api.auth, IamAuth)
|
|
|
|
def test_components_wired(self, api):
|
|
assert api.config_receiver is not None
|
|
assert api.dispatcher_manager is not None
|
|
assert api.endpoint_manager is not None
|
|
|
|
def test_dispatcher_manager_has_auth(self, api):
|
|
# The Mux uses this handle for first-frame socket auth.
|
|
assert api.dispatcher_manager.auth is api.auth
|
|
|
|
def test_custom_config(self, mock_backend):
|
|
config = {
|
|
"port": 9000,
|
|
"timeout": 300,
|
|
"pulsar_host": "pulsar://custom-host:6650",
|
|
"pulsar_api_key": "custom-key",
|
|
"prometheus_url": "http://custom-prometheus:9090",
|
|
}
|
|
with patch(
|
|
"trustgraph.gateway.service.get_pubsub",
|
|
return_value=mock_backend,
|
|
):
|
|
a = Api(**config)
|
|
|
|
assert a.port == 9000
|
|
assert a.timeout == 300
|
|
assert a.pulsar_host == "pulsar://custom-host:6650"
|
|
assert a.pulsar_api_key == "custom-key"
|
|
# Trailing slash added.
|
|
assert a.prometheus_url == "http://custom-prometheus:9090/"
|
|
|
|
def test_prometheus_url_already_has_trailing_slash(self, mock_backend):
|
|
with patch(
|
|
"trustgraph.gateway.service.get_pubsub",
|
|
return_value=mock_backend,
|
|
):
|
|
a = Api(prometheus_url="http://p:9090/")
|
|
assert a.prometheus_url == "http://p:9090/"
|
|
|
|
def test_queue_overrides_parsed_for_config(self, mock_backend):
|
|
with patch(
|
|
"trustgraph.gateway.service.get_pubsub",
|
|
return_value=mock_backend,
|
|
):
|
|
a = Api(
|
|
config_request_queue="alt-config-req",
|
|
config_response_queue="alt-config-resp",
|
|
)
|
|
overrides = a.dispatcher_manager.queue_overrides
|
|
assert overrides.get("config", {}).get("request") == "alt-config-req"
|
|
assert overrides.get("config", {}).get("response") == "alt-config-resp"
|
|
|
|
|
|
# -- app_factory -----------------------------------------------------------
|
|
|
|
|
|
class TestAppFactory:
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_creates_aiohttp_app(self, api):
|
|
# Stub out the long-tail dependencies that reach out to IAM /
|
|
# pub/sub so we can exercise the factory in isolation.
|
|
api.auth.start = AsyncMock()
|
|
api.config_receiver = Mock()
|
|
api.config_receiver.start = AsyncMock()
|
|
api.endpoint_manager = Mock()
|
|
api.endpoint_manager.add_routes = Mock()
|
|
api.endpoint_manager.start = AsyncMock()
|
|
api.endpoints = []
|
|
|
|
app = await api.app_factory()
|
|
|
|
assert isinstance(app, web.Application)
|
|
assert app._client_max_size == 256 * 1024 * 1024
|
|
api.auth.start.assert_called_once()
|
|
api.config_receiver.start.assert_called_once()
|
|
api.endpoint_manager.add_routes.assert_called_once_with(app)
|
|
api.endpoint_manager.start.assert_called_once()
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_auth_start_runs_before_accepting_traffic(self, api):
|
|
"""``auth.start()`` fetches the IAM signing key, and must
|
|
complete (or time out) before the gateway begins accepting
|
|
requests. It's the first await in app_factory."""
|
|
order = []
|
|
|
|
# AsyncMock.side_effect expects a sync callable (its return
|
|
# value becomes the coroutine's return); a plain list.append
|
|
# avoids the "coroutine was never awaited" trap of an async
|
|
# side_effect.
|
|
api.auth.start = AsyncMock(
|
|
side_effect=lambda: order.append("auth"),
|
|
)
|
|
api.config_receiver = Mock()
|
|
api.config_receiver.start = AsyncMock(
|
|
side_effect=lambda: order.append("config"),
|
|
)
|
|
api.endpoint_manager = Mock()
|
|
api.endpoint_manager.add_routes = Mock()
|
|
api.endpoint_manager.start = AsyncMock(
|
|
side_effect=lambda: order.append("endpoints"),
|
|
)
|
|
api.endpoints = []
|
|
|
|
await api.app_factory()
|
|
|
|
# auth.start must be first (before config receiver, before
|
|
# any endpoint starts).
|
|
assert order[0] == "auth"
|
|
# All three must have run.
|
|
assert set(order) == {"auth", "config", "endpoints"}
|