mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 00:16:23 +02:00
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model. The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.
IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
passwords and JWT signing keys in Cassandra. Reached over the
standard pub/sub request/response pattern; gateway is the only
caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
rotate-signing-key, create/list/get/update/disable/delete/enable-user,
change-password, reset-password, create/list/get/update/disable-
workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and
retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed. Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
required startup argument with no permissive default. Masked
"auth failure" errors hide whether a refused bootstrap request was
due to mode, state, or authorisation.
Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator. Distinguishes JWTs
(three-segment dotted) from API keys by shape; verifies JWTs
locally using the cached IAM public key; resolves API keys via
IAM with a short-TTL hash-keyed cache. Every failure path
surfaces the same 401 body ("auth failure") so callers cannot
enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
traffic does not begin flowing until auth has started.
Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
OSS ships reader / writer / admin; the first two are workspace-
assigned, admin is cross-workspace ("*"). No "cross-workspace"
pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
authorisation test: some role must grant the capability *and* be
active in the target workspace.
* enforce_workspace validates a request-body workspace against the
caller's role scopes and injects the resolved value. Cross-
workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
permissive default. Construction fails fast if omitted. Enterprise
editions can replace the role table without changing the wire
protocol.
WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
runs on the first WebSocket frame ({"type":"auth","token":"..."})
with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
The socket stays open on failure so the client can re-authenticate
— browsers treat a handshake-time 401 as terminal, breaking
reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
enforces the caller's workspace (envelope + inner payload) using
the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
handshake (URL-scoped short-lived transfers; no re-auth need).
Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
the IAM API (per-op REST endpoints to follow in a later change).
Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
Authenticator.permitted contract. The gateway cannot run without
IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
downgrade path.
CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.
Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
role bundles, agent-as-composition note, enforcement-boundary
policy, enterprise extensibility.
Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
role x workspace combinations, enforce_workspace paths,
unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
explicitly (no permissive defaults relied upon). New tests pin
the fail-closed invariants: DispatcherManager / Mux refuse
auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
437 lines
No EOL
14 KiB
Python
437 lines
No EOL
14 KiB
Python
"""Unit tests for SocketEndpoint graceful shutdown functionality.
|
|
|
|
These tests exercise SocketEndpoint in its handshake-auth
|
|
configuration (``in_band_auth=False``) — the mode used in production
|
|
for the flow import/export streaming endpoints. The mux socket at
|
|
``/api/v1/socket`` uses ``in_band_auth=True`` instead, where the
|
|
handshake always accepts and authentication runs on the first
|
|
WebSocket frame; that path is covered by the Mux tests.
|
|
|
|
Every endpoint constructor here passes an explicit capability — no
|
|
permissive default is relied upon.
|
|
"""
|
|
|
|
import pytest
|
|
import asyncio
|
|
from unittest.mock import AsyncMock, MagicMock, patch
|
|
from aiohttp import web, WSMsgType
|
|
from trustgraph.gateway.endpoint.socket import SocketEndpoint
|
|
from trustgraph.gateway.running import Running
|
|
from trustgraph.gateway.auth import Identity
|
|
|
|
|
|
# Representative capability used across these tests — corresponds to
|
|
# the flow-import streaming endpoint pattern that uses this class.
|
|
TEST_CAP = "graph:write"
|
|
|
|
|
|
def _valid_identity(roles=("admin",)):
|
|
return Identity(
|
|
user_id="test-user",
|
|
workspace="default",
|
|
roles=list(roles),
|
|
source="api-key",
|
|
)
|
|
|
|
|
|
@pytest.fixture
|
|
def mock_auth():
|
|
"""Mock IAM-backed authenticator. Successful by default —
|
|
``authenticate`` returns a valid admin identity. Tests that
|
|
need the auth failure path override the ``authenticate``
|
|
attribute locally."""
|
|
auth = MagicMock()
|
|
auth.authenticate = AsyncMock(return_value=_valid_identity())
|
|
return auth
|
|
|
|
|
|
@pytest.fixture
|
|
def mock_dispatcher_factory():
|
|
"""Mock dispatcher factory function."""
|
|
async def dispatcher_factory(ws, running, match_info):
|
|
dispatcher = AsyncMock()
|
|
dispatcher.run = AsyncMock()
|
|
dispatcher.receive = AsyncMock()
|
|
dispatcher.destroy = AsyncMock()
|
|
return dispatcher
|
|
|
|
return dispatcher_factory
|
|
|
|
|
|
@pytest.fixture
|
|
def socket_endpoint(mock_auth, mock_dispatcher_factory):
|
|
"""Create SocketEndpoint for testing."""
|
|
return SocketEndpoint(
|
|
endpoint_path="/test-socket",
|
|
auth=mock_auth,
|
|
dispatcher=mock_dispatcher_factory,
|
|
capability=TEST_CAP,
|
|
)
|
|
|
|
|
|
@pytest.fixture
|
|
def mock_websocket():
|
|
"""Mock websocket response."""
|
|
ws = AsyncMock(spec=web.WebSocketResponse)
|
|
ws.prepare = AsyncMock()
|
|
ws.close = AsyncMock()
|
|
ws.closed = False
|
|
return ws
|
|
|
|
|
|
@pytest.fixture
|
|
def mock_request():
|
|
"""Mock HTTP request."""
|
|
request = MagicMock()
|
|
request.query = {"token": "test-token"}
|
|
request.match_info = {}
|
|
return request
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_listener_graceful_shutdown_on_close():
|
|
"""Test listener handles websocket close gracefully."""
|
|
socket_endpoint = SocketEndpoint(
|
|
"/test", MagicMock(), AsyncMock(),
|
|
capability=TEST_CAP,
|
|
)
|
|
|
|
# Mock websocket that closes after one message
|
|
ws = AsyncMock()
|
|
|
|
# Create async iterator that yields one message then closes
|
|
async def mock_iterator(self):
|
|
# Yield normal message
|
|
msg = MagicMock()
|
|
msg.type = WSMsgType.TEXT
|
|
yield msg
|
|
|
|
# Yield close message
|
|
close_msg = MagicMock()
|
|
close_msg.type = WSMsgType.CLOSE
|
|
yield close_msg
|
|
|
|
# Set the async iterator method
|
|
ws.__aiter__ = mock_iterator
|
|
|
|
dispatcher = AsyncMock()
|
|
running = Running()
|
|
|
|
with patch('asyncio.sleep') as mock_sleep:
|
|
await socket_endpoint.listener(ws, dispatcher, running)
|
|
|
|
# Should have processed one message
|
|
dispatcher.receive.assert_called_once()
|
|
|
|
# Should have initiated graceful shutdown
|
|
assert running.get() is False
|
|
|
|
# Should have slept for grace period
|
|
mock_sleep.assert_called_once_with(1.0)
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_handle_normal_flow():
|
|
"""Valid bearer → handshake accepted, dispatcher created."""
|
|
mock_auth = MagicMock()
|
|
mock_auth.authenticate = AsyncMock(return_value=_valid_identity())
|
|
|
|
dispatcher_created = False
|
|
async def mock_dispatcher_factory(ws, running, match_info):
|
|
nonlocal dispatcher_created
|
|
dispatcher_created = True
|
|
dispatcher = AsyncMock()
|
|
dispatcher.destroy = AsyncMock()
|
|
return dispatcher
|
|
|
|
socket_endpoint = SocketEndpoint(
|
|
"/test", mock_auth, mock_dispatcher_factory,
|
|
capability=TEST_CAP,
|
|
)
|
|
|
|
request = MagicMock()
|
|
request.query = {"token": "valid-token"}
|
|
request.match_info = {}
|
|
|
|
with patch('aiohttp.web.WebSocketResponse') as mock_ws_class:
|
|
mock_ws = AsyncMock()
|
|
mock_ws.prepare = AsyncMock()
|
|
mock_ws.close = AsyncMock()
|
|
mock_ws.closed = False
|
|
mock_ws_class.return_value = mock_ws
|
|
|
|
with patch('asyncio.TaskGroup') as mock_task_group:
|
|
# Mock task group context manager
|
|
mock_tg = AsyncMock()
|
|
mock_tg.__aenter__ = AsyncMock(return_value=mock_tg)
|
|
mock_tg.__aexit__ = AsyncMock(return_value=None)
|
|
|
|
# Create proper mock tasks that look like asyncio.Task objects
|
|
def create_task_mock(coro):
|
|
# Consume the coroutine to avoid "was never awaited" warning
|
|
coro.close()
|
|
task = AsyncMock()
|
|
task.done = MagicMock(return_value=True)
|
|
task.cancelled = MagicMock(return_value=False)
|
|
return task
|
|
|
|
mock_tg.create_task = MagicMock(side_effect=create_task_mock)
|
|
mock_task_group.return_value = mock_tg
|
|
|
|
result = await socket_endpoint.handle(request)
|
|
|
|
# Should have created dispatcher
|
|
assert dispatcher_created is True
|
|
|
|
# Should return websocket
|
|
assert result == mock_ws
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_handle_exception_group_cleanup():
|
|
"""Test exception group triggers dispatcher cleanup."""
|
|
mock_auth = MagicMock()
|
|
mock_auth.authenticate = AsyncMock(return_value=_valid_identity())
|
|
|
|
mock_dispatcher = AsyncMock()
|
|
mock_dispatcher.destroy = AsyncMock()
|
|
|
|
async def mock_dispatcher_factory(ws, running, match_info):
|
|
return mock_dispatcher
|
|
|
|
socket_endpoint = SocketEndpoint(
|
|
"/test", mock_auth, mock_dispatcher_factory,
|
|
capability=TEST_CAP,
|
|
)
|
|
|
|
request = MagicMock()
|
|
request.query = {"token": "valid-token"}
|
|
request.match_info = {}
|
|
|
|
# Mock TaskGroup to raise ExceptionGroup
|
|
class TestException(Exception):
|
|
pass
|
|
|
|
exception_group = ExceptionGroup("Test exceptions", [TestException("test")])
|
|
|
|
with patch('aiohttp.web.WebSocketResponse') as mock_ws_class:
|
|
mock_ws = AsyncMock()
|
|
mock_ws.prepare = AsyncMock()
|
|
mock_ws.close = AsyncMock()
|
|
mock_ws.closed = False
|
|
mock_ws_class.return_value = mock_ws
|
|
|
|
with patch('asyncio.TaskGroup') as mock_task_group:
|
|
mock_tg = AsyncMock()
|
|
mock_tg.__aenter__ = AsyncMock(return_value=mock_tg)
|
|
mock_tg.__aexit__ = AsyncMock(side_effect=exception_group)
|
|
|
|
# Create proper mock tasks that look like asyncio.Task objects
|
|
def create_task_mock(coro):
|
|
# Consume the coroutine to avoid "was never awaited" warning
|
|
coro.close()
|
|
task = AsyncMock()
|
|
task.done = MagicMock(return_value=True)
|
|
task.cancelled = MagicMock(return_value=False)
|
|
return task
|
|
|
|
mock_tg.create_task = MagicMock(side_effect=create_task_mock)
|
|
mock_task_group.return_value = mock_tg
|
|
|
|
with patch('trustgraph.gateway.endpoint.socket.asyncio.wait_for', new_callable=AsyncMock) as mock_wait_for:
|
|
# Make wait_for consume the coroutine passed to it
|
|
async def wait_for_side_effect(coro, timeout=None):
|
|
coro.close() # Consume the coroutine
|
|
return None
|
|
mock_wait_for.side_effect = wait_for_side_effect
|
|
|
|
result = await socket_endpoint.handle(request)
|
|
|
|
# Should have attempted graceful cleanup
|
|
mock_wait_for.assert_called_once()
|
|
|
|
# Should have called destroy in finally block
|
|
assert mock_dispatcher.destroy.call_count >= 1
|
|
|
|
# Should have closed websocket
|
|
mock_ws.close.assert_called()
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_handle_dispatcher_cleanup_timeout():
|
|
"""Test dispatcher cleanup with timeout."""
|
|
mock_auth = MagicMock()
|
|
mock_auth.authenticate = AsyncMock(return_value=_valid_identity())
|
|
|
|
# Mock dispatcher that takes long to destroy
|
|
mock_dispatcher = AsyncMock()
|
|
mock_dispatcher.destroy = AsyncMock()
|
|
|
|
async def mock_dispatcher_factory(ws, running, match_info):
|
|
return mock_dispatcher
|
|
|
|
socket_endpoint = SocketEndpoint(
|
|
"/test", mock_auth, mock_dispatcher_factory,
|
|
capability=TEST_CAP,
|
|
)
|
|
|
|
request = MagicMock()
|
|
request.query = {"token": "valid-token"}
|
|
request.match_info = {}
|
|
|
|
# Mock TaskGroup to raise exception
|
|
exception_group = ExceptionGroup("Test", [Exception("test")])
|
|
|
|
with patch('aiohttp.web.WebSocketResponse') as mock_ws_class:
|
|
mock_ws = AsyncMock()
|
|
mock_ws.prepare = AsyncMock()
|
|
mock_ws.close = AsyncMock()
|
|
mock_ws.closed = False
|
|
mock_ws_class.return_value = mock_ws
|
|
|
|
with patch('asyncio.TaskGroup') as mock_task_group:
|
|
mock_tg = AsyncMock()
|
|
mock_tg.__aenter__ = AsyncMock(return_value=mock_tg)
|
|
mock_tg.__aexit__ = AsyncMock(side_effect=exception_group)
|
|
|
|
# Create proper mock tasks that look like asyncio.Task objects
|
|
def create_task_mock(coro):
|
|
# Consume the coroutine to avoid "was never awaited" warning
|
|
coro.close()
|
|
task = AsyncMock()
|
|
task.done = MagicMock(return_value=True)
|
|
task.cancelled = MagicMock(return_value=False)
|
|
return task
|
|
|
|
mock_tg.create_task = MagicMock(side_effect=create_task_mock)
|
|
mock_task_group.return_value = mock_tg
|
|
|
|
# Mock asyncio.wait_for to raise TimeoutError
|
|
with patch('trustgraph.gateway.endpoint.socket.asyncio.wait_for', new_callable=AsyncMock) as mock_wait_for:
|
|
# Make wait_for consume the coroutine before raising
|
|
async def wait_for_timeout(coro, timeout=None):
|
|
coro.close() # Consume the coroutine
|
|
raise asyncio.TimeoutError("Cleanup timeout")
|
|
mock_wait_for.side_effect = wait_for_timeout
|
|
|
|
result = await socket_endpoint.handle(request)
|
|
|
|
# Should have attempted cleanup with timeout
|
|
mock_wait_for.assert_called_once()
|
|
# Check that timeout was passed correctly
|
|
assert mock_wait_for.call_args[1]['timeout'] == 5.0
|
|
|
|
# Should still call destroy in finally block
|
|
assert mock_dispatcher.destroy.call_count >= 1
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_handle_unauthorized_request():
|
|
"""A bearer that the IAM layer rejects causes the handshake to
|
|
fail with 401. IamAuth surfaces an HTTPUnauthorized; the
|
|
endpoint propagates it. Note that the endpoint intentionally
|
|
does NOT distinguish 'bad token', 'expired', 'revoked', etc. —
|
|
that's the IAM error-masking policy."""
|
|
mock_auth = MagicMock()
|
|
mock_auth.authenticate = AsyncMock(side_effect=web.HTTPUnauthorized(
|
|
text='{"error":"auth failure"}',
|
|
content_type="application/json",
|
|
))
|
|
|
|
socket_endpoint = SocketEndpoint(
|
|
"/test", mock_auth, AsyncMock(),
|
|
capability=TEST_CAP,
|
|
)
|
|
|
|
request = MagicMock()
|
|
request.query = {"token": "invalid-token"}
|
|
|
|
result = await socket_endpoint.handle(request)
|
|
|
|
assert isinstance(result, web.HTTPUnauthorized)
|
|
# authenticate must have been invoked with a synthetic request
|
|
# carrying Bearer <the-token>. The endpoint wraps the query-
|
|
# string token into an Authorization header for a uniform auth
|
|
# path — the IAM layer does not look at query strings directly.
|
|
mock_auth.authenticate.assert_called_once()
|
|
passed_req = mock_auth.authenticate.call_args.args[0]
|
|
assert passed_req.headers["Authorization"] == "Bearer invalid-token"
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_handle_missing_token():
|
|
"""Request with no ``token`` query param → 401 before any
|
|
IAM call is made (cheap short-circuit)."""
|
|
mock_auth = MagicMock()
|
|
mock_auth.authenticate = AsyncMock(
|
|
side_effect=AssertionError(
|
|
"authenticate must not be invoked when no token is present"
|
|
),
|
|
)
|
|
|
|
socket_endpoint = SocketEndpoint(
|
|
"/test", mock_auth, AsyncMock(),
|
|
capability=TEST_CAP,
|
|
)
|
|
|
|
request = MagicMock()
|
|
request.query = {} # No token
|
|
|
|
result = await socket_endpoint.handle(request)
|
|
|
|
assert isinstance(result, web.HTTPUnauthorized)
|
|
mock_auth.authenticate.assert_not_called()
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_handle_websocket_already_closed():
|
|
"""Test handling when websocket is already closed."""
|
|
mock_auth = MagicMock()
|
|
mock_auth.authenticate = AsyncMock(return_value=_valid_identity())
|
|
|
|
mock_dispatcher = AsyncMock()
|
|
mock_dispatcher.destroy = AsyncMock()
|
|
|
|
async def mock_dispatcher_factory(ws, running, match_info):
|
|
return mock_dispatcher
|
|
|
|
socket_endpoint = SocketEndpoint(
|
|
"/test", mock_auth, mock_dispatcher_factory,
|
|
capability=TEST_CAP,
|
|
)
|
|
|
|
request = MagicMock()
|
|
request.query = {"token": "valid-token"}
|
|
request.match_info = {}
|
|
|
|
with patch('aiohttp.web.WebSocketResponse') as mock_ws_class:
|
|
mock_ws = AsyncMock()
|
|
mock_ws.prepare = AsyncMock()
|
|
mock_ws.close = AsyncMock()
|
|
mock_ws.closed = True # Already closed
|
|
mock_ws_class.return_value = mock_ws
|
|
|
|
with patch('asyncio.TaskGroup') as mock_task_group:
|
|
mock_tg = AsyncMock()
|
|
mock_tg.__aenter__ = AsyncMock(return_value=mock_tg)
|
|
mock_tg.__aexit__ = AsyncMock(return_value=None)
|
|
|
|
# Create proper mock tasks that look like asyncio.Task objects
|
|
def create_task_mock(coro):
|
|
# Consume the coroutine to avoid "was never awaited" warning
|
|
coro.close()
|
|
task = AsyncMock()
|
|
task.done = MagicMock(return_value=True)
|
|
task.cancelled = MagicMock(return_value=False)
|
|
return task
|
|
|
|
mock_tg.create_task = MagicMock(side_effect=create_task_mock)
|
|
mock_task_group.return_value = mock_tg
|
|
|
|
result = await socket_endpoint.handle(request)
|
|
|
|
# Should still have called destroy
|
|
mock_dispatcher.destroy.assert_called()
|
|
|
|
# Should not attempt to close already closed websocket
|
|
mock_ws.close.assert_not_called() # Not called in finally since ws.closed = True |