trustgraph/tests/unit/test_gateway/test_endpoint_socket.py
cybermaggedon 67b2fc448f
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model.  The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.

IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
  passwords and JWT signing keys in Cassandra.  Reached over the
  standard pub/sub request/response pattern; gateway is the only
  caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
  rotate-signing-key, create/list/get/update/disable/delete/enable-user,
  change-password, reset-password, create/list/get/update/disable-
  workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA).  Key rotation writes a new kid and
  retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed.  Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
  required startup argument with no permissive default.  Masked
  "auth failure" errors hide whether a refused bootstrap request was
  due to mode, state, or authorisation.

Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator.  Distinguishes JWTs
  (three-segment dotted) from API keys by shape; verifies JWTs
  locally using the cached IAM public key; resolves API keys via
  IAM with a short-TTL hash-keyed cache.  Every failure path
  surfaces the same 401 body ("auth failure") so callers cannot
  enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
  traffic does not begin flowing until auth has started.

Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
  OSS ships reader / writer / admin; the first two are workspace-
  assigned, admin is cross-workspace ("*").  No "cross-workspace"
  pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
  authorisation test: some role must grant the capability *and* be
  active in the target workspace.
* enforce_workspace validates a request-body workspace against the
  caller's role scopes and injects the resolved value.  Cross-
  workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
  permissive default.  Construction fails fast if omitted.  Enterprise
  editions can replace the role table without changing the wire
  protocol.

WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
  runs on the first WebSocket frame ({"type":"auth","token":"..."})
  with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
  The socket stays open on failure so the client can re-authenticate
  — browsers treat a handshake-time 401 as terminal, breaking
  reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
  enforces the caller's workspace (envelope + inner payload) using
  the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
  handshake (URL-scoped short-lived transfers; no re-auth need).

Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
  op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
  the IAM API (per-op REST endpoints to follow in a later change).

Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
  Authenticator.permitted contract.  The gateway cannot run without
  IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
  downgrade path.

CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces.  Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.

Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
  resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
  operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
  role bundles, agent-as-composition note, enforcement-boundary
  policy, enterprise extensibility.

Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
  Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
  role x workspace combinations, enforce_workspace paths,
  unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
  explicitly (no permissive defaults relied upon).  New tests pin
  the fail-closed invariants: DispatcherManager / Mux refuse
  auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00

158 lines
No EOL
5.3 KiB
Python

"""
Tests for Gateway Socket Endpoint.
In production the only SocketEndpoint registered with HTTP-layer
auth is ``/api/v1/socket`` using ``capability=AUTHENTICATED`` with
``in_band_auth=True`` (first-frame auth over the websocket frames,
not at the handshake). The tests below use AUTHENTICATED as the
representative capability; construction / worker / listener
behaviour is independent of which capability is configured.
"""
import pytest
from unittest.mock import MagicMock, AsyncMock
from aiohttp import WSMsgType
from trustgraph.gateway.endpoint.socket import SocketEndpoint
from trustgraph.gateway.capabilities import AUTHENTICATED
class TestSocketEndpoint:
"""Test cases for SocketEndpoint class"""
def test_socket_endpoint_initialization(self):
"""Construction records the configured capability on the
instance. No permissive default is applied."""
mock_auth = MagicMock()
mock_dispatcher = MagicMock()
endpoint = SocketEndpoint(
endpoint_path="/api/socket",
auth=mock_auth,
dispatcher=mock_dispatcher,
capability=AUTHENTICATED,
)
assert endpoint.path == "/api/socket"
assert endpoint.auth == mock_auth
assert endpoint.dispatcher == mock_dispatcher
assert endpoint.capability == AUTHENTICATED
@pytest.mark.asyncio
async def test_worker_method(self):
"""Test SocketEndpoint worker method"""
mock_auth = MagicMock()
mock_dispatcher = AsyncMock()
endpoint = SocketEndpoint(
"/api/socket", mock_auth, mock_dispatcher,
capability=AUTHENTICATED,
)
mock_ws = MagicMock()
mock_running = MagicMock()
# Call worker method
await endpoint.worker(mock_ws, mock_dispatcher, mock_running)
# Verify dispatcher.run was called
mock_dispatcher.run.assert_called_once()
@pytest.mark.asyncio
async def test_listener_method_with_text_message(self):
"""Test SocketEndpoint listener method with text message"""
mock_auth = MagicMock()
mock_dispatcher = AsyncMock()
endpoint = SocketEndpoint(
"/api/socket", mock_auth, mock_dispatcher,
capability=AUTHENTICATED,
)
# Mock websocket with text message
mock_msg = MagicMock()
mock_msg.type = WSMsgType.TEXT
# Create async iterator for websocket
async def async_iter():
yield mock_msg
mock_ws = AsyncMock()
mock_ws.__aiter__ = lambda self: async_iter()
mock_ws.closed = False # Set closed attribute
mock_running = MagicMock()
# Call listener method
await endpoint.listener(mock_ws, mock_dispatcher, mock_running)
# Verify dispatcher.receive was called with the message
mock_dispatcher.receive.assert_called_once_with(mock_msg)
# Verify cleanup methods were called
mock_running.stop.assert_called_once()
mock_ws.close.assert_called_once()
@pytest.mark.asyncio
async def test_listener_method_with_binary_message(self):
"""Test SocketEndpoint listener method with binary message"""
mock_auth = MagicMock()
mock_dispatcher = AsyncMock()
endpoint = SocketEndpoint(
"/api/socket", mock_auth, mock_dispatcher,
capability=AUTHENTICATED,
)
# Mock websocket with binary message
mock_msg = MagicMock()
mock_msg.type = WSMsgType.BINARY
# Create async iterator for websocket
async def async_iter():
yield mock_msg
mock_ws = AsyncMock()
mock_ws.__aiter__ = lambda self: async_iter()
mock_ws.closed = False # Set closed attribute
mock_running = MagicMock()
# Call listener method
await endpoint.listener(mock_ws, mock_dispatcher, mock_running)
# Verify dispatcher.receive was called with the message
mock_dispatcher.receive.assert_called_once_with(mock_msg)
# Verify cleanup methods were called
mock_running.stop.assert_called_once()
mock_ws.close.assert_called_once()
@pytest.mark.asyncio
async def test_listener_method_with_close_message(self):
"""Test SocketEndpoint listener method with close message"""
mock_auth = MagicMock()
mock_dispatcher = AsyncMock()
endpoint = SocketEndpoint(
"/api/socket", mock_auth, mock_dispatcher,
capability=AUTHENTICATED,
)
# Mock websocket with close message
mock_msg = MagicMock()
mock_msg.type = WSMsgType.CLOSE
# Create async iterator for websocket
async def async_iter():
yield mock_msg
mock_ws = AsyncMock()
mock_ws.__aiter__ = lambda self: async_iter()
mock_ws.closed = False # Set closed attribute
mock_running = MagicMock()
# Call listener method
await endpoint.listener(mock_ws, mock_dispatcher, mock_running)
# Verify dispatcher.receive was NOT called for close message
mock_dispatcher.receive.assert_not_called()
# Verify cleanup methods were called after break
mock_running.stop.assert_called_once()
mock_ws.close.assert_called_once()