feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)

Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model.  The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.

IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
  passwords and JWT signing keys in Cassandra.  Reached over the
  standard pub/sub request/response pattern; gateway is the only
  caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
  rotate-signing-key, create/list/get/update/disable/delete/enable-user,
  change-password, reset-password, create/list/get/update/disable-
  workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA).  Key rotation writes a new kid and
  retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed.  Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
  required startup argument with no permissive default.  Masked
  "auth failure" errors hide whether a refused bootstrap request was
  due to mode, state, or authorisation.

Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator.  Distinguishes JWTs
  (three-segment dotted) from API keys by shape; verifies JWTs
  locally using the cached IAM public key; resolves API keys via
  IAM with a short-TTL hash-keyed cache.  Every failure path
  surfaces the same 401 body ("auth failure") so callers cannot
  enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
  traffic does not begin flowing until auth has started.

Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
  OSS ships reader / writer / admin; the first two are workspace-
  assigned, admin is cross-workspace ("*").  No "cross-workspace"
  pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
  authorisation test: some role must grant the capability *and* be
  active in the target workspace.
* enforce_workspace validates a request-body workspace against the
  caller's role scopes and injects the resolved value.  Cross-
  workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
  permissive default.  Construction fails fast if omitted.  Enterprise
  editions can replace the role table without changing the wire
  protocol.

WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
  runs on the first WebSocket frame ({"type":"auth","token":"..."})
  with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
  The socket stays open on failure so the client can re-authenticate
  — browsers treat a handshake-time 401 as terminal, breaking
  reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
  enforces the caller's workspace (envelope + inner payload) using
  the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
  handshake (URL-scoped short-lived transfers; no re-auth need).

Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
  op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
  the IAM API (per-op REST endpoints to follow in a later change).

Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
  Authenticator.permitted contract.  The gateway cannot run without
  IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
  downgrade path.

CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces.  Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.

Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
  resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
  operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
  role bundles, agent-as-composition note, enforcement-boundary
  policy, enterprise extensibility.

Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
  Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
  role x workspace combinations, enforce_workspace paths,
  unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
  explicitly (no permissive defaults relied upon).  New tests pin
  the fail-closed invariants: DispatcherManager / Mux refuse
  auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
This commit is contained in:
cybermaggedon 2026-04-24 17:29:10 +01:00 committed by GitHub
parent ae9936c9cc
commit 67b2fc448f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
61 changed files with 6474 additions and 792 deletions

View file

@ -1,69 +1,312 @@
"""
Tests for Gateway Authentication
Tests for gateway/auth.py IamAuth, JWT verification, API key
resolution cache.
JWTs are signed with real Ed25519 keypairs generated per-test, so
the crypto path is exercised end-to-end without mocks. API-key
resolution is tested against a stubbed IamClient since the real
one requires pub/sub.
"""
import base64
import json
import time
from unittest.mock import AsyncMock, Mock, patch
import pytest
from aiohttp import web
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.primitives.asymmetric import ed25519
from trustgraph.gateway.auth import Authenticator
from trustgraph.gateway.auth import (
IamAuth, Identity,
_b64url_decode, _verify_jwt_eddsa,
API_KEY_CACHE_TTL,
)
class TestAuthenticator:
"""Test cases for Authenticator class"""
# -- helpers ---------------------------------------------------------------
def test_authenticator_initialization_with_token(self):
"""Test Authenticator initialization with valid token"""
auth = Authenticator(token="test-token-123")
assert auth.token == "test-token-123"
assert auth.allow_all is False
def test_authenticator_initialization_with_allow_all(self):
"""Test Authenticator initialization with allow_all=True"""
auth = Authenticator(allow_all=True)
assert auth.token is None
assert auth.allow_all is True
def _b64url(data: bytes) -> str:
return base64.urlsafe_b64encode(data).rstrip(b"=").decode("ascii")
def test_authenticator_initialization_without_token_raises_error(self):
"""Test Authenticator initialization without token raises RuntimeError"""
with pytest.raises(RuntimeError, match="Need a token"):
Authenticator()
def test_authenticator_initialization_with_empty_token_raises_error(self):
"""Test Authenticator initialization with empty token raises RuntimeError"""
with pytest.raises(RuntimeError, match="Need a token"):
Authenticator(token="")
def make_keypair():
priv = ed25519.Ed25519PrivateKey.generate()
public_pem = priv.public_key().public_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PublicFormat.SubjectPublicKeyInfo,
).decode("ascii")
return priv, public_pem
def test_permitted_with_allow_all_returns_true(self):
"""Test permitted method returns True when allow_all is enabled"""
auth = Authenticator(allow_all=True)
# Should return True regardless of token or roles
assert auth.permitted("any-token", []) is True
assert auth.permitted("different-token", ["admin"]) is True
assert auth.permitted(None, ["user"]) is True
def test_permitted_with_matching_token_returns_true(self):
"""Test permitted method returns True with matching token"""
auth = Authenticator(token="secret-token")
# Should return True when tokens match
assert auth.permitted("secret-token", []) is True
assert auth.permitted("secret-token", ["admin", "user"]) is True
def sign_jwt(priv, claims, alg="EdDSA"):
header = {"alg": alg, "typ": "JWT", "kid": "kid-test"}
h = _b64url(json.dumps(header, separators=(",", ":"), sort_keys=True).encode())
p = _b64url(json.dumps(claims, separators=(",", ":"), sort_keys=True).encode())
signing_input = f"{h}.{p}".encode("ascii")
if alg == "EdDSA":
sig = priv.sign(signing_input)
else:
raise ValueError(f"test helper doesn't sign {alg}")
return f"{h}.{p}.{_b64url(sig)}"
def test_permitted_with_non_matching_token_returns_false(self):
"""Test permitted method returns False with non-matching token"""
auth = Authenticator(token="secret-token")
# Should return False when tokens don't match
assert auth.permitted("wrong-token", []) is False
assert auth.permitted("different-token", ["admin"]) is False
assert auth.permitted(None, ["user"]) is False
def test_permitted_with_token_and_allow_all_returns_true(self):
"""Test permitted method with both token and allow_all set"""
auth = Authenticator(token="test-token", allow_all=True)
# allow_all should take precedence
assert auth.permitted("any-token", []) is True
assert auth.permitted("wrong-token", ["admin"]) is True
def make_request(auth_header):
"""Minimal stand-in for an aiohttp request — IamAuth only reads
``request.headers["Authorization"]``."""
req = Mock()
req.headers = {}
if auth_header is not None:
req.headers["Authorization"] = auth_header
return req
# -- pure helpers ----------------------------------------------------------
class TestB64UrlDecode:
def test_round_trip_without_padding(self):
data = b"hello"
encoded = _b64url(data)
assert _b64url_decode(encoded) == data
def test_handles_various_lengths(self):
for s in (b"a", b"ab", b"abc", b"abcd", b"abcde"):
assert _b64url_decode(_b64url(s)) == s
# -- JWT verification -----------------------------------------------------
class TestVerifyJwtEddsa:
def test_valid_jwt_passes(self):
priv, pub = make_keypair()
claims = {
"sub": "user-1", "workspace": "default",
"roles": ["reader"],
"iat": int(time.time()),
"exp": int(time.time()) + 60,
}
token = sign_jwt(priv, claims)
got = _verify_jwt_eddsa(token, pub)
assert got["sub"] == "user-1"
assert got["workspace"] == "default"
def test_expired_jwt_rejected(self):
priv, pub = make_keypair()
claims = {
"sub": "user-1", "workspace": "default", "roles": [],
"iat": int(time.time()) - 3600,
"exp": int(time.time()) - 1,
}
token = sign_jwt(priv, claims)
with pytest.raises(ValueError, match="expired"):
_verify_jwt_eddsa(token, pub)
def test_bad_signature_rejected(self):
priv_a, _ = make_keypair()
_, pub_b = make_keypair()
claims = {
"sub": "user-1", "workspace": "default", "roles": [],
"iat": int(time.time()),
"exp": int(time.time()) + 60,
}
token = sign_jwt(priv_a, claims)
# pub_b never signed this token.
with pytest.raises(Exception):
_verify_jwt_eddsa(token, pub_b)
def test_malformed_jwt_rejected(self):
_, pub = make_keypair()
with pytest.raises(ValueError, match="malformed"):
_verify_jwt_eddsa("not-a-jwt", pub)
def test_unsupported_algorithm_rejected(self):
priv, pub = make_keypair()
# Manually build an "alg":"HS256" header — no signer needed
# since we expect it to bail before verifying.
header = {"alg": "HS256", "typ": "JWT", "kid": "x"}
payload = {
"sub": "user-1", "workspace": "default", "roles": [],
"iat": int(time.time()), "exp": int(time.time()) + 60,
}
h = _b64url(json.dumps(header, separators=(",", ":")).encode())
p = _b64url(json.dumps(payload, separators=(",", ":")).encode())
sig = _b64url(b"not-a-real-sig")
token = f"{h}.{p}.{sig}"
with pytest.raises(ValueError, match="unsupported alg"):
_verify_jwt_eddsa(token, pub)
# -- Identity --------------------------------------------------------------
class TestIdentity:
def test_fields(self):
i = Identity(
user_id="u", workspace="w", roles=["reader"], source="api-key",
)
assert i.user_id == "u"
assert i.workspace == "w"
assert i.roles == ["reader"]
assert i.source == "api-key"
# -- IamAuth.authenticate --------------------------------------------------
class TestIamAuthDispatch:
"""``authenticate()`` chooses between the JWT and API-key paths
by shape of the bearer."""
@pytest.mark.asyncio
async def test_no_authorization_header_raises_401(self):
auth = IamAuth(backend=Mock())
with pytest.raises(web.HTTPUnauthorized):
await auth.authenticate(make_request(None))
@pytest.mark.asyncio
async def test_non_bearer_header_raises_401(self):
auth = IamAuth(backend=Mock())
with pytest.raises(web.HTTPUnauthorized):
await auth.authenticate(make_request("Basic whatever"))
@pytest.mark.asyncio
async def test_empty_bearer_raises_401(self):
auth = IamAuth(backend=Mock())
with pytest.raises(web.HTTPUnauthorized):
await auth.authenticate(make_request("Bearer "))
@pytest.mark.asyncio
async def test_unknown_format_raises_401(self):
# Not tg_... and not dotted-JWT shape.
auth = IamAuth(backend=Mock())
with pytest.raises(web.HTTPUnauthorized):
await auth.authenticate(make_request("Bearer garbage"))
@pytest.mark.asyncio
async def test_valid_jwt_resolves_to_identity(self):
priv, pub = make_keypair()
claims = {
"sub": "user-1", "workspace": "default",
"roles": ["writer"],
"iat": int(time.time()),
"exp": int(time.time()) + 60,
}
token = sign_jwt(priv, claims)
auth = IamAuth(backend=Mock())
auth._signing_public_pem = pub
ident = await auth.authenticate(
make_request(f"Bearer {token}")
)
assert ident.user_id == "user-1"
assert ident.workspace == "default"
assert ident.roles == ["writer"]
assert ident.source == "jwt"
@pytest.mark.asyncio
async def test_jwt_without_public_key_fails(self):
# If the gateway hasn't fetched IAM's public key yet, JWTs
# must not validate — even ones that would otherwise pass.
priv, _ = make_keypair()
claims = {
"sub": "user-1", "workspace": "default", "roles": [],
"iat": int(time.time()), "exp": int(time.time()) + 60,
}
token = sign_jwt(priv, claims)
auth = IamAuth(backend=Mock())
# _signing_public_pem defaults to None
with pytest.raises(web.HTTPUnauthorized):
await auth.authenticate(make_request(f"Bearer {token}"))
@pytest.mark.asyncio
async def test_api_key_path(self):
auth = IamAuth(backend=Mock())
async def fake_resolve(api_key):
assert api_key == "tg_testkey"
return ("user-xyz", "default", ["admin"])
async def fake_with_client(op):
return await op(Mock(resolve_api_key=fake_resolve))
with patch.object(auth, "_with_client", side_effect=fake_with_client):
ident = await auth.authenticate(
make_request("Bearer tg_testkey")
)
assert ident.user_id == "user-xyz"
assert ident.workspace == "default"
assert ident.roles == ["admin"]
assert ident.source == "api-key"
@pytest.mark.asyncio
async def test_api_key_rejection_masked_as_401(self):
auth = IamAuth(backend=Mock())
async def fake_with_client(op):
raise RuntimeError("auth-failed: unknown api key")
with patch.object(auth, "_with_client", side_effect=fake_with_client):
with pytest.raises(web.HTTPUnauthorized):
await auth.authenticate(
make_request("Bearer tg_bogus")
)
# -- API key cache ---------------------------------------------------------
class TestApiKeyCache:
@pytest.mark.asyncio
async def test_cache_hit_skips_iam(self):
auth = IamAuth(backend=Mock())
calls = {"n": 0}
async def fake_with_client(op):
calls["n"] += 1
return await op(Mock(
resolve_api_key=AsyncMock(
return_value=("u", "default", ["reader"]),
)
))
with patch.object(auth, "_with_client", side_effect=fake_with_client):
await auth.authenticate(make_request("Bearer tg_k1"))
await auth.authenticate(make_request("Bearer tg_k1"))
await auth.authenticate(make_request("Bearer tg_k1"))
# Only the first lookup reaches IAM; the rest are cache hits.
assert calls["n"] == 1
@pytest.mark.asyncio
async def test_different_keys_are_separately_cached(self):
auth = IamAuth(backend=Mock())
seen = []
async def fake_with_client(op):
async def resolve(plaintext):
seen.append(plaintext)
return ("u-" + plaintext, "default", ["reader"])
return await op(Mock(resolve_api_key=resolve))
with patch.object(auth, "_with_client", side_effect=fake_with_client):
a = await auth.authenticate(make_request("Bearer tg_a"))
b = await auth.authenticate(make_request("Bearer tg_b"))
assert a.user_id == "u-tg_a"
assert b.user_id == "u-tg_b"
assert seen == ["tg_a", "tg_b"]
@pytest.mark.asyncio
async def test_cache_has_ttl_constant_set(self):
# Not a behaviour test — just ensures we don't accidentally
# set TTL to 0 (which would defeat the cache) or to a week.
assert 10 <= API_KEY_CACHE_TTL <= 3600