trustgraph/iam-testing.txt
cybermaggedon 67b2fc448f
feat: IAM service, gateway auth middleware, capability model, and CLIs (#849)
Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed
identity and authorisation model.  The gateway no longer has an
"allow-all" or "no auth" mode; every request is authenticated via the
IAM service, authorised against a capability model that encodes both
the operation and the workspace it targets, and rejected with a
deliberately-uninformative 401 / 403 on any failure.

IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam)
-----------------------------------------------------------------------
* New backend service (iam-svc) owning users, workspaces, API keys,
  passwords and JWT signing keys in Cassandra.  Reached over the
  standard pub/sub request/response pattern; gateway is the only
  caller.
* Operations: bootstrap, resolve-api-key, login, get-signing-key-public,
  rotate-signing-key, create/list/get/update/disable/delete/enable-user,
  change-password, reset-password, create/list/get/update/disable-
  workspace, create/list/revoke-api-key.
* Ed25519 JWT signing (alg=EdDSA).  Key rotation writes a new kid and
  retires the previous one; validation is grace-period friendly.
* Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt.
* API keys: 128-bit random, SHA-256 hashed.  Plaintext returned once.
* Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a
  required startup argument with no permissive default.  Masked
  "auth failure" errors hide whether a refused bootstrap request was
  due to mode, state, or authorisation.

Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py)
-------------------------------------------------------------------
* IamAuth replaces the legacy Authenticator.  Distinguishes JWTs
  (three-segment dotted) from API keys by shape; verifies JWTs
  locally using the cached IAM public key; resolves API keys via
  IAM with a short-TTL hash-keyed cache.  Every failure path
  surfaces the same 401 body ("auth failure") so callers cannot
  enumerate credential state.
* Public key is fetched at gateway startup with a bounded retry loop;
  traffic does not begin flowing until auth has started.

Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py)
---------------------------------------------------------------------
* Roles have two dimensions: a capability set and a workspace scope.
  OSS ships reader / writer / admin; the first two are workspace-
  assigned, admin is cross-workspace ("*").  No "cross-workspace"
  pseudo-capability — workspace permission is a property of the role.
* check(identity, capability, target_workspace=None) is the single
  authorisation test: some role must grant the capability *and* be
  active in the target workspace.
* enforce_workspace validates a request-body workspace against the
  caller's role scopes and injects the resolved value.  Cross-
  workspace admin is permitted by role scope, not by a bypass.
* Gateway endpoints declare a required capability explicitly — no
  permissive default.  Construction fails fast if omitted.  Enterprise
  editions can replace the role table without changing the wire
  protocol.

WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py)
----------------------------------------------------------------
* /api/v1/socket handshake unconditionally accepts; authentication
  runs on the first WebSocket frame ({"type":"auth","token":"..."})
  with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}.
  The socket stays open on failure so the client can re-authenticate
  — browsers treat a handshake-time 401 as terminal, breaking
  reconnection.
* Mux.receive rejects every non-auth frame before auth succeeds,
  enforces the caller's workspace (envelope + inner payload) using
  the role-scope resolver, and supports mid-session re-auth.
* Flow import/export streaming endpoints keep the legacy ?token=
  handshake (URL-scoped short-lived transfers; no re-auth need).

Auth surface
------------
* POST /api/v1/auth/login — public, returns a JWT.
* POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap
  op which itself enforces mode + tables-empty.
* POST /api/v1/auth/change-password — any authenticated user.
* POST /api/v1/iam — admin-only generic forwarder for the rest of
  the IAM API (per-op REST endpoints to follow in a later change).

Removed / breaking
------------------
* GATEWAY_SECRET / --api-token / default_api_token and the legacy
  Authenticator.permitted contract.  The gateway cannot run without
  IAM.
* ?token= on /api/v1/socket.
* DispatcherManager and Mux both raise on auth=None — no silent
  downgrade path.

CLI tools (trustgraph-cli)
--------------------------
tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users,
tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password,
tg-reset-password, tg-create-api-key, tg-list-api-keys,
tg-revoke-api-key, tg-create-workspace, tg-list-workspaces.  Passwords
read via getpass; tokens / one-time secrets written to stdout with
operator context on stderr so shell composition works cleanly.
AsyncSocketClient / SocketClient updated to the first-frame auth
protocol.

Specifications
--------------
* docs/tech-specs/iam.md updated with the error policy, workspace
  resolver extension point, and OSS role-scope model.
* docs/tech-specs/iam-protocol.md (new) — transport, dataclasses,
  operation table, error taxonomy, bootstrap modes.
* docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS
  role bundles, agent-as-composition note, enforcement-boundary
  policy, enterprise extensibility.

Tests
-----
* test_auth.py (rewritten) — IamAuth + JWT round-trip with real
  Ed25519 keypairs + API-key cache behaviour.
* test_capabilities.py (new) — role table sanity, check across
  role x workspace combinations, enforce_workspace paths,
  unknown-cap / unknown-role fail-closed.
* Every endpoint test construction now names its capability
  explicitly (no permissive defaults relied upon).  New tests pin
  the fail-closed invariants: DispatcherManager / Mux refuse
  auth=None; i18n path-traversal defense is exercised.
* test_socket_graceful_shutdown rewritten against IamAuth.
2026-04-24 17:29:10 +01:00

252 lines
13 KiB
Text

curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation": "bootstrap"}'
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation": "resolve-api-key", "api_key": "tg_r-n43hDWV9WOY06w6o5YpevAxirlS33D"}'
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation": "resolve-api-key", "api_key": "asdalsdjasdkasdasda"}'
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation":"list-users","workspace":"default"}'
# 1. Admin creates a writer user "alice"
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{
"operation": "create-user",
"workspace": "default",
"user": {
"username": "alice",
"name": "Alice",
"email": "alice@example.com",
"password": "changeme",
"roles": ["writer"]
}
}'
# expect: {"user": {"id": "<alice-uuid>", ...}} — grab alice's uuid
# 2. Issue alice an API key
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{
"operation": "create-api-key",
"workspace": "default",
"key": {
"user_id": "f2363a10-3b83-44ea-a008-43caae8ba607",
"name": "alice-laptop"
}
}'
# expect: {"api_key_plaintext": "tg_...", "api_key": {"id": "<key-uuid>", "prefix": "tg_xxxx", ...}}
# 3. Resolve alice's key — should return alice's id + workspace + writer role
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation":"resolve-api-key","api_key":"tg_gt4buvk5NG-QS7oP_0Gk5yTWyj1qensf"}'
# expect: {"resolved_user_id":"<alice-uuid>","resolved_workspace":"default","resolved_roles":["writer"]}
# 4. List alice's keys (admin view of alice's keys)
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation":"list-api-keys","workspace":"default","user_id":"f2363a10-3b83-44ea-a008-43caae8ba607"}'
# expect: {"api_keys": [{"id":"<key-uuid>","user_id":"<alice-uuid>","name":"alice-laptop","prefix":"tg_xxxx",...}]}
# 5. Revoke alice's key
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation":"revoke-api-key","workspace":"default","key_id":"55f1c1f7-5448-49fd-9eda-56c192b61177"}'
# expect: {} (empty, no error)
# 6. Confirm the revoked key no longer resolves
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation":"resolve-api-key","api_key":"tg_gt4buvk5NG-QS7oP_0Gk5yTWyj1qensf"}'
# expect: {"error":{"type":"auth-failed","message":"unknown api key"}}
----------------------------------------------------------------------------
You'll want to re-bootstrap a fresh deployment to pick up the new signing-key row (or accept that login will lazily generate one on first
call). Then:
# 1. Create a user with a known password (admin's password is random)
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation":"create-user","workspace":"default","user":{"username":"alice","password":"s3cret","roles":["writer"]}}'
# 2. Log alice in
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation":"login","username":"alice","password":"s3cret"}'
# expect: {"jwt":"eyJ...","jwt_expires":"2026-..."}
# 3. Fetch the public key (what the gateway will use later to verify)
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation":"get-signing-key-public"}'
# expect: {"signing_key_public":"-----BEGIN PUBLIC KEY-----\n..."}
# 4. Wrong password
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Authorization: Bearer $GATEWAY_SECRET" \
-H "Content-Type: application/json" \
-d '{"operation":"login","username":"alice","password":"nope"}'
# expect: {"error":{"type":"auth-failed","message":"bad credentials"}}
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAseLB/a9Bo/RN/Rb/x763
+vdxmUKG75oWsXBmbwZGDXyN6fwqZ3L7cEje93qK0PYFuCHxhY1Hn0gW7FZ8ovH+
qEksekUlpfPYqKGiT5Mb0DKk49D4yKkIbJFugWalpwIilvRbQO0jy3V8knqGQ1xL
NfNYFrI2Rxe0Tq2OHVYc5YwYbyj1nz2TY5fd9qrzXtGRv5HZztkl25lWhRvG9G0K
urKDdBDbi894gIYorXvcwZw/b1GDXG/aUy/By1Oy3hXnCLsN8pA3nA437TTTWxHx
QgPH15jIF9hezO+3/ESZ7EhVEtgmwTxPddfXRa0ZoT6JyWOgcloKtnP4Lp9eQ4va
yQIDAQAB
-----END PUBLIC KEY-----
New operations:
- change-password — self-service. Requires current + new password.
- reset-password — admin-driven. Generates a random temporary, sets must_change_password=true, returns plaintext once.
- get-user, update-user, disable-user — workspace-scoped. update-user refuses to change username (immutable — error if different) and refuses
password-via-update. disable-user also revokes all the user's API keys, per spec.
- create-workspace, list-workspaces, get-workspace, update-workspace, disable-workspace — system-level. disable-workspace cascades: disables
all users + revokes all their keys. Rejects ids starting with _ (reserved, per the bootstrap framework convention).
- rotate-signing-key — generates a new Ed25519 key, retires the current one (sets retired timestamp; row stays for future grace-period
validation), switches the in-memory cache.
Touched files:
- trustgraph-flow/trustgraph/tables/iam.py — added retire_signing_key, update_user_profile, update_user_password, update_user_enabled,
update_workspace.
- trustgraph-flow/trustgraph/iam/service/iam.py — 12 new handlers + dispatch entries.
- trustgraph-base/trustgraph/base/iam_client.py — matching client helpers for all of them.
Smoke-test suggestions:
# change password for alice (from "s3cret" → "n3wer")
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation":"change-password","user_id":"b2960feb-caef-401d-af65-01bdb6960cad","password":"s3cret","new_password":"n3wer"}'
# login with new password
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation":"login","username":"alice","password":"n3wer"}'
# admin resets alice's password
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation":"reset-password","workspace":"default","user_id":"b2960feb-caef-401d-af65-01bdb6960cad"}'
# → {"temporary_password":"..."}
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation":"login","username":"alice","password":"fH2ttyrIcVXCIkH_"}'
# create a second workspace
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation":"create-workspace","workspace_record":{"id":"acme","name":"Acme Corp","enabled":true}}'
# rotate signing key (next login produces a JWT signed by a new kid)
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-d '{"operation":"rotate-signing-key"}'
curl -s -X POST "http://localhost:8088/api/v1/flow" \
-H "Authorization: Bearer tg_bs_kBAhfejiEJmbcO1gElbxk3MpV7wQFygP" \
-H "Content-Type: application/json" \
-d '{"operation":"list-flows"}'
curl -s -X POST "http://localhost:8088/api/v1/iam" \
-H "Authorization: Bearer tg_bs_kBAhfejiEJmbcO1gElbxk3MpV7wQFygP" \
-H "Content-Type: application/json" \
-d '{"operation":"list-users"}'
curl -s -X POST http://localhost:8088/api/v1/iam \
-H "Content-Type: application/json" \
-H "Authorization: Bearer tg_bs_kBAhfejiEJmbcO1gElbxk3MpV7wQFygP" \
-d '{
"operation": "create-user",
"workspace": "default",
"user": {
"username": "alice",
"name": "Alice",
"email": "alice@example.com",
"password": "s3cret",
"roles": ["writer"]
}
}'
# Login (public, no token needed) → returns a JWT
curl -s -X POST "http://localhost:8088/api/v1/auth/login" \
-H "Content-Type: application/json" \
-d '{"username":"alice","password":"s3cret"}'
export TRUSTGRAPH_TOKEN=$(tg-bootstrap-iam) # on fresh bootstrap-mode deployment
# or set to your existing admin API key
tg-create-user --username alice --roles writer
# → prints alice's user id
ALICE_ID=<uuid from above>
ALICE_KEY=$(tg-create-api-key --user-id $ALICE_ID --name alice-laptop)
# → alice's plaintext API key
tg-list-users
tg-list-api-keys --user-id $ALICE_ID
tg-revoke-api-key --key-id <...>
tg-disable-user --user-id $ALICE_ID
# User self-service:
tg-login --username alice # prompts for password, prints JWT
tg-change-password # prompts for current + new