mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-04-30 19:06:21 +02:00

iam: self-service ops, optional workspace filters, Mux service routing (#855 )

Three threads, all reinforcing the contract's system-level vs.
workspace-association distinction.

WS Mux service routing
- tg-show-flows (and any workspace-level service over the WS) was
  failing with "unknown service" because the post-refactor Mux
  unconditionally looked up flow-service:<kind>.  Now branches on
  the envelope's flow field: with flow → flow-service:<kind>;
  without flow → <kind>:<op> from the inner body; with bare op
  lookup for service=iam.  Resource and parameters come from the
  matched op's own extractors — same path the HTTP endpoints take.

Optional workspace on system-level user/key ops
- list-users returns the deployment-wide list when no workspace is
  supplied, filters when one is.  get-user, update-user,
  disable-user, enable-user, delete-user, reset-password,
  create-api-key, list-api-keys, revoke-api-key all treat workspace
  as an optional integrity check rather than a required argument.
- create-user keeps workspace required — there it's the new user's
  home-workspace binding, a parameter rather than an address.
- API keys reclassified as SYSTEM-level resources.  By the same
  reasoning that makes users system-level, an API key is a
  credential record on a deployment-wide registry; the workspace it
  authenticates to is a property, not a containment.

Self-service surface
- whoami: returns the caller's own user record.  AUTHENTICATED-only;
  no users:read capability required.  Foundation for UI affordances
  that depend on the caller's permissions.
- bootstrap-status: POST /api/v1/auth/bootstrap-status, PUBLIC,
  side-effect-free.  Returns {bootstrap_available: bool} so a
  first-run UI can decide whether to render setup without consuming
  the bootstrap op.
- Gateway now injects actor=identity.handle on every authenticated
  forward to iam-svc (IamEndpoint and WS Mux iam path), overwriting
  any caller-supplied value.  Underpins whoami, audit logging, and
  future regime-side decisions that need actor identity.
- tg-whoami and tg-update-user CLIs.

Spec polish
- iam-contract.md: actor-injection rule documented; whoami /
  bootstrap-status added to operations list; permission-scope
  framing tightened (workspace scope is a property of the grant,
  not the user or role).
- iam.md: self-service section; gateway flow gains the actor-
  injection step; role section reframed so iam-svc constraints
  don't leak into contract-level prose.
- iam-protocol.md: ops table updated for whoami, bootstrap-status,
  optional-workspace pattern; bootstrap_available added to the
  IamResponse listing.

2026-04-28 22:13:12 +01:00

18 KiB

Raw Blame History

layout	title	parent
default	IAM Service Protocol Technical Specification	Tech Specs

IAM Service Protocol Technical Specification

Overview

This document specifies the wire protocol of the open-source IAM regime — one implementation of the abstract IAM contract defined in iam-contract.md. Other regimes (OIDC / SSO, ABAC, ReBAC, external policy engines) implement the same contract with different transports, data models, and policy semantics; the gateway is unaware of which regime it's wired against.

The OSS regime is a backend processor (iam-svc) reached over the standard request/response pub/sub pattern. It owns users, workspaces, API keys, login credentials, and JWT signing keys, all backed by Cassandra. The API gateway is its only caller.

This document defines:

the IamRequest and IamResponse dataclasses on the bus,
the operation set the OSS regime implements,
per-operation input and output fields,
the error taxonomy,
the bootstrap modes,
the initial HTTP forwarding endpoint used while the protocol is being exercised.

The mapping from this regime onto the abstract contract is direct:

Contract operation	OSS regime operation
`authenticate(credential)`	`resolve-api-key` (for API keys); local JWT validation against `get-signing-key-public` (for JWTs)
`authorise(identity, capability, resource, parameters)`	Role-table lookup against the OSS role bundles defined in `capabilities.md`, gated by workspace scope. Workspace can come from the resource address (workspace- and flow-level resources) or from a parameter (system-level resources whose parameters reference a workspace, e.g. `create-user with workspace association W`).
`authorise_many`	Loop over `authorise`
Identity / credential / workspace management	`create-user`, `create-api-key`, etc. as listed below. These are operations on system-level resources (the user / workspace / credential registries); workspace, where it appears in the body, is a parameter.

Architectural context — roles, capabilities, workspace as resource scope, enforcement boundary — lives in iam.md and capabilities.md. The contract abstraction lives in iam-contract.md.

Transport

Request topic: request:tg/request/iam-request
Response topic: response:tg/response/iam-response
Pattern: request/response, correlated by the id message property, the same pattern used by config-svc and flow-svc.
Caller: the API gateway only. Under the enforcement-boundary policy (see capabilities spec), the IAM service trusts the bus and performs no per-request authentication or capability check against the caller. The gateway has already evaluated capability membership and workspace scoping before sending the request.

Dataclasses

`IamRequest`

@dataclass
class IamRequest:
    # One of the operation strings below.
    operation: str = ""

    # Scope of this request.  Required on every workspace-scoped
    # operation.  Omitted (or empty) for system-level ops
    # (workspace CRUD, signing-key ops, bootstrap, resolve-api-key,
    # login).
    workspace: str = ""

    # Acting user id.  Set by the gateway to the authenticated
    # caller's identity handle for every authenticated request
    # (overwrites any caller-supplied value — the gateway is the
    # only authority for actor identity, so handlers can rely on it
    # being authentic).  Used for audit logging, self-service ops
    # like ``whoami`` that resolve "the caller", and future actor-
    # scoped policy checks.  Empty for unauthenticated ops
    # (``login``, ``bootstrap``, ``bootstrap-status``,
    # ``get-signing-key-public``, ``resolve-api-key``).  See the
    # actor-injection rule in the IAM contract spec.
    actor: str = ""

    # --- identity selectors ---
    user_id: str = ""
    username: str = ""          # login; unique within a workspace
    key_id: str = ""            # revoke-api-key, list-api-keys (own)
    api_key: str = ""           # resolve-api-key (plaintext)

    # --- credentials ---
    password: str = ""          # login, change-password (current)
    new_password: str = ""      # change-password

    # --- user fields ---
    user: UserInput | None = None       # create-user, update-user

    # --- workspace fields ---
    workspace_record: WorkspaceInput | None = None   # create-workspace, update-workspace

    # --- api key fields ---
    key: ApiKeyInput | None = None      # create-api-key

`IamResponse`

@dataclass
class IamResponse:
    # Populated on success of operations that return them.
    user: UserRecord | None = None              # create-user, get-user, update-user
    users: list[UserRecord] = field(default_factory=list)   # list-users
    workspace: WorkspaceRecord | None = None    # create-workspace, get-workspace, update-workspace
    workspaces: list[WorkspaceRecord] = field(default_factory=list)  # list-workspaces

    # create-api-key returns the plaintext once.  Never populated
    # on any other operation.
    api_key_plaintext: str = ""
    api_key: ApiKeyRecord | None = None          # create-api-key
    api_keys: list[ApiKeyRecord] = field(default_factory=list)  # list-api-keys

    # login, rotate-signing-key
    jwt: str = ""
    jwt_expires: str = ""        # ISO-8601 UTC

    # get-signing-key-public
    signing_key_public: str = ""  # PEM

    # resolve-api-key returns who this key authenticates as.
    resolved_user_id: str = ""
    resolved_workspace: str = ""
    resolved_roles: list[str] = field(default_factory=list)

    # reset-password
    temporary_password: str = ""  # returned once to the operator

    # bootstrap: on first run, the initial admin's one-time API key
    # is returned for the operator to capture.
    bootstrap_admin_user_id: str = ""
    bootstrap_admin_api_key: str = ""

    # bootstrap-status: true iff an unconsumed ``bootstrap`` call
    # would currently succeed.  Always emitted by the response
    # translator (the false case is meaningful for first-run UIs).
    bootstrap_available: bool = False

    # Present on any failed operation.
    error: Error | None = None

Value types

@dataclass
class UserInput:
    username: str = ""
    name: str = ""
    email: str = ""
    password: str = ""          # only on create-user; never on update-user
    roles: list[str] = field(default_factory=list)
    enabled: bool = True
    must_change_password: bool = False

@dataclass
class UserRecord:
    id: str = ""
    workspace: str = ""
    username: str = ""
    name: str = ""
    email: str = ""
    roles: list[str] = field(default_factory=list)
    enabled: bool = True
    must_change_password: bool = False
    created: str = ""           # ISO-8601 UTC
    # Password hash is never included in any response.

@dataclass
class WorkspaceInput:
    id: str = ""
    name: str = ""
    enabled: bool = True

@dataclass
class WorkspaceRecord:
    id: str = ""
    name: str = ""
    enabled: bool = True
    created: str = ""           # ISO-8601 UTC

@dataclass
class ApiKeyInput:
    user_id: str = ""
    name: str = ""              # operator-facing label, e.g. "laptop"
    expires: str = ""           # optional ISO-8601 UTC; empty = no expiry

@dataclass
class ApiKeyRecord:
    id: str = ""
    user_id: str = ""
    name: str = ""
    prefix: str = ""            # first 4 chars of plaintext, for identification in lists
    expires: str = ""           # empty = no expiry
    created: str = ""
    last_used: str = ""         # empty if never used
    # key_hash is never included in any response.

Operations

Operation	Request fields	Response fields	Notes
`login`	`username`, `password`, `workspace` (optional)	`jwt`, `jwt_expires`	If `workspace` omitted, IAM resolves to the user's assigned workspace.
`whoami`	`actor` (gateway-injected)	`user`	Returns the calling user's own record. AUTHENTICATED-only; no `users:read` capability required.
`resolve-api-key`	`api_key` (plaintext)	`resolved_user_id`, `resolved_workspace`, `resolved_roles`	Gateway-internal. Service returns `auth-failed` for unknown / expired / revoked keys.
`change-password`	`user_id`, `password` (current), `new_password`	—	Self-service. IAM validates `password` against stored hash.
`reset-password`	`user_id`, `workspace` (optional integrity check)	`temporary_password`	Admin-initiated. IAM generates a random password, sets `must_change_password=true` on the user, returns the plaintext once.
`create-user`	`workspace`, `user`	`user`	`user.password` is hashed and stored; `user.roles` must be subset of known roles. `workspace` is the new user's home-workspace binding (a required parameter, not an address).
`list-users`	`workspace` (optional filter)	`users`	If `workspace` omitted, returns the deployment-wide list.
`get-user`	`user_id`, `workspace` (optional integrity check)	`user`
`update-user`	`user_id`, `user`, `workspace` (optional integrity check)	`user`	`password` field on `user` is rejected; use `change-password` / `reset-password`. Username is immutable.
`disable-user`	`user_id`, `workspace` (optional integrity check)	—	Soft-delete; sets `enabled=false`. Revokes all the user's API keys.
`enable-user`	`user_id`, `workspace` (optional integrity check)	—	Re-enables a previously disabled user; does not restore API keys.
`delete-user`	`user_id`, `workspace` (optional integrity check)	—	Hard-delete; removes user record, username lookup, and all the user's API keys.
`create-workspace`	`workspace_record`	`workspace`	System-level.
`list-workspaces`	—	`workspaces`	System-level.
`get-workspace`	`workspace_record` (id only)	`workspace`	System-level.
`update-workspace`	`workspace_record`	`workspace`	System-level.
`disable-workspace`	`workspace_record` (id only)	—	System-level. Sets `enabled=false`; revokes all workspace API keys; disables all users in the workspace.
`create-api-key`	`key`, `workspace` (optional integrity check)	`api_key_plaintext`, `api_key`	Plaintext returned once; only hash stored. `key.name` required.
`list-api-keys`	`user_id`, `workspace` (optional integrity check)	`api_keys`
`revoke-api-key`	`key_id`, `workspace` (optional integrity check)	—	Deletes the key record.
`get-signing-key-public`	—	`signing_key_public`	Gateway fetches this at startup.
`rotate-signing-key`	—	—	System-level. Introduces a new signing key; old key continues to validate JWTs for a grace period (implementation-defined, minimum 1h).
`bootstrap`	—	`bootstrap_admin_user_id`, `bootstrap_admin_api_key`	If IAM tables are empty and the service is in `bootstrap` mode, creates the initial `default` workspace, an `admin` user, an initial API key, and an initial signing key; returns them once. Otherwise returns a masked auth failure.
`bootstrap-status`	—	`bootstrap_available`	Side-effect-free probe; `true` iff iam-svc is in `bootstrap` mode and tables are empty. Intended for first-run UX.

Error taxonomy

All errors are carried in the IamResponse.error field. error.type is one of the values below; error.message is a human-readable string that is not surfaced verbatim to external callers (the gateway maps to auth failure / access denied per the IAM error policy).

`type`	When
`invalid-argument`	Malformed request (missing required field, unknown operation, invalid format).
`not-found`	Named resource does not exist (`user_id`, `key_id`, workspace).
`duplicate`	Create operation collides with an existing resource (username, workspace id, key name).
`auth-failed`	`login` with wrong credentials; `resolve-api-key` with unknown / expired / revoked key; `change-password` with wrong current password. Single bucket to deny oracle attacks.
`weak-password`	Password does not meet policy (length, complexity — policy defined at service level).
`disabled`	Target user or workspace has `enabled=false`.
`operation-not-permitted`	Non-admin attempting system-level operation, or workspace-scoped operation attempting to affect another workspace.
`internal-error`	Unexpected IAM-side failure. Log and surface as 500 at the gateway.

The gateway is responsible for translating auth-failed and operation-not-permitted into the obfuscated external error response ("auth failure" / "access denied"); invalid-argument becomes a descriptive 400; not-found / duplicate / weak-password / disabled become descriptive 4xx but never leak IAM-internal detail.

Credential storage

Passwords are stored using a slow KDF (bcrypt / argon2id — the service picks; documented as an implementation detail). The password_hash column stores the full KDF-encoded string (algorithm, cost, salt, hash). Not a plain SHA-256.
API keys are stored as SHA-256 of the plaintext. API keys are 128-bit random values (tg_ + base64url); the entropy makes a slow hash unnecessary. The hash serves as the primary key on the iam_api_keys table, enabling O(1) lookup on resolve-api-key.
JWT signing key is stored as an RSA or Ed25519 private key (implementation choice) in a dedicated iam_signing_keys table with a kid, created, and optional retired timestamp. At most one active key; up to N retired keys are kept for a grace period to validate previously-issued JWTs.

Passwords, API-key plaintext, and signing-key private material are never returned in any response other than the explicit one-time responses above (reset-password, create-api-key, bootstrap).

Bootstrap modes

iam-svc requires a bootstrap mode to be chosen at startup. There is no default — an unset or invalid mode causes the service to refuse to start. The purpose is to force the operator to make an explicit security decision rather than rely on an implicit "safe" fallback.

Mode	Startup behaviour	`bootstrap` operation	Suitability
`token`	On first start with empty tables, auto-seeds the `default` workspace, admin user, admin API key (using the operator-provided `--bootstrap-token`), and an initial signing key. No-op on subsequent starts.	Refused — returns `auth-failed` / `"auth failure"` regardless of caller.	Production, any public-exposure deployment.
`bootstrap`	No startup seeding. Tables remain empty until the `bootstrap` operation is invoked over the pub/sub bus (typically via `tg-bootstrap-iam`).	Live while tables are empty. Generates and returns the admin API key once. Refused (`auth-failed`) once tables are populated.	Dev / compose up / CI. Not safe under public exposure — any caller reaching the gateway's `/api/v1/iam` forwarder before the operator can cause a token to be issued to them. Operators choosing this mode accept that risk.

Error masking

In both modes, any refused invocation of the bootstrap operation returns the same error (auth-failed / "auth failure"). A caller cannot distinguish:

"service is in token mode"
"service is in bootstrap mode but already bootstrapped"
"operation forbidden"

This matches the general IAM error-policy stance (see iam.md) and prevents externally enumerating IAM's state.

Configuration sources

The mode and token can be supplied two ways. Resolution order is fixed; there is no permissive fallback.

Source	Field
Processor-group YAML / CLI argument	`bootstrap_mode`, `bootstrap_token`
Environment variable	`IAM_BOOTSTRAP_MODE`, `IAM_BOOTSTRAP_TOKEN`

For each setting the service uses the explicit param value if present; otherwise the environment variable; otherwise the service refuses to start. The env-var path is intended for the K8s deployment pattern where the token is injected from a Secret via secretKeyRef, so the plaintext never has to live in YAML or git. A typical production manifest holds bootstrap_mode: "token" in the YAML and pulls IAM_BOOTSTRAP_TOKEN from the Secret; the YAML is then safe to version-control.

Bootstrap-token lifecycle

The bootstrap token — whether operator-supplied (token mode) or service-generated (bootstrap mode) — is a one-time credential. It is stored as admin's single API key, tagged name="bootstrap". The operator's first admin action after bootstrap should be:

Create a durable admin user and API key (or issue a durable API key to the bootstrap admin).
Revoke the bootstrap key via revoke-api-key.
Remove the bootstrap token from any deployment configuration (Secret, env var, or YAML field — wherever it was sourced).

The name="bootstrap" marker makes bootstrap keys easy to detect in tooling (e.g. a tg-list-api-keys filter).

HTTP forwarding (initial integration)

For the initial gateway integration — before the IAM service is wired into the authentication middleware — the gateway exposes a single forwarding endpoint:

POST /api/v1/iam

Request body is a JSON encoding of IamRequest.
Response body is a JSON encoding of IamResponse.
The gateway's existing authentication (GATEWAY_SECRET bearer) gates access to this endpoint so the IAM protocol can be exercised end-to-end in tests without touching the live auth path.
This endpoint is not the final shape. Once the middleware is in place, per-operation REST endpoints replace it (for example POST /api/v1/auth/login, POST /api/v1/users, DELETE /api/v1/api-keys/{id}), and this generic forwarder is removed.

The endpoint performs only message marshalling: it does not read or rewrite fields in the request, and it applies no capability check. All authorisation for user / workspace / key management lands in the subsequent middleware work.

Non-goals for this spec

REST endpoint shape for the final gateway surface — covered in Phase 2 of the IAM implementation plan, not here.
OIDC / SAML external IdP protocol — out of scope for open source.
Key-signing algorithm choice, password KDF choice, JWT claim layout — implementation details captured in code + ADRs, not locked in the protocol spec.

References

IAM Contract Specification — the abstract gateway↔IAM regime contract this protocol implements.
Identity and Access Management Specification
Capability Vocabulary Specification

18 KiB Raw Blame History