refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853)

The gateway no longer holds any policy state — capability sets, role
definitions, workspace scope rules.  Per the IAM contract it asks the
regime "may this identity perform this capability on this resource?"
per request.  That moves the OSS role-based regime entirely into
iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing
the gateway, the wire protocol, or backend services.

Contract:
- authenticate(credential) -> Identity (handle, workspace,
  principal_id, source).  No roles, claims, or policy state surface
  to the gateway.
- authorise(identity, capability, resource, parameters) -> (allow,
  ttl).  Cached per-decision (regime TTL clamped above; fail-closed
  on regime errors).
- authorise_many available as a fan-out variant.

Operation registry drives every authorisation decision:
- /api/v1/iam -> IamEndpoint, looks up bare op name (create-user,
  list-workspaces, ...).
- /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op>
  (config:get, flow:list-blueprints, librarian:add-document, ...).
- /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>.
- /api/v1/flow/{flow}/{import,export}/{kind} ->
  flow-{import,export}:<kind>.
- WS Mux per-frame -> flow-service:<kind>; closes a gap where
  authenticated users could hit any service kind.
85 operations registered across the surface.

JWT carries identity only — sub + workspace.  The roles claim is gone;
the gateway never reads policy state from a credential.

The three coarse *_KIND_CAPABILITY maps are removed.  The registry is
the only source of truth for the capability + resource shape of an
operation.  Tests migrated to the new Identity shape and to
authorise()-mocked auth doubles.

Specs updated: docs/tech-specs/iam-contract.md (Identity surface,
caching, registry-naming conventions), iam.md (JWT shape, gateway
flow, role section reframed as OSS-regime detail), iam-protocol.md
(positioned as one implementation of the contract).
This commit is contained in:
cybermaggedon 2026-04-28 16:19:41 +01:00 committed by GitHub
parent 9f2d9adcb1
commit 5e28d3cce0
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
24 changed files with 2359 additions and 587 deletions

View file

@ -1,4 +1,6 @@
import json
from . request_response_spec import RequestResponse, RequestResponseSpec
from .. schema import (
IamRequest, IamResponse,
@ -44,7 +46,13 @@ class IamClient(RequestResponse):
Returns ``(user_id, workspace, roles)`` or raises
``RuntimeError`` with error type ``auth-failed`` if the key is
unknown / expired / revoked."""
unknown / expired / revoked.
Note: the ``roles`` value is a regime-internal hint and is
not used by the gateway directly under the IAM contract;
all authorisation decisions go through ``authorise()``.
Returned here only for backward compatibility with callers
that haven't migrated."""
resp = await self._request(
operation="resolve-api-key",
api_key=api_key,
@ -56,6 +64,40 @@ class IamClient(RequestResponse):
list(resp.resolved_roles),
)
async def authorise(self, identity_handle, capability,
resource, parameters, timeout=IAM_TIMEOUT):
"""Ask the IAM regime whether ``identity_handle`` may perform
``capability`` on ``resource`` given ``parameters``.
Implements the contract ``authorise(identity, capability,
resource, parameters) (decision, ttl)``. Returns a tuple
``(allow: bool, ttl_seconds: int)``. The TTL is the
regime's suggested cache lifetime for this decision; the
gateway honours it (clamped above by gateway-side policy)."""
resp = await self._request(
operation="authorise",
user_id=identity_handle,
capability=capability,
resource_json=json.dumps(resource or {}, sort_keys=True),
parameters_json=json.dumps(parameters or {}, sort_keys=True),
timeout=timeout,
)
return resp.decision_allow, resp.decision_ttl_seconds
async def authorise_many(self, identity_handle, checks,
timeout=IAM_TIMEOUT):
"""Bulk authorise. ``checks`` is a list of dicts each
carrying ``capability``, ``resource``, and ``parameters``.
Returns a list of ``(allow, ttl)`` tuples in the same order."""
resp = await self._request(
operation="authorise-many",
user_id=identity_handle,
authorise_checks=json.dumps(list(checks), sort_keys=True),
timeout=timeout,
)
decisions = json.loads(resp.decisions_json or "[]")
return [(d.get("allow", False), d.get("ttl", 0)) for d in decisions]
async def create_user(self, workspace, user, actor="",
timeout=IAM_TIMEOUT):
"""Create a user. ``user`` is a ``UserInput``."""

View file

@ -99,6 +99,21 @@ class IamRequest:
workspace_record: WorkspaceInput | None = None
key: ApiKeyInput | None = None
# ---- authorise / authorise-many inputs ----
# Capability string from the vocabulary in capabilities.md.
capability: str = ""
# Resource identifier as JSON. See the IAM contract spec for
# the resource-component vocabulary. An empty dict denotes a
# system-level resource.
resource_json: str = ""
# Operation parameters as JSON. Decision-relevant fields the
# operation supplied that are not part of the resource address
# (e.g. workspace association on create-user).
parameters_json: str = ""
# For authorise-many: a JSON-serialised list of
# {"capability": str, "resource": dict, "parameters": dict}.
authorise_checks: str = ""
@dataclass
class IamResponse:
@ -133,6 +148,18 @@ class IamResponse:
bootstrap_admin_user_id: str = ""
bootstrap_admin_api_key: str = ""
# ---- authorise / authorise-many outputs ----
# authorise: the regime's allow / deny verdict.
decision_allow: bool = False
# Cache TTL the regime suggests, in seconds. Gateway respects
# this for both allow and deny decisions; bounded above by
# gateway-side policy (typically <= 60s).
decision_ttl_seconds: int = 0
# authorise-many: a JSON-serialised list of {"allow": bool,
# "ttl": int} in the same order as the request's
# authorise_checks.
decisions_json: str = ""
error: Error | None = None