release/v2.4 -> master (#844)

This commit is contained in:
cybermaggedon 2026-04-22 15:19:57 +01:00 committed by GitHub
parent a24df8e990
commit 89cabee1b4
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
386 changed files with 7202 additions and 5741 deletions

View file

@ -0,0 +1,309 @@
---
layout: default
title: "Data Ownership and Information Separation"
parent: "Tech Specs"
---
# Data Ownership and Information Separation
## Purpose
This document defines the logical ownership model for data in
TrustGraph: what the artefacts are, who owns them, and how they relate
to each other.
The IAM spec ([iam.md](iam.md)) describes authentication and
authorisation mechanics. This spec addresses the prior question: what
are the boundaries around data, and who owns what?
## Concepts
### Workspace
A workspace is the primary isolation boundary. It represents an
organisation, team, or independent operating unit. All data belongs to
exactly one workspace. Cross-workspace access is never permitted through
the API.
A workspace owns:
- Source documents
- Flows (processing pipeline definitions)
- Knowledge cores (stored extraction output)
- Collections (organisational units for extracted knowledge)
### Collection
A collection is an organisational unit within a workspace. It groups
extracted knowledge produced from source documents. A workspace can
have multiple collections, allowing:
- Processing the same documents with different parameters or models.
- Maintaining separate knowledge bases for different purposes.
- Deleting extracted knowledge without deleting source documents.
Collections do not own source documents. A source document exists at the
workspace level and can be processed into multiple collections.
### Source document
A source document (PDF, text file, etc.) is raw input uploaded to the
system. Documents belong to the workspace, not to a specific collection.
This is intentional. A document is an asset that exists independently
of how it is processed. The same PDF might be processed into multiple
collections with different chunking parameters or extraction models.
Tying a document to a single collection would force re-upload for each
collection.
### Flow
A flow defines a processing pipeline: which models to use, what
parameters to apply (chunk size, temperature, etc.), and how processing
services are connected. Flows belong to a workspace.
The processing services themselves (document-decoder, chunker,
embeddings, LLM completion, etc.) are shared infrastructure — they serve
all workspaces. Each flow has its own queues, keeping data from
different workspaces and flows separate as it moves through the
pipeline.
Different workspaces can define different flows. Workspace A might use
GPT-5.2 with a chunk size of 2000, while workspace B uses Claude with a
chunk size of 1000.
### Prompts
Prompts are templates that control how the LLM behaves during knowledge
extraction and query answering. They belong to a workspace, allowing
different workspaces to have different extraction strategies, response
styles, or domain-specific instructions.
### Ontology
An ontology defines the concepts, entities, and relationships that the
extraction pipeline looks for in source documents. Ontologies belong to
a workspace. A medical workspace might define ontologies around diseases,
symptoms, and treatments, while a legal workspace defines ontologies
around statutes, precedents, and obligations.
### Schemas
Schemas define structured data types for extraction. They specify what
fields to extract, their types, and how they relate. Schemas belong to
a workspace, as different workspaces extract different structured
information from their documents.
### Tools, tool services, and MCP tools
Tools define capabilities available to agents: what actions they can
take, what external services they can call. Tool services configure how
tools connect to backend services. MCP tools configure connections to
remote MCP servers, including authentication tokens. All belong to a
workspace.
### Agent patterns and agent task types
Agent patterns define agent behaviour strategies (how an agent reasons,
what steps it follows). Agent task types define the kinds of tasks
agents can perform. Both belong to a workspace, as different workspaces
may have different agent configurations.
### Token costs
Token cost definitions specify pricing for LLM token usage per model.
These belong to a workspace since different workspaces may use different
models or have different billing arrangements.
### Flow blueprints
Flow blueprints are templates for creating flows. They define the
default pipeline structure and parameters. Blueprints belong to a
workspace, allowing workspaces to define custom processing templates.
### Parameter types
Parameter types define the kinds of parameters that flows accept (e.g.
"llm-model", "temperature"), including their defaults and validation
rules. They belong to a workspace since workspaces that define custom
flows need to define the parameter types those flows use.
### Interface descriptions
Interface descriptions define the connection points of a flow — what
queues and topics it uses. They belong to a workspace since they
describe workspace-owned flows.
### Knowledge core
A knowledge core is a stored snapshot of extracted knowledge (triples
and graph embeddings). Knowledge cores belong to a workspace and can be
loaded into any collection within that workspace.
Knowledge cores serve as a portable extraction output. You process
documents through a flow, the pipeline produces triples and embeddings,
and the results can be stored as a knowledge core. That core can later
be loaded into a different collection or reloaded after a collection is
cleared.
### Extracted knowledge
Extracted knowledge is the live, queryable content within a collection:
triples in the knowledge graph, graph embeddings, and document
embeddings. It is the product of processing source documents through a
flow into a specific collection.
Extracted knowledge is scoped to a workspace and a collection. It
cannot exist without both.
### Processing record
A processing record tracks which source document was processed, through
which flow, into which collection. It links the source document
(workspace-scoped) to the extracted knowledge (workspace + collection
scoped).
## Ownership summary
| Artefact | Owned by | Shared across collections? |
|----------|----------|---------------------------|
| Workspaces | Global (platform) | N/A |
| User accounts | Global (platform) | N/A |
| API keys | Global (platform) | N/A |
| Source documents | Workspace | Yes |
| Flows | Workspace | N/A |
| Flow blueprints | Workspace | N/A |
| Prompts | Workspace | N/A |
| Ontologies | Workspace | N/A |
| Schemas | Workspace | N/A |
| Tools | Workspace | N/A |
| Tool services | Workspace | N/A |
| MCP tools | Workspace | N/A |
| Agent patterns | Workspace | N/A |
| Agent task types | Workspace | N/A |
| Token costs | Workspace | N/A |
| Parameter types | Workspace | N/A |
| Interface descriptions | Workspace | N/A |
| Knowledge cores | Workspace | Yes — can be loaded into any collection |
| Collections | Workspace | N/A |
| Extracted knowledge | Workspace + collection | No |
| Processing records | Workspace + collection | No |
## Scoping summary
### Global (system-level)
A small number of artefacts exist outside any workspace:
- **Workspace registry** — the list of workspaces itself
- **User accounts** — users reference a workspace but are not owned by
one
- **API keys** — belong to users, not workspaces
These are managed by the IAM layer and exist at the platform level.
### Workspace-owned
All other configuration and data is workspace-owned:
- Flow definitions and parameters
- Flow blueprints
- Prompts
- Ontologies
- Schemas
- Tools, tool services, and MCP tools
- Agent patterns and agent task types
- Token costs
- Parameter types
- Interface descriptions
- Collection definitions
- Knowledge cores
- Source documents
- Collections and their extracted knowledge
## Relationship between artefacts
```
Platform (global)
|
+-- Workspaces
| |
+-- User accounts (each assigned to a workspace)
| |
+-- API keys (belong to users)
Workspace
|
+-- Source documents (uploaded, unprocessed)
|
+-- Flows (pipeline definitions: models, parameters, queues)
|
+-- Flow blueprints (templates for creating flows)
|
+-- Prompts (LLM instruction templates)
|
+-- Ontologies (entity and relationship definitions)
|
+-- Schemas (structured data type definitions)
|
+-- Tools, tool services, MCP tools (agent capabilities)
|
+-- Agent patterns and agent task types (agent behaviour)
|
+-- Token costs (LLM pricing per model)
|
+-- Parameter types (flow parameter definitions)
|
+-- Interface descriptions (flow connection points)
|
+-- Knowledge cores (stored extraction snapshots)
|
+-- Collections
|
+-- Extracted knowledge (triples, embeddings)
|
+-- Processing records (links documents to collections)
```
A typical workflow:
1. A source document is uploaded to the workspace.
2. A flow defines how to process it (which models, what parameters).
3. The document is processed through the flow into a collection.
4. Processing records track what was processed.
5. Extracted knowledge (triples, embeddings) is queryable within the
collection.
6. Optionally, the extracted knowledge is stored as a knowledge core
for later reuse.
## Implementation notes
The current codebase uses a `user` field in message metadata and storage
partition keys to identify the workspace. The `collection` field
identifies the collection within that workspace. The IAM spec describes
how the gateway maps authenticated credentials to a workspace identity
and sets these fields.
For details on how each storage backend implements this scoping, see:
- [Entity-Centric Graph](entity-centric-graph.md) — Cassandra KG schema
- [Neo4j User Collection Isolation](neo4j-user-collection-isolation.md)
- [Collection Management](collection-management.md)
### Known inconsistencies in current implementation
- **Pipeline intermediate tables** do not include collection in their
partition keys. Re-processing the same document into a different
collection may overwrite intermediate state.
- **Processing metadata** stores collection in the row payload but not
in the partition key, making collection-based queries inefficient.
- **Upload sessions** are keyed by upload ID, not workspace. The
gateway should validate workspace ownership before allowing
operations on upload sessions.
## References
- [Identity and Access Management](iam.md)
- [Collection Management](collection-management.md)
- [Entity-Centric Graph](entity-centric-graph.md)
- [Neo4j User Collection Isolation](neo4j-user-collection-isolation.md)
- [Multi-Tenant Support](multi-tenant-support.md)

View file

@ -20,8 +20,8 @@ Defines shared service processors that are instantiated once per flow blueprint.
```json
"class": {
"service-name:{class}": {
"request": "queue-pattern:{class}",
"response": "queue-pattern:{class}",
"request": "queue-pattern:{workspace}:{class}",
"response": "queue-pattern:{workspace}:{class}",
"settings": {
"setting-name": "fixed-value",
"parameterized-setting": "{parameter-name}"
@ -31,11 +31,11 @@ Defines shared service processors that are instantiated once per flow blueprint.
```
**Characteristics:**
- Shared across all flow instances of the same class
- Shared across all flow instances of the same class within a workspace
- Typically expensive or stateless services (LLMs, embedding models)
- Use `{class}` template variable for queue naming
- Use `{workspace}` and `{class}` template variables for queue naming
- Settings can be fixed values or parameterized with `{parameter-name}` syntax
- Examples: `embeddings:{class}`, `text-completion:{class}`, `graph-rag:{class}`
- Examples: `embeddings:{workspace}:{class}`, `text-completion:{workspace}:{class}`
### 2. Flow Section
Defines flow-specific processors that are instantiated for each individual flow instance. Each flow gets its own isolated set of these processors.
@ -43,8 +43,8 @@ Defines flow-specific processors that are instantiated for each individual flow
```json
"flow": {
"processor-name:{id}": {
"input": "queue-pattern:{id}",
"output": "queue-pattern:{id}",
"input": "queue-pattern:{workspace}:{id}",
"output": "queue-pattern:{workspace}:{id}",
"settings": {
"setting-name": "fixed-value",
"parameterized-setting": "{parameter-name}"
@ -56,9 +56,9 @@ Defines flow-specific processors that are instantiated for each individual flow
**Characteristics:**
- Unique instance per flow
- Handle flow-specific data and state
- Use `{id}` template variable for queue naming
- Use `{workspace}` and `{id}` template variables for queue naming
- Settings can be fixed values or parameterized with `{parameter-name}` syntax
- Examples: `chunker:{id}`, `pdf-decoder:{id}`, `kg-extract-relationships:{id}`
- Examples: `chunker:{workspace}:{id}`, `pdf-decoder:{workspace}:{id}`
### 3. Interfaces Section
Defines the entry points and interaction contracts for the flow. These form the API surface for external systems and internal component communication.
@ -68,8 +68,8 @@ Interfaces can take two forms:
**Fire-and-Forget Pattern** (single queue):
```json
"interfaces": {
"document-load": "persistent://tg/flow/document-load:{id}",
"triples-store": "persistent://tg/flow/triples-store:{id}"
"document-load": "persistent://tg/flow/{workspace}:document-load:{id}",
"triples-store": "persistent://tg/flow/{workspace}:triples-store:{id}"
}
```
@ -77,8 +77,8 @@ Interfaces can take two forms:
```json
"interfaces": {
"embeddings": {
"request": "non-persistent://tg/request/embeddings:{class}",
"response": "non-persistent://tg/response/embeddings:{class}"
"request": "non-persistent://tg/request/{workspace}:embeddings:{class}",
"response": "non-persistent://tg/response/{workspace}:embeddings:{class}"
}
}
```
@ -117,6 +117,16 @@ Additional information about the flow blueprint:
### System Variables
#### {workspace}
- Replaced with the workspace identifier
- Isolates queue names between workspaces so that two workspaces
starting the same flow do not share queues
- Must be included in all queue name patterns to ensure workspace
isolation
- Example: `ws-acme`, `ws-globex`
- All blueprint templates must include `{workspace}` in queue name
patterns
#### {id}
- Replaced with the unique flow instance identifier
- Creates isolated resources for each flow

858
docs/tech-specs/iam.md Normal file
View file

@ -0,0 +1,858 @@
---
layout: default
title: "Identity and Access Management"
parent: "Tech Specs"
---
# Identity and Access Management
## Problem Statement
TrustGraph has no meaningful identity or access management. The system
relies on a single shared gateway token for authentication and an
honour-system `user` query parameter for data isolation. This creates
several problems:
- **No user identity.** There are no user accounts, no login, and no way
to know who is making a request. The `user` field in message metadata
is a caller-supplied string with no validation — any client can claim
to be any user.
- **No access control.** A valid gateway token grants unrestricted access
to every endpoint, every user's data, every collection, and every
administrative operation. There is no way to limit what an
authenticated caller can do.
- **No credential isolation.** All callers share one static token. There
is no per-user credential, no token expiration, and no rotation
mechanism. Revoking access means changing the shared token, which
affects all callers.
- **Data isolation is unenforced.** Storage backends (Cassandra, Neo4j,
Qdrant) filter queries by `user` and `collection`, but the gateway
does not prevent a caller from specifying another user's identity.
Cross-user data access is trivial.
- **No audit trail.** There is no logging of who accessed what. Without
user identity, audit logging is impossible.
These gaps make the system unsuitable for multi-user deployments,
multi-tenant SaaS, or any environment where access needs to be
controlled or audited.
## Current State
### Authentication
The API gateway supports a single shared token configured via the
`GATEWAY_SECRET` environment variable or `--api-token` CLI argument. If
unset, authentication is disabled entirely. When enabled, every HTTP
endpoint requires an `Authorization: Bearer <token>` header. WebSocket
connections pass the token as a query parameter.
Implementation: `trustgraph-flow/trustgraph/gateway/auth.py`
```python
class Authenticator:
def __init__(self, token=None, allow_all=False):
self.token = token
self.allow_all = allow_all
def permitted(self, token, roles):
if self.allow_all: return True
if self.token != token: return False
return True
```
The `roles` parameter is accepted but never evaluated. All authenticated
requests have identical privileges.
MCP tool configurations support an optional per-tool `auth-token` for
service-to-service authentication with remote MCP servers. These are
static, system-wide tokens — not per-user credentials. See
[mcp-tool-bearer-token.md](mcp-tool-bearer-token.md) for details.
### User identity
The `user` field is passed explicitly by the caller as a query parameter
(e.g. `?user=trustgraph`) or set by CLI tools. It flows through the
system in the core `Metadata` dataclass:
```python
@dataclass
class Metadata:
id: str = ""
root: str = ""
user: str = ""
collection: str = ""
```
There is no user registration, login, user database, or session
management.
### Data isolation
The `user` + `collection` pair is used at the storage layer to partition
data:
- **Cassandra**: queries filter by `user` and `collection` columns
- **Neo4j**: queries filter by `user` and `collection` properties
- **Qdrant**: vector search filters by `user` and `collection` metadata
| Layer | Isolation mechanism | Enforced by |
|-------|-------------------|-------------|
| Gateway | Single shared token | `Authenticator` class |
| Message metadata | `user` + `collection` fields | Caller (honour system) |
| Cassandra | Column filters on `user`, `collection` | Query layer |
| Neo4j | Property filters on `user`, `collection` | Query layer |
| Qdrant | Metadata filters on `user`, `collection` | Query layer |
| Pub/sub topics | Per-flow topic namespacing | Flow service |
The storage-layer isolation depends on all queries correctly filtering by
`user` and `collection`. There is no gateway-level enforcement preventing
a caller from querying another user's data by passing a different `user`
parameter.
### Configuration and secrets
| Setting | Source | Default | Purpose |
|---------|--------|---------|---------|
| `GATEWAY_SECRET` | Env var | Empty (auth disabled) | Gateway bearer token |
| `--api-token` | CLI arg | None | Gateway bearer token (overrides env) |
| `PULSAR_API_KEY` | Env var | None | Pub/sub broker auth |
| MCP `auth-token` | Config service | None | Per-tool MCP server auth |
No secrets are encrypted at rest. The gateway token and MCP tokens are
stored and transmitted in plaintext (aside from any transport-layer
encryption such as TLS).
### Capabilities that do not exist
- Per-user authentication (JWT, OAuth, SAML, API keys per user)
- User accounts or user management
- Role-based access control (RBAC)
- Attribute-based access control (ABAC)
- Per-user or per-workspace API keys
- Token expiration or rotation
- Session management
- Per-user rate limiting
- Audit logging of user actions
- Permission checks preventing cross-user data access
- Multi-workspace credential isolation
### Key files
| File | Purpose |
|------|---------|
| `trustgraph-flow/trustgraph/gateway/auth.py` | Authenticator class |
| `trustgraph-flow/trustgraph/gateway/service.py` | Gateway init, token config |
| `trustgraph-flow/trustgraph/gateway/endpoint/*.py` | Per-endpoint auth checks |
| `trustgraph-base/trustgraph/schema/core/metadata.py` | `Metadata` dataclass with `user` field |
## Technical Design
### Design principles
- **Auth at the edge.** The gateway is the single enforcement point.
Internal services trust the gateway and do not re-authenticate.
This avoids distributing credential validation across dozens of
microservices.
- **Identity from credentials, not from callers.** The gateway derives
user identity from authentication credentials. Callers can no longer
self-declare their identity via query parameters.
- **Workspace isolation by default.** Every authenticated user belongs to
a workspace. All data operations are scoped to that workspace.
Cross-workspace access is not possible through the API.
- **Extensible API contract.** The API accepts an optional workspace
parameter on every request. This allows the same protocol to support
single-workspace deployments today and multi-workspace extensions in
the future without breaking changes.
- **Simple roles, not fine-grained permissions.** A small number of
predefined roles controls what operations a user can perform. This is
sufficient for the current API surface and avoids the complexity of
per-resource permission management.
### Authentication
The gateway supports two credential types. Both are carried as a Bearer
token in the `Authorization` header for HTTP requests. The gateway
distinguishes them by format.
For WebSocket connections, credentials are not passed in the URL or
headers. Instead, the client authenticates after connecting by sending
an auth message as the first frame:
```
Client: opens WebSocket to /api/v1/socket
Server: accepts connection (unauthenticated state)
Client: sends {"type": "auth", "token": "tg_abc123..."}
Server: validates token
success → {"type": "auth-ok", "workspace": "acme"}
failure → {"type": "auth-failed", "error": "invalid token"}
```
The server rejects all non-auth messages until authentication succeeds.
The socket remains open on auth failure, allowing the client to retry
with a different token without reconnecting. The client can also send
a new auth message at any time to re-authenticate — for example, to
refresh an expiring JWT or to switch workspace. The
resolved identity (user, workspace, roles) is updated on each
successful auth.
#### API keys
For programmatic access: CLI tools, scripts, and integrations.
- Opaque tokens (e.g. `tg_a1b2c3d4e5f6...`). Not JWTs — short,
simple, easy to paste into CLI tools and headers.
- Each user has one or more API keys.
- Keys are stored hashed (SHA-256 with salt) in the IAM service. The
plaintext key is returned once at creation time and cannot be
retrieved afterwards.
- Keys can be revoked individually without affecting other users.
- Keys optionally have an expiry date. Expired keys are rejected.
On each request, the gateway resolves an API key by:
1. Hashing the token.
2. Checking a local cache (hash → user/workspace/roles).
3. On cache miss, calling the IAM service to resolve.
4. Caching the result with a short TTL (e.g. 60 seconds).
Revoked keys stop working when the cache entry expires. No push
invalidation is needed.
#### JWTs (login sessions)
For interactive access via the UI or WebSocket connections.
- A user logs in with username and password. The gateway forwards the
request to the IAM service, which validates the credentials and
returns a signed JWT.
- The JWT carries the user ID, workspace, and roles as claims.
- The gateway validates JWTs locally using the IAM service's public
signing key — no service call needed on subsequent requests.
- Token expiry is enforced by standard JWT validation at the time the
request (or WebSocket connection) is made.
- For long-lived WebSocket connections, the JWT is validated at connect
time only. The connection remains authenticated for its lifetime.
The IAM service manages the signing key. The gateway fetches the public
key at startup (or on first JWT encounter) and caches it.
#### Login endpoint
```
POST /api/v1/auth/login
{
"username": "alice",
"password": "..."
}
→ {
"token": "eyJ...",
"expires": "2026-04-20T19:00:00Z"
}
```
The gateway forwards this to the IAM service, which validates
credentials and returns a signed JWT. The gateway returns the JWT to
the caller.
#### IAM service delegation
The gateway stays thin. Its authentication logic is:
1. Extract Bearer token from header (or query param for WebSocket).
2. If the token has JWT format (dotted structure), validate the
signature locally and extract claims.
3. Otherwise, treat as an API key: hash it and check the local cache.
On cache miss, call the IAM service to resolve.
4. If neither succeeds, return 401.
All user management, key management, credential validation, and token
signing logic lives in the IAM service. The gateway is a generic
enforcement point that can be replaced without changing the IAM
service.
#### No legacy token support
The existing `GATEWAY_SECRET` shared token is removed. All
authentication uses API keys or JWTs. On first start, the bootstrap
process creates a default workspace and admin user with an initial API
key.
### User identity
A user belongs to exactly one workspace. The design supports extending
this to multi-workspace access in the future (see
[Extension points](#extension-points)).
A user record contains:
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique user identifier (UUID) |
| `name` | string | Display name |
| `email` | string | Email address (optional) |
| `workspace` | string | Workspace the user belongs to |
| `roles` | list[string] | Assigned roles (e.g. `["reader"]`) |
| `enabled` | bool | Whether the user can authenticate |
| `created` | datetime | Account creation timestamp |
The `workspace` field maps to the existing `user` field in `Metadata`.
This means the storage-layer isolation (Cassandra, Neo4j, Qdrant
filtering by `user` + `collection`) works without changes — the gateway
sets the `user` metadata field to the authenticated user's workspace.
### Workspaces
A workspace is an isolated data boundary. Users belong to a workspace,
and all data operations are scoped to it. Workspaces map to the existing
`user` field in `Metadata` and the corresponding Cassandra keyspace,
Qdrant collection prefix, and Neo4j property filters.
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique workspace identifier |
| `name` | string | Display name |
| `enabled` | bool | Whether the workspace is active |
| `created` | datetime | Creation timestamp |
All data operations are scoped to a workspace. The gateway determines
the effective workspace for each request as follows:
1. If the request includes a `workspace` parameter, validate it against
the user's assigned workspace.
- If it matches, use it.
- If it does not match, return 403. (This could be extended to
check a workspace access grant list.)
2. If no `workspace` parameter is provided, use the user's assigned
workspace.
The gateway sets the `user` field in `Metadata` to the effective
workspace ID, replacing the caller-supplied `?user=` query parameter.
This design ensures forward compatibility. Clients that pass a
workspace parameter will work unchanged if multi-workspace support is
added later. Requests for an unassigned workspace get a clear 403
rather than silent misbehaviour.
### Roles and access control
Three roles with fixed permissions:
| Role | Data operations | Admin operations | System |
|------|----------------|-----------------|--------|
| `reader` | Query knowledge graph, embeddings, RAG | None | None |
| `writer` | All reader operations + load documents, manage collections | None | None |
| `admin` | All writer operations | Config, flows, collection management, user management | Metrics |
Role checks happen at the gateway before dispatching to backend
services. Each endpoint declares the minimum role required:
| Endpoint pattern | Minimum role |
|-----------------|--------------|
| `GET /api/v1/socket` (queries) | `reader` |
| `POST /api/v1/librarian` | `writer` |
| `POST /api/v1/flow/*/import/*` | `writer` |
| `POST /api/v1/config` | `admin` |
| `GET /api/v1/flow/*` | `admin` |
| `GET /api/metrics` | `admin` |
Roles are hierarchical: `admin` implies `writer`, which implies
`reader`.
### IAM service
The IAM service is a new backend service that manages all identity and
access data. It is the authority for users, workspaces, API keys, and
credentials. The gateway delegates to it.
#### Data model
```
iam_workspaces (
id text PRIMARY KEY,
name text,
enabled boolean,
created timestamp
)
iam_users (
id text PRIMARY KEY,
workspace text,
name text,
email text,
password_hash text,
roles set<text>,
enabled boolean,
created timestamp
)
iam_api_keys (
key_hash text PRIMARY KEY,
user_id text,
name text,
expires timestamp,
created timestamp
)
```
A secondary index on `iam_api_keys.user_id` supports listing a user's
keys.
#### Responsibilities
- User CRUD (create, list, update, disable)
- Workspace CRUD (create, list, update, disable)
- API key management (create, revoke, list)
- API key resolution (hash → user/workspace/roles)
- Credential validation (username/password → signed JWT)
- JWT signing key management (initialise, rotate)
- Bootstrap (create default workspace and admin user on first start)
#### Communication
The IAM service communicates via the standard request/response pub/sub
pattern, the same as the config service. The gateway calls it to
resolve API keys and to handle login requests. User management
operations (create user, revoke key, etc.) also go through the IAM
service.
### Gateway changes
The current `Authenticator` class is replaced with a thin authentication
middleware that delegates to the IAM service:
For HTTP requests:
1. Extract Bearer token from the `Authorization` header.
2. If the token has JWT format (dotted structure):
- Validate signature locally using the cached public key.
- Extract user ID, workspace, and roles from claims.
3. Otherwise, treat as an API key:
- Hash the token and check the local cache.
- On cache miss, call the IAM service to resolve.
- Cache the result (user/workspace/roles) with a short TTL.
4. If neither succeeds, return 401.
5. If the user or workspace is disabled, return 403.
6. Check the user's role against the endpoint's minimum role. If
insufficient, return 403.
7. Resolve the effective workspace:
- If the request includes a `workspace` parameter, validate it
against the user's assigned workspace. Return 403 on mismatch.
- If no `workspace` parameter, use the user's assigned workspace.
8. Set the `user` field in the request context to the effective
workspace ID. This propagates through `Metadata` to all downstream
services.
For WebSocket connections:
1. Accept the connection in an unauthenticated state.
2. Wait for an auth message (`{"type": "auth", "token": "..."}`).
3. Validate the token using the same logic as steps 2-7 above.
4. On success, attach the resolved identity to the connection and
send `{"type": "auth-ok", ...}`.
5. On failure, send `{"type": "auth-failed", ...}` but keep the
socket open.
6. Reject all non-auth messages until authentication succeeds.
7. Accept new auth messages at any time to re-authenticate.
### CLI changes
CLI tools authenticate with API keys:
- `--api-key` argument on all CLI tools, replacing `--api-token`.
- `tg-create-workspace`, `tg-list-workspaces` for workspace management.
- `tg-create-user`, `tg-list-users`, `tg-disable-user` for user
management.
- `tg-create-api-key`, `tg-list-api-keys`, `tg-revoke-api-key` for
key management.
- `--workspace` argument on tools that operate on workspace-scoped
data.
- The API key is passed as a Bearer token in the same way as the
current shared token, so the transport protocol is unchanged.
### Audit logging
With user identity established, the gateway logs:
- Timestamp, user ID, workspace, endpoint, HTTP method, response status.
- Audit logs are written to the standard logging output (structured
JSON). Integration with external log aggregation (Loki, ELK) is a
deployment concern, not an application concern.
### Config service changes
All configuration is workspace-scoped (see
[data-ownership-model.md](data-ownership-model.md)). The config service
needs to support this.
#### Schema change
The config table adds workspace as a key dimension:
```
config (
workspace text,
class text,
key text,
value text,
PRIMARY KEY ((workspace, class), key)
)
```
#### Request format
Config requests add a `workspace` field at the request level. The
existing `(type, key)` structure is unchanged within each workspace.
**Get:**
```json
{
"operation": "get",
"workspace": "workspace-a",
"keys": [{"type": "prompt", "key": "rag-prompt"}]
}
```
**Put:**
```json
{
"operation": "put",
"workspace": "workspace-a",
"values": [{"type": "prompt", "key": "rag-prompt", "value": "..."}]
}
```
**List (all keys of a type within a workspace):**
```json
{
"operation": "list",
"workspace": "workspace-a",
"type": "prompt"
}
```
**Delete:**
```json
{
"operation": "delete",
"workspace": "workspace-a",
"keys": [{"type": "prompt", "key": "rag-prompt"}]
}
```
The workspace is set by:
- **Gateway** — from the authenticated user's workspace for API-facing
requests.
- **Internal services** — explicitly, based on `Metadata.user` from
the message being processed, or `_system` for operational config.
#### System config namespace
Processor-level operational config (logging levels, connection strings,
resource limits) is not workspace-specific. This stays in a reserved
`_system` workspace that is not associated with any user workspace.
Services read system config at startup without needing a workspace
context.
#### Config change notifications
The config notify mechanism pushes change notifications via pub/sub
when config is updated. A single update may affect multiple workspaces
and multiple config types. The notification message carries a dict of
changes keyed by config type, with each value being the list of
affected workspaces:
```json
{
"version": 42,
"changes": {
"prompt": ["workspace-a", "workspace-b"],
"schema": ["workspace-a"]
}
}
```
System config changes use the reserved `_system` workspace:
```json
{
"version": 43,
"changes": {
"logging": ["_system"]
}
}
```
This structure is keyed by type because handlers register by type. A
handler registered for `prompt` looks up `"prompt"` directly and gets
the list of affected workspaces — no iteration over unrelated types.
#### Config change handlers
The current `on_config` hook mechanism needs two modes to support shared
processing services:
- **Workspace-scoped handlers** — notify when a config type changes in a
specific workspace. The handler looks up its registered type in the
changes dict and checks if its workspace is in the list. Used by the
gateway and by services that serve a single workspace.
- **Global handlers** — notify when a config type changes in any
workspace. The handler looks up its registered type in the changes
dict and gets the full list of affected workspaces. Used by shared
processing services (prompt-rag, agent manager, etc.) that serve all
workspaces. Each workspace in the list tells the handler which cache
entry to update rather than reloading everything.
#### Per-workspace config caching
Shared services that handle messages from multiple workspaces maintain a
per-workspace config cache. When a message arrives, the service looks up
the config for the workspace identified in `Metadata.user`. If the
workspace is not yet cached, the service fetches its config on demand.
Config change notifications update the relevant cache entry.
### Flow and queue isolation
Flows are workspace-owned. When two workspaces start flows with the same
name and blueprint, their queues must be separate to prevent data
mixing.
Flow blueprint templates currently use `{id}` (flow instance ID) and
`{class}` (blueprint name) as template variables in queue names. A new
`{workspace}` variable is added so queue names include the workspace:
**Current queue names (no workspace isolation):**
```
flow:tg:document-load:{id} → flow:tg:document-load:default
request:tg:embeddings:{class} → request:tg:embeddings:everything
```
**With workspace isolation:**
```
flow:tg:{workspace}:document-load:{id} → flow:tg:ws-a:document-load:default
request:tg:{workspace}:embeddings:{class} → request:tg:ws-a:embeddings:everything
```
The flow service substitutes `{workspace}` from the authenticated
workspace when starting a flow, the same way it substitutes `{id}` and
`{class}` today.
Processing services are shared infrastructure — they consume from
workspace-specific queues but are not themselves workspace-aware. The
workspace is carried in `Metadata.user` on every message, so services
know which workspace's data they are processing.
Blueprint templates need updating to include `{workspace}` in all queue
name patterns. For migration, the flow service can inject the workspace
into queue names automatically if the template does not include
`{workspace}`, defaulting to the legacy behaviour for existing
blueprints.
See [flow-class-definition.md](flow-class-definition.md) for the full
blueprint template specification.
### What changes and what doesn't
**Changes:**
| Component | Change |
|-----------|--------|
| `gateway/auth.py` | Replace `Authenticator` with new auth middleware |
| `gateway/service.py` | Initialise IAM client, configure JWT validation |
| `gateway/endpoint/*.py` | Add role requirement per endpoint |
| Metadata propagation | Gateway sets `user` from workspace, ignores query param |
| Config service | Add workspace dimension to config schema |
| Config table | `PRIMARY KEY ((workspace, class), key)` |
| Config request/response schema | Add `workspace` field |
| Config notify messages | Include workspace ID in change notifications |
| `on_config` handlers | Support workspace-scoped and global modes |
| Shared services | Per-workspace config caching |
| Flow blueprints | Add `{workspace}` template variable to queue names |
| Flow service | Substitute `{workspace}` when starting flows |
| CLI tools | New user management commands, `--api-key` argument |
| Cassandra schema | New `iam_workspaces`, `iam_users`, `iam_api_keys` tables |
**Does not change:**
| Component | Reason |
|-----------|--------|
| Internal service-to-service pub/sub | Services trust the gateway |
| `Metadata` dataclass | `user` field continues to carry workspace identity |
| Storage-layer isolation | Same `user` + `collection` filtering |
| Message serialisation | No schema changes |
### Migration
This is a breaking change. Existing deployments must be reconfigured:
1. `GATEWAY_SECRET` is removed. Authentication requires API keys or
JWT login tokens.
2. The `?user=` query parameter is removed. Workspace identity comes
from authentication.
3. On first start, the IAM service bootstraps a default workspace and
admin user. The initial API key is output to the service log.
4. Operators create additional workspaces and users via CLI tools.
5. Flow blueprints must be updated to include `{workspace}` in queue
name patterns.
6. Config data must be migrated to include the workspace dimension.
## Extension points
The design includes deliberate extension points for future capabilities.
These are not implemented but the architecture does not preclude them:
- **Multi-workspace access.** Users could be granted access to
additional workspaces beyond their primary assignment. The workspace
validation step checks a grant list instead of a single assignment.
- **Rules-based access control.** A separate access control service
could evaluate fine-grained policies (per-collection permissions,
operation-level restrictions, time-based access). The gateway
delegates authorisation decisions to this service.
- **External identity provider integration.** SAML, LDAP, and OIDC
flows (group mapping, claims-based role assignment) could be added
to the IAM service.
- **Cross-workspace administration.** A `superadmin` role for platform
operators who manage multiple workspaces.
- **Delegated workspace provisioning.** APIs for programmatic workspace
creation and user onboarding.
These extensions are additive — they extend the validation logic
without changing the request/response protocol. The gateway can be
replaced with an alternative implementation that supports these
capabilities while the IAM service and backend services remain
unchanged.
## Implementation plan
Workspace support is a prerequisite for auth — users are assigned to
workspaces, config is workspace-scoped, and flows use workspace in
queue names. Implementing workspaces first allows the structural changes
to be tested end-to-end without auth complicating debugging.
### Phase 1: Workspace support (no auth)
All workspace-scoped data and processing changes. The system works with
workspaces but no authentication — callers pass workspace as a
parameter, honour system. This allows full end-to-end testing: multiple
workspaces with separate flows, config, queues, and data.
#### Config service
- Update config client API to accept a workspace parameter on all
requests
- Update config storage schema to add workspace as a key dimension
- Update config notification API to report changes as a dict of
type → workspace list
- Update the processor base class to understand workspaces in config
notifications (workspace-scoped and global handler modes)
- Update all processors to implement workspace-aware config handling
(per-workspace config caching, on-demand fetch)
#### Flow and queue isolation
- Update flow blueprints to include `{workspace}` in all queue name
patterns
- Update the flow service to substitute `{workspace}` when starting
flows
- Update all built-in blueprints to include `{workspace}`
#### CLI tools (workspace support)
- Add `--workspace` argument to CLI tools that operate on
workspace-scoped data
- Add `tg-create-workspace`, `tg-list-workspaces` commands
### Phase 2: Authentication and access control
With workspaces working, add the IAM service and lock down the gateway.
#### IAM service
A new service handling identity and access management on behalf of the
API gateway:
- Add workspace table support (CRUD, enable/disable)
- Add user table support (CRUD, enable/disable, workspace assignment)
- Add roles support (role assignment, role validation)
- Add API key support (create, revoke, list, hash storage)
- Add ability to initialise a JWT signing key for token grants
- Add token grant endpoint: user/password login returns a signed JWT
- Add bootstrap/initialisation mechanism: ability to set the signing
key and create the initial workspace + admin user on first start
#### API gateway integration
- Add IAM middleware to the API gateway replacing the current
`Authenticator`
- Add local JWT validation (public key from IAM service)
- Add API key resolution with local cache (hash → user/workspace/roles,
cache miss calls IAM service, short TTL)
- Add login endpoint forwarding to IAM service
- Add workspace resolution: validate requested workspace against user
assignment
- Add role-based endpoint access checks
- Add user management API endpoints (forwarded to IAM service)
- Add audit logging (user ID, workspace, endpoint, method, status)
- WebSocket auth via first-message protocol (auth message after
connect, socket stays open on failure, re-auth supported)
#### CLI tools (auth support)
- Add `tg-create-user`, `tg-list-users`, `tg-disable-user` commands
- Add `tg-create-api-key`, `tg-list-api-keys`, `tg-revoke-api-key`
commands
- Replace `--api-token` with `--api-key` on existing CLI tools
#### Bootstrap and cutover
- Create default workspace and admin user on first start if IAM tables
are empty
- Remove `GATEWAY_SECRET` and `?user=` query parameter support
## Design Decisions
### IAM data store
IAM data is stored in dedicated Cassandra tables owned by the IAM
service, not in the config service. Reasons:
- **Security isolation.** The config service has a broad, generic
protocol. An access control failure on the config service could
expose credentials. A dedicated IAM service with a purpose-built
protocol limits the attack surface and makes security auditing
clearer.
- **Data model fit.** IAM needs indexed lookups (API key hash → user,
list keys by user). The config service's `(workspace, type, key) →
value` model stores opaque JSON strings with no secondary indexes.
- **Scope.** IAM data is global (workspaces, users, keys). Config is
workspace-scoped. Mixing global and workspace-scoped data in the
same store adds complexity.
- **Audit.** IAM operations (key creation, revocation, login attempts)
are security events that should be logged separately from general
config changes.
## Deferred to future design
- **OIDC integration.** External identity provider support (SAML, LDAP,
OIDC) is left for future implementation. The extension points section
describes where this fits architecturally.
- **API key scoping.** API keys could be scoped to specific collections
within a workspace rather than granting workspace-wide access. To be
designed when the need arises.
- **tg-init-trustgraph** only initialises a single workspace.
## References
- [Data Ownership and Information Separation](data-ownership-model.md)
- [MCP Tool Bearer Token Specification](mcp-tool-bearer-token.md)
- [Multi-Tenant Support Specification](multi-tenant-support.md)
- [Neo4j User Collection Isolation](neo4j-user-collection-isolation.md)