mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 00:16:23 +02:00
release/v2.4 -> master (#844)
This commit is contained in:
parent
a24df8e990
commit
89cabee1b4
386 changed files with 7202 additions and 5741 deletions
309
docs/tech-specs/data-ownership-model.md
Normal file
309
docs/tech-specs/data-ownership-model.md
Normal file
|
|
@ -0,0 +1,309 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Data Ownership and Information Separation"
|
||||
parent: "Tech Specs"
|
||||
---
|
||||
|
||||
# Data Ownership and Information Separation
|
||||
|
||||
## Purpose
|
||||
|
||||
This document defines the logical ownership model for data in
|
||||
TrustGraph: what the artefacts are, who owns them, and how they relate
|
||||
to each other.
|
||||
|
||||
The IAM spec ([iam.md](iam.md)) describes authentication and
|
||||
authorisation mechanics. This spec addresses the prior question: what
|
||||
are the boundaries around data, and who owns what?
|
||||
|
||||
## Concepts
|
||||
|
||||
### Workspace
|
||||
|
||||
A workspace is the primary isolation boundary. It represents an
|
||||
organisation, team, or independent operating unit. All data belongs to
|
||||
exactly one workspace. Cross-workspace access is never permitted through
|
||||
the API.
|
||||
|
||||
A workspace owns:
|
||||
- Source documents
|
||||
- Flows (processing pipeline definitions)
|
||||
- Knowledge cores (stored extraction output)
|
||||
- Collections (organisational units for extracted knowledge)
|
||||
|
||||
### Collection
|
||||
|
||||
A collection is an organisational unit within a workspace. It groups
|
||||
extracted knowledge produced from source documents. A workspace can
|
||||
have multiple collections, allowing:
|
||||
|
||||
- Processing the same documents with different parameters or models.
|
||||
- Maintaining separate knowledge bases for different purposes.
|
||||
- Deleting extracted knowledge without deleting source documents.
|
||||
|
||||
Collections do not own source documents. A source document exists at the
|
||||
workspace level and can be processed into multiple collections.
|
||||
|
||||
### Source document
|
||||
|
||||
A source document (PDF, text file, etc.) is raw input uploaded to the
|
||||
system. Documents belong to the workspace, not to a specific collection.
|
||||
|
||||
This is intentional. A document is an asset that exists independently
|
||||
of how it is processed. The same PDF might be processed into multiple
|
||||
collections with different chunking parameters or extraction models.
|
||||
Tying a document to a single collection would force re-upload for each
|
||||
collection.
|
||||
|
||||
### Flow
|
||||
|
||||
A flow defines a processing pipeline: which models to use, what
|
||||
parameters to apply (chunk size, temperature, etc.), and how processing
|
||||
services are connected. Flows belong to a workspace.
|
||||
|
||||
The processing services themselves (document-decoder, chunker,
|
||||
embeddings, LLM completion, etc.) are shared infrastructure — they serve
|
||||
all workspaces. Each flow has its own queues, keeping data from
|
||||
different workspaces and flows separate as it moves through the
|
||||
pipeline.
|
||||
|
||||
Different workspaces can define different flows. Workspace A might use
|
||||
GPT-5.2 with a chunk size of 2000, while workspace B uses Claude with a
|
||||
chunk size of 1000.
|
||||
|
||||
### Prompts
|
||||
|
||||
Prompts are templates that control how the LLM behaves during knowledge
|
||||
extraction and query answering. They belong to a workspace, allowing
|
||||
different workspaces to have different extraction strategies, response
|
||||
styles, or domain-specific instructions.
|
||||
|
||||
### Ontology
|
||||
|
||||
An ontology defines the concepts, entities, and relationships that the
|
||||
extraction pipeline looks for in source documents. Ontologies belong to
|
||||
a workspace. A medical workspace might define ontologies around diseases,
|
||||
symptoms, and treatments, while a legal workspace defines ontologies
|
||||
around statutes, precedents, and obligations.
|
||||
|
||||
### Schemas
|
||||
|
||||
Schemas define structured data types for extraction. They specify what
|
||||
fields to extract, their types, and how they relate. Schemas belong to
|
||||
a workspace, as different workspaces extract different structured
|
||||
information from their documents.
|
||||
|
||||
### Tools, tool services, and MCP tools
|
||||
|
||||
Tools define capabilities available to agents: what actions they can
|
||||
take, what external services they can call. Tool services configure how
|
||||
tools connect to backend services. MCP tools configure connections to
|
||||
remote MCP servers, including authentication tokens. All belong to a
|
||||
workspace.
|
||||
|
||||
### Agent patterns and agent task types
|
||||
|
||||
Agent patterns define agent behaviour strategies (how an agent reasons,
|
||||
what steps it follows). Agent task types define the kinds of tasks
|
||||
agents can perform. Both belong to a workspace, as different workspaces
|
||||
may have different agent configurations.
|
||||
|
||||
### Token costs
|
||||
|
||||
Token cost definitions specify pricing for LLM token usage per model.
|
||||
These belong to a workspace since different workspaces may use different
|
||||
models or have different billing arrangements.
|
||||
|
||||
### Flow blueprints
|
||||
|
||||
Flow blueprints are templates for creating flows. They define the
|
||||
default pipeline structure and parameters. Blueprints belong to a
|
||||
workspace, allowing workspaces to define custom processing templates.
|
||||
|
||||
### Parameter types
|
||||
|
||||
Parameter types define the kinds of parameters that flows accept (e.g.
|
||||
"llm-model", "temperature"), including their defaults and validation
|
||||
rules. They belong to a workspace since workspaces that define custom
|
||||
flows need to define the parameter types those flows use.
|
||||
|
||||
### Interface descriptions
|
||||
|
||||
Interface descriptions define the connection points of a flow — what
|
||||
queues and topics it uses. They belong to a workspace since they
|
||||
describe workspace-owned flows.
|
||||
|
||||
### Knowledge core
|
||||
|
||||
A knowledge core is a stored snapshot of extracted knowledge (triples
|
||||
and graph embeddings). Knowledge cores belong to a workspace and can be
|
||||
loaded into any collection within that workspace.
|
||||
|
||||
Knowledge cores serve as a portable extraction output. You process
|
||||
documents through a flow, the pipeline produces triples and embeddings,
|
||||
and the results can be stored as a knowledge core. That core can later
|
||||
be loaded into a different collection or reloaded after a collection is
|
||||
cleared.
|
||||
|
||||
### Extracted knowledge
|
||||
|
||||
Extracted knowledge is the live, queryable content within a collection:
|
||||
triples in the knowledge graph, graph embeddings, and document
|
||||
embeddings. It is the product of processing source documents through a
|
||||
flow into a specific collection.
|
||||
|
||||
Extracted knowledge is scoped to a workspace and a collection. It
|
||||
cannot exist without both.
|
||||
|
||||
### Processing record
|
||||
|
||||
A processing record tracks which source document was processed, through
|
||||
which flow, into which collection. It links the source document
|
||||
(workspace-scoped) to the extracted knowledge (workspace + collection
|
||||
scoped).
|
||||
|
||||
## Ownership summary
|
||||
|
||||
| Artefact | Owned by | Shared across collections? |
|
||||
|----------|----------|---------------------------|
|
||||
| Workspaces | Global (platform) | N/A |
|
||||
| User accounts | Global (platform) | N/A |
|
||||
| API keys | Global (platform) | N/A |
|
||||
| Source documents | Workspace | Yes |
|
||||
| Flows | Workspace | N/A |
|
||||
| Flow blueprints | Workspace | N/A |
|
||||
| Prompts | Workspace | N/A |
|
||||
| Ontologies | Workspace | N/A |
|
||||
| Schemas | Workspace | N/A |
|
||||
| Tools | Workspace | N/A |
|
||||
| Tool services | Workspace | N/A |
|
||||
| MCP tools | Workspace | N/A |
|
||||
| Agent patterns | Workspace | N/A |
|
||||
| Agent task types | Workspace | N/A |
|
||||
| Token costs | Workspace | N/A |
|
||||
| Parameter types | Workspace | N/A |
|
||||
| Interface descriptions | Workspace | N/A |
|
||||
| Knowledge cores | Workspace | Yes — can be loaded into any collection |
|
||||
| Collections | Workspace | N/A |
|
||||
| Extracted knowledge | Workspace + collection | No |
|
||||
| Processing records | Workspace + collection | No |
|
||||
|
||||
## Scoping summary
|
||||
|
||||
### Global (system-level)
|
||||
|
||||
A small number of artefacts exist outside any workspace:
|
||||
|
||||
- **Workspace registry** — the list of workspaces itself
|
||||
- **User accounts** — users reference a workspace but are not owned by
|
||||
one
|
||||
- **API keys** — belong to users, not workspaces
|
||||
|
||||
These are managed by the IAM layer and exist at the platform level.
|
||||
|
||||
### Workspace-owned
|
||||
|
||||
All other configuration and data is workspace-owned:
|
||||
|
||||
- Flow definitions and parameters
|
||||
- Flow blueprints
|
||||
- Prompts
|
||||
- Ontologies
|
||||
- Schemas
|
||||
- Tools, tool services, and MCP tools
|
||||
- Agent patterns and agent task types
|
||||
- Token costs
|
||||
- Parameter types
|
||||
- Interface descriptions
|
||||
- Collection definitions
|
||||
- Knowledge cores
|
||||
- Source documents
|
||||
- Collections and their extracted knowledge
|
||||
|
||||
## Relationship between artefacts
|
||||
|
||||
```
|
||||
Platform (global)
|
||||
|
|
||||
+-- Workspaces
|
||||
| |
|
||||
+-- User accounts (each assigned to a workspace)
|
||||
| |
|
||||
+-- API keys (belong to users)
|
||||
|
||||
Workspace
|
||||
|
|
||||
+-- Source documents (uploaded, unprocessed)
|
||||
|
|
||||
+-- Flows (pipeline definitions: models, parameters, queues)
|
||||
|
|
||||
+-- Flow blueprints (templates for creating flows)
|
||||
|
|
||||
+-- Prompts (LLM instruction templates)
|
||||
|
|
||||
+-- Ontologies (entity and relationship definitions)
|
||||
|
|
||||
+-- Schemas (structured data type definitions)
|
||||
|
|
||||
+-- Tools, tool services, MCP tools (agent capabilities)
|
||||
|
|
||||
+-- Agent patterns and agent task types (agent behaviour)
|
||||
|
|
||||
+-- Token costs (LLM pricing per model)
|
||||
|
|
||||
+-- Parameter types (flow parameter definitions)
|
||||
|
|
||||
+-- Interface descriptions (flow connection points)
|
||||
|
|
||||
+-- Knowledge cores (stored extraction snapshots)
|
||||
|
|
||||
+-- Collections
|
||||
|
|
||||
+-- Extracted knowledge (triples, embeddings)
|
||||
|
|
||||
+-- Processing records (links documents to collections)
|
||||
```
|
||||
|
||||
A typical workflow:
|
||||
|
||||
1. A source document is uploaded to the workspace.
|
||||
2. A flow defines how to process it (which models, what parameters).
|
||||
3. The document is processed through the flow into a collection.
|
||||
4. Processing records track what was processed.
|
||||
5. Extracted knowledge (triples, embeddings) is queryable within the
|
||||
collection.
|
||||
6. Optionally, the extracted knowledge is stored as a knowledge core
|
||||
for later reuse.
|
||||
|
||||
## Implementation notes
|
||||
|
||||
The current codebase uses a `user` field in message metadata and storage
|
||||
partition keys to identify the workspace. The `collection` field
|
||||
identifies the collection within that workspace. The IAM spec describes
|
||||
how the gateway maps authenticated credentials to a workspace identity
|
||||
and sets these fields.
|
||||
|
||||
For details on how each storage backend implements this scoping, see:
|
||||
|
||||
- [Entity-Centric Graph](entity-centric-graph.md) — Cassandra KG schema
|
||||
- [Neo4j User Collection Isolation](neo4j-user-collection-isolation.md)
|
||||
- [Collection Management](collection-management.md)
|
||||
|
||||
### Known inconsistencies in current implementation
|
||||
|
||||
- **Pipeline intermediate tables** do not include collection in their
|
||||
partition keys. Re-processing the same document into a different
|
||||
collection may overwrite intermediate state.
|
||||
- **Processing metadata** stores collection in the row payload but not
|
||||
in the partition key, making collection-based queries inefficient.
|
||||
- **Upload sessions** are keyed by upload ID, not workspace. The
|
||||
gateway should validate workspace ownership before allowing
|
||||
operations on upload sessions.
|
||||
|
||||
## References
|
||||
|
||||
- [Identity and Access Management](iam.md)
|
||||
- [Collection Management](collection-management.md)
|
||||
- [Entity-Centric Graph](entity-centric-graph.md)
|
||||
- [Neo4j User Collection Isolation](neo4j-user-collection-isolation.md)
|
||||
- [Multi-Tenant Support](multi-tenant-support.md)
|
||||
|
|
@ -20,8 +20,8 @@ Defines shared service processors that are instantiated once per flow blueprint.
|
|||
```json
|
||||
"class": {
|
||||
"service-name:{class}": {
|
||||
"request": "queue-pattern:{class}",
|
||||
"response": "queue-pattern:{class}",
|
||||
"request": "queue-pattern:{workspace}:{class}",
|
||||
"response": "queue-pattern:{workspace}:{class}",
|
||||
"settings": {
|
||||
"setting-name": "fixed-value",
|
||||
"parameterized-setting": "{parameter-name}"
|
||||
|
|
@ -31,11 +31,11 @@ Defines shared service processors that are instantiated once per flow blueprint.
|
|||
```
|
||||
|
||||
**Characteristics:**
|
||||
- Shared across all flow instances of the same class
|
||||
- Shared across all flow instances of the same class within a workspace
|
||||
- Typically expensive or stateless services (LLMs, embedding models)
|
||||
- Use `{class}` template variable for queue naming
|
||||
- Use `{workspace}` and `{class}` template variables for queue naming
|
||||
- Settings can be fixed values or parameterized with `{parameter-name}` syntax
|
||||
- Examples: `embeddings:{class}`, `text-completion:{class}`, `graph-rag:{class}`
|
||||
- Examples: `embeddings:{workspace}:{class}`, `text-completion:{workspace}:{class}`
|
||||
|
||||
### 2. Flow Section
|
||||
Defines flow-specific processors that are instantiated for each individual flow instance. Each flow gets its own isolated set of these processors.
|
||||
|
|
@ -43,8 +43,8 @@ Defines flow-specific processors that are instantiated for each individual flow
|
|||
```json
|
||||
"flow": {
|
||||
"processor-name:{id}": {
|
||||
"input": "queue-pattern:{id}",
|
||||
"output": "queue-pattern:{id}",
|
||||
"input": "queue-pattern:{workspace}:{id}",
|
||||
"output": "queue-pattern:{workspace}:{id}",
|
||||
"settings": {
|
||||
"setting-name": "fixed-value",
|
||||
"parameterized-setting": "{parameter-name}"
|
||||
|
|
@ -56,9 +56,9 @@ Defines flow-specific processors that are instantiated for each individual flow
|
|||
**Characteristics:**
|
||||
- Unique instance per flow
|
||||
- Handle flow-specific data and state
|
||||
- Use `{id}` template variable for queue naming
|
||||
- Use `{workspace}` and `{id}` template variables for queue naming
|
||||
- Settings can be fixed values or parameterized with `{parameter-name}` syntax
|
||||
- Examples: `chunker:{id}`, `pdf-decoder:{id}`, `kg-extract-relationships:{id}`
|
||||
- Examples: `chunker:{workspace}:{id}`, `pdf-decoder:{workspace}:{id}`
|
||||
|
||||
### 3. Interfaces Section
|
||||
Defines the entry points and interaction contracts for the flow. These form the API surface for external systems and internal component communication.
|
||||
|
|
@ -68,8 +68,8 @@ Interfaces can take two forms:
|
|||
**Fire-and-Forget Pattern** (single queue):
|
||||
```json
|
||||
"interfaces": {
|
||||
"document-load": "persistent://tg/flow/document-load:{id}",
|
||||
"triples-store": "persistent://tg/flow/triples-store:{id}"
|
||||
"document-load": "persistent://tg/flow/{workspace}:document-load:{id}",
|
||||
"triples-store": "persistent://tg/flow/{workspace}:triples-store:{id}"
|
||||
}
|
||||
```
|
||||
|
||||
|
|
@ -77,8 +77,8 @@ Interfaces can take two forms:
|
|||
```json
|
||||
"interfaces": {
|
||||
"embeddings": {
|
||||
"request": "non-persistent://tg/request/embeddings:{class}",
|
||||
"response": "non-persistent://tg/response/embeddings:{class}"
|
||||
"request": "non-persistent://tg/request/{workspace}:embeddings:{class}",
|
||||
"response": "non-persistent://tg/response/{workspace}:embeddings:{class}"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
|
@ -117,6 +117,16 @@ Additional information about the flow blueprint:
|
|||
|
||||
### System Variables
|
||||
|
||||
#### {workspace}
|
||||
- Replaced with the workspace identifier
|
||||
- Isolates queue names between workspaces so that two workspaces
|
||||
starting the same flow do not share queues
|
||||
- Must be included in all queue name patterns to ensure workspace
|
||||
isolation
|
||||
- Example: `ws-acme`, `ws-globex`
|
||||
- All blueprint templates must include `{workspace}` in queue name
|
||||
patterns
|
||||
|
||||
#### {id}
|
||||
- Replaced with the unique flow instance identifier
|
||||
- Creates isolated resources for each flow
|
||||
|
|
|
|||
858
docs/tech-specs/iam.md
Normal file
858
docs/tech-specs/iam.md
Normal file
|
|
@ -0,0 +1,858 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Identity and Access Management"
|
||||
parent: "Tech Specs"
|
||||
---
|
||||
|
||||
# Identity and Access Management
|
||||
|
||||
## Problem Statement
|
||||
|
||||
TrustGraph has no meaningful identity or access management. The system
|
||||
relies on a single shared gateway token for authentication and an
|
||||
honour-system `user` query parameter for data isolation. This creates
|
||||
several problems:
|
||||
|
||||
- **No user identity.** There are no user accounts, no login, and no way
|
||||
to know who is making a request. The `user` field in message metadata
|
||||
is a caller-supplied string with no validation — any client can claim
|
||||
to be any user.
|
||||
|
||||
- **No access control.** A valid gateway token grants unrestricted access
|
||||
to every endpoint, every user's data, every collection, and every
|
||||
administrative operation. There is no way to limit what an
|
||||
authenticated caller can do.
|
||||
|
||||
- **No credential isolation.** All callers share one static token. There
|
||||
is no per-user credential, no token expiration, and no rotation
|
||||
mechanism. Revoking access means changing the shared token, which
|
||||
affects all callers.
|
||||
|
||||
- **Data isolation is unenforced.** Storage backends (Cassandra, Neo4j,
|
||||
Qdrant) filter queries by `user` and `collection`, but the gateway
|
||||
does not prevent a caller from specifying another user's identity.
|
||||
Cross-user data access is trivial.
|
||||
|
||||
- **No audit trail.** There is no logging of who accessed what. Without
|
||||
user identity, audit logging is impossible.
|
||||
|
||||
These gaps make the system unsuitable for multi-user deployments,
|
||||
multi-tenant SaaS, or any environment where access needs to be
|
||||
controlled or audited.
|
||||
|
||||
## Current State
|
||||
|
||||
### Authentication
|
||||
|
||||
The API gateway supports a single shared token configured via the
|
||||
`GATEWAY_SECRET` environment variable or `--api-token` CLI argument. If
|
||||
unset, authentication is disabled entirely. When enabled, every HTTP
|
||||
endpoint requires an `Authorization: Bearer <token>` header. WebSocket
|
||||
connections pass the token as a query parameter.
|
||||
|
||||
Implementation: `trustgraph-flow/trustgraph/gateway/auth.py`
|
||||
|
||||
```python
|
||||
class Authenticator:
|
||||
def __init__(self, token=None, allow_all=False):
|
||||
self.token = token
|
||||
self.allow_all = allow_all
|
||||
|
||||
def permitted(self, token, roles):
|
||||
if self.allow_all: return True
|
||||
if self.token != token: return False
|
||||
return True
|
||||
```
|
||||
|
||||
The `roles` parameter is accepted but never evaluated. All authenticated
|
||||
requests have identical privileges.
|
||||
|
||||
MCP tool configurations support an optional per-tool `auth-token` for
|
||||
service-to-service authentication with remote MCP servers. These are
|
||||
static, system-wide tokens — not per-user credentials. See
|
||||
[mcp-tool-bearer-token.md](mcp-tool-bearer-token.md) for details.
|
||||
|
||||
### User identity
|
||||
|
||||
The `user` field is passed explicitly by the caller as a query parameter
|
||||
(e.g. `?user=trustgraph`) or set by CLI tools. It flows through the
|
||||
system in the core `Metadata` dataclass:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class Metadata:
|
||||
id: str = ""
|
||||
root: str = ""
|
||||
user: str = ""
|
||||
collection: str = ""
|
||||
```
|
||||
|
||||
There is no user registration, login, user database, or session
|
||||
management.
|
||||
|
||||
### Data isolation
|
||||
|
||||
The `user` + `collection` pair is used at the storage layer to partition
|
||||
data:
|
||||
|
||||
- **Cassandra**: queries filter by `user` and `collection` columns
|
||||
- **Neo4j**: queries filter by `user` and `collection` properties
|
||||
- **Qdrant**: vector search filters by `user` and `collection` metadata
|
||||
|
||||
| Layer | Isolation mechanism | Enforced by |
|
||||
|-------|-------------------|-------------|
|
||||
| Gateway | Single shared token | `Authenticator` class |
|
||||
| Message metadata | `user` + `collection` fields | Caller (honour system) |
|
||||
| Cassandra | Column filters on `user`, `collection` | Query layer |
|
||||
| Neo4j | Property filters on `user`, `collection` | Query layer |
|
||||
| Qdrant | Metadata filters on `user`, `collection` | Query layer |
|
||||
| Pub/sub topics | Per-flow topic namespacing | Flow service |
|
||||
|
||||
The storage-layer isolation depends on all queries correctly filtering by
|
||||
`user` and `collection`. There is no gateway-level enforcement preventing
|
||||
a caller from querying another user's data by passing a different `user`
|
||||
parameter.
|
||||
|
||||
### Configuration and secrets
|
||||
|
||||
| Setting | Source | Default | Purpose |
|
||||
|---------|--------|---------|---------|
|
||||
| `GATEWAY_SECRET` | Env var | Empty (auth disabled) | Gateway bearer token |
|
||||
| `--api-token` | CLI arg | None | Gateway bearer token (overrides env) |
|
||||
| `PULSAR_API_KEY` | Env var | None | Pub/sub broker auth |
|
||||
| MCP `auth-token` | Config service | None | Per-tool MCP server auth |
|
||||
|
||||
No secrets are encrypted at rest. The gateway token and MCP tokens are
|
||||
stored and transmitted in plaintext (aside from any transport-layer
|
||||
encryption such as TLS).
|
||||
|
||||
### Capabilities that do not exist
|
||||
|
||||
- Per-user authentication (JWT, OAuth, SAML, API keys per user)
|
||||
- User accounts or user management
|
||||
- Role-based access control (RBAC)
|
||||
- Attribute-based access control (ABAC)
|
||||
- Per-user or per-workspace API keys
|
||||
- Token expiration or rotation
|
||||
- Session management
|
||||
- Per-user rate limiting
|
||||
- Audit logging of user actions
|
||||
- Permission checks preventing cross-user data access
|
||||
- Multi-workspace credential isolation
|
||||
|
||||
### Key files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `trustgraph-flow/trustgraph/gateway/auth.py` | Authenticator class |
|
||||
| `trustgraph-flow/trustgraph/gateway/service.py` | Gateway init, token config |
|
||||
| `trustgraph-flow/trustgraph/gateway/endpoint/*.py` | Per-endpoint auth checks |
|
||||
| `trustgraph-base/trustgraph/schema/core/metadata.py` | `Metadata` dataclass with `user` field |
|
||||
|
||||
## Technical Design
|
||||
|
||||
### Design principles
|
||||
|
||||
- **Auth at the edge.** The gateway is the single enforcement point.
|
||||
Internal services trust the gateway and do not re-authenticate.
|
||||
This avoids distributing credential validation across dozens of
|
||||
microservices.
|
||||
|
||||
- **Identity from credentials, not from callers.** The gateway derives
|
||||
user identity from authentication credentials. Callers can no longer
|
||||
self-declare their identity via query parameters.
|
||||
|
||||
- **Workspace isolation by default.** Every authenticated user belongs to
|
||||
a workspace. All data operations are scoped to that workspace.
|
||||
Cross-workspace access is not possible through the API.
|
||||
|
||||
- **Extensible API contract.** The API accepts an optional workspace
|
||||
parameter on every request. This allows the same protocol to support
|
||||
single-workspace deployments today and multi-workspace extensions in
|
||||
the future without breaking changes.
|
||||
|
||||
- **Simple roles, not fine-grained permissions.** A small number of
|
||||
predefined roles controls what operations a user can perform. This is
|
||||
sufficient for the current API surface and avoids the complexity of
|
||||
per-resource permission management.
|
||||
|
||||
### Authentication
|
||||
|
||||
The gateway supports two credential types. Both are carried as a Bearer
|
||||
token in the `Authorization` header for HTTP requests. The gateway
|
||||
distinguishes them by format.
|
||||
|
||||
For WebSocket connections, credentials are not passed in the URL or
|
||||
headers. Instead, the client authenticates after connecting by sending
|
||||
an auth message as the first frame:
|
||||
|
||||
```
|
||||
Client: opens WebSocket to /api/v1/socket
|
||||
Server: accepts connection (unauthenticated state)
|
||||
Client: sends {"type": "auth", "token": "tg_abc123..."}
|
||||
Server: validates token
|
||||
success → {"type": "auth-ok", "workspace": "acme"}
|
||||
failure → {"type": "auth-failed", "error": "invalid token"}
|
||||
```
|
||||
|
||||
The server rejects all non-auth messages until authentication succeeds.
|
||||
The socket remains open on auth failure, allowing the client to retry
|
||||
with a different token without reconnecting. The client can also send
|
||||
a new auth message at any time to re-authenticate — for example, to
|
||||
refresh an expiring JWT or to switch workspace. The
|
||||
resolved identity (user, workspace, roles) is updated on each
|
||||
successful auth.
|
||||
|
||||
#### API keys
|
||||
|
||||
For programmatic access: CLI tools, scripts, and integrations.
|
||||
|
||||
- Opaque tokens (e.g. `tg_a1b2c3d4e5f6...`). Not JWTs — short,
|
||||
simple, easy to paste into CLI tools and headers.
|
||||
- Each user has one or more API keys.
|
||||
- Keys are stored hashed (SHA-256 with salt) in the IAM service. The
|
||||
plaintext key is returned once at creation time and cannot be
|
||||
retrieved afterwards.
|
||||
- Keys can be revoked individually without affecting other users.
|
||||
- Keys optionally have an expiry date. Expired keys are rejected.
|
||||
|
||||
On each request, the gateway resolves an API key by:
|
||||
|
||||
1. Hashing the token.
|
||||
2. Checking a local cache (hash → user/workspace/roles).
|
||||
3. On cache miss, calling the IAM service to resolve.
|
||||
4. Caching the result with a short TTL (e.g. 60 seconds).
|
||||
|
||||
Revoked keys stop working when the cache entry expires. No push
|
||||
invalidation is needed.
|
||||
|
||||
#### JWTs (login sessions)
|
||||
|
||||
For interactive access via the UI or WebSocket connections.
|
||||
|
||||
- A user logs in with username and password. The gateway forwards the
|
||||
request to the IAM service, which validates the credentials and
|
||||
returns a signed JWT.
|
||||
- The JWT carries the user ID, workspace, and roles as claims.
|
||||
- The gateway validates JWTs locally using the IAM service's public
|
||||
signing key — no service call needed on subsequent requests.
|
||||
- Token expiry is enforced by standard JWT validation at the time the
|
||||
request (or WebSocket connection) is made.
|
||||
- For long-lived WebSocket connections, the JWT is validated at connect
|
||||
time only. The connection remains authenticated for its lifetime.
|
||||
|
||||
The IAM service manages the signing key. The gateway fetches the public
|
||||
key at startup (or on first JWT encounter) and caches it.
|
||||
|
||||
#### Login endpoint
|
||||
|
||||
```
|
||||
POST /api/v1/auth/login
|
||||
{
|
||||
"username": "alice",
|
||||
"password": "..."
|
||||
}
|
||||
→ {
|
||||
"token": "eyJ...",
|
||||
"expires": "2026-04-20T19:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
The gateway forwards this to the IAM service, which validates
|
||||
credentials and returns a signed JWT. The gateway returns the JWT to
|
||||
the caller.
|
||||
|
||||
#### IAM service delegation
|
||||
|
||||
The gateway stays thin. Its authentication logic is:
|
||||
|
||||
1. Extract Bearer token from header (or query param for WebSocket).
|
||||
2. If the token has JWT format (dotted structure), validate the
|
||||
signature locally and extract claims.
|
||||
3. Otherwise, treat as an API key: hash it and check the local cache.
|
||||
On cache miss, call the IAM service to resolve.
|
||||
4. If neither succeeds, return 401.
|
||||
|
||||
All user management, key management, credential validation, and token
|
||||
signing logic lives in the IAM service. The gateway is a generic
|
||||
enforcement point that can be replaced without changing the IAM
|
||||
service.
|
||||
|
||||
#### No legacy token support
|
||||
|
||||
The existing `GATEWAY_SECRET` shared token is removed. All
|
||||
authentication uses API keys or JWTs. On first start, the bootstrap
|
||||
process creates a default workspace and admin user with an initial API
|
||||
key.
|
||||
|
||||
### User identity
|
||||
|
||||
A user belongs to exactly one workspace. The design supports extending
|
||||
this to multi-workspace access in the future (see
|
||||
[Extension points](#extension-points)).
|
||||
|
||||
A user record contains:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | string | Unique user identifier (UUID) |
|
||||
| `name` | string | Display name |
|
||||
| `email` | string | Email address (optional) |
|
||||
| `workspace` | string | Workspace the user belongs to |
|
||||
| `roles` | list[string] | Assigned roles (e.g. `["reader"]`) |
|
||||
| `enabled` | bool | Whether the user can authenticate |
|
||||
| `created` | datetime | Account creation timestamp |
|
||||
|
||||
The `workspace` field maps to the existing `user` field in `Metadata`.
|
||||
This means the storage-layer isolation (Cassandra, Neo4j, Qdrant
|
||||
filtering by `user` + `collection`) works without changes — the gateway
|
||||
sets the `user` metadata field to the authenticated user's workspace.
|
||||
|
||||
### Workspaces
|
||||
|
||||
A workspace is an isolated data boundary. Users belong to a workspace,
|
||||
and all data operations are scoped to it. Workspaces map to the existing
|
||||
`user` field in `Metadata` and the corresponding Cassandra keyspace,
|
||||
Qdrant collection prefix, and Neo4j property filters.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | string | Unique workspace identifier |
|
||||
| `name` | string | Display name |
|
||||
| `enabled` | bool | Whether the workspace is active |
|
||||
| `created` | datetime | Creation timestamp |
|
||||
|
||||
All data operations are scoped to a workspace. The gateway determines
|
||||
the effective workspace for each request as follows:
|
||||
|
||||
1. If the request includes a `workspace` parameter, validate it against
|
||||
the user's assigned workspace.
|
||||
- If it matches, use it.
|
||||
- If it does not match, return 403. (This could be extended to
|
||||
check a workspace access grant list.)
|
||||
2. If no `workspace` parameter is provided, use the user's assigned
|
||||
workspace.
|
||||
|
||||
The gateway sets the `user` field in `Metadata` to the effective
|
||||
workspace ID, replacing the caller-supplied `?user=` query parameter.
|
||||
|
||||
This design ensures forward compatibility. Clients that pass a
|
||||
workspace parameter will work unchanged if multi-workspace support is
|
||||
added later. Requests for an unassigned workspace get a clear 403
|
||||
rather than silent misbehaviour.
|
||||
|
||||
### Roles and access control
|
||||
|
||||
Three roles with fixed permissions:
|
||||
|
||||
| Role | Data operations | Admin operations | System |
|
||||
|------|----------------|-----------------|--------|
|
||||
| `reader` | Query knowledge graph, embeddings, RAG | None | None |
|
||||
| `writer` | All reader operations + load documents, manage collections | None | None |
|
||||
| `admin` | All writer operations | Config, flows, collection management, user management | Metrics |
|
||||
|
||||
Role checks happen at the gateway before dispatching to backend
|
||||
services. Each endpoint declares the minimum role required:
|
||||
|
||||
| Endpoint pattern | Minimum role |
|
||||
|-----------------|--------------|
|
||||
| `GET /api/v1/socket` (queries) | `reader` |
|
||||
| `POST /api/v1/librarian` | `writer` |
|
||||
| `POST /api/v1/flow/*/import/*` | `writer` |
|
||||
| `POST /api/v1/config` | `admin` |
|
||||
| `GET /api/v1/flow/*` | `admin` |
|
||||
| `GET /api/metrics` | `admin` |
|
||||
|
||||
Roles are hierarchical: `admin` implies `writer`, which implies
|
||||
`reader`.
|
||||
|
||||
### IAM service
|
||||
|
||||
The IAM service is a new backend service that manages all identity and
|
||||
access data. It is the authority for users, workspaces, API keys, and
|
||||
credentials. The gateway delegates to it.
|
||||
|
||||
#### Data model
|
||||
|
||||
```
|
||||
iam_workspaces (
|
||||
id text PRIMARY KEY,
|
||||
name text,
|
||||
enabled boolean,
|
||||
created timestamp
|
||||
)
|
||||
|
||||
iam_users (
|
||||
id text PRIMARY KEY,
|
||||
workspace text,
|
||||
name text,
|
||||
email text,
|
||||
password_hash text,
|
||||
roles set<text>,
|
||||
enabled boolean,
|
||||
created timestamp
|
||||
)
|
||||
|
||||
iam_api_keys (
|
||||
key_hash text PRIMARY KEY,
|
||||
user_id text,
|
||||
name text,
|
||||
expires timestamp,
|
||||
created timestamp
|
||||
)
|
||||
```
|
||||
|
||||
A secondary index on `iam_api_keys.user_id` supports listing a user's
|
||||
keys.
|
||||
|
||||
#### Responsibilities
|
||||
|
||||
- User CRUD (create, list, update, disable)
|
||||
- Workspace CRUD (create, list, update, disable)
|
||||
- API key management (create, revoke, list)
|
||||
- API key resolution (hash → user/workspace/roles)
|
||||
- Credential validation (username/password → signed JWT)
|
||||
- JWT signing key management (initialise, rotate)
|
||||
- Bootstrap (create default workspace and admin user on first start)
|
||||
|
||||
#### Communication
|
||||
|
||||
The IAM service communicates via the standard request/response pub/sub
|
||||
pattern, the same as the config service. The gateway calls it to
|
||||
resolve API keys and to handle login requests. User management
|
||||
operations (create user, revoke key, etc.) also go through the IAM
|
||||
service.
|
||||
|
||||
### Gateway changes
|
||||
|
||||
The current `Authenticator` class is replaced with a thin authentication
|
||||
middleware that delegates to the IAM service:
|
||||
|
||||
For HTTP requests:
|
||||
|
||||
1. Extract Bearer token from the `Authorization` header.
|
||||
2. If the token has JWT format (dotted structure):
|
||||
- Validate signature locally using the cached public key.
|
||||
- Extract user ID, workspace, and roles from claims.
|
||||
3. Otherwise, treat as an API key:
|
||||
- Hash the token and check the local cache.
|
||||
- On cache miss, call the IAM service to resolve.
|
||||
- Cache the result (user/workspace/roles) with a short TTL.
|
||||
4. If neither succeeds, return 401.
|
||||
5. If the user or workspace is disabled, return 403.
|
||||
6. Check the user's role against the endpoint's minimum role. If
|
||||
insufficient, return 403.
|
||||
7. Resolve the effective workspace:
|
||||
- If the request includes a `workspace` parameter, validate it
|
||||
against the user's assigned workspace. Return 403 on mismatch.
|
||||
- If no `workspace` parameter, use the user's assigned workspace.
|
||||
8. Set the `user` field in the request context to the effective
|
||||
workspace ID. This propagates through `Metadata` to all downstream
|
||||
services.
|
||||
|
||||
For WebSocket connections:
|
||||
|
||||
1. Accept the connection in an unauthenticated state.
|
||||
2. Wait for an auth message (`{"type": "auth", "token": "..."}`).
|
||||
3. Validate the token using the same logic as steps 2-7 above.
|
||||
4. On success, attach the resolved identity to the connection and
|
||||
send `{"type": "auth-ok", ...}`.
|
||||
5. On failure, send `{"type": "auth-failed", ...}` but keep the
|
||||
socket open.
|
||||
6. Reject all non-auth messages until authentication succeeds.
|
||||
7. Accept new auth messages at any time to re-authenticate.
|
||||
|
||||
### CLI changes
|
||||
|
||||
CLI tools authenticate with API keys:
|
||||
|
||||
- `--api-key` argument on all CLI tools, replacing `--api-token`.
|
||||
- `tg-create-workspace`, `tg-list-workspaces` for workspace management.
|
||||
- `tg-create-user`, `tg-list-users`, `tg-disable-user` for user
|
||||
management.
|
||||
- `tg-create-api-key`, `tg-list-api-keys`, `tg-revoke-api-key` for
|
||||
key management.
|
||||
- `--workspace` argument on tools that operate on workspace-scoped
|
||||
data.
|
||||
- The API key is passed as a Bearer token in the same way as the
|
||||
current shared token, so the transport protocol is unchanged.
|
||||
|
||||
### Audit logging
|
||||
|
||||
With user identity established, the gateway logs:
|
||||
|
||||
- Timestamp, user ID, workspace, endpoint, HTTP method, response status.
|
||||
- Audit logs are written to the standard logging output (structured
|
||||
JSON). Integration with external log aggregation (Loki, ELK) is a
|
||||
deployment concern, not an application concern.
|
||||
|
||||
### Config service changes
|
||||
|
||||
All configuration is workspace-scoped (see
|
||||
[data-ownership-model.md](data-ownership-model.md)). The config service
|
||||
needs to support this.
|
||||
|
||||
#### Schema change
|
||||
|
||||
The config table adds workspace as a key dimension:
|
||||
|
||||
```
|
||||
config (
|
||||
workspace text,
|
||||
class text,
|
||||
key text,
|
||||
value text,
|
||||
PRIMARY KEY ((workspace, class), key)
|
||||
)
|
||||
```
|
||||
|
||||
#### Request format
|
||||
|
||||
Config requests add a `workspace` field at the request level. The
|
||||
existing `(type, key)` structure is unchanged within each workspace.
|
||||
|
||||
**Get:**
|
||||
```json
|
||||
{
|
||||
"operation": "get",
|
||||
"workspace": "workspace-a",
|
||||
"keys": [{"type": "prompt", "key": "rag-prompt"}]
|
||||
}
|
||||
```
|
||||
|
||||
**Put:**
|
||||
```json
|
||||
{
|
||||
"operation": "put",
|
||||
"workspace": "workspace-a",
|
||||
"values": [{"type": "prompt", "key": "rag-prompt", "value": "..."}]
|
||||
}
|
||||
```
|
||||
|
||||
**List (all keys of a type within a workspace):**
|
||||
```json
|
||||
{
|
||||
"operation": "list",
|
||||
"workspace": "workspace-a",
|
||||
"type": "prompt"
|
||||
}
|
||||
```
|
||||
|
||||
**Delete:**
|
||||
```json
|
||||
{
|
||||
"operation": "delete",
|
||||
"workspace": "workspace-a",
|
||||
"keys": [{"type": "prompt", "key": "rag-prompt"}]
|
||||
}
|
||||
```
|
||||
|
||||
The workspace is set by:
|
||||
|
||||
- **Gateway** — from the authenticated user's workspace for API-facing
|
||||
requests.
|
||||
- **Internal services** — explicitly, based on `Metadata.user` from
|
||||
the message being processed, or `_system` for operational config.
|
||||
|
||||
#### System config namespace
|
||||
|
||||
Processor-level operational config (logging levels, connection strings,
|
||||
resource limits) is not workspace-specific. This stays in a reserved
|
||||
`_system` workspace that is not associated with any user workspace.
|
||||
Services read system config at startup without needing a workspace
|
||||
context.
|
||||
|
||||
#### Config change notifications
|
||||
|
||||
The config notify mechanism pushes change notifications via pub/sub
|
||||
when config is updated. A single update may affect multiple workspaces
|
||||
and multiple config types. The notification message carries a dict of
|
||||
changes keyed by config type, with each value being the list of
|
||||
affected workspaces:
|
||||
|
||||
```json
|
||||
{
|
||||
"version": 42,
|
||||
"changes": {
|
||||
"prompt": ["workspace-a", "workspace-b"],
|
||||
"schema": ["workspace-a"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
System config changes use the reserved `_system` workspace:
|
||||
|
||||
```json
|
||||
{
|
||||
"version": 43,
|
||||
"changes": {
|
||||
"logging": ["_system"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This structure is keyed by type because handlers register by type. A
|
||||
handler registered for `prompt` looks up `"prompt"` directly and gets
|
||||
the list of affected workspaces — no iteration over unrelated types.
|
||||
|
||||
#### Config change handlers
|
||||
|
||||
The current `on_config` hook mechanism needs two modes to support shared
|
||||
processing services:
|
||||
|
||||
- **Workspace-scoped handlers** — notify when a config type changes in a
|
||||
specific workspace. The handler looks up its registered type in the
|
||||
changes dict and checks if its workspace is in the list. Used by the
|
||||
gateway and by services that serve a single workspace.
|
||||
|
||||
- **Global handlers** — notify when a config type changes in any
|
||||
workspace. The handler looks up its registered type in the changes
|
||||
dict and gets the full list of affected workspaces. Used by shared
|
||||
processing services (prompt-rag, agent manager, etc.) that serve all
|
||||
workspaces. Each workspace in the list tells the handler which cache
|
||||
entry to update rather than reloading everything.
|
||||
|
||||
#### Per-workspace config caching
|
||||
|
||||
Shared services that handle messages from multiple workspaces maintain a
|
||||
per-workspace config cache. When a message arrives, the service looks up
|
||||
the config for the workspace identified in `Metadata.user`. If the
|
||||
workspace is not yet cached, the service fetches its config on demand.
|
||||
Config change notifications update the relevant cache entry.
|
||||
|
||||
### Flow and queue isolation
|
||||
|
||||
Flows are workspace-owned. When two workspaces start flows with the same
|
||||
name and blueprint, their queues must be separate to prevent data
|
||||
mixing.
|
||||
|
||||
Flow blueprint templates currently use `{id}` (flow instance ID) and
|
||||
`{class}` (blueprint name) as template variables in queue names. A new
|
||||
`{workspace}` variable is added so queue names include the workspace:
|
||||
|
||||
**Current queue names (no workspace isolation):**
|
||||
```
|
||||
flow:tg:document-load:{id} → flow:tg:document-load:default
|
||||
request:tg:embeddings:{class} → request:tg:embeddings:everything
|
||||
```
|
||||
|
||||
**With workspace isolation:**
|
||||
```
|
||||
flow:tg:{workspace}:document-load:{id} → flow:tg:ws-a:document-load:default
|
||||
request:tg:{workspace}:embeddings:{class} → request:tg:ws-a:embeddings:everything
|
||||
```
|
||||
|
||||
The flow service substitutes `{workspace}` from the authenticated
|
||||
workspace when starting a flow, the same way it substitutes `{id}` and
|
||||
`{class}` today.
|
||||
|
||||
Processing services are shared infrastructure — they consume from
|
||||
workspace-specific queues but are not themselves workspace-aware. The
|
||||
workspace is carried in `Metadata.user` on every message, so services
|
||||
know which workspace's data they are processing.
|
||||
|
||||
Blueprint templates need updating to include `{workspace}` in all queue
|
||||
name patterns. For migration, the flow service can inject the workspace
|
||||
into queue names automatically if the template does not include
|
||||
`{workspace}`, defaulting to the legacy behaviour for existing
|
||||
blueprints.
|
||||
|
||||
See [flow-class-definition.md](flow-class-definition.md) for the full
|
||||
blueprint template specification.
|
||||
|
||||
### What changes and what doesn't
|
||||
|
||||
**Changes:**
|
||||
|
||||
| Component | Change |
|
||||
|-----------|--------|
|
||||
| `gateway/auth.py` | Replace `Authenticator` with new auth middleware |
|
||||
| `gateway/service.py` | Initialise IAM client, configure JWT validation |
|
||||
| `gateway/endpoint/*.py` | Add role requirement per endpoint |
|
||||
| Metadata propagation | Gateway sets `user` from workspace, ignores query param |
|
||||
| Config service | Add workspace dimension to config schema |
|
||||
| Config table | `PRIMARY KEY ((workspace, class), key)` |
|
||||
| Config request/response schema | Add `workspace` field |
|
||||
| Config notify messages | Include workspace ID in change notifications |
|
||||
| `on_config` handlers | Support workspace-scoped and global modes |
|
||||
| Shared services | Per-workspace config caching |
|
||||
| Flow blueprints | Add `{workspace}` template variable to queue names |
|
||||
| Flow service | Substitute `{workspace}` when starting flows |
|
||||
| CLI tools | New user management commands, `--api-key` argument |
|
||||
| Cassandra schema | New `iam_workspaces`, `iam_users`, `iam_api_keys` tables |
|
||||
|
||||
**Does not change:**
|
||||
|
||||
| Component | Reason |
|
||||
|-----------|--------|
|
||||
| Internal service-to-service pub/sub | Services trust the gateway |
|
||||
| `Metadata` dataclass | `user` field continues to carry workspace identity |
|
||||
| Storage-layer isolation | Same `user` + `collection` filtering |
|
||||
| Message serialisation | No schema changes |
|
||||
|
||||
### Migration
|
||||
|
||||
This is a breaking change. Existing deployments must be reconfigured:
|
||||
|
||||
1. `GATEWAY_SECRET` is removed. Authentication requires API keys or
|
||||
JWT login tokens.
|
||||
2. The `?user=` query parameter is removed. Workspace identity comes
|
||||
from authentication.
|
||||
3. On first start, the IAM service bootstraps a default workspace and
|
||||
admin user. The initial API key is output to the service log.
|
||||
4. Operators create additional workspaces and users via CLI tools.
|
||||
5. Flow blueprints must be updated to include `{workspace}` in queue
|
||||
name patterns.
|
||||
6. Config data must be migrated to include the workspace dimension.
|
||||
|
||||
## Extension points
|
||||
|
||||
The design includes deliberate extension points for future capabilities.
|
||||
These are not implemented but the architecture does not preclude them:
|
||||
|
||||
- **Multi-workspace access.** Users could be granted access to
|
||||
additional workspaces beyond their primary assignment. The workspace
|
||||
validation step checks a grant list instead of a single assignment.
|
||||
- **Rules-based access control.** A separate access control service
|
||||
could evaluate fine-grained policies (per-collection permissions,
|
||||
operation-level restrictions, time-based access). The gateway
|
||||
delegates authorisation decisions to this service.
|
||||
- **External identity provider integration.** SAML, LDAP, and OIDC
|
||||
flows (group mapping, claims-based role assignment) could be added
|
||||
to the IAM service.
|
||||
- **Cross-workspace administration.** A `superadmin` role for platform
|
||||
operators who manage multiple workspaces.
|
||||
- **Delegated workspace provisioning.** APIs for programmatic workspace
|
||||
creation and user onboarding.
|
||||
|
||||
These extensions are additive — they extend the validation logic
|
||||
without changing the request/response protocol. The gateway can be
|
||||
replaced with an alternative implementation that supports these
|
||||
capabilities while the IAM service and backend services remain
|
||||
unchanged.
|
||||
|
||||
## Implementation plan
|
||||
|
||||
Workspace support is a prerequisite for auth — users are assigned to
|
||||
workspaces, config is workspace-scoped, and flows use workspace in
|
||||
queue names. Implementing workspaces first allows the structural changes
|
||||
to be tested end-to-end without auth complicating debugging.
|
||||
|
||||
### Phase 1: Workspace support (no auth)
|
||||
|
||||
All workspace-scoped data and processing changes. The system works with
|
||||
workspaces but no authentication — callers pass workspace as a
|
||||
parameter, honour system. This allows full end-to-end testing: multiple
|
||||
workspaces with separate flows, config, queues, and data.
|
||||
|
||||
#### Config service
|
||||
|
||||
- Update config client API to accept a workspace parameter on all
|
||||
requests
|
||||
- Update config storage schema to add workspace as a key dimension
|
||||
- Update config notification API to report changes as a dict of
|
||||
type → workspace list
|
||||
- Update the processor base class to understand workspaces in config
|
||||
notifications (workspace-scoped and global handler modes)
|
||||
- Update all processors to implement workspace-aware config handling
|
||||
(per-workspace config caching, on-demand fetch)
|
||||
|
||||
#### Flow and queue isolation
|
||||
|
||||
- Update flow blueprints to include `{workspace}` in all queue name
|
||||
patterns
|
||||
- Update the flow service to substitute `{workspace}` when starting
|
||||
flows
|
||||
- Update all built-in blueprints to include `{workspace}`
|
||||
|
||||
#### CLI tools (workspace support)
|
||||
|
||||
- Add `--workspace` argument to CLI tools that operate on
|
||||
workspace-scoped data
|
||||
- Add `tg-create-workspace`, `tg-list-workspaces` commands
|
||||
|
||||
### Phase 2: Authentication and access control
|
||||
|
||||
With workspaces working, add the IAM service and lock down the gateway.
|
||||
|
||||
#### IAM service
|
||||
|
||||
A new service handling identity and access management on behalf of the
|
||||
API gateway:
|
||||
|
||||
- Add workspace table support (CRUD, enable/disable)
|
||||
- Add user table support (CRUD, enable/disable, workspace assignment)
|
||||
- Add roles support (role assignment, role validation)
|
||||
- Add API key support (create, revoke, list, hash storage)
|
||||
- Add ability to initialise a JWT signing key for token grants
|
||||
- Add token grant endpoint: user/password login returns a signed JWT
|
||||
- Add bootstrap/initialisation mechanism: ability to set the signing
|
||||
key and create the initial workspace + admin user on first start
|
||||
|
||||
#### API gateway integration
|
||||
|
||||
- Add IAM middleware to the API gateway replacing the current
|
||||
`Authenticator`
|
||||
- Add local JWT validation (public key from IAM service)
|
||||
- Add API key resolution with local cache (hash → user/workspace/roles,
|
||||
cache miss calls IAM service, short TTL)
|
||||
- Add login endpoint forwarding to IAM service
|
||||
- Add workspace resolution: validate requested workspace against user
|
||||
assignment
|
||||
- Add role-based endpoint access checks
|
||||
- Add user management API endpoints (forwarded to IAM service)
|
||||
- Add audit logging (user ID, workspace, endpoint, method, status)
|
||||
- WebSocket auth via first-message protocol (auth message after
|
||||
connect, socket stays open on failure, re-auth supported)
|
||||
|
||||
#### CLI tools (auth support)
|
||||
|
||||
- Add `tg-create-user`, `tg-list-users`, `tg-disable-user` commands
|
||||
- Add `tg-create-api-key`, `tg-list-api-keys`, `tg-revoke-api-key`
|
||||
commands
|
||||
- Replace `--api-token` with `--api-key` on existing CLI tools
|
||||
|
||||
#### Bootstrap and cutover
|
||||
|
||||
- Create default workspace and admin user on first start if IAM tables
|
||||
are empty
|
||||
- Remove `GATEWAY_SECRET` and `?user=` query parameter support
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### IAM data store
|
||||
|
||||
IAM data is stored in dedicated Cassandra tables owned by the IAM
|
||||
service, not in the config service. Reasons:
|
||||
|
||||
- **Security isolation.** The config service has a broad, generic
|
||||
protocol. An access control failure on the config service could
|
||||
expose credentials. A dedicated IAM service with a purpose-built
|
||||
protocol limits the attack surface and makes security auditing
|
||||
clearer.
|
||||
- **Data model fit.** IAM needs indexed lookups (API key hash → user,
|
||||
list keys by user). The config service's `(workspace, type, key) →
|
||||
value` model stores opaque JSON strings with no secondary indexes.
|
||||
- **Scope.** IAM data is global (workspaces, users, keys). Config is
|
||||
workspace-scoped. Mixing global and workspace-scoped data in the
|
||||
same store adds complexity.
|
||||
- **Audit.** IAM operations (key creation, revocation, login attempts)
|
||||
are security events that should be logged separately from general
|
||||
config changes.
|
||||
|
||||
## Deferred to future design
|
||||
|
||||
- **OIDC integration.** External identity provider support (SAML, LDAP,
|
||||
OIDC) is left for future implementation. The extension points section
|
||||
describes where this fits architecturally.
|
||||
- **API key scoping.** API keys could be scoped to specific collections
|
||||
within a workspace rather than granting workspace-wide access. To be
|
||||
designed when the need arises.
|
||||
- **tg-init-trustgraph** only initialises a single workspace.
|
||||
|
||||
## References
|
||||
|
||||
- [Data Ownership and Information Separation](data-ownership-model.md)
|
||||
- [MCP Tool Bearer Token Specification](mcp-tool-bearer-token.md)
|
||||
- [Multi-Tenant Support Specification](multi-tenant-support.md)
|
||||
- [Neo4j User Collection Isolation](neo4j-user-collection-isolation.md)
|
||||
Loading…
Add table
Add a link
Reference in a new issue