mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-05-06 05:42:36 +02:00
Per-workspace queue routing for workspace-scoped services (#862)
Workspace identity is now determined by queue infrastructure instead of message body fields, closing a privilege-escalation vector where a caller could spoof workspace in the request payload. - Add WorkspaceProcessor base class: discovers workspaces from config at startup, creates per-workspace consumers (queue:workspace), and manages consumer lifecycle on workspace create/delete events - Roll out to librarian, flow-svc, knowledge cores, and config-svc - Config service gets a dual-queue regime: a system queue for cross-workspace ops (getvalues-all-ws, bootstrapper writes to __workspaces__) and per-workspace queues for tenant-scoped ops, with workspace discovery from its own Cassandra store - Remove workspace field from request schemas (FlowRequest, LibrarianRequest, KnowledgeRequest, CollectionManagementRequest) and from DocumentMetadata / ProcessingMetadata — table stores now accept workspace as an explicit parameter - Strip workspace encode/decode from all message translators and gateway serializers - Gateway enforces workspace existence: reject requests targeting non-existent workspaces instead of routing to queues with no consumer - Config service provisions new workspaces from __template__ on creation - Add workspace lifecycle hooks to AsyncProcessor so any processor can react to workspace create/delete without subclassing WorkspaceProcessor
This commit is contained in:
parent
9be257ceee
commit
9f2bfbce0c
53 changed files with 1565 additions and 677 deletions
366
docs/tech-specs/workspace-scoped-services.md
Normal file
366
docs/tech-specs/workspace-scoped-services.md
Normal file
|
|
@ -0,0 +1,366 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Workspace-Scoped Services"
|
||||
parent: "Tech Specs"
|
||||
---
|
||||
|
||||
# Workspace-Scoped Services
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Workspace-scoped services (librarian, config, knowledge, collection
|
||||
management) currently operate on global queues — a single
|
||||
`request:tg:librarian` queue handles requests for all workspaces.
|
||||
Workspace identity is carried as a field in the request body, set by
|
||||
the gateway after authentication. This creates several problems:
|
||||
|
||||
- **No structural isolation.** All workspaces share a single queue.
|
||||
Workspace scoping depends entirely on a body field being populated
|
||||
correctly. If the field is missing or wrong, the service operates
|
||||
on the wrong workspace — or fails with a confusing error. This is
|
||||
a security concern: workspace isolation should be enforced by
|
||||
infrastructure, not by trusting a field.
|
||||
|
||||
- **Redundant workspace fields.** Nested objects within requests
|
||||
(e.g. `processing-metadata`, `document-metadata`) carry their own
|
||||
`workspace` fields alongside the top-level request workspace.
|
||||
The gateway resolves the top-level workspace but does not
|
||||
propagate into nested payloads. Services that read workspace from
|
||||
a nested object instead of the top-level address see `None` and
|
||||
fail.
|
||||
|
||||
- **No workspace lifecycle awareness.** Workspace-scoped services
|
||||
have no mechanism to learn when workspaces are created or deleted.
|
||||
Flow processors discover workspaces indirectly through config
|
||||
entries, but workspace-scoped services on global queues have no
|
||||
equivalent. There is no event when a workspace appears or
|
||||
disappears.
|
||||
|
||||
- **Inconsistency with flow-scoped services.** Flow-scoped services
|
||||
already use per-workspace, per-flow queue names
|
||||
(`request:tg:{workspace}:embeddings:{class}`). Workspace-scoped
|
||||
services are the exception — they sit on global queues while
|
||||
everything else is structurally isolated.
|
||||
|
||||
## Design
|
||||
|
||||
### Per-workspace queues for workspace-scoped services
|
||||
|
||||
Workspace-scoped services move from global queues to per-workspace
|
||||
queues. The queue name includes the workspace identifier:
|
||||
|
||||
**Current (global):**
|
||||
```
|
||||
request:tg:librarian
|
||||
request:tg:config
|
||||
```
|
||||
|
||||
**Proposed (per-workspace):**
|
||||
```
|
||||
request:tg:librarian:{workspace}
|
||||
request:tg:config:{workspace}
|
||||
```
|
||||
|
||||
The gateway routes requests to the correct queue based on the
|
||||
resolved workspace from authentication — the same workspace that
|
||||
today gets written into the request body. The workspace is now part
|
||||
of the queue address, not just a field in the payload.
|
||||
|
||||
Services subscribe to per-workspace queues. When a new workspace is
|
||||
created, they subscribe to its queue. When a workspace is deleted,
|
||||
they unsubscribe.
|
||||
|
||||
### Workspace lifecycle via the `__workspaces__` config namespace
|
||||
|
||||
Workspace lifecycle events are modelled as config changes in a
|
||||
reserved `__workspaces__` namespace. This mirrors the existing
|
||||
`__system__` namespace — a reserved space for infrastructure
|
||||
concerns that don't belong to any user workspace.
|
||||
|
||||
When IAM creates a workspace, it writes an entry to the config
|
||||
service:
|
||||
|
||||
```
|
||||
workspace: __workspaces__
|
||||
type: workspace
|
||||
key: <workspace-id>
|
||||
value: {"enabled": true}
|
||||
```
|
||||
|
||||
When IAM deletes (or disables) a workspace, it updates or deletes
|
||||
the entry. The config service sees this as a normal config change
|
||||
and pushes a notification through the existing `ConfigPush`
|
||||
mechanism.
|
||||
|
||||
This avoids introducing a new notification channel. The config
|
||||
service already has the machinery to notify subscribers of changes
|
||||
by type and workspace. Workspace lifecycle is just another config
|
||||
type that services can register handlers for.
|
||||
|
||||
### Config push changes
|
||||
|
||||
#### Remove `_`-prefix suppression
|
||||
|
||||
The config service currently suppresses notifications for workspaces
|
||||
whose names start with `_`. This suppression is removed — the
|
||||
config service pushes notifications for all workspaces
|
||||
unconditionally.
|
||||
|
||||
The filtering moves to the consumer side. `AsyncProcessor` already
|
||||
filters `_`-prefixed workspaces in its config handler dispatch
|
||||
(lines 212 and 315 of `async_processor.py`). This filtering is
|
||||
retained as the default behaviour, but handlers can opt in to
|
||||
infrastructure namespaces by registering for them explicitly (see
|
||||
`WorkspaceProcessor` below).
|
||||
|
||||
#### Workspace change events
|
||||
|
||||
The `ConfigPush` message gains a `workspace_changes` field alongside
|
||||
the existing `changes` field:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ConfigPush:
|
||||
version: int = 0
|
||||
|
||||
# Config changes: type -> [affected workspaces]
|
||||
changes: dict[str, list[str]] = field(default_factory=dict)
|
||||
|
||||
# Workspace lifecycle: created/deleted workspace lists
|
||||
workspace_changes: WorkspaceChanges | None = None
|
||||
|
||||
@dataclass
|
||||
class WorkspaceChanges:
|
||||
created: list[str] = field(default_factory=list)
|
||||
deleted: list[str] = field(default_factory=list)
|
||||
```
|
||||
|
||||
The config service populates `workspace_changes` when it detects
|
||||
changes to the `__workspaces__` config namespace. A new key
|
||||
appearing is a creation; a key being deleted is a deletion.
|
||||
|
||||
Services that don't care about workspace lifecycle ignore the field.
|
||||
Services that do (workspace-scoped services, the gateway) react by
|
||||
subscribing to or tearing down per-workspace queues.
|
||||
|
||||
### The `WorkspaceProcessor` base class
|
||||
|
||||
A new base class sits between `AsyncProcessor` and `FlowProcessor`
|
||||
in the processor hierarchy:
|
||||
|
||||
```
|
||||
AsyncProcessor → WorkspaceProcessor → FlowProcessor
|
||||
```
|
||||
|
||||
`WorkspaceProcessor` manages per-workspace queue lifecycle the same
|
||||
way `FlowProcessor` manages per-flow lifecycle. It:
|
||||
|
||||
1. On startup, discovers existing workspaces by fetching config from
|
||||
the `__workspaces__` namespace (using the existing
|
||||
`_fetch_type_all_workspaces` pattern).
|
||||
|
||||
2. For each workspace, subscribes to the service's per-workspace
|
||||
queue (e.g. `request:tg:librarian:{workspace}`).
|
||||
|
||||
3. Registers a config handler for the `workspace` type in the
|
||||
`__workspaces__` namespace. When a workspace is created, it
|
||||
subscribes to the new queue. When a workspace is deleted, it
|
||||
unsubscribes and tears down.
|
||||
|
||||
4. Exposes hooks for derived classes:
|
||||
- `on_workspace_created(workspace)` — called after subscribing
|
||||
to the new workspace's queue.
|
||||
- `on_workspace_deleted(workspace)` — called before
|
||||
unsubscribing from the workspace's queue.
|
||||
|
||||
`FlowProcessor` extends `WorkspaceProcessor` instead of
|
||||
`AsyncProcessor`. Flows exist within workspaces, so the hierarchy
|
||||
is natural: workspace creation triggers queue subscription, then
|
||||
flow config changes within that workspace trigger flow start/stop.
|
||||
|
||||
Services that are workspace-scoped but not flow-scoped (librarian,
|
||||
knowledge, collection management) extend `WorkspaceProcessor`
|
||||
directly.
|
||||
|
||||
### Gateway routing changes
|
||||
|
||||
The gateway currently dispatches workspace-scoped requests to global
|
||||
service dispatchers. This changes to per-workspace dispatchers that
|
||||
route to per-workspace queues.
|
||||
|
||||
For HTTP requests, the resolved workspace from the URL path
|
||||
(`/api/v1/workspaces/{w}/library`) determines the target queue.
|
||||
|
||||
For WebSocket requests via the Mux, the resolved workspace from
|
||||
`enforce_workspace` determines the target queue. The Mux already
|
||||
resolves workspace before dispatching (line 214 of `mux.py`); the
|
||||
change is that `invoke_global_service` uses workspace to select the
|
||||
queue, rather than routing to a single global queue.
|
||||
|
||||
System-level services (IAM) remain on global queues — they are not
|
||||
workspace-scoped.
|
||||
|
||||
### Workspace field on nested metadata objects
|
||||
|
||||
With per-workspace queues, the workspace is part of the queue
|
||||
address. Services know which workspace they are serving by which
|
||||
queue a message arrived on.
|
||||
|
||||
The `workspace` field on `DocumentMetadata` and
|
||||
`ProcessingMetadata` in the librarian schema becomes a storage
|
||||
attribute — the workspace the record belongs to, populated by the
|
||||
service from the request context, not by the caller. The service
|
||||
reads workspace from `request.workspace` (the resolved address) or
|
||||
from the queue context, never from a nested payload field.
|
||||
|
||||
Callers are not required to populate workspace on nested objects.
|
||||
The service fills it in authoritatively from the request context
|
||||
before storing.
|
||||
|
||||
## Interaction with existing specs
|
||||
|
||||
### IAM (`iam.md`, `iam-contract.md`)
|
||||
|
||||
IAM is the authority for workspace existence. When IAM creates or
|
||||
deletes a workspace, it writes to the `__workspaces__` config
|
||||
namespace. This is a two-step operation: register the workspace in
|
||||
IAM's own store (`iam_workspaces` table), then announce it via
|
||||
config.
|
||||
|
||||
The IAM service itself remains on a global queue — it is a
|
||||
system-level service, not workspace-scoped.
|
||||
|
||||
### Config service
|
||||
|
||||
The config service is workspace-scoped — it stores per-workspace
|
||||
configuration. Under this design, the config service moves to
|
||||
per-workspace queues like other workspace-scoped services.
|
||||
|
||||
On startup, the config service discovers workspaces from its own
|
||||
store (it has direct access to the config tables, unlike other
|
||||
services that fetch via request/response). It subscribes to
|
||||
per-workspace queues for each known workspace.
|
||||
|
||||
When IAM writes a new workspace entry to the `__workspaces__`
|
||||
namespace, the config service sees the write directly (it is the
|
||||
config service), creates the per-workspace queue, and pushes the
|
||||
notification.
|
||||
|
||||
### Flow blueprints (`flow-blueprint-definition.md`)
|
||||
|
||||
Flow blueprints already use `{workspace}` in queue name templates.
|
||||
No changes needed — flows are created within an already-existing
|
||||
workspace, so the per-workspace infrastructure is in place before
|
||||
flow start.
|
||||
|
||||
### Data ownership (`data-ownership-model.md`)
|
||||
|
||||
This spec reinforces the data ownership model: a workspace is the
|
||||
primary isolation boundary, and per-workspace queues make that
|
||||
boundary structural rather than conventional.
|
||||
|
||||
## Migration
|
||||
|
||||
### Queue naming
|
||||
|
||||
Existing deployments use global queues for workspace-scoped
|
||||
services. Migration requires:
|
||||
|
||||
1. Deploy updated services that subscribe to both global and
|
||||
per-workspace queues during a transition period.
|
||||
2. Update the gateway to route to per-workspace queues.
|
||||
3. Drain the global queues.
|
||||
4. Remove global queue subscriptions from services.
|
||||
|
||||
### `__workspaces__` bootstrap
|
||||
|
||||
On first start after migration, IAM populates the `__workspaces__`
|
||||
config namespace with entries for all existing workspaces from
|
||||
`iam_workspaces`. This seeds the config store so that
|
||||
workspace-scoped services discover existing workspaces on startup.
|
||||
|
||||
### Config push compatibility
|
||||
|
||||
The `workspace_changes` field on `ConfigPush` is additive.
|
||||
Services that don't understand it ignore it (the field defaults to
|
||||
`None`). No breaking change to the push protocol.
|
||||
|
||||
## Summary of changes
|
||||
|
||||
| Component | Change |
|
||||
|-----------|--------|
|
||||
| Queue names | Workspace-scoped services move from `request:tg:{service}` to `request:tg:{service}:{workspace}` |
|
||||
| `__workspaces__` namespace | New reserved config namespace for workspace lifecycle |
|
||||
| IAM service | Writes to `__workspaces__` on workspace create/delete |
|
||||
| Config service | Removes `_`-prefix notification suppression; generates `workspace_changes` events; moves to per-workspace queues |
|
||||
| `ConfigPush` schema | Adds `workspace_changes` field (`WorkspaceChanges` dataclass) |
|
||||
| `WorkspaceProcessor` | New base class managing per-workspace queue lifecycle |
|
||||
| `FlowProcessor` | Extends `WorkspaceProcessor` instead of `AsyncProcessor` |
|
||||
| `AsyncProcessor` | Relaxes `_`-prefix filtering to allow opt-in for infrastructure namespaces |
|
||||
| Gateway | Routes workspace-scoped requests to per-workspace queues |
|
||||
| Librarian schema | `workspace` on nested metadata becomes a service-populated storage attribute, not a caller-supplied address |
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Foundation — `__workspaces__` namespace and config push
|
||||
|
||||
- **`ConfigPush` schema** (`trustgraph-base/trustgraph/schema/services/config.py`): Add `WorkspaceChanges` dataclass and `workspace_changes` field.
|
||||
- **Config push serialization** (`trustgraph-base/trustgraph/messaging/translators/`): Encode/decode the new field.
|
||||
- **Config service** (`trustgraph-flow/trustgraph/config/`): Detect writes to `__workspaces__` namespace and populate `workspace_changes` on the push message. Remove `_`-prefix notification suppression.
|
||||
- **`AsyncProcessor`** (`trustgraph-base/trustgraph/base/async_processor.py`): Relax `_`-prefix filtering so handlers can opt in to infrastructure namespaces.
|
||||
- **IAM service** (`trustgraph-flow/trustgraph/iam/`): Write to `__workspaces__` config namespace on `create-workspace` and `delete-workspace`. Add bootstrap step to seed `__workspaces__` entries for existing workspaces.
|
||||
|
||||
### Phase 2: `WorkspaceProcessor` base class
|
||||
|
||||
- **New `WorkspaceProcessor`** (`trustgraph-base/trustgraph/base/workspace_processor.py`): Implements workspace discovery on startup, per-workspace queue subscribe/unsubscribe, workspace lifecycle handler registration, `on_workspace_created`/`on_workspace_deleted` hooks.
|
||||
- **`FlowProcessor`** (`trustgraph-base/trustgraph/base/flow_processor.py`): Re-parent from `AsyncProcessor` to `WorkspaceProcessor`.
|
||||
- **Verify** existing flow processors continue to work — the new layer should be transparent to them.
|
||||
|
||||
### Phase 3: Per-workspace queues for workspace-scoped services
|
||||
|
||||
- **Queue definitions** (`trustgraph-base/trustgraph/schema/`): Update queue names for librarian, config, knowledge, collection management to include `{workspace}`.
|
||||
- **Librarian** (`trustgraph-flow/trustgraph/librarian/`): Extend `WorkspaceProcessor`. Remove reliance on workspace from nested metadata objects.
|
||||
- **Knowledge service, collection management** and other workspace-scoped services: Extend `WorkspaceProcessor`.
|
||||
- **Config service**: Self-bootstrap per-workspace queues from its own store on startup; subscribe to new workspace queues when `__workspaces__` entries appear.
|
||||
|
||||
### Phase 4: Gateway routing
|
||||
|
||||
- **Gateway dispatcher manager** (`trustgraph-flow/trustgraph/gateway/dispatch/manager.py`): Route workspace-scoped services to per-workspace queues using resolved workspace. System-level services (IAM) remain on global queues.
|
||||
- **Mux** (`trustgraph-flow/trustgraph/gateway/dispatch/mux.py`): Pass workspace to `invoke_global_service` for workspace-scoped services.
|
||||
- **HTTP endpoints** (`trustgraph-flow/trustgraph/gateway/endpoint/`): Route to per-workspace queues based on URL path workspace.
|
||||
|
||||
### Phase 5: Schema cleanup
|
||||
|
||||
- **`DocumentMetadata`, `ProcessingMetadata`** (`trustgraph-base/trustgraph/schema/services/library.py`): Remove `workspace` field from nested metadata objects, or retain as a service-populated storage attribute only.
|
||||
- **Serialization** (`trustgraph-flow/trustgraph/gateway/dispatch/serialize.py`, `trustgraph-base/trustgraph/messaging/translators/metadata.py`): Update translators to match.
|
||||
- **API client** (`trustgraph-base/trustgraph/api/library.py`): Stop sending workspace in nested payloads.
|
||||
- **Librarian service** (`trustgraph-flow/trustgraph/librarian/`): Populate workspace on stored records from request context.
|
||||
|
||||
### Dependencies
|
||||
|
||||
```
|
||||
Phase 1 (foundation)
|
||||
↓
|
||||
Phase 2 (WorkspaceProcessor)
|
||||
↓
|
||||
Phase 3 (per-workspace queues) ←→ Phase 4 (gateway routing)
|
||||
↓ ↓
|
||||
Phase 5 (schema cleanup)
|
||||
```
|
||||
|
||||
Phases 3 and 4 can be developed in parallel but must be deployed together — services expecting per-workspace queues need the gateway to route to them.
|
||||
|
||||
## References
|
||||
|
||||
- [Identity and Access Management](iam.md) — workspace registry,
|
||||
authentication, and workspace resolution.
|
||||
- [IAM Contract](iam-contract.md) — resource model and workspace as
|
||||
address vs. parameter.
|
||||
- [Data Ownership and Information Separation](data-ownership-model.md)
|
||||
— workspace as isolation boundary.
|
||||
- [Config Push and Poke](config-push-poke.md) — config notification
|
||||
mechanism.
|
||||
- [Flow Blueprint Definition](flow-blueprint-definition.md) —
|
||||
`{workspace}` template variable in queue names.
|
||||
- [Flow Service Queue Lifecycle](flow-service-queue-lifecycle.md) —
|
||||
queue ownership and lifecycle model.
|
||||
Loading…
Add table
Add a link
Reference in a new issue