mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-04-25 16:36:21 +02:00

feat: workspace-based multi-tenancy, replacing user as tenancy axis (#840 )

Introduces `workspace` as the isolation boundary for config, flows,
library, and knowledge data. Removes `user` as a schema-level field
throughout the code, API specs, and tests; workspace provides the
same separation more cleanly at the trusted flow.workspace layer
rather than through client-supplied message fields.

Design
------
- IAM tech spec (docs/tech-specs/iam.md) documents current state,
  proposed auth/access model, and migration direction.
- Data ownership model (docs/tech-specs/data-ownership-model.md)
  captures the workspace/collection/flow hierarchy.

Schema + messaging
------------------
- Drop `user` field from AgentRequest/Step, GraphRagQuery,
  DocumentRagQuery, Triples/Graph/Document/Row EmbeddingsRequest,
  Sparql/Rows/Structured QueryRequest, ToolServiceRequest.
- Keep collection/workspace routing via flow.workspace at the
  service layer.
- Translators updated to not serialise/deserialise user.

API specs
---------
- OpenAPI schemas and path examples cleaned of user fields.
- Websocket async-api messages updated.
- Removed the unused parameters/User.yaml.

Services + base
---------------
- Librarian, collection manager, knowledge, config: all operations
  scoped by workspace. Config client API takes workspace as first
  positional arg.
- `flow.workspace` set at flow start time by the infrastructure;
  no longer pass-through from clients.
- Tool service drops user-personalisation passthrough.

CLI + SDK
---------
- tg-init-workspace and workspace-aware import/export.
- All tg-* commands drop user args; accept --workspace.
- Python API/SDK (flow, socket_client, async_*, explainability,
  library) drop user kwargs from every method signature.

MCP server
----------
- All tool endpoints drop user parameters; socket_manager no longer
  keyed per user.

Flow service
------------
- Closure-based topic cleanup on flow stop: only delete topics
  whose blueprint template was parameterised AND no remaining
  live flow (across all workspaces) still resolves to that topic.
  Three scopes fall out naturally from template analysis:
    * {id} -> per-flow, deleted on stop
    * {blueprint} -> per-blueprint, kept while any flow of the
      same blueprint exists
    * {workspace} -> per-workspace, kept while any flow in the
      workspace exists
    * literal -> global, never deleted (e.g. tg.request.librarian)
  Fixes a bug where stopping a flow silently destroyed the global
  librarian exchange, wedging all library operations until manual
  restart.

RabbitMQ backend
----------------
- heartbeat=60, blocked_connection_timeout=300. Catches silently
  dead connections (broker restart, orphaned channels, network
  partitions) within ~2 heartbeat windows, so the consumer
  reconnects and re-binds its queue rather than sitting forever
  on a zombie connection.

Tests
-----
- Full test refresh: unit, integration, contract, provenance.
- Dropped user-field assertions and constructor kwargs across
  ~100 test files.
- Renamed user-collection isolation tests to workspace-collection.

2026-04-21 23:23:01 +01:00

9.1 KiB

Raw Blame History

layout	title	parent
default	Flow Blueprint Definition Specification	Tech Specs

Flow Blueprint Definition Specification

Overview

A flow blueprint defines a complete dataflow pattern template in the TrustGraph system. When instantiated, it creates an interconnected network of processors that handle data ingestion, processing, storage, and querying as a unified system.

Structure

A flow blueprint definition consists of five main sections:

1. Class Section

Defines shared service processors that are instantiated once per flow blueprint. These processors handle requests from all flow instances of this class.

"class": {
  "service-name:{class}": {
    "request": "queue-pattern:{workspace}:{class}",
    "response": "queue-pattern:{workspace}:{class}",
    "settings": {
      "setting-name": "fixed-value",
      "parameterized-setting": "{parameter-name}"
    }
  }
}

Characteristics:

Shared across all flow instances of the same class within a workspace
Typically expensive or stateless services (LLMs, embedding models)
Use {workspace} and {class} template variables for queue naming
Settings can be fixed values or parameterized with {parameter-name} syntax
Examples: embeddings:{workspace}:{class}, text-completion:{workspace}:{class}

2. Flow Section

Defines flow-specific processors that are instantiated for each individual flow instance. Each flow gets its own isolated set of these processors.

"flow": {
  "processor-name:{id}": {
    "input": "queue-pattern:{workspace}:{id}",
    "output": "queue-pattern:{workspace}:{id}",
    "settings": {
      "setting-name": "fixed-value",
      "parameterized-setting": "{parameter-name}"
    }
  }
}

Characteristics:

Unique instance per flow
Handle flow-specific data and state
Use {workspace} and {id} template variables for queue naming
Settings can be fixed values or parameterized with {parameter-name} syntax
Examples: chunker:{workspace}:{id}, pdf-decoder:{workspace}:{id}

3. Interfaces Section

Defines the entry points and interaction contracts for the flow. These form the API surface for external systems and internal component communication.

Interfaces can take two forms:

Fire-and-Forget Pattern (single queue):

"interfaces": {
  "document-load": "persistent://tg/flow/{workspace}:document-load:{id}",
  "triples-store": "persistent://tg/flow/{workspace}:triples-store:{id}"
}

Request/Response Pattern (object with request/response fields):

"interfaces": {
  "embeddings": {
    "request": "non-persistent://tg/request/{workspace}:embeddings:{class}",
    "response": "non-persistent://tg/response/{workspace}:embeddings:{class}"
  }
}

Types of Interfaces:

Entry Points: Where external systems inject data (document-load, agent)
Service Interfaces: Request/response patterns for services (embeddings, text-completion)
Data Interfaces: Fire-and-forget data flow connection points (triples-store, entity-contexts-load)

4. Parameters Section

Maps flow-specific parameter names to centrally-stored parameter definitions:

"parameters": {
  "model": "llm-model",
  "temp": "temperature",
  "chunk": "chunk-size"
}

Characteristics:

Keys are parameter names used in processor settings (e.g., {model})
Values reference parameter definitions stored in schema/config
Enables reuse of common parameter definitions across flows
Reduces duplication of parameter schemas

5. Metadata

Additional information about the flow blueprint:

"description": "Human-readable description",
"tags": ["capability-1", "capability-2"]

Template Variables

System Variables

{workspace}

Replaced with the workspace identifier
Isolates queue names between workspaces so that two workspaces starting the same flow do not share queues
Must be included in all queue name patterns to ensure workspace isolation
Example: ws-acme, ws-globex
All blueprint templates must include {workspace} in queue name patterns

{id}

Replaced with the unique flow instance identifier
Creates isolated resources for each flow
Example: flow-123, customer-A-flow

{class}

Replaced with the flow blueprint name
Creates shared resources across flows of the same class
Example: standard-rag, enterprise-rag

Parameter Variables

{parameter-name}

Custom parameters defined at flow launch time
Parameter names match keys in the flow's parameters section
Used in processor settings to customize behavior
Examples: {model}, {temp}, {chunk}
Replaced with values provided when launching the flow
Validated against centrally-stored parameter definitions

Processor Settings

Settings provide configuration values to processors at instantiation time. They can be:

Fixed Settings

Direct values that don't change:

"settings": {
  "model": "gemma3:12b",
  "temperature": 0.7,
  "max_retries": 3
}

Parameterized Settings

Values that use parameters provided at flow launch:

"settings": {
  "model": "{model}",
  "temperature": "{temp}",
  "endpoint": "https://{region}.api.example.com"
}

Parameter names in settings correspond to keys in the flow's parameters section.

Settings Examples

LLM Processor with Parameters:

// In parameters section:
"parameters": {
  "model": "llm-model",
  "temp": "temperature",
  "tokens": "max-tokens",
  "key": "openai-api-key"
}

// In processor definition:
"text-completion:{class}": {
  "request": "non-persistent://tg/request/text-completion:{class}",
  "response": "non-persistent://tg/response/text-completion:{class}",
  "settings": {
    "model": "{model}",
    "temperature": "{temp}",
    "max_tokens": "{tokens}",
    "api_key": "{key}"
  }
}

Chunker with Fixed and Parameterized Settings:

// In parameters section:
"parameters": {
  "chunk": "chunk-size"
}

// In processor definition:
"chunker:{id}": {
  "input": "persistent://tg/flow/chunk:{id}",
  "output": "persistent://tg/flow/chunk-load:{id}",
  "settings": {
    "chunk_size": "{chunk}",
    "chunk_overlap": 100,
    "encoding": "utf-8"
  }
}

Queue Patterns (Pulsar)

Flow blueprintes use Apache Pulsar for messaging. Queue names follow the Pulsar format:

<persistence>://<tenant>/<namespace>/<topic>

Components:

persistence: persistent or non-persistent (Pulsar persistence mode)
tenant: tg for TrustGraph-supplied flow blueprint definitions
namespace: Indicates the messaging pattern
- flow: Fire-and-forget services
- request: Request portion of request/response services
- response: Response portion of request/response services
topic: The specific queue/topic name with template variables

Persistent Queues

Pattern: persistent://tg/flow/<topic>:{id}
Used for fire-and-forget services and durable data flow
Data persists in Pulsar storage across restarts
Example: persistent://tg/flow/chunk-load:{id}

Non-Persistent Queues

Pattern: non-persistent://tg/request/<topic>:{class} or non-persistent://tg/response/<topic>:{class}
Used for request/response messaging patterns
Ephemeral, not persisted to disk by Pulsar
Lower latency, suitable for RPC-style communication
Example: non-persistent://tg/request/embeddings:{class}

Dataflow Architecture

The flow blueprint creates a unified dataflow where:

Document Processing Pipeline: Flows from ingestion through transformation to storage
Query Services: Integrated processors that query the same data stores and services
Shared Services: Centralized processors that all flows can utilize
Storage Writers: Persist processed data to appropriate stores

All processors (both {id} and {class}) work together as a cohesive dataflow graph, not as separate systems.

Example Flow Instantiation

Given:

Flow Instance ID: customer-A-flow
Flow Blueprint: standard-rag
Flow parameter mappings:
- "model": "llm-model"
- "temp": "temperature"
- "chunk": "chunk-size"
User-provided parameters:
- model: gpt-4
- temp: 0.5
- chunk: 512

Template expansions:

persistent://tg/flow/chunk-load:{id} → persistent://tg/flow/chunk-load:customer-A-flow
non-persistent://tg/request/embeddings:{class} → non-persistent://tg/request/embeddings:standard-rag
"model": "{model}" → "model": "gpt-4"
"temperature": "{temp}" → "temperature": "0.5"
"chunk_size": "{chunk}" → "chunk_size": "512"

This creates:

Isolated document processing pipeline for customer-A-flow
Shared embedding service for all standard-rag flows
Complete dataflow from document ingestion through querying
Processors configured with the provided parameter values

Benefits

Resource Efficiency: Expensive services are shared across flows
Flow Isolation: Each flow has its own data processing pipeline
Scalability: Can instantiate multiple flows from the same template
Modularity: Clear separation between shared and flow-specific components
Unified Architecture: Query and processing are part of the same dataflow

9.1 KiB Raw Blame History

Flow Blueprint Definition Specification

Overview

Structure

1. Class Section

2. Flow Section

3. Interfaces Section

4. Parameters Section

5. Metadata

Template Variables

System Variables

{workspace}

{id}

{class}

Parameter Variables

{parameter-name}

Processor Settings

Fixed Settings

Parameterized Settings

Settings Examples

Queue Patterns (Pulsar)

Components:

Persistent Queues

Non-Persistent Queues

Dataflow Architecture

Example Flow Instantiation

Benefits

9.1 KiB

Raw Blame History