mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 08:26:21 +02:00
Native CLI i18n: The TrustGraph CLI has built-in translation support that dynamically loads language strings. You can test and use different languages by simply passing the --lang flag (e.g., --lang es for Spanish, --lang ru for Russian) or by configuring your environment's LANG variable. Automated Docs Translations: This PR introduces autonomously translated Markdown documentation into several target languages, including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew, Arabic, Simplified Chinese, and Russian.
778 lines
28 KiB
Markdown
778 lines
28 KiB
Markdown
---
|
|
layout: default
|
|
title: "Technical Specification: Multi-Tenant Support"
|
|
parent: "Tech Specs"
|
|
---
|
|
|
|
# Technical Specification: Multi-Tenant Support
|
|
|
|
## Overview
|
|
|
|
Enable multi-tenant deployments by fixing parameter name mismatches that prevent queue customization and adding Cassandra keyspace parameterization.
|
|
|
|
## Architecture Context
|
|
|
|
### Flow-Based Queue Resolution
|
|
|
|
The TrustGraph system uses a **flow-based architecture** for dynamic queue resolution, which inherently supports multi-tenancy:
|
|
|
|
- **Flow Definitions** are stored in Cassandra and specify queue names via interface definitions
|
|
- **Queue names use templates** with `{id}` variables that are replaced with flow instance IDs
|
|
- **Services dynamically resolve queues** by looking up flow configurations at request time
|
|
- **Each tenant can have unique flows** with different queue names, providing isolation
|
|
|
|
Example flow interface definition:
|
|
```json
|
|
{
|
|
"interfaces": {
|
|
"triples-store": "persistent://tg/flow/triples-store:{id}",
|
|
"graph-embeddings-store": "persistent://tg/flow/graph-embeddings-store:{id}"
|
|
}
|
|
}
|
|
```
|
|
|
|
When tenant A starts flow `tenant-a-prod` and tenant B starts flow `tenant-b-prod`, they automatically get isolated queues:
|
|
- `persistent://tg/flow/triples-store:tenant-a-prod`
|
|
- `persistent://tg/flow/triples-store:tenant-b-prod`
|
|
|
|
**Services correctly designed for multi-tenancy:**
|
|
- ✅ **Knowledge Management (cores)** - Dynamically resolves queues from flow configuration passed in requests
|
|
|
|
**Services needing fixes:**
|
|
- 🔴 **Config Service** - Parameter name mismatch prevents queue customization
|
|
- 🔴 **Librarian Service** - Hardcoded storage management topics (discussed below)
|
|
- 🔴 **All Services** - Cannot customize Cassandra keyspace
|
|
|
|
## Problem Statement
|
|
|
|
### Issue #1: Parameter Name Mismatch in AsyncProcessor
|
|
- **CLI defines:** `--config-queue` (unclear naming)
|
|
- **Argparse converts to:** `config_queue` (in params dict)
|
|
- **Code looks for:** `config_push_queue`
|
|
- **Result:** Parameter is ignored, defaults to `persistent://tg/config/config`
|
|
- **Impact:** Affects all 32+ services inheriting from AsyncProcessor
|
|
- **Blocks:** Multi-tenant deployments cannot use tenant-specific config queues
|
|
- **Solution:** Rename CLI parameter to `--config-push-queue` for clarity (breaking change acceptable since feature is currently broken)
|
|
|
|
### Issue #2: Parameter Name Mismatch in Config Service
|
|
- **CLI defines:** `--push-queue` (ambiguous naming)
|
|
- **Argparse converts to:** `push_queue` (in params dict)
|
|
- **Code looks for:** `config_push_queue`
|
|
- **Result:** Parameter is ignored
|
|
- **Impact:** Config service cannot use custom push queue
|
|
- **Solution:** Rename CLI parameter to `--config-push-queue` for consistency and clarity (breaking change acceptable)
|
|
|
|
### Issue #3: Hardcoded Cassandra Keyspace
|
|
- **Current:** Keyspace hardcoded as `"config"`, `"knowledge"`, `"librarian"` in various services
|
|
- **Result:** Cannot customize keyspace for multi-tenant deployments
|
|
- **Impact:** Config, cores, and librarian services
|
|
- **Blocks:** Multiple tenants cannot use separate Cassandra keyspaces
|
|
|
|
### Issue #4: Collection Management Architecture ✅ COMPLETED
|
|
- **Previous:** Collections stored in Cassandra librarian keyspace via separate collections table
|
|
- **Previous:** Librarian used 4 hardcoded storage management topics to coordinate collection create/delete:
|
|
- `vector_storage_management_topic`
|
|
- `object_storage_management_topic`
|
|
- `triples_storage_management_topic`
|
|
- `storage_management_response_topic`
|
|
- **Problems (Resolved):**
|
|
- Hardcoded topics could not be customized for multi-tenant deployments
|
|
- Complex async coordination between librarian and 4+ storage services
|
|
- Separate Cassandra table and management infrastructure
|
|
- Non-persistent request/response queues for critical operations
|
|
- **Solution Implemented:** Migrated collections to config service storage, use config push for distribution
|
|
- **Status:** All storage backends migrated to `CollectionConfigHandler` pattern
|
|
|
|
## Solution
|
|
|
|
This spec addresses Issues #1, #2, #3, and #4.
|
|
|
|
### Part 1: Fix Parameter Name Mismatches
|
|
|
|
#### Change 1: AsyncProcessor Base Class - Rename CLI Parameter
|
|
**File:** `trustgraph-base/trustgraph/base/async_processor.py`
|
|
**Line:** 260-264
|
|
|
|
**Current:**
|
|
```python
|
|
parser.add_argument(
|
|
'--config-queue',
|
|
default=default_config_queue,
|
|
help=f'Config push queue {default_config_queue}',
|
|
)
|
|
```
|
|
|
|
**Fixed:**
|
|
```python
|
|
parser.add_argument(
|
|
'--config-push-queue',
|
|
default=default_config_queue,
|
|
help=f'Config push queue (default: {default_config_queue})',
|
|
)
|
|
```
|
|
|
|
**Rationale:**
|
|
- Clearer, more explicit naming
|
|
- Matches the internal variable name `config_push_queue`
|
|
- Breaking change acceptable since feature is currently non-functional
|
|
- No code change needed in params.get() - it already looks for the correct name
|
|
|
|
#### Change 2: Config Service - Rename CLI Parameter
|
|
**File:** `trustgraph-flow/trustgraph/config/service/service.py`
|
|
**Line:** 276-279
|
|
|
|
**Current:**
|
|
```python
|
|
parser.add_argument(
|
|
'--push-queue',
|
|
default=default_config_push_queue,
|
|
help=f'Config push queue (default: {default_config_push_queue})'
|
|
)
|
|
```
|
|
|
|
**Fixed:**
|
|
```python
|
|
parser.add_argument(
|
|
'--config-push-queue',
|
|
default=default_config_push_queue,
|
|
help=f'Config push queue (default: {default_config_push_queue})'
|
|
)
|
|
```
|
|
|
|
**Rationale:**
|
|
- Clearer naming - "config-push-queue" is more explicit than just "push-queue"
|
|
- Matches the internal variable name `config_push_queue`
|
|
- Consistent with AsyncProcessor's `--config-push-queue` parameter
|
|
- Breaking change acceptable since feature is currently non-functional
|
|
- No code change needed in params.get() - it already looks for the correct name
|
|
|
|
### Part 2: Add Cassandra Keyspace Parameterization
|
|
|
|
#### Change 3: Add Keyspace Parameter to cassandra_config Module
|
|
**File:** `trustgraph-base/trustgraph/base/cassandra_config.py`
|
|
|
|
**Add CLI argument** (in `add_cassandra_args()` function):
|
|
```python
|
|
parser.add_argument(
|
|
'--cassandra-keyspace',
|
|
default=None,
|
|
help='Cassandra keyspace (default: service-specific)'
|
|
)
|
|
```
|
|
|
|
**Add environment variable support** (in `resolve_cassandra_config()` function):
|
|
```python
|
|
keyspace = params.get(
|
|
"cassandra_keyspace",
|
|
os.environ.get("CASSANDRA_KEYSPACE")
|
|
)
|
|
```
|
|
|
|
**Update return value** of `resolve_cassandra_config()`:
|
|
- Currently returns: `(hosts, username, password)`
|
|
- Change to return: `(hosts, username, password, keyspace)`
|
|
|
|
**Rationale:**
|
|
- Consistent with existing Cassandra configuration pattern
|
|
- Available to all services via `add_cassandra_args()`
|
|
- Supports both CLI and environment variable configuration
|
|
|
|
#### Change 4: Config Service - Use Parameterized Keyspace
|
|
**File:** `trustgraph-flow/trustgraph/config/service/service.py`
|
|
|
|
**Line 30** - Remove hardcoded keyspace:
|
|
```python
|
|
# DELETE THIS LINE:
|
|
keyspace = "config"
|
|
```
|
|
|
|
**Lines 69-73** - Update cassandra config resolution:
|
|
|
|
**Current:**
|
|
```python
|
|
cassandra_host, cassandra_username, cassandra_password = \
|
|
resolve_cassandra_config(params)
|
|
```
|
|
|
|
**Fixed:**
|
|
```python
|
|
cassandra_host, cassandra_username, cassandra_password, keyspace = \
|
|
resolve_cassandra_config(params, default_keyspace="config")
|
|
```
|
|
|
|
**Rationale:**
|
|
- Maintains backward compatibility with "config" as default
|
|
- Allows override via `--cassandra-keyspace` or `CASSANDRA_KEYSPACE`
|
|
|
|
#### Change 5: Cores/Knowledge Service - Use Parameterized Keyspace
|
|
**File:** `trustgraph-flow/trustgraph/cores/service.py`
|
|
|
|
**Line 37** - Remove hardcoded keyspace:
|
|
```python
|
|
# DELETE THIS LINE:
|
|
keyspace = "knowledge"
|
|
```
|
|
|
|
**Update cassandra config resolution** (similar location as config service):
|
|
```python
|
|
cassandra_host, cassandra_username, cassandra_password, keyspace = \
|
|
resolve_cassandra_config(params, default_keyspace="knowledge")
|
|
```
|
|
|
|
#### Change 6: Librarian Service - Use Parameterized Keyspace
|
|
**File:** `trustgraph-flow/trustgraph/librarian/service.py`
|
|
|
|
**Line 51** - Remove hardcoded keyspace:
|
|
```python
|
|
# DELETE THIS LINE:
|
|
keyspace = "librarian"
|
|
```
|
|
|
|
**Update cassandra config resolution** (similar location as config service):
|
|
```python
|
|
cassandra_host, cassandra_username, cassandra_password, keyspace = \
|
|
resolve_cassandra_config(params, default_keyspace="librarian")
|
|
```
|
|
|
|
### Part 3: Migrate Collection Management to Config Service
|
|
|
|
#### Overview
|
|
Migrate collections from Cassandra librarian keyspace to config service storage. This eliminates hardcoded storage management topics and simplifies the architecture by using the existing config push mechanism for distribution.
|
|
|
|
#### Current Architecture
|
|
```
|
|
API Request → Gateway → Librarian Service
|
|
↓
|
|
CollectionManager
|
|
↓
|
|
Cassandra Collections Table (librarian keyspace)
|
|
↓
|
|
Broadcast to 4 Storage Management Topics (hardcoded)
|
|
↓
|
|
Wait for 4+ Storage Service Responses
|
|
↓
|
|
Response to Gateway
|
|
```
|
|
|
|
#### New Architecture
|
|
```
|
|
API Request → Gateway → Librarian Service
|
|
↓
|
|
CollectionManager
|
|
↓
|
|
Config Service API (put/delete/getvalues)
|
|
↓
|
|
Cassandra Config Table (class='collections', key='user:collection')
|
|
↓
|
|
Config Push (to all subscribers on config-push-queue)
|
|
↓
|
|
All Storage Services receive config update independently
|
|
```
|
|
|
|
#### Change 7: Collection Manager - Use Config Service API
|
|
**File:** `trustgraph-flow/trustgraph/librarian/collection_manager.py`
|
|
|
|
**Remove:**
|
|
- `LibraryTableStore` usage (Lines 33, 40-41)
|
|
- Storage management producers initialization (Lines 86-140)
|
|
- `on_storage_response` method (Lines 400-430)
|
|
- `pending_deletions` tracking (Lines 57, 90-96, and usage throughout)
|
|
|
|
**Add:**
|
|
- Config service client for API calls (request/response pattern)
|
|
|
|
**Config Client Setup:**
|
|
```python
|
|
# In __init__, add config request/response producers/consumers
|
|
from trustgraph.schema.services.config import ConfigRequest, ConfigResponse
|
|
|
|
# Producer for config requests
|
|
self.config_request_producer = Producer(
|
|
client=pulsar_client,
|
|
topic=config_request_queue,
|
|
schema=ConfigRequest,
|
|
)
|
|
|
|
# Consumer for config responses (with correlation ID)
|
|
self.config_response_consumer = Consumer(
|
|
taskgroup=taskgroup,
|
|
client=pulsar_client,
|
|
flow=None,
|
|
topic=config_response_queue,
|
|
subscriber=f"{id}-config",
|
|
schema=ConfigResponse,
|
|
handler=self.on_config_response,
|
|
)
|
|
|
|
# Tracking for pending config requests
|
|
self.pending_config_requests = {} # request_id -> asyncio.Event
|
|
```
|
|
|
|
**Modify `list_collections` (Lines 145-180):**
|
|
```python
|
|
async def list_collections(self, user, tag_filter=None, limit=None):
|
|
"""List collections from config service"""
|
|
# Send getvalues request to config service
|
|
request = ConfigRequest(
|
|
id=str(uuid.uuid4()),
|
|
operation='getvalues',
|
|
type='collections',
|
|
)
|
|
|
|
# Send request and wait for response
|
|
response = await self.send_config_request(request)
|
|
|
|
# Parse collections from response
|
|
collections = []
|
|
for key, value_json in response.values.items():
|
|
if ":" in key:
|
|
coll_user, collection = key.split(":", 1)
|
|
if coll_user == user:
|
|
metadata = json.loads(value_json)
|
|
collections.append(CollectionMetadata(**metadata))
|
|
|
|
# Apply tag filtering in-memory (as before)
|
|
if tag_filter:
|
|
collections = [c for c in collections if any(tag in c.tags for tag in tag_filter)]
|
|
|
|
# Apply limit
|
|
if limit:
|
|
collections = collections[:limit]
|
|
|
|
return collections
|
|
|
|
async def send_config_request(self, request):
|
|
"""Send config request and wait for response"""
|
|
event = asyncio.Event()
|
|
self.pending_config_requests[request.id] = event
|
|
|
|
await self.config_request_producer.send(request)
|
|
await event.wait()
|
|
|
|
return self.pending_config_requests.pop(request.id + "_response")
|
|
|
|
async def on_config_response(self, message, consumer, flow):
|
|
"""Handle config response"""
|
|
response = message.value()
|
|
if response.id in self.pending_config_requests:
|
|
self.pending_config_requests[response.id + "_response"] = response
|
|
self.pending_config_requests[response.id].set()
|
|
```
|
|
|
|
**Modify `update_collection` (Lines 182-312):**
|
|
```python
|
|
async def update_collection(self, user, collection, name, description, tags):
|
|
"""Update collection via config service"""
|
|
# Create metadata
|
|
metadata = CollectionMetadata(
|
|
user=user,
|
|
collection=collection,
|
|
name=name,
|
|
description=description,
|
|
tags=tags,
|
|
)
|
|
|
|
# Send put request to config service
|
|
request = ConfigRequest(
|
|
id=str(uuid.uuid4()),
|
|
operation='put',
|
|
type='collections',
|
|
key=f'{user}:{collection}',
|
|
value=json.dumps(metadata.to_dict()),
|
|
)
|
|
|
|
response = await self.send_config_request(request)
|
|
|
|
if response.error:
|
|
raise RuntimeError(f"Config update failed: {response.error.message}")
|
|
|
|
# Config service will trigger config push automatically
|
|
# Storage services will receive update and create collections
|
|
```
|
|
|
|
**Modify `delete_collection` (Lines 314-398):**
|
|
```python
|
|
async def delete_collection(self, user, collection):
|
|
"""Delete collection via config service"""
|
|
# Send delete request to config service
|
|
request = ConfigRequest(
|
|
id=str(uuid.uuid4()),
|
|
operation='delete',
|
|
type='collections',
|
|
key=f'{user}:{collection}',
|
|
)
|
|
|
|
response = await self.send_config_request(request)
|
|
|
|
if response.error:
|
|
raise RuntimeError(f"Config delete failed: {response.error.message}")
|
|
|
|
# Config service will trigger config push automatically
|
|
# Storage services will receive update and delete collections
|
|
```
|
|
|
|
**Collection Metadata Format:**
|
|
- Stored in config table as: `class='collections', key='user:collection'`
|
|
- Value is JSON-serialized CollectionMetadata (without timestamp fields)
|
|
- Fields: `user`, `collection`, `name`, `description`, `tags`
|
|
- Example: `class='collections', key='alice:my-docs', value='{"user":"alice","collection":"my-docs","name":"My Documents","description":"...","tags":["work"]}'`
|
|
|
|
#### Change 8: Librarian Service - Remove Storage Management Infrastructure
|
|
**File:** `trustgraph-flow/trustgraph/librarian/service.py`
|
|
|
|
**Remove:**
|
|
- Storage management producers (Lines 173-190):
|
|
- `vector_storage_management_producer`
|
|
- `object_storage_management_producer`
|
|
- `triples_storage_management_producer`
|
|
- Storage response consumer (Lines 192-201)
|
|
- `on_storage_response` handler (Lines 467-473)
|
|
|
|
**Modify:**
|
|
- CollectionManager initialization (Lines 215-224) - remove storage producer parameters
|
|
|
|
**Note:** External collection API remains unchanged:
|
|
- `list-collections`
|
|
- `update-collection`
|
|
- `delete-collection`
|
|
|
|
#### Change 9: Remove Collections Table from LibraryTableStore
|
|
**File:** `trustgraph-flow/trustgraph/tables/library.py`
|
|
|
|
**Delete:**
|
|
- Collections table CREATE statement (Lines 114-127)
|
|
- Collections prepared statements (Lines 205-240)
|
|
- All collection methods (Lines 578-717):
|
|
- `ensure_collection_exists`
|
|
- `list_collections`
|
|
- `update_collection`
|
|
- `delete_collection`
|
|
- `get_collection`
|
|
- `create_collection`
|
|
|
|
**Rationale:**
|
|
- Collections now stored in config table
|
|
- Breaking change acceptable - no data migration needed
|
|
- Simplifies librarian service significantly
|
|
|
|
#### Change 10: Storage Services - Config-Based Collection Management ✅ COMPLETED
|
|
|
|
**Status:** All 11 storage backends have been migrated to use `CollectionConfigHandler`.
|
|
|
|
**Affected Services (11 total):**
|
|
- Document embeddings: milvus, pinecone, qdrant
|
|
- Graph embeddings: milvus, pinecone, qdrant
|
|
- Object storage: cassandra
|
|
- Triples storage: cassandra, falkordb, memgraph, neo4j
|
|
|
|
**Files:**
|
|
- `trustgraph-flow/trustgraph/storage/doc_embeddings/milvus/write.py`
|
|
- `trustgraph-flow/trustgraph/storage/doc_embeddings/pinecone/write.py`
|
|
- `trustgraph-flow/trustgraph/storage/doc_embeddings/qdrant/write.py`
|
|
- `trustgraph-flow/trustgraph/storage/graph_embeddings/milvus/write.py`
|
|
- `trustgraph-flow/trustgraph/storage/graph_embeddings/pinecone/write.py`
|
|
- `trustgraph-flow/trustgraph/storage/graph_embeddings/qdrant/write.py`
|
|
- `trustgraph-flow/trustgraph/storage/objects/cassandra/write.py`
|
|
- `trustgraph-flow/trustgraph/storage/triples/cassandra/write.py`
|
|
- `trustgraph-flow/trustgraph/storage/triples/falkordb/write.py`
|
|
- `trustgraph-flow/trustgraph/storage/triples/memgraph/write.py`
|
|
- `trustgraph-flow/trustgraph/storage/triples/neo4j/write.py`
|
|
|
|
**Implementation Pattern (all services):**
|
|
|
|
1. **Register config handler in `__init__`:**
|
|
```python
|
|
# Add after AsyncProcessor initialization
|
|
self.register_config_handler(self.on_collection_config)
|
|
self.known_collections = set() # Track (user, collection) tuples
|
|
```
|
|
|
|
2. **Implement config handler:**
|
|
```python
|
|
async def on_collection_config(self, config, version):
|
|
"""Handle collection configuration updates"""
|
|
logger.info(f"Collection config version: {version}")
|
|
|
|
if "collections" not in config:
|
|
return
|
|
|
|
# Parse collections from config
|
|
# Key format: "user:collection" in config["collections"]
|
|
config_collections = set()
|
|
for key in config["collections"].keys():
|
|
if ":" in key:
|
|
user, collection = key.split(":", 1)
|
|
config_collections.add((user, collection))
|
|
|
|
# Determine changes
|
|
to_create = config_collections - self.known_collections
|
|
to_delete = self.known_collections - config_collections
|
|
|
|
# Create new collections (idempotent)
|
|
for user, collection in to_create:
|
|
try:
|
|
await self.create_collection_internal(user, collection)
|
|
self.known_collections.add((user, collection))
|
|
logger.info(f"Created collection: {user}/{collection}")
|
|
except Exception as e:
|
|
logger.error(f"Failed to create {user}/{collection}: {e}")
|
|
|
|
# Delete removed collections (idempotent)
|
|
for user, collection in to_delete:
|
|
try:
|
|
await self.delete_collection_internal(user, collection)
|
|
self.known_collections.discard((user, collection))
|
|
logger.info(f"Deleted collection: {user}/{collection}")
|
|
except Exception as e:
|
|
logger.error(f"Failed to delete {user}/{collection}: {e}")
|
|
```
|
|
|
|
3. **Initialize known collections on startup:**
|
|
```python
|
|
async def start(self):
|
|
"""Start the processor"""
|
|
await super().start()
|
|
await self.sync_known_collections()
|
|
|
|
async def sync_known_collections(self):
|
|
"""Query backend to populate known_collections set"""
|
|
# Backend-specific implementation:
|
|
# - Milvus/Pinecone/Qdrant: List collections/indexes matching naming pattern
|
|
# - Cassandra: Query keyspaces or collection metadata
|
|
# - Neo4j/Memgraph/FalkorDB: Query CollectionMetadata nodes
|
|
pass
|
|
```
|
|
|
|
4. **Refactor existing handler methods:**
|
|
```python
|
|
# Rename and remove response sending:
|
|
# handle_create_collection → create_collection_internal
|
|
# handle_delete_collection → delete_collection_internal
|
|
|
|
async def create_collection_internal(self, user, collection):
|
|
"""Create collection (idempotent)"""
|
|
# Same logic as current handle_create_collection
|
|
# But remove response producer calls
|
|
# Handle "already exists" gracefully
|
|
pass
|
|
|
|
async def delete_collection_internal(self, user, collection):
|
|
"""Delete collection (idempotent)"""
|
|
# Same logic as current handle_delete_collection
|
|
# But remove response producer calls
|
|
# Handle "not found" gracefully
|
|
pass
|
|
```
|
|
|
|
5. **Remove storage management infrastructure:**
|
|
- Remove `self.storage_request_consumer` setup and start
|
|
- Remove `self.storage_response_producer` setup
|
|
- Remove `on_storage_management` dispatcher method
|
|
- Remove metrics for storage management
|
|
- Remove imports: `StorageManagementRequest`, `StorageManagementResponse`
|
|
|
|
**Backend-Specific Considerations:**
|
|
|
|
- **Vector stores (Milvus, Pinecone, Qdrant):** Track logical `(user, collection)` in `known_collections`, but may create multiple backend collections per dimension. Continue lazy creation pattern. Delete operations must remove all dimension variants.
|
|
|
|
- **Cassandra Objects:** Collections are row properties, not structures. Track keyspace-level information.
|
|
|
|
- **Graph stores (Neo4j, Memgraph, FalkorDB):** Query `CollectionMetadata` nodes on startup. Create/delete metadata nodes on sync.
|
|
|
|
- **Cassandra Triples:** Use `KnowledgeGraph` API for collection operations.
|
|
|
|
**Key Design Points:**
|
|
|
|
- **Eventual consistency:** No request/response mechanism, config push is broadcast
|
|
- **Idempotency:** All create/delete operations must be safe to retry
|
|
- **Error handling:** Log errors but don't block config updates
|
|
- **Self-healing:** Failed operations will retry on next config push
|
|
- **Collection key format:** `"user:collection"` in `config["collections"]`
|
|
|
|
#### Change 11: Update Collection Schema - Remove Timestamps
|
|
**File:** `trustgraph-base/trustgraph/schema/services/collection.py`
|
|
|
|
**Modify CollectionMetadata (Lines 13-21):**
|
|
Remove `created_at` and `updated_at` fields:
|
|
```python
|
|
class CollectionMetadata(Record):
|
|
user = String()
|
|
collection = String()
|
|
name = String()
|
|
description = String()
|
|
tags = Array(String())
|
|
# Remove: created_at = String()
|
|
# Remove: updated_at = String()
|
|
```
|
|
|
|
**Modify CollectionManagementRequest (Lines 25-47):**
|
|
Remove timestamp fields:
|
|
```python
|
|
class CollectionManagementRequest(Record):
|
|
operation = String()
|
|
user = String()
|
|
collection = String()
|
|
timestamp = String()
|
|
name = String()
|
|
description = String()
|
|
tags = Array(String())
|
|
# Remove: created_at = String()
|
|
# Remove: updated_at = String()
|
|
tag_filter = Array(String())
|
|
limit = Integer()
|
|
```
|
|
|
|
**Rationale:**
|
|
- Timestamps don't add value for collections
|
|
- Config service maintains its own version tracking
|
|
- Simplifies schema and reduces storage
|
|
|
|
#### Benefits of Config Service Migration
|
|
|
|
1. ✅ **Eliminates hardcoded storage management topics** - Solves multi-tenant blocker
|
|
2. ✅ **Simpler coordination** - No complex async waiting for 4+ storage responses
|
|
3. ✅ **Eventual consistency** - Storage services update independently via config push
|
|
4. ✅ **Better reliability** - Persistent config push vs non-persistent request/response
|
|
5. ✅ **Unified configuration model** - Collections treated as configuration
|
|
6. ✅ **Reduces complexity** - Removes ~300 lines of coordination code
|
|
7. ✅ **Multi-tenant ready** - Config already supports tenant isolation via keyspace
|
|
8. ✅ **Version tracking** - Config service version mechanism provides audit trail
|
|
|
|
## Implementation Notes
|
|
|
|
### Backward Compatibility
|
|
|
|
**Parameter Changes:**
|
|
- CLI parameter renames are breaking changes but acceptable (feature currently non-functional)
|
|
- Services work without parameters (use defaults)
|
|
- Default keyspaces preserved: "config", "knowledge", "librarian"
|
|
- Default queue: `persistent://tg/config/config`
|
|
|
|
**Collection Management:**
|
|
- **Breaking change:** Collections table removed from librarian keyspace
|
|
- **No data migration provided** - acceptable for this phase
|
|
- External collection API unchanged (list/update/delete operations)
|
|
- Collection metadata format simplified (timestamps removed)
|
|
|
|
### Testing Requirements
|
|
|
|
**Parameter Testing:**
|
|
1. Verify `--config-push-queue` parameter works on graph-embeddings service
|
|
2. Verify `--config-push-queue` parameter works on text-completion service
|
|
3. Verify `--config-push-queue` parameter works on config service
|
|
4. Verify `--cassandra-keyspace` parameter works for config service
|
|
5. Verify `--cassandra-keyspace` parameter works for cores service
|
|
6. Verify `--cassandra-keyspace` parameter works for librarian service
|
|
7. Verify services work without parameters (uses defaults)
|
|
8. Verify multi-tenant deployment with custom queue names and keyspace
|
|
|
|
**Collection Management Testing:**
|
|
9. Verify `list-collections` operation via config service
|
|
10. Verify `update-collection` creates/updates in config table
|
|
11. Verify `delete-collection` removes from config table
|
|
12. Verify config push is triggered on collection updates
|
|
13. Verify tag filtering works with config-based storage
|
|
14. Verify collection operations work without timestamp fields
|
|
|
|
### Multi-Tenant Deployment Example
|
|
```bash
|
|
# Tenant: tg-dev
|
|
graph-embeddings \
|
|
-p pulsar+ssl://broker:6651 \
|
|
--pulsar-api-key <KEY> \
|
|
--config-push-queue persistent://tg-dev/config/config
|
|
|
|
config-service \
|
|
-p pulsar+ssl://broker:6651 \
|
|
--pulsar-api-key <KEY> \
|
|
--config-push-queue persistent://tg-dev/config/config \
|
|
--cassandra-keyspace tg_dev_config
|
|
```
|
|
|
|
## Impact Analysis
|
|
|
|
### Services Affected by Change 1-2 (CLI Parameter Rename)
|
|
All services inheriting from AsyncProcessor or FlowProcessor:
|
|
- config-service
|
|
- cores-service
|
|
- librarian-service
|
|
- graph-embeddings
|
|
- document-embeddings
|
|
- text-completion-* (all providers)
|
|
- extract-* (all extractors)
|
|
- query-* (all query services)
|
|
- retrieval-* (all RAG services)
|
|
- storage-* (all storage services)
|
|
- And 20+ more services
|
|
|
|
### Services Affected by Changes 3-6 (Cassandra Keyspace)
|
|
- config-service
|
|
- cores-service
|
|
- librarian-service
|
|
|
|
### Services Affected by Changes 7-11 (Collection Management)
|
|
|
|
**Immediate Changes:**
|
|
- librarian-service (collection_manager.py, service.py)
|
|
- tables/library.py (collections table removal)
|
|
- schema/services/collection.py (timestamp removal)
|
|
|
|
**Completed Changes (Change 10):** ✅
|
|
- All storage services (11 total) - migrated to config push for collection updates via `CollectionConfigHandler`
|
|
- Storage management schema removed from `storage.py`
|
|
|
|
## Future Considerations
|
|
|
|
### Per-User Keyspace Model
|
|
|
|
Some services use **per-user keyspaces** dynamically, where each user gets their own Cassandra keyspace:
|
|
|
|
**Services with per-user keyspaces:**
|
|
1. **Triples Query Service** (`trustgraph-flow/trustgraph/query/triples/cassandra/service.py:65`)
|
|
- Uses `keyspace=query.user`
|
|
2. **Objects Query Service** (`trustgraph-flow/trustgraph/query/objects/cassandra/service.py:479`)
|
|
- Uses `keyspace=self.sanitize_name(user)`
|
|
3. **KnowledgeGraph Direct Access** (`trustgraph-flow/trustgraph/direct/cassandra_kg.py:18`)
|
|
- Default parameter `keyspace="trustgraph"`
|
|
|
|
**Status:** These are **not modified** in this specification.
|
|
|
|
**Future Review Required:**
|
|
- Evaluate whether per-user keyspace model creates tenant isolation issues
|
|
- Consider if multi-tenant deployments need keyspace prefix patterns (e.g., `tenant_a_user1`)
|
|
- Review for potential user ID collision across tenants
|
|
- Assess if single shared keyspace per tenant with user-based row isolation is preferable
|
|
|
|
**Note:** This does not block the current multi-tenant implementation but should be reviewed before production multi-tenant deployments.
|
|
|
|
## Implementation Phases
|
|
|
|
### Phase 1: Parameter Fixes (Changes 1-6)
|
|
- Fix `--config-push-queue` parameter naming
|
|
- Add `--cassandra-keyspace` parameter support
|
|
- **Outcome:** Multi-tenant queue and keyspace configuration enabled
|
|
|
|
### Phase 2: Collection Management Migration (Changes 7-9, 11)
|
|
- Migrate collection storage to config service
|
|
- Remove collections table from librarian
|
|
- Update collection schema (remove timestamps)
|
|
- **Outcome:** Eliminates hardcoded storage management topics, simplifies librarian
|
|
|
|
### Phase 3: Storage Service Updates (Change 10) ✅ COMPLETED
|
|
- Updated all storage services to use config push for collections via `CollectionConfigHandler`
|
|
- Removed storage management request/response infrastructure
|
|
- Removed legacy schema definitions
|
|
- **Outcome:** Complete config-based collection management achieved
|
|
|
|
## References
|
|
- GitHub Issue: https://github.com/trustgraph-ai/trustgraph/issues/582
|
|
- Related Files:
|
|
- `trustgraph-base/trustgraph/base/async_processor.py`
|
|
- `trustgraph-base/trustgraph/base/cassandra_config.py`
|
|
- `trustgraph-base/trustgraph/schema/core/topic.py`
|
|
- `trustgraph-base/trustgraph/schema/services/collection.py`
|
|
- `trustgraph-flow/trustgraph/config/service/service.py`
|
|
- `trustgraph-flow/trustgraph/cores/service.py`
|
|
- `trustgraph-flow/trustgraph/librarian/service.py`
|
|
- `trustgraph-flow/trustgraph/librarian/collection_manager.py`
|
|
- `trustgraph-flow/trustgraph/tables/library.py`
|