mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 00:16:23 +02:00
Basic multitenant support (#583)
* Tech spec * Address multi-tenant queue option problems in CLI * Modified collection service to use config * Changed storage management to use the config service definition
This commit is contained in:
parent
789d9713a0
commit
7d07f802a8
28 changed files with 1416 additions and 1731 deletions
768
docs/tech-specs/multi-tenant-support.md
Normal file
768
docs/tech-specs/multi-tenant-support.md
Normal file
|
|
@ -0,0 +1,768 @@
|
|||
# Technical Specification: Multi-Tenant Support
|
||||
|
||||
## Overview
|
||||
|
||||
Enable multi-tenant deployments by fixing parameter name mismatches that prevent queue customization and adding Cassandra keyspace parameterization.
|
||||
|
||||
## Architecture Context
|
||||
|
||||
### Flow-Based Queue Resolution
|
||||
|
||||
The TrustGraph system uses a **flow-based architecture** for dynamic queue resolution, which inherently supports multi-tenancy:
|
||||
|
||||
- **Flow Definitions** are stored in Cassandra and specify queue names via interface definitions
|
||||
- **Queue names use templates** with `{id}` variables that are replaced with flow instance IDs
|
||||
- **Services dynamically resolve queues** by looking up flow configurations at request time
|
||||
- **Each tenant can have unique flows** with different queue names, providing isolation
|
||||
|
||||
Example flow interface definition:
|
||||
```json
|
||||
{
|
||||
"interfaces": {
|
||||
"triples-store": "persistent://tg/flow/triples-store:{id}",
|
||||
"graph-embeddings-store": "persistent://tg/flow/graph-embeddings-store:{id}"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
When tenant A starts flow `tenant-a-prod` and tenant B starts flow `tenant-b-prod`, they automatically get isolated queues:
|
||||
- `persistent://tg/flow/triples-store:tenant-a-prod`
|
||||
- `persistent://tg/flow/triples-store:tenant-b-prod`
|
||||
|
||||
**Services correctly designed for multi-tenancy:**
|
||||
- ✅ **Knowledge Management (cores)** - Dynamically resolves queues from flow configuration passed in requests
|
||||
|
||||
**Services needing fixes:**
|
||||
- 🔴 **Config Service** - Parameter name mismatch prevents queue customization
|
||||
- 🔴 **Librarian Service** - Hardcoded storage management topics (discussed below)
|
||||
- 🔴 **All Services** - Cannot customize Cassandra keyspace
|
||||
|
||||
## Problem Statement
|
||||
|
||||
### Issue #1: Parameter Name Mismatch in AsyncProcessor
|
||||
- **CLI defines:** `--config-queue` (unclear naming)
|
||||
- **Argparse converts to:** `config_queue` (in params dict)
|
||||
- **Code looks for:** `config_push_queue`
|
||||
- **Result:** Parameter is ignored, defaults to `persistent://tg/config/config`
|
||||
- **Impact:** Affects all 32+ services inheriting from AsyncProcessor
|
||||
- **Blocks:** Multi-tenant deployments cannot use tenant-specific config queues
|
||||
- **Solution:** Rename CLI parameter to `--config-push-queue` for clarity (breaking change acceptable since feature is currently broken)
|
||||
|
||||
### Issue #2: Parameter Name Mismatch in Config Service
|
||||
- **CLI defines:** `--push-queue` (ambiguous naming)
|
||||
- **Argparse converts to:** `push_queue` (in params dict)
|
||||
- **Code looks for:** `config_push_queue`
|
||||
- **Result:** Parameter is ignored
|
||||
- **Impact:** Config service cannot use custom push queue
|
||||
- **Solution:** Rename CLI parameter to `--config-push-queue` for consistency and clarity (breaking change acceptable)
|
||||
|
||||
### Issue #3: Hardcoded Cassandra Keyspace
|
||||
- **Current:** Keyspace hardcoded as `"config"`, `"knowledge"`, `"librarian"` in various services
|
||||
- **Result:** Cannot customize keyspace for multi-tenant deployments
|
||||
- **Impact:** Config, cores, and librarian services
|
||||
- **Blocks:** Multiple tenants cannot use separate Cassandra keyspaces
|
||||
|
||||
### Issue #4: Collection Management Architecture
|
||||
- **Current:** Collections stored in Cassandra librarian keyspace via separate collections table
|
||||
- **Current:** Librarian uses 4 hardcoded storage management topics to coordinate collection create/delete:
|
||||
- `vector_storage_management_topic`
|
||||
- `object_storage_management_topic`
|
||||
- `triples_storage_management_topic`
|
||||
- `storage_management_response_topic`
|
||||
- **Problems:**
|
||||
- Hardcoded topics cannot be customized for multi-tenant deployments
|
||||
- Complex async coordination between librarian and 4+ storage services
|
||||
- Separate Cassandra table and management infrastructure
|
||||
- Non-persistent request/response queues for critical operations
|
||||
- **Solution:** Migrate collections to config service storage, use config push for distribution
|
||||
|
||||
## Solution
|
||||
|
||||
This spec addresses Issues #1, #2, #3, and #4.
|
||||
|
||||
### Part 1: Fix Parameter Name Mismatches
|
||||
|
||||
#### Change 1: AsyncProcessor Base Class - Rename CLI Parameter
|
||||
**File:** `trustgraph-base/trustgraph/base/async_processor.py`
|
||||
**Line:** 260-264
|
||||
|
||||
**Current:**
|
||||
```python
|
||||
parser.add_argument(
|
||||
'--config-queue',
|
||||
default=default_config_queue,
|
||||
help=f'Config push queue {default_config_queue}',
|
||||
)
|
||||
```
|
||||
|
||||
**Fixed:**
|
||||
```python
|
||||
parser.add_argument(
|
||||
'--config-push-queue',
|
||||
default=default_config_queue,
|
||||
help=f'Config push queue (default: {default_config_queue})',
|
||||
)
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- Clearer, more explicit naming
|
||||
- Matches the internal variable name `config_push_queue`
|
||||
- Breaking change acceptable since feature is currently non-functional
|
||||
- No code change needed in params.get() - it already looks for the correct name
|
||||
|
||||
#### Change 2: Config Service - Rename CLI Parameter
|
||||
**File:** `trustgraph-flow/trustgraph/config/service/service.py`
|
||||
**Line:** 276-279
|
||||
|
||||
**Current:**
|
||||
```python
|
||||
parser.add_argument(
|
||||
'--push-queue',
|
||||
default=default_config_push_queue,
|
||||
help=f'Config push queue (default: {default_config_push_queue})'
|
||||
)
|
||||
```
|
||||
|
||||
**Fixed:**
|
||||
```python
|
||||
parser.add_argument(
|
||||
'--config-push-queue',
|
||||
default=default_config_push_queue,
|
||||
help=f'Config push queue (default: {default_config_push_queue})'
|
||||
)
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- Clearer naming - "config-push-queue" is more explicit than just "push-queue"
|
||||
- Matches the internal variable name `config_push_queue`
|
||||
- Consistent with AsyncProcessor's `--config-push-queue` parameter
|
||||
- Breaking change acceptable since feature is currently non-functional
|
||||
- No code change needed in params.get() - it already looks for the correct name
|
||||
|
||||
### Part 2: Add Cassandra Keyspace Parameterization
|
||||
|
||||
#### Change 3: Add Keyspace Parameter to cassandra_config Module
|
||||
**File:** `trustgraph-base/trustgraph/base/cassandra_config.py`
|
||||
|
||||
**Add CLI argument** (in `add_cassandra_args()` function):
|
||||
```python
|
||||
parser.add_argument(
|
||||
'--cassandra-keyspace',
|
||||
default=None,
|
||||
help='Cassandra keyspace (default: service-specific)'
|
||||
)
|
||||
```
|
||||
|
||||
**Add environment variable support** (in `resolve_cassandra_config()` function):
|
||||
```python
|
||||
keyspace = params.get(
|
||||
"cassandra_keyspace",
|
||||
os.environ.get("CASSANDRA_KEYSPACE")
|
||||
)
|
||||
```
|
||||
|
||||
**Update return value** of `resolve_cassandra_config()`:
|
||||
- Currently returns: `(hosts, username, password)`
|
||||
- Change to return: `(hosts, username, password, keyspace)`
|
||||
|
||||
**Rationale:**
|
||||
- Consistent with existing Cassandra configuration pattern
|
||||
- Available to all services via `add_cassandra_args()`
|
||||
- Supports both CLI and environment variable configuration
|
||||
|
||||
#### Change 4: Config Service - Use Parameterized Keyspace
|
||||
**File:** `trustgraph-flow/trustgraph/config/service/service.py`
|
||||
|
||||
**Line 30** - Remove hardcoded keyspace:
|
||||
```python
|
||||
# DELETE THIS LINE:
|
||||
keyspace = "config"
|
||||
```
|
||||
|
||||
**Lines 69-73** - Update cassandra config resolution:
|
||||
|
||||
**Current:**
|
||||
```python
|
||||
cassandra_host, cassandra_username, cassandra_password = \
|
||||
resolve_cassandra_config(params)
|
||||
```
|
||||
|
||||
**Fixed:**
|
||||
```python
|
||||
cassandra_host, cassandra_username, cassandra_password, keyspace = \
|
||||
resolve_cassandra_config(params, default_keyspace="config")
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- Maintains backward compatibility with "config" as default
|
||||
- Allows override via `--cassandra-keyspace` or `CASSANDRA_KEYSPACE`
|
||||
|
||||
#### Change 5: Cores/Knowledge Service - Use Parameterized Keyspace
|
||||
**File:** `trustgraph-flow/trustgraph/cores/service.py`
|
||||
|
||||
**Line 37** - Remove hardcoded keyspace:
|
||||
```python
|
||||
# DELETE THIS LINE:
|
||||
keyspace = "knowledge"
|
||||
```
|
||||
|
||||
**Update cassandra config resolution** (similar location as config service):
|
||||
```python
|
||||
cassandra_host, cassandra_username, cassandra_password, keyspace = \
|
||||
resolve_cassandra_config(params, default_keyspace="knowledge")
|
||||
```
|
||||
|
||||
#### Change 6: Librarian Service - Use Parameterized Keyspace
|
||||
**File:** `trustgraph-flow/trustgraph/librarian/service.py`
|
||||
|
||||
**Line 51** - Remove hardcoded keyspace:
|
||||
```python
|
||||
# DELETE THIS LINE:
|
||||
keyspace = "librarian"
|
||||
```
|
||||
|
||||
**Update cassandra config resolution** (similar location as config service):
|
||||
```python
|
||||
cassandra_host, cassandra_username, cassandra_password, keyspace = \
|
||||
resolve_cassandra_config(params, default_keyspace="librarian")
|
||||
```
|
||||
|
||||
### Part 3: Migrate Collection Management to Config Service
|
||||
|
||||
#### Overview
|
||||
Migrate collections from Cassandra librarian keyspace to config service storage. This eliminates hardcoded storage management topics and simplifies the architecture by using the existing config push mechanism for distribution.
|
||||
|
||||
#### Current Architecture
|
||||
```
|
||||
API Request → Gateway → Librarian Service
|
||||
↓
|
||||
CollectionManager
|
||||
↓
|
||||
Cassandra Collections Table (librarian keyspace)
|
||||
↓
|
||||
Broadcast to 4 Storage Management Topics (hardcoded)
|
||||
↓
|
||||
Wait for 4+ Storage Service Responses
|
||||
↓
|
||||
Response to Gateway
|
||||
```
|
||||
|
||||
#### New Architecture
|
||||
```
|
||||
API Request → Gateway → Librarian Service
|
||||
↓
|
||||
CollectionManager
|
||||
↓
|
||||
Config Service API (put/delete/getvalues)
|
||||
↓
|
||||
Cassandra Config Table (class='collections', key='user:collection')
|
||||
↓
|
||||
Config Push (to all subscribers on config-push-queue)
|
||||
↓
|
||||
All Storage Services receive config update independently
|
||||
```
|
||||
|
||||
#### Change 7: Collection Manager - Use Config Service API
|
||||
**File:** `trustgraph-flow/trustgraph/librarian/collection_manager.py`
|
||||
|
||||
**Remove:**
|
||||
- `LibraryTableStore` usage (Lines 33, 40-41)
|
||||
- Storage management producers initialization (Lines 86-140)
|
||||
- `on_storage_response` method (Lines 400-430)
|
||||
- `pending_deletions` tracking (Lines 57, 90-96, and usage throughout)
|
||||
|
||||
**Add:**
|
||||
- Config service client for API calls (request/response pattern)
|
||||
|
||||
**Config Client Setup:**
|
||||
```python
|
||||
# In __init__, add config request/response producers/consumers
|
||||
from trustgraph.schema.services.config import ConfigRequest, ConfigResponse
|
||||
|
||||
# Producer for config requests
|
||||
self.config_request_producer = Producer(
|
||||
client=pulsar_client,
|
||||
topic=config_request_queue,
|
||||
schema=ConfigRequest,
|
||||
)
|
||||
|
||||
# Consumer for config responses (with correlation ID)
|
||||
self.config_response_consumer = Consumer(
|
||||
taskgroup=taskgroup,
|
||||
client=pulsar_client,
|
||||
flow=None,
|
||||
topic=config_response_queue,
|
||||
subscriber=f"{id}-config",
|
||||
schema=ConfigResponse,
|
||||
handler=self.on_config_response,
|
||||
)
|
||||
|
||||
# Tracking for pending config requests
|
||||
self.pending_config_requests = {} # request_id -> asyncio.Event
|
||||
```
|
||||
|
||||
**Modify `list_collections` (Lines 145-180):**
|
||||
```python
|
||||
async def list_collections(self, user, tag_filter=None, limit=None):
|
||||
"""List collections from config service"""
|
||||
# Send getvalues request to config service
|
||||
request = ConfigRequest(
|
||||
id=str(uuid.uuid4()),
|
||||
operation='getvalues',
|
||||
type='collections',
|
||||
)
|
||||
|
||||
# Send request and wait for response
|
||||
response = await self.send_config_request(request)
|
||||
|
||||
# Parse collections from response
|
||||
collections = []
|
||||
for key, value_json in response.values.items():
|
||||
if ":" in key:
|
||||
coll_user, collection = key.split(":", 1)
|
||||
if coll_user == user:
|
||||
metadata = json.loads(value_json)
|
||||
collections.append(CollectionMetadata(**metadata))
|
||||
|
||||
# Apply tag filtering in-memory (as before)
|
||||
if tag_filter:
|
||||
collections = [c for c in collections if any(tag in c.tags for tag in tag_filter)]
|
||||
|
||||
# Apply limit
|
||||
if limit:
|
||||
collections = collections[:limit]
|
||||
|
||||
return collections
|
||||
|
||||
async def send_config_request(self, request):
|
||||
"""Send config request and wait for response"""
|
||||
event = asyncio.Event()
|
||||
self.pending_config_requests[request.id] = event
|
||||
|
||||
await self.config_request_producer.send(request)
|
||||
await event.wait()
|
||||
|
||||
return self.pending_config_requests.pop(request.id + "_response")
|
||||
|
||||
async def on_config_response(self, message, consumer, flow):
|
||||
"""Handle config response"""
|
||||
response = message.value()
|
||||
if response.id in self.pending_config_requests:
|
||||
self.pending_config_requests[response.id + "_response"] = response
|
||||
self.pending_config_requests[response.id].set()
|
||||
```
|
||||
|
||||
**Modify `update_collection` (Lines 182-312):**
|
||||
```python
|
||||
async def update_collection(self, user, collection, name, description, tags):
|
||||
"""Update collection via config service"""
|
||||
# Create metadata
|
||||
metadata = CollectionMetadata(
|
||||
user=user,
|
||||
collection=collection,
|
||||
name=name,
|
||||
description=description,
|
||||
tags=tags,
|
||||
)
|
||||
|
||||
# Send put request to config service
|
||||
request = ConfigRequest(
|
||||
id=str(uuid.uuid4()),
|
||||
operation='put',
|
||||
type='collections',
|
||||
key=f'{user}:{collection}',
|
||||
value=json.dumps(metadata.to_dict()),
|
||||
)
|
||||
|
||||
response = await self.send_config_request(request)
|
||||
|
||||
if response.error:
|
||||
raise RuntimeError(f"Config update failed: {response.error.message}")
|
||||
|
||||
# Config service will trigger config push automatically
|
||||
# Storage services will receive update and create collections
|
||||
```
|
||||
|
||||
**Modify `delete_collection` (Lines 314-398):**
|
||||
```python
|
||||
async def delete_collection(self, user, collection):
|
||||
"""Delete collection via config service"""
|
||||
# Send delete request to config service
|
||||
request = ConfigRequest(
|
||||
id=str(uuid.uuid4()),
|
||||
operation='delete',
|
||||
type='collections',
|
||||
key=f'{user}:{collection}',
|
||||
)
|
||||
|
||||
response = await self.send_config_request(request)
|
||||
|
||||
if response.error:
|
||||
raise RuntimeError(f"Config delete failed: {response.error.message}")
|
||||
|
||||
# Config service will trigger config push automatically
|
||||
# Storage services will receive update and delete collections
|
||||
```
|
||||
|
||||
**Collection Metadata Format:**
|
||||
- Stored in config table as: `class='collections', key='user:collection'`
|
||||
- Value is JSON-serialized CollectionMetadata (without timestamp fields)
|
||||
- Fields: `user`, `collection`, `name`, `description`, `tags`
|
||||
- Example: `class='collections', key='alice:my-docs', value='{"user":"alice","collection":"my-docs","name":"My Documents","description":"...","tags":["work"]}'`
|
||||
|
||||
#### Change 8: Librarian Service - Remove Storage Management Infrastructure
|
||||
**File:** `trustgraph-flow/trustgraph/librarian/service.py`
|
||||
|
||||
**Remove:**
|
||||
- Storage management producers (Lines 173-190):
|
||||
- `vector_storage_management_producer`
|
||||
- `object_storage_management_producer`
|
||||
- `triples_storage_management_producer`
|
||||
- Storage response consumer (Lines 192-201)
|
||||
- `on_storage_response` handler (Lines 467-473)
|
||||
|
||||
**Modify:**
|
||||
- CollectionManager initialization (Lines 215-224) - remove storage producer parameters
|
||||
|
||||
**Note:** External collection API remains unchanged:
|
||||
- `list-collections`
|
||||
- `update-collection`
|
||||
- `delete-collection`
|
||||
|
||||
#### Change 9: Remove Collections Table from LibraryTableStore
|
||||
**File:** `trustgraph-flow/trustgraph/tables/library.py`
|
||||
|
||||
**Delete:**
|
||||
- Collections table CREATE statement (Lines 114-127)
|
||||
- Collections prepared statements (Lines 205-240)
|
||||
- All collection methods (Lines 578-717):
|
||||
- `ensure_collection_exists`
|
||||
- `list_collections`
|
||||
- `update_collection`
|
||||
- `delete_collection`
|
||||
- `get_collection`
|
||||
- `create_collection`
|
||||
|
||||
**Rationale:**
|
||||
- Collections now stored in config table
|
||||
- Breaking change acceptable - no data migration needed
|
||||
- Simplifies librarian service significantly
|
||||
|
||||
#### Change 10: Storage Services - Config-Based Collection Management
|
||||
|
||||
**Affected Services (11 total):**
|
||||
- Document embeddings: milvus, pinecone, qdrant
|
||||
- Graph embeddings: milvus, pinecone, qdrant
|
||||
- Object storage: cassandra
|
||||
- Triples storage: cassandra, falkordb, memgraph, neo4j
|
||||
|
||||
**Files:**
|
||||
- `trustgraph-flow/trustgraph/storage/doc_embeddings/milvus/write.py`
|
||||
- `trustgraph-flow/trustgraph/storage/doc_embeddings/pinecone/write.py`
|
||||
- `trustgraph-flow/trustgraph/storage/doc_embeddings/qdrant/write.py`
|
||||
- `trustgraph-flow/trustgraph/storage/graph_embeddings/milvus/write.py`
|
||||
- `trustgraph-flow/trustgraph/storage/graph_embeddings/pinecone/write.py`
|
||||
- `trustgraph-flow/trustgraph/storage/graph_embeddings/qdrant/write.py`
|
||||
- `trustgraph-flow/trustgraph/storage/objects/cassandra/write.py`
|
||||
- `trustgraph-flow/trustgraph/storage/triples/cassandra/write.py`
|
||||
- `trustgraph-flow/trustgraph/storage/triples/falkordb/write.py`
|
||||
- `trustgraph-flow/trustgraph/storage/triples/memgraph/write.py`
|
||||
- `trustgraph-flow/trustgraph/storage/triples/neo4j/write.py`
|
||||
|
||||
**Implementation Pattern (all services):**
|
||||
|
||||
1. **Register config handler in `__init__`:**
|
||||
```python
|
||||
# Add after AsyncProcessor initialization
|
||||
self.register_config_handler(self.on_collection_config)
|
||||
self.known_collections = set() # Track (user, collection) tuples
|
||||
```
|
||||
|
||||
2. **Implement config handler:**
|
||||
```python
|
||||
async def on_collection_config(self, config, version):
|
||||
"""Handle collection configuration updates"""
|
||||
logger.info(f"Collection config version: {version}")
|
||||
|
||||
if "collections" not in config:
|
||||
return
|
||||
|
||||
# Parse collections from config
|
||||
# Key format: "user:collection" in config["collections"]
|
||||
config_collections = set()
|
||||
for key in config["collections"].keys():
|
||||
if ":" in key:
|
||||
user, collection = key.split(":", 1)
|
||||
config_collections.add((user, collection))
|
||||
|
||||
# Determine changes
|
||||
to_create = config_collections - self.known_collections
|
||||
to_delete = self.known_collections - config_collections
|
||||
|
||||
# Create new collections (idempotent)
|
||||
for user, collection in to_create:
|
||||
try:
|
||||
await self.create_collection_internal(user, collection)
|
||||
self.known_collections.add((user, collection))
|
||||
logger.info(f"Created collection: {user}/{collection}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to create {user}/{collection}: {e}")
|
||||
|
||||
# Delete removed collections (idempotent)
|
||||
for user, collection in to_delete:
|
||||
try:
|
||||
await self.delete_collection_internal(user, collection)
|
||||
self.known_collections.discard((user, collection))
|
||||
logger.info(f"Deleted collection: {user}/{collection}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to delete {user}/{collection}: {e}")
|
||||
```
|
||||
|
||||
3. **Initialize known collections on startup:**
|
||||
```python
|
||||
async def start(self):
|
||||
"""Start the processor"""
|
||||
await super().start()
|
||||
await self.sync_known_collections()
|
||||
|
||||
async def sync_known_collections(self):
|
||||
"""Query backend to populate known_collections set"""
|
||||
# Backend-specific implementation:
|
||||
# - Milvus/Pinecone/Qdrant: List collections/indexes matching naming pattern
|
||||
# - Cassandra: Query keyspaces or collection metadata
|
||||
# - Neo4j/Memgraph/FalkorDB: Query CollectionMetadata nodes
|
||||
pass
|
||||
```
|
||||
|
||||
4. **Refactor existing handler methods:**
|
||||
```python
|
||||
# Rename and remove response sending:
|
||||
# handle_create_collection → create_collection_internal
|
||||
# handle_delete_collection → delete_collection_internal
|
||||
|
||||
async def create_collection_internal(self, user, collection):
|
||||
"""Create collection (idempotent)"""
|
||||
# Same logic as current handle_create_collection
|
||||
# But remove response producer calls
|
||||
# Handle "already exists" gracefully
|
||||
pass
|
||||
|
||||
async def delete_collection_internal(self, user, collection):
|
||||
"""Delete collection (idempotent)"""
|
||||
# Same logic as current handle_delete_collection
|
||||
# But remove response producer calls
|
||||
# Handle "not found" gracefully
|
||||
pass
|
||||
```
|
||||
|
||||
5. **Remove storage management infrastructure:**
|
||||
- Remove `self.storage_request_consumer` setup and start
|
||||
- Remove `self.storage_response_producer` setup
|
||||
- Remove `on_storage_management` dispatcher method
|
||||
- Remove metrics for storage management
|
||||
- Remove imports: `StorageManagementRequest`, `StorageManagementResponse`
|
||||
|
||||
**Backend-Specific Considerations:**
|
||||
|
||||
- **Vector stores (Milvus, Pinecone, Qdrant):** Track logical `(user, collection)` in `known_collections`, but may create multiple backend collections per dimension. Continue lazy creation pattern. Delete operations must remove all dimension variants.
|
||||
|
||||
- **Cassandra Objects:** Collections are row properties, not structures. Track keyspace-level information.
|
||||
|
||||
- **Graph stores (Neo4j, Memgraph, FalkorDB):** Query `CollectionMetadata` nodes on startup. Create/delete metadata nodes on sync.
|
||||
|
||||
- **Cassandra Triples:** Use `KnowledgeGraph` API for collection operations.
|
||||
|
||||
**Key Design Points:**
|
||||
|
||||
- **Eventual consistency:** No request/response mechanism, config push is broadcast
|
||||
- **Idempotency:** All create/delete operations must be safe to retry
|
||||
- **Error handling:** Log errors but don't block config updates
|
||||
- **Self-healing:** Failed operations will retry on next config push
|
||||
- **Collection key format:** `"user:collection"` in `config["collections"]`
|
||||
|
||||
#### Change 11: Update Collection Schema - Remove Timestamps
|
||||
**File:** `trustgraph-base/trustgraph/schema/services/collection.py`
|
||||
|
||||
**Modify CollectionMetadata (Lines 13-21):**
|
||||
Remove `created_at` and `updated_at` fields:
|
||||
```python
|
||||
class CollectionMetadata(Record):
|
||||
user = String()
|
||||
collection = String()
|
||||
name = String()
|
||||
description = String()
|
||||
tags = Array(String())
|
||||
# Remove: created_at = String()
|
||||
# Remove: updated_at = String()
|
||||
```
|
||||
|
||||
**Modify CollectionManagementRequest (Lines 25-47):**
|
||||
Remove timestamp fields:
|
||||
```python
|
||||
class CollectionManagementRequest(Record):
|
||||
operation = String()
|
||||
user = String()
|
||||
collection = String()
|
||||
timestamp = String()
|
||||
name = String()
|
||||
description = String()
|
||||
tags = Array(String())
|
||||
# Remove: created_at = String()
|
||||
# Remove: updated_at = String()
|
||||
tag_filter = Array(String())
|
||||
limit = Integer()
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- Timestamps don't add value for collections
|
||||
- Config service maintains its own version tracking
|
||||
- Simplifies schema and reduces storage
|
||||
|
||||
#### Benefits of Config Service Migration
|
||||
|
||||
1. ✅ **Eliminates hardcoded storage management topics** - Solves multi-tenant blocker
|
||||
2. ✅ **Simpler coordination** - No complex async waiting for 4+ storage responses
|
||||
3. ✅ **Eventual consistency** - Storage services update independently via config push
|
||||
4. ✅ **Better reliability** - Persistent config push vs non-persistent request/response
|
||||
5. ✅ **Unified configuration model** - Collections treated as configuration
|
||||
6. ✅ **Reduces complexity** - Removes ~300 lines of coordination code
|
||||
7. ✅ **Multi-tenant ready** - Config already supports tenant isolation via keyspace
|
||||
8. ✅ **Version tracking** - Config service version mechanism provides audit trail
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Backward Compatibility
|
||||
|
||||
**Parameter Changes:**
|
||||
- CLI parameter renames are breaking changes but acceptable (feature currently non-functional)
|
||||
- Services work without parameters (use defaults)
|
||||
- Default keyspaces preserved: "config", "knowledge", "librarian"
|
||||
- Default queue: `persistent://tg/config/config`
|
||||
|
||||
**Collection Management:**
|
||||
- **Breaking change:** Collections table removed from librarian keyspace
|
||||
- **No data migration provided** - acceptable for this phase
|
||||
- External collection API unchanged (list/update/delete operations)
|
||||
- Collection metadata format simplified (timestamps removed)
|
||||
|
||||
### Testing Requirements
|
||||
|
||||
**Parameter Testing:**
|
||||
1. Verify `--config-push-queue` parameter works on graph-embeddings service
|
||||
2. Verify `--config-push-queue` parameter works on text-completion service
|
||||
3. Verify `--config-push-queue` parameter works on config service
|
||||
4. Verify `--cassandra-keyspace` parameter works for config service
|
||||
5. Verify `--cassandra-keyspace` parameter works for cores service
|
||||
6. Verify `--cassandra-keyspace` parameter works for librarian service
|
||||
7. Verify services work without parameters (uses defaults)
|
||||
8. Verify multi-tenant deployment with custom queue names and keyspace
|
||||
|
||||
**Collection Management Testing:**
|
||||
9. Verify `list-collections` operation via config service
|
||||
10. Verify `update-collection` creates/updates in config table
|
||||
11. Verify `delete-collection` removes from config table
|
||||
12. Verify config push is triggered on collection updates
|
||||
13. Verify tag filtering works with config-based storage
|
||||
14. Verify collection operations work without timestamp fields
|
||||
|
||||
### Multi-Tenant Deployment Example
|
||||
```bash
|
||||
# Tenant: tg-dev
|
||||
graph-embeddings \
|
||||
-p pulsar+ssl://broker:6651 \
|
||||
--pulsar-api-key <KEY> \
|
||||
--config-push-queue persistent://tg-dev/config/config
|
||||
|
||||
config-service \
|
||||
-p pulsar+ssl://broker:6651 \
|
||||
--pulsar-api-key <KEY> \
|
||||
--config-push-queue persistent://tg-dev/config/config \
|
||||
--cassandra-keyspace tg_dev_config
|
||||
```
|
||||
|
||||
## Impact Analysis
|
||||
|
||||
### Services Affected by Change 1-2 (CLI Parameter Rename)
|
||||
All services inheriting from AsyncProcessor or FlowProcessor:
|
||||
- config-service
|
||||
- cores-service
|
||||
- librarian-service
|
||||
- graph-embeddings
|
||||
- document-embeddings
|
||||
- text-completion-* (all providers)
|
||||
- extract-* (all extractors)
|
||||
- query-* (all query services)
|
||||
- retrieval-* (all RAG services)
|
||||
- storage-* (all storage services)
|
||||
- And 20+ more services
|
||||
|
||||
### Services Affected by Changes 3-6 (Cassandra Keyspace)
|
||||
- config-service
|
||||
- cores-service
|
||||
- librarian-service
|
||||
|
||||
### Services Affected by Changes 7-11 (Collection Management)
|
||||
|
||||
**Immediate Changes:**
|
||||
- librarian-service (collection_manager.py, service.py)
|
||||
- tables/library.py (collections table removal)
|
||||
- schema/services/collection.py (timestamp removal)
|
||||
|
||||
**Deferred Changes (Change 10):**
|
||||
- All storage services (11 total) - will subscribe to config push for collection updates
|
||||
- Storage management schema (potentially removable if unused elsewhere)
|
||||
|
||||
## Future Considerations
|
||||
|
||||
### Per-User Keyspace Model
|
||||
|
||||
Some services use **per-user keyspaces** dynamically, where each user gets their own Cassandra keyspace:
|
||||
|
||||
**Services with per-user keyspaces:**
|
||||
1. **Triples Query Service** (`trustgraph-flow/trustgraph/query/triples/cassandra/service.py:65`)
|
||||
- Uses `keyspace=query.user`
|
||||
2. **Objects Query Service** (`trustgraph-flow/trustgraph/query/objects/cassandra/service.py:479`)
|
||||
- Uses `keyspace=self.sanitize_name(user)`
|
||||
3. **KnowledgeGraph Direct Access** (`trustgraph-flow/trustgraph/direct/cassandra_kg.py:18`)
|
||||
- Default parameter `keyspace="trustgraph"`
|
||||
|
||||
**Status:** These are **not modified** in this specification.
|
||||
|
||||
**Future Review Required:**
|
||||
- Evaluate whether per-user keyspace model creates tenant isolation issues
|
||||
- Consider if multi-tenant deployments need keyspace prefix patterns (e.g., `tenant_a_user1`)
|
||||
- Review for potential user ID collision across tenants
|
||||
- Assess if single shared keyspace per tenant with user-based row isolation is preferable
|
||||
|
||||
**Note:** This does not block the current multi-tenant implementation but should be reviewed before production multi-tenant deployments.
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Parameter Fixes (Changes 1-6)
|
||||
- Fix `--config-push-queue` parameter naming
|
||||
- Add `--cassandra-keyspace` parameter support
|
||||
- **Outcome:** Multi-tenant queue and keyspace configuration enabled
|
||||
|
||||
### Phase 2: Collection Management Migration (Changes 7-9, 11)
|
||||
- Migrate collection storage to config service
|
||||
- Remove collections table from librarian
|
||||
- Update collection schema (remove timestamps)
|
||||
- **Outcome:** Eliminates hardcoded storage management topics, simplifies librarian
|
||||
|
||||
### Phase 3: Storage Service Updates (Change 10) - Deferred
|
||||
- Update all storage services to use config push for collections
|
||||
- Remove storage management request/response infrastructure
|
||||
- **Outcome:** Complete config-based collection management
|
||||
|
||||
## References
|
||||
- GitHub Issue: https://github.com/trustgraph-ai/trustgraph/issues/582
|
||||
- Related Files:
|
||||
- `trustgraph-base/trustgraph/base/async_processor.py`
|
||||
- `trustgraph-base/trustgraph/base/cassandra_config.py`
|
||||
- `trustgraph-base/trustgraph/schema/core/topic.py`
|
||||
- `trustgraph-base/trustgraph/schema/services/collection.py`
|
||||
- `trustgraph-flow/trustgraph/config/service/service.py`
|
||||
- `trustgraph-flow/trustgraph/cores/service.py`
|
||||
- `trustgraph-flow/trustgraph/librarian/service.py`
|
||||
- `trustgraph-flow/trustgraph/librarian/collection_manager.py`
|
||||
- `trustgraph-flow/trustgraph/tables/library.py`
|
||||
Loading…
Add table
Add a link
Reference in a new issue