mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-26 00:46:22 +02:00
781 lines
27 KiB
Markdown
781 lines
27 KiB
Markdown
|
|
---
|
|||
|
|
layout: default
|
|||
|
|
title: "技术规范:多租户支持"
|
|||
|
|
parent: "Chinese (Beta)"
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# 技术规范:多租户支持
|
|||
|
|
|
|||
|
|
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
|||
|
|
|
|||
|
|
## 概述
|
|||
|
|
|
|||
|
|
通过修复参数名称不匹配的问题,从而解决阻止队列自定义的问题,并添加 Cassandra 键空间参数化,以实现多租户部署。
|
|||
|
|
|
|||
|
|
## 架构上下文
|
|||
|
|
|
|||
|
|
### 基于流的队列解析
|
|||
|
|
|
|||
|
|
TrustGraph 系统使用**基于流的架构**进行动态队列解析,该架构本质上支持多租户:
|
|||
|
|
|
|||
|
|
**流定义**存储在 Cassandra 中,并通过接口定义指定队列名称。
|
|||
|
|
**队列名称使用模板**,其中包含 `{id}` 变量,这些变量会被替换为流实例 ID。
|
|||
|
|
**服务在请求时动态解析队列**,通过查找流配置。
|
|||
|
|
**每个租户可以拥有独特的流**,具有不同的队列名称,从而提供隔离。
|
|||
|
|
|
|||
|
|
示例流接口定义:
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"interfaces": {
|
|||
|
|
"triples-store": "persistent://tg/flow/triples-store:{id}",
|
|||
|
|
"graph-embeddings-store": "persistent://tg/flow/graph-embeddings-store:{id}"
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
当租户 A 启动流程 `tenant-a-prod`,而租户 B 启动流程 `tenant-b-prod` 时,它们会自动获得隔离的队列:
|
|||
|
|
`persistent://tg/flow/triples-store:tenant-a-prod`
|
|||
|
|
`persistent://tg/flow/triples-store:tenant-b-prod`
|
|||
|
|
|
|||
|
|
**为多租户设计的服务:**
|
|||
|
|
✅ **知识管理 (核心)** - 动态解析从请求中传递的流程配置中的队列
|
|||
|
|
|
|||
|
|
**需要修复的服务:**
|
|||
|
|
🔴 **配置服务** - 参数名称不匹配,无法自定义队列
|
|||
|
|
🔴 **图书管理员服务** - 预定义的存储管理主题(见下文)
|
|||
|
|
🔴 **所有服务** - 无法自定义 Cassandra keyspace
|
|||
|
|
|
|||
|
|
## 问题描述
|
|||
|
|
|
|||
|
|
### 问题 #1:AsyncProcessor 中的参数名称不匹配
|
|||
|
|
**CLI 定义:** `--config-queue` (命名不明确)
|
|||
|
|
**Argparse 转换:** `config_queue` (在 params 字典中)
|
|||
|
|
**代码查找:** `config_push_queue`
|
|||
|
|
**结果:** 参数被忽略,默认为 `persistent://tg/config/config`
|
|||
|
|
**影响:** 影响所有 32 多个从 AsyncProcessor 继承的服务
|
|||
|
|
**问题:** 多租户部署无法使用租户特定的配置队列
|
|||
|
|
**解决方案:** 将 CLI 参数重命名为 `--config-push-queue`,以提高清晰度(可以接受破坏性更改,因为该功能当前已损坏)
|
|||
|
|
|
|||
|
|
### 问题 #2:配置服务中的参数名称不匹配
|
|||
|
|
**CLI 定义:** `--push-queue` (命名模糊)
|
|||
|
|
**Argparse 转换:** `push_queue` (在 params 字典中)
|
|||
|
|
**代码查找:** `config_push_queue`
|
|||
|
|
**结果:** 参数被忽略
|
|||
|
|
**影响:** 配置服务无法使用自定义推送队列
|
|||
|
|
**解决方案:** 将 CLI 参数重命名为 `--config-push-queue`,以提高一致性和清晰度(可以接受破坏性更改)
|
|||
|
|
|
|||
|
|
### 问题 #3:预定义的 Cassandra Keyspace
|
|||
|
|
**当前:** Keyspace 在各种服务中硬编码为 `"config"`、`"knowledge"`、`"librarian"`
|
|||
|
|
**结果:** 无法为多租户部署自定义 keyspace
|
|||
|
|
**影响:** 配置、核心和图书管理员服务
|
|||
|
|
**问题:** 多个租户无法使用单独的 Cassandra keyspace
|
|||
|
|
|
|||
|
|
### 问题 #4:集合管理架构 ✅ 已完成
|
|||
|
|
**之前:** 集合存储在图书管理员 keyspace 中的单独的集合表中
|
|||
|
|
**之前:** 图书管理员使用 4 个硬编码的存储管理主题来协调集合创建/删除:
|
|||
|
|
`vector_storage_management_topic`
|
|||
|
|
`object_storage_management_topic`
|
|||
|
|
`triples_storage_management_topic`
|
|||
|
|
`storage_management_response_topic`
|
|||
|
|
**问题(已解决):**
|
|||
|
|
无法为多租户部署自定义硬编码主题
|
|||
|
|
图书管理员和 4 个或更多存储服务之间的复杂异步协调
|
|||
|
|
单独的 Cassandra 表和管理基础设施
|
|||
|
|
关键操作的非持久性请求/响应队列
|
|||
|
|
**已实施的解决方案:** 将集合迁移到配置服务存储,使用配置推送进行分发
|
|||
|
|
**状态:** 所有存储后端已迁移到 `CollectionConfigHandler` 模式
|
|||
|
|
|
|||
|
|
## 解决方案
|
|||
|
|
|
|||
|
|
此规范解决了问题 #1、#2、#3 和 #4。
|
|||
|
|
|
|||
|
|
### 第一部分:修复参数名称不匹配
|
|||
|
|
|
|||
|
|
#### 更改 1:AsyncProcessor 基类 - 重命名 CLI 参数
|
|||
|
|
**文件:** `trustgraph-base/trustgraph/base/async_processor.py`
|
|||
|
|
**行:** 260-264
|
|||
|
|
|
|||
|
|
**当前:**
|
|||
|
|
```python
|
|||
|
|
parser.add_argument(
|
|||
|
|
'--config-queue',
|
|||
|
|
default=default_config_queue,
|
|||
|
|
help=f'Config push queue {default_config_queue}',
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**已修复:**
|
|||
|
|
```python
|
|||
|
|
parser.add_argument(
|
|||
|
|
'--config-push-queue',
|
|||
|
|
default=default_config_queue,
|
|||
|
|
help=f'Config push queue (default: {default_config_queue})',
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**理由:**
|
|||
|
|
命名更清晰、更明确
|
|||
|
|
与内部变量名 `config_push_queue` 匹配
|
|||
|
|
允许进行重大更改,因为该功能目前不可用
|
|||
|
|
params.get() 不需要任何代码更改,因为它已经查找正确的名称
|
|||
|
|
|
|||
|
|
#### 更改 2:配置服务 - 重命名 CLI 参数
|
|||
|
|
**文件:** `trustgraph-flow/trustgraph/config/service/service.py`
|
|||
|
|
**行:** 276-279
|
|||
|
|
|
|||
|
|
**当前:**
|
|||
|
|
```python
|
|||
|
|
parser.add_argument(
|
|||
|
|
'--push-queue',
|
|||
|
|
default=default_config_push_queue,
|
|||
|
|
help=f'Config push queue (default: {default_config_push_queue})'
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**固定:**
|
|||
|
|
```python
|
|||
|
|
parser.add_argument(
|
|||
|
|
'--config-push-queue',
|
|||
|
|
default=default_config_push_queue,
|
|||
|
|
help=f'Config push queue (default: {default_config_push_queue})'
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**理由:**
|
|||
|
|
更清晰的命名 - "config-push-queue" 比仅仅 "push-queue" 更明确。
|
|||
|
|
与内部变量名 `config_push_queue` 匹配。
|
|||
|
|
与 AsyncProcessor 的 `--config-push-queue` 参数一致。
|
|||
|
|
即使是重大更改,也是可以接受的,因为该功能目前不可用。
|
|||
|
|
params.get() 中不需要任何代码更改 - 它已经查找正确的名称。
|
|||
|
|
|
|||
|
|
### 第二部分:添加 Cassandra 键空间参数化
|
|||
|
|
|
|||
|
|
#### 更改 3:向 cassandra_config 模块添加键空间参数
|
|||
|
|
**文件:** `trustgraph-base/trustgraph/base/cassandra_config.py`
|
|||
|
|
|
|||
|
|
**添加 CLI 参数**(在 `add_cassandra_args()` 函数中):
|
|||
|
|
```python
|
|||
|
|
parser.add_argument(
|
|||
|
|
'--cassandra-keyspace',
|
|||
|
|
default=None,
|
|||
|
|
help='Cassandra keyspace (default: service-specific)'
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**添加环境变量支持** (在 `resolve_cassandra_config()` 函数中):
|
|||
|
|
```python
|
|||
|
|
keyspace = params.get(
|
|||
|
|
"cassandra_keyspace",
|
|||
|
|
os.environ.get("CASSANDRA_KEYSPACE")
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**更新 `resolve_cassandra_config()` 的返回值:**
|
|||
|
|
当前返回:`(hosts, username, password)`
|
|||
|
|
更改为返回:`(hosts, username, password, keyspace)`
|
|||
|
|
|
|||
|
|
**理由:**
|
|||
|
|
与现有的 Cassandra 配置模式一致
|
|||
|
|
通过 `add_cassandra_args()` 可供所有服务使用
|
|||
|
|
支持 CLI 和环境变量配置
|
|||
|
|
|
|||
|
|
#### 变更 4:配置服务 - 使用参数化 Keyspace
|
|||
|
|
**文件:** `trustgraph-flow/trustgraph/config/service/service.py`
|
|||
|
|
|
|||
|
|
**第 30 行** - 移除硬编码的 Keyspace:
|
|||
|
|
```python
|
|||
|
|
# DELETE THIS LINE:
|
|||
|
|
keyspace = "config"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**第69-73行** - 更新 Cassandra 配置解析:
|
|||
|
|
|
|||
|
|
**当前:**
|
|||
|
|
```python
|
|||
|
|
cassandra_host, cassandra_username, cassandra_password = \
|
|||
|
|
resolve_cassandra_config(params)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**已修复:**
|
|||
|
|
```python
|
|||
|
|
cassandra_host, cassandra_username, cassandra_password, keyspace = \
|
|||
|
|
resolve_cassandra_config(params, default_keyspace="config")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**原因:**
|
|||
|
|
保持与默认值为 "config" 的配置向后兼容。
|
|||
|
|
允许通过 `--cassandra-keyspace` 或 `CASSANDRA_KEYSPACE` 进行覆盖。
|
|||
|
|
|
|||
|
|
#### 变更 5:核心/知识服务 - 使用参数化键空间
|
|||
|
|
**文件:** `trustgraph-flow/trustgraph/cores/service.py`
|
|||
|
|
|
|||
|
|
**第 37 行** - 移除硬编码的键空间:
|
|||
|
|
```python
|
|||
|
|
# DELETE THIS LINE:
|
|||
|
|
keyspace = "knowledge"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**更新 Cassandra 配置解析**(位置类似于配置服务):
|
|||
|
|
```python
|
|||
|
|
cassandra_host, cassandra_username, cassandra_password, keyspace = \
|
|||
|
|
resolve_cassandra_config(params, default_keyspace="knowledge")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 变更 6:图书管理员服务 - 使用参数化键空间
|
|||
|
|
**文件:** `trustgraph-flow/trustgraph/librarian/service.py`
|
|||
|
|
|
|||
|
|
**第 51 行** - 移除硬编码的键空间:
|
|||
|
|
```python
|
|||
|
|
# DELETE THIS LINE:
|
|||
|
|
keyspace = "librarian"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**更新 Cassandra 配置解析**(位置与配置服务类似):
|
|||
|
|
```python
|
|||
|
|
cassandra_host, cassandra_username, cassandra_password, keyspace = \
|
|||
|
|
resolve_cassandra_config(params, default_keyspace="librarian")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 第三部分:将集合管理迁移到配置服务
|
|||
|
|
|
|||
|
|
#### 概述
|
|||
|
|
将集合从 Cassandra librarian 键空间迁移到配置服务存储。这消除了硬编码的存储管理主题,并通过使用现有的配置推送机制进行分发,简化了架构。
|
|||
|
|
|
|||
|
|
#### 当前架构
|
|||
|
|
```
|
|||
|
|
API Request → Gateway → Librarian Service
|
|||
|
|
↓
|
|||
|
|
CollectionManager
|
|||
|
|
↓
|
|||
|
|
Cassandra Collections Table (librarian keyspace)
|
|||
|
|
↓
|
|||
|
|
Broadcast to 4 Storage Management Topics (hardcoded)
|
|||
|
|
↓
|
|||
|
|
Wait for 4+ Storage Service Responses
|
|||
|
|
↓
|
|||
|
|
Response to Gateway
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 新架构
|
|||
|
|
```
|
|||
|
|
API Request → Gateway → Librarian Service
|
|||
|
|
↓
|
|||
|
|
CollectionManager
|
|||
|
|
↓
|
|||
|
|
Config Service API (put/delete/getvalues)
|
|||
|
|
↓
|
|||
|
|
Cassandra Config Table (class='collections', key='user:collection')
|
|||
|
|
↓
|
|||
|
|
Config Push (to all subscribers on config-push-queue)
|
|||
|
|
↓
|
|||
|
|
All Storage Services receive config update independently
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 变更 7:集合管理器 - 使用配置服务 API
|
|||
|
|
**文件:** `trustgraph-flow/trustgraph/librarian/collection_manager.py`
|
|||
|
|
|
|||
|
|
**移除:**
|
|||
|
|
`LibraryTableStore` 的使用(第 33 行,第 40-41 行)
|
|||
|
|
存储管理生产者初始化(第 86-140 行)
|
|||
|
|
`on_storage_response` 方法(第 400-430 行)
|
|||
|
|
`pending_deletions` 跟踪(第 57 行,第 90-96 行,以及整个使用过程)
|
|||
|
|
|
|||
|
|
**添加:**
|
|||
|
|
用于 API 调用的配置服务客户端(请求/响应模式)
|
|||
|
|
|
|||
|
|
**配置客户端设置:**
|
|||
|
|
```python
|
|||
|
|
# In __init__, add config request/response producers/consumers
|
|||
|
|
from trustgraph.schema.services.config import ConfigRequest, ConfigResponse
|
|||
|
|
|
|||
|
|
# Producer for config requests
|
|||
|
|
self.config_request_producer = Producer(
|
|||
|
|
client=pulsar_client,
|
|||
|
|
topic=config_request_queue,
|
|||
|
|
schema=ConfigRequest,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# Consumer for config responses (with correlation ID)
|
|||
|
|
self.config_response_consumer = Consumer(
|
|||
|
|
taskgroup=taskgroup,
|
|||
|
|
client=pulsar_client,
|
|||
|
|
flow=None,
|
|||
|
|
topic=config_response_queue,
|
|||
|
|
subscriber=f"{id}-config",
|
|||
|
|
schema=ConfigResponse,
|
|||
|
|
handler=self.on_config_response,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# Tracking for pending config requests
|
|||
|
|
self.pending_config_requests = {} # request_id -> asyncio.Event
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**修改 `list_collections` (第145-180行):**
|
|||
|
|
```python
|
|||
|
|
async def list_collections(self, user, tag_filter=None, limit=None):
|
|||
|
|
"""List collections from config service"""
|
|||
|
|
# Send getvalues request to config service
|
|||
|
|
request = ConfigRequest(
|
|||
|
|
id=str(uuid.uuid4()),
|
|||
|
|
operation='getvalues',
|
|||
|
|
type='collections',
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# Send request and wait for response
|
|||
|
|
response = await self.send_config_request(request)
|
|||
|
|
|
|||
|
|
# Parse collections from response
|
|||
|
|
collections = []
|
|||
|
|
for key, value_json in response.values.items():
|
|||
|
|
if ":" in key:
|
|||
|
|
coll_user, collection = key.split(":", 1)
|
|||
|
|
if coll_user == user:
|
|||
|
|
metadata = json.loads(value_json)
|
|||
|
|
collections.append(CollectionMetadata(**metadata))
|
|||
|
|
|
|||
|
|
# Apply tag filtering in-memory (as before)
|
|||
|
|
if tag_filter:
|
|||
|
|
collections = [c for c in collections if any(tag in c.tags for tag in tag_filter)]
|
|||
|
|
|
|||
|
|
# Apply limit
|
|||
|
|
if limit:
|
|||
|
|
collections = collections[:limit]
|
|||
|
|
|
|||
|
|
return collections
|
|||
|
|
|
|||
|
|
async def send_config_request(self, request):
|
|||
|
|
"""Send config request and wait for response"""
|
|||
|
|
event = asyncio.Event()
|
|||
|
|
self.pending_config_requests[request.id] = event
|
|||
|
|
|
|||
|
|
await self.config_request_producer.send(request)
|
|||
|
|
await event.wait()
|
|||
|
|
|
|||
|
|
return self.pending_config_requests.pop(request.id + "_response")
|
|||
|
|
|
|||
|
|
async def on_config_response(self, message, consumer, flow):
|
|||
|
|
"""Handle config response"""
|
|||
|
|
response = message.value()
|
|||
|
|
if response.id in self.pending_config_requests:
|
|||
|
|
self.pending_config_requests[response.id + "_response"] = response
|
|||
|
|
self.pending_config_requests[response.id].set()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**修改 `update_collection` (第182-312行):**
|
|||
|
|
```python
|
|||
|
|
async def update_collection(self, user, collection, name, description, tags):
|
|||
|
|
"""Update collection via config service"""
|
|||
|
|
# Create metadata
|
|||
|
|
metadata = CollectionMetadata(
|
|||
|
|
user=user,
|
|||
|
|
collection=collection,
|
|||
|
|
name=name,
|
|||
|
|
description=description,
|
|||
|
|
tags=tags,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# Send put request to config service
|
|||
|
|
request = ConfigRequest(
|
|||
|
|
id=str(uuid.uuid4()),
|
|||
|
|
operation='put',
|
|||
|
|
type='collections',
|
|||
|
|
key=f'{user}:{collection}',
|
|||
|
|
value=json.dumps(metadata.to_dict()),
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
response = await self.send_config_request(request)
|
|||
|
|
|
|||
|
|
if response.error:
|
|||
|
|
raise RuntimeError(f"Config update failed: {response.error.message}")
|
|||
|
|
|
|||
|
|
# Config service will trigger config push automatically
|
|||
|
|
# Storage services will receive update and create collections
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**修改 `delete_collection` (第314-398行):**
|
|||
|
|
```python
|
|||
|
|
async def delete_collection(self, user, collection):
|
|||
|
|
"""Delete collection via config service"""
|
|||
|
|
# Send delete request to config service
|
|||
|
|
request = ConfigRequest(
|
|||
|
|
id=str(uuid.uuid4()),
|
|||
|
|
operation='delete',
|
|||
|
|
type='collections',
|
|||
|
|
key=f'{user}:{collection}',
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
response = await self.send_config_request(request)
|
|||
|
|
|
|||
|
|
if response.error:
|
|||
|
|
raise RuntimeError(f"Config delete failed: {response.error.message}")
|
|||
|
|
|
|||
|
|
# Config service will trigger config push automatically
|
|||
|
|
# Storage services will receive update and delete collections
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**集合元数据格式:**
|
|||
|
|
存储在配置表中,格式为:`class='collections', key='user:collection'`
|
|||
|
|
值是 JSON 序列化的 CollectionMetadata (不包含时间戳字段)
|
|||
|
|
字段:`user`, `collection`, `name`, `description`, `tags`
|
|||
|
|
示例:`class='collections', key='alice:my-docs', value='{"user":"alice","collection":"my-docs","name":"My Documents","description":"...","tags":["work"]}'`
|
|||
|
|
|
|||
|
|
#### 变更 8: Librarian Service - 移除存储管理基础设施
|
|||
|
|
**文件:** `trustgraph-flow/trustgraph/librarian/service.py`
|
|||
|
|
|
|||
|
|
**移除:**
|
|||
|
|
存储管理生产者 (173-190 行):
|
|||
|
|
`vector_storage_management_producer`
|
|||
|
|
`object_storage_management_producer`
|
|||
|
|
`triples_storage_management_producer`
|
|||
|
|
存储响应消费者 (192-201 行)
|
|||
|
|
`on_storage_response` 处理程序 (467-473 行)
|
|||
|
|
|
|||
|
|
**修改:**
|
|||
|
|
CollectionManager 初始化 (215-224 行) - 移除存储生产者参数
|
|||
|
|
|
|||
|
|
**注意:** 外部集合 API 保持不变:
|
|||
|
|
`list-collections`
|
|||
|
|
`update-collection`
|
|||
|
|
`delete-collection`
|
|||
|
|
|
|||
|
|
#### 变更 9: 从 LibraryTableStore 中移除 Collections 表
|
|||
|
|
**文件:** `trustgraph-flow/trustgraph/tables/library.py`
|
|||
|
|
|
|||
|
|
**删除:**
|
|||
|
|
Collections 表的 CREATE 语句 (114-127 行)
|
|||
|
|
Collections 预处理语句 (205-240 行)
|
|||
|
|
所有集合方法 (578-717 行):
|
|||
|
|
`ensure_collection_exists`
|
|||
|
|
`list_collections`
|
|||
|
|
`update_collection`
|
|||
|
|
`delete_collection`
|
|||
|
|
`get_collection`
|
|||
|
|
`create_collection`
|
|||
|
|
|
|||
|
|
**原因:**
|
|||
|
|
集合现在存储在配置表中
|
|||
|
|
这是一个破坏性变更,但无需数据迁移
|
|||
|
|
显著简化了 librarian service
|
|||
|
|
|
|||
|
|
#### 变更 10: 存储服务 - 基于配置的集合管理 ✅ 已完成
|
|||
|
|
|
|||
|
|
**状态:** 所有 11 个存储后端都已迁移到使用 `CollectionConfigHandler`。
|
|||
|
|
|
|||
|
|
**受影响的服务 (总共 11 个):**
|
|||
|
|
文档嵌入: milvus, pinecone, qdrant
|
|||
|
|
图嵌入: milvus, pinecone, qdrant
|
|||
|
|
对象存储: cassandra
|
|||
|
|
三元组存储: cassandra, falkordb, memgraph, neo4j
|
|||
|
|
|
|||
|
|
**文件:**
|
|||
|
|
`trustgraph-flow/trustgraph/storage/doc_embeddings/milvus/write.py`
|
|||
|
|
`trustgraph-flow/trustgraph/storage/doc_embeddings/pinecone/write.py`
|
|||
|
|
`trustgraph-flow/trustgraph/storage/doc_embeddings/qdrant/write.py`
|
|||
|
|
`trustgraph-flow/trustgraph/storage/graph_embeddings/milvus/write.py`
|
|||
|
|
`trustgraph-flow/trustgraph/storage/graph_embeddings/pinecone/write.py`
|
|||
|
|
`trustgraph-flow/trustgraph/storage/graph_embeddings/qdrant/write.py`
|
|||
|
|
`trustgraph-flow/trustgraph/storage/objects/cassandra/write.py`
|
|||
|
|
`trustgraph-flow/trustgraph/storage/triples/cassandra/write.py`
|
|||
|
|
`trustgraph-flow/trustgraph/storage/triples/falkordb/write.py`
|
|||
|
|
`trustgraph-flow/trustgraph/storage/triples/memgraph/write.py`
|
|||
|
|
`trustgraph-flow/trustgraph/storage/triples/neo4j/write.py`
|
|||
|
|
|
|||
|
|
**实现模式 (所有服务):**
|
|||
|
|
|
|||
|
|
1. **在 `__init__` 中注册配置处理程序:**
|
|||
|
|
```python
|
|||
|
|
# Add after AsyncProcessor initialization
|
|||
|
|
self.register_config_handler(self.on_collection_config)
|
|||
|
|
self.known_collections = set() # Track (user, collection) tuples
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **实现配置处理器:**
|
|||
|
|
```python
|
|||
|
|
async def on_collection_config(self, config, version):
|
|||
|
|
"""Handle collection configuration updates"""
|
|||
|
|
logger.info(f"Collection config version: {version}")
|
|||
|
|
|
|||
|
|
if "collections" not in config:
|
|||
|
|
return
|
|||
|
|
|
|||
|
|
# Parse collections from config
|
|||
|
|
# Key format: "user:collection" in config["collections"]
|
|||
|
|
config_collections = set()
|
|||
|
|
for key in config["collections"].keys():
|
|||
|
|
if ":" in key:
|
|||
|
|
user, collection = key.split(":", 1)
|
|||
|
|
config_collections.add((user, collection))
|
|||
|
|
|
|||
|
|
# Determine changes
|
|||
|
|
to_create = config_collections - self.known_collections
|
|||
|
|
to_delete = self.known_collections - config_collections
|
|||
|
|
|
|||
|
|
# Create new collections (idempotent)
|
|||
|
|
for user, collection in to_create:
|
|||
|
|
try:
|
|||
|
|
await self.create_collection_internal(user, collection)
|
|||
|
|
self.known_collections.add((user, collection))
|
|||
|
|
logger.info(f"Created collection: {user}/{collection}")
|
|||
|
|
except Exception as e:
|
|||
|
|
logger.error(f"Failed to create {user}/{collection}: {e}")
|
|||
|
|
|
|||
|
|
# Delete removed collections (idempotent)
|
|||
|
|
for user, collection in to_delete:
|
|||
|
|
try:
|
|||
|
|
await self.delete_collection_internal(user, collection)
|
|||
|
|
self.known_collections.discard((user, collection))
|
|||
|
|
logger.info(f"Deleted collection: {user}/{collection}")
|
|||
|
|
except Exception as e:
|
|||
|
|
logger.error(f"Failed to delete {user}/{collection}: {e}")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
3. **初始化启动时的已知集合:**
|
|||
|
|
```python
|
|||
|
|
async def start(self):
|
|||
|
|
"""Start the processor"""
|
|||
|
|
await super().start()
|
|||
|
|
await self.sync_known_collections()
|
|||
|
|
|
|||
|
|
async def sync_known_collections(self):
|
|||
|
|
"""Query backend to populate known_collections set"""
|
|||
|
|
# Backend-specific implementation:
|
|||
|
|
# - Milvus/Pinecone/Qdrant: List collections/indexes matching naming pattern
|
|||
|
|
# - Cassandra: Query keyspaces or collection metadata
|
|||
|
|
# - Neo4j/Memgraph/FalkorDB: Query CollectionMetadata nodes
|
|||
|
|
pass
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
4. **重构现有的处理方法:**
|
|||
|
|
```python
|
|||
|
|
# Rename and remove response sending:
|
|||
|
|
# handle_create_collection → create_collection_internal
|
|||
|
|
# handle_delete_collection → delete_collection_internal
|
|||
|
|
|
|||
|
|
async def create_collection_internal(self, user, collection):
|
|||
|
|
"""Create collection (idempotent)"""
|
|||
|
|
# Same logic as current handle_create_collection
|
|||
|
|
# But remove response producer calls
|
|||
|
|
# Handle "already exists" gracefully
|
|||
|
|
pass
|
|||
|
|
|
|||
|
|
async def delete_collection_internal(self, user, collection):
|
|||
|
|
"""Delete collection (idempotent)"""
|
|||
|
|
# Same logic as current handle_delete_collection
|
|||
|
|
# But remove response producer calls
|
|||
|
|
# Handle "not found" gracefully
|
|||
|
|
pass
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
5. **移除存储管理基础设施:**
|
|||
|
|
移除 `self.storage_request_consumer` 的配置和启动
|
|||
|
|
移除 `self.storage_response_producer` 的配置
|
|||
|
|
移除 `on_storage_management` 的调度器方法
|
|||
|
|
移除存储管理的指标
|
|||
|
|
移除导入:`StorageManagementRequest`, `StorageManagementResponse`
|
|||
|
|
|
|||
|
|
**后端特定注意事项:**
|
|||
|
|
|
|||
|
|
**向量存储 (Milvus, Pinecone, Qdrant):** 跟踪 `(user, collection)` 在 `known_collections` 中的逻辑,但可能会为每个维度创建多个后端集合。继续采用延迟创建模式。删除操作必须删除所有维度变体。
|
|||
|
|
|
|||
|
|
**Cassandra Objects:** 集合是行属性,而不是结构。跟踪键空间级别的信息。
|
|||
|
|
|
|||
|
|
**图数据库 (Neo4j, Memgraph, FalkorDB):** 启动时查询 `CollectionMetadata` 节点。在同步时创建/删除元数据节点。
|
|||
|
|
|
|||
|
|
**Cassandra 三元组:** 使用 `KnowledgeGraph` API 进行集合操作。
|
|||
|
|
|
|||
|
|
**关键设计要点:**
|
|||
|
|
|
|||
|
|
**最终一致性:** 没有请求/响应机制,配置推送是广播的
|
|||
|
|
**幂等性:** 所有创建/删除操作都必须可以安全重试
|
|||
|
|
**错误处理:** 记录错误,但不要阻止配置更新
|
|||
|
|
**自愈:** 失败的操作将在下一次配置推送时重试
|
|||
|
|
**集合键格式:** `"user:collection"` 在 `config["collections"]` 中
|
|||
|
|
|
|||
|
|
#### 变更 11:更新集合模式 - 移除时间戳
|
|||
|
|
**文件:** `trustgraph-base/trustgraph/schema/services/collection.py`
|
|||
|
|
|
|||
|
|
**修改 CollectionMetadata (第 13-21 行):**
|
|||
|
|
移除 `created_at` 和 `updated_at` 字段:
|
|||
|
|
```python
|
|||
|
|
class CollectionMetadata(Record):
|
|||
|
|
user = String()
|
|||
|
|
collection = String()
|
|||
|
|
name = String()
|
|||
|
|
description = String()
|
|||
|
|
tags = Array(String())
|
|||
|
|
# Remove: created_at = String()
|
|||
|
|
# Remove: updated_at = String()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**修改 CollectionManagementRequest (第 25-47 行):**
|
|||
|
|
移除时间戳字段:
|
|||
|
|
```python
|
|||
|
|
class CollectionManagementRequest(Record):
|
|||
|
|
operation = String()
|
|||
|
|
user = String()
|
|||
|
|
collection = String()
|
|||
|
|
timestamp = String()
|
|||
|
|
name = String()
|
|||
|
|
description = String()
|
|||
|
|
tags = Array(String())
|
|||
|
|
# Remove: created_at = String()
|
|||
|
|
# Remove: updated_at = String()
|
|||
|
|
tag_filter = Array(String())
|
|||
|
|
limit = Integer()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Rationale:**
|
|||
|
|
Timestamps don't add value for collections
|
|||
|
|
Config service maintains its own version tracking
|
|||
|
|
Simplifies schema and reduces storage
|
|||
|
|
|
|||
|
|
#### Benefits of Config Service Migration
|
|||
|
|
|
|||
|
|
1. ✅ **Eliminates hardcoded storage management topics** - Solves multi-tenant blocker
|
|||
|
|
2. ✅ **Simpler coordination** - No complex async waiting for 4+ storage responses
|
|||
|
|
3. ✅ **Eventual consistency** - Storage services update independently via config push
|
|||
|
|
4. ✅ **Better reliability** - Persistent config push vs non-persistent request/response
|
|||
|
|
5. ✅ **Unified configuration model** - Collections treated as configuration
|
|||
|
|
6. ✅ **Reduces complexity** - Removes ~300 lines of coordination code
|
|||
|
|
7. ✅ **Multi-tenant ready** - Config already supports tenant isolation via keyspace
|
|||
|
|
8. ✅ **Version tracking** - Config service version mechanism provides audit trail
|
|||
|
|
|
|||
|
|
## Implementation Notes
|
|||
|
|
|
|||
|
|
### Backward Compatibility
|
|||
|
|
|
|||
|
|
**Parameter Changes:**
|
|||
|
|
CLI parameter renames are breaking changes but acceptable (feature currently non-functional)
|
|||
|
|
Services work without parameters (use defaults)
|
|||
|
|
Default keyspaces preserved: "config", "knowledge", "librarian"
|
|||
|
|
Default queue: `persistent://tg/config/config`
|
|||
|
|
|
|||
|
|
**Collection Management:**
|
|||
|
|
**Breaking change:** Collections table removed from librarian keyspace
|
|||
|
|
**No data migration provided** - acceptable for this phase
|
|||
|
|
External collection API unchanged (list/update/delete operations)
|
|||
|
|
Collection metadata format simplified (timestamps removed)
|
|||
|
|
|
|||
|
|
### Testing Requirements
|
|||
|
|
|
|||
|
|
**Parameter Testing:**
|
|||
|
|
1. Verify `--config-push-queue` parameter works on graph-embeddings service
|
|||
|
|
2. Verify `--config-push-queue` parameter works on text-completion service
|
|||
|
|
3. Verify `--config-push-queue` parameter works on config service
|
|||
|
|
4. Verify `--cassandra-keyspace` parameter works for config service
|
|||
|
|
5. Verify `--cassandra-keyspace` parameter works for cores service
|
|||
|
|
6. Verify `--cassandra-keyspace` parameter works for librarian service
|
|||
|
|
7. Verify services work without parameters (uses defaults)
|
|||
|
|
8. Verify multi-tenant deployment with custom queue names and keyspace
|
|||
|
|
|
|||
|
|
**Collection Management Testing:**
|
|||
|
|
9. Verify `list-collections` operation via config service
|
|||
|
|
10. Verify `update-collection` creates/updates in config table
|
|||
|
|
11. Verify `delete-collection` removes from config table
|
|||
|
|
12. Verify config push is triggered on collection updates
|
|||
|
|
13. Verify tag filtering works with config-based storage
|
|||
|
|
14. Verify collection operations work without timestamp fields
|
|||
|
|
|
|||
|
|
### Multi-Tenant Deployment Example
|
|||
|
|
```bash
|
|||
|
|
# Tenant: tg-dev
|
|||
|
|
graph-embeddings \
|
|||
|
|
-p pulsar+ssl://broker:6651 \
|
|||
|
|
--pulsar-api-key <KEY> \
|
|||
|
|
--config-push-queue persistent://tg-dev/config/config
|
|||
|
|
|
|||
|
|
config-service \
|
|||
|
|
-p pulsar+ssl://broker:6651 \
|
|||
|
|
--pulsar-api-key <KEY> \
|
|||
|
|
--config-push-queue persistent://tg-dev/config/config \
|
|||
|
|
--cassandra-keyspace tg_dev_config
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 影响分析
|
|||
|
|
|
|||
|
|
### 受变更 1-2 影响的服务 (CLI 参数重命名)
|
|||
|
|
所有继承自 AsyncProcessor 或 FlowProcessor 的服务:
|
|||
|
|
config-service
|
|||
|
|
cores-service
|
|||
|
|
librarian-service
|
|||
|
|
graph-embeddings
|
|||
|
|
document-embeddings
|
|||
|
|
text-completion-* (所有提供者)
|
|||
|
|
extract-* (所有提取器)
|
|||
|
|
query-* (所有查询服务)
|
|||
|
|
retrieval-* (所有 RAG 服务)
|
|||
|
|
storage-* (所有存储服务)
|
|||
|
|
还有 20 多个服务
|
|||
|
|
|
|||
|
|
### 受变更 3-6 影响的服务 (Cassandra Keyspace)
|
|||
|
|
config-service
|
|||
|
|
cores-service
|
|||
|
|
librarian-service
|
|||
|
|
|
|||
|
|
### 受变更 7-11 影响的服务 (集合管理)
|
|||
|
|
|
|||
|
|
**即时变更:**
|
|||
|
|
librarian-service (collection_manager.py, service.py)
|
|||
|
|
tables/library.py (删除 collections 表)
|
|||
|
|
schema/services/collection.py (删除时间戳)
|
|||
|
|
|
|||
|
|
**已完成的变更 (变更 10):** ✅
|
|||
|
|
所有存储服务 (共 11 个) - 已迁移到配置推送,用于通过 `CollectionConfigHandler` 更新集合
|
|||
|
|
存储管理模式已从 `storage.py` 中移除
|
|||
|
|
|
|||
|
|
## 未来考虑
|
|||
|
|
|
|||
|
|
### 基于用户的 Keyspace 模式
|
|||
|
|
|
|||
|
|
一些服务使用 **基于用户的 Keyspace** 动态模式,其中每个用户都拥有自己的 Cassandra Keyspace:
|
|||
|
|
|
|||
|
|
**使用基于用户的 Keyspace 的服务:**
|
|||
|
|
1. **三元组查询服务** (`trustgraph-flow/trustgraph/query/triples/cassandra/service.py:65`)
|
|||
|
|
使用 `keyspace=query.user`
|
|||
|
|
2. **对象查询服务** (`trustgraph-flow/trustgraph/query/objects/cassandra/service.py:479`)
|
|||
|
|
使用 `keyspace=self.sanitize_name(user)`
|
|||
|
|
3. **知识图谱直接访问** (`trustgraph-flow/trustgraph/direct/cassandra_kg.py:18`)
|
|||
|
|
默认参数 `keyspace="trustgraph"`
|
|||
|
|
|
|||
|
|
**状态:** 这些 **未进行修改**,在本规范中。
|
|||
|
|
|
|||
|
|
**需要未来审查:**
|
|||
|
|
评估基于用户的 Keyspace 模式是否会产生租户隔离问题
|
|||
|
|
考虑是否需要为多租户部署使用 Keyspace 前缀模式 (例如,`tenant_a_user1`)
|
|||
|
|
审查是否存在用户 ID 在租户之间的冲突
|
|||
|
|
评估是否更倾向于使用单个共享 Keyspace,每个租户使用基于用户的行隔离
|
|||
|
|
|
|||
|
|
**注意:** 这不会阻止当前的 multi-tenant 实现,但在进行生产 multi-tenant 部署之前应进行审查。
|
|||
|
|
|
|||
|
|
## 实施阶段
|
|||
|
|
|
|||
|
|
### 第一阶段:参数修复 (变更 1-6)
|
|||
|
|
修复 `--config-push-queue` 参数命名
|
|||
|
|
添加 `--cassandra-keyspace` 参数支持
|
|||
|
|
**结果:** 启用了 multi-tenant 队列和 Keyspace 配置
|
|||
|
|
|
|||
|
|
### 第二阶段:集合管理迁移 (变更 7-9, 11)
|
|||
|
|
将集合存储迁移到配置服务
|
|||
|
|
从 librarian 中删除 collections 表
|
|||
|
|
更新集合模式 (删除时间戳)
|
|||
|
|
**结果:** 消除硬编码的存储管理主题,简化 librarian
|
|||
|
|
|
|||
|
|
### 第三阶段:存储服务更新 (变更 10) ✅ 已完成
|
|||
|
|
所有存储服务已更新为使用配置推送进行集合管理,通过 `CollectionConfigHandler`
|
|||
|
|
移除了存储管理请求/响应基础设施
|
|||
|
|
移除了旧的模式定义
|
|||
|
|
**结果:** 实现了基于配置的集合管理
|
|||
|
|
|
|||
|
|
## 引用
|
|||
|
|
GitHub Issue: https://github.com/trustgraph-ai/trustgraph/issues/582
|
|||
|
|
相关文件:
|
|||
|
|
`trustgraph-base/trustgraph/base/async_processor.py`
|
|||
|
|
`trustgraph-base/trustgraph/base/cassandra_config.py`
|
|||
|
|
`trustgraph-base/trustgraph/schema/core/topic.py`
|
|||
|
|
`trustgraph-base/trustgraph/schema/services/collection.py`
|
|||
|
|
`trustgraph-flow/trustgraph/config/service/service.py`
|
|||
|
|
`trustgraph-flow/trustgraph/cores/service.py`
|
|||
|
|
`trustgraph-flow/trustgraph/librarian/service.py`
|
|||
|
|
`trustgraph-flow/trustgraph/librarian/collection_manager.py`
|
|||
|
|
`trustgraph-flow/trustgraph/tables/library.py`
|