mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 08:26:21 +02:00
Native CLI i18n: The TrustGraph CLI has built-in translation support that dynamically loads language strings. You can test and use different languages by simply passing the --lang flag (e.g., --lang es for Spanish, --lang ru for Russian) or by configuring your environment's LANG variable. Automated Docs Translations: This PR introduces autonomously translated Markdown documentation into several target languages, including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew, Arabic, Simplified Chinese, and Russian.
4.7 KiB
4.7 KiB
| layout | title | parent |
|---|---|---|
| default | 结构化数据 Pulsar 模式更改 | Chinese (Beta) |
结构化数据 Pulsar 模式更改
Beta Translation: This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
概述
根据 STRUCTURED_DATA.md 规范,本文件提出必要的 Pulsar 模式添加和修改,以支持 TrustGraph 中的结构化数据功能。
必需的模式更改
1. 核心模式增强
增强的字段定义
现有的 Field 类在 core/primitives.py 中需要额外的属性:
class Field(Record):
name = String()
type = String() # int, string, long, bool, float, double, timestamp
size = Integer()
primary = Boolean()
description = String()
# 新字段:
required = Boolean() # 字段是否必需
enum_values = Array(String()) # 针对枚举类型的字段
indexed = Boolean() # 字段是否应进行索引
2. 新的知识模式
2.1 结构化数据提交
新的文件:knowledge/structured.py
from pulsar.schema import Record, String, Bytes, Map
from ..core.metadata import Metadata
class StructuredDataSubmission(Record):
metadata = Metadata()
format = String() # "json", "csv", "xml"
schema_name = String() # 引用配置中的模式
data = Bytes() # 原始数据,用于导入
options = Map(String()) # 格式特定的选项
2.2 结构化查询模式
3.1 NLP 到结构化查询服务
新的文件:services/nlp_query.py
from pulsar.schema import Record, String, Array, Map, Integer, Double
from ..core.primitives import Error
class NLPToStructuredQueryRequest(Record):
natural_language_query = String()
max_results = Integer()
context_hints = Map(String()) # 针对查询生成的可选上下文
class NLPToStructuredQueryResponse(Record):
error = Error()
graphql_query = String() # 生成的 GraphQL 查询
variables = Map(String()) # 如果有的话,GraphQL 变量
detected_schemas = Array(String()) # 查询的目标模式
confidence = Double()
3.2 结构化查询服务
新的文件:services/structured_query.py
from pulsar.schema import Record, String, Map, Array
from ..core.primitives import Error
class StructuredQueryRequest(Record):
query = String() # GraphQL 查询
variables = Map(String()) # GraphQL 变量
operation_name = String() # 针对多操作文档的可选操作名称
class StructuredQueryResponse(Record):
error = Error()
data = String() # JSON 编码的 GraphQL 响应数据
errors = Array(String()) # 如果有的话,GraphQL 错误
2.2 对象提取输出
新的文件:knowledge/object.py
from pulsar.schema import Record, String, Map, Double
from ..core.metadata import Metadata
class ExtractedObject(Record):
metadata = Metadata()
schema_name = String() # 此对象属于哪个模式
values = Map(String()) # 字段名称 -> 值
confidence = Double()
source_span = String() # 对象所在的文本范围
4. 增强的知识模式
4.1 对象嵌入增强
更新 knowledge/embeddings.py 以更好地支持结构化对象嵌入:
class StructuredObjectEmbedding(Record):
metadata = Metadata()
vectors = Array(Array(Double()))
schema_name = String()
object_id = String() # 主键值
field_embeddings = Map(Array(Double())) # 针对每个字段的嵌入
集成点
流集成
这些模式将由新的流模块使用:
trustgraph-flow/trustgraph/decoding/structured- 使用 StructuredDataSubmissiontrustgraph-flow/trustgraph/query/nlp_query/cassandra- 使用 NLP 查询模式trustgraph-flow/trustgraph/query/objects/cassandra- 使用结构化查询模式trustgraph-flow/trustgraph/extract/object/row/- 消耗 Chunk,产生 ExtractedObjecttrustgraph-flow/trustgraph/storage/objects/cassandra- 使用 Rows 模式trustgraph-flow/trustgraph/embeddings/object_embeddings/qdrant- 使用对象嵌入模式
实现说明
- 模式版本控制: 考虑为 RowSchema 添加
version字段,以便进行未来迁移支持 - 类型系统:
Field.type应该支持所有 Cassandra 原生类型 - 批量操作: 大多数服务都应该支持单个和批量操作
- 错误处理: 所有新服务的错误报告应保持一致
- 向后兼容性: 现有的模式不受影响,只有字段进行了轻微增强
接下来要做的事
- 在新的结构中实现模式文件
- 更新现有服务以识别新的模式类型
- 实现使用这些模式的流模块
- 为新的服务添加网关/反向网关端点
- 创建模式验证的单元测试