mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 08:26:21 +02:00
Feat: TrustGraph i18n & Documentation Translation Updates (#781)
Native CLI i18n: The TrustGraph CLI has built-in translation support that dynamically loads language strings. You can test and use different languages by simply passing the --lang flag (e.g., --lang es for Spanish, --lang ru for Russian) or by configuring your environment's LANG variable. Automated Docs Translations: This PR introduces autonomously translated Markdown documentation into several target languages, including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew, Arabic, Simplified Chinese, and Russian.
This commit is contained in:
parent
19f73e4cdc
commit
f95fd4f052
560 changed files with 236300 additions and 99 deletions
192
docs/tech-specs/explainability-cli.zh-cn.md
Normal file
192
docs/tech-specs/explainability-cli.zh-cn.md
Normal file
|
|
@ -0,0 +1,192 @@
|
|||
---
|
||||
layout: default
|
||||
title: "可解释 CLI 技术规范"
|
||||
parent: "Chinese (Beta)"
|
||||
---
|
||||
|
||||
# 可解释 CLI 技术规范
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## 状态
|
||||
|
||||
草稿
|
||||
|
||||
## 概述
|
||||
|
||||
本规范描述了用于在 TrustGraph 中调试和探索可解释数据的 CLI 工具。这些工具使用户能够跟踪答案的生成方式,并从边向源文档追溯查询的来源链。
|
||||
|
||||
三个 CLI 工具:
|
||||
|
||||
1. **`tg-show-document-hierarchy`** - 显示文档 → 页面 → 块 → 边层级结构
|
||||
2. **`tg-list-explain-traces`** - 列出所有 GraphRAG 会话,包含问题
|
||||
3. **`tg-show-explain-trace`** - 显示会话的完整可解释性跟踪
|
||||
|
||||
## 目标
|
||||
|
||||
- **调试**: 允许开发者检查文档处理结果
|
||||
- **可追溯性**: 追踪任何提取的事实,追溯到其原始文档
|
||||
- **透明性**: 明确显示 GraphRAG 如何得出答案
|
||||
- **易用性**: 简单的 CLI 界面,带有合理的默认设置
|
||||
|
||||
## 背景
|
||||
|
||||
TrustGraph 拥有两个来源系统:
|
||||
|
||||
1. **摄取时来源**: (见 `extraction-time-provenance.md`) - 记录文档 → 页面 → 块 → 边的关系,发生在摄取时。存储在名为 `urn:graph:source` 的图表中,使用 `prov:wasDerivedFrom` 属性。
|
||||
|
||||
2. **查询时可解释性**: (见 `query-time-explainability.md`) - 记录问题 → 探索 → 重点 → 总结链,发生在 GraphRAG 查询时。存储在名为 `urn:graph:retrieval` 的图表中。
|
||||
|
||||
当前限制:
|
||||
- 没有简单的方法来可视化文档层级结构,在处理后
|
||||
- 必须手动查询三元组来查看可解释性数据
|
||||
- 没有 GraphRAG 会话的综合视图
|
||||
|
||||
## 技术设计
|
||||
|
||||
### 工具 1: `tg-show-document-hierarchy`
|
||||
|
||||
**目的**: 针对特定文档 ID,遍历并显示所有派生的实体。
|
||||
|
||||
**用法**:
|
||||
```bash
|
||||
tg-show-document-hierarchy "urn:trustgraph:doc:abc123"
|
||||
tg-show-document-hierarchy --show-content --max-content 500 "urn:trustgraph:doc:abc123"
|
||||
```
|
||||
|
||||
**参数**:
|
||||
| 参数 | 描述 |
|
||||
|---|---|
|
||||
| `document_id` | 文档 URI (位置参数) |
|
||||
| `-u/--api-url` | API URL |
|
||||
| `-t/--token` | 身份验证令牌 |
|
||||
| `-U/--user` | 用户 ID (默认: `trustgraph`) |
|
||||
| `-C/--collection` | 集合 (默认: `default`) |
|
||||
| `--show-content` | 包含内容 (blob/文档内容) |
|
||||
| `--max-content` | 每个 blob 的最大字符数 (默认: 200) |
|
||||
| `--format` | 输出格式: `tree` (默认), `json` |
|
||||
|
||||
**实现**:
|
||||
1. 查询三元组: `?child prov:wasDerivedFrom <document_id>` 在 `urn:graph:source` 图表中
|
||||
2. 递归查询每个结果的子节点
|
||||
3. 构建树结构: 文档 → 页面 → 块
|
||||
4. 如果 `--show-content`,则从 librarian API 获取内容
|
||||
5. 以缩进树或 JSON 格式显示
|
||||
|
||||
**输出示例**:
|
||||
```
|
||||
Document: urn:trustgraph:doc:abc123
|
||||
Title: "Sample PDF"
|
||||
Type: application/pdf
|
||||
|
||||
└── Page 1: urn:trustgraph:doc:abc123/p1
|
||||
├── Chunk 0: urn:trustgraph:doc:abc123/p1/c0
|
||||
Content: "The quick brown fox..." [truncated]
|
||||
└── Chunk 1: urn:trustgraph:doc:abc123/p1/c1
|
||||
Content: "Machine learning is..." [truncated]
|
||||
```
|
||||
|
||||
### 工具 2: `tg-list-explain-traces`
|
||||
|
||||
**目的**: 列出 GraphRAG 会话(问题)在集合中的所有实例。
|
||||
|
||||
**用法**:
|
||||
```bash
|
||||
tg-list-explain-traces
|
||||
tg-list-explain-traces --limit 20 --format json
|
||||
```
|
||||
|
||||
**参数**:
|
||||
| 参数 | 描述 |
|
||||
|---|---|
|
||||
| `-u/--api-url` | API URL |
|
||||
| `-t/--token` | 身份验证令牌 |
|
||||
| `-U/--user` | 用户 ID |
|
||||
| `-C/--collection` | 集合 |
|
||||
| `--limit` | 最大结果数 (默认: 50) |
|
||||
| `--format` | 输出格式: `table` (默认), `json` |
|
||||
|
||||
**实现**:
|
||||
1. 查询: `?session tg:query ?text` 在 `urn:graph:retrieval` 图表中
|
||||
2. 查询时间戳: `?session prov:startedAtTime ?time`
|
||||
3. 以表格形式显示
|
||||
|
||||
**输出示例**:
|
||||
```
|
||||
Session ID | Question | Time
|
||||
----------------------------------------------|--------------------------------|---------------------
|
||||
urn:trustgraph:question:abc123 | What was the War on Terror? | 2024-01-15 10:30:00
|
||||
urn:trustgraph:question:def456 | Who founded OpenAI? | 2024-01-15 09:15:00
|
||||
```
|
||||
|
||||
### 工具 3: `tg-show-explain-trace`
|
||||
|
||||
**目的**: 显示 GraphRAG 会话的完整可解释性跟踪。
|
||||
|
||||
**用法**:
|
||||
```bash
|
||||
tg-show-explain-trace "urn:trustgraph:question:abc123"
|
||||
tg-show-explain-trace --max-answer 1000 --show-provenance "urn:trustgraph:question:abc123"
|
||||
```
|
||||
|
||||
**参数**:
|
||||
| 参数 | 描述 |
|
||||
|---|---|
|
||||
| `question_id` | 问题 URI (位置参数) |
|
||||
| `-u/--api-url` | API URL |
|
||||
| `-t/--token` | 身份验证令牌 |
|
||||
| `-U/--user` | 用户 ID |
|
||||
| `-C/--collection` | 集合 |
|
||||
| `--max-answer` | 答案的最大字符数 (默认: 500) |
|
||||
| `--show-provenance` | 显示来源文档的边 |
|
||||
| `--format` | 输出格式: `text` (默认), `json` |
|
||||
|
||||
**实现**:
|
||||
1. 从 `tg:query` 谓词中获取问题文本
|
||||
2. 查找探索: `?exp prov:wasGeneratedBy <question_id>`
|
||||
3. 查找重点: `?focus prov:wasDerivedFrom <exploration_id>`
|
||||
4. 获取选定的边: `<focus_id> tg:selectedEdge ?edge`
|
||||
5. 对于每个边,获取 `tg:edge` (三元组) 和 `tg:reasoning`
|
||||
6. 查找总结: `?synth prov:wasDerivedFrom <focus_id>`
|
||||
7. 通过 librarian API 获取答案
|
||||
8. 如果 `--show-provenance`,则跟踪指向来源文档的边
|
||||
|
||||
**输出示例**:
|
||||
```
|
||||
=== GraphRAG Session: urn:trustgraph:question:abc123 ===
|
||||
|
||||
Question: What was the War on Terror?
|
||||
Time: 2024-01-15 10:30:00
|
||||
|
||||
--- Exploration ---
|
||||
Retrieved 50 edges from knowledge graph
|
||||
|
||||
--- Focus (Edge Selection) ---
|
||||
Selected 12 edges:
|
||||
|
||||
1. (War on Terror, definition, "A military campaign...")
|
||||
Reasoning: Directly defines the subject of the query
|
||||
Source: chunk → page 2 → "Beyond the Vigilant State"
|
||||
|
||||
2. (Guantanamo Bay, part_of, War on Terror)
|
||||
Reasoning: Shows key component of the campaign
|
||||
|
||||
--- Synthesis ---
|
||||
Answer:
|
||||
The War on Terror was a military campaign initiated...
|
||||
[truncated at 500 chars]
|
||||
```
|
||||
|
||||
## 创建的文件
|
||||
|
||||
| 文件 | 目的 |
|
||||
|---|---|
|
||||
| `trustgraph-cli/trustgraph/cli/show_document_hierarchy.py` | 工具 1 |
|
||||
| `trustgraph-cli/trustgraph/cli/list_explain_traces.py` | 工具 2 |
|
||||
| `trustgraph-cli/trustgraph/cli/show_explain_trace.py` | 工具 3 |
|
||||
|
||||
## 引用
|
||||
|
||||
- 咨询时间可解释性: `docs/tech-specs/query-time-explainability.md`
|
||||
- 摄取时来源: `docs/tech-specs/extraction-time-provenance.md`
|
||||
- 现有 CLI 示例: `trustgraph-cli/trustgraph/cli/invoke_graph_rag.py`
|
||||
Loading…
Add table
Add a link
Reference in a new issue