mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 16:36:21 +02:00
* Added tech spec
* Add provenance recording to React agent loop
Enables agent sessions to be traced and debugged using the same
explainability infrastructure as GraphRAG. Agent traces record:
- Session start with query and timestamp
- Each iteration's thought, action, arguments, and observation
- Final answer with derivation chain
Changes:
- Add session_id and collection fields to AgentRequest schema
- Add agent predicates (TG_THOUGHT, TG_ACTION, etc.) to namespaces
- Create agent provenance triple generators in provenance/agent.py
- Register explainability producer in agent service
- Emit provenance triples during agent execution
- Update CLI tools to detect and render agent traces alongside GraphRAG
* Updated explainability taxonomy:
GraphRAG: tg:Question → tg:Exploration → tg:Focus → tg:Synthesis
Agent: tg:Question → tg:Analysis(s) → tg:Conclusion
All entities also have their PROV-O type (prov:Activity or prov:Entity).
Updated commit message:
Add provenance recording to React agent loop
Enables agent sessions to be traced and debugged using the same
explainability infrastructure as GraphRAG.
Entity types follow human reasoning patterns:
- tg:Question - the user's query (shared with GraphRAG)
- tg:Analysis - each think/act/observe cycle
- tg:Conclusion - the final answer
Also adds explicit TG types to GraphRAG entities:
- tg:Question, tg:Exploration, tg:Focus, tg:Synthesis
All types retain their PROV-O base types (prov:Activity, prov:Entity).
Changes:
- Add session_id and collection fields to AgentRequest schema
- Add explainability entity types to namespaces.py
- Create agent provenance triple generators
- Register explainability producer in agent service
- Emit provenance triples during agent execution
- Update CLI tools to detect and render both trace types
* Document RAG explainability is now complete. Here's a summary of the
changes made:
Schema Changes:
- trustgraph-base/trustgraph/schema/services/retrieval.py: Added
explain_id and explain_graph fields to DocumentRagResponse
- trustgraph-base/trustgraph/messaging/translators/retrieval.py:
Updated translator to handle explainability fields
Provenance Changes:
- trustgraph-base/trustgraph/provenance/namespaces.py: Added
TG_CHUNK_COUNT and TG_SELECTED_CHUNK predicates
- trustgraph-base/trustgraph/provenance/uris.py: Added
docrag_question_uri, docrag_exploration_uri, docrag_synthesis_uri
generators
- trustgraph-base/trustgraph/provenance/triples.py: Added
docrag_question_triples, docrag_exploration_triples,
docrag_synthesis_triples builders
- trustgraph-base/trustgraph/provenance/__init__.py: Exported all
new Document RAG functions and predicates
Service Changes:
- trustgraph-flow/trustgraph/retrieval/document_rag/document_rag.py:
Added explainability callback support and triple emission at each
phase (Question → Exploration → Synthesis)
- trustgraph-flow/trustgraph/retrieval/document_rag/rag.py:
Registered explainability producer and wired up the callback
Documentation:
- docs/tech-specs/agent-explainability.md: Added Document RAG entity
types and provenance model documentation
Document RAG Provenance Model:
Question (urn:trustgraph:docrag:{uuid})
│
│ tg:query, prov:startedAtTime
│ rdf:type = prov:Activity, tg:Question
│
↓ prov:wasGeneratedBy
│
Exploration (urn:trustgraph:docrag:{uuid}/exploration)
│
│ tg:chunkCount, tg:selectedChunk (multiple)
│ rdf:type = prov:Entity, tg:Exploration
│
↓ prov:wasDerivedFrom
│
Synthesis (urn:trustgraph:docrag:{uuid}/synthesis)
│
│ tg:content = "The answer..."
│ rdf:type = prov:Entity, tg:Synthesis
* Specific subtype that makes the retrieval mechanism immediately
obvious:
System: GraphRAG
TG Types on Question: tg:Question, tg:GraphRagQuestion
URI Pattern: urn:trustgraph:question:{uuid}
────────────────────────────────────────
System: Document RAG
TG Types on Question: tg:Question, tg:DocRagQuestion
URI Pattern: urn:trustgraph:docrag:{uuid}
────────────────────────────────────────
System: Agent
TG Types on Question: tg:Question, tg:AgentQuestion
URI Pattern: urn:trustgraph:agent:{uuid}
Files modified:
- trustgraph-base/trustgraph/provenance/namespaces.py - Added
TG_GRAPH_RAG_QUESTION, TG_DOC_RAG_QUESTION, TG_AGENT_QUESTION
- trustgraph-base/trustgraph/provenance/triples.py - Added subtype to
question_triples and docrag_question_triples
- trustgraph-base/trustgraph/provenance/agent.py - Added subtype to
agent_session_triples
- trustgraph-base/trustgraph/provenance/__init__.py - Exported new types
- docs/tech-specs/agent-explainability.md - Documented the subtypes
This allows:
- Query all questions: ?q rdf:type tg:Question
- Query only GraphRAG: ?q rdf:type tg:GraphRagQuestion
- Query only Document RAG: ?q rdf:type tg:DocRagQuestion
- Query only Agent: ?q rdf:type tg:AgentQuestion
* Fixed tests
272 lines
9.8 KiB
Markdown
272 lines
9.8 KiB
Markdown
# Agent Explainability: Provenance Recording
|
|
|
|
## Overview
|
|
|
|
Add provenance recording to the React agent loop so agent sessions can be traced and debugged using the same explainability infrastructure as GraphRAG.
|
|
|
|
**Design Decisions:**
|
|
- Write to `urn:graph:retrieval` (generic explainability graph)
|
|
- Linear dependency chain for now (analysis N → wasDerivedFrom → analysis N-1)
|
|
- Tools are opaque black boxes (record input/output only)
|
|
- DAG support deferred to future iteration
|
|
|
|
## Entity Types
|
|
|
|
Both GraphRAG and Agent use PROV-O as the base ontology with TrustGraph-specific subtypes:
|
|
|
|
### GraphRAG Types
|
|
| Entity | PROV-O Type | TG Types | Description |
|
|
|--------|-------------|----------|-------------|
|
|
| Question | `prov:Activity` | `tg:Question`, `tg:GraphRagQuestion` | The user's query |
|
|
| Exploration | `prov:Entity` | `tg:Exploration` | Edges retrieved from knowledge graph |
|
|
| Focus | `prov:Entity` | `tg:Focus` | Selected edges with reasoning |
|
|
| Synthesis | `prov:Entity` | `tg:Synthesis` | Final answer |
|
|
|
|
### Agent Types
|
|
| Entity | PROV-O Type | TG Types | Description |
|
|
|--------|-------------|----------|-------------|
|
|
| Question | `prov:Activity` | `tg:Question`, `tg:AgentQuestion` | The user's query |
|
|
| Analysis | `prov:Entity` | `tg:Analysis` | Each think/act/observe cycle |
|
|
| Conclusion | `prov:Entity` | `tg:Conclusion` | Final answer |
|
|
|
|
### Document RAG Types
|
|
| Entity | PROV-O Type | TG Types | Description |
|
|
|--------|-------------|----------|-------------|
|
|
| Question | `prov:Activity` | `tg:Question`, `tg:DocRagQuestion` | The user's query |
|
|
| Exploration | `prov:Entity` | `tg:Exploration` | Chunks retrieved from document store |
|
|
| Synthesis | `prov:Entity` | `tg:Synthesis` | Final answer |
|
|
|
|
**Note:** Document RAG uses a subset of GraphRAG's types (no Focus step since there's no edge selection/reasoning phase).
|
|
|
|
### Question Subtypes
|
|
|
|
All Question entities share `tg:Question` as a base type but have a specific subtype to identify the retrieval mechanism:
|
|
|
|
| Subtype | URI Pattern | Mechanism |
|
|
|---------|-------------|-----------|
|
|
| `tg:GraphRagQuestion` | `urn:trustgraph:question:{uuid}` | Knowledge graph RAG |
|
|
| `tg:DocRagQuestion` | `urn:trustgraph:docrag:{uuid}` | Document/chunk RAG |
|
|
| `tg:AgentQuestion` | `urn:trustgraph:agent:{uuid}` | ReAct agent |
|
|
|
|
This allows querying all questions via `tg:Question` while filtering by specific mechanism via the subtype.
|
|
|
|
## Provenance Model
|
|
|
|
```
|
|
Question (urn:trustgraph:agent:{uuid})
|
|
│
|
|
│ tg:query = "User's question"
|
|
│ prov:startedAtTime = timestamp
|
|
│ rdf:type = prov:Activity, tg:Question
|
|
│
|
|
↓ prov:wasDerivedFrom
|
|
│
|
|
Analysis1 (urn:trustgraph:agent:{uuid}/i1)
|
|
│
|
|
│ tg:thought = "I need to query the knowledge base..."
|
|
│ tg:action = "knowledge-query"
|
|
│ tg:arguments = {"question": "..."}
|
|
│ tg:observation = "Result from tool..."
|
|
│ rdf:type = prov:Entity, tg:Analysis
|
|
│
|
|
↓ prov:wasDerivedFrom
|
|
│
|
|
Analysis2 (urn:trustgraph:agent:{uuid}/i2)
|
|
│ ...
|
|
↓ prov:wasDerivedFrom
|
|
│
|
|
Conclusion (urn:trustgraph:agent:{uuid}/final)
|
|
│
|
|
│ tg:answer = "The final response..."
|
|
│ rdf:type = prov:Entity, tg:Conclusion
|
|
```
|
|
|
|
### Document RAG Provenance Model
|
|
|
|
```
|
|
Question (urn:trustgraph:docrag:{uuid})
|
|
│
|
|
│ tg:query = "User's question"
|
|
│ prov:startedAtTime = timestamp
|
|
│ rdf:type = prov:Activity, tg:Question
|
|
│
|
|
↓ prov:wasGeneratedBy
|
|
│
|
|
Exploration (urn:trustgraph:docrag:{uuid}/exploration)
|
|
│
|
|
│ tg:chunkCount = 5
|
|
│ tg:selectedChunk = "chunk-id-1"
|
|
│ tg:selectedChunk = "chunk-id-2"
|
|
│ ...
|
|
│ rdf:type = prov:Entity, tg:Exploration
|
|
│
|
|
↓ prov:wasDerivedFrom
|
|
│
|
|
Synthesis (urn:trustgraph:docrag:{uuid}/synthesis)
|
|
│
|
|
│ tg:content = "The synthesized answer..."
|
|
│ rdf:type = prov:Entity, tg:Synthesis
|
|
```
|
|
|
|
## Changes Required
|
|
|
|
### 1. Schema Changes
|
|
|
|
**File:** `trustgraph-base/trustgraph/schema/services/agent.py`
|
|
|
|
Add `session_id` and `collection` fields to `AgentRequest`:
|
|
```python
|
|
@dataclass
|
|
class AgentRequest:
|
|
question: str = ""
|
|
state: str = ""
|
|
group: list[str] | None = None
|
|
history: list[AgentStep] = field(default_factory=list)
|
|
user: str = ""
|
|
collection: str = "default" # NEW: Collection for provenance traces
|
|
streaming: bool = False
|
|
session_id: str = "" # NEW: For provenance tracking across iterations
|
|
```
|
|
|
|
**File:** `trustgraph-base/trustgraph/messaging/translators/agent.py`
|
|
|
|
Update translator to handle `session_id` and `collection` in both `to_pulsar()` and `from_pulsar()`.
|
|
|
|
### 2. Add Explainability Producer to Agent Service
|
|
|
|
**File:** `trustgraph-flow/trustgraph/agent/react/service.py`
|
|
|
|
Register an "explainability" producer (same pattern as GraphRAG):
|
|
```python
|
|
from ... base import ProducerSpec
|
|
from ... schema import Triples
|
|
|
|
# In __init__:
|
|
self.register_specification(
|
|
ProducerSpec(
|
|
name = "explainability",
|
|
schema = Triples,
|
|
)
|
|
)
|
|
```
|
|
|
|
### 3. Provenance Triple Generation
|
|
|
|
**File:** `trustgraph-base/trustgraph/provenance/agent.py`
|
|
|
|
Create helper functions (similar to GraphRAG's `question_triples`, `exploration_triples`, etc.):
|
|
```python
|
|
def agent_session_triples(session_uri, query, timestamp):
|
|
"""Generate triples for agent Question."""
|
|
return [
|
|
Triple(s=session_uri, p=RDF_TYPE, o=PROV_ACTIVITY),
|
|
Triple(s=session_uri, p=RDF_TYPE, o=TG_QUESTION),
|
|
Triple(s=session_uri, p=TG_QUERY, o=query),
|
|
Triple(s=session_uri, p=PROV_STARTED_AT_TIME, o=timestamp),
|
|
]
|
|
|
|
def agent_iteration_triples(iteration_uri, parent_uri, thought, action, arguments, observation):
|
|
"""Generate triples for one Analysis step."""
|
|
return [
|
|
Triple(s=iteration_uri, p=RDF_TYPE, o=PROV_ENTITY),
|
|
Triple(s=iteration_uri, p=RDF_TYPE, o=TG_ANALYSIS),
|
|
Triple(s=iteration_uri, p=TG_THOUGHT, o=thought),
|
|
Triple(s=iteration_uri, p=TG_ACTION, o=action),
|
|
Triple(s=iteration_uri, p=TG_ARGUMENTS, o=json.dumps(arguments)),
|
|
Triple(s=iteration_uri, p=TG_OBSERVATION, o=observation),
|
|
Triple(s=iteration_uri, p=PROV_WAS_DERIVED_FROM, o=parent_uri),
|
|
]
|
|
|
|
def agent_final_triples(final_uri, parent_uri, answer):
|
|
"""Generate triples for Conclusion."""
|
|
return [
|
|
Triple(s=final_uri, p=RDF_TYPE, o=PROV_ENTITY),
|
|
Triple(s=final_uri, p=RDF_TYPE, o=TG_CONCLUSION),
|
|
Triple(s=final_uri, p=TG_ANSWER, o=answer),
|
|
Triple(s=final_uri, p=PROV_WAS_DERIVED_FROM, o=parent_uri),
|
|
]
|
|
```
|
|
|
|
### 4. Type Definitions
|
|
|
|
**File:** `trustgraph-base/trustgraph/provenance/namespaces.py`
|
|
|
|
Add explainability entity types and agent predicates:
|
|
```python
|
|
# Explainability entity types (used by both GraphRAG and Agent)
|
|
TG_QUESTION = TG + "Question"
|
|
TG_EXPLORATION = TG + "Exploration"
|
|
TG_FOCUS = TG + "Focus"
|
|
TG_SYNTHESIS = TG + "Synthesis"
|
|
TG_ANALYSIS = TG + "Analysis"
|
|
TG_CONCLUSION = TG + "Conclusion"
|
|
|
|
# Agent predicates
|
|
TG_THOUGHT = TG + "thought"
|
|
TG_ACTION = TG + "action"
|
|
TG_ARGUMENTS = TG + "arguments"
|
|
TG_OBSERVATION = TG + "observation"
|
|
TG_ANSWER = TG + "answer"
|
|
```
|
|
|
|
## Files Modified
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `trustgraph-base/trustgraph/schema/services/agent.py` | Add session_id and collection to AgentRequest |
|
|
| `trustgraph-base/trustgraph/messaging/translators/agent.py` | Update translator for new fields |
|
|
| `trustgraph-base/trustgraph/provenance/namespaces.py` | Add entity types, agent predicates, and Document RAG predicates |
|
|
| `trustgraph-base/trustgraph/provenance/triples.py` | Add TG types to GraphRAG triple builders, add Document RAG triple builders |
|
|
| `trustgraph-base/trustgraph/provenance/uris.py` | Add Document RAG URI generators |
|
|
| `trustgraph-base/trustgraph/provenance/__init__.py` | Export new types, predicates, and Document RAG functions |
|
|
| `trustgraph-base/trustgraph/schema/services/retrieval.py` | Add explain_id and explain_graph to DocumentRagResponse |
|
|
| `trustgraph-base/trustgraph/messaging/translators/retrieval.py` | Update DocumentRagResponseTranslator for explainability fields |
|
|
| `trustgraph-flow/trustgraph/agent/react/service.py` | Add explainability producer + recording logic |
|
|
| `trustgraph-flow/trustgraph/retrieval/document_rag/document_rag.py` | Add explainability callback and emit provenance triples |
|
|
| `trustgraph-flow/trustgraph/retrieval/document_rag/rag.py` | Add explainability producer and wire up callback |
|
|
| `trustgraph-cli/trustgraph/cli/show_explain_trace.py` | Handle agent trace types |
|
|
| `trustgraph-cli/trustgraph/cli/list_explain_traces.py` | List agent sessions alongside GraphRAG |
|
|
|
|
## Files Created
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `trustgraph-base/trustgraph/provenance/agent.py` | Agent-specific triple generators |
|
|
|
|
## CLI Updates
|
|
|
|
**Detection:** Both GraphRAG and Agent Questions have `tg:Question` type. Distinguished by:
|
|
1. URI pattern: `urn:trustgraph:agent:` vs `urn:trustgraph:question:`
|
|
2. Derived entities: `tg:Analysis` (agent) vs `tg:Exploration` (GraphRAG)
|
|
|
|
**`list_explain_traces.py`:**
|
|
- Shows Type column (Agent vs GraphRAG)
|
|
|
|
**`show_explain_trace.py`:**
|
|
- Auto-detects trace type
|
|
- Agent rendering shows: Question → Analysis step(s) → Conclusion
|
|
|
|
## Backwards Compatibility
|
|
|
|
- `session_id` defaults to `""` - old requests work, just won't have provenance
|
|
- `collection` defaults to `"default"` - reasonable fallback
|
|
- CLI gracefully handles both trace types
|
|
|
|
## Verification
|
|
|
|
```bash
|
|
# Run an agent query
|
|
tg-invoke-agent -q "What is the capital of France?"
|
|
|
|
# List traces (should show agent sessions with Type column)
|
|
tg-list-explain-traces -U trustgraph -C default
|
|
|
|
# Show agent trace
|
|
tg-show-explain-trace "urn:trustgraph:agent:xxx"
|
|
```
|
|
|
|
## Future Work (Not This PR)
|
|
|
|
- DAG dependencies (when analysis N uses results from multiple prior analyses)
|
|
- Tool-specific provenance linking (KnowledgeQuery → its GraphRAG trace)
|
|
- Streaming provenance emission (emit as we go, not batch at end)
|