Addresses recommendations from the UX developer's agent experience report.
Adds provenance predicates, DAG structure changes, error resilience, and
a published OWL ontology.
Explainability additions:
- Tool candidates: tg:toolCandidate on Analysis events lists the tools
visible to the LLM for each iteration (names only, descriptions in config)
- Termination reason: tg:terminationReason on Conclusion/Synthesis events
(final-answer, plan-complete, subagents-complete)
- Step counter: tg:stepNumber on iteration events
- Pattern decision: new tg:PatternDecision entity in the DAG between
session and first iteration, carrying tg:pattern and tg:taskType
- Latency: tg:llmDurationMs on Analysis events, tg:toolDurationMs on
Observation events
- Token counts on events: tg:inToken/tg:outToken/tg:llmModel on
Grounding, Focus, Synthesis, and Analysis events
- Tool/parse errors: tg:toolError on Observation events with tg:Error
mixin type. Parse failures return as error observations instead of
crashing the agent, giving it a chance to retry.
Envelope unification:
- Rename chunk_type to message_type across AgentResponse schema,
translator, SDK types, socket clients, CLI, and all tests.
Agent and RAG services now both use message_type on the wire.
Ontology:
- specs/ontology/trustgraph.ttl — OWL vocabulary covering all 26 classes,
7 object properties, and 36+ datatype properties including new predicates.
DAG structure tests:
- tests/unit/test_provenance/test_dag_structure.py verifies the
wasDerivedFrom chain for GraphRAG, DocumentRAG, and all three agent
patterns (react, plan, supervisor) including the pattern-decision link.
Expose LLM token usage (in_token, out_token, model) across all
service layers
Propagate token counts from LLM services through the prompt,
text-completion, graph-RAG, document-RAG, and agent orchestrator
pipelines to the API gateway and Python SDK. All fields are Optional
— None means "not available", distinguishing from a real zero count.
Key changes:
- Schema: Add in_token/out_token/model to TextCompletionResponse,
PromptResponse, GraphRagResponse, DocumentRagResponse,
AgentResponse
- TextCompletionClient: New TextCompletionResult return type. Split
into text_completion() (non-streaming) and
text_completion_stream() (streaming with per-chunk handler
callback)
- PromptClient: New PromptResult with response_type
(text/json/jsonl), typed fields (text/object/objects), and token
usage. All callers updated.
- RAG services: Accumulate token usage across all prompt calls
(extract-concepts, edge-scoring, edge-reasoning,
synthesis). Non-streaming path sends single combined response
instead of chunk + end_of_session.
- Agent orchestrator: UsageTracker accumulates tokens across
meta-router, pattern prompt calls, and react reasoning. Attached
to end_of_dialog.
- Translators: Encode token fields when not None (is not None, not truthy)
- Python SDK: RAG and text-completion methods return
TextCompletionResult (non-streaming) or RAGChunk/AgentAnswer with
token fields (streaming)
- CLI: --show-usage flag on tg-invoke-llm, tg-invoke-prompt,
tg-invoke-graph-rag, tg-invoke-document-rag, tg-invoke-agent
Refactor agent provenance so that the decision (thought + tool
selection) and the result (observation) are separate DAG entities:
Question ← Analysis+ToolUse ← Observation ← ... ← Conclusion
Analysis gains tg:ToolUse as a mixin RDF type and is emitted
before tool execution via an on_action callback in react().
This ensures sub-traces (e.g. GraphRAG) appear after their
parent Analysis in the streaming event order.
Observation becomes a standalone prov:Entity with tg:Observation
type, emitted after tool execution. The linear DAG chain runs
through Observation — subsequent iterations and the Conclusion
derive from it, not from the Analysis.
message_id is populated on streaming AgentResponse for thought
and observation chunks, using the provenance URI of the entity
being built. This lets clients group streamed chunks by entity.
Wire changes:
- provenance/agent.py: Add ToolUse type, new
agent_observation_triples(), remove observation from iteration
- agent_manager.py: Add on_action callback between reason() and
tool execution
- orchestrator/pattern_base.py: Split emit, wire message_id,
chain through observation URIs
- orchestrator/react_pattern.py: Emit Analysis via on_action
before tool runs
- agent/react/service.py: Same for non-orchestrator path
- api/explainability.py: New Observation class, updated dispatch
and chain walker
- api/types.py: Add message_id to AgentThought/AgentObservation
- cli: Render Observation separately, [analysis: tool] labels
Use max_completion_tokens for OpenAI and Azure OpenAI providers:
The OpenAI API deprecated max_tokens in favor of
max_completion_tokens for chat completions. Newer models
(gpt-4o, o1, o3) reject the old parameter with a 400 error.
AZURE_API_VERSION env var now overrides the default API version:
(falls back to 2024-12-01-preview).
Update tests to test for expected structures
The subjectOf triples were redundant with the subgraph provenance model
introduced in e8407b34. Entity-to-source lineage can be traced via
tg:contains -> subgraph -> prov:wasDerivedFrom -> chunk, making the
direct subjectOf edges unnecessary metadata polluting the knowledge graph.
Removed from all three extractors (agent, definitions, relationships),
cleaned up the SUBJECT_OF constant and vocabulary label, and updated
tests accordingly.
Replace per-triple provenance reification with subgraph model
Extraction provenance previously created a full reification (statement
URI, activity, agent) for every single extracted triple, producing ~13
provenance triples per knowledge triple. Since each chunk is processed
by a single LLM call, this was both redundant and semantically
inaccurate.
Now one subgraph object is created per chunk extraction, with
tg:contains linking to each extracted triple. For 20 extractions from
a chunk this reduces provenance from ~260 triples to ~33.
- Rename tg:reifies -> tg:contains, stmt_uri -> subgraph_uri
- Replace triple_provenance_triples() with subgraph_provenance_triples()
- Refactor kg-extract-definitions and kg-extract-relationships to
generate provenance once per chunk instead of per triple
- Add subgraph provenance to kg-extract-ontology and kg-extract-agent
(previously had none)
- Update CLI tools and tech specs to match
Also rename tg-show-document-hierarchy to tg-show-extraction-provenance.
Added extra typing for extraction provenance, fixed extraction prov CLI
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
Update RAG and Agent clients for streaming message handling
GraphRAG now sends multiple message types in a stream:
- 'explain' messages with explain_id and explain_graph for
provenance
- 'chunk' messages with response text fragments
- end_of_session marker for stream completion
Updated all clients to handle this properly:
CLI clients (trustgraph-base/trustgraph/clients/):
- graph_rag_client.py: Added chunk_callback and explain_callback
- document_rag_client.py: Added chunk_callback and explain_callback
- agent_client.py: Added think, observe, answer_callback,
error_callback
Internal clients (trustgraph-base/trustgraph/base/):
- graph_rag_client.py: Async callbacks for streaming
- agent_client.py: Async callbacks for streaming
All clients now:
- Route messages by chunk_type/message_type
- Stream via optional callbacks for incremental delivery
- Wait for proper completion signals
(end_of_dialog/end_of_session/end_of_stream)
- Accumulate and return complete response for callers not using
callbacks
Updated callers:
- extract/kg/agent/extract.py: Uses new invoke(question=...) API
- tests/integration/test_agent_kg_extraction_integration.py:
Updated mocks
This fixes the agent infinite loop issue where knowledge_query was
returning the first 'explain' message (empty response) instead of
waiting for the actual answer chunks.
Concurrency in triples query
The metadata field (list of triples) in the pipeline Metadata class
was redundant. Document metadata triples already flow directly from
librarian to triple-store via emit_document_provenance() - they don't
need to pass through the extraction pipeline.
Additionally, chunker and PDF decoder were overwriting metadata to []
anyway, so any metadata passed through the pipeline was being
discarded.
Changes:
- Remove metadata field from Metadata dataclass
(schema/core/metadata.py)
- Update all Metadata instantiations to remove metadata=[]
parameter
- Remove metadata handling from translators (document_loading,
knowledge)
- Remove metadata consumption from extractors (ontology, agent)
- Update gateway serializers and import handlers
- Update all unit, integration, and contract tests
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
* Don't emit graph embeddings if there aren't any.
* Don't store graph embeddings in a knowledge store if there's an empty list.
* Translate between Cassandra's 'null' representing an empty list and an
empty list which is what the surrounding code wants (and stored in the
first place).
* Avoid emitting empty embedding lists
* Avoid output empty triple lists
* Fix tests
* Changed schema for Value -> Term, majorly breaking change
* Following the schema change, Value -> Term into all processing
* Updated Cassandra for g, p, s, o index patterns (7 indexes)
* Reviewed and updated all tests
* Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down
* Removed legacy storage management cruft. Tidied tech specs.
* Fix deletion of last collection
* Storage processor ignores data on the queue which is for a deleted collection
* Updated tests
* Tech spec
* Address multi-tenant queue option problems in CLI
* Modified collection service to use config
* Changed storage management to use the config service definition
* Updates for agent API with streaming support
* Added tg-dump-queues tool to dump Pulsar queues to a log
* Updated tg-invoke-agent, incremental output
* Queue dumper CLI - might be useful for debug
* Updating for tests
* Tidy up duplicate tech specs in doc directory
* Streaming LLM text-completion service tech spec.
* text-completion and prompt interfaces
* streaming change applied to all LLMs, so far tested with VertexAI
* Skip Pinecone unit tests, upstream module issue is affecting things, tests are passing again
* Added agent streaming, not working and has broken tests
* Tweak the structured query schema
* Structure query service
* Gateway support for nlp-query and structured-query
* API support
* Added CLI
* Update tests
* More tests
* Tech spec for graceful shutdown
* Graceful shutdown of importers/exporters
* Update socket to include graceful shutdown orchestration
* Adding tests for conditions tracked in this PR