2024-09-05 16:45:22 +01:00
|
|
|
"""
|
2025-01-02 19:49:22 +00:00
|
|
|
Uses the GraphRAG service to answer a question
|
2024-09-05 16:45:22 +01:00
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
import argparse
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
import json
|
2024-09-05 16:45:22 +01:00
|
|
|
import os
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
import sys
|
|
|
|
|
import websockets
|
|
|
|
|
import asyncio
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
from trustgraph.api import (
|
|
|
|
|
Api,
|
|
|
|
|
ExplainabilityClient,
|
|
|
|
|
RAGChunk,
|
|
|
|
|
ProvenanceEvent,
|
|
|
|
|
Question,
|
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O
GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
kg-edge-scoring,
kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
provenance/explainability edges
- Add source document edges to knowledge graph
DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
pattern:
Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication
Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
|
|
|
Grounding,
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
Exploration,
|
|
|
|
|
Focus,
|
|
|
|
|
Synthesis,
|
|
|
|
|
)
|
2024-09-05 16:45:22 +01:00
|
|
|
|
2025-01-02 19:49:22 +00:00
|
|
|
default_url = os.getenv("TRUSTGRAPH_URL", 'http://localhost:8088/')
|
2025-12-04 17:38:57 +00:00
|
|
|
default_token = os.getenv("TRUSTGRAPH_TOKEN", None)
|
2024-10-02 18:14:29 +01:00
|
|
|
default_user = 'trustgraph'
|
|
|
|
|
default_collection = 'default'
|
2025-03-13 00:38:18 +00:00
|
|
|
default_entity_limit = 50
|
|
|
|
|
default_triple_limit = 30
|
|
|
|
|
default_max_subgraph_size = 150
|
|
|
|
|
default_max_path_length = 2
|
2026-03-21 20:06:29 +00:00
|
|
|
default_edge_score_limit = 30
|
|
|
|
|
default_edge_limit = 25
|
2024-09-05 16:45:22 +01:00
|
|
|
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
# Provenance predicates
|
|
|
|
|
TG = "https://trustgraph.ai/ns/"
|
|
|
|
|
TG_QUERY = TG + "query"
|
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O
GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
kg-edge-scoring,
kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
provenance/explainability edges
- Add source document edges to knowledge graph
DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
pattern:
Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication
Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
|
|
|
TG_CONCEPT = TG + "concept"
|
|
|
|
|
TG_ENTITY = TG + "entity"
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
TG_EDGE_COUNT = TG + "edgeCount"
|
|
|
|
|
TG_SELECTED_EDGE = TG + "selectedEdge"
|
|
|
|
|
TG_EDGE = TG + "edge"
|
|
|
|
|
TG_REASONING = TG + "reasoning"
|
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O
GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
kg-edge-scoring,
kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
provenance/explainability edges
- Add source document edges to knowledge graph
DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
pattern:
Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication
Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
|
|
|
TG_DOCUMENT = TG + "document"
|
2026-03-13 11:37:59 +00:00
|
|
|
TG_CONTAINS = TG + "contains"
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
PROV = "http://www.w3.org/ns/prov#"
|
|
|
|
|
PROV_STARTED_AT_TIME = PROV + "startedAtTime"
|
|
|
|
|
PROV_WAS_DERIVED_FROM = PROV + "wasDerivedFrom"
|
|
|
|
|
RDFS_LABEL = "http://www.w3.org/2000/01/rdf-schema#label"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _get_event_type(prov_id):
|
|
|
|
|
"""Extract event type from provenance_id"""
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
if "question" in prov_id:
|
|
|
|
|
return "question"
|
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O
GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
kg-edge-scoring,
kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
provenance/explainability edges
- Add source document edges to knowledge graph
DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
pattern:
Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication
Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
|
|
|
elif "grounding" in prov_id:
|
|
|
|
|
return "grounding"
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
elif "exploration" in prov_id:
|
|
|
|
|
return "exploration"
|
|
|
|
|
elif "focus" in prov_id:
|
|
|
|
|
return "focus"
|
|
|
|
|
elif "synthesis" in prov_id:
|
|
|
|
|
return "synthesis"
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
return "provenance"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _format_provenance_details(event_type, triples):
|
|
|
|
|
"""Format provenance details based on event type and triples"""
|
|
|
|
|
lines = []
|
|
|
|
|
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
if event_type == "question":
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
# Show query and timestamp
|
|
|
|
|
for s, p, o in triples:
|
|
|
|
|
if p == TG_QUERY:
|
|
|
|
|
lines.append(f" Query: {o}")
|
|
|
|
|
elif p == PROV_STARTED_AT_TIME:
|
|
|
|
|
lines.append(f" Time: {o}")
|
|
|
|
|
|
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O
GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
kg-edge-scoring,
kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
provenance/explainability edges
- Add source document edges to knowledge graph
DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
pattern:
Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication
Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
|
|
|
elif event_type == "grounding":
|
|
|
|
|
# Show extracted concepts
|
|
|
|
|
concepts = [o for s, p, o in triples if p == TG_CONCEPT]
|
|
|
|
|
if concepts:
|
|
|
|
|
lines.append(f" Concepts: {len(concepts)}")
|
|
|
|
|
for concept in concepts:
|
|
|
|
|
lines.append(f" - {concept}")
|
|
|
|
|
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
elif event_type == "exploration":
|
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O
GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
kg-edge-scoring,
kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
provenance/explainability edges
- Add source document edges to knowledge graph
DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
pattern:
Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication
Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
|
|
|
# Show edge count (seed entities resolved separately with labels)
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
for s, p, o in triples:
|
|
|
|
|
if p == TG_EDGE_COUNT:
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
lines.append(f" Edges explored: {o}")
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
elif event_type == "focus":
|
|
|
|
|
# For focus, just count edge selection URIs
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
# The actual edge details are fetched separately via edge_selections parameter
|
|
|
|
|
edge_sel_uris = []
|
|
|
|
|
for s, p, o in triples:
|
|
|
|
|
if p == TG_SELECTED_EDGE:
|
|
|
|
|
edge_sel_uris.append(o)
|
|
|
|
|
if edge_sel_uris:
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
lines.append(f" Focused on {len(edge_sel_uris)} edge(s)")
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
elif event_type == "synthesis":
|
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O
GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
kg-edge-scoring,
kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
provenance/explainability edges
- Add source document edges to knowledge graph
DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
pattern:
Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication
Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
|
|
|
# Show document reference (content already streamed)
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
for s, p, o in triples:
|
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O
GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
kg-edge-scoring,
kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
provenance/explainability edges
- Add source document edges to knowledge graph
DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
pattern:
Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication
Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
|
|
|
if p == TG_DOCUMENT:
|
|
|
|
|
lines.append(f" Document: {o}")
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
|
|
|
|
|
return lines
|
|
|
|
|
|
|
|
|
|
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
async def _query_triples_once(ws_url, flow_id, prov_id, user, collection, graph=None, debug=False):
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
"""Query triples for a provenance node (single attempt)"""
|
|
|
|
|
request = {
|
|
|
|
|
"id": "triples-request",
|
|
|
|
|
"service": "triples",
|
|
|
|
|
"flow": flow_id,
|
|
|
|
|
"request": {
|
|
|
|
|
"s": {"t": "i", "i": prov_id},
|
|
|
|
|
"user": user,
|
|
|
|
|
"collection": collection,
|
|
|
|
|
"limit": 100
|
|
|
|
|
}
|
|
|
|
|
}
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
# Add graph filter if specified (for named graph queries)
|
|
|
|
|
if graph is not None:
|
|
|
|
|
request["request"]["g"] = graph
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] querying triples for s={prov_id}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
triples = []
|
|
|
|
|
try:
|
|
|
|
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=30) as websocket:
|
|
|
|
|
await websocket.send(json.dumps(request))
|
|
|
|
|
|
|
|
|
|
async for raw_message in websocket:
|
|
|
|
|
response = json.loads(raw_message)
|
|
|
|
|
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] response: {json.dumps(response)[:200]}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
if response.get("id") != "triples-request":
|
|
|
|
|
continue
|
|
|
|
|
|
|
|
|
|
if "error" in response:
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] error: {response['error']}", file=sys.stderr)
|
|
|
|
|
break
|
|
|
|
|
|
|
|
|
|
if "response" in response:
|
|
|
|
|
resp = response["response"]
|
|
|
|
|
# Handle triples response
|
|
|
|
|
# Response format: {"response": [triples...]}
|
|
|
|
|
# Each triple uses compact keys: "i" for iri, "v" for value, "t" for type
|
|
|
|
|
triple_list = resp.get("response", [])
|
|
|
|
|
for t in triple_list:
|
|
|
|
|
s = t.get("s", {}).get("i", t.get("s", {}).get("v", ""))
|
|
|
|
|
p = t.get("p", {}).get("i", t.get("p", {}).get("v", ""))
|
|
|
|
|
# Handle quoted triples (type "t") and regular values
|
|
|
|
|
o_term = t.get("o", {})
|
|
|
|
|
if o_term.get("t") == "t":
|
|
|
|
|
# Quoted triple - extract s, p, o from nested structure
|
|
|
|
|
tr = o_term.get("tr", {})
|
|
|
|
|
o = {
|
|
|
|
|
"s": tr.get("s", {}).get("i", ""),
|
|
|
|
|
"p": tr.get("p", {}).get("i", ""),
|
|
|
|
|
"o": tr.get("o", {}).get("i", tr.get("o", {}).get("v", "")),
|
|
|
|
|
}
|
|
|
|
|
else:
|
|
|
|
|
o = o_term.get("i", o_term.get("v", ""))
|
|
|
|
|
triples.append((s, p, o))
|
|
|
|
|
|
|
|
|
|
if resp.get("complete") or response.get("complete"):
|
|
|
|
|
break
|
|
|
|
|
except Exception as e:
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] exception: {e}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] got {len(triples)} triples", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
return triples
|
|
|
|
|
|
|
|
|
|
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
async def _query_triples(ws_url, flow_id, prov_id, user, collection, graph=None, max_retries=5, retry_delay=0.2, debug=False):
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
"""Query triples for a provenance node with retries for race condition"""
|
|
|
|
|
for attempt in range(max_retries):
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
triples = await _query_triples_once(ws_url, flow_id, prov_id, user, collection, graph=graph, debug=debug)
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
if triples:
|
|
|
|
|
return triples
|
|
|
|
|
# Wait before retry if empty (triples may not be stored yet)
|
|
|
|
|
if attempt < max_retries - 1:
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] retry {attempt + 1}/{max_retries}...", file=sys.stderr)
|
|
|
|
|
await asyncio.sleep(retry_delay)
|
|
|
|
|
return []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
async def _query_edge_provenance(ws_url, flow_id, edge_s, edge_p, edge_o, user, collection, debug=False):
|
|
|
|
|
"""
|
|
|
|
|
Query for provenance of an edge (s, p, o) in the knowledge graph.
|
|
|
|
|
|
2026-03-13 11:37:59 +00:00
|
|
|
Finds subgraphs that contain the edge via tg:contains, then follows
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
prov:wasDerivedFrom to find source documents.
|
|
|
|
|
|
|
|
|
|
Returns list of source URIs (chunks, pages, documents).
|
|
|
|
|
"""
|
2026-03-13 11:37:59 +00:00
|
|
|
# Query for subgraphs that contain this edge: ?subgraph tg:contains <<s p o>>
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
request = {
|
|
|
|
|
"id": "edge-prov-request",
|
|
|
|
|
"service": "triples",
|
|
|
|
|
"flow": flow_id,
|
|
|
|
|
"request": {
|
2026-03-13 11:37:59 +00:00
|
|
|
"p": {"t": "i", "i": TG_CONTAINS},
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
"o": {
|
|
|
|
|
"t": "t", # Quoted triple type
|
|
|
|
|
"tr": {
|
|
|
|
|
"s": {"t": "i", "i": edge_s},
|
|
|
|
|
"p": {"t": "i", "i": edge_p},
|
|
|
|
|
"o": {"t": "i", "i": edge_o} if edge_o.startswith("http") or edge_o.startswith("urn:") else {"t": "l", "v": edge_o},
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"user": user,
|
|
|
|
|
"collection": collection,
|
|
|
|
|
"limit": 10
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] querying edge provenance for ({edge_s}, {edge_p}, {edge_o})", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
stmt_uris = []
|
|
|
|
|
try:
|
|
|
|
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=30) as websocket:
|
|
|
|
|
await websocket.send(json.dumps(request))
|
|
|
|
|
|
|
|
|
|
async for raw_message in websocket:
|
|
|
|
|
response = json.loads(raw_message)
|
|
|
|
|
|
|
|
|
|
if response.get("id") != "edge-prov-request":
|
|
|
|
|
continue
|
|
|
|
|
|
|
|
|
|
if "error" in response:
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] error: {response['error']}", file=sys.stderr)
|
|
|
|
|
break
|
|
|
|
|
|
|
|
|
|
if "response" in response:
|
|
|
|
|
resp = response["response"]
|
|
|
|
|
triple_list = resp.get("response", [])
|
|
|
|
|
for t in triple_list:
|
|
|
|
|
s = t.get("s", {}).get("i", "")
|
|
|
|
|
if s:
|
|
|
|
|
stmt_uris.append(s)
|
|
|
|
|
|
|
|
|
|
if resp.get("complete") or response.get("complete"):
|
|
|
|
|
break
|
|
|
|
|
except Exception as e:
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] exception querying edge provenance: {e}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] found {len(stmt_uris)} reifying statements", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
# For each statement, query wasDerivedFrom to find sources
|
|
|
|
|
sources = []
|
|
|
|
|
for stmt_uri in stmt_uris:
|
|
|
|
|
# Query: stmt_uri prov:wasDerivedFrom ?source
|
|
|
|
|
request = {
|
|
|
|
|
"id": "derived-from-request",
|
|
|
|
|
"service": "triples",
|
|
|
|
|
"flow": flow_id,
|
|
|
|
|
"request": {
|
|
|
|
|
"s": {"t": "i", "i": stmt_uri},
|
|
|
|
|
"p": {"t": "i", "i": PROV_WAS_DERIVED_FROM},
|
|
|
|
|
"user": user,
|
|
|
|
|
"collection": collection,
|
|
|
|
|
"limit": 10
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=30) as websocket:
|
|
|
|
|
await websocket.send(json.dumps(request))
|
|
|
|
|
|
|
|
|
|
async for raw_message in websocket:
|
|
|
|
|
response = json.loads(raw_message)
|
|
|
|
|
|
|
|
|
|
if response.get("id") != "derived-from-request":
|
|
|
|
|
continue
|
|
|
|
|
|
|
|
|
|
if "error" in response:
|
|
|
|
|
break
|
|
|
|
|
|
|
|
|
|
if "response" in response:
|
|
|
|
|
resp = response["response"]
|
|
|
|
|
triple_list = resp.get("response", [])
|
|
|
|
|
for t in triple_list:
|
|
|
|
|
o = t.get("o", {}).get("i", "")
|
|
|
|
|
if o:
|
|
|
|
|
sources.append(o)
|
|
|
|
|
|
|
|
|
|
if resp.get("complete") or response.get("complete"):
|
|
|
|
|
break
|
|
|
|
|
except Exception as e:
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] exception querying wasDerivedFrom: {e}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] found {len(sources)} source(s): {sources}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
return sources
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
async def _query_derived_from(ws_url, flow_id, uri, user, collection, debug=False):
|
|
|
|
|
"""Query for the prov:wasDerivedFrom parent of a URI. Returns None if no parent."""
|
|
|
|
|
request = {
|
|
|
|
|
"id": "parent-request",
|
|
|
|
|
"service": "triples",
|
|
|
|
|
"flow": flow_id,
|
|
|
|
|
"request": {
|
|
|
|
|
"s": {"t": "i", "i": uri},
|
|
|
|
|
"p": {"t": "i", "i": PROV_WAS_DERIVED_FROM},
|
|
|
|
|
"user": user,
|
|
|
|
|
"collection": collection,
|
|
|
|
|
"limit": 1
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=30) as websocket:
|
|
|
|
|
await websocket.send(json.dumps(request))
|
|
|
|
|
|
|
|
|
|
async for raw_message in websocket:
|
|
|
|
|
response = json.loads(raw_message)
|
|
|
|
|
|
|
|
|
|
if response.get("id") != "parent-request":
|
|
|
|
|
continue
|
|
|
|
|
|
|
|
|
|
if "error" in response:
|
|
|
|
|
break
|
|
|
|
|
|
|
|
|
|
if "response" in response:
|
|
|
|
|
resp = response["response"]
|
|
|
|
|
triple_list = resp.get("response", [])
|
|
|
|
|
if triple_list:
|
|
|
|
|
return triple_list[0].get("o", {}).get("i", None)
|
|
|
|
|
|
|
|
|
|
if resp.get("complete") or response.get("complete"):
|
|
|
|
|
break
|
|
|
|
|
except Exception as e:
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] exception querying parent: {e}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
async def _trace_provenance_chain(ws_url, flow_id, source_uri, user, collection, label_cache, debug=False):
|
|
|
|
|
"""
|
|
|
|
|
Trace the full provenance chain from a source URI up to the root document.
|
|
|
|
|
Returns a list of (uri, label) tuples from leaf to root.
|
|
|
|
|
"""
|
|
|
|
|
chain = []
|
|
|
|
|
current = source_uri
|
|
|
|
|
max_depth = 10 # Prevent infinite loops
|
|
|
|
|
|
|
|
|
|
for _ in range(max_depth):
|
|
|
|
|
if not current:
|
|
|
|
|
break
|
|
|
|
|
|
|
|
|
|
# Get label for current entity
|
|
|
|
|
label = await _query_label(ws_url, flow_id, current, user, collection, label_cache, debug)
|
|
|
|
|
chain.append((current, label))
|
|
|
|
|
|
|
|
|
|
# Get parent
|
|
|
|
|
parent = await _query_derived_from(ws_url, flow_id, current, user, collection, debug)
|
|
|
|
|
if not parent or parent == current:
|
|
|
|
|
break
|
|
|
|
|
current = parent
|
|
|
|
|
|
|
|
|
|
return chain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _format_provenance_chain(chain):
|
|
|
|
|
"""
|
|
|
|
|
Format a provenance chain as a human-readable string.
|
|
|
|
|
Chain is [(uri, label), ...] from leaf to root.
|
|
|
|
|
"""
|
|
|
|
|
if not chain:
|
|
|
|
|
return ""
|
|
|
|
|
|
|
|
|
|
# Show labels, from leaf to root
|
|
|
|
|
labels = [label for uri, label in chain]
|
|
|
|
|
return " → ".join(labels)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _is_iri(value):
|
|
|
|
|
"""Check if a value looks like an IRI."""
|
|
|
|
|
if not isinstance(value, str):
|
|
|
|
|
return False
|
|
|
|
|
return value.startswith("http://") or value.startswith("https://") or value.startswith("urn:")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
async def _query_label(ws_url, flow_id, iri, user, collection, label_cache, debug=False):
|
|
|
|
|
"""
|
|
|
|
|
Query for the rdfs:label of an IRI.
|
|
|
|
|
Uses label_cache to avoid repeated queries.
|
|
|
|
|
Returns the label if found, otherwise returns the IRI.
|
|
|
|
|
"""
|
|
|
|
|
if not _is_iri(iri):
|
|
|
|
|
return iri
|
|
|
|
|
|
|
|
|
|
# Check cache first
|
|
|
|
|
if iri in label_cache:
|
|
|
|
|
return label_cache[iri]
|
|
|
|
|
|
|
|
|
|
request = {
|
|
|
|
|
"id": "label-request",
|
|
|
|
|
"service": "triples",
|
|
|
|
|
"flow": flow_id,
|
|
|
|
|
"request": {
|
|
|
|
|
"s": {"t": "i", "i": iri},
|
|
|
|
|
"p": {"t": "i", "i": RDFS_LABEL},
|
|
|
|
|
"user": user,
|
|
|
|
|
"collection": collection,
|
|
|
|
|
"limit": 1
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
label = iri # Default to IRI if no label found
|
|
|
|
|
try:
|
|
|
|
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=30) as websocket:
|
|
|
|
|
await websocket.send(json.dumps(request))
|
|
|
|
|
|
|
|
|
|
async for raw_message in websocket:
|
|
|
|
|
response = json.loads(raw_message)
|
|
|
|
|
|
|
|
|
|
if response.get("id") != "label-request":
|
|
|
|
|
continue
|
|
|
|
|
|
|
|
|
|
if "error" in response:
|
|
|
|
|
break
|
|
|
|
|
|
|
|
|
|
if "response" in response:
|
|
|
|
|
resp = response["response"]
|
|
|
|
|
triple_list = resp.get("response", [])
|
|
|
|
|
if triple_list:
|
|
|
|
|
# Get the label value
|
|
|
|
|
o = triple_list[0].get("o", {})
|
|
|
|
|
label = o.get("v", o.get("i", iri))
|
|
|
|
|
|
|
|
|
|
if resp.get("complete") or response.get("complete"):
|
|
|
|
|
break
|
|
|
|
|
except Exception as e:
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] exception querying label for {iri}: {e}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
# Cache the result
|
|
|
|
|
label_cache[iri] = label
|
|
|
|
|
return label
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
async def _resolve_edge_labels(ws_url, flow_id, edge_triple, user, collection, label_cache, debug=False):
|
|
|
|
|
"""
|
|
|
|
|
Resolve labels for all IRI components of an edge triple.
|
|
|
|
|
Returns (s_label, p_label, o_label).
|
|
|
|
|
"""
|
|
|
|
|
s = edge_triple.get("s", "?")
|
|
|
|
|
p = edge_triple.get("p", "?")
|
|
|
|
|
o = edge_triple.get("o", "?")
|
|
|
|
|
|
|
|
|
|
s_label = await _query_label(ws_url, flow_id, s, user, collection, label_cache, debug)
|
|
|
|
|
p_label = await _query_label(ws_url, flow_id, p, user, collection, label_cache, debug)
|
|
|
|
|
o_label = await _query_label(ws_url, flow_id, o, user, collection, label_cache, debug)
|
|
|
|
|
|
|
|
|
|
return s_label, p_label, o_label
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
async def _question_explainable(
|
|
|
|
|
url, flow_id, question, user, collection, entity_limit, triple_limit,
|
|
|
|
|
max_subgraph_size, max_path_length, token=None, debug=False
|
|
|
|
|
):
|
|
|
|
|
"""Execute graph RAG with explainability - shows provenance events with details"""
|
|
|
|
|
# Convert HTTP URL to WebSocket URL
|
|
|
|
|
if url.startswith("http://"):
|
|
|
|
|
ws_url = url.replace("http://", "ws://", 1)
|
|
|
|
|
elif url.startswith("https://"):
|
|
|
|
|
ws_url = url.replace("https://", "wss://", 1)
|
|
|
|
|
else:
|
|
|
|
|
ws_url = f"ws://{url}"
|
|
|
|
|
|
|
|
|
|
ws_url = f"{ws_url.rstrip('/')}/api/v1/socket"
|
|
|
|
|
if token:
|
|
|
|
|
ws_url = f"{ws_url}?token={token}"
|
|
|
|
|
|
|
|
|
|
# Cache for label lookups to avoid repeated queries
|
|
|
|
|
label_cache = {}
|
|
|
|
|
|
|
|
|
|
request = {
|
|
|
|
|
"id": "cli-request",
|
|
|
|
|
"service": "graph-rag",
|
|
|
|
|
"flow": flow_id,
|
|
|
|
|
"request": {
|
|
|
|
|
"query": question,
|
|
|
|
|
"user": user,
|
|
|
|
|
"collection": collection,
|
|
|
|
|
"entity-limit": entity_limit,
|
|
|
|
|
"triple-limit": triple_limit,
|
|
|
|
|
"max-subgraph-size": max_subgraph_size,
|
|
|
|
|
"max-path-length": max_path_length,
|
|
|
|
|
"streaming": True
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=300) as websocket:
|
|
|
|
|
await websocket.send(json.dumps(request))
|
|
|
|
|
|
|
|
|
|
async for raw_message in websocket:
|
|
|
|
|
response = json.loads(raw_message)
|
|
|
|
|
|
|
|
|
|
if response.get("id") != "cli-request":
|
|
|
|
|
continue
|
|
|
|
|
|
|
|
|
|
if "error" in response:
|
|
|
|
|
print(f"\nError: {response['error']}", file=sys.stderr)
|
|
|
|
|
break
|
|
|
|
|
|
|
|
|
|
if "response" in response:
|
|
|
|
|
resp = response["response"]
|
|
|
|
|
|
|
|
|
|
# Check for errors in response
|
|
|
|
|
if "error" in resp and resp["error"]:
|
|
|
|
|
err = resp["error"]
|
|
|
|
|
print(f"\nError: {err.get('message', 'Unknown error')}", file=sys.stderr)
|
|
|
|
|
break
|
|
|
|
|
|
|
|
|
|
message_type = resp.get("message_type", "")
|
|
|
|
|
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] message_type={message_type}, keys={list(resp.keys())}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
if message_type == "explain":
|
|
|
|
|
# Display explain event with details
|
|
|
|
|
explain_id = resp.get("explain_id", "")
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
explain_graph = resp.get("explain_graph") # Named graph (e.g., urn:graph:retrieval)
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
if explain_id:
|
|
|
|
|
event_type = _get_event_type(explain_id)
|
|
|
|
|
print(f"\n [{event_type}] {explain_id}", file=sys.stderr)
|
|
|
|
|
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
# Query triples for this explain node (using named graph filter)
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
triples = await _query_triples(
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
ws_url, flow_id, explain_id, user, collection, graph=explain_graph, debug=debug
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
)
|
|
|
|
|
|
|
|
|
|
# Format and display details
|
|
|
|
|
details = _format_provenance_details(event_type, triples)
|
|
|
|
|
for line in details:
|
|
|
|
|
print(line, file=sys.stderr)
|
|
|
|
|
|
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O
GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
kg-edge-scoring,
kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
provenance/explainability edges
- Add source document edges to knowledge graph
DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
pattern:
Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication
Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
|
|
|
# For exploration events, resolve entity labels
|
|
|
|
|
if event_type == "exploration":
|
|
|
|
|
entity_iris = [o for s, p, o in triples if p == TG_ENTITY]
|
|
|
|
|
if entity_iris:
|
|
|
|
|
print(f" Seed entities: {len(entity_iris)}", file=sys.stderr)
|
|
|
|
|
for iri in entity_iris:
|
|
|
|
|
label = await _query_label(
|
|
|
|
|
ws_url, flow_id, iri, user, collection,
|
|
|
|
|
label_cache, debug=debug
|
|
|
|
|
)
|
|
|
|
|
print(f" - {label}", file=sys.stderr)
|
|
|
|
|
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
# For focus events, query each edge selection for details
|
|
|
|
|
if event_type == "focus":
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
for s, p, o in triples:
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] triple: p={p}, o={o}, o_type={type(o).__name__}", file=sys.stderr)
|
|
|
|
|
if p == TG_SELECTED_EDGE and isinstance(o, str):
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] querying edge selection: {o}", file=sys.stderr)
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
# Query the edge selection entity (using named graph filter)
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
edge_triples = await _query_triples(
|
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data
Changed terminology:
- session -> question
- retrieval -> exploration
- selection -> focus
- answer -> synthesis
- uris.py: Renamed query_session_uri → question_uri,
retrieval_uri → exploration_uri, selection_uri → focus_uri,
answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
updated labels ("GraphRAG question", "Exploration", "Focus",
"Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
(Question, Exploration, Focus, Synthesis)
Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
named graph
- rag.py: Explainability triples stored in user's collection (not
separate collection) with named graph
Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph
CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
--show-graph to display graph column
Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
|
|
|
ws_url, flow_id, o, user, collection, graph=explain_graph, debug=debug
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
)
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] got {len(edge_triples)} edge triples", file=sys.stderr)
|
|
|
|
|
# Extract edge and reasoning
|
|
|
|
|
edge_triple = None # Store the actual triple for provenance lookup
|
|
|
|
|
reasoning = None
|
|
|
|
|
for es, ep, eo in edge_triples:
|
|
|
|
|
if debug:
|
|
|
|
|
print(f" [debug] edge triple: ep={ep}, eo={eo}", file=sys.stderr)
|
|
|
|
|
if ep == TG_EDGE and isinstance(eo, dict):
|
|
|
|
|
# eo is a quoted triple dict
|
|
|
|
|
edge_triple = eo
|
|
|
|
|
elif ep == TG_REASONING:
|
|
|
|
|
reasoning = eo
|
|
|
|
|
if edge_triple:
|
|
|
|
|
# Resolve labels for edge components
|
|
|
|
|
s_label, p_label, o_label = await _resolve_edge_labels(
|
|
|
|
|
ws_url, flow_id, edge_triple, user, collection,
|
|
|
|
|
label_cache, debug=debug
|
|
|
|
|
)
|
|
|
|
|
print(f" Edge: ({s_label}, {p_label}, {o_label})", file=sys.stderr)
|
|
|
|
|
if reasoning:
|
|
|
|
|
r_short = reasoning[:100] + "..." if len(reasoning) > 100 else reasoning
|
|
|
|
|
print(f" Reason: {r_short}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
# Trace edge provenance in the user's collection (not explainability)
|
|
|
|
|
if edge_triple:
|
|
|
|
|
sources = await _query_edge_provenance(
|
|
|
|
|
ws_url, flow_id,
|
|
|
|
|
edge_triple.get("s", ""),
|
|
|
|
|
edge_triple.get("p", ""),
|
|
|
|
|
edge_triple.get("o", ""),
|
|
|
|
|
user, collection, # Use the query collection, not explainability
|
|
|
|
|
debug=debug
|
|
|
|
|
)
|
|
|
|
|
if sources:
|
|
|
|
|
for src in sources:
|
|
|
|
|
# Trace full chain from source to root document
|
|
|
|
|
chain = await _trace_provenance_chain(
|
|
|
|
|
ws_url, flow_id, src, user, collection,
|
|
|
|
|
label_cache, debug=debug
|
|
|
|
|
)
|
|
|
|
|
chain_str = _format_provenance_chain(chain)
|
|
|
|
|
print(f" Source: {chain_str}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
elif message_type == "chunk" or not message_type:
|
|
|
|
|
# Display response chunk
|
|
|
|
|
chunk = resp.get("response", "")
|
|
|
|
|
if chunk:
|
|
|
|
|
print(chunk, end="", flush=True)
|
|
|
|
|
|
|
|
|
|
# Check if session is complete
|
|
|
|
|
if resp.get("end_of_session"):
|
|
|
|
|
break
|
|
|
|
|
|
|
|
|
|
print() # Final newline
|
|
|
|
|
|
|
|
|
|
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
def _question_explainable_api(
|
|
|
|
|
url, flow_id, question_text, user, collection, entity_limit, triple_limit,
|
2026-03-21 20:06:29 +00:00
|
|
|
max_subgraph_size, max_path_length, edge_score_limit=30,
|
|
|
|
|
edge_limit=25, token=None, debug=False
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
):
|
|
|
|
|
"""Execute graph RAG with explainability using the new API classes."""
|
|
|
|
|
api = Api(url=url, token=token)
|
|
|
|
|
socket = api.socket()
|
|
|
|
|
flow = socket.flow(flow_id)
|
|
|
|
|
explain_client = ExplainabilityClient(flow, retry_delay=0.2, max_retries=10)
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
# Stream GraphRAG with explainability - process events as they arrive
|
|
|
|
|
for item in flow.graph_rag_explain(
|
|
|
|
|
query=question_text,
|
|
|
|
|
user=user,
|
|
|
|
|
collection=collection,
|
2026-03-21 20:06:29 +00:00
|
|
|
entity_limit=entity_limit,
|
|
|
|
|
triple_limit=triple_limit,
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
max_subgraph_size=max_subgraph_size,
|
2026-03-21 20:06:29 +00:00
|
|
|
max_path_length=max_path_length,
|
|
|
|
|
edge_score_limit=edge_score_limit,
|
|
|
|
|
edge_limit=edge_limit,
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
):
|
|
|
|
|
if isinstance(item, RAGChunk):
|
|
|
|
|
# Print response content
|
|
|
|
|
print(item.content, end="", flush=True)
|
|
|
|
|
|
|
|
|
|
elif isinstance(item, ProvenanceEvent):
|
Deliver explainability triples inline in retrieval response stream (#763)
Provenance triples are now included directly in explain messages from
GraphRAG, DocumentRAG, and Agent services, eliminating the need for
follow-up knowledge graph queries to retrieve explainability details.
Each explain message in the response stream now carries:
- explain_id: root URI for this provenance step (unchanged)
- explain_graph: named graph where triples are stored (unchanged)
- explain_triples: the actual provenance triples for this step (new)
Changes across the stack:
- Schema: added explain_triples field to GraphRagResponse,
DocumentRagResponse, and AgentResponse
- Services: all explain message call sites pass triples through
(graph_rag, document_rag, agent react, agent orchestrator)
- Translators: encode explain_triples via TripleTranslator for
gateway wire format
- Python SDK: ProvenanceEvent now includes parsed ExplainEntity
and raw triples; expanded event_type detection
- CLI: invoke_graph_rag, invoke_agent, invoke_document_rag use
inline entity when available, fall back to graph query
- Tech specs updated
Additional explainability test
2026-04-07 12:19:05 +01:00
|
|
|
# Use inline entity if available, otherwise fetch from graph
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
prov_id = item.explain_id
|
|
|
|
|
explain_graph = item.explain_graph or "urn:graph:retrieval"
|
|
|
|
|
|
Deliver explainability triples inline in retrieval response stream (#763)
Provenance triples are now included directly in explain messages from
GraphRAG, DocumentRAG, and Agent services, eliminating the need for
follow-up knowledge graph queries to retrieve explainability details.
Each explain message in the response stream now carries:
- explain_id: root URI for this provenance step (unchanged)
- explain_graph: named graph where triples are stored (unchanged)
- explain_triples: the actual provenance triples for this step (new)
Changes across the stack:
- Schema: added explain_triples field to GraphRagResponse,
DocumentRagResponse, and AgentResponse
- Services: all explain message call sites pass triples through
(graph_rag, document_rag, agent react, agent orchestrator)
- Translators: encode explain_triples via TripleTranslator for
gateway wire format
- Python SDK: ProvenanceEvent now includes parsed ExplainEntity
and raw triples; expanded event_type detection
- CLI: invoke_graph_rag, invoke_agent, invoke_document_rag use
inline entity when available, fall back to graph query
- Tech specs updated
Additional explainability test
2026-04-07 12:19:05 +01:00
|
|
|
entity = item.entity
|
|
|
|
|
if entity is None:
|
|
|
|
|
entity = explain_client.fetch_entity(
|
|
|
|
|
prov_id,
|
|
|
|
|
graph=explain_graph,
|
|
|
|
|
user=user,
|
|
|
|
|
collection=collection
|
|
|
|
|
)
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
|
|
|
|
|
if entity is None:
|
|
|
|
|
if debug:
|
|
|
|
|
print(f"\n [warning] Could not fetch entity: {prov_id}", file=sys.stderr)
|
|
|
|
|
continue
|
|
|
|
|
|
|
|
|
|
# Display based on entity type
|
|
|
|
|
if isinstance(entity, Question):
|
|
|
|
|
print(f"\n [question] {prov_id}", file=sys.stderr)
|
|
|
|
|
if entity.query:
|
|
|
|
|
print(f" Query: {entity.query}", file=sys.stderr)
|
|
|
|
|
if entity.timestamp:
|
|
|
|
|
print(f" Time: {entity.timestamp}", file=sys.stderr)
|
|
|
|
|
|
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O
GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
kg-edge-scoring,
kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
provenance/explainability edges
- Add source document edges to knowledge graph
DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
pattern:
Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication
Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
|
|
|
elif isinstance(entity, Grounding):
|
|
|
|
|
print(f"\n [grounding] {prov_id}", file=sys.stderr)
|
|
|
|
|
if entity.concepts:
|
|
|
|
|
print(f" Concepts: {len(entity.concepts)}", file=sys.stderr)
|
|
|
|
|
for concept in entity.concepts:
|
|
|
|
|
print(f" - {concept}", file=sys.stderr)
|
|
|
|
|
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
elif isinstance(entity, Exploration):
|
|
|
|
|
print(f"\n [exploration] {prov_id}", file=sys.stderr)
|
|
|
|
|
if entity.edge_count:
|
|
|
|
|
print(f" Edges explored: {entity.edge_count}", file=sys.stderr)
|
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O
GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
kg-edge-scoring,
kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
provenance/explainability edges
- Add source document edges to knowledge graph
DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
pattern:
Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication
Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
|
|
|
if entity.entities:
|
|
|
|
|
print(f" Seed entities: {len(entity.entities)}", file=sys.stderr)
|
|
|
|
|
for ent in entity.entities:
|
|
|
|
|
label = explain_client.resolve_label(ent, user, collection)
|
|
|
|
|
print(f" - {label}", file=sys.stderr)
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
|
|
|
|
|
elif isinstance(entity, Focus):
|
|
|
|
|
print(f"\n [focus] {prov_id}", file=sys.stderr)
|
|
|
|
|
if entity.selected_edge_uris:
|
|
|
|
|
print(f" Focused on {len(entity.selected_edge_uris)} edge(s)", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
# Fetch full focus with edge details
|
|
|
|
|
focus_full = explain_client.fetch_focus_with_edges(
|
|
|
|
|
prov_id,
|
|
|
|
|
graph=explain_graph,
|
|
|
|
|
user=user,
|
|
|
|
|
collection=collection
|
|
|
|
|
)
|
|
|
|
|
if focus_full and focus_full.edge_selections:
|
|
|
|
|
for edge_sel in focus_full.edge_selections:
|
|
|
|
|
if edge_sel.edge:
|
|
|
|
|
# Resolve labels for edge components
|
|
|
|
|
s_label, p_label, o_label = explain_client.resolve_edge_labels(
|
|
|
|
|
edge_sel.edge, user, collection
|
|
|
|
|
)
|
|
|
|
|
print(f" Edge: ({s_label}, {p_label}, {o_label})", file=sys.stderr)
|
|
|
|
|
if edge_sel.reasoning:
|
|
|
|
|
r_short = edge_sel.reasoning[:100] + "..." if len(edge_sel.reasoning) > 100 else edge_sel.reasoning
|
|
|
|
|
print(f" Reason: {r_short}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
elif isinstance(entity, Synthesis):
|
|
|
|
|
print(f"\n [synthesis] {prov_id}", file=sys.stderr)
|
2026-03-16 14:47:37 +00:00
|
|
|
if entity.document:
|
|
|
|
|
print(f" Document: {entity.document}", file=sys.stderr)
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
|
|
|
|
|
else:
|
|
|
|
|
if debug:
|
|
|
|
|
print(f"\n [unknown] {prov_id} (type: {entity.entity_type})", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
print() # Final newline
|
|
|
|
|
|
|
|
|
|
finally:
|
|
|
|
|
socket.close()
|
|
|
|
|
|
|
|
|
|
|
2025-12-04 17:38:57 +00:00
|
|
|
def question(
|
2025-05-03 10:39:53 +01:00
|
|
|
url, flow_id, question, user, collection, entity_limit, triple_limit,
|
2026-03-21 20:06:29 +00:00
|
|
|
max_subgraph_size, max_path_length, edge_score_limit=50,
|
|
|
|
|
edge_limit=25, streaming=True, token=None,
|
Expose LLM token usage (in_token, out_token, model) across all
service layers
Propagate token counts from LLM services through the prompt,
text-completion, graph-RAG, document-RAG, and agent orchestrator
pipelines to the API gateway and Python SDK. All fields are Optional
— None means "not available", distinguishing from a real zero count.
Key changes:
- Schema: Add in_token/out_token/model to TextCompletionResponse,
PromptResponse, GraphRagResponse, DocumentRagResponse,
AgentResponse
- TextCompletionClient: New TextCompletionResult return type. Split
into text_completion() (non-streaming) and
text_completion_stream() (streaming with per-chunk handler
callback)
- PromptClient: New PromptResult with response_type
(text/json/jsonl), typed fields (text/object/objects), and token
usage. All callers updated.
- RAG services: Accumulate token usage across all prompt calls
(extract-concepts, edge-scoring, edge-reasoning,
synthesis). Non-streaming path sends single combined response
instead of chunk + end_of_session.
- Agent orchestrator: UsageTracker accumulates tokens across
meta-router, pattern prompt calls, and react reasoning. Attached
to end_of_dialog.
- Translators: Encode token fields when not None (is not None, not truthy)
- Python SDK: RAG and text-completion methods return
TextCompletionResult (non-streaming) or RAGChunk/AgentAnswer with
token fields (streaming)
- CLI: --show-usage flag on tg-invoke-llm, tg-invoke-prompt,
tg-invoke-graph-rag, tg-invoke-document-rag, tg-invoke-agent
2026-04-11 20:22:35 +01:00
|
|
|
explainable=False, debug=False, show_usage=False
|
2025-03-13 00:38:18 +00:00
|
|
|
):
|
2025-01-02 19:49:22 +00:00
|
|
|
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
# Explainable mode uses the API to capture and process provenance events
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
if explainable:
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
_question_explainable_api(
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
url=url,
|
|
|
|
|
flow_id=flow_id,
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
question_text=question,
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
user=user,
|
|
|
|
|
collection=collection,
|
|
|
|
|
entity_limit=entity_limit,
|
|
|
|
|
triple_limit=triple_limit,
|
|
|
|
|
max_subgraph_size=max_subgraph_size,
|
|
|
|
|
max_path_length=max_path_length,
|
2026-03-21 20:06:29 +00:00
|
|
|
edge_score_limit=edge_score_limit,
|
|
|
|
|
edge_limit=edge_limit,
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
token=token,
|
|
|
|
|
debug=debug
|
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines
Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.
Explainability API:
- New explainability.py module with entity classes (Question,
Exploration, Focus, Synthesis, Analysis, Conclusion) and
ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
fetching
- Content fetching from librarian with retry logic
CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
explain_id
- tg-invoke-document-rag -x/--explainable flag returns
explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types
Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
references
- New predicates: tg:thoughtDocument, tg:observationDocument
DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references
Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
observation_document_id
Update tests.
2026-03-12 21:40:09 +00:00
|
|
|
)
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
return
|
|
|
|
|
|
2025-12-04 17:38:57 +00:00
|
|
|
# Create API client
|
|
|
|
|
api = Api(url=url, token=token)
|
|
|
|
|
|
|
|
|
|
if streaming:
|
|
|
|
|
# Use socket client for streaming
|
|
|
|
|
socket = api.socket()
|
|
|
|
|
flow = socket.flow(flow_id)
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
response = flow.graph_rag(
|
2025-12-17 21:40:43 +00:00
|
|
|
query=question,
|
2025-12-04 17:38:57 +00:00
|
|
|
user=user,
|
|
|
|
|
collection=collection,
|
|
|
|
|
entity_limit=entity_limit,
|
|
|
|
|
triple_limit=triple_limit,
|
|
|
|
|
max_subgraph_size=max_subgraph_size,
|
|
|
|
|
max_path_length=max_path_length,
|
2026-03-21 20:06:29 +00:00
|
|
|
edge_score_limit=edge_score_limit,
|
|
|
|
|
edge_limit=edge_limit,
|
2025-12-04 17:38:57 +00:00
|
|
|
streaming=True
|
|
|
|
|
)
|
2024-09-05 16:45:22 +01:00
|
|
|
|
2025-12-04 17:38:57 +00:00
|
|
|
# Stream output
|
Expose LLM token usage (in_token, out_token, model) across all
service layers
Propagate token counts from LLM services through the prompt,
text-completion, graph-RAG, document-RAG, and agent orchestrator
pipelines to the API gateway and Python SDK. All fields are Optional
— None means "not available", distinguishing from a real zero count.
Key changes:
- Schema: Add in_token/out_token/model to TextCompletionResponse,
PromptResponse, GraphRagResponse, DocumentRagResponse,
AgentResponse
- TextCompletionClient: New TextCompletionResult return type. Split
into text_completion() (non-streaming) and
text_completion_stream() (streaming with per-chunk handler
callback)
- PromptClient: New PromptResult with response_type
(text/json/jsonl), typed fields (text/object/objects), and token
usage. All callers updated.
- RAG services: Accumulate token usage across all prompt calls
(extract-concepts, edge-scoring, edge-reasoning,
synthesis). Non-streaming path sends single combined response
instead of chunk + end_of_session.
- Agent orchestrator: UsageTracker accumulates tokens across
meta-router, pattern prompt calls, and react reasoning. Attached
to end_of_dialog.
- Translators: Encode token fields when not None (is not None, not truthy)
- Python SDK: RAG and text-completion methods return
TextCompletionResult (non-streaming) or RAGChunk/AgentAnswer with
token fields (streaming)
- CLI: --show-usage flag on tg-invoke-llm, tg-invoke-prompt,
tg-invoke-graph-rag, tg-invoke-document-rag, tg-invoke-agent
2026-04-11 20:22:35 +01:00
|
|
|
last_chunk = None
|
2025-12-04 17:38:57 +00:00
|
|
|
for chunk in response:
|
Expose LLM token usage (in_token, out_token, model) across all
service layers
Propagate token counts from LLM services through the prompt,
text-completion, graph-RAG, document-RAG, and agent orchestrator
pipelines to the API gateway and Python SDK. All fields are Optional
— None means "not available", distinguishing from a real zero count.
Key changes:
- Schema: Add in_token/out_token/model to TextCompletionResponse,
PromptResponse, GraphRagResponse, DocumentRagResponse,
AgentResponse
- TextCompletionClient: New TextCompletionResult return type. Split
into text_completion() (non-streaming) and
text_completion_stream() (streaming with per-chunk handler
callback)
- PromptClient: New PromptResult with response_type
(text/json/jsonl), typed fields (text/object/objects), and token
usage. All callers updated.
- RAG services: Accumulate token usage across all prompt calls
(extract-concepts, edge-scoring, edge-reasoning,
synthesis). Non-streaming path sends single combined response
instead of chunk + end_of_session.
- Agent orchestrator: UsageTracker accumulates tokens across
meta-router, pattern prompt calls, and react reasoning. Attached
to end_of_dialog.
- Translators: Encode token fields when not None (is not None, not truthy)
- Python SDK: RAG and text-completion methods return
TextCompletionResult (non-streaming) or RAGChunk/AgentAnswer with
token fields (streaming)
- CLI: --show-usage flag on tg-invoke-llm, tg-invoke-prompt,
tg-invoke-graph-rag, tg-invoke-document-rag, tg-invoke-agent
2026-04-11 20:22:35 +01:00
|
|
|
print(chunk.content, end="", flush=True)
|
|
|
|
|
last_chunk = chunk
|
2025-12-04 17:38:57 +00:00
|
|
|
print() # Final newline
|
|
|
|
|
|
Expose LLM token usage (in_token, out_token, model) across all
service layers
Propagate token counts from LLM services through the prompt,
text-completion, graph-RAG, document-RAG, and agent orchestrator
pipelines to the API gateway and Python SDK. All fields are Optional
— None means "not available", distinguishing from a real zero count.
Key changes:
- Schema: Add in_token/out_token/model to TextCompletionResponse,
PromptResponse, GraphRagResponse, DocumentRagResponse,
AgentResponse
- TextCompletionClient: New TextCompletionResult return type. Split
into text_completion() (non-streaming) and
text_completion_stream() (streaming with per-chunk handler
callback)
- PromptClient: New PromptResult with response_type
(text/json/jsonl), typed fields (text/object/objects), and token
usage. All callers updated.
- RAG services: Accumulate token usage across all prompt calls
(extract-concepts, edge-scoring, edge-reasoning,
synthesis). Non-streaming path sends single combined response
instead of chunk + end_of_session.
- Agent orchestrator: UsageTracker accumulates tokens across
meta-router, pattern prompt calls, and react reasoning. Attached
to end_of_dialog.
- Translators: Encode token fields when not None (is not None, not truthy)
- Python SDK: RAG and text-completion methods return
TextCompletionResult (non-streaming) or RAGChunk/AgentAnswer with
token fields (streaming)
- CLI: --show-usage flag on tg-invoke-llm, tg-invoke-prompt,
tg-invoke-graph-rag, tg-invoke-document-rag, tg-invoke-agent
2026-04-11 20:22:35 +01:00
|
|
|
if show_usage and last_chunk:
|
|
|
|
|
print(
|
|
|
|
|
f"Input tokens: {last_chunk.in_token} "
|
|
|
|
|
f"Output tokens: {last_chunk.out_token} "
|
|
|
|
|
f"Model: {last_chunk.model}",
|
|
|
|
|
file=sys.stderr,
|
|
|
|
|
)
|
|
|
|
|
|
2025-12-04 17:38:57 +00:00
|
|
|
finally:
|
|
|
|
|
socket.close()
|
|
|
|
|
else:
|
|
|
|
|
# Use REST API for non-streaming
|
|
|
|
|
flow = api.flow().id(flow_id)
|
Expose LLM token usage (in_token, out_token, model) across all
service layers
Propagate token counts from LLM services through the prompt,
text-completion, graph-RAG, document-RAG, and agent orchestrator
pipelines to the API gateway and Python SDK. All fields are Optional
— None means "not available", distinguishing from a real zero count.
Key changes:
- Schema: Add in_token/out_token/model to TextCompletionResponse,
PromptResponse, GraphRagResponse, DocumentRagResponse,
AgentResponse
- TextCompletionClient: New TextCompletionResult return type. Split
into text_completion() (non-streaming) and
text_completion_stream() (streaming with per-chunk handler
callback)
- PromptClient: New PromptResult with response_type
(text/json/jsonl), typed fields (text/object/objects), and token
usage. All callers updated.
- RAG services: Accumulate token usage across all prompt calls
(extract-concepts, edge-scoring, edge-reasoning,
synthesis). Non-streaming path sends single combined response
instead of chunk + end_of_session.
- Agent orchestrator: UsageTracker accumulates tokens across
meta-router, pattern prompt calls, and react reasoning. Attached
to end_of_dialog.
- Translators: Encode token fields when not None (is not None, not truthy)
- Python SDK: RAG and text-completion methods return
TextCompletionResult (non-streaming) or RAGChunk/AgentAnswer with
token fields (streaming)
- CLI: --show-usage flag on tg-invoke-llm, tg-invoke-prompt,
tg-invoke-graph-rag, tg-invoke-document-rag, tg-invoke-agent
2026-04-11 20:22:35 +01:00
|
|
|
result = flow.graph_rag(
|
2025-12-17 21:40:43 +00:00
|
|
|
query=question,
|
2025-12-04 17:38:57 +00:00
|
|
|
user=user,
|
|
|
|
|
collection=collection,
|
|
|
|
|
entity_limit=entity_limit,
|
|
|
|
|
triple_limit=triple_limit,
|
|
|
|
|
max_subgraph_size=max_subgraph_size,
|
2026-03-21 20:06:29 +00:00
|
|
|
max_path_length=max_path_length,
|
|
|
|
|
edge_score_limit=edge_score_limit,
|
|
|
|
|
edge_limit=edge_limit,
|
2025-12-04 17:38:57 +00:00
|
|
|
)
|
Expose LLM token usage (in_token, out_token, model) across all
service layers
Propagate token counts from LLM services through the prompt,
text-completion, graph-RAG, document-RAG, and agent orchestrator
pipelines to the API gateway and Python SDK. All fields are Optional
— None means "not available", distinguishing from a real zero count.
Key changes:
- Schema: Add in_token/out_token/model to TextCompletionResponse,
PromptResponse, GraphRagResponse, DocumentRagResponse,
AgentResponse
- TextCompletionClient: New TextCompletionResult return type. Split
into text_completion() (non-streaming) and
text_completion_stream() (streaming with per-chunk handler
callback)
- PromptClient: New PromptResult with response_type
(text/json/jsonl), typed fields (text/object/objects), and token
usage. All callers updated.
- RAG services: Accumulate token usage across all prompt calls
(extract-concepts, edge-scoring, edge-reasoning,
synthesis). Non-streaming path sends single combined response
instead of chunk + end_of_session.
- Agent orchestrator: UsageTracker accumulates tokens across
meta-router, pattern prompt calls, and react reasoning. Attached
to end_of_dialog.
- Translators: Encode token fields when not None (is not None, not truthy)
- Python SDK: RAG and text-completion methods return
TextCompletionResult (non-streaming) or RAGChunk/AgentAnswer with
token fields (streaming)
- CLI: --show-usage flag on tg-invoke-llm, tg-invoke-prompt,
tg-invoke-graph-rag, tg-invoke-document-rag, tg-invoke-agent
2026-04-11 20:22:35 +01:00
|
|
|
print(result.text)
|
|
|
|
|
|
|
|
|
|
if show_usage:
|
|
|
|
|
print(
|
|
|
|
|
f"Input tokens: {result.in_token} "
|
|
|
|
|
f"Output tokens: {result.out_token} "
|
|
|
|
|
f"Model: {result.model}",
|
|
|
|
|
file=sys.stderr,
|
|
|
|
|
)
|
2024-09-05 16:45:22 +01:00
|
|
|
|
|
|
|
|
def main():
|
|
|
|
|
|
|
|
|
|
parser = argparse.ArgumentParser(
|
2025-01-02 19:49:22 +00:00
|
|
|
prog='tg-invoke-graph-rag',
|
2024-09-05 16:45:22 +01:00
|
|
|
description=__doc__,
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
parser.add_argument(
|
2025-01-02 19:49:22 +00:00
|
|
|
'-u', '--url',
|
|
|
|
|
default=default_url,
|
|
|
|
|
help=f'API URL (default: {default_url})',
|
2024-09-05 16:45:22 +01:00
|
|
|
)
|
|
|
|
|
|
2025-12-04 17:38:57 +00:00
|
|
|
parser.add_argument(
|
|
|
|
|
'-t', '--token',
|
|
|
|
|
default=default_token,
|
|
|
|
|
help='Authentication token (default: $TRUSTGRAPH_TOKEN)',
|
|
|
|
|
)
|
|
|
|
|
|
2025-05-03 10:39:53 +01:00
|
|
|
parser.add_argument(
|
|
|
|
|
'-f', '--flow-id',
|
2025-05-24 12:27:56 +01:00
|
|
|
default="default",
|
|
|
|
|
help=f'Flow ID (default: default)'
|
2025-05-03 10:39:53 +01:00
|
|
|
)
|
|
|
|
|
|
2024-09-05 16:45:22 +01:00
|
|
|
parser.add_argument(
|
2025-01-02 19:49:22 +00:00
|
|
|
'-q', '--question',
|
2024-09-05 16:45:22 +01:00
|
|
|
required=True,
|
2025-01-02 19:49:22 +00:00
|
|
|
help=f'Question to answer',
|
2024-09-05 16:45:22 +01:00
|
|
|
)
|
|
|
|
|
|
2024-10-02 18:14:29 +01:00
|
|
|
parser.add_argument(
|
2025-01-02 19:49:22 +00:00
|
|
|
'-U', '--user',
|
2024-10-02 18:14:29 +01:00
|
|
|
default=default_user,
|
|
|
|
|
help=f'User ID (default: {default_user})'
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
parser.add_argument(
|
2025-01-02 19:49:22 +00:00
|
|
|
'-C', '--collection',
|
2024-10-02 18:14:29 +01:00
|
|
|
default=default_collection,
|
|
|
|
|
help=f'Collection ID (default: {default_collection})'
|
|
|
|
|
)
|
|
|
|
|
|
2025-03-13 00:38:18 +00:00
|
|
|
parser.add_argument(
|
|
|
|
|
'-e', '--entity-limit',
|
2025-12-04 17:38:57 +00:00
|
|
|
type=int,
|
2025-03-13 00:38:18 +00:00
|
|
|
default=default_entity_limit,
|
|
|
|
|
help=f'Entity limit (default: {default_entity_limit})'
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
parser.add_argument(
|
2025-12-04 17:38:57 +00:00
|
|
|
'--triple-limit',
|
|
|
|
|
type=int,
|
2025-03-13 00:38:18 +00:00
|
|
|
default=default_triple_limit,
|
|
|
|
|
help=f'Triple limit (default: {default_triple_limit})'
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
parser.add_argument(
|
|
|
|
|
'-s', '--max-subgraph-size',
|
2025-12-04 17:38:57 +00:00
|
|
|
type=int,
|
2025-03-13 00:38:18 +00:00
|
|
|
default=default_max_subgraph_size,
|
|
|
|
|
help=f'Max subgraph size (default: {default_max_subgraph_size})'
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
parser.add_argument(
|
|
|
|
|
'-p', '--max-path-length',
|
2025-12-04 17:38:57 +00:00
|
|
|
type=int,
|
2025-03-13 00:38:18 +00:00
|
|
|
default=default_max_path_length,
|
|
|
|
|
help=f'Max path length (default: {default_max_path_length})'
|
|
|
|
|
)
|
|
|
|
|
|
2026-03-21 20:06:29 +00:00
|
|
|
parser.add_argument(
|
|
|
|
|
'--edge-score-limit',
|
|
|
|
|
type=int,
|
|
|
|
|
default=default_edge_score_limit,
|
|
|
|
|
help=f'Semantic pre-filter limit before LLM scoring (default: {default_edge_score_limit})'
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
parser.add_argument(
|
|
|
|
|
'--edge-limit',
|
|
|
|
|
type=int,
|
|
|
|
|
default=default_edge_limit,
|
|
|
|
|
help=f'Max edges after LLM scoring (default: {default_edge_limit})'
|
|
|
|
|
)
|
|
|
|
|
|
2025-11-26 19:47:39 +00:00
|
|
|
parser.add_argument(
|
|
|
|
|
'--no-streaming',
|
|
|
|
|
action='store_true',
|
|
|
|
|
help='Disable streaming (use non-streaming mode)'
|
|
|
|
|
)
|
|
|
|
|
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
parser.add_argument(
|
|
|
|
|
'-x', '--explainable',
|
|
|
|
|
action='store_true',
|
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O
GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
kg-edge-scoring,
kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
provenance/explainability edges
- Add source document edges to knowledge graph
DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
pattern:
Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication
Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
|
|
|
help='Show provenance events: Question, Grounding, Exploration, Focus, Synthesis (implies streaming)'
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
)
|
|
|
|
|
|
|
|
|
|
parser.add_argument(
|
|
|
|
|
'--debug',
|
|
|
|
|
action='store_true',
|
|
|
|
|
help='Show debug output for troubleshooting'
|
|
|
|
|
)
|
|
|
|
|
|
Expose LLM token usage (in_token, out_token, model) across all
service layers
Propagate token counts from LLM services through the prompt,
text-completion, graph-RAG, document-RAG, and agent orchestrator
pipelines to the API gateway and Python SDK. All fields are Optional
— None means "not available", distinguishing from a real zero count.
Key changes:
- Schema: Add in_token/out_token/model to TextCompletionResponse,
PromptResponse, GraphRagResponse, DocumentRagResponse,
AgentResponse
- TextCompletionClient: New TextCompletionResult return type. Split
into text_completion() (non-streaming) and
text_completion_stream() (streaming with per-chunk handler
callback)
- PromptClient: New PromptResult with response_type
(text/json/jsonl), typed fields (text/object/objects), and token
usage. All callers updated.
- RAG services: Accumulate token usage across all prompt calls
(extract-concepts, edge-scoring, edge-reasoning,
synthesis). Non-streaming path sends single combined response
instead of chunk + end_of_session.
- Agent orchestrator: UsageTracker accumulates tokens across
meta-router, pattern prompt calls, and react reasoning. Attached
to end_of_dialog.
- Translators: Encode token fields when not None (is not None, not truthy)
- Python SDK: RAG and text-completion methods return
TextCompletionResult (non-streaming) or RAGChunk/AgentAnswer with
token fields (streaming)
- CLI: --show-usage flag on tg-invoke-llm, tg-invoke-prompt,
tg-invoke-graph-rag, tg-invoke-document-rag, tg-invoke-agent
2026-04-11 20:22:35 +01:00
|
|
|
parser.add_argument(
|
|
|
|
|
'--show-usage',
|
|
|
|
|
action='store_true',
|
|
|
|
|
help='Show token usage and model on stderr'
|
|
|
|
|
)
|
|
|
|
|
|
2024-09-05 16:45:22 +01:00
|
|
|
args = parser.parse_args()
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
|
2025-12-04 17:38:57 +00:00
|
|
|
question(
|
|
|
|
|
url=args.url,
|
|
|
|
|
flow_id=args.flow_id,
|
|
|
|
|
question=args.question,
|
|
|
|
|
user=args.user,
|
|
|
|
|
collection=args.collection,
|
|
|
|
|
entity_limit=args.entity_limit,
|
|
|
|
|
triple_limit=args.triple_limit,
|
|
|
|
|
max_subgraph_size=args.max_subgraph_size,
|
|
|
|
|
max_path_length=args.max_path_length,
|
2026-03-21 20:06:29 +00:00
|
|
|
edge_score_limit=args.edge_score_limit,
|
|
|
|
|
edge_limit=args.edge_limit,
|
2025-12-04 17:38:57 +00:00
|
|
|
streaming=not args.no_streaming,
|
|
|
|
|
token=args.token,
|
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
|
|
|
explainable=args.explainable,
|
|
|
|
|
debug=args.debug,
|
Expose LLM token usage (in_token, out_token, model) across all
service layers
Propagate token counts from LLM services through the prompt,
text-completion, graph-RAG, document-RAG, and agent orchestrator
pipelines to the API gateway and Python SDK. All fields are Optional
— None means "not available", distinguishing from a real zero count.
Key changes:
- Schema: Add in_token/out_token/model to TextCompletionResponse,
PromptResponse, GraphRagResponse, DocumentRagResponse,
AgentResponse
- TextCompletionClient: New TextCompletionResult return type. Split
into text_completion() (non-streaming) and
text_completion_stream() (streaming with per-chunk handler
callback)
- PromptClient: New PromptResult with response_type
(text/json/jsonl), typed fields (text/object/objects), and token
usage. All callers updated.
- RAG services: Accumulate token usage across all prompt calls
(extract-concepts, edge-scoring, edge-reasoning,
synthesis). Non-streaming path sends single combined response
instead of chunk + end_of_session.
- Agent orchestrator: UsageTracker accumulates tokens across
meta-router, pattern prompt calls, and react reasoning. Attached
to end_of_dialog.
- Translators: Encode token fields when not None (is not None, not truthy)
- Python SDK: RAG and text-completion methods return
TextCompletionResult (non-streaming) or RAGChunk/AgentAnswer with
token fields (streaming)
- CLI: --show-usage flag on tg-invoke-llm, tg-invoke-prompt,
tg-invoke-graph-rag, tg-invoke-document-rag, tg-invoke-agent
2026-04-11 20:22:35 +01:00
|
|
|
show_usage=args.show_usage,
|
2025-12-04 17:38:57 +00:00
|
|
|
)
|
2024-09-05 16:45:22 +01:00
|
|
|
|
|
|
|
|
except Exception as e:
|
|
|
|
|
|
|
|
|
|
print("Exception:", e, flush=True)
|
|
|
|
|
|
2025-07-23 21:22:08 +01:00
|
|
|
if __name__ == "__main__":
|
2025-12-04 17:38:57 +00:00
|
|
|
main()
|