cybermaggedon
7a6197d8c3
GraphRAG Query-Time Explainability ( #677 )
...
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.
Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"
GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
reasoning)
4. Answer - reference to synthesized response
Events stream via explain_callback during query(), enabling
real-time UX.
- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service
- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
(uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies
- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
?stmt tg:reifies <<s p o>>
- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries
GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool
trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()
trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session
trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission
trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching
trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
cybermaggedon
d2d71f859d
Feature/streaming triples ( #676 )
...
* Steaming triples
* Also GraphRAG service uses this
* Updated tests
2026-03-09 15:46:33 +00:00
cybermaggedon
f2ae0e8623
Embeddings API scores ( #671 )
...
- Put scores in all responses
- Remove unused 'middle' vector layer. Vector of texts -> vector of (vector embedding)
2026-03-09 10:53:44 +00:00
cybermaggedon
4fa7cc7d7c
Fix/embeddings integration 2 ( #670 )
2026-03-08 19:42:26 +00:00
cybermaggedon
3bf8a65409
Fix tests ( #666 )
2026-03-07 23:38:09 +00:00
cybermaggedon
1809c1f56d
Structured data 2 ( #645 )
...
* Structured data refactor - multi-index tables, remove need for manual mods to the Cassandra tables
* Tech spec updated to track implementation
2026-02-23 15:56:29 +00:00
cybermaggedon
d886358be6
Entity & triple batch size limits ( #635 )
...
* Entities and triples are emitted in batches with a batch limit to manage
overloading downstream.
* Update tests
2026-02-16 17:38:03 +00:00
cybermaggedon
8574861196
Protect null embeddings - v2.0 ( #627 )
...
* Don't emit graph embeddings if there aren't any.
* Don't store graph embeddings in a knowledge store if there's an empty list.
* Translate between Cassandra's 'null' representing an empty list and an
empty list which is what the surrounding code wants (and stored in the
first place).
* Avoid emitting empty embedding lists
* Avoid output empty triple lists
* Fix tests
2026-02-09 14:57:36 +00:00
cybermaggedon
cf0daedefa
Changed schema for Value -> Term, majorly breaking change ( #622 )
...
* Changed schema for Value -> Term, majorly breaking change
* Following the schema change, Value -> Term into all processing
* Updated Cassandra for g, p, s, o index patterns (7 indexes)
* Reviewed and updated all tests
* Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down
2026-01-27 13:48:08 +00:00
cybermaggedon
e214eb4e02
Feature/prompts jsonl ( #619 )
...
* Tech spec
* JSONL implementation complete
* Updated prompt client users
* Fix tests
2026-01-26 17:38:00 +00:00
cybermaggedon
16a5cf966a
Fix agent streaming tool failure ( #602 )
...
* Fix agent streaming linkage
* Update tests
2026-01-06 23:00:50 +00:00
cybermaggedon
f79d0603f7
Update to add streaming tests ( #600 )
2026-01-06 21:48:05 +00:00
cybermaggedon
ae13190093
Address legacy issues in storage management ( #595 )
...
* Removed legacy storage management cruft. Tidied tech specs.
* Fix deletion of last collection
* Storage processor ignores data on the queue which is for a deleted collection
* Updated tests
2026-01-05 13:45:14 +00:00
cybermaggedon
5304f96fe6
Fix tests ( #593 )
...
* Fix unit/integration/contract tests which were broken by messaging fabric work
2025-12-19 08:53:21 +00:00
cybermaggedon
7d07f802a8
Basic multitenant support ( #583 )
...
* Tech spec
* Address multi-tenant queue option problems in CLI
* Modified collection service to use config
* Changed storage management to use the config service definition
2025-12-05 21:45:30 +00:00
cybermaggedon
72cb1c98e0
Fix tests ( #571 )
2025-11-28 16:37:01 +00:00
cybermaggedon
e24de6081f
Fix streaming agent interactions ( #570 )
...
* Fix observer, thought streaming
* Fix end of message indicators
* Remove double-delivery of answer
2025-11-28 16:25:57 +00:00
cybermaggedon
1948edaa50
Streaming rag responses ( #568 )
...
* Tech spec for streaming RAG
* Support for streaming Graph/Doc RAG
2025-11-26 19:47:39 +00:00
cybermaggedon
b1cc724f7d
Streaming LLM part 2 ( #567 )
...
* Updates for agent API with streaming support
* Added tg-dump-queues tool to dump Pulsar queues to a log
* Updated tg-invoke-agent, incremental output
* Queue dumper CLI - might be useful for debug
* Updating for tests
2025-11-26 15:16:17 +00:00
cybermaggedon
310a2deb06
Feature/streaming llm phase 1 ( #566 )
...
* Tidy up duplicate tech specs in doc directory
* Streaming LLM text-completion service tech spec.
* text-completion and prompt interfaces
* streaming change applied to all LLMs, so far tested with VertexAI
* Skip Pinecone unit tests, upstream module issue is affecting things, tests are passing again
* Added agent streaming, not working and has broken tests
2025-11-26 09:59:10 +00:00
cybermaggedon
51107008fd
master -> 1.5 (README updates) ( #552 )
2025-10-11 11:46:03 +01:00
cybermaggedon
52b133fc86
Collection delete pt. 3 ( #542 )
...
* Fixing collection deletion
* Fixing collection management param error
* Always test for collections
* Add Cassandra collection table
* Updated tech spec for explicit creation/deletion
* Remove implicit collection creation
* Fix up collection tracking in all processors
2025-09-30 16:02:33 +01:00
cybermaggedon
43cfcb18a0
More LLM param test coverage ( #535 )
...
* More LLM tests
* Fixing tests
2025-09-26 01:00:30 +01:00
cybermaggedon
b0a3716b0e
Tests are failing ( #534 )
...
* Fix tests, update to new model parameter usage
2025-09-25 21:32:19 +01:00
cybermaggedon
13ff7d765d
Collection management ( #520 )
...
* Tech spec
* Refactored Cassanda knowledge graph for single table
* Collection management, librarian services to manage metadata and collection deletion
2025-09-18 15:57:52 +01:00
cybermaggedon
f22bf13aa6
Extend use of user + collection fields ( #503 )
...
* Collection+user fields in structured query
* User/collection in structured query & agent
2025-09-08 18:28:38 +01:00
cybermaggedon
5537fac731
Structured data, minor features ( #500 )
...
- Sorted out confusing --auto mode with tg-load-structured-data
- Fixed tests & added CLI tests
2025-09-05 17:25:12 +01:00
cybermaggedon
0b7620bc04
Object batching ( #499 )
...
* Object batching
* Update tests
2025-09-05 15:59:06 +01:00
cybermaggedon
50c37407c5
Fix/sys integration issues ( #494 )
...
* Fix integration issues
* Fix query defaults
* Fix tests
2025-09-05 08:38:15 +01:00
cybermaggedon
ed0e02791d
Feature/structured query tool integration ( #493 )
...
* Agent integration to structured query
* Update tests
2025-09-04 16:23:43 +01:00
cybermaggedon
a6d9f5e849
Structured query support ( #492 )
...
* Tweak the structured query schema
* Structure query service
* Gateway support for nlp-query and structured-query
* API support
* Added CLI
* Update tests
* More tests
2025-09-04 16:06:18 +01:00
cybermaggedon
85e669c763
Fixing more Cassandra consistency issues ( #488 )
...
* Fixing more Cassandra work
* Fix tests
2025-09-04 00:58:11 +01:00
cybermaggedon
ccaec88a72
Feature/consolidate cassandra config ( #483 )
...
* Cassandra consolidation of parameters
* New Cassandra configuration helper
* Implemented Cassanda config refactor
* New tests
2025-09-03 23:41:22 +01:00
cybermaggedon
e74eb5d1ff
Feature/tool group ( #484 )
...
* Tech spec for tool group
* Partial tool group implementation
* Tool group tests
2025-09-03 23:39:49 +01:00
cybermaggedon
672e358b2f
Feature/graphql table query ( #486 )
...
* Tech spec
* Object query service for Cassandra
* Gateway support for objects-query
* GraphQL query utility
* Filters, ordering
2025-09-03 23:39:11 +01:00
cybermaggedon
96c2b73457
Fix import export graceful shutdown ( #476 )
...
* Tech spec for graceful shutdown
* Graceful shutdown of importers/exporters
* Update socket to include graceful shutdown orchestration
* Adding tests for conditions tracked in this PR
2025-08-28 13:39:28 +01:00
cybermaggedon
e5b9b4976a
Fix agent knowledge query initialisation failure ( #469 )
...
* Back out agent change
* Fixed broken tests
2025-08-26 19:41:04 +01:00
cybermaggedon
6e9e2a11b1
Fix knowledge query ignoring the collection ( #467 )
...
* Fix knowledge query ignoring the collection
* Updated the agent_manager.py to properly pass config parameters when instantiating tool implementations
* Added tests for agent collection parameter
2025-08-26 19:05:48 +01:00
cybermaggedon
28190fea8a
More config cli ( #466 )
...
* Extra config CLI tech spec
* Describe packaging
* Added CLI commands
* Add tests
2025-08-22 13:36:10 +01:00
cybermaggedon
83f0c1e7f3
Structure data mvp ( #452 )
...
* Structured data tech spec
* Architecture principles
* New schemas
* Updated schemas and specs
* Object extractor
* Add .coveragerc
* New tests
* Cassandra object storage
* Trying to object extraction working, issues exist
2025-08-07 20:47:20 +01:00
cybermaggedon
dd70aade11
Implement logging strategy ( #444 )
...
* Logging strategy and convert all prints() to logging invocations
2025-07-30 23:18:38 +01:00
cybermaggedon
d83e4e3d59
Update to enable knowledge extraction using the agent framework ( #439 )
...
* Implement KG extraction agent (kg-extract-agent)
* Using ReAct framework (agent-manager-react)
* ReAct manager had an issue when emitting JSON, which conflicts which ReAct manager's own JSON messages, so refactored ReAct manager to use traditional ReAct messages, non-JSON structure.
* Minor refactor to take the prompt template client out of prompt-template so it can be more readily used by other modules. kg-extract-agent uses this framework.
2025-07-21 14:31:57 +01:00
cybermaggedon
81c7c1181b
Updated CLI invocation and config model for tools and mcp ( #438 )
...
* Updated CLI invocation and config model for tools and mcp
* CLI anomalies
* Tweaked the MCP tool implementation for new model
* Update agent implementation to match the new model
* Fix agent tools, now all tested
* Fixed integration tests
* Fix MCP delete tool params
2025-07-16 23:09:32 +01:00
cybermaggedon
f37decea2b
Increase storage test coverage ( #435 )
...
* Fixing storage and adding tests
* PR pipeline only runs quick tests
2025-07-15 09:33:35 +01:00
cybermaggedon
2f7fddd206
Test suite executed from CI pipeline ( #433 )
...
* Test strategy & test cases
* Unit tests
* Integration tests
2025-07-14 14:57:44 +01:00