Commit graph

46 commits

Author SHA1 Message Date
cybermaggedon
e1bc4c04a4
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data

Changed terminology:
  - session -> question
  - retrieval -> exploration
  - selection -> focus
  - answer -> synthesis

- uris.py: Renamed query_session_uri → question_uri,
  retrieval_uri → exploration_uri, selection_uri → focus_uri,
  answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
  updated labels ("GraphRAG question", "Exploration", "Focus",
  "Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
  GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
  (Question, Exploration, Focus, Synthesis)

Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
  on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
  named graph
- rag.py: Explainability triples stored in user's collection (not
  separate collection) with named graph

Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
  named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
  named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph

CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
  --show-graph to display graph column

Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
cybermaggedon
7a6197d8c3
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.

Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"

GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
   reasoning)
4. Answer - reference to synthesized response

Events stream via explain_callback during query(), enabling
real-time UX.

- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service

- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
  (uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies

- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
  ?stmt tg:reifies <<s p o>>

- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries

GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool

trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()

trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session

trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission

trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching

trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
cybermaggedon
d2d71f859d
Feature/streaming triples (#676)
* Steaming triples

* Also GraphRAG service uses this

* Updated tests
2026-03-09 15:46:33 +00:00
cybermaggedon
f2ae0e8623
Embeddings API scores (#671)
- Put scores in all responses
- Remove unused 'middle' vector layer. Vector of texts -> vector of (vector embedding)
2026-03-09 10:53:44 +00:00
cybermaggedon
4fa7cc7d7c
Fix/embeddings integration 2 (#670) 2026-03-08 19:42:26 +00:00
cybermaggedon
3bf8a65409
Fix tests (#666) 2026-03-07 23:38:09 +00:00
cybermaggedon
1809c1f56d
Structured data 2 (#645)
* Structured data refactor - multi-index tables, remove need for manual mods to the Cassandra tables

* Tech spec updated to track implementation
2026-02-23 15:56:29 +00:00
cybermaggedon
d886358be6
Entity & triple batch size limits (#635)
* Entities and triples are emitted in batches with a batch limit to manage
overloading downstream.

* Update tests
2026-02-16 17:38:03 +00:00
cybermaggedon
8574861196
Protect null embeddings - v2.0 (#627)
* Don't emit graph embeddings if there aren't any.

* Don't store graph embeddings in a knowledge store if there's an empty list.

* Translate between Cassandra's 'null' representing an empty list and an
  empty list which is what the surrounding code wants (and stored in the
  first place).

* Avoid emitting empty embedding lists

* Avoid output empty triple lists

* Fix tests
2026-02-09 14:57:36 +00:00
cybermaggedon
cf0daedefa
Changed schema for Value -> Term, majorly breaking change (#622)
* Changed schema for Value -> Term, majorly breaking change

* Following the schema change, Value -> Term into all processing

* Updated Cassandra for g, p, s, o index patterns (7 indexes)

* Reviewed and updated all tests

* Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down
2026-01-27 13:48:08 +00:00
cybermaggedon
e214eb4e02
Feature/prompts jsonl (#619)
* Tech spec

* JSONL implementation complete

* Updated prompt client users

* Fix tests
2026-01-26 17:38:00 +00:00
cybermaggedon
16a5cf966a
Fix agent streaming tool failure (#602)
* Fix agent streaming linkage

* Update tests
2026-01-06 23:00:50 +00:00
cybermaggedon
f79d0603f7
Update to add streaming tests (#600) 2026-01-06 21:48:05 +00:00
cybermaggedon
ae13190093
Address legacy issues in storage management (#595)
* Removed legacy storage management cruft.  Tidied tech specs.

* Fix deletion of last collection

* Storage processor ignores data on the queue which is for a deleted collection

* Updated tests
2026-01-05 13:45:14 +00:00
cybermaggedon
5304f96fe6
Fix tests (#593)
* Fix unit/integration/contract tests which were broken by messaging fabric work
2025-12-19 08:53:21 +00:00
cybermaggedon
7d07f802a8
Basic multitenant support (#583)
* Tech spec

* Address multi-tenant queue option problems in CLI

* Modified collection service to use config

* Changed storage management to use the config service definition
2025-12-05 21:45:30 +00:00
cybermaggedon
72cb1c98e0
Fix tests (#571) 2025-11-28 16:37:01 +00:00
cybermaggedon
e24de6081f
Fix streaming agent interactions (#570)
* Fix observer, thought streaming

* Fix end of message indicators

* Remove double-delivery of answer
2025-11-28 16:25:57 +00:00
cybermaggedon
1948edaa50
Streaming rag responses (#568)
* Tech spec for streaming RAG

* Support for streaming Graph/Doc RAG
2025-11-26 19:47:39 +00:00
cybermaggedon
b1cc724f7d
Streaming LLM part 2 (#567)
* Updates for agent API with streaming support

* Added tg-dump-queues tool to dump Pulsar queues to a log

* Updated tg-invoke-agent, incremental output

* Queue dumper CLI - might be useful for debug

* Updating for tests
2025-11-26 15:16:17 +00:00
cybermaggedon
310a2deb06
Feature/streaming llm phase 1 (#566)
* Tidy up duplicate tech specs in doc directory

* Streaming LLM text-completion service tech spec.

* text-completion and prompt interfaces

* streaming change applied to all LLMs, so far tested with VertexAI

* Skip Pinecone unit tests, upstream module issue is affecting things, tests are passing again

* Added agent streaming, not working and has broken tests
2025-11-26 09:59:10 +00:00
cybermaggedon
51107008fd
master -> 1.5 (README updates) (#552) 2025-10-11 11:46:03 +01:00
cybermaggedon
52b133fc86
Collection delete pt. 3 (#542)
* Fixing collection deletion

* Fixing collection management param error

* Always test for collections

* Add Cassandra collection table

* Updated tech spec for explicit creation/deletion

* Remove implicit collection creation

* Fix up collection tracking in all processors
2025-09-30 16:02:33 +01:00
cybermaggedon
43cfcb18a0
More LLM param test coverage (#535)
* More LLM tests

* Fixing tests
2025-09-26 01:00:30 +01:00
cybermaggedon
b0a3716b0e
Tests are failing (#534)
* Fix tests, update to new model parameter usage
2025-09-25 21:32:19 +01:00
cybermaggedon
13ff7d765d
Collection management (#520)
* Tech spec

* Refactored Cassanda knowledge graph for single table

* Collection management, librarian services to manage metadata and collection deletion
2025-09-18 15:57:52 +01:00
cybermaggedon
f22bf13aa6
Extend use of user + collection fields (#503)
* Collection+user fields in structured query

* User/collection in structured query & agent
2025-09-08 18:28:38 +01:00
cybermaggedon
5537fac731
Structured data, minor features (#500)
- Sorted out confusing --auto mode with tg-load-structured-data
- Fixed tests & added CLI tests
2025-09-05 17:25:12 +01:00
cybermaggedon
0b7620bc04
Object batching (#499)
* Object batching

* Update tests
2025-09-05 15:59:06 +01:00
cybermaggedon
50c37407c5
Fix/sys integration issues (#494)
* Fix integration issues

* Fix query defaults

* Fix tests
2025-09-05 08:38:15 +01:00
cybermaggedon
ed0e02791d
Feature/structured query tool integration (#493)
* Agent integration to structured query

* Update tests
2025-09-04 16:23:43 +01:00
cybermaggedon
a6d9f5e849
Structured query support (#492)
* Tweak the structured query schema

* Structure query service

* Gateway support for nlp-query and structured-query

* API support

* Added CLI

* Update tests

* More tests
2025-09-04 16:06:18 +01:00
cybermaggedon
85e669c763
Fixing more Cassandra consistency issues (#488)
* Fixing more Cassandra work

* Fix tests
2025-09-04 00:58:11 +01:00
cybermaggedon
ccaec88a72
Feature/consolidate cassandra config (#483)
* Cassandra consolidation of parameters

* New Cassandra configuration helper

* Implemented Cassanda config refactor

* New tests
2025-09-03 23:41:22 +01:00
cybermaggedon
e74eb5d1ff
Feature/tool group (#484)
* Tech spec for tool group

* Partial tool group implementation

* Tool group tests
2025-09-03 23:39:49 +01:00
cybermaggedon
672e358b2f
Feature/graphql table query (#486)
* Tech spec

* Object query service for Cassandra

* Gateway support for objects-query

* GraphQL query utility

* Filters, ordering
2025-09-03 23:39:11 +01:00
cybermaggedon
96c2b73457
Fix import export graceful shutdown (#476)
* Tech spec for graceful shutdown

* Graceful shutdown of importers/exporters

* Update socket to include graceful shutdown orchestration

* Adding tests for conditions tracked in this PR
2025-08-28 13:39:28 +01:00
cybermaggedon
e5b9b4976a
Fix agent knowledge query initialisation failure (#469)
* Back out agent change

* Fixed broken tests
2025-08-26 19:41:04 +01:00
cybermaggedon
6e9e2a11b1
Fix knowledge query ignoring the collection (#467)
* Fix knowledge query ignoring the collection

* Updated the agent_manager.py to properly pass config parameters when instantiating tool implementations

* Added tests for agent collection parameter
2025-08-26 19:05:48 +01:00
cybermaggedon
28190fea8a
More config cli (#466)
* Extra config CLI tech spec

* Describe packaging

* Added CLI commands

* Add tests
2025-08-22 13:36:10 +01:00
cybermaggedon
83f0c1e7f3
Structure data mvp (#452)
* Structured data tech spec

* Architecture principles

* New schemas

* Updated schemas and specs

* Object extractor

* Add .coveragerc

* New tests

* Cassandra object storage

* Trying to object extraction working, issues exist
2025-08-07 20:47:20 +01:00
cybermaggedon
dd70aade11
Implement logging strategy (#444)
* Logging strategy and convert all prints() to logging invocations
2025-07-30 23:18:38 +01:00
cybermaggedon
d83e4e3d59
Update to enable knowledge extraction using the agent framework (#439)
* Implement KG extraction agent (kg-extract-agent)

* Using ReAct framework (agent-manager-react)
 
* ReAct manager had an issue when emitting JSON, which conflicts which ReAct manager's own JSON messages, so refactored ReAct manager to use traditional ReAct messages, non-JSON structure.
 
* Minor refactor to take the prompt template client out of prompt-template so it can be more readily used by other modules. kg-extract-agent uses this framework.
2025-07-21 14:31:57 +01:00
cybermaggedon
81c7c1181b
Updated CLI invocation and config model for tools and mcp (#438)
* Updated CLI invocation and config model for tools and mcp

* CLI anomalies

* Tweaked the MCP tool implementation for new model

* Update agent implementation to match the new model

* Fix agent tools, now all tested

* Fixed integration tests

* Fix MCP delete tool params
2025-07-16 23:09:32 +01:00
cybermaggedon
f37decea2b
Increase storage test coverage (#435)
* Fixing storage and adding tests

* PR pipeline only runs quick tests
2025-07-15 09:33:35 +01:00
cybermaggedon
2f7fddd206
Test suite executed from CI pipeline (#433)
* Test strategy & test cases

* Unit tests

* Integration tests
2025-07-14 14:57:44 +01:00