Commit graph

32 commits

Author SHA1 Message Date
cybermaggedon
d9dc4cbab5
SPARQL query service (#754)
SPARQL 1.1 query service wrapping pub/sub triples interface

Add a backend-agnostic SPARQL query service that parses SPARQL
queries using rdflib, decomposes them into triple pattern lookups
via the existing TriplesClient pub/sub interface, and performs
in-memory joins, filters, and projections.

Includes:
- SPARQL parser, algebra evaluator, expression evaluator, solution
  sequence operations (BGP, JOIN, OPTIONAL, UNION, FILTER, BIND,
  VALUES, GROUP BY, ORDER BY, LIMIT/OFFSET, DISTINCT, aggregates)
- FlowProcessor service with TriplesClientSpec
- Gateway dispatcher, request/response translators, API spec
- Python SDK method (FlowInstance.sparql_query)
- CLI command (tg-invoke-sparql-query)
- Tech spec (docs/tech-specs/sparql-query.md)

New unit tests for SPARQL query
2026-04-02 17:21:39 +01:00
cybermaggedon
5a9db2da50
Add tg-monitor-prompts CLI tool for prompt queue monitoring (#737)
Subscribes to prompt request/response Pulsar queues, correlates
messages by ID, and logs a summary with template name, truncated
terms, and elapsed time. Streaming responses are accumulated and
shown at completion. Supports prompt and prompt-rag queue types.
2026-03-30 16:08:46 +01:00
cybermaggedon
4609424afe
Prepare 2.2 release branch (#704) 2026-03-22 15:23:23 +00:00
cybermaggedon
64e3f6bd0d
Subgraph provenance (#694)
Replace per-triple provenance reification with subgraph model

Extraction provenance previously created a full reification (statement
URI, activity, agent) for every single extracted triple, producing ~13
provenance triples per knowledge triple.  Since each chunk is processed
by a single LLM call, this was both redundant and semantically
inaccurate.

Now one subgraph object is created per chunk extraction, with
tg:contains linking to each extracted triple.  For 20 extractions from
a chunk this reduces provenance from ~260 triples to ~33.

- Rename tg:reifies -> tg:contains, stmt_uri -> subgraph_uri
- Replace triple_provenance_triples() with subgraph_provenance_triples()
- Refactor kg-extract-definitions and kg-extract-relationships to
  generate provenance once per chunk instead of per triple
- Add subgraph provenance to kg-extract-ontology and kg-extract-agent
  (previously had none)
- Update CLI tools and tech specs to match

Also rename tg-show-document-hierarchy to tg-show-extraction-provenance.

Added extra typing for extraction provenance, fixed extraction prov CLI
2026-03-13 11:37:59 +00:00
cybermaggedon
a53ed41da2
Add explainability CLI tools (#688)
Add explainability CLI tools for debugging provenance data
- tg-show-document-hierarchy: Display document → page → chunk → edge
  hierarchy by traversing prov:wasDerivedFrom relationships
- tg-list-explain-traces: List all GraphRAG sessions with questions
  and timestamps from the retrieval graph
- tg-show-explain-trace: Show full explainability cascade for a
  GraphRAG session (question → exploration → focus → synthesis)

These tools query the source and retrieval graphs to help debug
and explore provenance/explainability data stored during document
processing and GraphRAG queries.
2026-03-11 13:44:29 +00:00
cybermaggedon
c951562189
Graph query CLI tool (#679)
New CLI tool that enables selective queries against the triple store
unlike tg-show-graph which dumps the entire graph.

Features:
- Filter by subject, predicate, object, and/or named graph
- Auto-detection of term types (IRI, literal, quoted triple)
- Two ways to specify quoted triples:
  - Inline Turtle-style: -o "<<s p o>>"
  - Explicit flags: --qt-subject, --qt-predicate, --qt-object
- Output formats: space-separated, pipe-separated, JSON, JSON Lines
- Streaming mode for efficient large result sets

Auto-detection rules:
- http://, https://, urn:, or <wrapped> -> IRI
- <<s p o>> -> quoted triple
- Otherwise -> literal
2026-03-10 11:03:34 +00:00
cybermaggedon
3c3e11bef5
Fix/librarian broken (#674)
* Set end-of-stream cleanly - clean streaming message structures

* Add tg-get-document-content
2026-03-09 13:36:24 +00:00
cybermaggedon
a630e143ef
Incremental / large document loading (#659)
Tech spec

BlobStore (trustgraph-flow/trustgraph/librarian/blob_store.py):
- get_stream() - yields document content in chunks for streaming retrieval
- create_multipart_upload() - initializes S3 multipart upload, returns
  upload_id
- upload_part() - uploads a single part, returns etag
- complete_multipart_upload() - finalizes upload with part etags
- abort_multipart_upload() - cancels and cleans up

Cassandra schema (trustgraph-flow/trustgraph/tables/library.py):
- New upload_session table with 24-hour TTL
- Index on user for listing sessions
- Prepared statements for all operations
- Methods: create_upload_session(), get_upload_session(),
  update_upload_session_chunk(), delete_upload_session(),
  list_upload_sessions()

- Schema extended with UploadSession, UploadProgress, and new
  request/response fields
- Librarian methods: begin_upload, upload_chunk, complete_upload,
  abort_upload, get_upload_status, list_uploads
- Service routing for all new operations
- Python SDK with transparent chunked upload:
  - add_document() auto-switches to chunked for files > 10MB
  - Progress callback support (on_progress)
  - get_pending_uploads(), get_upload_status(), abort_upload(),
    resume_upload()

- Document table: Added parent_id and document_type columns with index
- Document schema (knowledge/document.py): Added document_id field for
  streaming retrieval
- Librarian operations:
  - add-child-document for extracted PDF pages
  - list-children to get child documents
  - stream-document for chunked content retrieval
  - Cascade delete removes children when parent is deleted
  - list-documents filters children by default
- PDF decoder (decoding/pdf/pdf_decoder.py): Updated to stream large
  documents from librarian API to temp file
- Librarian service (librarian/service.py): Sends document_id instead of
  content for large PDFs (>2MB)
- Deprecated tools (load_pdf.py, load_text.py): Added deprecation
  warnings directing users to tg-add-library-document +
  tg-start-library-processing

Remove load_pdf and load_text utils

Move chunker/librarian comms to base class

Updating tests
2026-03-04 16:57:58 +00:00
cybermaggedon
88fe8468bc
Update CI for 2.1 release (#653) 2026-02-28 11:10:11 +00:00
cybermaggedon
4bbc6d844f
Row embeddings APIs exposed (#646)
* Added row embeddings API and CLI support

* Updated protocol specs

* Row embeddings agent tool

* Add new agent tool to CLI
2026-02-23 21:52:56 +00:00
cybermaggedon
1809c1f56d
Structured data 2 (#645)
* Structured data refactor - multi-index tables, remove need for manual mods to the Cassandra tables

* Tech spec updated to track implementation
2026-02-23 15:56:29 +00:00
cybermaggedon
6bf08c3ace
Feature/more cli diags (#624)
* CLI tools for tg-invoke-graph-embeddings, tg-invoke-document-embeddings,
and tg-invoke-embeddings.  Just useful for diagnostics.

* Fix tg-load-knowledge
2026-02-04 14:10:30 +00:00
cybermaggedon
cf0daedefa
Changed schema for Value -> Term, majorly breaking change (#622)
* Changed schema for Value -> Term, majorly breaking change

* Following the schema change, Value -> Term into all processing

* Updated Cassandra for g, p, s, o index patterns (7 indexes)

* Reviewed and updated all tests

* Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down
2026-01-27 13:48:08 +00:00
cybermaggedon
e4f0013841
Open 1.9 branch (#620) 2026-01-26 17:36:25 +00:00
cybermaggedon
b08db761d7
Fix config inconsistency (#609)
* Plural/singular confusion in config key

* Flow class vs flow blueprint nomenclature change

* Update docs & CLI to reflect the above
2026-01-14 12:31:40 +00:00
Cyber MacGeddon
1865b3f3c8 Start 1.8 release branch 2025-12-17 21:32:13 +00:00
cybermaggedon
52ca74bbbc
System startup tracker (#579) 2025-12-04 18:01:47 +00:00
Cyber MacGeddon
98aaa4f67e Configure for 1.7 release branch 2025-12-03 09:46:55 +00:00
cybermaggedon
b1cc724f7d
Streaming LLM part 2 (#567)
* Updates for agent API with streaming support

* Added tg-dump-queues tool to dump Pulsar queues to a log

* Updated tg-invoke-agent, incremental output

* Queue dumper CLI - might be useful for debug

* Updating for tests
2025-11-26 15:16:17 +00:00
cybermaggedon
97d8b84d7f
Open 1.6 release branch (#564) 2025-11-24 10:05:29 +00:00
cybermaggedon
ad35656811
Prepare 1.5 release branch (#550) 2025-10-11 11:44:00 +01:00
cybermaggedon
8354ea1276
Update flow parameter tech spec for advanced params (#537)
* Add advanced mode to tech spec,  fix enum description in tech spec

* Updated tech-spec for controlled-by relationship between parameters

* Update tg-show-flows CLI

* Update tg-show-flows, tg-show-flow-classes, tg-start-flow CLI

* Add tg-show-parameter-types
2025-09-26 10:55:10 +01:00
cybermaggedon
fcd15d1833
Collection management part 2 (#522)
* Plumb collection manager into librarian

* Test end-to-end
2025-09-19 16:08:47 +01:00
cybermaggedon
13ff7d765d
Collection management (#520)
* Tech spec

* Refactored Cassanda knowledge graph for single table

* Collection management, librarian services to manage metadata and collection deletion
2025-09-18 15:57:52 +01:00
cybermaggedon
0b59f0c828
Maint/open 1.4 release branch (#508)
* Change pyproject files for 1.4

* Fix tests to track 1.4
2025-09-10 22:11:03 +01:00
cybermaggedon
ebca467ed8
Structured data loader cli (#498) 2025-09-05 15:38:18 +01:00
cybermaggedon
a6d9f5e849
Structured query support (#492)
* Tweak the structured query schema

* Structure query service

* Gateway support for nlp-query and structured-query

* API support

* Added CLI

* Update tests

* More tests
2025-09-04 16:06:18 +01:00
cybermaggedon
672e358b2f
Feature/graphql table query (#486)
* Tech spec

* Object query service for Cassandra

* Gateway support for objects-query

* GraphQL query utility

* Filters, ordering
2025-09-03 23:39:11 +01:00
cybermaggedon
5139c6ad5d
Bump pyproject.toml constraints (#477) 2025-08-28 13:45:58 +01:00
cybermaggedon
28190fea8a
More config cli (#466)
* Extra config CLI tech spec

* Describe packaging

* Added CLI commands

* Add tests
2025-08-22 13:36:10 +01:00
cybermaggedon
e89a5b5d23
Knowledge load utility CLI (#456)
* Knowledge loader

* More tests
2025-08-13 16:07:58 +01:00
cybermaggedon
98022d6af4
Migrate from setup.py to pyproject.toml (#440)
* Converted setup.py to pyproject.toml

* Modern package infrastructure as recommended by py docs
2025-07-23 21:22:08 +01:00