Commit graph

42 commits

Author SHA1 Message Date
cybermaggedon
24bbe94136
Document chunks not stored in vector store (#665)
- Schema - ChunkEmbeddings now uses chunk_id: str instead of chunk: bytes
- Schema - DocumentEmbeddingsResponse now returns chunk_ids: list[str]
  instead of chunks
- Translators - Updated to serialize/deserialize chunk_id
- Clients - DocumentEmbeddingsClient.query() returns chunk_ids
- SDK/API - flow.py, socket_client.py, bulk_client.py updated
- Document embeddings service - Stores chunk_id (document ID) instead
  of chunk text
- Storage writers - Qdrant, Milvus, Pinecone store chunk_id in payload
- Query services - Return chunk_id from vector store searches
- Gateway dispatchers - Serialize chunk_id in API responses
- Document RAG - Added librarian client to fetch chunk content from
  Garage using chunk_ids
- CLI tools - Updated all three tools:
  - invoke_document_embeddings.py - displays chunk_ids, removed
    max_chunk_length
  - save_doc_embeds.py - exports chunk_id
  - load_doc_embeds.py - imports chunk_id
2026-03-07 23:10:45 +00:00
cybermaggedon
2b9232917c
Fix/extraction prov (#662)
Quoted triple fixes, including...

1. Updated triple_provenance_triples() in triples.py:
   - Now accepts a Triple object directly
   - Creates the reification triple using TRIPLE term type: stmt_uri tg:reifies
         <<extracted_triple>>
   - Includes it in the returned provenance triples
    
2. Updated definitions extractor:
   - Added imports for provenance functions and component version
   - Added ParameterSpec for optional llm-model and ontology flow parameters
   - For each definition triple, generates provenance with reification
    
3. Updated relationships extractor:
   - Same changes as definitions extractor
2026-03-06 12:23:58 +00:00
cybermaggedon
a630e143ef
Incremental / large document loading (#659)
Tech spec

BlobStore (trustgraph-flow/trustgraph/librarian/blob_store.py):
- get_stream() - yields document content in chunks for streaming retrieval
- create_multipart_upload() - initializes S3 multipart upload, returns
  upload_id
- upload_part() - uploads a single part, returns etag
- complete_multipart_upload() - finalizes upload with part etags
- abort_multipart_upload() - cancels and cleans up

Cassandra schema (trustgraph-flow/trustgraph/tables/library.py):
- New upload_session table with 24-hour TTL
- Index on user for listing sessions
- Prepared statements for all operations
- Methods: create_upload_session(), get_upload_session(),
  update_upload_session_chunk(), delete_upload_session(),
  list_upload_sessions()

- Schema extended with UploadSession, UploadProgress, and new
  request/response fields
- Librarian methods: begin_upload, upload_chunk, complete_upload,
  abort_upload, get_upload_status, list_uploads
- Service routing for all new operations
- Python SDK with transparent chunked upload:
  - add_document() auto-switches to chunked for files > 10MB
  - Progress callback support (on_progress)
  - get_pending_uploads(), get_upload_status(), abort_upload(),
    resume_upload()

- Document table: Added parent_id and document_type columns with index
- Document schema (knowledge/document.py): Added document_id field for
  streaming retrieval
- Librarian operations:
  - add-child-document for extracted PDF pages
  - list-children to get child documents
  - stream-document for chunked content retrieval
  - Cascade delete removes children when parent is deleted
  - list-documents filters children by default
- PDF decoder (decoding/pdf/pdf_decoder.py): Updated to stream large
  documents from librarian API to temp file
- Librarian service (librarian/service.py): Sends document_id instead of
  content for large PDFs (>2MB)
- Deprecated tools (load_pdf.py, load_text.py): Added deprecation
  warnings directing users to tg-add-library-document +
  tg-start-library-processing

Remove load_pdf and load_text utils

Move chunker/librarian comms to base class

Updating tests
2026-03-04 16:57:58 +00:00
cybermaggedon
4bbc6d844f
Row embeddings APIs exposed (#646)
* Added row embeddings API and CLI support

* Updated protocol specs

* Row embeddings agent tool

* Add new agent tool to CLI
2026-02-23 21:52:56 +00:00
cybermaggedon
1809c1f56d
Structured data 2 (#645)
* Structured data refactor - multi-index tables, remove need for manual mods to the Cassandra tables

* Tech spec updated to track implementation
2026-02-23 15:56:29 +00:00
cybermaggedon
b2e768c309
Fixing Uri import error (#636) 2026-02-16 19:18:40 +00:00
cybermaggedon
6bf08c3ace
Feature/more cli diags (#624)
* CLI tools for tg-invoke-graph-embeddings, tg-invoke-document-embeddings,
and tg-invoke-embeddings.  Just useful for diagnostics.

* Fix tg-load-knowledge
2026-02-04 14:10:30 +00:00
cybermaggedon
cf0daedefa
Changed schema for Value -> Term, majorly breaking change (#622)
* Changed schema for Value -> Term, majorly breaking change

* Following the schema change, Value -> Term into all processing

* Updated Cassandra for g, p, s, o index patterns (7 indexes)

* Reviewed and updated all tests

* Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down
2026-01-27 13:48:08 +00:00
cybermaggedon
1c006d5b14
Python API docs (#614)
* Python API docs working

* Python API doc generation
2026-01-15 15:12:32 +00:00
cybermaggedon
b08db761d7
Fix config inconsistency (#609)
* Plural/singular confusion in config key

* Flow class vs flow blueprint nomenclature change

* Update docs & CLI to reflect the above
2026-01-14 12:31:40 +00:00
cybermaggedon
99f17d1b9d
Fix non-streaming (2) (#608) 2026-01-12 21:21:51 +00:00
cybermaggedon
807f6cc4e2
Fix non streaming RAG problems (#607)
* Fix non-streaming failure in RAG services

* Fix non-streaming failure in API

* Fix agent non-streaming messaging

* Agent messaging unit & contract tests
2026-01-12 18:45:52 +00:00
cybermaggedon
34eb083836
Messaging fabric plugins (#592)
* Plugin architecture for messaging fabric

* Schemas use a technology neutral expression

* Schemas strictness has uncovered some incorrect schema use which is fixed
2025-12-17 21:40:43 +00:00
cybermaggedon
39f6a8b940
Fix/queue configurations (#585)
* Fix config-svc startup dupe CLI args

* Fix missing params on collection service

* Fix collection management handling
2025-12-06 14:54:47 +00:00
cybermaggedon
664bce6182
Fix Python streaming SDK issues (#580)
* Fix verify CLI issues

* Fixing content mechanisms in API

* Fixing error handling

* Fixing invoke_prompt, invoke_llm, invoke_agent
2025-12-04 20:42:25 +00:00
cybermaggedon
01aeede78b
Python API implements streaming interfaces (#577)
* Tech spec

* Python CLI utilities updated to use the API including streaming features

* Added type safety to Python API

* Completed missing auth token support in CLI
2025-12-04 17:38:57 +00:00
cybermaggedon
dc2fa1f31e
flow parameters (#526)
* Flow parameter tech spec

* Flow configurable parameters implemented
2025-09-23 23:18:04 +01:00
cybermaggedon
fcd15d1833
Collection management part 2 (#522)
* Plumb collection manager into librarian

* Test end-to-end
2025-09-19 16:08:47 +01:00
cybermaggedon
13ff7d765d
Collection management (#520)
* Tech spec

* Refactored Cassanda knowledge graph for single table

* Collection management, librarian services to manage metadata and collection deletion
2025-09-18 15:57:52 +01:00
cybermaggedon
48016d8fb2
Added XML, JSON, CSV detection (#519)
* Improved XML detect, added schema selection

* Add schema select + tests

* API additions

* More tests

* Fixed tests
2025-09-16 23:53:43 +01:00
cybermaggedon
5867f45c3a
Fix/agent groups broken (#504)
* Fix non-backward-compatible agent changes

* Fix broken agents
2025-09-08 21:17:18 +01:00
cybermaggedon
f22bf13aa6
Extend use of user + collection fields (#503)
* Collection+user fields in structured query

* User/collection in structured query & agent
2025-09-08 18:28:38 +01:00
cybermaggedon
a6d9f5e849
Structured query support (#492)
* Tweak the structured query schema

* Structure query service

* Gateway support for nlp-query and structured-query

* API support

* Added CLI

* Update tests

* More tests
2025-09-04 16:06:18 +01:00
cybermaggedon
672e358b2f
Feature/graphql table query (#486)
* Tech spec

* Object query service for Cassandra

* Gateway support for objects-query

* GraphQL query utility

* Filters, ordering
2025-09-03 23:39:11 +01:00
cybermaggedon
dd70aade11
Implement logging strategy (#444)
* Logging strategy and convert all prints() to logging invocations
2025-07-30 23:18:38 +01:00
cybermaggedon
9c7a070681
Feature/react call mcp (#428)
Key Features

  - MCP Tool Integration: Added core MCP tool support with ToolClientSpec and ToolClient classes
  - API Enhancement: New mcp_tool method for flow-specific tool invocation
  - CLI Tooling: New tg-invoke-mcp-tool command for testing MCP integration
  - React Agent Enhancement: Fixed and improved multi-tool invocation capabilities
  - Tool Management: Enhanced CLI for tool configuration and management

Changes

  - Added MCP tool invocation to API with flow-specific integration
  - Implemented ToolClientSpec and ToolClient for tool call handling
  - Updated agent-manager-react to invoke MCP tools with configurable types
  - Enhanced CLI with new commands and improved help text
  - Added comprehensive documentation for new CLI commands
  - Improved tool configuration management

Testing

  - Added tg-invoke-mcp-tool CLI command for isolated MCP integration testing
  - Enhanced agent capability to invoke multiple tools simultaneously
2025-07-08 16:19:19 +01:00
cybermaggedon
ef34d951fe
Renamed default flow from 0000 to default (#395) 2025-05-24 12:27:56 +01:00
cybermaggedon
6be0ca1990
Add optional timeout to API, 60s default (#376) 2025-05-08 19:00:17 +01:00
cybermaggedon
31b7ade44d
Feature/knowledge load (#372)
* Switch off retry in Cassandra until we can differentiate retryable errors

* Fix config getvalues

* Loading knowledge cores works
2025-05-08 00:41:45 +01:00
cybermaggedon
8080b54328
Knowledge core CLI (#368) 2025-05-07 00:20:59 +01:00
cybermaggedon
54e475fa3a
Sample docs loader (#365) 2025-05-06 13:43:17 +01:00
cybermaggedon
844547ab5f
Feature/library cli (#363)
* Major Python client API rework, break down API & colossal class

* Complete rest of library API

* Library CLI support
2025-05-05 11:09:18 +01:00
cybermaggedon
3b8b9ea866
Feature/flow api 3 (#358)
* Working mux socket

* Change API to incorporate flow

* Add Flow ID to all relevant CLIs, not completely implemented

* Change tg-processor-state to use API gateway

* Updated all CLIs

* New tg-show-flow-state command

* tg-show-flow-state shows classes too
2025-05-03 10:39:53 +01:00
cybermaggedon
3b021720c5
Feature/flow management cli (#346)
Flow management API + various flow management commands

trustgraph-cli/scripts/tg-delete-flow-class
trustgraph-cli/scripts/tg-get-flow-class
trustgraph-cli/scripts/tg-put-flow-class
trustgraph-cli/scripts/tg-show-flow-classes
trustgraph-cli/scripts/tg-show-flows
trustgraph-cli/scripts/tg-start-flow
trustgraph-cli/scripts/tg-stop-flow
2025-04-24 18:57:33 +01:00
cybermaggedon
1d222235d3
Configuration initialisation (#335)
* - Fixed error reporting in config
- Updated tg-init-pulsar to be able to load initial config to config-svc
- Tweaked API naming and added more config calls

* Tools to dump out prompts and agent tools
2025-04-02 13:52:33 +01:00
cybermaggedon
88eae0a9f0
Fix no version/config at startup (#333) 2025-04-01 20:54:59 +01:00
cybermaggedon
fa09dc319e
Feature/config service (#332)
Configuration service provides an API to change configuration. Complete configuration is pushed down a config queue so that users have a complete copy of config object.
2025-04-01 19:47:05 +01:00
cybermaggedon
ef845d6c9b
Feature/rag parameters (#311)
* Change document-rag and graph-rag processing so that the user can
specify parameters.  Changes in Pulsar services, Pulsar message
schemas, gateway and command-line tools.  User-visible changes in
new parameters on command-line tools.

* Fix bugs, graph-rag working

* Get subgraph truncation in the right place

* Graph RAG and document RAG working and configurable

* Multi-hop path traversal GraphRAG

* Add safety valve for path_size set too high
2025-03-13 00:38:18 +00:00
cybermaggedon
6aa212061d
Fix/document embeddings (#247)
* Update schema for doc embeddings

* Rename embeddings-vectorize to graph-embeddings

* Added document-embeddings processor (broken, needs fixing)

* Added scripts

* Fixed DE queue schema

* Add missing DE process

* Fix doc RAG processing, put graph-rag and doc-rag in appropriate component files.
2025-01-04 21:51:28 +00:00
cybermaggedon
e3d06ab80b
Fix isinstance test on null values (#192)
Co-authored-by: Mark Adams <mark.adams@surevine.com>
2024-12-04 14:42:55 +00:00
cybermaggedon
887fafcf8c
Fix/core save api (#172)
* Acknowledge messaages from Pulsar, doh!
* Change API to deliver a boolean e if value is an entity
* Change loaders to use new API
* Changes, entity-aware API is complete
2024-11-26 16:46:38 +00:00
cybermaggedon
ae1264f5c4
Add Python support to calling the API (#169) 2024-11-22 15:55:32 +00:00