trustgraph

mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-04-26 00:46:22 +02:00

Author	SHA1	Message	Date
cybermaggedon	cd5580be59	Extract-time provenance (#661 ) 1. Shared Provenance Module - URI generators, namespace constants, triple builders, vocabulary bootstrap 2. Librarian - Emits document metadata to graph on processing initiation (vocabulary bootstrap + PROV-O triples) 3. PDF Extractor - Saves pages as child documents, emits parent-child provenance edges, forwards page IDs 4. Chunker - Saves chunks as child documents, emits provenance edges, forwards chunk ID + content 5. Knowledge Extractors (both definitions and relationships): - Link entities to chunks via SUBJECT_OF (not top-level document) - Removed duplicate metadata emission (now handled by librarian) - Get chunk_doc_id and chunk_uri from incoming Chunk message 6. Embedding Provenance: - EntityContext schema has chunk_id field - EntityEmbeddings schema has chunk_id field - Definitions extractor sets chunk_id when creating EntityContext - Graph embeddings processor passes chunk_id through to EntityEmbeddings Provenance Flow: Document → Page (PDF) → Chunk → Extracted Facts/Embeddings ↓ ↓ ↓ ↓ librarian librarian librarian (chunk_id reference) + graph + graph + graph Each artifact is stored in librarian with parent-child linking, and PROV-O edges are emitted to the knowledge graph for full traceability from any extracted fact back to its source document. Also, updating tests	2026-03-05 18:36:10 +00:00
cybermaggedon	d8f0a576af	Document API updates (#660 ) * Doc streaming from librarian * Fix chunk minimum confusion * Add CLI args	2026-03-05 15:20:45 +00:00
cybermaggedon	a630e143ef	Incremental / large document loading (#659 ) Tech spec BlobStore (trustgraph-flow/trustgraph/librarian/blob_store.py): - get_stream() - yields document content in chunks for streaming retrieval - create_multipart_upload() - initializes S3 multipart upload, returns upload_id - upload_part() - uploads a single part, returns etag - complete_multipart_upload() - finalizes upload with part etags - abort_multipart_upload() - cancels and cleans up Cassandra schema (trustgraph-flow/trustgraph/tables/library.py): - New upload_session table with 24-hour TTL - Index on user for listing sessions - Prepared statements for all operations - Methods: create_upload_session(), get_upload_session(), update_upload_session_chunk(), delete_upload_session(), list_upload_sessions() - Schema extended with UploadSession, UploadProgress, and new request/response fields - Librarian methods: begin_upload, upload_chunk, complete_upload, abort_upload, get_upload_status, list_uploads - Service routing for all new operations - Python SDK with transparent chunked upload: - add_document() auto-switches to chunked for files > 10MB - Progress callback support (on_progress) - get_pending_uploads(), get_upload_status(), abort_upload(), resume_upload() - Document table: Added parent_id and document_type columns with index - Document schema (knowledge/document.py): Added document_id field for streaming retrieval - Librarian operations: - add-child-document for extracted PDF pages - list-children to get child documents - stream-document for chunked content retrieval - Cascade delete removes children when parent is deleted - list-documents filters children by default - PDF decoder (decoding/pdf/pdf_decoder.py): Updated to stream large documents from librarian API to temp file - Librarian service (librarian/service.py): Sends document_id instead of content for large PDFs (>2MB) - Deprecated tools (load_pdf.py, load_text.py): Added deprecation warnings directing users to tg-add-library-document + tg-start-library-processing Remove load_pdf and load_text utils Move chunker/librarian comms to base class Updating tests	2026-03-04 16:57:58 +00:00
cybermaggedon	a38ca9474f	Tool services - dynamically pluggable tool implementations for agent frameworks (#658 ) * New schema * Tool service implementation * Base class * Joke service, for testing * Update unit tests for tool services	2026-03-04 14:51:32 +00:00
cybermaggedon	0b83c08ae4	Use model in Azure LLM integration (#657 )	2026-03-04 12:06:06 +00:00
cybermaggedon	88fe8468bc	Update CI for 2.1 release (#653 )	2026-02-28 11:10:11 +00:00
cybermaggedon	6d8da748d7	Fix mismatching ge-query / graph-embeddings-query service idents (#648 )	2026-02-24 12:17:29 +00:00
cybermaggedon	4bbc6d844f	Row embeddings APIs exposed (#646 ) * Added row embeddings API and CLI support * Updated protocol specs * Row embeddings agent tool * Add new agent tool to CLI	2026-02-23 21:52:56 +00:00
cybermaggedon	1809c1f56d	Structured data 2 (#645 ) * Structured data refactor - multi-index tables, remove need for manual mods to the Cassandra tables * Tech spec updated to track implementation	2026-02-23 15:56:29 +00:00
cybermaggedon	2d8dbf4cdb	Move GAIStudio to vertexai package to simplify deps (#639 )	2026-02-20 08:46:29 +00:00
cybermaggedon	769c56bbea	Use ClientError & code to determine 429 error (#638 )	2026-02-20 08:00:07 +00:00
cybermaggedon	89b69fdb08	Fix weird Onttology URI issue (#637 )	2026-02-16 19:18:29 +00:00
cybermaggedon	d886358be6	Entity & triple batch size limits (#635 ) * Entities and triples are emitted in batches with a batch limit to manage overloading downstream. * Update tests	2026-02-16 17:38:03 +00:00
cybermaggedon	fe389354f6	Fix d/g attribute error (#634 )	2026-02-16 13:34:08 +00:00
cybermaggedon	00c1ca681b	Entity-centric graph (#633 ) * Tech spec for new entity-centric graph schema * Graph implementation	2026-02-16 13:26:43 +00:00
cybermaggedon	f24f1ebd80	Migrate to VertexAI to google-genai SDK from deprecated library (#632 ) * Migrate to VertexAI to google-genai SDK from deprecated library * Fix tests, mock the correct API	2026-02-09 20:43:33 +00:00
cybermaggedon	2781c7d87c	Fix LLM metrics (#631 ) * Fix mistral metrics * Fix to other models	2026-02-09 19:35:42 +00:00
cybermaggedon	4fca97d555	Output the entity term as well as its definition as entity contexts (#629 )	2026-02-09 15:18:05 +00:00
cybermaggedon	8574861196	Protect null embeddings - v2.0 (#627 ) * Don't emit graph embeddings if there aren't any. * Don't store graph embeddings in a knowledge store if there's an empty list. * Translate between Cassandra's 'null' representing an empty list and an empty list which is what the surrounding code wants (and stored in the first place). * Avoid emitting empty embedding lists * Avoid output empty triple lists * Fix tests	2026-02-09 14:57:36 +00:00
cybermaggedon	6bf08c3ace	Feature/more cli diags (#624 ) * CLI tools for tg-invoke-graph-embeddings, tg-invoke-document-embeddings, and tg-invoke-embeddings. Just useful for diagnostics. * Fix tg-load-knowledge	2026-02-04 14:10:30 +00:00
cybermaggedon	cf0daedefa	Changed schema for Value -> Term, majorly breaking change (#622 ) * Changed schema for Value -> Term, majorly breaking change * Following the schema change, Value -> Term into all processing * Updated Cassandra for g, p, s, o index patterns (7 indexes) * Reviewed and updated all tests * Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down	2026-01-27 13:48:08 +00:00
cybermaggedon	e214eb4e02	Feature/prompts jsonl (#619 ) * Tech spec * JSONL implementation complete * Updated prompt client users * Fix tests	2026-01-26 17:38:00 +00:00
cybermaggedon	e4f0013841	Open 1.9 branch (#620 )	2026-01-26 17:36:25 +00:00
cybermaggedon	11f41b07ab	Get neo4j to use limit (#618 ) * Get neo4j to use limit * Fix tests - they we exact matching on query strings	2026-01-22 15:16:34 +00:00
cybermaggedon	58c00149a7	Normalise URLs so that / suffix is optional (#617 )	2026-01-19 13:34:33 +00:00
cybermaggedon	83ea15dae7	Fixed flows/flow key issue in config (#616 )	2026-01-16 00:10:44 +00:00
cybermaggedon	62b754d788	Fix flow loading (#611 )	2026-01-14 16:23:15 +00:00
cybermaggedon	387afee7b7	Fix load-doc (#610 )	2026-01-14 15:46:29 +00:00
cybermaggedon	b08db761d7	Fix config inconsistency (#609 ) * Plural/singular confusion in config key * Flow class vs flow blueprint nomenclature change * Update docs & CLI to reflect the above	2026-01-14 12:31:40 +00:00
cybermaggedon	807f6cc4e2	Fix non streaming RAG problems (#607 ) * Fix non-streaming failure in RAG services * Fix non-streaming failure in API * Fix agent non-streaming messaging * Agent messaging unit & contract tests	2026-01-12 18:45:52 +00:00
cybermaggedon	16a5cf966a	Fix agent streaming tool failure (#602 ) * Fix agent streaming linkage * Update tests	2026-01-06 23:00:50 +00:00
cybermaggedon	f0c95a4c5e	Fix streaming API niggles (#599 ) * Fix end-of-stream anomally with some graph-rag and document-rag * Fix gateway translators dropping responses	2026-01-06 16:41:35 +00:00
cybermaggedon	7a5bf47959	Fix collection existence test logic (#597 )	2026-01-05 16:31:26 +00:00
cybermaggedon	ae13190093	Address legacy issues in storage management (#595 ) * Removed legacy storage management cruft. Tidied tech specs. * Fix deletion of last collection * Storage processor ignores data on the queue which is for a deleted collection * Updated tests	2026-01-05 13:45:14 +00:00
cybermaggedon	25563bae3c	Change MinIO integration options in librarian to be more generic - to support a Garage integration (#594 ) * Tweak object store parameters to be more generic for other S3-type store integration * Update librarian to have region & SSL params * Update MinIO migration tech spec	2025-12-27 18:01:51 +00:00
cybermaggedon	34eb083836	Messaging fabric plugins (#592 ) * Plugin architecture for messaging fabric * Schemas use a technology neutral expression * Schemas strictness has uncovered some incorrect schema use which is fixed	2025-12-17 21:40:43 +00:00
Cyber MacGeddon	1865b3f3c8	Start 1.8 release branch	2025-12-17 21:32:13 +00:00
cybermaggedon	2b5ba68f00	Add model to meter metrics (#589 )	2025-12-10 11:34:48 +00:00
cybermaggedon	727b6bc9d6	Add service ID to log entry instead of module name (#588 )	2025-12-10 11:07:43 +00:00
cybermaggedon	aebdf9444b	Fix incorrect Cassandra config invocation (#587 )	2025-12-10 10:55:14 +00:00
cybermaggedon	f12fcc2652	Loki logging (#586 ) * Consolidate logging into a single module * Added Loki logging * Update tech spec * Add processor label * Fix recursive log entries, logging Loki"s internals	2025-12-09 23:24:41 +00:00
cybermaggedon	39f6a8b940	Fix/queue configurations (#585 ) * Fix config-svc startup dupe CLI args * Fix missing params on collection service * Fix collection management handling	2025-12-06 14:54:47 +00:00
cybermaggedon	ba95fa226b	Gateway queue overrides (#584 )	2025-12-06 11:01:20 +00:00
cybermaggedon	7d07f802a8	Basic multitenant support (#583 ) * Tech spec * Address multi-tenant queue option problems in CLI * Modified collection service to use config * Changed storage management to use the config service definition	2025-12-05 21:45:30 +00:00
cybermaggedon	b957004db9	Feature/improve ontology extract (#576 ) * Tech spec to change ontology extraction * Ontology extract refactoring	2025-12-03 13:36:10 +00:00
Cyber MacGeddon	98aaa4f67e	Configure for 1.7 release branch	2025-12-03 09:46:55 +00:00
cybermaggedon	72cb1c98e0	Fix tests (#571 )	2025-11-28 16:37:01 +00:00
cybermaggedon	e24de6081f	Fix streaming agent interactions (#570 ) * Fix observer, thought streaming * Fix end of message indicators * Remove double-delivery of answer	2025-11-28 16:25:57 +00:00
cybermaggedon	1948edaa50	Streaming rag responses (#568 ) * Tech spec for streaming RAG * Support for streaming Graph/Doc RAG	2025-11-26 19:47:39 +00:00
cybermaggedon	b1cc724f7d	Streaming LLM part 2 (#567 ) * Updates for agent API with streaming support * Added tg-dump-queues tool to dump Pulsar queues to a log * Updated tg-invoke-agent, incremental output * Queue dumper CLI - might be useful for debug * Updating for tests	2025-11-26 15:16:17 +00:00

1 2 3 4 5 ...

280 commits