trustgraph

mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-04-25 08:26:21 +02:00

Author	SHA1	Message	Date
cybermaggedon	e1bc4c04a4	Terminology Rename, and named-graphs for explainability (#682 ) Terminology Rename, and named-graphs for explainability data Changed terminology: - session -> question - retrieval -> exploration - selection -> focus - answer -> synthesis - uris.py: Renamed query_session_uri → question_uri, retrieval_uri → exploration_uri, selection_uri → focus_uri, answer_uri → synthesis_uri - triples.py: Renamed corresponding triple generation functions with updated labels ("GraphRAG question", "Exploration", "Focus", "Synthesis") - namespaces.py: Added named graph constants GRAPH_DEFAULT, GRAPH_SOURCE, GRAPH_RETRIEVAL - init.py: Updated exports - graph_rag.py: Updated to use new terminology - invoke_graph_rag.py: Updated CLI to display new stage names (Question, Exploration, Focus, Synthesis) Query-Time Explainability → Named Graph - triples.py: Added set_graph() helper function to set named graph on triples - graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL named graph - rag.py: Explainability triples stored in user's collection (not separate collection) with named graph Extraction Provenance → Named Graph - relationships/extract.py: Provenance triples use GRAPH_SOURCE named graph - definitions/extract.py: Provenance triples use GRAPH_SOURCE named graph - chunker.py: Provenance triples use GRAPH_SOURCE named graph - pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph CLI Updates - show_graph.py: Added -g/--graph option to filter by named graph and --show-graph to display graph column Also: - Fix knowledge core schemas	2026-03-10 14:35:21 +00:00
cybermaggedon	57eda65674	Knowledge core processing updated for embeddings interface change (#681 ) Knowledge core fixed: - trustgraph-flow/trustgraph/tables/knowledge.py - v.vector, v.chunk_id - trustgraph-base/trustgraph/messaging/translators/document_loading.py - chunk.vector - trustgraph-base/trustgraph/messaging/translators/knowledge.py - entity.vector - trustgraph-flow/trustgraph/gateway/dispatch/serialize.py - entity.vector, chunk.vector Test fixtures fixed: - tests/unit/test_storage/conftest.py - All mock entities/chunks use vector - tests/unit/test_query/conftest.py - All mock requests use vector - tests/unit/test_query/test_doc_embeddings_pinecone_query.py - All mock messages use vector These changes align with commit `f2ae0e86` which changed the schema from vectors: list[list[float]] to vector: list[float].	2026-03-10 13:28:16 +00:00
cybermaggedon	7a6197d8c3	GraphRAG Query-Time Explainability (#677 ) Implements full explainability pipeline for GraphRAG queries, enabling traceability from answers back to source documents. Renamed throughout for clarity: - provenance_callback → explain_callback - provenance_id → explain_id - provenance_collection → explain_collection - message_type "provenance" → "explain" - Queue name "provenance" → "explainability" GraphRAG queries now emit explainability events as they execute: 1. Session - query text and timestamp 2. Retrieval - edges retrieved from subgraph 3. Selection - selected edges with LLM reasoning (JSONL with id + reasoning) 4. Answer - reference to synthesized response Events stream via explain_callback during query(), enabling real-time UX. - Answers stored in librarian service (not inline in graph - too large) - Document ID as URN: urn:trustgraph:answer:{session_id} - Graph stores tg:document reference (IRI) to librarian document - Added librarian producer/consumer to graph-rag service - get_labelgraph() now returns (labeled_edges, uri_map) - uri_map maps edge_id(label_s, label_p, label_o) → (uri_s, uri_p, uri_o) - Explainability data stores original URIs, not labels - Enables tracing edges back to reifying statements via tg:reifies - Added serialize_triple() to query service (matches storage format) - get_term_value() now handles TRIPLE type terms - Enables querying by quoted triple in object position: ?stmt tg:reifies <<s p o>> - Displays real-time explainability events during query - Resolves rdfs:label for edge components (s, p, o) - Traces source chain via prov:wasDerivedFrom to root document - Output: "Source: Chunk 1 → Page 2 → Document Title" - Label caching to avoid repeated queries GraphRagResponse: - explain_id: str \| None - explain_collection: str \| None - message_type: str ("chunk" or "explain") - end_of_session: bool trustgraph-base/trustgraph/provenance/: - namespaces.py - Added TG_DOCUMENT predicate - triples.py - answer_triples() supports document_id reference - uris.py - Added edge_selection_uri() trustgraph-base/trustgraph/schema/services/retrieval.py: - GraphRagResponse with explain_id, explain_collection, end_of_session trustgraph-flow/trustgraph/retrieval/graph_rag/: - graph_rag.py - URI preservation, streaming answer accumulation - rag.py - Librarian integration, real-time explain emission trustgraph-flow/trustgraph/query/triples/cassandra/service.py: - Quoted triple serialization for query matching trustgraph-cli/trustgraph/cli/invoke_graph_rag.py: - Full explainability display with label resolution and source tracing	2026-03-10 10:00:01 +00:00
cybermaggedon	d2d71f859d	Feature/streaming triples (#676 ) * Steaming triples * Also GraphRAG service uses this * Updated tests	2026-03-09 15:46:33 +00:00
cybermaggedon	3c3e11bef5	Fix/librarian broken (#674 ) * Set end-of-stream cleanly - clean streaming message structures * Add tg-get-document-content	2026-03-09 13:36:24 +00:00
cybermaggedon	df1808768d	Fix/doc streaming proto (#673 ) * Librarian streaming doc download * Document stream download endpoint	2026-03-09 12:36:10 +00:00
cybermaggedon	b2ef7bbb8c	Fix doc embeddings invocation (#672 ) * Fix doc embeddings invocation * Tidy query embeddings invocation	2026-03-09 11:07:32 +00:00
cybermaggedon	f2ae0e8623	Embeddings API scores (#671 ) - Put scores in all responses - Remove unused 'middle' vector layer. Vector of texts -> vector of (vector embedding)	2026-03-09 10:53:44 +00:00
cybermaggedon	919b760c05	Update embeddings integration for new batch embeddings interfaces (#669 ) * Fix vector extraction * Fix embeddings integration	2026-03-08 19:41:52 +00:00
cybermaggedon	0a2ce47a88	Batch embeddings (#668 ) Base Service (trustgraph-base/trustgraph/base/embeddings_service.py): - Changed on_request to use request.texts FastEmbed Processor (trustgraph-flow/trustgraph/embeddings/fastembed/processor.py): - on_embeddings(texts, model=None) now processes full batch efficiently - Returns [[v.tolist()] for v in vecs] - list of vector sets Ollama Processor (trustgraph-flow/trustgraph/embeddings/ollama/processor.py): - on_embeddings(texts, model=None) passes list directly to Ollama - Returns [[embedding] for embedding in embeds.embeddings] EmbeddingsClient (trustgraph-base/trustgraph/base/embeddings_client.py): - embed(texts, timeout=300) accepts list of texts Tests Updated: - test_fastembed_dynamic_model.py - 4 tests updated for new interface - test_ollama_dynamic_model.py - 4 tests updated for new interface Updated CLI, SDK and APIs	2026-03-08 18:36:54 +00:00
cybermaggedon	24bbe94136	Document chunks not stored in vector store (#665 ) - Schema - ChunkEmbeddings now uses chunk_id: str instead of chunk: bytes - Schema - DocumentEmbeddingsResponse now returns chunk_ids: list[str] instead of chunks - Translators - Updated to serialize/deserialize chunk_id - Clients - DocumentEmbeddingsClient.query() returns chunk_ids - SDK/API - flow.py, socket_client.py, bulk_client.py updated - Document embeddings service - Stores chunk_id (document ID) instead of chunk text - Storage writers - Qdrant, Milvus, Pinecone store chunk_id in payload - Query services - Return chunk_id from vector store searches - Gateway dispatchers - Serialize chunk_id in API responses - Document RAG - Added librarian client to fetch chunk content from Garage using chunk_ids - CLI tools - Updated all three tools: - invoke_document_embeddings.py - displays chunk_ids, removed max_chunk_length - save_doc_embeds.py - exports chunk_id - load_doc_embeds.py - imports chunk_id	2026-03-07 23:10:45 +00:00
cybermaggedon	2b9232917c	Fix/extraction prov (#662 ) Quoted triple fixes, including... 1. Updated triple_provenance_triples() in triples.py: - Now accepts a Triple object directly - Creates the reification triple using TRIPLE term type: stmt_uri tg:reifies <<extracted_triple>> - Includes it in the returned provenance triples 2. Updated definitions extractor: - Added imports for provenance functions and component version - Added ParameterSpec for optional llm-model and ontology flow parameters - For each definition triple, generates provenance with reification 3. Updated relationships extractor: - Same changes as definitions extractor	2026-03-06 12:23:58 +00:00
cybermaggedon	cd5580be59	Extract-time provenance (#661 ) 1. Shared Provenance Module - URI generators, namespace constants, triple builders, vocabulary bootstrap 2. Librarian - Emits document metadata to graph on processing initiation (vocabulary bootstrap + PROV-O triples) 3. PDF Extractor - Saves pages as child documents, emits parent-child provenance edges, forwards page IDs 4. Chunker - Saves chunks as child documents, emits provenance edges, forwards chunk ID + content 5. Knowledge Extractors (both definitions and relationships): - Link entities to chunks via SUBJECT_OF (not top-level document) - Removed duplicate metadata emission (now handled by librarian) - Get chunk_doc_id and chunk_uri from incoming Chunk message 6. Embedding Provenance: - EntityContext schema has chunk_id field - EntityEmbeddings schema has chunk_id field - Definitions extractor sets chunk_id when creating EntityContext - Graph embeddings processor passes chunk_id through to EntityEmbeddings Provenance Flow: Document → Page (PDF) → Chunk → Extracted Facts/Embeddings ↓ ↓ ↓ ↓ librarian librarian librarian (chunk_id reference) + graph + graph + graph Each artifact is stored in librarian with parent-child linking, and PROV-O edges are emitted to the knowledge graph for full traceability from any extracted fact back to its source document. Also, updating tests	2026-03-05 18:36:10 +00:00
cybermaggedon	a630e143ef	Incremental / large document loading (#659 ) Tech spec BlobStore (trustgraph-flow/trustgraph/librarian/blob_store.py): - get_stream() - yields document content in chunks for streaming retrieval - create_multipart_upload() - initializes S3 multipart upload, returns upload_id - upload_part() - uploads a single part, returns etag - complete_multipart_upload() - finalizes upload with part etags - abort_multipart_upload() - cancels and cleans up Cassandra schema (trustgraph-flow/trustgraph/tables/library.py): - New upload_session table with 24-hour TTL - Index on user for listing sessions - Prepared statements for all operations - Methods: create_upload_session(), get_upload_session(), update_upload_session_chunk(), delete_upload_session(), list_upload_sessions() - Schema extended with UploadSession, UploadProgress, and new request/response fields - Librarian methods: begin_upload, upload_chunk, complete_upload, abort_upload, get_upload_status, list_uploads - Service routing for all new operations - Python SDK with transparent chunked upload: - add_document() auto-switches to chunked for files > 10MB - Progress callback support (on_progress) - get_pending_uploads(), get_upload_status(), abort_upload(), resume_upload() - Document table: Added parent_id and document_type columns with index - Document schema (knowledge/document.py): Added document_id field for streaming retrieval - Librarian operations: - add-child-document for extracted PDF pages - list-children to get child documents - stream-document for chunked content retrieval - Cascade delete removes children when parent is deleted - list-documents filters children by default - PDF decoder (decoding/pdf/pdf_decoder.py): Updated to stream large documents from librarian API to temp file - Librarian service (librarian/service.py): Sends document_id instead of content for large PDFs (>2MB) - Deprecated tools (load_pdf.py, load_text.py): Added deprecation warnings directing users to tg-add-library-document + tg-start-library-processing Remove load_pdf and load_text utils Move chunker/librarian comms to base class Updating tests	2026-03-04 16:57:58 +00:00
cybermaggedon	a38ca9474f	Tool services - dynamically pluggable tool implementations for agent frameworks (#658 ) * New schema * Tool service implementation * Base class * Joke service, for testing * Update unit tests for tool services	2026-03-04 14:51:32 +00:00
cybermaggedon	6d8da748d7	Fix mismatching ge-query / graph-embeddings-query service idents (#648 )	2026-02-24 12:17:29 +00:00
cybermaggedon	4bbc6d844f	Row embeddings APIs exposed (#646 ) * Added row embeddings API and CLI support * Updated protocol specs * Row embeddings agent tool * Add new agent tool to CLI	2026-02-23 21:52:56 +00:00
cybermaggedon	1809c1f56d	Structured data 2 (#645 ) * Structured data refactor - multi-index tables, remove need for manual mods to the Cassandra tables * Tech spec updated to track implementation	2026-02-23 15:56:29 +00:00
cybermaggedon	5ffad92345	Fix subscriber unexpected message causing queue clogging (#642 ) queue clogging.	2026-02-23 14:34:05 +00:00
cybermaggedon	b2e768c309	Fixing Uri import error (#636 )	2026-02-16 19:18:40 +00:00
cybermaggedon	6bf08c3ace	Feature/more cli diags (#624 ) * CLI tools for tg-invoke-graph-embeddings, tg-invoke-document-embeddings, and tg-invoke-embeddings. Just useful for diagnostics. * Fix tg-load-knowledge	2026-02-04 14:10:30 +00:00
cybermaggedon	cf0daedefa	Changed schema for Value -> Term, majorly breaking change (#622 ) * Changed schema for Value -> Term, majorly breaking change * Following the schema change, Value -> Term into all processing * Updated Cassandra for g, p, s, o index patterns (7 indexes) * Reviewed and updated all tests * Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down	2026-01-27 13:48:08 +00:00
cybermaggedon	1c006d5b14	Python API docs (#614 ) * Python API docs working * Python API doc generation	2026-01-15 15:12:32 +00:00
cybermaggedon	b08db761d7	Fix config inconsistency (#609 ) * Plural/singular confusion in config key * Flow class vs flow blueprint nomenclature change * Update docs & CLI to reflect the above	2026-01-14 12:31:40 +00:00
cybermaggedon	99f17d1b9d	Fix non-streaming (2) (#608 )	2026-01-12 21:21:51 +00:00
cybermaggedon	807f6cc4e2	Fix non streaming RAG problems (#607 ) * Fix non-streaming failure in RAG services * Fix non-streaming failure in API * Fix agent non-streaming messaging * Agent messaging unit & contract tests	2026-01-12 18:45:52 +00:00
cybermaggedon	f79d0603f7	Update to add streaming tests (#600 )	2026-01-06 21:48:05 +00:00
cybermaggedon	f0c95a4c5e	Fix streaming API niggles (#599 ) * Fix end-of-stream anomally with some graph-rag and document-rag * Fix gateway translators dropping responses	2026-01-06 16:41:35 +00:00
cybermaggedon	3c675b8cfc	Fix doc embedding schema messages (#598 )	2026-01-05 17:46:08 +00:00
cybermaggedon	fe2dd704a2	Fix optionality in objects-query schema (#596 )	2026-01-05 15:40:53 +00:00
cybermaggedon	ae13190093	Address legacy issues in storage management (#595 ) * Removed legacy storage management cruft. Tidied tech specs. * Fix deletion of last collection * Storage processor ignores data on the queue which is for a deleted collection * Updated tests	2026-01-05 13:45:14 +00:00
cybermaggedon	5304f96fe6	Fix tests (#593 ) * Fix unit/integration/contract tests which were broken by messaging fabric work	2025-12-19 08:53:21 +00:00
cybermaggedon	34eb083836	Messaging fabric plugins (#592 ) * Plugin architecture for messaging fabric * Schemas use a technology neutral expression * Schemas strictness has uncovered some incorrect schema use which is fixed	2025-12-17 21:40:43 +00:00
cybermaggedon	727b6bc9d6	Add service ID to log entry instead of module name (#588 )	2025-12-10 11:07:43 +00:00
cybermaggedon	f12fcc2652	Loki logging (#586 ) * Consolidate logging into a single module * Added Loki logging * Update tech spec * Add processor label * Fix recursive log entries, logging Loki"s internals	2025-12-09 23:24:41 +00:00
cybermaggedon	39f6a8b940	Fix/queue configurations (#585 ) * Fix config-svc startup dupe CLI args * Fix missing params on collection service * Fix collection management handling	2025-12-06 14:54:47 +00:00
cybermaggedon	7d07f802a8	Basic multitenant support (#583 ) * Tech spec * Address multi-tenant queue option problems in CLI * Modified collection service to use config * Changed storage management to use the config service definition	2025-12-05 21:45:30 +00:00
cybermaggedon	664bce6182	Fix Python streaming SDK issues (#580 ) * Fix verify CLI issues * Fixing content mechanisms in API * Fixing error handling * Fixing invoke_prompt, invoke_llm, invoke_agent	2025-12-04 20:42:25 +00:00
cybermaggedon	01aeede78b	Python API implements streaming interfaces (#577 ) * Tech spec * Python CLI utilities updated to use the API including streaming features * Added type safety to Python API * Completed missing auth token support in CLI	2025-12-04 17:38:57 +00:00
cybermaggedon	1948edaa50	Streaming rag responses (#568 ) * Tech spec for streaming RAG * Support for streaming Graph/Doc RAG	2025-11-26 19:47:39 +00:00
cybermaggedon	b1cc724f7d	Streaming LLM part 2 (#567 ) * Updates for agent API with streaming support * Added tg-dump-queues tool to dump Pulsar queues to a log * Updated tg-invoke-agent, incremental output * Queue dumper CLI - might be useful for debug * Updating for tests	2025-11-26 15:16:17 +00:00
cybermaggedon	310a2deb06	Feature/streaming llm phase 1 (#566 ) * Tidy up duplicate tech specs in doc directory * Streaming LLM text-completion service tech spec. * text-completion and prompt interfaces * streaming change applied to all LLMs, so far tested with VertexAI * Skip Pinecone unit tests, upstream module issue is affecting things, tests are passing again * Added agent streaming, not working and has broken tests	2025-11-26 09:59:10 +00:00
cybermaggedon	c69f5207a4	OntoRAG: Ontology-Based Knowledge Extraction and Query Technical Specification (#523 ) * Onto-rag tech spec * New processor kg-extract-ontology, use 'ontology' objects from config to guide triple extraction * Also entity contexts * Integrate with ontology extractor from workbench This is first phase, the extraction is tested and working, also GraphRAG with the extracted knowledge works	2025-11-12 20:38:08 +00:00
cybermaggedon	d9d4c91363	Dynamic embeddings model (#556 ) * Dynamic embeddings model selection * Added tests * HF embeddings are skipped, tests don't run with that package currently tests	2025-11-10 20:38:01 +00:00
cybermaggedon	d1456e547c	Fix label issue in metrics (#540 )	2025-09-26 14:13:22 +01:00
cybermaggedon	10806caac2	Fix label names (#539 )	2025-09-26 12:34:40 +01:00
cybermaggedon	8929a680a1	Chunking dynamic params (#536 ) * Chunking params are dynamic * Update tests	2025-09-26 10:53:32 +01:00
cybermaggedon	6f4f7ce6b4	Flow temperature parameter (#533 ) * Add temperature parameter to LlmService and roll out to all LLMs	2025-09-25 21:26:11 +01:00
cybermaggedon	aa8e422e8c	Flow configurable parameters (#532 ) * Fix pyproject.toml missing requests dep * parameters is now parameter-types * Update flow parameters tech spec for recent changes (no impact on this repo)	2025-09-25 19:11:40 +01:00
cybermaggedon	7a3bfad826	LLM dynamic settings, using the llm-model and llm-rag-model paramters to a flow (#531 ) * Ported LLMs to dynamic models	2025-09-24 16:36:25 +01:00

1 2 3

139 commits