trustgraph

mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-05-17 03:15:14 +02:00

Author	SHA1	Message	Date
cybermaggedon	142dd0231c	release/v2.4 -> master (#924 ) * CLI auth migration, document embeddings core lifecycle (#913) Migrate get_kg_core and put_kg_core CLI tools to use Api/SocketClient with first-frame auth (fixes broken raw websocket path). Fix wire format field names (root/vector). Remove ~600 lines of dead raw websocket code from invoke_graph_rag.py. Add document embeddings core lifecycle to the knowledge service: list/get/put/delete/load operations across schema, translator, Cassandra table store, knowledge manager, gateway registry, REST API, socket client, and CLI (tg-get-de-core, tg-put-de-core). Fix delete_kg_core to also clean up document embeddings rows. * Remove spurious workspace parameter from SPARQL algebra evaluator (#915) Fix threading of workspace paramater: - The SPARQL algebra evaluator was threading a workspace parameter through every function and passing it to TriplesClient.query(), which doesn't accept it. Workspace isolation is handled by pub/sub topic routing — the TriplesClient is already scoped to a workspace-specific flow, same as GraphRAG. Passing workspace explicitly was both incorrect and unnecessary. Update tests: - tests/unit/test_query/test_sparql_algebra.py (new) — Tests _query_pattern, _eval_bgp, and evaluate() with various algebra nodes. Key tests assert workspace is never in tc.query() kwargs, plus correctness tests for BGP, JOIN, UNION, SLICE, DISTINCT, and edge cases. - tests/unit/test_retrieval/test_graph_rag.py — Added test_triples_query_never_passes_workspace (checks query()) and test_follow_edges_never_passes_workspace (checks query_stream()). * Make all Cassandra and Qdrant I/O async-safe with proper concurrency controls (#916) Cassandra triples services were using syncronous EntityCentricKnowledgeGraph methods from async contexts, and connection state was managed with threading.local which is wrong for asyncio coroutines sharing a single thread. Qdrant services had no async wrapping at all, blocking the event loop on every network call. Rows services had unprotected shared state mutations across concurrent coroutines. - Add async methods to EntityCentricKnowledgeGraph (async_insert, async_get_s/p/o/sp/po/os/spo/all, async_collection_exists, async_create_collection, async_delete_collection) using the existing cassandra_async.async_execute bridge - Rewrite triples write + query services: replace threading.local with asyncio.Lock + dict cache for per-workspace connections, use async ECKG methods for all data operations, keep asyncio.to_thread only for one-time blocking ECKG construction - Wrap all Qdrant calls in asyncio.to_thread across all 6 services (doc/graph/row embeddings write + query), add asyncio.Lock + set cache for collection existence checks - Add asyncio.Lock to rows write + query services to protect shared state (schemas, sessions, config caches) from concurrent mutation - Update all affected tests to match new async patterns * Fixed error only returning a page of results (#921) The root cause: async_execute only materialises the first result page (by design — it says so in its docstring). The streaming query set fetch_size=20 and expected to iterate all results, but only got the first 20 rows back. The fix uses asyncio.to_thread(lambda: list(tg.session.execute(...))) which lets the sync driver iterate all pages in a worker thread — exactly what the pre-async code did. * Optional test warning suppression (#923) * Fix test collection module errors & silence upstream Pytest warnings (#823) * chore: add virtual environment and .env directories to gitignore * test: filter upstream DeprecationWarning and UserWarning messages * fix(namespace): remove empty __init__.py files to fix PEP 420 implicit namespace routing for trustgraph sub-packages * Revert __init__.py deletions * Add .ini changes but commented out, will be useful at times --------- Co-authored-by: Salil M <d2kyt@protonmail.com>	2026-05-15 13:02:51 +01:00
cybermaggedon	89cabee1b4	release/v2.4 -> master (#844 )	2026-04-22 15:19:57 +01:00
cybermaggedon	22096e07e2	Fix tests broken by the recent RabbitMQ/Cassandra async fixes (#815 ) - Fix invalid key in config causing rogue warning - Fix asyncio test tags	2026-04-16 10:00:18 +01:00
cybermaggedon	d9dc4cbab5	SPARQL query service (#754 ) SPARQL 1.1 query service wrapping pub/sub triples interface Add a backend-agnostic SPARQL query service that parses SPARQL queries using rdflib, decomposes them into triple pattern lookups via the existing TriplesClient pub/sub interface, and performs in-memory joins, filters, and projections. Includes: - SPARQL parser, algebra evaluator, expression evaluator, solution sequence operations (BGP, JOIN, OPTIONAL, UNION, FILTER, BIND, VALUES, GROUP BY, ORDER BY, LIMIT/OFFSET, DISTINCT, aggregates) - FlowProcessor service with TriplesClientSpec - Gateway dispatcher, request/response translators, API spec - Python SDK method (FlowInstance.sparql_query) - CLI command (tg-invoke-sparql-query) - Tech spec (docs/tech-specs/sparql-query.md) New unit tests for SPARQL query	2026-04-02 17:21:39 +01:00
cybermaggedon	57eda65674	Knowledge core processing updated for embeddings interface change (#681 ) Knowledge core fixed: - trustgraph-flow/trustgraph/tables/knowledge.py - v.vector, v.chunk_id - trustgraph-base/trustgraph/messaging/translators/document_loading.py - chunk.vector - trustgraph-base/trustgraph/messaging/translators/knowledge.py - entity.vector - trustgraph-flow/trustgraph/gateway/dispatch/serialize.py - entity.vector, chunk.vector Test fixtures fixed: - tests/unit/test_storage/conftest.py - All mock entities/chunks use vector - tests/unit/test_query/conftest.py - All mock requests use vector - tests/unit/test_query/test_doc_embeddings_pinecone_query.py - All mock messages use vector These changes align with commit `f2ae0e86` which changed the schema from vectors: list[list[float]] to vector: list[float].	2026-03-10 13:28:16 +00:00
cybermaggedon	7a6197d8c3	GraphRAG Query-Time Explainability (#677 ) Implements full explainability pipeline for GraphRAG queries, enabling traceability from answers back to source documents. Renamed throughout for clarity: - provenance_callback → explain_callback - provenance_id → explain_id - provenance_collection → explain_collection - message_type "provenance" → "explain" - Queue name "provenance" → "explainability" GraphRAG queries now emit explainability events as they execute: 1. Session - query text and timestamp 2. Retrieval - edges retrieved from subgraph 3. Selection - selected edges with LLM reasoning (JSONL with id + reasoning) 4. Answer - reference to synthesized response Events stream via explain_callback during query(), enabling real-time UX. - Answers stored in librarian service (not inline in graph - too large) - Document ID as URN: urn:trustgraph:answer:{session_id} - Graph stores tg:document reference (IRI) to librarian document - Added librarian producer/consumer to graph-rag service - get_labelgraph() now returns (labeled_edges, uri_map) - uri_map maps edge_id(label_s, label_p, label_o) → (uri_s, uri_p, uri_o) - Explainability data stores original URIs, not labels - Enables tracing edges back to reifying statements via tg:reifies - Added serialize_triple() to query service (matches storage format) - get_term_value() now handles TRIPLE type terms - Enables querying by quoted triple in object position: ?stmt tg:reifies <<s p o>> - Displays real-time explainability events during query - Resolves rdfs:label for edge components (s, p, o) - Traces source chain via prov:wasDerivedFrom to root document - Output: "Source: Chunk 1 → Page 2 → Document Title" - Label caching to avoid repeated queries GraphRagResponse: - explain_id: str \| None - explain_collection: str \| None - message_type: str ("chunk" or "explain") - end_of_session: bool trustgraph-base/trustgraph/provenance/: - namespaces.py - Added TG_DOCUMENT predicate - triples.py - answer_triples() supports document_id reference - uris.py - Added edge_selection_uri() trustgraph-base/trustgraph/schema/services/retrieval.py: - GraphRagResponse with explain_id, explain_collection, end_of_session trustgraph-flow/trustgraph/retrieval/graph_rag/: - graph_rag.py - URI preservation, streaming answer accumulation - rag.py - Librarian integration, real-time explain emission trustgraph-flow/trustgraph/query/triples/cassandra/service.py: - Quoted triple serialization for query matching trustgraph-cli/trustgraph/cli/invoke_graph_rag.py: - Full explainability display with label resolution and source tracing	2026-03-10 10:00:01 +00:00
cybermaggedon	f2ae0e8623	Embeddings API scores (#671 ) - Put scores in all responses - Remove unused 'middle' vector layer. Vector of texts -> vector of (vector embedding)	2026-03-09 10:53:44 +00:00
cybermaggedon	3bf8a65409	Fix tests (#666 )	2026-03-07 23:38:09 +00:00
cybermaggedon	1809c1f56d	Structured data 2 (#645 ) * Structured data refactor - multi-index tables, remove need for manual mods to the Cassandra tables * Tech spec updated to track implementation	2026-02-23 15:56:29 +00:00
cybermaggedon	00c1ca681b	Entity-centric graph (#633 ) * Tech spec for new entity-centric graph schema * Graph implementation	2026-02-16 13:26:43 +00:00
cybermaggedon	cf0daedefa	Changed schema for Value -> Term, majorly breaking change (#622 ) * Changed schema for Value -> Term, majorly breaking change * Following the schema change, Value -> Term into all processing * Updated Cassandra for g, p, s, o index patterns (7 indexes) * Reviewed and updated all tests * Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down	2026-01-27 13:48:08 +00:00
cybermaggedon	11f41b07ab	Get neo4j to use limit (#618 ) * Get neo4j to use limit * Fix tests - they we exact matching on query strings	2026-01-22 15:16:34 +00:00
cybermaggedon	310a2deb06	Feature/streaming llm phase 1 (#566 ) * Tidy up duplicate tech specs in doc directory * Streaming LLM text-completion service tech spec. * text-completion and prompt interfaces * streaming change applied to all LLMs, so far tested with VertexAI * Skip Pinecone unit tests, upstream module issue is affecting things, tests are passing again * Added agent streaming, not working and has broken tests	2025-11-26 09:59:10 +00:00
cybermaggedon	6129bb68c1	Fix hard coded vector size (#555 ) * Fixed hard-coded embeddings store size * Vector store lazy-creates collections, different collections for different dimension lengths. * Added tech spec for vector store lifecycle * Fixed some tests for the new spec	2025-11-10 16:56:51 +00:00
cybermaggedon	fcd15d1833	Collection management part 2 (#522 ) * Plumb collection manager into librarian * Test end-to-end	2025-09-19 16:08:47 +01:00
cybermaggedon	d378db9370	Cassandra performance enhancement (#521 ) * Tech spec * Tech spec complete * Cassandra multi-table for performance	2025-09-18 19:52:05 +01:00
cybermaggedon	13ff7d765d	Collection management (#520 ) * Tech spec * Refactored Cassanda knowledge graph for single table * Collection management, librarian services to manage metadata and collection deletion	2025-09-18 15:57:52 +01:00
cybermaggedon	7f57bc6a0a	Feature/memgraph user collection isolation (#510 ) * User/collection processing in memgraph * Update tests	2025-09-10 22:11:35 +01:00
cybermaggedon	c694b12e9c	Feature/neo4j user collection isolation (#509 ) * Tech spec * User/collection separation * Update tests	2025-09-10 22:11:21 +01:00
cybermaggedon	314ce76b81	Feature/fix milvus (#507 ) - Remove object embeddings, were currently broken and not used - Fixed Milvus collection names * Updating tests * Remove unused entrypoint	2025-09-09 21:44:55 +01:00
cybermaggedon	85e669c763	Fixing more Cassandra consistency issues (#488 ) * Fixing more Cassandra work * Fix tests	2025-09-04 00:58:11 +01:00
cybermaggedon	ccaec88a72	Feature/consolidate cassandra config (#483 ) * Cassandra consolidation of parameters * New Cassandra configuration helper * Implemented Cassanda config refactor * New tests	2025-09-03 23:41:22 +01:00
cybermaggedon	672e358b2f	Feature/graphql table query (#486 ) * Tech spec * Object query service for Cassandra * Gateway support for objects-query * GraphQL query utility * Filters, ordering	2025-09-03 23:39:11 +01:00
cybermaggedon	f37decea2b	Increase storage test coverage (#435 ) * Fixing storage and adding tests * PR pipeline only runs quick tests	2025-07-15 09:33:35 +01:00
cybermaggedon	2f7fddd206	Test suite executed from CI pipeline (#433 ) * Test strategy & test cases * Unit tests * Integration tests	2025-07-14 14:57:44 +01:00

25 commits