* CLI auth migration, document embeddings core lifecycle (#913)
Migrate get_kg_core and put_kg_core CLI tools to use Api/SocketClient
with first-frame auth (fixes broken raw websocket path). Fix wire
format field names (root/vector). Remove ~600 lines of dead raw
websocket code from invoke_graph_rag.py.
Add document embeddings core lifecycle to the knowledge service:
list/get/put/delete/load operations across schema, translator,
Cassandra table store, knowledge manager, gateway registry, REST API,
socket client, and CLI (tg-get-de-core, tg-put-de-core).
Fix delete_kg_core to also clean up document embeddings rows.
* Remove spurious workspace parameter from SPARQL algebra evaluator (#915)
Fix threading of workspace paramater:
- The SPARQL algebra evaluator was threading a workspace parameter
through every function and passing it to TriplesClient.query(),
which doesn't accept it. Workspace isolation is handled by pub/sub
topic routing — the TriplesClient is already scoped to a
workspace-specific flow, same as GraphRAG. Passing workspace
explicitly was both incorrect and unnecessary.
Update tests:
- tests/unit/test_query/test_sparql_algebra.py (new) — Tests
_query_pattern, _eval_bgp, and evaluate() with various algebra
nodes. Key tests assert workspace is never in tc.query() kwargs,
plus correctness tests for BGP, JOIN, UNION, SLICE, DISTINCT, and
edge cases.
- tests/unit/test_retrieval/test_graph_rag.py — Added
test_triples_query_never_passes_workspace (checks query()) and
test_follow_edges_never_passes_workspace (checks query_stream()).
* Make all Cassandra and Qdrant I/O async-safe with proper concurrency controls (#916)
Cassandra triples services were using syncronous EntityCentricKnowledgeGraph
methods from async contexts, and connection state was managed with
threading.local which is wrong for asyncio coroutines sharing a single
thread. Qdrant services had no async wrapping at all, blocking the event
loop on every network call. Rows services had unprotected shared state
mutations across concurrent coroutines.
- Add async methods to EntityCentricKnowledgeGraph (async_insert,
async_get_s/p/o/sp/po/os/spo/all, async_collection_exists,
async_create_collection, async_delete_collection) using the existing
cassandra_async.async_execute bridge
- Rewrite triples write + query services: replace threading.local with
asyncio.Lock + dict cache for per-workspace connections, use async
ECKG methods for all data operations, keep asyncio.to_thread only for
one-time blocking ECKG construction
- Wrap all Qdrant calls in asyncio.to_thread across all 6 services
(doc/graph/row embeddings write + query), add asyncio.Lock + set cache
for collection existence checks
- Add asyncio.Lock to rows write + query services to protect shared
state (schemas, sessions, config caches) from concurrent mutation
- Update all affected tests to match new async patterns
* Fixed error only returning a page of results (#921)
The root cause: async_execute only materialises the first result
page (by design — it says so in its docstring). The streaming query
set fetch_size=20 and expected to iterate all results, but only got
the first 20 rows back.
The fix uses
asyncio.to_thread(lambda: list(tg.session.execute(...)))
which lets the sync driver iterate
all pages in a worker thread — exactly what the pre-async code did.
* Optional test warning suppression (#923)
* Fix test collection module errors & silence upstream Pytest warnings (#823)
* chore: add virtual environment and .env directories to gitignore
* test: filter upstream DeprecationWarning and UserWarning messages
* fix(namespace): remove empty __init__.py files to fix PEP 420 implicit namespace routing for trustgraph sub-packages
* Revert __init__.py deletions
* Add .ini changes but commented out, will be useful at times
---------
Co-authored-by: Salil M <d2kyt@protonmail.com>
SPARQL 1.1 query service wrapping pub/sub triples interface
Add a backend-agnostic SPARQL query service that parses SPARQL
queries using rdflib, decomposes them into triple pattern lookups
via the existing TriplesClient pub/sub interface, and performs
in-memory joins, filters, and projections.
Includes:
- SPARQL parser, algebra evaluator, expression evaluator, solution
sequence operations (BGP, JOIN, OPTIONAL, UNION, FILTER, BIND,
VALUES, GROUP BY, ORDER BY, LIMIT/OFFSET, DISTINCT, aggregates)
- FlowProcessor service with TriplesClientSpec
- Gateway dispatcher, request/response translators, API spec
- Python SDK method (FlowInstance.sparql_query)
- CLI command (tg-invoke-sparql-query)
- Tech spec (docs/tech-specs/sparql-query.md)
New unit tests for SPARQL query
Knowledge core fixed:
- trustgraph-flow/trustgraph/tables/knowledge.py - v.vector, v.chunk_id
- trustgraph-base/trustgraph/messaging/translators/document_loading.py -
chunk.vector
- trustgraph-base/trustgraph/messaging/translators/knowledge.py -
entity.vector
- trustgraph-flow/trustgraph/gateway/dispatch/serialize.py - entity.vector,
chunk.vector
Test fixtures fixed:
- tests/unit/test_storage/conftest.py - All mock entities/chunks use vector
- tests/unit/test_query/conftest.py - All mock requests use vector
- tests/unit/test_query/test_doc_embeddings_pinecone_query.py - All mock
messages use vector
These changes align with commit f2ae0e86 which changed the schema from
vectors: list[list[float]] to vector: list[float].
* Changed schema for Value -> Term, majorly breaking change
* Following the schema change, Value -> Term into all processing
* Updated Cassandra for g, p, s, o index patterns (7 indexes)
* Reviewed and updated all tests
* Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down
* Tidy up duplicate tech specs in doc directory
* Streaming LLM text-completion service tech spec.
* text-completion and prompt interfaces
* streaming change applied to all LLMs, so far tested with VertexAI
* Skip Pinecone unit tests, upstream module issue is affecting things, tests are passing again
* Added agent streaming, not working and has broken tests
* Fixed hard-coded embeddings store size
* Vector store lazy-creates collections, different collections for
different dimension lengths.
* Added tech spec for vector store lifecycle
* Fixed some tests for the new spec