* CLI auth migration, document embeddings core lifecycle (#913)
Migrate get_kg_core and put_kg_core CLI tools to use Api/SocketClient
with first-frame auth (fixes broken raw websocket path). Fix wire
format field names (root/vector). Remove ~600 lines of dead raw
websocket code from invoke_graph_rag.py.
Add document embeddings core lifecycle to the knowledge service:
list/get/put/delete/load operations across schema, translator,
Cassandra table store, knowledge manager, gateway registry, REST API,
socket client, and CLI (tg-get-de-core, tg-put-de-core).
Fix delete_kg_core to also clean up document embeddings rows.
* Remove spurious workspace parameter from SPARQL algebra evaluator (#915)
Fix threading of workspace paramater:
- The SPARQL algebra evaluator was threading a workspace parameter
through every function and passing it to TriplesClient.query(),
which doesn't accept it. Workspace isolation is handled by pub/sub
topic routing — the TriplesClient is already scoped to a
workspace-specific flow, same as GraphRAG. Passing workspace
explicitly was both incorrect and unnecessary.
Update tests:
- tests/unit/test_query/test_sparql_algebra.py (new) — Tests
_query_pattern, _eval_bgp, and evaluate() with various algebra
nodes. Key tests assert workspace is never in tc.query() kwargs,
plus correctness tests for BGP, JOIN, UNION, SLICE, DISTINCT, and
edge cases.
- tests/unit/test_retrieval/test_graph_rag.py — Added
test_triples_query_never_passes_workspace (checks query()) and
test_follow_edges_never_passes_workspace (checks query_stream()).
* Make all Cassandra and Qdrant I/O async-safe with proper concurrency controls (#916)
Cassandra triples services were using syncronous EntityCentricKnowledgeGraph
methods from async contexts, and connection state was managed with
threading.local which is wrong for asyncio coroutines sharing a single
thread. Qdrant services had no async wrapping at all, blocking the event
loop on every network call. Rows services had unprotected shared state
mutations across concurrent coroutines.
- Add async methods to EntityCentricKnowledgeGraph (async_insert,
async_get_s/p/o/sp/po/os/spo/all, async_collection_exists,
async_create_collection, async_delete_collection) using the existing
cassandra_async.async_execute bridge
- Rewrite triples write + query services: replace threading.local with
asyncio.Lock + dict cache for per-workspace connections, use async
ECKG methods for all data operations, keep asyncio.to_thread only for
one-time blocking ECKG construction
- Wrap all Qdrant calls in asyncio.to_thread across all 6 services
(doc/graph/row embeddings write + query), add asyncio.Lock + set cache
for collection existence checks
- Add asyncio.Lock to rows write + query services to protect shared
state (schemas, sessions, config caches) from concurrent mutation
- Update all affected tests to match new async patterns
* Fixed error only returning a page of results (#921)
The root cause: async_execute only materialises the first result
page (by design — it says so in its docstring). The streaming query
set fetch_size=20 and expected to iterate all results, but only got
the first 20 rows back.
The fix uses
asyncio.to_thread(lambda: list(tg.session.execute(...)))
which lets the sync driver iterate
all pages in a worker thread — exactly what the pre-async code did.
* Optional test warning suppression (#923)
* Fix test collection module errors & silence upstream Pytest warnings (#823)
* chore: add virtual environment and .env directories to gitignore
* test: filter upstream DeprecationWarning and UserWarning messages
* fix(namespace): remove empty __init__.py files to fix PEP 420 implicit namespace routing for trustgraph sub-packages
* Revert __init__.py deletions
* Add .ini changes but commented out, will be useful at times
---------
Co-authored-by: Salil M <d2kyt@protonmail.com>
* fix(openai): fail fast on unrecoverable RateLimitError codes (#901) (#904) (#925)
Co-authored-by: Sahil Yadav <sahilyadav.sy2004@gmail.com>
* Ensure retry exception is properly raised (#926)
* fix: library API get/update document round-trip bugs (#893) (#928)
Fix 5 cascading bugs in the Library API wrapper that prevented
the get_documents → update_document round-trip from working:
- Tolerate missing title field in document metadata (use .get())
- Use attribute access on Triple objects instead of subscript
- Serialize datetime to int seconds for JSON compatibility
- Handle empty server response on successful update
- Send both id and document-id keys in update request
Added library API tests
* Fix ontology selector defaults, add bypass mode, enforce domain/range (#929)
- Align similarity_threshold default to 0.3 everywhere (class signature
had stale 0.7). Fix matching contradiction in tech-spec.
- Add bypass_selector_below parameter (default 5) to skip vector
similarity selection when ontology element count is small enough.
- Enforce domain/range constraints in TripleConverter for object
properties and datatype properties, with subclass hierarchy support.
Properties with no declared domain/range pass through unchanged.
- Add unit tests for domain/range validation, subclass acceptance,
polymorphic pass-through, and selector bypass.
Fixes#908, #920
* Close producers on flow stop to prevent stale non-persistent topics (#930)
Flow.stop() only stopped consumers, leaving response producers
connected to non-persistent Pulsar topics. After flow restart, the
orphaned producers held stale broker routing state, causing response
messages to never reach new consumers — manifesting as 120s timeouts
on document-embeddings and similar RPC paths.
Fix: Flow.stop() now explicitly stops all producers. Producer.stop()
closes the underlying Pulsar producer connection rather than just
setting a flag.
Fixes#906
* fix(gateway): propagate --timeout flag to per-service dispatchers (#931)
The api-gateway accepts a --timeout flag (default 600s) but the value
was not propagated into DispatcherManager, which hard-coded
timeout=120 for every per-service dispatcher (graph-rag, document-rag,
text-completion, embeddings, librarian, etc.).
This meant any synchronous request taking more than 120 seconds would
always return a Timeout error at the 120s mark, regardless of the
--timeout value set on the gateway.
Changes:
- Add timeout parameter to DispatcherManager.__init__ (default: 120
for backward compatibility)
- Store self.timeout in DispatcherManager
- Replace both hardcoded timeout=120 with self.timeout in
invoke_global_service and invoke_flow_service
- Pass self.timeout from Api to DispatcherManager in service.py
- Document the timeout parameter in the docstring
Fixes#894
---------
Co-authored-by: Salil M <d2kyt@protonmail.com>
Co-authored-by: Sahil Yadav <sahilyadav.sy2004@gmail.com>
Co-authored-by: Mister Lobster <jlaportebot@gmail.com>
Native CLI i18n: The TrustGraph CLI has built-in translation support
that dynamically loads language strings. You can test and use
different languages by simply passing the --lang flag (e.g., --lang
es for Spanish, --lang ru for Russian) or by configuring your
environment's LANG variable.
Automated Docs Translations: This PR introduces autonomously
translated Markdown documentation into several target languages,
including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew,
Arabic, Simplified Chinese, and Russian.
* Onto-rag tech spec
* New processor kg-extract-ontology, use 'ontology' objects from config to guide triple extraction
* Also entity contexts
* Integrate with ontology extractor from workbench
This is first phase, the extraction is tested and working, also GraphRAG with the extracted knowledge works