release/v2.4 -> master (#932)

* CLI auth migration, document embeddings core lifecycle (#913) Migrate get_kg_core and put_kg_core CLI tools to use Api/SocketClient with first-frame auth (fixes broken raw websocket path). Fix wire format field names (root/vector). Remove ~600 lines of dead raw websocket code from invoke_graph_rag.py. Add document embeddings core lifecycle to the knowledge service: list/get/put/delete/load operations across schema, translator, Cassandra table store, knowledge manager, gateway registry, REST API, socket client, and CLI (tg-get-de-core, tg-put-de-core). Fix delete_kg_core to also clean up document embeddings rows. * Remove spurious workspace parameter from SPARQL algebra evaluator (#915) Fix threading of workspace paramater: - The SPARQL algebra evaluator was threading a workspace parameter through every function and passing it to TriplesClient.query(), which doesn't accept it. Workspace isolation is handled by pub/sub topic routing — the TriplesClient is already scoped to a workspace-specific flow, same as GraphRAG. Passing workspace explicitly was both incorrect and unnecessary. Update tests: - tests/unit/test_query/test_sparql_algebra.py (new) — Tests _query_pattern, _eval_bgp, and evaluate() with various algebra nodes. Key tests assert workspace is never in tc.query() kwargs, plus correctness tests for BGP, JOIN, UNION, SLICE, DISTINCT, and edge cases. - tests/unit/test_retrieval/test_graph_rag.py — Added test_triples_query_never_passes_workspace (checks query()) and test_follow_edges_never_passes_workspace (checks query_stream()). * Make all Cassandra and Qdrant I/O async-safe with proper concurrency controls (#916) Cassandra triples services were using syncronous EntityCentricKnowledgeGraph methods from async contexts, and connection state was managed with threading.local which is wrong for asyncio coroutines sharing a single thread. Qdrant services had no async wrapping at all, blocking the event loop on every network call. Rows services had unprotected shared state mutations across concurrent coroutines. - Add async methods to EntityCentricKnowledgeGraph (async_insert, async_get_s/p/o/sp/po/os/spo/all, async_collection_exists, async_create_collection, async_delete_collection) using the existing cassandra_async.async_execute bridge - Rewrite triples write + query services: replace threading.local with asyncio.Lock + dict cache for per-workspace connections, use async ECKG methods for all data operations, keep asyncio.to_thread only for one-time blocking ECKG construction - Wrap all Qdrant calls in asyncio.to_thread across all 6 services (doc/graph/row embeddings write + query), add asyncio.Lock + set cache for collection existence checks - Add asyncio.Lock to rows write + query services to protect shared state (schemas, sessions, config caches) from concurrent mutation - Update all affected tests to match new async patterns * Fixed error only returning a page of results (#921) The root cause: async_execute only materialises the first result page (by design — it says so in its docstring). The streaming query set fetch_size=20 and expected to iterate all results, but only got the first 20 rows back. The fix uses asyncio.to_thread(lambda: list(tg.session.execute(...))) which lets the sync driver iterate all pages in a worker thread — exactly what the pre-async code did. * Optional test warning suppression (#923) * Fix test collection module errors & silence upstream Pytest warnings (#823) * chore: add virtual environment and .env directories to gitignore * test: filter upstream DeprecationWarning and UserWarning messages * fix(namespace): remove empty __init__.py files to fix PEP 420 implicit namespace routing for trustgraph sub-packages * Revert __init__.py deletions * Add .ini changes but commented out, will be useful at times --------- Co-authored-by: Salil M <d2kyt@protonmail.com> * fix(openai): fail fast on unrecoverable RateLimitError codes (#901) (#904) (#925) Co-authored-by: Sahil Yadav <sahilyadav.sy2004@gmail.com> * Ensure retry exception is properly raised (#926) * fix: library API get/update document round-trip bugs (#893) (#928) Fix 5 cascading bugs in the Library API wrapper that prevented the get_documents → update_document round-trip from working: - Tolerate missing title field in document metadata (use .get()) - Use attribute access on Triple objects instead of subscript - Serialize datetime to int seconds for JSON compatibility - Handle empty server response on successful update - Send both id and document-id keys in update request Added library API tests * Fix ontology selector defaults, add bypass mode, enforce domain/range (#929) - Align similarity_threshold default to 0.3 everywhere (class signature had stale 0.7). Fix matching contradiction in tech-spec. - Add bypass_selector_below parameter (default 5) to skip vector similarity selection when ontology element count is small enough. - Enforce domain/range constraints in TripleConverter for object properties and datatype properties, with subclass hierarchy support. Properties with no declared domain/range pass through unchanged. - Add unit tests for domain/range validation, subclass acceptance, polymorphic pass-through, and selector bypass. Fixes #908, #920 * Close producers on flow stop to prevent stale non-persistent topics (#930) Flow.stop() only stopped consumers, leaving response producers connected to non-persistent Pulsar topics. After flow restart, the orphaned producers held stale broker routing state, causing response messages to never reach new consumers — manifesting as 120s timeouts on document-embeddings and similar RPC paths. Fix: Flow.stop() now explicitly stops all producers. Producer.stop() closes the underlying Pulsar producer connection rather than just setting a flag. Fixes #906 * fix(gateway): propagate --timeout flag to per-service dispatchers (#931) The api-gateway accepts a --timeout flag (default 600s) but the value was not propagated into DispatcherManager, which hard-coded timeout=120 for every per-service dispatcher (graph-rag, document-rag, text-completion, embeddings, librarian, etc.). This meant any synchronous request taking more than 120 seconds would always return a Timeout error at the 120s mark, regardless of the --timeout value set on the gateway. Changes: - Add timeout parameter to DispatcherManager.__init__ (default: 120 for backward compatibility) - Store self.timeout in DispatcherManager - Replace both hardcoded timeout=120 with self.timeout in invoke_global_service and invoke_flow_service - Pass self.timeout from Api to DispatcherManager in service.py - Document the timeout parameter in the docstring Fixes #894 --------- Co-authored-by: Salil M <d2kyt@protonmail.com> Co-authored-by: Sahil Yadav <sahilyadav.sy2004@gmail.com> Co-authored-by: Mister Lobster <jlaportebot@gmail.com>
2026-07-03 06:51:00 +02:00 · 2026-05-18 09:46:58 +01:00 · 2026-05-18 09:46:58 +01:00 · 71517e6417
commit 71517e6417
parent 142dd0231c
12 changed files with 849 additions and 29 deletions
--- a/trustgraph-base/trustgraph/api/library.py
+++ b/trustgraph-base/trustgraph/api/library.py
@ -365,7 +365,7 @@ class Library:
                    id = v["id"],
                    time = datetime.datetime.fromtimestamp(v["time"]),
                    kind = v["kind"],
-                    title = v["title"],
+                    title = v.get("title", ""),
                    comments = v.get("comments", ""),
                    metadata = [
                        Triple(
@ -482,14 +482,15 @@ class Library:
            "workspace": self.api.workspace,
            "document-metadata": {
                "document-id": id,
-                "time": metadata.time,
+                "id": id,
+                "time": int(metadata.time.timestamp()) if hasattr(metadata.time, 'timestamp') else metadata.time,
                "title": metadata.title,
                "comments": metadata.comments,
                "metadata": [
                    {
-                        "s": from_value(t["s"]),
-                        "p": from_value(t["p"]),
-                        "o": from_value(t["o"]),
+                        "s": from_value(t.s),
+                        "p": from_value(t.p),
+                        "o": from_value(t.o),
                    }
                    for t in metadata.metadata
                ],
@ -498,14 +499,17 @@ class Library:
        }

        object = self.request(input)
-        doc = object["document-metadata"]
+        doc = object.get("document-metadata") if isinstance(object, dict) else None
+
+        if not doc:
+            return metadata

        try:
-            DocumentMetadata(
+            return DocumentMetadata(
                id = doc["id"],
                time = datetime.datetime.fromtimestamp(doc["time"]),
                kind = doc["kind"],
-                title = doc["title"],
+                title = doc.get("title", ""),
                comments = doc.get("comments", ""),
                metadata = [
                    Triple(
@ -513,10 +517,11 @@ class Library:
                        p = to_value(w["p"]),
                        o = to_value(w["o"])
                    )
-                    for w in doc["metadata"]
+                    for w in doc.get("metadata", [])
                ],
-                workspace = doc.get("workspace", ""),
-                tags = doc["tags"]
+                tags = doc.get("tags", []),
+                parent_id = doc.get("parent-id", ""),
+                document_type = doc.get("document-type", "source"),
            )
        except Exception as e:
            logger.error("Failed to parse document update response", exc_info=True)
--- a/trustgraph-base/trustgraph/base/flow.py
+++ b/trustgraph-base/trustgraph/base/flow.py
@ -34,6 +34,8 @@ class Flow:
    async def stop(self):
        for c in self.consumer.values():
            await c.stop()
+        for p in self.producer.values():
+            await p.stop()
        if self.librarian:
            await self.librarian.stop()

--- a/trustgraph-base/trustgraph/base/producer.py
+++ b/trustgraph-base/trustgraph/base/producer.py
@ -34,6 +34,9 @@ class Producer:

    async def stop(self):
        self.running = False
+        if self.producer:
+            self.producer.close()
+            self.producer = None

    async def send(self, msg, properties={}):