trustgraph

mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-07-08 12:52:12 +02:00

Author	SHA1	Message	Date
Cyber MacGeddon	668b64742f	Merge branch 'release/v2.4'	2026-05-19 18:01:35 +01:00
cybermaggedon	fd6e3e1269	fix: stop dropping messages on Pulsar flow restarts (#938 ) consumer.py called unsubscribe() on every flow stop, deleting the server-side subscription cursor. On restart, initial_position='latest' skipped any messages published during the gap — causing intermittent data loss (e.g. graph embeddings silently never reaching Qdrant). Replace unsubscribe() with close() so the cursor survives restarts. Move subscription cleanup to where it belongs: the Pulsar backend's delete_topic(), called by the flow controller on deliberate flow deletion. This was previously a no-op TODO.	2026-05-19 13:26:39 +01:00
cybermaggedon	47dfc30c1c	fix: suppress Pulsar C++ client log noise (#936 ) Revert consumer receive timeout from 100ms back to the original 2000ms. The 100ms change was based on a misunderstanding — receive() is a blocking call that returns immediately when a message arrives, so the timeout only affects how quickly a consumer checks the shutdown flag during idle periods. 100ms generated ~200 WARN lines/sec from the C++ client with no latency benefit. Also set the Pulsar C++ client logger to Error level so residual timeout warnings from the subscriber (250ms) don't produce noise. Update poll timeout test to match reverted 2000ms value	2026-05-18 22:08:52 +01:00
cybermaggedon	29d3100c46	fix: IAM bootstrap atomicity and bootstrapper startup ordering (#935 ) IAM auto-bootstrap could get permanently stuck in a half-done state: _seed_tables wrote the workspace first, so any_workspace_exists() returned true on restart even when user/key/signing-key creation had failed. Remove workspace creation from _seed_tables (WorkspaceInit handles it) and use any_signing_key_exists() as the completion check since the signing key is the last thing written. Run pre-service initialisers (PulsarTopology) in start() before opening pub/sub connections, breaking the chicken-and-egg where the bootstrapper needed Pulsar namespaces that it was responsible for creating. Guard against empty cluster list when broker isn't ready.	2026-05-18 22:08:12 +01:00
cybermaggedon	76e3358ed3	fix: guard against empty query in SPARQL generator (#934 ) Split the query once and check the parts list before indexing, preventing an IndexError if the LLM returns an empty or whitespace-only string. Fixes #870.	2026-05-18 14:19:19 +01:00
cybermaggedon	da7d10e995	feat: add no-auth IAM regime as a drop-in replacement for iam-svc (#933 ) Adds `no-auth-svc`, a lightweight IAM service that permits all access unconditionally — no database, no bootstrap, no signing keys. Deploy it in place of `iam-svc` for development, demos, and single-user setups where authentication overhead is unwanted. The gateway no longer hard-codes a 401 on missing credentials. Instead it asks the IAM regime via a new `authenticate-anonymous` operation whether token-free access is allowed. This keeps the gateway regime-agnostic: `iam-svc` rejects anonymous auth (preserving existing security), while `no-auth-svc` grants it with a configurable default user and workspace. Includes a tech spec (docs/tech-specs/no-auth-regime.md) and tests that pin the safety boundary — malformed tokens never fall through to the anonymous path, and a contract test ensures the full iam-svc always rejects `authenticate-anonymous`.	2026-05-18 14:10:05 +01:00
cybermaggedon	71517e6417	release/v2.4 -> master (#932 ) * CLI auth migration, document embeddings core lifecycle (#913) Migrate get_kg_core and put_kg_core CLI tools to use Api/SocketClient with first-frame auth (fixes broken raw websocket path). Fix wire format field names (root/vector). Remove ~600 lines of dead raw websocket code from invoke_graph_rag.py. Add document embeddings core lifecycle to the knowledge service: list/get/put/delete/load operations across schema, translator, Cassandra table store, knowledge manager, gateway registry, REST API, socket client, and CLI (tg-get-de-core, tg-put-de-core). Fix delete_kg_core to also clean up document embeddings rows. * Remove spurious workspace parameter from SPARQL algebra evaluator (#915) Fix threading of workspace paramater: - The SPARQL algebra evaluator was threading a workspace parameter through every function and passing it to TriplesClient.query(), which doesn't accept it. Workspace isolation is handled by pub/sub topic routing — the TriplesClient is already scoped to a workspace-specific flow, same as GraphRAG. Passing workspace explicitly was both incorrect and unnecessary. Update tests: - tests/unit/test_query/test_sparql_algebra.py (new) — Tests _query_pattern, _eval_bgp, and evaluate() with various algebra nodes. Key tests assert workspace is never in tc.query() kwargs, plus correctness tests for BGP, JOIN, UNION, SLICE, DISTINCT, and edge cases. - tests/unit/test_retrieval/test_graph_rag.py — Added test_triples_query_never_passes_workspace (checks query()) and test_follow_edges_never_passes_workspace (checks query_stream()). * Make all Cassandra and Qdrant I/O async-safe with proper concurrency controls (#916) Cassandra triples services were using syncronous EntityCentricKnowledgeGraph methods from async contexts, and connection state was managed with threading.local which is wrong for asyncio coroutines sharing a single thread. Qdrant services had no async wrapping at all, blocking the event loop on every network call. Rows services had unprotected shared state mutations across concurrent coroutines. - Add async methods to EntityCentricKnowledgeGraph (async_insert, async_get_s/p/o/sp/po/os/spo/all, async_collection_exists, async_create_collection, async_delete_collection) using the existing cassandra_async.async_execute bridge - Rewrite triples write + query services: replace threading.local with asyncio.Lock + dict cache for per-workspace connections, use async ECKG methods for all data operations, keep asyncio.to_thread only for one-time blocking ECKG construction - Wrap all Qdrant calls in asyncio.to_thread across all 6 services (doc/graph/row embeddings write + query), add asyncio.Lock + set cache for collection existence checks - Add asyncio.Lock to rows write + query services to protect shared state (schemas, sessions, config caches) from concurrent mutation - Update all affected tests to match new async patterns * Fixed error only returning a page of results (#921) The root cause: async_execute only materialises the first result page (by design — it says so in its docstring). The streaming query set fetch_size=20 and expected to iterate all results, but only got the first 20 rows back. The fix uses asyncio.to_thread(lambda: list(tg.session.execute(...))) which lets the sync driver iterate all pages in a worker thread — exactly what the pre-async code did. * Optional test warning suppression (#923) * Fix test collection module errors & silence upstream Pytest warnings (#823) * chore: add virtual environment and .env directories to gitignore * test: filter upstream DeprecationWarning and UserWarning messages * fix(namespace): remove empty __init__.py files to fix PEP 420 implicit namespace routing for trustgraph sub-packages * Revert __init__.py deletions * Add .ini changes but commented out, will be useful at times --------- Co-authored-by: Salil M <d2kyt@protonmail.com> * fix(openai): fail fast on unrecoverable RateLimitError codes (#901) (#904) (#925) Co-authored-by: Sahil Yadav <sahilyadav.sy2004@gmail.com> * Ensure retry exception is properly raised (#926) * fix: library API get/update document round-trip bugs (#893) (#928) Fix 5 cascading bugs in the Library API wrapper that prevented the get_documents → update_document round-trip from working: - Tolerate missing title field in document metadata (use .get()) - Use attribute access on Triple objects instead of subscript - Serialize datetime to int seconds for JSON compatibility - Handle empty server response on successful update - Send both id and document-id keys in update request Added library API tests * Fix ontology selector defaults, add bypass mode, enforce domain/range (#929) - Align similarity_threshold default to 0.3 everywhere (class signature had stale 0.7). Fix matching contradiction in tech-spec. - Add bypass_selector_below parameter (default 5) to skip vector similarity selection when ontology element count is small enough. - Enforce domain/range constraints in TripleConverter for object properties and datatype properties, with subclass hierarchy support. Properties with no declared domain/range pass through unchanged. - Add unit tests for domain/range validation, subclass acceptance, polymorphic pass-through, and selector bypass. Fixes #908, #920 * Close producers on flow stop to prevent stale non-persistent topics (#930) Flow.stop() only stopped consumers, leaving response producers connected to non-persistent Pulsar topics. After flow restart, the orphaned producers held stale broker routing state, causing response messages to never reach new consumers — manifesting as 120s timeouts on document-embeddings and similar RPC paths. Fix: Flow.stop() now explicitly stops all producers. Producer.stop() closes the underlying Pulsar producer connection rather than just setting a flag. Fixes #906 * fix(gateway): propagate --timeout flag to per-service dispatchers (#931) The api-gateway accepts a --timeout flag (default 600s) but the value was not propagated into DispatcherManager, which hard-coded timeout=120 for every per-service dispatcher (graph-rag, document-rag, text-completion, embeddings, librarian, etc.). This meant any synchronous request taking more than 120 seconds would always return a Timeout error at the 120s mark, regardless of the --timeout value set on the gateway. Changes: - Add timeout parameter to DispatcherManager.__init__ (default: 120 for backward compatibility) - Store self.timeout in DispatcherManager - Replace both hardcoded timeout=120 with self.timeout in invoke_global_service and invoke_flow_service - Pass self.timeout from Api to DispatcherManager in service.py - Document the timeout parameter in the docstring Fixes #894 --------- Co-authored-by: Salil M <d2kyt@protonmail.com> Co-authored-by: Sahil Yadav <sahilyadav.sy2004@gmail.com> Co-authored-by: Mister Lobster <jlaportebot@gmail.com>	2026-05-18 09:46:58 +01:00
Mister Lobster	ab83c81d8a	fix(gateway): propagate --timeout flag to per-service dispatchers (#931 ) The api-gateway accepts a --timeout flag (default 600s) but the value was not propagated into DispatcherManager, which hard-coded timeout=120 for every per-service dispatcher (graph-rag, document-rag, text-completion, embeddings, librarian, etc.). This meant any synchronous request taking more than 120 seconds would always return a Timeout error at the 120s mark, regardless of the --timeout value set on the gateway. Changes: - Add timeout parameter to DispatcherManager.__init__ (default: 120 for backward compatibility) - Store self.timeout in DispatcherManager - Replace both hardcoded timeout=120 with self.timeout in invoke_global_service and invoke_flow_service - Pass self.timeout from Api to DispatcherManager in service.py - Document the timeout parameter in the docstring Fixes #894	2026-05-18 09:44:37 +01:00
cybermaggedon	2b70a1ea8e	Close producers on flow stop to prevent stale non-persistent topics (#930 ) Flow.stop() only stopped consumers, leaving response producers connected to non-persistent Pulsar topics. After flow restart, the orphaned producers held stale broker routing state, causing response messages to never reach new consumers — manifesting as 120s timeouts on document-embeddings and similar RPC paths. Fix: Flow.stop() now explicitly stops all producers. Producer.stop() closes the underlying Pulsar producer connection rather than just setting a flag. Fixes #906	2026-05-16 16:07:16 +01:00
cybermaggedon	38d9c746a8	Fix ontology selector defaults, add bypass mode, enforce domain/range (#929 ) - Align similarity_threshold default to 0.3 everywhere (class signature had stale 0.7). Fix matching contradiction in tech-spec. - Add bypass_selector_below parameter (default 5) to skip vector similarity selection when ontology element count is small enough. - Enforce domain/range constraints in TripleConverter for object properties and datatype properties, with subclass hierarchy support. Properties with no declared domain/range pass through unchanged. - Add unit tests for domain/range validation, subclass acceptance, polymorphic pass-through, and selector bypass. Fixes #908, #920	2026-05-16 15:13:38 +01:00
cybermaggedon	aea4c2df8e	fix: library API get/update document round-trip bugs (#893 ) (#928 ) Fix 5 cascading bugs in the Library API wrapper that prevented the get_documents → update_document round-trip from working: - Tolerate missing title field in document metadata (use .get()) - Use attribute access on Triple objects instead of subscript - Serialize datetime to int seconds for JSON compatibility - Handle empty server response on successful update - Send both id and document-id keys in update request Added library API tests	2026-05-16 11:32:51 +01:00
cybermaggedon	913f610db5	Ensure retry exception is properly raised (#926 )	2026-05-15 13:35:04 +01:00
cybermaggedon	58b5c5c8d5	fix(openai): fail fast on unrecoverable RateLimitError codes (#901 ) (#904 ) (#925 ) Co-authored-by: Sahil Yadav <sahilyadav.sy2004@gmail.com>	2026-05-15 13:32:30 +01:00
cybermaggedon	142dd0231c	release/v2.4 -> master (#924 ) * CLI auth migration, document embeddings core lifecycle (#913) Migrate get_kg_core and put_kg_core CLI tools to use Api/SocketClient with first-frame auth (fixes broken raw websocket path). Fix wire format field names (root/vector). Remove ~600 lines of dead raw websocket code from invoke_graph_rag.py. Add document embeddings core lifecycle to the knowledge service: list/get/put/delete/load operations across schema, translator, Cassandra table store, knowledge manager, gateway registry, REST API, socket client, and CLI (tg-get-de-core, tg-put-de-core). Fix delete_kg_core to also clean up document embeddings rows. * Remove spurious workspace parameter from SPARQL algebra evaluator (#915) Fix threading of workspace paramater: - The SPARQL algebra evaluator was threading a workspace parameter through every function and passing it to TriplesClient.query(), which doesn't accept it. Workspace isolation is handled by pub/sub topic routing — the TriplesClient is already scoped to a workspace-specific flow, same as GraphRAG. Passing workspace explicitly was both incorrect and unnecessary. Update tests: - tests/unit/test_query/test_sparql_algebra.py (new) — Tests _query_pattern, _eval_bgp, and evaluate() with various algebra nodes. Key tests assert workspace is never in tc.query() kwargs, plus correctness tests for BGP, JOIN, UNION, SLICE, DISTINCT, and edge cases. - tests/unit/test_retrieval/test_graph_rag.py — Added test_triples_query_never_passes_workspace (checks query()) and test_follow_edges_never_passes_workspace (checks query_stream()). * Make all Cassandra and Qdrant I/O async-safe with proper concurrency controls (#916) Cassandra triples services were using syncronous EntityCentricKnowledgeGraph methods from async contexts, and connection state was managed with threading.local which is wrong for asyncio coroutines sharing a single thread. Qdrant services had no async wrapping at all, blocking the event loop on every network call. Rows services had unprotected shared state mutations across concurrent coroutines. - Add async methods to EntityCentricKnowledgeGraph (async_insert, async_get_s/p/o/sp/po/os/spo/all, async_collection_exists, async_create_collection, async_delete_collection) using the existing cassandra_async.async_execute bridge - Rewrite triples write + query services: replace threading.local with asyncio.Lock + dict cache for per-workspace connections, use async ECKG methods for all data operations, keep asyncio.to_thread only for one-time blocking ECKG construction - Wrap all Qdrant calls in asyncio.to_thread across all 6 services (doc/graph/row embeddings write + query), add asyncio.Lock + set cache for collection existence checks - Add asyncio.Lock to rows write + query services to protect shared state (schemas, sessions, config caches) from concurrent mutation - Update all affected tests to match new async patterns * Fixed error only returning a page of results (#921) The root cause: async_execute only materialises the first result page (by design — it says so in its docstring). The streaming query set fetch_size=20 and expected to iterate all results, but only got the first 20 rows back. The fix uses asyncio.to_thread(lambda: list(tg.session.execute(...))) which lets the sync driver iterate all pages in a worker thread — exactly what the pre-async code did. * Optional test warning suppression (#923) * Fix test collection module errors & silence upstream Pytest warnings (#823) * chore: add virtual environment and .env directories to gitignore * test: filter upstream DeprecationWarning and UserWarning messages * fix(namespace): remove empty __init__.py files to fix PEP 420 implicit namespace routing for trustgraph sub-packages * Revert __init__.py deletions * Add .ini changes but commented out, will be useful at times --------- Co-authored-by: Salil M <d2kyt@protonmail.com>	2026-05-15 13:02:51 +01:00
cybermaggedon	01b1fd849d	Optional test warning suppression (#923 ) * Fix test collection module errors & silence upstream Pytest warnings (#823) * chore: add virtual environment and .env directories to gitignore * test: filter upstream DeprecationWarning and UserWarning messages * fix(namespace): remove empty __init__.py files to fix PEP 420 implicit namespace routing for trustgraph sub-packages * Revert __init__.py deletions * Add .ini changes but commented out, will be useful at times --------- Co-authored-by: Salil M <d2kyt@protonmail.com>	2026-05-15 12:58:12 +01:00
cybermaggedon	846282c375	Fixed error only returning a page of results (#921 ) The root cause: async_execute only materialises the first result page (by design — it says so in its docstring). The streaming query set fetch_size=20 and expected to iterate all results, but only got the first 20 rows back. The fix uses asyncio.to_thread(lambda: list(tg.session.execute(...))) which lets the sync driver iterate all pages in a worker thread — exactly what the pre-async code did.	2026-05-14 21:03:09 +01:00
cybermaggedon	a2dde9cafb	Make all Cassandra and Qdrant I/O async-safe with proper concurrency controls (#916 ) Cassandra triples services were using syncronous EntityCentricKnowledgeGraph methods from async contexts, and connection state was managed with threading.local which is wrong for asyncio coroutines sharing a single thread. Qdrant services had no async wrapping at all, blocking the event loop on every network call. Rows services had unprotected shared state mutations across concurrent coroutines. - Add async methods to EntityCentricKnowledgeGraph (async_insert, async_get_s/p/o/sp/po/os/spo/all, async_collection_exists, async_create_collection, async_delete_collection) using the existing cassandra_async.async_execute bridge - Rewrite triples write + query services: replace threading.local with asyncio.Lock + dict cache for per-workspace connections, use async ECKG methods for all data operations, keep asyncio.to_thread only for one-time blocking ECKG construction - Wrap all Qdrant calls in asyncio.to_thread across all 6 services (doc/graph/row embeddings write + query), add asyncio.Lock + set cache for collection existence checks - Add asyncio.Lock to rows write + query services to protect shared state (schemas, sessions, config caches) from concurrent mutation - Update all affected tests to match new async patterns	2026-05-14 16:00:54 +01:00
cybermaggedon	bb1109963c	Remove spurious workspace parameter from SPARQL algebra evaluator (#915 ) Fix threading of workspace paramater: - The SPARQL algebra evaluator was threading a workspace parameter through every function and passing it to TriplesClient.query(), which doesn't accept it. Workspace isolation is handled by pub/sub topic routing — the TriplesClient is already scoped to a workspace-specific flow, same as GraphRAG. Passing workspace explicitly was both incorrect and unnecessary. Update tests: - tests/unit/test_query/test_sparql_algebra.py (new) — Tests _query_pattern, _eval_bgp, and evaluate() with various algebra nodes. Key tests assert workspace is never in tc.query() kwargs, plus correctness tests for BGP, JOIN, UNION, SLICE, DISTINCT, and edge cases. - tests/unit/test_retrieval/test_graph_rag.py — Added test_triples_query_never_passes_workspace (checks query()) and test_follow_edges_never_passes_workspace (checks query_stream()).	2026-05-14 12:03:43 +01:00
cybermaggedon	f0ad282708	CLI auth migration, document embeddings core lifecycle (#913 ) Migrate get_kg_core and put_kg_core CLI tools to use Api/SocketClient with first-frame auth (fixes broken raw websocket path). Fix wire format field names (root/vector). Remove ~600 lines of dead raw websocket code from invoke_graph_rag.py. Add document embeddings core lifecycle to the knowledge service: list/get/put/delete/load operations across schema, translator, Cassandra table store, knowledge manager, gateway registry, REST API, socket client, and CLI (tg-get-de-core, tg-put-de-core). Fix delete_kg_core to also clean up document embeddings rows.	2026-05-14 10:30:21 +01:00
Cyber MacGeddon	159b1e2824	Merge branch 'release/v2.4'	2026-05-11 15:15:50 +01:00
KOTHA-SRIVIBHU	dd974b0cac	fix: replace bare excepts in NLTK initialization (#896 )	2026-05-11 15:12:25 +01:00
KOTHA-SRIVIBHU	ab02c02b33	fix: replace bare excepts in NLTK initialization (#896 )	2026-05-11 15:11:30 +01:00
Sahil Yadav	d08ec56a73	fix: resolve publisher resource leak and field parse validation (#886 )	2026-05-11 15:06:54 +01:00
Sahil Yadav	c2f1759bdf	fix: resolve publisher resource leak and field parse validation (#886 )	2026-05-11 15:06:24 +01:00
Jack Colquitt	80a7579639	Enhance README.md descriptions for TrustGraph and Context Core (#892 ) Refined the description of TrustGraph and updated the Context Core explanation for clarity.	2026-05-08 13:45:06 -07:00
cybermaggedon	fd8d5b2c42	Recent fixes -> release/v2.4 (#891 ) * Fix publisher resource leak in librarian submit_document (#883) Wrap pub.start()/pub.send() in try/finally to guarantee pub.stop() is called on error. Remove unnecessary asyncio.sleep(1) kludge. * Make Cassandra replication factor configurable (issue #787) (#887) Add CASSANDRA_REPLICATION_FACTOR environment variable and --cassandra-replication-factor CLI argument to cassandra_config.py. Update all four table store constructors (ConfigTableStore, KnowledgeTableStore, LibraryTableStore, IamTableStore) to accept an optional replication_factor parameter and use it in keyspace creation CQL queries. Thread the replication factor through all service constructors: Configuration, KnowledgeManager, Librarian, IamService, and knowledge store Processor. * Update tests --------- Co-authored-by: gittihub-jpg <rico@springer-mail.net>	2026-05-08 19:48:12 +01:00
gittihub-jpg	e23d4a5b58	Make Cassandra replication factor configurable (issue #787 ) (#887 ) Add CASSANDRA_REPLICATION_FACTOR environment variable and --cassandra-replication-factor CLI argument to cassandra_config.py. Update all four table store constructors (ConfigTableStore, KnowledgeTableStore, LibraryTableStore, IamTableStore) to accept an optional replication_factor parameter and use it in keyspace creation CQL queries. Thread the replication factor through all service constructors: Configuration, KnowledgeManager, Librarian, IamService, and knowledge store Processor.	2026-05-08 19:31:58 +01:00
gittihub-jpg	f9d6606423	Fix publisher resource leak in librarian submit_document (#883 ) Wrap pub.start()/pub.send() in try/finally to guarantee pub.stop() is called on error. Remove unnecessary asyncio.sleep(1) kludge.	2026-05-08 19:22:48 +01:00
cybermaggedon	fe542b3d33	Remove race condition in workspace initialisation if iam-svc is up (#867 ) Remove race condition in workspace initialisation if iam-svc is up before config-svc. iam.py — handle_create_workspace: - Config registration (_on_workspace_created) moves before the IAM table write, so it's a prerequisite. If the config put fails, the exception propagates and the IAM create doesn't happen. - On duplicate, the IAM table write is skipped but config registration still runs (idempotent put). Returns the existing record with no error instead of returning _err("duplicate", ...). service.py — _announce_workspace_created → _ensure_workspace_registered: - Renamed to reflect the new semantics. - Exceptions propagate instead of being swallowed — if config registration fails, the caller sees the error.	2026-05-06 21:21:48 +01:00
cybermaggedon	d282d72db1	Fixed document-rag workspace problem (#866 ) - Fixed document-rag workspace problem - OpenAI text-completion processor now puts 'not-set' in the token if no token is set (new OpenAI library requires it to be set to something. - Update tests	2026-05-06 14:55:21 +01:00
cybermaggedon	03cc5ac80f	Per-flow librarian clients and per-workspace response queues (#865 ) Replace singleton LibrarianClient with per-flow instances via the new LibrarianSpec, giving each flow its own librarian tied to the workspace-scoped request/response queues from the blueprint. Move all workspace-scoped services (config, flow, librarian, knowledge) from a single base-queue response producer to per-workspace response producers created alongside the existing per-workspace request consumers. Update the gateway dispatcher and bootstrapper flow client to subscribe to the matching workspace-scoped response queues. Fix WorkspaceInit to register workspaces through the IAM create-workspace API so they appear in __workspaces__ and are visible to the gateway. Simplify the bootstrapper gate to only check config-svc reachability. Updated tests accordingly.	2026-05-06 12:01:01 +01:00
Jack Colquitt	1ffae12559	Revise README to reflect agent runtime platform (#864 ) Updated platform description and added messaging systems.	2026-05-05 11:47:19 -07:00
cybermaggedon	01bf1d89d5	Fixed a circular dependency causing bootstrap to fail (#863 ) TemplateSeed and WorkspaceInit now run pre-gate. They'll write templates and register the default workspace before the gate checks flow-svc, breaking the circular dependency.	2026-05-05 16:00:21 +01:00
cybermaggedon	9f2bfbce0c	Per-workspace queue routing for workspace-scoped services (#862 ) Workspace identity is now determined by queue infrastructure instead of message body fields, closing a privilege-escalation vector where a caller could spoof workspace in the request payload. - Add WorkspaceProcessor base class: discovers workspaces from config at startup, creates per-workspace consumers (queue:workspace), and manages consumer lifecycle on workspace create/delete events - Roll out to librarian, flow-svc, knowledge cores, and config-svc - Config service gets a dual-queue regime: a system queue for cross-workspace ops (getvalues-all-ws, bootstrapper writes to __workspaces__) and per-workspace queues for tenant-scoped ops, with workspace discovery from its own Cassandra store - Remove workspace field from request schemas (FlowRequest, LibrarianRequest, KnowledgeRequest, CollectionManagementRequest) and from DocumentMetadata / ProcessingMetadata — table stores now accept workspace as an explicit parameter - Strip workspace encode/decode from all message translators and gateway serializers - Gateway enforces workspace existence: reject requests targeting non-existent workspaces instead of routing to queues with no consumer - Config service provisions new workspaces from __template__ on creation - Add workspace lifecycle hooks to AsyncProcessor so any processor can react to workspace create/delete without subclassing WorkspaceProcessor	2026-05-04 10:30:03 +01:00
cybermaggedon	9be257ceee	Update packages with vulns in container builds (#861 ) * Fix vulns-flagged imports * Fix archaic pulls in the "trustgraph" package * Add unstructured to meta package	2026-04-30 20:02:53 +01:00
cybermaggedon	89f058d35b	Fix erroneous handling of __template__ and __system__ workspaces (#860 ) __template__ and __system__ are not real workspace but config zones for workspace template and init handling. They should not be processed as workspaces. Now ignores workspace beginning with '_' prefix.	2026-04-30 09:53:58 +01:00
cybermaggedon	69b0b9b326	Add missing websockets dep (#859 ) Added 'websockets' to pyproject.toml in trustgraph-base	2026-04-30 09:53:32 +01:00
Cyber MacGeddon	c112af0ab0	align chunker + googleaistudio fixes with release/v2.4 Master had a parallel sibling fix for issue #821 (PR #828) using self.RecursiveCharacterTextSplitter / self.TokenTextSplitter; release branches converged on the bare module-level form. Adopt release/v2.4's version so downstream branches don't drift further.	2026-04-29 18:01:31 +01:00
Cyber MacGeddon	f3434307c5	Merge branch 'release/v2.4'	2026-04-29 17:57:24 +01:00
Jack Colquitt	627cb1e0d8	Remove Cossmology badge from README (#857 ) Removed the badge for trustgraph on Cossmology from the README.	2026-04-28 16:54:23 -07:00
cybermaggedon	d0850ff381	Delete some stuff to free up disk space (#856 )	2026-04-28 22:46:02 +01:00
cybermaggedon	9fc1d4527b	iam: self-service ops, optional workspace filters, Mux service routing (#855 ) Three threads, all reinforcing the contract's system-level vs. workspace-association distinction. WS Mux service routing - tg-show-flows (and any workspace-level service over the WS) was failing with "unknown service" because the post-refactor Mux unconditionally looked up flow-service:<kind>. Now branches on the envelope's flow field: with flow → flow-service:<kind>; without flow → <kind>:<op> from the inner body; with bare op lookup for service=iam. Resource and parameters come from the matched op's own extractors — same path the HTTP endpoints take. Optional workspace on system-level user/key ops - list-users returns the deployment-wide list when no workspace is supplied, filters when one is. get-user, update-user, disable-user, enable-user, delete-user, reset-password, create-api-key, list-api-keys, revoke-api-key all treat workspace as an optional integrity check rather than a required argument. - create-user keeps workspace required — there it's the new user's home-workspace binding, a parameter rather than an address. - API keys reclassified as SYSTEM-level resources. By the same reasoning that makes users system-level, an API key is a credential record on a deployment-wide registry; the workspace it authenticates to is a property, not a containment. Self-service surface - whoami: returns the caller's own user record. AUTHENTICATED-only; no users:read capability required. Foundation for UI affordances that depend on the caller's permissions. - bootstrap-status: POST /api/v1/auth/bootstrap-status, PUBLIC, side-effect-free. Returns {bootstrap_available: bool} so a first-run UI can decide whether to render setup without consuming the bootstrap op. - Gateway now injects actor=identity.handle on every authenticated forward to iam-svc (IamEndpoint and WS Mux iam path), overwriting any caller-supplied value. Underpins whoami, audit logging, and future regime-side decisions that need actor identity. - tg-whoami and tg-update-user CLIs. Spec polish - iam-contract.md: actor-injection rule documented; whoami / bootstrap-status added to operations list; permission-scope framing tightened (workspace scope is a property of the grant, not the user or role). - iam.md: self-service section; gateway flow gains the actor- injection step; role section reframed so iam-svc constraints don't leak into contract-level prose. - iam-protocol.md: ops table updated for whoami, bootstrap-status, optional-workspace pattern; bootstrap_available added to the IamResponse listing.	2026-04-28 22:13:12 +01:00
Trevin Chow	6302eb8c97	test(ontology): harden domain/range validation + add missing tests (#848 ) Fixes #826. Addresses all five points the maintainer called out in the follow-up to #825. Source change (trustgraph-flow/trustgraph/extract/kg/ontology/extract.py): - Added `_is_subclass_of(cls, target, ontology_subset, max_depth=100)` helper with visited-set cycle detection + a defensive depth cap. LLM-generated ontologies may emit cycles (A subclass_of B, B subclass_of A); the prior while-loop would infinite-loop on that. - Replaced both near-identical domain and range subclass walks in `is_valid_triple` with a single call to the new helper. Net is -20 duplicated lines + 26-line helper. Tests (tests/unit/test_extract/test_ontology/test_prompt_and_extraction.py): - test_is_valid_triple_subclass_is_accepted: domain expects Recipe, actual type is Cake (subclass), validates. - test_is_valid_triple_handles_subclass_cycle_without_infinite_loop: A subclass_of B, B subclass_of A; call returns False within the depth cap rather than hanging. - test_parse_and_validate_triples_collects_entity_types_from_rdf_type: end-to-end path: rdf:type triples build the entity_types dict, subsequent domain-check triples validate against it. - test_is_valid_triple_entity_types_none_default: the None default path now has explicit coverage. 156 existing tests in tests/unit/test_extract/test_ontology still pass.	2026-04-28 16:37:42 +01:00
Trevin Chow	e59aa023d4	test(ontology): harden domain/range validation + add missing tests (#848 ) Fixes #826. Addresses all five points the maintainer called out in the follow-up to #825. Source change (trustgraph-flow/trustgraph/extract/kg/ontology/extract.py): - Added `_is_subclass_of(cls, target, ontology_subset, max_depth=100)` helper with visited-set cycle detection + a defensive depth cap. LLM-generated ontologies may emit cycles (A subclass_of B, B subclass_of A); the prior while-loop would infinite-loop on that. - Replaced both near-identical domain and range subclass walks in `is_valid_triple` with a single call to the new helper. Net is -20 duplicated lines + 26-line helper. Tests (tests/unit/test_extract/test_ontology/test_prompt_and_extraction.py): - test_is_valid_triple_subclass_is_accepted: domain expects Recipe, actual type is Cake (subclass), validates. - test_is_valid_triple_handles_subclass_cycle_without_infinite_loop: A subclass_of B, B subclass_of A; call returns False within the depth cap rather than hanging. - test_parse_and_validate_triples_collects_entity_types_from_rdf_type: end-to-end path: rdf:type triples build the entity_types dict, subsequent domain-check triples validate against it. - test_is_valid_triple_entity_types_none_default: the None default path now has explicit coverage. 156 existing tests in tests/unit/test_extract/test_ontology still pass.	2026-04-28 16:33:49 +01:00
cybermaggedon	5e28d3cce0	refactor(iam): pluggable IAM regime via authenticate/authorise contract (#853 ) The gateway no longer holds any policy state — capability sets, role definitions, workspace scope rules. Per the IAM contract it asks the regime "may this identity perform this capability on this resource?" per request. That moves the OSS role-based regime entirely into iam-svc, which can be replaced (SSO, ABAC, ReBAC) without changing the gateway, the wire protocol, or backend services. Contract: - authenticate(credential) -> Identity (handle, workspace, principal_id, source). No roles, claims, or policy state surface to the gateway. - authorise(identity, capability, resource, parameters) -> (allow, ttl). Cached per-decision (regime TTL clamped above; fail-closed on regime errors). - authorise_many available as a fan-out variant. Operation registry drives every authorisation decision: - /api/v1/iam -> IamEndpoint, looks up bare op name (create-user, list-workspaces, ...). - /api/v1/{kind} -> RegistryRoutedVariableEndpoint, <kind>:<op> (config:get, flow:list-blueprints, librarian:add-document, ...). - /api/v1/flow/{flow}/service/{kind} -> flow-service:<kind>. - /api/v1/flow/{flow}/{import,export}/{kind} -> flow-{import,export}:<kind>. - WS Mux per-frame -> flow-service:<kind>; closes a gap where authenticated users could hit any service kind. 85 operations registered across the surface. JWT carries identity only — sub + workspace. The roles claim is gone; the gateway never reads policy state from a credential. The three coarse *_KIND_CAPABILITY maps are removed. The registry is the only source of truth for the capability + resource shape of an operation. Tests migrated to the new Identity shape and to authorise()-mocked auth doubles. Specs updated: docs/tech-specs/iam-contract.md (Identity surface, caching, registry-naming conventions), iam.md (JWT shape, gateway flow, role section reframed as OSS-regime detail), iam-protocol.md (positioned as one implementation of the contract).	2026-04-28 16:19:41 +01:00
cybermaggedon	9f2d9adcb1	Fix Ollama async issue (#854 ) * Fix Ollama sync issues - replaced with async * Fix tests	2026-04-28 15:43:04 +01:00
cybermaggedon	b15f1a167c	Smoke-test websocket tool (#852 )	2026-04-28 15:05:35 +01:00
cybermaggedon	666af1c4b3	feat(iam): allow bootstrap mode and token to be sourced from env vars (#851 ) Adds an environment-variable fallback for the iam-svc bootstrap configuration so the token can be injected from a Kubernetes Secret (or any equivalent secret store) without ever appearing in the processor-group YAML — which is typically version-controlled. Resolution order is fixed and per-setting: bootstrap_mode = params["bootstrap_mode"] or $IAM_BOOTSTRAP_MODE bootstrap_token = params["bootstrap_token"] or $IAM_BOOTSTRAP_TOKEN If neither source supplies a value, the service refuses to start with a clear message naming both options. The two settings are resolved independently, which lets operators commit the mode in YAML (it is not a secret) while pulling the token from a Secret-backed ``IAM_BOOTSTRAP_TOKEN`` env var. Validation invariants are unchanged: * mode must be 'token' or 'bootstrap' * mode='token' requires a token (from any source) * mode='bootstrap' must NOT have a token (ambiguous intent) There is no permissive fallback — the service fails closed in every branch where configuration is incomplete. docs/tech-specs/iam-protocol.md gains a 'Configuration sources' subsection under 'Bootstrap modes' that documents the precedence table and the K8s injection pattern. The 'Bootstrap-token lifecycle' step about removing the token after rotation now applies to whichever source was used (Secret, env var, or YAML field).	2026-04-28 15:00:33 +01:00
cybermaggedon	67b2fc448f	feat: IAM service, gateway auth middleware, capability model, and CLIs (#849 ) Replaces the legacy GATEWAY_SECRET shared-token gate with an IAM-backed identity and authorisation model. The gateway no longer has an "allow-all" or "no auth" mode; every request is authenticated via the IAM service, authorised against a capability model that encodes both the operation and the workspace it targets, and rejected with a deliberately-uninformative 401 / 403 on any failure. IAM service (trustgraph-flow/trustgraph/iam, trustgraph-base/schema/iam) ----------------------------------------------------------------------- * New backend service (iam-svc) owning users, workspaces, API keys, passwords and JWT signing keys in Cassandra. Reached over the standard pub/sub request/response pattern; gateway is the only caller. * Operations: bootstrap, resolve-api-key, login, get-signing-key-public, rotate-signing-key, create/list/get/update/disable/delete/enable-user, change-password, reset-password, create/list/get/update/disable- workspace, create/list/revoke-api-key. * Ed25519 JWT signing (alg=EdDSA). Key rotation writes a new kid and retires the previous one; validation is grace-period friendly. * Passwords: PBKDF2-HMAC-SHA-256, 600k iterations, per-user salt. * API keys: 128-bit random, SHA-256 hashed. Plaintext returned once. * Bootstrap is explicit: --bootstrap-mode {token,bootstrap} is a required startup argument with no permissive default. Masked "auth failure" errors hide whether a refused bootstrap request was due to mode, state, or authorisation. Gateway authentication (trustgraph-flow/trustgraph/gateway/auth.py) ------------------------------------------------------------------- * IamAuth replaces the legacy Authenticator. Distinguishes JWTs (three-segment dotted) from API keys by shape; verifies JWTs locally using the cached IAM public key; resolves API keys via IAM with a short-TTL hash-keyed cache. Every failure path surfaces the same 401 body ("auth failure") so callers cannot enumerate credential state. * Public key is fetched at gateway startup with a bounded retry loop; traffic does not begin flowing until auth has started. Capability model (trustgraph-flow/trustgraph/gateway/capabilities.py) --------------------------------------------------------------------- * Roles have two dimensions: a capability set and a workspace scope. OSS ships reader / writer / admin; the first two are workspace- assigned, admin is cross-workspace (""). No "cross-workspace" pseudo-capability — workspace permission is a property of the role. check(identity, capability, target_workspace=None) is the single authorisation test: some role must grant the capability and be active in the target workspace. * enforce_workspace validates a request-body workspace against the caller's role scopes and injects the resolved value. Cross- workspace admin is permitted by role scope, not by a bypass. * Gateway endpoints declare a required capability explicitly — no permissive default. Construction fails fast if omitted. Enterprise editions can replace the role table without changing the wire protocol. WebSocket first-frame auth (dispatch/mux.py, endpoint/socket.py) ---------------------------------------------------------------- * /api/v1/socket handshake unconditionally accepts; authentication runs on the first WebSocket frame ({"type":"auth","token":"..."}) with {"type":"auth-ok","workspace":"..."} / {"type":"auth-failed"}. The socket stays open on failure so the client can re-authenticate — browsers treat a handshake-time 401 as terminal, breaking reconnection. * Mux.receive rejects every non-auth frame before auth succeeds, enforces the caller's workspace (envelope + inner payload) using the role-scope resolver, and supports mid-session re-auth. * Flow import/export streaming endpoints keep the legacy ?token= handshake (URL-scoped short-lived transfers; no re-auth need). Auth surface ------------ * POST /api/v1/auth/login — public, returns a JWT. * POST /api/v1/auth/bootstrap — public; forwards to IAM's bootstrap op which itself enforces mode + tables-empty. * POST /api/v1/auth/change-password — any authenticated user. * POST /api/v1/iam — admin-only generic forwarder for the rest of the IAM API (per-op REST endpoints to follow in a later change). Removed / breaking ------------------ * GATEWAY_SECRET / --api-token / default_api_token and the legacy Authenticator.permitted contract. The gateway cannot run without IAM. * ?token= on /api/v1/socket. * DispatcherManager and Mux both raise on auth=None — no silent downgrade path. CLI tools (trustgraph-cli) -------------------------- tg-bootstrap-iam, tg-login, tg-create-user, tg-list-users, tg-disable-user, tg-enable-user, tg-delete-user, tg-change-password, tg-reset-password, tg-create-api-key, tg-list-api-keys, tg-revoke-api-key, tg-create-workspace, tg-list-workspaces. Passwords read via getpass; tokens / one-time secrets written to stdout with operator context on stderr so shell composition works cleanly. AsyncSocketClient / SocketClient updated to the first-frame auth protocol. Specifications -------------- * docs/tech-specs/iam.md updated with the error policy, workspace resolver extension point, and OSS role-scope model. * docs/tech-specs/iam-protocol.md (new) — transport, dataclasses, operation table, error taxonomy, bootstrap modes. * docs/tech-specs/capabilities.md (new) — capability vocabulary, OSS role bundles, agent-as-composition note, enforcement-boundary policy, enterprise extensibility. Tests ----- * test_auth.py (rewritten) — IamAuth + JWT round-trip with real Ed25519 keypairs + API-key cache behaviour. * test_capabilities.py (new) — role table sanity, check across role x workspace combinations, enforce_workspace paths, unknown-cap / unknown-role fail-closed. * Every endpoint test construction now names its capability explicitly (no permissive defaults relied upon). New tests pin the fail-closed invariants: DispatcherManager / Mux refuse auth=None; i18n path-traversal defense is exercised. * test_socket_graceful_shutdown rewritten against IamAuth.	2026-04-24 17:29:10 +01:00
Jack Colquitt	cd9c307453	Add Cossmology badge to README (#850 ) Added a badge for trustgraph on Cossmology to README.	2026-04-23 13:40:39 -07:00

1 2 3 4 5 ...

1377 commits