Commit graph

1478 commits

Author SHA1 Message Date
Cyber MacGeddon
4e3bd85abc Merge branch 'master' into release/v2.5 2026-05-19 18:57:26 +01:00
Cyber MacGeddon
668b64742f Merge branch 'release/v2.4' 2026-05-19 18:01:35 +01:00
cybermaggedon
66912d9f55
Open 2.5 release branch (#939) 2026-05-19 16:07:27 +01:00
cybermaggedon
fd6e3e1269
fix: stop dropping messages on Pulsar flow restarts (#938)
consumer.py called unsubscribe() on every flow stop, deleting the
server-side subscription cursor. On restart, initial_position='latest'
skipped any messages published during the gap — causing intermittent
data loss (e.g. graph embeddings silently never reaching Qdrant).

Replace unsubscribe() with close() so the cursor survives restarts.
Move subscription cleanup to where it belongs: the Pulsar backend's
delete_topic(), called by the flow controller on deliberate flow
deletion. This was previously a no-op TODO.
2026-05-19 13:26:39 +01:00
cybermaggedon
47dfc30c1c
fix: suppress Pulsar C++ client log noise (#936)
Revert consumer receive timeout from 100ms back to the original
2000ms.  The 100ms change was based on a misunderstanding — receive()
is a blocking call that returns immediately when a message arrives,
so the timeout only affects how quickly a consumer checks the shutdown
flag during idle periods.  100ms generated ~200 WARN lines/sec from
the C++ client with no latency benefit.

Also set the Pulsar C++ client logger to Error level so residual
timeout warnings from the subscriber (250ms) don't produce noise.

Update poll timeout test to match reverted 2000ms value
2026-05-18 22:08:52 +01:00
cybermaggedon
29d3100c46
fix: IAM bootstrap atomicity and bootstrapper startup ordering (#935)
IAM auto-bootstrap could get permanently stuck in a half-done state:
_seed_tables wrote the workspace first, so any_workspace_exists()
returned true on restart even when user/key/signing-key creation had
failed.  Remove workspace creation from _seed_tables (WorkspaceInit
handles it) and use any_signing_key_exists() as the completion check
since the signing key is the last thing written.

Run pre-service initialisers (PulsarTopology) in start() before
opening pub/sub connections, breaking the chicken-and-egg where the
bootstrapper needed Pulsar namespaces that it was responsible for
creating.  Guard against empty cluster list when broker isn't ready.
2026-05-18 22:08:12 +01:00
cybermaggedon
76e3358ed3
fix: guard against empty query in SPARQL generator (#934)
Split the query once and check the parts list before indexing,
preventing an IndexError if the LLM returns an empty or
whitespace-only string.

Fixes #870.
2026-05-18 14:19:19 +01:00
cybermaggedon
da7d10e995
feat: add no-auth IAM regime as a drop-in replacement for iam-svc (#933)
Adds `no-auth-svc`, a lightweight IAM service that permits all access
unconditionally — no database, no bootstrap, no signing keys.  Deploy
it in place of `iam-svc` for development, demos, and single-user
setups where authentication overhead is unwanted.

The gateway no longer hard-codes a 401 on missing credentials.
Instead it asks the IAM regime via a new `authenticate-anonymous`
operation whether token-free access is allowed.  This keeps the
gateway regime-agnostic: `iam-svc` rejects anonymous auth (preserving
existing security), while `no-auth-svc` grants it with a configurable
default user and workspace.

Includes a tech spec (docs/tech-specs/no-auth-regime.md) and tests
that pin the safety boundary — malformed tokens never fall through
to the anonymous path, and a contract test ensures the full iam-svc
always rejects `authenticate-anonymous`.
2026-05-18 14:10:05 +01:00
cybermaggedon
71517e6417
release/v2.4 -> master (#932)
* CLI auth migration, document embeddings core lifecycle (#913)

Migrate get_kg_core and put_kg_core CLI tools to use Api/SocketClient
with first-frame auth (fixes broken raw websocket path). Fix wire
format field names (root/vector). Remove ~600 lines of dead raw
websocket code from invoke_graph_rag.py.

Add document embeddings core lifecycle to the knowledge service:
list/get/put/delete/load operations across schema, translator,
Cassandra table store, knowledge manager, gateway registry, REST API,
socket client, and CLI (tg-get-de-core, tg-put-de-core).

Fix delete_kg_core to also clean up document embeddings rows.

* Remove spurious workspace parameter from SPARQL algebra evaluator (#915)

Fix threading of workspace paramater:
- The SPARQL algebra evaluator was threading a workspace parameter
  through every function and passing it to TriplesClient.query(),
  which doesn't accept it. Workspace isolation is handled by pub/sub
  topic routing — the TriplesClient is already scoped to a
  workspace-specific flow, same as GraphRAG. Passing workspace
  explicitly was both incorrect and unnecessary.

Update tests:
- tests/unit/test_query/test_sparql_algebra.py (new) — Tests
  _query_pattern, _eval_bgp, and evaluate() with various algebra
  nodes. Key tests assert workspace is never in tc.query() kwargs,
  plus correctness tests for BGP, JOIN, UNION, SLICE, DISTINCT, and
  edge cases.
- tests/unit/test_retrieval/test_graph_rag.py — Added
  test_triples_query_never_passes_workspace (checks query()) and
  test_follow_edges_never_passes_workspace (checks query_stream()).

* Make all Cassandra and Qdrant I/O async-safe with proper concurrency controls (#916)

Cassandra triples services were using syncronous EntityCentricKnowledgeGraph
methods from async contexts, and connection state was managed with
threading.local which is wrong for asyncio coroutines sharing a single
thread. Qdrant services had no async wrapping at all, blocking the event
loop on every network call. Rows services had unprotected shared state
mutations across concurrent coroutines.

- Add async methods to EntityCentricKnowledgeGraph (async_insert,
  async_get_s/p/o/sp/po/os/spo/all, async_collection_exists,
  async_create_collection, async_delete_collection) using the existing
  cassandra_async.async_execute bridge
- Rewrite triples write + query services: replace threading.local with
  asyncio.Lock + dict cache for per-workspace connections, use async
  ECKG methods for all data operations, keep asyncio.to_thread only for
  one-time blocking ECKG construction
- Wrap all Qdrant calls in asyncio.to_thread across all 6 services
  (doc/graph/row embeddings write + query), add asyncio.Lock + set cache
  for collection existence checks
- Add asyncio.Lock to rows write + query services to protect shared
  state (schemas, sessions, config caches) from concurrent mutation
- Update all affected tests to match new async patterns

* Fixed error only returning a page of results (#921)

The root cause: async_execute only materialises the first result
page (by design — it says so in its docstring). The streaming query
set fetch_size=20 and expected to iterate all results, but only got
the first 20 rows back.

The fix uses
  asyncio.to_thread(lambda: list(tg.session.execute(...)))
which lets the sync driver iterate
all pages in a worker thread — exactly what the pre-async code did.

* Optional test warning suppression (#923)

* Fix test collection module errors & silence upstream Pytest warnings (#823)

* chore: add virtual environment and .env directories to gitignore

* test: filter upstream DeprecationWarning and UserWarning messages

* fix(namespace): remove empty __init__.py files to fix PEP 420 implicit namespace routing for trustgraph sub-packages

* Revert __init__.py deletions

* Add .ini changes but commented out, will be useful at times

---------

Co-authored-by: Salil M <d2kyt@protonmail.com>

* fix(openai): fail fast on unrecoverable RateLimitError codes (#901) (#904) (#925)

Co-authored-by: Sahil Yadav <sahilyadav.sy2004@gmail.com>

* Ensure retry exception is properly raised (#926)

* fix: library API get/update document round-trip bugs (#893) (#928)

Fix 5 cascading bugs in the Library API wrapper that prevented
the get_documents → update_document round-trip from working:

- Tolerate missing title field in document metadata (use .get())
- Use attribute access on Triple objects instead of subscript
- Serialize datetime to int seconds for JSON compatibility
- Handle empty server response on successful update
- Send both id and document-id keys in update request

Added library API tests

* Fix ontology selector defaults, add bypass mode, enforce domain/range (#929)

- Align similarity_threshold default to 0.3 everywhere (class signature
  had stale 0.7). Fix matching contradiction in tech-spec.
- Add bypass_selector_below parameter (default 5) to skip vector
  similarity selection when ontology element count is small enough.
- Enforce domain/range constraints in TripleConverter for object
  properties and datatype properties, with subclass hierarchy support.
  Properties with no declared domain/range pass through unchanged.
- Add unit tests for domain/range validation, subclass acceptance,
  polymorphic pass-through, and selector bypass.

Fixes #908, #920

* Close producers on flow stop to prevent stale non-persistent topics (#930)

Flow.stop() only stopped consumers, leaving response producers
connected to non-persistent Pulsar topics. After flow restart, the
orphaned producers held stale broker routing state, causing response
messages to never reach new consumers — manifesting as 120s timeouts
on document-embeddings and similar RPC paths.

Fix: Flow.stop() now explicitly stops all producers. Producer.stop()
closes the underlying Pulsar producer connection rather than just
setting a flag.

Fixes #906

* fix(gateway): propagate --timeout flag to per-service dispatchers (#931)

The api-gateway accepts a --timeout flag (default 600s) but the value
was not propagated into DispatcherManager, which hard-coded
timeout=120 for every per-service dispatcher (graph-rag, document-rag,
text-completion, embeddings, librarian, etc.).

This meant any synchronous request taking more than 120 seconds would
always return a Timeout error at the 120s mark, regardless of the
--timeout value set on the gateway.

Changes:
- Add timeout parameter to DispatcherManager.__init__ (default: 120
  for backward compatibility)
- Store self.timeout in DispatcherManager
- Replace both hardcoded timeout=120 with self.timeout in
  invoke_global_service and invoke_flow_service
- Pass self.timeout from Api to DispatcherManager in service.py
- Document the timeout parameter in the docstring

Fixes #894

---------

Co-authored-by: Salil M <d2kyt@protonmail.com>
Co-authored-by: Sahil Yadav <sahilyadav.sy2004@gmail.com>
Co-authored-by: Mister Lobster <jlaportebot@gmail.com>
2026-05-18 09:46:58 +01:00
Mister Lobster
ab83c81d8a fix(gateway): propagate --timeout flag to per-service dispatchers (#931)
The api-gateway accepts a --timeout flag (default 600s) but the value
was not propagated into DispatcherManager, which hard-coded
timeout=120 for every per-service dispatcher (graph-rag, document-rag,
text-completion, embeddings, librarian, etc.).

This meant any synchronous request taking more than 120 seconds would
always return a Timeout error at the 120s mark, regardless of the
--timeout value set on the gateway.

Changes:
- Add timeout parameter to DispatcherManager.__init__ (default: 120
  for backward compatibility)
- Store self.timeout in DispatcherManager
- Replace both hardcoded timeout=120 with self.timeout in
  invoke_global_service and invoke_flow_service
- Pass self.timeout from Api to DispatcherManager in service.py
- Document the timeout parameter in the docstring

Fixes #894
2026-05-18 09:44:37 +01:00
cybermaggedon
2b70a1ea8e
Close producers on flow stop to prevent stale non-persistent topics (#930)
Flow.stop() only stopped consumers, leaving response producers
connected to non-persistent Pulsar topics. After flow restart, the
orphaned producers held stale broker routing state, causing response
messages to never reach new consumers — manifesting as 120s timeouts
on document-embeddings and similar RPC paths.

Fix: Flow.stop() now explicitly stops all producers. Producer.stop()
closes the underlying Pulsar producer connection rather than just
setting a flag.

Fixes #906
2026-05-16 16:07:16 +01:00
cybermaggedon
38d9c746a8
Fix ontology selector defaults, add bypass mode, enforce domain/range (#929)
- Align similarity_threshold default to 0.3 everywhere (class signature
  had stale 0.7). Fix matching contradiction in tech-spec.
- Add bypass_selector_below parameter (default 5) to skip vector
  similarity selection when ontology element count is small enough.
- Enforce domain/range constraints in TripleConverter for object
  properties and datatype properties, with subclass hierarchy support.
  Properties with no declared domain/range pass through unchanged.
- Add unit tests for domain/range validation, subclass acceptance,
  polymorphic pass-through, and selector bypass.

Fixes #908, #920
2026-05-16 15:13:38 +01:00
cybermaggedon
aea4c2df8e
fix: library API get/update document round-trip bugs (#893) (#928)
Fix 5 cascading bugs in the Library API wrapper that prevented
the get_documents → update_document round-trip from working:

- Tolerate missing title field in document metadata (use .get())
- Use attribute access on Triple objects instead of subscript
- Serialize datetime to int seconds for JSON compatibility
- Handle empty server response on successful update
- Send both id and document-id keys in update request

Added library API tests
2026-05-16 11:32:51 +01:00
cybermaggedon
913f610db5
Ensure retry exception is properly raised (#926) 2026-05-15 13:35:04 +01:00
cybermaggedon
58b5c5c8d5
fix(openai): fail fast on unrecoverable RateLimitError codes (#901) (#904) (#925)
Co-authored-by: Sahil Yadav <sahilyadav.sy2004@gmail.com>
2026-05-15 13:32:30 +01:00
cybermaggedon
142dd0231c
release/v2.4 -> master (#924)
* CLI auth migration, document embeddings core lifecycle (#913)

Migrate get_kg_core and put_kg_core CLI tools to use Api/SocketClient
with first-frame auth (fixes broken raw websocket path). Fix wire
format field names (root/vector). Remove ~600 lines of dead raw
websocket code from invoke_graph_rag.py.

Add document embeddings core lifecycle to the knowledge service:
list/get/put/delete/load operations across schema, translator,
Cassandra table store, knowledge manager, gateway registry, REST API,
socket client, and CLI (tg-get-de-core, tg-put-de-core).

Fix delete_kg_core to also clean up document embeddings rows.

* Remove spurious workspace parameter from SPARQL algebra evaluator (#915)

Fix threading of workspace paramater:
- The SPARQL algebra evaluator was threading a workspace parameter
  through every function and passing it to TriplesClient.query(),
  which doesn't accept it. Workspace isolation is handled by pub/sub
  topic routing — the TriplesClient is already scoped to a
  workspace-specific flow, same as GraphRAG. Passing workspace
  explicitly was both incorrect and unnecessary.

Update tests:
- tests/unit/test_query/test_sparql_algebra.py (new) — Tests
  _query_pattern, _eval_bgp, and evaluate() with various algebra
  nodes. Key tests assert workspace is never in tc.query() kwargs,
  plus correctness tests for BGP, JOIN, UNION, SLICE, DISTINCT, and
  edge cases.
- tests/unit/test_retrieval/test_graph_rag.py — Added
  test_triples_query_never_passes_workspace (checks query()) and
  test_follow_edges_never_passes_workspace (checks query_stream()).

* Make all Cassandra and Qdrant I/O async-safe with proper concurrency controls (#916)

Cassandra triples services were using syncronous EntityCentricKnowledgeGraph
methods from async contexts, and connection state was managed with
threading.local which is wrong for asyncio coroutines sharing a single
thread. Qdrant services had no async wrapping at all, blocking the event
loop on every network call. Rows services had unprotected shared state
mutations across concurrent coroutines.

- Add async methods to EntityCentricKnowledgeGraph (async_insert,
  async_get_s/p/o/sp/po/os/spo/all, async_collection_exists,
  async_create_collection, async_delete_collection) using the existing
  cassandra_async.async_execute bridge
- Rewrite triples write + query services: replace threading.local with
  asyncio.Lock + dict cache for per-workspace connections, use async
  ECKG methods for all data operations, keep asyncio.to_thread only for
  one-time blocking ECKG construction
- Wrap all Qdrant calls in asyncio.to_thread across all 6 services
  (doc/graph/row embeddings write + query), add asyncio.Lock + set cache
  for collection existence checks
- Add asyncio.Lock to rows write + query services to protect shared
  state (schemas, sessions, config caches) from concurrent mutation
- Update all affected tests to match new async patterns

* Fixed error only returning a page of results (#921)

The root cause: async_execute only materialises the first result
page (by design — it says so in its docstring). The streaming query
set fetch_size=20 and expected to iterate all results, but only got
the first 20 rows back.

The fix uses
  asyncio.to_thread(lambda: list(tg.session.execute(...)))
which lets the sync driver iterate
all pages in a worker thread — exactly what the pre-async code did.

* Optional test warning suppression (#923)

* Fix test collection module errors & silence upstream Pytest warnings (#823)

* chore: add virtual environment and .env directories to gitignore

* test: filter upstream DeprecationWarning and UserWarning messages

* fix(namespace): remove empty __init__.py files to fix PEP 420 implicit namespace routing for trustgraph sub-packages

* Revert __init__.py deletions

* Add .ini changes but commented out, will be useful at times

---------

Co-authored-by: Salil M <d2kyt@protonmail.com>
2026-05-15 13:02:51 +01:00
cybermaggedon
01b1fd849d
Optional test warning suppression (#923)
* Fix test collection module errors & silence upstream Pytest warnings (#823)

* chore: add virtual environment and .env directories to gitignore

* test: filter upstream DeprecationWarning and UserWarning messages

* fix(namespace): remove empty __init__.py files to fix PEP 420 implicit namespace routing for trustgraph sub-packages

* Revert __init__.py deletions

* Add .ini changes but commented out, will be useful at times

---------

Co-authored-by: Salil M <d2kyt@protonmail.com>
2026-05-15 12:58:12 +01:00
cybermaggedon
846282c375
Fixed error only returning a page of results (#921)
The root cause: async_execute only materialises the first result
page (by design — it says so in its docstring). The streaming query
set fetch_size=20 and expected to iterate all results, but only got
the first 20 rows back.

The fix uses
  asyncio.to_thread(lambda: list(tg.session.execute(...)))
which lets the sync driver iterate
all pages in a worker thread — exactly what the pre-async code did.
2026-05-14 21:03:09 +01:00
cybermaggedon
a2dde9cafb
Make all Cassandra and Qdrant I/O async-safe with proper concurrency controls (#916)
Cassandra triples services were using syncronous EntityCentricKnowledgeGraph
methods from async contexts, and connection state was managed with
threading.local which is wrong for asyncio coroutines sharing a single
thread. Qdrant services had no async wrapping at all, blocking the event
loop on every network call. Rows services had unprotected shared state
mutations across concurrent coroutines.

- Add async methods to EntityCentricKnowledgeGraph (async_insert,
  async_get_s/p/o/sp/po/os/spo/all, async_collection_exists,
  async_create_collection, async_delete_collection) using the existing
  cassandra_async.async_execute bridge
- Rewrite triples write + query services: replace threading.local with
  asyncio.Lock + dict cache for per-workspace connections, use async
  ECKG methods for all data operations, keep asyncio.to_thread only for
  one-time blocking ECKG construction
- Wrap all Qdrant calls in asyncio.to_thread across all 6 services
  (doc/graph/row embeddings write + query), add asyncio.Lock + set cache
  for collection existence checks
- Add asyncio.Lock to rows write + query services to protect shared
  state (schemas, sessions, config caches) from concurrent mutation
- Update all affected tests to match new async patterns
2026-05-14 16:00:54 +01:00
cybermaggedon
bb1109963c
Remove spurious workspace parameter from SPARQL algebra evaluator (#915)
Fix threading of workspace paramater:
- The SPARQL algebra evaluator was threading a workspace parameter
  through every function and passing it to TriplesClient.query(),
  which doesn't accept it. Workspace isolation is handled by pub/sub
  topic routing — the TriplesClient is already scoped to a
  workspace-specific flow, same as GraphRAG. Passing workspace
  explicitly was both incorrect and unnecessary.

Update tests:
- tests/unit/test_query/test_sparql_algebra.py (new) — Tests
  _query_pattern, _eval_bgp, and evaluate() with various algebra
  nodes. Key tests assert workspace is never in tc.query() kwargs,
  plus correctness tests for BGP, JOIN, UNION, SLICE, DISTINCT, and
  edge cases.
- tests/unit/test_retrieval/test_graph_rag.py — Added
  test_triples_query_never_passes_workspace (checks query()) and
  test_follow_edges_never_passes_workspace (checks query_stream()).
2026-05-14 12:03:43 +01:00
cybermaggedon
f0ad282708
CLI auth migration, document embeddings core lifecycle (#913)
Migrate get_kg_core and put_kg_core CLI tools to use Api/SocketClient
with first-frame auth (fixes broken raw websocket path). Fix wire
format field names (root/vector). Remove ~600 lines of dead raw
websocket code from invoke_graph_rag.py.

Add document embeddings core lifecycle to the knowledge service:
list/get/put/delete/load operations across schema, translator,
Cassandra table store, knowledge manager, gateway registry, REST API,
socket client, and CLI (tg-get-de-core, tg-put-de-core).

Fix delete_kg_core to also clean up document embeddings rows.
2026-05-14 10:30:21 +01:00
elpresidank
ffd97375a8 saving 2026-05-12 08:06:58 -05:00
elpresidank
e8c7a4f6e0 Merge remote-tracking branch 'origin/master' into ts-port 2026-05-11 19:46:08 -05:00
elpresidank
a20dd1999c saving 2026-05-11 19:44:40 -05:00
Cyber MacGeddon
159b1e2824 Merge branch 'release/v2.4' 2026-05-11 15:15:50 +01:00
KOTHA-SRIVIBHU
dd974b0cac fix: replace bare excepts in NLTK initialization (#896) 2026-05-11 15:12:25 +01:00
KOTHA-SRIVIBHU
ab02c02b33
fix: replace bare excepts in NLTK initialization (#896) 2026-05-11 15:11:30 +01:00
Sahil Yadav
d08ec56a73 fix: resolve publisher resource leak and field parse validation (#886) 2026-05-11 15:06:54 +01:00
Sahil Yadav
c2f1759bdf
fix: resolve publisher resource leak and field parse validation (#886) 2026-05-11 15:06:24 +01:00
Jack Colquitt
80a7579639
Enhance README.md descriptions for TrustGraph and Context Core (#892)
Refined the description of TrustGraph and updated the Context Core explanation for clarity.
2026-05-08 13:45:06 -07:00
cybermaggedon
fd8d5b2c42
Recent fixes -> release/v2.4 (#891)
* Fix publisher resource leak in librarian submit_document (#883)

Wrap pub.start()/pub.send() in try/finally to guarantee pub.stop() is
called on error. Remove unnecessary asyncio.sleep(1) kludge.

* Make Cassandra replication factor configurable (issue #787) (#887)

Add CASSANDRA_REPLICATION_FACTOR environment variable and
--cassandra-replication-factor CLI argument to cassandra_config.py.

Update all four table store constructors (ConfigTableStore,
KnowledgeTableStore, LibraryTableStore, IamTableStore) to accept
an optional replication_factor parameter and use it in keyspace
creation CQL queries.

Thread the replication factor through all service constructors:
Configuration, KnowledgeManager, Librarian, IamService, and
knowledge store Processor.

* Update tests

---------

Co-authored-by: gittihub-jpg <rico@springer-mail.net>
2026-05-08 19:48:12 +01:00
gittihub-jpg
e23d4a5b58
Make Cassandra replication factor configurable (issue #787) (#887)
Add CASSANDRA_REPLICATION_FACTOR environment variable and
--cassandra-replication-factor CLI argument to cassandra_config.py.

Update all four table store constructors (ConfigTableStore,
KnowledgeTableStore, LibraryTableStore, IamTableStore) to accept
an optional replication_factor parameter and use it in keyspace
creation CQL queries.

Thread the replication factor through all service constructors:
Configuration, KnowledgeManager, Librarian, IamService, and
knowledge store Processor.
2026-05-08 19:31:58 +01:00
gittihub-jpg
f9d6606423
Fix publisher resource leak in librarian submit_document (#883)
Wrap pub.start()/pub.send() in try/finally to guarantee pub.stop() is
called on error. Remove unnecessary asyncio.sleep(1) kludge.
2026-05-08 19:22:48 +01:00
cybermaggedon
fe542b3d33
Remove race condition in workspace initialisation if iam-svc is up (#867)
Remove race condition in workspace initialisation if iam-svc is up
before config-svc.

iam.py — handle_create_workspace:
- Config registration (_on_workspace_created) moves before the IAM
  table write, so it's a prerequisite. If the config put fails,
  the exception propagates and the IAM create doesn't happen.
- On duplicate, the IAM table write is skipped but config
  registration still runs (idempotent put). Returns the existing
  record with no error instead of returning _err("duplicate", ...).

service.py — _announce_workspace_created → _ensure_workspace_registered:
- Renamed to reflect the new semantics.
- Exceptions propagate instead of being swallowed — if config
  registration fails, the caller sees the error.
2026-05-06 21:21:48 +01:00
cybermaggedon
d282d72db1
Fixed document-rag workspace problem (#866)
- Fixed document-rag workspace problem
- OpenAI text-completion processor now puts 'not-set' in the token
  if no token is set (new OpenAI library requires it to be set to
  something.
- Update tests
2026-05-06 14:55:21 +01:00
cybermaggedon
03cc5ac80f
Per-flow librarian clients and per-workspace response queues (#865)
Replace singleton LibrarianClient with per-flow instances via the new
LibrarianSpec, giving each flow its own librarian tied to the
workspace-scoped request/response queues from the blueprint.

Move all workspace-scoped services (config, flow, librarian, knowledge)
from a single base-queue response producer to per-workspace response
producers created alongside the existing per-workspace request
consumers.  Update the gateway dispatcher and bootstrapper flow client
to subscribe to the matching workspace-scoped response queues.

Fix WorkspaceInit to register workspaces through the IAM
create-workspace API so they appear in __workspaces__ and are visible
to the gateway.  Simplify the bootstrapper gate to only check
config-svc reachability.

Updated tests accordingly.
2026-05-06 12:01:01 +01:00
Jack Colquitt
1ffae12559
Revise README to reflect agent runtime platform (#864)
Updated platform description and added messaging systems.
2026-05-05 11:47:19 -07:00
cybermaggedon
01bf1d89d5
Fixed a circular dependency causing bootstrap to fail (#863)
TemplateSeed and WorkspaceInit now run pre-gate. They'll
write templates and register the default workspace before the gate
checks flow-svc, breaking the circular dependency.
2026-05-05 16:00:21 +01:00
cybermaggedon
9f2bfbce0c
Per-workspace queue routing for workspace-scoped services (#862)
Workspace identity is now determined by queue infrastructure instead of
message body fields, closing a privilege-escalation vector where a caller
could spoof workspace in the request payload.

- Add WorkspaceProcessor base class: discovers workspaces from config at
  startup, creates per-workspace consumers (queue:workspace), and manages
  consumer lifecycle on workspace create/delete events
- Roll out to librarian, flow-svc, knowledge cores, and config-svc
- Config service gets a dual-queue regime: a system queue for
  cross-workspace ops (getvalues-all-ws, bootstrapper writes to
  __workspaces__) and per-workspace queues for tenant-scoped ops, with
  workspace discovery from its own Cassandra store
- Remove workspace field from request schemas (FlowRequest,
  LibrarianRequest, KnowledgeRequest, CollectionManagementRequest) and
  from DocumentMetadata / ProcessingMetadata — table stores now accept
  workspace as an explicit parameter
- Strip workspace encode/decode from all message translators and gateway
  serializers
- Gateway enforces workspace existence: reject requests targeting
  non-existent workspaces instead of routing to queues with no consumer
- Config service provisions new workspaces from __template__ on creation
- Add workspace lifecycle hooks to AsyncProcessor so any processor can
  react to workspace create/delete without subclassing WorkspaceProcessor
2026-05-04 10:30:03 +01:00
elpresidank
54a6e49bd3 Merge remote-tracking branch 'origin/master' into ts-port 2026-05-01 22:16:51 -05:00
elpresidank
6ac5446a76 feat(mcp-tool): wire McpToolService into deploy stack
Three pieces, all required for an end-to-end MCP tool call:

* McpToolService used generic spec names "request"/"response" instead of
  "mcp-tool-request"/"mcp-tool-response", so RequestResponseSpec's
  flow-config topic lookup never matched and consumers bound to literal
  subjects nobody else publishes to.

* Add entrypoints/mcp-tool.mjs (mirrors agent/librarian entrypoints) so
  the service can be launched in the prebuilt trustgraph-ts image.

* Add a `mcp-tool` service block to deploy/docker-compose.yml.

With these three fixes plus a `mcp-tool-request`/`mcp-tool-response`
entry in each flow's topics map, the agent ReAct loop can now invoke
remote MCP tools (verified end-to-end against Brave Search and FireCrawl).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:16:37 -05:00
elpresidank
4c356cd24c fix(client): use correct put/delete config wire shape
ConfigApi.putConfig and deleteConfig (and the duplicate in FlowsApi) sent
a flat values:[{type,key,value}] array and a keys:{type,key} object —
neither matches the ConfigService schema, which requires keys:[namespace,
...innerKeys] and values:Record<string,unknown>. Every save in the
workbench /mcp-tools page returned `Put requires at least one key
(namespace)`.

putConfig now groups items by type (namespace) and issues one put per
group; deleteConfig sends keys:[type, key].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:16:28 -05:00
cybermaggedon
9be257ceee
Update packages with vulns in container builds (#861)
* Fix vulns-flagged imports

* Fix archaic pulls in the "trustgraph" package

* Add unstructured to meta package
2026-04-30 20:02:53 +01:00
cybermaggedon
89f058d35b
Fix erroneous handling of __template__ and __system__ workspaces (#860)
__template__ and __system__ are not real workspace but config
zones for workspace template and init handling.  They should not
be processed as workspaces.

Now ignores workspace beginning with '_' prefix.
2026-04-30 09:53:58 +01:00
cybermaggedon
69b0b9b326
Add missing websockets dep (#859)
Added 'websockets' to pyproject.toml in trustgraph-base
2026-04-30 09:53:32 +01:00
Cyber MacGeddon
c112af0ab0 align chunker + googleaistudio fixes with release/v2.4
Master had a parallel sibling fix for issue #821 (PR #828) using
self.RecursiveCharacterTextSplitter / self.TokenTextSplitter; release
branches converged on the bare module-level form.  Adopt release/v2.4's
version so downstream branches don't drift further.
2026-04-29 18:01:31 +01:00
Cyber MacGeddon
f3434307c5 Merge branch 'release/v2.4' 2026-04-29 17:57:24 +01:00
Jack Colquitt
627cb1e0d8
Remove Cossmology badge from README (#857)
Removed the badge for trustgraph on Cossmology from the README.
2026-04-28 16:54:23 -07:00
cybermaggedon
d0850ff381
Delete some stuff to free up disk space (#856) 2026-04-28 22:46:02 +01:00
cybermaggedon
9fc1d4527b
iam: self-service ops, optional workspace filters, Mux service routing (#855)
Three threads, all reinforcing the contract's system-level vs.
workspace-association distinction.

WS Mux service routing
- tg-show-flows (and any workspace-level service over the WS) was
  failing with "unknown service" because the post-refactor Mux
  unconditionally looked up flow-service:<kind>.  Now branches on
  the envelope's flow field: with flow → flow-service:<kind>;
  without flow → <kind>:<op> from the inner body; with bare op
  lookup for service=iam.  Resource and parameters come from the
  matched op's own extractors — same path the HTTP endpoints take.

Optional workspace on system-level user/key ops
- list-users returns the deployment-wide list when no workspace is
  supplied, filters when one is.  get-user, update-user,
  disable-user, enable-user, delete-user, reset-password,
  create-api-key, list-api-keys, revoke-api-key all treat workspace
  as an optional integrity check rather than a required argument.
- create-user keeps workspace required — there it's the new user's
  home-workspace binding, a parameter rather than an address.
- API keys reclassified as SYSTEM-level resources.  By the same
  reasoning that makes users system-level, an API key is a
  credential record on a deployment-wide registry; the workspace it
  authenticates to is a property, not a containment.

Self-service surface
- whoami: returns the caller's own user record.  AUTHENTICATED-only;
  no users:read capability required.  Foundation for UI affordances
  that depend on the caller's permissions.
- bootstrap-status: POST /api/v1/auth/bootstrap-status, PUBLIC,
  side-effect-free.  Returns {bootstrap_available: bool} so a
  first-run UI can decide whether to render setup without consuming
  the bootstrap op.
- Gateway now injects actor=identity.handle on every authenticated
  forward to iam-svc (IamEndpoint and WS Mux iam path), overwriting
  any caller-supplied value.  Underpins whoami, audit logging, and
  future regime-side decisions that need actor identity.
- tg-whoami and tg-update-user CLIs.

Spec polish
- iam-contract.md: actor-injection rule documented; whoami /
  bootstrap-status added to operations list; permission-scope
  framing tightened (workspace scope is a property of the grant,
  not the user or role).
- iam.md: self-service section; gateway flow gains the actor-
  injection step; role section reframed so iam-svc constraints
  don't leak into contract-level prose.
- iam-protocol.md: ops table updated for whoami, bootstrap-status,
  optional-workspace pattern; bootstrap_available added to the
  IamResponse listing.
2026-04-28 22:13:12 +01:00