Provenance triples are now included directly in explain messages from
GraphRAG, DocumentRAG, and Agent services, eliminating the need for
follow-up knowledge graph queries to retrieve explainability details.
Each explain message in the response stream now carries:
- explain_id: root URI for this provenance step (unchanged)
- explain_graph: named graph where triples are stored (unchanged)
- explain_triples: the actual provenance triples for this step (new)
Changes across the stack:
- Schema: added explain_triples field to GraphRagResponse,
DocumentRagResponse, and AgentResponse
- Services: all explain message call sites pass triples through
(graph_rag, document_rag, agent react, agent orchestrator)
- Translators: encode explain_triples via TripleTranslator for
gateway wire format
- Python SDK: ProvenanceEvent now includes parsed ExplainEntity
and raw triples; expanded event_type detection
- CLI: invoke_graph_rag, invoke_agent, invoke_document_rag use
inline entity when available, fall back to graph query
- Tech specs updated
Additional explainability test
Replace the config push mechanism that broadcast the full config
blob on a 'state' class pub/sub queue with a lightweight notify
signal containing only the version number and affected config
types. Processors fetch the full config via request/response from
the config service when notified.
This eliminates the need for the pub/sub 'state' queue class and
stateful pub/sub services entirely. The config push queue moves
from 'state' to 'flow' class — a simple transient signal rather
than a retained message. This solves the RabbitMQ
late-subscriber problem where restarting processes never received
the current config because their fresh queue had no historical
messages.
Key changes:
- ConfigPush schema: config dict replaced with types list
- Subscribe-then-fetch startup with retry: processors subscribe
to notify queue, fetch config via request/response, then
process buffered notifies with version comparison to avoid race
conditions
- register_config_handler() accepts optional types parameter so
handlers only fire when their config types change
- Short-lived config request/response clients to avoid subscriber
contention on non-persistent response topics
- Config service passes affected types through put/delete/flow
operations
- Gateway ConfigReceiver rewritten with same notify pattern and
retry loop
Tests updated
New tests:
- register_config_handler: without types, with types, multiple
types, multiple handlers
- on_config_notify: old/same version skipped, irrelevant types
skipped (version still updated), relevant type triggers fetch,
handler without types always called, mixed handler filtering,
empty types invokes all, fetch failure handled gracefully
- fetch_config: returns config+version, raises on error response,
stops client even on exception
- fetch_and_apply_config: applies to all handlers on startup,
retries on failure
* fix: prevent duplicate dispatcher creation race condition in invoke_global_service
Concurrent coroutines could all pass the `if key in self.dispatchers` check
before any of them wrote the result back, because `await dispatcher.start()`
yields to the event loop. This caused multiple Pulsar consumers to be created
on the same shared subscription, distributing responses round-robin and
dropping ~2/3 of them — manifesting as a permanent spinner in the Workbench UI.
Apply a double-checked asyncio.Lock in both `invoke_global_service` and
`invoke_flow_service` so only one dispatcher is ever created per service key.
* test: add concurrent-dispatch tests for race condition fix
Add asyncio.gather-based tests that verify invoke_global_service and
invoke_flow_service create exactly one dispatcher under concurrent calls,
preventing the duplicate Pulsar consumer bug.
Remove Pulsar-specific concepts from application code so that
the pub/sub backend is swappable via configuration.
Rename translators:
- to_pulsar/from_pulsar → decode/encode across all translator
classes, dispatch handlers, and tests (55+ files)
- from_response_with_completion → encode_with_completion
- Remove pulsar.schema.Record from translator base class
Queue naming (CLASS:TOPICSPACE:TOPIC):
- Replace topic() helper with queue() using new format:
flow:tg:name, request:tg:name, response:tg:name, state:tg:name
- Queue class implies persistence/TTL (no QoS in names)
- Update Pulsar backend map_topic() to parse new format
- Librarian queues use flow class (persistent, for chunking)
- Config push uses state class (persistent, last-value)
- Remove 15 dead topic imports from schema files
- Update init_trustgraph.py namespace: config → state
Confine Pulsar to pulsar_backend.py:
- Delete legacy PulsarClient class from pubsub.py
- Move add_args to add_pubsub_args() with standalone flag
for CLI tools (defaults to localhost)
- PulsarBackendConsumer.receive() catches _pulsar.Timeout,
raises standard TimeoutError
- Remove Pulsar imports from: async_processor, flow_processor,
log_level, all 11 client files, 4 storage writers, gateway
service, gateway config receiver
- Remove log_level/LoggerLevel from client API
- Rewrite tg-monitor-prompts to use backend abstraction
- Update tg-dump-queues to use add_pubsub_args
Also: pubsub-abstraction.md tech spec covering problem statement,
design goals, as-is requirements, candidate broker assessment,
approach, and implementation order.
Error responses from the websocket multiplexer were missing the
request ID and using a bare string format instead of the structured
error protocol. This caused clients to hang when a request failed
(e.g. unsupported service for a flow) because the error could not
be routed to the waiting caller.
Include request ID in all error paths, use structured error format
({message, type}) with complete flag, and extract the ID early in
receive() so even malformed requests get a routable error when
possible.
Updated tests - tests were coded against invalid protocol messages
The metadata field (list of triples) in the pipeline Metadata class
was redundant. Document metadata triples already flow directly from
librarian to triple-store via emit_document_provenance() - they don't
need to pass through the extraction pipeline.
Additionally, chunker and PDF decoder were overwriting metadata to []
anyway, so any metadata passed through the pipeline was being
discarded.
Changes:
- Remove metadata field from Metadata dataclass
(schema/core/metadata.py)
- Update all Metadata instantiations to remove metadata=[]
parameter
- Remove metadata handling from translators (document_loading,
knowledge)
- Remove metadata consumption from extractors (ontology, agent)
- Update gateway serializers and import handlers
- Update all unit, integration, and contract tests
* Changed schema for Value -> Term, majorly breaking change
* Following the schema change, Value -> Term into all processing
* Updated Cassandra for g, p, s, o index patterns (7 indexes)
* Reviewed and updated all tests
* Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down
* Tech spec for graceful shutdown
* Graceful shutdown of importers/exporters
* Update socket to include graceful shutdown orchestration
* Adding tests for conditions tracked in this PR