Commit graph

126 commits

Author SHA1 Message Date
Het Patel
adea976203 feat: implement retry logic and exponential backoff for S3 operations (#829)
* feat: implement retry logic and exponential backoff for S3 operations

* test: fix librarian mocks after BlobStore async conversion
2026-04-18 12:05:37 +01:00
cybermaggedon
d7745baab4
Add Kafka pub/sub backend (#830)
Third backend alongside Pulsar and RabbitMQ. Topics map 1:1 to Kafka
topics, subscriptions map to consumer groups. Response/notify uses
unique consumer groups with correlation ID filtering. Topic lifecycle
managed via AdminClient with class-based retention.

Initial code drop: Needs major integration testing
2026-04-18 11:18:34 +01:00
cybermaggedon
3505bfdd25
refactor: use one fanout exchange per topic instead of shared topic exchange (#827)
The RabbitMQ backend used a single topic exchange per topicspace
with routing keys to differentiate logical topics. This meant the
flow service had to manually create named queues for every
processor-topic pair, including producer-side topics — creating
phantom queues that accumulated unread message copies indefinitely.

Replace with one fanout exchange per logical topic. Consumers now
declare and bind their own queues on connect. The flow service
manages topic lifecycle (create/delete exchanges) rather than queue
lifecycle, and only collects unique topic identifiers instead of
per-processor (topic, subscription) pairs.

Backend API: create_queue/delete_queue/ensure_queue replaced with
create_topic/delete_topic/ensure_topic (subscription parameter
removed).
2026-04-17 18:01:35 +01:00
Het Patel
391b9076f3 feat: add domain and range validation to triple extraction in extract.py (#825) 2026-04-17 11:29:57 +01:00
cybermaggedon
9f84891fcc
Flow service lifecycle management (#822)
feat: separate flow service from config service with explicit queue
lifecycle management

The flow service is now an independent service that owns the lifecycle
of flow and blueprint queues. System services own their own queues.
Consumers never create queues.

Flow service separation:
- New service at trustgraph-flow/trustgraph/flow/service/
- Uses async ConfigClient (RequestResponse pattern) to talk to config
  service
- Config service stripped of all flow handling

Queue lifecycle management:
- PubSubBackend protocol gains create_queue, delete_queue,
  queue_exists, ensure_queue — all async
- RabbitMQ: implements via pika with asyncio.to_thread internally
- Pulsar: stubs for future admin REST API implementation
- Consumer _connect() no longer creates queues (passive=True for named
  queues)
- System services call ensure_queue on startup
- Flow service creates queues on flow start, deletes on flow stop
- Flow service ensures queues for pre-existing flows on startup

Two-phase flow stop:
- Phase 1: set flow status to "stopping", delete processor config
  entries
- Phase 2: retry queue deletion, then delete flow record

Config restructure:
- active-flow config replaced with processor:{name} types
- Each processor has its own config type, each flow variant is a key
- Flow start/stop use batch put/delete — single config push per
  operation
- FlowProcessor subscribes to its own type only

Blueprint format:
- Processor entries split into topics and parameters dicts
- Flow interfaces use {"flow": "topic"} instead of bare strings
- Specs (ConsumerSpec, ProducerSpec, etc.) read from
  definition["topics"]

Tests updated
2026-04-16 17:19:39 +01:00
cybermaggedon
22096e07e2
Fix tests broken by the recent RabbitMQ/Cassandra async fixes (#815)
- Fix invalid key in config causing rogue warning
- Fix asyncio test tags
2026-04-16 10:00:18 +01:00
Alex Jenkins
8954fa3ad7 Feat: TrustGraph i18n & Documentation Translation Updates (#781)
Native CLI i18n: The TrustGraph CLI has built-in translation support
that dynamically loads language strings. You can test and use
different languages by simply passing the --lang flag (e.g., --lang
es for Spanish, --lang ru for Russian) or by configuring your
environment's LANG variable.

Automated Docs Translations: This PR introduces autonomously
translated Markdown documentation into several target languages,
including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew,
Arabic, Simplified Chinese, and Russian.
2026-04-14 12:08:32 +01:00
cybermaggedon
f976f1b6fe
Fix test suite Prometheus registry pollution and remove default (#806)
Fix test suite Prometheus registry pollution and remove default coverage

The test_metrics.py fixture (added in 4e63dbda) deleted
class-level metric singleton attributes without restoring them. This
stripped the hasattr() guards that prevent duplicate Prometheus
registration, so every subsequent Processor construction in the test
run raised ValueError: Duplicated timeseries. Fix by saving and
restoring the original attributes around each test.

Remove --cov=trustgraph from pytest.ini defaults — the coverage
import hooks interact badly with the trustgraph namespace package,
causing duplicate class loading. Coverage can still be requested
explicitly via the command line.
2026-04-14 11:07:23 +01:00
Zeel Desai
39dcd1d386 Add unit tests for base helper modules (#797)
- add unit tests for base metrics, logging, spec, parameter_spec,
  and flow modules
- add a lightweight test-only module loader so these tests can run
  without optional runtime dependencies
- fix Parameter.start/stop to accept self
2026-04-14 10:58:15 +01:00
cybermaggedon
d2751553a3
Add agent explainability instrumentation and unify envelope field naming (#795)
Addresses recommendations from the UX developer's agent experience report.
Adds provenance predicates, DAG structure changes, error resilience, and
a published OWL ontology.

Explainability additions:

- Tool candidates: tg:toolCandidate on Analysis events lists the tools
  visible to the LLM for each iteration (names only, descriptions in config)
- Termination reason: tg:terminationReason on Conclusion/Synthesis events
  (final-answer, plan-complete, subagents-complete)
- Step counter: tg:stepNumber on iteration events
- Pattern decision: new tg:PatternDecision entity in the DAG between
  session and first iteration, carrying tg:pattern and tg:taskType
- Latency: tg:llmDurationMs on Analysis events, tg:toolDurationMs on
  Observation events
- Token counts on events: tg:inToken/tg:outToken/tg:llmModel on
  Grounding, Focus, Synthesis, and Analysis events
- Tool/parse errors: tg:toolError on Observation events with tg:Error
  mixin type. Parse failures return as error observations instead of
  crashing the agent, giving it a chance to retry.

Envelope unification:

- Rename chunk_type to message_type across AgentResponse schema,
  translator, SDK types, socket clients, CLI, and all tests.
  Agent and RAG services now both use message_type on the wire.

Ontology:

- specs/ontology/trustgraph.ttl — OWL vocabulary covering all 26 classes,
  7 object properties, and 36+ datatype properties including new predicates.

DAG structure tests:

- tests/unit/test_provenance/test_dag_structure.py verifies the
  wasDerivedFrom chain for GraphRAG, DocumentRAG, and all three agent
  patterns (react, plan, supervisor) including the pattern-decision link.
2026-04-13 16:16:42 +01:00
cybermaggedon
14e49d83c7
Expose LLM token usage across all service layers (#782)
Expose LLM token usage (in_token, out_token, model) across all
service layers

Propagate token counts from LLM services through the prompt,
text-completion, graph-RAG, document-RAG, and agent orchestrator
pipelines to the API gateway and Python SDK. All fields are Optional
— None means "not available", distinguishing from a real zero count.

Key changes:

- Schema: Add in_token/out_token/model to TextCompletionResponse,
  PromptResponse, GraphRagResponse, DocumentRagResponse,
  AgentResponse

- TextCompletionClient: New TextCompletionResult return type. Split
  into text_completion() (non-streaming) and
  text_completion_stream() (streaming with per-chunk handler
  callback)

- PromptClient: New PromptResult with response_type
  (text/json/jsonl), typed fields (text/object/objects), and token
  usage. All callers updated.

- RAG services: Accumulate token usage across all prompt calls
  (extract-concepts, edge-scoring, edge-reasoning,
  synthesis). Non-streaming path sends single combined response
  instead of chunk + end_of_session.

- Agent orchestrator: UsageTracker accumulates tokens across
  meta-router, pattern prompt calls, and react reasoning. Attached
  to end_of_dialog.

- Translators: Encode token fields when not None (is not None, not truthy)

- Python SDK: RAG and text-completion methods return
  TextCompletionResult (non-streaming) or RAGChunk/AgentAnswer with
  token fields (streaming)

- CLI: --show-usage flag on tg-invoke-llm, tg-invoke-prompt,
  tg-invoke-graph-rag, tg-invoke-document-rag, tg-invoke-agent
2026-04-13 14:38:34 +01:00
cybermaggedon
ffe310af7c
Fix RabbitMQ request/response race and chunker Flow API drift (#779)
* Fix Metadata/EntityEmbeddings schema migration tail and add regression tests (#776)

The Metadata dataclass dropped its `metadata: list[Triple]` field
and EntityEmbeddings/ChunkEmbeddings settled on a singular
`vector: list[float]` field, but several call sites kept passing
`Metadata(metadata=...)` and `EntityEmbeddings(vectors=...)`. The
bugs were latent until a websocket client first hit
`/api/v1/flow/default/import/entity-contexts`, at which point the
dispatcher TypeError'd on construction.

Production fixes (5 call sites on the same migration tail):

  * trustgraph-flow gateway dispatchers entity_contexts_import.py
    and graph_embeddings_import.py — drop the stale
    Metadata(metadata=...)  kwarg; switch graph_embeddings_import
    to the singular `vector` wire key.
  * trustgraph-base messaging translators knowledge.py and
    document_loading.py — fix decode side to read the singular
    `"vector"` key, matching what their own encode sides have
    always written.
  * trustgraph-flow tables/knowledge.py — fix Cassandra row
    deserialiser to construct EntityEmbeddings(vector=...)
    instead of vectors=.
  * trustgraph-flow gateway core_import/core_export — switch the
    kg-core msgpack wire format to the singular `"v"`/`"vector"`
    key and drop the dead `m["m"]` envelope field that referenced
    the removed Metadata.metadata triples list (it was a
    guaranteed KeyError on the export side).

Defense-in-depth regression coverage (32 new tests across 7 files):

  * tests/contract/test_schema_field_contracts.py — pin the field
    set of Metadata, EntityEmbeddings, ChunkEmbeddings,
    EntityContext so any future schema rename fails CI loudly
    with a clear diff.
  * tests/unit/test_translators/test_knowledge_translator_roundtrip.py
    and test_document_embeddings_translator_roundtrip.py -
    encode→decode round-trip the affected translators end to end,
    locking in the singular `"vector"` wire key.
  * tests/unit/test_gateway/test_entity_contexts_import_dispatcher.py
    and test_graph_embeddings_import_dispatcher.py — exercise the
    websocket dispatchers' receive() path with realistic
    payloads, the direct regression test for the original
    production crash.
  * tests/unit/test_gateway/test_core_import_export_roundtrip.py
    — pack/unpack the kg-core msgpack format through the real
    dispatcher classes (with KnowledgeRequestor mocked),
    including a full export→import round-trip.
  * tests/unit/test_tables/test_knowledge_table_store.py —
    exercise the Cassandra row → schema conversion via __new__ to
    bypass the live cluster connection.

Also fixes an unrelated leaked-coroutine RuntimeWarning in
test_gateway/test_service.py::test_run_method_calls_web_run_app: the
mocked aiohttp.web.run_app now closes the coroutine that Api.run() hands
it, mirroring what the real run_app would do, instead of leaving it for
the GC to complain about.

* Fix RabbitMQ request/response race and chunker Flow API drift

Two unrelated regressions surfaced after the v2.2 queue class
refactor.  Bundled here because both are small and both block
production.

1. Request/response race against ephemeral RabbitMQ response
queues

Commit feeb92b3 switched response/notify queues to per-subscriber
auto-delete exclusive queues. That fixed orphaned-queue
accumulation but introduced a setup race: Subscriber.start()
created the run() task and returned immediately, while the
underlying RabbitMQ consumer only declared and bound its queue
lazily on the first receive() call.  RequestResponse.request()
therefore published the request before any queue was bound to the
matching routing key, and the broker dropped the reply. Symptoms:
"Failed to fetch config on notify" / "Request timeout exception"
repeating roughly every 10s in api-gateway, document-embeddings
and any other service exercising the config notify path.

Fix:

  * Add ensure_connected() to the BackendConsumer protocol;
    implement it on RabbitMQBackendConsumer (calls _connect
    synchronously, declaring and binding the queue) and as a
    no-op on PulsarBackendConsumer (Pulsar's client.subscribe is
    already synchronous at construction).

  * Convert Subscriber's readiness signal from a non-existent
    Event to an asyncio.Future created in start(). run() calls
    consumer.ensure_connected() immediately after
    create_consumer() and sets _ready.set_result(None) on first
    successful bind. start() awaits the future via asyncio.wait
    so it returns only once the consumer is fully bound. Any
    reply published after start() returns is therefore guaranteed
    to land in a bound queue.

  * First-attempt connection failures call
    _ready.set_exception(e) and exit run() so start() unblocks
    with the error rather than hanging forever — the existing
    higher-level retry pattern in fetch_and_apply_config takes
    over from there. Runtime failures after a successful start
    still go through the existing retry-with-backoff path.

  * Update the two existing graceful-shutdown tests that
    monkey-patch Subscriber.run with a custom coroutine to honor
    the new contract by signalling _ready themselves.

  * Add tests/unit/test_base/test_subscriber_readiness.py with
    five regression tests pinning the readiness contract:
    ensure_connected must be called before start() returns;
    start() must block while ensure_connected runs
    (race-condition guard with a threading.Event gate);
    first-attempt create_consumer and ensure_connected failures
    must propagate to start() instead of hanging;
    ensure_connected must run before any receive() call.

2. Chunker Flow parameter lookup using the wrong attribute

trustgraph-base/trustgraph/base/chunking_service.py was reading
flow.parameters.get("chunk-size") and chunk-overlap, but the Flow
class has no `parameters` attribute — parameter lookup is exposed
through Flow.__call__ (flow("chunk-size") returns the resolved
value or None).  The exception was caught and logged as a
WARNING, so chunking continued with the default sizes and any
configured chunk-size / chunk-overlap was silently ignored:

    chunker - WARNING - Could not parse chunk-size parameter:
    'Flow' object has no attribute 'parameters'

The chunker tests didn't catch this because they constructed
mock_flow = MagicMock() and configured
mock_flow.parameters.get.side_effect = ..., which is the same
phantom attribute MagicMock auto-creates on demand. Tests and
production agreed on the wrong API.

Fix: switch chunking_service.py to flow("chunk-size") /
flow("chunk-overlap"). Update both chunker test files to mock the
__call__ side_effect instead of the phantom parameters.get,
merging parameter values into the existing flow() lookup the
on_message tests already used for producer resolution.
2026-04-11 01:29:38 +01:00
cybermaggedon
c23e28aa66
Fix Metadata/EntityEmbeddings schema migration tail and add regression tests (#777)
The Metadata dataclass dropped its `metadata: list[Triple]` field
and EntityEmbeddings/ChunkEmbeddings settled on a singular
`vector: list[float]` field, but several call sites kept passing
`Metadata(metadata=...)` and `EntityEmbeddings(vectors=...)`. The
bugs were latent until a websocket client first hit
`/api/v1/flow/default/import/entity-contexts`, at which point the
dispatcher TypeError'd on construction.

Production fixes (5 call sites on the same migration tail):

  * trustgraph-flow gateway dispatchers entity_contexts_import.py
    and graph_embeddings_import.py — drop the stale
    Metadata(metadata=...)  kwarg; switch graph_embeddings_import
    to the singular `vector` wire key.
  * trustgraph-base messaging translators knowledge.py and
    document_loading.py — fix decode side to read the singular
    `"vector"` key, matching what their own encode sides have
    always written.
  * trustgraph-flow tables/knowledge.py — fix Cassandra row
    deserialiser to construct EntityEmbeddings(vector=...)
    instead of vectors=.
  * trustgraph-flow gateway core_import/core_export — switch the
    kg-core msgpack wire format to the singular `"v"`/`"vector"`
    key and drop the dead `m["m"]` envelope field that referenced
    the removed Metadata.metadata triples list (it was a
    guaranteed KeyError on the export side).

Defense-in-depth regression coverage (32 new tests across 7 files):

  * tests/contract/test_schema_field_contracts.py — pin the field
    set of Metadata, EntityEmbeddings, ChunkEmbeddings,
    EntityContext so any future schema rename fails CI loudly
    with a clear diff.
  * tests/unit/test_translators/test_knowledge_translator_roundtrip.py
    and test_document_embeddings_translator_roundtrip.py -
    encode→decode round-trip the affected translators end to end,
    locking in the singular `"vector"` wire key.
  * tests/unit/test_gateway/test_entity_contexts_import_dispatcher.py
    and test_graph_embeddings_import_dispatcher.py — exercise the
    websocket dispatchers' receive() path with realistic
    payloads, the direct regression test for the original
    production crash.
  * tests/unit/test_gateway/test_core_import_export_roundtrip.py
    — pack/unpack the kg-core msgpack format through the real
    dispatcher classes (with KnowledgeRequestor mocked),
    including a full export→import round-trip.
  * tests/unit/test_tables/test_knowledge_table_store.py —
    exercise the Cassandra row → schema conversion via __new__ to
    bypass the live cluster connection.

Also fixes an unrelated leaked-coroutine RuntimeWarning in
test_gateway/test_service.py::test_run_method_calls_web_run_app: the
mocked aiohttp.web.run_app now closes the coroutine that Api.run() hands
it, mirroring what the real run_app would do, instead of leaving it for
the GC to complain about.
2026-04-10 20:43:45 +01:00
cybermaggedon
feeb92b33f
Refactor: Derive consumer behaviour from queue class (#772)
Derive consumer behaviour from queue class, remove
consumer_type parameter

The queue class prefix (flow, request, response, notify) now
fully determines consumer behaviour in both RabbitMQ and Pulsar
backends.  Added 'notify' class for ephemeral broadcast (config
push notifications).  Response and notify classes always create
per-subscriber auto-delete queues, eliminating orphaned queues
that accumulated on service restarts.

Change init-trustgraph to set up the 'notify' namespace in
Pulsar instead of old hangover 'state'.

Fixes 'stuck backlog' on RabbitMQ config notification queue.
2026-04-09 09:55:41 +01:00
cybermaggedon
e81418c58f
fix: preserve literal types in focus quoted triples and document tracing (#769)
The triples client returns Uri/Literal (str subclasses), not Term
objects.  _quoted_triple() treated all values as IRIs, so literal
objects like skos:definition values were mistyped in focus
provenance events, and trace_source_documents could not match
them in the store.

Added to_term() to convert Uri/Literal back to Term, threaded a
term_map from follow_edges_batch through
get_subgraph/get_labelgraph into uri_map, and updated
_quoted_triple to accept Term objects directly.
2026-04-08 13:37:02 +01:00
cybermaggedon
4b5bfacab1
Forward missing explain_triples through RAG clients and agent tool callback (#768)
fix: forward explain_triples through RAG clients and agent tool callback
- RAG clients and the KnowledgeQueryImpl tool callback were
  dropping explain_triples from explain events, losing provenance
  data (including focus edge selections) when graph-rag is invoked
  via the agent.

Tests for provenance and explainability (56 new):
- Client-level forwarding of explain_triples
- Graph-RAG structural chain
  (question → grounding → exploration → focus → synthesis)
- Graph-RAG integration with mocked subsidiary clients
- Document-RAG integration
  (question → grounding → exploration → synthesis)
- Agent-orchestrator all 3 patterns: react, plan-then-execute,
  supervisor
2026-04-08 11:41:17 +01:00
cybermaggedon
c20e6540ec
Subscriber resilience and RabbitMQ fixes (#765)
Subscriber resilience: recreate consumer after connection failure

- Move consumer creation from Subscriber.start() into the run() loop,
  matching the pattern used by Consumer. If the connection drops and the
  consumer is closed in the finally block, the loop now recreates it on
  the next iteration instead of spinning forever on a None consumer.

Consumer thread safety:
- Dedicated ThreadPoolExecutor per consumer so all pika operations
  (create, receive, acknowledge, negative_acknowledge) run on the
  same thread — pika BlockingConnection is not thread-safe
- Applies to both Consumer and Subscriber classes

Config handler type audit — fix four mismatched type registrations:
- librarian: was ["librarian"] (non-existent type), now ["flow",
  "active-flow"] (matches config["flow"] that the handler reads)
- cores/service: was ["kg-core"], now ["flow"] (reads
  config["flow"])
- metering/counter: was ["token-costs"], now ["token-cost"]
  (singular)
- agent/mcp_tool: was ["mcp-tool"], now ["mcp"] (reads
  config["mcp"])

Update tests
2026-04-07 14:51:14 +01:00
cybermaggedon
ddd4bd7790
Deliver explainability triples inline in retrieval response stream (#763)
Provenance triples are now included directly in explain messages from
GraphRAG, DocumentRAG, and Agent services, eliminating the need for
follow-up knowledge graph queries to retrieve explainability details.

Each explain message in the response stream now carries:
- explain_id: root URI for this provenance step (unchanged)
- explain_graph: named graph where triples are stored (unchanged)
- explain_triples: the actual provenance triples for this step (new)

Changes across the stack:
- Schema: added explain_triples field to GraphRagResponse,
  DocumentRagResponse, and AgentResponse
- Services: all explain message call sites pass triples through
  (graph_rag, document_rag, agent react, agent orchestrator)
- Translators: encode explain_triples via TripleTranslator for
  gateway wire format
- Python SDK: ProvenanceEvent now includes parsed ExplainEntity
  and raw triples; expanded event_type detection
- CLI: invoke_graph_rag, invoke_agent, invoke_document_rag use
  inline entity when available, fall back to graph query
- Tech specs updated

Additional explainability test
2026-04-07 12:19:05 +01:00
Sreeram Venkatasubramanian
f0c9039b76 fix: reduce consumer poll timeout from 2000ms to 100ms 2026-04-07 12:02:27 +01:00
cybermaggedon
4acd853023
Config push notify pattern: replace stateful pub/sub with signal+ fetch (#760)
Replace the config push mechanism that broadcast the full config
blob on a 'state' class pub/sub queue with a lightweight notify
signal containing only the version number and affected config
types. Processors fetch the full config via request/response from
the config service when notified.

This eliminates the need for the pub/sub 'state' queue class and
stateful pub/sub services entirely. The config push queue moves
from 'state' to 'flow' class — a simple transient signal rather
than a retained message.  This solves the RabbitMQ
late-subscriber problem where restarting processes never received
the current config because their fresh queue had no historical
messages.

Key changes:
- ConfigPush schema: config dict replaced with types list
- Subscribe-then-fetch startup with retry: processors subscribe
  to notify queue, fetch config via request/response, then
  process buffered notifies with version comparison to avoid race
  conditions
- register_config_handler() accepts optional types parameter so
  handlers only fire when their config types change
- Short-lived config request/response clients to avoid subscriber
  contention on non-persistent response topics
- Config service passes affected types through put/delete/flow
  operations
- Gateway ConfigReceiver rewritten with same notify pattern and
  retry loop

Tests updated

New tests:
- register_config_handler: without types, with types, multiple
  types, multiple handlers
- on_config_notify: old/same version skipped, irrelevant types
  skipped (version still updated), relevant type triggers fetch,
  handler without types always called, mixed handler filtering,
  empty types invokes all, fetch failure handled gracefully
- fetch_config: returns config+version, raises on error response,
  stops client even on exception
- fetch_and_apply_config: applies to all handlers on startup,
  retries on failure
2026-04-06 16:57:27 +01:00
V.Sreeram
d4723566cb fix: prevent duplicate dispatcher creation race condition in invoke_global_service (#715)
* fix: prevent duplicate dispatcher creation race condition in invoke_global_service

Concurrent coroutines could all pass the `if key in self.dispatchers` check
before any of them wrote the result back, because `await dispatcher.start()`
yields to the event loop. This caused multiple Pulsar consumers to be created
on the same shared subscription, distributing responses round-robin and
dropping ~2/3 of them — manifesting as a permanent spinner in the Workbench UI.

Apply a double-checked asyncio.Lock in both `invoke_global_service` and
`invoke_flow_service` so only one dispatcher is ever created per service key.

* test: add concurrent-dispatch tests for race condition fix

Add asyncio.gather-based tests that verify invoke_global_service and
invoke_flow_service create exactly one dispatcher under concurrent calls,
preventing the duplicate Pulsar consumer bug.
2026-04-06 11:14:32 +01:00
Alex Jenkins
10a931f04c Feat: Auto-pull missing Ollama models (#757)
* fix deadlink in readme

Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu>

* feat: Auto-pull Ollama models

Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu>

* fix: Restore namespace __init__.py files for package resolution

Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu>

* fix CI

Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu>
2026-04-06 11:10:53 +01:00
cybermaggedon
d9dc4cbab5
SPARQL query service (#754)
SPARQL 1.1 query service wrapping pub/sub triples interface

Add a backend-agnostic SPARQL query service that parses SPARQL
queries using rdflib, decomposes them into triple pattern lookups
via the existing TriplesClient pub/sub interface, and performs
in-memory joins, filters, and projections.

Includes:
- SPARQL parser, algebra evaluator, expression evaluator, solution
  sequence operations (BGP, JOIN, OPTIONAL, UNION, FILTER, BIND,
  VALUES, GROUP BY, ORDER BY, LIMIT/OFFSET, DISTINCT, aggregates)
- FlowProcessor service with TriplesClientSpec
- Gateway dispatcher, request/response translators, API spec
- Python SDK method (FlowInstance.sparql_query)
- CLI command (tg-invoke-sparql-query)
- Tech spec (docs/tech-specs/sparql-query.md)

New unit tests for SPARQL query
2026-04-02 17:21:39 +01:00
cybermaggedon
24f0190ce7
RabbitMQ pub/sub backend with topic exchange architecture (#752)
Adds a RabbitMQ backend as an alternative to Pulsar, selectable via
PUBSUB_BACKEND=rabbitmq. Both backends implement the same PubSubBackend
protocol — no application code changes needed to switch.

RabbitMQ topology:
- Single topic exchange per topicspace (e.g. 'tg')
- Routing key derived from queue class and topic name
- Shared consumers: named queue bound to exchange (competing, round-robin)
- Exclusive consumers: anonymous auto-delete queue (broadcast, each gets
  every message). Used by Subscriber and config push consumer.
- Thread-local producer connections (pika is not thread-safe)
- Push-based consumption via basic_consume with process_data_events
  for heartbeat processing

Consumer model changes:
- Consumer class creates one backend consumer per concurrent task
  (required for pika thread safety, harmless for Pulsar)
- Consumer class accepts consumer_type parameter
- Subscriber passes consumer_type='exclusive' for broadcast semantics
- Config push consumer uses consumer_type='exclusive' so every
  processor instance receives config updates
- handle_one_from_queue receives consumer as parameter for correct
  per-connection ack/nack

LibrarianClient:
- New shared client class replacing duplicated librarian request-response
  code across 6+ services (chunking, decoders, RAG, etc.)
- Uses stream-document instead of get-document-content for fetching
  document content in 1MB chunks (avoids broker message size limits)
- Standalone object (self.librarian = LibrarianClient(...)) not a mixin
- get-document-content marked deprecated in schema and OpenAPI spec

Serialisation:
- Extracted dataclass_to_dict/dict_to_dataclass to shared
  serialization.py (used by both Pulsar and RabbitMQ backends)

Librarian queues:
- Changed from flow class (persistent) back to request/response class
  now that stream-document eliminates large single messages
- API upload chunk size reduced from 5MB to 3MB to stay under broker
  limits after base64 encoding

Factory and CLI:
- get_pubsub() handles 'rabbitmq' backend with RabbitMQ connection params
- add_pubsub_args() includes RabbitMQ options (host, port, credentials)
- add_pubsub_args(standalone=True) defaults to localhost for CLI tools
- init_trustgraph skips Pulsar admin setup for non-Pulsar backends
- tg-dump-queues and tg-monitor-prompts use backend abstraction
- BaseClient and ConfigClient accept generic pubsub config
2026-04-02 12:47:16 +01:00
cybermaggedon
4fb0b4d8e8
Pub/sub abstraction: decouple from Pulsar (#751)
Remove Pulsar-specific concepts from application code so that
the pub/sub backend is swappable via configuration.

Rename translators:
- to_pulsar/from_pulsar → decode/encode across all translator
  classes, dispatch handlers, and tests (55+ files)
- from_response_with_completion → encode_with_completion
- Remove pulsar.schema.Record from translator base class

Queue naming (CLASS:TOPICSPACE:TOPIC):
- Replace topic() helper with queue() using new format:
  flow:tg:name, request:tg:name, response:tg:name, state:tg:name
- Queue class implies persistence/TTL (no QoS in names)
- Update Pulsar backend map_topic() to parse new format
- Librarian queues use flow class (persistent, for chunking)
- Config push uses state class (persistent, last-value)
- Remove 15 dead topic imports from schema files
- Update init_trustgraph.py namespace: config → state

Confine Pulsar to pulsar_backend.py:
- Delete legacy PulsarClient class from pubsub.py
- Move add_args to add_pubsub_args() with standalone flag
  for CLI tools (defaults to localhost)
- PulsarBackendConsumer.receive() catches _pulsar.Timeout,
  raises standard TimeoutError
- Remove Pulsar imports from: async_processor, flow_processor,
  log_level, all 11 client files, 4 storage writers, gateway
  service, gateway config receiver
- Remove log_level/LoggerLevel from client API
- Rewrite tg-monitor-prompts to use backend abstraction
- Update tg-dump-queues to use add_pubsub_args

Also: pubsub-abstraction.md tech spec covering problem statement,
design goals, as-is requirements, candidate broker assessment,
approach, and implementation order.
2026-04-01 20:16:53 +01:00
cybermaggedon
dbf8daa74a Additional agent DAG tests (#750)
- test_agent_provenance.py: test_session_parent_uri,
  test_session_no_parent_uri, and 6 synthesis tests (types,
  single/multiple parents, document, label)
- test_on_action_callback.py: 3 tests — fires before tool, skipped
  for Final, works when None
- test_callback_message_id.py: 7 tests — message_id on think/observe/
  answer callbacks (streaming + non-streaming) and
  send_final_response
- test_parse_chunk_message_id.py (5 tests) - _parse_chunk propagates
  message_id for thought, observation, answer; handles missing
  gracefully
- test_explainability_parsing.py (+1) -
  test_dispatches_analysis_with_tooluse - Analysis+ToolUse mixin still
  dispatches to Analysis
- test_explainability.py (+1) -
  test_observation_found_via_subtrace_synthesis
- chain walker follows from sub-trace Synthesis to find Observation
  and
  Conclusion in correct order
- test_agent_provenance.py (+8) - session parent_uri (2), synthesis
  single/multiple parents, types, document, label (6)
2026-04-01 13:59:58 +01:00
cybermaggedon
153ae9ad30
Split Analysis into Analysis+ToolUse and Observation, add message_id (#747)
Refactor agent provenance so that the decision (thought + tool
selection) and the result (observation) are separate DAG entities:

  Question ← Analysis+ToolUse ← Observation ← ... ← Conclusion

Analysis gains tg:ToolUse as a mixin RDF type and is emitted
before tool execution via an on_action callback in react().
This ensures sub-traces (e.g. GraphRAG) appear after their
parent Analysis in the streaming event order.

Observation becomes a standalone prov:Entity with tg:Observation
type, emitted after tool execution. The linear DAG chain runs
through Observation — subsequent iterations and the Conclusion
derive from it, not from the Analysis.

message_id is populated on streaming AgentResponse for thought
and observation chunks, using the provenance URI of the entity
being built. This lets clients group streamed chunks by entity.

Wire changes:
- provenance/agent.py: Add ToolUse type, new
  agent_observation_triples(), remove observation from iteration
- agent_manager.py: Add on_action callback between reason() and
  tool execution
- orchestrator/pattern_base.py: Split emit, wire message_id,
  chain through observation URIs
- orchestrator/react_pattern.py: Emit Analysis via on_action
  before tool runs
- agent/react/service.py: Same for non-orchestrator path
- api/explainability.py: New Observation class, updated dispatch
  and chain walker
- api/types.py: Add message_id to AgentThought/AgentObservation
- cli: Render Observation separately, [analysis: tool] labels
2026-03-31 17:51:22 +01:00
cybermaggedon
816a8cfcf6
Update tests for agent-orchestrator (#745)
Add 96 tests covering the orchestrator's aggregation, provenance,
routing, and explainability parsing. These verify the supervisor
fan-out/fan-in lifecycle, the new RDF provenance types
(Decomposition, Finding, Plan, StepResult, Synthesis), and their
round-trip through the wire format.

Unit tests (84):
- Aggregator: register, record completion, peek, build synthesis,
  cleanup
- Provenance triple builders: types, provenance links,
  goals/steps, labels
- Explainability parsing: from_triples dispatch, field extraction
  for all new entity types, precedence over existing types
- PatternBase: is_subagent detection, emit_subagent_completion
  message shape
- Completion dispatch: detection logic, full aggregator
  integration flow, synthesis request not re-intercepted as
  completion
- MetaRouter: task type identification, pattern selection,
  valid_patterns constraints, fallback on LLM error or unknown
  response

Contract tests (12):
- Orchestration fields on AgentRequest round-trip correctly
- subagent-completion and synthesise step types in request
  history
- Plan steps with status and dependencies
- Provenance triple builder → wire format → from_triples
  round-trip for all five new entity types
2026-03-31 13:12:26 +01:00
cybermaggedon
849987f0e6
Add multi-pattern orchestrator with plan-then-execute and supervisor (#739)
Introduce an agent orchestrator service that supports three
execution patterns (ReAct, plan-then-execute, supervisor) with
LLM-based meta-routing to select the appropriate pattern and task
type per request. Update the agent schema to support
orchestration fields (correlation, sub-agents, plan steps) and
remove legacy response fields (answer, thought, observation).
2026-03-31 00:32:49 +01:00
CommitHu502Craft
7af1d60db8 fix(gateway): accept raw utf-8 text in text-load (#729)
Co-authored-by: nanqinhu <139929317+nanqinhu@users.noreply.github.com>
2026-03-30 17:00:10 +01:00
cybermaggedon
20204d87c3
Fix OpenAI compatibility issues for newer models and Azure config (#727)
Use max_completion_tokens for OpenAI and Azure OpenAI providers:
The OpenAI API deprecated max_tokens in favor of
max_completion_tokens for chat completions. Newer models
(gpt-4o, o1, o3) reject the old parameter with a 400 error.

AZURE_API_VERSION env var now overrides the default API version:
(falls back to 2024-12-01-preview).

Update tests to test for expected structures
2026-03-28 11:19:45 +00:00
cybermaggedon
a634520509
Fix websocket error responses in Mux dispatcher (#726)
Error responses from the websocket multiplexer were missing the
request ID and using a bare string format instead of the structured
error protocol. This caused clients to hang when a request failed
(e.g. unsupported service for a flow) because the error could not
be routed to the waiting caller.

Include request ID in all error paths, use structured error format
({message, type}) with complete flag, and extract the ID early in
receive() so even malformed requests get a routable error when
possible.

Updated tests - tests were coded against invalid protocol messages
2026-03-28 10:58:28 +00:00
cybermaggedon
5c6fe90fe2
Add universal document decoder with multi-format support (#705)
Add universal document decoder with multi-format support
using 'unstructured'.

New universal decoder service powered by the unstructured
library, handling DOCX, XLSX, PPTX, HTML, Markdown, CSV, RTF,
ODT, EPUB and more through a single service. Tables are preserved
as HTML markup for better downstream extraction. Images are
stored in the librarian but excluded from the text
pipeline. Configurable section grouping strategies
(whole-document, heading, element-type, count, size) for non-page
formats. Page-based formats (PDF, PPTX, XLSX) are automatically
grouped by page.

All four decoders (PDF, Mistral OCR, Tesseract OCR, universal)
now share the "document-decoder" ident so they are
interchangeable.  PDF-only decoders fetch document metadata to
check MIME type and gracefully skip unsupported formats.

Librarian changes: removed MIME type whitelist validation so any
document format can be ingested. Simplified routing so text/plain
goes to text-load and everything else goes to document-load.
Removed dual inline/streaming data paths — documents always use
document_id for content retrieval.

New provenance entity types (tg:Section, tg:Image) and metadata
predicates (tg:elementTypes, tg:tableCount, tg:imageCount) for
richer explainability.

Universal decoder is in its own package (trustgraph-unstructured)
and container image (trustgraph-unstructured).
2026-03-23 12:56:35 +00:00
cybermaggedon
96fd1eab15
Use UUID-based URNs for page and chunk IDs (#703)
Page and chunk document IDs were deterministic ({doc_id}/p{num},
{doc_id}/p{num}/c{num}), causing "Document already exists" errors
when reprocessing documents through different flows. Content may
differ between runs due to different parameters or extractors, so
deterministic IDs are incorrect.

Pages now use urn:page:{uuid}, chunks use
urn:chunk:{uuid}. Parent- child relationships are tracked via
librarian metadata and provenance triples.

Also brings Mistral OCR and Tesseract OCR decoders up to parity
with the PDF decoder: librarian fetch/save support, per-page
output with unique IDs, and provenance triple emission. Fixes
Mistral OCR bug where only the first 5 pages were processed.
2026-03-21 21:17:03 +00:00
cybermaggedon
c387670944
Fix incorrect property names in explainability (#698)
Remove type suffixes from explainability dataclass fields + fix show_explain_trace

Rename dataclass fields to match KG property naming conventions:
- Analysis: thought_uri/observation_uri → thought/observation
- Synthesis/Conclusion/Reflection: document_uri → document

Fix show_explain_trace for current API:
- Resolve document content via librarian fetch instead of removed
  inline content fields (synthesis.content, conclusion.answer)
- Add Grounding display for DocRAG traces
- Update fetch_docrag_trace chain: Question → Grounding → Exploration →
Synthesis
- Pass api/explain_client to all print functions for content resolution

Update all CLI tools and tests for renamed fields.
2026-03-16 14:47:37 +00:00
cybermaggedon
a115ec06ab
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O

GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
  kg-edge-scoring,
  kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
  provenance/explainability edges
- Add source document edges to knowledge graph

DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
  pattern:
  Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication

Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
  entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
  tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
cybermaggedon
29b4300808
Updated test suite for explainability & provenance (#696)
* Provenance tests

* Embeddings tests

* Test librarian

* Test triples stream

* Test concurrency

* Entity centric graph writes

* Agent tool service tests

* Structured data tests

* RDF tests

* Addition LLM tests

* Reliability tests
2026-03-13 14:27:42 +00:00
cybermaggedon
e6623fc915
Remove schema:subjectOf edges from KG extraction (#695)
The subjectOf triples were redundant with the subgraph provenance model
introduced in e8407b34. Entity-to-source lineage can be traced via
tg:contains -> subgraph -> prov:wasDerivedFrom -> chunk, making the
direct subjectOf edges unnecessary metadata polluting the knowledge graph.

Removed from all three extractors (agent, definitions, relationships),
cleaned up the SUBJECT_OF constant and vocabulary label, and updated
tests accordingly.
2026-03-13 12:11:21 +00:00
cybermaggedon
64e3f6bd0d
Subgraph provenance (#694)
Replace per-triple provenance reification with subgraph model

Extraction provenance previously created a full reification (statement
URI, activity, agent) for every single extracted triple, producing ~13
provenance triples per knowledge triple.  Since each chunk is processed
by a single LLM call, this was both redundant and semantically
inaccurate.

Now one subgraph object is created per chunk extraction, with
tg:contains linking to each extracted triple.  For 20 extractions from
a chunk this reduces provenance from ~260 triples to ~33.

- Rename tg:reifies -> tg:contains, stmt_uri -> subgraph_uri
- Replace triple_provenance_triples() with subgraph_provenance_triples()
- Refactor kg-extract-definitions and kg-extract-relationships to
  generate provenance once per chunk instead of per triple
- Add subgraph provenance to kg-extract-ontology and kg-extract-agent
  (previously had none)
- Update CLI tools and tech specs to match

Also rename tg-show-document-hierarchy to tg-show-extraction-provenance.

Added extra typing for extraction provenance, fixed extraction prov CLI
2026-03-13 11:37:59 +00:00
cybermaggedon
35128ff019
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines

Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.

Explainability API:
- New explainability.py module with entity classes (Question,
  Exploration, Focus, Synthesis, Analysis, Conclusion) and
  ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
  fetching
- Content fetching from librarian with retry logic

CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
  explain_id
- tg-invoke-document-rag -x/--explainable flag returns
  explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types

Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
  references
- New predicates: tg:thoughtDocument, tg:observationDocument

DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references

Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
  observation_document_id

Update tests.
2026-03-12 21:40:09 +00:00
cybermaggedon
45e6ad4abc
Fix ontology RAG pipeline + add query concurrency (#691)
- Fix ontology RAG pipeline: embeddings API, chunker provenance, and query concurrency

- Fix ontology embeddings to use correct response shape from embed()
  API (returns list of vectors, not list of list of vectors).
- Simplify chunker URI logic to append /c{index} to parent ID
  instead of parsing page/doc URI structure which was fragile.

- Add provenance tracking and librarian integration to token
  chunker, matching recursive chunker capabilities.

- Add configurable concurrency (default 10) to Cassandra, Qdrant,
  and embeddings query services.
2026-03-12 11:34:42 +00:00
cybermaggedon
312174eb88
Adding explainability to the ReACT agent (#689)
* Added tech spec

* Add provenance recording to React agent loop

Enables agent sessions to be traced and debugged using the same
explainability infrastructure as GraphRAG. Agent traces record:
- Session start with query and timestamp
- Each iteration's thought, action, arguments, and observation
- Final answer with derivation chain

Changes:
- Add session_id and collection fields to AgentRequest schema
- Add agent predicates (TG_THOUGHT, TG_ACTION, etc.) to namespaces
- Create agent provenance triple generators in provenance/agent.py
- Register explainability producer in agent service
- Emit provenance triples during agent execution
- Update CLI tools to detect and render agent traces alongside GraphRAG

* Updated explainability taxonomy:

GraphRAG: tg:Question → tg:Exploration → tg:Focus → tg:Synthesis

Agent: tg:Question → tg:Analysis(s) → tg:Conclusion

All entities also have their PROV-O type (prov:Activity or prov:Entity).

Updated commit message:

Add provenance recording to React agent loop

Enables agent sessions to be traced and debugged using the same
explainability infrastructure as GraphRAG.

Entity types follow human reasoning patterns:
- tg:Question - the user's query (shared with GraphRAG)
- tg:Analysis - each think/act/observe cycle
- tg:Conclusion - the final answer

Also adds explicit TG types to GraphRAG entities:
- tg:Question, tg:Exploration, tg:Focus, tg:Synthesis

All types retain their PROV-O base types (prov:Activity, prov:Entity).

Changes:
- Add session_id and collection fields to AgentRequest schema
- Add explainability entity types to namespaces.py
- Create agent provenance triple generators
- Register explainability producer in agent service
- Emit provenance triples during agent execution
- Update CLI tools to detect and render both trace types

* Document RAG explainability is now complete. Here's a summary of the
changes made:

Schema Changes:
- trustgraph-base/trustgraph/schema/services/retrieval.py: Added
  explain_id and explain_graph fields to DocumentRagResponse
- trustgraph-base/trustgraph/messaging/translators/retrieval.py:
  Updated translator to handle explainability fields

Provenance Changes:
- trustgraph-base/trustgraph/provenance/namespaces.py: Added
  TG_CHUNK_COUNT and TG_SELECTED_CHUNK predicates
- trustgraph-base/trustgraph/provenance/uris.py: Added
  docrag_question_uri, docrag_exploration_uri, docrag_synthesis_uri
  generators
- trustgraph-base/trustgraph/provenance/triples.py: Added
  docrag_question_triples, docrag_exploration_triples,
  docrag_synthesis_triples builders
- trustgraph-base/trustgraph/provenance/__init__.py: Exported all
  new Document RAG functions and predicates

Service Changes:
- trustgraph-flow/trustgraph/retrieval/document_rag/document_rag.py:
  Added explainability callback support and triple emission at each
  phase (Question → Exploration → Synthesis)
- trustgraph-flow/trustgraph/retrieval/document_rag/rag.py:
  Registered explainability producer and wired up the callback

Documentation:
- docs/tech-specs/agent-explainability.md: Added Document RAG entity
  types and provenance model documentation

Document RAG Provenance Model:
Question (urn:trustgraph:docrag:{uuid})
    │
    │  tg:query, prov:startedAtTime
    │  rdf:type = prov:Activity, tg:Question
    │
    ↓ prov:wasGeneratedBy
    │
Exploration (urn:trustgraph:docrag:{uuid}/exploration)
    │
    │  tg:chunkCount, tg:selectedChunk (multiple)
    │  rdf:type = prov:Entity, tg:Exploration
    │
    ↓ prov:wasDerivedFrom
    │
Synthesis (urn:trustgraph:docrag:{uuid}/synthesis)
    │
    │  tg:content = "The answer..."
    │  rdf:type = prov:Entity, tg:Synthesis

* Specific subtype that makes the retrieval mechanism immediately
obvious:

System: GraphRAG
TG Types on Question: tg:Question, tg:GraphRagQuestion
URI Pattern: urn:trustgraph:question:{uuid}
────────────────────────────────────────
System: Document RAG
TG Types on Question: tg:Question, tg:DocRagQuestion
URI Pattern: urn:trustgraph:docrag:{uuid}
────────────────────────────────────────
System: Agent
TG Types on Question: tg:Question, tg:AgentQuestion
URI Pattern: urn:trustgraph:agent:{uuid}
Files modified:
- trustgraph-base/trustgraph/provenance/namespaces.py - Added
TG_GRAPH_RAG_QUESTION, TG_DOC_RAG_QUESTION, TG_AGENT_QUESTION
- trustgraph-base/trustgraph/provenance/triples.py - Added subtype to
question_triples and docrag_question_triples
- trustgraph-base/trustgraph/provenance/agent.py - Added subtype to
agent_session_triples
- trustgraph-base/trustgraph/provenance/__init__.py - Exported new types
- docs/tech-specs/agent-explainability.md - Documented the subtypes

This allows:
- Query all questions: ?q rdf:type tg:Question
- Query only GraphRAG: ?q rdf:type tg:GraphRagQuestion
- Query only Document RAG: ?q rdf:type tg:DocRagQuestion
- Query only Agent: ?q rdf:type tg:AgentQuestion

* Fixed tests
2026-03-11 15:28:15 +00:00
cybermaggedon
286f762369
The id field in pipeline Metadata was being overwritten at each processing (#686)
The id field in pipeline Metadata was being overwritten at each processing
stage (document → page → chunk), causing knowledge storage to create
separate cores per chunk instead of grouping by document.

Add a root field that:
- Is set by librarian to the original document ID
- Is copied unchanged through PDF decoder, chunkers, and extractors
- Is used by knowledge storage for document_id grouping (with fallback to id)

Changes:
- Add root field to Metadata schema with empty string default
- Set root=document.id in librarian when initiating document processing
- Copy root through PDF decoder, recursive chunker, and all extractors
- Update knowledge storage to use root (or id as fallback) for grouping
- Add root handling to translators and gateway serialization
- Update test mock Metadata class to include root parameter
2026-03-11 12:16:39 +00:00
cybermaggedon
aa4f5c6c00
Remove redundant metadata (#685)
The metadata field (list of triples) in the pipeline Metadata class
was redundant. Document metadata triples already flow directly from
librarian to triple-store via emit_document_provenance() - they don't
need to pass through the extraction pipeline.

Additionally, chunker and PDF decoder were overwriting metadata to []
anyway, so any metadata passed through the pipeline was being
discarded.

Changes:
- Remove metadata field from Metadata dataclass
  (schema/core/metadata.py)
- Update all Metadata instantiations to remove metadata=[]
  parameter
- Remove metadata handling from translators (document_loading,
  knowledge)
- Remove metadata consumption from extractors (ontology, agent)
- Update gateway serializers and import handlers
- Update all unit, integration, and contract tests
2026-03-11 10:51:39 +00:00
cybermaggedon
e1bc4c04a4
Terminology Rename, and named-graphs for explainability (#682)
Terminology Rename, and named-graphs for explainability data

Changed terminology:
  - session -> question
  - retrieval -> exploration
  - selection -> focus
  - answer -> synthesis

- uris.py: Renamed query_session_uri → question_uri,
  retrieval_uri → exploration_uri, selection_uri → focus_uri,
  answer_uri → synthesis_uri
- triples.py: Renamed corresponding triple generation functions with
  updated labels ("GraphRAG question", "Exploration", "Focus",
  "Synthesis")
- namespaces.py: Added named graph constants GRAPH_DEFAULT,
  GRAPH_SOURCE, GRAPH_RETRIEVAL
- init.py: Updated exports
- graph_rag.py: Updated to use new terminology
- invoke_graph_rag.py: Updated CLI to display new stage names
  (Question, Exploration, Focus, Synthesis)

Query-Time Explainability → Named Graph
- triples.py: Added set_graph() helper function to set named graph
  on triples
- graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL
  named graph
- rag.py: Explainability triples stored in user's collection (not
  separate collection) with named graph

Extraction Provenance → Named Graph
- relationships/extract.py: Provenance triples use GRAPH_SOURCE
  named graph
- definitions/extract.py: Provenance triples use GRAPH_SOURCE
  named graph
- chunker.py: Provenance triples use GRAPH_SOURCE named graph
- pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph

CLI Updates
- show_graph.py: Added -g/--graph option to filter by named graph and
  --show-graph to display graph column

Also:
- Fix knowledge core schemas
2026-03-10 14:35:21 +00:00
cybermaggedon
57eda65674
Knowledge core processing updated for embeddings interface change (#681)
Knowledge core fixed: 
- trustgraph-flow/trustgraph/tables/knowledge.py - v.vector, v.chunk_id
- trustgraph-base/trustgraph/messaging/translators/document_loading.py -
  chunk.vector
- trustgraph-base/trustgraph/messaging/translators/knowledge.py -
  entity.vector
- trustgraph-flow/trustgraph/gateway/dispatch/serialize.py - entity.vector,
  chunk.vector

Test fixtures fixed:
- tests/unit/test_storage/conftest.py - All mock entities/chunks use vector
- tests/unit/test_query/conftest.py - All mock requests use vector
- tests/unit/test_query/test_doc_embeddings_pinecone_query.py - All mock
  messages use vector

These changes align with commit f2ae0e86 which changed the schema from
vectors: list[list[float]] to vector: list[float].
2026-03-10 13:28:16 +00:00
cybermaggedon
84941ce645
Fix Cassandra schema and graph filter semantics (#680)
Schema fix (dtype/lang clustering key):
- Add dtype and lang to PRIMARY KEY in quads_by_entity table
- Add otype, dtype, lang to PRIMARY KEY in quads_by_collection table
- Fixes deduplication bug where literals with same value but different
  datatype or language tag were collapsed (e.g., "thing" vs "thing"@en)
- Update delete_collection to pass new clustering columns
- Update tech spec to reflect new schema

Graph filter semantics (simplified, no wildcard constant):
- g=None means all graphs (no filter)
- g="" means default graph only
- g="uri" means specific named graph
- Remove GRAPH_WILDCARD usage from EntityCentricKnowledgeGraph
- Fix service.py streaming and non-streaming paths
- Fix CLI to preserve empty string for -g '' argument
2026-03-10 12:52:51 +00:00
cybermaggedon
7a6197d8c3
GraphRAG Query-Time Explainability (#677)
Implements full explainability pipeline for GraphRAG queries, enabling
traceability from answers back to source documents.

Renamed throughout for clarity:
- provenance_callback → explain_callback
- provenance_id → explain_id
- provenance_collection → explain_collection
- message_type "provenance" → "explain"
- Queue name "provenance" → "explainability"

GraphRAG queries now emit explainability events as they execute:
1. Session - query text and timestamp
2. Retrieval - edges retrieved from subgraph
3. Selection - selected edges with LLM reasoning (JSONL with id +
   reasoning)
4. Answer - reference to synthesized response

Events stream via explain_callback during query(), enabling
real-time UX.

- Answers stored in librarian service (not inline in graph - too large)
- Document ID as URN: urn:trustgraph:answer:{session_id}
- Graph stores tg:document reference (IRI) to librarian document
- Added librarian producer/consumer to graph-rag service

- get_labelgraph() now returns (labeled_edges, uri_map)
- uri_map maps edge_id(label_s, label_p, label_o) →
  (uri_s, uri_p, uri_o)
- Explainability data stores original URIs, not labels
- Enables tracing edges back to reifying statements via tg:reifies

- Added serialize_triple() to query service (matches storage format)
- get_term_value() now handles TRIPLE type terms
- Enables querying by quoted triple in object position:
  ?stmt tg:reifies <<s p o>>

- Displays real-time explainability events during query
- Resolves rdfs:label for edge components (s, p, o)
- Traces source chain via prov:wasDerivedFrom to root document
- Output: "Source: Chunk 1 → Page 2 → Document Title"
- Label caching to avoid repeated queries

GraphRagResponse:
- explain_id: str | None
- explain_collection: str | None
- message_type: str ("chunk" or "explain")
- end_of_session: bool

trustgraph-base/trustgraph/provenance/:
- namespaces.py - Added TG_DOCUMENT predicate
- triples.py - answer_triples() supports document_id reference
- uris.py - Added edge_selection_uri()

trustgraph-base/trustgraph/schema/services/retrieval.py:
- GraphRagResponse with explain_id, explain_collection, end_of_session

trustgraph-flow/trustgraph/retrieval/graph_rag/:
- graph_rag.py - URI preservation, streaming answer accumulation
- rag.py - Librarian integration, real-time explain emission

trustgraph-flow/trustgraph/query/triples/cassandra/service.py:
- Quoted triple serialization for query matching

trustgraph-cli/trustgraph/cli/invoke_graph_rag.py:
- Full explainability display with label resolution and source tracing
2026-03-10 10:00:01 +00:00
cybermaggedon
d2d71f859d
Feature/streaming triples (#676)
* Steaming triples

* Also GraphRAG service uses this

* Updated tests
2026-03-09 15:46:33 +00:00
cybermaggedon
f2ae0e8623
Embeddings API scores (#671)
- Put scores in all responses
- Remove unused 'middle' vector layer. Vector of texts -> vector of (vector embedding)
2026-03-09 10:53:44 +00:00