trustgraph

mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-04-25 16:36:21 +02:00

Author	SHA1	Message	Date
cybermaggedon	e8bc96ef7e	Release/v2.3 -> master	2026-04-17 09:09:22 +01:00
Alex Jenkins	8954fa3ad7	Feat: TrustGraph i18n & Documentation Translation Updates (#781 ) Native CLI i18n: The TrustGraph CLI has built-in translation support that dynamically loads language strings. You can test and use different languages by simply passing the --lang flag (e.g., --lang es for Spanish, --lang ru for Russian) or by configuring your environment's LANG variable. Automated Docs Translations: This PR introduces autonomously translated Markdown documentation into several target languages, including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew, Arabic, Simplified Chinese, and Russian.	2026-04-14 12:08:32 +01:00
cybermaggedon	d2751553a3	Add agent explainability instrumentation and unify envelope field naming (#795 ) Addresses recommendations from the UX developer's agent experience report. Adds provenance predicates, DAG structure changes, error resilience, and a published OWL ontology. Explainability additions: - Tool candidates: tg:toolCandidate on Analysis events lists the tools visible to the LLM for each iteration (names only, descriptions in config) - Termination reason: tg:terminationReason on Conclusion/Synthesis events (final-answer, plan-complete, subagents-complete) - Step counter: tg:stepNumber on iteration events - Pattern decision: new tg:PatternDecision entity in the DAG between session and first iteration, carrying tg:pattern and tg:taskType - Latency: tg:llmDurationMs on Analysis events, tg:toolDurationMs on Observation events - Token counts on events: tg:inToken/tg:outToken/tg:llmModel on Grounding, Focus, Synthesis, and Analysis events - Tool/parse errors: tg:toolError on Observation events with tg:Error mixin type. Parse failures return as error observations instead of crashing the agent, giving it a chance to retry. Envelope unification: - Rename chunk_type to message_type across AgentResponse schema, translator, SDK types, socket clients, CLI, and all tests. Agent and RAG services now both use message_type on the wire. Ontology: - specs/ontology/trustgraph.ttl — OWL vocabulary covering all 26 classes, 7 object properties, and 36+ datatype properties including new predicates. DAG structure tests: - tests/unit/test_provenance/test_dag_structure.py verifies the wasDerivedFrom chain for GraphRAG, DocumentRAG, and all three agent patterns (react, plan, supervisor) including the pattern-decision link.	2026-04-13 16:16:42 +01:00
cybermaggedon	c23e28aa66	Fix Metadata/EntityEmbeddings schema migration tail and add regression tests (#777 ) The Metadata dataclass dropped its `metadata: list[Triple]` field and EntityEmbeddings/ChunkEmbeddings settled on a singular `vector: list[float]` field, but several call sites kept passing `Metadata(metadata=...)` and `EntityEmbeddings(vectors=...)`. The bugs were latent until a websocket client first hit `/api/v1/flow/default/import/entity-contexts`, at which point the dispatcher TypeError'd on construction. Production fixes (5 call sites on the same migration tail): * trustgraph-flow gateway dispatchers entity_contexts_import.py and graph_embeddings_import.py — drop the stale Metadata(metadata=...) kwarg; switch graph_embeddings_import to the singular `vector` wire key. * trustgraph-base messaging translators knowledge.py and document_loading.py — fix decode side to read the singular `"vector"` key, matching what their own encode sides have always written. * trustgraph-flow tables/knowledge.py — fix Cassandra row deserialiser to construct EntityEmbeddings(vector=...) instead of vectors=. * trustgraph-flow gateway core_import/core_export — switch the kg-core msgpack wire format to the singular `"v"`/`"vector"` key and drop the dead `m["m"]` envelope field that referenced the removed Metadata.metadata triples list (it was a guaranteed KeyError on the export side). Defense-in-depth regression coverage (32 new tests across 7 files): * tests/contract/test_schema_field_contracts.py — pin the field set of Metadata, EntityEmbeddings, ChunkEmbeddings, EntityContext so any future schema rename fails CI loudly with a clear diff. * tests/unit/test_translators/test_knowledge_translator_roundtrip.py and test_document_embeddings_translator_roundtrip.py - encode→decode round-trip the affected translators end to end, locking in the singular `"vector"` wire key. * tests/unit/test_gateway/test_entity_contexts_import_dispatcher.py and test_graph_embeddings_import_dispatcher.py — exercise the websocket dispatchers' receive() path with realistic payloads, the direct regression test for the original production crash. * tests/unit/test_gateway/test_core_import_export_roundtrip.py — pack/unpack the kg-core msgpack format through the real dispatcher classes (with KnowledgeRequestor mocked), including a full export→import round-trip. * tests/unit/test_tables/test_knowledge_table_store.py — exercise the Cassandra row → schema conversion via __new__ to bypass the live cluster connection. Also fixes an unrelated leaked-coroutine RuntimeWarning in test_gateway/test_service.py::test_run_method_calls_web_run_app: the mocked aiohttp.web.run_app now closes the coroutine that Api.run() hands it, mirroring what the real run_app would do, instead of leaving it for the GC to complain about.	2026-04-10 20:43:45 +01:00
cybermaggedon	feeb92b33f	Refactor: Derive consumer behaviour from queue class (#772 ) Derive consumer behaviour from queue class, remove consumer_type parameter The queue class prefix (flow, request, response, notify) now fully determines consumer behaviour in both RabbitMQ and Pulsar backends. Added 'notify' class for ephemeral broadcast (config push notifications). Response and notify classes always create per-subscriber auto-delete queues, eliminating orphaned queues that accumulated on service restarts. Change init-trustgraph to set up the 'notify' namespace in Pulsar instead of old hangover 'state'. Fixes 'stuck backlog' on RabbitMQ config notification queue.	2026-04-09 09:55:41 +01:00
cybermaggedon	ddd4bd7790	Deliver explainability triples inline in retrieval response stream (#763 ) Provenance triples are now included directly in explain messages from GraphRAG, DocumentRAG, and Agent services, eliminating the need for follow-up knowledge graph queries to retrieve explainability details. Each explain message in the response stream now carries: - explain_id: root URI for this provenance step (unchanged) - explain_graph: named graph where triples are stored (unchanged) - explain_triples: the actual provenance triples for this step (new) Changes across the stack: - Schema: added explain_triples field to GraphRagResponse, DocumentRagResponse, and AgentResponse - Services: all explain message call sites pass triples through (graph_rag, document_rag, agent react, agent orchestrator) - Translators: encode explain_triples via TripleTranslator for gateway wire format - Python SDK: ProvenanceEvent now includes parsed ExplainEntity and raw triples; expanded event_type detection - CLI: invoke_graph_rag, invoke_agent, invoke_document_rag use inline entity when available, fall back to graph query - Tech specs updated Additional explainability test	2026-04-07 12:19:05 +01:00
cybermaggedon	4acd853023	Config push notify pattern: replace stateful pub/sub with signal+ fetch (#760 ) Replace the config push mechanism that broadcast the full config blob on a 'state' class pub/sub queue with a lightweight notify signal containing only the version number and affected config types. Processors fetch the full config via request/response from the config service when notified. This eliminates the need for the pub/sub 'state' queue class and stateful pub/sub services entirely. The config push queue moves from 'state' to 'flow' class — a simple transient signal rather than a retained message. This solves the RabbitMQ late-subscriber problem where restarting processes never received the current config because their fresh queue had no historical messages. Key changes: - ConfigPush schema: config dict replaced with types list - Subscribe-then-fetch startup with retry: processors subscribe to notify queue, fetch config via request/response, then process buffered notifies with version comparison to avoid race conditions - register_config_handler() accepts optional types parameter so handlers only fire when their config types change - Short-lived config request/response clients to avoid subscriber contention on non-persistent response topics - Config service passes affected types through put/delete/flow operations - Gateway ConfigReceiver rewritten with same notify pattern and retry loop Tests updated New tests: - register_config_handler: without types, with types, multiple types, multiple handlers - on_config_notify: old/same version skipped, irrelevant types skipped (version still updated), relevant type triggers fetch, handler without types always called, mixed handler filtering, empty types invokes all, fetch failure handled gracefully - fetch_config: returns config+version, raises on error response, stops client even on exception - fetch_and_apply_config: applies to all handlers on startup, retries on failure	2026-04-06 16:57:27 +01:00
V.Sreeram	d4723566cb	fix: prevent duplicate dispatcher creation race condition in invoke_global_service (#715 ) * fix: prevent duplicate dispatcher creation race condition in invoke_global_service Concurrent coroutines could all pass the `if key in self.dispatchers` check before any of them wrote the result back, because `await dispatcher.start()` yields to the event loop. This caused multiple Pulsar consumers to be created on the same shared subscription, distributing responses round-robin and dropping ~2/3 of them — manifesting as a permanent spinner in the Workbench UI. Apply a double-checked asyncio.Lock in both `invoke_global_service` and `invoke_flow_service` so only one dispatcher is ever created per service key. * test: add concurrent-dispatch tests for race condition fix Add asyncio.gather-based tests that verify invoke_global_service and invoke_flow_service create exactly one dispatcher under concurrent calls, preventing the duplicate Pulsar consumer bug.	2026-04-06 11:14:32 +01:00
cybermaggedon	4fb0b4d8e8	Pub/sub abstraction: decouple from Pulsar (#751 ) Remove Pulsar-specific concepts from application code so that the pub/sub backend is swappable via configuration. Rename translators: - to_pulsar/from_pulsar → decode/encode across all translator classes, dispatch handlers, and tests (55+ files) - from_response_with_completion → encode_with_completion - Remove pulsar.schema.Record from translator base class Queue naming (CLASS:TOPICSPACE:TOPIC): - Replace topic() helper with queue() using new format: flow:tg:name, request:tg:name, response:tg:name, state:tg:name - Queue class implies persistence/TTL (no QoS in names) - Update Pulsar backend map_topic() to parse new format - Librarian queues use flow class (persistent, for chunking) - Config push uses state class (persistent, last-value) - Remove 15 dead topic imports from schema files - Update init_trustgraph.py namespace: config → state Confine Pulsar to pulsar_backend.py: - Delete legacy PulsarClient class from pubsub.py - Move add_args to add_pubsub_args() with standalone flag for CLI tools (defaults to localhost) - PulsarBackendConsumer.receive() catches _pulsar.Timeout, raises standard TimeoutError - Remove Pulsar imports from: async_processor, flow_processor, log_level, all 11 client files, 4 storage writers, gateway service, gateway config receiver - Remove log_level/LoggerLevel from client API - Rewrite tg-monitor-prompts to use backend abstraction - Update tg-dump-queues to use add_pubsub_args Also: pubsub-abstraction.md tech spec covering problem statement, design goals, as-is requirements, candidate broker assessment, approach, and implementation order.	2026-04-01 20:16:53 +01:00
CommitHu502Craft	7af1d60db8	fix(gateway): accept raw utf-8 text in text-load (#729 ) Co-authored-by: nanqinhu <139929317+nanqinhu@users.noreply.github.com>	2026-03-30 17:00:10 +01:00
cybermaggedon	a634520509	Fix websocket error responses in Mux dispatcher (#726 ) Error responses from the websocket multiplexer were missing the request ID and using a bare string format instead of the structured error protocol. This caused clients to hang when a request failed (e.g. unsupported service for a flow) because the error could not be routed to the waiting caller. Include request ID in all error paths, use structured error format ({message, type}) with complete flag, and extract the ID early in receive() so even malformed requests get a routable error when possible. Updated tests - tests were coded against invalid protocol messages	2026-03-28 10:58:28 +00:00
cybermaggedon	aa4f5c6c00	Remove redundant metadata (#685 ) The metadata field (list of triples) in the pipeline Metadata class was redundant. Document metadata triples already flow directly from librarian to triple-store via emit_document_provenance() - they don't need to pass through the extraction pipeline. Additionally, chunker and PDF decoder were overwriting metadata to [] anyway, so any metadata passed through the pipeline was being discarded. Changes: - Remove metadata field from Metadata dataclass (schema/core/metadata.py) - Update all Metadata instantiations to remove metadata=[] parameter - Remove metadata handling from translators (document_loading, knowledge) - Remove metadata consumption from extractors (ontology, agent) - Update gateway serializers and import handlers - Update all unit, integration, and contract tests	2026-03-11 10:51:39 +00:00
cybermaggedon	7a6197d8c3	GraphRAG Query-Time Explainability (#677 ) Implements full explainability pipeline for GraphRAG queries, enabling traceability from answers back to source documents. Renamed throughout for clarity: - provenance_callback → explain_callback - provenance_id → explain_id - provenance_collection → explain_collection - message_type "provenance" → "explain" - Queue name "provenance" → "explainability" GraphRAG queries now emit explainability events as they execute: 1. Session - query text and timestamp 2. Retrieval - edges retrieved from subgraph 3. Selection - selected edges with LLM reasoning (JSONL with id + reasoning) 4. Answer - reference to synthesized response Events stream via explain_callback during query(), enabling real-time UX. - Answers stored in librarian service (not inline in graph - too large) - Document ID as URN: urn:trustgraph:answer:{session_id} - Graph stores tg:document reference (IRI) to librarian document - Added librarian producer/consumer to graph-rag service - get_labelgraph() now returns (labeled_edges, uri_map) - uri_map maps edge_id(label_s, label_p, label_o) → (uri_s, uri_p, uri_o) - Explainability data stores original URIs, not labels - Enables tracing edges back to reifying statements via tg:reifies - Added serialize_triple() to query service (matches storage format) - get_term_value() now handles TRIPLE type terms - Enables querying by quoted triple in object position: ?stmt tg:reifies <<s p o>> - Displays real-time explainability events during query - Resolves rdfs:label for edge components (s, p, o) - Traces source chain via prov:wasDerivedFrom to root document - Output: "Source: Chunk 1 → Page 2 → Document Title" - Label caching to avoid repeated queries GraphRagResponse: - explain_id: str \| None - explain_collection: str \| None - message_type: str ("chunk" or "explain") - end_of_session: bool trustgraph-base/trustgraph/provenance/: - namespaces.py - Added TG_DOCUMENT predicate - triples.py - answer_triples() supports document_id reference - uris.py - Added edge_selection_uri() trustgraph-base/trustgraph/schema/services/retrieval.py: - GraphRagResponse with explain_id, explain_collection, end_of_session trustgraph-flow/trustgraph/retrieval/graph_rag/: - graph_rag.py - URI preservation, streaming answer accumulation - rag.py - Librarian integration, real-time explain emission trustgraph-flow/trustgraph/query/triples/cassandra/service.py: - Quoted triple serialization for query matching trustgraph-cli/trustgraph/cli/invoke_graph_rag.py: - Full explainability display with label resolution and source tracing	2026-03-10 10:00:01 +00:00
cybermaggedon	1809c1f56d	Structured data 2 (#645 ) * Structured data refactor - multi-index tables, remove need for manual mods to the Cassandra tables * Tech spec updated to track implementation	2026-02-23 15:56:29 +00:00
cybermaggedon	cf0daedefa	Changed schema for Value -> Term, majorly breaking change (#622 ) * Changed schema for Value -> Term, majorly breaking change * Following the schema change, Value -> Term into all processing * Updated Cassandra for g, p, s, o index patterns (7 indexes) * Reviewed and updated all tests * Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down	2026-01-27 13:48:08 +00:00
cybermaggedon	62b754d788	Fix flow loading (#611 )	2026-01-14 16:23:15 +00:00
cybermaggedon	53cf5fd7f9	Fix test async warnings (#601 ) * Fix tracemalloc async warnings * Comment out debug, left in for use if needed	2026-01-06 22:09:34 +00:00
cybermaggedon	f79d0603f7	Update to add streaming tests (#600 )	2026-01-06 21:48:05 +00:00
cybermaggedon	5304f96fe6	Fix tests (#593 ) * Fix unit/integration/contract tests which were broken by messaging fabric work	2025-12-19 08:53:21 +00:00
cybermaggedon	ba95fa226b	Gateway queue overrides (#584 )	2025-12-06 11:01:20 +00:00
cybermaggedon	0b7620bc04	Object batching (#499 ) * Object batching * Update tests	2025-09-05 15:59:06 +01:00
cybermaggedon	257a7951a7	Object import (#497 ) * Object import dispatcher * Add object import gateway test	2025-09-05 14:06:01 +01:00
cybermaggedon	96c2b73457	Fix import export graceful shutdown (#476 ) * Tech spec for graceful shutdown * Graceful shutdown of importers/exporters * Update socket to include graceful shutdown orchestration * Adding tests for conditions tracked in this PR	2025-08-28 13:39:28 +01:00
cybermaggedon	2f7fddd206	Test suite executed from CI pipeline (#433 ) * Test strategy & test cases * Unit tests * Integration tests	2025-07-14 14:57:44 +01:00

24 commits