trustgraph

mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-04-25 16:36:21 +02:00

Author	SHA1	Message	Date
cybermaggedon	9f84891fcc	Flow service lifecycle management (#822 ) feat: separate flow service from config service with explicit queue lifecycle management The flow service is now an independent service that owns the lifecycle of flow and blueprint queues. System services own their own queues. Consumers never create queues. Flow service separation: - New service at trustgraph-flow/trustgraph/flow/service/ - Uses async ConfigClient (RequestResponse pattern) to talk to config service - Config service stripped of all flow handling Queue lifecycle management: - PubSubBackend protocol gains create_queue, delete_queue, queue_exists, ensure_queue — all async - RabbitMQ: implements via pika with asyncio.to_thread internally - Pulsar: stubs for future admin REST API implementation - Consumer _connect() no longer creates queues (passive=True for named queues) - System services call ensure_queue on startup - Flow service creates queues on flow start, deletes on flow stop - Flow service ensures queues for pre-existing flows on startup Two-phase flow stop: - Phase 1: set flow status to "stopping", delete processor config entries - Phase 2: retry queue deletion, then delete flow record Config restructure: - active-flow config replaced with processor:{name} types - Each processor has its own config type, each flow variant is a key - Flow start/stop use batch put/delete — single config push per operation - FlowProcessor subscribes to its own type only Blueprint format: - Processor entries split into topics and parameters dicts - Flow interfaces use {"flow": "topic"} instead of bare strings - Specs (ConsumerSpec, ProducerSpec, etc.) read from definition["topics"] Tests updated	2026-04-16 17:19:39 +01:00
cybermaggedon	ce3c8b421b	Remove Pulsar healthcheck from tg-verify-system-health (#809 ) The Pulsar healthcheck was removed from tg-verify-system-health and re-added accidentally in a PR. This removes it again.	2026-04-14 16:12:24 +01:00
Alex Jenkins	8954fa3ad7	Feat: TrustGraph i18n & Documentation Translation Updates (#781 ) Native CLI i18n: The TrustGraph CLI has built-in translation support that dynamically loads language strings. You can test and use different languages by simply passing the --lang flag (e.g., --lang es for Spanish, --lang ru for Russian) or by configuring your environment's LANG variable. Automated Docs Translations: This PR introduces autonomously translated Markdown documentation into several target languages, including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew, Arabic, Simplified Chinese, and Russian.	2026-04-14 12:08:32 +01:00
cybermaggedon	d2751553a3	Add agent explainability instrumentation and unify envelope field naming (#795 ) Addresses recommendations from the UX developer's agent experience report. Adds provenance predicates, DAG structure changes, error resilience, and a published OWL ontology. Explainability additions: - Tool candidates: tg:toolCandidate on Analysis events lists the tools visible to the LLM for each iteration (names only, descriptions in config) - Termination reason: tg:terminationReason on Conclusion/Synthesis events (final-answer, plan-complete, subagents-complete) - Step counter: tg:stepNumber on iteration events - Pattern decision: new tg:PatternDecision entity in the DAG between session and first iteration, carrying tg:pattern and tg:taskType - Latency: tg:llmDurationMs on Analysis events, tg:toolDurationMs on Observation events - Token counts on events: tg:inToken/tg:outToken/tg:llmModel on Grounding, Focus, Synthesis, and Analysis events - Tool/parse errors: tg:toolError on Observation events with tg:Error mixin type. Parse failures return as error observations instead of crashing the agent, giving it a chance to retry. Envelope unification: - Rename chunk_type to message_type across AgentResponse schema, translator, SDK types, socket clients, CLI, and all tests. Agent and RAG services now both use message_type on the wire. Ontology: - specs/ontology/trustgraph.ttl — OWL vocabulary covering all 26 classes, 7 object properties, and 36+ datatype properties including new predicates. DAG structure tests: - tests/unit/test_provenance/test_dag_structure.py verifies the wasDerivedFrom chain for GraphRAG, DocumentRAG, and all three agent patterns (react, plan, supervisor) including the pattern-decision link.	2026-04-13 16:16:42 +01:00
cybermaggedon	14e49d83c7	Expose LLM token usage across all service layers (#782 ) Expose LLM token usage (in_token, out_token, model) across all service layers Propagate token counts from LLM services through the prompt, text-completion, graph-RAG, document-RAG, and agent orchestrator pipelines to the API gateway and Python SDK. All fields are Optional — None means "not available", distinguishing from a real zero count. Key changes: - Schema: Add in_token/out_token/model to TextCompletionResponse, PromptResponse, GraphRagResponse, DocumentRagResponse, AgentResponse - TextCompletionClient: New TextCompletionResult return type. Split into text_completion() (non-streaming) and text_completion_stream() (streaming with per-chunk handler callback) - PromptClient: New PromptResult with response_type (text/json/jsonl), typed fields (text/object/objects), and token usage. All callers updated. - RAG services: Accumulate token usage across all prompt calls (extract-concepts, edge-scoring, edge-reasoning, synthesis). Non-streaming path sends single combined response instead of chunk + end_of_session. - Agent orchestrator: UsageTracker accumulates tokens across meta-router, pattern prompt calls, and react reasoning. Attached to end_of_dialog. - Translators: Encode token fields when not None (is not None, not truthy) - Python SDK: RAG and text-completion methods return TextCompletionResult (non-streaming) or RAGChunk/AgentAnswer with token fields (streaming) - CLI: --show-usage flag on tg-invoke-llm, tg-invoke-prompt, tg-invoke-graph-rag, tg-invoke-document-rag, tg-invoke-agent	2026-04-13 14:38:34 +01:00
cybermaggedon	67cfa80836	SPARQL CLI reports errors from service (#794 ) SPARQL query CLI ignores errors from the SPARQL service and just emits a zero row output. This change causes an error to be reported	2026-04-13 14:31:33 +01:00
cybermaggedon	0994d4b05f	Open 2.3 release branch (#775 ) * Update packages and CI for new release branch	2026-04-10 14:42:19 +01:00
cybermaggedon	feeb92b33f	Refactor: Derive consumer behaviour from queue class (#772 ) Derive consumer behaviour from queue class, remove consumer_type parameter The queue class prefix (flow, request, response, notify) now fully determines consumer behaviour in both RabbitMQ and Pulsar backends. Added 'notify' class for ephemeral broadcast (config push notifications). Response and notify classes always create per-subscriber auto-delete queues, eliminating orphaned queues that accumulated on service restarts. Change init-trustgraph to set up the 'notify' namespace in Pulsar instead of old hangover 'state'. Fixes 'stuck backlog' on RabbitMQ config notification queue.	2026-04-09 09:55:41 +01:00
cybermaggedon	ddd4bd7790	Deliver explainability triples inline in retrieval response stream (#763 ) Provenance triples are now included directly in explain messages from GraphRAG, DocumentRAG, and Agent services, eliminating the need for follow-up knowledge graph queries to retrieve explainability details. Each explain message in the response stream now carries: - explain_id: root URI for this provenance step (unchanged) - explain_graph: named graph where triples are stored (unchanged) - explain_triples: the actual provenance triples for this step (new) Changes across the stack: - Schema: added explain_triples field to GraphRagResponse, DocumentRagResponse, and AgentResponse - Services: all explain message call sites pass triples through (graph_rag, document_rag, agent react, agent orchestrator) - Translators: encode explain_triples via TripleTranslator for gateway wire format - Python SDK: ProvenanceEvent now includes parsed ExplainEntity and raw triples; expanded event_type detection - CLI: invoke_graph_rag, invoke_agent, invoke_document_rag use inline entity when available, fall back to graph query - Tech specs updated Additional explainability test	2026-04-07 12:19:05 +01:00
cybermaggedon	ee65d90fdd	SPARQL service supports batching/streaming (#755 )	2026-04-02 17:54:07 +01:00
cybermaggedon	d9dc4cbab5	SPARQL query service (#754 ) SPARQL 1.1 query service wrapping pub/sub triples interface Add a backend-agnostic SPARQL query service that parses SPARQL queries using rdflib, decomposes them into triple pattern lookups via the existing TriplesClient pub/sub interface, and performs in-memory joins, filters, and projections. Includes: - SPARQL parser, algebra evaluator, expression evaluator, solution sequence operations (BGP, JOIN, OPTIONAL, UNION, FILTER, BIND, VALUES, GROUP BY, ORDER BY, LIMIT/OFFSET, DISTINCT, aggregates) - FlowProcessor service with TriplesClientSpec - Gateway dispatcher, request/response translators, API spec - Python SDK method (FlowInstance.sparql_query) - CLI command (tg-invoke-sparql-query) - Tech spec (docs/tech-specs/sparql-query.md) New unit tests for SPARQL query	2026-04-02 17:21:39 +01:00
cybermaggedon	62c30a3a50	Skip Pulsar check in tg-verify-system-status (#753 )	2026-04-02 13:20:39 +01:00
cybermaggedon	24f0190ce7	RabbitMQ pub/sub backend with topic exchange architecture (#752 ) Adds a RabbitMQ backend as an alternative to Pulsar, selectable via PUBSUB_BACKEND=rabbitmq. Both backends implement the same PubSubBackend protocol — no application code changes needed to switch. RabbitMQ topology: - Single topic exchange per topicspace (e.g. 'tg') - Routing key derived from queue class and topic name - Shared consumers: named queue bound to exchange (competing, round-robin) - Exclusive consumers: anonymous auto-delete queue (broadcast, each gets every message). Used by Subscriber and config push consumer. - Thread-local producer connections (pika is not thread-safe) - Push-based consumption via basic_consume with process_data_events for heartbeat processing Consumer model changes: - Consumer class creates one backend consumer per concurrent task (required for pika thread safety, harmless for Pulsar) - Consumer class accepts consumer_type parameter - Subscriber passes consumer_type='exclusive' for broadcast semantics - Config push consumer uses consumer_type='exclusive' so every processor instance receives config updates - handle_one_from_queue receives consumer as parameter for correct per-connection ack/nack LibrarianClient: - New shared client class replacing duplicated librarian request-response code across 6+ services (chunking, decoders, RAG, etc.) - Uses stream-document instead of get-document-content for fetching document content in 1MB chunks (avoids broker message size limits) - Standalone object (self.librarian = LibrarianClient(...)) not a mixin - get-document-content marked deprecated in schema and OpenAPI spec Serialisation: - Extracted dataclass_to_dict/dict_to_dataclass to shared serialization.py (used by both Pulsar and RabbitMQ backends) Librarian queues: - Changed from flow class (persistent) back to request/response class now that stream-document eliminates large single messages - API upload chunk size reduced from 5MB to 3MB to stay under broker limits after base64 encoding Factory and CLI: - get_pubsub() handles 'rabbitmq' backend with RabbitMQ connection params - add_pubsub_args() includes RabbitMQ options (host, port, credentials) - add_pubsub_args(standalone=True) defaults to localhost for CLI tools - init_trustgraph skips Pulsar admin setup for non-Pulsar backends - tg-dump-queues and tg-monitor-prompts use backend abstraction - BaseClient and ConfigClient accept generic pubsub config	2026-04-02 12:47:16 +01:00
cybermaggedon	4fb0b4d8e8	Pub/sub abstraction: decouple from Pulsar (#751 ) Remove Pulsar-specific concepts from application code so that the pub/sub backend is swappable via configuration. Rename translators: - to_pulsar/from_pulsar → decode/encode across all translator classes, dispatch handlers, and tests (55+ files) - from_response_with_completion → encode_with_completion - Remove pulsar.schema.Record from translator base class Queue naming (CLASS:TOPICSPACE:TOPIC): - Replace topic() helper with queue() using new format: flow:tg:name, request:tg:name, response:tg:name, state:tg:name - Queue class implies persistence/TTL (no QoS in names) - Update Pulsar backend map_topic() to parse new format - Librarian queues use flow class (persistent, for chunking) - Config push uses state class (persistent, last-value) - Remove 15 dead topic imports from schema files - Update init_trustgraph.py namespace: config → state Confine Pulsar to pulsar_backend.py: - Delete legacy PulsarClient class from pubsub.py - Move add_args to add_pubsub_args() with standalone flag for CLI tools (defaults to localhost) - PulsarBackendConsumer.receive() catches _pulsar.Timeout, raises standard TimeoutError - Remove Pulsar imports from: async_processor, flow_processor, log_level, all 11 client files, 4 storage writers, gateway service, gateway config receiver - Remove log_level/LoggerLevel from client API - Rewrite tg-monitor-prompts to use backend abstraction - Update tg-dump-queues to use add_pubsub_args Also: pubsub-abstraction.md tech spec covering problem statement, design goals, as-is requirements, candidate broker assessment, approach, and implementation order.	2026-04-01 20:16:53 +01:00
cybermaggedon	153ae9ad30	Split Analysis into Analysis+ToolUse and Observation, add message_id (#747 ) Refactor agent provenance so that the decision (thought + tool selection) and the result (observation) are separate DAG entities: Question ← Analysis+ToolUse ← Observation ← ... ← Conclusion Analysis gains tg:ToolUse as a mixin RDF type and is emitted before tool execution via an on_action callback in react(). This ensures sub-traces (e.g. GraphRAG) appear after their parent Analysis in the streaming event order. Observation becomes a standalone prov:Entity with tg:Observation type, emitted after tool execution. The linear DAG chain runs through Observation — subsequent iterations and the Conclusion derive from it, not from the Analysis. message_id is populated on streaming AgentResponse for thought and observation chunks, using the provenance URI of the entity being built. This lets clients group streamed chunks by entity. Wire changes: - provenance/agent.py: Add ToolUse type, new agent_observation_triples(), remove observation from iteration - agent_manager.py: Add on_action callback between reason() and tool execution - orchestrator/pattern_base.py: Split emit, wire message_id, chain through observation URIs - orchestrator/react_pattern.py: Emit Analysis via on_action before tool runs - agent/react/service.py: Same for non-orchestrator path - api/explainability.py: New Observation class, updated dispatch and chain walker - api/types.py: Add message_id to AgentThought/AgentObservation - cli: Render Observation separately, [analysis: tool] labels	2026-03-31 17:51:22 +01:00
cybermaggedon	89e13a756a	Minor agent-orchestrator updates (#746 ) Tidy agent-orchestrator logs Added CLI support for selecting the pattern... tg-invoke-agent -q "What is the document about?" -p supervisor -v tg-invoke-agent -q "What is the document about?" -p plan-then-execute -v tg-invoke-agent -q "What is the document about?" -p react -v Added new event types to tg-show-explain-trace	2026-03-31 13:29:04 +01:00
cybermaggedon	7b734148b3	agent-orchestrator: add explainability provenance for all patterns (#744 ) agent-orchestrator: add explainability provenance for all agent patterns Extend the provenance/explainability system to provide human-readable reasoning traces for the orchestrator's three agent patterns. Previously only ReAct emitted provenance (session, iteration, conclusion). Now each pattern records its cognitive steps as typed RDF entities in the knowledge graph, using composable mixin types (e.g. Finding + Answer). New provenance chains: - Supervisor: Question → Decomposition → Finding ×N → Synthesis - Plan-then-Execute: Question → Plan → StepResult ×N → Synthesis - ReAct: Question → Analysis ×N → Conclusion (unchanged) New RDF types: Decomposition, Finding, Plan, StepResult. New predicates: tg:subagentGoal, tg:planStep. Reuses existing Synthesis + Answer mixin for final answers. Provenance library (trustgraph-base): - Triple builders, URI generators, vocabulary labels for new types - Client dataclasses with from_triples() dispatch - fetch_agent_trace() follows branching provenance chains - API exports updated Orchestrator (trustgraph-flow): - PatternBase emit methods for decomposition, finding, plan, step result, and synthesis - SupervisorPattern emits decomposition during fan-out - PlanThenExecutePattern emits plan and step results - Service emits finding triples on subagent completion - Synthesis provenance replaces generic final triples CLI (trustgraph-cli): - invoke_agent -x displays new entity types inline	2026-03-31 12:54:51 +01:00
cybermaggedon	e65ea217a2	agent-orchestrator improvements (#743 ) agent-orchestrator improvements: - Improve agent trace - Improve queue dumping - Fixing supervisor pattern - Fix synthesis step to remove loop Minor dev environment improvements: - Improve queue dump output for JSON - Reduce dev container rebuild	2026-03-31 11:24:30 +01:00
cybermaggedon	81ca7bbc11	Change monitor default to prompts-rag (#742 )	2026-03-31 09:35:58 +01:00
cybermaggedon	5a9db2da50	Add tg-monitor-prompts CLI tool for prompt queue monitoring (#737 ) Subscribes to prompt request/response Pulsar queues, correlates messages by ID, and logs a summary with template name, truncated terms, and elapsed time. Streaming responses are accumulated and shown at completion. Supports prompt and prompt-rag queue types.	2026-03-30 16:08:46 +01:00
cybermaggedon	ea33620fb2	Fix missing auth header in verify_system_status (#724 ) Fix missing auth header in verify_system_status processor check The check_processors function received the token parameter but did not include it in the Authorization header when calling the metrics endpoint, causing 401 errors when gateway auth is enabled.	2026-03-26 16:58:30 +00:00
cybermaggedon	9c55a0a0ff	Persistent websocket connections for socket clients and CLI tools (#723 ) Replace per-request websocket connections in SocketClient and AsyncSocketClient with a single persistent connection that multiplexes requests by ID via a background reader task. This eliminates repeated TCP+WS handshakes which caused significant latency over proxies. Convert show_flows, show_flow_blueprints, and show_parameter_types CLI tools from sequential HTTP requests to concurrent websocket requests using AsyncSocketClient, reducing round trips from O(N) sequential to a small number of parallel batches. Also fix describe_interfaces bug in show_flows where response queue was reading the request field instead of the response field.	2026-03-26 16:46:28 +00:00
cybermaggedon	9330730afb	Add chunk content ID to explain trace provenance output (#708 ) When --show-provenance is used with tg-show-explain-trace, display the chunk URI on a Content: line below each Source: chain. This allows the user to easily fetch the source text with tg-get-document-content.	2026-03-23 16:20:52 +00:00
cybermaggedon	4609424afe	Prepare 2.2 release branch (#704 )	2026-03-22 15:23:23 +00:00
cybermaggedon	1a7b654bd3	Add semantic pre-filter for GraphRAG edge scoring (#702 ) Embed edge descriptions and compute cosine similarity against grounding concepts to reduce the number of edges sent to expensive LLM scoring. Controlled by edge_score_limit parameter (default 30), skipped when edge count is already below the limit. Also plumbs edge_score_limit and edge_limit parameters end-to-end: - CLI args (--edge-score-limit, --edge-limit) in both invoke and service - Socket client: fix parameter mapping to use hyphenated wire-format keys - Flow API, message translator, gateway all pass through correctly - Explainable code path (_question_explainable_api) now forwards all params - Default edge_score_limit changed from 50 to 30 based on typical subgraph sizes	2026-03-21 20:06:29 +00:00
cybermaggedon	c387670944	Fix incorrect property names in explainability (#698 ) Remove type suffixes from explainability dataclass fields + fix show_explain_trace Rename dataclass fields to match KG property naming conventions: - Analysis: thought_uri/observation_uri → thought/observation - Synthesis/Conclusion/Reflection: document_uri → document Fix show_explain_trace for current API: - Resolve document content via librarian fetch instead of removed inline content fields (synthesis.content, conclusion.answer) - Add Grounding display for DocRAG traces - Update fetch_docrag_trace chain: Question → Grounding → Exploration → Synthesis - Pass api/explain_client to all print functions for content resolution Update all CLI tools and tests for renamed fields.	2026-03-16 14:47:37 +00:00
cybermaggedon	a115ec06ab	Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697 ) Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding, consistent PROV-O GraphRAG: - Split retrieval into 4 prompt stages: extract-concepts, kg-edge-scoring, kg-edge-reasoning, kg-synthesis (was single-stage) - Add concept extraction (grounding) for per-concept embedding - Filter main query to default graph, ignoring provenance/explainability edges - Add source document edges to knowledge graph DocumentRAG: - Add grounding step with concept extraction, matching GraphRAG's pattern: Question → Grounding → Exploration → Synthesis - Per-concept embedding and chunk retrieval with deduplication Cross-pipeline: - Make PROV-O derivation links consistent: wasGeneratedBy for first entity from Activity, wasDerivedFrom for entity-to-entity chains - Update CLIs (tg-invoke-agent, tg-invoke-graph-rag, tg-invoke-document-rag) for new explainability structure - Fix all affected unit and integration tests	2026-03-16 12:12:13 +00:00
cybermaggedon	64e3f6bd0d	Subgraph provenance (#694 ) Replace per-triple provenance reification with subgraph model Extraction provenance previously created a full reification (statement URI, activity, agent) for every single extracted triple, producing ~13 provenance triples per knowledge triple. Since each chunk is processed by a single LLM call, this was both redundant and semantically inaccurate. Now one subgraph object is created per chunk extraction, with tg:contains linking to each extracted triple. For 20 extractions from a chunk this reduces provenance from ~260 triples to ~33. - Rename tg:reifies -> tg:contains, stmt_uri -> subgraph_uri - Replace triple_provenance_triples() with subgraph_provenance_triples() - Refactor kg-extract-definitions and kg-extract-relationships to generate provenance once per chunk instead of per triple - Add subgraph provenance to kg-extract-ontology and kg-extract-agent (previously had none) - Update CLI tools and tech specs to match Also rename tg-show-document-hierarchy to tg-show-extraction-provenance. Added extra typing for extraction provenance, fixed extraction prov CLI	2026-03-13 11:37:59 +00:00
cybermaggedon	35128ff019	Add unified explainability support and librarian storage for (#693 ) Add unified explainability support and librarian storage for all retrieval engines Implements consistent explainability/provenance tracking across GraphRAG, DocumentRAG, and Agent retrieval engines. All large content (answers, thoughts, observations) is now stored in librarian rather than as inline literals in the knowledge graph. Explainability API: - New explainability.py module with entity classes (Question, Exploration, Focus, Synthesis, Analysis, Conclusion) and ExplainabilityClient - Quiescence-based eventual consistency handling for trace fetching - Content fetching from librarian with retry logic CLI updates: - tg-invoke-graph-rag -x/--explainable flag returns explain_id - tg-invoke-document-rag -x/--explainable flag returns explain_id - tg-invoke-agent -x/--explainable flag returns explain_id - tg-list-explain-traces uses new explainability API - tg-show-explain-trace handles all three trace types Agent provenance: - Records session, iterations (think/act/observe), and conclusion - Stores thoughts and observations in librarian with document references - New predicates: tg:thoughtDocument, tg:observationDocument DocumentRAG provenance: - Records question, exploration (chunk retrieval), and synthesis - Stores answers in librarian with document references Schema changes: - AgentResponse: added explain_id, explain_graph fields - RetrievalResponse: added explain_id, explain_graph fields - agent_iteration_triples: supports thought_document_id, observation_document_id Update tests.	2026-03-12 21:40:09 +00:00
cybermaggedon	312174eb88	Adding explainability to the ReACT agent (#689 ) * Added tech spec * Add provenance recording to React agent loop Enables agent sessions to be traced and debugged using the same explainability infrastructure as GraphRAG. Agent traces record: - Session start with query and timestamp - Each iteration's thought, action, arguments, and observation - Final answer with derivation chain Changes: - Add session_id and collection fields to AgentRequest schema - Add agent predicates (TG_THOUGHT, TG_ACTION, etc.) to namespaces - Create agent provenance triple generators in provenance/agent.py - Register explainability producer in agent service - Emit provenance triples during agent execution - Update CLI tools to detect and render agent traces alongside GraphRAG * Updated explainability taxonomy: GraphRAG: tg:Question → tg:Exploration → tg:Focus → tg:Synthesis Agent: tg:Question → tg:Analysis(s) → tg:Conclusion All entities also have their PROV-O type (prov:Activity or prov:Entity). Updated commit message: Add provenance recording to React agent loop Enables agent sessions to be traced and debugged using the same explainability infrastructure as GraphRAG. Entity types follow human reasoning patterns: - tg:Question - the user's query (shared with GraphRAG) - tg:Analysis - each think/act/observe cycle - tg:Conclusion - the final answer Also adds explicit TG types to GraphRAG entities: - tg:Question, tg:Exploration, tg:Focus, tg:Synthesis All types retain their PROV-O base types (prov:Activity, prov:Entity). Changes: - Add session_id and collection fields to AgentRequest schema - Add explainability entity types to namespaces.py - Create agent provenance triple generators - Register explainability producer in agent service - Emit provenance triples during agent execution - Update CLI tools to detect and render both trace types * Document RAG explainability is now complete. Here's a summary of the changes made: Schema Changes: - trustgraph-base/trustgraph/schema/services/retrieval.py: Added explain_id and explain_graph fields to DocumentRagResponse - trustgraph-base/trustgraph/messaging/translators/retrieval.py: Updated translator to handle explainability fields Provenance Changes: - trustgraph-base/trustgraph/provenance/namespaces.py: Added TG_CHUNK_COUNT and TG_SELECTED_CHUNK predicates - trustgraph-base/trustgraph/provenance/uris.py: Added docrag_question_uri, docrag_exploration_uri, docrag_synthesis_uri generators - trustgraph-base/trustgraph/provenance/triples.py: Added docrag_question_triples, docrag_exploration_triples, docrag_synthesis_triples builders - trustgraph-base/trustgraph/provenance/__init__.py: Exported all new Document RAG functions and predicates Service Changes: - trustgraph-flow/trustgraph/retrieval/document_rag/document_rag.py: Added explainability callback support and triple emission at each phase (Question → Exploration → Synthesis) - trustgraph-flow/trustgraph/retrieval/document_rag/rag.py: Registered explainability producer and wired up the callback Documentation: - docs/tech-specs/agent-explainability.md: Added Document RAG entity types and provenance model documentation Document RAG Provenance Model: Question (urn:trustgraph:docrag:{uuid}) │ │ tg:query, prov:startedAtTime │ rdf:type = prov:Activity, tg:Question │ ↓ prov:wasGeneratedBy │ Exploration (urn:trustgraph:docrag:{uuid}/exploration) │ │ tg:chunkCount, tg:selectedChunk (multiple) │ rdf:type = prov:Entity, tg:Exploration │ ↓ prov:wasDerivedFrom │ Synthesis (urn:trustgraph:docrag:{uuid}/synthesis) │ │ tg:content = "The answer..." │ rdf:type = prov:Entity, tg:Synthesis * Specific subtype that makes the retrieval mechanism immediately obvious: System: GraphRAG TG Types on Question: tg:Question, tg:GraphRagQuestion URI Pattern: urn:trustgraph:question:{uuid} ──────────────────────────────────────── System: Document RAG TG Types on Question: tg:Question, tg:DocRagQuestion URI Pattern: urn:trustgraph:docrag:{uuid} ──────────────────────────────────────── System: Agent TG Types on Question: tg:Question, tg:AgentQuestion URI Pattern: urn:trustgraph:agent:{uuid} Files modified: - trustgraph-base/trustgraph/provenance/namespaces.py - Added TG_GRAPH_RAG_QUESTION, TG_DOC_RAG_QUESTION, TG_AGENT_QUESTION - trustgraph-base/trustgraph/provenance/triples.py - Added subtype to question_triples and docrag_question_triples - trustgraph-base/trustgraph/provenance/agent.py - Added subtype to agent_session_triples - trustgraph-base/trustgraph/provenance/__init__.py - Exported new types - docs/tech-specs/agent-explainability.md - Documented the subtypes This allows: - Query all questions: ?q rdf:type tg:Question - Query only GraphRAG: ?q rdf:type tg:GraphRagQuestion - Query only Document RAG: ?q rdf:type tg:DocRagQuestion - Query only Agent: ?q rdf:type tg:AgentQuestion * Fixed tests	2026-03-11 15:28:15 +00:00
cybermaggedon	a53ed41da2	Add explainability CLI tools (#688 ) Add explainability CLI tools for debugging provenance data - tg-show-document-hierarchy: Display document → page → chunk → edge hierarchy by traversing prov:wasDerivedFrom relationships - tg-list-explain-traces: List all GraphRAG sessions with questions and timestamps from the retrieval graph - tg-show-explain-trace: Show full explainability cascade for a GraphRAG session (question → exploration → focus → synthesis) These tools query the source and retrieval graphs to help debug and explore provenance/explainability data stored during document processing and GraphRAG queries.	2026-03-11 13:44:29 +00:00
cybermaggedon	e1bc4c04a4	Terminology Rename, and named-graphs for explainability (#682 ) Terminology Rename, and named-graphs for explainability data Changed terminology: - session -> question - retrieval -> exploration - selection -> focus - answer -> synthesis - uris.py: Renamed query_session_uri → question_uri, retrieval_uri → exploration_uri, selection_uri → focus_uri, answer_uri → synthesis_uri - triples.py: Renamed corresponding triple generation functions with updated labels ("GraphRAG question", "Exploration", "Focus", "Synthesis") - namespaces.py: Added named graph constants GRAPH_DEFAULT, GRAPH_SOURCE, GRAPH_RETRIEVAL - init.py: Updated exports - graph_rag.py: Updated to use new terminology - invoke_graph_rag.py: Updated CLI to display new stage names (Question, Exploration, Focus, Synthesis) Query-Time Explainability → Named Graph - triples.py: Added set_graph() helper function to set named graph on triples - graph_rag.py: All explainability triples now use GRAPH_RETRIEVAL named graph - rag.py: Explainability triples stored in user's collection (not separate collection) with named graph Extraction Provenance → Named Graph - relationships/extract.py: Provenance triples use GRAPH_SOURCE named graph - definitions/extract.py: Provenance triples use GRAPH_SOURCE named graph - chunker.py: Provenance triples use GRAPH_SOURCE named graph - pdf_decoder.py: Provenance triples use GRAPH_SOURCE named graph CLI Updates - show_graph.py: Added -g/--graph option to filter by named graph and --show-graph to display graph column Also: - Fix knowledge core schemas	2026-03-10 14:35:21 +00:00
cybermaggedon	84941ce645	Fix Cassandra schema and graph filter semantics (#680 ) Schema fix (dtype/lang clustering key): - Add dtype and lang to PRIMARY KEY in quads_by_entity table - Add otype, dtype, lang to PRIMARY KEY in quads_by_collection table - Fixes deduplication bug where literals with same value but different datatype or language tag were collapsed (e.g., "thing" vs "thing"@en) - Update delete_collection to pass new clustering columns - Update tech spec to reflect new schema Graph filter semantics (simplified, no wildcard constant): - g=None means all graphs (no filter) - g="" means default graph only - g="uri" means specific named graph - Remove GRAPH_WILDCARD usage from EntityCentricKnowledgeGraph - Fix service.py streaming and non-streaming paths - Fix CLI to preserve empty string for -g '' argument	2026-03-10 12:52:51 +00:00
cybermaggedon	c951562189	Graph query CLI tool (#679 ) New CLI tool that enables selective queries against the triple store unlike tg-show-graph which dumps the entire graph. Features: - Filter by subject, predicate, object, and/or named graph - Auto-detection of term types (IRI, literal, quoted triple) - Two ways to specify quoted triples: - Inline Turtle-style: -o "<<s p o>>" - Explicit flags: --qt-subject, --qt-predicate, --qt-object - Output formats: space-separated, pipe-separated, JSON, JSON Lines - Streaming mode for efficient large result sets Auto-detection rules: - http://, https://, urn:, or <wrapped> -> IRI - <<s p o>> -> quoted triple - Otherwise -> literal	2026-03-10 11:03:34 +00:00
cybermaggedon	7a6197d8c3	GraphRAG Query-Time Explainability (#677 ) Implements full explainability pipeline for GraphRAG queries, enabling traceability from answers back to source documents. Renamed throughout for clarity: - provenance_callback → explain_callback - provenance_id → explain_id - provenance_collection → explain_collection - message_type "provenance" → "explain" - Queue name "provenance" → "explainability" GraphRAG queries now emit explainability events as they execute: 1. Session - query text and timestamp 2. Retrieval - edges retrieved from subgraph 3. Selection - selected edges with LLM reasoning (JSONL with id + reasoning) 4. Answer - reference to synthesized response Events stream via explain_callback during query(), enabling real-time UX. - Answers stored in librarian service (not inline in graph - too large) - Document ID as URN: urn:trustgraph:answer:{session_id} - Graph stores tg:document reference (IRI) to librarian document - Added librarian producer/consumer to graph-rag service - get_labelgraph() now returns (labeled_edges, uri_map) - uri_map maps edge_id(label_s, label_p, label_o) → (uri_s, uri_p, uri_o) - Explainability data stores original URIs, not labels - Enables tracing edges back to reifying statements via tg:reifies - Added serialize_triple() to query service (matches storage format) - get_term_value() now handles TRIPLE type terms - Enables querying by quoted triple in object position: ?stmt tg:reifies <<s p o>> - Displays real-time explainability events during query - Resolves rdfs:label for edge components (s, p, o) - Traces source chain via prov:wasDerivedFrom to root document - Output: "Source: Chunk 1 → Page 2 → Document Title" - Label caching to avoid repeated queries GraphRagResponse: - explain_id: str \| None - explain_collection: str \| None - message_type: str ("chunk" or "explain") - end_of_session: bool trustgraph-base/trustgraph/provenance/: - namespaces.py - Added TG_DOCUMENT predicate - triples.py - answer_triples() supports document_id reference - uris.py - Added edge_selection_uri() trustgraph-base/trustgraph/schema/services/retrieval.py: - GraphRagResponse with explain_id, explain_collection, end_of_session trustgraph-flow/trustgraph/retrieval/graph_rag/: - graph_rag.py - URI preservation, streaming answer accumulation - rag.py - Librarian integration, real-time explain emission trustgraph-flow/trustgraph/query/triples/cassandra/service.py: - Quoted triple serialization for query matching trustgraph-cli/trustgraph/cli/invoke_graph_rag.py: - Full explainability display with label resolution and source tracing	2026-03-10 10:00:01 +00:00
cybermaggedon	d2d71f859d	Feature/streaming triples (#676 ) * Steaming triples * Also GraphRAG service uses this * Updated tests	2026-03-09 15:46:33 +00:00
cybermaggedon	3c3e11bef5	Fix/librarian broken (#674 ) * Set end-of-stream cleanly - clean streaming message structures * Add tg-get-document-content	2026-03-09 13:36:24 +00:00
cybermaggedon	b2ef7bbb8c	Fix doc embeddings invocation (#672 ) * Fix doc embeddings invocation * Tidy query embeddings invocation	2026-03-09 11:07:32 +00:00
cybermaggedon	0a2ce47a88	Batch embeddings (#668 ) Base Service (trustgraph-base/trustgraph/base/embeddings_service.py): - Changed on_request to use request.texts FastEmbed Processor (trustgraph-flow/trustgraph/embeddings/fastembed/processor.py): - on_embeddings(texts, model=None) now processes full batch efficiently - Returns [[v.tolist()] for v in vecs] - list of vector sets Ollama Processor (trustgraph-flow/trustgraph/embeddings/ollama/processor.py): - on_embeddings(texts, model=None) passes list directly to Ollama - Returns [[embedding] for embedding in embeds.embeddings] EmbeddingsClient (trustgraph-base/trustgraph/base/embeddings_client.py): - embed(texts, timeout=300) accepts list of texts Tests Updated: - test_fastembed_dynamic_model.py - 4 tests updated for new interface - test_ollama_dynamic_model.py - 4 tests updated for new interface Updated CLI, SDK and APIs	2026-03-08 18:36:54 +00:00
cybermaggedon	24bbe94136	Document chunks not stored in vector store (#665 ) - Schema - ChunkEmbeddings now uses chunk_id: str instead of chunk: bytes - Schema - DocumentEmbeddingsResponse now returns chunk_ids: list[str] instead of chunks - Translators - Updated to serialize/deserialize chunk_id - Clients - DocumentEmbeddingsClient.query() returns chunk_ids - SDK/API - flow.py, socket_client.py, bulk_client.py updated - Document embeddings service - Stores chunk_id (document ID) instead of chunk text - Storage writers - Qdrant, Milvus, Pinecone store chunk_id in payload - Query services - Return chunk_id from vector store searches - Gateway dispatchers - Serialize chunk_id in API responses - Document RAG - Added librarian client to fetch chunk content from Garage using chunk_ids - CLI tools - Updated all three tools: - invoke_document_embeddings.py - displays chunk_ids, removed max_chunk_length - save_doc_embeds.py - exports chunk_id - load_doc_embeds.py - imports chunk_id	2026-03-07 23:10:45 +00:00
cybermaggedon	2b9232917c	Fix/extraction prov (#662 ) Quoted triple fixes, including... 1. Updated triple_provenance_triples() in triples.py: - Now accepts a Triple object directly - Creates the reification triple using TRIPLE term type: stmt_uri tg:reifies <<extracted_triple>> - Includes it in the returned provenance triples 2. Updated definitions extractor: - Added imports for provenance functions and component version - Added ParameterSpec for optional llm-model and ontology flow parameters - For each definition triple, generates provenance with reification 3. Updated relationships extractor: - Same changes as definitions extractor	2026-03-06 12:23:58 +00:00
cybermaggedon	a630e143ef	Incremental / large document loading (#659 ) Tech spec BlobStore (trustgraph-flow/trustgraph/librarian/blob_store.py): - get_stream() - yields document content in chunks for streaming retrieval - create_multipart_upload() - initializes S3 multipart upload, returns upload_id - upload_part() - uploads a single part, returns etag - complete_multipart_upload() - finalizes upload with part etags - abort_multipart_upload() - cancels and cleans up Cassandra schema (trustgraph-flow/trustgraph/tables/library.py): - New upload_session table with 24-hour TTL - Index on user for listing sessions - Prepared statements for all operations - Methods: create_upload_session(), get_upload_session(), update_upload_session_chunk(), delete_upload_session(), list_upload_sessions() - Schema extended with UploadSession, UploadProgress, and new request/response fields - Librarian methods: begin_upload, upload_chunk, complete_upload, abort_upload, get_upload_status, list_uploads - Service routing for all new operations - Python SDK with transparent chunked upload: - add_document() auto-switches to chunked for files > 10MB - Progress callback support (on_progress) - get_pending_uploads(), get_upload_status(), abort_upload(), resume_upload() - Document table: Added parent_id and document_type columns with index - Document schema (knowledge/document.py): Added document_id field for streaming retrieval - Librarian operations: - add-child-document for extracted PDF pages - list-children to get child documents - stream-document for chunked content retrieval - Cascade delete removes children when parent is deleted - list-documents filters children by default - PDF decoder (decoding/pdf/pdf_decoder.py): Updated to stream large documents from librarian API to temp file - Librarian service (librarian/service.py): Sends document_id instead of content for large PDFs (>2MB) - Deprecated tools (load_pdf.py, load_text.py): Added deprecation warnings directing users to tg-add-library-document + tg-start-library-processing Remove load_pdf and load_text utils Move chunker/librarian comms to base class Updating tests	2026-03-04 16:57:58 +00:00
cybermaggedon	88fe8468bc	Update CI for 2.1 release (#653 )	2026-02-28 11:10:11 +00:00
cybermaggedon	4bbc6d844f	Row embeddings APIs exposed (#646 ) * Added row embeddings API and CLI support * Updated protocol specs * Row embeddings agent tool * Add new agent tool to CLI	2026-02-23 21:52:56 +00:00
cybermaggedon	1809c1f56d	Structured data 2 (#645 ) * Structured data refactor - multi-index tables, remove need for manual mods to the Cassandra tables * Tech spec updated to track implementation	2026-02-23 15:56:29 +00:00
cybermaggedon	6bf08c3ace	Feature/more cli diags (#624 ) * CLI tools for tg-invoke-graph-embeddings, tg-invoke-document-embeddings, and tg-invoke-embeddings. Just useful for diagnostics. * Fix tg-load-knowledge	2026-02-04 14:10:30 +00:00
cybermaggedon	cf0daedefa	Changed schema for Value -> Term, majorly breaking change (#622 ) * Changed schema for Value -> Term, majorly breaking change * Following the schema change, Value -> Term into all processing * Updated Cassandra for g, p, s, o index patterns (7 indexes) * Reviewed and updated all tests * Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down	2026-01-27 13:48:08 +00:00
cybermaggedon	e4f0013841	Open 1.9 branch (#620 )	2026-01-26 17:36:25 +00:00
cybermaggedon	b08db761d7	Fix config inconsistency (#609 ) * Plural/singular confusion in config key * Flow class vs flow blueprint nomenclature change * Update docs & CLI to reflect the above	2026-01-14 12:31:40 +00:00
cybermaggedon	99f17d1b9d	Fix non-streaming (2) (#608 )	2026-01-12 21:21:51 +00:00

1 2 3 4

171 commits