Commit graph

1385 commits

Author SHA1 Message Date
cybermaggedon
816a8cfcf6
Update tests for agent-orchestrator (#745)
Add 96 tests covering the orchestrator's aggregation, provenance,
routing, and explainability parsing. These verify the supervisor
fan-out/fan-in lifecycle, the new RDF provenance types
(Decomposition, Finding, Plan, StepResult, Synthesis), and their
round-trip through the wire format.

Unit tests (84):
- Aggregator: register, record completion, peek, build synthesis,
  cleanup
- Provenance triple builders: types, provenance links,
  goals/steps, labels
- Explainability parsing: from_triples dispatch, field extraction
  for all new entity types, precedence over existing types
- PatternBase: is_subagent detection, emit_subagent_completion
  message shape
- Completion dispatch: detection logic, full aggregator
  integration flow, synthesis request not re-intercepted as
  completion
- MetaRouter: task type identification, pattern selection,
  valid_patterns constraints, fallback on LLM error or unknown
  response

Contract tests (12):
- Orchestration fields on AgentRequest round-trip correctly
- subagent-completion and synthesise step types in request
  history
- Plan steps with status and dependencies
- Provenance triple builder → wire format → from_triples
  round-trip for all five new entity types
2026-03-31 13:12:26 +01:00
cybermaggedon
7b734148b3
agent-orchestrator: add explainability provenance for all patterns (#744)
agent-orchestrator: add explainability provenance for all agent
patterns

Extend the provenance/explainability system to provide
human-readable reasoning traces for the orchestrator's three
agent patterns. Previously only ReAct emitted provenance
(session, iteration, conclusion). Now each pattern records its
cognitive steps as typed RDF entities in the knowledge graph,
using composable mixin types (e.g. Finding + Answer).

New provenance chains:
- Supervisor: Question → Decomposition → Finding ×N → Synthesis
- Plan-then-Execute: Question → Plan → StepResult ×N → Synthesis
- ReAct: Question → Analysis ×N → Conclusion (unchanged)

New RDF types: Decomposition, Finding, Plan, StepResult.
New predicates: tg:subagentGoal, tg:planStep.
Reuses existing Synthesis + Answer mixin for final answers.

Provenance library (trustgraph-base):
- Triple builders, URI generators, vocabulary labels for new types
- Client dataclasses with from_triples() dispatch
- fetch_agent_trace() follows branching provenance chains
- API exports updated

Orchestrator (trustgraph-flow):
- PatternBase emit methods for decomposition, finding, plan, step result, and synthesis
- SupervisorPattern emits decomposition during fan-out
- PlanThenExecutePattern emits plan and step results
- Service emits finding triples on subagent completion
- Synthesis provenance replaces generic final triples

CLI (trustgraph-cli):
- invoke_agent -x displays new entity types inline
2026-03-31 12:54:51 +01:00
cybermaggedon
e65ea217a2
agent-orchestrator improvements (#743)
agent-orchestrator improvements:
- Improve agent trace
- Improve queue dumping
- Fixing supervisor pattern
- Fix synthesis step to remove loop

Minor dev environment improvements:
- Improve queue dump output for JSON
- Reduce dev container rebuild
2026-03-31 11:24:30 +01:00
cybermaggedon
81ca7bbc11
Change monitor default to prompts-rag (#742) 2026-03-31 09:35:58 +01:00
cybermaggedon
0781d3e6a7
Remove unnecessary prompt-client logging (#740) 2026-03-31 09:12:33 +01:00
cybermaggedon
849987f0e6
Add multi-pattern orchestrator with plan-then-execute and supervisor (#739)
Introduce an agent orchestrator service that supports three
execution patterns (ReAct, plan-then-execute, supervisor) with
LLM-based meta-routing to select the appropriate pattern and task
type per request. Update the agent schema to support
orchestration fields (correlation, sub-agents, plan steps) and
remove legacy response fields (answer, thought, observation).
2026-03-31 00:32:49 +01:00
CommitHu502Craft
7af1d60db8 fix(gateway): accept raw utf-8 text in text-load (#729)
Co-authored-by: nanqinhu <139929317+nanqinhu@users.noreply.github.com>
2026-03-30 17:00:10 +01:00
cybermaggedon
5a9db2da50
Add tg-monitor-prompts CLI tool for prompt queue monitoring (#737)
Subscribes to prompt request/response Pulsar queues, correlates
messages by ID, and logs a summary with template name, truncated
terms, and elapsed time. Streaming responses are accumulated and
shown at completion. Supports prompt and prompt-rag queue types.
2026-03-30 16:08:46 +01:00
Alex Jenkins
2f484b4c15
fix deadlink in readme (#735)
Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu>
2026-03-29 16:51:40 -07:00
cybermaggedon
2449392896
release/v2.2 -> master (#733) 2026-03-29 20:27:25 +01:00
cybermaggedon
687a9e08fe
master -> release/v2.2 (#732) 2026-03-29 20:26:26 +01:00
cybermaggedon
413f917676 Add missing pdf extra to unstructured dependency (#728)
* Fix PDF processing deps so that PDF processing works
2026-03-29 20:22:45 +01:00
Alex Jenkins
3ed71a5620
Add security policy (#731) 2026-03-29 20:17:48 +01:00
cybermaggedon
20204d87c3
Fix OpenAI compatibility issues for newer models and Azure config (#727)
Use max_completion_tokens for OpenAI and Azure OpenAI providers:
The OpenAI API deprecated max_tokens in favor of
max_completion_tokens for chat completions. Newer models
(gpt-4o, o1, o3) reject the old parameter with a 400 error.

AZURE_API_VERSION env var now overrides the default API version:
(falls back to 2024-12-01-preview).

Update tests to test for expected structures
2026-03-28 11:19:45 +00:00
cybermaggedon
a634520509
Fix websocket error responses in Mux dispatcher (#726)
Error responses from the websocket multiplexer were missing the
request ID and using a bare string format instead of the structured
error protocol. This caused clients to hang when a request failed
(e.g. unsupported service for a flow) because the error could not
be routed to the waiting caller.

Include request ID in all error paths, use structured error format
({message, type}) with complete flag, and extract the ID early in
receive() so even malformed requests get a routable error when
possible.

Updated tests - tests were coded against invalid protocol messages
2026-03-28 10:58:28 +00:00
Jack Colquitt
060ed258eb
Add license badge to README (#725) 2026-03-27 11:28:22 -07:00
cybermaggedon
ea33620fb2
Fix missing auth header in verify_system_status (#724)
Fix missing auth header in verify_system_status processor check               
                                                                             
The check_processors function received the token parameter but                
did not include it in the Authorization header when calling the               
metrics endpoint, causing 401 errors when gateway auth is enabled.
2026-03-26 16:58:30 +00:00
cybermaggedon
9c55a0a0ff
Persistent websocket connections for socket clients and CLI tools (#723)
Replace per-request websocket connections in SocketClient and
AsyncSocketClient with a single persistent connection that
multiplexes requests by ID via a background reader task. This
eliminates repeated TCP+WS handshakes which caused significant
latency over proxies.

Convert show_flows, show_flow_blueprints, and
show_parameter_types CLI tools from sequential HTTP requests to
concurrent websocket requests using AsyncSocketClient, reducing
round trips from O(N) sequential to a small number of parallel
batches.

Also fix describe_interfaces bug in show_flows where response
queue was reading the request field instead of the response
field.
2026-03-26 16:46:28 +00:00
cybermaggedon
1ec081f42f
Update CLA notice in repo (#722) 2026-03-26 14:18:13 +00:00
Cyber MacGeddon
5702bcae1d New CLA workflow: Uses a github action in
trustgraph-ai/contributor-license-agreement

This blocks a PR until the commiter responds with a message
of agreement with the CLA terms.
2026-03-26 14:11:36 +00:00
Cyber MacGeddon
f02bbdb442 New CLA workflow: Uses a github action in
trustgraph-ai/contributor-license-agreement

This blocks a PR until the commiter responds with a message
of agreement with the CLA terms.
2026-03-26 14:09:07 +00:00
cybermaggedon
4164ef1c47
Add GATEWAY_SECRET support for MCP server to API gateway auth (#721)
Pass bearer token from GATEWAY_SECRET environment variable as a
URL query parameter on websocket connections to the API gateway.
When unset or empty, no auth is applied (backwards compatible).
2026-03-26 10:49:28 +00:00
cybermaggedon
97f5645ea0
CLA (#716)
Explanatory text for the CLA process
2026-03-26 09:08:09 +00:00
cybermaggedon
1f67fc2312
master -> release/v2.2 (#713)
Merge doc updates from master into release branch
2026-03-25 17:53:20 +00:00
cybermaggedon
3ccff800c7
Merge pull request #712 from trustgraph-ai/release/v2.2
release/v2.2 -> master
2026-03-25 17:49:19 +00:00
cybermaggedon
9330730afb
Add chunk content ID to explain trace provenance output (#708)
When --show-provenance is used with tg-show-explain-trace,
display the chunk URI on a Content: line below each Source:
chain. This allows the user to easily fetch the source text
with tg-get-document-content.
2026-03-23 16:20:52 +00:00
cybermaggedon
25995d03f4
Fix stray log messages caused by librarian messages (#706)
Warning generated by librarian responses meant for other
services (chunker, embeddings, etc.) arriving on the shared
response queue. The decoder's subscription picks them up, can't
match them to a pending request, and logs a warning.

Removed the warnings, as not serving a purpose.
2026-03-23 13:16:39 +00:00
cybermaggedon
5c6fe90fe2
Add universal document decoder with multi-format support (#705)
Add universal document decoder with multi-format support
using 'unstructured'.

New universal decoder service powered by the unstructured
library, handling DOCX, XLSX, PPTX, HTML, Markdown, CSV, RTF,
ODT, EPUB and more through a single service. Tables are preserved
as HTML markup for better downstream extraction. Images are
stored in the librarian but excluded from the text
pipeline. Configurable section grouping strategies
(whole-document, heading, element-type, count, size) for non-page
formats. Page-based formats (PDF, PPTX, XLSX) are automatically
grouped by page.

All four decoders (PDF, Mistral OCR, Tesseract OCR, universal)
now share the "document-decoder" ident so they are
interchangeable.  PDF-only decoders fetch document metadata to
check MIME type and gracefully skip unsupported formats.

Librarian changes: removed MIME type whitelist validation so any
document format can be ingested. Simplified routing so text/plain
goes to text-load and everything else goes to document-load.
Removed dual inline/streaming data paths — documents always use
document_id for content retrieval.

New provenance entity types (tg:Section, tg:Image) and metadata
predicates (tg:elementTypes, tg:tableCount, tg:imageCount) for
richer explainability.

Universal decoder is in its own package (trustgraph-unstructured)
and container image (trustgraph-unstructured).
2026-03-23 12:56:35 +00:00
cybermaggedon
4609424afe
Prepare 2.2 release branch (#704) 2026-03-22 15:23:23 +00:00
cybermaggedon
96fd1eab15
Use UUID-based URNs for page and chunk IDs (#703)
Page and chunk document IDs were deterministic ({doc_id}/p{num},
{doc_id}/p{num}/c{num}), causing "Document already exists" errors
when reprocessing documents through different flows. Content may
differ between runs due to different parameters or extractors, so
deterministic IDs are incorrect.

Pages now use urn:page:{uuid}, chunks use
urn:chunk:{uuid}. Parent- child relationships are tracked via
librarian metadata and provenance triples.

Also brings Mistral OCR and Tesseract OCR decoders up to parity
with the PDF decoder: librarian fetch/save support, per-page
output with unique IDs, and provenance triple emission. Fixes
Mistral OCR bug where only the first 5 pages were processed.
2026-03-21 21:17:03 +00:00
cybermaggedon
1a7b654bd3
Add semantic pre-filter for GraphRAG edge scoring (#702)
Embed edge descriptions and compute cosine similarity against grounding
concepts to reduce the number of edges sent to expensive LLM scoring.
Controlled by edge_score_limit parameter (default 30), skipped when edge
count is already below the limit.

Also plumbs edge_score_limit and edge_limit parameters end-to-end:
- CLI args (--edge-score-limit, --edge-limit) in both invoke and service
- Socket client: fix parameter mapping to use hyphenated wire-format keys
- Flow API, message translator, gateway all pass through correctly
- Explainable code path (_question_explainable_api) now forwards all params
- Default edge_score_limit changed from 50 to 30 based on typical subgraph
  sizes
2026-03-21 20:06:29 +00:00
Jack Colquitt
d30857b5c3
Update video links and section titles in README 2026-03-20 21:33:16 -07:00
Jack Colquitt
b8ed36401a
Update README to reflect new section and links 2026-03-20 21:17:17 -07:00
Jack Colquitt
690ca4e837
Revise README to reflect context development platform
Updated project description and platform details in README.
2026-03-19 15:51:02 -07:00
Cyber MacGeddon
3ec5e1b385 Merge branch 'release/v2.1' 2026-03-17 20:59:48 +00:00
cybermaggedon
bc68738c37
README.md from master (#701) 2026-03-17 20:54:04 +00:00
Cyber MacGeddon
c818a1fe17 Fix broken merge 2026-03-17 20:52:51 +00:00
Cyber MacGeddon
64b934c814 Fix changing the README 2026-03-17 20:51:17 +00:00
Cyber MacGeddon
824f993985 Merge branch 'release/v2.1' 2026-03-17 20:44:03 +00:00
cybermaggedon
664d1d0384
Update API specs for 2.1 (#699)
* Updating API specs for 2.1

* Updated API and SDK docs
2026-03-17 20:36:31 +00:00
cybermaggedon
c387670944
Fix incorrect property names in explainability (#698)
Remove type suffixes from explainability dataclass fields + fix show_explain_trace

Rename dataclass fields to match KG property naming conventions:
- Analysis: thought_uri/observation_uri → thought/observation
- Synthesis/Conclusion/Reflection: document_uri → document

Fix show_explain_trace for current API:
- Resolve document content via librarian fetch instead of removed
  inline content fields (synthesis.content, conclusion.answer)
- Add Grounding display for DocRAG traces
- Update fetch_docrag_trace chain: Question → Grounding → Exploration →
Synthesis
- Pass api/explain_client to all print functions for content resolution

Update all CLI tools and tests for renamed fields.
2026-03-16 14:47:37 +00:00
cybermaggedon
a115ec06ab
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O

GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
  kg-edge-scoring,
  kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
  provenance/explainability edges
- Add source document edges to knowledge graph

DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
  pattern:
  Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication

Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
  entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
  tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
cybermaggedon
29b4300808
Updated test suite for explainability & provenance (#696)
* Provenance tests

* Embeddings tests

* Test librarian

* Test triples stream

* Test concurrency

* Entity centric graph writes

* Agent tool service tests

* Structured data tests

* RDF tests

* Addition LLM tests

* Reliability tests
2026-03-13 14:27:42 +00:00
cybermaggedon
e6623fc915
Remove schema:subjectOf edges from KG extraction (#695)
The subjectOf triples were redundant with the subgraph provenance model
introduced in e8407b34. Entity-to-source lineage can be traced via
tg:contains -> subgraph -> prov:wasDerivedFrom -> chunk, making the
direct subjectOf edges unnecessary metadata polluting the knowledge graph.

Removed from all three extractors (agent, definitions, relationships),
cleaned up the SUBJECT_OF constant and vocabulary label, and updated
tests accordingly.
2026-03-13 12:11:21 +00:00
cybermaggedon
64e3f6bd0d
Subgraph provenance (#694)
Replace per-triple provenance reification with subgraph model

Extraction provenance previously created a full reification (statement
URI, activity, agent) for every single extracted triple, producing ~13
provenance triples per knowledge triple.  Since each chunk is processed
by a single LLM call, this was both redundant and semantically
inaccurate.

Now one subgraph object is created per chunk extraction, with
tg:contains linking to each extracted triple.  For 20 extractions from
a chunk this reduces provenance from ~260 triples to ~33.

- Rename tg:reifies -> tg:contains, stmt_uri -> subgraph_uri
- Replace triple_provenance_triples() with subgraph_provenance_triples()
- Refactor kg-extract-definitions and kg-extract-relationships to
  generate provenance once per chunk instead of per triple
- Add subgraph provenance to kg-extract-ontology and kg-extract-agent
  (previously had none)
- Update CLI tools and tech specs to match

Also rename tg-show-document-hierarchy to tg-show-extraction-provenance.

Added extra typing for extraction provenance, fixed extraction prov CLI
2026-03-13 11:37:59 +00:00
cybermaggedon
35128ff019
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines

Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.

Explainability API:
- New explainability.py module with entity classes (Question,
  Exploration, Focus, Synthesis, Analysis, Conclusion) and
  ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
  fetching
- Content fetching from librarian with retry logic

CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
  explain_id
- tg-invoke-document-rag -x/--explainable flag returns
  explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types

Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
  references
- New predicates: tg:thoughtDocument, tg:observationDocument

DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references

Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
  observation_document_id

Update tests.
2026-03-12 21:40:09 +00:00
cybermaggedon
aecf00f040
Minor agent tweaks (#692)
Update RAG and Agent clients for streaming message handling

GraphRAG now sends multiple message types in a stream:
- 'explain' messages with explain_id and explain_graph for
  provenance
- 'chunk' messages with response text fragments
- end_of_session marker for stream completion

Updated all clients to handle this properly:

CLI clients (trustgraph-base/trustgraph/clients/):
- graph_rag_client.py: Added chunk_callback and explain_callback
- document_rag_client.py: Added chunk_callback and explain_callback
- agent_client.py: Added think, observe, answer_callback,
  error_callback

Internal clients (trustgraph-base/trustgraph/base/):
- graph_rag_client.py: Async callbacks for streaming
- agent_client.py: Async callbacks for streaming

All clients now:
- Route messages by chunk_type/message_type
- Stream via optional callbacks for incremental delivery
- Wait for proper completion signals
(end_of_dialog/end_of_session/end_of_stream)
- Accumulate and return complete response for callers not using
  callbacks

Updated callers:
- extract/kg/agent/extract.py: Uses new invoke(question=...) API
- tests/integration/test_agent_kg_extraction_integration.py:
  Updated mocks

This fixes the agent infinite loop issue where knowledge_query was
returning the first 'explain' message (empty response) instead of
waiting for the actual answer chunks.

Concurrency in triples query
2026-03-12 17:59:02 +00:00
cybermaggedon
45e6ad4abc
Fix ontology RAG pipeline + add query concurrency (#691)
- Fix ontology RAG pipeline: embeddings API, chunker provenance, and query concurrency

- Fix ontology embeddings to use correct response shape from embed()
  API (returns list of vectors, not list of list of vectors).
- Simplify chunker URI logic to append /c{index} to parent ID
  instead of parsing page/doc URI structure which was fragile.

- Add provenance tracking and librarian integration to token
  chunker, matching recursive chunker capabilities.

- Add configurable concurrency (default 10) to Cassandra, Qdrant,
  and embeddings query services.
2026-03-12 11:34:42 +00:00
Jack Colquitt
b8013fbed0
Update description of TrustGraph for clarity 2026-03-11 15:40:00 -07:00
Jack Colquitt
73ba197b89
Update TrustGraph link in README.md 2026-03-11 13:52:55 -07:00