Commit graph

1234 commits

Author SHA1 Message Date
elpresidank
f09ef4de45 feat: add document pipeline, ReAct agent, and knowledge core services
Document Pipeline (Team A):
- LibrarianService: document storage with filesystem backend, metadata
  persistence, child document hierarchy, collection management
- ChunkingService: recursive character text splitter with configurable
  chunk size/overlap, FlowProcessor pattern
- KnowledgeExtractService: combined relationship + definition extraction
  using prompt service and LLM, emits RDF triples and entity contexts
- KnowledgeCoreService: knowledge core CRUD with streaming export and
  flow-based loading

ReAct Agent (Team B):
- StreamingReActParser: state machine for parsing LLM output into
  Thought/Action/ActionInput/FinalAnswer sections
- Three MVP tools: KnowledgeQuery (GraphRAG), DocumentQuery (DocRAG),
  TriplesQuery with RequestResponse clients
- AgentService FlowProcessor with ReAct loop, tool execution, and
  streaming chunk responses (thought/observation/answer)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:19:37 -05:00
elpresidank
5ed3f0e2d8 feat: add schema foundation for document pipeline, agent, and deployment
Add missing topics (librarian, knowledge, collection-management, flow),
pipeline message types (TextDocument, Chunk, Triples, EntityContexts),
service message types (Librarian, Knowledge, Collection, Flow CRUD),
and update AgentResponse for streaming chunk format.

Add RequestResponseSpec enabling flow-scoped request/response calls
(needed by knowledge extraction and agent services). Add requestor
registry to Flow class with proper lifecycle management.

Add end_of_dialog to gateway's isComplete() check for agent streaming.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:11:29 -05:00
elpresidank
28747e1a92 fix: NATS pipeline bugs, add integration tests and service runners
Fix three critical bugs preventing the NATS message pipeline from working:

- FlowProcessor now subscribes to config-push topic (was missing entirely),
  using DeliverPolicy.All to replay config on service restart
- NATS streams use wildcard subjects (tg.flow.>) instead of per-topic
  narrow filters that caused 503 errors on publish
- Subscriber dispatch loop has exponential backoff on errors to prevent
  tight error loops

Add service runner scripts (gateway, config, LLM) and a 7-test
integration suite that verifies config CRUD, WebSocket round-trip,
and full LLM text-completion through the NATS pipeline.

Fix Docker Compose infra: pin Tempo to v2.6.1, remove deprecated Loki
config fields, add user:0 for volume permissions, remap conflicting
ports (FalkorDB 6380, OTLP 4327/4328).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 23:41:39 -05:00
elpresidank
0042f9259c fix: linter cleanup on flow service implementations
Minor fixes from linter: readonly modifiers, unused parameter prefixes,
type narrowing in graph-rag BFS traversal and edge scoring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 22:52:40 -05:00
elpresidank
b6536eca38 init 2026-04-05 22:44:45 -05:00
elpresidank
c386f68743 Merge commit '74cc8a4685' as 'ai-context/trustgraph-templates' 2026-04-05 21:09:49 -05:00
elpresidank
74cc8a4685 Squashed 'ai-context/trustgraph-templates/' content from commit 42a5fd1b
git-subtree-dir: ai-context/trustgraph-templates
git-subtree-split: 42a5fd1b678f32be378062e30451e2052ccb95dd
2026-04-05 21:09:49 -05:00
elpresidank
e26caa0b12 saving 2026-04-05 21:09:33 -05:00
elpresidank
9e9307a2aa Merge commit 'ad40332d56' as 'ai-context/trustgraph-templates' 2026-04-05 21:08:57 -05:00
elpresidank
ad40332d56 Squashed 'ai-context/trustgraph-templates/' content from commit 338a8ffa
git-subtree-dir: ai-context/trustgraph-templates
git-subtree-split: 338a8ffadb1439013071ae922e55ed2421f17025
2026-04-05 21:08:57 -05:00
elpresidank
ecaf3489f1 Merge commit '9b2f675702' as 'ai-context/context-graph-demo' 2026-04-05 21:08:35 -05:00
elpresidank
9b2f675702 Squashed 'ai-context/context-graph-demo/' content from commit 338a8ffa
git-subtree-dir: ai-context/context-graph-demo
git-subtree-split: 338a8ffadb1439013071ae922e55ed2421f17025
2026-04-05 21:08:35 -05:00
elpresidank
1a72bfdec0 Merge commit 'a8390532f7' as 'ai-context/workbench-ui' 2026-04-05 21:08:02 -05:00
elpresidank
a8390532f7 Squashed 'ai-context/workbench-ui/' content from commit 32e36a5c
git-subtree-dir: ai-context/workbench-ui
git-subtree-split: 32e36a5c2131e429a7081cfaf67dabad3193cda3
2026-04-05 21:08:02 -05:00
elpresidank
05d87964c2 Merge commit 'deff028fed' as 'ai-context/trustgraph-client' 2026-04-05 21:07:35 -05:00
elpresidank
deff028fed Squashed 'ai-context/trustgraph-client/' content from commit 908f18cf
git-subtree-dir: ai-context/trustgraph-client
git-subtree-split: 908f18cf814470ec3b72cc336bb945fb792ffdec
2026-04-05 21:07:35 -05:00
Jack Colquitt
be443a1679
Refine README content and remove Table of Contents (#759)
Updated the README to improve clarity and remove the Table of Contents section.
2026-04-04 13:40:12 -07:00
Jack Colquitt
8d1a4ae3bf
Revise quickstart instructions in README.md (#758)
Updated the README to clarify the configuration process and improve wording.
2026-04-04 13:34:12 -07:00
Alex Jenkins
2f484b4c15
fix deadlink in readme (#735)
Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu>
2026-03-29 16:51:40 -07:00
cybermaggedon
2449392896
release/v2.2 -> master (#733) 2026-03-29 20:27:25 +01:00
Alex Jenkins
3ed71a5620
Add security policy (#731) 2026-03-29 20:17:48 +01:00
Jack Colquitt
060ed258eb
Add license badge to README (#725) 2026-03-27 11:28:22 -07:00
Cyber MacGeddon
5702bcae1d New CLA workflow: Uses a github action in
trustgraph-ai/contributor-license-agreement

This blocks a PR until the commiter responds with a message
of agreement with the CLA terms.
2026-03-26 14:11:36 +00:00
cybermaggedon
3ccff800c7
Merge pull request #712 from trustgraph-ai/release/v2.2
release/v2.2 -> master
2026-03-25 17:49:19 +00:00
cybermaggedon
9330730afb
Add chunk content ID to explain trace provenance output (#708)
When --show-provenance is used with tg-show-explain-trace,
display the chunk URI on a Content: line below each Source:
chain. This allows the user to easily fetch the source text
with tg-get-document-content.
2026-03-23 16:20:52 +00:00
cybermaggedon
25995d03f4
Fix stray log messages caused by librarian messages (#706)
Warning generated by librarian responses meant for other
services (chunker, embeddings, etc.) arriving on the shared
response queue. The decoder's subscription picks them up, can't
match them to a pending request, and logs a warning.

Removed the warnings, as not serving a purpose.
2026-03-23 13:16:39 +00:00
cybermaggedon
5c6fe90fe2
Add universal document decoder with multi-format support (#705)
Add universal document decoder with multi-format support
using 'unstructured'.

New universal decoder service powered by the unstructured
library, handling DOCX, XLSX, PPTX, HTML, Markdown, CSV, RTF,
ODT, EPUB and more through a single service. Tables are preserved
as HTML markup for better downstream extraction. Images are
stored in the librarian but excluded from the text
pipeline. Configurable section grouping strategies
(whole-document, heading, element-type, count, size) for non-page
formats. Page-based formats (PDF, PPTX, XLSX) are automatically
grouped by page.

All four decoders (PDF, Mistral OCR, Tesseract OCR, universal)
now share the "document-decoder" ident so they are
interchangeable.  PDF-only decoders fetch document metadata to
check MIME type and gracefully skip unsupported formats.

Librarian changes: removed MIME type whitelist validation so any
document format can be ingested. Simplified routing so text/plain
goes to text-load and everything else goes to document-load.
Removed dual inline/streaming data paths — documents always use
document_id for content retrieval.

New provenance entity types (tg:Section, tg:Image) and metadata
predicates (tg:elementTypes, tg:tableCount, tg:imageCount) for
richer explainability.

Universal decoder is in its own package (trustgraph-unstructured)
and container image (trustgraph-unstructured).
2026-03-23 12:56:35 +00:00
cybermaggedon
4609424afe
Prepare 2.2 release branch (#704) 2026-03-22 15:23:23 +00:00
cybermaggedon
96fd1eab15
Use UUID-based URNs for page and chunk IDs (#703)
Page and chunk document IDs were deterministic ({doc_id}/p{num},
{doc_id}/p{num}/c{num}), causing "Document already exists" errors
when reprocessing documents through different flows. Content may
differ between runs due to different parameters or extractors, so
deterministic IDs are incorrect.

Pages now use urn:page:{uuid}, chunks use
urn:chunk:{uuid}. Parent- child relationships are tracked via
librarian metadata and provenance triples.

Also brings Mistral OCR and Tesseract OCR decoders up to parity
with the PDF decoder: librarian fetch/save support, per-page
output with unique IDs, and provenance triple emission. Fixes
Mistral OCR bug where only the first 5 pages were processed.
2026-03-21 21:17:03 +00:00
cybermaggedon
1a7b654bd3
Add semantic pre-filter for GraphRAG edge scoring (#702)
Embed edge descriptions and compute cosine similarity against grounding
concepts to reduce the number of edges sent to expensive LLM scoring.
Controlled by edge_score_limit parameter (default 30), skipped when edge
count is already below the limit.

Also plumbs edge_score_limit and edge_limit parameters end-to-end:
- CLI args (--edge-score-limit, --edge-limit) in both invoke and service
- Socket client: fix parameter mapping to use hyphenated wire-format keys
- Flow API, message translator, gateway all pass through correctly
- Explainable code path (_question_explainable_api) now forwards all params
- Default edge_score_limit changed from 50 to 30 based on typical subgraph
  sizes
2026-03-21 20:06:29 +00:00
Jack Colquitt
d30857b5c3
Update video links and section titles in README 2026-03-20 21:33:16 -07:00
Jack Colquitt
b8ed36401a
Update README to reflect new section and links 2026-03-20 21:17:17 -07:00
Jack Colquitt
690ca4e837
Revise README to reflect context development platform
Updated project description and platform details in README.
2026-03-19 15:51:02 -07:00
Cyber MacGeddon
3ec5e1b385 Merge branch 'release/v2.1' 2026-03-17 20:59:48 +00:00
cybermaggedon
bc68738c37
README.md from master (#701) 2026-03-17 20:54:04 +00:00
Cyber MacGeddon
c818a1fe17 Fix broken merge 2026-03-17 20:52:51 +00:00
Cyber MacGeddon
64b934c814 Fix changing the README 2026-03-17 20:51:17 +00:00
Cyber MacGeddon
824f993985 Merge branch 'release/v2.1' 2026-03-17 20:44:03 +00:00
cybermaggedon
664d1d0384
Update API specs for 2.1 (#699)
* Updating API specs for 2.1

* Updated API and SDK docs
2026-03-17 20:36:31 +00:00
cybermaggedon
c387670944
Fix incorrect property names in explainability (#698)
Remove type suffixes from explainability dataclass fields + fix show_explain_trace

Rename dataclass fields to match KG property naming conventions:
- Analysis: thought_uri/observation_uri → thought/observation
- Synthesis/Conclusion/Reflection: document_uri → document

Fix show_explain_trace for current API:
- Resolve document content via librarian fetch instead of removed
  inline content fields (synthesis.content, conclusion.answer)
- Add Grounding display for DocRAG traces
- Update fetch_docrag_trace chain: Question → Grounding → Exploration →
Synthesis
- Pass api/explain_client to all print functions for content resolution

Update all CLI tools and tests for renamed fields.
2026-03-16 14:47:37 +00:00
cybermaggedon
a115ec06ab
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O

GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
  kg-edge-scoring,
  kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
  provenance/explainability edges
- Add source document edges to knowledge graph

DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
  pattern:
  Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication

Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
  entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
  tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00
cybermaggedon
29b4300808
Updated test suite for explainability & provenance (#696)
* Provenance tests

* Embeddings tests

* Test librarian

* Test triples stream

* Test concurrency

* Entity centric graph writes

* Agent tool service tests

* Structured data tests

* RDF tests

* Addition LLM tests

* Reliability tests
2026-03-13 14:27:42 +00:00
cybermaggedon
e6623fc915
Remove schema:subjectOf edges from KG extraction (#695)
The subjectOf triples were redundant with the subgraph provenance model
introduced in e8407b34. Entity-to-source lineage can be traced via
tg:contains -> subgraph -> prov:wasDerivedFrom -> chunk, making the
direct subjectOf edges unnecessary metadata polluting the knowledge graph.

Removed from all three extractors (agent, definitions, relationships),
cleaned up the SUBJECT_OF constant and vocabulary label, and updated
tests accordingly.
2026-03-13 12:11:21 +00:00
cybermaggedon
64e3f6bd0d
Subgraph provenance (#694)
Replace per-triple provenance reification with subgraph model

Extraction provenance previously created a full reification (statement
URI, activity, agent) for every single extracted triple, producing ~13
provenance triples per knowledge triple.  Since each chunk is processed
by a single LLM call, this was both redundant and semantically
inaccurate.

Now one subgraph object is created per chunk extraction, with
tg:contains linking to each extracted triple.  For 20 extractions from
a chunk this reduces provenance from ~260 triples to ~33.

- Rename tg:reifies -> tg:contains, stmt_uri -> subgraph_uri
- Replace triple_provenance_triples() with subgraph_provenance_triples()
- Refactor kg-extract-definitions and kg-extract-relationships to
  generate provenance once per chunk instead of per triple
- Add subgraph provenance to kg-extract-ontology and kg-extract-agent
  (previously had none)
- Update CLI tools and tech specs to match

Also rename tg-show-document-hierarchy to tg-show-extraction-provenance.

Added extra typing for extraction provenance, fixed extraction prov CLI
2026-03-13 11:37:59 +00:00
cybermaggedon
35128ff019
Add unified explainability support and librarian storage for (#693)
Add unified explainability support and librarian storage for all retrieval engines

Implements consistent explainability/provenance tracking
across GraphRAG, DocumentRAG, and Agent retrieval
engines. All large content (answers, thoughts, observations)
is now stored in librarian rather than as inline literals in
the knowledge graph.

Explainability API:
- New explainability.py module with entity classes (Question,
  Exploration, Focus, Synthesis, Analysis, Conclusion) and
  ExplainabilityClient
- Quiescence-based eventual consistency handling for trace
  fetching
- Content fetching from librarian with retry logic

CLI updates:
- tg-invoke-graph-rag -x/--explainable flag returns
  explain_id
- tg-invoke-document-rag -x/--explainable flag returns
  explain_id
- tg-invoke-agent -x/--explainable flag returns explain_id
- tg-list-explain-traces uses new explainability API
- tg-show-explain-trace handles all three trace types

Agent provenance:
- Records session, iterations (think/act/observe), and conclusion
- Stores thoughts and observations in librarian with document
  references
- New predicates: tg:thoughtDocument, tg:observationDocument

DocumentRAG provenance:
- Records question, exploration (chunk retrieval), and synthesis
- Stores answers in librarian with document references

Schema changes:
- AgentResponse: added explain_id, explain_graph fields
- RetrievalResponse: added explain_id, explain_graph fields
- agent_iteration_triples: supports thought_document_id,
  observation_document_id

Update tests.
2026-03-12 21:40:09 +00:00
cybermaggedon
aecf00f040
Minor agent tweaks (#692)
Update RAG and Agent clients for streaming message handling

GraphRAG now sends multiple message types in a stream:
- 'explain' messages with explain_id and explain_graph for
  provenance
- 'chunk' messages with response text fragments
- end_of_session marker for stream completion

Updated all clients to handle this properly:

CLI clients (trustgraph-base/trustgraph/clients/):
- graph_rag_client.py: Added chunk_callback and explain_callback
- document_rag_client.py: Added chunk_callback and explain_callback
- agent_client.py: Added think, observe, answer_callback,
  error_callback

Internal clients (trustgraph-base/trustgraph/base/):
- graph_rag_client.py: Async callbacks for streaming
- agent_client.py: Async callbacks for streaming

All clients now:
- Route messages by chunk_type/message_type
- Stream via optional callbacks for incremental delivery
- Wait for proper completion signals
(end_of_dialog/end_of_session/end_of_stream)
- Accumulate and return complete response for callers not using
  callbacks

Updated callers:
- extract/kg/agent/extract.py: Uses new invoke(question=...) API
- tests/integration/test_agent_kg_extraction_integration.py:
  Updated mocks

This fixes the agent infinite loop issue where knowledge_query was
returning the first 'explain' message (empty response) instead of
waiting for the actual answer chunks.

Concurrency in triples query
2026-03-12 17:59:02 +00:00
cybermaggedon
45e6ad4abc
Fix ontology RAG pipeline + add query concurrency (#691)
- Fix ontology RAG pipeline: embeddings API, chunker provenance, and query concurrency

- Fix ontology embeddings to use correct response shape from embed()
  API (returns list of vectors, not list of list of vectors).
- Simplify chunker URI logic to append /c{index} to parent ID
  instead of parsing page/doc URI structure which was fragile.

- Add provenance tracking and librarian integration to token
  chunker, matching recursive chunker capabilities.

- Add configurable concurrency (default 10) to Cassandra, Qdrant,
  and embeddings query services.
2026-03-12 11:34:42 +00:00
Jack Colquitt
b8013fbed0
Update description of TrustGraph for clarity 2026-03-11 15:40:00 -07:00
Jack Colquitt
73ba197b89
Update TrustGraph link in README.md 2026-03-11 13:52:55 -07:00
Jack Colquitt
d464d552e9
Revise README for clarity and API key details
Updated various sections for clarity and added information about API key requirements.
2026-03-11 13:50:32 -07:00