trustgraph

mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-07-01 17:39:39 +02:00

Author	SHA1	Message	Date
elpresidank	f09ef4de45	feat: add document pipeline, ReAct agent, and knowledge core services Document Pipeline (Team A): - LibrarianService: document storage with filesystem backend, metadata persistence, child document hierarchy, collection management - ChunkingService: recursive character text splitter with configurable chunk size/overlap, FlowProcessor pattern - KnowledgeExtractService: combined relationship + definition extraction using prompt service and LLM, emits RDF triples and entity contexts - KnowledgeCoreService: knowledge core CRUD with streaming export and flow-based loading ReAct Agent (Team B): - StreamingReActParser: state machine for parsing LLM output into Thought/Action/ActionInput/FinalAnswer sections - Three MVP tools: KnowledgeQuery (GraphRAG), DocumentQuery (DocRAG), TriplesQuery with RequestResponse clients - AgentService FlowProcessor with ReAct loop, tool execution, and streaming chunk responses (thought/observation/answer) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 00:19:37 -05:00
elpresidank	5ed3f0e2d8	feat: add schema foundation for document pipeline, agent, and deployment Add missing topics (librarian, knowledge, collection-management, flow), pipeline message types (TextDocument, Chunk, Triples, EntityContexts), service message types (Librarian, Knowledge, Collection, Flow CRUD), and update AgentResponse for streaming chunk format. Add RequestResponseSpec enabling flow-scoped request/response calls (needed by knowledge extraction and agent services). Add requestor registry to Flow class with proper lifecycle management. Add end_of_dialog to gateway's isComplete() check for agent streaming. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 00:11:29 -05:00
elpresidank	28747e1a92	fix: NATS pipeline bugs, add integration tests and service runners Fix three critical bugs preventing the NATS message pipeline from working: - FlowProcessor now subscribes to config-push topic (was missing entirely), using DeliverPolicy.All to replay config on service restart - NATS streams use wildcard subjects (tg.flow.>) instead of per-topic narrow filters that caused 503 errors on publish - Subscriber dispatch loop has exponential backoff on errors to prevent tight error loops Add service runner scripts (gateway, config, LLM) and a 7-test integration suite that verifies config CRUD, WebSocket round-trip, and full LLM text-completion through the NATS pipeline. Fix Docker Compose infra: pin Tempo to v2.6.1, remove deprecated Loki config fields, add user:0 for volume permissions, remap conflicting ports (FalkorDB 6380, OTLP 4327/4328). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 23:41:39 -05:00
elpresidank	0042f9259c	fix: linter cleanup on flow service implementations Minor fixes from linter: readonly modifiers, unused parameter prefixes, type narrowing in graph-rag BFS traversal and edge scoring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 22:52:40 -05:00
elpresidank	b6536eca38	init	2026-04-05 22:44:45 -05:00
elpresidank	c386f68743	Merge commit '`74cc8a4685`' as 'ai-context/trustgraph-templates'	2026-04-05 21:09:49 -05:00
elpresidank	74cc8a4685	Squashed 'ai-context/trustgraph-templates/' content from commit 42a5fd1b git-subtree-dir: ai-context/trustgraph-templates git-subtree-split: 42a5fd1b678f32be378062e30451e2052ccb95dd	2026-04-05 21:09:49 -05:00
elpresidank	e26caa0b12	saving	2026-04-05 21:09:33 -05:00
elpresidank	9e9307a2aa	Merge commit '`ad40332d56`' as 'ai-context/trustgraph-templates'	2026-04-05 21:08:57 -05:00
elpresidank	ad40332d56	Squashed 'ai-context/trustgraph-templates/' content from commit 338a8ffa git-subtree-dir: ai-context/trustgraph-templates git-subtree-split: 338a8ffadb1439013071ae922e55ed2421f17025	2026-04-05 21:08:57 -05:00
elpresidank	ecaf3489f1	Merge commit '`9b2f675702`' as 'ai-context/context-graph-demo'	2026-04-05 21:08:35 -05:00
elpresidank	9b2f675702	Squashed 'ai-context/context-graph-demo/' content from commit 338a8ffa git-subtree-dir: ai-context/context-graph-demo git-subtree-split: 338a8ffadb1439013071ae922e55ed2421f17025	2026-04-05 21:08:35 -05:00
elpresidank	1a72bfdec0	Merge commit '`a8390532f7`' as 'ai-context/workbench-ui'	2026-04-05 21:08:02 -05:00
elpresidank	a8390532f7	Squashed 'ai-context/workbench-ui/' content from commit 32e36a5c git-subtree-dir: ai-context/workbench-ui git-subtree-split: 32e36a5c2131e429a7081cfaf67dabad3193cda3	2026-04-05 21:08:02 -05:00
elpresidank	05d87964c2	Merge commit '`deff028fed`' as 'ai-context/trustgraph-client'	2026-04-05 21:07:35 -05:00
elpresidank	deff028fed	Squashed 'ai-context/trustgraph-client/' content from commit 908f18cf git-subtree-dir: ai-context/trustgraph-client git-subtree-split: 908f18cf814470ec3b72cc336bb945fb792ffdec	2026-04-05 21:07:35 -05:00
Jack Colquitt	be443a1679	Refine README content and remove Table of Contents (#759 ) Updated the README to improve clarity and remove the Table of Contents section.	2026-04-04 13:40:12 -07:00
Jack Colquitt	8d1a4ae3bf	Revise quickstart instructions in README.md (#758 ) Updated the README to clarify the configuration process and improve wording.	2026-04-04 13:34:12 -07:00
Alex Jenkins	2f484b4c15	fix deadlink in readme (#735 ) Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu>	2026-03-29 16:51:40 -07:00
cybermaggedon	2449392896	release/v2.2 -> master (#733 )	2026-03-29 20:27:25 +01:00
Alex Jenkins	3ed71a5620	Add security policy (#731 )	2026-03-29 20:17:48 +01:00
Jack Colquitt	060ed258eb	Add license badge to README (#725 )	2026-03-27 11:28:22 -07:00
Cyber MacGeddon	5702bcae1d	New CLA workflow: Uses a github action in trustgraph-ai/contributor-license-agreement This blocks a PR until the commiter responds with a message of agreement with the CLA terms.	2026-03-26 14:11:36 +00:00
cybermaggedon	3ccff800c7	Merge pull request #712 from trustgraph-ai/release/v2.2 release/v2.2 -> master	2026-03-25 17:49:19 +00:00
cybermaggedon	9330730afb	Add chunk content ID to explain trace provenance output (#708 ) When --show-provenance is used with tg-show-explain-trace, display the chunk URI on a Content: line below each Source: chain. This allows the user to easily fetch the source text with tg-get-document-content.	2026-03-23 16:20:52 +00:00
cybermaggedon	25995d03f4	Fix stray log messages caused by librarian messages (#706 ) Warning generated by librarian responses meant for other services (chunker, embeddings, etc.) arriving on the shared response queue. The decoder's subscription picks them up, can't match them to a pending request, and logs a warning. Removed the warnings, as not serving a purpose.	2026-03-23 13:16:39 +00:00
cybermaggedon	5c6fe90fe2	Add universal document decoder with multi-format support (#705 ) Add universal document decoder with multi-format support using 'unstructured'. New universal decoder service powered by the unstructured library, handling DOCX, XLSX, PPTX, HTML, Markdown, CSV, RTF, ODT, EPUB and more through a single service. Tables are preserved as HTML markup for better downstream extraction. Images are stored in the librarian but excluded from the text pipeline. Configurable section grouping strategies (whole-document, heading, element-type, count, size) for non-page formats. Page-based formats (PDF, PPTX, XLSX) are automatically grouped by page. All four decoders (PDF, Mistral OCR, Tesseract OCR, universal) now share the "document-decoder" ident so they are interchangeable. PDF-only decoders fetch document metadata to check MIME type and gracefully skip unsupported formats. Librarian changes: removed MIME type whitelist validation so any document format can be ingested. Simplified routing so text/plain goes to text-load and everything else goes to document-load. Removed dual inline/streaming data paths — documents always use document_id for content retrieval. New provenance entity types (tg:Section, tg:Image) and metadata predicates (tg:elementTypes, tg:tableCount, tg:imageCount) for richer explainability. Universal decoder is in its own package (trustgraph-unstructured) and container image (trustgraph-unstructured).	2026-03-23 12:56:35 +00:00
cybermaggedon	4609424afe	Prepare 2.2 release branch (#704 )	2026-03-22 15:23:23 +00:00
cybermaggedon	96fd1eab15	Use UUID-based URNs for page and chunk IDs (#703 ) Page and chunk document IDs were deterministic ({doc_id}/p{num}, {doc_id}/p{num}/c{num}), causing "Document already exists" errors when reprocessing documents through different flows. Content may differ between runs due to different parameters or extractors, so deterministic IDs are incorrect. Pages now use urn:page:{uuid}, chunks use urn:chunk:{uuid}. Parent- child relationships are tracked via librarian metadata and provenance triples. Also brings Mistral OCR and Tesseract OCR decoders up to parity with the PDF decoder: librarian fetch/save support, per-page output with unique IDs, and provenance triple emission. Fixes Mistral OCR bug where only the first 5 pages were processed.	2026-03-21 21:17:03 +00:00
cybermaggedon	1a7b654bd3	Add semantic pre-filter for GraphRAG edge scoring (#702 ) Embed edge descriptions and compute cosine similarity against grounding concepts to reduce the number of edges sent to expensive LLM scoring. Controlled by edge_score_limit parameter (default 30), skipped when edge count is already below the limit. Also plumbs edge_score_limit and edge_limit parameters end-to-end: - CLI args (--edge-score-limit, --edge-limit) in both invoke and service - Socket client: fix parameter mapping to use hyphenated wire-format keys - Flow API, message translator, gateway all pass through correctly - Explainable code path (_question_explainable_api) now forwards all params - Default edge_score_limit changed from 50 to 30 based on typical subgraph sizes	2026-03-21 20:06:29 +00:00
Jack Colquitt	d30857b5c3	Update video links and section titles in README	2026-03-20 21:33:16 -07:00
Jack Colquitt	b8ed36401a	Update README to reflect new section and links	2026-03-20 21:17:17 -07:00
Jack Colquitt	690ca4e837	Revise README to reflect context development platform Updated project description and platform details in README.	2026-03-19 15:51:02 -07:00
Cyber MacGeddon	3ec5e1b385	Merge branch 'release/v2.1'	2026-03-17 20:59:48 +00:00
cybermaggedon	bc68738c37	README.md from master (#701 )	2026-03-17 20:54:04 +00:00
Cyber MacGeddon	c818a1fe17	Fix broken merge	2026-03-17 20:52:51 +00:00
Cyber MacGeddon	64b934c814	Fix changing the README	2026-03-17 20:51:17 +00:00
Cyber MacGeddon	824f993985	Merge branch 'release/v2.1'	2026-03-17 20:44:03 +00:00
cybermaggedon	664d1d0384	Update API specs for 2.1 (#699 ) * Updating API specs for 2.1 * Updated API and SDK docs	2026-03-17 20:36:31 +00:00
cybermaggedon	c387670944	Fix incorrect property names in explainability (#698 ) Remove type suffixes from explainability dataclass fields + fix show_explain_trace Rename dataclass fields to match KG property naming conventions: - Analysis: thought_uri/observation_uri → thought/observation - Synthesis/Conclusion/Reflection: document_uri → document Fix show_explain_trace for current API: - Resolve document content via librarian fetch instead of removed inline content fields (synthesis.content, conclusion.answer) - Add Grounding display for DocRAG traces - Update fetch_docrag_trace chain: Question → Grounding → Exploration → Synthesis - Pass api/explain_client to all print functions for content resolution Update all CLI tools and tests for renamed fields.	2026-03-16 14:47:37 +00:00
cybermaggedon	a115ec06ab	Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697 ) Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding, consistent PROV-O GraphRAG: - Split retrieval into 4 prompt stages: extract-concepts, kg-edge-scoring, kg-edge-reasoning, kg-synthesis (was single-stage) - Add concept extraction (grounding) for per-concept embedding - Filter main query to default graph, ignoring provenance/explainability edges - Add source document edges to knowledge graph DocumentRAG: - Add grounding step with concept extraction, matching GraphRAG's pattern: Question → Grounding → Exploration → Synthesis - Per-concept embedding and chunk retrieval with deduplication Cross-pipeline: - Make PROV-O derivation links consistent: wasGeneratedBy for first entity from Activity, wasDerivedFrom for entity-to-entity chains - Update CLIs (tg-invoke-agent, tg-invoke-graph-rag, tg-invoke-document-rag) for new explainability structure - Fix all affected unit and integration tests	2026-03-16 12:12:13 +00:00
cybermaggedon	29b4300808	Updated test suite for explainability & provenance (#696 ) * Provenance tests * Embeddings tests * Test librarian * Test triples stream * Test concurrency * Entity centric graph writes * Agent tool service tests * Structured data tests * RDF tests * Addition LLM tests * Reliability tests	2026-03-13 14:27:42 +00:00
cybermaggedon	e6623fc915	Remove schema:subjectOf edges from KG extraction (#695 ) The subjectOf triples were redundant with the subgraph provenance model introduced in `e8407b34`. Entity-to-source lineage can be traced via tg:contains -> subgraph -> prov:wasDerivedFrom -> chunk, making the direct subjectOf edges unnecessary metadata polluting the knowledge graph. Removed from all three extractors (agent, definitions, relationships), cleaned up the SUBJECT_OF constant and vocabulary label, and updated tests accordingly.	2026-03-13 12:11:21 +00:00
cybermaggedon	64e3f6bd0d	Subgraph provenance (#694 ) Replace per-triple provenance reification with subgraph model Extraction provenance previously created a full reification (statement URI, activity, agent) for every single extracted triple, producing ~13 provenance triples per knowledge triple. Since each chunk is processed by a single LLM call, this was both redundant and semantically inaccurate. Now one subgraph object is created per chunk extraction, with tg:contains linking to each extracted triple. For 20 extractions from a chunk this reduces provenance from ~260 triples to ~33. - Rename tg:reifies -> tg:contains, stmt_uri -> subgraph_uri - Replace triple_provenance_triples() with subgraph_provenance_triples() - Refactor kg-extract-definitions and kg-extract-relationships to generate provenance once per chunk instead of per triple - Add subgraph provenance to kg-extract-ontology and kg-extract-agent (previously had none) - Update CLI tools and tech specs to match Also rename tg-show-document-hierarchy to tg-show-extraction-provenance. Added extra typing for extraction provenance, fixed extraction prov CLI	2026-03-13 11:37:59 +00:00
cybermaggedon	35128ff019	Add unified explainability support and librarian storage for (#693 ) Add unified explainability support and librarian storage for all retrieval engines Implements consistent explainability/provenance tracking across GraphRAG, DocumentRAG, and Agent retrieval engines. All large content (answers, thoughts, observations) is now stored in librarian rather than as inline literals in the knowledge graph. Explainability API: - New explainability.py module with entity classes (Question, Exploration, Focus, Synthesis, Analysis, Conclusion) and ExplainabilityClient - Quiescence-based eventual consistency handling for trace fetching - Content fetching from librarian with retry logic CLI updates: - tg-invoke-graph-rag -x/--explainable flag returns explain_id - tg-invoke-document-rag -x/--explainable flag returns explain_id - tg-invoke-agent -x/--explainable flag returns explain_id - tg-list-explain-traces uses new explainability API - tg-show-explain-trace handles all three trace types Agent provenance: - Records session, iterations (think/act/observe), and conclusion - Stores thoughts and observations in librarian with document references - New predicates: tg:thoughtDocument, tg:observationDocument DocumentRAG provenance: - Records question, exploration (chunk retrieval), and synthesis - Stores answers in librarian with document references Schema changes: - AgentResponse: added explain_id, explain_graph fields - RetrievalResponse: added explain_id, explain_graph fields - agent_iteration_triples: supports thought_document_id, observation_document_id Update tests.	2026-03-12 21:40:09 +00:00
cybermaggedon	aecf00f040	Minor agent tweaks (#692 ) Update RAG and Agent clients for streaming message handling GraphRAG now sends multiple message types in a stream: - 'explain' messages with explain_id and explain_graph for provenance - 'chunk' messages with response text fragments - end_of_session marker for stream completion Updated all clients to handle this properly: CLI clients (trustgraph-base/trustgraph/clients/): - graph_rag_client.py: Added chunk_callback and explain_callback - document_rag_client.py: Added chunk_callback and explain_callback - agent_client.py: Added think, observe, answer_callback, error_callback Internal clients (trustgraph-base/trustgraph/base/): - graph_rag_client.py: Async callbacks for streaming - agent_client.py: Async callbacks for streaming All clients now: - Route messages by chunk_type/message_type - Stream via optional callbacks for incremental delivery - Wait for proper completion signals (end_of_dialog/end_of_session/end_of_stream) - Accumulate and return complete response for callers not using callbacks Updated callers: - extract/kg/agent/extract.py: Uses new invoke(question=...) API - tests/integration/test_agent_kg_extraction_integration.py: Updated mocks This fixes the agent infinite loop issue where knowledge_query was returning the first 'explain' message (empty response) instead of waiting for the actual answer chunks. Concurrency in triples query	2026-03-12 17:59:02 +00:00
cybermaggedon	45e6ad4abc	Fix ontology RAG pipeline + add query concurrency (#691 ) - Fix ontology RAG pipeline: embeddings API, chunker provenance, and query concurrency - Fix ontology embeddings to use correct response shape from embed() API (returns list of vectors, not list of list of vectors). - Simplify chunker URI logic to append /c{index} to parent ID instead of parsing page/doc URI structure which was fragile. - Add provenance tracking and librarian integration to token chunker, matching recursive chunker capabilities. - Add configurable concurrency (default 10) to Cassandra, Qdrant, and embeddings query services.	2026-03-12 11:34:42 +00:00
Jack Colquitt	b8013fbed0	Update description of TrustGraph for clarity	2026-03-11 15:40:00 -07:00
Jack Colquitt	73ba197b89	Update TrustGraph link in README.md	2026-03-11 13:52:55 -07:00
Jack Colquitt	d464d552e9	Revise README for clarity and API key details Updated various sections for clarity and added information about API key requirements.	2026-03-11 13:50:32 -07:00

1 2 3 4 5 ...

1234 commits