Commit graph

1243 commits

Author SHA1 Message Date
elpresidank
50fb311d2d feat: real PDF pipeline test — end-to-end knowledge extraction working
Add full pipeline test that generates a real PDF, processes it through
the entire pipeline, and verifies knowledge lands in FalkorDB:

- Create test PDF generator using pdf-lib (2-page doc about Acme Corp)
- Add testFullPipeline() to integration tests with store verification
- Fix FalkorDB client connect() — createClient returns unconnected client
  in both TriplesStore and TriplesQuery classes

Results: PDF decoded (2 pages) → chunked (2 chunks) → extracted
(4 relationships) → 16 triples stored in FalkorDB including:
  alice-johnson → is-a-senior-engineer → acme-corporation
  cloudsync → uses-aws-for-hosting → amazon-web-services
  provenance: pages → prov:wasDerivedFrom → source document

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 02:19:12 -05:00
elpresidank
5bc7a1b6fc fix: resolve FlowProcessor topic collisions, librarian timeout, tests
Two bugs found during end-to-end testing:

1. FlowProcessor never restarted flows when config changed — it only
   started them once. Stale NATS JetStream data from previous sessions
   caused services to bind to wrong topics. Fix: stop and restart flows
   on every config push that includes flow definitions.

2. Gateway publishToTopic sent messages without an id property. Pipeline
   FlowProcessor handlers check properties.id and silently return if
   missing. Fix: auto-generate a message id when publishing to topics.

Both fixes validated: 13/13 integration tests passing, PDF decoder
correctly receives and processes document messages through the pipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 01:53:55 -05:00
elpresidank
c545213224 feat: add query/retrieval FlowProcessor services and missing runner scripts
Wire up the query and retrieval side of the pipeline so the agent can
answer questions from stored knowledge:

- Triples query service (FalkorDB) — all SPO pattern queries via NATS
- Graph embeddings query service (Qdrant) — entity vector similarity
- Document embeddings query service (Qdrant) — chunk vector similarity
- Graph RAG service — full concept→entity→traverse→score→synthesize pipeline
- Document RAG service — embed→find chunks→synthesize pipeline
- Runner scripts for chunker, extractor, embeddings (missing from Phase 5)
- Add DocumentEmbeddingsRequest/Response schema types
- Add RAG prompt templates (extract-concepts, edge-scoring, synthesize)
- Add graph/doc embeddings query topics to seed config + flow manager
- Add all pipeline/query/retrieval services to docker-compose
- 8 new runner scripts, 8 new pnpm script aliases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 01:05:54 -05:00
elpresidank
8f7008822a feat: add document pipeline — PDF decoder, Ollama LLM, storage services
Add end-to-end document processing pipeline:
- PDF decoder service (pdfjs-dist) extracts text per page from librarian docs
- Ollama native LLM service for local model inference
- FalkorDB triples store FlowProcessor consumer
- Qdrant graph embeddings store FlowProcessor consumer
- Fix spec name collisions in chunker/extractor (input→chunk-input, etc.)
- Gateway /load endpoint to trigger document processing
- Align flow manager blueprint and seed config with full pipeline topics
- Add runner scripts and test coverage for document load

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:47:43 -05:00
elpresidank
8f9de7604e fix: make abstract class constructors protected
Marks FlowProcessor and EmbeddingsService constructors as protected
since these classes should only be instantiated via subclasses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 21:52:00 -05:00
elpresidank
25d4227cb5 fix: resolve FlowProcessor topic collisions, librarian timeout, tests
Fix critical bug where all FlowProcessor services shared the same spec
names ("request"/"response"), causing them to steal each other's NATS
topics. Now each service uses unique spec names matching the flow config
topic keys (e.g., "text-completion-request", "prompt-request",
"agent-request").

Fix librarian NATS consumer timeout (500ms → 2000ms, below NATS minimum).

Update seed-config and test-pipeline with correct flow topic mappings.
Add prompt template runner script.

Smoke test results: 11/11 passing (config CRUD, WebSocket, LLM,
librarian CRUD). Agent routing verified via manual curl test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 01:02:10 -05:00
elpresidank
515fc0c264 fix: Docker build fixes, add agent/librarian/flow-manager to compose
Fix Containerfiles:
- Move tsconfig.json to workspace config layer for early availability
- Add missing workspace package.json entries for pnpm lockfile resolution

Docker Compose:
- Move Grafana from port 3000 to 3030 (avoid conflicts)
- Add agent, librarian, and flow-manager app services
- Add librarian-data volume for document persistence

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:41:01 -05:00
elpresidank
7db5a1023e feat: add flow manager, config seeding, and expanded integration tests
Flow Management Service:
- FlowManagerService (AsyncProcessor) handling list/get/start/stop flows
  and list/get blueprints via kebab-case wire format
- Default blueprint with all service topic mappings
- Pushes flow config to config service on start/stop

Config Seeding:
- seed-config.ts script pushes prompt templates (extract-relationships,
  extract-definitions, document-prompt, kg-prompt) and default flow
  definition via gateway REST API

Integration Tests:
- Librarian CRUD: add-document, list-documents, get-content, delete
- Agent query: verifies routing through gateway to agent service
- Skip flags: SKIP_LIBRARIAN=1, SKIP_AGENT=1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:37:03 -05:00
elpresidank
d1f24cf759 feat: add Docker deployment with Containerfile, entrypoints, and nginx
Multi-stage Containerfile for all Node.js services (single image,
different CMD per docker-compose service). ESM entrypoints for gateway,
config, text-completion, prompt, embeddings, agent, and librarian.

Workbench gets a separate Containerfile (nginx:alpine) with SPA routing
and API/WebSocket proxy to gateway.

Docker Compose updated with 6 app services (gateway, config-service,
text-completion, prompt, embeddings, workbench) using shared
trustgraph-ts:local image.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:21:00 -05:00
elpresidank
f09ef4de45 feat: add document pipeline, ReAct agent, and knowledge core services
Document Pipeline (Team A):
- LibrarianService: document storage with filesystem backend, metadata
  persistence, child document hierarchy, collection management
- ChunkingService: recursive character text splitter with configurable
  chunk size/overlap, FlowProcessor pattern
- KnowledgeExtractService: combined relationship + definition extraction
  using prompt service and LLM, emits RDF triples and entity contexts
- KnowledgeCoreService: knowledge core CRUD with streaming export and
  flow-based loading

ReAct Agent (Team B):
- StreamingReActParser: state machine for parsing LLM output into
  Thought/Action/ActionInput/FinalAnswer sections
- Three MVP tools: KnowledgeQuery (GraphRAG), DocumentQuery (DocRAG),
  TriplesQuery with RequestResponse clients
- AgentService FlowProcessor with ReAct loop, tool execution, and
  streaming chunk responses (thought/observation/answer)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:19:37 -05:00
elpresidank
5ed3f0e2d8 feat: add schema foundation for document pipeline, agent, and deployment
Add missing topics (librarian, knowledge, collection-management, flow),
pipeline message types (TextDocument, Chunk, Triples, EntityContexts),
service message types (Librarian, Knowledge, Collection, Flow CRUD),
and update AgentResponse for streaming chunk format.

Add RequestResponseSpec enabling flow-scoped request/response calls
(needed by knowledge extraction and agent services). Add requestor
registry to Flow class with proper lifecycle management.

Add end_of_dialog to gateway's isComplete() check for agent streaming.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:11:29 -05:00
elpresidank
28747e1a92 fix: NATS pipeline bugs, add integration tests and service runners
Fix three critical bugs preventing the NATS message pipeline from working:

- FlowProcessor now subscribes to config-push topic (was missing entirely),
  using DeliverPolicy.All to replay config on service restart
- NATS streams use wildcard subjects (tg.flow.>) instead of per-topic
  narrow filters that caused 503 errors on publish
- Subscriber dispatch loop has exponential backoff on errors to prevent
  tight error loops

Add service runner scripts (gateway, config, LLM) and a 7-test
integration suite that verifies config CRUD, WebSocket round-trip,
and full LLM text-completion through the NATS pipeline.

Fix Docker Compose infra: pin Tempo to v2.6.1, remove deprecated Loki
config fields, add user:0 for volume permissions, remap conflicting
ports (FalkorDB 6380, OTLP 4327/4328).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 23:41:39 -05:00
elpresidank
0042f9259c fix: linter cleanup on flow service implementations
Minor fixes from linter: readonly modifiers, unused parameter prefixes,
type narrowing in graph-rag BFS traversal and edge scoring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 22:52:40 -05:00
elpresidank
b6536eca38 init 2026-04-05 22:44:45 -05:00
elpresidank
c386f68743 Merge commit '74cc8a4685' as 'ai-context/trustgraph-templates' 2026-04-05 21:09:49 -05:00
elpresidank
74cc8a4685 Squashed 'ai-context/trustgraph-templates/' content from commit 42a5fd1b
git-subtree-dir: ai-context/trustgraph-templates
git-subtree-split: 42a5fd1b678f32be378062e30451e2052ccb95dd
2026-04-05 21:09:49 -05:00
elpresidank
e26caa0b12 saving 2026-04-05 21:09:33 -05:00
elpresidank
9e9307a2aa Merge commit 'ad40332d56' as 'ai-context/trustgraph-templates' 2026-04-05 21:08:57 -05:00
elpresidank
ad40332d56 Squashed 'ai-context/trustgraph-templates/' content from commit 338a8ffa
git-subtree-dir: ai-context/trustgraph-templates
git-subtree-split: 338a8ffadb1439013071ae922e55ed2421f17025
2026-04-05 21:08:57 -05:00
elpresidank
ecaf3489f1 Merge commit '9b2f675702' as 'ai-context/context-graph-demo' 2026-04-05 21:08:35 -05:00
elpresidank
9b2f675702 Squashed 'ai-context/context-graph-demo/' content from commit 338a8ffa
git-subtree-dir: ai-context/context-graph-demo
git-subtree-split: 338a8ffadb1439013071ae922e55ed2421f17025
2026-04-05 21:08:35 -05:00
elpresidank
1a72bfdec0 Merge commit 'a8390532f7' as 'ai-context/workbench-ui' 2026-04-05 21:08:02 -05:00
elpresidank
a8390532f7 Squashed 'ai-context/workbench-ui/' content from commit 32e36a5c
git-subtree-dir: ai-context/workbench-ui
git-subtree-split: 32e36a5c2131e429a7081cfaf67dabad3193cda3
2026-04-05 21:08:02 -05:00
elpresidank
05d87964c2 Merge commit 'deff028fed' as 'ai-context/trustgraph-client' 2026-04-05 21:07:35 -05:00
elpresidank
deff028fed Squashed 'ai-context/trustgraph-client/' content from commit 908f18cf
git-subtree-dir: ai-context/trustgraph-client
git-subtree-split: 908f18cf814470ec3b72cc336bb945fb792ffdec
2026-04-05 21:07:35 -05:00
Jack Colquitt
be443a1679
Refine README content and remove Table of Contents (#759)
Updated the README to improve clarity and remove the Table of Contents section.
2026-04-04 13:40:12 -07:00
Jack Colquitt
8d1a4ae3bf
Revise quickstart instructions in README.md (#758)
Updated the README to clarify the configuration process and improve wording.
2026-04-04 13:34:12 -07:00
Alex Jenkins
2f484b4c15
fix deadlink in readme (#735)
Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu>
2026-03-29 16:51:40 -07:00
cybermaggedon
2449392896
release/v2.2 -> master (#733) 2026-03-29 20:27:25 +01:00
Alex Jenkins
3ed71a5620
Add security policy (#731) 2026-03-29 20:17:48 +01:00
Jack Colquitt
060ed258eb
Add license badge to README (#725) 2026-03-27 11:28:22 -07:00
Cyber MacGeddon
5702bcae1d New CLA workflow: Uses a github action in
trustgraph-ai/contributor-license-agreement

This blocks a PR until the commiter responds with a message
of agreement with the CLA terms.
2026-03-26 14:11:36 +00:00
cybermaggedon
3ccff800c7
Merge pull request #712 from trustgraph-ai/release/v2.2
release/v2.2 -> master
2026-03-25 17:49:19 +00:00
cybermaggedon
9330730afb
Add chunk content ID to explain trace provenance output (#708)
When --show-provenance is used with tg-show-explain-trace,
display the chunk URI on a Content: line below each Source:
chain. This allows the user to easily fetch the source text
with tg-get-document-content.
2026-03-23 16:20:52 +00:00
cybermaggedon
25995d03f4
Fix stray log messages caused by librarian messages (#706)
Warning generated by librarian responses meant for other
services (chunker, embeddings, etc.) arriving on the shared
response queue. The decoder's subscription picks them up, can't
match them to a pending request, and logs a warning.

Removed the warnings, as not serving a purpose.
2026-03-23 13:16:39 +00:00
cybermaggedon
5c6fe90fe2
Add universal document decoder with multi-format support (#705)
Add universal document decoder with multi-format support
using 'unstructured'.

New universal decoder service powered by the unstructured
library, handling DOCX, XLSX, PPTX, HTML, Markdown, CSV, RTF,
ODT, EPUB and more through a single service. Tables are preserved
as HTML markup for better downstream extraction. Images are
stored in the librarian but excluded from the text
pipeline. Configurable section grouping strategies
(whole-document, heading, element-type, count, size) for non-page
formats. Page-based formats (PDF, PPTX, XLSX) are automatically
grouped by page.

All four decoders (PDF, Mistral OCR, Tesseract OCR, universal)
now share the "document-decoder" ident so they are
interchangeable.  PDF-only decoders fetch document metadata to
check MIME type and gracefully skip unsupported formats.

Librarian changes: removed MIME type whitelist validation so any
document format can be ingested. Simplified routing so text/plain
goes to text-load and everything else goes to document-load.
Removed dual inline/streaming data paths — documents always use
document_id for content retrieval.

New provenance entity types (tg:Section, tg:Image) and metadata
predicates (tg:elementTypes, tg:tableCount, tg:imageCount) for
richer explainability.

Universal decoder is in its own package (trustgraph-unstructured)
and container image (trustgraph-unstructured).
2026-03-23 12:56:35 +00:00
cybermaggedon
4609424afe
Prepare 2.2 release branch (#704) 2026-03-22 15:23:23 +00:00
cybermaggedon
96fd1eab15
Use UUID-based URNs for page and chunk IDs (#703)
Page and chunk document IDs were deterministic ({doc_id}/p{num},
{doc_id}/p{num}/c{num}), causing "Document already exists" errors
when reprocessing documents through different flows. Content may
differ between runs due to different parameters or extractors, so
deterministic IDs are incorrect.

Pages now use urn:page:{uuid}, chunks use
urn:chunk:{uuid}. Parent- child relationships are tracked via
librarian metadata and provenance triples.

Also brings Mistral OCR and Tesseract OCR decoders up to parity
with the PDF decoder: librarian fetch/save support, per-page
output with unique IDs, and provenance triple emission. Fixes
Mistral OCR bug where only the first 5 pages were processed.
2026-03-21 21:17:03 +00:00
cybermaggedon
1a7b654bd3
Add semantic pre-filter for GraphRAG edge scoring (#702)
Embed edge descriptions and compute cosine similarity against grounding
concepts to reduce the number of edges sent to expensive LLM scoring.
Controlled by edge_score_limit parameter (default 30), skipped when edge
count is already below the limit.

Also plumbs edge_score_limit and edge_limit parameters end-to-end:
- CLI args (--edge-score-limit, --edge-limit) in both invoke and service
- Socket client: fix parameter mapping to use hyphenated wire-format keys
- Flow API, message translator, gateway all pass through correctly
- Explainable code path (_question_explainable_api) now forwards all params
- Default edge_score_limit changed from 50 to 30 based on typical subgraph
  sizes
2026-03-21 20:06:29 +00:00
Jack Colquitt
d30857b5c3
Update video links and section titles in README 2026-03-20 21:33:16 -07:00
Jack Colquitt
b8ed36401a
Update README to reflect new section and links 2026-03-20 21:17:17 -07:00
Jack Colquitt
690ca4e837
Revise README to reflect context development platform
Updated project description and platform details in README.
2026-03-19 15:51:02 -07:00
Cyber MacGeddon
3ec5e1b385 Merge branch 'release/v2.1' 2026-03-17 20:59:48 +00:00
cybermaggedon
bc68738c37
README.md from master (#701) 2026-03-17 20:54:04 +00:00
Cyber MacGeddon
c818a1fe17 Fix broken merge 2026-03-17 20:52:51 +00:00
Cyber MacGeddon
64b934c814 Fix changing the README 2026-03-17 20:51:17 +00:00
Cyber MacGeddon
824f993985 Merge branch 'release/v2.1' 2026-03-17 20:44:03 +00:00
cybermaggedon
664d1d0384
Update API specs for 2.1 (#699)
* Updating API specs for 2.1

* Updated API and SDK docs
2026-03-17 20:36:31 +00:00
cybermaggedon
c387670944
Fix incorrect property names in explainability (#698)
Remove type suffixes from explainability dataclass fields + fix show_explain_trace

Rename dataclass fields to match KG property naming conventions:
- Analysis: thought_uri/observation_uri → thought/observation
- Synthesis/Conclusion/Reflection: document_uri → document

Fix show_explain_trace for current API:
- Resolve document content via librarian fetch instead of removed
  inline content fields (synthesis.content, conclusion.answer)
- Add Grounding display for DocRAG traces
- Update fetch_docrag_trace chain: Question → Grounding → Exploration →
Synthesis
- Pass api/explain_client to all print functions for content resolution

Update all CLI tools and tests for renamed fields.
2026-03-16 14:47:37 +00:00
cybermaggedon
a115ec06ab
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding (#697)
Enhance retrieval pipelines: 4-stage GraphRAG, DocRAG grounding,
consistent PROV-O

GraphRAG:
- Split retrieval into 4 prompt stages: extract-concepts,
  kg-edge-scoring,
  kg-edge-reasoning, kg-synthesis (was single-stage)
- Add concept extraction (grounding) for per-concept embedding
- Filter main query to default graph, ignoring
  provenance/explainability edges
- Add source document edges to knowledge graph

DocumentRAG:
- Add grounding step with concept extraction, matching GraphRAG's
  pattern:
  Question → Grounding → Exploration → Synthesis
- Per-concept embedding and chunk retrieval with deduplication

Cross-pipeline:
- Make PROV-O derivation links consistent: wasGeneratedBy for first
  entity from Activity, wasDerivedFrom for entity-to-entity chains
- Update CLIs (tg-invoke-agent, tg-invoke-graph-rag,
  tg-invoke-document-rag) for new explainability structure
- Fix all affected unit and integration tests
2026-03-16 12:12:13 +00:00