Commit graph

3 commits

Author SHA1 Message Date
elpresidank
50fb311d2d feat: real PDF pipeline test — end-to-end knowledge extraction working
Add full pipeline test that generates a real PDF, processes it through
the entire pipeline, and verifies knowledge lands in FalkorDB:

- Create test PDF generator using pdf-lib (2-page doc about Acme Corp)
- Add testFullPipeline() to integration tests with store verification
- Fix FalkorDB client connect() — createClient returns unconnected client
  in both TriplesStore and TriplesQuery classes

Results: PDF decoded (2 pages) → chunked (2 chunks) → extracted
(4 relationships) → 16 triples stored in FalkorDB including:
  alice-johnson → is-a-senior-engineer → acme-corporation
  cloudsync → uses-aws-for-hosting → amazon-web-services
  provenance: pages → prov:wasDerivedFrom → source document

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 02:19:12 -05:00
elpresidank
c545213224 feat: add query/retrieval FlowProcessor services and missing runner scripts
Wire up the query and retrieval side of the pipeline so the agent can
answer questions from stored knowledge:

- Triples query service (FalkorDB) — all SPO pattern queries via NATS
- Graph embeddings query service (Qdrant) — entity vector similarity
- Document embeddings query service (Qdrant) — chunk vector similarity
- Graph RAG service — full concept→entity→traverse→score→synthesize pipeline
- Document RAG service — embed→find chunks→synthesize pipeline
- Runner scripts for chunker, extractor, embeddings (missing from Phase 5)
- Add DocumentEmbeddingsRequest/Response schema types
- Add RAG prompt templates (extract-concepts, edge-scoring, synthesize)
- Add graph/doc embeddings query topics to seed config + flow manager
- Add all pipeline/query/retrieval services to docker-compose
- 8 new runner scripts, 8 new pnpm script aliases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 01:05:54 -05:00
elpresidank
b6536eca38 init 2026-04-05 22:44:45 -05:00