Commit graph

5 commits

Author SHA1 Message Date
elpresidank
50fb311d2d feat: real PDF pipeline test — end-to-end knowledge extraction working
Add full pipeline test that generates a real PDF, processes it through
the entire pipeline, and verifies knowledge lands in FalkorDB:

- Create test PDF generator using pdf-lib (2-page doc about Acme Corp)
- Add testFullPipeline() to integration tests with store verification
- Fix FalkorDB client connect() — createClient returns unconnected client
  in both TriplesStore and TriplesQuery classes

Results: PDF decoded (2 pages) → chunked (2 chunks) → extracted
(4 relationships) → 16 triples stored in FalkorDB including:
  alice-johnson → is-a-senior-engineer → acme-corporation
  cloudsync → uses-aws-for-hosting → amazon-web-services
  provenance: pages → prov:wasDerivedFrom → source document

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 02:19:12 -05:00
elpresidank
8f7008822a feat: add document pipeline — PDF decoder, Ollama LLM, storage services
Add end-to-end document processing pipeline:
- PDF decoder service (pdfjs-dist) extracts text per page from librarian docs
- Ollama native LLM service for local model inference
- FalkorDB triples store FlowProcessor consumer
- Qdrant graph embeddings store FlowProcessor consumer
- Fix spec name collisions in chunker/extractor (input→chunk-input, etc.)
- Gateway /load endpoint to trigger document processing
- Align flow manager blueprint and seed config with full pipeline topics
- Add runner scripts and test coverage for document load

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:47:43 -05:00
elpresidank
8f9de7604e fix: make abstract class constructors protected
Marks FlowProcessor and EmbeddingsService constructors as protected
since these classes should only be instantiated via subclasses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 21:52:00 -05:00
elpresidank
28747e1a92 fix: NATS pipeline bugs, add integration tests and service runners
Fix three critical bugs preventing the NATS message pipeline from working:

- FlowProcessor now subscribes to config-push topic (was missing entirely),
  using DeliverPolicy.All to replay config on service restart
- NATS streams use wildcard subjects (tg.flow.>) instead of per-topic
  narrow filters that caused 503 errors on publish
- Subscriber dispatch loop has exponential backoff on errors to prevent
  tight error loops

Add service runner scripts (gateway, config, LLM) and a 7-test
integration suite that verifies config CRUD, WebSocket round-trip,
and full LLM text-completion through the NATS pipeline.

Fix Docker Compose infra: pin Tempo to v2.6.1, remove deprecated Loki
config fields, add user:0 for volume permissions, remap conflicting
ports (FalkorDB 6380, OTLP 4327/4328).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 23:41:39 -05:00
elpresidank
b6536eca38 init 2026-04-05 22:44:45 -05:00