feat: real PDF pipeline test — end-to-end knowledge extraction working

Add full pipeline test that generates a real PDF, processes it through
the entire pipeline, and verifies knowledge lands in FalkorDB:

- Create test PDF generator using pdf-lib (2-page doc about Acme Corp)
- Add testFullPipeline() to integration tests with store verification
- Fix FalkorDB client connect() — createClient returns unconnected client
  in both TriplesStore and TriplesQuery classes

Results: PDF decoded (2 pages) → chunked (2 chunks) → extracted
(4 relationships) → 16 triples stored in FalkorDB including:
  alice-johnson → is-a-senior-engineer → acme-corporation
  cloudsync → uses-aws-for-hosting → amazon-web-services
  provenance: pages → prov:wasDerivedFrom → source document

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
elpresidank 2026-04-07 02:19:12 -05:00
parent 5bc7a1b6fc
commit 50fb311d2d
6 changed files with 269 additions and 1 deletions

View file

@ -29,10 +29,12 @@
"graph-embeddings-query": "tsx scripts/run-graph-embeddings-query.ts",
"doc-embeddings-query": "tsx scripts/run-doc-embeddings-query.ts",
"graph-rag": "tsx scripts/run-graph-rag.ts",
"document-rag": "tsx scripts/run-document-rag.ts"
"document-rag": "tsx scripts/run-document-rag.ts",
"create-test-pdf": "tsx scripts/create-test-pdf.ts"
},
"devDependencies": {
"nats": "^2.29.0",
"pdf-lib": "^1.17.1",
"tsx": "^4.21.0",
"turbo": "^2.5.0",
"typescript": "^5.8.0"