mirror of
https://github.com/asg017/sqlite-vec.git
synced 2026-04-25 16:56:27 +02:00
Add approximate nearest neighbor infrastructure to vec0: shared distance dispatch (vec0_distance_full), flat index type with parser, NEON-optimized cosine/Hamming for float32/int8, amalgamation script, and benchmark suite (benchmarks-ann/) with ground-truth generation and profiling tools. Remove unused vec_npy_each/vec_static_blobs code, fix missing stdint.h include.
2.8 KiB
2.8 KiB
TODO: ann base branch + consolidated benchmarks
1. Create ann branch with shared code
1.1 Branch setup
git checkout -B ann origin/main- Cherry-pick
624f998(vec0_distance_full shared distance dispatch) - Cherry-pick stdint.h fix for test header
- Pull NEON cosine optimization from ivf-yolo3 into shared code
- Currently only in ivf branch but is general-purpose (benefits all distance calcs)
- Lives in
distance_cosine_float()— ~57 lines of ARM NEON vectorized cosine
1.2 Benchmark infrastructure (benchmarks-ann/)
- Seed data pipeline (
seed/Makefile,seed/build_base_db.py) - Ground truth generator (
ground_truth.py) - Results schema (
schema.sql) - Benchmark runner with
INDEX_REGISTRYextension point (bench.py)- Baseline configs (float, int8-rescore, bit-rescore) implemented
- Index branches register their types via
INDEX_REGISTRYdict
- Makefile with baseline targets
- README
1.3 Rebase feature branches onto ann
- Rebase
diskann-yolo2ontoann(1 commit: DiskANN implementation) - Rebase
ivf-yolo3ontoann(1 commit: IVF implementation) - Rebase
annoy-yolo2ontoann(2 commits: Annoy implementation + schema fix) - Verify each branch has only its index-specific commits remaining
- Force-push all 4 branches to origin
2. Per-branch: register index type in benchmarks
Each index branch should add to benchmarks-ann/ when rebased onto ann:
2.1 Register in bench.py
Add an INDEX_REGISTRY entry. Each entry provides:
defaults— default param valuescreate_table_sql(params)— CREATE VIRTUAL TABLE with INDEXED BY clauseinsert_sql(params)— custom insert SQL, or None for defaultpost_insert_hook(conn, params)— training/building step, returns timerun_query(conn, params, query, k)— custom query, or None for default MATCHdescribe(params)— one-line description for report output
2.2 Add configs to Makefile
Append index-specific config variables and targets. Example pattern:
DISKANN_CONFIGS = \
"diskann-R48-binary:type=diskann,R=48,L=128,quantizer=binary" \
...
ALL_CONFIGS += $(DISKANN_CONFIGS)
bench-diskann: seed
$(BENCH) --subset-size 10000 -k 10 -o runs/diskann $(BASELINES) $(DISKANN_CONFIGS)
...
2.3 Migrate existing benchmark results/docs
- Move useful results docs (RESULTS.md, etc.) into
benchmarks-ann/results/ - Delete redundant per-branch benchmark directories once consolidated infra is proven
3. Future improvements
- Reporting script (
report.py) — query results.db, produce markdown comparison tables - Profiling targets in Makefile (lift from ivf-yolo3's Instruments/perf wrappers)
- Pre-computed ground truth integration (use GT DB files instead of on-the-fly brute-force)