mirror of
https://github.com/asg017/sqlite-vec.git
synced 2026-04-26 01:06:27 +02:00
Add ANN search support for vec0 virtual table
Add approximate nearest neighbor infrastructure to vec0: shared distance dispatch (vec0_distance_full), flat index type with parser, NEON-optimized cosine/Hamming for float32/int8, amalgamation script, and benchmark suite (benchmarks-ann/) with ground-truth generation and profiling tools. Remove unused vec_npy_each/vec_static_blobs code, fix missing stdint.h include.
This commit is contained in:
parent
dfd8dc5290
commit
bf2455f2ba
27 changed files with 2177 additions and 2116 deletions
73
TODO.md
Normal file
73
TODO.md
Normal file
|
|
@ -0,0 +1,73 @@
|
|||
# TODO: `ann` base branch + consolidated benchmarks
|
||||
|
||||
## 1. Create `ann` branch with shared code
|
||||
|
||||
### 1.1 Branch setup
|
||||
- [x] `git checkout -B ann origin/main`
|
||||
- [x] Cherry-pick `624f998` (vec0_distance_full shared distance dispatch)
|
||||
- [x] Cherry-pick stdint.h fix for test header
|
||||
- [ ] Pull NEON cosine optimization from ivf-yolo3 into shared code
|
||||
- Currently only in ivf branch but is general-purpose (benefits all distance calcs)
|
||||
- Lives in `distance_cosine_float()` — ~57 lines of ARM NEON vectorized cosine
|
||||
|
||||
### 1.2 Benchmark infrastructure (`benchmarks-ann/`)
|
||||
- [x] Seed data pipeline (`seed/Makefile`, `seed/build_base_db.py`)
|
||||
- [x] Ground truth generator (`ground_truth.py`)
|
||||
- [x] Results schema (`schema.sql`)
|
||||
- [x] Benchmark runner with `INDEX_REGISTRY` extension point (`bench.py`)
|
||||
- Baseline configs (float, int8-rescore, bit-rescore) implemented
|
||||
- Index branches register their types via `INDEX_REGISTRY` dict
|
||||
- [x] Makefile with baseline targets
|
||||
- [x] README
|
||||
|
||||
### 1.3 Rebase feature branches onto `ann`
|
||||
- [x] Rebase `diskann-yolo2` onto `ann` (1 commit: DiskANN implementation)
|
||||
- [x] Rebase `ivf-yolo3` onto `ann` (1 commit: IVF implementation)
|
||||
- [x] Rebase `annoy-yolo2` onto `ann` (2 commits: Annoy implementation + schema fix)
|
||||
- [x] Verify each branch has only its index-specific commits remaining
|
||||
- [ ] Force-push all 4 branches to origin
|
||||
|
||||
---
|
||||
|
||||
## 2. Per-branch: register index type in benchmarks
|
||||
|
||||
Each index branch should add to `benchmarks-ann/` when rebased onto `ann`:
|
||||
|
||||
### 2.1 Register in `bench.py`
|
||||
|
||||
Add an `INDEX_REGISTRY` entry. Each entry provides:
|
||||
- `defaults` — default param values
|
||||
- `create_table_sql(params)` — CREATE VIRTUAL TABLE with INDEXED BY clause
|
||||
- `insert_sql(params)` — custom insert SQL, or None for default
|
||||
- `post_insert_hook(conn, params)` — training/building step, returns time
|
||||
- `run_query(conn, params, query, k)` — custom query, or None for default MATCH
|
||||
- `describe(params)` — one-line description for report output
|
||||
|
||||
### 2.2 Add configs to `Makefile`
|
||||
|
||||
Append index-specific config variables and targets. Example pattern:
|
||||
|
||||
```makefile
|
||||
DISKANN_CONFIGS = \
|
||||
"diskann-R48-binary:type=diskann,R=48,L=128,quantizer=binary" \
|
||||
...
|
||||
|
||||
ALL_CONFIGS += $(DISKANN_CONFIGS)
|
||||
|
||||
bench-diskann: seed
|
||||
$(BENCH) --subset-size 10000 -k 10 -o runs/diskann $(BASELINES) $(DISKANN_CONFIGS)
|
||||
...
|
||||
```
|
||||
|
||||
### 2.3 Migrate existing benchmark results/docs
|
||||
|
||||
- Move useful results docs (RESULTS.md, etc.) into `benchmarks-ann/results/`
|
||||
- Delete redundant per-branch benchmark directories once consolidated infra is proven
|
||||
|
||||
---
|
||||
|
||||
## 3. Future improvements
|
||||
|
||||
- [ ] Reporting script (`report.py`) — query results.db, produce markdown comparison tables
|
||||
- [ ] Profiling targets in Makefile (lift from ivf-yolo3's Instruments/perf wrappers)
|
||||
- [ ] Pre-computed ground truth integration (use GT DB files instead of on-the-fly brute-force)
|
||||
Loading…
Add table
Add a link
Reference in a new issue