mirror of
https://github.com/asg017/sqlite-vec.git
synced 2026-04-25 08:46:49 +02:00
74 lines
2.8 KiB
Markdown
74 lines
2.8 KiB
Markdown
|
|
# TODO: `ann` base branch + consolidated benchmarks
|
||
|
|
|
||
|
|
## 1. Create `ann` branch with shared code
|
||
|
|
|
||
|
|
### 1.1 Branch setup
|
||
|
|
- [x] `git checkout -B ann origin/main`
|
||
|
|
- [x] Cherry-pick `624f998` (vec0_distance_full shared distance dispatch)
|
||
|
|
- [x] Cherry-pick stdint.h fix for test header
|
||
|
|
- [ ] Pull NEON cosine optimization from ivf-yolo3 into shared code
|
||
|
|
- Currently only in ivf branch but is general-purpose (benefits all distance calcs)
|
||
|
|
- Lives in `distance_cosine_float()` — ~57 lines of ARM NEON vectorized cosine
|
||
|
|
|
||
|
|
### 1.2 Benchmark infrastructure (`benchmarks-ann/`)
|
||
|
|
- [x] Seed data pipeline (`seed/Makefile`, `seed/build_base_db.py`)
|
||
|
|
- [x] Ground truth generator (`ground_truth.py`)
|
||
|
|
- [x] Results schema (`schema.sql`)
|
||
|
|
- [x] Benchmark runner with `INDEX_REGISTRY` extension point (`bench.py`)
|
||
|
|
- Baseline configs (float, int8-rescore, bit-rescore) implemented
|
||
|
|
- Index branches register their types via `INDEX_REGISTRY` dict
|
||
|
|
- [x] Makefile with baseline targets
|
||
|
|
- [x] README
|
||
|
|
|
||
|
|
### 1.3 Rebase feature branches onto `ann`
|
||
|
|
- [x] Rebase `diskann-yolo2` onto `ann` (1 commit: DiskANN implementation)
|
||
|
|
- [x] Rebase `ivf-yolo3` onto `ann` (1 commit: IVF implementation)
|
||
|
|
- [x] Rebase `annoy-yolo2` onto `ann` (2 commits: Annoy implementation + schema fix)
|
||
|
|
- [x] Verify each branch has only its index-specific commits remaining
|
||
|
|
- [ ] Force-push all 4 branches to origin
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 2. Per-branch: register index type in benchmarks
|
||
|
|
|
||
|
|
Each index branch should add to `benchmarks-ann/` when rebased onto `ann`:
|
||
|
|
|
||
|
|
### 2.1 Register in `bench.py`
|
||
|
|
|
||
|
|
Add an `INDEX_REGISTRY` entry. Each entry provides:
|
||
|
|
- `defaults` — default param values
|
||
|
|
- `create_table_sql(params)` — CREATE VIRTUAL TABLE with INDEXED BY clause
|
||
|
|
- `insert_sql(params)` — custom insert SQL, or None for default
|
||
|
|
- `post_insert_hook(conn, params)` — training/building step, returns time
|
||
|
|
- `run_query(conn, params, query, k)` — custom query, or None for default MATCH
|
||
|
|
- `describe(params)` — one-line description for report output
|
||
|
|
|
||
|
|
### 2.2 Add configs to `Makefile`
|
||
|
|
|
||
|
|
Append index-specific config variables and targets. Example pattern:
|
||
|
|
|
||
|
|
```makefile
|
||
|
|
DISKANN_CONFIGS = \
|
||
|
|
"diskann-R48-binary:type=diskann,R=48,L=128,quantizer=binary" \
|
||
|
|
...
|
||
|
|
|
||
|
|
ALL_CONFIGS += $(DISKANN_CONFIGS)
|
||
|
|
|
||
|
|
bench-diskann: seed
|
||
|
|
$(BENCH) --subset-size 10000 -k 10 -o runs/diskann $(BASELINES) $(DISKANN_CONFIGS)
|
||
|
|
...
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2.3 Migrate existing benchmark results/docs
|
||
|
|
|
||
|
|
- Move useful results docs (RESULTS.md, etc.) into `benchmarks-ann/results/`
|
||
|
|
- Delete redundant per-branch benchmark directories once consolidated infra is proven
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 3. Future improvements
|
||
|
|
|
||
|
|
- [ ] Reporting script (`report.py`) — query results.db, produce markdown comparison tables
|
||
|
|
- [ ] Profiling targets in Makefile (lift from ivf-yolo3's Instruments/perf wrappers)
|
||
|
|
- [ ] Pre-computed ground truth integration (use GT DB files instead of on-the-fly brute-force)
|