diff --git a/TODO.md b/TODO.md deleted file mode 100644 index 4c3cc19..0000000 --- a/TODO.md +++ /dev/null @@ -1,73 +0,0 @@ -# TODO: `ann` base branch + consolidated benchmarks - -## 1. Create `ann` branch with shared code - -### 1.1 Branch setup -- [x] `git checkout -B ann origin/main` -- [x] Cherry-pick `624f998` (vec0_distance_full shared distance dispatch) -- [x] Cherry-pick stdint.h fix for test header -- [ ] Pull NEON cosine optimization from ivf-yolo3 into shared code - - Currently only in ivf branch but is general-purpose (benefits all distance calcs) - - Lives in `distance_cosine_float()` — ~57 lines of ARM NEON vectorized cosine - -### 1.2 Benchmark infrastructure (`benchmarks-ann/`) -- [x] Seed data pipeline (`seed/Makefile`, `seed/build_base_db.py`) -- [x] Ground truth generator (`ground_truth.py`) -- [x] Results schema (`schema.sql`) -- [x] Benchmark runner with `INDEX_REGISTRY` extension point (`bench.py`) - - Baseline configs (float, int8-rescore, bit-rescore) implemented - - Index branches register their types via `INDEX_REGISTRY` dict -- [x] Makefile with baseline targets -- [x] README - -### 1.3 Rebase feature branches onto `ann` -- [x] Rebase `diskann-yolo2` onto `ann` (1 commit: DiskANN implementation) -- [x] Rebase `ivf-yolo3` onto `ann` (1 commit: IVF implementation) -- [x] Rebase `annoy-yolo2` onto `ann` (2 commits: Annoy implementation + schema fix) -- [x] Verify each branch has only its index-specific commits remaining -- [ ] Force-push all 4 branches to origin - ---- - -## 2. Per-branch: register index type in benchmarks - -Each index branch should add to `benchmarks-ann/` when rebased onto `ann`: - -### 2.1 Register in `bench.py` - -Add an `INDEX_REGISTRY` entry. Each entry provides: -- `defaults` — default param values -- `create_table_sql(params)` — CREATE VIRTUAL TABLE with INDEXED BY clause -- `insert_sql(params)` — custom insert SQL, or None for default -- `post_insert_hook(conn, params)` — training/building step, returns time -- `run_query(conn, params, query, k)` — custom query, or None for default MATCH -- `describe(params)` — one-line description for report output - -### 2.2 Add configs to `Makefile` - -Append index-specific config variables and targets. Example pattern: - -```makefile -DISKANN_CONFIGS = \ - "diskann-R48-binary:type=diskann,R=48,L=128,quantizer=binary" \ - ... - -ALL_CONFIGS += $(DISKANN_CONFIGS) - -bench-diskann: seed - $(BENCH) --subset-size 10000 -k 10 -o runs/diskann $(BASELINES) $(DISKANN_CONFIGS) - ... -``` - -### 2.3 Migrate existing benchmark results/docs - -- Move useful results docs (RESULTS.md, etc.) into `benchmarks-ann/results/` -- Delete redundant per-branch benchmark directories once consolidated infra is proven - ---- - -## 3. Future improvements - -- [ ] Reporting script (`report.py`) — query results.db, produce markdown comparison tables -- [ ] Profiling targets in Makefile (lift from ivf-yolo3's Instruments/perf wrappers) -- [ ] Pre-computed ground truth integration (use GT DB files instead of on-the-fly brute-force)