diff --git a/IVF_PLAN.md b/IVF_PLAN.md
deleted file mode 100644
index 91bb85a..0000000
--- a/IVF_PLAN.md
+++ /dev/null
@@ -1,264 +0,0 @@
-# IVF Index for sqlite-vec
-
-## Overview
-
-IVF (Inverted File Index) is an approximate nearest neighbor index for
-sqlite-vec's `vec0` virtual table. It partitions vectors into clusters via
-k-means, then at query time only scans the nearest clusters instead of all
-vectors. Combined with scalar or binary quantization, this gives 5-20x query
-speedups over brute-force with tunable recall.
-
-## SQL API
-
-### Table Creation
-
-```sql
-CREATE VIRTUAL TABLE vec_items USING vec0(
-  id INTEGER PRIMARY KEY,
-  embedding float[768] distance_metric=cosine
-    INDEXED BY ivf(nlist=128, nprobe=16)
-);
-
--- With quantization (4x smaller cells, rescore for recall)
-CREATE VIRTUAL TABLE vec_items USING vec0(
-  id INTEGER PRIMARY KEY,
-  embedding float[768] distance_metric=cosine
-    INDEXED BY ivf(nlist=128, nprobe=16, quantizer=int8, oversample=4)
-);
-```
-
-### Parameters
-
-| Parameter | Values | Default | Description |
-|-----------|--------|---------|-------------|
-| `nlist` | 1-65536, or 0 | 128 | Number of k-means clusters. Rule of thumb: `sqrt(N)` |
-| `nprobe` | 1-nlist | 10 | Clusters to search at query time. More = better recall, slower |
-| `quantizer` | `none`, `int8`, `binary` | `none` | How vectors are stored in cells |
-| `oversample` | >= 1 | 1 | Re-rank `oversample * k` candidates with full-precision distance |
-
-### Inserting Vectors
-
-```sql
--- Works immediately, even before training
-INSERT INTO vec_items(id, embedding) VALUES (1, :vector);
-```
-
-Before centroids exist, vectors go to an "unassigned" partition and queries do
-brute-force. After training, new inserts are assigned to the nearest centroid.
-
-### Training (Computing Centroids)
-
-```sql
--- Run built-in k-means on all vectors
-INSERT INTO vec_items(id) VALUES ('compute-centroids');
-```
-
-This loads all vectors into memory, runs k-means++ with Lloyd's algorithm,
-creates quantized centroids, and redistributes all vectors into cluster cells.
-It's a blocking operation — run it once after bulk insert.
-
-### Manual Centroid Import
-
-```sql
--- Import externally-computed centroids
-INSERT INTO vec_items(id, embedding) VALUES ('set-centroid:0', :centroid_0);
-INSERT INTO vec_items(id, embedding) VALUES ('set-centroid:1', :centroid_1);
-
--- Assign vectors to imported centroids
-INSERT INTO vec_items(id) VALUES ('assign-vectors');
-```
-
-### Runtime Parameter Tuning
-
-```sql
--- Change nprobe without rebuilding the index
-INSERT INTO vec_items(id) VALUES ('nprobe=32');
-```
-
-### KNN Queries
-
-```sql
--- Same syntax as standard vec0
-SELECT id, distance
-FROM vec_items
-WHERE embedding MATCH :query AND k = 10;
-```
-
-### Other Commands
-
-```sql
--- Remove centroids, move all vectors back to unassigned
-INSERT INTO vec_items(id) VALUES ('clear-centroids');
-```
-
-## How It Works
-
-### Architecture
-
-```
-User vector (float32)
-  → quantize to int8/binary (if quantizer != none)
-  → find nearest centroid (quantized distance)
-  → store quantized vector in cell blob
-  → store full vector in KV table (if quantizer != none)
-  → query:
-      1. quantize query vector
-      2. find top nprobe centroids by quantized distance
-      3. scan cell blobs: quantized distance (fast, small I/O)
-      4. if oversample > 1: re-score top N*k with full vectors
-      5. return top k
-```
-
-### Shadow Tables
-
-For a table `vec_items` with vector column index 0:
-
-| Table | Schema | Purpose |
-|-------|--------|---------|
-| `vec_items_ivf_centroids00` | `centroid_id PK, centroid BLOB` | K-means centroids (quantized) |
-| `vec_items_ivf_cells00` | `centroid_id, n_vectors, validity BLOB, rowids BLOB, vectors BLOB` | Packed vector cells, 64 vectors max per row. Multiple rows per centroid. Index on centroid_id. |
-| `vec_items_ivf_rowid_map00` | `rowid PK, cell_id, slot` | Maps vector rowid → cell location for O(1) delete |
-| `vec_items_ivf_vectors00` | `rowid PK, vector BLOB` | Full-precision vectors (only when quantizer != none) |
-
-### Cell Storage
-
-Cells use packed blob storage identical to vec0's chunk layout:
-- **validity**: bitmap (1 bit per slot) marking live vectors
-- **rowids**: packed i64 array
-- **vectors**: packed array of quantized vectors
-
-Cells are capped at 64 vectors (~200KB at 768-dim float32, ~48KB for int8,
-~6KB for binary). When a cell fills, a new row is created for the same
-centroid. This avoids SQLite overflow page traversal which was a 110x
-performance bottleneck with unbounded cells.
-
-### Quantization
-
-**int8**: Each float32 dimension clamped to [-1,1] and scaled to int8
-[-127,127]. 4x storage reduction. Distance computed via int8 L2.
-
-**binary**: Sign-bit quantization — each bit is 1 if the float is positive.
-32x storage reduction. Distance computed via hamming distance.
-
-**Oversample re-ranking**: When `oversample > 1`, the quantized scan collects
-`oversample * k` candidates, then looks up each candidate's full-precision
-vector from the KV table and re-computes exact distance. This recovers nearly
-all recall lost from quantization. At oversample=4 with int8, recall matches
-non-quantized IVF exactly.
-
-### K-Means
-
-Uses Lloyd's algorithm with k-means++ initialization:
-1. K-means++ picks initial centroids weighted by distance
-2. Lloyd's iterations: assign vectors to nearest centroid, recompute centroids as cluster means
-3. Empty cluster handling: reassign to farthest point
-4. K-means runs in float32; centroids are quantized before storage
-
-Training data: recommend 16× nlist vectors. At nlist=1000, that's 16k
-vectors — k-means takes ~140s on 768-dim data.
-
-## Performance
-
-### 100k vectors (COHERE 768-dim cosine)
-
-```
-                          name  qry(ms)  recall
-───────────────────────────────────────────────
-          ivf(q=int8,os=4),p=8    5.3ms  0.934  ← 6x faster than flat
-         ivf(q=int8,os=4),p=16    5.4ms  0.968
-               ivf(q=none),p=8    5.3ms  0.934
-      ivf(q=binary,os=10),p=16    1.3ms  0.832  ← 26x faster than flat
-         ivf(q=int8,os=4),p=32    7.4ms  0.990
-              ivf(q=none),p=32   15.5ms  0.992
-                    int8(os=4)   18.7ms  0.996
-                     bit(os=8)   18.7ms  0.884
-                          flat   33.7ms  1.000
-```
-
-### 1M vectors (COHERE 768-dim cosine)
-
-```
-                            name  insert  train    MB  qry(ms)  recall
-──────────────────────────────────────────────────────────────────────
-            ivf(q=int8,os=4),p=8   163s   142s  4725   16.3ms  0.892
-        ivf(q=binary,os=10),p=16   118s   144s  4073   17.7ms  0.830
-           ivf(q=int8,os=4),p=16   163s   142s  4725   24.3ms  0.950
-           ivf(q=int8,os=4),p=32   163s   142s  4725   41.6ms  0.980
-                 ivf(q=none),p=8   497s   144s  3101   52.1ms  0.890
-                 ivf(q=none),p=16  497s   144s  3101   56.6ms  0.950
-                       bit(os=8)    18s      -  3048   83.5ms  0.918
-                 ivf(q=none),p=32  497s   144s  3101  103.9ms  0.980
-                      int8(os=4)    19s      -  3689  169.1ms  0.994
-                            flat    20s      -  2955  338.0ms  1.000
-```
-
-**Best config at 1M: `ivf(quantizer=int8, oversample=4, nprobe=16)`** —
-24ms query, 0.95 recall, 14x faster than flat, 7x faster than int8 baseline.
-
-### Scaling Characteristics
-
-| Metric | 100k | 1M | Scaling |
-|--------|------|-----|---------|
-| Flat query | 34ms | 338ms | 10x (linear) |
-| IVF int8 p=16 | 5.4ms | 24.3ms | 4.5x (sublinear) |
-| IVF insert rate | ~10k/s | ~6k/s | Slight degradation |
-| Training (nlist=1000) | 13s | 142s | ~11x |
-
-## Implementation
-
-### File Structure
-
-```
-sqlite-vec-ivf-kmeans.c    K-means++ algorithm (pure C, no SQLite deps)
-sqlite-vec-ivf.c           All IVF logic: parser, shadow tables, insert,
-                           delete, query, centroid commands, quantization
-sqlite-vec.c               ~50 lines of additions: struct fields, #includes,
-                           dispatch hooks in parse/create/insert/delete/filter
-```
-
-Both IVF files are `#include`d into `sqlite-vec.c`. No Makefile changes needed.
-
-### Key Design Decisions
-
-1. **Fixed-size cells (64 vectors)** instead of one blob per centroid. Avoids
-   SQLite overflow page traversal which caused 110x insert slowdown.
-
-2. **Multiple cell rows per centroid** with an index on centroid_id. When a
-   cell fills, a new row is created. Query scans all rows for probed centroids
-   via `WHERE centroid_id IN (...)`.
-
-3. **Always store full vectors** when quantizer != none (in `_ivf_vectors` KV
-   table). Enables oversample re-ranking and point queries returning original
-   precision.
-
-4. **K-means in float32, quantize after**. Simpler than quantized k-means,
-   and assignment accuracy doesn't suffer much since nprobe compensates.
-
-5. **NEON SIMD for cosine distance**. Added `cosine_float_neon()` with 4-wide
-   FMA for dot product + magnitudes. Benefits all vec0 queries, not just IVF.
-
-6. **Runtime nprobe tuning**. `INSERT INTO t(id) VALUES ('nprobe=N')` changes
-   the probe count without rebuilding — enables fast parameter sweeps.
-
-### Optimization History
-
-| Optimization | Impact |
-|-------------|--------|
-| Fixed-size cells (64 max) | 110x insert speedup |
-| Skip chunk writes for IVF | 2x DB size reduction |
-| NEON cosine distance | 2x query speedup + 13% recall improvement (correct metric) |
-| Cached prepared statements | Eliminated per-insert prepare/finalize |
-| Batched cell reads (IN clause) | Fewer SQLite queries per KNN |
-| int8 quantization | 2.5x query speedup at same recall |
-| Binary quantization | 32x less cell I/O |
-| Oversample re-ranking | Recovers quantization recall loss |
-
-## Remaining Work
-
-See `ivf-benchmarks/TODO.md` for the full list. Key items:
-
-- **Cache centroids in memory** — each insert re-reads all centroids from SQLite
-- **Runtime oversample** — same pattern as nprobe runtime command
-- **SIMD k-means** — training uses scalar distance, could be 4x faster
-- **Top-k heap** — replace qsort with min-heap for large nprobe
-- **IVF-PQ** — product quantization for better compression/recall tradeoff