sqlite-vec/benchmarks-ann/bench-delete
Alex Garcia 6e2c4c6bab Add FTS5-style command column and runtime oversample for rescore
Replace the old INSERT INTO t(rowid) VALUES('command') hack with a
proper hidden command column named after the table (FTS5 pattern):

  INSERT INTO t(t) VALUES ('oversample=16')

The command column is the first hidden column (before distance and k)
to reserve ability for future table-valued function argument use.

Schema: CREATE TABLE x(rowid, <cols>, "<table>" hidden, distance hidden, k hidden)

For backwards compat, pre-v0.1.10 tables (detected via _info shadow
table version) skip the command column to avoid name conflicts with
user columns that may share the table's name. Verified with legacy
fixture DB generated by sqlite-vec v0.1.6.

Changes:
- Add hidden command column to sqlite3_declare_vtab for new tables
- Version-gate via _info shadow table for existing tables
- Validate at CREATE time that no column name matches table name
- Add rescore_handle_command() with oversample=N support
- rescore_knn() prefers runtime oversample_search over CREATE default
- Remove old rowid-based command dispatch
- Migrate all DiskANN/IVF/fuzz tests and benchmarks to new syntax
- Add legacy DB fixture (v0.1.6) and 9 backwards-compat tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:39:18 -07:00
..
.gitignore Add delete recall benchmark suite 2026-03-31 17:13:40 -07:00
bench_delete.py Add FTS5-style command column and runtime oversample for rescore 2026-03-31 22:39:18 -07:00
Makefile Add delete recall benchmark suite 2026-03-31 17:13:40 -07:00
README.md Add delete recall benchmark suite 2026-03-31 17:13:40 -07:00
test_smoke.py Add delete recall benchmark suite 2026-03-31 17:13:40 -07:00

bench-delete: Recall degradation after random deletion

Measures how KNN recall changes after deleting a random percentage of rows from different index types (flat, rescore, DiskANN).

Quick start

# Ensure dataset exists
make -C ../datasets/cohere1m

# Ensure extension is built
make -C ../.. loadable

# Quick smoke test
make smoke

# Full benchmark at 10k vectors
make bench-10k

Usage

python bench_delete.py --subset-size 10000 --delete-pct 10,25,50,75 \
  "flat:type=vec0-flat,variant=float" \
  "diskann-R72:type=diskann,R=72,L=128,quantizer=binary" \
  "rescore-bit:type=rescore,quantizer=bit,oversample=8"

What it measures

For each config and delete percentage:

Metric Description
recall KNN recall@k after deletion (ground truth recomputed over surviving rows)
delta Recall change vs 0% baseline
query latency Mean/median query time after deletion
db_size_mb DB file size before VACUUM
vacuum_size_mb DB file size after VACUUM (space reclaimed)
delete_time_s Wall time for the DELETE operations

How it works

  1. Build index with N vectors (one copy per config)
  2. Measure recall at k=10 (pre-delete baseline)
  3. For each delete %:
    • Copy the master DB
    • Delete a random selection of rows (deterministic seed)
    • Measure recall (ground truth recomputed over surviving rows only)
    • VACUUM and measure size savings
  4. Print comparison table

Expected behavior

  • Flat index: Recall should be 1.0 at all delete percentages (brute-force is always exact)
  • Rescore: Recall should stay close to baseline (quantized scan + rescore is robust)
  • DiskANN: Recall may degrade at high delete % due to graph fragmentation (dangling edges, broken connectivity)

Results DB

Results are stored in runs/<dataset>/<subset_size>/delete_results.db:

SELECT config_name, delete_pct, recall, vacuum_size_mb
FROM delete_runs
ORDER BY config_name, delete_pct;