mirror of
https://github.com/asg017/sqlite-vec.git
synced 2026-04-25 08:46:49 +02:00
Add DiskANN graph-based index: builds a Vamana graph with configurable R (max degree) and L (search list size, separate for insert/query), supports int8 quantization with rescore, lazy reverse-edge replacement, pre-quantized query optimization, and insert buffer reuse. Includes shadow table management, delete support, KNN integration, compile flag (SQLITE_VEC_ENABLE_DISKANN), release-demo workflow, fuzz targets, and tests. Fixes rescore int8 quantization bug. |
||
|---|---|---|
| .. | ||
| seed | ||
| .gitignore | ||
| bench.py | ||
| ground_truth.py | ||
| Makefile | ||
| profile.py | ||
| README.md | ||
| schema.sql | ||
KNN Benchmarks for sqlite-vec
Benchmarking infrastructure for vec0 KNN configurations. Includes brute-force
baselines (float, int8, bit); index-specific branches add their own types
via the INDEX_REGISTRY in bench.py.
Prerequisites
- Built
dist/vec0extension (runmakefrom repo root) - Python 3.10+
uv(for seed data prep):pip install uv
Quick start
# 1. Download dataset and build seed DB (~3 GB download, ~5 min)
make seed
# 2. Run a quick smoke test (5k vectors, ~1 min)
make bench-smoke
# 3. Run full benchmark at 10k
make bench-10k
Usage
Direct invocation
python bench.py --subset-size 10000 \
"brute-float:type=baseline,variant=float" \
"brute-int8:type=baseline,variant=int8" \
"brute-bit:type=baseline,variant=bit"
Config format
name:type=<index_type>,key=val,key=val
| Index type | Keys | Branch |
|---|---|---|
baseline |
variant (float/int8/bit), oversample |
this branch |
Index branches register additional types in INDEX_REGISTRY. See the
docstring in bench.py for the extension API.
Make targets
| Target | Description |
|---|---|
make seed |
Download COHERE 1M dataset |
make ground-truth |
Pre-compute ground truth for 10k/50k/100k |
make bench-smoke |
Quick 5k baseline test |
make bench-10k |
All configs at 10k vectors |
make bench-50k |
All configs at 50k vectors |
make bench-100k |
All configs at 100k vectors |
make bench-all |
10k + 50k + 100k |
Adding an index type
In your index branch, add an entry to INDEX_REGISTRY in bench.py and
append your configs to ALL_CONFIGS in the Makefile. See the existing
baseline entry and the comments in both files for the pattern.
Results
Results are stored in runs/<dir>/results.db using the schema in schema.sql.
sqlite3 runs/10k/results.db "
SELECT config_name, recall, mean_ms, qps
FROM bench_results
ORDER BY recall DESC
"
Dataset
Zilliz COHERE Medium 1M: 768 dimensions, cosine distance, 1M train vectors + 10k query vectors with precomputed neighbors.