mirror of
https://github.com/asg017/sqlite-vec.git
synced 2026-04-25 08:46:49 +02:00
Add approximate nearest neighbor infrastructure to vec0: shared distance dispatch (vec0_distance_full), flat index type with parser, NEON-optimized cosine/Hamming for float32/int8, amalgamation script, and benchmark suite (benchmarks-ann/) with ground-truth generation and profiling tools. Remove unused vec_npy_each/vec_static_blobs code, fix missing stdint.h include. |
||
|---|---|---|
| .. | ||
| .gitignore | ||
| bench.py | ||
| gist.suite | ||
| Makefile | ||
| README.md | ||
| requirements.txt | ||
| sift.suite | ||
sqlite-vec In-memory benchmark comparisions
This repo contains a benchmarks that compares KNN queries of sqlite-vec to other in-process vector search tools using brute force linear scans only. These include:
- Faiss IndexFlatL2
- usearch with
exact=True - libsql vector search with
vector_distance_cos - numpy, using this approach
- duckdb with
list_cosine_similarity sentence_transformers.util.semantic_search- hnswlib BFIndex
Again ONLY BRUTE FORCE LINEAR SCANS ARE TESTED. This benchmark does not test approximate nearest neighbors (ANN) implementations. This benchmarks is extremely narrow to just testing KNN searches using brute force.
A few other caveats:
- Only brute-force linear scans, no ANN
- Only CPU is used. The only tool that does offer GPU is Faiss anyway.
- Only in-memory datasets are used. Many of these tools do support serializing and reading from disk (including
sqlite-vec) and possiblymmap'ing, but this only tests in-memory datasets. Mostly because of numpy - Queries are made one after the other, not batched. Some tools offer APIs to query multiple inputs at the same time, but this benchmark runs queries sequentially. This was done to emulate "server request"-style queries, but multiple users would send queries at different times, making batching more difficult. To note,
sqlite-vecdoes not support batched queries yet.
These tests are run in Python. Vectors are provided as an in-memory numpy array, and each test converts that numpy array into whatever makes sense for the given tool. For example, sqlite-vec tests will read those vectors into a SQLite table. DuckDB will read them into an Array array then create a DuckDB table from that.