Commit graph

72 commits

Author SHA1 Message Date
Alex Garcia
7de925be70 Fix int16 overflow in l2_sqr_int8_neon SIMD distance
vmulq_s16(diff, diff) produced int16 results, but diff can be up to
255 for int8 vectors (-128 vs 127), and 255^2 = 65025 overflows
int16 (max 32767). This caused NaN/wrong results for int8 vectors
with large differences.

Fix: use vmull_s16 (widening multiply) to produce int32 results
directly, avoiding the intermediate int16 overflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:55:37 -07:00
Alex Garcia
4bee88384b Reject IVF binary quantizer when dimensions not divisible by 8
The binary quantizer uses D/8 for buffer sizes and memset, which
truncates for non-multiple-of-8 dimensions, causing OOB writes.
Rather than using ceiling division, enforce the constraint at
table creation time with a clear parse error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:51:27 -07:00
Alex Garcia
82f4eb08bf Add NULL checks after sqlite3_column_blob in rescore and DiskANN
sqlite3_column_blob() returns NULL for zero-length blobs or on OOM.
Several call sites in rescore KNN and DiskANN node/vector read passed
the result directly to memcpy without checking, risking NULL deref on
corrupt or empty databases. IVF already had proper NULL checks.

Adds corruption regression tests that truncate shadow table blobs and
verify the query errors cleanly instead of crashing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:31:49 -07:00
Alex Garcia
9df59b4c03 Temporarily block vector UPDATE for DiskANN and IVF indexes
vec0Update_UpdateVectorColumn writes to flat chunk blobs but does not
update DiskANN graph or IVF index structures, silently corrupting KNN
results. Now returns a clear error for these index types. Rescore
UPDATE is unaffected — it already has a full implementation that
updates both quantized chunks and float vectors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:08:08 -07:00
Alex Garcia
575371d751 Add DiskANN index for vec0 virtual table
Add DiskANN graph-based index: builds a Vamana graph with configurable R
(max degree) and L (search list size, separate for insert/query), supports
int8 quantization with rescore, lazy reverse-edge replacement, pre-quantized
query optimization, and insert buffer reuse. Includes shadow table management,
delete support, KNN integration, compile flag (SQLITE_VEC_ENABLE_DISKANN),
release-demo workflow, fuzz targets, and tests. Fixes rescore int8
quantization bug.
2026-03-31 01:21:54 -07:00
Alex Garcia
bb3ef78f75 Hide IVF behind SQLITE_VEC_EXPERIMENTAL_IVF_ENABLE, default off
Rename SQLITE_VEC_ENABLE_IVF to SQLITE_VEC_EXPERIMENTAL_IVF_ENABLE and
flip the default from 1 to 0. IVF tests are automatically skipped when
the build flag is not set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 01:18:47 -07:00
Alex Garcia
3358e127f6 Add IVF index for vec0 virtual table
Add inverted file (IVF) index type: partitions vectors into clusters via
k-means, quantizes to int8, and scans only the nearest nprobe partitions at
query time. Includes shadow table management, insert/delete, KNN integration,
compile flag (SQLITE_VEC_ENABLE_IVF), fuzz targets, and tests. Removes
superseded ivf-benchmarks/ directory.
2026-03-31 01:18:47 -07:00
Alex Garcia
45d1375602 Merge branch 'main' into pr/rescore 2026-03-31 01:12:50 -07:00
Alex Garcia
0de765f457
Add ANN search support for vec0 virtual table (#273)
Add approximate nearest neighbor infrastructure to vec0: shared distance
dispatch (vec0_distance_full), flat index type with parser, NEON-optimized
cosine/Hamming for float32/int8, amalgamation script, and benchmark suite
(benchmarks-ann/) with ground-truth generation and profiling tools. Remove
unused vec_npy_each/vec_static_blobs code, fix missing stdint.h include.
2026-03-31 01:03:32 -07:00
Alex Garcia
ee9bd2ba4d
Fix SQLITE_DONE leak in ClearMetadata that broke DELETE on long text metadata (#274) (#275)
vec0Update_Delete_ClearMetadata's long-text branch runs a DELETE via
sqlite3_step, which returns SQLITE_DONE (101) on success. The code
checked for failure but never normalized the success case to SQLITE_OK.
The function's epilogue returned SQLITE_DONE as-is, which the caller
(vec0Update_Delete) treated as an error, aborting the DELETE scan and
silently leaving rows behind.

- Normalize rc to SQLITE_OK after successful sqlite3_step in ClearMetadata
- Move sqlite3_finalize before the rc check (cleanup on all paths)
- Add test_delete_by_metadata_with_long_text regression test
- Update test_deletes snapshot (row 3 now correctly deleted)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 16:39:59 -07:00
Alex Garcia
ba0db0b6d6 Add rescore index for ANN queries
Add rescore index type: stores full-precision float vectors in a rowid-keyed
shadow table, quantizes to int8 for fast initial scan, then rescores top
candidates with original vectors. Includes config parser, shadow table
management, insert/delete support, KNN integration, compile flag
(SQLITE_VEC_ENABLE_RESCORE), fuzz targets, and tests.
2026-03-29 19:45:54 -07:00
Alex Garcia
bf2455f2ba Add ANN search support for vec0 virtual table
Add approximate nearest neighbor infrastructure to vec0: shared distance
dispatch (vec0_distance_full), flat index type with parser, NEON-optimized
cosine/Hamming for float32/int8, amalgamation script, and benchmark suite
(benchmarks-ann/) with ground-truth generation and profiling tools. Remove
unused vec_npy_each/vec_static_blobs code, fix missing stdint.h include.
2026-03-29 19:44:44 -07:00
Alex Garcia
380b0bb032 Redact version from info table snapshot to avoid test failures on version bumps
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 00:09:28 -07:00
Alex Garcia
cb147c8834
Complete vec0 DELETE: zero data, reclaim empty chunks, fix metadata rc bug (#268)
When a row is deleted from a vec0 virtual table, the rowid slot in
_chunks.rowids and vector data in _vector_chunksNN.vectors are now
zeroed out (previously left as stale data, tracked in #54). When all
rows in a chunk are deleted (validity bitmap all zeros), the chunk and
its associated vector/metadata shadow table rows are reclaimed.

- Add vec0Update_Delete_ClearRowid to zero the rowid blob slot
- Add vec0Update_Delete_ClearVectors to zero all vector blob slots
- Add vec0Update_Delete_DeleteChunkIfEmpty to detect and delete
  fully-empty chunks from _chunks, _vector_chunksNN, _metadatachunksNN
- Fix missing rc check in ClearMetadata loop (bug: errors were silently
  ignored)
- Fix vec0_new_chunk to explicitly set _rowid_ on shadow table INSERTs
  (SHADOW_TABLE_ROWID_QUIRK: "rowid PRIMARY KEY" without INTEGER type
  is not a true rowid alias, causing blob_open failures after chunk
  delete+recreate cycles)
- Add 13 new tests covering rowid/vector zeroing, chunk reclamation,
  metadata/auxiliary/partition/text-PK/int8/bit variants, and
  page_count shrinkage verification
- Add vec0-delete-completeness fuzz target
- Update snapshots for new delete zeroing behavior

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 00:02:36 -07:00
Alex Garcia
b669801d31 Add UBSAN findings TODO and improve vec-mismatch fuzzer
Document three classes of undefined behavior found by UBSAN:
function pointer type mismatches, misaligned f32 reads, and
float-to-integer overflow in vec_quantize_int8.

Improve vec-mismatch fuzzer to cover all error-path cleanup patterns:
type mismatches, dimension mismatches, single-arg functions, and
both text and blob inputs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 21:19:33 -08:00
Alex Garcia
0dd0765cc6 Add vec-mismatch fuzz target that catches aCleanup(a) bug in ensure_vector_match
Targeted fuzzer for two-argument vector functions (vec_distance_*,
vec_add, vec_sub) that binds a valid JSON vector as arg1 and fuzz
data as arg2. This exercises the error path in ensure_vector_match()
where the first vector parses successfully (with sqlite3_free cleanup)
but the second fails, triggering the buggy aCleanup(a) call on line
1031 of sqlite-vec.c (should be aCleanup(*a)).

The fuzzer catches this immediately — ASAN reports "bad-free" when
sqlite3_free is called on a stack address.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 20:45:50 -08:00
Alex Garcia
a61d45183b Add comprehensive fuzz testing infrastructure with 6 new targets
- Fix numpy.c: tautology bug (|| → &&), infinite loop, and missing
  sqlite3_vec_numpy_init call
- Replace tests/fuzz/Makefile: auto-detect clang, add UBSAN, macOS
  ld_classic workaround, generic build rules for all 10 targets
- Add 6 new fuzz targets: shadow-corrupt (corrupted shadow tables),
  vec0-operations (INSERT/DELETE/query sequences), scalar-functions
  (all 18 SQL scalar functions), vec0-create-full (CREATE + lifecycle),
  metadata-columns (metadata/auxiliary columns), vec-each (vec_each TVF)
- Add seed corpora for shadow-corrupt, vec0-operations, exec, and json
- Add fuzz-build/fuzz-quick/fuzz-long targets to root Makefile

All 10 targets verified building and running on macOS ARM (Apple Silicon).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 20:33:05 -08:00
Alex Garcia
9a6bf96b92 Extract shared Python test utilities into tests/helpers.py
Deduplicates exec(), vec0_shadow_table_contents(), _f32(), _i64(), and
_int8() helpers that were copied across six test files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 20:05:21 -08:00
Alex Garcia
206fbc2bdd Add Python regression tests for existing insert/delete paths
Baseline tests protecting non-DiskANN chunk-based insert and delete
behavior: vector round-trips, auto rowids, text primary keys, delete
validity, reinsert after delete, dimension/type validation, and v_info
snapshot.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 18:12:01 -08:00
Alex Garcia
0bca960e9d Add LPAREN, RPAREN, COMMA token types to the scanner
Extends the vec0 tokenizer to recognize '(', ')', and ',' as
single-character tokens, preparing for DiskANN index option parsing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 18:07:57 -08:00
Alex Garcia
aab9b37de2 Add unit tests for distance functions (L2, cosine, hamming)
Add test-only wrappers behind SQLITE_VEC_TEST compile flag to expose
static distance functions for unit testing. Includes tests for
distance_l2_sqr_float (4 cases), distance_cosine_float (3 cases),
and distance_hamming (4 cases). Print active SIMD flags at test start.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 18:04:30 -08:00
Alex Garcia
79d5818015 Add regression tests for vec0_parse_vector_column edge cases
Add SQLITE_EMPTY tests for non-vector column inputs (primary key,
partition key, unknown types) and SQLITE_ERROR tests for zero
dimensions and empty brackets. Tighten existing error assertions
from rc != SQLITE_OK to exact expected return codes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 17:50:31 -08:00
Alex Garcia
0659d8848d Update test-unit.c and unittest.rs functions to enforce pre-existing behavior
- Expand sqlite-vec-internal.h with scanner/tokenizer types, vector column
  definition types, and parser function declarations
- Fix min_idx declaration to match actual C signature (add candidates,
  bTaken, k_used params)
- Compile test-unit with -DSQLITE_CORE and link vendor/sqlite3.c so
  sqlite3 API functions (sqlite3_strnicmp, sqlite3_mprintf, etc.) resolve
- Add unit tests for vec0_token_next, Vec0Scanner, and
  vec0_parse_vector_column
- Fix Rust build.rs to define SQLITE_CORE and compile vendor/sqlite3.c
- Fix Rust min_idx FFI signature and wrapper to match actual C function

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 17:46:11 -08:00
Alex Garcia
681f9028af --managed-python, fix flakey tests 2026-02-13 07:08:48 -08:00
Alex Garcia
8bc206fa0b test: drop vec_xyz if it exists 2026-02-13 07:05:39 -08:00
Alex Garcia
d9020b7ded test: shadow snapshot update 2026-02-13 06:59:18 -08:00
Alex Garcia
f8db17fded fix test tale ordering 2026-02-13 06:55:49 -08:00
Alex Garcia
611ca631ea
Support constaints on distance column in KNN queries, for pagination and range queries (#166)
* Initial pass, needs tests+docs

* old: test-knn-constraints

* cleanup
2026-02-13 06:38:26 -08:00
Alex Garcia
44e4438ed5 fix segfault on invalid vec_each() input, fixes #163 2025-01-10 14:44:37 -08:00
Alex Garcia
352f953fc0
Metadata filtering (#124)
* initial pass at PARTITION KEY support.

* Initial pass, allow auxiliary columns on vec0 virtual tables

* update TODO

* Initial pass at metadata filtering

* unit tests

* gha this PR branch

* fixup tests

* doc internal

* fix tests, KNN/rowids in

* define SQLITE_INDEX_CONSTRAINT_OFFSET

* whoops

* update tests, syrupy, use uv

* un ignore pyproject.toml

* dot

* tests/

* type error?

* win: .exe, update error name

* try fix macos python, paren around expr?

* win bash?

* dbg :(

* explicit error

* op

* dbg win

* win ./tests/.venv/Scripts/python.exe

* block UPDATEs on partition key values for now

* test this branch

* accidentally removved "partition key type mistmatch" block during merge

* typo ugh

* bruv

* start aux snapshots

* drop aux shadow table on destroy

* enforce column types

* block WHERE constraints on auxiliary columns in KNN queries

* support delete

* support UPDATE on auxiliary columns

* test this PR

* dont inline that

* test-metadata.py

* memzero text buffer

* stress test

* more snpashot tests

* rm double/int32, just float/int64

* finish type checking

* long text support

* DELETE support

* UPDATE support

* fix snapshot names

* drop not-used in eqp

* small fixes

* boolean comparison handling

* ensure error is raised when long string constraint

* new version string for beta builds

* typo whoops

* ann-filtering-benchmark directory

* test-case

* updates

* fix aux column error when using non-default rowid values, needs test

* refactor some text knn filtering

* rowids blob read only on text metadata filters

* refactor

* add failing test causes for non eq text knn

* text knn NE

* test cases diff

* GT

* text knn GT/GE fixes

* text knn LT/LE

* clean

* vtab_in handling

* unblock aux failures for now

* guard sqlite3_vtab_in

* else in guard?

* fixes and tests

* add broken shadow table test

* rename _metadata_chunksNN shadown table to _metadatachunksNN, for proper shadowName detection

* _metadata_text_NN shadow tables to _metadatatextNN

* SQLITE_VEC_VERSION_MAJOR SQLITE_VEC_VERSION_MINOR and SQLITE_VEC_VERSION_PATCH in sqlite-vec.h

* _info shadow table

* forgot to update aux snapshot?

* fix aux tests
2024-11-20 00:59:34 -08:00
Alex Garcia
9bfeaa7842
Auxiliary column support (#123)
* initial pass at PARTITION KEY support.

* Initial pass, allow auxiliary columns on vec0 virtual tables

* update TODO

* unit tests

* gha this PR branch

* fixup tests

* doc internal

* fix tests, KNN/rowids in

* define SQLITE_INDEX_CONSTRAINT_OFFSET

* whoops

* update tests, syrupy, use uv

* un ignore pyproject.toml

* dot

* tests/

* type error?

* win: .exe, update error name

* try fix macos python, paren around expr?

* win bash?

* dbg :(

* explicit error

* op

* dbg win

* win ./tests/.venv/Scripts/python.exe

* block UPDATEs on partition key values for now

* test this branch

* accidentally removved "partition key type mistmatch" block during merge

* typo ugh

* bruv

* start aux snapshots

* drop aux shadow table on destroy

* enforce column types

* block WHERE constraints on auxiliary columns in KNN queries

* support delete

* support UPDATE on auxiliary columns
2024-11-20 00:30:23 -08:00
Alex Garcia
6658624172
PARTITION KEY support (#122)
* initial pass at PARTITION KEY support.

* unit tests

* gha this PR branch

* fixup tests

* doc internal

* fix tests, KNN/rowids in

* define SQLITE_INDEX_CONSTRAINT_OFFSET

* whoops

* update tests, syrupy, use uv

* un ignore pyproject.toml

* dot

* tests/

* type error?

* win: .exe, update error name

* try fix macos python, paren around expr?

* win bash?

* dbg :(

* explicit error

* op

* dbg win

* win ./tests/.venv/Scripts/python.exe

* block UPDATEs on partition key values for now
2024-11-20 00:02:04 -08:00
Alex Garcia
763aad5d6a Remove vec_npy_each from default entrypoint and move to sqlite3_vec_numpy_init entrypoint 2024-09-25 23:07:17 -07:00
Alex Garcia
7ea402931e fmt and SQLITE_VEC_OMIT_FS fixes 2024-08-10 23:33:28 -07:00
Alex Garcia
a6498d04b8 properly check SQLITE_THREADSAFE for static compilation 2024-08-09 13:23:18 -07:00
Alex Garcia
2ed95aacc5 ensure UPDATEs and DELETEs work on vec0 tables with text primary keys, refs #77 2024-08-09 12:16:56 -07:00
Alex Garcia
fbd9790542 gha: math 2024-08-05 16:55:45 -07:00
Alex Garcia
530a3c95d2 Explicitly test that SQLite version 3.31.1 is compatible with sqlite-vec when statically compiling 2024-08-05 16:46:35 -07:00
Alex Garcia
e379c205c8 limit checks 2024-08-01 02:45:51 -07:00
Alex Garcia
a0bc9404ce static updates 2024-07-31 12:56:09 -07:00
Alex Garcia
e8219064cb fmt 2024-07-31 12:55:46 -07:00
Alex Garcia
156d6c1e3b Merge branch 'main' of github.com:asg017/sqlite-vec into main 2024-07-25 11:23:00 -07:00
Alex Garcia
65656cbadc fuzz work 2024-07-25 11:16:06 -07:00
Alex Garcia
0f5bc2f254 fmt 2024-07-23 23:57:42 -07:00
Alex Garcia
21d442903e test vec0 vacuums 2024-07-23 22:36:42 -07:00
Alex Garcia
633db6e9cc add l1 distance to vec0 tables 2024-07-23 14:04:17 -07:00
Alex Garcia
79491542e5 Merge branch 'main' of github.com:asg017/sqlite-vec into main 2024-07-23 12:27:37 -07:00
Daniel Levi-Minzi
25b85afc89
l1 distance (#39)
* initial work on l1

* l1 int8 neon implementation

* tweak l1 int8 and add test

* broken overflow still

* some progress on l1

* change to i32 instead of i64

* remove comment

* ignore poetry stuff

* unrolled l1 int8 and format

* remove comments
2024-07-23 09:04:15 -07:00
Alex Garcia
7fc8248f28 ensure statements opened by vec0 are finalize before commits. 2024-07-23 08:59:34 -07:00
Alex Garcia
ff6cf96e2a vec_type(), API references 2024-07-22 21:24:44 -07:00