Commit graph

62 commits

Author SHA1 Message Date
Alex Garcia
ba0db0b6d6 Add rescore index for ANN queries
Add rescore index type: stores full-precision float vectors in a rowid-keyed
shadow table, quantizes to int8 for fast initial scan, then rescores top
candidates with original vectors. Includes config parser, shadow table
management, insert/delete support, KNN integration, compile flag
(SQLITE_VEC_ENABLE_RESCORE), fuzz targets, and tests.
2026-03-29 19:45:54 -07:00
Alex Garcia
bf2455f2ba Add ANN search support for vec0 virtual table
Add approximate nearest neighbor infrastructure to vec0: shared distance
dispatch (vec0_distance_full), flat index type with parser, NEON-optimized
cosine/Hamming for float32/int8, amalgamation script, and benchmark suite
(benchmarks-ann/) with ground-truth generation and profiling tools. Remove
unused vec_npy_each/vec_static_blobs code, fix missing stdint.h include.
2026-03-29 19:44:44 -07:00
Alex Garcia
380b0bb032 Redact version from info table snapshot to avoid test failures on version bumps
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 00:09:28 -07:00
Alex Garcia
cb147c8834
Complete vec0 DELETE: zero data, reclaim empty chunks, fix metadata rc bug (#268)
When a row is deleted from a vec0 virtual table, the rowid slot in
_chunks.rowids and vector data in _vector_chunksNN.vectors are now
zeroed out (previously left as stale data, tracked in #54). When all
rows in a chunk are deleted (validity bitmap all zeros), the chunk and
its associated vector/metadata shadow table rows are reclaimed.

- Add vec0Update_Delete_ClearRowid to zero the rowid blob slot
- Add vec0Update_Delete_ClearVectors to zero all vector blob slots
- Add vec0Update_Delete_DeleteChunkIfEmpty to detect and delete
  fully-empty chunks from _chunks, _vector_chunksNN, _metadatachunksNN
- Fix missing rc check in ClearMetadata loop (bug: errors were silently
  ignored)
- Fix vec0_new_chunk to explicitly set _rowid_ on shadow table INSERTs
  (SHADOW_TABLE_ROWID_QUIRK: "rowid PRIMARY KEY" without INTEGER type
  is not a true rowid alias, causing blob_open failures after chunk
  delete+recreate cycles)
- Add 13 new tests covering rowid/vector zeroing, chunk reclamation,
  metadata/auxiliary/partition/text-PK/int8/bit variants, and
  page_count shrinkage verification
- Add vec0-delete-completeness fuzz target
- Update snapshots for new delete zeroing behavior

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 00:02:36 -07:00
Alex Garcia
b669801d31 Add UBSAN findings TODO and improve vec-mismatch fuzzer
Document three classes of undefined behavior found by UBSAN:
function pointer type mismatches, misaligned f32 reads, and
float-to-integer overflow in vec_quantize_int8.

Improve vec-mismatch fuzzer to cover all error-path cleanup patterns:
type mismatches, dimension mismatches, single-arg functions, and
both text and blob inputs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 21:19:33 -08:00
Alex Garcia
0dd0765cc6 Add vec-mismatch fuzz target that catches aCleanup(a) bug in ensure_vector_match
Targeted fuzzer for two-argument vector functions (vec_distance_*,
vec_add, vec_sub) that binds a valid JSON vector as arg1 and fuzz
data as arg2. This exercises the error path in ensure_vector_match()
where the first vector parses successfully (with sqlite3_free cleanup)
but the second fails, triggering the buggy aCleanup(a) call on line
1031 of sqlite-vec.c (should be aCleanup(*a)).

The fuzzer catches this immediately — ASAN reports "bad-free" when
sqlite3_free is called on a stack address.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 20:45:50 -08:00
Alex Garcia
a61d45183b Add comprehensive fuzz testing infrastructure with 6 new targets
- Fix numpy.c: tautology bug (|| → &&), infinite loop, and missing
  sqlite3_vec_numpy_init call
- Replace tests/fuzz/Makefile: auto-detect clang, add UBSAN, macOS
  ld_classic workaround, generic build rules for all 10 targets
- Add 6 new fuzz targets: shadow-corrupt (corrupted shadow tables),
  vec0-operations (INSERT/DELETE/query sequences), scalar-functions
  (all 18 SQL scalar functions), vec0-create-full (CREATE + lifecycle),
  metadata-columns (metadata/auxiliary columns), vec-each (vec_each TVF)
- Add seed corpora for shadow-corrupt, vec0-operations, exec, and json
- Add fuzz-build/fuzz-quick/fuzz-long targets to root Makefile

All 10 targets verified building and running on macOS ARM (Apple Silicon).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 20:33:05 -08:00
Alex Garcia
9a6bf96b92 Extract shared Python test utilities into tests/helpers.py
Deduplicates exec(), vec0_shadow_table_contents(), _f32(), _i64(), and
_int8() helpers that were copied across six test files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 20:05:21 -08:00
Alex Garcia
206fbc2bdd Add Python regression tests for existing insert/delete paths
Baseline tests protecting non-DiskANN chunk-based insert and delete
behavior: vector round-trips, auto rowids, text primary keys, delete
validity, reinsert after delete, dimension/type validation, and v_info
snapshot.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 18:12:01 -08:00
Alex Garcia
0bca960e9d Add LPAREN, RPAREN, COMMA token types to the scanner
Extends the vec0 tokenizer to recognize '(', ')', and ',' as
single-character tokens, preparing for DiskANN index option parsing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 18:07:57 -08:00
Alex Garcia
aab9b37de2 Add unit tests for distance functions (L2, cosine, hamming)
Add test-only wrappers behind SQLITE_VEC_TEST compile flag to expose
static distance functions for unit testing. Includes tests for
distance_l2_sqr_float (4 cases), distance_cosine_float (3 cases),
and distance_hamming (4 cases). Print active SIMD flags at test start.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 18:04:30 -08:00
Alex Garcia
79d5818015 Add regression tests for vec0_parse_vector_column edge cases
Add SQLITE_EMPTY tests for non-vector column inputs (primary key,
partition key, unknown types) and SQLITE_ERROR tests for zero
dimensions and empty brackets. Tighten existing error assertions
from rc != SQLITE_OK to exact expected return codes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 17:50:31 -08:00
Alex Garcia
0659d8848d Update test-unit.c and unittest.rs functions to enforce pre-existing behavior
- Expand sqlite-vec-internal.h with scanner/tokenizer types, vector column
  definition types, and parser function declarations
- Fix min_idx declaration to match actual C signature (add candidates,
  bTaken, k_used params)
- Compile test-unit with -DSQLITE_CORE and link vendor/sqlite3.c so
  sqlite3 API functions (sqlite3_strnicmp, sqlite3_mprintf, etc.) resolve
- Add unit tests for vec0_token_next, Vec0Scanner, and
  vec0_parse_vector_column
- Fix Rust build.rs to define SQLITE_CORE and compile vendor/sqlite3.c
- Fix Rust min_idx FFI signature and wrapper to match actual C function

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 17:46:11 -08:00
Alex Garcia
681f9028af --managed-python, fix flakey tests 2026-02-13 07:08:48 -08:00
Alex Garcia
8bc206fa0b test: drop vec_xyz if it exists 2026-02-13 07:05:39 -08:00
Alex Garcia
d9020b7ded test: shadow snapshot update 2026-02-13 06:59:18 -08:00
Alex Garcia
f8db17fded fix test tale ordering 2026-02-13 06:55:49 -08:00
Alex Garcia
611ca631ea
Support constaints on distance column in KNN queries, for pagination and range queries (#166)
* Initial pass, needs tests+docs

* old: test-knn-constraints

* cleanup
2026-02-13 06:38:26 -08:00
Alex Garcia
44e4438ed5 fix segfault on invalid vec_each() input, fixes #163 2025-01-10 14:44:37 -08:00
Alex Garcia
352f953fc0
Metadata filtering (#124)
* initial pass at PARTITION KEY support.

* Initial pass, allow auxiliary columns on vec0 virtual tables

* update TODO

* Initial pass at metadata filtering

* unit tests

* gha this PR branch

* fixup tests

* doc internal

* fix tests, KNN/rowids in

* define SQLITE_INDEX_CONSTRAINT_OFFSET

* whoops

* update tests, syrupy, use uv

* un ignore pyproject.toml

* dot

* tests/

* type error?

* win: .exe, update error name

* try fix macos python, paren around expr?

* win bash?

* dbg :(

* explicit error

* op

* dbg win

* win ./tests/.venv/Scripts/python.exe

* block UPDATEs on partition key values for now

* test this branch

* accidentally removved "partition key type mistmatch" block during merge

* typo ugh

* bruv

* start aux snapshots

* drop aux shadow table on destroy

* enforce column types

* block WHERE constraints on auxiliary columns in KNN queries

* support delete

* support UPDATE on auxiliary columns

* test this PR

* dont inline that

* test-metadata.py

* memzero text buffer

* stress test

* more snpashot tests

* rm double/int32, just float/int64

* finish type checking

* long text support

* DELETE support

* UPDATE support

* fix snapshot names

* drop not-used in eqp

* small fixes

* boolean comparison handling

* ensure error is raised when long string constraint

* new version string for beta builds

* typo whoops

* ann-filtering-benchmark directory

* test-case

* updates

* fix aux column error when using non-default rowid values, needs test

* refactor some text knn filtering

* rowids blob read only on text metadata filters

* refactor

* add failing test causes for non eq text knn

* text knn NE

* test cases diff

* GT

* text knn GT/GE fixes

* text knn LT/LE

* clean

* vtab_in handling

* unblock aux failures for now

* guard sqlite3_vtab_in

* else in guard?

* fixes and tests

* add broken shadow table test

* rename _metadata_chunksNN shadown table to _metadatachunksNN, for proper shadowName detection

* _metadata_text_NN shadow tables to _metadatatextNN

* SQLITE_VEC_VERSION_MAJOR SQLITE_VEC_VERSION_MINOR and SQLITE_VEC_VERSION_PATCH in sqlite-vec.h

* _info shadow table

* forgot to update aux snapshot?

* fix aux tests
2024-11-20 00:59:34 -08:00
Alex Garcia
9bfeaa7842
Auxiliary column support (#123)
* initial pass at PARTITION KEY support.

* Initial pass, allow auxiliary columns on vec0 virtual tables

* update TODO

* unit tests

* gha this PR branch

* fixup tests

* doc internal

* fix tests, KNN/rowids in

* define SQLITE_INDEX_CONSTRAINT_OFFSET

* whoops

* update tests, syrupy, use uv

* un ignore pyproject.toml

* dot

* tests/

* type error?

* win: .exe, update error name

* try fix macos python, paren around expr?

* win bash?

* dbg :(

* explicit error

* op

* dbg win

* win ./tests/.venv/Scripts/python.exe

* block UPDATEs on partition key values for now

* test this branch

* accidentally removved "partition key type mistmatch" block during merge

* typo ugh

* bruv

* start aux snapshots

* drop aux shadow table on destroy

* enforce column types

* block WHERE constraints on auxiliary columns in KNN queries

* support delete

* support UPDATE on auxiliary columns
2024-11-20 00:30:23 -08:00
Alex Garcia
6658624172
PARTITION KEY support (#122)
* initial pass at PARTITION KEY support.

* unit tests

* gha this PR branch

* fixup tests

* doc internal

* fix tests, KNN/rowids in

* define SQLITE_INDEX_CONSTRAINT_OFFSET

* whoops

* update tests, syrupy, use uv

* un ignore pyproject.toml

* dot

* tests/

* type error?

* win: .exe, update error name

* try fix macos python, paren around expr?

* win bash?

* dbg :(

* explicit error

* op

* dbg win

* win ./tests/.venv/Scripts/python.exe

* block UPDATEs on partition key values for now
2024-11-20 00:02:04 -08:00
Alex Garcia
763aad5d6a Remove vec_npy_each from default entrypoint and move to sqlite3_vec_numpy_init entrypoint 2024-09-25 23:07:17 -07:00
Alex Garcia
7ea402931e fmt and SQLITE_VEC_OMIT_FS fixes 2024-08-10 23:33:28 -07:00
Alex Garcia
a6498d04b8 properly check SQLITE_THREADSAFE for static compilation 2024-08-09 13:23:18 -07:00
Alex Garcia
2ed95aacc5 ensure UPDATEs and DELETEs work on vec0 tables with text primary keys, refs #77 2024-08-09 12:16:56 -07:00
Alex Garcia
fbd9790542 gha: math 2024-08-05 16:55:45 -07:00
Alex Garcia
530a3c95d2 Explicitly test that SQLite version 3.31.1 is compatible with sqlite-vec when statically compiling 2024-08-05 16:46:35 -07:00
Alex Garcia
e379c205c8 limit checks 2024-08-01 02:45:51 -07:00
Alex Garcia
a0bc9404ce static updates 2024-07-31 12:56:09 -07:00
Alex Garcia
e8219064cb fmt 2024-07-31 12:55:46 -07:00
Alex Garcia
156d6c1e3b Merge branch 'main' of github.com:asg017/sqlite-vec into main 2024-07-25 11:23:00 -07:00
Alex Garcia
65656cbadc fuzz work 2024-07-25 11:16:06 -07:00
Alex Garcia
0f5bc2f254 fmt 2024-07-23 23:57:42 -07:00
Alex Garcia
21d442903e test vec0 vacuums 2024-07-23 22:36:42 -07:00
Alex Garcia
633db6e9cc add l1 distance to vec0 tables 2024-07-23 14:04:17 -07:00
Alex Garcia
79491542e5 Merge branch 'main' of github.com:asg017/sqlite-vec into main 2024-07-23 12:27:37 -07:00
Daniel Levi-Minzi
25b85afc89
l1 distance (#39)
* initial work on l1

* l1 int8 neon implementation

* tweak l1 int8 and add test

* broken overflow still

* some progress on l1

* change to i32 instead of i64

* remove comment

* ignore poetry stuff

* unrolled l1 int8 and format

* remove comments
2024-07-23 09:04:15 -07:00
Alex Garcia
7fc8248f28 ensure statements opened by vec0 are finalize before commits. 2024-07-23 08:59:34 -07:00
Alex Garcia
ff6cf96e2a vec_type(), API references 2024-07-22 21:24:44 -07:00
Alex Garcia
f4fe53e584 docs and fuzz 2024-07-16 22:28:15 -07:00
Alex Garcia
f217cbf2bd knn cleanups and tests 2024-07-05 12:07:45 -07:00
Alex Garcia
3442131613 test_vec_quantize_binary small test for windows 2024-06-28 16:27:41 -07:00
Alex Garcia
37b4c2e9dc is it q?? trying to fix windows i64 stuff 2024-06-28 16:06:04 -07:00
Alex Garcia
2eafd843d7 no inline, windows i64 fix? 2024-06-28 16:00:58 -07:00
Alex Garcia
a5525c9a5d vec0 point and knn error handling 2024-06-28 15:29:13 -07:00
Alex Garcia
2fdd760dd1 fmt 2024-06-28 10:51:59 -07:00
Alex Garcia
b923c596df a ton more error handing, vec0 insert/delete/update, npy fixes 2024-06-28 10:51:49 -07:00
Alex Garcia
9dc772e9f9 format, pragma_table_list -> sqlite_master 2024-06-25 08:54:51 -07:00
Alex Garcia
feea3bfe43 remove vec_expo, impl drop vec0 2024-06-25 08:52:48 -07:00