Commit graph

80 commits

Author SHA1 Message Date
Alex Garcia
bf2455f2ba Add ANN search support for vec0 virtual table
Add approximate nearest neighbor infrastructure to vec0: shared distance
dispatch (vec0_distance_full), flat index type with parser, NEON-optimized
cosine/Hamming for float32/int8, amalgamation script, and benchmark suite
(benchmarks-ann/) with ground-truth generation and profiling tools. Remove
unused vec_npy_each/vec_static_blobs code, fix missing stdint.h include.
2026-03-29 19:44:44 -07:00
Alex Garcia
cb147c8834
Complete vec0 DELETE: zero data, reclaim empty chunks, fix metadata rc bug (#268)
When a row is deleted from a vec0 virtual table, the rowid slot in
_chunks.rowids and vector data in _vector_chunksNN.vectors are now
zeroed out (previously left as stale data, tracked in #54). When all
rows in a chunk are deleted (validity bitmap all zeros), the chunk and
its associated vector/metadata shadow table rows are reclaimed.

- Add vec0Update_Delete_ClearRowid to zero the rowid blob slot
- Add vec0Update_Delete_ClearVectors to zero all vector blob slots
- Add vec0Update_Delete_DeleteChunkIfEmpty to detect and delete
  fully-empty chunks from _chunks, _vector_chunksNN, _metadatachunksNN
- Fix missing rc check in ClearMetadata loop (bug: errors were silently
  ignored)
- Fix vec0_new_chunk to explicitly set _rowid_ on shadow table INSERTs
  (SHADOW_TABLE_ROWID_QUIRK: "rowid PRIMARY KEY" without INTEGER type
  is not a true rowid alias, causing blob_open failures after chunk
  delete+recreate cycles)
- Add 13 new tests covering rowid/vector zeroing, chunk reclamation,
  metadata/auxiliary/partition/text-PK/int8/bit variants, and
  page_count shrinkage verification
- Add vec0-delete-completeness fuzz target
- Update snapshots for new delete zeroing behavior

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 00:02:36 -07:00
Alex Garcia
d04e2aeda1 Fix remaining fuzzer issues: leaks and macOS SDK headers
sqlite-vec.c:
- vec0_free: add loops to free partition, auxiliary, and metadata
  column names (previously leaked on error paths)
- vec0_init: update pNew->numXxxColumns incrementally in the parse
  loop so vec0_free sees correct counts on early goto-error paths
  (previously the counts were only written after the loop, so vec0_free
  would loop 0 times and leak names allocated inside the loop)

fuzz.yaml:
- macOS: pass -isysroot $(xcrun --sdk macosx --show-sdk-path) so
  Xcode clang can find system headers (stdio.h, assert.h, etc.)
- Fix artifact upload paths: libFuzzer writes crash-*/leak-* to
  the cwd (repo root), not tests/fuzz/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 17:35:41 -08:00
Alex Garcia
1b53b942e0 Fix remaining fuzzer issues: leaks, UBSAN NaN, macOS LLVM version
- fuzz.yaml: switch macOS to llvm@18 (latest LLVM uses typed allocation
  C++ ABI symbols not available on macOS 14 runner's system libc++)
- sqlite-vec.c: fix NaN input in vec_quantize_int8 by using !(val <= X)
  comparisons which evaluate to true for NaN, ensuring the clamp fires
- sqlite-vec.c: free pzErrMsg in vec_eachFilter error path (was leaking
  the error string returned by vector_from_value)
- sqlite-vec.c: add sqlite3_free(pNew) to vec0_init error path; vec0_free
  frees the contents but not the struct itself, mirroring vec0Disconnect
- sqlite-vec.c: free knn_data in vec0Filter_knn cleanup when rc != SQLITE_OK;
  on error the cursor's knn_data field is never set so it would not be
  freed by the cursor teardown path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 08:36:59 -08:00
Alex Garcia
cdbc34785f Fix fuzzer-found bugs and CI build issues
- fuzz.yaml: embed rpath to Homebrew LLVM's libc++ so macOS binaries can
  find the right C++ runtime at load time (fixes dyld weak-def crash)
- fuzz.yaml: add `make sqlite-vec.h` step on all platforms before building
  fuzz targets (the header is generated from a template, not checked in)
- fuzz.yaml: drop llvm version pin on Windows so choco succeeds when a
  newer LLVM is already installed on the runner
- sqlite-vec.c: change fvec_cleanup / fvec_cleanup_noop to take void*
  instead of f32* so they are ABI-compatible with vector_cleanup; removes
  UBSAN indirect-call errors at many call sites
- sqlite-vec.c: copy BLOB data into sqlite3_malloc'd buffer in
  fvec_from_value instead of aliasing the raw blob pointer, fixing UBSAN
  misaligned-load errors when SQLite hands us an unaligned blob
- sqlite-vec.c: guard npy_token_next string scan with ptr < end check
  before the closing-quote dereference (heap-buffer-overflow)
- sqlite-vec.c: clamp vec_quantize_int8 intermediate value to [-128, 127]
  before casting to i8 (UBSAN out-of-range conversion)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 07:16:33 -08:00
Alex Garcia
4ce1ef3c6f Fix bad-free in ensure_vector_match: aCleanup(a) → aCleanup(*a)
When the second vector argument fails to parse, the cleanup of the
first vector was called with the double-pointer 'a' instead of '*a'.
When the first vector was parsed from JSON text (cleanup = sqlite3_free),
this called sqlite3_free on a stack address, causing a crash.

Found by the vec-mismatch fuzz target.

Shout out to @renatgalimov in #257 for finding the original bug!

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 20:50:54 -08:00
Alex Garcia
0bca960e9d Add LPAREN, RPAREN, COMMA token types to the scanner
Extends the vec0 tokenizer to recognize '(', ')', and ',' as
single-character tokens, preparing for DiskANN index option parsing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 18:07:57 -08:00
Alex Garcia
aab9b37de2 Add unit tests for distance functions (L2, cosine, hamming)
Add test-only wrappers behind SQLITE_VEC_TEST compile flag to expose
static distance functions for unit testing. Includes tests for
distance_l2_sqr_float (4 cases), distance_cosine_float (3 cases),
and distance_hamming (4 cases). Print active SIMD flags at test start.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 18:04:30 -08:00
Alex Garcia
611ca631ea
Support constaints on distance column in KNN queries, for pagination and range queries (#166)
* Initial pass, needs tests+docs

* old: test-knn-constraints

* cleanup
2026-02-13 06:38:26 -08:00
Alex Garcia
44e4438ed5 fix segfault on invalid vec_each() input, fixes #163 2025-01-10 14:44:37 -08:00
Alex Garcia
352f953fc0
Metadata filtering (#124)
* initial pass at PARTITION KEY support.

* Initial pass, allow auxiliary columns on vec0 virtual tables

* update TODO

* Initial pass at metadata filtering

* unit tests

* gha this PR branch

* fixup tests

* doc internal

* fix tests, KNN/rowids in

* define SQLITE_INDEX_CONSTRAINT_OFFSET

* whoops

* update tests, syrupy, use uv

* un ignore pyproject.toml

* dot

* tests/

* type error?

* win: .exe, update error name

* try fix macos python, paren around expr?

* win bash?

* dbg :(

* explicit error

* op

* dbg win

* win ./tests/.venv/Scripts/python.exe

* block UPDATEs on partition key values for now

* test this branch

* accidentally removved "partition key type mistmatch" block during merge

* typo ugh

* bruv

* start aux snapshots

* drop aux shadow table on destroy

* enforce column types

* block WHERE constraints on auxiliary columns in KNN queries

* support delete

* support UPDATE on auxiliary columns

* test this PR

* dont inline that

* test-metadata.py

* memzero text buffer

* stress test

* more snpashot tests

* rm double/int32, just float/int64

* finish type checking

* long text support

* DELETE support

* UPDATE support

* fix snapshot names

* drop not-used in eqp

* small fixes

* boolean comparison handling

* ensure error is raised when long string constraint

* new version string for beta builds

* typo whoops

* ann-filtering-benchmark directory

* test-case

* updates

* fix aux column error when using non-default rowid values, needs test

* refactor some text knn filtering

* rowids blob read only on text metadata filters

* refactor

* add failing test causes for non eq text knn

* text knn NE

* test cases diff

* GT

* text knn GT/GE fixes

* text knn LT/LE

* clean

* vtab_in handling

* unblock aux failures for now

* guard sqlite3_vtab_in

* else in guard?

* fixes and tests

* add broken shadow table test

* rename _metadata_chunksNN shadown table to _metadatachunksNN, for proper shadowName detection

* _metadata_text_NN shadow tables to _metadatatextNN

* SQLITE_VEC_VERSION_MAJOR SQLITE_VEC_VERSION_MINOR and SQLITE_VEC_VERSION_PATCH in sqlite-vec.h

* _info shadow table

* forgot to update aux snapshot?

* fix aux tests
2024-11-20 00:59:34 -08:00
Alex Garcia
9bfeaa7842
Auxiliary column support (#123)
* initial pass at PARTITION KEY support.

* Initial pass, allow auxiliary columns on vec0 virtual tables

* update TODO

* unit tests

* gha this PR branch

* fixup tests

* doc internal

* fix tests, KNN/rowids in

* define SQLITE_INDEX_CONSTRAINT_OFFSET

* whoops

* update tests, syrupy, use uv

* un ignore pyproject.toml

* dot

* tests/

* type error?

* win: .exe, update error name

* try fix macos python, paren around expr?

* win bash?

* dbg :(

* explicit error

* op

* dbg win

* win ./tests/.venv/Scripts/python.exe

* block UPDATEs on partition key values for now

* test this branch

* accidentally removved "partition key type mistmatch" block during merge

* typo ugh

* bruv

* start aux snapshots

* drop aux shadow table on destroy

* enforce column types

* block WHERE constraints on auxiliary columns in KNN queries

* support delete

* support UPDATE on auxiliary columns
2024-11-20 00:30:23 -08:00
Alex Garcia
6658624172
PARTITION KEY support (#122)
* initial pass at PARTITION KEY support.

* unit tests

* gha this PR branch

* fixup tests

* doc internal

* fix tests, KNN/rowids in

* define SQLITE_INDEX_CONSTRAINT_OFFSET

* whoops

* update tests, syrupy, use uv

* un ignore pyproject.toml

* dot

* tests/

* type error?

* win: .exe, update error name

* try fix macos python, paren around expr?

* win bash?

* dbg :(

* explicit error

* op

* dbg win

* win ./tests/.venv/Scripts/python.exe

* block UPDATEs on partition key values for now
2024-11-20 00:02:04 -08:00
Alex Garcia
cc12e44d4c small docs work 2024-10-11 09:09:32 -07:00
Alex Garcia
763aad5d6a Remove vec_npy_each from default entrypoint and move to sqlite3_vec_numpy_init entrypoint 2024-09-25 23:07:17 -07:00
Alex Garcia
f09f6a0215 fmt 2024-09-20 13:17:57 -07:00
Alex Garcia
89faa5be15 don't call SQLITE_EXTENSION_INIT2 if SQLITE_CORE 2024-09-13 12:50:05 -07:00
Alex Garcia
fb11a2b32e guart #include "sqlite3ext.h" with SQLITE_CORE 2024-09-13 12:46:13 -07:00
Alex Garcia
7ea402931e fmt and SQLITE_VEC_OMIT_FS fixes 2024-08-10 23:33:28 -07:00
Alex Garcia
a6498d04b8 properly check SQLITE_THREADSAFE for static compilation 2024-08-09 13:23:18 -07:00
Alex Garcia
2ed95aacc5 ensure UPDATEs and DELETEs work on vec0 tables with text primary keys, refs #77 2024-08-09 12:16:56 -07:00
Alex Garcia
ac87b06b02 Add SQLITE_VEC_STATIC option, prefix json function 2024-08-09 10:44:39 -07:00
Alex Garcia
65c4aa3754 Merge branch 'main' of github.com:asg017/sqlite-vec into main 2024-08-09 10:26:50 -07:00
Sheldon Robinson
6c26399269
Fix compilation error for redefinition of jsonIsspace (#75)
* Fix compilation error for redefinition of jsonIsspace when including in amalgamation build of sqlite3.c

* Fix redefinition variable jsonIsSpaceX[]

* Add check for SQLITE_AMALGMATION

* Add check for SQLITE_CORE
2024-08-09 10:26:45 -07:00
Alex Garcia
fdd1b2679e control path fixes 2024-08-09 10:25:31 -07:00
Sheldon Robinson
6cccfae273
Add implementation for __builtin_popcountl for Windows on ARM (#72)
Window on Arm missing the __popcnt64 function.
Adding static implementation based on b64f1e77b5/lib/ngtcp2_ringbuf.c, line 34-43
2024-08-09 10:07:15 -07:00
Alex Garcia
530a3c95d2 Explicitly test that SQLite version 3.31.1 is compatible with sqlite-vec when statically compiling 2024-08-05 16:46:35 -07:00
Ikko Eltociear Ashimine
bd5c847a97
chore: update sqlite-vec.c (#61)
identifer -> identifier
2024-08-05 11:15:02 -07:00
Alex Garcia
e379c205c8 limit checks 2024-08-01 02:45:51 -07:00
Alex Garcia
a0bc9404ce static updates 2024-07-31 12:56:09 -07:00
Alex Garcia
0f5bc2f254 fmt 2024-07-23 23:57:42 -07:00
Alex Garcia
7a1b14976a vec_blob_close proper handling 2024-07-23 23:57:28 -07:00
Alex Garcia
633db6e9cc add l1 distance to vec0 tables 2024-07-23 14:04:17 -07:00
Alex Garcia
79491542e5 Merge branch 'main' of github.com:asg017/sqlite-vec into main 2024-07-23 12:27:37 -07:00
Daniel Levi-Minzi
25b85afc89
l1 distance (#39)
* initial work on l1

* l1 int8 neon implementation

* tweak l1 int8 and add test

* broken overflow still

* some progress on l1

* change to i32 instead of i64

* remove comment

* ignore poetry stuff

* unrolled l1 int8 and format

* remove comments
2024-07-23 09:04:15 -07:00
Alex Garcia
7fc8248f28 ensure statements opened by vec0 are finalize before commits. 2024-07-23 08:59:34 -07:00
Alex Garcia
ff6cf96e2a vec_type(), API references 2024-07-22 21:24:44 -07:00
Alex Garcia
f4fe53e584 docs and fuzz 2024-07-16 22:28:15 -07:00
Alex Garcia
73b9156a7c changes for ncruces go 2024-07-11 22:36:18 -07:00
Alex Garcia
23f0b75f9c fix win cl.exe, void unknown size 2024-07-05 12:10:05 -07:00
Alex Garcia
f217cbf2bd knn cleanups and tests 2024-07-05 12:07:45 -07:00
Alex Garcia
f602ae1396 cast ambiguous to i64 2024-06-28 22:15:53 -07:00
Alex Garcia
39f6fa3dc9 gha: cl DEFAULT_FLAGS 2024-06-28 22:07:02 -07:00
Alex Garcia
be6900b0f9 gha: yeet out p 2024-06-28 22:03:54 -07:00
Alex Garcia
cc95770edd gha: please 2024-06-28 21:58:53 -07:00
Alex Garcia
50f6886ac3 drop th econst? 2024-06-28 21:38:50 -07:00
Alex Garcia
b7bfe1f805 address some cl.exe issues 2024-06-28 20:56:51 -07:00
Alex Garcia
76c421e0b9 win32 try 2024-06-28 20:50:20 -07:00
Alex Garcia
44aef7a50f memset 0 all applicable mallocs, fix windows? 2024-06-28 19:21:50 -07:00
Alex Garcia
2eafd843d7 no inline, windows i64 fix? 2024-06-28 16:00:58 -07:00