vec0Update_UpdateVectorColumn writes to flat chunk blobs but does not
update DiskANN graph or IVF index structures, silently corrupting KNN
results. Now returns a clear error for these index types. Rescore
UPDATE is unaffected — it already has a full implementation that
updates both quantized chunks and float vectors.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Six sites used #if SQLITE_VEC_ENABLE_RESCORE to guard _vector_chunks
skip logic that applies to ALL non-flat index types. When RESCORE was
compiled out, DiskANN and IVF columns would incorrectly access flat
chunk tables. Two sites also missed DiskANN in the skip enumeration,
which would break mixed flat+DiskANN tables.
Fix: replace all six compile-time guards with unconditional runtime
`!= VEC0_INDEX_TYPE_FLAT` checks. Also move rescore_on_delete inside
the !vec0_all_columns_diskann guard to prevent use of uninitialized
chunk_id/chunk_offset, and initialize those variables to 0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The UINT32_TYPE/UINT16_TYPE/INT16_TYPE/UINT8_TYPE/INT8_TYPE/LONGDOUBLE_TYPE
macros (copied from sqlite3.c) were never used anywhere in sqlite-vec.
The u_int8_t/u_int16_t/u_int64_t typedefs redefined standard types using
BSD-only types despite <stdint.h> already being included, breaking builds
on musl/Alpine, strict C99, and requiring reactive platform guards.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extend benchmarks-ann/ with results database (SQLite with per-query detail
and continuous writes), dataset subfolder organization, --subset-size and
--warmup options. Supports systematic comparison across flat, rescore, IVF,
and DiskANN index types.
The command insert handler (used for runtime config like
search_list_size_search) was gated behind SQLITE_VEC_EXPERIMENTAL_IVF_ENABLE,
which defaults to 0. DiskANN commands were unreachable unless IVF was
also compiled in. Widen the guard to also activate when
SQLITE_VEC_ENABLE_DISKANN is set.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add DiskANN graph-based index: builds a Vamana graph with configurable R
(max degree) and L (search list size, separate for insert/query), supports
int8 quantization with rescore, lazy reverse-edge replacement, pre-quantized
query optimization, and insert buffer reuse. Includes shadow table management,
delete support, KNN integration, compile flag (SQLITE_VEC_ENABLE_DISKANN),
release-demo workflow, fuzz targets, and tests. Fixes rescore int8
quantization bug.
Rename SQLITE_VEC_ENABLE_IVF to SQLITE_VEC_EXPERIMENTAL_IVF_ENABLE and
flip the default from 1 to 0. IVF tests are automatically skipped when
the build flag is not set.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add inverted file (IVF) index type: partitions vectors into clusters via
k-means, quantizes to int8, and scans only the nearest nprobe partitions at
query time. Includes shadow table management, insert/delete, KNN integration,
compile flag (SQLITE_VEC_ENABLE_IVF), fuzz targets, and tests. Removes
superseded ivf-benchmarks/ directory.
The Linux AVX auto-detection checked the host's /proc/cpuinfo, which
passes on x86 CI runners even when cross-compiling for Android ARM
targets. Skip AVX detection when CC contains 'android'.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add approximate nearest neighbor infrastructure to vec0: shared distance
dispatch (vec0_distance_full), flat index type with parser, NEON-optimized
cosine/Hamming for float32/int8, amalgamation script, and benchmark suite
(benchmarks-ann/) with ground-truth generation and profiling tools. Remove
unused vec_npy_each/vec_static_blobs code, fix missing stdint.h include.
vec0Update_Delete_ClearMetadata's long-text branch runs a DELETE via
sqlite3_step, which returns SQLITE_DONE (101) on success. The code
checked for failure but never normalized the success case to SQLITE_OK.
The function's epilogue returned SQLITE_DONE as-is, which the caller
(vec0Update_Delete) treated as an error, aborting the DELETE scan and
silently leaving rows behind.
- Normalize rc to SQLITE_OK after successful sqlite3_step in ClearMetadata
- Move sqlite3_finalize before the rc check (cleanup on all paths)
- Add test_delete_by_metadata_with_long_text regression test
- Update test_deletes snapshot (row 3 now correctly deleted)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add rescore index type: stores full-precision float vectors in a rowid-keyed
shadow table, quantizes to int8 for fast initial scan, then rescores top
candidates with original vectors. Includes config parser, shadow table
management, insert/delete support, KNN integration, compile flag
(SQLITE_VEC_ENABLE_RESCORE), fuzz targets, and tests.
Add approximate nearest neighbor infrastructure to vec0: shared distance
dispatch (vec0_distance_full), flat index type with parser, NEON-optimized
cosine/Hamming for float32/int8, amalgamation script, and benchmark suite
(benchmarks-ann/) with ground-truth generation and profiling tools. Remove
unused vec_npy_each/vec_static_blobs code, fix missing stdint.h include.
When a row is deleted from a vec0 virtual table, the rowid slot in
_chunks.rowids and vector data in _vector_chunksNN.vectors are now
zeroed out (previously left as stale data, tracked in #54). When all
rows in a chunk are deleted (validity bitmap all zeros), the chunk and
its associated vector/metadata shadow table rows are reclaimed.
- Add vec0Update_Delete_ClearRowid to zero the rowid blob slot
- Add vec0Update_Delete_ClearVectors to zero all vector blob slots
- Add vec0Update_Delete_DeleteChunkIfEmpty to detect and delete
fully-empty chunks from _chunks, _vector_chunksNN, _metadatachunksNN
- Fix missing rc check in ClearMetadata loop (bug: errors were silently
ignored)
- Fix vec0_new_chunk to explicitly set _rowid_ on shadow table INSERTs
(SHADOW_TABLE_ROWID_QUIRK: "rowid PRIMARY KEY" without INTEGER type
is not a true rowid alias, causing blob_open failures after chunk
delete+recreate cycles)
- Add 13 new tests covering rowid/vector zeroing, chunk reclamation,
metadata/auxiliary/partition/text-PK/int8/bit variants, and
page_count shrinkage verification
- Add vec0-delete-completeness fuzz target
- Update snapshots for new delete zeroing behavior
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Homebrew LLVM 18 runtime dylibs use typed allocation ABI symbols
(__ZnwmSt19__type_descriptor_t) not available in macOS 14's system
libc++, causing dyld to abort. Xcode clang doesn't ship libFuzzer.
Mark fuzz-macos as continue-on-error (same as fuzz-windows) so it
doesn't block CI. Linux fuzzing remains the primary bug detector.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sqlite-vec.c:
- vec0_free: add loops to free partition, auxiliary, and metadata
column names (previously leaked on error paths)
- vec0_init: update pNew->numXxxColumns incrementally in the parse
loop so vec0_free sees correct counts on early goto-error paths
(previously the counts were only written after the loop, so vec0_free
would loop 0 times and leak names allocated inside the loop)
fuzz.yaml:
- macOS: pass -isysroot $(xcrun --sdk macosx --show-sdk-path) so
Xcode clang can find system headers (stdio.h, assert.h, etc.)
- Fix artifact upload paths: libFuzzer writes crash-*/leak-* to
the cwd (repo root), not tests/fuzz/
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Homebrew LLVM 18's ASAN/libFuzzer runtimes contain weak-def symbols
for typed allocation operators (__ZnwmSt19__type_descriptor_t) that
don't exist in macOS 14's system libc++. Since the symbol is embedded
in the pre-built runtime dylibs (not our code), link-time flags cannot
fix it.
Switch to Apple's Xcode clang which ships its own libFuzzer and ASAN
runtime built against the system libc++ — no ABI mismatch possible.
No Homebrew LLVM install needed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The fuzz targets were crashing on macOS 14 with:
dyld: weak-def symbol not found '__ZnwmSt19__type_descriptor_t'
libFuzzer compiled with LLVM 18 uses typed allocation ABI symbols
not present in macOS 14's system libc++. Since DYLD_LIBRARY_PATH
cannot override SIP-protected /usr/lib/libc++.1.dylib at runtime,
we fix this at link time:
- -nostdlib++: suppress implicit system libc++ linking
- -L$LLVM/lib/c++ -lc++: explicitly link LLVM's libc++ (which has the symbol)
- -Wl,-rpath,$LLVM/lib/c++: embed rpath so dyld finds LLVM's libc++ at runtime
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- fuzz.yaml: switch macOS to llvm@18 (latest LLVM uses typed allocation
C++ ABI symbols not available on macOS 14 runner's system libc++)
- sqlite-vec.c: fix NaN input in vec_quantize_int8 by using !(val <= X)
comparisons which evaluate to true for NaN, ensuring the clamp fires
- sqlite-vec.c: free pzErrMsg in vec_eachFilter error path (was leaking
the error string returned by vector_from_value)
- sqlite-vec.c: add sqlite3_free(pNew) to vec0_init error path; vec0_free
frees the contents but not the struct itself, mirroring vec0Disconnect
- sqlite-vec.c: free knn_data in vec0Filter_knn cleanup when rc != SQLITE_OK;
on error the cursor's knn_data field is never set so it would not be
freed by the cursor teardown path
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- fuzz.yaml: embed rpath to Homebrew LLVM's libc++ so macOS binaries can
find the right C++ runtime at load time (fixes dyld weak-def crash)
- fuzz.yaml: add `make sqlite-vec.h` step on all platforms before building
fuzz targets (the header is generated from a template, not checked in)
- fuzz.yaml: drop llvm version pin on Windows so choco succeeds when a
newer LLVM is already installed on the runner
- sqlite-vec.c: change fvec_cleanup / fvec_cleanup_noop to take void*
instead of f32* so they are ABI-compatible with vector_cleanup; removes
UBSAN indirect-call errors at many call sites
- sqlite-vec.c: copy BLOB data into sqlite3_malloc'd buffer in
fvec_from_value instead of aliasing the raw blob pointer, fixing UBSAN
misaligned-load errors when SQLite hands us an unaligned blob
- sqlite-vec.c: guard npy_token_next string scan with ptr < end check
before the closing-quote dereference (heap-buffer-overflow)
- sqlite-vec.c: clamp vec_quantize_int8 intermediate value to [-128, 127]
before casting to i8 (UBSAN out-of-range conversion)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Document three classes of undefined behavior found by UBSAN:
function pointer type mismatches, misaligned f32 reads, and
float-to-integer overflow in vec_quantize_int8.
Improve vec-mismatch fuzzer to cover all error-path cleanup patterns:
type mismatches, dimension mismatches, single-arg functions, and
both text and blob inputs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the second vector argument fails to parse, the cleanup of the
first vector was called with the double-pointer 'a' instead of '*a'.
When the first vector was parsed from JSON text (cleanup = sqlite3_free),
this called sqlite3_free on a stack address, causing a crash.
Found by the vec-mismatch fuzz target.
Shout out to @renatgalimov in #257 for finding the original bug!
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Targeted fuzzer for two-argument vector functions (vec_distance_*,
vec_add, vec_sub) that binds a valid JSON vector as arg1 and fuzz
data as arg2. This exercises the error path in ensure_vector_match()
where the first vector parses successfully (with sqlite3_free cleanup)
but the second fails, triggering the buggy aCleanup(a) call on line
1031 of sqlite-vec.c (should be aCleanup(*a)).
The fuzzer catches this immediately — ASAN reports "bad-free" when
sqlite3_free is called on a stack address.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runs on push to main, nightly at 2am UTC, and manual dispatch.
Linux (ubuntu-22.04) and macOS (macos-14) are fully supported.
Windows (windows-2022) is best-effort with continue-on-error.
Crash artifacts are uploaded on failure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deduplicates exec(), vec0_shadow_table_contents(), _f32(), _i64(), and
_int8() helpers that were copied across six test files.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Baseline tests protecting non-DiskANN chunk-based insert and delete
behavior: vector round-trips, auto rowids, text primary keys, delete
validity, reinsert after delete, dimension/type validation, and v_info
snapshot.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extends the vec0 tokenizer to recognize '(', ')', and ',' as
single-character tokens, preparing for DiskANN index option parsing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add test-only wrappers behind SQLITE_VEC_TEST compile flag to expose
static distance functions for unit testing. Includes tests for
distance_l2_sqr_float (4 cases), distance_cosine_float (3 cases),
and distance_hamming (4 cases). Print active SIMD flags at test start.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>