Commit graph

5 commits

Author SHA1 Message Date
Alex Garcia
6e2c4c6bab Add FTS5-style command column and runtime oversample for rescore
Replace the old INSERT INTO t(rowid) VALUES('command') hack with a
proper hidden command column named after the table (FTS5 pattern):

  INSERT INTO t(t) VALUES ('oversample=16')

The command column is the first hidden column (before distance and k)
to reserve ability for future table-valued function argument use.

Schema: CREATE TABLE x(rowid, <cols>, "<table>" hidden, distance hidden, k hidden)

For backwards compat, pre-v0.1.10 tables (detected via _info shadow
table version) skip the command column to avoid name conflicts with
user columns that may share the table's name. Verified with legacy
fixture DB generated by sqlite-vec v0.1.6.

Changes:
- Add hidden command column to sqlite3_declare_vtab for new tables
- Version-gate via _info shadow table for existing tables
- Validate at CREATE time that no column name matches table name
- Add rescore_handle_command() with oversample=N support
- rescore_knn() prefers runtime oversample_search over CREATE default
- Remove old rowid-based command dispatch
- Migrate all DiskANN/IVF/fuzz tests and benchmarks to new syntax
- Add legacy DB fixture (v0.1.6) and 9 backwards-compat tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:39:18 -07:00
Alex Garcia
5522e86cd2 Validate validity/rowids blob sizes in rescore KNN path
The rescore KNN loop read validity and rowids blobs from the chunks
iterator without checking their sizes matched chunk_size expectations.
A truncated or corrupt blob could cause OOB reads in bitmap_copy or
rowid array access. The flat KNN path already had these checks.

Adds corruption tests: truncated rowids blob and truncated validity
blob both produce errors instead of crashes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 17:49:40 -07:00
Alex Garcia
f2c9fb8f08 Add text PK, WAL concurrency tests, and fix bench-smoke config
Infrastructure improvements:
- Fix benchmarks-ann Makefile: type=baseline -> type=vec0-flat (baseline
  was never a valid INDEX_REGISTRY key)
- Add DiskANN + text primary key test: insert, KNN, delete, KNN
- Add rescore + text primary key test: insert, KNN, delete, KNN
- Add WAL concurrency test: reader sees snapshot isolation while
  writer has an open transaction, KNN works on reader's snapshot

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 17:43:49 -07:00
Alex Garcia
82f4eb08bf Add NULL checks after sqlite3_column_blob in rescore and DiskANN
sqlite3_column_blob() returns NULL for zero-length blobs or on OOM.
Several call sites in rescore KNN and DiskANN node/vector read passed
the result directly to memcpy without checking, risking NULL deref on
corrupt or empty databases. IVF already had proper NULL checks.

Adds corruption regression tests that truncate shadow table blobs and
verify the query errors cleanly instead of crashing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:31:49 -07:00
Alex Garcia
ba0db0b6d6 Add rescore index for ANN queries
Add rescore index type: stores full-precision float vectors in a rowid-keyed
shadow table, quantizes to int8 for fast initial scan, then rescores top
candidates with original vectors. Includes config parser, shadow table
management, insert/delete support, KNN integration, compile flag
(SQLITE_VEC_ENABLE_RESCORE), fuzz targets, and tests.
2026-03-29 19:45:54 -07:00