* initial pass at PARTITION KEY support. * Initial pass, allow auxiliary columns on vec0 virtual tables * update TODO * Initial pass at metadata filtering * unit tests * gha this PR branch * fixup tests * doc internal * fix tests, KNN/rowids in * define SQLITE_INDEX_CONSTRAINT_OFFSET * whoops * update tests, syrupy, use uv * un ignore pyproject.toml * dot * tests/ * type error? * win: .exe, update error name * try fix macos python, paren around expr? * win bash? * dbg :( * explicit error * op * dbg win * win ./tests/.venv/Scripts/python.exe * block UPDATEs on partition key values for now * test this branch * accidentally removved "partition key type mistmatch" block during merge * typo ugh * bruv * start aux snapshots * drop aux shadow table on destroy * enforce column types * block WHERE constraints on auxiliary columns in KNN queries * support delete * support UPDATE on auxiliary columns * test this PR * dont inline that * test-metadata.py * memzero text buffer * stress test * more snpashot tests * rm double/int32, just float/int64 * finish type checking * long text support * DELETE support * UPDATE support * fix snapshot names * drop not-used in eqp * small fixes * boolean comparison handling * ensure error is raised when long string constraint * new version string for beta builds * typo whoops * ann-filtering-benchmark directory * test-case * updates * fix aux column error when using non-default rowid values, needs test * refactor some text knn filtering * rowids blob read only on text metadata filters * refactor * add failing test causes for non eq text knn * text knn NE * test cases diff * GT * text knn GT/GE fixes * text knn LT/LE * clean * vtab_in handling * unblock aux failures for now * guard sqlite3_vtab_in * else in guard? * fixes and tests * add broken shadow table test * rename _metadata_chunksNN shadown table to _metadatachunksNN, for proper shadowName detection * _metadata_text_NN shadow tables to _metadatatextNN * SQLITE_VEC_VERSION_MAJOR SQLITE_VEC_VERSION_MINOR and SQLITE_VEC_VERSION_PATCH in sqlite-vec.h * _info shadow table * forgot to update aux snapshot? * fix aux tests
3.9 KiB
sqlite-vec Architecture
Internal documentation for how sqlite-vec works under-the-hood. Not meant for
users of the sqlite-vec project, consult
the official sqlite-vec documentation for
how-to-guides. Rather, this is for people interested in how sqlite-vec works
and some guidelines to any future contributors.
Very much a WIP.
vec0
Shadow Tables
xyz_chunks
chunk_id INTEGERsize INTEGERvalidity BLOBrowids BLOB
xyz_rowids
rowid INTEGERidchunk_id INTEGERchunk_offset INTEGER
xyz_vector_chunksNN
rowid INTEGERvector BLOB
xyz_auxiliary
rowid INTEGERvalueNN [type]
xyz_metadatachunksNN
rowid INTEGERdata BLOB
xyz_metadatatextNN
rowid INTEGERdata TEXT
idxStr
The vec0 idxStr is a string composed of single "header" character and 0 or
more "blocks" of 4 characters each.
The "header" charcter denotes the type of query plan, as determined by the
enum vec0_query_plan values. The current possible values are:
| Name | Value | Description |
|---|---|---|
VEC0_QUERY_PLAN_FULLSCAN |
'1' |
Perform a full-scan on all rows |
VEC0_QUERY_PLAN_POINT |
'2' |
Perform a single-lookup point query for the provided rowid |
VEC0_QUERY_PLAN_KNN |
'3' |
Perform a KNN-style query on the provided query vector and parameters. |
Each 4-character "block" is associated with a corresponding value in argv[].
For example, the 1st block at byte offset 1-4 (inclusive) is the 1st block and
is associated with argv[1]. The 2nd block at byte offset 5-8 (inclusive) is
associated with argv[2] and so on. Each block describes what kind of value or
filter the given argv[i] value is.
VEC0_IDXSTR_KIND_KNN_MATCH ('{')
argv[i] is the query vector of the KNN query.
The remaining 3 characters of the block are _ fillers.
VEC0_IDXSTR_KIND_KNN_K ('}')
argv[i] is the limit/k value of the KNN query.
The remaining 3 characters of the block are _ fillers.
VEC0_IDXSTR_KIND_KNN_ROWID_IN ('[')
argv[i] is the optional rowid in (...) value, and must be handled with
sqlite3_vtab_in_first() / sqlite3_vtab_in_next().
The remaining 3 characters of the block are _ fillers.
VEC0_IDXSTR_KIND_KNN_PARTITON_CONSTRAINT (']')
argv[i] is a "constraint" on a specific partition key.
The second character of the block denotes which partition key to filter on,
using A to denote the first partition key column, B for the second, etc. It
is encoded with 'A' + partition_idx and can be decoded with c - 'A'.
The third character of the block denotes which operator is used in the
constraint. It will be one of the values of enum vec0_partition_operator, as
only a subset of operations are supported on partition keys.
The fourth character of the block is a _ filler.
VEC0_IDXSTR_KIND_POINT_ID ('!')
argv[i] is the value of the rowid or id to match against for the point query.
The remaining 3 characters of the block are _ fillers.
VEC0_IDXSTR_KIND_METADATA_CONSTRAINT ('&')
argv[i] is the value of the WHERE constraint for a metdata column in a KNN
query.
The second character of the block denotes which metadata column the constraint
belongs to, using A to denote the first metadata column column, B for the
second, etc. It is encoded with 'A' + metadata_idx and can be decoded with
c - 'A'.
The third character of the block is the constraint operator. It will be one of
enum vec0_metadata_operator, as only a subset of operators are supported on
metadata column KNN filters.
The foruth character of the block is a _ filler.