sqlite-vec/ARCHITECTURE.md
2024-11-12 21:35:39 -08:00

2.5 KiB

vec0

idxStr

The vec0 idxStr is a string composed of single "header" character and 0 or more "blocks" of 4 characters each.

The "header" charcter denotes the type of query plan, as determined by the enum vec0_query_plan values. The current possible values are:

Name Value Description
VEC0_QUERY_PLAN_FULLSCAN '1' Perform a full-scan on all rows
VEC0_QUERY_PLAN_POINT '2' Perform a single-lookup point query for the provided rowid
VEC0_QUERY_PLAN_KNN '3' Perform a KNN-style query on the provided query vector and parameters.

Each 4-character "block" is associated with a corresponding value in argv[]. For example, the 1st block at byte offset 1-4 (inclusive) is the 1st block and is associated with argv[1]. The 2nd block at byte offset 5-8 (inclusive) is associated with argv[2] and so on. Each block describes what kind of value or filter the given argv[i] value is.

VEC0_IDXSTR_KIND_KNN_MATCH ('{')

argv[i] is the query vector of the KNN query.

The remaining 3 characters of the block are _ fillers.

VEC0_IDXSTR_KIND_KNN_K ('}')

argv[i] is the limit/k value of the KNN query.

The remaining 3 characters of the block are _ fillers.

VEC0_IDXSTR_KIND_KNN_ROWID_IN ('[')

argv[i] is the optional rowid in (...) value, and must be handled with sqlite3_vtab_in_first() / sqlite3_vtab_in_next().

The remaining 3 characters of the block are _ fillers.

VEC0_IDXSTR_KIND_KNN_PARTITON_CONSTRAINT (']')

argv[i] is a "constraint" on a specific partition key.

The second character of the block denotes which partition key to filter on, using A to denote the first partition key column, B for the second, etc. It is encoded with 'A' + partition_idx and can be decoded with c - 'A'.

The third character of the block denotes which operator is used in the constraint. It will be one of the values of enum vec0_partition_operator, as only a subset of operations are supported on partition keys.

The fourth character of the block is a _ filler.

VEC0_IDXSTR_KIND_POINT_ID ('!')

argv[i] is the value of the rowid or id to match against for the point query.

The remaining 3 characters of the block are _ fillers.