mirror of
https://github.com/asg017/sqlite-vec.git
synced 2026-04-25 16:56:27 +02:00
* initial pass at PARTITION KEY support. * Initial pass, allow auxiliary columns on vec0 virtual tables * update TODO * Initial pass at metadata filtering * unit tests * gha this PR branch * fixup tests * doc internal * fix tests, KNN/rowids in * define SQLITE_INDEX_CONSTRAINT_OFFSET * whoops * update tests, syrupy, use uv * un ignore pyproject.toml * dot * tests/ * type error? * win: .exe, update error name * try fix macos python, paren around expr? * win bash? * dbg :( * explicit error * op * dbg win * win ./tests/.venv/Scripts/python.exe * block UPDATEs on partition key values for now * test this branch * accidentally removved "partition key type mistmatch" block during merge * typo ugh * bruv * start aux snapshots * drop aux shadow table on destroy * enforce column types * block WHERE constraints on auxiliary columns in KNN queries * support delete * support UPDATE on auxiliary columns * test this PR * dont inline that * test-metadata.py * memzero text buffer * stress test * more snpashot tests * rm double/int32, just float/int64 * finish type checking * long text support * DELETE support * UPDATE support * fix snapshot names * drop not-used in eqp * small fixes * boolean comparison handling * ensure error is raised when long string constraint * new version string for beta builds * typo whoops * ann-filtering-benchmark directory * test-case * updates * fix aux column error when using non-default rowid values, needs test * refactor some text knn filtering * rowids blob read only on text metadata filters * refactor * add failing test causes for non eq text knn * text knn NE * test cases diff * GT * text knn GT/GE fixes * text knn LT/LE * clean * vtab_in handling * unblock aux failures for now * guard sqlite3_vtab_in * else in guard? * fixes and tests * add broken shadow table test * rename _metadata_chunksNN shadown table to _metadatachunksNN, for proper shadowName detection * _metadata_text_NN shadow tables to _metadatatextNN * SQLITE_VEC_VERSION_MAJOR SQLITE_VEC_VERSION_MINOR and SQLITE_VEC_VERSION_PATCH in sqlite-vec.h * _info shadow table * forgot to update aux snapshot? * fix aux tests
122 lines
3.9 KiB
Markdown
122 lines
3.9 KiB
Markdown
# `sqlite-vec` Architecture
|
|
|
|
Internal documentation for how `sqlite-vec` works under-the-hood. Not meant for
|
|
users of the `sqlite-vec` project, consult
|
|
[the official `sqlite-vec` documentation](https://alexgarcia.xyz/sqlite-vec) for
|
|
how-to-guides. Rather, this is for people interested in how `sqlite-vec` works
|
|
and some guidelines to any future contributors.
|
|
|
|
Very much a WIP.
|
|
|
|
## `vec0`
|
|
|
|
### Shadow Tables
|
|
|
|
#### `xyz_chunks`
|
|
|
|
- `chunk_id INTEGER`
|
|
- `size INTEGER`
|
|
- `validity BLOB`
|
|
- `rowids BLOB`
|
|
|
|
#### `xyz_rowids`
|
|
|
|
- `rowid INTEGER`
|
|
- `id`
|
|
- `chunk_id INTEGER`
|
|
- `chunk_offset INTEGER`
|
|
|
|
#### `xyz_vector_chunksNN`
|
|
|
|
- `rowid INTEGER`
|
|
- `vector BLOB`
|
|
|
|
#### `xyz_auxiliary`
|
|
|
|
- `rowid INTEGER`
|
|
- `valueNN [type]`
|
|
|
|
#### `xyz_metadatachunksNN`
|
|
|
|
- `rowid INTEGER`
|
|
- `data BLOB`
|
|
|
|
#### `xyz_metadatatextNN`
|
|
|
|
- `rowid INTEGER`
|
|
- `data TEXT`
|
|
|
|
### idxStr
|
|
|
|
The `vec0` idxStr is a string composed of single "header" character and 0 or
|
|
more "blocks" of 4 characters each.
|
|
|
|
The "header" charcter denotes the type of query plan, as determined by the
|
|
`enum vec0_query_plan` values. The current possible values are:
|
|
|
|
| Name | Value | Description |
|
|
| -------------------------- | ----- | ---------------------------------------------------------------------- |
|
|
| `VEC0_QUERY_PLAN_FULLSCAN` | `'1'` | Perform a full-scan on all rows |
|
|
| `VEC0_QUERY_PLAN_POINT` | `'2'` | Perform a single-lookup point query for the provided rowid |
|
|
| `VEC0_QUERY_PLAN_KNN` | `'3'` | Perform a KNN-style query on the provided query vector and parameters. |
|
|
|
|
Each 4-character "block" is associated with a corresponding value in `argv[]`.
|
|
For example, the 1st block at byte offset `1-4` (inclusive) is the 1st block and
|
|
is associated with `argv[1]`. The 2nd block at byte offset `5-8` (inclusive) is
|
|
associated with `argv[2]` and so on. Each block describes what kind of value or
|
|
filter the given `argv[i]` value is.
|
|
|
|
#### `VEC0_IDXSTR_KIND_KNN_MATCH` (`'{'`)
|
|
|
|
`argv[i]` is the query vector of the KNN query.
|
|
|
|
The remaining 3 characters of the block are `_` fillers.
|
|
|
|
#### `VEC0_IDXSTR_KIND_KNN_K` (`'}'`)
|
|
|
|
`argv[i]` is the limit/k value of the KNN query.
|
|
|
|
The remaining 3 characters of the block are `_` fillers.
|
|
|
|
#### `VEC0_IDXSTR_KIND_KNN_ROWID_IN` (`'['`)
|
|
|
|
`argv[i]` is the optional `rowid in (...)` value, and must be handled with
|
|
[`sqlite3_vtab_in_first()` / `sqlite3_vtab_in_next()`](https://www.sqlite.org/c3ref/vtab_in_first.html).
|
|
|
|
The remaining 3 characters of the block are `_` fillers.
|
|
|
|
#### `VEC0_IDXSTR_KIND_KNN_PARTITON_CONSTRAINT` (`']'`)
|
|
|
|
`argv[i]` is a "constraint" on a specific partition key.
|
|
|
|
The second character of the block denotes which partition key to filter on,
|
|
using `A` to denote the first partition key column, `B` for the second, etc. It
|
|
is encoded with `'A' + partition_idx` and can be decoded with `c - 'A'`.
|
|
|
|
The third character of the block denotes which operator is used in the
|
|
constraint. It will be one of the values of `enum vec0_partition_operator`, as
|
|
only a subset of operations are supported on partition keys.
|
|
|
|
The fourth character of the block is a `_` filler.
|
|
|
|
#### `VEC0_IDXSTR_KIND_POINT_ID` (`'!'`)
|
|
|
|
`argv[i]` is the value of the rowid or id to match against for the point query.
|
|
|
|
The remaining 3 characters of the block are `_` fillers.
|
|
|
|
#### `VEC0_IDXSTR_KIND_METADATA_CONSTRAINT` (`'&'`)
|
|
|
|
`argv[i]` is the value of the `WHERE` constraint for a metdata column in a KNN
|
|
query.
|
|
|
|
The second character of the block denotes which metadata column the constraint
|
|
belongs to, using `A` to denote the first metadata column column, `B` for the
|
|
second, etc. It is encoded with `'A' + metadata_idx` and can be decoded with
|
|
`c - 'A'`.
|
|
|
|
The third character of the block is the constraint operator. It will be one of
|
|
`enum vec0_metadata_operator`, as only a subset of operators are supported on
|
|
metadata column KNN filters.
|
|
|
|
The foruth character of the block is a `_` filler.
|