mirror of
https://github.com/asg017/sqlite-vec.git
synced 2026-04-26 01:06:27 +02:00
add broken shadow table test
This commit is contained in:
parent
a657b3a216
commit
e99e31feb7
5 changed files with 226 additions and 36 deletions
|
|
@ -1,3 +1,13 @@
|
||||||
|
# `sqlite-vec` Architecture
|
||||||
|
|
||||||
|
Internal documentation for how `sqlite-vec` works under-the-hood. Not meant for
|
||||||
|
users of the `sqlite-vec` project, consult
|
||||||
|
[the official `sqlite-vec` documentation](https://alexgarcia.xyz/sqlite-vec) for
|
||||||
|
how-to-guides. Rather, this is for people interested in how `sqlite-vec` works
|
||||||
|
and some guidelines to any future contributors.
|
||||||
|
|
||||||
|
Very much a WIP.
|
||||||
|
|
||||||
## `vec0`
|
## `vec0`
|
||||||
|
|
||||||
### Shadow Tables
|
### Shadow Tables
|
||||||
|
|
@ -9,7 +19,6 @@
|
||||||
- `validity BLOB`
|
- `validity BLOB`
|
||||||
- `rowids BLOB`
|
- `rowids BLOB`
|
||||||
|
|
||||||
|
|
||||||
#### `xyz_rowids`
|
#### `xyz_rowids`
|
||||||
|
|
||||||
- `rowid INTEGER`
|
- `rowid INTEGER`
|
||||||
|
|
@ -32,7 +41,6 @@
|
||||||
- `rowid INTEGER`
|
- `rowid INTEGER`
|
||||||
- `data BLOB`
|
- `data BLOB`
|
||||||
|
|
||||||
|
|
||||||
#### `xyz_metadata_text_data_00`
|
#### `xyz_metadata_text_data_00`
|
||||||
|
|
||||||
- `rowid INTEGER`
|
- `rowid INTEGER`
|
||||||
|
|
@ -52,8 +60,11 @@ The "header" charcter denotes the type of query plan, as determined by the
|
||||||
| `VEC0_QUERY_PLAN_POINT` | `'2'` | Perform a single-lookup point query for the provided rowid |
|
| `VEC0_QUERY_PLAN_POINT` | `'2'` | Perform a single-lookup point query for the provided rowid |
|
||||||
| `VEC0_QUERY_PLAN_KNN` | `'3'` | Perform a KNN-style query on the provided query vector and parameters. |
|
| `VEC0_QUERY_PLAN_KNN` | `'3'` | Perform a KNN-style query on the provided query vector and parameters. |
|
||||||
|
|
||||||
Each 4-character "block" is associated with a corresponding value in `argv[]`. For example, the 1st block at byte offset `1-4` (inclusive) is the 1st block and is associated with `argv[1]`. The 2nd block at byte offset `5-8` (inclusive) is associated with `argv[2]` and so on. Each block describes what kind of value or filter the given `argv[i]` value is.
|
Each 4-character "block" is associated with a corresponding value in `argv[]`.
|
||||||
|
For example, the 1st block at byte offset `1-4` (inclusive) is the 1st block and
|
||||||
|
is associated with `argv[1]`. The 2nd block at byte offset `5-8` (inclusive) is
|
||||||
|
associated with `argv[2]` and so on. Each block describes what kind of value or
|
||||||
|
filter the given `argv[i]` value is.
|
||||||
|
|
||||||
#### `VEC0_IDXSTR_KIND_KNN_MATCH` (`'{'`)
|
#### `VEC0_IDXSTR_KIND_KNN_MATCH` (`'{'`)
|
||||||
|
|
||||||
|
|
@ -69,7 +80,8 @@ The remaining 3 characters of the block are `_` fillers.
|
||||||
|
|
||||||
#### `VEC0_IDXSTR_KIND_KNN_ROWID_IN` (`'['`)
|
#### `VEC0_IDXSTR_KIND_KNN_ROWID_IN` (`'['`)
|
||||||
|
|
||||||
`argv[i]` is the optional `rowid in (...)` value, and must be handled with [`sqlite3_vtab_in_first()` /
|
`argv[i]` is the optional `rowid in (...)` value, and must be handled with
|
||||||
|
[`sqlite3_vtab_in_first()` /
|
||||||
`sqlite3_vtab_in_next()`](https://www.sqlite.org/c3ref/vtab_in_first.html).
|
`sqlite3_vtab_in_next()`](https://www.sqlite.org/c3ref/vtab_in_first.html).
|
||||||
|
|
||||||
The remaining 3 characters of the block are `_` fillers.
|
The remaining 3 characters of the block are `_` fillers.
|
||||||
|
|
@ -78,13 +90,16 @@ The remaining 3 characters of the block are `_` fillers.
|
||||||
|
|
||||||
`argv[i]` is a "constraint" on a specific partition key.
|
`argv[i]` is a "constraint" on a specific partition key.
|
||||||
|
|
||||||
The second character of the block denotes which partition key to filter on, using `A` to denote the first partition key column, `B` for the second, etc. It is encoded with `'A' + partition_idx` and can be decoded with `c - 'A'`.
|
The second character of the block denotes which partition key to filter on,
|
||||||
|
using `A` to denote the first partition key column, `B` for the second, etc. It
|
||||||
|
is encoded with `'A' + partition_idx` and can be decoded with `c - 'A'`.
|
||||||
|
|
||||||
The third character of the block denotes which operator is used in the constraint. It will be one of the values of `enum vec0_partition_operator`, as only a subset of operations are supported on partition keys.
|
The third character of the block denotes which operator is used in the
|
||||||
|
constraint. It will be one of the values of `enum vec0_partition_operator`, as
|
||||||
|
only a subset of operations are supported on partition keys.
|
||||||
|
|
||||||
The fourth character of the block is a `_` filler.
|
The fourth character of the block is a `_` filler.
|
||||||
|
|
||||||
|
|
||||||
#### `VEC0_IDXSTR_KIND_POINT_ID` (`'!'`)
|
#### `VEC0_IDXSTR_KIND_POINT_ID` (`'!'`)
|
||||||
|
|
||||||
`argv[i]` is the value of the rowid or id to match against for the point query.
|
`argv[i]` is the value of the rowid or id to match against for the point query.
|
||||||
|
|
@ -93,11 +108,16 @@ The remaining 3 characters of the block are `_` fillers.
|
||||||
|
|
||||||
#### `VEC0_IDXSTR_KIND_METADATA_CONSTRAINT` (`'&'`)
|
#### `VEC0_IDXSTR_KIND_METADATA_CONSTRAINT` (`'&'`)
|
||||||
|
|
||||||
`argv[i]` is the value of the `WHERE` constraint for a metdata column in a KNN query.
|
`argv[i]` is the value of the `WHERE` constraint for a metdata column in a KNN
|
||||||
|
query.
|
||||||
|
|
||||||
The second character of the block denotes which metadata column the constraint belongs to, using `A` to denote the first metadata column column, `B` for the second, etc. It is encoded with `'A' + metadata_idx` and can be decoded with `c - 'A'`.
|
The second character of the block denotes which metadata column the constraint
|
||||||
|
belongs to, using `A` to denote the first metadata column column, `B` for the
|
||||||
|
second, etc. It is encoded with `'A' + metadata_idx` and can be decoded with
|
||||||
|
`c - 'A'`.
|
||||||
|
|
||||||
The third character of the block is the constraint operator. It will be one of `enum vec0_metadata_operator`, as only a subset of operators are supported on metadata column KNN filters.
|
The third character of the block is the constraint operator. It will be one of
|
||||||
|
`enum vec0_metadata_operator`, as only a subset of operators are supported on
|
||||||
|
metadata column KNN filters.
|
||||||
|
|
||||||
The foruth character of the block is a `_` filler.
|
The foruth character of the block is a `_` filler.
|
||||||
|
|
||||||
|
|
|
||||||
38
TODO
38
TODO
|
|
@ -1,25 +1,17 @@
|
||||||
# partition
|
|
||||||
|
|
||||||
- [ ] UPDATE on partition key values
|
|
||||||
- remove previous row from chunk, insert into new one?
|
|
||||||
- [ ] properly sqlite3_vtab_nochange / sqlite3_value_nochange handling
|
|
||||||
|
|
||||||
|
|
||||||
# auxiliary columns
|
|
||||||
|
|
||||||
- later:
|
|
||||||
- NOT NULL?
|
|
||||||
- perf: INSERT stmt should be cached on vec0_vtab
|
|
||||||
- perf: LEFT JOIN aux table to rowids query in vec0_cursor for rowid/point stmts, to avoid N lookup queries
|
|
||||||
|
|
||||||
# metadata filtering
|
|
||||||
- `v in (...)` handling
|
|
||||||
- [ ] test accessing aux values when rowid is different than 1,2,3 etc.
|
|
||||||
- [ ] add `xyz_info` shadow table with version etc.
|
- [ ] add `xyz_info` shadow table with version etc.
|
||||||
|
|
||||||
- later
|
- later
|
||||||
- null!
|
- [ ] partition: UPDATE support
|
||||||
- date/datetime
|
- [ ] skip invalid validity entries in knn filter?
|
||||||
- remaining TODO items
|
- [ ] nulls in metadata
|
||||||
- skip invalid validity entries in knn filter?
|
- [ ] partition `x in (...)` handling
|
||||||
- dictionary encoding?
|
- [ ] blobs/date/datetime
|
||||||
- partition `x in (...)` handling
|
- [ ] uuid/ulid perf
|
||||||
|
- [ ] Aux columns: `NOT NULL` constraint
|
||||||
|
- [ ] Metadata columns: `NOT NULL` constraint
|
||||||
|
- [ ] Partiion key: `NOT NULL` constraint
|
||||||
|
- [ ] dictionary encoding?
|
||||||
|
- [ ] properly sqlite3_vtab_nochange / sqlite3_value_nochange handling
|
||||||
|
- [ ] perf
|
||||||
|
- [ ] aux: cache INSERT
|
||||||
|
- [ ] aux: LEFT JOIN on `_rowids` queries to avoid N lookup queries
|
||||||
|
|
|
||||||
|
|
@ -8759,7 +8759,7 @@ static int vec0Update(sqlite3_vtab *pVTab, int argc, sqlite3_value **argv,
|
||||||
}
|
}
|
||||||
|
|
||||||
static int vec0ShadowName(const char *zName) {
|
static int vec0ShadowName(const char *zName) {
|
||||||
static const char *azName[] = {"rowids", "chunks", "auxiliary", "vector_chunks", "metadata_chunks"};
|
static const char *azName[] = {"rowids", "chunks", "auxiliary"};
|
||||||
|
|
||||||
for (size_t i = 0; i < sizeof(azName) / sizeof(azName[0]); i++) {
|
for (size_t i = 0; i < sizeof(azName) / sizeof(azName[0]); i++) {
|
||||||
if (sqlite3_stricmp(zName, azName[i]) == 0)
|
if (sqlite3_stricmp(zName, azName[i]) == 0)
|
||||||
|
|
|
||||||
123
tests/__snapshots__/test-general.ambr
Normal file
123
tests/__snapshots__/test-general.ambr
Normal file
|
|
@ -0,0 +1,123 @@
|
||||||
|
# serializer version: 1
|
||||||
|
# name: test_shadow
|
||||||
|
OrderedDict({
|
||||||
|
'sql': 'select * from sqlite_master order by name',
|
||||||
|
'rows': list([
|
||||||
|
OrderedDict({
|
||||||
|
'type': 'index',
|
||||||
|
'name': 'sqlite_autoindex_v_metadata_chunks00_1',
|
||||||
|
'tbl_name': 'v_metadata_chunks00',
|
||||||
|
'rootpage': 8,
|
||||||
|
'sql': None,
|
||||||
|
}),
|
||||||
|
OrderedDict({
|
||||||
|
'type': 'index',
|
||||||
|
'name': 'sqlite_autoindex_v_metadata_text_data_00_1',
|
||||||
|
'tbl_name': 'v_metadata_text_data_00',
|
||||||
|
'rootpage': 10,
|
||||||
|
'sql': None,
|
||||||
|
}),
|
||||||
|
OrderedDict({
|
||||||
|
'type': 'index',
|
||||||
|
'name': 'sqlite_autoindex_v_vector_chunks00_1',
|
||||||
|
'tbl_name': 'v_vector_chunks00',
|
||||||
|
'rootpage': 6,
|
||||||
|
'sql': None,
|
||||||
|
}),
|
||||||
|
OrderedDict({
|
||||||
|
'type': 'table',
|
||||||
|
'name': 'sqlite_sequence',
|
||||||
|
'tbl_name': 'sqlite_sequence',
|
||||||
|
'rootpage': 3,
|
||||||
|
'sql': 'CREATE TABLE sqlite_sequence(name,seq)',
|
||||||
|
}),
|
||||||
|
OrderedDict({
|
||||||
|
'type': 'table',
|
||||||
|
'name': 'v',
|
||||||
|
'tbl_name': 'v',
|
||||||
|
'rootpage': 0,
|
||||||
|
'sql': 'CREATE VIRTUAL TABLE v using vec0(a float[1], partition text partition key, metadata text, +name text, chunk_size=8)',
|
||||||
|
}),
|
||||||
|
OrderedDict({
|
||||||
|
'type': 'table',
|
||||||
|
'name': 'v_auxiliary',
|
||||||
|
'tbl_name': 'v_auxiliary',
|
||||||
|
'rootpage': 11,
|
||||||
|
'sql': 'CREATE TABLE "v_auxiliary"( rowid integer PRIMARY KEY , value00)',
|
||||||
|
}),
|
||||||
|
OrderedDict({
|
||||||
|
'type': 'table',
|
||||||
|
'name': 'v_chunks',
|
||||||
|
'tbl_name': 'v_chunks',
|
||||||
|
'rootpage': 2,
|
||||||
|
'sql': 'CREATE TABLE "v_chunks"(chunk_id INTEGER PRIMARY KEY AUTOINCREMENT,size INTEGER NOT NULL,sequence_id integer,partition00,validity BLOB NOT NULL, rowids BLOB NOT NULL)',
|
||||||
|
}),
|
||||||
|
OrderedDict({
|
||||||
|
'type': 'table',
|
||||||
|
'name': 'v_metadata_chunks00',
|
||||||
|
'tbl_name': 'v_metadata_chunks00',
|
||||||
|
'rootpage': 7,
|
||||||
|
'sql': 'CREATE TABLE "v_metadata_chunks00"(rowid PRIMARY KEY, data BLOB NOT NULL)',
|
||||||
|
}),
|
||||||
|
OrderedDict({
|
||||||
|
'type': 'table',
|
||||||
|
'name': 'v_metadata_text_data_00',
|
||||||
|
'tbl_name': 'v_metadata_text_data_00',
|
||||||
|
'rootpage': 9,
|
||||||
|
'sql': 'CREATE TABLE "v_metadata_text_data_00"(rowid PRIMARY KEY, data TEXT)',
|
||||||
|
}),
|
||||||
|
OrderedDict({
|
||||||
|
'type': 'table',
|
||||||
|
'name': 'v_rowids',
|
||||||
|
'tbl_name': 'v_rowids',
|
||||||
|
'rootpage': 4,
|
||||||
|
'sql': 'CREATE TABLE "v_rowids"(rowid INTEGER PRIMARY KEY AUTOINCREMENT,id,chunk_id INTEGER,chunk_offset INTEGER)',
|
||||||
|
}),
|
||||||
|
OrderedDict({
|
||||||
|
'type': 'table',
|
||||||
|
'name': 'v_vector_chunks00',
|
||||||
|
'tbl_name': 'v_vector_chunks00',
|
||||||
|
'rootpage': 5,
|
||||||
|
'sql': 'CREATE TABLE "v_vector_chunks00"(rowid PRIMARY KEY,vectors BLOB NOT NULL)',
|
||||||
|
}),
|
||||||
|
]),
|
||||||
|
})
|
||||||
|
# ---
|
||||||
|
# name: test_shadow.1
|
||||||
|
OrderedDict({
|
||||||
|
'sql': "select * from pragma_table_list where type = 'shadow'",
|
||||||
|
'rows': list([
|
||||||
|
OrderedDict({
|
||||||
|
'schema': 'main',
|
||||||
|
'name': 'v_auxiliary',
|
||||||
|
'type': 'shadow',
|
||||||
|
'ncol': 2,
|
||||||
|
'wr': 0,
|
||||||
|
'strict': 0,
|
||||||
|
}),
|
||||||
|
OrderedDict({
|
||||||
|
'schema': 'main',
|
||||||
|
'name': 'v_rowids',
|
||||||
|
'type': 'shadow',
|
||||||
|
'ncol': 4,
|
||||||
|
'wr': 0,
|
||||||
|
'strict': 0,
|
||||||
|
}),
|
||||||
|
OrderedDict({
|
||||||
|
'schema': 'main',
|
||||||
|
'name': 'v_chunks',
|
||||||
|
'type': 'shadow',
|
||||||
|
'ncol': 6,
|
||||||
|
'wr': 0,
|
||||||
|
'strict': 0,
|
||||||
|
}),
|
||||||
|
]),
|
||||||
|
})
|
||||||
|
# ---
|
||||||
|
# name: test_shadow.2
|
||||||
|
OrderedDict({
|
||||||
|
'sql': "select * from pragma_table_list where type = 'shadow'",
|
||||||
|
'rows': list([
|
||||||
|
]),
|
||||||
|
})
|
||||||
|
# ---
|
||||||
55
tests/test-general.py
Normal file
55
tests/test-general.py
Normal file
|
|
@ -0,0 +1,55 @@
|
||||||
|
import sqlite3
|
||||||
|
from collections import OrderedDict
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.skipif(
|
||||||
|
sqlite3.sqlite_version_info[1] < 37,
|
||||||
|
reason="pragma_table_list was added in SQLite 3.37",
|
||||||
|
)
|
||||||
|
def test_shadow(db, snapshot):
|
||||||
|
db.execute(
|
||||||
|
"create virtual table v using vec0(a float[1], partition text partition key, metadata text, +name text, chunk_size=8)"
|
||||||
|
)
|
||||||
|
assert exec(db, "select * from sqlite_master order by name") == snapshot()
|
||||||
|
assert (
|
||||||
|
exec(db, "select * from pragma_table_list where type = 'shadow'") == snapshot()
|
||||||
|
)
|
||||||
|
|
||||||
|
db.execute("drop table v;")
|
||||||
|
assert (
|
||||||
|
exec(db, "select * from pragma_table_list where type = 'shadow'") == snapshot()
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def exec(db, sql, parameters=[]):
|
||||||
|
try:
|
||||||
|
rows = db.execute(sql, parameters).fetchall()
|
||||||
|
except (sqlite3.OperationalError, sqlite3.DatabaseError) as e:
|
||||||
|
return {
|
||||||
|
"error": e.__class__.__name__,
|
||||||
|
"message": str(e),
|
||||||
|
}
|
||||||
|
a = []
|
||||||
|
for row in rows:
|
||||||
|
o = OrderedDict()
|
||||||
|
for k in row.keys():
|
||||||
|
o[k] = row[k]
|
||||||
|
a.append(o)
|
||||||
|
result = OrderedDict()
|
||||||
|
result["sql"] = sql
|
||||||
|
result["rows"] = a
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def vec0_shadow_table_contents(db, v):
|
||||||
|
shadow_tables = [
|
||||||
|
row[0]
|
||||||
|
for row in db.execute(
|
||||||
|
"select name from sqlite_master where name like ? order by 1", [f"{v}_%"]
|
||||||
|
).fetchall()
|
||||||
|
]
|
||||||
|
o = {}
|
||||||
|
for shadow_table in shadow_tables:
|
||||||
|
o[shadow_table] = exec(db, f"select * from {shadow_table}")
|
||||||
|
return o
|
||||||
Loading…
Add table
Add a link
Reference in a new issue