Metadata filtering (#124)

* initial pass at PARTITION KEY support. * Initial pass, allow auxiliary columns on vec0 virtual tables * update TODO * Initial pass at metadata filtering * unit tests * gha this PR branch * fixup tests * doc internal * fix tests, KNN/rowids in * define SQLITE_INDEX_CONSTRAINT_OFFSET * whoops * update tests, syrupy, use uv * un ignore pyproject.toml * dot * tests/ * type error? * win: .exe, update error name * try fix macos python, paren around expr? * win bash? * dbg :( * explicit error * op * dbg win * win ./tests/.venv/Scripts/python.exe * block UPDATEs on partition key values for now * test this branch * accidentally removved "partition key type mistmatch" block during merge * typo ugh * bruv * start aux snapshots * drop aux shadow table on destroy * enforce column types * block WHERE constraints on auxiliary columns in KNN queries * support delete * support UPDATE on auxiliary columns * test this PR * dont inline that * test-metadata.py * memzero text buffer * stress test * more snpashot tests * rm double/int32, just float/int64 * finish type checking * long text support * DELETE support * UPDATE support * fix snapshot names * drop not-used in eqp * small fixes * boolean comparison handling * ensure error is raised when long string constraint * new version string for beta builds * typo whoops * ann-filtering-benchmark directory * test-case * updates * fix aux column error when using non-default rowid values, needs test * refactor some text knn filtering * rowids blob read only on text metadata filters * refactor * add failing test causes for non eq text knn * text knn NE * test cases diff * GT * text knn GT/GE fixes * text knn LT/LE * clean * vtab_in handling * unblock aux failures for now * guard sqlite3_vtab_in * else in guard? * fixes and tests * add broken shadow table test * rename _metadata_chunksNN shadown table to _metadatachunksNN, for proper shadowName detection * _metadata_text_NN shadow tables to _metadatatextNN * SQLITE_VEC_VERSION_MAJOR SQLITE_VEC_VERSION_MINOR and SQLITE_VEC_VERSION_PATCH in sqlite-vec.h * _info shadow table * forgot to update aux snapshot? * fix aux tests
2026-06-26 15:49:42 +02:00 · 2024-11-20 00:59:34 -08:00 · 2024-11-20 00:59:34 -08:00 · 352f953fc0
commit 352f953fc0
parent 9bfeaa7842
21 changed files with 7361 additions and 105 deletions
--- a/.github/workflows/test.yaml
+++ b/.github/workflows/test.yaml
@ -5,6 +5,7 @@ on:
      - main
      - partition-by
      - auxiliary
+      - metadata-filtering
 permissions:
  contents: read
 jobs:
--- a/.gitignore
+++ b/.gitignore
@ -26,3 +26,8 @@ sqlite-vec.h
 tmp/

 poetry.lock
+
+*.jsonl
+
+memstat.c
+memstat.*
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@ -1,5 +1,51 @@
+# `sqlite-vec` Architecture
+
+Internal documentation for how `sqlite-vec` works under-the-hood. Not meant for
+users of the `sqlite-vec` project, consult
+[the official `sqlite-vec` documentation](https://alexgarcia.xyz/sqlite-vec) for
+how-to-guides. Rather, this is for people interested in how `sqlite-vec` works
+and some guidelines to any future contributors.
+
+Very much a WIP.
+
 ## `vec0`

+### Shadow Tables
+
+#### `xyz_chunks`
+
+- `chunk_id INTEGER`
+- `size INTEGER`
+- `validity BLOB`
+- `rowids BLOB`
+
+#### `xyz_rowids`
+
+- `rowid INTEGER`
+- `id`
+- `chunk_id INTEGER`
+- `chunk_offset INTEGER`
+
+#### `xyz_vector_chunksNN`
+
+- `rowid INTEGER`
+- `vector BLOB`
+
+#### `xyz_auxiliary`
+
+- `rowid INTEGER`
+- `valueNN [type]`
+
+#### `xyz_metadatachunksNN`
+
+- `rowid INTEGER`
+- `data BLOB`
+
+#### `xyz_metadatatextNN`
+
+- `rowid INTEGER`
+- `data TEXT`
+
 ### idxStr

 The `vec0` idxStr is a string composed of single "header" character and 0 or
@ -14,8 +60,11 @@ The "header" charcter denotes the type of query plan, as determined by the
 | `VEC0_QUERY_PLAN_POINT`    | `'2'` | Perform a single-lookup point query for the provided rowid             |
 | `VEC0_QUERY_PLAN_KNN`      | `'3'` | Perform a KNN-style query on the provided query vector and parameters. |

-Each 4-character "block" is associated with a corresponding value in `argv[]`. For example, the 1st block at byte offset `1-4` (inclusive) is the 1st block and is associated with `argv[1]`. The 2nd block at byte offset `5-8` (inclusive) is associated with `argv[2]` and so on. Each block describes what kind of value or filter the given `argv[i]` value is.
-
+Each 4-character "block" is associated with a corresponding value in `argv[]`.
+For example, the 1st block at byte offset `1-4` (inclusive) is the 1st block and
+is associated with `argv[1]`. The 2nd block at byte offset `5-8` (inclusive) is
+associated with `argv[2]` and so on. Each block describes what kind of value or
+filter the given `argv[i]` value is.

 #### `VEC0_IDXSTR_KIND_KNN_MATCH` (`'{'`)

@ -31,8 +80,8 @@ The remaining 3 characters of the block are `_` fillers.

 #### `VEC0_IDXSTR_KIND_KNN_ROWID_IN` (`'['`)

-`argv[i]` is the optional `rowid in (...)` value, and must be handled with [`sqlite3_vtab_in_first()` /
-`sqlite3_vtab_in_next()`](https://www.sqlite.org/c3ref/vtab_in_first.html).
+`argv[i]` is the optional `rowid in (...)` value, and must be handled with
+[`sqlite3_vtab_in_first()` / `sqlite3_vtab_in_next()`](https://www.sqlite.org/c3ref/vtab_in_first.html).

 The remaining 3 characters of the block are `_` fillers.

@ -40,15 +89,34 @@ The remaining 3 characters of the block are `_` fillers.

 `argv[i]` is a "constraint" on a specific partition key.

-The second character of the block denotes which partition key to filter on, using `A` to denote the first partition key column, `B` for the second, etc. It is encoded with `'A' + partition_idx` and can be decoded with `c - 'A'`.
+The second character of the block denotes which partition key to filter on,
+using `A` to denote the first partition key column, `B` for the second, etc. It
+is encoded with `'A' + partition_idx` and can be decoded with `c - 'A'`.

-The third character of the block denotes which operator is used in the constraint. It will be one of the values of `enum vec0_partition_operator`, as only a subset of operations are supported on partition keys.
+The third character of the block denotes which operator is used in the
+constraint. It will be one of the values of `enum vec0_partition_operator`, as
+only a subset of operations are supported on partition keys.

 The fourth character of the block is a `_` filler.

-
 #### `VEC0_IDXSTR_KIND_POINT_ID` (`'!'`)

 `argv[i]` is the value of the rowid or id to match against for the point query.

 The remaining 3 characters of the block are `_` fillers.
+
+#### `VEC0_IDXSTR_KIND_METADATA_CONSTRAINT` (`'&'`)
+
+`argv[i]` is the value of the `WHERE` constraint for a metdata column in a KNN
+query.
+
+The second character of the block denotes which metadata column the constraint
+belongs to, using `A` to denote the first metadata column column, `B` for the
+second, etc. It is encoded with `'A' + metadata_idx` and can be decoded with
+`c - 'A'`.
+
+The third character of the block is the constraint operator. It will be one of
+`enum vec0_metadata_operator`, as only a subset of operators are supported on
+metadata column KNN filters.
+
+The foruth character of the block is a `_` filler.
--- a/3
+++ b/3
@ -153,6 +153,9 @@ sqlite-vec.h: sqlite-vec.h.tmpl VERSION
 	VERSION=$(shell cat VERSION) \
 	DATE=$(shell date -r VERSION +'%FT%TZ%z') \
 	SOURCE=$(shell git log -n 1 --pretty=format:%H -- VERSION) \
+	VERSION_MAJOR=$$(echo $$VERSION | cut -d. -f1) \
+	VERSION_MINOR=$$(echo $$VERSION | cut -d. -f2) \
+	VERSION_PATCH=$$(echo $$VERSION | cut -d. -f3 | cut -d- -f1) \
 	envsubst < $< > $@

 clean:
--- a/28
+++ b/28
@ -1,13 +1,17 @@
-# partition
+- [ ] add `xyz_info` shadow table with version etc.

- [ ] UPDATE on partition key values
-  - remove previous row from chunk, insert into new one?
- [ ] properly sqlite3_vtab_nochange / sqlite3_value_nochange handling
-
-# auxiliary columns
-
- later:
-  - NOT NULL?
-  - perf: INSERT stmt should be cached on vec0_vtab
-  - perf: LEFT JOIN aux table to rowids query in vec0_cursor for rowid/point
-    stmts, to avoid N lookup queries
+- later
+  - [ ] partition: UPDATE support
+  - [ ] skip invalid validity entries in knn filter?
+  - [ ] nulls in metadata
+  - [ ] partition `x in (...)` handling
+  - [ ] blobs/date/datetime
+  - [ ] uuid/ulid perf
+  - [ ] Aux columns: `NOT NULL` constraint
+  - [ ] Metadata columns: `NOT NULL` constraint
+   - [ ] Partiion key: `NOT NULL` constraint
+  - [ ] dictionary encoding?
+  - [ ] properly sqlite3_vtab_nochange / sqlite3_value_nochange handling
+  - [ ] perf
+    - [ ] aux: cache INSERT
+    - [ ] aux: LEFT JOIN on `_rowids` queries to avoid N lookup queries
--- a/sqlite-vec.c
+++ b/sqlite-vec.c
--- a/sqlite-vec.h.tmpl
+++ b/sqlite-vec.h.tmpl
@ -18,9 +18,16 @@
 #endif

 #define SQLITE_VEC_VERSION "v${VERSION}"
+// TODO rm
+#define SQLITE_VEC_VERSION "v-metadata-experiment.01"
 #define SQLITE_VEC_DATE "${DATE}"
 #define SQLITE_VEC_SOURCE "${SOURCE}"

+
+#define SQLITE_VEC_VERSION_MAJOR ${VERSION_MAJOR}
+#define SQLITE_VEC_VERSION_MINOR ${VERSION_MINOR}
+#define SQLITE_VEC_VERSION_PATCH ${VERSION_PATCH}
+
 #ifdef __cplusplus
 extern "C" {
 #endif
--- a/test.sql
+++ b/test.sql
@ -1,10 +1,333 @@

-.load dist/vec0
-.echo on
+.load dist/vec0main
 .bail on

 .mode qbox

+
+.load ./memstat
+.echo on
+
+select name, value from sqlite_memstat where name = 'MEMORY_USED';
+
+create virtual table v using vec0(
+  vector float[1],
+  name1 text,
+  name2 text,
+  age int,
+  chunk_size=8
+);
+
+select name, value from sqlite_memstat where name = 'MEMORY_USED';
+
+insert into v(vector, name1, name2, age) values
+  ('[1]', 'alex', 'xxxx', 1),
+  ('[2]', 'alex', 'aaaa', 2),
+  ('[3]', 'alex', 'aaaa', 3),
+  ('[4]', 'brian', 'aaaa', 1),
+  ('[5]', 'brian', 'aaaa', 2),
+  ('[6]', 'brian', 'aaaa', 3),
+  ('[7]', 'craig', 'aaaa', 1),
+  ('[8]', 'craig', 'xxxx', 2),
+  ('[9]', 'craig', 'xxxx', 3),
+  ('[10]', '123456789012345', 'xxxx', 3);
+
+select name, value from sqlite_memstat where name = 'MEMORY_USED';
+
+select rowid, name1, name2, age, vec_to_json(vector)
+from v
+where vector match '[0]'
+  and k = 5
+  and name1 in ('alex', 'brian', 'craig')
+  --and name2 in ('aaaa', 'xxxx')
+  and age in (1, 2, 3, 2222,3333,4444);
+
+select name, value from sqlite_memstat where name = 'MEMORY_USED';
+
+select rowid, name1, name2, age, vec_to_json(vector)
+from v
+where vector match '[0]'
+  and k = 5
+  and name1 in ('123456789012345', 'superfluous');
+
+
+.exit
+
+create virtual table v using vec0(
+  vector float[1],
+  +description text
+);
+insert into v(rowid, vector, description) values (1, '[1]', 'aaa');
+select * from v;
+
+.exit
+
+create virtual table vec_articles using vec0(
+  article_id integer primary key,
+  year integer partition key,
+  headline_embedding float[1],
+  +headline text,
+  +url text,
+  word_count integer,
+  print_section text,
+  print_page integer,
+  pub_date text,
+);
+
+insert into vec_articles values (1111, 2020, '[1]', 'headline', 'https://...', 200, 'A', 1, '2020-01-01');
+
+select * from vec_articles;
+
+.exit
+
+
+create table movies(movie_id integer primary key, synopsis text);
+INSERT INTO movies(movie_id, synopsis)
+VALUES
+  (1, 'A family is haunted by demonic spirits after moving into a new house, requiring the help of paranormal investigators.'),
+  (2, 'Two dim-witted friends embark on a cross-country road trip to return a briefcase full of money to its owner.'),
+  (3, 'A team of explorers travels through a wormhole in space in an attempt to ensure humanity’s survival.'),
+  (4, 'A young hobbit embarks on a journey with a fellowship to destroy a powerful ring and save Middle-earth from darkness.'),
+  (5, 'A documentary about the dangers of global warming, featuring former U.S. Vice President Al Gore.'),
+  (6, 'After the death of her secretive mother, a woman discovers terrifying secrets about her family lineage.'),
+  (7, 'A clueless but charismatic TV anchorman struggles to stay relevant in the world of broadcast journalism.'),
+  (8, 'A young blade runner uncovers a long-buried secret that leads him to track down former blade runner Rick Deckard.'),
+  (9, 'A young boy discovers he is a wizard and attends a magical school, where he learns about his destiny.'),
+  (10, 'A rock climber attempts to scale El Capitan in Yosemite National Park without the use of ropes or safety gear.'),
+  (11, 'A young African-American man uncovers a disturbing secret when he visits his white girlfriend''s family estate.'),
+  (12, 'Three friends wake up from a bachelor party in Las Vegas with no memory of the previous night and must retrace their steps.'),
+  (13, 'A computer hacker learns about the true nature of his reality and his role in the war against its controllers.'),
+  (14, 'In post-Civil War Spain, a young girl escapes into an eerie but captivating fantasy world.'),
+  (15, 'A documentary that explores racial inequality in the United States, focusing on the prison system and mass incarceration.'),
+  (16, 'A young woman is followed by an unknown supernatural force after a sexual encounter.'),
+  (17, 'Two immature but well-meaning stepbrothers become instant rivals when their single parents marry.'),
+  (18, 'A thief with the ability to enter people''s dreams is tasked with planting an idea into a target''s subconscious.'),
+  (19, 'A mute woman forms a unique relationship with a mysterious aquatic creature being held in a secret research facility.'),
+  (20, 'A documentary about the life and legacy of Fred Rogers, the beloved host of the children''s TV show "Mister Rogers'' Neighborhood."');
+
+
+create virtual table vec_movies using vec0(
+  movie_id integer primary key,
+  synopsis_embedding float[1],
+  +title text,
+  genre text,
+  num_reviews int,
+  mean_rating float,
+  chunk_size=8
+);
+
+.schema
+/*
+insert into vec_movies(movie_id, synopsis_embedding, num_reviews, mean_rating) values
+  (1, '[1]', 153, 4.6),
+  (2, '[2]', 382, 2.6),
+  (3, '[3]', 53, 5.0),
+  (4, '[4]', 210, 4.2),
+  (5, '[5]', 93, 3.4),
+  (6, '[6]', 167, 4.7),
+  (7, '[7]', 482, 2.9),
+  (8, '[8]', 301, 5.0),
+  (9, '[9]', 134, 4.1),
+  (10, '[10]', 66, 3.2),
+  (11, '[11]', 88, 4.9),
+  (12, '[12]', 59, 2.8),
+  (13, '[13]', 423, 4.5),
+  (14, '[14]', 275, 3.6),
+  (15, '[15]', 191, 4.4),
+  (16, '[16]', 314, 4.3),
+  (17, '[17]', 74, 3.0),
+  (18, '[18]', 201, 5.0),
+  (19, '[19]', 399, 2.7),
+  (20, '[20]', 186, 4.8);
+*/
+
+/*
+
+INSERT INTO vec_movies(movie_id, synopsis_embedding, genre, num_reviews, mean_rating)
+VALUES
+  (1, '[1]', 'horror', 153, 4.6),
+  (2, '[2]', 'comedy', 382, 2.6),
+  (3, '[3]', 'scifi', 53, 5.0),
+  (4, '[4]', 'fantasy', 210, 4.2),
+  (5, '[5]', 'documentary', 93, 3.4),
+  (6, '[6]', 'horror', 167, 4.7),
+  (7, '[7]', 'comedy', 482, 2.9),
+  (8, '[8]', 'scifi', 301, 5.0),
+  (9, '[9]', 'fantasy', 134, 4.1),
+  (10, '[10]', 'documentary', 66, 3.2),
+  (11, '[11]', 'horror', 88, 4.9),
+  (12, '[12]', 'comedy', 59, 2.8),
+  (13, '[13]', 'scifi', 423, 4.5),
+  (14, '[14]', 'fantasy', 275, 3.6),
+  (15, '[15]', 'documentary', 191, 4.4),
+  (16, '[16]', 'horror', 314, 4.3),
+  (17, '[17]', 'comedy', 74, 3.0),
+  (18, '[18]', 'scifi', 201, 5.0),
+  (19, '[19]', 'fantasy', 399, 2.7),
+  (20, '[20]', 'documentary', 186, 4.8);
+*/
+
+INSERT INTO vec_movies(movie_id, synopsis_embedding, genre, title, num_reviews, mean_rating)
+VALUES
+  (1, '[1]', 'horror', 'The Conjuring', 153, 4.6),
+  (2, '[2]', 'comedy', 'Dumb and Dumber', 382, 2.6),
+  (3, '[3]', 'scifi', 'Interstellar', 53, 5.0),
+  (4, '[4]', 'fantasy', 'The Lord of the Rings: The Fellowship of the Ring', 210, 4.2),
+  (5, '[5]', 'documentary', 'An Inconvenient Truth', 93, 3.4),
+  (6, '[6]', 'horror', 'Hereditary', 167, 4.7),
+  (7, '[7]', 'comedy', 'Anchorman: The Legend of Ron Burgundy', 482, 2.9),
+  (8, '[8]', 'scifi', 'Blade Runner 2049', 301, 5.0),
+  (9, '[9]', 'fantasy', 'Harry Potter and the Sorcerer''s Stone', 134, 4.1),
+  (10, '[10]', 'documentary', 'Free Solo', 66, 3.2),
+  (11, '[11]', 'horror', 'Get Out', 88, 4.9),
+  (12, '[12]', 'comedy', 'The Hangover', 59, 2.8),
+  (13, '[13]', 'scifi', 'The Matrix', 423, 4.5),
+  (14, '[14]', 'fantasy', 'Pan''s Labyrinth', 275, 3.6),
+  (15, '[15]', 'documentary', '13th', 191, 4.4),
+  (16, '[16]', 'horror', 'It Follows', 314, 4.3),
+  (17, '[17]', 'comedy', 'Step Brothers', 74, 3.0),
+  (18, '[18]', 'scifi', 'Inception', 201, 5.0),
+  (19, '[19]', 'fantasy', 'The Shape of Water', 399, 2.7),
+  (20, '[20]', 'documentary', 'Won''t You Be My Neighbor?', 186, 4.8),
+  (21, '[21]', 'scifi', 'Gravity', 342, 4.0),
+  (22, '[22]', 'scifi', 'Dune', 451, 4.4),
+  (23, '[23]', 'scifi', 'The Martian', 522, 4.6),
+  (24, '[24]', 'horror', 'A Quiet Place', 271, 4.3),
+  (25, '[25]', 'fantasy', 'The Chronicles of Narnia: The Lion, the Witch and the Wardrobe', 310, 3.9);
+
+--select * from vec_movies;
+--select * from vec_movies_metadata_chunks00;
+
+
+create virtual table vec_chunks using vec0(
+  user_id integer partition key,
+  +contents text,
+  contents_embedding float[1],
+);
+
+INSERT INTO vec_chunks (rowid, user_id, contents, contents_embedding) VALUES
+(1, 123, 'Our PTO policy allows employees to take both vacation and sick leave as needed.', '[1]'),
+(2, 123, 'Employees must provide notice at least two weeks in advance for planned vacations.', '[2]'),
+(3, 123, 'Sick leave can be taken without advance notice, but employees must inform their manager.', '[3]'),
+(4, 123, 'Unused PTO can be carried over to the following year, up to a maximum of 40 hours.', '[4]'),
+(5, 123, 'PTO must be used in increments of at least 4 hours.', '[5]'),
+(6, 456, 'New employees are granted 10 days of PTO during their first year of employment.', '[6]'),
+(7, 456, 'After the first year, employees earn an additional day of PTO for each year of service.', '[7]'),
+(8, 789, 'PTO requests will be reviewed by the HR department and are subject to approval.', '[8]'),
+(9, 789, 'The company reserves the right to deny PTO requests during peak operational periods.', '[9]'),
+(10, 456, 'If PTO is denied, the employee will be given an alternative time to take leave.', '[10]'),
+(11, 789, 'Employees who are out of PTO must request unpaid leave for any additional time off.', '[11]'),
+(12, 789, 'In case of a family emergency, employees can request emergency leave.', '[12]'),
+(13, 456, 'Emergency leave may be granted for personal or family illness, or other critical situations.', '[13]'),
+(14, 789, 'The maximum length of emergency leave is subject to company discretion.', '[14]'),
+(15, 123, 'All PTO balances will be displayed on the employee self-service portal.', '[15]'),
+(16, 456, 'Employees who are terminated will be paid for unused PTO, as per state law.', '[16]'),
+(17, 123, 'Part-time employees are eligible for PTO on a pro-rata basis.', '[17]'),
+(18, 789, 'The company encourages employees to use their PTO to maintain work-life balance.', '[18]'),
+(19, 456, 'Employees should not book travel plans until their PTO request has been approved.', '[19]'),
+(20, 123, 'Managers are responsible for tracking their team members'' PTO usage.', '[20]');
+
+select rowid, user_id, contents, distance
+from vec_chunks
+where contents_embedding match '[19]'
+  and user_id = 123
+  and k = 5;
+
+.exit
+
+
+
+
+
+-- PARTITION KEY and auxiliar columns!
+create virtual table vec_chunks using vec0(
+  -- internally shard the vector index by user
+  user_id integer partition key,
+  -- store the chunk text pre-embedding as an "auxiliary column"
+  +contents text,
+  contents_embeddings float[1024],
+);
+
+select rowid, user_id, contents, distance
+from vec_chunks
+where contents_embedding match '[...]'
+  and user_id = 123
+  and k = 5;
+/*
+┌───────┬─────────┬──────────────────────────────────────────────────────────────┬──────────┐
+│ rowid │ user_id │                           contents                           │ distance │
+├───────┼─────────┼──────────────────────────────────────────────────────────────┼──────────┤
+│ 20    │ 123     │ 'Managers are responsible for tracking their team members''  │ 1.0      │
+│       │         │ PTO usage.'                                                  │          │
+├───────┼─────────┼──────────────────────────────────────────────────────────────┼──────────┤
+│ 17    │ 123     │ 'Part-time employees are eligible for PTO on a pro-rata basi │ 2.0      │
+│       │         │ s.'                                                          │          │
+├───────┼─────────┼──────────────────────────────────────────────────────────────┼──────────┤
+│ 15    │ 123     │ 'All PTO balances will be displayed on the employee self-ser │ 4.0      │
+│       │         │ vice portal.'                                                │          │
+├───────┼─────────┼──────────────────────────────────────────────────────────────┼──────────┤
+│ 5     │ 123     │ 'PTO must be used in increments of at least 4 hours.'        │ 14.0     │
+├───────┼─────────┼──────────────────────────────────────────────────────────────┼──────────┤
+│ 4     │ 123     │ 'Unused PTO can be carried over to the following year, up to │ 15.0     │
+│       │         │  a maximum of 40 hours.'                                     │          │
+└───────┴─────────┴──────────────────────────────────────────────────────────────┴──────────┘
+*/
+
+
+
+
+
+-- metadata filters!
+create virtual table vec_movies using vec0(
+  movie_id integer primary key,
+  synopsis_embedding float[1024],
+  genre text,
+  num_reviews int,
+  mean_rating float
+);
+
+select
+  movie_id,
+  title,
+  genre,
+  num_reviews,
+  mean_rating,
+  distance
+from vec_movies
+where synopsis_embedding match '[15.5]'
+  and genre = 'scifi'
+  and num_reviews between 100 and 500
+  and mean_rating > 3.5
+  and k = 5;
+/*
+┌──────────┬─────────────────────┬─────────┬─────────────┬──────────────────┬──────────┐
+│ movie_id │        title        │  genre  │ num_reviews │   mean_rating    │ distance │
+├──────────┼─────────────────────┼─────────┼─────────────┼──────────────────┼──────────┤
+│ 13       │ 'The Matrix'        │ 'scifi' │ 423         │ 4.5              │ 2.5      │
+│ 18       │ 'Inception'         │ 'scifi' │ 201         │ 5.0              │ 2.5      │
+│ 21       │ 'Gravity'           │ 'scifi' │ 342         │ 4.0              │ 5.5      │
+│ 22       │ 'Dune'              │ 'scifi' │ 451         │ 4.40000009536743 │ 6.5      │
+│ 8        │ 'Blade Runner 2049' │ 'scifi' │ 301         │ 5.0              │ 7.5      │
+└──────────┴─────────────────────┴─────────┴─────────────┴──────────────────┴──────────┘
+*/
+
+
+
+
+.exit
+
+create virtual table vec_movies using vec0(
+  movie_id integer primary key,
+  synopsis_embedding float[768],
+  genre text,
+  num_reviews int,
+  mean_rating float,
+);
+
+
+.exit
+
+
 create virtual table vec_chunks using vec0(
  chunk_id integer primary key,
  contents_embedding float[1],
--- a/tests/snapshots/test-auxiliary.ambr
+++ b/tests/snapshots/test-auxiliary.ambr
@ -316,7 +316,7 @@
        'type': 'table',
        'name': 'sqlite_sequence',
        'tbl_name': 'sqlite_sequence',
-        'rootpage': 3,
+        'rootpage': 5,
        'sql': 'CREATE TABLE sqlite_sequence(name,seq)',
      }),
    ]),
@ -326,18 +326,25 @@
  OrderedDict({
    'sql': 'select * from sqlite_master order by name',
    'rows': list([
+      OrderedDict({
+        'type': 'index',
+        'name': 'sqlite_autoindex_v_info_1',
+        'tbl_name': 'v_info',
+        'rootpage': 3,
+        'sql': None,
+      }),
      OrderedDict({
        'type': 'index',
        'name': 'sqlite_autoindex_v_vector_chunks00_1',
        'tbl_name': 'v_vector_chunks00',
-        'rootpage': 6,
+        'rootpage': 8,
        'sql': None,
      }),
      OrderedDict({
        'type': 'table',
        'name': 'sqlite_sequence',
        'tbl_name': 'sqlite_sequence',
-        'rootpage': 3,
+        'rootpage': 5,
        'sql': 'CREATE TABLE sqlite_sequence(name,seq)',
      }),
      OrderedDict({
@ -351,28 +358,35 @@
        'type': 'table',
        'name': 'v_auxiliary',
        'tbl_name': 'v_auxiliary',
-        'rootpage': 7,
+        'rootpage': 9,
        'sql': 'CREATE TABLE "v_auxiliary"( rowid integer PRIMARY KEY , value00)',
      }),
      OrderedDict({
        'type': 'table',
        'name': 'v_chunks',
        'tbl_name': 'v_chunks',
-        'rootpage': 2,
+        'rootpage': 4,
        'sql': 'CREATE TABLE "v_chunks"(chunk_id INTEGER PRIMARY KEY AUTOINCREMENT,size INTEGER NOT NULL,validity BLOB NOT NULL,rowids BLOB NOT NULL)',
      }),
+      OrderedDict({
+        'type': 'table',
+        'name': 'v_info',
+        'tbl_name': 'v_info',
+        'rootpage': 2,
+        'sql': 'CREATE TABLE "v_info" (key text primary key, value any)',
+      }),
      OrderedDict({
        'type': 'table',
        'name': 'v_rowids',
        'tbl_name': 'v_rowids',
-        'rootpage': 4,
+        'rootpage': 6,
        'sql': 'CREATE TABLE "v_rowids"(rowid INTEGER PRIMARY KEY AUTOINCREMENT,id,chunk_id INTEGER,chunk_offset INTEGER)',
      }),
      OrderedDict({
        'type': 'table',
        'name': 'v_vector_chunks00',
        'tbl_name': 'v_vector_chunks00',
-        'rootpage': 5,
+        'rootpage': 7,
        'sql': 'CREATE TABLE "v_vector_chunks00"(rowid PRIMARY KEY,vectors BLOB NOT NULL)',
      }),
    ]),
@ -409,25 +423,25 @@
 # ---
 # name: test_types.3
  dict({
-    'error': 'OperationalError',
+    'error': 'IntegrityError',
    'message': 'Auxiliary column type mismatch: The auxiliary column aux_int has type INTEGER, but TEXT was provided.',
  })
 # ---
 # name: test_types.4
  dict({
-    'error': 'OperationalError',
+    'error': 'IntegrityError',
    'message': 'Auxiliary column type mismatch: The auxiliary column aux_float has type FLOAT, but TEXT was provided.',
  })
 # ---
 # name: test_types.5
  dict({
-    'error': 'OperationalError',
+    'error': 'IntegrityError',
    'message': 'Auxiliary column type mismatch: The auxiliary column aux_text has type TEXT, but INTEGER was provided.',
  })
 # ---
 # name: test_types.6
  dict({
-    'error': 'OperationalError',
+    'error': 'IntegrityError',
    'message': 'Auxiliary column type mismatch: The auxiliary column aux_blob has type BLOB, but INTEGER was provided.',
  })
 # ---
--- a/tests/snapshots/test-general.ambr
+++ b/tests/snapshots/test-general.ambr
@ -0,0 +1,184 @@
+# serializer version: 1
+# name: test_info
+  OrderedDict({
+    'sql': 'select key, typeof(value) from v_info order by 1',
+    'rows': list([
+      OrderedDict({
+        'key': 'CREATE_VERSION',
+        'typeof(value)': 'text',
+      }),
+      OrderedDict({
+        'key': 'CREATE_VERSION_MAJOR',
+        'typeof(value)': 'integer',
+      }),
+      OrderedDict({
+        'key': 'CREATE_VERSION_MINOR',
+        'typeof(value)': 'integer',
+      }),
+      OrderedDict({
+        'key': 'CREATE_VERSION_PATCH',
+        'typeof(value)': 'integer',
+      }),
+    ]),
+  })
+# ---
+# name: test_shadow
+  OrderedDict({
+    'sql': 'select * from sqlite_master order by name',
+    'rows': list([
+      OrderedDict({
+        'type': 'index',
+        'name': 'sqlite_autoindex_v_info_1',
+        'tbl_name': 'v_info',
+        'rootpage': 3,
+        'sql': None,
+      }),
+      OrderedDict({
+        'type': 'index',
+        'name': 'sqlite_autoindex_v_metadatachunks00_1',
+        'tbl_name': 'v_metadatachunks00',
+        'rootpage': 10,
+        'sql': None,
+      }),
+      OrderedDict({
+        'type': 'index',
+        'name': 'sqlite_autoindex_v_metadatatext00_1',
+        'tbl_name': 'v_metadatatext00',
+        'rootpage': 12,
+        'sql': None,
+      }),
+      OrderedDict({
+        'type': 'index',
+        'name': 'sqlite_autoindex_v_vector_chunks00_1',
+        'tbl_name': 'v_vector_chunks00',
+        'rootpage': 8,
+        'sql': None,
+      }),
+      OrderedDict({
+        'type': 'table',
+        'name': 'sqlite_sequence',
+        'tbl_name': 'sqlite_sequence',
+        'rootpage': 5,
+        'sql': 'CREATE TABLE sqlite_sequence(name,seq)',
+      }),
+      OrderedDict({
+        'type': 'table',
+        'name': 'v',
+        'tbl_name': 'v',
+        'rootpage': 0,
+        'sql': 'CREATE VIRTUAL TABLE v using vec0(a float[1], partition text partition key, metadata text, +name text, chunk_size=8)',
+      }),
+      OrderedDict({
+        'type': 'table',
+        'name': 'v_auxiliary',
+        'tbl_name': 'v_auxiliary',
+        'rootpage': 13,
+        'sql': 'CREATE TABLE "v_auxiliary"( rowid integer PRIMARY KEY , value00)',
+      }),
+      OrderedDict({
+        'type': 'table',
+        'name': 'v_chunks',
+        'tbl_name': 'v_chunks',
+        'rootpage': 4,
+        'sql': 'CREATE TABLE "v_chunks"(chunk_id INTEGER PRIMARY KEY AUTOINCREMENT,size INTEGER NOT NULL,sequence_id integer,partition00,validity BLOB NOT NULL, rowids BLOB NOT NULL)',
+      }),
+      OrderedDict({
+        'type': 'table',
+        'name': 'v_info',
+        'tbl_name': 'v_info',
+        'rootpage': 2,
+        'sql': 'CREATE TABLE "v_info" (key text primary key, value any)',
+      }),
+      OrderedDict({
+        'type': 'table',
+        'name': 'v_metadatachunks00',
+        'tbl_name': 'v_metadatachunks00',
+        'rootpage': 9,
+        'sql': 'CREATE TABLE "v_metadatachunks00"(rowid PRIMARY KEY, data BLOB NOT NULL)',
+      }),
+      OrderedDict({
+        'type': 'table',
+        'name': 'v_metadatatext00',
+        'tbl_name': 'v_metadatatext00',
+        'rootpage': 11,
+        'sql': 'CREATE TABLE "v_metadatatext00"(rowid PRIMARY KEY, data TEXT)',
+      }),
+      OrderedDict({
+        'type': 'table',
+        'name': 'v_rowids',
+        'tbl_name': 'v_rowids',
+        'rootpage': 6,
+        'sql': 'CREATE TABLE "v_rowids"(rowid INTEGER PRIMARY KEY AUTOINCREMENT,id,chunk_id INTEGER,chunk_offset INTEGER)',
+      }),
+      OrderedDict({
+        'type': 'table',
+        'name': 'v_vector_chunks00',
+        'tbl_name': 'v_vector_chunks00',
+        'rootpage': 7,
+        'sql': 'CREATE TABLE "v_vector_chunks00"(rowid PRIMARY KEY,vectors BLOB NOT NULL)',
+      }),
+    ]),
+  })
+# ---
+# name: test_shadow.1
+  OrderedDict({
+    'sql': "select * from pragma_table_list where type = 'shadow'",
+    'rows': list([
+      OrderedDict({
+        'schema': 'main',
+        'name': 'v_auxiliary',
+        'type': 'shadow',
+        'ncol': 2,
+        'wr': 0,
+        'strict': 0,
+      }),
+      OrderedDict({
+        'schema': 'main',
+        'name': 'v_chunks',
+        'type': 'shadow',
+        'ncol': 6,
+        'wr': 0,
+        'strict': 0,
+      }),
+      OrderedDict({
+        'schema': 'main',
+        'name': 'v_info',
+        'type': 'shadow',
+        'ncol': 2,
+        'wr': 0,
+        'strict': 0,
+      }),
+      OrderedDict({
+        'schema': 'main',
+        'name': 'v_rowids',
+        'type': 'shadow',
+        'ncol': 4,
+        'wr': 0,
+        'strict': 0,
+      }),
+      OrderedDict({
+        'schema': 'main',
+        'name': 'v_metadatachunks00',
+        'type': 'shadow',
+        'ncol': 2,
+        'wr': 0,
+        'strict': 0,
+      }),
+      OrderedDict({
+        'schema': 'main',
+        'name': 'v_metadatatext00',
+        'type': 'shadow',
+        'ncol': 2,
+        'wr': 0,
+        'strict': 0,
+      }),
+    ]),
+  })
+# ---
+# name: test_shadow.2
+  OrderedDict({
+    'sql': "select * from pragma_table_list where type = 'shadow'",
+    'rows': list([
+    ]),
+  })
+# ---
--- a/tests/snapshots/test-metadata.ambr
+++ b/tests/snapshots/test-metadata.ambr
--- a/tests/afbd/.gitignore
+++ b/tests/afbd/.gitignore
@ -0,0 +1 @@
+*.tgz
--- a/tests/afbd/.python-version
+++ b/tests/afbd/.python-version
@ -0,0 +1 @@
+3.12
--- a/tests/afbd/Makefile
+++ b/tests/afbd/Makefile
@ -0,0 +1,9 @@
+random_ints_1m.tgz:
+	curl -o $@ https://storage.googleapis.com/ann-filtered-benchmark/datasets/random_ints_1m.tgz
+
+random_float_1m.tgz:
+	curl -o $@ https://storage.googleapis.com/ann-filtered-benchmark/datasets/random_float_1m.tgz
+
+random_keywords_1m.tgz:
+	curl -o $@ https://storage.googleapis.com/ann-filtered-benchmark/datasets/random_keywords_1m.tgz
+all: random_ints_1m.tgz random_float_1m.tgz random_keywords_1m.tgz
--- a/tests/afbd/README.md
+++ b/tests/afbd/README.md
@ -0,0 +1,12 @@
+
+# hnm
+
+```
+tar -xOzf hnm.tgz ./tests.jsonl  > tests.jsonl
+solite q "select group_concat(distinct key) from lines_read('tests.jsonl'), json_each(line -> '$.conditions.and[0]')"
+```
+
+
+```
+> python test-afbd.py build hnm.tgz --metadata product_group_name,colour_group_name,index_group_name,perceived_colour_value_name,section_name,product_type_name,department_name,graphical_appearance_name,garment_group_name,perceived_colour_master_name
+```
--- a/tests/afbd/test-afbd.py
+++ b/tests/afbd/test-afbd.py
@ -0,0 +1,231 @@
+import numpy as np
+from tqdm import tqdm
+from deepdiff import DeepDiff
+
+import tarfile
+import json
+from io import BytesIO
+import sqlite3
+from typing import List
+from struct import pack
+import time
+from pathlib import Path
+import argparse
+
+
+def serialize_float32(vector: List[float]) -> bytes:
+    """Serializes a list of floats into the "raw bytes" format sqlite-vec expects"""
+    return pack("%sf" % len(vector), *vector)
+
+
+def build_command(file_path, metadata_set=None):
+    if metadata_set:
+        metadata_set = set(metadata_set.split(","))
+
+    file_path = Path(file_path)
+    print(f"reading {file_path}...")
+    t0 = time.time()
+    with tarfile.open(file_path, "r:gz") as archive:
+        for file in archive:
+            if file.name == "./payloads.jsonl":
+                payloads = [
+                    json.loads(line)
+                    for line in archive.extractfile(file.name).readlines()
+                ]
+            if file.name == "./tests.jsonl":
+                tests = [
+                    json.loads(line)
+                    for line in archive.extractfile(file.name).readlines()
+                ]
+            if file.name == "./vectors.npy":
+                f = BytesIO()
+                f.write(archive.extractfile(file.name).read())
+                f.seek(0)
+                vectors = np.load(f)
+
+    assert payloads is not None
+    assert tests is not None
+    assert vectors is not None
+    dimensions = vectors.shape[1]
+    metadata_columns = sorted(list(payloads[0].keys()))
+
+    def col_type(v):
+        if isinstance(v, int):
+            return "integer"
+        if isinstance(v, float):
+            return "float"
+        if isinstance(v, str):
+            return "text"
+        raise Exception(f"Unknown column type: {v}")
+
+    metadata_columns_types = [col_type(payloads[0][col]) for col in metadata_columns]
+
+    print(time.time() - t0)
+    t0 = time.time()
+    print("seeding...")
+
+    db = sqlite3.connect(f"{file_path.stem}.db")
+    db.execute("PRAGMA page_size = 16384")
+    db.row_factory = sqlite3.Row
+    db.enable_load_extension(True)
+    db.load_extension("../../dist/vec0")
+    db.enable_load_extension(False)
+
+    with db:
+        db.execute("create table tests(data)")
+
+        for test in tests:
+            db.execute("insert into tests values (?)", [json.dumps(test)])
+
+    with db:
+        create_sql = f"create virtual table v using vec0(vector float[{dimensions}] distance_metric=cosine"
+        insert_sql = "insert into v(rowid, vector"
+        for name, type in zip(metadata_columns, metadata_columns_types):
+            if metadata_set:
+                if name in metadata_set:
+                    create_sql += f", {name} {type}"
+                else:
+                    create_sql += f", +{name} {type}"
+            else:
+                create_sql += f", {name} {type}"
+
+            insert_sql += f", {name}"
+        create_sql += ")"
+        insert_sql += ") values (" + ",".join("?" * (2 + len(metadata_columns))) + ")"
+        print(create_sql)
+        print(insert_sql)
+
+        db.execute(create_sql)
+
+        for idx, (payload, vector) in enumerate(
+            tqdm(zip(payloads, vectors), total=len(payloads))
+        ):
+            params = [idx, vector]
+            for c in metadata_columns:
+                params.append(payload[c])
+            db.execute(insert_sql, params)
+
+    print(time.time() - t0)
+
+
+def tests_command(file_path):
+    file_path = Path(file_path)
+    db = sqlite3.connect(f"{file_path.stem}.db")
+    db.execute("PRAGMA cache_size = -100000000")
+    db.row_factory = sqlite3.Row
+    db.enable_load_extension(True)
+    db.load_extension("../../dist/vec0")
+    db.enable_load_extension(False)
+
+    tests = [
+        json.loads(row["data"])
+        for row in db.execute("select data from tests").fetchall()
+    ]
+
+    num_or_skips = 0
+    num_1off_errors = 0
+
+    t0 = time.time()
+    print("testing...")
+    for idx, test in enumerate(tqdm(tests)):
+        query = test["query"]
+        conditions = test["conditions"]
+        expected_closest_ids = test["closest_ids"]
+        expected_closest_scores = test["closest_scores"]
+
+        sql = "select rowid, 1 - distance as similarity from v where vector match ? and k = ?"
+        params = [serialize_float32(query), len(expected_closest_ids)]
+
+        if "and" in conditions:
+            for condition in conditions["and"]:
+                assert len(condition.keys()) == 1
+                column = list(condition.keys())[0]
+                assert len(list(condition[column].keys())) == 1
+                condition_type = list(condition[column].keys())[0]
+                if condition_type == "match":
+                    value = condition[column]["match"]["value"]
+                    sql += f" and {column} = ?"
+                    params.append(value)
+                elif condition_type == "range":
+                    sql += f" and {column} between ? and ?"
+                    params.append(condition[column]["range"]["gt"])
+                    params.append(condition[column]["range"]["lt"])
+                else:
+                    raise Exception(f"Unknown condition type: {condition_type}")
+        elif "or" in conditions:
+            column = list(conditions["or"][0].keys())[0]
+            condition_type = list(conditions["or"][0][column].keys())[0]
+            assert condition_type == "match"
+            sql += f" and {column} in ("
+            for idx, condition in enumerate(conditions["or"]):
+                if condition_type == "match":
+                    value = condition[column]["match"]["value"]
+                    if idx != 0:
+                        sql += ","
+                    sql += "?"
+                    params.append(value)
+                elif condition_type == "range":
+                    breakpoint()
+                else:
+                    raise Exception(f"Unknown condition type: {condition_type}")
+            sql += ")"
+
+        # print(sql, params[1:])
+        rows = db.execute(sql, params).fetchall()
+        actual_closest_ids = [row["rowid"] for row in rows]
+        matches = expected_closest_ids == actual_closest_ids
+        if not matches:
+            diff = DeepDiff(
+                expected_closest_ids, actual_closest_ids, ignore_order=False
+            )
+            assert len(list(diff.keys())) == 1
+            assert "values_changed" in diff.keys()
+            keys_changed = list(diff["values_changed"].keys())
+            if len(keys_changed) == 2:
+                akey, bkey = keys_changed
+                a = int(akey.lstrip("root[").rstrip("]"))
+                b = int(bkey.lstrip("root[").rstrip("]"))
+                assert abs(a - b) == 1
+                assert (
+                    diff["values_changed"][akey]["new_value"]
+                    == diff["values_changed"][bkey]["old_value"]
+                )
+                assert (
+                    diff["values_changed"][akey]["old_value"]
+                    == diff["values_changed"][bkey]["new_value"]
+                )
+            elif len(keys_changed) == 1:
+                v = int(keys_changed[0].lstrip("root[").rstrip("]"))
+                assert (v + 1) == len(expected_closest_ids)
+            else:
+                raise Exception("fuck")
+            num_1off_errors += 1
+        # print(closest_scores)
+        # print([row["similarity"] for row in rows])
+        # assert closest_scores == [row["similarity"] for row in rows]
+    print("Number skipped: ", num_or_skips)
+    print("Num 1 off errors: ", num_1off_errors)
+    print("1 off error rate: ", num_1off_errors / (len(tests) - num_or_skips))
+    print(time.time() - t0)
+    print("done")
+
+
+def main():
+    parser = argparse.ArgumentParser(description="CLI tool")
+    subparsers = parser.add_subparsers(dest="command", required=True)
+
+    build_parser = subparsers.add_parser("build")
+    build_parser.add_argument("file", type=str, help="Path to input file")
+    build_parser.add_argument("--metadata", type=str, help="Metadata columns")
+    build_parser.set_defaults(func=lambda args: build_command(args.file, args.metadata))
+
+    tests_parser = subparsers.add_parser("test")
+    tests_parser.add_argument("file", type=str, help="Path to input file")
+    tests_parser.set_defaults(func=lambda args: tests_command(args.file))
+
+    args = parser.parse_args()
+    args.func(args)
+
+
+if __name__ == "__main__":
+    main()
--- a/tests/test-auxiliary.py
+++ b/tests/test-auxiliary.py
@ -55,7 +55,10 @@ def test_types(db, snapshot):
    )
    assert exec(db, "select * from v") == snapshot()

+    # TODO: integrity test transaction failures in shadow tables
+    db.commit()
    # bad types
+    db.execute("BEGIN")
    assert (
        exec(db, INSERT, [b"\x11\x11\x11\x11", "not int", 1.2, "text", b"blob"])
        == snapshot()
@ -66,6 +69,7 @@ def test_types(db, snapshot):
    )
    assert exec(db, INSERT, [b"\x11\x11\x11\x11", 1, 1.2, 1, b"blob"]) == snapshot()
    assert exec(db, INSERT, [b"\x11\x11\x11\x11", 1, 1.2, "text", 1]) == snapshot()
+    db.execute("ROLLBACK")

    # NULLs are totally chill
    assert exec(db, INSERT, [b"\x11\x11\x11\x11", None, None, None, None]) == snapshot()
@ -151,5 +155,7 @@ def vec0_shadow_table_contents(db, v):
    ]
    o = {}
    for shadow_table in shadow_tables:
+        if shadow_table.endswith("_info"):
+            continue
        o[shadow_table] = exec(db, f"select * from {shadow_table}")
    return o
--- a/tests/test-general.py
+++ b/tests/test-general.py
@ -0,0 +1,60 @@
+import sqlite3
+from collections import OrderedDict
+import pytest
+
+
+@pytest.mark.skipif(
+    sqlite3.sqlite_version_info[1] < 37,
+    reason="pragma_table_list was added in SQLite 3.37",
+)
+def test_shadow(db, snapshot):
+    db.execute(
+        "create virtual table v using vec0(a float[1], partition text partition key, metadata text, +name text, chunk_size=8)"
+    )
+    assert exec(db, "select * from sqlite_master order by name") == snapshot()
+    assert (
+        exec(db, "select * from pragma_table_list where type = 'shadow'") == snapshot()
+    )
+
+    db.execute("drop table v;")
+    assert (
+        exec(db, "select * from pragma_table_list where type = 'shadow'") == snapshot()
+    )
+
+
+def test_info(db, snapshot):
+    db.execute("create virtual table v using vec0(a float[1])")
+    assert exec(db, "select key, typeof(value) from v_info order by 1") == snapshot()
+
+
+def exec(db, sql, parameters=[]):
+    try:
+        rows = db.execute(sql, parameters).fetchall()
+    except (sqlite3.OperationalError, sqlite3.DatabaseError) as e:
+        return {
+            "error": e.__class__.__name__,
+            "message": str(e),
+        }
+    a = []
+    for row in rows:
+        o = OrderedDict()
+        for k in row.keys():
+            o[k] = row[k]
+        a.append(o)
+    result = OrderedDict()
+    result["sql"] = sql
+    result["rows"] = a
+    return result
+
+
+def vec0_shadow_table_contents(db, v):
+    shadow_tables = [
+        row[0]
+        for row in db.execute(
+            "select name from sqlite_master where name like ? order by 1", [f"{v}_%"]
+        ).fetchall()
+    ]
+    o = {}
+    for shadow_table in shadow_tables:
+        o[shadow_table] = exec(db, f"select * from {shadow_table}")
+    return o
--- a/tests/test-loadable.py
+++ b/tests/test-loadable.py
@ -1022,6 +1022,7 @@ def test_vec0_drops():
    ] == [
        "t1",
        "t1_chunks",
+        "t1_info",
        "t1_rowids",
        "t1_vector_chunks00",
        "t1_vector_chunks01",
@ -2216,6 +2217,9 @@ def test_smoke():
        {
            "name": "vec_xyz_chunks",
        },
+        {
+            "name": "vec_xyz_info",
+        },
        {
            "name": "vec_xyz_rowids",
        },
--- a/tests/test-metadata.py
+++ b/tests/test-metadata.py
@ -0,0 +1,629 @@
+import pytest
+import sqlite3
+from collections import OrderedDict
+import json
+
+
+def test_constructor_limit(db, snapshot):
+    assert exec(
+        db,
+        f"""
+        create virtual table v using vec0(
+          {",".join([f"metadata{x} integer" for x in range(17)])}
+          v float[1]
+        )
+      """,
+    ) == snapshot(name="max 16 metadata columns")
+
+
+def test_normal(db, snapshot):
+    db.execute(
+        "create virtual table v using vec0(vector float[1], b boolean, n int, f float, t text, chunk_size=8)"
+    )
+    assert exec(
+        db, "select * from sqlite_master where type = 'table' order by name"
+    ) == snapshot(name="sqlite_master")
+
+    assert vec0_shadow_table_contents(db, "v") == snapshot()
+
+    INSERT = "insert into v(vector, b, n, f, t) values (?, ?, ?, ?, ?)"
+    assert exec(db, INSERT, [b"\x11\x11\x11\x11", 1, 1, 1.1, "one"]) == snapshot()
+    assert exec(db, INSERT, [b"\x22\x22\x22\x22", 1, 2, 2.2, "two"]) == snapshot()
+    assert exec(db, INSERT, [b"\x33\x33\x33\x33", 1, 3, 3.3, "three"]) == snapshot()
+
+    assert exec(db, "select * from v") == snapshot()
+    assert vec0_shadow_table_contents(db, "v") == snapshot()
+
+    assert exec(db, "drop table v") == snapshot()
+    assert exec(db, "select * from sqlite_master") == snapshot()
+
+
+#
+# assert exec(db, "select * from v") == snapshot()
+# assert vec0_shadow_table_contents(db, "v") == snapshot()
+#
+# db.execute("drop table v;")
+# assert exec(db, "select * from sqlite_master order by name") == snapshot(
+#    name="sqlite_master post drop"
+# )
+
+
+def test_text_knn(db, snapshot):
+    db.execute(
+        "create virtual table v using vec0(vector float[1], name text, chunk_size=8)"
+    )
+    assert vec0_shadow_table_contents(db, "v") == snapshot()
+    INSERT = "insert into v(vector, name) values (?, ?)"
+    db.execute(
+        """
+      INSERT INTO v(vector, name) VALUES
+        ('[.11]', 'aaa'),
+        ('[.22]', 'bbb'),
+        ('[.33]', 'ccc'),
+        ('[.44]', 'ddd'),
+        ('[.55]', 'eee'),
+        ('[.66]', 'fff'),
+        ('[.77]', 'ggg'),
+        ('[.88]', 'hhh'),
+        ('[.99]', 'iii');
+    """
+    )
+    assert exec(db, "select * from v") == snapshot()
+    assert vec0_shadow_table_contents(db, "v") == snapshot()
+
+    assert (
+        exec(
+            db,
+            "select rowid, name, distance from v where vector match '[1]' and k = 5",
+        )
+        == snapshot()
+    )
+
+    assert (
+        exec(
+            db,
+            "select rowid, name, distance from v where vector match '[1]' and k = 5 and name < 'ddd'",
+        )
+        == snapshot()
+    )
+    assert (
+        exec(
+            db,
+            "select rowid, name, distance from v where vector match '[1]' and k = 5 and name <= 'ddd'",
+        )
+        == snapshot()
+    )
+    assert (
+        exec(
+            db,
+            "select rowid, name, distance from v where vector match '[1]' and k = 5 and name > 'fff'",
+        )
+        == snapshot()
+    )
+    assert (
+        exec(
+            db,
+            "select rowid, name, distance from v where vector match '[1]' and k = 5 and name >= 'fff'",
+        )
+        == snapshot()
+    )
+    assert (
+        exec(
+            db,
+            "select rowid, name, distance from v where vector match '[1]' and k = 5 and name = 'aaa'",
+        )
+        == snapshot()
+    )
+    assert (
+        exec(
+            db,
+            "select rowid, name, distance from v where vector match '[.01]' and k = 5 and name != 'aaa'",
+        )
+        == snapshot()
+    )
+
+
+def test_long_text_updates(db, snapshot):
+    db.execute(
+        "create virtual table v using vec0(vector float[1], name text, chunk_size=8)"
+    )
+    assert vec0_shadow_table_contents(db, "v") == snapshot()
+    INSERT = "insert into v(vector, name) values (?, ?)"
+    exec(db, INSERT, [b"\x11\x11\x11\x11", "123456789a12"])
+    exec(db, INSERT, [b"\x11\x11\x11\x11", "123456789a123"])
+    assert exec(db, "select * from v") == snapshot()
+    assert vec0_shadow_table_contents(db, "v") == snapshot()
+
+
+def test_long_text_knn(db, snapshot):
+    db.execute(
+        "create virtual table v using vec0(vector float[1], name text, chunk_size=8)"
+    )
+    INSERT = "insert into v(vector, name) values (?, ?)"
+    exec(db, INSERT, ["[1]", "aaaa"])
+    exec(db, INSERT, ["[2]", "aaaaaaaaaaaa_aaa"])
+    exec(db, INSERT, ["[3]", "bbbb"])
+    exec(db, INSERT, ["[4]", "bbbbbbbbbbbb_bbb"])
+    exec(db, INSERT, ["[5]", "cccc"])
+    exec(db, INSERT, ["[6]", "cccccccccccc_ccc"])
+
+    tests = [
+        "bbbb",
+        "bb",
+        "bbbbbb",
+        "bbbbbbbbbbbb_bbb",
+        "bbbbbbbbbbbb_aaa",
+        "bbbbbbbbbbbb_ccc",
+        "longlonglonglonglonglonglong",
+    ]
+    ops = ["=", "!=", "<", "<=", ">", ">="]
+    op_names = ["eq", "ne", "lt", "le", "gt", "ge"]
+
+    for test in tests:
+        for op, op_name in zip(ops, op_names):
+            assert exec(
+                db,
+                f"select rowid, name, distance from v where vector match '[100]' and k = 5 and name {op} ?",
+                [test],
+            ) == snapshot(name=f"{op_name}-{test}")
+
+
+def test_types(db, snapshot):
+    db.execute(
+        "create virtual table v using vec0(vector float[1], b boolean, n int, f float, t text, chunk_size=8)"
+    )
+    INSERT = "insert into v(vector, b, n, f, t) values (?, ?, ?, ?, ?)"
+
+    assert exec(db, INSERT, [b"\x11\x11\x11\x11", 1, 1, 1.1, "test"]) == snapshot(
+        name="legal"
+    )
+
+    # fmt: off
+    assert exec(db, INSERT, [b"\x11\x11\x11\x11", 'illegal', 1, 1.1, 'test']) == snapshot(name="illegal-type-boolean")
+    assert exec(db, INSERT, [b"\x11\x11\x11\x11", 1, 'illegal', 1.1, 'test']) == snapshot(name="illegal-type-int")
+    assert exec(db, INSERT, [b"\x11\x11\x11\x11", 1, 1, 'illegal', 'test']) == snapshot(name="illegal-type-float")
+    assert exec(db, INSERT, [b"\x11\x11\x11\x11", 1, 1, 1.1, 420]) == snapshot(name="illegal-type-text")
+    # fmt: on
+
+    assert exec(db, INSERT, [b"\x11\x11\x11\x11", 44, 1, 1.1, "test"]) == snapshot(
+        name="illegal-boolean"
+    )
+
+
+def test_updates(db, snapshot):
+    db.execute(
+        "create virtual table v using vec0(vector float[1], b boolean, n int, f float, t text, chunk_size=8)"
+    )
+    INSERT = "insert into v(rowid, vector, b, n, f, t) values (?, ?, ?, ?, ?, ?)"
+
+    exec(db, INSERT, [1, b"\x11\x11\x11\x11", 1, 1, 1.1, "test1"])
+    exec(db, INSERT, [2, b"\x22\x22\x22\x22", 1, 2, 2.2, "test2"])
+    exec(db, INSERT, [3, b"\x33\x33\x33\x33", 1, 3, 3.3, "1234567890123"])
+    assert exec(db, "select * from v") == snapshot(name="1-init-contents")
+    assert vec0_shadow_table_contents(db, "v") == snapshot(name="1-init-shadow")
+
+    assert exec(
+        db, "UPDATE v SET b = 0, n = 11, f = 11.11, t = 'newtest1' where rowid = 1"
+    )
+    assert exec(db, "select * from v") == snapshot(name="general-update-contents")
+    assert vec0_shadow_table_contents(db, "v") == snapshot(
+        name="general-update-shaodnw"
+    )
+
+    # string update #1: long string updated to long string
+    exec(db, "UPDATE v SET t = '1234567890123-updated' where rowid = 3")
+    assert exec(db, "select * from v") == snapshot(name="string-update-1-contents")
+    assert vec0_shadow_table_contents(db, "v") == snapshot(
+        name="string-update-1-shadow"
+    )
+
+    # string update #2: short string updated to short string
+    exec(db, "UPDATE v SET t = 'test2-short' where rowid = 2")
+    assert exec(db, "select * from v") == snapshot(name="string-update-2-contents")
+    assert vec0_shadow_table_contents(db, "v") == snapshot(
+        name="string-update-2-shadow"
+    )
+
+    # string update #3: short string updated to long string
+    exec(db, "UPDATE v SET t = 'test2-long-long-long' where rowid = 2")
+    assert exec(db, "select * from v") == snapshot(name="string-update-3-contents")
+    assert vec0_shadow_table_contents(db, "v") == snapshot(
+        name="string-update-3-shadow"
+    )
+
+    # string update #4: long string updated to short string
+    exec(db, "UPDATE v SET t = 'test2-shortx' where rowid = 2")
+    assert exec(db, "select * from v") == snapshot(name="string-update-4-contents")
+    assert vec0_shadow_table_contents(db, "v") == snapshot(
+        name="string-update-4-shadow"
+    )
+
+
+def test_deletes(db, snapshot):
+    db.execute(
+        "create virtual table v using vec0(vector float[1], b boolean, n int, f float, t text, chunk_size=8)"
+    )
+    INSERT = "insert into v(rowid, vector, b, n, f, t) values (?, ?, ?, ?, ?, ?)"
+
+    assert exec(db, INSERT, [1, b"\x11\x11\x11\x11", 1, 1, 1.1, "test1"]) == snapshot()
+    assert exec(db, INSERT, [2, b"\x22\x22\x22\x22", 1, 2, 2.2, "test2"]) == snapshot()
+    assert (
+        exec(db, INSERT, [3, b"\x33\x33\x33\x33", 1, 3, 3.3, "1234567890123"])
+        == snapshot()
+    )
+
+    assert exec(db, "select * from v") == snapshot()
+    assert vec0_shadow_table_contents(db, "v") == snapshot()
+
+    assert exec(db, "DELETE FROM v where rowid = 1") == snapshot()
+    assert exec(db, "select * from v") == snapshot()
+    assert vec0_shadow_table_contents(db, "v") == snapshot()
+
+    assert exec(db, "DELETE FROM v where rowid = 3") == snapshot()
+    assert exec(db, "select * from v") == snapshot()
+    assert vec0_shadow_table_contents(db, "v") == snapshot()
+
+
+def test_knn(db, snapshot):
+    db.execute(
+        "create virtual table v using vec0(vector float[1], name text, chunk_size=8)"
+    )
+    assert exec(
+        db, "select * from sqlite_master where type = 'table' order by name"
+    ) == snapshot(name="sqlite_master")
+    db.executemany(
+        "insert into v(vector, name) values (?, ?)",
+        [("[1]", "alex"), ("[2]", "brian"), ("[3]", "craig")],
+    )
+
+    # EVIDENCE-OF: V16511_00582 catches "illegal" constraints on metadata columns
+    assert (
+        exec(
+            db,
+            "select *, distance from v where vector match '[5]' and k = 3 and name like 'illegal'",
+        )
+        == snapshot()
+    )
+
+
+SUPPORTS_VTAB_IN = sqlite3.sqlite_version_info[1] >= 38
+
+
+@pytest.mark.skipif(
+    not SUPPORTS_VTAB_IN, reason="requires vtab `x in (...)` support in SQLite >=3.38"
+)
+def test_vtab_in(db, snapshot):
+    db.execute(
+        "create virtual table v using vec0(vector float[1], n int, t text, b boolean, f float, chunk_size=8)"
+    )
+    db.executemany(
+        "insert into v(rowid, vector, n, t, b, f) values (?, ?, ?, ?, ?, ?)",
+        [
+            (1, "[1]", 999, "aaaa", 0, 1.1),
+            (2, "[2]", 555, "aaaa", 0, 1.1),
+            (3, "[3]", 999, "aaaa", 0, 1.1),
+            (4, "[4]", 555, "aaaa", 0, 1.1),
+            (5, "[5]", 999, "zzzz", 0, 1.1),
+            (6, "[6]", 555, "zzzz", 0, 1.1),
+            (7, "[7]", 999, "zzzz", 0, 1.1),
+            (8, "[8]", 555, "zzzz", 0, 1.1),
+        ],
+    )
+
+    # EVIDENCE-OF: V15248_32086
+    assert exec(
+        db, "select *  from v where vector match '[0]' and k = 8 and b in (1, 0)"
+    ) == snapshot(name="block-bool")
+
+    assert exec(
+        db, "select *  from v where vector match '[0]' and k = 8 and f in (1.1, 0.0)"
+    ) == snapshot(name="block-float")
+
+    assert exec(
+        db,
+        "select rowid, n, distance  from v where vector match '[0]' and k = 8 and n in (555, 999)",
+    ) == snapshot(name="allow-int-all")
+    assert exec(
+        db,
+        "select rowid, n, distance from v where vector match '[0]' and k = 8 and n in (555, -1, -2)",
+    ) == snapshot(name="allow-int-superfluous")
+
+    assert exec(
+        db,
+        "select rowid, t, distance  from v where vector match '[0]' and k = 8 and t in ('aaaa', 'zzzz')",
+    ) == snapshot(name="allow-text-all")
+    assert exec(
+        db,
+        "select rowid, t, distance from v where vector match '[0]' and k = 8 and t in ('aaaa', 'foo', 'bar')",
+    ) == snapshot(name="allow-text-superfluous")
+
+
+def test_vtab_in_long_text(db, snapshot):
+    db.execute(
+        "create virtual table v using vec0(vector float[1], t text, chunk_size=8)"
+    )
+    data = [
+        (1, "aaaa"),
+        (2, "aaaaaaaaaaaa_aaa"),
+        (3, "bbbb"),
+        (4, "bbbbbbbbbbbb_bbb"),
+        (5, "cccc"),
+        (6, "cccccccccccc_ccc"),
+    ]
+    db.executemany(
+        "insert into v(rowid, vector, t) values (:rowid, printf('[%d]', :rowid), :vector)",
+        [{"rowid": row[0], "vector": row[1]} for row in data],
+    )
+
+    for _, lookup in data:
+        assert exec(
+            db,
+            "select rowid, t from v where vector match '[0]' and k = 10 and t in (?, 'nonsense')",
+            [lookup],
+        ) == snapshot(name=f"individual-{lookup}")
+
+    assert exec(
+        db,
+        "select rowid, t from v where vector match '[0]' and k = 10 and t in (select value from json_each(?))",
+        [json.dumps([row[1] for row in data])],
+    ) == snapshot(name="all")
+
+
+def test_idxstr(db, snapshot):
+    db.execute(
+        """
+          create virtual table vec_movies using vec0(
+            movie_id integer primary key,
+            synopsis_embedding float[1],
+            +title text,
+            is_favorited boolean,
+            genre text,
+            num_reviews int,
+            mean_rating float,
+            chunk_size=8
+          );
+        """
+    )
+
+    assert (
+        eqp(
+            db,
+            "select * from vec_movies where synopsis_embedding match '' and k = 0 and is_favorited = true",
+        )
+        == snapshot()
+    )
+
+    ops = ["<", ">", "<=", ">=", "!="]
+
+    for op in ops:
+        assert eqp(
+            db,
+            f"select * from vec_movies where synopsis_embedding match '' and k = 0 and genre {op} NULL",
+        ) == snapshot(name=f"knn-constraint-text {op}")
+
+    for op in ops:
+        assert eqp(
+            db,
+            f"select * from vec_movies where synopsis_embedding match '' and k = 0 and num_reviews {op} NULL",
+        ) == snapshot(name=f"knn-constraint-int {op}")
+
+    for op in ops:
+        assert eqp(
+            db,
+            f"select * from vec_movies where synopsis_embedding match '' and k = 0 and mean_rating {op} NULL",
+        ) == snapshot(name=f"knn-constraint-float {op}")
+
+    # for op in ops:
+    #    assert eqp(
+    #        db,
+    #        f"select * from vec_movies where synopsis_embedding match '' and k = 0 and is_favorited {op} NULL",
+    #    ) == snapshot(name=f"knn-constraint-boolean {op}")
+
+
+def eqp(db, sql):
+    o = OrderedDict()
+    o["sql"] = sql
+    o["plan"] = [
+        dict(row) for row in db.execute(f"explain query plan {sql}").fetchall()
+    ]
+    for p in o["plan"]:
+        # value is different on macos-aarch64 in github actions, not sure why
+        del p["notused"]
+    return o
+
+
+def test_stress(db, snapshot):
+    db.execute(
+        """
+          create virtual table vec_movies using vec0(
+            movie_id integer primary key,
+            synopsis_embedding float[1],
+            +title text,
+            is_favorited boolean,
+            genre text,
+            num_reviews int,
+            mean_rating float,
+            chunk_size=8
+          );
+        """
+    )
+
+    db.execute(
+        """
+          INSERT INTO vec_movies(movie_id, synopsis_embedding, is_favorited, genre, title, num_reviews, mean_rating)
+          VALUES
+            (1, '[1]', 0, 'horror', 'The Conjuring', 153, 4.6),
+            (2, '[2]', 0, 'comedy', 'Dumb and Dumber', 382, 2.6),
+            (3, '[3]', 0, 'scifi', 'Interstellar', 53, 5.0),
+            (4, '[4]', 0, 'fantasy', 'The Lord of the Rings: The Fellowship of the Ring', 210, 4.2),
+            (5, '[5]', 1, 'documentary', 'An Inconvenient Truth', 93, 3.4),
+            (6, '[6]', 1, 'horror', 'Hereditary', 167, 4.7),
+            (7, '[7]', 1, 'comedy', 'Anchorman: The Legend of Ron Burgundy', 482, 2.9),
+            (8, '[8]', 0, 'scifi', 'Blade Runner 2049', 301, 5.0),
+            (9, '[9]', 1, 'fantasy', 'Harry Potter and the Sorcerer''s Stone', 134, 4.1),
+            (10, '[10]', 0, 'documentary', 'Free Solo', 66, 3.2),
+            (11, '[11]', 1, 'horror', 'Get Out', 88, 4.9),
+            (12, '[12]', 0, 'comedy', 'The Hangover', 59, 2.8),
+            (13, '[13]', 1, 'scifi', 'The Matrix', 423, 4.5),
+            (14, '[14]', 0, 'fantasy', 'Pan''s Labyrinth', 275, 3.6),
+            (15, '[15]', 1, 'documentary', '13th', 191, 4.4),
+            (16, '[16]', 0, 'horror', 'It Follows', 314, 4.3),
+            (17, '[17]', 1, 'comedy', 'Step Brothers', 74, 3.0),
+            (18, '[18]', 1, 'scifi', 'Inception', 201, 5.0),
+            (19, '[19]', 1, 'fantasy', 'The Shape of Water', 399, 2.7),
+            (20, '[20]', 1, 'documentary', 'Won''t You Be My Neighbor?', 186, 4.8),
+            (21, '[21]', 1, 'scifi', 'Gravity', 342, 4.0),
+            (22, '[22]', 1, 'scifi', 'Dune', 451, 4.4),
+            (23, '[23]', 1, 'scifi', 'The Martian', 522, 4.6),
+            (24, '[24]', 1, 'horror', 'A Quiet Place', 271, 4.3),
+            (25, '[25]', 1, 'fantasy', 'The Chronicles of Narnia: The Lion, the Witch and the Wardrobe', 310, 3.9);
+
+        """
+    )
+
+    assert vec0_shadow_table_contents(db, "vec_movies") == snapshot()
+    assert (
+        exec(
+            db,
+            """
+          select
+            movie_id,
+            title,
+            genre,
+            num_reviews,
+            mean_rating,
+            is_favorited,
+            distance
+          from vec_movies
+          where synopsis_embedding match '[15.5]'
+            and genre = 'scifi'
+            and num_reviews between 100 and 500
+            and mean_rating > 3.5
+            and k = 5;
+        """,
+        )
+        == snapshot()
+    )
+
+    assert (
+        exec(
+            db,
+            "select movie_id, genre, distance from vec_movies where synopsis_embedding match '[100]' and k = 5 and genre = 'horror'",
+        )
+        == snapshot()
+    )
+    assert (
+        exec(
+            db,
+            "select movie_id, genre, distance from vec_movies where synopsis_embedding match '[100]' and k = 5 and genre = 'comedy'",
+        )
+        == snapshot()
+    )
+    assert (
+        exec(
+            db,
+            "select movie_id, num_reviews, distance from vec_movies where synopsis_embedding match '[100]' and k = 5 and num_reviews between 100 and 500",
+        )
+        == snapshot()
+    )
+    assert (
+        exec(
+            db,
+            "select movie_id, num_reviews, distance from vec_movies where synopsis_embedding match '[100]' and k = 5 and num_reviews >= 500",
+        )
+        == snapshot()
+    )
+    assert (
+        exec(
+            db,
+            "select movie_id, mean_rating, distance from vec_movies where synopsis_embedding match '[100]' and k = 5 and mean_rating < 3.0",
+        )
+        == snapshot()
+    )
+    assert (
+        exec(
+            db,
+            "select movie_id, mean_rating, distance from vec_movies where synopsis_embedding match '[100]' and k = 5 and mean_rating between 4.0 and 5.0",
+        )
+        == snapshot()
+    )
+
+    assert exec(
+        db,
+        "select movie_id, is_favorited, distance from vec_movies where synopsis_embedding match '[100]' and k = 5 and is_favorited = TRUE",
+    ) == snapshot(name="bool-eq-true")
+    assert exec(
+        db,
+        "select movie_id, is_favorited, distance from vec_movies where synopsis_embedding match '[100]' and k = 5 and is_favorited != TRUE",
+    ) == snapshot(name="bool-ne-true")
+    assert exec(
+        db,
+        "select movie_id, is_favorited, distance from vec_movies where synopsis_embedding match '[100]' and k = 5 and is_favorited = FALSE",
+    ) == snapshot(name="bool-eq-false")
+    assert exec(
+        db,
+        "select movie_id, is_favorited, distance from vec_movies where synopsis_embedding match '[100]' and k = 5 and is_favorited != FALSE",
+    ) == snapshot(name="bool-ne-false")
+
+    # EVIDENCE-OF: V10145_26984
+    assert exec(
+        db,
+        "select movie_id, is_favorited, distance from vec_movies where synopsis_embedding match '[100]' and k = 5 and is_favorited >= 999",
+    ) == snapshot(name="bool-other-op")
+
+
+def test_errors(db, snapshot):
+    db.execute("create virtual table v using vec0(vector float[1], t text)")
+    db.execute("insert into v(vector, t) values ('[1]', 'aaaaaaaaaaaax')")
+
+    assert exec(db, "select * from v") == snapshot()
+
+    # EVIDENCE-OF: V15466_32305
+    db.set_authorizer(
+        authorizer_deny_on(sqlite3.SQLITE_READ, "v_metadatatext00", "data")
+    )
+    assert exec(db, "select * from v") == snapshot()
+
+
+def authorizer_deny_on(operation, x1, x2=None):
+    def _auth(op, p1, p2, p3, p4):
+        if op == operation and p1 == x1 and p2 == x2:
+            return sqlite3.SQLITE_DENY
+        return sqlite3.SQLITE_OK
+
+    return _auth
+
+
+def exec(db, sql, parameters=[]):
+    try:
+        rows = db.execute(sql, parameters).fetchall()
+    except (sqlite3.OperationalError, sqlite3.DatabaseError) as e:
+        return {
+            "error": e.__class__.__name__,
+            "message": str(e),
+        }
+    a = []
+    for row in rows:
+        o = OrderedDict()
+        for k in row.keys():
+            o[k] = row[k]
+        a.append(o)
+    result = OrderedDict()
+    result["sql"] = sql
+    result["rows"] = a
+    return result
+
+
+def vec0_shadow_table_contents(db, v):
+    shadow_tables = [
+        row[0]
+        for row in db.execute(
+            "select name from sqlite_master where name like ? order by 1", [f"{v}_%"]
+        ).fetchall()
+    ]
+    o = {}
+    for shadow_table in shadow_tables:
+        if shadow_table.endswith("_info"):
+            continue
+        o[shadow_table] = exec(db, f"select * from {shadow_table}")
+    return o
--- a/tests/test-partition-keys.py
+++ b/tests/test-partition-keys.py
@ -111,5 +111,7 @@ def vec0_shadow_table_contents(db, v):
    ]
    o = {}
    for shadow_table in shadow_tables:
+        if shadow_table.endswith("_info"):
+            continue
        o[shadow_table] = exec(db, f"select * from {shadow_table}")
    return o