sqlite-vec/reference.yaml

293 lines
8.9 KiB
YAML
Raw Normal View History

2024-04-27 12:05:50 -07:00
sections:
2024-07-16 22:28:15 -07:00
meta:
title: Meta
desc: TODO
constructors:
title: Constructors
desc: TODO
op:
title: Operations
desc: TODO
distance:
title: Distance functions
desc: TODO
quantization:
title: Quantization
desc: TODO
2024-04-27 12:05:50 -07:00
functions:
vec_version:
params: []
section: meta
2024-07-14 13:47:41 -07:00
desc: Returns a version string of the current `sqlite-vec` installation.
2024-06-24 23:26:29 -07:00
example: select vec_version();
2024-04-27 12:05:50 -07:00
vec_debug:
params: []
section: meta
2024-07-14 13:47:41 -07:00
desc: Returns debugging information of the current `sqlite-vec` installation.
example: select vec_debug();
2024-04-27 12:05:50 -07:00
vec_f32:
2024-07-14 13:47:41 -07:00
params: [vector]
2024-07-16 22:28:15 -07:00
section: constructors
2024-07-14 13:47:41 -07:00
desc: |
Creates a float vector from a BLOB or JSON text. If a BLOB is provided,
the length must be divisible by 4, as a float takes up 4 bytes of space each.
2024-07-16 22:28:15 -07:00
The returned value is a BLOB with 4 bytes per element, with a special [subtype](https://www.sqlite.org/c3ref/result_subtype.html)
of `223`.
2024-07-14 13:47:41 -07:00
example:
2024-07-16 22:28:15 -07:00
- select vec_f32('[.1, .2, .3, 4]');
- select subtype(vec_f32('[.1, .2, .3, 4]'));
- select vec_f32(X'AABBCCDD');
- select vec_to_json(vec_f32(X'AABBCCDD'));
- select vec_f32(X'AA');
2024-04-27 12:05:50 -07:00
vec_int8:
2024-07-14 13:47:41 -07:00
params: [vector]
2024-07-16 22:28:15 -07:00
section: constructors
desc: |
2024-07-14 13:47:41 -07:00
Creates a 8-bit integer vector from a BLOB or JSON text. If a BLOB is provided,
the length must be divisible by 4, as a float takes up 4 bytes of space each.
If JSON text is provided, each element must be an integer between -128 and 127 inclusive.
2024-07-16 22:28:15 -07:00
The returned value is a BLOB with 1 byte per element, with a special [subtype](https://www.sqlite.org/c3ref/result_subtype.html)
of `225`.
2024-07-14 13:47:41 -07:00
example:
- select vec_int8('[1, 2, 3, 4]');
- select subtype(vec_int8('[1, 2, 3, 4]'));
- select vec_int8(X'AABBCCDD');
- select vec_to_json(vec_int8(X'AABBCCDD'));
- select vec_int8('[999]');
2024-04-27 12:05:50 -07:00
2024-07-16 22:28:15 -07:00
vec_bit:
params: [vector]
section: constructors
desc: |
Creates a binary vector from a BLOB.
The returned value is a BLOB with 4 bytes per element, with a special [subtype](https://www.sqlite.org/c3ref/result_subtype.html)
of `224`.
example:
- select vec_bit(X'F0');
- select subtype(vec_bit(X'F0'));
- select vec_to_json(vec_bit(X'F0'));
vec_length:
params: [vector]
section: op
desc: |
Returns the number of elements in the given vector.
The vector can be `JSON`, `BLOB`, or the result of a [constructor function](#constructors).
This function will return an error if `vector` is invalid.
example:
- select vec_length('[.1, .2]');
- select vec_length(X'AABBCCDD');
- select vec_length(vec_int8(X'AABBCCDD'));
- select vec_length(vec_bit(X'AABBCCDD'));
- select vec_length(X'CCDD');
2024-04-27 12:05:50 -07:00
vec_add:
2024-07-14 13:47:41 -07:00
params: [a, b]
2024-04-27 12:05:50 -07:00
section: op
2024-07-14 13:47:41 -07:00
desc: |
2024-07-16 22:28:15 -07:00
Adds every element in vector `a` with vector `b`, returning a new vector `c`. Both vectors
must be of the same type and same length. Only `float32` and `int8` vectors are supported.
An error is raised if either `a` or `b` are invalid, or if they are not the same type or same length.
2024-07-20 10:46:27 -07:00
See also [`vec_sub()`](#vec_sub).
2024-07-16 22:28:15 -07:00
example:
- |
select vec_add(
2024-07-14 13:47:41 -07:00
'[.1, .2, .3]',
'[.4, .5, .6]'
);
- |
select vec_to_json(
vec_add(
'[.1, .2, .3]',
'[.4, .5, .6]'
)
);
- |
select vec_to_json(
2024-07-16 22:28:15 -07:00
vec_add(
vec_int8('[1, 2, 3]'),
vec_int8('[4, 5, 6]')
)
);
- select vec_add('[.1]', vec_int8('[1]'));
- select vec_add(vec_bit(X'AA'), vec_bit(X'BB'));
vec_sub:
params: [a, b]
2024-04-27 12:05:50 -07:00
section: op
2024-07-20 10:46:27 -07:00
desc: |
Subtracts every element in vector `a` with vector `b`, returning a new vector `c`. Both vectors
must be of the same type and same length. Only `float32` and `int8` vectors are supported.
An error is raised if either `a` or `b` are invalid, or if they are not the same type or same length.
See also [`vec_add()`](#vec_add).
2024-07-16 22:28:15 -07:00
example:
- |
select vec_sub(
'[.1, .2, .3]',
'[.4, .5, .6]'
);
- |
select vec_to_json(
vec_sub(
'[.1, .2, .3]',
'[.4, .5, .6]'
)
);
- |
select vec_to_json(
vec_sub(
vec_int8('[1, 2, 3]'),
vec_int8('[4, 5, 6]')
)
);
- select vec_sub('[.1]', vec_int8('[1]'));
- select vec_sub(vec_bit(X'AA'), vec_bit(X'BB'));
2024-04-27 12:05:50 -07:00
vec_normalize:
2024-07-16 22:28:15 -07:00
params: [vector]
2024-04-27 12:05:50 -07:00
section: op
2024-07-20 10:46:27 -07:00
desc: |
Performs L2 normalization on the given vector. Only float32 vectors are currently supported.
Returns an error if the input is an invalid vector or not a float32 vector.
example:
- select vec_normalize('[2, 3, 1, -4]');
- |
select vec_to_json(
vec_normalize('[2, 3, 1, -4]')
);
- |
-- for matryoshka embeddings - slice then normalize
select vec_to_json(
vec_normalize(
vec_slice('[2, 3, 1, -4]', 0, 2)
)
);
2024-04-27 12:05:50 -07:00
vec_slice:
2024-07-16 22:28:15 -07:00
params: [vector, start, end]
2024-04-27 12:05:50 -07:00
section: op
2024-07-20 10:46:27 -07:00
desc: |
Extract a subset of `vector` from the `start` element (inclusive) to the `end` element (exclusive). TODO check
This is especially useful for [Matryoshka embeddings](#TODO), also known as "adaptive length" embeddings.
Use with [`vec_normalize()`](#vec_normalize) to get proper results.
Returns an error in the following conditions:
- If `vector` is not a valid vector
- If `start` is less than zero or greater than or equal to `end`
- If `end` is greater than the length of `vector`, or less than or equal to `start`.
- If `vector` is a bitvector, `start` and `end` must be divisible by 8.
example:
- select vec_slice('[1, 2,3, 4]', 0, 2);
- |
select vec_to_json(
vec_slice('[1, 2,3, 4]', 0, 2)
);
- |
select vec_to_json(
vec_slice('[1, 2,3, 4]', 2, 4)
);
- |
select vec_to_json(
vec_slice('[1, 2,3, 4]', -1, 4)
);
- |
select vec_to_json(
vec_slice('[1, 2,3, 4]', 0, 5)
);
- |
select vec_to_json(
vec_slice('[1, 2,3, 4]', 0, 0)
);
2024-04-27 12:05:50 -07:00
vec_to_json:
2024-07-16 22:28:15 -07:00
params: [vector]
2024-04-27 12:05:50 -07:00
section: op
2024-07-20 10:46:27 -07:00
desc: |
Represents a vector as JSON text. The input vector can be a vector BLOB or JSON text.
Returns an error if `vector` is an invalid vector, or when memory cannot be allocated.
example:
- select vec_to_json(X'AABBCCDD');
- select vec_to_json(vec_int8(X'AABBCCDD'));
- select vec_to_json(vec_bit(X'AABBCCDD'));
- select vec_to_json('[1,2,3,4]');
- select vec_to_json('invalid');
2024-04-27 12:05:50 -07:00
vec_distance_cosine:
2024-07-16 22:28:15 -07:00
params: [a, b]
2024-04-27 12:05:50 -07:00
section: distance
2024-07-20 10:46:27 -07:00
desc: |
Calculates the cosine distance between vectors `a` and `b`. Only valid for float32 or int8 vectors.
Returns an error under the following conditions:
- `a` or `b` are invalid vectors
- `a` or `b` do not share the same vector element types (ex float32 or int8)
- `a` or `b` are bit vectors. Use [`vec_distance_hamming()`](#vec_distance_hamming) for distance calculations between two bitvectors.
- `a` or `b` do not have the same length.
example:
- select vec_distance_cosine('[1, 1]', '[2, 2]');
- select vec_distance_cosine('[1, 1]', '[-2, -2]');
- select vec_distance_cosine('[1.1, 2.2, 3.3]', '[4.4, 5.5, 6.6]');
- select vec_distance_cosine(X'AABBCCDD', X'00112233');
2024-04-27 12:05:50 -07:00
vec_distance_hamming:
2024-07-16 22:28:15 -07:00
params: [a, b]
2024-04-27 12:05:50 -07:00
section: distance
2024-07-20 10:46:27 -07:00
desc: |
Calculates the hamming distance between two bitvectors `a` and `b`. Only valid for bitvectors.
Returns an error under the following conditions:
- `a` or `b` are not bitvectors
- `a` and `b` do not share the same length
- Memory cannot be allocated
example:
- select vec_distance_hamming(vec_bit(X'00'), vec_bit(X'FF'));
- select vec_distance_hamming(vec_bit(X'FF'), vec_bit(X'FF'));
- select vec_distance_hamming(vec_bit(X'F0'), vec_bit(X'44'));
- select vec_distance_hamming(X'F0', X'00');
2024-04-27 12:05:50 -07:00
vec_distance_l2:
2024-07-16 22:28:15 -07:00
params: [a, b]
2024-04-27 12:05:50 -07:00
section: distance
desc: x
2024-07-16 22:28:15 -07:00
example: select 'todo';
2024-04-27 12:05:50 -07:00
vec_quantize_binary:
2024-07-16 22:28:15 -07:00
params: [vector]
2024-04-27 12:05:50 -07:00
section: quantization
desc: x
2024-07-16 22:28:15 -07:00
example: select 'todo';
2024-04-27 12:05:50 -07:00
vec_quantize_i8:
2024-07-16 22:28:15 -07:00
params: [vector, "[start]", "[end]"]
2024-04-27 12:05:50 -07:00
section: quantization
desc: x
2024-07-16 22:28:15 -07:00
example: select 'todo';
#table_functions:
# vec_each:
# columns: [rowid, value]
# inputs: ["vector"]
# desc:
# example:
# vec_npy_each:
# columns: [rowid, vector]
# inputs: ["input"]
# desc:
# example:
#virtual_tables:
# vec0:
# desc:
# example:
#entrypoints:
# sqlite3_vec_init: {}
# sqlite3_vec_fs_read_init: {}
#compile_options:
# - SQLITE_VEC_ENABLE_AVX
# - SQLITE_VEC_ENABLE_NEON
# - SQLITE_VEC_OMIT_FS
#