mirror of
https://github.com/asg017/sqlite-vec.git
synced 2026-04-25 08:46:49 +02:00
doc updates
This commit is contained in:
parent
df48ac2416
commit
b62f6f19a8
31 changed files with 751 additions and 97 deletions
5
site/guides/arithmetic.md
Normal file
5
site/guides/arithmetic.md
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
# Vector Arithmetic
|
||||
|
||||
- `vec_add()`
|
||||
- `vec_sub()`
|
||||
- `vec_mean()`
|
||||
120
site/guides/binary-quant.md
Normal file
120
site/guides/binary-quant.md
Normal file
|
|
@ -0,0 +1,120 @@
|
|||
# Binary Quantization
|
||||
|
||||
"Quantization" refers to a variety of methods and techniques for reducing the
|
||||
size of vectors in a vector index. **Binary quantization** (BQ) refers to a
|
||||
specific technique where each individual floating point element in a vector is
|
||||
reduced to a single bit, typically by assigning `0` to negative numbers and `1`
|
||||
to positive numbers.
|
||||
|
||||
For example, in this 8-dimensional `float32` vector:
|
||||
|
||||
```json
|
||||
[-0.73, -0.80, 0.12, -0.73, 0.79, -0.11, 0.23, 0.97]
|
||||
```
|
||||
|
||||
Applying binary quantization would result in the following `bit` vector:
|
||||
|
||||
```json
|
||||
[0, 0, 1, 0, 1, 0, 1, 1]
|
||||
```
|
||||
|
||||
The original 8-dimensional `float32` vector requires `8 * 4 = 32` bytes of space
|
||||
to store. For 1 million vectors, that would be `32MB`. On the other hand, the
|
||||
binary quantized 8-dimensional vector can be stored in a single byte — one bit
|
||||
per element. For 1 million vectors, that would be just `1MB`, a 32x reduction!
|
||||
|
||||
Though keep in mind, you're bound to lose a lot quality when reducing 32 bits of
|
||||
information to 1 bit. [Over-sampling and re-scoring](#re-scoring) will help a
|
||||
lot.
|
||||
|
||||
The main goal of BQ is to dramatically reduce the size of your vector index,
|
||||
resulting in faster searches and less resources. This is especially useful in
|
||||
`sqlite-vec`, which is (currently) brute-force only and meant to run on small
|
||||
devices. BQ is an easy low-cost method to make larger vector datasets easy to
|
||||
manage.
|
||||
|
||||
## Binary Quantization `sqlite-vec`
|
||||
|
||||
The `sqlite-vec` extension offers a `vec_quantize_binary()` SQL scalar function,
|
||||
which applies binary quanitization to a `float32` or `int8` vector. For every
|
||||
element in a given vector, it will apply `0` to negative values and `1` to
|
||||
positive values, and pack them into a `BLOB`.
|
||||
|
||||
```sqlite
|
||||
select vec_quantize_binary('[-0.73, -0.80, 0.12, -0.73, 0.79, -0.11, 0.23, 0.97]');
|
||||
-- X'd4`
|
||||
```
|
||||
|
||||
The single byte `0xd4` in hexadecimal is `11010100` in binary.
|
||||
|
||||
<!-- TODO what https://github.com/asg017/sqlite-vec/issues/23 -->
|
||||
|
||||
## Demo
|
||||
|
||||
```sqlite
|
||||
create virtual table vec_movies using vec0(
|
||||
synopsis_embedding bit[768]
|
||||
);
|
||||
```
|
||||
|
||||
```sqlite
|
||||
insert into vec_movies(rowid, synopsis_embedding)
|
||||
VALUES (:id, vec_quantize_binary(:vector));
|
||||
```
|
||||
|
||||
```sqlite
|
||||
select
|
||||
rowid,
|
||||
distance
|
||||
from vec_movies
|
||||
where synopsis_embedding match vec_quantize_binary(:query)
|
||||
order by distance
|
||||
limit 20;
|
||||
```
|
||||
|
||||
### Re-scoring
|
||||
|
||||
```sqlite
|
||||
create virtual table vec_movies using vec0(
|
||||
synopsis_embedding float[768],
|
||||
synopsis_embedding_coarse bit[768]
|
||||
);
|
||||
```
|
||||
|
||||
```sqlite
|
||||
insert into vec_movies(rowid, synopsis_embedding, synopsis_embedding_coarse)
|
||||
VALUES (:id, :vector, vec_quantize_binary(:vector));
|
||||
```
|
||||
|
||||
```sqlite
|
||||
with coarse_matches as (
|
||||
select
|
||||
rowid,
|
||||
synopsis_embedding
|
||||
from vec_movies
|
||||
where synopsis_embedding_coarse match vec_quantize_binary(:query)
|
||||
order by distance
|
||||
limit 20 * 8
|
||||
),
|
||||
select
|
||||
rowid,
|
||||
vec_distance_L2(synopsis_embedding, :query)
|
||||
from coarse_matches
|
||||
order by 2
|
||||
limit 20;
|
||||
```
|
||||
|
||||
# Benchmarks
|
||||
|
||||
## Model support
|
||||
|
||||
Certain embedding models, like [Nomic](https://nomic.ai/)'s
|
||||
[`nomic-embed-text-v1.5`](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5)
|
||||
text embedding model and
|
||||
[mixedbread.ai](https://www.mixedbread.ai/blog/mxbai-embed-2d-large-v1)'s
|
||||
[`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1)
|
||||
are specifically trained to perform well after binary quantization.
|
||||
|
||||
Other embeddings models may not, but you can still try BQ and see if it works
|
||||
for your datasets. Chances are, if your vectors are normalized (ie between
|
||||
`-1.0` and `1.0`) there's a good chance you will see acceptable results with BQ.
|
||||
0
site/guides/classifiers.md
Normal file
0
site/guides/classifiers.md
Normal file
0
site/guides/hybrid-search.md
Normal file
0
site/guides/hybrid-search.md
Normal file
49
site/guides/matryoshka.md
Normal file
49
site/guides/matryoshka.md
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
# Matryoshka (Adaptive-Length) Embeddings
|
||||
|
||||
Matryoshka embeddings are a new class of embedding models introduced in the
|
||||
TODO-YYY paper [_TODO title_](https://arxiv.org/abs/2205.13147). They allow one
|
||||
to truncate excess dimensions in large vector, without lossing much quality.
|
||||
|
||||
Let's say your embedding model generate 1024-dimensional vectors. If you have 1
|
||||
million of these 1024-dimensional vectors, they would take up `4.096 GB` of
|
||||
space! You're not able to reduce the dimensions without lossing a lot of
|
||||
quality - if you were to remove half of the dimensions 512-dimensional vectors,
|
||||
you could expect to also lose 50% or more of the quality of results. There are
|
||||
other dimensional-reduction techniques, like [PCA](#TODO), but this requires a
|
||||
complicated and expensive training process.
|
||||
|
||||
Matryoshka embeddings, on the other hand, _can_ be truncated, without losing
|
||||
quality. Using [`mixedbread.ai`](#TODO) `mxbai-embed-large-v1` model, they claim
|
||||
that
|
||||
|
||||
They are called "Matryoshka" embeddings because ... TODO
|
||||
|
||||
## Matryoshka Embeddings with `sqlite-vec`
|
||||
|
||||
You can use a combination of [`vec_slice()`](/api-reference#vec_slice) and
|
||||
[`vec_normalize()`](/api-reference#vec_slice) on Matryoshka embeddings to
|
||||
truncate.
|
||||
|
||||
```sql
|
||||
select
|
||||
vec_normalize(vec_slice(title_embeddings, 0, 256)) as title_embeddings_256d
|
||||
from vec_articles;
|
||||
```
|
||||
|
||||
## Benchmarks
|
||||
|
||||
## Suppported Models
|
||||
|
||||
https://supabase.com/blog/matryoshka-embeddings#which-granularities-were-openais-text-embedding-3-models-trained-on
|
||||
|
||||
`text-embedding-3-small`: 1536, 512 `text-embedding-3-large`: 3072, 1024, 256
|
||||
|
||||
https://x.com/ZainHasan6/status/1757519325202686255
|
||||
|
||||
`text-embeddings-3-large:` 3072, 1536, 1024, 512
|
||||
|
||||
https://www.mixedbread.ai/blog/binary-mrl
|
||||
|
||||
`mxbai-embed-large-v1`: 1024, 512, 256, 128, 64
|
||||
|
||||
`nomic-embed-text-v1.5`: 768, 512, 256, 128, 64
|
||||
4
site/guides/performance.md
Normal file
4
site/guides/performance.md
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
- page_size
|
||||
- memory mapping
|
||||
- in-memory index
|
||||
- chunk_size (?)
|
||||
4
site/guides/rag.md
Normal file
4
site/guides/rag.md
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
# Retrival Augmented Generation (RAG)
|
||||
|
||||
- "memories"?
|
||||
- chunking
|
||||
27
site/guides/scalar-quant.md
Normal file
27
site/guides/scalar-quant.md
Normal file
|
|
@ -0,0 +1,27 @@
|
|||
# Scalar Quantization (SQ)
|
||||
|
||||
"Quantization" refers to a variety of methods and techniques for reducing the
|
||||
size of vectors in a vector index. **Scalar quantization** (SQ) refers to a
|
||||
specific technique where each individual floating point element in a vector is
|
||||
scaled to a small element type, like `float16`, `int8`.
|
||||
|
||||
Most embedding models generate `float32` vectors. Each `float32` takes up 4
|
||||
bytes of space. This can add up, especially when working with a large amount of
|
||||
vectors or vectors with many dimensions. However, if you scale them to `float16`
|
||||
or `int8` vectors, they only take up 2 bytes of space and 1 bytes of space
|
||||
respectively, saving you precious space at the expense of some quality.
|
||||
|
||||
```sql
|
||||
select vec_quantize_float16(vec_f32('[]'), 'unit');
|
||||
select vec_quantize_int8(vec_f32('[]'), 'unit');
|
||||
|
||||
select vec_quantize('float16', vec_f32('...'));
|
||||
select vec_quantize('int8', vec_f32('...'));
|
||||
select vec_quantize('bit', vec_f32('...'));
|
||||
|
||||
select vec_quantize('sqf16', vec_f32('...'));
|
||||
select vec_quantize('sqi8', vec_f32('...'));
|
||||
select vec_quantize('bq2', vec_f32('...'));
|
||||
```
|
||||
|
||||
## Benchmarks
|
||||
0
site/guides/semantic-search.md
Normal file
0
site/guides/semantic-search.md
Normal file
Loading…
Add table
Add a link
Reference in a new issue