doc updates

This commit is contained in:
Alex Garcia 2024-06-22 16:46:33 -07:00
parent df48ac2416
commit b62f6f19a8
31 changed files with 751 additions and 97 deletions

View file

@ -0,0 +1,5 @@
# Vector Arithmetic
- `vec_add()`
- `vec_sub()`
- `vec_mean()`

120
site/guides/binary-quant.md Normal file
View file

@ -0,0 +1,120 @@
# Binary Quantization
"Quantization" refers to a variety of methods and techniques for reducing the
size of vectors in a vector index. **Binary quantization** (BQ) refers to a
specific technique where each individual floating point element in a vector is
reduced to a single bit, typically by assigning `0` to negative numbers and `1`
to positive numbers.
For example, in this 8-dimensional `float32` vector:
```json
[-0.73, -0.80, 0.12, -0.73, 0.79, -0.11, 0.23, 0.97]
```
Applying binary quantization would result in the following `bit` vector:
```json
[0, 0, 1, 0, 1, 0, 1, 1]
```
The original 8-dimensional `float32` vector requires `8 * 4 = 32` bytes of space
to store. For 1 million vectors, that would be `32MB`. On the other hand, the
binary quantized 8-dimensional vector can be stored in a single byte — one bit
per element. For 1 million vectors, that would be just `1MB`, a 32x reduction!
Though keep in mind, you're bound to lose a lot quality when reducing 32 bits of
information to 1 bit. [Over-sampling and re-scoring](#re-scoring) will help a
lot.
The main goal of BQ is to dramatically reduce the size of your vector index,
resulting in faster searches and less resources. This is especially useful in
`sqlite-vec`, which is (currently) brute-force only and meant to run on small
devices. BQ is an easy low-cost method to make larger vector datasets easy to
manage.
## Binary Quantization `sqlite-vec`
The `sqlite-vec` extension offers a `vec_quantize_binary()` SQL scalar function,
which applies binary quanitization to a `float32` or `int8` vector. For every
element in a given vector, it will apply `0` to negative values and `1` to
positive values, and pack them into a `BLOB`.
```sqlite
select vec_quantize_binary('[-0.73, -0.80, 0.12, -0.73, 0.79, -0.11, 0.23, 0.97]');
-- X'd4`
```
The single byte `0xd4` in hexadecimal is `11010100` in binary.
<!-- TODO what https://github.com/asg017/sqlite-vec/issues/23 -->
## Demo
```sqlite
create virtual table vec_movies using vec0(
synopsis_embedding bit[768]
);
```
```sqlite
insert into vec_movies(rowid, synopsis_embedding)
VALUES (:id, vec_quantize_binary(:vector));
```
```sqlite
select
rowid,
distance
from vec_movies
where synopsis_embedding match vec_quantize_binary(:query)
order by distance
limit 20;
```
### Re-scoring
```sqlite
create virtual table vec_movies using vec0(
synopsis_embedding float[768],
synopsis_embedding_coarse bit[768]
);
```
```sqlite
insert into vec_movies(rowid, synopsis_embedding, synopsis_embedding_coarse)
VALUES (:id, :vector, vec_quantize_binary(:vector));
```
```sqlite
with coarse_matches as (
select
rowid,
synopsis_embedding
from vec_movies
where synopsis_embedding_coarse match vec_quantize_binary(:query)
order by distance
limit 20 * 8
),
select
rowid,
vec_distance_L2(synopsis_embedding, :query)
from coarse_matches
order by 2
limit 20;
```
# Benchmarks
## Model support
Certain embedding models, like [Nomic](https://nomic.ai/)'s
[`nomic-embed-text-v1.5`](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5)
text embedding model and
[mixedbread.ai](https://www.mixedbread.ai/blog/mxbai-embed-2d-large-v1)'s
[`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1)
are specifically trained to perform well after binary quantization.
Other embeddings models may not, but you can still try BQ and see if it works
for your datasets. Chances are, if your vectors are normalized (ie between
`-1.0` and `1.0`) there's a good chance you will see acceptable results with BQ.

View file

View file

49
site/guides/matryoshka.md Normal file
View file

@ -0,0 +1,49 @@
# Matryoshka (Adaptive-Length) Embeddings
Matryoshka embeddings are a new class of embedding models introduced in the
TODO-YYY paper [_TODO title_](https://arxiv.org/abs/2205.13147). They allow one
to truncate excess dimensions in large vector, without lossing much quality.
Let's say your embedding model generate 1024-dimensional vectors. If you have 1
million of these 1024-dimensional vectors, they would take up `4.096 GB` of
space! You're not able to reduce the dimensions without lossing a lot of
quality - if you were to remove half of the dimensions 512-dimensional vectors,
you could expect to also lose 50% or more of the quality of results. There are
other dimensional-reduction techniques, like [PCA](#TODO), but this requires a
complicated and expensive training process.
Matryoshka embeddings, on the other hand, _can_ be truncated, without losing
quality. Using [`mixedbread.ai`](#TODO) `mxbai-embed-large-v1` model, they claim
that
They are called "Matryoshka" embeddings because ... TODO
## Matryoshka Embeddings with `sqlite-vec`
You can use a combination of [`vec_slice()`](/api-reference#vec_slice) and
[`vec_normalize()`](/api-reference#vec_slice) on Matryoshka embeddings to
truncate.
```sql
select
vec_normalize(vec_slice(title_embeddings, 0, 256)) as title_embeddings_256d
from vec_articles;
```
## Benchmarks
## Suppported Models
https://supabase.com/blog/matryoshka-embeddings#which-granularities-were-openais-text-embedding-3-models-trained-on
`text-embedding-3-small`: 1536, 512 `text-embedding-3-large`: 3072, 1024, 256
https://x.com/ZainHasan6/status/1757519325202686255
`text-embeddings-3-large:` 3072, 1536, 1024, 512
https://www.mixedbread.ai/blog/binary-mrl
`mxbai-embed-large-v1`: 1024, 512, 256, 128, 64
`nomic-embed-text-v1.5`: 768, 512, 256, 128, 64

View file

@ -0,0 +1,4 @@
- page_size
- memory mapping
- in-memory index
- chunk_size (?)

4
site/guides/rag.md Normal file
View file

@ -0,0 +1,4 @@
# Retrival Augmented Generation (RAG)
- "memories"?
- chunking

View file

@ -0,0 +1,27 @@
# Scalar Quantization (SQ)
"Quantization" refers to a variety of methods and techniques for reducing the
size of vectors in a vector index. **Scalar quantization** (SQ) refers to a
specific technique where each individual floating point element in a vector is
scaled to a small element type, like `float16`, `int8`.
Most embedding models generate `float32` vectors. Each `float32` takes up 4
bytes of space. This can add up, especially when working with a large amount of
vectors or vectors with many dimensions. However, if you scale them to `float16`
or `int8` vectors, they only take up 2 bytes of space and 1 bytes of space
respectively, saving you precious space at the expense of some quality.
```sql
select vec_quantize_float16(vec_f32('[]'), 'unit');
select vec_quantize_int8(vec_f32('[]'), 'unit');
select vec_quantize('float16', vec_f32('...'));
select vec_quantize('int8', vec_f32('...'));
select vec_quantize('bit', vec_f32('...'));
select vec_quantize('sqf16', vec_f32('...'));
select vec_quantize('sqi8', vec_f32('...'));
select vec_quantize('bq2', vec_f32('...'));
```
## Benchmarks

View file