This commit is contained in:
Alex Garcia 2024-07-31 12:55:03 -07:00
parent 4febdff11a
commit 356f75cca7
17 changed files with 350 additions and 166 deletions

View file

@ -34,126 +34,77 @@ print(f"vec_version={vec_version}")
### Lists
If the vectors you are working with are provided as a list of floats, you can convert them into the compact BLOB format that `sqlite-vec` uses with [`struct.pack()`](https://docs.python.org/3/library/struct.html#struct.pack).
If your vectors in Python are provided as a list of floats, you can
convert them into the compact BLOB format that `sqlite-vec` uses with
`serialize_float32()`. This will internally call [`struct.pack()`](https://docs.python.org/3/library/struct.html#struct.pack).
```python
import struct
def serialize(vector: List[float]) -> bytes:
""" serializes a list of floats into a compact "raw bytes" format """
return struct.pack('%sf' % len(vector), *vector)
from sqlite_vec import serialize_float32
embedding = [0.1, 0.2, 0.3, 0.4]
result = db.execute('select vec_length(?)', [serialize(embedding)]).fetchone()[0]
result = db.execute('select vec_length(?)', [serialize_float32(embedding)])
print(result) # 4
print(result.fetchone()[0]) # 4
```
### NumPy Arrays
If your vectors are from `numpy` arrays, the Python SQLite package allows you to pass it along as-is. Make sure you convert your array elements to 32-bit floats with [`.astype(np.float32)`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html), as some embedding services will use `np.float64` elements.
If your vectors are NumPy arrays, the Python SQLite package allows you to
pass it along as-is, since NumPy arrays implement [the Buffer protocol](https://docs.python.org/3/c-api/buffer.html). Make sure you cast your array elements to 32-bit floats
with
[`.astype(np.float32)`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html),
as some embeddings will use `np.float64`.
```python
import numpy as np
import sqlite3
import sqlite_vec
db = sqlite3.connect(":memory:")
db.enable_load_extension(True)
sqlite_vec.load(db)
db.enable_load_extension(False)
db.execute("CREATE VIRTUAL TABLE vec_demo(sample_embedding float[4])")
embedding = np.array([0.1, 0.2, 0.3, 0.4])
db.execute(
"INSERT INTO vec_demo(sample_embedding) VALUES (?)", [embedding.astype(np.float32)]
)
"SELECT vec_length(?)", [embedding.astype(np.float32)]
) # 4
```
## Recipes
### OpenAI
https://platform.openai.com/docs/guides/embeddings/what-are-embeddings?lang=python
TODO
```python
from openai import OpenAI
import sqlite3
import sqlite_vec
texts = [
'Capri-Sun is a brand of juice concentratebased drinks manufactured by the German company Wild and regional licensees.',
'Shohei Ohtani is a Japanese professional baseball pitcher and designated hitter for the Los Angeles Dodgers of Major League Baseball.',
'George V was King of the United Kingdom and the British Dominions, and Emperor of India, from 6 May 1910 until his death in 1936.',
'Alan Mathison Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist.',
'Alaqua Cox is a Native American (Menominee) actress.'
]
# change ':memory:' to a filepath to persist data
db = sqlite3.connect(':memory:')
db.enable_load_extension(True)
sqlite_vec.load(db)
db.enable_load_extension(False)
client = OpenAI()
response = client.embeddings.create(
input=[texts],
model="text-embedding-3-small"
)
print(response.data[0].embedding)
```
### llamafile
https://github.com/Mozilla-Ocho/llamafile
TODO
### llama-cpp-python
https://github.com/abetlen/llama-cpp-python
TODO
### sentence-transformers (etc.)
https://github.com/UKPLab/sentence-transformers
TODO
## Using an up-to-date version of SQLite
Some features of `sqlite-vec` will require an up-to-date SQLite library. You can see what version of SQLite your Python environment uses with [`sqlite3.sqlite-version`](https://docs.python.org/3/library/sqlite3.html#sqlite3.sqlite_version), or with this one-line command:
Some features of `sqlite-vec` will require an up-to-date SQLite library. You can
see what version of SQLite your Python environment uses with
[`sqlite3.sqlite_version`](https://docs.python.org/3/library/sqlite3.html#sqlite3.sqlite_version),
or with this one-line command:
```bash
python -c 'import sqlite3; print(sqlite3.sqlite_version)'
```
Currently, **SQLite version 3.41 or higher** is recommended but not required. `sqlite-vec` will work with older version, but certain features and queries will only work correctly in >=3.41.
Currently, **SQLite version 3.41 or higher** is recommended but not required.
`sqlite-vec` will work with older versions, but certain features and queries will
only work correctly in >=3.41.
To "upgrade" the SQLite version your Python installation uses, you have a few options.
To "upgrade" the SQLite version your Python installation uses, you have a few
options.
### Compile your own SQLite version
You can compile an up-to-date version of SQLite and use some system environment variables (like `LD_PRELOAD` and `DYLD_LIBRARY_PATH`) to force Python to use a different SQLite library. [This guide](https://til.simonwillison.net/sqlite/sqlite-version-macos-python) goes into this approach in more details.
You can compile an up-to-date version of SQLite and use some system environment
variables (like `LD_PRELOAD` and `DYLD_LIBRARY_PATH`) to force Python to use a
different SQLite library.
[This guide](https://til.simonwillison.net/sqlite/sqlite-version-macos-python)
goes into this approach in more details.
Although compiling SQLite can be straightforward, there are a lot of different compilation options to consider, which makes it confusing. This also doesn't work with Windows, which statically compiles its own SQLite library.
Although compiling SQLite can be straightforward, there are a lot of different
compilation options to consider, which makes it confusing. This also doesn't
work with Windows, which statically compiles its own SQLite library.
### Use `pysqlite3`
[`pysqlite3`](https://github.com/coleifer/pysqlite3) is a 3rd party PyPi package that bundles an up-to-date SQLite library as a separate pip package.
[`pysqlite3`](https://github.com/coleifer/pysqlite3) is a 3rd party PyPi package
that bundles an up-to-date SQLite library as a separate pip package.
While it's mostly compatible with the Python `sqlite3` module, there are a few rare edge cases where the APIs don't match.
While it's mostly compatible with the Python `sqlite3` module, there are a few
rare edge cases where the APIs don't match.
### Upgrading your Python version
Sometimes installing a latest version of Python will "magically" upgrade your SQLite version as well. This is a nuclear option, as upgrading Python installations can be quite the hassle, but most Python 3.12 builds will have a very recent SQLite version.
Sometimes installing a latest version of Python will "magically" upgrade your
SQLite version as well. This is a nuclear option, as upgrading Python
installations can be quite the hassle, but most Python 3.12 builds will have a
very recent SQLite version.