This commit is contained in:
Alex Garcia 2024-07-31 12:55:03 -07:00
parent 4febdff11a
commit 356f75cca7
17 changed files with 350 additions and 166 deletions

View file

@ -1,5 +1,7 @@
# Using `sqlite-vec` in Datasette
[![Datasette](https://img.shields.io/pypi/v/datasette-sqlite-vec.svg?color=B6B6D9&label=Datasette+plugin&logoColor=white&logo=python)](https://datasette.io/plugins/datasette-sqlite-vec)
```bash
datasette install datasette-sqlite-vec
```

View file

@ -1,5 +1,7 @@
# Using `sqlite-vec` in Go
[![Go Reference](https://pkg.go.dev/badge/github.com/asg017/sqlite-vec-go-bindings/cgo.svg)](https://pkg.go.dev/github.com/asg017/sqlite-vec-go-bindings/cgo) [![Go Reference](https://pkg.go.dev/badge/github.com/asg017/sqlite-vec-go-bindings/ncruces.svg)](https://pkg.go.dev/github.com/asg017/sqlite-vec-go-bindings/ncruces)
There are two ways you can embed `sqlite-vec` into Go applications: a CGO option
for libraries like
[`github.com/mattn/go-sqlite3`](https://github.com/mattn/go-sqlite3), or a
@ -8,14 +10,87 @@ WASM-based option with
## Option 1: CGO
If using [`github.com/mattn/go-sqlite3`](https://github.com/mattn/go-sqlite3) or another CGO-based SQLite library, then use the `github.com/asg017/sqlite-vec-go-bindings/cgo` module to embed `sqlite-vec` into your Go application.
```bash
go get -u github.com/asg017/sqlite-vec/bindings/go/cgo
go get -u github.com/asg017/sqlite-vec-go-bindings/cgo
```
This will compile and statically link `sqlite-vec` into your project. The initial build will be slow, but later builds will be cached and much faster.
Use `sqlite_vec.Auto()` to enable `sqlite-vec` functions in all future database connections. Also `sqlite_vec.Cancel()` is available to undo `Auto()`.
```go
package main
import (
"database/sql"
"log"
sqlite_vec "github.com/asg017/sqlite-vec-go-bindings/cgo"
_ "github.com/mattn/go-sqlite3"
)
func main() {
sqlite_vec.Auto()
db, err := sql.Open("sqlite3", ":memory:")
if err != nil {
log.Fatal(err)
}
defer db.Close()
var vecVersion string
err = db.QueryRow("select vec_version()").Scan(&vecVersion)
if err != nil {
log.Fatal(err)
}
log.Printf("sqlite_version=%s, vec_version=%s\n",vecVersion)
}
```
## Option 2: WASM based with `ncruces/go-sqlite3`
```
go
[`github.com/ncruces/go-sqlite3`](https://github.com/ncruces/go-sqlite3) is an alternative SQLite Go driver that avoids CGO by using a custom WASM build of SQLite. To use `sqlite-vec` from this library, use the specicial WASM binary provided in `github.com/asg017/sqlite-vec-go-bindings/ncruces`.
```bash
go get -u github.com/asg017/sqlite-vec-go-bindings/ncruces
```
```go
package main
import (
_ "embed"
"log"
_ "github.com/asg017/sqlite-vec-go-bindings/ncruces"
"github.com/ncruces/go-sqlite3"
)
func main() {
db, err := sqlite3.Open(":memory:")
if err != nil {
log.Fatal(err)
}
stmt, _, err := db.Prepare(`SELECT sqlite_version(), vec_version()`)
if err != nil {
log.Fatal(err)
}
stmt.Step()
log.Printf("vec_version=%s\n", stmt.ColumnText(0))
stmt.Close()
}
```
The `github.com/asg017/sqlite-vec-go-bindings/ncruces` package embeds a custom WASM build of SQLite, so there's no need to use `github.com/ncruces/go-sqlite3/embed`.
## Working with vectors in Go
If vectors are provided as a list of floats, use `SerializeFloat32(list)` to serialize them into the compact BLOB format that `sqlite-vec` expects.
```go
TODO
```

View file

@ -56,41 +56,48 @@ accessor to bind as a parameter to `sqlite-vec` SQL functions.
```js
// TODO
const embedding = new Float32Array([0.1, 0.2, 0.3, 0.4]);
const stmt = db.prepare("INSERT INTO vss_demo VALUES (?)");
stmt.run(embedding.buffer);
const stmt = db.prepare("select vec_length(?)");
console.log(stmt.run(embedding.buffer));
```
## Node.js
Here's a quick recipe of using `sqlite-vec` with [`better-sqlite3`](https://github.com/WiseLibs/better-sqlite3) in Node.js.
Here's a quick recipe of using `sqlite-vec` with
[`better-sqlite3`](https://github.com/WiseLibs/better-sqlite3) in Node.js.
```js
```
See [`simple-node/demo.mjs`](https://github.com/asg017/sqlite-vec/blob/main/examples/simple-node/demo.mjs)
See
[`simple-node/demo.mjs`](https://github.com/asg017/sqlite-vec/blob/main/examples/simple-node/demo.mjs)
for a more complete Node.js demo.
## Deno
Here's a quick recipe of using `sqlite-vec` with [`jsr:@db/sqlite`](https://jsr.io/@db/sqlite) in Deno. It will only work on Deno version `1.44` or greater, because of a bug in previous Deno version.
Here's a quick recipe of using `sqlite-vec` with
[`jsr:@db/sqlite`](https://jsr.io/@db/sqlite) in Deno. It will only work on Deno
version `1.44` or greater, because of a bug in previous Deno version.
Keep in mind, the `better-sqlite3` example above also works in Deno, you just need to prefix the `better-sqlite3` import with `npm:`, like `import * from "npm:better-sqlite3"`.
Keep in mind, the `better-sqlite3` example above also works in Deno, you just
need to prefix the `better-sqlite3` import with `npm:`, like
`import * from "npm:better-sqlite3"`.
```ts
```
See [`simple-deno/demo.ts`](https://github.com/asg017/sqlite-vec/blob/main/examples/simple-deno/demo.ts)
See
[`simple-deno/demo.ts`](https://github.com/asg017/sqlite-vec/blob/main/examples/simple-deno/demo.ts)
for a more complete Deno demo.
## Bun
Here's a quick recipe of using `sqlite-vec` with [`bun:sqlite`](https://bun.sh/docs/api/sqlite) in Bun. The `better-sqlite3` example above also works with Bun.
Here's a quick recipe of using `sqlite-vec` with
[`bun:sqlite`](https://bun.sh/docs/api/sqlite) in Bun. The `better-sqlite3`
example above also works with Bun.
```ts
```
See [`simple-bun/demo.ts`](https://github.com/asg017/sqlite-vec/blob/main/examples/simple-bun/demo.ts)
See
[`simple-bun/demo.ts`](https://github.com/asg017/sqlite-vec/blob/main/examples/simple-bun/demo.ts)
for a more complete Bun demo.

View file

@ -34,126 +34,77 @@ print(f"vec_version={vec_version}")
### Lists
If the vectors you are working with are provided as a list of floats, you can convert them into the compact BLOB format that `sqlite-vec` uses with [`struct.pack()`](https://docs.python.org/3/library/struct.html#struct.pack).
If your vectors in Python are provided as a list of floats, you can
convert them into the compact BLOB format that `sqlite-vec` uses with
`serialize_float32()`. This will internally call [`struct.pack()`](https://docs.python.org/3/library/struct.html#struct.pack).
```python
import struct
def serialize(vector: List[float]) -> bytes:
""" serializes a list of floats into a compact "raw bytes" format """
return struct.pack('%sf' % len(vector), *vector)
from sqlite_vec import serialize_float32
embedding = [0.1, 0.2, 0.3, 0.4]
result = db.execute('select vec_length(?)', [serialize(embedding)]).fetchone()[0]
result = db.execute('select vec_length(?)', [serialize_float32(embedding)])
print(result) # 4
print(result.fetchone()[0]) # 4
```
### NumPy Arrays
If your vectors are from `numpy` arrays, the Python SQLite package allows you to pass it along as-is. Make sure you convert your array elements to 32-bit floats with [`.astype(np.float32)`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html), as some embedding services will use `np.float64` elements.
If your vectors are NumPy arrays, the Python SQLite package allows you to
pass it along as-is, since NumPy arrays implement [the Buffer protocol](https://docs.python.org/3/c-api/buffer.html). Make sure you cast your array elements to 32-bit floats
with
[`.astype(np.float32)`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html),
as some embeddings will use `np.float64`.
```python
import numpy as np
import sqlite3
import sqlite_vec
db = sqlite3.connect(":memory:")
db.enable_load_extension(True)
sqlite_vec.load(db)
db.enable_load_extension(False)
db.execute("CREATE VIRTUAL TABLE vec_demo(sample_embedding float[4])")
embedding = np.array([0.1, 0.2, 0.3, 0.4])
db.execute(
"INSERT INTO vec_demo(sample_embedding) VALUES (?)", [embedding.astype(np.float32)]
)
"SELECT vec_length(?)", [embedding.astype(np.float32)]
) # 4
```
## Recipes
### OpenAI
https://platform.openai.com/docs/guides/embeddings/what-are-embeddings?lang=python
TODO
```python
from openai import OpenAI
import sqlite3
import sqlite_vec
texts = [
'Capri-Sun is a brand of juice concentratebased drinks manufactured by the German company Wild and regional licensees.',
'Shohei Ohtani is a Japanese professional baseball pitcher and designated hitter for the Los Angeles Dodgers of Major League Baseball.',
'George V was King of the United Kingdom and the British Dominions, and Emperor of India, from 6 May 1910 until his death in 1936.',
'Alan Mathison Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist.',
'Alaqua Cox is a Native American (Menominee) actress.'
]
# change ':memory:' to a filepath to persist data
db = sqlite3.connect(':memory:')
db.enable_load_extension(True)
sqlite_vec.load(db)
db.enable_load_extension(False)
client = OpenAI()
response = client.embeddings.create(
input=[texts],
model="text-embedding-3-small"
)
print(response.data[0].embedding)
```
### llamafile
https://github.com/Mozilla-Ocho/llamafile
TODO
### llama-cpp-python
https://github.com/abetlen/llama-cpp-python
TODO
### sentence-transformers (etc.)
https://github.com/UKPLab/sentence-transformers
TODO
## Using an up-to-date version of SQLite
Some features of `sqlite-vec` will require an up-to-date SQLite library. You can see what version of SQLite your Python environment uses with [`sqlite3.sqlite-version`](https://docs.python.org/3/library/sqlite3.html#sqlite3.sqlite_version), or with this one-line command:
Some features of `sqlite-vec` will require an up-to-date SQLite library. You can
see what version of SQLite your Python environment uses with
[`sqlite3.sqlite_version`](https://docs.python.org/3/library/sqlite3.html#sqlite3.sqlite_version),
or with this one-line command:
```bash
python -c 'import sqlite3; print(sqlite3.sqlite_version)'
```
Currently, **SQLite version 3.41 or higher** is recommended but not required. `sqlite-vec` will work with older version, but certain features and queries will only work correctly in >=3.41.
Currently, **SQLite version 3.41 or higher** is recommended but not required.
`sqlite-vec` will work with older versions, but certain features and queries will
only work correctly in >=3.41.
To "upgrade" the SQLite version your Python installation uses, you have a few options.
To "upgrade" the SQLite version your Python installation uses, you have a few
options.
### Compile your own SQLite version
You can compile an up-to-date version of SQLite and use some system environment variables (like `LD_PRELOAD` and `DYLD_LIBRARY_PATH`) to force Python to use a different SQLite library. [This guide](https://til.simonwillison.net/sqlite/sqlite-version-macos-python) goes into this approach in more details.
You can compile an up-to-date version of SQLite and use some system environment
variables (like `LD_PRELOAD` and `DYLD_LIBRARY_PATH`) to force Python to use a
different SQLite library.
[This guide](https://til.simonwillison.net/sqlite/sqlite-version-macos-python)
goes into this approach in more details.
Although compiling SQLite can be straightforward, there are a lot of different compilation options to consider, which makes it confusing. This also doesn't work with Windows, which statically compiles its own SQLite library.
Although compiling SQLite can be straightforward, there are a lot of different
compilation options to consider, which makes it confusing. This also doesn't
work with Windows, which statically compiles its own SQLite library.
### Use `pysqlite3`
[`pysqlite3`](https://github.com/coleifer/pysqlite3) is a 3rd party PyPi package that bundles an up-to-date SQLite library as a separate pip package.
[`pysqlite3`](https://github.com/coleifer/pysqlite3) is a 3rd party PyPi package
that bundles an up-to-date SQLite library as a separate pip package.
While it's mostly compatible with the Python `sqlite3` module, there are a few rare edge cases where the APIs don't match.
While it's mostly compatible with the Python `sqlite3` module, there are a few
rare edge cases where the APIs don't match.
### Upgrading your Python version
Sometimes installing a latest version of Python will "magically" upgrade your SQLite version as well. This is a nuclear option, as upgrading Python installations can be quite the hassle, but most Python 3.12 builds will have a very recent SQLite version.
Sometimes installing a latest version of Python will "magically" upgrade your
SQLite version as well. This is a nuclear option, as upgrading Python
installations can be quite the hassle, but most Python 3.12 builds will have a
very recent SQLite version.

View file

@ -1,9 +1,37 @@
# Using `sqlite-vec` in Ruby
https://rubygems.org/gems/sqlite-vec
![Gem](https://img.shields.io/gem/v/sqlite-vec?color=red&logo=rubygems&logoColor=white)
Ruby developers can use `sqlite-vec` with the [`sqlite-vec` Gem](https://rubygems.org/gems/sqlite-vec).
```bash
gem install sqlite-vec
```
You can then use `SqliteVss.load()` to load `sqlite-vss` SQL functions in a given SQLite connection.
```ruby
require 'sqlite3'
require 'sqlite_vec'
db = SQLite3::Database.new(':memory:')
db.enable_load_extension(true)
SqliteVec.load(db)
db.enable_load_extension(false)
result = db.execute('SELECT vec_version()')
puts result.first.first
```
## Working with vectors in Ruby
If your embeddings are provided as a list of numbers, use `.pack("f*")` to convert them into the compact BLOB format that `sqlite-vec` uses.
```ruby
embedding = [0.1, 0.2, 0.3, 0.4]
result = db.execute("SELECT vec_length(?)", [query.pack("f*")]])
puts result.first.first # 4
```

View file

@ -1,4 +1,5 @@
# Using `sqlite-vec` in Rust
[![Crates.io](https://img.shields.io/crates/v/sqlite-vec?logo=rust)](https://crates.io/crates/sqlite-vec)
You can embed `sqlite-vec` into your Rust projects using the official
[`sqlite-vec` crate](https://crates.io/crates/sqlite-vec).
@ -18,16 +19,29 @@ SQLite library's `sqlite3_auto_extension()` function. Here's an example with
```rs
use sqlite_vec::sqlite3_vec_init;
use rusqlite::{ffi::sqlite3_auto_extension};
use rusqlite::{ffi::sqlite3_auto_extension, Result};
fn main() {
fn main()-> Result<()> {
unsafe {
sqlite3_auto_extension(Some(std::mem::transmute(sqlite3_vec_init as *const ())));
}
// future database connection will now automatically include sqlite-vec functions!
let db = Connection::open_in_memory()?;
let vec_version: String = db.query_row("select vec_version()", &[v.as_bytes()], |x| x.get(0)?)?;
println!("vec_version={vec_version}");
Ok(())
}
```
A full [`sqlite-vec` Rust demo](#TODO) is also available.
## Working with vectors in Rust
If your vectors are provided as a `Vec<f32>` type, the [`zerocopy` crate](https://crates.io/crates/zerocopy) is recommended, specifically `zerocopy::AsBytes`. This will allow you to pass in vectors into `sqlite-vec` without any copying.
```rs
let query: Vec<f32> = vec![0.1, 0.2, 0.3, 0.4];
let mut stmt = db.prepare("SELECT vec_length(?)")?;
stmt.execute(&[item.1.as_bytes()])?;
```

View file

@ -1,5 +1,7 @@
# Using `sqlite-vec` in `sqlite-utils`
![sqlite-utils](https://img.shields.io/pypi/v/sqlite-utils-sqlite-vec.svg?color=B6B6D9&label=sqlite-utils+plugin&logoColor=white&logo=python)
```bash
sqlite-utils install sqlite-utils-sqlite-vec
```

View file

@ -0,0 +1,17 @@
# `sqlite-vec` in the Browser with WebAssembly
```html
<html>
<body>
<script type="module">
import {default as init} from "https://cdn.jsdelivr.net/npm/sqlite-vec-wasm-demo@latest/sqlite3.mjs";
const sqlite3 = await init();
const db = new sqlite3.oo1.DB(":memory:");
const [sqlite_version, vec_version] = db.selectArray('select vec_version();')
console.log(`vec_version=${vec_version}`);
</script>
</body>
</html>
```