docs(embeddings): provider-independent config surface (RFC-012 Phase 2)

Rewrite docs/user/search/embeddings.md for the resolved Provider model: the provider table, the OMNIGRAPH_EMBED_* env surface (default OpenRouter), the deadline/observability/reuse behavior, and a no-back-compat migration note for gemini-preview graphs.
2026-06-18 02:24:27 +02:00 · 2026-06-15 17:28:03 +02:00 · 2026-06-15 17:28:03 +02:00 · e7361dce49
commit e7361dce49
parent 532631bca0
1 changed files with 51 additions and 13 deletions
--- a/docs/user/search/embeddings.md
+++ b/docs/user/search/embeddings.md
@ -1,17 +1,47 @@
 # Embeddings

-OmniGraph embeds text through a **single engine-side client** (Google Gemini). It is used in two places: the
-query-time auto-embed of a string passed to `nearest($v, "string")`, and the offline `omnigraph embed` file
-pipeline. Both paths use the same client, so query vectors and CLI-produced document vectors share one model
-and one vector space.
+OmniGraph embeds text through a **single, provider-independent client** resolved from one
+`EmbeddingConfig { provider, model, base_url, api_key }`. The same resolved config is used by the query-time
+auto-embed of a string in `nearest($v, "string")` and by the offline `omnigraph embed` file pipeline, so
+query vectors and document vectors share one model and one vector space.

-## Engine embedding client
+## Providers

- Model: `gemini-embedding-2-preview`
- Env: `GEMINI_API_KEY`, `OMNIGRAPH_GEMINI_BASE_URL` (default Google generativelanguage v1beta), `OMNIGRAPH_EMBED_TIMEOUT_MS=30000`, `OMNIGRAPH_EMBED_RETRY_ATTEMPTS=4`, `OMNIGRAPH_EMBED_RETRY_BACKOFF_MS=200`, `OMNIGRAPH_EMBEDDINGS_MOCK`
- Two task types: `embed_query_text` (RETRIEVAL_QUERY) for query strings and `embed_document_text` (RETRIEVAL_DOCUMENT) for stored documents
- Exponential backoff with retryable detection (timeouts, 429, 5xx)
- Vectors are stored as L2-normalized `FixedSizeList(Float32, dim)`; the requested dimension is driven by the target column width
+| `provider` | Wire shape | Use it for |
+|---|---|---|
+| `openai-compatible` (default) | `POST {base}/embeddings`, bearer auth, `{model, input, dimensions}` | **OpenRouter** (the default gateway — one key for many models), OpenAI direct, or a self-hosted endpoint (vLLM / Ollama / LM Studio) |
+| `gemini` | `POST {base}/models/{model}:embedContent`, `x-goog-api-key`, with `RETRIEVAL_QUERY` / `RETRIEVAL_DOCUMENT` task types | Reaching Google's `generativelanguage` API directly |
+| `mock` | none — deterministic offline vectors | Tests and local dev without a key |
+
+Vectors are stored L2-normalized as `FixedSizeList(Float32, dim)`; the requested output dimension is driven by
+the target column width and sent as Gemini `outputDimensionality` / OpenAI `dimensions`.
+
+## Configuration (environment)
+
+| Variable | Meaning |
+|---|---|
+| `OMNIGRAPH_EMBED_PROVIDER` | `openai-compatible` (default) \| `gemini` \| `mock` |
+| `OMNIGRAPH_EMBED_BASE_URL` | endpoint base; default `https://openrouter.ai/api/v1` (openai-compatible) or `https://generativelanguage.googleapis.com/v1beta` (gemini) |
+| `OMNIGRAPH_EMBED_MODEL` | model id; default `openai/text-embedding-3-large` (openai-compatible) or `gemini-embedding-2` (gemini) |
+| `OPENROUTER_API_KEY` / `OPENAI_API_KEY` | api key for `openai-compatible` (OpenRouter preferred) |
+| `GEMINI_API_KEY` | api key for `gemini` |
+| `OMNIGRAPH_EMBED_QUERY_DEADLINE_MS` | total wall-clock budget for one embed call across all retries (default `60000`; `0` = unbounded) |
+| `OMNIGRAPH_EMBED_TIMEOUT_MS` | per-request HTTP timeout (default `30000`) |
+| `OMNIGRAPH_EMBED_RETRY_ATTEMPTS` / `OMNIGRAPH_EMBED_RETRY_BACKOFF_MS` | retry policy (defaults `4` / `200`) |
+| `OMNIGRAPH_EMBEDDINGS_MOCK` | set truthy to force the deterministic mock provider |
+
+The default zero-config path is OpenRouter: set `OPENROUTER_API_KEY` and run. Reaching Gemini takes
+`OMNIGRAPH_EMBED_PROVIDER=gemini` plus `GEMINI_API_KEY`.
+
+### Behavior notes
+
+- **Bounded latency.** Each embed call is wrapped in `OMNIGRAPH_EMBED_QUERY_DEADLINE_MS`, so a degraded
+  provider cannot hang a read for the full retry envelope.
+- **Reuse.** The query path builds the client once per graph handle (on the first `nearest($v, "string")`
+  that needs embedding) and reuses it, keeping the provider connection pool warm. A graph that never embeds
+  needs no provider key.
+- **Observability.** Embed calls emit `tracing` events under `target = "omnigraph::embedding"` (provider,
+  model, dim, attempt, elapsed, outcome).

 ## `@embed` schema annotation

@ -20,15 +50,23 @@ by the query typechecker and linter: it records which String property is the emb
 `nearest($v, "string")` auto-embed a query string for comparison against that vector column.

 **It does not embed at ingest.** Stored vectors are supplied directly in your load data, or pre-filled by the
-offline `omnigraph embed` pipeline below. (Ingest-time execution of `@embed` is a planned enhancement; until
-it ships, populate the vector column yourself.)
+offline `omnigraph embed` pipeline below. (Ingest-time execution of `@embed` is a planned enhancement.)

 ## CLI `omnigraph embed` (offline file pipeline)

-Operates on **JSONL files** (not on a graph), using the same engine client. Three modes (mutually exclusive):
+Operates on **JSONL files** (not on a graph), using the same resolved provider config. Three modes (mutually
+exclusive):

 - (default) `fill_missing` — only embed rows whose target field is empty
 - `--reembed-all` — overwrite all
 - `--clean` — strip embeddings

 Inputs are either a single seed manifest YAML or `--input/--output/--spec`. Selectors `--type T`, `--select T:field=value` filter rows. Streams JSONL → JSONL.
+
+## Migration
+
+This release has no backwards-compatibility shim (pre-release). The default provider is now OpenRouter, and
+the legacy `OMNIGRAPH_GEMINI_BASE_URL` is removed. A graph whose vectors were produced with
+`gemini-embedding-2-preview` should either re-embed, or pin the query-time embedder to match by setting
+`OMNIGRAPH_EMBED_PROVIDER=gemini` and `OMNIGRAPH_EMBED_MODEL=gemini-embedding-2-preview` (the stored and query
+vectors must come from the same model to be comparable).