omnigraph/docs/user/search/embeddings.md
Ragnor Comerford 30377c453b
fix(embedding): address PR review feedback (RFC-012 Phase 2)
openai-alias host (Cursor): OMNIGRAPH_EMBED_PROVIDER=openai now defaults its base URL to https://api.openai.com/v1 (model text-embedding-3-large), while openai-compatible/unset keep the OpenRouter gateway default. The default is derived from the alias rather than the Provider enum, so an operator's stated intent can no longer be silently routed to OpenRouter; an explicit OMNIGRAPH_EMBED_BASE_URL still wins. New test from_env_openai_alias_uses_openai_host_not_openrouter.

single model source of truth (Cursor): remove the EmbedSpec.model field. The provider config is authoritative for the model, so a spec can no longer declare a model that is silently ignored while the API uses another (the wrong-space-vectors footgun); the embed summary reports the model actually resolved. Correct by construction rather than a truthful-echo patch.

stale @embed docs (Codex): docs/user/schema/index.md and docs/dev/execution.md still claimed @embed embeds at ingest; corrected to the real contract (catalog annotation; vectors supplied or pre-filled by 'omnigraph embed'). Also documented the openai-vs-OpenRouter base default in embeddings.md.

Greptile's RFC-status note is declined: the repo lifecycle keeps an RFC Status: Proposed while its PR is open and flips to Accepted on merge.
2026-06-15 18:37:34 +02:00

72 lines
4.5 KiB
Markdown

# Embeddings
OmniGraph embeds text through a **single, provider-independent client** resolved from one
`EmbeddingConfig { provider, model, base_url, api_key }`. The same resolved config is used by the query-time
auto-embed of a string in `nearest($v, "string")` and by the offline `omnigraph embed` file pipeline, so
query vectors and document vectors share one model and one vector space.
## Providers
| `provider` | Wire shape | Use it for |
|---|---|---|
| `openai-compatible` (default) | `POST {base}/embeddings`, bearer auth, `{model, input, dimensions}` | **OpenRouter** (the default gateway — one key for many models), OpenAI direct, or a self-hosted endpoint (vLLM / Ollama / LM Studio) |
| `gemini` | `POST {base}/models/{model}:embedContent`, `x-goog-api-key`, with `RETRIEVAL_QUERY` / `RETRIEVAL_DOCUMENT` task types | Reaching Google's `generativelanguage` API directly |
| `mock` | none — deterministic offline vectors | Tests and local dev without a key |
Vectors are stored L2-normalized as `FixedSizeList(Float32, dim)`; the requested output dimension is driven by
the target column width and sent as Gemini `outputDimensionality` / OpenAI `dimensions`.
## Configuration (environment)
| Variable | Meaning |
|---|---|
| `OMNIGRAPH_EMBED_PROVIDER` | `openai-compatible` (default, → OpenRouter) \| `openai` (→ OpenAI's own host) \| `gemini` \| `mock` |
| `OMNIGRAPH_EMBED_BASE_URL` | endpoint base; defaults `https://openrouter.ai/api/v1` (`openai-compatible`/unset), `https://api.openai.com/v1` (`openai`), `https://generativelanguage.googleapis.com/v1beta` (`gemini`) |
| `OMNIGRAPH_EMBED_MODEL` | model id; defaults `openai/text-embedding-3-large` (OpenRouter), `text-embedding-3-large` (`openai`), `gemini-embedding-2` (`gemini`) |
| `OPENROUTER_API_KEY` / `OPENAI_API_KEY` | api key for `openai-compatible` (OpenRouter preferred) |
| `GEMINI_API_KEY` | api key for `gemini` |
| `OMNIGRAPH_EMBED_QUERY_DEADLINE_MS` | total wall-clock budget for one embed call across all retries (default `60000`; `0` = unbounded) |
| `OMNIGRAPH_EMBED_TIMEOUT_MS` | per-request HTTP timeout (default `30000`) |
| `OMNIGRAPH_EMBED_RETRY_ATTEMPTS` / `OMNIGRAPH_EMBED_RETRY_BACKOFF_MS` | retry policy (defaults `4` / `200`) |
| `OMNIGRAPH_EMBEDDINGS_MOCK` | set truthy to force the deterministic mock provider |
The default zero-config path is OpenRouter: set `OPENROUTER_API_KEY` and run. Reaching Gemini takes
`OMNIGRAPH_EMBED_PROVIDER=gemini` plus `GEMINI_API_KEY`.
### Behavior notes
- **Bounded latency.** Each embed call is wrapped in `OMNIGRAPH_EMBED_QUERY_DEADLINE_MS`, so a degraded
provider cannot hang a read for the full retry envelope.
- **Reuse.** The query path builds the client once per graph handle (on the first `nearest($v, "string")`
that needs embedding) and reuses it, keeping the provider connection pool warm. A graph that never embeds
needs no provider key.
- **Observability.** Embed calls emit `tracing` events under `target = "omnigraph::embedding"` (provider,
model, dim, attempt, elapsed, outcome).
## `@embed` schema annotation
Mark a Vector property with `@embed("source_text_property")`. Today this is a **catalog annotation** consumed
by the query typechecker and linter: it records which String property is the embedding source and lets
`nearest($v, "string")` auto-embed a query string for comparison against that vector column.
**It does not embed at ingest.** Stored vectors are supplied directly in your load data, or pre-filled by the
offline `omnigraph embed` pipeline below. (Ingest-time execution of `@embed` is a planned enhancement.)
## CLI `omnigraph embed` (offline file pipeline)
Operates on **JSONL files** (not on a graph), using the same resolved provider config. Three modes (mutually
exclusive):
- (default) `fill_missing` — only embed rows whose target field is empty
- `--reembed-all` — overwrite all
- `--clean` — strip embeddings
Inputs are either a single seed manifest YAML or `--input/--output/--spec`. Selectors `--type T`, `--select T:field=value` filter rows. Streams JSONL → JSONL.
## Migration
This release has no backwards-compatibility shim (pre-release). The default provider is now OpenRouter, and
the legacy `OMNIGRAPH_GEMINI_BASE_URL` is removed. A graph whose vectors were produced with
`gemini-embedding-2-preview` should either re-embed, or pin the query-time embedder to match by setting
`OMNIGRAPH_EMBED_PROVIDER=gemini` and `OMNIGRAPH_EMBED_MODEL=gemini-embedding-2-preview` (the stored and query
vectors must come from the same model to be comparable).