mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-18 02:24:27 +02:00
docs(rfc): add RFC-012 provider-independent embedding configuration
Records the design for one resolved EmbeddingConfig behind a sealed Provider enum (OpenAiCompatible primary, covering OpenRouter as the default gateway, OpenAI direct, and self-hosted endpoints; native Gemini for task-type asymmetry; Mock). Identity recorded in the schema IR with query-time same-space validation, NFR floor (deadline, observability, single normalize), and a 5-phase rollout. Corrects the prior handoff's misdiagnosis (the OpenAI client it blamed was dead code). Links from docs/dev/index.md.
This commit is contained in:
parent
c3d7639377
commit
5b83e6b56d
2 changed files with 286 additions and 0 deletions
|
|
@ -81,6 +81,7 @@ Working documents for in-flight feature work. Removed when the work lands.
|
|||
| Unify CLI embedded/remote access paths — parity referee, shared wire-DTO crate, `GraphClient` trait, declared plane capabilities | [rfc-009-unify-access-paths.md](rfc-009-unify-access-paths.md) |
|
||||
| Restructure the CLI around explicit planes — one graph-addressing model, declared capability surface, plane-grouped help (expands RFC-009 Phase 4) | [rfc-010-cli-planes-restructure.md](rfc-010-cli-planes-restructure.md) |
|
||||
| CLI refactoring — one addressing & config model post-`omnigraph.yaml`: scope + `--graph` + derived access path, served-default / privileged-direct, profiles, named queries, capability classifier (completes RFC-008) | [rfc-011-cli-refactoring.md](rfc-011-cli-refactoring.md) |
|
||||
| Provider-independent embedding configuration — one resolved `EmbeddingConfig` + sealed provider enum (Gemini/OpenAI/Mock), identity recorded in the schema IR, query-time same-space validation, NFR floor | [rfc-012-embedding-provider-config.md](rfc-012-embedding-provider-config.md) |
|
||||
|
||||
## Boundary
|
||||
|
||||
|
|
|
|||
285
docs/dev/rfc-012-embedding-provider-config.md
Normal file
285
docs/dev/rfc-012-embedding-provider-config.md
Normal file
|
|
@ -0,0 +1,285 @@
|
|||
# RFC: Provider-Independent Embedding Configuration
|
||||
|
||||
**Status:** Proposed
|
||||
**Date:** 2026-06-15
|
||||
**Builds on:** the engine embedding client (`crates/omnigraph/src/embedding.rs`), the `@embed` catalog
|
||||
annotation (`omnigraph-compiler/src/catalog`), the reserved cluster `embeddings`/`providers` fields
|
||||
([cluster-config-specs.md](cluster-config-specs.md), [rfc-007-operator-config.md](rfc-007-operator-config.md)
|
||||
for the secret-resolution pattern).
|
||||
**Target release:** staged — NFR floor first, then the provider-independent config core; ingest-time `@embed`
|
||||
execution is a separate later phase.
|
||||
|
||||
## Summary
|
||||
|
||||
OmniGraph's embedding subsystem is **hardwired to a single provider (Google Gemini)** and has no recorded
|
||||
link between the model that produced a stored vector and the model that embeds a query string. Today that
|
||||
happens to be self-consistent (one live client embeds both sides), but it is consistent by accident, not by
|
||||
construction: the provider is hardcoded, the model is a moving `-preview` target, nothing validates that a
|
||||
query vector and a stored vector share a space, and the one configurable knob (key + base URL) cannot change
|
||||
the provider or model.
|
||||
|
||||
This RFC makes embedding **provider-independent**: one resolved `EmbeddingConfig { provider, model, base_url,
|
||||
api_key, dim, normalize }` behind a sealed provider abstraction, resolved once and shared by every embedder.
|
||||
The **primary variant is OpenAI-compatible** — a single request/response shape (`POST {base}/embeddings`,
|
||||
`{model, input, dimensions}`) that covers **OpenRouter** (the recommended default gateway, one key for Gemini,
|
||||
OpenAI, Mistral, BGE, Qwen, sentence-transformers, …), OpenAI direct, and any self-hosted OpenAI-compatible
|
||||
endpoint (vLLM, Ollama, LM Studio, Together). A native **Gemini** (`generativelanguage`) variant is retained
|
||||
for shops that want to hit Google directly with its `RETRIEVAL_QUERY`/`RETRIEVAL_DOCUMENT` task-type
|
||||
asymmetry, plus a deterministic **Mock**. The embedding *identity* (provider + model + dim) is recorded in the
|
||||
schema IR so it travels with the data, and a query whose resolved embedder cannot match the stored vectors'
|
||||
recorded identity is **rejected with a typed error instead of silently ranking across vector spaces.**
|
||||
Provider/endpoint wiring lands on the already-reserved cluster `providers.embedding` field; secrets follow the
|
||||
existing operator-credential pattern; no secret ever enters the schema.
|
||||
|
||||
This RFC supersedes the framing in `docs/user/search/embeddings.md` that described "two embedding clients
|
||||
with different defaults" — one of those clients was dead code with zero callers and has been removed (see
|
||||
Phase 1); the OpenAI request shape returns as a first-class *provider variant* of the one client, not as a
|
||||
second parallel client.
|
||||
|
||||
## Motivation
|
||||
|
||||
This work originated in an external handoff that reported a live cross-provider bug: gemini-3072 stored
|
||||
vectors compared against OpenAI-1536 query vectors, silently. Investigation against the current source showed
|
||||
the reported mechanism is **inaccurate** — the OpenAI client it blamed (`omnigraph-compiler/src/embedding.rs`)
|
||||
was `pub(crate)`, `#![allow(dead_code)]`, and had **zero callers**; the live `nearest("string")` path and the
|
||||
offline `omnigraph embed` CLI both use the engine **Gemini** client; and `@embed` does no ingest-time
|
||||
embedding at all. So the documented happy path is self-consistent. But the investigation surfaced four real
|
||||
problems the handoff's instincts correctly smelled:
|
||||
|
||||
- **P1 — Provider is hardwired.** The one live client builds Google `generativelanguage` requests; only key +
|
||||
base URL are configurable, not the provider or model. A non-Gemini shop cannot use `nearest("string")`
|
||||
without a Gemini key, and cannot make it produce non-Gemini vectors. If they store their own vectors and
|
||||
query with `nearest("string")`, the query is embedded with Gemini → a silent cross-space ranking. This is
|
||||
the handoff's failure, reached by a different cause.
|
||||
- **P2 — A dead, divergent second client + stale docs** invited exactly the misdiagnosis the handoff made.
|
||||
- **P3 — No same-space guarantee recorded with the data.** Nothing stamps which model/dim produced a stored
|
||||
vector, so write-side and read-side embedders can drift with no validation.
|
||||
- **P4 — `@embed` is declarative-in-name-only.** It records a source property for the typechecker but never
|
||||
embeds at ingest; the docs claimed otherwise.
|
||||
|
||||
Per the project's first principle, the lower-liability shape is **one provider-independent client with the
|
||||
identity recorded next to the data**, not N independently-defaulted clients kept in lockstep by discipline.
|
||||
Hardcoding one provider mortgages every future "we need OpenAI / a local model / Vertex" against a rewrite;
|
||||
recording identity once closes the silent-wrong-results class by construction.
|
||||
|
||||
## Current state — which API we actually use
|
||||
|
||||
| | Live engine client (`crates/omnigraph/src/embedding.rs`) | Deleted dead client (was `omnigraph-compiler/src/embedding.rs`) |
|
||||
|---|---|---|
|
||||
| Provider | **Google Gemini Developer API** (`generativelanguage`, *not* Vertex AI) | OpenAI |
|
||||
| Endpoint | `POST {base}/models/{model}:embedContent` | `POST {base}/embeddings` |
|
||||
| Auth | header `x-goog-api-key`, env `GEMINI_API_KEY` | `Authorization: Bearer`, env `OPENAI_API_KEY` |
|
||||
| Model | `gemini-embedding-2-preview` (hardcoded) | `text-embedding-3-small` (env `NANOGRAPH_EMBED_MODEL`) |
|
||||
| Base default | `https://generativelanguage.googleapis.com/v1beta` | `https://api.openai.com/v1` |
|
||||
| Request body | `{model, content:{parts:[{text}]}, taskType, outputDimensionality}` | `{model, input:[…], dimensions}` |
|
||||
| Response | `{embedding:{values:[f32]}}` | `{data:[{index, embedding:[f32]}]}` |
|
||||
| Task types | `RETRIEVAL_QUERY` / `RETRIEVAL_DOCUMENT` | none |
|
||||
| Status | **live** — used by `nearest("string")` and `omnigraph embed` | **removed in Phase 1** (zero callers) |
|
||||
|
||||
Both shapes honour a requested output dimensionality (Gemini `outputDimensionality`, OpenAI `dimensions`)
|
||||
driven by the target column width, so dimension is already schema-driven. The two known shapes are exactly the
|
||||
two initial provider variants this RFC defines — the OpenAI shape returns from git history as a `Provider`
|
||||
variant of the single client.
|
||||
|
||||
## Guide-level explanation
|
||||
|
||||
### Configuring a provider (operator view)
|
||||
|
||||
Pick a provider for the graph in `cluster.yaml` (the team-owned surface), referencing a secret by name. The
|
||||
recommended default routes through OpenRouter (OpenAI-compatible, one key for many models):
|
||||
|
||||
```yaml
|
||||
providers:
|
||||
embedding:
|
||||
default:
|
||||
kind: openai-compatible # openai-compatible | gemini | mock
|
||||
base_url: https://openrouter.ai/api/v1
|
||||
model: google/gemini-embedding-2 # or openai/text-embedding-3-large, mistralai/mistral-embed, …
|
||||
dimension: 3072
|
||||
api_key: ${OPENROUTER_API_KEY}
|
||||
```
|
||||
|
||||
The same `openai-compatible` kind points at OpenAI direct (`base_url: https://api.openai.com/v1`,
|
||||
`model: text-embedding-3-large`) or a self-hosted endpoint (vLLM/Ollama/LM Studio) by changing `base_url`. Use
|
||||
`kind: gemini` only to reach Google's `generativelanguage` API directly (it keeps the query/document
|
||||
task-type asymmetry that the OpenAI-compatible shape does not expose).
|
||||
|
||||
The zero-config tier keeps working with env only (`OMNIGRAPH_EMBED_PROVIDER`, `OMNIGRAPH_EMBED_BASE_URL`,
|
||||
`OMNIGRAPH_EMBED_MODEL`, and the provider api-key env — `OPENROUTER_API_KEY` / `OPENAI_API_KEY` /
|
||||
`GEMINI_API_KEY`), so no cluster file is required for a single-graph setup.
|
||||
|
||||
### Recording identity in the schema
|
||||
|
||||
`@embed` grows optional arguments that pin the embedding identity to the vector column:
|
||||
|
||||
```pg
|
||||
node Doc {
|
||||
slug: String @key
|
||||
text: String
|
||||
v: Vector(3072) @embed("text", model="gemini-embedding-2", dim=3072) @index
|
||||
}
|
||||
```
|
||||
|
||||
The single-argument form `@embed("text")` keeps working unchanged. The recorded identity persists in the
|
||||
schema IR (`_schema.ir.json`) and so travels with `schema apply` and `schema show`.
|
||||
|
||||
### What a mismatch looks like
|
||||
|
||||
If the resolved read-side embedder cannot produce the recorded identity (wrong model, wrong dim, wrong
|
||||
provider), `nearest($v, "string")` fails with a typed error naming both sides, instead of returning a
|
||||
plausible-but-meaningless ranking. Changing the recorded identity on an existing column is a loud schema-apply
|
||||
refusal (it is a re-embed, a deliberate migration step), reusing the migration planner's existing
|
||||
annotation-change rejection.
|
||||
|
||||
## Reference-level design
|
||||
|
||||
### One client, sealed provider abstraction
|
||||
|
||||
Replace the two-variant `EmbeddingTransport` with a resolved config plus a sealed provider enum:
|
||||
|
||||
```text
|
||||
EmbeddingConfig { provider: Provider, model, base_url, api_key, dim, normalize }
|
||||
enum Provider {
|
||||
OpenAiCompatible, // POST {base}/embeddings, Bearer auth, {model, input, dimensions} → {data:[{embedding,index}]}
|
||||
// covers OpenRouter (default gateway), OpenAI direct, vLLM/Ollama/LM Studio/Together
|
||||
Gemini, // POST {base}/models/{model}:embedContent, x-goog-api-key, with RETRIEVAL_QUERY/DOCUMENT task types
|
||||
Mock, // deterministic, offline
|
||||
}
|
||||
struct EmbeddingClient { config, http, retry, deadline }
|
||||
```
|
||||
|
||||
`Provider` owns the per-API differences (endpoint suffix, auth header, request JSON, response JSON, task-type
|
||||
support); the client owns retry/backoff, the deadline, normalization, and tracing — all provider-independent.
|
||||
**OpenRouter is not a distinct variant** — it is `OpenAiCompatible` with `base_url =
|
||||
https://openrouter.ai/api/v1`, which is the point: one OpenAI-compatible shape gives provider-independence
|
||||
across every model OpenRouter fronts, so the gateway does the multi-provider fan-out and OmniGraph carries one
|
||||
request shape. The native `Gemini` variant exists only for direct-to-Google with task-type asymmetry. An enum
|
||||
(not a trait) is the earned complexity for this small, first-party set; if third-party plug-in providers are
|
||||
ever needed, the enum becomes a trait behind the same `EmbeddingConfig` surface without touching callers.
|
||||
|
||||
The OpenAI-compatible `input` accepts an **array**, giving batch embedding for free — which the later
|
||||
ingest phase needs for throughput, and which removes the open dependency on Gemini's native
|
||||
`batchEmbedContents`.
|
||||
|
||||
### Config resolution (resolved once, shared)
|
||||
|
||||
Precedence, highest first: cluster `providers.embedding.<name>` profile → env (`OMNIGRAPH_EMBED_*`, provider
|
||||
api-key env) → built-in defaults. The api-key is resolved through the existing operator credential chain
|
||||
(`${NAME}` → env / `~/.omnigraph/credentials` / server `TokenSource`); it never lives in the schema or any
|
||||
checked-in file. Resolution happens once; the resolved client is shared by `nearest("string")` and the
|
||||
offline CLI (replacing the per-query `EmbeddingClient::from_env()` rebuild at `exec/query.rs:238`).
|
||||
|
||||
### Identity recorded in the schema IR (not a new store)
|
||||
|
||||
The `@embed` args serialize into `PropertyIR.annotations` → `_schema.ir.json`, which `schema apply` already
|
||||
persists atomically and which the catalog (the one thing `nearest()` reads at query time) is built from. No
|
||||
new metadata store, no manifest column, no extra read on the query path. The migration planner already rejects
|
||||
non-description annotation changes as `UnsupportedChange`, so "recorded identity is immutable without a
|
||||
deliberate re-embed migration" is the default behaviour, not new code. (A second, optional copy in Lance
|
||||
field metadata — co-located with the vectors — is available later by activating the currently no-op
|
||||
`UpdatePropertyMetadata` migration step; out of scope here.)
|
||||
|
||||
### Query-time validation
|
||||
|
||||
`resolve_nearest_query_vec` compares the resolved read-side identity against the column's recorded identity
|
||||
before embedding; on mismatch it returns a typed `OmniError` naming recorded vs resolved (model, dim,
|
||||
provider). This is the only behaviour that closes P3 by construction.
|
||||
|
||||
### NFR floor (independent of the provider work)
|
||||
|
||||
- **Deadline:** wrap the query-time embed in a total-operation deadline (`OMNIGRAPH_EMBED_QUERY_DEADLINE_MS`)
|
||||
so a degraded provider cannot hang a read for the current ~121 s worst case (4 × 30 s timeout + backoff).
|
||||
- **Observability:** `tracing` span per embed call (provider, model, dim, attempts, outcome, elapsed; `warn!`
|
||||
per retry; token usage when the provider returns it). The subsystem has zero instrumentation today.
|
||||
- **Single normalization:** one `normalize_vector` (the dead client carried a divergent second copy; removed
|
||||
in Phase 1).
|
||||
- **Stable model:** make the model configurable and default to a stable (non-`-preview`) model once the GA
|
||||
name is confirmed.
|
||||
|
||||
### Ingest-time `@embed` (later phase, not this RFC's core)
|
||||
|
||||
Making `@embed` embed at ingest is a separate phase with a hard constraint: embedding is a slow, external,
|
||||
**non-idempotent** side effect, so it must run **entirely before staging** — in the pure in-memory phase,
|
||||
before any `stage_*`/Lance HEAD move, alongside the existing constraint validation — so a mid-load provider
|
||||
failure aborts with zero drift. It must never sit inside or after the commit protocol, because the recovery
|
||||
sweep cannot re-run or undo an external embedding. It also needs a content-hash skip (so `load --mode
|
||||
overwrite` does not re-bill every row), batching, and a bounded-concurrency stage. Specified here only to fix
|
||||
the design constraint; deferred to its own RFC/phase.
|
||||
|
||||
### Phasing (implementation order)
|
||||
|
||||
| Phase | Scope | Demo |
|
||||
|---|---|---|
|
||||
| **1 — NFR floor + dead-client removal** | deadline, observability, single normalize, configurable model, delete dead client + `NANOGRAPH_*` | a hung provider fails at the deadline; embed calls traced; `rg NANOGRAPH_` empty |
|
||||
| **2 — Provider-independent config** | `EmbeddingConfig` + `Provider` enum (OpenAiCompatible covering OpenRouter/OpenAI/local, Gemini, Mock); env-first resolution; client reuse | point `base_url` at OpenRouter, run `nearest("string")`, get correct neighbours vs OpenRouter-stored vectors; CLI shares the config |
|
||||
| **3 — Record identity in schema IR** | `@embed` args grammar + catalog + IR persistence | `schema show` reflects recorded model/dim |
|
||||
| **4 — Query-time validation** | compare resolved vs recorded; typed error; planner refusal on identity change | stored model A vs read model B → loud error, never silent garbage |
|
||||
| **5 — Cluster provider wiring** | un-reserve `providers.embedding`; `${NAME}` resolution | provider profile resolved from `cluster.yaml`; legacy `omnigraph.yaml` untouched |
|
||||
| later | ingest-time `@embed` (Shape C) | separate RFC |
|
||||
|
||||
## Invariants & deny-list check
|
||||
|
||||
- **Invariant 9 (integrity failures are loud):** strengthened — query-time identity mismatch becomes a typed
|
||||
error instead of silent wrong results.
|
||||
- **Invariant 10 (query semantics are first-class IR concepts):** embedding identity becomes IR/catalog data,
|
||||
not an out-of-band env guess.
|
||||
- **Invariant 11 (transport stays at the boundary):** strengthened — Phase 1 removes the HTTP client + async
|
||||
runtime (`reqwest`, `tokio`) from `omnigraph-compiler`, whose own manifest advertises "Zero Lance
|
||||
dependency"; the embedding HTTP client lives only in the engine.
|
||||
- **Invariant 12 / secret handling:** api-keys resolve through the existing credential chain; never in schema
|
||||
or checked-in config.
|
||||
- **Invariant 13 (bounded & observable):** addressed — the deadline bounds latency; tracing makes the
|
||||
subsystem observable.
|
||||
- **Deny-list — "silent fallback / dropped rows":** the cross-space ranking is exactly a silent-wrong-result;
|
||||
this RFC closes it.
|
||||
- **Deny-list — "new write paths that advance Lance HEAD before manifest publish without a recovery
|
||||
sidecar":** the ingest phase (deferred) explicitly keeps embedding *before* staging, so it does not create a
|
||||
new HEAD-advancing write path. No invariant is weakened.
|
||||
|
||||
## Drawbacks & alternatives
|
||||
|
||||
- **Do nothing.** The happy path works today, so the live risk is narrow (P1 + P3). But the provider hardwiring
|
||||
and missing validation are a latent silent-wrong-results class that bites the first non-Gemini user.
|
||||
- **Interim env-only provider switch (no schema record).** Cheaper, but leaves the same-space guarantee to
|
||||
operator discipline (fails P3). Folded in as Phase 2's env-first resolution, with Phases 3–4 adding the
|
||||
record/validate guarantee.
|
||||
- **Trait-based provider plug-ins now.** Rejected as unearned complexity for two first-party providers; the
|
||||
enum upgrades to a trait behind the same surface if needed.
|
||||
- **Stamp identity in the manifest or Lance field metadata instead of the IR.** The manifest is the wrong
|
||||
granularity; field metadata needs net-new wiring and a query-path dataset open. The IR is where `@embed`
|
||||
already lives and is already read at query time (see spike).
|
||||
|
||||
## Reversibility
|
||||
|
||||
Mostly reversible. Phases 1–2 and 5 are code/config (env, CLI, cluster keys) and cheap to undo. Phase 3
|
||||
(recording identity in the schema IR) is **near-permanent** — it changes the on-disk `_schema.ir.json` shape
|
||||
and the schema hash — so it earns the most scrutiny: the single-arg `@embed` form stays byte-compatible, and
|
||||
recorded identity is additive (absent identity = today's behaviour). Provider request/response shapes are
|
||||
external API contracts, not our format, so adding providers is reversible.
|
||||
|
||||
## Gateway tradeoff (OpenRouter)
|
||||
|
||||
Routing through OpenRouter (the default) buys provider-independence with one key and one billing relationship,
|
||||
batch input, and access to the GA `google/gemini-embedding-2`. Costs to accept, all controllable:
|
||||
|
||||
- **Extra network hop** → more query-path latency. The Phase-1 deadline bounds it; the cache mitigates repeats.
|
||||
- **Text transits a third party.** OpenRouter's `provider: { data_collection }` routing preference controls
|
||||
retention; shops with strict residency requirements use `kind: gemini`/`openai-compatible` pointed at the
|
||||
provider (or a self-hosted endpoint) directly instead of the gateway. Provider-independence means this is a
|
||||
config change, not a code change.
|
||||
- **Loses Gemini's task-type asymmetry** when Gemini is reached via the OpenAI-compatible gateway (both sides
|
||||
embed symmetrically). This is a retrieval-quality cost, **not** a same-space correctness cost — both stored
|
||||
and query vectors take the identical path, so they stay in one space by construction. Shops that want the
|
||||
asymmetry use `kind: gemini`.
|
||||
|
||||
## Unresolved questions
|
||||
|
||||
- GA Gemini model name — **resolved:** `google/gemini-embedding-2` (via OpenRouter) / `gemini-embedding-2`
|
||||
(direct), 128–3072 dims (recommended 768/1536/3072). Default flips off `-preview` in Phase 2.
|
||||
- Gemini `batchEmbedContents` availability — **moot** when going through the OpenAI-compatible gateway (its
|
||||
`input` array batches); still relevant only for the direct `kind: gemini` path.
|
||||
- Identity granularity: per-vector-property args vs one graph-level default profile referenced by name.
|
||||
- Whether to backfill recorded identity for existing graphs, or treat absent-identity as "unvalidated, legacy"
|
||||
permanently.
|
||||
- Default model for the zero-config tier: `google/gemini-embedding-2` vs `openai/text-embedding-3-large`
|
||||
(both 3072-capable) — pick the project default.
|
||||
Loading…
Add table
Add a link
Reference in a new issue