mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-24 02:38:06 +02:00
Wire cluster embedding providers
This commit is contained in:
parent
8d2128438e
commit
16e4a833c0
16 changed files with 1132 additions and 165 deletions
|
|
@ -1,9 +1,9 @@
|
|||
# RFC: Provider-Independent Embedding Configuration
|
||||
|
||||
**Status:** Proposed
|
||||
**Status:** Accepted — Phases 1-5 implemented
|
||||
**Date:** 2026-06-15
|
||||
**Builds on:** the engine embedding client (`crates/omnigraph/src/embedding.rs`), the `@embed` catalog
|
||||
annotation (`omnigraph-compiler/src/catalog`), the reserved cluster `embeddings`/`providers` fields
|
||||
annotation (`omnigraph-compiler/src/catalog`), the cluster `providers.embedding` surface
|
||||
([cluster-config-specs.md](cluster-config-specs.md), [rfc-007-operator-config.md](rfc-007-operator-config.md)
|
||||
for the secret-resolution pattern).
|
||||
**Target release:** staged — NFR floor first, then the provider-independent config core; ingest-time `@embed`
|
||||
|
|
@ -95,14 +95,18 @@ providers:
|
|||
kind: openai-compatible # openai-compatible | gemini | mock
|
||||
base_url: https://openrouter.ai/api/v1
|
||||
model: google/gemini-embedding-2 # or openai/text-embedding-3-large, mistralai/mistral-embed, …
|
||||
dimension: 3072
|
||||
api_key: ${OPENROUTER_API_KEY}
|
||||
graphs:
|
||||
knowledge:
|
||||
schema: knowledge.pg
|
||||
embedding_provider: default
|
||||
```
|
||||
|
||||
The same `openai-compatible` kind points at OpenAI direct (`base_url: https://api.openai.com/v1`,
|
||||
`model: text-embedding-3-large`) or a self-hosted endpoint (vLLM/Ollama/LM Studio) by changing `base_url`. Use
|
||||
`kind: gemini` only to reach Google's `generativelanguage` API directly (it keeps the query/document
|
||||
task-type asymmetry that the OpenAI-compatible shape does not expose).
|
||||
task-type asymmetry that the OpenAI-compatible shape does not expose). Dimensions are schema-driven by the
|
||||
target `Vector(N)` column, not duplicated in the provider profile.
|
||||
|
||||
The zero-config tier keeps working with env only (`OMNIGRAPH_EMBED_PROVIDER`, `OMNIGRAPH_EMBED_BASE_URL`,
|
||||
`OMNIGRAPH_EMBED_MODEL`, and the provider api-key env — `OPENROUTER_API_KEY` / `OPENAI_API_KEY` /
|
||||
|
|
@ -163,11 +167,12 @@ ingest phase needs for throughput, and which removes the open dependency on Gemi
|
|||
|
||||
### Config resolution (resolved once, shared)
|
||||
|
||||
Precedence, highest first: cluster `providers.embedding.<name>` profile → env (`OMNIGRAPH_EMBED_*`, provider
|
||||
api-key env) → built-in defaults. The api-key is resolved through the existing operator credential chain
|
||||
(`${NAME}` → env / `~/.omnigraph/credentials` / server `TokenSource`); it never lives in the schema or any
|
||||
checked-in file. Resolution happens once; the resolved client is shared by `nearest("string")` and the
|
||||
offline CLI (replacing the per-query `EmbeddingClient::from_env()` rebuild at `exec/query.rs:238`).
|
||||
Precedence, highest first for served cluster graphs: applied cluster `providers.embedding.<name>` profile →
|
||||
env (`OMNIGRAPH_EMBED_*`, provider api-key env) → built-in defaults. The cluster `api_key` value is a
|
||||
`${NAME}` env reference resolved at server boot; plaintext never lives in the schema, state ledger, or any
|
||||
checked-in file. Resolution happens once per graph handle; the resolved client is shared by
|
||||
`nearest("string")`. Direct single-graph serving, embedded callers, and the offline CLI keep the env path
|
||||
unless they inject an `EmbeddingConfig` directly.
|
||||
|
||||
### Identity recorded in the schema IR (not a new store)
|
||||
|
||||
|
|
@ -215,12 +220,12 @@ the design constraint; deferred to its own RFC/phase.
|
|||
| **2 — Provider-independent config** | `EmbeddingConfig` + `Provider` enum (OpenAiCompatible covering OpenRouter/OpenAI/local, Gemini, Mock); env-first resolution; client reuse | point `base_url` at OpenRouter, run `nearest("string")`, get correct neighbours vs OpenRouter-stored vectors; CLI shares the config |
|
||||
| **3 — Record identity in schema IR** | `@embed` args grammar + catalog + IR persistence | `schema show` reflects recorded model/dim |
|
||||
| **4 — Query-time validation** | compare resolved vs recorded; typed error; planner refusal on identity change | stored model A vs read model B → loud error, never silent garbage |
|
||||
| **5 — Cluster provider wiring** | un-reserve `providers.embedding`; `${NAME}` resolution | provider profile resolved from `cluster.yaml`; legacy `omnigraph.yaml` untouched |
|
||||
| **5 — Cluster provider wiring** | `providers.embedding` resources; `graphs.<id>.embedding_provider`; `${NAME}` resolution at server boot | provider profile resolved from applied cluster state; legacy `omnigraph.yaml` untouched |
|
||||
| later | ingest-time `@embed` (Shape C) | separate RFC |
|
||||
|
||||
**Status:** Phases 1–4 are implemented (`@embed("…", model="…")` is recorded in the schema IR and validated at
|
||||
query time with a typed same-space error; an unrecorded `@embed` keeps working with no check). Phase 5 (cluster
|
||||
`providers.embedding` wiring) and ingest-time `@embed` remain.
|
||||
**Status:** Phases 1–5 are implemented (`@embed("…", model="…")` is recorded in the schema IR and validated at
|
||||
query time with a typed same-space error; an unrecorded `@embed` keeps working with no check; cluster-served
|
||||
graphs can bind an applied `providers.embedding` profile). Ingest-time `@embed` remains.
|
||||
|
||||
## Invariants & deny-list check
|
||||
|
||||
|
|
|
|||
|
|
@ -13,7 +13,8 @@ catalog writes, **graph creation** (a declared graph that does not exist yet
|
|||
is initialized by apply at the derived root), **schema updates** (soft drops
|
||||
only), and — behind an explicit, digest-bound **approval** — **graph
|
||||
deletion**. It does not perform data-loss schema migrations, start servers,
|
||||
or serve anything it applies: the server still boots from `omnigraph.yaml`.
|
||||
or run data loads. A server can boot from the applied ledger with
|
||||
`omnigraph-server --cluster <config-dir | storage-root>`.
|
||||
|
||||
## Commands
|
||||
|
||||
|
|
@ -57,7 +58,7 @@ The exact contract:
|
|||
|
||||
## Supported `cluster.yaml`
|
||||
|
||||
Stage 3A accepts only this resource subset:
|
||||
The current config surface accepts this resource subset:
|
||||
|
||||
```yaml
|
||||
version: 1
|
||||
|
|
@ -68,9 +69,18 @@ state:
|
|||
backend: cluster
|
||||
lock: true
|
||||
|
||||
providers:
|
||||
embedding:
|
||||
default:
|
||||
kind: openai-compatible
|
||||
base_url: https://openrouter.ai/api/v1
|
||||
model: openai/text-embedding-3-large
|
||||
api_key: ${OPENROUTER_API_KEY}
|
||||
|
||||
graphs:
|
||||
knowledge:
|
||||
schema: knowledge.pg
|
||||
embedding_provider: default
|
||||
queries: queries/ # discover every `query <name>` in queries/*.gq
|
||||
|
||||
policies:
|
||||
|
|
@ -99,6 +109,17 @@ updates all of its queries together. Paths are relative to the config
|
|||
directory — the cluster is one explicit folder, so no `./` prefixes are
|
||||
needed.
|
||||
|
||||
`providers.embedding.<name>` defines a query-time embedding provider profile
|
||||
for cluster-served graphs. A graph opts in with `embedding_provider: <name>`;
|
||||
bare names normalize to `provider.embedding.<name>`. Supported provider
|
||||
`kind` values are `openai-compatible` (default/OpenRouter-compatible),
|
||||
`openai` (OpenAI's own host), `gemini`, and `mock`. Real providers require
|
||||
`api_key: ${ENV_VAR}`; inline secrets are rejected. The env var is resolved
|
||||
only when a `--cluster` server boots, so `cluster validate`, `plan`, and
|
||||
`apply` do not need deployment secrets. `mock` is deterministic and does not
|
||||
require `api_key`. Vector dimensions stay schema-driven by the target
|
||||
`Vector(N)` column, not the provider profile.
|
||||
|
||||
`storage:` (optional) is the **storage root URI** for everything the cluster
|
||||
stores — the state ledger, lock, content-addressed catalog, recovery
|
||||
sidecars, approval artifacts, and the derived graph roots
|
||||
|
|
@ -133,10 +154,12 @@ operation is active.
|
|||
- stored-query parsing and query-name matching
|
||||
- stored-query type-checking against the desired schema
|
||||
- policy `applies_to` graph references
|
||||
- embedding provider profiles and graph `embedding_provider` references
|
||||
|
||||
Fields reserved for later phases, such as `pipelines`, `embeddings`, `ui`,
|
||||
`aliases`, and `bindings`, fail with a typed diagnostic instead of being
|
||||
silently ignored.
|
||||
Fields reserved for later phases, such as `pipelines`, top-level
|
||||
`embeddings`, `ui`, `aliases`, and `bindings`, fail with a typed diagnostic
|
||||
instead of being silently ignored. Under `providers`, only `embedding` is
|
||||
supported today; other provider namespaces fail as unsupported config.
|
||||
|
||||
## Planning
|
||||
|
||||
|
|
@ -156,9 +179,21 @@ resource is planned as a create. If present, the file must use this shape:
|
|||
"applied_revision": {
|
||||
"config_digest": "...",
|
||||
"resources": {
|
||||
"graph.knowledge": { "digest": "..." },
|
||||
"schema.knowledge": { "digest": "..." },
|
||||
"query.knowledge.find_experts": { "digest": "..." },
|
||||
"provider.embedding.default": {
|
||||
"digest": "...",
|
||||
"embedding_profile": {
|
||||
"kind": "openai-compatible",
|
||||
"base_url": "https://openrouter.ai/api/v1",
|
||||
"model": "openai/text-embedding-3-large",
|
||||
"api_key": "${OPENROUTER_API_KEY}"
|
||||
}
|
||||
},
|
||||
"graph.knowledge": {
|
||||
"digest": "...",
|
||||
"embedding_provider": "provider.embedding.default"
|
||||
},
|
||||
"policy.base": {
|
||||
"digest": "...",
|
||||
"applies_to": ["cluster", "graph.knowledge"]
|
||||
|
|
|
|||
|
|
@ -19,11 +19,13 @@
|
|||
| Expand mode override | `OMNIGRAPH_TRAVERSAL_MODE` (`indexed`\|`csr`; unset = cost-based auto) | traversal |
|
||||
| Default body limit | `1 MB` | HTTP server |
|
||||
| Ingest body limit | `32 MB` | HTTP server |
|
||||
| Engine embed model | `gemini-embedding-2-preview` | engine embedding |
|
||||
| Compiler embed model | `text-embedding-3-small` | compiler embedding |
|
||||
| Embed timeout | `30 000 ms` | both clients |
|
||||
| Embed retries | `4` | both clients |
|
||||
| Embed retry backoff | `200 ms` | both clients |
|
||||
| Default embed provider/model | `openai-compatible` / `openai/text-embedding-3-large` | engine embedding |
|
||||
| OpenAI-direct embed model | `text-embedding-3-large` | engine embedding |
|
||||
| Gemini-direct embed model | `gemini-embedding-2` | engine embedding |
|
||||
| Embed deadline | `OMNIGRAPH_EMBED_DEADLINE_MS=60000` | engine embedding |
|
||||
| Embed timeout | `OMNIGRAPH_EMBED_TIMEOUT_MS=30000` | engine embedding |
|
||||
| Embed retries | `OMNIGRAPH_EMBED_RETRY_ATTEMPTS=4` | engine embedding |
|
||||
| Embed retry backoff | `OMNIGRAPH_EMBED_RETRY_BACKOFF_MS=200` | engine embedding |
|
||||
| LANCE memory pool default | `1 GB` (raised in v0.3.0) | runtime |
|
||||
|
||||
**Expand traversal dispatch.** With `OMNIGRAPH_TRAVERSAL_MODE` unset, the engine
|
||||
|
|
|
|||
|
|
@ -16,6 +16,36 @@ query vectors and document vectors share one model and one vector space.
|
|||
Vectors are stored L2-normalized as `FixedSizeList(Float32, dim)`; the requested output dimension is driven by
|
||||
the target column width and sent as Gemini `outputDimensionality` / OpenAI `dimensions`.
|
||||
|
||||
## Configuration (cluster)
|
||||
|
||||
Cluster-served graphs can pin their query-time embedder in `cluster.yaml`:
|
||||
|
||||
```yaml
|
||||
providers:
|
||||
embedding:
|
||||
default:
|
||||
kind: openai-compatible
|
||||
base_url: https://openrouter.ai/api/v1
|
||||
model: openai/text-embedding-3-large
|
||||
api_key: ${OPENROUTER_API_KEY}
|
||||
|
||||
graphs:
|
||||
knowledge:
|
||||
schema: knowledge.pg
|
||||
embedding_provider: default
|
||||
```
|
||||
|
||||
`embedding_provider` references `providers.embedding.<name>`; bare names are
|
||||
normalized to that typed ref. The server resolves `${ENV_VAR}` only when it
|
||||
boots from the applied cluster ledger, so `cluster validate`, `plan`, and
|
||||
`apply` do not need provider secrets. Inline API keys are rejected. `mock`
|
||||
needs no key. Vector dimensions stay schema-driven by the target `Vector(N)`
|
||||
column.
|
||||
|
||||
Direct single-graph serving, embedded callers, and the offline
|
||||
`omnigraph embed` pipeline use environment configuration unless they inject an
|
||||
`EmbeddingConfig` directly.
|
||||
|
||||
## Configuration (environment)
|
||||
|
||||
| Variable | Meaning |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue