docs(embeddings): @embed model arg + same-space validation (RFC-012 Phase 3-4)

Document the optional @embed model kwarg, the query-time same-space rejection, model-string strictness, and the loud schema-apply refusal on model change. Mark RFC-012 phases 1-4 implemented.
2026-07-03 02:51:04 +02:00 · 2026-06-15 21:09:35 +02:00 · 2026-06-15 21:09:35 +02:00 · 70ed848b9d
commit 70ed848b9d
parent 0a34f9011b
3 changed files with 17 additions and 3 deletions
--- a/docs/user/search/embeddings.md
+++ b/docs/user/search/embeddings.md
@ -45,10 +45,20 @@ The default zero-config path is OpenRouter: set `OPENROUTER_API_KEY` and run. Re

 ## `@embed` schema annotation

-Mark a Vector property with `@embed("source_text_property")`. Today this is a **catalog annotation** consumed
-by the query typechecker and linter: it records which String property is the embedding source and lets
+Mark a Vector property with `@embed("source_text_property")`. This is a **catalog annotation** consumed by the
+query typechecker and linter: it records which String property is the embedding source and lets
 `nearest($v, "string")` auto-embed a query string for comparison against that vector column.

+Optionally record the model that produced the stored vectors:
+`@embed("source_text_property", model="openai/text-embedding-3-large")`. When a model is recorded, a
+`nearest($v, "string")` query is **rejected with a typed error** unless the resolved query embedder uses the
+same model — so stored and query vectors are guaranteed same-space instead of silently ranking across spaces.
+To fix a mismatch, set `OMNIGRAPH_EMBED_MODEL` (and the matching provider) to the recorded model, or re-embed.
+The recorded model is the literal string, so `openai/text-embedding-3-large` (via OpenRouter) and
+`text-embedding-3-large` (OpenAI direct) are distinct identities; use the matching string. Changing a recorded
+model is a loud `schema apply` refusal (treat it as a re-embed migration). `@embed` without a model keeps
+working with no validation. `model` is the only supported `@embed` argument; any other is a parse error.
+
 **It does not embed at ingest.** Stored vectors are supplied directly in your load data, or pre-filled by the
 offline `omnigraph embed` pipeline below. (Ingest-time execution of `@embed` is a planned enhancement.)