mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-21 02:28:07 +02:00
docs(embeddings): @embed model arg + same-space validation (RFC-012 Phase 3-4)
Document the optional @embed model kwarg, the query-time same-space rejection, model-string strictness, and the loud schema-apply refusal on model change. Mark RFC-012 phases 1-4 implemented.
This commit is contained in:
parent
0a34f9011b
commit
70ed848b9d
3 changed files with 17 additions and 3 deletions
|
|
@ -45,7 +45,7 @@ Edge bodies only allow `@unique` and `@index`.
|
|||
|
||||
- `@<ident>` or `@<ident>(<literal>)` on any declaration or property.
|
||||
- Known annotations:
|
||||
- `@embed("source_property")` on a Vector property — records which String property is the embedding source for query-time `nearest($v, "string")` auto-embedding. It is a catalog annotation; it does **not** populate the vector at ingest (supply vectors in load data, or pre-fill via the offline `omnigraph embed` pipeline).
|
||||
- `@embed("source_property")` on a Vector property — records which String property is the embedding source for query-time `nearest($v, "string")` auto-embedding. It is a catalog annotation; it does **not** populate the vector at ingest (supply vectors in load data, or pre-fill via the offline `omnigraph embed` pipeline). An optional `model="…"` kwarg (`@embed("source_property", model="openai/text-embedding-3-large")`) records the embedding model so a `nearest()` query whose embedder uses a different model is rejected loudly; `model` is the only supported kwarg. See [search/embeddings.md](../search/embeddings.md).
|
||||
- `@description("…")`, `@instruction("…")` on query declarations (carried through to clients).
|
||||
- Custom annotations are accepted by the parser and surfaced in catalog metadata; unrecognized annotations don't fail compilation.
|
||||
|
||||
|
|
|
|||
|
|
@ -45,10 +45,20 @@ The default zero-config path is OpenRouter: set `OPENROUTER_API_KEY` and run. Re
|
|||
|
||||
## `@embed` schema annotation
|
||||
|
||||
Mark a Vector property with `@embed("source_text_property")`. Today this is a **catalog annotation** consumed
|
||||
by the query typechecker and linter: it records which String property is the embedding source and lets
|
||||
Mark a Vector property with `@embed("source_text_property")`. This is a **catalog annotation** consumed by the
|
||||
query typechecker and linter: it records which String property is the embedding source and lets
|
||||
`nearest($v, "string")` auto-embed a query string for comparison against that vector column.
|
||||
|
||||
Optionally record the model that produced the stored vectors:
|
||||
`@embed("source_text_property", model="openai/text-embedding-3-large")`. When a model is recorded, a
|
||||
`nearest($v, "string")` query is **rejected with a typed error** unless the resolved query embedder uses the
|
||||
same model — so stored and query vectors are guaranteed same-space instead of silently ranking across spaces.
|
||||
To fix a mismatch, set `OMNIGRAPH_EMBED_MODEL` (and the matching provider) to the recorded model, or re-embed.
|
||||
The recorded model is the literal string, so `openai/text-embedding-3-large` (via OpenRouter) and
|
||||
`text-embedding-3-large` (OpenAI direct) are distinct identities; use the matching string. Changing a recorded
|
||||
model is a loud `schema apply` refusal (treat it as a re-embed migration). `@embed` without a model keeps
|
||||
working with no validation. `model` is the only supported `@embed` argument; any other is a parse error.
|
||||
|
||||
**It does not embed at ingest.** Stored vectors are supplied directly in your load data, or pre-filled by the
|
||||
offline `omnigraph embed` pipeline below. (Ingest-time execution of `@embed` is a planned enhancement.)
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue