fix(embedding): openai-alias api key + deadline scope (PR review)

openai-alias api key (Cursor High): key resolution was dispatched on the Provider enum, which is OpenAiCompatible for both openai and openai-compatible, so OMNIGRAPH_EMBED_PROVIDER=openai (base api.openai.com) could send OPENROUTER_API_KEY and 401. Fix: the alias match now yields the ordered key-env list too, so base, model and key are all alias-derived in one place. openai takes only OPENAI_API_KEY (errors loudly if absent); openai-compatible/unset prefer OPENROUTER_API_KEY then OPENAI_API_KEY. Closes the class, not the instance. deadline scope (Cursor Medium): the deadline bounds every embed call (query and document), which is correct, but the name said query-only. Renamed OMNIGRAPH_EMBED_QUERY_DEADLINE_MS -> OMNIGRAPH_EMBED_DEADLINE_MS (field query_deadline_ms -> deadline_ms) and updated the doc wording so the name matches the behavior. Two new alias-key tests; 20 embedding unit tests pass. Resolved earlier in 30377c4: openai-alias base URL, single embed-model source, stale @embed ingest docs. Declined: RFC Status (the lifecycle keeps it Proposed while the PR is open).
2026-06-18 02:24:27 +02:00 · 2026-06-15 21:44:31 +02:00 · 2026-06-15 21:44:31 +02:00 · ffb4a2c9ab
commit ffb4a2c9ab
parent 70ed848b9d
3 changed files with 68 additions and 37 deletions
--- a/docs/dev/rfc-012-embedding-provider-config.md
+++ b/docs/dev/rfc-012-embedding-provider-config.md
@ -187,8 +187,9 @@ provider). This is the only behaviour that closes P3 by construction.

 ### NFR floor (independent of the provider work)

- **Deadline:** wrap the query-time embed in a total-operation deadline (`OMNIGRAPH_EMBED_QUERY_DEADLINE_MS`)
-  so a degraded provider cannot hang a read for the current ~121 s worst case (4 × 30 s timeout + backoff).
+- **Deadline:** wrap every embed call (query or document) in a total-operation deadline
+  (`OMNIGRAPH_EMBED_DEADLINE_MS`) so a degraded provider cannot hang the caller for the current ~121 s worst
+  case (4 × 30 s timeout + backoff).
 - **Observability:** `tracing` span per embed call (provider, model, dim, attempts, outcome, elapsed; `warn!`
  per retry; token usage when the provider returns it). The subsystem has zero instrumentation today.
 - **Single normalization:** one `normalize_vector` (the dead client carried a divergent second copy; removed
--- a/docs/user/search/embeddings.md
+++ b/docs/user/search/embeddings.md
@ -25,7 +25,7 @@ the target column width and sent as Gemini `outputDimensionality` / OpenAI `dime
 | `OMNIGRAPH_EMBED_MODEL` | model id; defaults `openai/text-embedding-3-large` (OpenRouter), `text-embedding-3-large` (`openai`), `gemini-embedding-2` (`gemini`) |
 | `OPENROUTER_API_KEY` / `OPENAI_API_KEY` | api key for `openai-compatible` (OpenRouter preferred) |
 | `GEMINI_API_KEY` | api key for `gemini` |
-| `OMNIGRAPH_EMBED_QUERY_DEADLINE_MS` | total wall-clock budget for one embed call across all retries (default `60000`; `0` = unbounded) |
+| `OMNIGRAPH_EMBED_DEADLINE_MS` | total wall-clock budget for one embed call across all retries (default `60000`; `0` = unbounded) |
 | `OMNIGRAPH_EMBED_TIMEOUT_MS` | per-request HTTP timeout (default `30000`) |
 | `OMNIGRAPH_EMBED_RETRY_ATTEMPTS` / `OMNIGRAPH_EMBED_RETRY_BACKOFF_MS` | retry policy (defaults `4` / `200`) |
 | `OMNIGRAPH_EMBEDDINGS_MOCK` | set truthy to force the deterministic mock provider |
@ -35,7 +35,7 @@ The default zero-config path is OpenRouter: set `OPENROUTER_API_KEY` and run. Re

 ### Behavior notes

- **Bounded latency.** Each embed call is wrapped in `OMNIGRAPH_EMBED_QUERY_DEADLINE_MS`, so a degraded
+- **Bounded latency.** Each embed call is wrapped in `OMNIGRAPH_EMBED_DEADLINE_MS`, so a degraded
  provider cannot hang a read for the full retry envelope.
 - **Reuse.** The query path builds the client once per graph handle (on the first `nearest($v, "string")`
  that needs embedding) and reuses it, keeping the provider connection pool warm. A graph that never embeds