mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-15 01:55:13 +02:00
Remove developer-only scaffolding that leaked into the public user/operator docs, while preserving every user-facing behavior, command, flag, endpoint, constant, and env var. No behavior changes. Removed across 18 files: - internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N"); - source-code paths (crates/**/*.rs, *.pest) and internal struct/function dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types, internal fn names like fork_branch_from_state, optimize_all_tables); - Lance-internal blocker prose (upstream issue numbers, blob-decode cause, sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g. "optimize skips Blob-column tables; reads/writes unaffected"); - pre-v0.4.0 Run-state-machine archaeology. Internal IR/lowering/recovery-internals sections were either trimmed to a brief user-facing note (e.g. "Traversal execution", "interrupted writes recover automatically; recovery commits are recorded under actor omnigraph:recovery") or removed. Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints, error taxonomy, every constant and env var (verified none dropped from the constants cheat-sheet), and the operator-facing explanations of on-disk artifacts. Residual "legacy" mentions are all user-facing (the deprecated omnigraph.yaml, the legacy token chain, old command names). Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across docs/user; zero broken links; check-agents-md.sh green. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1.7 KiB
1.7 KiB
Embeddings
OmniGraph has two embedding clients with different defaults and purposes.
Compiler-side client — query-time normalization
- Default model:
text-embedding-3-small(OpenAI-style schema) - Env:
NANOGRAPH_EMBED_MODEL,OPENAI_API_KEY,OPENAI_BASE_URL(defaulthttps://api.openai.com/v1),NANOGRAPH_EMBEDDINGS_MOCK,NANOGRAPH_EMBED_TIMEOUT_MS=30000,NANOGRAPH_EMBED_RETRY_ATTEMPTS=4,NANOGRAPH_EMBED_RETRY_BACKOFF_MS=200 - Methods:
embed_text(input, expected_dim),embed_texts(inputs, expected_dim) - Mock mode: deterministic FNV-1a + xorshift64 → L2-normalized vectors
Engine-side client — runtime ingest
- Model:
gemini-embedding-2-preview - Env:
GEMINI_API_KEY,OMNIGRAPH_GEMINI_BASE_URL(default Google generativelanguage v1beta),OMNIGRAPH_EMBED_TIMEOUT_MS=30000,OMNIGRAPH_EMBED_RETRY_ATTEMPTS=4,OMNIGRAPH_EMBED_RETRY_BACKOFF_MS=200,OMNIGRAPH_EMBEDDINGS_MOCK - Two task types:
embed_query_text(RETRIEVAL_QUERY) andembed_document_text(RETRIEVAL_DOCUMENT) - Exponential backoff with retryable detection (timeouts, 429, 5xx)
Schema integration
Mark a Vector property with @embed("source_text_property"). At ingest, the engine pulls the source text and writes the embedding into the vector column. Stored as L2-normalized FixedSizeList(Float32, dim).
CLI omnigraph embed (offline file pipeline)
Operates on JSONL files (not on a graph). Three modes (mutually exclusive):
- (default)
fill_missing— only embed rows whose target field is empty --reembed-all— overwrite all--clean— strip embeddings
Inputs are either a single seed manifest YAML or --input/--output/--spec. Selectors --type T, --select T:field=value filter rows. Streams JSONL → JSONL.