mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-24 02:38:06 +02:00
feat(engine): cost-based Expand dispatch chooser (C5)
Replace the fixed frontier<=1024 && hops<=6 dispatch threshold with a pure,
IO-free cost model. choose_expand_mode compares the indexed path's
frontier-relative work (hops * frontier * fanout, or hops * |E| when BTREE
coverage is degraded) against the cost of building the whole-graph CSR
(BUILD_FACTOR * |E|), from cheap manifest row counts. Under good coverage this
reduces to a selectivity ratio independent of |E|, preserving the flat-in-|E|
indexed win for selective traversals while routing dense / deep / high-fanout
or degraded-and-expensive traversals to CSR.
execute_expand decides cardinality-first and only opens the edge dataset to
confirm coverage when it leans indexed (no open on a clearly-CSR traversal).
The two env knobs become hard ceilings layered on the model; the
OMNIGRAPH_TRAVERSAL_MODE override still forces a path; the chosen mode is
traced. Results are unchanged across modes — only the path differs.
Adds inline crossover unit tests and extends the traversal_indexed both_modes
harness with an auto pass asserting the chooser is result-preserving across
every traversal shape. Documents the new flag semantics in
docs/user/{constants,query-language}.md.
This commit is contained in:
parent
236e66c789
commit
787d41ec5f
4 changed files with 367 additions and 31 deletions
|
|
@ -13,9 +13,10 @@
|
|||
| Maintenance concurrency | `OMNIGRAPH_MAINTENANCE_CONCURRENCY=8` | `db/omnigraph/optimize.rs` |
|
||||
| Lance blob compaction support | `LANCE_SUPPORTS_BLOB_COMPACTION = false` | `db/omnigraph/optimize.rs` |
|
||||
| Graph index cache size | `8` (LRU) | `runtime_cache.rs` |
|
||||
| Expand indexed-path max frontier | `OMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER=1024` | `exec/query.rs` |
|
||||
| Expand indexed-path max hops | `OMNIGRAPH_EXPAND_INDEXED_MAX_HOPS=6` | `exec/query.rs` |
|
||||
| Expand mode override | `OMNIGRAPH_TRAVERSAL_MODE` (`indexed`\|`csr`; unset = auto) | `exec/query.rs` |
|
||||
| Expand indexed-path frontier ceiling | `OMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER=1024` | `exec/query.rs` |
|
||||
| Expand indexed-path hop ceiling | `OMNIGRAPH_EXPAND_INDEXED_MAX_HOPS=6` | `exec/query.rs` |
|
||||
| Expand CSR-build cost factor | `CSR_BUILD_FACTOR = 1.5` | `exec/query.rs` |
|
||||
| Expand mode override | `OMNIGRAPH_TRAVERSAL_MODE` (`indexed`\|`csr`; unset = cost-based auto) | `exec/query.rs` |
|
||||
| Default body limit | `1 MB` | `omnigraph-server/lib.rs` |
|
||||
| Ingest body limit | `32 MB` | `omnigraph-server/lib.rs` |
|
||||
| Engine embed model | `gemini-embedding-2-preview` | `omnigraph/embedding.rs` |
|
||||
|
|
@ -24,3 +25,14 @@
|
|||
| Embed retries | `4` | both clients |
|
||||
| Embed retry backoff | `200 ms` | both clients |
|
||||
| LANCE memory pool default | `1 GB` (raised in v0.3.0) | runtime |
|
||||
|
||||
**Expand traversal dispatch.** With `OMNIGRAPH_TRAVERSAL_MODE` unset, the engine
|
||||
chooses the indexed (per-hop BTREE) vs CSR (whole-graph in-memory) path with a
|
||||
cost model over cheap manifest counts (frontier size, |E|, source-vertex count,
|
||||
hops) plus the index-coverage signal: the indexed path is preferred when its
|
||||
frontier-relative work beats building the CSR (≈ when `hops × frontier` is a
|
||||
small fraction of the source-vertex set), and CSR is preferred for dense/deep
|
||||
traversals or when the BTREE coverage is degraded and a full scan would be paid
|
||||
per hop. The two ceilings above are hard caps — beyond them CSR is always used —
|
||||
and the override flag forces a path (the `auto` result is identical either way;
|
||||
only the path differs).
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue