mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-09 01:35:18 +02:00
feat(engine): frontier-size Expand dispatcher + lazy CSR build
Replace the env-only mode switch with an auto policy: Expand uses the BTREE-indexed path when the source frontier is small and the hop count bounded (OMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER=1024, OMNIGRAPH_EXPAND_INDEXED_MAX_HOPS=6), else the in-memory CSR. OMNIGRAPH_TRAVERSAL_MODE=indexed|csr still forces a mode. Make the CSR index lazy: thread a GraphIndexHandle (memoizing OnceCell over a Cached/Direct/None builder) through execute_query/execute_pipeline/ execute_rrf_query/execute_anti_join instead of a pre-built Option<&GraphIndex>. A query served entirely by the indexed path with no AntiJoin never pays the O(|E|) CSR build — the perf win of Tier 3. AntiJoin still realizes the index (its negation uses CSR has_neighbors). Net effect: selective traversals (the common case) skip the whole-graph CSR build and resolve neighbors from the persisted, incrementally-maintained src/dst BTREE. Existing traversal/aggregation/end_to_end/search suites now run the indexed path by default and stay green. Docs: constants.md (new env knobs), query-language.md (Expand dual path), indexes.md (graph index is lazy + the indexed alternative).
This commit is contained in:
parent
7337be60c8
commit
c594aeb868
4 changed files with 127 additions and 34 deletions
|
|
@ -13,6 +13,9 @@
|
|||
| Maintenance concurrency | `OMNIGRAPH_MAINTENANCE_CONCURRENCY=8` | `db/omnigraph/optimize.rs` |
|
||||
| Lance blob compaction support | `LANCE_SUPPORTS_BLOB_COMPACTION = false` | `db/omnigraph/optimize.rs` |
|
||||
| Graph index cache size | `8` (LRU) | `runtime_cache.rs` |
|
||||
| Expand indexed-path max frontier | `OMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER=1024` | `exec/query.rs` |
|
||||
| Expand indexed-path max hops | `OMNIGRAPH_EXPAND_INDEXED_MAX_HOPS=6` | `exec/query.rs` |
|
||||
| Expand mode override | `OMNIGRAPH_TRAVERSAL_MODE` (`indexed`\|`csr`; unset = auto) | `exec/query.rs` |
|
||||
| Default body limit | `1 MB` | `omnigraph-server/lib.rs` |
|
||||
| Ingest body limit | `32 MB` | `omnigraph-server/lib.rs` |
|
||||
| Engine embed model | `gemini-embedding-2-preview` | `omnigraph/embedding.rs` |
|
||||
|
|
|
|||
|
|
@ -21,6 +21,6 @@ This is OmniGraph-specific (not Lance):
|
|||
|
||||
- `TypeIndex`: dense `u32 ↔ String id` mapping per node type.
|
||||
- `CsrIndex`: Compressed Sparse Row representation of edges per edge type — `offsets[i]..offsets[i+1]` slices into `targets`.
|
||||
- `GraphIndex { type_indices, csr (out), csc (in) }` — built on demand from a snapshot's edge tables.
|
||||
- `GraphIndex { type_indices, csr (out), csc (in) }` — built on demand from a snapshot's edge tables, **lazily**: only when an `Expand` the planner routes to the CSR path (dense / large frontier) or an `AntiJoin` actually needs it.
|
||||
- Cached in `RuntimeCache::graph_indices` (LRU, max 8 entries, keyed by snapshot id + edge table versions).
|
||||
- Built only when an `Expand` or `AntiJoin` IR op is present in the lowered query, so pure scans skip it.
|
||||
- Selective `Expand`s resolve neighbors from the persisted `src`/`dst` BTREE instead (one indexed scan per hop) and never trigger the CSR build; see [query-language](query-language.md) → Expand. Pure scans, and queries served entirely by the indexed traversal path, skip it.
|
||||
|
|
|
|||
|
|
@ -79,7 +79,7 @@ Reason: under the staged-write rewire (MR-794), inserts and updates accumulate i
|
|||
Pipeline operations:
|
||||
|
||||
- `NodeScan { variable, type_name, filters }`
|
||||
- `Expand { src_var, dst_var, edge_type, direction (Out|In), dst_type, min_hops, max_hops, dst_filters }` — destination filters are pushed *into* the expand so Lance scalar pushdown can prune.
|
||||
- `Expand { src_var, dst_var, edge_type, direction (Out|In), dst_type, min_hops, max_hops, dst_filters }` — destination filters are pushed *into* the expand so Lance scalar pushdown can prune. Executed one of two ways, chosen per-expand by frontier size: selective traversals (small frontier) resolve neighbors from the persisted `src`/`dst` BTREE (one indexed scan per hop, cost ∝ frontier); dense / whole-graph traversals fall back to the in-memory CSR adjacency index. Both produce identical results. Tunable via `OMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER` / `OMNIGRAPH_EXPAND_INDEXED_MAX_HOPS`, with `OMNIGRAPH_TRAVERSAL_MODE=indexed|csr` forcing a mode (see [constants](constants.md)).
|
||||
- `Filter { left, op, right }`
|
||||
- `AntiJoin { outer_var, inner: Vec<IROp> }` — for `not { … }`
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue