feat(engine): frontier-size Expand dispatcher + lazy CSR build

Replace the env-only mode switch with an auto policy: Expand uses the
BTREE-indexed path when the source frontier is small and the hop count bounded
(OMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER=1024, OMNIGRAPH_EXPAND_INDEXED_MAX_HOPS=6),
else the in-memory CSR. OMNIGRAPH_TRAVERSAL_MODE=indexed|csr still forces a mode.

Make the CSR index lazy: thread a GraphIndexHandle (memoizing OnceCell over a
Cached/Direct/None builder) through execute_query/execute_pipeline/
execute_rrf_query/execute_anti_join instead of a pre-built Option<&GraphIndex>.
A query served entirely by the indexed path with no AntiJoin never pays the
O(|E|) CSR build — the perf win of Tier 3. AntiJoin still realizes the index
(its negation uses CSR has_neighbors).

Net effect: selective traversals (the common case) skip the whole-graph CSR
build and resolve neighbors from the persisted, incrementally-maintained
src/dst BTREE. Existing traversal/aggregation/end_to_end/search suites now run
the indexed path by default and stay green.

Docs: constants.md (new env knobs), query-language.md (Expand dual path),
indexes.md (graph index is lazy + the indexed alternative).
This commit is contained in:
Ragnor Comerford 2026-06-08 11:32:06 +02:00
parent 7337be60c8
commit c594aeb868
No known key found for this signature in database
4 changed files with 127 additions and 34 deletions

View file

@ -79,7 +79,7 @@ Reason: under the staged-write rewire (MR-794), inserts and updates accumulate i
Pipeline operations:
- `NodeScan { variable, type_name, filters }`
- `Expand { src_var, dst_var, edge_type, direction (Out|In), dst_type, min_hops, max_hops, dst_filters }` — destination filters are pushed *into* the expand so Lance scalar pushdown can prune.
- `Expand { src_var, dst_var, edge_type, direction (Out|In), dst_type, min_hops, max_hops, dst_filters }` — destination filters are pushed *into* the expand so Lance scalar pushdown can prune. Executed one of two ways, chosen per-expand by frontier size: selective traversals (small frontier) resolve neighbors from the persisted `src`/`dst` BTREE (one indexed scan per hop, cost ∝ frontier); dense / whole-graph traversals fall back to the in-memory CSR adjacency index. Both produce identical results. Tunable via `OMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER` / `OMNIGRAPH_EXPAND_INDEXED_MAX_HOPS`, with `OMNIGRAPH_TRAVERSAL_MODE=indexed|csr` forcing a mode (see [constants](constants.md)).
- `Filter { left, op, right }`
- `AntiJoin { outer_var, inner: Vec<IROp> }` — for `not { … }`