From f96682bd521a66bfce2780ba85704497a7d39129 Mon Sep 17 00:00:00 2001 From: Ragnor Comerford Date: Tue, 9 Jun 2026 17:23:42 +0200 Subject: [PATCH] docs: clarify the Expand frontier ceiling bounds the initial dispatch frontier MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The cap is applied at dispatch on the initial frontier; per-hop fan-out (union_dense) is not hard-capped. Correct the constants.md and query-language.md claims: the ceilings bound the initial-dispatch frontier/hops, the cost model estimates total indexed work as ~hops*frontier*fanout (pricing dense fan-out toward CSR), and per-hop work is not a hard bound. Drops the overstated 'hard caps bound indexed work' / 'cost ∝ frontier' wording. --- docs/user/constants.md | 6 ++++-- docs/user/query-language.md | 2 +- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/user/constants.md b/docs/user/constants.md index 8a45ca7..f523042 100644 --- a/docs/user/constants.md +++ b/docs/user/constants.md @@ -33,6 +33,8 @@ hops) plus the index-coverage signal: the indexed path is preferred when its frontier-relative work beats building the CSR (≈ when `hops × frontier` is a small fraction of the source-vertex set), and CSR is preferred for dense/deep traversals or when the BTREE coverage is degraded and a full scan would be paid -per hop. The two ceilings above are hard caps — beyond them CSR is always used — -and the override flag forces a path (the `auto` result is identical either way; +per hop. The two ceilings bound the **initial dispatch** frontier/hops (beyond +them CSR is always used); they are not a hard per-hop bound — the cost model +*estimates* total indexed work as ~`hops × frontier × fanout`, so dense fan-out is +priced toward CSR rather than capped mid-traversal. The override flag forces a path (the `auto` result is identical either way; only the path differs). diff --git a/docs/user/query-language.md b/docs/user/query-language.md index bc22064..acdc45d 100644 --- a/docs/user/query-language.md +++ b/docs/user/query-language.md @@ -81,7 +81,7 @@ Reason: under the staged-write rewire (MR-794), inserts and updates accumulate i Pipeline operations: - `NodeScan { variable, type_name, filters }` -- `Expand { src_var, dst_var, edge_type, direction (Out|In), dst_type, min_hops, max_hops, dst_filters }` — destination filters are pushed *into* the expand so Lance scalar pushdown can prune. Executed one of two ways, chosen per-expand by a cost model over cheap manifest counts (frontier size, |E|, source-vertex count, hops) plus index coverage: selective traversals (small frontier relative to the source set) resolve neighbors from the persisted `src`/`dst` BTREE (one indexed scan per hop, cost ∝ frontier); dense / deep / large-frontier traversals — or those whose BTREE coverage is degraded so a full scan would be paid per hop — use the in-memory CSR adjacency index. Both produce identical results. Bounded by `OMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER` / `OMNIGRAPH_EXPAND_INDEXED_MAX_HOPS` (hard ceilings, beyond which CSR is always used), with `OMNIGRAPH_TRAVERSAL_MODE=indexed|csr` forcing a mode (see [constants](constants.md)). +- `Expand { src_var, dst_var, edge_type, direction (Out|In), dst_type, min_hops, max_hops, dst_filters }` — destination filters are pushed *into* the expand so Lance scalar pushdown can prune. Executed one of two ways, chosen per-expand by a cost model over cheap manifest counts (frontier size, |E|, source-vertex count, hops) plus index coverage: selective traversals (small frontier relative to the source set) resolve neighbors from the persisted `src`/`dst` BTREE (one indexed scan per hop); dense / deep / large-frontier traversals — or those whose BTREE coverage is degraded so a full scan would be paid per hop — use the in-memory CSR adjacency index. Both produce identical results. The `OMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER` / `OMNIGRAPH_EXPAND_INDEXED_MAX_HOPS` ceilings bound the *initial dispatch* frontier/hops (beyond them CSR is always used); the cost model estimates total indexed work as ~`hops × frontier × fanout` and prices dense fan-out toward CSR — they are not a hard per-hop bound. `OMNIGRAPH_TRAVERSAL_MODE=indexed|csr` forces a mode (see [constants](constants.md)). - `Filter { left, op, right }` - `AntiJoin { outer_var, inner: Vec }` — for `not { … }`