Replace the fixed frontier<=1024 && hops<=6 dispatch threshold with a pure,
IO-free cost model. choose_expand_mode compares the indexed path's
frontier-relative work (hops * frontier * fanout, or hops * |E| when BTREE
coverage is degraded) against the cost of building the whole-graph CSR
(BUILD_FACTOR * |E|), from cheap manifest row counts. Under good coverage this
reduces to a selectivity ratio independent of |E|, preserving the flat-in-|E|
indexed win for selective traversals while routing dense / deep / high-fanout
or degraded-and-expensive traversals to CSR.
execute_expand decides cardinality-first and only opens the edge dataset to
confirm coverage when it leans indexed (no open on a clearly-CSR traversal).
The two env knobs become hard ceilings layered on the model; the
OMNIGRAPH_TRAVERSAL_MODE override still forces a path; the chosen mode is
traced. Results are unchanged across modes — only the path differs.
Adds inline crossover unit tests and extends the traversal_indexed both_modes
harness with an auto pass asserting the chooser is result-preserving across
every traversal shape. Documents the new flag semantics in
docs/user/{constants,query-language}.md.
5.9 KiB
Query Language (.gq)
Pest grammar at crates/omnigraph-compiler/src/query/query.pest. AST in query/ast.rs. Type checker in query/typecheck.rs. Lowering in ir/lower.rs.
Query declarations
query <name>($p1: T1, $p2: T2?, …)
@description("…") @instruction("…") {
…
}
Two body shapes:
- Read:
match { … } return { … } [order { … }] [limit N] - Mutation: one or more of
insert | update | deletestatements
Param types reuse all schema scalars; trailing ? makes a param optional. The compiler reserves $__nanograph_now for now().
MATCH clauses
- Binding:
$x: NodeType { prop: <literal | $param | now()>, … } - Traversal:
$src EDGE_NAME { min, max? } $dst— variable-length paths via hop bounds; default 1..1 if bounds omitted. - Filter:
<expr> <op> <expr>with operators>=,<=,!=,>,<,=, and stringcontains. - Negation:
not { clause+ }— desugars to anti-join over the inner pipeline.
Search clauses (multi-modal)
Used inside MATCH or as expressions inside RETURN/ORDER:
| Function | Purpose | Underlying Lance facility |
|---|---|---|
nearest($x.vec, $q) |
k-NN vector search (cosine) | Lance vector index (IVF / HNSW) |
search(field, q) |
Generic FTS | Inverted index |
fuzzy(field, q [, max_edits]) |
Levenshtein-tolerant text search | Inverted index |
match_text(field, q) |
Pattern match | Inverted index |
bm25(field, q) |
BM25 scoring | Inverted index |
rrf(rank_a, rank_b [, k]) |
Reciprocal Rank Fusion of two rankings (default k=60) | OmniGraph fuses scored rankings |
nearest() requires a LIMIT; the compiler resolves the query vector via the param map (or via the runtime embedding client when bound to a text input).
RETURN clause
return { <expr> [as <alias>], … } with expressions:
- Variable / property access:
$x,$x.prop - Literals: string, int, float, bool, list
now()- Aggregates:
count,sum,avg,min,max - All search functions above (so you can return a score column)
AliasRef— re-use a previous projection alias
ORDER & LIMIT
order { <expr> [asc|desc], … }— supports plain expressions andnearest(...).limit <integer>— required when there is anearest(...)ordering.
Mutation statements
insert <Type> { prop: <value>, … }update <Type> set { prop: <value>, … } where <prop> <op> <value>delete <Type> where <prop> <op> <value>
<value> is a literal, $param, or now(). Multi-statement mutations execute atomically (added in v0.2.0).
D₂ — mixed insert/update + delete is rejected at parse time
A single mutation query must be either insert/update-only or delete-only. Mixed → rejected before any I/O with the message:
mutation '<name>' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes. This restriction lifts when Lance exposes a two-phase delete API (tracked: MR-793 / Lance-upstream).
Reason: under the staged-write rewire (MR-794), inserts and updates accumulate in memory and commit at end-of-query, while deletes still inline-commit (Lance 4.0.0 has no public two-phase delete). Mixing creates ordering hazards (same-row insert→delete becomes a no-op because the staged insert isn't visible to delete; cascading deletes of just-inserted edges break referential integrity by silent design). Until Lance exposes DeleteJob::execute_uncommitted, the parse-time rejection keeps both paths atomic and correct. See docs/dev/writes.md and docs/dev/invariants.md.
IR (Intermediate Representation)
QueryIR { name, params, pipeline: Vec<IROp>, return_exprs, order_by, limit }
Pipeline operations:
NodeScan { variable, type_name, filters }Expand { src_var, dst_var, edge_type, direction (Out|In), dst_type, min_hops, max_hops, dst_filters }— destination filters are pushed into the expand so Lance scalar pushdown can prune. Executed one of two ways, chosen per-expand by a cost model over cheap manifest counts (frontier size, |E|, source-vertex count, hops) plus index coverage: selective traversals (small frontier relative to the source set) resolve neighbors from the persistedsrc/dstBTREE (one indexed scan per hop, cost ∝ frontier); dense / deep / large-frontier traversals — or those whose BTREE coverage is degraded so a full scan would be paid per hop — use the in-memory CSR adjacency index. Both produce identical results. Bounded byOMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER/OMNIGRAPH_EXPAND_INDEXED_MAX_HOPS(hard ceilings, beyond which CSR is always used), withOMNIGRAPH_TRAVERSAL_MODE=indexed|csrforcing a mode (see constants).Filter { left, op, right }AntiJoin { outer_var, inner: Vec<IROp> }— fornot { … }
Lowering:
- Partition MATCH clauses (bindings, traversals, filters, negations).
- Identify "deferred" bindings (a destination of a traversal that has filters) so the Expand can carry the filter as a pushdown.
- Emit NodeScan for the first binding, then Expand operations, then remaining Filter operations, then AntiJoins for negations.
- Translate RETURN / ORDER expressions; preserve LIMIT.
Linting & validation (query/lint.rs)
Codes seen so far:
- Q000 (Error): parse error
- L201 (Warning): nullable property never set by any UPDATE — "{type}.{prop} exists in schema but no update query sets it"
- (Warning): mutation declares no params — hardcoded mutations are easy to miss
- Plus all type errors from
typecheck_query_decl()(undefined types, mismatched operators, undefined edges, etc.)
Output:
QueryLintOutput { status, schema_source, query_path,
queries_processed, errors, warnings, infos,
results: [{ name, kind, status, error?, warnings[] }],
findings: [{ severity, code, message, type_name?, property?, query_names[] }] }
CLI exits non-zero only on status = Error.