mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-15 01:55:13 +02:00
Content build-out on top of the Phase 1 topic move. No behavior changes. Splits (existing content relocated, cross-linked): - queries/index.md → mutations/index.md (insert/update/delete + the inserts-vs-deletes rule) and search/index.md (the multi-modal search functions + a hybrid-ranking overview tying nearest/bm25/rrf together). queries/index.md now covers the read shape and points at both. - branching/index.md → branching/time-travel.md (snapshots/time travel) and branching/merge.md (three-way merge + the 7 conflict kinds, verified against error.rs MergeConflictKind). New pages (written from the code, user-facing): - quickstart.md — init → load → query → branch, with verified CLI flags. - concepts/index.md — what OmniGraph is + the L1/L2 (Lance/OmniGraph) framing. Expanded operations/audit.md from a 7-line struct dump into a real actor-tracking page (server token-resolved vs CLI --as chain; reading the trail; the omnigraph:recovery reserved actor). Index wiring: docs/user/index.md and AGENTS.md's topic table link every new page; also normalized AGENTS.md's docs/user link display text to match the Phase 1 retargeted paths. Verified: zero broken .md links; check-agents-md.sh green (57 links, 54 docs). Deferred to Phase 3: de-dev polish (grammar paths, IR internals still in queries/branching), guides/, and a possible reference/config.md split (the config schema is already coherent in cli/reference.md). Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
4.9 KiB
4.9 KiB
Query Language (.gq)
Pest grammar at crates/omnigraph-compiler/src/query/query.pest. AST in query/ast.rs. Type checker in query/typecheck.rs. Lowering in ir/lower.rs.
Query declarations
query <name>($p1: T1, $p2: T2?, …)
@description("…") @instruction("…") {
…
}
Two body shapes:
- Read:
match { … } return { … } [order { … }] [limit N]— covered on this page. - Mutation: one or more of
insert | update | deletestatements — see mutations.
Multi-modal search functions (nearest, bm25, rrf, …) used inside match,
return, and order are documented on the search page.
Param types reuse all schema scalars; trailing ? makes a param optional. The compiler reserves $__nanograph_now for now().
MATCH clauses
- Binding:
$x: NodeType { prop: <literal | $param | now()>, … } - Traversal:
$src EDGE_NAME { min, max? } $dst— variable-length paths via hop bounds; default 1..1 if bounds omitted. - Filter:
<expr> <op> <expr>with operators>=,<=,!=,>,<,=, and stringcontains. - Negation:
not { clause+ }— desugars to anti-join over the inner pipeline.
RETURN clause
return { <expr> [as <alias>], … } with expressions:
- Variable / property access:
$x,$x.prop - Literals: string, int, float, bool, list
now()- Aggregates:
count,sum,avg,min,max - Search functions (so you can return a score column)
AliasRef— re-use a previous projection alias
ORDER & LIMIT
order { <expr> [asc|desc], … }— supports plain expressions andnearest(...).limit <integer>— required when there is anearest(...)ordering.- Total, deterministic order. Rows with equal user-sort keys are broken by the bound entities' key columns (
<var>.id, ascending) appended as a final tie-break, so the result is a total order — reproducible across runs, andorder … limit Nreturns a deterministic top-N even when ties straddle the cutoff. (Aggregate results have no entity-key columns; their group rows are already distinct on the projected group keys.) - NULL placement is nulls-first ascending, nulls-last descending (i.e.
nulls_first = !descending): a NULL sorts as if smaller than any value.
Write statements (insert / update / delete) are documented on the
mutations page.
IR (Intermediate Representation)
QueryIR { name, params, pipeline: Vec<IROp>, return_exprs, order_by, limit }
Pipeline operations:
NodeScan { variable, type_name, filters }Expand { src_var, dst_var, edge_type, direction (Out|In), dst_type, min_hops, max_hops, dst_filters }— destination filters are pushed into the expand so Lance scalar pushdown can prune. Executed one of two ways, chosen per-expand by a cost model over cheap manifest counts (frontier size, |E|, source-vertex count, hops) plus index coverage: selective traversals (small frontier relative to the source set) resolve neighbors from the persistedsrc/dstBTREE (one indexed scan per hop); dense / deep / large-frontier traversals — or those whose BTREE coverage is degraded so a full scan would be paid per hop — use the in-memory CSR adjacency index. Both produce identical results. TheOMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER/OMNIGRAPH_EXPAND_INDEXED_MAX_HOPSceilings bound the initial dispatch frontier/hops (beyond them CSR is always used); the cost model estimates total indexed work as ~hops × frontier × fanoutand prices dense fan-out toward CSR — they are not a hard per-hop bound.OMNIGRAPH_TRAVERSAL_MODE=indexed|csrforces a mode (see constants).Filter { left, op, right }AntiJoin { outer_var, inner: Vec<IROp> }— fornot { … }
Lowering:
- Partition MATCH clauses (bindings, traversals, filters, negations).
- Identify "deferred" bindings (a destination of a traversal that has filters) so the Expand can carry the filter as a pushdown.
- Emit NodeScan for the first binding, then Expand operations, then remaining Filter operations, then AntiJoins for negations.
- Translate RETURN / ORDER expressions; preserve LIMIT.
Linting & validation (query/lint.rs)
Codes seen so far:
- Q000 (Error): parse error
- L201 (Warning): nullable property never set by any UPDATE — "{type}.{prop} exists in schema but no update query sets it"
- (Warning): mutation declares no params — hardcoded mutations are easy to miss
- Plus all type errors from
typecheck_query_decl()(undefined types, mismatched operators, undefined edges, etc.)
Output:
QueryLintOutput { status, schema_source, query_path,
queries_processed, errors, warnings, infos,
results: [{ name, kind, status, error?, warnings[] }],
findings: [{ severity, code, message, type_name?, property?, query_names[] }] }
CLI exits non-zero only on status = Error.