omnigraph/docs/query-language.md
Ragnor Comerford a61e82f47a
MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup
Refresh user-facing and agent-facing docs for the staged-write rewire
and clean up stale Run-state-machine references that survived MR-771.

MR-794-specific updates:
* docs/runs.md — remove "Known limitation: mid-query partial failure"
  section; document the in-memory accumulator + D₂ rule + the
  LoadMode::Overwrite residual.
* docs/invariants.md §VI.25 — flip from aspirational/open to
  upheld for inserts/updates. Within-query read-your-writes is now
  load-bearing for the publisher CAS contract.
* docs/architecture.md — add "Mutation atomicity — in-memory
  accumulator (MR-794)" subsection with per-op flow; refresh the
  engine + state diagrams to drop RunRegistry and add MutationStaging.
* docs/execution.md — rewrite the mutation flow sequence diagram
  for the staged-write path; updated the LoadMode table to call
  out per-mode commit semantics; rewrote load vs ingest.
* docs/query-language.md — document the D₂ parse-time rule.
* docs/errors.md — add the D₂ BadRequest rejection path.
* docs/testing.md — extend the runs.rs row to cover the new MR-794
  contract tests; add the staged_writes.rs row.
* docs/releases/v0.4.1.md (new) — release note covering the rewire,
  test additions, residuals, and files changed.
* AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query
  description and the L2 capability matrix row.

Stale-reference cleanup (MR-771 leftovers):
* docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance
  from the layout diagram and prose; mark legacy.
* docs/branches-commits.md — move __run__<id> to a legacy note;
  remove publish_run from the publish-trigger list.
* docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as);
  legacy RunRecord.actor_id moved to a historical note.
* docs/constants.md — mark run registry / branch-prefix rows as legacy.
* docs/cli.md — replace the legacy omnigraph run * quickstart block
  with omnigraph commit list/show.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:43:19 +02:00

5.1 KiB

Query Language (.gq)

Pest grammar at crates/omnigraph-compiler/src/query/query.pest. AST in query/ast.rs. Type checker in query/typecheck.rs. Lowering in ir/lower.rs.

Query declarations

query <name>($p1: T1, $p2: T2?, …)
  @description("…") @instruction("…") {
  …
}

Two body shapes:

  • Read: match { … } return { … } [order { … }] [limit N]
  • Mutation: one or more of insert | update | delete statements

Param types reuse all schema scalars; trailing ? makes a param optional. The compiler reserves $__nanograph_now for now().

MATCH clauses

  • Binding: $x: NodeType { prop: <literal | $param | now()>, … }
  • Traversal: $src EDGE_NAME { min, max? } $dst — variable-length paths via hop bounds; default 1..1 if bounds omitted.
  • Filter: <expr> <op> <expr> with operators >=, <=, !=, >, <, =, and string contains.
  • Negation: not { clause+ } — desugars to anti-join over the inner pipeline.

Search clauses (multi-modal)

Used inside MATCH or as expressions inside RETURN/ORDER:

Function Purpose Underlying Lance facility
nearest($x.vec, $q) k-NN vector search (cosine) Lance vector index (IVF / HNSW)
search(field, q) Generic FTS Inverted index
fuzzy(field, q [, max_edits]) Levenshtein-tolerant text search Inverted index
match_text(field, q) Pattern match Inverted index
bm25(field, q) BM25 scoring Inverted index
rrf(rank_a, rank_b [, k]) Reciprocal Rank Fusion of two rankings (default k=60) OmniGraph fuses scored rankings

nearest() requires a LIMIT; the compiler resolves the query vector via the param map (or via the runtime embedding client when bound to a text input).

RETURN clause

return { <expr> [as <alias>], … } with expressions:

  • Variable / property access: $x, $x.prop
  • Literals: string, int, float, bool, list
  • now()
  • Aggregates: count, sum, avg, min, max
  • All search functions above (so you can return a score column)
  • AliasRef — re-use a previous projection alias

ORDER & LIMIT

  • order { <expr> [asc|desc], … } — supports plain expressions and nearest(...).
  • limit <integer> — required when there is a nearest(...) ordering.

Mutation statements

  • insert <Type> { prop: <value>, … }
  • update <Type> set { prop: <value>, … } where <prop> <op> <value>
  • delete <Type> where <prop> <op> <value>

<value> is a literal, $param, or now(). Multi-statement mutations execute atomically (added in v0.2.0).

D₂ — mixed insert/update + delete is rejected at parse time

A single mutation query must be either insert/update-only or delete-only. Mixed → rejected before any I/O with the message:

mutation '<name>' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes. This restriction lifts when Lance exposes a two-phase delete API (tracked: MR-793 / Lance-upstream).

Reason: under the staged-write rewire (MR-794), inserts and updates accumulate in memory and commit at end-of-query, while deletes still inline-commit (Lance 4.0.0 has no public two-phase delete). Mixing creates ordering hazards (same-row insert→delete becomes a no-op because the staged insert isn't visible to delete; cascading deletes of just-inserted edges break referential integrity by silent design). Until Lance exposes DeleteJob::execute_uncommitted, the parse-time rejection keeps both paths atomic and correct. See docs/runs.md and docs/invariants.md §VI.25.

IR (Intermediate Representation)

QueryIR { name, params, pipeline: Vec<IROp>, return_exprs, order_by, limit }

Pipeline operations:

  • NodeScan { variable, type_name, filters }
  • Expand { src_var, dst_var, edge_type, direction (Out|In), dst_type, min_hops, max_hops, dst_filters } — destination filters are pushed into the expand so Lance scalar pushdown can prune.
  • Filter { left, op, right }
  • AntiJoin { outer_var, inner: Vec<IROp> } — for not { … }

Lowering:

  1. Partition MATCH clauses (bindings, traversals, filters, negations).
  2. Identify "deferred" bindings (a destination of a traversal that has filters) so the Expand can carry the filter as a pushdown.
  3. Emit NodeScan for the first binding, then Expand operations, then remaining Filter operations, then AntiJoins for negations.
  4. Translate RETURN / ORDER expressions; preserve LIMIT.

Linting & validation (query/lint.rs)

Codes seen so far:

  • Q000 (Error): parse error
  • L201 (Warning): nullable property never set by any UPDATE — "{type}.{prop} exists in schema but no update query sets it"
  • (Warning): mutation declares no params — hardcoded mutations are easy to miss
  • Plus all type errors from typecheck_query_decl() (undefined types, mismatched operators, undefined edges, etc.)

Output:

QueryLintOutput { status, schema_source, query_path,
  queries_processed, errors, warnings, infos,
  results: [{ name, kind, status, error?, warnings[] }],
  findings: [{ severity, code, message, type_name?, property?, query_names[] }] }

CLI exits non-zero only on status = Error.