mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-09 01:35:18 +02:00
Refresh user-facing and agent-facing docs for the staged-write rewire and clean up stale Run-state-machine references that survived MR-771. MR-794-specific updates: * docs/runs.md — remove "Known limitation: mid-query partial failure" section; document the in-memory accumulator + D₂ rule + the LoadMode::Overwrite residual. * docs/invariants.md §VI.25 — flip from aspirational/open to upheld for inserts/updates. Within-query read-your-writes is now load-bearing for the publisher CAS contract. * docs/architecture.md — add "Mutation atomicity — in-memory accumulator (MR-794)" subsection with per-op flow; refresh the engine + state diagrams to drop RunRegistry and add MutationStaging. * docs/execution.md — rewrite the mutation flow sequence diagram for the staged-write path; updated the LoadMode table to call out per-mode commit semantics; rewrote load vs ingest. * docs/query-language.md — document the D₂ parse-time rule. * docs/errors.md — add the D₂ BadRequest rejection path. * docs/testing.md — extend the runs.rs row to cover the new MR-794 contract tests; add the staged_writes.rs row. * docs/releases/v0.4.1.md (new) — release note covering the rewire, test additions, residuals, and files changed. * AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query description and the L2 capability matrix row. Stale-reference cleanup (MR-771 leftovers): * docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance from the layout diagram and prose; mark legacy. * docs/branches-commits.md — move __run__<id> to a legacy note; remove publish_run from the publish-trigger list. * docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as); legacy RunRecord.actor_id moved to a historical note. * docs/constants.md — mark run registry / branch-prefix rows as legacy. * docs/cli.md — replace the legacy omnigraph run * quickstart block with omnigraph commit list/show. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
111 lines
5.1 KiB
Markdown
111 lines
5.1 KiB
Markdown
# Query Language (`.gq`)
|
|
|
|
Pest grammar at `crates/omnigraph-compiler/src/query/query.pest`. AST in `query/ast.rs`. Type checker in `query/typecheck.rs`. Lowering in `ir/lower.rs`.
|
|
|
|
## Query declarations
|
|
|
|
```
|
|
query <name>($p1: T1, $p2: T2?, …)
|
|
@description("…") @instruction("…") {
|
|
…
|
|
}
|
|
```
|
|
|
|
Two body shapes:
|
|
|
|
- **Read**: `match { … } return { … } [order { … }] [limit N]`
|
|
- **Mutation**: one or more of `insert | update | delete` statements
|
|
|
|
Param types reuse all schema scalars; trailing `?` makes a param optional. The compiler reserves `$__nanograph_now` for `now()`.
|
|
|
|
## MATCH clauses
|
|
|
|
- **Binding**: `$x: NodeType { prop: <literal | $param | now()>, … }`
|
|
- **Traversal**: `$src EDGE_NAME { min, max? } $dst` — variable-length paths via hop bounds; default 1..1 if bounds omitted.
|
|
- **Filter**: `<expr> <op> <expr>` with operators `>=`, `<=`, `!=`, `>`, `<`, `=`, and string `contains`.
|
|
- **Negation**: `not { clause+ }` — desugars to anti-join over the inner pipeline.
|
|
|
|
## Search clauses (multi-modal)
|
|
|
|
Used inside MATCH or as expressions inside RETURN/ORDER:
|
|
|
|
| Function | Purpose | Underlying Lance facility |
|
|
|---|---|---|
|
|
| `nearest($x.vec, $q)` | k-NN vector search (cosine) | Lance vector index (IVF / HNSW) |
|
|
| `search(field, q)` | Generic FTS | Inverted index |
|
|
| `fuzzy(field, q [, max_edits])` | Levenshtein-tolerant text search | Inverted index |
|
|
| `match_text(field, q)` | Pattern match | Inverted index |
|
|
| `bm25(field, q)` | BM25 scoring | Inverted index |
|
|
| `rrf(rank_a, rank_b [, k])` | Reciprocal Rank Fusion of two rankings (default k=60) | OmniGraph fuses scored rankings |
|
|
|
|
`nearest()` requires a `LIMIT`; the compiler resolves the query vector via the param map (or via the runtime embedding client when bound to a text input).
|
|
|
|
## RETURN clause
|
|
|
|
`return { <expr> [as <alias>], … }` with expressions:
|
|
|
|
- Variable / property access: `$x`, `$x.prop`
|
|
- Literals: string, int, float, bool, list
|
|
- `now()`
|
|
- Aggregates: `count`, `sum`, `avg`, `min`, `max`
|
|
- All search functions above (so you can return a score column)
|
|
- `AliasRef` — re-use a previous projection alias
|
|
|
|
## ORDER & LIMIT
|
|
|
|
- `order { <expr> [asc|desc], … }` — supports plain expressions and `nearest(...)`.
|
|
- `limit <integer>` — required when there is a `nearest(...)` ordering.
|
|
|
|
## Mutation statements
|
|
|
|
- `insert <Type> { prop: <value>, … }`
|
|
- `update <Type> set { prop: <value>, … } where <prop> <op> <value>`
|
|
- `delete <Type> where <prop> <op> <value>`
|
|
|
|
`<value>` is a literal, `$param`, or `now()`. Multi-statement mutations execute atomically (added in v0.2.0).
|
|
|
|
### D₂ — mixed insert/update + delete is rejected at parse time
|
|
|
|
A single mutation query must be **either insert/update-only or delete-only**. Mixed → rejected before any I/O with the message:
|
|
|
|
> `mutation '<name>' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes. This restriction lifts when Lance exposes a two-phase delete API (tracked: MR-793 / Lance-upstream).`
|
|
|
|
Reason: under the staged-write rewire (MR-794), inserts and updates accumulate in memory and commit at end-of-query, while deletes still inline-commit (Lance 4.0.0 has no public two-phase delete). Mixing creates ordering hazards (same-row insert→delete becomes a no-op because the staged insert isn't visible to delete; cascading deletes of just-inserted edges break referential integrity by silent design). Until Lance exposes `DeleteJob::execute_uncommitted`, the parse-time rejection keeps both paths atomic and correct. See [docs/runs.md](runs.md) and [docs/invariants.md §VI.25](invariants.md).
|
|
|
|
## IR (Intermediate Representation)
|
|
|
|
`QueryIR { name, params, pipeline: Vec<IROp>, return_exprs, order_by, limit }`
|
|
|
|
Pipeline operations:
|
|
|
|
- `NodeScan { variable, type_name, filters }`
|
|
- `Expand { src_var, dst_var, edge_type, direction (Out|In), dst_type, min_hops, max_hops, dst_filters }` — destination filters are pushed *into* the expand so Lance scalar pushdown can prune.
|
|
- `Filter { left, op, right }`
|
|
- `AntiJoin { outer_var, inner: Vec<IROp> }` — for `not { … }`
|
|
|
|
Lowering:
|
|
|
|
1. Partition MATCH clauses (bindings, traversals, filters, negations).
|
|
2. Identify "deferred" bindings (a destination of a traversal that has filters) so the Expand can carry the filter as a pushdown.
|
|
3. Emit NodeScan for the first binding, then Expand operations, then remaining Filter operations, then AntiJoins for negations.
|
|
4. Translate RETURN / ORDER expressions; preserve LIMIT.
|
|
|
|
## Linting & validation (`query/lint.rs`)
|
|
|
|
Codes seen so far:
|
|
|
|
- **Q000** (Error): parse error
|
|
- **L201** (Warning): nullable property never set by any UPDATE — "{type}.{prop} exists in schema but no update query sets it"
|
|
- (Warning): mutation declares no params — hardcoded mutations are easy to miss
|
|
- Plus all type errors from `typecheck_query_decl()` (undefined types, mismatched operators, undefined edges, etc.)
|
|
|
|
Output:
|
|
|
|
```
|
|
QueryLintOutput { status, schema_source, query_path,
|
|
queries_processed, errors, warnings, infos,
|
|
results: [{ name, kind, status, error?, warnings[] }],
|
|
findings: [{ severity, code, message, type_name?, property?, query_names[] }] }
|
|
```
|
|
|
|
CLI exits non-zero only on `status = Error`.
|