mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-12 01:45:14 +02:00
* MR-854: convert engine call sites to &dyn TableStorage; demote legacy methods
Phase 1b: every db.table_store.X(...) call site converts to
db.storage().X(...), reaching the storage layer through the sealed
TableStorage trait (returns &dyn TableStorage). Opaque SnapshotHandle
and StagedHandle replace bare lance::Dataset and Transaction in the
threaded values.
Phase 9: the inherent inline-commit methods on TableStore
(append_batch, merge_insert_batch{,es}, overwrite_batch,
create_btree_index, create_inverted_index) demote from pub to
pub(crate). Their only remaining direct users are table_store.rs
itself and the bulk loader's LoadMode::{Append, Overwrite, Merge}
concurrent fast-paths in loader::write_batch_to_dataset (no
two-phase shape in Lance 4.0.0 — closes after lance#6658 and #6666).
Docs:
- invariants.md \u00a7VI.23: drop "at the writer-trait surface"
qualifier; staged primitives are now the only engine surface.
- runs.md: residual matrix shrinks to delete_where and
create_vector_index (the two upstream-blocked residuals).
- forbidden_apis.rs: replace transitional language with the
current allow-list shape (table_store.rs + loader concurrent
fast-path only).
Files touched:
- changes/mod.rs, db/omnigraph.rs (+export/optimize/schema_apply/
table_ops.rs), exec/{merge,mod,mutation,staging}.rs,
loader/mod.rs, storage_layer.rs, table_store.rs,
tests/forbidden_apis.rs, docs/{invariants,runs}.md.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
* MR-854: replace test-only inline-commit append callers with local Lance helpers
After demoting TableStore::append_batch from pub to pub(crate), the
integration tests in tests/recovery.rs and tests/staged_writes.rs
that previously called store.append_batch(...) directly to simulate
HEAD-ahead-of-manifest drift can no longer access the inherent
method. Replace those calls with small in-test helpers that do a raw
Dataset::append (the same body the inherent method runs).
- tests/helpers/mod.rs gains lance_append_inline (shared helper).
- tests/staged_writes.rs gets a file-local lance_append_inline_local
(staged_writes.rs does not import helpers::).
- tests/recovery.rs drops the unused TableStore import in the one
function whose store binding became unused after the conversion.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
* MR-854: retrigger CI for flaky Test Workspace job
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
* MR-854: convert remaining table_store call sites in export.rs / read_blob
Two leftover `self.table_store.X` / `db.table_store.X` call sites were
missed in the initial sweep — flagged by Devin Review on PR #86. Both
now go through the trait surface:
- `entity_from_snapshot` (db/omnigraph/export.rs): switch from
`db.table_store.open_snapshot_table` + `db.table_store.scan` to
`db.storage().open_snapshot_at_table` + `db.storage().scan`.
- `read_blob` (db/omnigraph.rs): replace
`snapshot.open(table_key)` + `self.table_store.first_row_id_for_filter`
with `self.storage().open_snapshot_at_table` +
`self.storage().first_row_id_for_filter`. The follow-up
`take_blobs` call still needs an `Arc<Dataset>` (it's a Lance blob
accessor not surfaced through the trait), so we hand off via
`SnapshotHandle::into_arc()` with a comment.
After this commit, no engine code outside `table_store.rs` reaches the
inherent `TableStore` API — the docs/runs.md and docs/invariants.md
claim is now uniformly true.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
* MR-854: post-rebase doc fixes (Lance 6.0.1, MR-A framing, into_dataset note)
Reviewer feedback on the rebased PR:
* docs/dev/writes.md residuals matrix: drop demoted methods from the trait-surface table (now `pub(crate)`); keep only the two genuine trait-surface residuals (`delete_where`, `create_vector_index`); reframe under MR-A (Lance v7.x bump) per docs/dev/lance.md.
* tests/forbidden_apis.rs: update transitional allow-list header to (a) drop the truncate_table mislabel (truncate_table is a Lance Dataset method, not a TableStore method — overwrite_batch's internal call), (b) reframe trait-surface residuals under MR-A / Lance #6666.
* crates/omnigraph/src/storage_layer.rs::SnapshotHandle::{into_arc, into_dataset}: add single-ref invariant doc — both consume Arc via try_unwrap-or-clone; sibling SnapshotHandle clones across an await point force a deep Dataset clone.
* Replace lance-4.0.0 version refs with lance-6.0.1 in active source/test/dev-doc comments (storage_layer.rs, table_store.rs, table_ops.rs, schema_apply.rs, merge.rs, recovery.rs, staged_writes.rs, consistency.rs, docs/dev/execution.md, docs/user/query-language.md). Historical refs in docs/releases/v0.4.1.md and the canonical "Lance 4.0.0 → 6.0.1 migration" line in docs/dev/lance.md left intact.
No engine code changes.
* MR-854: update docs/dev/invariants.md Storage trait row + gap entry
Reviewer feedback: the docs reorg landed; the invariant row now lives in
docs/dev/invariants.md with stable headings (no more numbered §VI.23).
Update two pieces to reflect MR-854 completion:
* Status table 'Storage trait' row: was 'full call-site migration ... incomplete';
now 'engine call sites all route through db.storage() (MR-854); inline-commit
inherent methods are pub(crate)-demoted; capability/stat surfaces are roadmap'.
* 'Known Gaps' 'Storage abstraction' entry: was 'older inherent TableStore call
sites and inline residuals remain'; now names the closed scope (MR-854 — call
sites migrated, methods demoted, loader fast-paths) and the remaining
trait-surface residuals under MR-A (Lance v7.x bump) and Lance #6666.
Cross-links to docs/dev/lance.md and docs/dev/writes.md so the framing stays
co-located with the canonical Lance surface tracking.
* MR-854: remove dead inline-commit methods from the storage surface
The loader concurrent fast-path (write_batch_to_dataset) is only reached
for LoadMode::Overwrite — Append/Merge route through MutationStaging — so
its Append/Merge arms were unreachable. Collapse it to overwrite-only and
drop the now-unused mode params, which removes the only callers of:
- TableStorage::append_batch + TableStorage::merge_insert_batches (trait)
- TableStore::merge_insert_batch + merge_insert_batches (inherent)
create_btree_index / create_inverted_index had zero callers anywhere
(scalar index builds use the stage_* primitives). Remove both from the
trait and the inherent impl.
Inherent append_batch stays pub(crate): overwrite_batch and recovery
tests use it. Migrate the one trait-append_batch test caller
(seed_person_row) to stage_append + commit_staged. The merge_insert
FirstSeen-workaround rationale moves from the deleted merge_insert_batch
into stage_merge_insert (now the sole merge path). No behavior change.
Also corrects the inaccurate loader residual comment (the prior text
blamed Lance #6658/#6666, which are the delete and vector-index issues,
for keeping overwrite inline; a stage_overwrite primitive already exists
and schema_apply uses it).
* MR-854: seal db.storage() to staged-only; move residuals to InlineCommitResidual
Split the three remaining inline-commit writes (overwrite_batch,
delete_where, create_vector_index) off the TableStorage trait onto a new
sealed InlineCommitResidual trait, reachable only via the explicit
Omnigraph::storage_inline_residual() accessor. db.storage() now exposes
only staged primitives + reads, so engine code cannot couple a write
with a Lance HEAD advance through the default surface — MR-793 acceptance
§1 ("no public method commits as a side effect of writing") now holds by
construction, not by review + naming.
Call sites moved to storage_inline_residual(): loader overwrite
fast-path, the three mutation delete_where paths, the branch-merge
delete, and the vector-index build. Impl bodies are unchanged (same
delegation to the pub(crate) inherent methods); this is a pure surface
reshape with no behavior change.
The residual trait holds two genuinely upstream-blocked methods
(delete_where -> Lance #6658/v7.x, create_vector_index -> Lance #6666)
plus overwrite_batch, kept for the loader's cross-table bulk-overwrite
concurrency until its staged migration lands (tracked follow-up).
* MR-854 docs: describe the staged-only seal; fix stale Lance index URLs
- writes.md / invariants.md / AGENTS.md: the inline-commit residuals now
live on InlineCommitResidual behind db.storage_inline_residual(), so
acceptance §1 holds by construction rather than 'option (b)' per-method
enumeration. Drop the inaccurate 'until Lance exposes
Operation::Overwrite { fragments }' claim (that op exists; stage_overwrite
already builds it) and reframe overwrite_batch as a removable legacy
residual gated on the loader's bulk-overwrite concurrency.
- forbidden_apis.rs: rewrite the allow-list doc for the split surface.
- lance.md: the index spec pages moved from /format/table/index/ to
/format/index/ in Lance 6.x (the old paths 404). Fix all 13 URLs.
* MR-854: fix stale lance-4.0.0 comment refs flagged in review
Addresses greptile (exec/merge.rs) and aaltshuler's stale-version blocker:
update lance-4.0.0 -> 6.0.1 in the comment/doc refs within this PR's
footprint (exec/merge.rs, exec/mutation.rs, docs/dev/writes.md). Also
corrects exec/merge.rs to cite lance#6666 (not #6658) for
build_index_metadata_from_segments — that is the vector-index segment-commit
API; #6658 is the two-phase delete. (Pre-existing 4.0.0 refs in untouched
files like architecture.md/storage.md are main's incomplete migration
cleanup, left out of scope.)
* fix(storage): stage loader overwrites
* fix(storage): stage empty schema rewrites
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Ragnor Comerford <ragnor.comerford@gmail.com>
Co-authored-by: Ragnor Comerford <hello@ragnor.co>
113 lines
6.7 KiB
Markdown
113 lines
6.7 KiB
Markdown
# Query Language (`.gq`)
|
||
|
||
Pest grammar at `crates/omnigraph-compiler/src/query/query.pest`. AST in `query/ast.rs`. Type checker in `query/typecheck.rs`. Lowering in `ir/lower.rs`.
|
||
|
||
## Query declarations
|
||
|
||
```
|
||
query <name>($p1: T1, $p2: T2?, …)
|
||
@description("…") @instruction("…") {
|
||
…
|
||
}
|
||
```
|
||
|
||
Two body shapes:
|
||
|
||
- **Read**: `match { … } return { … } [order { … }] [limit N]`
|
||
- **Mutation**: one or more of `insert | update | delete` statements
|
||
|
||
Param types reuse all schema scalars; trailing `?` makes a param optional. The compiler reserves `$__nanograph_now` for `now()`.
|
||
|
||
## MATCH clauses
|
||
|
||
- **Binding**: `$x: NodeType { prop: <literal | $param | now()>, … }`
|
||
- **Traversal**: `$src EDGE_NAME { min, max? } $dst` — variable-length paths via hop bounds; default 1..1 if bounds omitted.
|
||
- **Filter**: `<expr> <op> <expr>` with operators `>=`, `<=`, `!=`, `>`, `<`, `=`, and string `contains`.
|
||
- **Negation**: `not { clause+ }` — desugars to anti-join over the inner pipeline.
|
||
|
||
## Search clauses (multi-modal)
|
||
|
||
Used inside MATCH or as expressions inside RETURN/ORDER:
|
||
|
||
| Function | Purpose | Underlying Lance facility |
|
||
|---|---|---|
|
||
| `nearest($x.vec, $q)` | k-NN vector search (cosine) | Lance vector index (IVF / HNSW) |
|
||
| `search(field, q)` | Generic FTS | Inverted index |
|
||
| `fuzzy(field, q [, max_edits])` | Levenshtein-tolerant text search | Inverted index |
|
||
| `match_text(field, q)` | Pattern match | Inverted index |
|
||
| `bm25(field, q)` | BM25 scoring | Inverted index |
|
||
| `rrf(rank_a, rank_b [, k])` | Reciprocal Rank Fusion of two rankings (default k=60) | OmniGraph fuses scored rankings |
|
||
|
||
`nearest()` requires a `LIMIT`; the compiler resolves the query vector via the param map (or via the runtime embedding client when bound to a text input).
|
||
|
||
## RETURN clause
|
||
|
||
`return { <expr> [as <alias>], … }` with expressions:
|
||
|
||
- Variable / property access: `$x`, `$x.prop`
|
||
- Literals: string, int, float, bool, list
|
||
- `now()`
|
||
- Aggregates: `count`, `sum`, `avg`, `min`, `max`
|
||
- All search functions above (so you can return a score column)
|
||
- `AliasRef` — re-use a previous projection alias
|
||
|
||
## ORDER & LIMIT
|
||
|
||
- `order { <expr> [asc|desc], … }` — supports plain expressions and `nearest(...)`.
|
||
- `limit <integer>` — required when there is a `nearest(...)` ordering.
|
||
- **Total, deterministic order.** Rows with equal user-sort keys are broken by the bound entities' key columns (`<var>.id`, ascending) appended as a final tie-break, so the result is a *total* order — reproducible across runs, and `order … limit N` returns a deterministic top-N even when ties straddle the cutoff. (Aggregate results have no entity-key columns; their group rows are already distinct on the projected group keys.)
|
||
- **NULL placement** is *nulls-first ascending, nulls-last descending* (i.e. `nulls_first = !descending`): a NULL sorts as if smaller than any value.
|
||
|
||
## Mutation statements
|
||
|
||
- `insert <Type> { prop: <value>, … }`
|
||
- `update <Type> set { prop: <value>, … } where <prop> <op> <value>`
|
||
- `delete <Type> where <prop> <op> <value>`
|
||
|
||
`<value>` is a literal, `$param`, or `now()`. Multi-statement mutations execute atomically (added in v0.2.0).
|
||
|
||
### D₂ — mixed insert/update + delete is rejected at parse time
|
||
|
||
A single mutation query must be **either insert/update-only or delete-only**. Mixed → rejected before any I/O with the message:
|
||
|
||
> `mutation '<name>' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes. This restriction lifts when Lance exposes a two-phase delete API (tracked: MR-793 / Lance-upstream).`
|
||
|
||
Reason: under the staged-write rewire (MR-794), inserts and updates accumulate in memory and commit at end-of-query, while deletes still inline-commit (Lance v6.0.1 has no public two-phase delete). Mixing creates ordering hazards (same-row insert→delete becomes a no-op because the staged insert isn't visible to delete; cascading deletes of just-inserted edges break referential integrity by silent design). Until the MR-A Lance v7 bump migrates `delete_where` to staged (`DeleteBuilder::execute_uncommitted` first ships in `v7.0.0-beta.10`), the parse-time rejection keeps both paths atomic and correct. See [docs/dev/writes.md](../dev/writes.md), [docs/dev/lance.md](../dev/lance.md), and [docs/dev/invariants.md](../dev/invariants.md).
|
||
|
||
## IR (Intermediate Representation)
|
||
|
||
`QueryIR { name, params, pipeline: Vec<IROp>, return_exprs, order_by, limit }`
|
||
|
||
Pipeline operations:
|
||
|
||
- `NodeScan { variable, type_name, filters }`
|
||
- `Expand { src_var, dst_var, edge_type, direction (Out|In), dst_type, min_hops, max_hops, dst_filters }` — destination filters are pushed *into* the expand so Lance scalar pushdown can prune. Executed one of two ways, chosen per-expand by a cost model over cheap manifest counts (frontier size, |E|, source-vertex count, hops) plus index coverage: selective traversals (small frontier relative to the source set) resolve neighbors from the persisted `src`/`dst` BTREE (one indexed scan per hop); dense / deep / large-frontier traversals — or those whose BTREE coverage is degraded so a full scan would be paid per hop — use the in-memory CSR adjacency index. Both produce identical results. The `OMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER` / `OMNIGRAPH_EXPAND_INDEXED_MAX_HOPS` ceilings bound the *initial dispatch* frontier/hops (beyond them CSR is always used); the cost model estimates total indexed work as ~`hops × frontier × fanout` and prices dense fan-out toward CSR — they are not a hard per-hop bound. `OMNIGRAPH_TRAVERSAL_MODE=indexed|csr` forces a mode (see [constants](constants.md)).
|
||
- `Filter { left, op, right }`
|
||
- `AntiJoin { outer_var, inner: Vec<IROp> }` — for `not { … }`
|
||
|
||
Lowering:
|
||
|
||
1. Partition MATCH clauses (bindings, traversals, filters, negations).
|
||
2. Identify "deferred" bindings (a destination of a traversal that has filters) so the Expand can carry the filter as a pushdown.
|
||
3. Emit NodeScan for the first binding, then Expand operations, then remaining Filter operations, then AntiJoins for negations.
|
||
4. Translate RETURN / ORDER expressions; preserve LIMIT.
|
||
|
||
## Linting & validation (`query/lint.rs`)
|
||
|
||
Codes seen so far:
|
||
|
||
- **Q000** (Error): parse error
|
||
- **L201** (Warning): nullable property never set by any UPDATE — "{type}.{prop} exists in schema but no update query sets it"
|
||
- (Warning): mutation declares no params — hardcoded mutations are easy to miss
|
||
- Plus all type errors from `typecheck_query_decl()` (undefined types, mismatched operators, undefined edges, etc.)
|
||
|
||
Output:
|
||
|
||
```
|
||
QueryLintOutput { status, schema_source, query_path,
|
||
queries_processed, errors, warnings, infos,
|
||
results: [{ name, kind, status, error?, warnings[] }],
|
||
findings: [{ severity, code, message, type_name?, property?, query_names[] }] }
|
||
```
|
||
|
||
CLI exits non-zero only on `status = Error`.
|