From a61e82f47acdd49bf75a7d21a5e837e58dfdeee4 Mon Sep 17 00:00:00 2001 From: Ragnor Comerford Date: Fri, 1 May 2026 10:43:19 +0200 Subject: [PATCH] =?UTF-8?q?MR-794=20step=202:=20docs=20=E2=80=94=20runs/in?= =?UTF-8?q?variants/architecture/execution=20+=20cleanup?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Refresh user-facing and agent-facing docs for the staged-write rewire and clean up stale Run-state-machine references that survived MR-771. MR-794-specific updates: * docs/runs.md — remove "Known limitation: mid-query partial failure" section; document the in-memory accumulator + D₂ rule + the LoadMode::Overwrite residual. * docs/invariants.md §VI.25 — flip from aspirational/open to upheld for inserts/updates. Within-query read-your-writes is now load-bearing for the publisher CAS contract. * docs/architecture.md — add "Mutation atomicity — in-memory accumulator (MR-794)" subsection with per-op flow; refresh the engine + state diagrams to drop RunRegistry and add MutationStaging. * docs/execution.md — rewrite the mutation flow sequence diagram for the staged-write path; updated the LoadMode table to call out per-mode commit semantics; rewrote load vs ingest. * docs/query-language.md — document the D₂ parse-time rule. * docs/errors.md — add the D₂ BadRequest rejection path. * docs/testing.md — extend the runs.rs row to cover the new MR-794 contract tests; add the staged_writes.rs row. * docs/releases/v0.4.1.md (new) — release note covering the rewire, test additions, residuals, and files changed. * AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query description and the L2 capability matrix row. Stale-reference cleanup (MR-771 leftovers): * docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance from the layout diagram and prose; mark legacy. * docs/branches-commits.md — move __run__ to a legacy note; remove publish_run from the publish-trigger list. * docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as); legacy RunRecord.actor_id moved to a historical note. * docs/constants.md — mark run registry / branch-prefix rows as legacy. * docs/cli.md — replace the legacy omnigraph run * quickstart block with omnigraph commit list/show. Co-Authored-By: Claude Opus 4.7 (1M context) --- AGENTS.md | 4 +- docs/architecture.md | 53 +++++++++++++-- docs/audit.md | 5 +- docs/branches-commits.md | 4 +- docs/cli.md | 8 +-- docs/constants.md | 4 +- docs/errors.md | 1 + docs/execution.md | 109 ++++++++++++++++--------------- docs/invariants.md | 2 +- docs/query-language.md | 8 +++ docs/releases/v0.4.1.md | 138 +++++++++++++++++++++++++++++++++++++++ docs/runs.md | 118 ++++++++++++++++++++------------- docs/storage.md | 6 +- docs/testing.md | 3 +- 14 files changed, 342 insertions(+), 121 deletions(-) create mode 100644 docs/releases/v0.4.1.md diff --git a/AGENTS.md b/AGENTS.md index c851128..d347343 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -32,7 +32,7 @@ OmniGraph is a typed property-graph engine built as a coordination layer over ma - **Languages**: a `.pg` schema language and a `.gq` query language, both Pest-based, with a typed IR. - **Multi-modal querying**: vector ANN (`nearest`), full-text (`search`/`fuzzy`/`match_text`/`bm25`), Reciprocal Rank Fusion (`rrf`), and graph traversal (`Expand`, anti-join `not { … }`) in one runtime. - **Branches and commits across the whole graph**: Git-style — every successful publish appends to a commit DAG; merges are three-way at the row level. -- **Atomic per-query writes**: `mutate_as` and `load` capture per-table `expected_table_versions` before writing and call `ManifestBatchPublisher::publish` once at the end. Cross-table OCC enforced inside the publisher's row-level CAS; no staging branches, no run state machine. +- **Atomic per-query writes**: `mutate_as` and `load` accumulate insert/update batches into an in-memory `MutationStaging.pending` per touched table; one `stage_*` + `commit_staged` per table runs at end-of-query, then `ManifestBatchPublisher::publish` commits the manifest atomically with per-table `expected_table_versions` CAS. A mid-query failure leaves Lance HEAD untouched on staged tables — no drift, no run state machine, no staging branches. Deletes still inline-commit; D₂ at parse time prevents inserts/updates and deletes from coexisting in one query. - **HTTP server**: Axum + utoipa OpenAPI, bearer auth (SHA-256 hashed, optional AWS Secrets Manager), Cedar policy gating. - **CLI** driven by a single `omnigraph.yaml`; multi-format output (json/jsonl/csv/kv/table). @@ -211,7 +211,7 @@ omnigraph policy explain --actor act-alice --action change --branch main | Query language | — | `.gq` + Pest grammar + IR + lowering + linter | | Schema migration planning | — | `plan_schema_migration` + `apply_schema` step types + `__schema_apply_lock__` | | Commit graph (DAG) across whole repo | — | `_graph_commits.lance` with linear + merge parents, ULID ids, actor map | -| Per-query atomic writes | — | `MutationStaging` accumulator + `commit_with_expected` publisher CAS, single commit per `mutate_as` / `load` | +| Per-query atomic writes | — | In-memory `MutationStaging.pending` accumulator + `stage_*` / `commit_staged` per touched table at end-of-query + publisher CAS via `commit_with_expected` (single manifest commit per `mutate_as` / `load`); D₂ parse-time rule keeps inserts/updates and deletes from mixing | | Three-way row-level merge | — | `OrderedTableCursor` + `StagedTableWriter`, structured `MergeConflictKind` | | Change feeds | — | `diff_between` / `diff_commits` with manifest fast path + ID streaming | | Cedar policy | — | 8 actions, branch / target_branch / protected scopes, validate/test/explain CLI | diff --git a/docs/architecture.md b/docs/architecture.md index 5bd252e..0357b5d 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -63,7 +63,7 @@ flowchart TB subgraph engine[omnigraph engine] plan[exec query and mutation]:::l2 gi[graph index CSR/CSC
RuntimeCache LRU 8]:::l2 - coord[coordinator
ManifestRepo · CommitGraph · RunRegistry]:::l2 + coord[coordinator
ManifestRepo · CommitGraph]:::l2 end subgraph storage[storage trait — wraps Lance] @@ -134,7 +134,7 @@ flowchart TB coord[GraphCoordinator]:::l2 mr[ManifestRepo
db/manifest.rs]:::l2 cg[CommitGraph
_graph_commits.lance]:::l2 - rr[RunRegistry
_graph_runs.lance]:::l2 + stg[MutationStaging
per-query in-memory accumulator
exec/staging.rs]:::l2 end subgraph idx[graph index] @@ -149,17 +149,18 @@ flowchart TB eq --> gi eq --> ts + em --> stg em --> ts + ld --> stg ld --> ts eq --> mr em --> mr coord --> mr coord --> cg - coord --> rr ts --> st ``` -The engine binds the compiler IR to Lance. It owns multi-dataset coordination, the graph topology index, the run registry, and the snapshot/manifest read path. +The engine binds the compiler IR to Lance. It owns multi-dataset coordination, the graph topology index, the per-query staging accumulator, and the snapshot/manifest read path. Code paths: @@ -169,6 +170,46 @@ Code paths: - Graph index: `crates/omnigraph/src/graph_index/` - Loader: `Omnigraph::ingest` at `crates/omnigraph/src/loader/mod.rs:74` +### Mutation atomicity — in-memory accumulator (MR-794) + +Inserts and updates inside `mutate_as` and the bulk loader's +Append/Merge modes go through `MutationStaging` +([`crates/omnigraph/src/exec/staging.rs`](../crates/omnigraph/src/exec/staging.rs)), +a per-query in-memory accumulator. No Lance HEAD advance happens during +op execution; one `stage_*` + `commit_staged` per touched table runs +at end-of-query, then the publisher commits the manifest atomically. + +``` +op-1 (insert/update) → push RecordBatch → MutationStaging.pending[table] +op-2 (insert/update) → read committed via Lance + pending via DataFusion + MemTable (read-your-writes) → push batch +op-N → push batch +─── end of query ─────────────────────────────────────── +finalize: per pending table: + concat batches → stage_append OR stage_merge_insert → commit_staged +publisher: ManifestBatchPublisher::publish (one cross-table CAS) +``` + +A failed op leaves Lance HEAD untouched on the staged tables: the next +mutation proceeds normally with no drift to reconcile. Concrete +contracts: + +- `D₂` parse-time rule: a query is either insert/update-only or + delete-only. Mixed → reject. Deletes still inline-commit (Lance + 4.0.0 has no public two-phase delete); D₂ keeps the inline path safe. +- `LoadMode::Overwrite` keeps the inline-commit path + (truncate-then-append doesn't fit the staged shape; overwrite has no + in-flight read-your-writes requirement). +- Read sites consume `TableStore::scan_with_pending`, which Lance-scans + the committed snapshot at the captured `expected_version` and unions + with a DataFusion `MemTable` over the pending batches. + +This pattern realizes [docs/invariants.md §VI.25](invariants.md) +(read-your-writes within a multi-statement mutation) and §VI.32 +(failure scope bounded) for inserts/updates by construction at the +writer layer. See [docs/runs.md](runs.md) for the publisher CAS +contract this builds on. + ### Storage trait — today vs. roadmap ```mermaid @@ -256,13 +297,13 @@ Throughout the docs, capabilities are split into: - **MVCC**: every Lance write bumps a per-dataset version; the OmniGraph manifest version coordinates which sub-table versions are visible together. - **Snapshot isolation**: a query holds one `Snapshot` for its lifetime; concurrent writes don't leak in. - **Cross-branch isolation**: copy-on-write means readers and writers on different branches don't block each other. -- **Run isolation**: each transactional run lives on its own `__run__` branch. +- **Per-query staging**: `mutate_as` and `load` (Append/Merge) accumulate insert/update batches in an in-memory `MutationStaging`; one `stage_*` + `commit_staged` per touched table runs at end-of-query, then the publisher commits the manifest atomically. A mid-query failure leaves Lance HEAD untouched on staged tables. (MR-794; pre-v0.4.0 used a `__run__` staging branch + Run state machine, removed in MR-771.) - **Schema-apply lock**: `__schema_apply_lock__` system branch serializes schema migrations. - **Fail-points** (`failpoints` cargo feature): `failpoints::maybe_fail("operation.step")?` in `branch_create`, publish, etc., for deterministic failure injection in tests. ## Workspace crates - `omnigraph-compiler` — schema and query grammars, catalog, IR, lowering, type checker, lint, migration planner, OpenAI-style embedding client. -- `omnigraph` (engine, published as `omnigraph-engine` on crates.io since v0.2.2) — the Lance-backed runtime: manifest, commit graph, run registry, snapshot, exec, merge, loader, Gemini embedding client. +- `omnigraph` (engine, published as `omnigraph-engine` on crates.io since v0.2.2) — the Lance-backed runtime: manifest, commit graph, snapshot, exec (incl. per-query `MutationStaging` accumulator), merge, loader, Gemini embedding client. - `omnigraph-cli` — the `omnigraph` binary. - `omnigraph-server` — the `omnigraph-server` binary (Axum HTTP server). diff --git a/docs/audit.md b/docs/audit.md index 591c016..80ac137 100644 --- a/docs/audit.md +++ b/docs/audit.md @@ -1,6 +1,7 @@ # Audit / Actor tracking - `Omnigraph::audit_actor_id: Option` is the actor in effect. -- `_as` variants of every write API let callers override the actor: `begin_run_as`, `publish_run_as`, `ingest_as`, `mutate_as`, `branch_merge_as`, etc. -- Actor IDs are persisted both on `RunRecord.actor_id` and on `GraphCommit.actor_id`, with optional split storage in `_graph_commit_actors.lance` and `_graph_run_actors.lance`. +- `_as` variants of every write API let callers override the actor: `mutate_as`, `ingest_as`, `branch_merge_as`, `apply_schema_as`, etc. +- Actor IDs are persisted on `GraphCommit.actor_id` with split storage in `_graph_commit_actors.lance` (the commit graph is split into `_graph_commits.lance` for the linkage and `_graph_commit_actors.lance` for the actor map). - HTTP server uses the bearer-token actor automatically; CLI uses the local user / explicit env (no implicit actor). +- Pre-v0.4.0 repos also stored actor IDs on `RunRecord.actor_id` in `_graph_runs.lance` / `_graph_run_actors.lance`. The Run state machine was removed in MR-771; those files are inert post-v0.4.0 and reclaimed by MR-770's production sweep. diff --git a/docs/branches-commits.md b/docs/branches-commits.md index 2e8cca8..4501822 100644 --- a/docs/branches-commits.md +++ b/docs/branches-commits.md @@ -37,7 +37,7 @@ Storage is split across two Lance datasets (both with stable row IDs): Notes: -- Every successful publish (load / change / merge / schema_apply / publish_run) appends one commit. +- Every successful publish (load / change / merge / schema_apply) appends one commit. - Merge commits have two parents; linear commits have one. - API: `list_commits(branch)`, `get_commit(id)`, `head_commit_id_for_branch(branch)`. @@ -53,5 +53,5 @@ Notes: Filtered from `branch_list()` but visible to internals: -- `__run__` — ephemeral isolation branch for a transactional run. - `__schema_apply_lock__` — serializes schema migrations. +- `__run__` — legacy from the pre-v0.4.0 Run state machine (removed in MR-771). The branch-name guard predicate `is_internal_run_branch` is kept as defense-in-depth so users cannot create a branch matching the legacy prefix; the filter will be removed once production legacy branches are swept (MR-770). diff --git a/docs/cli.md b/docs/cli.md index 0634bea..ae8c152 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -56,12 +56,12 @@ omnigraph policy validate --config ./omnigraph.yaml omnigraph policy test --config ./omnigraph.yaml omnigraph policy explain --config ./omnigraph.yaml --actor act-alice --action read --branch main -omnigraph run list ./repo.omni --json -omnigraph run show --uri ./repo.omni --json -omnigraph run publish --uri ./repo.omni --json -omnigraph run abort --uri ./repo.omni --json +omnigraph commit list ./repo.omni --json +omnigraph commit show --uri ./repo.omni --json ``` +(The legacy `omnigraph run list/show/publish/abort` subcommands were removed in MR-771; mutations and loads publish atomically and the commit graph (`omnigraph commit list`) is the audit surface.) + `query lint` and `query check` are the same command surface. In v1, repo-backed lint uses local or `s3://` repo URIs; HTTP targets are only supported when you also pass `--schema`. diff --git a/docs/constants.md b/docs/constants.md index 198b77e..527aaea 100644 --- a/docs/constants.md +++ b/docs/constants.md @@ -4,8 +4,8 @@ |---|---|---| | `MANIFEST_DIR` | `__manifest` | `db/manifest/layout.rs` | | Commit graph dir | `_graph_commits.lance` | `db/commit_graph.rs` | -| Run registry dir | `_graph_runs.lance` | `db/run_registry.rs` | -| Run branch prefix | `__run__` | `db/run_registry.rs` | +| Run registry dir (legacy, removed MR-771) | `_graph_runs.lance` | inert post-v0.4.0; reclaimed by MR-770 | +| Run branch prefix (legacy, removed MR-771) | `__run__` | filtered by `is_internal_run_branch` defense-in-depth | | Schema apply lock | `__schema_apply_lock__` | `db/mod.rs` | | Manifest publisher retry budget | `PUBLISHER_RETRY_BUDGET = 5` | `db/manifest/publisher.rs` | | Internal manifest schema version | `INTERNAL_MANIFEST_SCHEMA_VERSION = 2` | `db/manifest/migrations.rs` | diff --git a/docs/errors.md b/docs/errors.md index 257ae4c..ad79e66 100644 --- a/docs/errors.md +++ b/docs/errors.md @@ -9,6 +9,7 @@ - `Manifest(ManifestError { kind: BadRequest|NotFound|Conflict|Internal, details: Option, … })` - `ManifestConflictDetails::ExpectedVersionMismatch { table_key, expected, actual }` — caller's `expected_table_versions` did not match the manifest's current latest non-tombstoned version (set by `OmniError::manifest_expected_version_mismatch`). - `ManifestConflictDetails::RowLevelCasContention` — Lance row-level CAS rejected the publish because a concurrent writer landed the same `object_id`. Retried internally by the publisher; only surfaces if the retry budget exhausts. + - **D₂ parse-time rejection** (MR-794): a single mutation query that mixes inserts/updates with deletes errors out *before any I/O* with kind `BadRequest`. Message: `mutation '' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes`. See [docs/query-language.md](query-language.md) for the rule and [docs/runs.md](runs.md) for the underlying staged-write rationale. - `MergeConflicts(Vec)` Compiler-side `NanoError` covers parse / catalog / type / storage / plan / execution / arrow / lance / IO / manifest / unique-constraint, each with structured spans (`SourceSpan { start, end }`) for ariadne-style diagnostics. diff --git a/docs/execution.md b/docs/execution.md index 6bf55a5..bd4842c 100644 --- a/docs/execution.md +++ b/docs/execution.md @@ -79,13 +79,16 @@ Hybrid example: `order { rrf(nearest($d.embedding, $q), bm25($d.body, $q_text)) ## Mutation execution (`exec/mutation.rs`) -Resolves expression values to literals, converts to typed Arrow arrays (`literal_to_typed_array(lit, DataType, num_rows)`), then writes: +Resolves expression values to literals, converts to typed Arrow arrays (`literal_to_typed_array(lit, DataType, num_rows)`), then writes via Lance's two-phase distributed-write API at end-of-query: -- `insert` → Lance `WriteMode::Append` -- `update` → Lance `merge_insert(WhenMatched::Update)` -- `delete` → Lance `merge_insert(WhenMatched::Delete)` (logical) or filtered overwrite. +- `insert` (no `@key`, edges) → accumulate into `MutationStaging.pending` (Append mode); finalize calls `stage_append` once per touched table. +- `insert` (`@key` node) → accumulate into `pending` (Merge mode); finalize calls `stage_merge_insert` once per touched table. +- `update` → scan committed via Lance + pending via DataFusion `MemTable` (read-your-writes), apply assignments, accumulate into `pending` (Merge mode). +- `delete` → still inline-commits via `delete_where` (Lance 4.0.0 has no public two-phase delete); recorded into `MutationStaging.inline_committed`. -Multi-statement mutations are atomic at the manifest commit boundary. +**D₂ parse-time rule.** A single mutation query is either insert/update-only or delete-only. Mixed → reject before any I/O. The check fires in `enforce_no_mixed_destructive_constructive(&ir)` inside `execute_named_mutation`. + +Multi-statement mutations are atomic at the publisher commit boundary: every insert/update batch lives in memory until end-of-query, then exactly one `stage_*` + `commit_staged` runs per touched table, then `ManifestBatchPublisher::publish` commits the manifest atomically with per-table `expected_table_versions` CAS. ### Mutation flow — sequence @@ -93,57 +96,58 @@ Multi-statement mutations are atomic at the manifest commit boundary. sequenceDiagram autonumber participant client as Client - participant og as Omnigraph::mutate
(mutation.rs:511) + participant og as Omnigraph::mutate_as
(mutation.rs) participant cmp as omnigraph-compiler - participant runs as RunRegistry + participant stg as MutationStaging
(exec/staging.rs) participant ts as table_store participant lance as Lance dataset - participant mr as ManifestRepo
(manifest.rs:280) + participant pub as ManifestBatchPublisher - client->>og: mutate(target, source, name, params) - og->>cmp: parse + typecheck_query - cmp-->>og: CheckedQuery (Mutation IR) - og->>runs: begin_run(target, op_hash)
fork __run__ from target head - runs-->>og: RunRecord - loop for each mutation statement (on __run__) - og->>og: resolve expression literals
literal_to_typed_array(lit, type, n) - alt insert - og->>ts: append RecordBatches - ts->>lance: WriteMode::Append → new fragment(s) - else update - og->>ts: merge_insert keyed by id - ts->>lance: merge_insert(WhenMatched::Update) - else delete - og->>ts: merge_insert with delete predicate - ts->>lance: merge_insert(WhenMatched::Delete) + client->>og: mutate_as(branch, source, name, params, actor_id) + og->>cmp: parse + typecheck + lower_mutation_query + cmp-->>og: MutationIR + og->>og: enforce_no_mixed_destructive_constructive (D₂) + loop for each mutation op + og->>og: resolve literals + build batch + alt insert / update (accumulate) + og->>ts: open dataset @ pre-write version (first touch) + og->>stg: ensure_path + append_batch (PendingMode) + opt update — scan committed + pending + og->>ts: scan_with_pending (Lance + DataFusion MemTable union) + ts-->>og: matched batches + end + else delete (inline-commit, D₂ keeps separate) + og->>ts: delete_where (advances Lance HEAD) + og->>stg: record_inline (SubTableUpdate) end - lance-->>ts: new dataset version - og->>mr: commit_updates(SubTableUpdate)
per-statement commit on __run__ - mr-->>og: ack end - og->>og: OCC: target head unchanged since begin_run? - og->>og: publish_run(run_id) - alt fast path (target hasn't moved) - og->>mr: commit_updates_on_branch(target, updates)
promote run snapshot - else merge path (target advanced) - og->>og: branch_merge_internal(__run__, target)
three-way merge + og->>stg: finalize(db, branch) + loop per pending table + stg->>ts: stage_append OR stage_merge_insert (one per table) + ts-->>stg: StagedWrite (transaction + fragments) + stg->>ts: commit_staged (advances Lance HEAD) + ts-->>stg: new Dataset end - mr-->>og: new target snapshot - og->>runs: terminate_run(Published) + stg-->>og: (updates: Vec, expected_versions) + og->>pub: commit_updates_on_branch_with_expected + pub->>pub: publisher CAS (cross-table OCC on __manifest) + pub-->>og: new manifest version og-->>client: MutationResult ``` **Code paths:** -- Entry: `Omnigraph::mutate` at `crates/omnigraph/src/exec/mutation.rs:511` -- Per-mutation orchestration: `mutate_with_current_actor` at `crates/omnigraph/src/exec/mutation.rs:539` -- Per-statement commit on the run-branch: `commit_updates` (called from `execute_insert` / `execute_update` / `execute_delete` in `crates/omnigraph/src/exec/mutation.rs`) -- Run publish: `Omnigraph::publish_run` at `crates/omnigraph/src/db/omnigraph.rs:858` -- Manifest commit primitive: `ManifestRepo::commit` at `crates/omnigraph/src/db/manifest.rs:280` (called from both per-statement `commit_updates` and the publish path) +- Entry: `Omnigraph::mutate_as` at `crates/omnigraph/src/exec/mutation.rs` +- Per-mutation orchestration: `mutate_with_current_actor` at `crates/omnigraph/src/exec/mutation.rs` +- D₂ check: `enforce_no_mixed_destructive_constructive` (in the same file) +- Per-op execution: `execute_insert`, `execute_update`, `execute_delete_node`, `execute_delete_edge` +- Pending-aware reads: `TableStore::scan_with_pending` / `count_rows_with_pending` at `crates/omnigraph/src/table_store.rs` +- Edge cardinality with pending: `validate_edge_cardinality_with_pending` at `crates/omnigraph/src/exec/mutation.rs` +- Per-query accumulator: `crates/omnigraph/src/exec/staging.rs` (`MutationStaging`, `PendingTable`, `PendingMode`, `finalize`) +- End-of-query Lance commit: `TableStore::stage_append`, `stage_merge_insert`, `commit_staged` at `crates/omnigraph/src/table_store.rs` +- Manifest commit primitive: `commit_updates_on_branch_with_expected` at `crates/omnigraph/src/db/omnigraph/table_ops.rs` -Multi-statement mutations don't get atomicity from a single final `commit` — they get it from the **run-branch + publish_run** pattern. By default a mutation forks a fresh `__run__` branch (`begin_run`); each statement individually commits its sub-table updates to that run-branch. After all statements complete, an OCC pre-check verifies the target hasn't moved since the run started, then `publish_run` atomically promotes the run-branch into the target — either via the fast path (direct promotion if the target hasn't moved) or a three-way merge. That final publish is what gives multi-statement mutations their atomicity guarantee (per [`docs/invariants.md`](invariants.md) §VI.26). If anything fails mid-run, the run is failed and the run-branch is dropped without affecting the target. - -One exception: if the caller already targets a `__run__` branch (mutation.rs:555), the mutation runs directly on that branch with no nested run wrapping — the assumption is the caller is managing the surrounding run lifecycle themselves. See [runs.md](runs.md) for the full run lifecycle. +Atomicity guarantee for multi-statement mutations: a mid-query failure leaves Lance HEAD untouched on staged tables (no inline commit happened during op execution), so the next mutation proceeds normally with no `ExpectedVersionMismatch`. The publisher CAS at the very end either succeeds (manifest advances atomically across all touched sub-tables) or fails with a typed `ManifestConflictDetails::ExpectedVersionMismatch` (no partial publish). See [docs/invariants.md §VI.25 / §VI.32](invariants.md) and [docs/runs.md](runs.md). ## Bulk loader (`loader/mod.rs`) @@ -156,19 +160,18 @@ One exception: if the caller already targets a `__run__` branch (mutation.rs ## Load modes (`LoadMode`) -| Mode | Semantics | -|---|---| -| `Overwrite` | Replace all data in the target tables on the branch | -| `Append` | Strict insert; duplicates error | -| `Merge` | Upsert by id (`merge_insert`) | +| Mode | Semantics | Path (post-MR-794) | +|---|---|---| +| `Overwrite` | Replace all data in the target tables on the branch | Inline-commit per type, then publisher CAS at end-of-load. Truncate-then-append doesn't fit the staged shape; documented residual. | +| `Append` | Strict insert; duplicates error | In-memory `MutationStaging` accumulator; one `stage_append` + `commit_staged` per touched table at end-of-load; publisher CAS. | +| `Merge` | Upsert by `id` (`merge_insert`) | Same accumulator; one `stage_merge_insert` per touched table at end-of-load (Merge mode dedupes by `id`, last-write-wins); publisher CAS. | + +For Append/Merge, a mid-load failure (RI / cardinality violation, validation error) leaves Lance HEAD untouched on the staged tables — the next load on the same tables proceeds normally with no `ExpectedVersionMismatch`. For Overwrite, a mid-load failure can still leave Lance HEAD on a partially-truncated table; the next overwrite replaces it. ## `load` vs `ingest` -- `load(branch, data, mode)` — direct load to a branch. -- `ingest(branch, from, data, mode)` — branch-creating, transactional load: - 1. If target advanced since the run started, fork a fresh run branch from `from`. - 2. Load into the run branch (Append). - 3. If target hasn't moved, fast-publish; otherwise abort. +- `load(branch, data, mode)` — direct load to a branch (single publisher commit per call). +- `ingest(branch, from, data, mode)` — branch-creating wrapper: if `branch` doesn't exist, fork it from `from` (default `main`) via `branch_create_from`, then call `load(branch, data, mode)`. - Returns `IngestResult { branch, base_branch, branch_created, mode, tables[] }`. - `ingest_as(actor_id)` records the actor on the resulting commit. diff --git a/docs/invariants.md b/docs/invariants.md index 96a99b4..3e3af86 100644 --- a/docs/invariants.md +++ b/docs/invariants.md @@ -110,7 +110,7 @@ Specific defaults (timeout values, memory caps, TTL windows) are *configuration* *Status: aspirational — referential integrity at scale requires SIP-backed cross-table validation; not yet implemented. Cross-batch / cross-version uniqueness tracked in MR-714.* 25. **Isolation: per-query snapshot; read-your-writes within and across queries in a session.** Each query reads from one consistent manifest version. Within a multi-statement mutation, the read subplan inside each write operator sees the writes from earlier statements. Across queries in a session, reads always resolve the latest manifest version — no reader pinning to older snapshots. - *Status: open — read-your-writes within a multi-statement mutation requires Kuzu-style local-uncommitted scan path; deferred per MR-737 §10.10.* + *Status: upheld for inserts/updates after MR-794 step 2+ — `MutationStaging`'s in-memory accumulator + `TableStore::scan_with_pending` (DataFusion `MemTable` union with the committed Lance scan) implements read-your-writes within a multi-statement mutation. Delete-touching mutations are limited to delete-only by parse-time D₂; closing the within-query RYW gap for deletes requires Lance's two-phase delete API (tracked: MR-793 / Lance-upstream).* 26. **Durability before acknowledgement.** Commit returns only after the substrate has confirmed durable persistence. No "fast" or "fire-and-forget" durability levels. diff --git a/docs/query-language.md b/docs/query-language.md index f90fe60..5c98959 100644 --- a/docs/query-language.md +++ b/docs/query-language.md @@ -64,6 +64,14 @@ Used inside MATCH or as expressions inside RETURN/ORDER: `` is a literal, `$param`, or `now()`. Multi-statement mutations execute atomically (added in v0.2.0). +### D₂ — mixed insert/update + delete is rejected at parse time + +A single mutation query must be **either insert/update-only or delete-only**. Mixed → rejected before any I/O with the message: + +> `mutation '' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes. This restriction lifts when Lance exposes a two-phase delete API (tracked: MR-793 / Lance-upstream).` + +Reason: under the staged-write rewire (MR-794), inserts and updates accumulate in memory and commit at end-of-query, while deletes still inline-commit (Lance 4.0.0 has no public two-phase delete). Mixing creates ordering hazards (same-row insert→delete becomes a no-op because the staged insert isn't visible to delete; cascading deletes of just-inserted edges break referential integrity by silent design). Until Lance exposes `DeleteJob::execute_uncommitted`, the parse-time rejection keeps both paths atomic and correct. See [docs/runs.md](runs.md) and [docs/invariants.md §VI.25](invariants.md). + ## IR (Intermediate Representation) `QueryIR { name, params, pipeline: Vec, return_exprs, order_by, limit }` diff --git a/docs/releases/v0.4.1.md b/docs/releases/v0.4.1.md new file mode 100644 index 0000000..4c5d44a --- /dev/null +++ b/docs/releases/v0.4.1.md @@ -0,0 +1,138 @@ +# Omnigraph v0.4.1 + +Omnigraph v0.4.1 closes the multi-statement-mutation atomicity gap that +v0.4.0 documented as a known limitation. Inserts and updates now route +through an in-memory `MutationStaging` accumulator and commit via Lance's +two-phase distributed-write API at end-of-query. A failed mid-query op +no longer leaves Lance HEAD drifted on the touched table — the next +mutation proceeds normally. + +## Highlights + +- **Staged-write rewire (MR-794)**: `mutate_as` and `load` (Append / + Merge modes) accumulate insert/update batches into + `MutationStaging.pending` per touched table. No Lance HEAD advance + happens during op execution; one `stage_*` + `commit_staged` per + table runs at end-of-query, then `ManifestBatchPublisher::publish` + commits the manifest atomically. A mid-query failure leaves Lance + HEAD untouched on staged tables. +- **D₂ parse-time rule**: a single mutation query is either + insert/update-only or delete-only. Mixed → rejected with a clear + error directing the caller to split into two queries. Lance 4.0.0 + has no public two-phase delete; deletes still inline-commit, and D₂ + keeps that path safe. +- **Read-your-writes via DataFusion `MemTable`**: read sites in + multi-statement mutations consume `TableStore::scan_with_pending`, + which Lance-scans the committed snapshot at the captured + `expected_version` and unions with a DataFusion `MemTable` over the + pending batches. Replaces the previous "reopen at staged Lance + version" pattern. +- **Coordinator swap-restore eliminated** from `mutate_with_current_actor`. + Branch is threaded explicitly through the per-op execution path + (`execute_named_mutation`, `execute_insert`, `execute_update`, + `execute_delete*`, `validate_edge_insert_endpoints`, + `ensure_node_id_exists`). The `swap_coordinator_for_branch` / + `restore_coordinator` API and `CoordinatorRestoreGuard` are removed + from `mutation.rs`. (`merge.rs` keeps its own swap pattern; that's + a separate workflow tracked in MR-793.) +- **`docs/invariants.md` §VI.25** flips from `aspirational/open` to + `upheld for inserts/updates`. The within-query read-your-writes + guarantee is now load-bearing for the publisher CAS contract. + +## Behavior changes + +- A failed multi-statement mutation no longer surfaces + `ExpectedVersionMismatch` on the *next* mutation against the same + table. The next call proceeds normally — Lance HEAD on staged + tables is unchanged. +- Mixed insert/update + delete in one query is rejected at parse + time. Existing test queries that mixed both must be split. +- `MutationStaging`'s shape changed: `pending: HashMap` + + `inline_committed: HashMap` replaces the + previous `latest: HashMap`. This is an internal + type; no public API impact. + +## Residual / out of scope + +- **`LoadMode::Overwrite`** keeps the legacy inline-commit path + (truncate-then-append doesn't fit the staged shape). A mid-overwrite + failure can still drift Lance HEAD on a partially-truncated table; + the next overwrite replaces it. Operator-driven, rare. +- **Delete-only multi-statement mutations** still inline-commit per op. + D₂ keeps inserts/updates from coexisting with deletes, so the + inline path remains atomic per op but not per query for delete-only + cascades. Closing this requires Lance to expose + `DeleteJob::execute_uncommitted`; tracked in MR-793 / Lance-upstream. +- **`schema_apply`, `branch_merge_internal`, `ensure_indices`** still + use Lance's inline-commit APIs. The two-phase pattern is in + `mutate_as` and `load` only; hoisting it to a storage-trait + invariant covering all writers is MR-793. + +## Tests added + +- `tests/runs.rs::partial_failure_leaves_target_queryable_and_unblocks_next_mutation` + (replaces the old `partial_failure_observably_rolls_back_but_blocks_next_mutation_on_same_table`) +- `tests/runs.rs::mutation_rejects_mixed_insert_and_delete_at_parse_time` +- `tests/runs.rs::mixed_insert_and_update_on_same_person_coalesces_to_one_merge` +- `tests/runs.rs::multiple_appends_to_same_edge_coalesce_to_one_append` +- `tests/runs.rs::multi_statement_inserts_publish_exactly_once` +- `tests/runs.rs::load_with_bad_edge_reference_unblocks_next_load` +- `tests/runs.rs::load_with_cardinality_violation_unblocks_next_load` + +## Files changed + +- `crates/omnigraph/src/exec/staging.rs` (NEW) — `MutationStaging`, + `PendingTable`, `PendingMode`, `StagedTablePath`, + `dedupe_merge_batches_by_id`. +- `crates/omnigraph/src/exec/mutation.rs` — D₂ check; per-op + rewires (`execute_insert`, `execute_update`, `execute_delete*`); + branch threading; coordinator-swap removal; helper + `validate_edge_cardinality_with_pending`; helper + `concat_match_batches_to_schema`; `apply_assignments` updated to + copy unassigned blob columns from full-schema scans. +- `crates/omnigraph/src/loader/mod.rs` — `load_jsonl_reader` split: + staged path for Append/Merge, legacy inline-commit path for + Overwrite. Helpers `collect_node_ids_with_pending` and + `validate_edge_cardinality_with_pending_loader`. +- `crates/omnigraph/src/table_store.rs` — `scan_with_pending`, + `count_rows_with_pending` (DataFusion `MemTable`-backed union with + Lance scan). +- `Cargo.toml` (workspace) + `crates/omnigraph/Cargo.toml` — added + `datafusion = "52"` direct dep (transitively pulled by Lance + already; required for `MemTable`). +- `docs/runs.md` — removed "Known limitation" section; documented + the new accumulator + D₂ + LoadMode::Overwrite residual. +- `docs/invariants.md` — §VI.25 status flipped to `upheld for + inserts/updates`. +- `docs/architecture.md` — added "Mutation atomicity — in-memory + accumulator (MR-794)" subsection; refreshed the engine + state + diagrams to drop `RunRegistry` and add `MutationStaging`. +- `docs/execution.md` — rewrote the mutation flow sequence diagram + for the staged-write path; updated the `LoadMode` table to call + out per-mode commit semantics; rewrote `load` vs `ingest`. +- `docs/query-language.md` — documented the D₂ parse-time rule. +- `docs/errors.md` — added the D₂ `BadRequest` rejection path. +- `docs/storage.md` — dropped the live `_graph_runs.lance` reference + (legacy from MR-771) from the layout diagram and prose. +- `docs/branches-commits.md` — moved `__run__` to a legacy note; + removed `publish_run` from the publish-trigger list. +- `docs/audit.md` — current `_as` API list refreshed; legacy + `RunRecord.actor_id` moved to a historical note. +- `docs/constants.md` — marked the run registry / branch-prefix rows + as legacy. +- `docs/cli.md` — replaced the legacy `omnigraph run *` quickstart + block with `omnigraph commit list/show`. +- `docs/testing.md` — extended the `runs.rs` row to cover the new + MR-794 contract tests; added the `staged_writes.rs` row. +- `AGENTS.md` (CLAUDE.md symlink) — updated the atomic-per-query + description and the L2 capability matrix row. + +## Included Changes + +- MR-794 step 2+ — rewire `mutate_as` and `load` via in-memory + `MutationStaging` + `stage_*` / `commit_staged` per touched table at + end-of-query. +- (MR-794 step 1 shipped in v0.4.0's PR #67 — `StagedWrite`, + `stage_append`, `stage_merge_insert`, `commit_staged`, + `scan_with_staged`, `count_rows_with_staged` — and is the substrate + this release builds on.) diff --git a/docs/runs.md b/docs/runs.md index 789d91d..63295fd 100644 --- a/docs/runs.md +++ b/docs/runs.md @@ -20,22 +20,60 @@ publisher's row-level CAS on `__manifest` is the single fence. A `.gq` query with multiple ops (e.g. `insert Person … insert Knows …`) must observe earlier ops' writes when validating later ops (referential -integrity, edge cardinality). After demotion this is implemented via an -in-process `MutationStaging` accumulator in -`crates/omnigraph/src/exec/mutation.rs`: +integrity, edge cardinality). After MR-794 step 2+ this is implemented +via an in-memory `MutationStaging` accumulator in +[`crates/omnigraph/src/exec/staging.rs`](../crates/omnigraph/src/exec/staging.rs), +shared by both `mutate_as` and the bulk loader: - On the first touch of each table, the pre-write manifest version is - captured into `expected_versions[table_key]`. -- Subsequent ops on the same table re-open the dataset at the locally - staged Lance version (bypassing the manifest, which has not been - committed yet) so they see prior writes. -- One `commit_with_expected(updates, expected_versions)` at the end - publishes the lot atomically. Cross-table conflicts surface as + captured into `expected_versions[table_key]` (the publisher's CAS + fence at end-of-query). +- Each insert/update op pushes a `RecordBatch` into the per-table + pending accumulator. Lance HEAD does **not** advance during op + execution. +- Read sites (validation, predicate matching for `update`) consume + `TableStore::scan_with_pending`, which scans committed via Lance + and applies the same SQL filter to the pending batches via DataFusion + `MemTable`. Same-query writes are visible to subsequent reads. +- At end-of-query, `MutationStaging::finalize` issues exactly one + `stage_*` + `commit_staged` per touched table (concatenating + accumulated batches; merge-mode dedupes by `id`, last-write-wins), + and the publisher publishes the manifest atomically across all + touched sub-tables. Cross-table conflicts surface as `ManifestConflictDetails::ExpectedVersionMismatch`. +- **Deletes still inline-commit.** Lance's `Dataset::delete` is not + exposed as a two-phase op in 4.0.0; deletes go through `delete_where` + immediately and record their post-write state in + `MutationStaging.inline_committed`. The parse-time D₂ rule (below) + prevents inserts/updates from coexisting with deletes in one query, + so the inline path is safe for delete-only mutations. This upholds [docs/invariants.md §VI.23](invariants.md) (atomicity per -query) and §VI.25 (read-your-writes within a multi-statement mutation — -previously aspirational, now upheld). +query) and §VI.25 (read-your-writes within a multi-statement mutation, +upheld). + +### D₂ — parse-time mixed-mode rejection + +A single mutation query is either insert/update-only or delete-only. +Mixed → rejected at parse time with a clear error directing the user to +split the query. Reason: mixing creates ordering hazards +(insert→delete on the same row would silently no-op because the staged +insert isn't visible to delete; cascading deletes of just-inserted +edges break referential integrity). Until Lance exposes a two-phase +delete API, the parse-time rejection keeps both paths atomic and +correct. Tracked: MR-793, plus a Lance-upstream ticket. + +### `LoadMode::Overwrite` residual + +The bulk loader's Append and Merge modes use the staged-write path +described above. `LoadMode::Overwrite` keeps the legacy inline-commit +path: truncate-then-append doesn't fit the staged shape cleanly in +Lance 4.0.0, and overwrite has no in-flight read-your-writes +requirement (the prior data is being wiped). A mid-overwrite failure +can leave Lance HEAD on a partially-truncated table; the next overwrite +will replace it. Operator-driven (rare in agent workloads); document +permanently until Lance exposes `Operation::Overwrite { fragments }` as +a two-phase op. ## Conflict shape @@ -59,40 +97,32 @@ list`. `_graph_runs.lance` belongs in MR-770 (the production sweep) — this PR stops *creating* run state but does not destroy legacy bytes on disk. -## Known limitation: mid-query partial failure on the same table +## Mid-query partial failure: closed by MR-794 -A multi-statement `.gq` mutation where op-N writes a Lance fragment -successfully and op-N+1 then fails leaves the touched table at -`Lance HEAD = manifest_version + 1`. The query is atomic at the manifest -level (the publisher never publishes, so reads at the pinned manifest -version do *not* see op-N's data), but the *next* mutation against the -same table fails loudly with -`ManifestConflictDetails::ExpectedVersionMismatch` because -`ensure_expected_version` enforces strict equality between Lance HEAD and -the manifest's pinned version. +The pre-MR-794 design had a known limitation: a multi-statement `.gq` +mutation where op-N inline-committed a Lance fragment and op-N+1 then +failed left the touched table at `Lance HEAD = manifest_version + 1`, +blocking the next mutation with `ExpectedVersionMismatch`. -**Why the engine doesn't auto-rollback**: Lance's `Dataset::restore()` is -*not* a rewind — it appends a new commit (containing the desired -historical version's data) and advances HEAD further. There is no Lance -API to delete a committed version. A proper fix requires writing each -mutation's per-table fragments to a *transient Lance branch* on the -sub-table, then fast-forwarding main on success or dropping the branch -on failure. That work is tracked as a follow-up to MR-771; in the -meantime: +MR-794 (step 1 + step 2+) closed this for inserts/updates **by +construction at the writer layer**: insert and update batches accumulate +in memory; no Lance HEAD advance happens during op execution; one +`stage_*` + `commit_staged` per touched table runs at end-of-query, and +only after every op succeeded. A failed op leaves Lance HEAD untouched +on the staged tables, so the next mutation proceeds normally with no +drift to reconcile. -- **In practice this is rare.** Most schema-language validation - (`@key`, `@enum`, `@range`, intra-batch uniqueness, edge-endpoint - existence) runs *before* any Lance write inside the failing op, so - single-statement mutations never trip this. The narrow path is - multi-statement queries (`insert ... insert ...`, - `insert ... update ...`) where a late op fails on validation that - depends on earlier ops' staged data. -- **Workaround**: callers that hit this should refresh the handle and - retry the mutation; if Lance HEAD remains drifted the - `omnigraph cleanup` command will GC the orphan version once a later - successful commit on the same table moves HEAD past it. (`cleanup` - cannot reclaim an orphan that *is* the current Lance HEAD; that case - needs the per-table-branch follow-up to fully heal.) +The cancellation case (future drop mid-mutation) inherits the same +guarantee — the in-memory accumulator evaporates with the dropped task +and no Lance write was ever issued. -The cancellation case (future drop mid-mutation) has the same shape and -the same workaround. +For delete-touching mutations the legacy inline-commit shape is +preserved (Lance has no public two-phase delete in 4.0.0) — the same +narrow window remains. The parse-time D₂ rule prevents inserts/updates +from coexisting with deletes in one query, so a pure-delete failure +cannot drift any staged-table state. If a delete-only multi-table +mutation fails mid-cascade, the same workaround as before applies +(retry; rely on `omnigraph cleanup` once a later successful commit +moves HEAD past the orphan version). Closing this requires Lance to +expose `DeleteJob::execute_uncommitted`; tracked in MR-793 and a +Lance-upstream ticket. diff --git a/docs/storage.md b/docs/storage.md index 153b080..90806d0 100644 --- a/docs/storage.md +++ b/docs/storage.md @@ -22,7 +22,7 @@ OmniGraph is **not** a single Lance dataset; it is a *graph* of datasets coordin - `edges/{fnv1a64-hex(edge_type_name)}` — one Lance dataset per edge type - `__manifest/` — the catalog of all sub-tables and their published versions - `_graph_commits.lance` / `_graph_commit_actors.lance` — the commit graph and its actor map - - `_graph_runs.lance` / `_graph_run_actors.lance` — the run registry and its actor map + - (legacy `_graph_runs.lance` / `_graph_run_actors.lance` from pre-v0.4.0 repos are inert; the run state machine was removed in MR-771 and these files are cleaned up via MR-770's production sweep) - **Manifest row schema** (`object_id, object_type, location, metadata, base_objects, table_key, table_version, table_branch, row_count`): - `object_type` ∈ `table | table_version | table_tombstone` - `table_key` ∈ `node: | edge:` @@ -63,14 +63,12 @@ flowchart TB nodes["nodes/{fnv1a64-hex}/
one dataset per node type"]:::l2 edges["edges/{fnv1a64-hex}/
one dataset per edge type"]:::l2 cgraph["_graph_commits.lance/
_graph_commit_actors.lance/"]:::l2 - runs["_graph_runs.lance/
_graph_run_actors.lance/"]:::l2 refs["_refs/branches/{name}.json
graph-level branches"]:::l2 repo --> manifest repo --> nodes repo --> edges repo --> cgraph - repo --> runs repo --> refs subgraph dataset[Inside each Lance dataset — L1] @@ -91,7 +89,7 @@ flowchart TB - **Repo root** is one directory (or S3 prefix). Everything below is part of one OmniGraph repo. - **`__manifest/`** is a Lance dataset whose rows describe which sub-table version is published at which graph-branch. Reading a snapshot starts here. - **`nodes/`** and **`edges/`** are sibling directories holding one Lance dataset per declared type. Names are `fnv1a64-hex` of the type name to keep paths fixed-length and case-safe. -- **`_graph_commits.lance` / `_graph_runs.lance`** are L2 datasets that record the graph-level commit DAG and run registry respectively (each has a paired `*_actors.lance` for the actor map). +- **`_graph_commits.lance`** is an L2 dataset that records the graph-level commit DAG, with a paired `_graph_commit_actors.lance` for the actor map. (Pre-v0.4.0 repos also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; MR-770 sweeps these in production.) - **`_refs/branches/{name}.json`** is graph-level branch metadata — pointers from a branch name to the manifest version it heads. - **Inside each Lance dataset** (orange): the standard Lance directory layout. `_versions/{n}.manifest` records every commit; `data/` holds the actual Arrow fragments; `_indices/{uuid}/` holds index segments with their own `fragment_bitmap` for partial coverage; `_refs/` holds Lance-native per-dataset branches and tags. diff --git a/docs/testing.md b/docs/testing.md index f4330db..4ba3cbc 100644 --- a/docs/testing.md +++ b/docs/testing.md @@ -19,7 +19,8 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav |---|---| | `end_to_end.rs` | Full init → load → query/mutate flow | | `branching.rs` | Branch create / list / delete, lazy fork | -| `runs.rs` | Transactional runs (begin/publish/abort), idempotency | +| `runs.rs` | Direct-publish writes: cancellation, concurrent-writer CAS, multi-statement atomicity, MR-794 staged-write rewire (D₂ rejection, insert+update coalesce, multi-append coalesce, partial-failure recovery, load RI/cardinality recovery) | +| `staged_writes.rs` | TableStore staged-write primitives (`stage_append`, `stage_merge_insert`, `commit_staged`, `scan_with_staged`, `count_rows_with_staged`) — primitive-level only; engine code uses the in-memory `MutationStaging` accumulator instead | | `lifecycle.rs` | Repo lifecycle, schema state | | `point_in_time.rs` | Snapshots, time travel (`snapshot_at_version`, `entity_at`) | | `changes.rs` | `diff_between` / `diff_commits` |