Merge pull request #68 from ModernRelay/ragnorc/mr794-rewire

MR-794 step 2: in-memory accumulator rewire for mutate_as + load
2026-06-24 02:38:06 +02:00 · 2026-05-01 23:06:10 +02:00 · 2026-05-01 23:06:10 +02:00 · 6f60c0cbcf
commit 6f60c0cbcf
parent 68c9d1bc91 044ed46019
36 changed files with 3093 additions and 697 deletions
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@ -161,6 +161,16 @@ jobs:
          OMNIGRAPH_UPDATE_OPENAPI: ${{ (github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository) && '1' || '' }}
        run: cargo test --workspace --locked
      - name: Run failpoints feature test
        if: needs.classify_changes.outputs.run_full_ci == 'true'
        # Run after the workspace test so the build cache is warm —
        # enabling --features failpoints is just an incremental rebuild
        # of omnigraph-engine + the small `fail` crate, not the full
        # dep tree (lance, datafusion). A separate job with its own
        # cache key would be a fresh ~20min build on first run; this
        # is ~30s on a warm cache.
        run: cargo test --locked -p omnigraph-engine --features failpoints --test failpoints
      - name: Commit regenerated openapi.json to PR branch
        if: |
          needs.classify_changes.outputs.run_full_ci == 'true' &&
--- a/AGENTS.md
+++ b/AGENTS.md
@ -32,7 +32,7 @@ OmniGraph is a typed property-graph engine built as a coordination layer over ma
 - **Languages**: a `.pg` schema language and a `.gq` query language, both Pest-based, with a typed IR.
 - **Multi-modal querying**: vector ANN (`nearest`), full-text (`search`/`fuzzy`/`match_text`/`bm25`), Reciprocal Rank Fusion (`rrf`), and graph traversal (`Expand`, anti-join `not { … }`) in one runtime.
 - **Branches and commits across the whole graph**: Git-style — every successful publish appends to a commit DAG; merges are three-way at the row level.
- **Atomic per-query writes**: `mutate_as` and `load` capture per-table `expected_table_versions` before writing and call `ManifestBatchPublisher::publish` once at the end. Cross-table OCC enforced inside the publisher's row-level CAS; no staging branches, no run state machine.
+- **Atomic per-query writes**: `mutate_as` and `load` accumulate insert/update batches into an in-memory `MutationStaging.pending` per touched table; one `stage_*` + `commit_staged` per table runs at end-of-query, then `ManifestBatchPublisher::publish` commits the manifest atomically with per-table `expected_table_versions` CAS. A mid-query failure leaves Lance HEAD untouched on staged tables — no drift, no run state machine, no staging branches. Deletes still inline-commit; D₂ at parse time prevents inserts/updates and deletes from coexisting in one query.
 - **HTTP server**: Axum + utoipa OpenAPI, bearer auth (SHA-256 hashed, optional AWS Secrets Manager), Cedar policy gating.
 - **CLI** driven by a single `omnigraph.yaml`; multi-format output (json/jsonl/csv/kv/table).
@ -211,7 +211,7 @@ omnigraph policy explain --actor act-alice --action change --branch main
 | Query language | — | `.gq` + Pest grammar + IR + lowering + linter |
 | Schema migration planning | — | `plan_schema_migration` + `apply_schema` step types + `__schema_apply_lock__` |
 | Commit graph (DAG) across whole repo | — | `_graph_commits.lance` with linear + merge parents, ULID ids, actor map |
-| Per-query atomic writes | — | `MutationStaging` accumulator + `commit_with_expected` publisher CAS, single commit per `mutate_as` / `load` |
+| Per-query atomic writes | — | In-memory `MutationStaging.pending` accumulator + `stage_*` / `commit_staged` per touched table at end-of-query + publisher CAS via `commit_with_expected` (single manifest commit per `mutate_as` / `load`); D₂ parse-time rule keeps inserts/updates and deletes from mixing |
 | Three-way row-level merge | — | `OrderedTableCursor` + `StagedTableWriter`, structured `MergeConflictKind` |
 | Change feeds | — | `diff_between` / `diff_commits` with manifest fast path + ID streaming |
 | Cedar policy | — | 8 actions, branch / target_branch / protected scopes, validate/test/explain CLI |
--- a/Cargo.lock
+++ b/Cargo.lock
@ -4647,6 +4647,7 @@ dependencies = [
 "async-trait",
 "base64",
 "chrono",
 "datafusion",
 "fail",
 "futures",
 "lance",
--- a/Cargo.toml
+++ b/Cargo.toml
@ -20,6 +20,7 @@ arrow-select = "57"
 arrow-cast = { version = "57", features = ["prettyprint"] }
 arrow-ord = "57"
 datafusion = { version = "52", default-features = false }
 datafusion-physical-plan = "52"
 datafusion-physical-expr = "52"
 datafusion-execution = "52"
--- a/crates/omnigraph-cli/tests/cli.rs
+++ b/crates/omnigraph-cli/tests/cli.rs
@ -1905,7 +1905,7 @@ fn cli_fails_for_invalid_merge_requests() {
    );
 }
-// MR-771: `omnigraph run list/show/publish/abort` subcommands removed
+// `omnigraph run list/show/publish/abort` subcommands removed
 // alongside the run state machine. Direct-to-target writes leave nothing
 // for these CLIs to manage. Audit history is now visible via
 // `omnigraph commit list` reading the commit graph.
--- a/crates/omnigraph-cli/tests/system_local.rs
+++ b/crates/omnigraph-cli/tests/system_local.rs
@ -307,7 +307,7 @@ fn local_cli_end_to_end_branch_change_merge_flow() {
    assert_eq!(main_read["row_count"], 1);
    assert_eq!(main_read["rows"][0]["p.name"], "Zoe");
-    // MR-771: `omnigraph run list` removed. Audit visible via commit list.
+    // `omnigraph run list` removed. Audit visible via commit list.
    let commits_payload = parse_stdout_json(&output_success(
        cli()
            .arg("commit")
@ -597,8 +597,8 @@ fn local_cli_failed_load_keeps_target_state_unchanged() {
        snapshot_table_row_count(&repo, "edge:Knows"),
        knows_rows_before
    );
-    // MR-771: failed loads no longer leave a RunRecord. The atomicity
+    // Failed loads leave no run record (the run lifecycle has been
-    // guarantee is verified above (target tables are unchanged).
+    // removed); atomicity is verified above by the unchanged target.
 }
 #[test]
@ -631,9 +631,8 @@ fn local_cli_failed_change_keeps_target_state_unchanged() {
            .arg("--json"),
    ));
    assert_eq!(friends_payload["row_count"], 2);
-    // MR-771: failed mutations no longer leave a RunRecord. The atomicity
+    // Failed mutations leave no run record (the run lifecycle has been
-    // guarantee is verified above (the friends_of read above shows main
+    // removed); atomicity is verified above by the unchanged target.
    // unchanged).
 }
 #[test]
@ -941,12 +940,13 @@ query vector_search($q: String) {
    assert_eq!(result["rows"][0]["d.slug"], "alpha-doc");
 }
-// MR-771: the publisher CAS conflict shape is verified end-to-end at the
+// The publisher CAS conflict shape is verified end-to-end at the engine
-// engine level in `crates/omnigraph/tests/runs.rs::concurrent_writers_one_succeeds_one_gets_expected_version_mismatch`
+// level in
 // `crates/omnigraph/tests/runs.rs::concurrent_writers_one_succeeds_one_gets_expected_version_mismatch`
 // and at the HTTP boundary in
 // `crates/omnigraph-server/tests/server.rs::change_conflict_returns_manifest_conflict_409`.
-// The pre-MR-771 CLI-level race was timing-dependent; with direct-publish
+// A CLI-level race would be timing-dependent; with direct-publish the
-// the surface is the same engine path the unit test already covers.
+// surface is the same engine path the unit test already covers.
 #[test]
 fn local_cli_policy_tooling_is_end_to_end_while_local_writes_stay_unenforced() {
--- a/crates/omnigraph-cli/tests/system_remote.rs
+++ b/crates/omnigraph-cli/tests/system_remote.rs
@ -192,7 +192,7 @@ query insert_person($name: String, $age: I32) {
    assert_eq!(local_verify["row_count"], 1);
    assert_eq!(local_verify["rows"][0]["p.name"], "Mina");
-    // MR-771: `run publish` / `run list` removed. Direct-to-target writes
+    // `run publish` / `run list` removed. Direct-to-target writes
    // already landed via the change call above; the commit graph is now
    // the audit surface (verified separately by `commit list`).
 }
--- a/crates/omnigraph-server/tests/server.rs
+++ b/crates/omnigraph-server/tests/server.rs
@ -1346,7 +1346,7 @@ async fn policy_blocks_non_admin_merge_to_main_and_allows_admin() {
 #[tokio::test(flavor = "multi_thread")]
 async fn authenticated_change_stamps_actor_on_commits() {
-    // MR-771: with the Run state machine removed, actor_id is recorded
+    // With the Run state machine removed, actor_id is recorded
    // directly on the commit graph (no intermediate run record).
    let (_temp, app) = app_for_loaded_repo_with_auth_tokens(&[("act-andrew", "token-one")]).await;
@ -2108,10 +2108,10 @@ query vector_search_string($q: String) {
 #[tokio::test(flavor = "multi_thread")]
 async fn change_conflict_returns_manifest_conflict_409() {
-    // MR-771: a write that races with another writer surfaces as HTTP 409
+    // A write that races with another writer surfaces as HTTP 409 with
-    // with a structured `manifest_conflict` body — `table_key`, `expected`,
+    // a structured `manifest_conflict` body — `table_key`, `expected`,
-    // and `actual` — so clients can detect-and-retry without parsing the
+    // and `actual` — so clients can detect-and-retry without parsing
-    // message. (Replaces the old run-publish merge-conflict shape.)
+    // the message.
    let temp = init_loaded_repo().await;
    let repo = repo_path(temp.path());
--- a/crates/omnigraph/Cargo.toml
+++ b/crates/omnigraph/Cargo.toml
@ -19,6 +19,7 @@ failpoints = ["dep:fail", "fail/failpoints"]
 omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.3.1" }
 lance = { workspace = true }
 lance-datafusion = { workspace = true }
 datafusion = { workspace = true }
 lance-file = { workspace = true }
 lance-index = { workspace = true }
 lance-linalg = { workspace = true }
--- a/crates/omnigraph/src/db/omnigraph.rs
+++ b/crates/omnigraph/src/db/omnigraph.rs
@ -1430,12 +1430,12 @@ edge WorksAt: Person -> Company
    #[tokio::test]
    async fn test_apply_schema_succeeds_after_load() {
-        // MR-670 + MR-674: schema apply used to be blocked by leftover
+        // Historical: schema apply used to be blocked by leftover
-        // __run__ branches. MR-670 added a defense-in-depth filter that
+        // `__run__` branches. A defense-in-depth filter now skips
-        // skips internal system branches. MR-674 made run branches
+        // internal system branches, and run branches were made
-        // ephemeral on every terminal state, so in practice no __run__
+        // ephemeral on every terminal state — so in practice no
-        // branch survives publish — but the filter still guards the
+        // `__run__` branch survives publish. The filter still guards
-        // invariant.
+        // the invariant.
        let dir = tempfile::tempdir().unwrap();
        let uri = dir.path().to_str().unwrap();
        let mut db = Omnigraph::init(uri, TEST_SCHEMA).await.unwrap();
@ -1451,7 +1451,7 @@ edge WorksAt: Person -> Company
        let all_branches = db.coordinator.all_branches().await.unwrap();
        assert!(
            !all_branches.iter().any(|b| is_internal_run_branch(b)),
-            "MR-674: run branch should be deleted after publish, got: {:?}",
+            "run branch should be deleted after publish, got: {:?}",
            all_branches
        );
--- a/crates/omnigraph/src/db/omnigraph/optimize.rs
+++ b/crates/omnigraph/src/db/omnigraph/optimize.rs
@ -15,8 +15,8 @@
 //!   retention. Destructive to version history — callers should gate this
 //!   behind an explicit confirm flag at the CLI layer.
 //!
-//! Both walk every node + edge table on the `main` branch. Run branches are
+//! Both walk every node + edge table on the `main` branch. Run branches
-//! ephemeral by design (MR-670 / MR-674) so we do not optimize them.
+//! are ephemeral by design so we do not optimize them.
 use std::time::Duration;
--- a/crates/omnigraph/src/db/omnigraph/schema_apply.rs
+++ b/crates/omnigraph/src/db/omnigraph/schema_apply.rs
@ -34,9 +34,9 @@ pub(super) async fn apply_schema_with_lock(
    let branches = db.coordinator.all_branches().await?;
    // Skip `main` and internal system branches. The schema-apply lock branch
    // is excluded because it is the cluster-wide schema-apply serializer.
-    // `__run__*` branches are no longer created (MR-771); the filter remains
+    // `__run__*` branches are no longer created; the filter remains as
-    // as defense-in-depth for legacy repos with leftover staging branches —
+    // defense-in-depth for legacy repos with leftover staging branches.
-    // MR-770 will sweep them and this guard can go.
+    // A future production sweep will let this guard go.
    let blocking_branches = branches
        .into_iter()
        .filter(|branch| branch != "main" && !is_internal_system_branch(branch))
--- a/crates/omnigraph/src/db/run_registry.rs
+++ b/crates/omnigraph/src/db/run_registry.rs
@ -1,12 +1,12 @@
-// Run state-machine has been removed (MR-771). Mutations now write directly
+// The Run state machine has been removed. Mutations now write directly
-// to target tables and use the publisher's `expected_table_versions` CAS for
+// to target tables and use the publisher's `expected_table_versions`
-// cross-table OCC; the `__run__<id>` staging branches and `_graph_runs.lance`
+// CAS for cross-table OCC; `__run__<id>` staging branches and the
-// state machine no longer exist.
+// `_graph_runs.lance` state machine no longer exist.
 //
-// What remains is the branch-name predicate, kept as a defense-in-depth guard
+// What remains is the branch-name predicate, kept as a defense-in-depth
-// against users naming a public branch `__run__*`. MR-770 owns the production
+// guard against users naming a public branch `__run__*`. A future
-// sweep of legacy `_graph_runs.lance` rows and stale `__run__*` branches; once
+// production sweep of legacy `_graph_runs.lance` rows and stale
-// that lands the predicate (and this file) can go too.
+// `__run__*` branches will let this predicate (and this file) go too.
 pub(crate) const INTERNAL_RUN_BRANCH_PREFIX: &str = "__run__";
--- a/crates/omnigraph/src/exec/mod.rs
+++ b/crates/omnigraph/src/exec/mod.rs
@ -46,3 +46,4 @@ mod merge;
 mod mutation;
 mod projection;
 mod query;
 pub(crate) mod staging;
--- a/crates/omnigraph/src/exec/mutation.rs
+++ b/crates/omnigraph/src/exec/mutation.rs
--- a/crates/omnigraph/src/exec/staging.rs
+++ b/crates/omnigraph/src/exec/staging.rs
@ -0,0 +1,662 @@
 //! Per-query staging accumulator for direct-publish writes.
 //!
 //! `MutationStaging` accumulates per-table input batches in memory during a
 //! `mutate_as` or `load` query, then at end-of-query commits each touched
 //! table via Lance's distributed-write API (one `stage_*` + `commit_staged`
 //! per table) and returns the publisher inputs (`SubTableUpdate` list +
 //! `expected_table_versions`).
 //!
 //! Read-your-writes within the same query is satisfied by the in-memory
 //! pending batches (see `pending_batches`) — read sites union the committed
 //! Lance scan with the pending Arrow batches via DataFusion `MemTable` (see
 //! `crate::table_store::TableStore::scan_with_pending`).
 //!
 //! This module is shared by the engine's mutation path (`exec/mutation.rs`)
 //! and the bulk loader (`loader/mod.rs`); both feed insert/update batches
 //! into `pending` and route end-of-query commits through `finalize`.
 //! Deletes follow the inline-commit path and are recorded via
 //! `record_inline` (parse-time D₂ rule prevents mixed insert/delete in a
 //! single query, so no flushing is required).
 use std::collections::{HashMap, HashSet};
 use std::sync::Arc;
 use arrow_array::{Array, RecordBatch, StringArray, UInt32Array};
 use arrow_schema::SchemaRef;
 use lance::Dataset;
 use omnigraph_compiler::catalog::EdgeType;
 use crate::db::SubTableUpdate;
 use crate::error::{OmniError, Result};
 /// Whether the per-table accumulator should commit via `stage_append`
 /// (no @key inserts, edge inserts) or `stage_merge_insert` (any @key insert
 /// or update). Once set to `Merge` for a table within a query, subsequent
 /// inserts on that table are rolled into the same merge — a `WhenNotMatched
 /// = InsertAll` merge is correct for both cases.
 #[derive(Debug, Clone, Copy, PartialEq, Eq)]
 pub(crate) enum PendingMode {
    Append,
    Merge,
 }
 /// Per-table accumulator. Each insert/update op pushes a `RecordBatch` into
 /// `batches`; at end-of-query the accumulated batches concat into a single
 /// stage call.
 #[derive(Debug)]
 pub(crate) struct PendingTable {
    pub(crate) schema: SchemaRef,
    pub(crate) mode: PendingMode,
    pub(crate) batches: Vec<RecordBatch>,
 }
 impl PendingTable {
    fn new(schema: SchemaRef, mode: PendingMode) -> Self {
        Self {
            schema,
            mode,
            batches: Vec::new(),
        }
    }
    fn total_rows(&self) -> usize {
        self.batches.iter().map(|b| b.num_rows()).sum()
    }
 }
 /// Stable per-table identifiers captured on first touch and reused at
 /// finalize time. Avoids re-resolving the dataset path / branch.
 #[derive(Debug, Clone)]
 pub(crate) struct StagedTablePath {
    pub(crate) full_path: String,
    pub(crate) table_branch: Option<String>,
 }
 /// Per-query staging state.
 ///
 /// Replaces the legacy inline-commit `MutationStaging.latest` map with
 /// an in-memory accumulator that defers all Lance HEAD advances to
 /// end-of-query. After this rewire the bug class "Lance HEAD drifts ahead
 /// of `__manifest`" is unreachable in `mutate_as` and `load` for inserts
 /// and updates by construction.
 #[derive(Default)]
 pub(crate) struct MutationStaging {
    /// Pre-write manifest version per table — the publisher's CAS fence at
    /// end-of-query.
    pub(crate) expected_versions: HashMap<String, u64>,
    /// Per-table identifiers captured on first touch.
    pub(crate) paths: HashMap<String, StagedTablePath>,
    /// In-memory accumulated batches per table (insert/update path).
    pub(crate) pending: HashMap<String, PendingTable>,
    /// Inline-committed updates from delete-touching ops (D₂ guarantees no
    /// pending batches exist on a delete-touched table).
    pub(crate) inline_committed: HashMap<String, SubTableUpdate>,
 }
 impl MutationStaging {
    /// Capture pre-write metadata on first touch of a table. Subsequent
    /// touches are no-ops (paths and `expected_version` are stable for the
    /// lifetime of one query).
    pub(crate) fn ensure_path(
        &mut self,
        table_key: &str,
        full_path: String,
        table_branch: Option<String>,
        expected_version: u64,
    ) {
        self.paths.entry(table_key.to_string()).or_insert(StagedTablePath {
            full_path,
            table_branch,
        });
        self.expected_versions
            .entry(table_key.to_string())
            .or_insert(expected_version);
    }
    /// Append a batch to the per-table accumulator.
    ///
    /// `mode` is asserted-consistent with prior pushes for the same table:
    /// `Append`+`Append` stays Append; any `Merge` upgrades the table to
    /// Merge (e.g. an `update Person` after `insert Knows from='X' to='Y'`
    /// when both produce content on `node:Person`). Once Merge is set,
    /// subsequent appends roll into the merge stream — `WhenNotMatched =
    /// InsertAll` correctly inserts append-shaped rows.
    pub(crate) fn append_batch(
        &mut self,
        table_key: &str,
        schema: SchemaRef,
        mode: PendingMode,
        batch: RecordBatch,
    ) -> Result<()> {
        if batch.num_rows() == 0 {
            // No-op — staging is purely additive; an empty batch should not
            // be appended.
            return Ok(());
        }
        // If we've already accumulated a batch on this table, the new
        // batch's schema MUST match the existing accumulator's schema.
        // The mismatch case in practice is a blob-bearing table that
        // sees an `insert` (full schema, blob columns included) and
        // then an `update` whose `apply_assignments` output omits
        // unassigned blob columns (subset schema). Concat-time and
        // MemTable-construction errors would catch this later, but
        // surfacing it at the offending `append_batch` call gives the
        // caller a clearer point of failure attached to the specific
        // op that introduced the drift.
        if let Some(existing) = self.pending.get(table_key) {
            if !schemas_compatible(&existing.schema, &batch.schema()) {
                return Err(OmniError::manifest(format!(
                    "table '{}' accumulated mutation batches with mismatched schemas: \
                     prior batches have {} columns, this batch has {}. \
                     This typically happens on a blob-bearing table when one \
                     op uses the full schema (e.g. an `insert`) and another \
                     omits unassigned blob columns (e.g. an `update` that \
                     doesn't set every blob property). Split the mutation \
                     into two queries: one for the inserts, one for the \
                     updates.",
                    table_key,
                    existing.schema.fields().len(),
                    batch.schema().fields().len(),
                )));
            }
        }
        let entry = self
            .pending
            .entry(table_key.to_string())
            .or_insert_with(|| PendingTable::new(schema.clone(), mode));
        // Upgrade Append -> Merge if any op needs merge semantics.
        if mode == PendingMode::Merge {
            entry.mode = PendingMode::Merge;
        }
        entry.batches.push(batch);
        Ok(())
    }
    /// Record a delete that already inline-committed at the Lance layer.
    pub(crate) fn record_inline(&mut self, update: SubTableUpdate) {
        self.inline_committed.insert(update.table_key.clone(), update);
    }
    /// Read-your-writes accessor: the accumulated pending batches for
    /// `table_key`, or `&[]` if none.
    pub(crate) fn pending_batches(&self, table_key: &str) -> &[RecordBatch] {
        self.pending
            .get(table_key)
            .map(|p| p.batches.as_slice())
            .unwrap_or(&[])
    }
    /// Schema of the accumulated batches for `table_key`, or `None` if no
    /// op has touched the table. Used by `scan_with_pending` to construct
    /// the in-memory `MemTable`.
    pub(crate) fn pending_schema(&self, table_key: &str) -> Option<SchemaRef> {
        self.pending.get(table_key).map(|p| p.schema.clone())
    }
    /// `true` if neither pending nor inline_committed has any state — the
    /// query made no observable writes.
    pub(crate) fn is_empty(&self) -> bool {
        self.pending.is_empty() && self.inline_committed.is_empty()
    }
    /// Total count of pending rows across all tables. Used by tests and
    /// (eventually) memory-budget enforcement.
    #[allow(dead_code)]
    pub(crate) fn pending_row_count(&self) -> usize {
        self.pending.values().map(|p| p.total_rows()).sum()
    }
    /// End-of-query: for each pending table, concat batches and commit via
    /// `stage_append` or `stage_merge_insert` followed by `commit_staged`.
    /// Merge with inline-committed entries. Return `(updates,
    /// expected_versions)` for `commit_updates_on_branch_with_expected`.
    ///
    /// Sequential per-table — no cross-table dependency, but a parallel
    /// version is a perf optimization for multi-table writes (loader with
    /// many node + edge types). v1 ships sequential; the fan-out can land
    /// in a follow-up.
    pub(crate) async fn finalize(
        self,
        db: &crate::db::Omnigraph,
        _branch: Option<&str>,
    ) -> Result<(Vec<SubTableUpdate>, HashMap<String, u64>)> {
        let MutationStaging {
            expected_versions,
            paths,
            pending,
            inline_committed,
        } = self;
        let mut updates: Vec<SubTableUpdate> =
            inline_committed.into_values().collect();
        for (table_key, table) in pending {
            let path = paths.get(&table_key).ok_or_else(|| {
                OmniError::manifest_internal(format!(
                    "MutationStaging::finalize: missing path for table '{}'",
                    table_key
                ))
            })?;
            let expected = *expected_versions.get(&table_key).ok_or_else(|| {
                OmniError::manifest_internal(format!(
                    "MutationStaging::finalize: missing expected version for table '{}'",
                    table_key
                ))
            })?;
            // Reopen at the pre-write version. Lance HEAD has not advanced
            // since `ensure_path` captured it — no prior op committed to
            // this dataset.
            let ds = db
                .reopen_for_mutation(
                    &table_key,
                    &path.full_path,
                    path.table_branch.as_deref(),
                    expected,
                )
                .await?;
            if table.batches.is_empty() {
                continue;
            }
            // For Merge mode, dedupe accumulated batches by `id`, keeping
            // the LAST occurrence (last-write-wins for the query). This
            // is required because Lance's `MergeInsertBuilder` produces
            // arbitrary results on duplicate keys in the source. Append
            // mode is exempt because no-key node and edge inserts use
            // ULID-generated ids that are unique within a query.
            let combined = match table.mode {
                PendingMode::Merge => {
                    dedupe_merge_batches_by_id(&table.schema, table.batches)?
                }
                PendingMode::Append => {
                    if table.batches.len() == 1 {
                        table.batches.into_iter().next().unwrap()
                    } else {
                        arrow_select::concat::concat_batches(
                            &table.schema,
                            &table.batches,
                        )
                        .map_err(|e| OmniError::Lance(e.to_string()))?
                    }
                }
            };
            // Commit via Lance's two-phase write: stage produces
            // uncommitted fragments + transaction; commit advances HEAD.
            let staged = match table.mode {
                PendingMode::Append => {
                    db.table_store().stage_append(&ds, combined, &[]).await?
                }
                PendingMode::Merge => {
                    db.table_store()
                        .stage_merge_insert(
                            ds.clone(),
                            combined,
                            vec!["id".to_string()],
                            lance::dataset::WhenMatched::UpdateAll,
                            lance::dataset::WhenNotMatched::InsertAll,
                        )
                        .await?
                }
            };
            let new_ds = db
                .table_store()
                .commit_staged(Arc::new(ds), staged.transaction)
                .await?;
            let state = db
                .table_store()
                .table_state(&path.full_path, &new_ds)
                .await?;
            updates.push(SubTableUpdate {
                table_key: table_key.clone(),
                table_version: state.version,
                table_branch: path.table_branch.clone(),
                row_count: state.row_count,
                version_metadata: state.version_metadata,
            });
        }
        Ok((updates, expected_versions))
    }
 }
 /// Walk `batches` in reverse, tracking seen `id` values; for each row
 /// whose id we have NOT seen yet, mark it as a keeper. After the walk,
 /// take the kept rows in forward (input) order and concat into one batch.
 ///
 /// Result: a deduped batch where each `id` appears at most once, with
 /// the LAST occurrence's column values. Required by `stage_merge_insert`,
 /// which needs unique source keys (Lance's `MergeInsertBuilder` produces
 /// arbitrary results on duplicates).
 ///
 /// `batches` must be non-empty and all share `schema` (caller enforces).
 /// Compare two schemas for the purposes of `MutationStaging::append_batch`'s
 /// accumulator-compatibility check. We treat schemas as compatible if
 /// they have the same field names and data types in the same order.
 /// Nullability and field metadata differences are tolerated — Lance and
 /// Arrow round-trip these freely and the accumulator's downstream
 /// `concat_batches` is also permissive on those.
 fn schemas_compatible(a: &SchemaRef, b: &SchemaRef) -> bool {
    if a.fields().len() != b.fields().len() {
        return false;
    }
    for (af, bf) in a.fields().iter().zip(b.fields().iter()) {
        if af.name() != bf.name() || af.data_type() != bf.data_type() {
            return false;
        }
    }
    true
 }
 fn dedupe_merge_batches_by_id(
    schema: &SchemaRef,
    batches: Vec<RecordBatch>,
 ) -> Result<RecordBatch> {
    if batches.is_empty() {
        return Err(OmniError::manifest_internal(
            "dedupe_merge_batches_by_id: batches is empty".to_string(),
        ));
    }
    // Walk in reverse, tracking seen ids. For each row whose id we
    // haven't seen yet, record (batch_idx, row_idx) for the kept set.
    let mut seen: HashSet<String> = HashSet::new();
    let mut keep: Vec<Vec<u32>> = vec![Vec::new(); batches.len()];
    let mut any_duplicates = false;
    for (b_idx, batch) in batches.iter().enumerate().rev() {
        let id_col = batch
            .column_by_name("id")
            .ok_or_else(|| {
                OmniError::manifest_internal(
                    "dedupe_merge_batches_by_id: batch has no 'id' column".to_string(),
                )
            })?
            .as_any()
            .downcast_ref::<StringArray>()
            .ok_or_else(|| {
                OmniError::manifest_internal(
                    "dedupe_merge_batches_by_id: 'id' column is not Utf8".to_string(),
                )
            })?;
        for r_idx in (0..batch.num_rows()).rev() {
            if !id_col.is_valid(r_idx) {
                // NULL ids — keep all (NULL != NULL in Lance/SQL semantics).
                keep[b_idx].push(r_idx as u32);
                continue;
            }
            let id = id_col.value(r_idx);
            if seen.insert(id.to_string()) {
                keep[b_idx].push(r_idx as u32);
            } else {
                any_duplicates = true;
            }
        }
        // We pushed in reverse-row order; flip to forward order so the
        // emitted batch reflects insertion order.
        keep[b_idx].reverse();
    }
    // Fast path: no duplicates → simple concat.
    if !any_duplicates {
        if batches.len() == 1 {
            return Ok(batches.into_iter().next().unwrap());
        }
        return arrow_select::concat::concat_batches(schema, &batches)
            .map_err(|e| OmniError::Lance(e.to_string()));
    }
    // Slow path: build per-batch slices via `take`, then concat.
    let mut sliced: Vec<RecordBatch> = Vec::with_capacity(batches.len());
    for (b_idx, idxs) in keep.into_iter().enumerate() {
        if idxs.is_empty() {
            continue;
        }
        let take_array = UInt32Array::from(idxs);
        let columns: Vec<Arc<dyn Array>> = batches[b_idx]
            .columns()
            .iter()
            .map(|col| arrow_select::take::take(col, &take_array, None))
            .collect::<std::result::Result<_, _>>()
            .map_err(|e| OmniError::Lance(e.to_string()))?;
        let new_batch = RecordBatch::try_new(batches[b_idx].schema(), columns)
            .map_err(|e| OmniError::Lance(e.to_string()))?;
        sliced.push(new_batch);
    }
    if sliced.is_empty() {
        return Err(OmniError::manifest_internal(
            "dedupe_merge_batches_by_id: all rows were dropped (unexpected)".to_string(),
        ));
    }
    if sliced.len() == 1 {
        return Ok(sliced.into_iter().next().unwrap());
    }
    arrow_select::concat::concat_batches(schema, &sliced)
        .map_err(|e| OmniError::Lance(e.to_string()))
 }
 // ─── Cardinality helpers (shared by mutation + loader paths) ────────────────
 /// Count edges per `src` value across committed (Lance scan) + pending
 /// (in-memory). Caller supplies an opened committed dataset so the
 /// mutation path (which already has one) and the loader path (which
 /// opens via snapshot) share the same body.
 ///
 /// `dedupe_key_column` controls whether committed rows are shadowed by
 /// pending:
 /// - `None` — every committed row counts, every pending row counts.
 ///   Correct when committed and pending cannot share a primary key
 ///   (engine inserts always use fresh ULID edge ids; loader Append
 ///   mode uses fresh ids too).
 /// - `Some(col)` — committed rows whose `col` value also appears in any
 ///   pending batch are EXCLUDED from the committed count, so a Merge-mode
 ///   load that *updates* an existing edge (potentially changing its
 ///   `src`) counts the post-update row exactly once. Without this,
 ///   `LoadMode::Merge` double-counts.
 pub(crate) async fn count_src_per_edge(
    db: &crate::db::Omnigraph,
    committed_ds: &Dataset,
    table_key: &str,
    staging: &MutationStaging,
    dedupe_key_column: Option<&str>,
 ) -> Result<HashMap<String, u32>> {
    let mut counts: HashMap<String, u32> = HashMap::new();
    let pending_batches = staging.pending_batches(table_key);
    // Collect pending key values (for shadow-on-merge dedupe). Only when
    // dedupe is requested AND there's anything pending.
    let pending_keys: Option<HashSet<String>> = match dedupe_key_column {
        Some(col) if !pending_batches.is_empty() => {
            let mut set = HashSet::new();
            for batch in pending_batches {
                if let Some(arr) = batch
                    .column_by_name(col)
                    .and_then(|c| c.as_any().downcast_ref::<StringArray>())
                {
                    for i in 0..arr.len() {
                        if arr.is_valid(i) {
                            set.insert(arr.value(i).to_string());
                        }
                    }
                }
            }
            Some(set)
        }
        _ => None,
    };
    // Committed side: scan `src` plus the dedupe key column when set, so
    // we can both count and shadow in one pass.
    let projection: Vec<&str> = match dedupe_key_column {
        Some(col) if pending_keys.as_ref().is_some_and(|s| !s.is_empty()) => vec!["src", col],
        _ => vec!["src"],
    };
    let committed = db
        .table_store()
        .scan(committed_ds, Some(&projection), None, None)
        .await?;
    for batch in &committed {
        let srcs = batch
            .column_by_name("src")
            .ok_or_else(|| OmniError::Lance("missing 'src' column on edge table".into()))?
            .as_any()
            .downcast_ref::<StringArray>()
            .ok_or_else(|| OmniError::Lance("'src' column is not Utf8".into()))?;
        // Optional shadow-key column (only present when dedupe is on).
        let key_arr = match (&pending_keys, dedupe_key_column) {
            (Some(set), Some(col)) if !set.is_empty() => batch
                .column_by_name(col)
                .and_then(|c| c.as_any().downcast_ref::<StringArray>()),
            _ => None,
        };
        for i in 0..srcs.len() {
            if !srcs.is_valid(i) {
                continue;
            }
            // Shadow this committed row if its key is in pending.
            if let (Some(arr), Some(set)) = (key_arr, pending_keys.as_ref()) {
                if arr.is_valid(i) && set.contains(arr.value(i)) {
                    continue;
                }
            }
            *counts.entry(srcs.value(i).to_string()).or_insert(0) += 1;
        }
    }
    // Pending side: walk in-memory batches for `src`. When dedupe is on,
    // collapse rows that share `dedupe_key_column` to their last occurrence
    // — mirrors `dedupe_merge_batches_by_id`'s last-write-wins applied at
    // finalize time, so cardinality counts what `commit_staged` will
    // actually publish, not raw input duplicates.
    //
    // Without this, a Merge-mode load whose input JSONL has two rows with
    // the same edge id would be double-counted here, even though the
    // finalize-time dedupe would collapse them to one. The result: spurious
    // `@card` violations on perfectly valid Merge inputs.
    match dedupe_key_column {
        Some(key_col) => count_pending_src_with_dedupe(pending_batches, key_col, &mut counts)?,
        None => count_pending_src_naive(pending_batches, &mut counts),
    }
    Ok(counts)
 }
 /// Count pending edges per `src` with NO dedup. Correct when caller
 /// guarantees pending rows have unique primary keys (engine inserts via
 /// fresh ULID; loader Append mode).
 fn count_pending_src_naive(
    pending_batches: &[RecordBatch],
    counts: &mut HashMap<String, u32>,
 ) {
    for batch in pending_batches {
        let Some(col) = batch.column_by_name("src") else {
            continue;
        };
        let Some(srcs) = col.as_any().downcast_ref::<StringArray>() else {
            continue;
        };
        for i in 0..srcs.len() {
            if srcs.is_valid(i) {
                *counts.entry(srcs.value(i).to_string()).or_insert(0) += 1;
            }
        }
    }
 }
 /// Count pending edges per `src` after deduping rows that share
 /// `dedupe_key_column`. Last occurrence wins (mirrors
 /// `dedupe_merge_batches_by_id`'s walk-in-reverse contract). Required for
 /// `LoadMode::Merge` where the same edge id may appear multiple times in
 /// one load and finalize will collapse them to the last value.
 fn count_pending_src_with_dedupe(
    pending_batches: &[RecordBatch],
    dedupe_key_column: &str,
    counts: &mut HashMap<String, u32>,
 ) -> Result<()> {
    // Walk in reverse, track seen keys, keep one (key, src) pair per key.
    let mut seen: HashSet<String> = HashSet::new();
    let mut kept_srcs: Vec<String> = Vec::new();
    for batch in pending_batches.iter().rev() {
        let Some(key_col) = batch.column_by_name(dedupe_key_column) else {
            // Pending batch is missing the key column. By construction
            // this is unreachable: callers in dedupe mode always push
            // batches whose schema contains the key (loader Merge mode
            // builds via build_edge_batch which always emits `id`; the
            // append_batch schema-compatibility check at the call site
            // would also reject a heterogeneous mix). If it ever fires
            // it's a programmer error — fail loudly rather than skip
            // counting (which would let `@card` violations slip).
            return Err(OmniError::manifest_internal(format!(
                "count_pending_src_with_dedupe: pending batch missing dedup key column '{}' \
                 (schema-compat check at append_batch should have rejected this)",
                dedupe_key_column
            )));
        };
        let key_arr = key_col.as_any().downcast_ref::<StringArray>().ok_or_else(|| {
            OmniError::Lance(format!(
                "count_src_per_edge: pending '{}' column is not Utf8",
                dedupe_key_column
            ))
        })?;
        let src_arr = batch
            .column_by_name("src")
            .and_then(|c| c.as_any().downcast_ref::<StringArray>());
        let Some(srcs) = src_arr else {
            continue;
        };
        for i in (0..batch.num_rows()).rev() {
            if !srcs.is_valid(i) {
                continue;
            }
            // NULL key: keep (NULL != NULL semantics — every NULL counts).
            if !key_arr.is_valid(i) {
                kept_srcs.push(srcs.value(i).to_string());
                continue;
            }
            let key = key_arr.value(i);
            if seen.insert(key.to_string()) {
                kept_srcs.push(srcs.value(i).to_string());
            }
        }
    }
    for src in kept_srcs {
        *counts.entry(src).or_insert(0) += 1;
    }
    Ok(())
 }
 /// Apply `@card(min..max)` bounds to a per-source count map.
 ///
 /// Both bounds are checked. The `min` check produces a misleading error
 /// during a per-op insert mid-query (a bound of `2..` requires both
 /// edges to be inserted before validation passes), but the historical
 /// behavior was to enforce min per-op anyway — keeping users from
 /// accidentally publishing a graph that violates the schema. Consumers
 /// that need end-of-query semantics call this from after all edge ops
 /// are accumulated (the loader does, via Phase 3).
 pub(crate) fn enforce_cardinality_bounds(
    edge_type: &EdgeType,
    counts: &HashMap<String, u32>,
 ) -> Result<()> {
    let card = &edge_type.cardinality;
    for (src, count) in counts {
        if let Some(max) = card.max {
            if *count > max {
                return Err(OmniError::manifest(format!(
                    "@card violation on edge {}: source '{}' has {} edges (max {})",
                    edge_type.name, src, count, max
                )));
            }
        }
        if *count < card.min {
            return Err(OmniError::manifest(format!(
                "@card violation on edge {}: source '{}' has {} edges (min {})",
                edge_type.name, src, count, card.min
            )));
        }
    }
    Ok(())
 }
--- a/crates/omnigraph/src/loader/mod.rs
+++ b/crates/omnigraph/src/loader/mod.rs
@ -21,6 +21,7 @@ use serde_json::Value as JsonValue;
 use crate::db::Omnigraph;
 use crate::error::{OmniError, Result};
 use crate::exec::staging::{MutationStaging, PendingMode};
 /// Result of a load operation.
 #[derive(Debug, Clone, Default)]
@ -154,11 +155,10 @@ impl Omnigraph {
    pub async fn load(&mut self, branch: &str, data: &str, mode: LoadMode) -> Result<LoadResult> {
        self.ensure_schema_state_valid().await?;
-        // Reject internal `__run__*` / system-prefixed branches at the public
+        // Reject internal `__run__*` / system-prefixed branches at the
-        // write boundary. The pre-MR-771 path got this guard transitively via
+        // public write boundary. Direct-publish paths assert this
-        // `begin_run`'s `ensure_public_branch_ref` call; the direct-publish
+        // explicitly so a caller can't write to legacy or system
-        // path needs to assert it explicitly so a caller can't write to
+        // staging branches by passing the prefix verbatim.
        // legacy or system staging branches by passing the prefix verbatim.
        crate::db::ensure_public_branch_ref(branch, "load")?;
        // Branch convention: `None` represents `main`. Re-normalizing to
        // `Some("main")` here would route the publisher commit through a
@ -304,21 +304,29 @@ async fn load_jsonl_reader<R: BufRead>(
        }
    }
-    // Phase 2: Build per-type RecordBatches and write to Lance.
+    // Phase 2: Build per-type RecordBatches and accumulate into the
-    //
+    // staging pipeline. For Append/Merge, batches go into an in-memory
-    // Writes to different tables are independent in Lance (each table has its
+    // accumulator and a single `stage_*` + `commit_staged` per touched
-    // own manifest + fragments), so we parallelize across types with a bounded
+    // table runs at end-of-load — a mid-load failure (RI / cardinality
-    // concurrency limit. Serial writes against S3 were the dominant cost of
+    // violation) leaves Lance HEAD untouched. For Overwrite, the legacy
-    // load — batching and parallelizing per-type cuts wall time by roughly
+    // inline-commit path is preserved (truncate+append doesn't fit the
-    // `LOAD_WRITE_CONCURRENCY`× for wide schemas (see MR-677).
+    // staged shape cleanly, and overwrite has no in-flight read-your-writes
    // requirement).
    let mut updates = Vec::new();
    let mut result = LoadResult::default();
    let snapshot = db.snapshot_for_branch(branch).await?;
-    // Capture per-table manifest versions before any write so the publisher
+    let use_staging = !matches!(mode, LoadMode::Overwrite);
-    // CAS at commit-time can detect concurrent writers landing between our
+    let mut staging = MutationStaging::default();
-    // read snapshot and our publish.
+    let mut overwrite_updates: Vec<crate::db::SubTableUpdate> = Vec::new();
-    let mut expected_table_versions: HashMap<String, u64> = HashMap::new();
+    let mut overwrite_expected: HashMap<String, u64> = HashMap::new();
    let pending_mode = match mode {
        LoadMode::Merge => PendingMode::Merge,
        // Append-mode loads accumulate as Append. Edge tables (no @key)
        // and no-key node tables stay safe on the stage_append path. The
        // Merge mode applies dedupe-by-id; Append assumes unique inputs.
        LoadMode::Append => PendingMode::Append,
        LoadMode::Overwrite => PendingMode::Append, // unused
    };
    // Phase 2a: build and validate every node batch up front. Cheap and
    // synchronous — surfaces validation errors before any S3 traffic.
@ -338,46 +346,89 @@ async fn load_jsonl_reader<R: BufRead>(
        let entry = snapshot
            .entry(&table_key)
            .ok_or_else(|| OmniError::manifest(format!("no manifest entry for {}", table_key)))?;
-        expected_table_versions.insert(table_key.clone(), entry.table_version);
+        if !use_staging {
            overwrite_expected.insert(table_key.clone(), entry.table_version);
        }
        prepared_nodes.push((type_name.clone(), table_key, batch, loaded_count));
    }
-    // Phase 2b: write every node type concurrently, bounded.
+    // Phase 2b: write every node type. Append/Merge → in-memory
-    let node_write_results = write_batches_concurrently(db, branch, mode, prepared_nodes).await?;
+    // accumulator. Overwrite → concurrent inline-commit (legacy path).
-
+    if use_staging {
-    for (type_name, table_key, loaded_count, state, table_branch) in node_write_results {
+        for (type_name, table_key, batch, loaded_count) in prepared_nodes {
-        updates.push(crate::db::SubTableUpdate {
+            let (ds, full_path, table_branch) = db
-            table_key,
+                .open_for_mutation_on_branch(branch, &table_key)
-            table_version: state.version,
+                .await?;
-            table_branch,
+            let expected_version = ds.version().version;
-            row_count: state.row_count,
+            staging.ensure_path(
-            version_metadata: state.version_metadata,
+                &table_key,
-        });
+                full_path,
-        result.nodes_loaded.insert(type_name, loaded_count);
+                table_branch,
                expected_version,
            );
            let schema = batch.schema();
            staging.append_batch(&table_key, schema, pending_mode, batch)?;
            result.nodes_loaded.insert(type_name, loaded_count);
        }
    } else {
        let node_write_results =
            write_batches_concurrently(db, branch, mode, prepared_nodes).await?;
        for (type_name, table_key, loaded_count, state, table_branch) in node_write_results {
            overwrite_updates.push(crate::db::SubTableUpdate {
                table_key,
                table_version: state.version,
                table_branch,
                row_count: state.row_count,
                version_metadata: state.version_metadata,
            });
            result.nodes_loaded.insert(type_name, loaded_count);
        }
    }
-    // Phase 2b: Validate edge referential integrity — every src/dst must
+    // Phase 2c: Validate edge referential integrity — every src/dst must
-    // reference an existing node ID in the appropriate type.
+    // reference an existing node ID in the appropriate type. For staged
    // loads, the lookup unions snapshot-committed IDs with the in-memory
    // pending batches (which carry the just-staged node inserts).
    for (edge_name, rows) in &edge_rows {
        let edge_type = &catalog.edge_types[edge_name];
-        let from_ids = collect_node_ids(
+        let from_ids = if use_staging {
-            db,
+            collect_node_ids_with_pending(
-            branch,
+                db,
-            &edge_type.from_type,
+                branch,
-            &node_rows,
+                &edge_type.from_type,
-            &catalog,
+                &staging,
-            &updates,
+            )
-        )
+            .await?
-        .await?;
+        } else {
-        let to_ids = collect_node_ids(
+            collect_node_ids(
-            db,
+                db,
-            branch,
+                branch,
-            &edge_type.to_type,
+                &edge_type.from_type,
-            &node_rows,
+                &node_rows,
-            &catalog,
+                &catalog,
-            &updates,
+                &overwrite_updates,
-        )
+            )
-        .await?;
+            .await?
        };
        let to_ids = if use_staging {
            collect_node_ids_with_pending(
                db,
                branch,
                &edge_type.to_type,
                &staging,
            )
            .await?
        } else {
            collect_node_ids(
                db,
                branch,
                &edge_type.to_type,
                &node_rows,
                &catalog,
                &overwrite_updates,
            )
            .await?
        };
        for (i, (src, dst, _)) in rows.iter().enumerate() {
            if !from_ids.contains(src.as_str()) {
@ -401,7 +452,7 @@ async fn load_jsonl_reader<R: BufRead>(
        }
    }
-    // Write edges (parallel per edge type, same pattern as nodes)
+    // Phase 2d: build edge batches.
    let mut prepared_edges: Vec<(String, String, RecordBatch, usize)> =
        Vec::with_capacity(edge_rows.len());
    for (edge_name, rows) in &edge_rows {
@ -417,29 +468,62 @@ async fn load_jsonl_reader<R: BufRead>(
        let entry = snapshot
            .entry(&table_key)
            .ok_or_else(|| OmniError::manifest(format!("no manifest entry for {}", table_key)))?;
-        expected_table_versions.insert(table_key.clone(), entry.table_version);
+        if !use_staging {
            overwrite_expected.insert(table_key.clone(), entry.table_version);
        }
        prepared_edges.push((edge_name.clone(), table_key, batch, loaded_count));
    }
-    let edge_write_results = write_batches_concurrently(db, branch, mode, prepared_edges).await?;
+    // Phase 2e: write every edge type. Same dispatch as Phase 2b.
-
+    if use_staging {
-    for (edge_name, table_key, loaded_count, state, table_branch) in edge_write_results {
+        for (edge_name, table_key, batch, loaded_count) in prepared_edges {
-        updates.push(crate::db::SubTableUpdate {
+            let (ds, full_path, table_branch) = db
-            table_key,
+                .open_for_mutation_on_branch(branch, &table_key)
-            table_version: state.version,
+                .await?;
-            table_branch,
+            let expected_version = ds.version().version;
-            row_count: state.row_count,
+            staging.ensure_path(
-            version_metadata: state.version_metadata,
+                &table_key,
-        });
+                full_path,
-        result.edges_loaded.insert(edge_name, loaded_count);
+                table_branch,
                expected_version,
            );
            let schema = batch.schema();
            staging.append_batch(&table_key, schema, pending_mode, batch)?;
            result.edges_loaded.insert(edge_name, loaded_count);
        }
    } else {
        let edge_write_results =
            write_batches_concurrently(db, branch, mode, prepared_edges).await?;
        for (edge_name, table_key, loaded_count, state, table_branch) in edge_write_results {
            overwrite_updates.push(crate::db::SubTableUpdate {
                table_key,
                table_version: state.version,
                table_branch,
                row_count: state.row_count,
                version_metadata: state.version_metadata,
            });
            result.edges_loaded.insert(edge_name, loaded_count);
        }
    }
-    // Phase 3: Validate edge cardinality constraints (before commit — invalid
+    // Phase 3: Validate edge cardinality constraints (before commit —
-    // data must not be committed). Opens edge sub-tables at their just-written
+    // invalid data must not be committed). Staged path scans committed
-    // versions, not through the snapshot (which still pins to pre-write state).
+    // edges via Lance + iterates pending edges in-memory. Overwrite path
    // opens the just-written version (legacy behavior).
    for (edge_name, _) in &edge_rows {
        let edge_type = &catalog.edge_types[edge_name];
        let table_key = format!("edge:{}", edge_name);
-        if let Some(update) = updates.iter().find(|u| u.table_key == table_key) {
+        if use_staging {
            validate_edge_cardinality_with_pending_loader(
                db,
                branch,
                edge_type,
                &table_key,
                &staging,
                mode,
            )
            .await?;
        } else if let Some(update) = overwrite_updates.iter().find(|u| u.table_key == table_key) {
            validate_edge_cardinality(
                db,
                branch,
@ -452,8 +536,18 @@ async fn load_jsonl_reader<R: BufRead>(
    }
    // Phase 4: Atomic manifest commit with publisher-level OCC.
-    db.commit_updates_on_branch_with_expected(branch, &updates, &expected_table_versions)
+    if use_staging {
        let (updates, expected_versions) = staging.finalize(db, branch).await?;
        db.commit_updates_on_branch_with_expected(branch, &updates, &expected_versions)
            .await?;
    } else {
        db.commit_updates_on_branch_with_expected(
            branch,
            &overwrite_updates,
            &overwrite_expected,
        )
        .await?;
    }
    Ok(result)
 }
@ -1456,6 +1550,123 @@ pub(crate) async fn validate_edge_cardinality(
    Ok(())
 }
 /// Validate edge `@card` cardinality with in-memory pending edges visible.
 ///
 /// Loader-level analog to `exec::mutation::validate_edge_cardinality_with_pending`:
 /// opens the committed dataset at the pre-load snapshot version, then
 /// delegates to the shared `count_src_per_edge` + `enforce_cardinality_bounds`
 /// helpers in `exec::staging`. Used by Append/Merge loads (the Overwrite
 /// path uses `validate_edge_cardinality` which opens the just-written
 /// Lance version).
 ///
 /// `mode` controls dedup behavior. `LoadMode::Merge` passes `Some("id")`
 /// so committed edges that the load is *updating* (same edge id,
 /// possibly changed `src`) are not double-counted. `LoadMode::Append`
 /// passes `None` because each line generates a fresh ULID id that
 /// never collides with committed.
 async fn validate_edge_cardinality_with_pending_loader(
    db: &Omnigraph,
    branch: Option<&str>,
    edge_type: &omnigraph_compiler::catalog::EdgeType,
    table_key: &str,
    staging: &MutationStaging,
    mode: LoadMode,
 ) -> Result<()> {
    if edge_type.cardinality.is_default() {
        return Ok(());
    }
    let snapshot = db.snapshot_for_branch(branch).await?;
    let Some(entry) = snapshot.entry(table_key) else {
        // No manifest entry — table doesn't exist yet. Pending-only is
        // fine; the helper handles empty committed scans.
        return Ok(());
    };
    let ds = db
        .open_dataset_at_state(
            &entry.table_path,
            entry.table_branch.as_deref(),
            entry.table_version,
        )
        .await?;
    let dedupe_key = match mode {
        LoadMode::Merge => Some("id"),
        LoadMode::Append | LoadMode::Overwrite => None,
    };
    let counts =
        crate::exec::staging::count_src_per_edge(db, &ds, table_key, staging, dedupe_key)
            .await?;
    crate::exec::staging::enforce_cardinality_bounds(edge_type, &counts)
 }
 /// Collect all valid node IDs for a given type, with in-memory pending
 /// node inserts visible. Used by the staged loader's Phase 2c
 /// referential-integrity validation.
 ///
 /// Union of:
 /// - IDs from the staged loader's pending batches (in-memory; just-staged
 ///   inserts of this type)
 /// - IDs from the committed sub-table at the pre-load snapshot version
 async fn collect_node_ids_with_pending(
    db: &Omnigraph,
    branch: Option<&str>,
    type_name: &str,
    staging: &MutationStaging,
 ) -> Result<HashSet<String>> {
    let mut ids = HashSet::new();
    let table_key = format!("node:{}", type_name);
    // From staging.pending: walk the in-memory accumulator's id column.
    for batch in staging.pending_batches(&table_key) {
        if let Some(col) = batch.column_by_name("id") {
            if let Some(arr) = col.as_any().downcast_ref::<StringArray>() {
                for i in 0..arr.len() {
                    if arr.is_valid(i) {
                        ids.insert(arr.value(i).to_string());
                    }
                }
            }
        }
    }
    // From the committed Lance sub-table at the pre-load snapshot version.
    let snapshot = db.snapshot_for_branch(branch).await?;
    let Some(entry) = snapshot.entry(&table_key) else {
        return Ok(ids);
    };
    let ds = db
        .open_dataset_at_state(
            &entry.table_path,
            entry.table_branch.as_deref(),
            entry.table_version,
        )
        .await?;
    let batches = db
        .table_store()
        .scan(&ds, Some(&["id"]), None, None)
        .await?;
    for batch in &batches {
        let id_col = batch
            .column_by_name("id")
            .ok_or_else(|| OmniError::Lance("missing 'id' column".into()))?
            .as_any()
            .downcast_ref::<StringArray>()
            .ok_or_else(|| OmniError::Lance("'id' column is not Utf8".into()))?;
        for i in 0..batch.num_rows() {
            // Defensive: `id` is the @key column on every node type and
            // is non-nullable by schema, but a committed-row corruption
            // (or future schema change) could surface a NULL. Skip
            // rather than insert "" — pending-side does the same.
            if id_col.is_valid(i) {
                ids.insert(id_col.value(i).to_string());
            }
        }
    }
    Ok(ids)
 }
 /// Collect all valid node IDs for a given type. Union of:
 /// - IDs from the just-loaded batch (in memory, from node_rows)
 /// - IDs from the sub-table at the just-written version (if it was updated)
@ -1512,6 +1723,9 @@ async fn collect_node_ids(
            .downcast_ref::<StringArray>()
            .unwrap();
        for i in 0..batch.num_rows() {
            if !id_col.is_valid(i) {
                continue;
            }
            ids.insert(id_col.value(i).to_string());
        }
    }
--- a/crates/omnigraph/src/table_store.rs
+++ b/crates/omnigraph/src/table_store.rs
@ -40,12 +40,12 @@ pub struct DeleteState {
 /// A Lance write that has produced fragment files on object storage but is
 /// not yet committed to the dataset's manifest. The staged-write primitives
-/// are defined here for later integration in `MutationStaging`
+/// are consumed by `MutationStaging` (`exec/staging.rs`,
-/// (`exec/mutation.rs`) and the loader (`loader/mod.rs`) — those rewires
+/// `exec/mutation.rs`) and the bulk loader (`loader/mod.rs`). The
-/// land in [MR-794](https://linear.app/modernrelay/issue/MR-794) step 2+.
+/// intent: defer Lance commits to end-of-query so a mid-query failure
-/// The intent: defer Lance commits to end-of-query so a mid-query failure
+/// leaves the touched table at the pre-mutation HEAD instead of
-/// leaves the touched table at the pre-mutation HEAD instead of drifting
+/// drifting ahead. See `docs/runs.md` for the publisher-CAS contract
-/// ahead.
+/// this builds on.
 ///
 /// `transaction` is opaque from our side — Lance owns its semantics. We
 /// commit it via `CommitBuilder::execute(transaction)` (see
@ -545,7 +545,7 @@ impl TableStore {
        })
    }
-    // ─── Staged-write API (MR-794) ───────────────────────────────────────────
+    // ─── Staged-write API ────────────────────────────────────────────────────
    //
    // These primitives wrap Lance's distributed-write API: each call writes
    // fragment files to object storage but does NOT advance the dataset's
@ -672,16 +672,16 @@ impl TableStore {
    ///
    /// This is intrinsic to the underlying Lance API: there is no public
    /// way to make `MergeInsertBuilder` see uncommitted fragments. The
-    /// engine's mutation path enforces the rule "per touched table: all
+    /// engine's `MutationStaging` accumulator works around this by
-    /// stage_append OR exactly one stage_merge_insert" at parse time
+    /// concatenating per-table batches in memory and issuing exactly
-    /// (the D₂′ check landing with [MR-794](https://linear.app/modernrelay/issue/MR-794)
+    /// one `stage_merge_insert` per touched table at end-of-query (with
-    /// step 2+ in `exec/mutation.rs`). Multi-table queries and append-chains
+    /// last-write-wins dedupe by id) — see `exec/staging.rs`. Direct
-    /// remain safe; only chained merges on a single table are rejected.
+    /// callers of this primitive must respect the contract themselves.
    ///
    /// Lift path: either a Lance API extension that lets
    /// `MergeInsertBuilder` accept additional staged fragments, or an
    /// in-memory pre-merge here that folds prior staged batches into the
-    /// input stream. See `docs/runs.md` and MR-793.
+    /// input stream. See `docs/runs.md`.
    pub async fn stage_merge_insert(
        &self,
        ds: Dataset,
@ -780,10 +780,10 @@ impl TableStore {
    /// filtered scan even when their data would match. Staged-fragment
    /// rows are silently absent from the result. `scanner.use_stats(false)`
    /// does not fix this in lance 4.0.0. Callers needing correct filtered
-    /// reads against staged data should use a different strategy (the
+    /// reads against staged data should use a different strategy — the
-    /// engine's MR-794 step 2+ design uses in-memory pending-batch
+    /// engine's `MutationStaging` accumulator unions in-memory pending
-    /// accumulation + DataFusion `MemTable` instead — see
+    /// batches with the committed scan via DataFusion `MemTable` (see
-    /// `.context/mr-794-step2-design.md`).
+    /// `scan_with_pending`).
    ///
    /// This method remains on the surface for primitive-level testing
    /// (basic stage + scan correctness without filters works) and for
@ -821,6 +821,106 @@ impl TableStore {
            .map_err(|e| OmniError::Lance(e.to_string()))
    }
    /// Scan committed via Lance + apply the same filter to in-memory
    /// pending batches via DataFusion `MemTable`, concat the two result
    /// streams. The replacement for `scan_with_staged` in engine code:
    /// the staged-write writer accumulates input batches in memory and
    /// unions them with the committed snapshot at read time,
    /// sidestepping the `Scanner::with_fragments` filter-pushdown
    /// limitation documented on `scan_with_staged`.
    ///
    /// `committed_ds` should be opened at the pre-mutation
    /// `expected_version` (the same version captured in `MutationStaging::expected_versions`
    /// at first touch of the table). `pending_batches` are the per-table
    /// accumulator's batches in their input shape. `pending_schema` is
    /// the schema of the accumulated batches; passing `None` falls back
    /// to the schema of the first pending batch.
    ///
    /// `filter` is the Lance / DataFusion SQL predicate. It is applied
    /// to both sides — Lance pushes it down on the committed side; the
    /// pending side runs it through a fresh DataFusion `SessionContext`
    /// with the batches registered as a `MemTable` named `pending`.
    ///
    /// `key_column` controls how committed and pending are unioned:
    /// - **`None` (union semantics)**: every committed row that matches
    ///   the filter and every pending row that matches the filter is
    ///   returned. Correct when committed and pending cannot share a
    ///   primary key — e.g., Append-mode loads with ULID-generated ids,
    ///   or any read where pending hasn't been used to update committed
    ///   rows.
    /// - **`Some(col)` (merge / shadow semantics)**: committed rows whose
    ///   `col` value appears in any pending batch are EXCLUDED from the
    ///   result; only pending's view of those rows is returned. Required
    ///   for Merge-mode reads (e.g., `execute_update` on the engine path)
    ///   so a chained `update` doesn't see stale committed values that
    ///   a prior op already updated in pending. Without this, a predicate
    ///   like `where age > 30` can match a row that an earlier
    ///   `set age = 20` already moved out of range.
    ///
    /// When `pending_batches` is empty this delegates to the regular
    /// scan path.
    pub async fn scan_with_pending(
        &self,
        committed_ds: &Dataset,
        pending_batches: &[RecordBatch],
        pending_schema: Option<SchemaRef>,
        projection: Option<&[&str]>,
        filter: Option<&str>,
        key_column: Option<&str>,
    ) -> Result<Vec<RecordBatch>> {
        // Contract: when merge-shadow semantics are requested via
        // `key_column`, the committed-side projection MUST include that
        // column so we can filter committed rows whose key appears in
        // pending. Silently dropping the shadow when projection omits
        // the key would re-introduce union semantics behind the
        // caller's back. Reject up front with a clear error so callers
        // either (a) include the key in projection or (b) drop
        // `key_column` if union is what they wanted.
        if let (Some(key_col), Some(cols)) = (key_column, projection) {
            if !cols.iter().any(|c| *c == key_col) {
                return Err(OmniError::Lance(format!(
                    "scan_with_pending: key_column '{}' must appear in projection \
                     when merge-shadow semantics are requested (got projection = {:?})",
                    key_col, cols
                )));
            }
        }
        let committed = self.scan(committed_ds, projection, filter, None).await?;
        if pending_batches.is_empty() {
            return Ok(committed);
        }
        // Shadow committed rows whose key value also appears in pending.
        // This makes scan_with_pending implement merge semantics rather
        // than naive union: any row that has a pending update is
        // represented ONLY by its pending value, never by both its
        // (stale) committed value and its (current) pending value.
        let committed = match key_column {
            Some(key_col) => {
                let pending_keys = collect_string_column_values(pending_batches, key_col)?;
                if pending_keys.is_empty() {
                    committed
                } else {
                    filter_out_rows_where_string_in(committed, key_col, &pending_keys)?
                }
            }
            None => committed,
        };
        let pending = scan_pending_batches(
            pending_batches,
            pending_schema,
            projection,
            filter,
        )
        .await?;
        let mut out = committed;
        out.extend(pending);
        Ok(out)
    }
    /// `count_rows` variant that respects staged writes. Used for
    /// edge-cardinality validation that needs to see staged edges before
    /// commit. Same `committed - removed + new` composition as
@ -1068,6 +1168,139 @@ fn assign_row_id_meta(fragments: &mut [Fragment], start_row_id: u64) -> Result<(
    Ok(())
 }
 /// Collect the set of values in a Utf8 column across multiple batches.
 /// Used by `scan_with_pending`'s merge-semantic path to identify
 /// committed rows that are shadowed by pending writes. NULL values are
 /// skipped.
 fn collect_string_column_values(
    batches: &[RecordBatch],
    column: &str,
 ) -> Result<std::collections::HashSet<String>> {
    use arrow_array::{Array, StringArray};
    let mut out = std::collections::HashSet::new();
    for batch in batches {
        let Some(col) = batch.column_by_name(column) else {
            return Err(OmniError::Lance(format!(
                "scan_with_pending: pending batch missing key column '{}'",
                column
            )));
        };
        let arr = col.as_any().downcast_ref::<StringArray>().ok_or_else(|| {
            OmniError::Lance(format!(
                "scan_with_pending: key column '{}' is not Utf8",
                column
            ))
        })?;
        for i in 0..arr.len() {
            if arr.is_valid(i) {
                out.insert(arr.value(i).to_string());
            }
        }
    }
    Ok(out)
 }
 /// Drop rows from `batches` whose Utf8 `column` value is in `excluded`.
 /// Used by `scan_with_pending`'s merge-semantic path to shadow committed
 /// rows that pending has already updated. Returns the surviving rows.
 ///
 /// `scan_with_pending` validates up front that the projection contains
 /// `column`, so a missing column here is a programmer error — error
 /// loudly instead of silently passing batches through (which would
 /// re-introduce the union semantics the caller asked us to avoid).
 fn filter_out_rows_where_string_in(
    batches: Vec<RecordBatch>,
    column: &str,
    excluded: &std::collections::HashSet<String>,
 ) -> Result<Vec<RecordBatch>> {
    use arrow_array::{Array, BooleanArray, StringArray};
    let mut out = Vec::with_capacity(batches.len());
    for batch in batches {
        if batch.num_rows() == 0 {
            out.push(batch);
            continue;
        }
        let col = batch.column_by_name(column).ok_or_else(|| {
            OmniError::manifest_internal(format!(
                "scan_with_pending: committed batch missing key column '{}' \
                 (the up-front projection check should have rejected this)",
                column
            ))
        })?;
        let arr = col.as_any().downcast_ref::<StringArray>().ok_or_else(|| {
            OmniError::Lance(format!(
                "scan_with_pending: committed column '{}' is not Utf8",
                column
            ))
        })?;
        let mask: BooleanArray = (0..arr.len())
            .map(|i| {
                if arr.is_valid(i) {
                    Some(!excluded.contains(arr.value(i)))
                } else {
                    Some(true)
                }
            })
            .collect();
        let filtered = arrow_select::filter::filter_record_batch(&batch, &mask)
            .map_err(|e| OmniError::Lance(e.to_string()))?;
        out.push(filtered);
    }
    Ok(out)
 }
 /// Apply `projection` and `filter` to in-memory pending batches via a
 /// fresh DataFusion `SessionContext`. Used by `scan_with_pending` for
 /// the read-your-writes side of the in-memory staging accumulator.
 ///
 /// `pending_batches` must be non-empty (the caller short-circuits on
 /// empty).
 ///
 /// **SQL dialect contract.** `filter` is also passed to Lance's scanner
 /// on the committed side. Lance and DataFusion both accept standard
 /// SQL comparison predicates (`col op literal`) and OmniGraph's
 /// `predicate_to_sql` only emits those shapes today (`=`, `!=`, `>`,
 /// `<`, `>=`, `<=`). If a future caller introduces a Lance-specific
 /// scanner extension (vector search, FTS, `_rowid` references) into
 /// the filter, this function will need explicit translation — DataFusion
 /// won't recognize those operators against the in-memory `MemTable`.
 async fn scan_pending_batches(
    pending_batches: &[RecordBatch],
    pending_schema: Option<SchemaRef>,
    projection: Option<&[&str]>,
    filter: Option<&str>,
 ) -> Result<Vec<RecordBatch>> {
    let schema = pending_schema.unwrap_or_else(|| pending_batches[0].schema());
    let ctx = datafusion::execution::context::SessionContext::new();
    let mem = datafusion::datasource::MemTable::try_new(
        schema,
        vec![pending_batches.to_vec()],
    )
    .map_err(|e| OmniError::Lance(e.to_string()))?;
    ctx.register_table("pending", Arc::new(mem))
        .map_err(|e| OmniError::Lance(e.to_string()))?;
    let proj = projection
        .map(|cols| {
            cols.iter()
                .map(|c| format!("\"{}\"", c.replace('"', "\"\"")))
                .collect::<Vec<_>>()
                .join(", ")
        })
        .unwrap_or_else(|| "*".to_string());
    let where_clause = filter
        .map(|f| format!("WHERE {f}"))
        .unwrap_or_default();
    let sql = format!("SELECT {proj} FROM pending {where_clause}");
    let df = ctx
        .sql(&sql)
        .await
        .map_err(|e| OmniError::Lance(e.to_string()))?;
    df.collect()
        .await
        .map_err(|e| OmniError::Lance(e.to_string()))
 }
 fn combine_committed_with_staged(ds: &Dataset, staged: &[StagedWrite]) -> Vec<Fragment> {
    let removed: std::collections::HashSet<u64> = staged
        .iter()
--- a/crates/omnigraph/tests/consistency.rs
+++ b/crates/omnigraph/tests/consistency.rs
@ -522,11 +522,12 @@ query high_value() {
 #[tokio::test]
 async fn stale_handle_public_mutation_must_refresh_then_retry() {
-    // MR-771: with the Run state machine removed, the engine no longer
+    // With the Run state machine removed, the engine no longer
-    // auto-rebases stale-handle mutations onto the latest target head. The
+    // auto-rebases stale-handle mutations onto the latest target head.
-    // publisher's `expected_table_versions` CAS makes the contract explicit
+    // The publisher's `expected_table_versions` CAS makes the contract
-    // — a stale writer fails loudly with `ExpectedVersionMismatch` and the
+    // explicit — a stale writer fails loudly with
-    // client decides whether to refresh-and-retry.
+    // `ExpectedVersionMismatch` and the client decides whether to
    // refresh-and-retry.
    let dir = tempfile::tempdir().unwrap();
    let _db = init_and_load(&dir).await;
    drop(_db);
--- a/crates/omnigraph/tests/failpoints.rs
+++ b/crates/omnigraph/tests/failpoints.rs
@ -140,6 +140,131 @@ async fn schema_apply_recovers_partial_rename() {
    assert_no_staging_files(dir.path());
 }
 /// Pin the documented "finalize → publisher residual" of the
 /// staged-write commit path.
 ///
 /// `MutationStaging::finalize` runs `commit_staged` per touched table
 /// sequentially before the publisher commits the manifest. Lance has no
 /// multi-dataset atomic commit primitive, so a failure between the
 /// per-table staged commits and the manifest commit leaves Lance HEAD
 /// advanced on the touched tables with no manifest update — and the
 /// next mutation surfaces `ExpectedVersionMismatch` on those tables.
 ///
 /// This isn't a code bug we can fix without an upstream Lance change;
 /// it's the documented residual (see `docs/runs.md` "Finalize →
 /// publisher residual"). The test pins the behavior so future code
 /// changes catch any silent regression: if someone widens the residual
 /// (e.g. failing earlier in finalize without rolling back), this test
 /// will surface a different error than `ExpectedVersionMismatch`. If
 /// someone narrows the residual (e.g. lance ships multi-dataset commit
 /// and we plumb it), this test will start passing the next mutation
 /// — and someone has to update the assertion + the docs.
 #[tokio::test]
 async fn finalize_publisher_residual_drifts_lance_head_until_next_writer_recovers() {
    use omnigraph::error::{ManifestConflictDetails, OmniError};
    let _scenario = FailScenario::setup();
    let dir = tempfile::tempdir().unwrap();
    let mut db = Omnigraph::init(dir.path().to_str().unwrap(), helpers::TEST_SCHEMA)
        .await
        .unwrap();
    {
        let _failpoint =
            ScopedFailPoint::new("mutation.post_finalize_pre_publisher", "return");
        // First mutation: finalize succeeds (commit_staged advances Lance
        // HEAD on node:Person), then the failpoint kicks before the
        // publisher's manifest commit. The caller sees the synthetic
        // error.
        let err = mutate_main(
            &mut db,
            MUTATION_QUERIES,
            "insert_person",
            &mixed_params(&[("$name", "Eve")], &[("$age", 22)]),
        )
        .await
        .unwrap_err();
        assert!(
            err.to_string().contains(
                "injected failpoint triggered: mutation.post_finalize_pre_publisher"
            ),
            "unexpected error: {err}"
        );
    }
    // Failpoint dropped — subsequent calls are not synthetic-failed.
    // Next mutation against the same table surfaces the documented
    // residual: Lance HEAD on node:Person advanced (commit_staged ran),
    // manifest didn't, so the publisher CAS at next-mutation time
    // surfaces ExpectedVersionMismatch.
    let err = mutate_main(
        &mut db,
        MUTATION_QUERIES,
        "insert_person",
        &mixed_params(&[("$name", "Frank")], &[("$age", 33)]),
    )
    .await
    .unwrap_err();
    let OmniError::Manifest(manifest_err) = err else {
        panic!("expected Manifest error, got {err:?}");
    };
    let Some(ManifestConflictDetails::ExpectedVersionMismatch {
        ref table_key,
        expected,
        actual,
    }) = manifest_err.details
    else {
        panic!(
            "expected ExpectedVersionMismatch (the documented residual), got {:?}",
            manifest_err.details
        );
    };
    assert_eq!(
        table_key, "node:Person",
        "drift should be on the table the failed finalize touched"
    );
    assert!(
        actual > expected,
        "Lance HEAD on the drifted table should be ahead of manifest pinned: actual={actual} expected={expected}",
    );
 }
 /// Companion to the above — confirms that a finalize→publisher failure
 /// on one table leaves OTHER tables untouched. Subsequent writes to
 /// non-drifted tables proceed normally; the drift is contained.
 #[tokio::test]
 async fn finalize_publisher_residual_does_not_drift_untouched_tables() {
    let _scenario = FailScenario::setup();
    let dir = tempfile::tempdir().unwrap();
    let mut db = Omnigraph::init(dir.path().to_str().unwrap(), helpers::TEST_SCHEMA)
        .await
        .unwrap();
    {
        let _failpoint =
            ScopedFailPoint::new("mutation.post_finalize_pre_publisher", "return");
        let _ = mutate_main(
            &mut db,
            MUTATION_QUERIES,
            "insert_person",
            &mixed_params(&[("$name", "Eve")], &[("$age", 22)]),
        )
        .await
        .expect_err("synthetic failpoint must fire");
    }
    // node:Person drifted. node:Company didn't — try a Company write.
    use omnigraph::loader::{LoadMode, load_jsonl};
    load_jsonl(
        &mut db,
        r#"{"type": "Company", "data": {"name": "Acme"}}"#,
        LoadMode::Append,
    )
    .await
    .expect("Company write on a non-drifted table should succeed");
 }
 fn assert_no_staging_files(repo: &std::path::Path) {
    for name in [
        "_schema.pg.staging",
--- a/crates/omnigraph/tests/runs.rs
+++ b/crates/omnigraph/tests/runs.rs
--- a/crates/omnigraph/tests/s3_storage.rs
+++ b/crates/omnigraph/tests/s3_storage.rs
@ -44,7 +44,7 @@ async fn s3_compatible_repo_lifecycle_works() {
        .await
        .unwrap();
-    // Direct-to-target load (MR-771): no run lifecycle, single publisher
+    // Direct-to-target load: no run lifecycle, single publisher
    // commit lands the row.
    reopened
        .load(
@ -153,7 +153,7 @@ async fn s3_public_load_uses_hidden_run_and_publishes() {
    .await
    .unwrap();
-    // Direct-to-target writes (MR-771): no run state machine, just the
+    // Direct-to-target writes: no run state machine, just the
    // published commit lands the row. Verify by reopening and reading.
    let mut reopened = Omnigraph::open(&uri).await.unwrap();
    let loaded = query_main(
--- a/crates/omnigraph/tests/staged_writes.rs
+++ b/crates/omnigraph/tests/staged_writes.rs
@ -1,20 +1,21 @@
-//! Primitive-level tests for `TableStore`'s staged-write API
+//! Primitive-level tests for `TableStore`'s staged-write API. These
-//! (MR-794 step 1). These exercise `stage_append`, `stage_merge_insert`,
+//! exercise `stage_append`, `stage_merge_insert`, `scan_with_staged`,
-//! `scan_with_staged`, and `count_rows_with_staged` directly against a
+//! and `count_rows_with_staged` directly against a Lance dataset — no
-//! Lance dataset — no Omnigraph engine involved. The engine-level rewire
+//! Omnigraph engine involved. The engine-level use of these primitives
-//! (MR-794 step 2+) lives in `tests/runs.rs` once it lands.
+//! is exercised by `tests/runs.rs`.
 //!
 //! Test surface here:
 //! 1. `stage_append` + `scan_with_staged` shows committed + staged data
 //!    without duplicates.
 //! 2. `stage_merge_insert` of a row that supersedes a committed fragment
-//!    surfaces only the rewritten row, not both — the
+//!    surfaces only the rewritten row, not both, via the
-//!    `removed_fragment_ids` dedup landed in PR #66's `730631c`.
+//!    `removed_fragment_ids` dedup contract.
-//! 3. **Documented contract**: chained `stage_merge_insert` calls on the
+//! 3. **Documented contract**: chained `stage_merge_insert` calls on
-//!    same dataset whose source rows share keys produce duplicate rows in
+//!    the same dataset whose source rows share keys produce duplicate
-//!    `scan_with_staged`. The engine's parse-time D₂′ check (MR-794 step
+//!    rows in `scan_with_staged`. The engine's accumulator dedupes by
-//!    2+) prevents callers from triggering this; this test pins the
+//!    id at finalize time so this primitive-level pitfall doesn't
-//!    primitive's behavior so a future change either (a) preserves it or
+//!    surface in production paths; this test pins the primitive's
 //!    behavior so a future change either (a) preserves it or
 //!    (b) consciously fixes it (and updates this test).
 use arrow_array::{Array, Int32Array, RecordBatch, StringArray, UInt64Array};
@ -120,8 +121,9 @@ async fn stage_merge_insert_dedupes_superseded_committed_fragment() {
    assert!(
        !staged.removed_fragment_ids.is_empty(),
        "merge_insert that rewrites a committed row must set removed_fragment_ids \
-         (this is the dedup invariant from PR #66 commit 730631c — its absence \
+         so the scan-with-staged composer can shadow the superseded committed \
-         was caught by Cubic/Cursor/Codex on PR #66)"
+         fragment — without it, the committed row and its rewrite both appear, \
         producing duplicates by key"
    );
    // scan_with_staged: alice appears exactly once, with the new age.
@ -347,11 +349,11 @@ async fn stage_merge_insert_then_commit_persists_merged_view() {
 /// silently absent. `scanner.use_stats(false)` does not bypass this in
 /// lance 4.0.0.
 ///
-/// This test pins the actual behavior so a future change either preserves
+/// This test pins the actual behavior so a future change either
-/// it (and updates the doc) or fixes it (and rewrites this test). The
+/// preserves it (and updates the doc) or fixes it (and rewrites this
-/// engine's MR-794 step 2+ design uses in-memory pending-batch
+/// test). The engine's `MutationStaging` accumulator unions in-memory
-/// accumulation + DataFusion `MemTable` for read-your-writes instead, so
+/// pending batches with the committed scan via DataFusion `MemTable`
-/// production code is unaffected.
+/// for read-your-writes instead, so production code is unaffected.
 #[tokio::test]
 async fn scan_with_staged_with_filter_silently_drops_staged_rows() {
    let dir = tempfile::tempdir().unwrap();
@ -485,6 +487,8 @@ async fn chained_stage_merge_insert_with_shared_key_documents_duplicate_behavior
         here because this assertion failed: either (a) the primitive was \
         improved to dedupe across stages (good — update to assert == 1) \
         or (b) something subtler broke (investigate before changing the \
-         assertion). See PR #67 Codex P1 thread + .context/mr-794-step2-design.md §3.1."
+         assertion). The engine's MutationStaging accumulator dedupes by \
         id at finalize time so this primitive-level pitfall doesn't \
         surface in production paths — see exec/staging.rs."
    );
 }
--- a/docs/architecture.md
+++ b/docs/architecture.md
@ -63,7 +63,7 @@ flowchart TB
    subgraph engine[omnigraph engine]
        plan[exec query and mutation]:::l2
        gi[graph index CSR/CSC<br/>RuntimeCache LRU 8]:::l2
-        coord[coordinator<br/>ManifestRepo · CommitGraph · RunRegistry]:::l2
+        coord[coordinator<br/>ManifestRepo · CommitGraph]:::l2
    end
    subgraph storage[storage trait — wraps Lance]
@ -134,7 +134,7 @@ flowchart TB
        coord[GraphCoordinator]:::l2
        mr[ManifestRepo<br/>db/manifest.rs]:::l2
        cg[CommitGraph<br/>_graph_commits.lance]:::l2
-        rr[RunRegistry<br/>_graph_runs.lance]:::l2
+        stg[MutationStaging<br/>per-query in-memory accumulator<br/>exec/staging.rs]:::l2
    end
    subgraph idx[graph index]
@ -149,17 +149,18 @@ flowchart TB
    eq --> gi
    eq --> ts
    em --> stg
    em --> ts
    ld --> stg
    ld --> ts
    eq --> mr
    em --> mr
    coord --> mr
    coord --> cg
    coord --> rr
    ts --> st
 ```
-The engine binds the compiler IR to Lance. It owns multi-dataset coordination, the graph topology index, the run registry, and the snapshot/manifest read path.
+The engine binds the compiler IR to Lance. It owns multi-dataset coordination, the graph topology index, the per-query staging accumulator, and the snapshot/manifest read path.
 Code paths:
@ -169,6 +170,46 @@ Code paths:
 - Graph index: `crates/omnigraph/src/graph_index/`
 - Loader: `Omnigraph::ingest` at `crates/omnigraph/src/loader/mod.rs:74`
 ### Mutation atomicity — in-memory accumulator (MR-794)
 Inserts and updates inside `mutate_as` and the bulk loader's
 Append/Merge modes go through `MutationStaging`
 ([`crates/omnigraph/src/exec/staging.rs`](../crates/omnigraph/src/exec/staging.rs)),
 a per-query in-memory accumulator. No Lance HEAD advance happens during
 op execution; one `stage_*` + `commit_staged` per touched table runs
 at end-of-query, then the publisher commits the manifest atomically.
 ```
 op-1 (insert/update) → push RecordBatch → MutationStaging.pending[table]
 op-2 (insert/update) → read committed via Lance + pending via DataFusion
                       MemTable (read-your-writes) → push batch
 op-N → push batch
 ─── end of query ───────────────────────────────────────
 finalize: per pending table:
   concat batches → stage_append OR stage_merge_insert → commit_staged
 publisher: ManifestBatchPublisher::publish (one cross-table CAS)
 ```
 A failed op leaves Lance HEAD untouched on the staged tables: the next
 mutation proceeds normally with no drift to reconcile. Concrete
 contracts:
 - `D₂` parse-time rule: a query is either insert/update-only or
  delete-only. Mixed → reject. Deletes still inline-commit (Lance
  4.0.0 has no public two-phase delete); D₂ keeps the inline path safe.
 - `LoadMode::Overwrite` keeps the inline-commit path
  (truncate-then-append doesn't fit the staged shape; overwrite has no
  in-flight read-your-writes requirement).
 - Read sites consume `TableStore::scan_with_pending`, which Lance-scans
  the committed snapshot at the captured `expected_version` and unions
  with a DataFusion `MemTable` over the pending batches.
 This pattern realizes [docs/invariants.md §VI.25](invariants.md)
 (read-your-writes within a multi-statement mutation) and §VI.32
 (failure scope bounded) for inserts/updates by construction at the
 writer layer. See [docs/runs.md](runs.md) for the publisher CAS
 contract this builds on.
 ### Storage trait — today vs. roadmap
 ```mermaid
@ -256,13 +297,13 @@ Throughout the docs, capabilities are split into:
 - **MVCC**: every Lance write bumps a per-dataset version; the OmniGraph manifest version coordinates which sub-table versions are visible together.
 - **Snapshot isolation**: a query holds one `Snapshot` for its lifetime; concurrent writes don't leak in.
 - **Cross-branch isolation**: copy-on-write means readers and writers on different branches don't block each other.
- **Run isolation**: each transactional run lives on its own `__run__<id>` branch.
+- **Per-query staging**: `mutate_as` and `load` (Append/Merge) accumulate insert/update batches in an in-memory `MutationStaging`; one `stage_*` + `commit_staged` per touched table runs at end-of-query, then the publisher commits the manifest atomically. A mid-query failure leaves Lance HEAD untouched on staged tables. (MR-794; pre-v0.4.0 used a `__run__<id>` staging branch + Run state machine, removed in MR-771.)
 - **Schema-apply lock**: `__schema_apply_lock__` system branch serializes schema migrations.
 - **Fail-points** (`failpoints` cargo feature): `failpoints::maybe_fail("operation.step")?` in `branch_create`, publish, etc., for deterministic failure injection in tests.
 ## Workspace crates
 - `omnigraph-compiler` — schema and query grammars, catalog, IR, lowering, type checker, lint, migration planner, OpenAI-style embedding client.
- `omnigraph` (engine, published as `omnigraph-engine` on crates.io since v0.2.2) — the Lance-backed runtime: manifest, commit graph, run registry, snapshot, exec, merge, loader, Gemini embedding client.
+- `omnigraph` (engine, published as `omnigraph-engine` on crates.io since v0.2.2) — the Lance-backed runtime: manifest, commit graph, snapshot, exec (incl. per-query `MutationStaging` accumulator), merge, loader, Gemini embedding client.
 - `omnigraph-cli` — the `omnigraph` binary.
 - `omnigraph-server` — the `omnigraph-server` binary (Axum HTTP server).
--- a/docs/audit.md
+++ b/docs/audit.md
@ -1,6 +1,7 @@
 # Audit / Actor tracking
 - `Omnigraph::audit_actor_id: Option<String>` is the actor in effect.
- `_as` variants of every write API let callers override the actor: `begin_run_as`, `publish_run_as`, `ingest_as`, `mutate_as`, `branch_merge_as`, etc.
+- `_as` variants of every write API let callers override the actor: `mutate_as`, `ingest_as`, `branch_merge_as`, `apply_schema_as`, etc.
- Actor IDs are persisted both on `RunRecord.actor_id` and on `GraphCommit.actor_id`, with optional split storage in `_graph_commit_actors.lance` and `_graph_run_actors.lance`.
+- Actor IDs are persisted on `GraphCommit.actor_id` with split storage in `_graph_commit_actors.lance` (the commit graph is split into `_graph_commits.lance` for the linkage and `_graph_commit_actors.lance` for the actor map).
 - HTTP server uses the bearer-token actor automatically; CLI uses the local user / explicit env (no implicit actor).
 - Pre-v0.4.0 repos also stored actor IDs on `RunRecord.actor_id` in `_graph_runs.lance` / `_graph_run_actors.lance`. The Run state machine was removed in MR-771; those files are inert post-v0.4.0 and reclaimed by MR-770's production sweep.
--- a/docs/branches-commits.md
+++ b/docs/branches-commits.md
@ -37,7 +37,7 @@ Storage is split across two Lance datasets (both with stable row IDs):
 Notes:
- Every successful publish (load / change / merge / schema_apply / publish_run) appends one commit.
+- Every successful publish (load / change / merge / schema_apply) appends one commit.
 - Merge commits have two parents; linear commits have one.
 - API: `list_commits(branch)`, `get_commit(id)`, `head_commit_id_for_branch(branch)`.
@ -53,5 +53,5 @@ Notes:
 Filtered from `branch_list()` but visible to internals:
 - `__run__<run-id>` — ephemeral isolation branch for a transactional run.
 - `__schema_apply_lock__` — serializes schema migrations.
 - `__run__<run-id>` — legacy from the pre-v0.4.0 Run state machine (removed in MR-771). The branch-name guard predicate `is_internal_run_branch` is kept as defense-in-depth so users cannot create a branch matching the legacy prefix; the filter will be removed once production legacy branches are swept (MR-770).
--- a/docs/cli.md
+++ b/docs/cli.md
@ -56,12 +56,12 @@ omnigraph policy validate --config ./omnigraph.yaml
 omnigraph policy test --config ./omnigraph.yaml
 omnigraph policy explain --config ./omnigraph.yaml --actor act-alice --action read --branch main
-omnigraph run list ./repo.omni --json
+omnigraph commit list ./repo.omni --json
-omnigraph run show --uri ./repo.omni <run-id> --json
+omnigraph commit show --uri ./repo.omni <commit-id> --json
 omnigraph run publish --uri ./repo.omni <run-id> --json
 omnigraph run abort --uri ./repo.omni <run-id> --json
 ```
 (The legacy `omnigraph run list/show/publish/abort` subcommands were removed in MR-771; mutations and loads publish atomically and the commit graph (`omnigraph commit list`) is the audit surface.)
 `query lint` and `query check` are the same command surface. In v1, repo-backed
 lint uses local or `s3://` repo URIs; HTTP targets are only supported when you
 also pass `--schema`.
--- a/docs/constants.md
+++ b/docs/constants.md
@ -4,8 +4,8 @@
 |---|---|---|
 | `MANIFEST_DIR` | `__manifest` | `db/manifest/layout.rs` |
 | Commit graph dir | `_graph_commits.lance` | `db/commit_graph.rs` |
-| Run registry dir | `_graph_runs.lance` | `db/run_registry.rs` |
+| Run registry dir (legacy, removed MR-771) | `_graph_runs.lance` | inert post-v0.4.0; reclaimed by MR-770 |
-| Run branch prefix | `__run__` | `db/run_registry.rs` |
+| Run branch prefix (legacy, removed MR-771) | `__run__` | filtered by `is_internal_run_branch` defense-in-depth |
 | Schema apply lock | `__schema_apply_lock__` | `db/mod.rs` |
 | Manifest publisher retry budget | `PUBLISHER_RETRY_BUDGET = 5` | `db/manifest/publisher.rs` |
 | Internal manifest schema version | `INTERNAL_MANIFEST_SCHEMA_VERSION = 2` | `db/manifest/migrations.rs` |
--- a/docs/errors.md
+++ b/docs/errors.md
@ -9,6 +9,7 @@
 - `Manifest(ManifestError { kind: BadRequest|NotFound|Conflict|Internal, details: Option<ManifestConflictDetails>, … })`
  - `ManifestConflictDetails::ExpectedVersionMismatch { table_key, expected, actual }` — caller's `expected_table_versions` did not match the manifest's current latest non-tombstoned version (set by `OmniError::manifest_expected_version_mismatch`).
  - `ManifestConflictDetails::RowLevelCasContention` — Lance row-level CAS rejected the publish because a concurrent writer landed the same `object_id`. Retried internally by the publisher; only surfaces if the retry budget exhausts.
  - **D₂ parse-time rejection** (MR-794): a single mutation query that mixes inserts/updates with deletes errors out *before any I/O* with kind `BadRequest`. Message: `mutation '<name>' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes`. See [docs/query-language.md](query-language.md) for the rule and [docs/runs.md](runs.md) for the underlying staged-write rationale.
 - `MergeConflicts(Vec<MergeConflict>)`
 Compiler-side `NanoError` covers parse / catalog / type / storage / plan / execution / arrow / lance / IO / manifest / unique-constraint, each with structured spans (`SourceSpan { start, end }`) for ariadne-style diagnostics.
--- a/docs/execution.md
+++ b/docs/execution.md
@ -79,13 +79,16 @@ Hybrid example: `order { rrf(nearest($d.embedding, $q), bm25($d.body, $q_text))
 ## Mutation execution (`exec/mutation.rs`)
-Resolves expression values to literals, converts to typed Arrow arrays (`literal_to_typed_array(lit, DataType, num_rows)`), then writes:
+Resolves expression values to literals, converts to typed Arrow arrays (`literal_to_typed_array(lit, DataType, num_rows)`), then writes via Lance's two-phase distributed-write API at end-of-query:
- `insert` → Lance `WriteMode::Append`
+- `insert` (no `@key`, edges) → accumulate into `MutationStaging.pending` (Append mode); finalize calls `stage_append` once per touched table.
- `update` → Lance `merge_insert(WhenMatched::Update)`
+- `insert` (`@key` node) → accumulate into `pending` (Merge mode); finalize calls `stage_merge_insert` once per touched table.
- `delete` → Lance `merge_insert(WhenMatched::Delete)` (logical) or filtered overwrite.
+- `update` → scan committed via Lance + pending via DataFusion `MemTable` (read-your-writes), apply assignments, accumulate into `pending` (Merge mode).
 - `delete` → still inline-commits via `delete_where` (Lance 4.0.0 has no public two-phase delete); recorded into `MutationStaging.inline_committed`.
-Multi-statement mutations are atomic at the manifest commit boundary.
+**D₂ parse-time rule.** A single mutation query is either insert/update-only or delete-only. Mixed → reject before any I/O. The check fires in `enforce_no_mixed_destructive_constructive(&ir)` inside `execute_named_mutation`.
 Multi-statement mutations are atomic at the publisher commit boundary: every insert/update batch lives in memory until end-of-query, then exactly one `stage_*` + `commit_staged` runs per touched table, then `ManifestBatchPublisher::publish` commits the manifest atomically with per-table `expected_table_versions` CAS.
 ### Mutation flow — sequence
@ -93,57 +96,58 @@ Multi-statement mutations are atomic at the manifest commit boundary.
 sequenceDiagram
    autonumber
    participant client as Client
-    participant og as Omnigraph::mutate<br/>(mutation.rs:511)
+    participant og as Omnigraph::mutate_as<br/>(mutation.rs)
    participant cmp as omnigraph-compiler
-    participant runs as RunRegistry
+    participant stg as MutationStaging<br/>(exec/staging.rs)
    participant ts as table_store
    participant lance as Lance dataset
-    participant mr as ManifestRepo<br/>(manifest.rs:280)
+    participant pub as ManifestBatchPublisher
-    client->>og: mutate(target, source, name, params)
+    client->>og: mutate_as(branch, source, name, params, actor_id)
-    og->>cmp: parse + typecheck_query
+    og->>cmp: parse + typecheck + lower_mutation_query
-    cmp-->>og: CheckedQuery (Mutation IR)
+    cmp-->>og: MutationIR
-    og->>runs: begin_run(target, op_hash)<br/>fork __run__<id> from target head
+    og->>og: enforce_no_mixed_destructive_constructive (D₂)
-    runs-->>og: RunRecord
+    loop for each mutation op
-    loop for each mutation statement (on __run__<id>)
+        og->>og: resolve literals + build batch
-        og->>og: resolve expression literals<br/>literal_to_typed_array(lit, type, n)
+        alt insert / update (accumulate)
-        alt insert
+            og->>ts: open dataset @ pre-write version (first touch)
-            og->>ts: append RecordBatches
+            og->>stg: ensure_path + append_batch (PendingMode)
-            ts->>lance: WriteMode::Append → new fragment(s)
+            opt update — scan committed + pending
-        else update
+                og->>ts: scan_with_pending (Lance + DataFusion MemTable union)
-            og->>ts: merge_insert keyed by id
+                ts-->>og: matched batches
-            ts->>lance: merge_insert(WhenMatched::Update)
+            end
-        else delete
+        else delete (inline-commit, D₂ keeps separate)
-            og->>ts: merge_insert with delete predicate
+            og->>ts: delete_where (advances Lance HEAD)
-            ts->>lance: merge_insert(WhenMatched::Delete)
+            og->>stg: record_inline (SubTableUpdate)
        end
        lance-->>ts: new dataset version
        og->>mr: commit_updates(SubTableUpdate)<br/>per-statement commit on __run__<id>
        mr-->>og: ack
    end
-    og->>og: OCC: target head unchanged since begin_run?
+    og->>stg: finalize(db, branch)
-    og->>og: publish_run(run_id)
+    loop per pending table
-    alt fast path (target hasn't moved)
+        stg->>ts: stage_append OR stage_merge_insert (one per table)
-        og->>mr: commit_updates_on_branch(target, updates)<br/>promote run snapshot
+        ts-->>stg: StagedWrite (transaction + fragments)
-    else merge path (target advanced)
+        stg->>ts: commit_staged (advances Lance HEAD)
-        og->>og: branch_merge_internal(__run__<id>, target)<br/>three-way merge
+        ts-->>stg: new Dataset
    end
-    mr-->>og: new target snapshot
+    stg-->>og: (updates: Vec<SubTableUpdate>, expected_versions)
-    og->>runs: terminate_run(Published)
+    og->>pub: commit_updates_on_branch_with_expected
    pub->>pub: publisher CAS (cross-table OCC on __manifest)
    pub-->>og: new manifest version
    og-->>client: MutationResult
 ```
 **Code paths:**
- Entry: `Omnigraph::mutate` at `crates/omnigraph/src/exec/mutation.rs:511`
+- Entry: `Omnigraph::mutate_as` at `crates/omnigraph/src/exec/mutation.rs`
- Per-mutation orchestration: `mutate_with_current_actor` at `crates/omnigraph/src/exec/mutation.rs:539`
+- Per-mutation orchestration: `mutate_with_current_actor` at `crates/omnigraph/src/exec/mutation.rs`
- Per-statement commit on the run-branch: `commit_updates` (called from `execute_insert` / `execute_update` / `execute_delete` in `crates/omnigraph/src/exec/mutation.rs`)
+- D₂ check: `enforce_no_mixed_destructive_constructive` (in the same file)
- Run publish: `Omnigraph::publish_run` at `crates/omnigraph/src/db/omnigraph.rs:858`
+- Per-op execution: `execute_insert`, `execute_update`, `execute_delete_node`, `execute_delete_edge`
- Manifest commit primitive: `ManifestRepo::commit` at `crates/omnigraph/src/db/manifest.rs:280` (called from both per-statement `commit_updates` and the publish path)
+- Pending-aware reads: `TableStore::scan_with_pending` / `count_rows_with_pending` at `crates/omnigraph/src/table_store.rs`
 - Edge cardinality with pending: `validate_edge_cardinality_with_pending` at `crates/omnigraph/src/exec/mutation.rs`
 - Per-query accumulator: `crates/omnigraph/src/exec/staging.rs` (`MutationStaging`, `PendingTable`, `PendingMode`, `finalize`)
 - End-of-query Lance commit: `TableStore::stage_append`, `stage_merge_insert`, `commit_staged` at `crates/omnigraph/src/table_store.rs`
 - Manifest commit primitive: `commit_updates_on_branch_with_expected` at `crates/omnigraph/src/db/omnigraph/table_ops.rs`
-Multi-statement mutations don't get atomicity from a single final `commit` — they get it from the **run-branch + publish_run** pattern. By default a mutation forks a fresh `__run__<id>` branch (`begin_run`); each statement individually commits its sub-table updates to that run-branch. After all statements complete, an OCC pre-check verifies the target hasn't moved since the run started, then `publish_run` atomically promotes the run-branch into the target — either via the fast path (direct promotion if the target hasn't moved) or a three-way merge. That final publish is what gives multi-statement mutations their atomicity guarantee (per [`docs/invariants.md`](invariants.md) §VI.26). If anything fails mid-run, the run is failed and the run-branch is dropped without affecting the target.
+Atomicity guarantee for multi-statement mutations: a mid-query failure leaves Lance HEAD untouched on staged tables (no inline commit happened during op execution), so the next mutation proceeds normally with no `ExpectedVersionMismatch`. The publisher CAS at the very end either succeeds (manifest advances atomically across all touched sub-tables) or fails with a typed `ManifestConflictDetails::ExpectedVersionMismatch` (no partial publish). See [docs/invariants.md §VI.25 / §VI.32](invariants.md) and [docs/runs.md](runs.md).
 One exception: if the caller already targets a `__run__<id>` branch (mutation.rs:555), the mutation runs directly on that branch with no nested run wrapping — the assumption is the caller is managing the surrounding run lifecycle themselves. See [runs.md](runs.md) for the full run lifecycle.
 ## Bulk loader (`loader/mod.rs`)
@ -156,19 +160,18 @@ One exception: if the caller already targets a `__run__<id>` branch (mutation.rs
 ## Load modes (`LoadMode`)
-| Mode | Semantics |
+| Mode | Semantics | Path (post-MR-794) |
-|---|---|
+|---|---|---|
-| `Overwrite` | Replace all data in the target tables on the branch |
+| `Overwrite` | Replace all data in the target tables on the branch | Inline-commit per type, then publisher CAS at end-of-load. Truncate-then-append doesn't fit the staged shape; documented residual. |
-| `Append` | Strict insert; duplicates error |
+| `Append` | Strict insert; duplicates error | In-memory `MutationStaging` accumulator; one `stage_append` + `commit_staged` per touched table at end-of-load; publisher CAS. |
-| `Merge` | Upsert by id (`merge_insert`) |
+| `Merge` | Upsert by `id` (`merge_insert`) | Same accumulator; one `stage_merge_insert` per touched table at end-of-load (Merge mode dedupes by `id`, last-write-wins); publisher CAS. |
 For Append/Merge, a mid-load failure (RI / cardinality violation, validation error) leaves Lance HEAD untouched on the staged tables — the next load on the same tables proceeds normally with no `ExpectedVersionMismatch`. For Overwrite, a mid-load failure can still leave Lance HEAD on a partially-truncated table; the next overwrite replaces it.
 ## `load` vs `ingest`
- `load(branch, data, mode)` — direct load to a branch.
+- `load(branch, data, mode)` — direct load to a branch (single publisher commit per call).
- `ingest(branch, from, data, mode)` — branch-creating, transactional load:
+- `ingest(branch, from, data, mode)` — branch-creating wrapper: if `branch` doesn't exist, fork it from `from` (default `main`) via `branch_create_from`, then call `load(branch, data, mode)`.
  1. If target advanced since the run started, fork a fresh run branch from `from`.
  2. Load into the run branch (Append).
  3. If target hasn't moved, fast-publish; otherwise abort.
 - Returns `IngestResult { branch, base_branch, branch_created, mode, tables[] }`.
 - `ingest_as(actor_id)` records the actor on the resulting commit.
--- a/docs/invariants.md
+++ b/docs/invariants.md
@ -110,7 +110,7 @@ Specific defaults (timeout values, memory caps, TTL windows) are *configuration*
    *Status: aspirational — referential integrity at scale requires SIP-backed cross-table validation; not yet implemented. Cross-batch / cross-version uniqueness tracked in MR-714.*
 25. **Isolation: per-query snapshot; read-your-writes within and across queries in a session.** Each query reads from one consistent manifest version. Within a multi-statement mutation, the read subplan inside each write operator sees the writes from earlier statements. Across queries in a session, reads always resolve the latest manifest version — no reader pinning to older snapshots.
-    *Status: open — read-your-writes within a multi-statement mutation requires Kuzu-style local-uncommitted scan path; deferred per MR-737 §10.10.*
+    *Status: upheld for inserts/updates after MR-794 step 2+ — `MutationStaging`'s in-memory accumulator + `TableStore::scan_with_pending` (DataFusion `MemTable` union with the committed Lance scan, with merge-shadow semantics for chained updates) implements read-your-writes within a multi-statement mutation. Delete-touching mutations are limited to delete-only by parse-time D₂; closing the within-query RYW gap for deletes requires Lance's two-phase delete API (tracked: MR-793 / Lance-upstream lance-format/lance#6658). The "Lance HEAD ahead of `__manifest`" drift class is unreachable for op-execution failures (the partial-failure test pins this), but a narrowed residual remains at the finalize→publisher boundary because Lance has no multi-dataset commit primitive — see [docs/runs.md](runs.md) "Finalize → publisher residual".*
 26. **Durability before acknowledgement.** Commit returns only after the substrate has confirmed durable persistence. No "fast" or "fire-and-forget" durability levels.
--- a/docs/query-language.md
+++ b/docs/query-language.md
@ -64,6 +64,14 @@ Used inside MATCH or as expressions inside RETURN/ORDER:
 `<value>` is a literal, `$param`, or `now()`. Multi-statement mutations execute atomically (added in v0.2.0).
 ### D₂ — mixed insert/update + delete is rejected at parse time
 A single mutation query must be **either insert/update-only or delete-only**. Mixed → rejected before any I/O with the message:
 > `mutation '<name>' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes. This restriction lifts when Lance exposes a two-phase delete API (tracked: MR-793 / Lance-upstream).`
 Reason: under the staged-write rewire (MR-794), inserts and updates accumulate in memory and commit at end-of-query, while deletes still inline-commit (Lance 4.0.0 has no public two-phase delete). Mixing creates ordering hazards (same-row insert→delete becomes a no-op because the staged insert isn't visible to delete; cascading deletes of just-inserted edges break referential integrity by silent design). Until Lance exposes `DeleteJob::execute_uncommitted`, the parse-time rejection keeps both paths atomic and correct. See [docs/runs.md](runs.md) and [docs/invariants.md §VI.25](invariants.md).
 ## IR (Intermediate Representation)
 `QueryIR { name, params, pipeline: Vec<IROp>, return_exprs, order_by, limit }`
--- a/docs/releases/v0.4.1.md
+++ b/docs/releases/v0.4.1.md
@ -0,0 +1,143 @@
 # Omnigraph v0.4.1
 Omnigraph v0.4.1 closes the multi-statement-mutation atomicity gap that
 v0.4.0 documented as a known limitation. Inserts and updates now route
 through an in-memory `MutationStaging` accumulator and commit via Lance's
 two-phase distributed-write API at end-of-query. A failed mid-query op
 no longer leaves Lance HEAD drifted on the touched table — the next
 mutation proceeds normally.
 ## Highlights
 - **Staged-write rewire (MR-794)**: `mutate_as` and `load` (Append /
  Merge modes) accumulate insert/update batches into
  `MutationStaging.pending` per touched table. No Lance HEAD advance
  happens during op execution; one `stage_*` + `commit_staged` per
  table runs at end-of-query, then `ManifestBatchPublisher::publish`
  commits the manifest atomically. **For op-execution failures**
  (validation errors, missing endpoints, parse-time D₂ rejection), Lance
  HEAD on every staged table is untouched and the next mutation
  proceeds normally. A narrowed residual remains at the
  finalize→publisher boundary (multi-table `commit_staged` is not
  atomic with the manifest commit) — see [docs/runs.md](../runs.md)
  "Finalize → publisher residual" for details.
 - **D₂ parse-time rule**: a single mutation query is either
  insert/update-only or delete-only. Mixed → rejected with a clear
  error directing the caller to split into two queries. Lance 4.0.0
  has no public two-phase delete; deletes still inline-commit, and D₂
  keeps that path safe.
 - **Read-your-writes via DataFusion `MemTable`**: read sites in
  multi-statement mutations consume `TableStore::scan_with_pending`,
  which Lance-scans the committed snapshot at the captured
  `expected_version` and unions with a DataFusion `MemTable` over the
  pending batches. Replaces the previous "reopen at staged Lance
  version" pattern.
 - **Coordinator swap-restore eliminated** from `mutate_with_current_actor`.
  Branch is threaded explicitly through the per-op execution path
  (`execute_named_mutation`, `execute_insert`, `execute_update`,
  `execute_delete*`, `validate_edge_insert_endpoints`,
  `ensure_node_id_exists`). The `swap_coordinator_for_branch` /
  `restore_coordinator` API and `CoordinatorRestoreGuard` are removed
  from `mutation.rs`. (`merge.rs` keeps its own swap pattern; that's
  a separate workflow tracked in MR-793.)
 - **`docs/invariants.md` §VI.25** flips from `aspirational/open` to
  `upheld for inserts/updates`. The within-query read-your-writes
  guarantee is now load-bearing for the publisher CAS contract.
 ## Behavior changes
 - A failed multi-statement mutation no longer surfaces
  `ExpectedVersionMismatch` on the *next* mutation against the same
  table. The next call proceeds normally — Lance HEAD on staged
  tables is unchanged.
 - Mixed insert/update + delete in one query is rejected at parse
  time. Existing test queries that mixed both must be split.
 - `MutationStaging`'s shape changed: `pending: HashMap<String, PendingTable>`
  + `inline_committed: HashMap<String, SubTableUpdate>` replaces the
  previous `latest: HashMap<String, StagedTable>`. This is an internal
  type; no public API impact.
 ## Residual / out of scope
 - **`LoadMode::Overwrite`** keeps the legacy inline-commit path
  (truncate-then-append doesn't fit the staged shape). A mid-overwrite
  failure can still drift Lance HEAD on a partially-truncated table;
  the next overwrite replaces it. Operator-driven, rare.
 - **Delete-only multi-statement mutations** still inline-commit per op.
  D₂ keeps inserts/updates from coexisting with deletes, so the
  inline path remains atomic per op but not per query for delete-only
  cascades. Closing this requires Lance to expose
  `DeleteJob::execute_uncommitted`; tracked in MR-793 / Lance-upstream.
 - **`schema_apply`, `branch_merge_internal`, `ensure_indices`** still
  use Lance's inline-commit APIs. The two-phase pattern is in
  `mutate_as` and `load` only; hoisting it to a storage-trait
  invariant covering all writers is MR-793.
 ## Tests added
 - `tests/runs.rs::partial_failure_leaves_target_queryable_and_unblocks_next_mutation`
  (replaces the old `partial_failure_observably_rolls_back_but_blocks_next_mutation_on_same_table`)
 - `tests/runs.rs::mutation_rejects_mixed_insert_and_delete_at_parse_time`
 - `tests/runs.rs::mixed_insert_and_update_on_same_person_coalesces_to_one_merge`
 - `tests/runs.rs::multiple_appends_to_same_edge_coalesce_to_one_append`
 - `tests/runs.rs::multi_statement_inserts_publish_exactly_once`
 - `tests/runs.rs::load_with_bad_edge_reference_unblocks_next_load`
 - `tests/runs.rs::load_with_cardinality_violation_unblocks_next_load`
 ## Files changed
 - `crates/omnigraph/src/exec/staging.rs` (NEW) — `MutationStaging`,
  `PendingTable`, `PendingMode`, `StagedTablePath`,
  `dedupe_merge_batches_by_id`.
 - `crates/omnigraph/src/exec/mutation.rs` — D₂ check; per-op
  rewires (`execute_insert`, `execute_update`, `execute_delete*`);
  branch threading; coordinator-swap removal; helper
  `validate_edge_cardinality_with_pending`; helper
  `concat_match_batches_to_schema`; `apply_assignments` updated to
  copy unassigned blob columns from full-schema scans.
 - `crates/omnigraph/src/loader/mod.rs` — `load_jsonl_reader` split:
  staged path for Append/Merge, legacy inline-commit path for
  Overwrite. Helpers `collect_node_ids_with_pending` and
  `validate_edge_cardinality_with_pending_loader`.
 - `crates/omnigraph/src/table_store.rs` — `scan_with_pending`,
  `count_rows_with_pending` (DataFusion `MemTable`-backed union with
  Lance scan).
 - `Cargo.toml` (workspace) + `crates/omnigraph/Cargo.toml` — added
  `datafusion = "52"` direct dep (transitively pulled by Lance
  already; required for `MemTable`).
 - `docs/runs.md` — removed "Known limitation" section; documented
  the new accumulator + D₂ + LoadMode::Overwrite residual.
 - `docs/invariants.md` — §VI.25 status flipped to `upheld for
  inserts/updates`.
 - `docs/architecture.md` — added "Mutation atomicity — in-memory
  accumulator (MR-794)" subsection; refreshed the engine + state
  diagrams to drop `RunRegistry` and add `MutationStaging`.
 - `docs/execution.md` — rewrote the mutation flow sequence diagram
  for the staged-write path; updated the `LoadMode` table to call
  out per-mode commit semantics; rewrote `load` vs `ingest`.
 - `docs/query-language.md` — documented the D₂ parse-time rule.
 - `docs/errors.md` — added the D₂ `BadRequest` rejection path.
 - `docs/storage.md` — dropped the live `_graph_runs.lance` reference
  (legacy from MR-771) from the layout diagram and prose.
 - `docs/branches-commits.md` — moved `__run__<id>` to a legacy note;
  removed `publish_run` from the publish-trigger list.
 - `docs/audit.md` — current `_as` API list refreshed; legacy
  `RunRecord.actor_id` moved to a historical note.
 - `docs/constants.md` — marked the run registry / branch-prefix rows
  as legacy.
 - `docs/cli.md` — replaced the legacy `omnigraph run *` quickstart
  block with `omnigraph commit list/show`.
 - `docs/testing.md` — extended the `runs.rs` row to cover the new
  MR-794 contract tests; added the `staged_writes.rs` row.
 - `AGENTS.md` (CLAUDE.md symlink) — updated the atomic-per-query
  description and the L2 capability matrix row.
 ## Included Changes
 - MR-794 step 2+ — rewire `mutate_as` and `load` via in-memory
  `MutationStaging` + `stage_*` / `commit_staged` per touched table at
  end-of-query.
 - (MR-794 step 1 shipped in v0.4.0's PR #67 — `StagedWrite`,
  `stage_append`, `stage_merge_insert`, `commit_staged`,
  `scan_with_staged`, `count_rows_with_staged` — and is the substrate
  this release builds on.)
--- a/docs/runs.md
+++ b/docs/runs.md
@ -20,22 +20,96 @@ publisher's row-level CAS on `__manifest` is the single fence.
 A `.gq` query with multiple ops (e.g. `insert Person … insert Knows …`)
 must observe earlier ops' writes when validating later ops (referential
-integrity, edge cardinality). After demotion this is implemented via an
+integrity, edge cardinality). After MR-794 step 2+ this is implemented
-in-process `MutationStaging` accumulator in
+via an in-memory `MutationStaging` accumulator in
-`crates/omnigraph/src/exec/mutation.rs`:
+[`crates/omnigraph/src/exec/staging.rs`](../crates/omnigraph/src/exec/staging.rs),
 shared by both `mutate_as` and the bulk loader:
 - On the first touch of each table, the pre-write manifest version is
-  captured into `expected_versions[table_key]`.
+  captured into `expected_versions[table_key]` (the publisher's CAS
- Subsequent ops on the same table re-open the dataset at the locally
+  fence at end-of-query).
-  staged Lance version (bypassing the manifest, which has not been
+- Each insert/update op pushes a `RecordBatch` into the per-table
-  committed yet) so they see prior writes.
+  pending accumulator. Lance HEAD does **not** advance during op
- One `commit_with_expected(updates, expected_versions)` at the end
+  execution.
-  publishes the lot atomically. Cross-table conflicts surface as
+- Read sites (validation, predicate matching for `update`) consume
  `TableStore::scan_with_pending`, which scans committed via Lance
  and applies the same SQL filter to the pending batches via DataFusion
  `MemTable`. Same-query writes are visible to subsequent reads.
 - At end-of-query, `MutationStaging::finalize` issues exactly one
  `stage_*` + `commit_staged` per touched table (concatenating
  accumulated batches; merge-mode dedupes by `id`, last-write-wins),
  and the publisher publishes the manifest atomically across all
  touched sub-tables. Cross-table conflicts surface as
  `ManifestConflictDetails::ExpectedVersionMismatch`.
 - **Deletes still inline-commit.** Lance's `Dataset::delete` is not
  exposed as a two-phase op in 4.0.0; deletes go through `delete_where`
  immediately and record their post-write state in
  `MutationStaging.inline_committed`. The parse-time D₂ rule (below)
  prevents inserts/updates from coexisting with deletes in one query,
  so the inline path is safe for delete-only mutations.
 This upholds [docs/invariants.md §VI.23](invariants.md) (atomicity per
-query) and §VI.25 (read-your-writes within a multi-statement mutation —
+query) and §VI.25 (read-your-writes within a multi-statement mutation,
-previously aspirational, now upheld).
+upheld).
 ### D₂ — parse-time mixed-mode rejection
 A single mutation query is either insert/update-only or delete-only.
 Mixed → rejected at parse time with a clear error directing the user to
 split the query. Reason: mixing creates ordering hazards
 (insert→delete on the same row would silently no-op because the staged
 insert isn't visible to delete; cascading deletes of just-inserted
 edges break referential integrity). Until Lance exposes a two-phase
 delete API, the parse-time rejection keeps both paths atomic and
 correct. Tracked: MR-793, plus a Lance-upstream ticket.
 ### `LoadMode::Overwrite` residual
 The bulk loader's Append and Merge modes use the staged-write path
 described above. `LoadMode::Overwrite` keeps the legacy inline-commit
 path: truncate-then-append doesn't fit the staged shape cleanly in
 Lance 4.0.0, and overwrite has no in-flight read-your-writes
 requirement (the prior data is being wiped). A mid-overwrite failure
 can leave Lance HEAD on a partially-truncated table; the next overwrite
 will replace it. Operator-driven (rare in agent workloads); document
 permanently until Lance exposes `Operation::Overwrite { fragments }` as
 a two-phase op.
 ### Finalize → publisher residual
 The staged-write rewire eliminates one drift class **by construction at
 the writer layer**: an op that fails before pushing to the in-memory
 accumulator (validation errors, missing endpoints, parse-time D₂
 rejection) leaves Lance HEAD untouched on every staged table. This is
 the case the `partial_failure_leaves_target_queryable_and_unblocks_next_mutation`
 test pins.
 A second, narrower drift class remains. `MutationStaging::finalize`
 runs `stage_*` + `commit_staged` per touched table sequentially, then
 the publisher commits the manifest. Lance has no multi-dataset atomic
 commit, so the per-table `commit_staged` calls are independent
 operations: if commit_staged on table N+1 fails *after* commit_staged
 on tables 1..N succeeded, or if the publisher's CAS pre-check rejects
 *after* every commit_staged succeeded, tables 1..N are left at
 `Lance HEAD = manifest_pinned + 1`. The next mutation against those
 tables surfaces `ManifestConflictDetails::ExpectedVersionMismatch` —
 the same loud failure mode the rewire was designed to make rare, just
 no longer "unreachable."
 Triggers: transient Lance write errors during finalize (object-store
 retry budget exhaustion, disk full); persistent publisher contention
 exceeding `PUBLISHER_RETRY_BUDGET = 5` retries. Closing this requires
 either a Lance multi-dataset atomic-commit primitive (filed upstream
 alongside the two-phase delete request) or a manifest-layer journal
 that replays staged commits on next open. Both are heavyweight; the
 v1 stance is "narrowed window, documented residual, surface the loud
 error when it fires."
 The publisher-CAS contract is unchanged: a *concurrent writer* that
 advances any of our touched tables between snapshot capture and
 publisher commit produces exactly one winner. The residual above is
 about *our* abandoned commits in the failure path, not about
 concurrency races.
 ## Conflict shape
@ -59,40 +133,32 @@ list`.
 `_graph_runs.lance` belongs in MR-770 (the production sweep) — this PR
 stops *creating* run state but does not destroy legacy bytes on disk.
-## Known limitation: mid-query partial failure on the same table
+## Mid-query partial failure: closed by MR-794
-A multi-statement `.gq` mutation where op-N writes a Lance fragment
+The pre-MR-794 design had a known limitation: a multi-statement `.gq`
-successfully and op-N+1 then fails leaves the touched table at
+mutation where op-N inline-committed a Lance fragment and op-N+1 then
-`Lance HEAD = manifest_version + 1`. The query is atomic at the manifest
+failed left the touched table at `Lance HEAD = manifest_version + 1`,
-level (the publisher never publishes, so reads at the pinned manifest
+blocking the next mutation with `ExpectedVersionMismatch`.
 version do *not* see op-N's data), but the *next* mutation against the
 same table fails loudly with
 `ManifestConflictDetails::ExpectedVersionMismatch` because
 `ensure_expected_version` enforces strict equality between Lance HEAD and
 the manifest's pinned version.
-**Why the engine doesn't auto-rollback**: Lance's `Dataset::restore()` is
+MR-794 (step 1 + step 2+) closed this for inserts/updates **by
-*not* a rewind — it appends a new commit (containing the desired
+construction at the writer layer**: insert and update batches accumulate
-historical version's data) and advances HEAD further. There is no Lance
+in memory; no Lance HEAD advance happens during op execution; one
-API to delete a committed version. A proper fix requires writing each
+`stage_*` + `commit_staged` per touched table runs at end-of-query, and
-mutation's per-table fragments to a *transient Lance branch* on the
+only after every op succeeded. A failed op leaves Lance HEAD untouched
-sub-table, then fast-forwarding main on success or dropping the branch
+on the staged tables, so the next mutation proceeds normally with no
-on failure. That work is tracked as a follow-up to MR-771; in the
+drift to reconcile.
 meantime:
- **In practice this is rare.** Most schema-language validation
+The cancellation case (future drop mid-mutation) inherits the same
-  (`@key`, `@enum`, `@range`, intra-batch uniqueness, edge-endpoint
+guarantee — the in-memory accumulator evaporates with the dropped task
-  existence) runs *before* any Lance write inside the failing op, so
+and no Lance write was ever issued.
  single-statement mutations never trip this. The narrow path is
  multi-statement queries (`insert ... insert ...`,
  `insert ... update ...`) where a late op fails on validation that
  depends on earlier ops' staged data.
 - **Workaround**: callers that hit this should refresh the handle and
  retry the mutation; if Lance HEAD remains drifted the
  `omnigraph cleanup` command will GC the orphan version once a later
  successful commit on the same table moves HEAD past it. (`cleanup`
  cannot reclaim an orphan that *is* the current Lance HEAD; that case
  needs the per-table-branch follow-up to fully heal.)
-The cancellation case (future drop mid-mutation) has the same shape and
+For delete-touching mutations the legacy inline-commit shape is
-the same workaround.
+preserved (Lance has no public two-phase delete in 4.0.0) — the same
 narrow window remains. The parse-time D₂ rule prevents inserts/updates
 from coexisting with deletes in one query, so a pure-delete failure
 cannot drift any staged-table state. If a delete-only multi-table
 mutation fails mid-cascade, the same workaround as before applies
 (retry; rely on `omnigraph cleanup` once a later successful commit
 moves HEAD past the orphan version). Closing this requires Lance to
 expose `DeleteJob::execute_uncommitted`; tracked in MR-793 and a
 Lance-upstream ticket.
--- a/docs/storage.md
+++ b/docs/storage.md
@ -22,7 +22,7 @@ OmniGraph is **not** a single Lance dataset; it is a *graph* of datasets coordin
  - `edges/{fnv1a64-hex(edge_type_name)}` — one Lance dataset per edge type
  - `__manifest/` — the catalog of all sub-tables and their published versions
  - `_graph_commits.lance` / `_graph_commit_actors.lance` — the commit graph and its actor map
-  - `_graph_runs.lance` / `_graph_run_actors.lance` — the run registry and its actor map
+  - (legacy `_graph_runs.lance` / `_graph_run_actors.lance` from pre-v0.4.0 repos are inert; the run state machine was removed in MR-771 and these files are cleaned up via MR-770's production sweep)
 - **Manifest row schema** (`object_id, object_type, location, metadata, base_objects, table_key, table_version, table_branch, row_count`):
  - `object_type` ∈ `table | table_version | table_tombstone`
  - `table_key` ∈ `node:<TypeName> | edge:<EdgeName>`
@ -63,14 +63,12 @@ flowchart TB
    nodes["nodes/{fnv1a64-hex}/<br/>one dataset per node type"]:::l2
    edges["edges/{fnv1a64-hex}/<br/>one dataset per edge type"]:::l2
    cgraph["_graph_commits.lance/<br/>_graph_commit_actors.lance/"]:::l2
    runs["_graph_runs.lance/<br/>_graph_run_actors.lance/"]:::l2
    refs["_refs/branches/{name}.json<br/>graph-level branches"]:::l2
    repo --> manifest
    repo --> nodes
    repo --> edges
    repo --> cgraph
    repo --> runs
    repo --> refs
    subgraph dataset[Inside each Lance dataset — L1]
@ -91,7 +89,7 @@ flowchart TB
 - **Repo root** is one directory (or S3 prefix). Everything below is part of one OmniGraph repo.
 - **`__manifest/`** is a Lance dataset whose rows describe which sub-table version is published at which graph-branch. Reading a snapshot starts here.
 - **`nodes/`** and **`edges/`** are sibling directories holding one Lance dataset per declared type. Names are `fnv1a64-hex` of the type name to keep paths fixed-length and case-safe.
- **`_graph_commits.lance` / `_graph_runs.lance`** are L2 datasets that record the graph-level commit DAG and run registry respectively (each has a paired `*_actors.lance` for the actor map).
+- **`_graph_commits.lance`** is an L2 dataset that records the graph-level commit DAG, with a paired `_graph_commit_actors.lance` for the actor map. (Pre-v0.4.0 repos also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; MR-770 sweeps these in production.)
 - **`_refs/branches/{name}.json`** is graph-level branch metadata — pointers from a branch name to the manifest version it heads.
 - **Inside each Lance dataset** (orange): the standard Lance directory layout. `_versions/{n}.manifest` records every commit; `data/` holds the actual Arrow fragments; `_indices/{uuid}/` holds index segments with their own `fragment_bitmap` for partial coverage; `_refs/` holds Lance-native per-dataset branches and tags.
--- a/docs/testing.md
+++ b/docs/testing.md
@ -19,7 +19,8 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
 |---|---|
 | `end_to_end.rs` | Full init → load → query/mutate flow |
 | `branching.rs` | Branch create / list / delete, lazy fork |
-| `runs.rs` | Transactional runs (begin/publish/abort), idempotency |
+| `runs.rs` | Direct-publish writes: cancellation, concurrent-writer CAS, multi-statement atomicity, MR-794 staged-write rewire (D₂ rejection, insert+update coalesce, multi-append coalesce, partial-failure recovery, load RI/cardinality recovery) |
 | `staged_writes.rs` | TableStore staged-write primitives (`stage_append`, `stage_merge_insert`, `commit_staged`, `scan_with_staged`, `count_rows_with_staged`) — primitive-level only; engine code uses the in-memory `MutationStaging` accumulator instead |
 | `lifecycle.rs` | Repo lifecycle, schema state |
 | `point_in_time.rs` | Snapshots, time travel (`snapshot_at_version`, `entity_at`) |
 | `changes.rs` | `diff_between` / `diff_commits` |