Merge branch 'main' into ragnorc/omnigraph-mcp-crate

Folds in v0.7.2 (release #301) + RFC-013 Phase 7 (graph lineage in __manifest, internal schema v3→v4 migration #299; WriteTxn #298; recovery convergence #296) under the MCP branch. Conflict resolutions (2 files): - crates/omnigraph-server/Cargo.toml: take main's 0.7.2 path-dep constraints; keep our omnigraph-mcp dep (bumped to 0.7.2). - docs/releases/v0.8.0.md (add/add): both branches drafted v0.8.0 notes for the same next minor — combined them. v0.8.0 now documents BOTH the MCP surface (ours) and main's __manifest lineage fold + the breaking internal-schema-v4 upgrade-order requirement (kept prominent under Upgrade notes). Corrected our 'no breaking changes / on-disk format unchanged' line, which the v4 migration makes false. Coherence: omnigraph-mcp [package] + Cargo.lock bumped 0.7.1→0.7.2; openapi.json auto-merged to info.version 0.7.2 (no API-surface drift from the incoming engine-internal commits). Verification deferred to CI (no local rebuild).
2026-06-27 02:39:38 +02:00 · 2026-06-25 15:53:53 +02:00 · 2026-06-25 15:53:53 +02:00 · 4d4c2164de
commit 4d4c2164de
parent 83446cdb2f 1c5cb8741e
62 changed files with 5898 additions and 1053 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -16,7 +16,7 @@ Tools that support `@`-imports (Claude Code) auto-include all three files via th

 `CLAUDE.md` is a symlink to this file — there is exactly one source of truth. Edit `AGENTS.md`.

-**Version surveyed:** 0.7.1
+**Version surveyed:** 0.7.2
 **Workspace crates:** `omnigraph-compiler`, `omnigraph` (engine), `omnigraph-policy`, `omnigraph-api-types` (shared HTTP wire DTOs), `omnigraph-cluster`, `omnigraph-cli`, `omnigraph-server`
 **Storage substrate:** Lance 7.x (columnar, versioned, branchable)
 **License:** MIT
@ -264,7 +264,7 @@ omnigraph policy explain --cluster ./company-brain --graph knowledge --actor act
 | Schema language | — | `.pg` + Pest grammar + catalog + interfaces + constraints + annotations |
 | Query language | — | `.gq` + Pest grammar + IR + lowering + linter |
 | Schema migration planning | — | `plan_schema_migration` + `apply_schema` step types + `__schema_apply_lock__` |
-| Commit graph (DAG) across whole graph | — | `_graph_commits.lance` with linear + merge parents, ULID ids, actor map |
+| Commit graph (DAG) across whole graph | — | Lineage (linear + merge parents, ULID ids, actor) stored as `graph_commit`/`graph_head` rows in `__manifest`, written in the same publish CAS as the table-version rows (RFC-013 Phase 7 — no separate `_graph_commits.lance` write; manifest→commit-graph atomicity gap closed); the in-memory commit graph is a projection of those rows |
 | Per-query atomic writes | — | In-memory `MutationStaging.pending` accumulator + `stage_*` / `commit_staged` per touched table at end-of-query + publisher CAS via `commit_with_expected` (single manifest commit per `mutate_as` / `load`); D₂ parse-time rule keeps inserts/updates and deletes from mixing |
 | Three-way row-level merge | — | `OrderedTableCursor` + `StagedTableWriter`, structured `MergeConflictKind` |
 | Change feeds | — | `diff_between` / `diff_commits` with manifest fast path + ID streaming |
--- a/Cargo.lock
+++ b/Cargo.lock
@ -4851,7 +4851,7 @@ dependencies = [

 [[package]]
 name = "omnigraph-api-types"
-version = "0.7.1"
+version = "0.7.2"
 dependencies = [
 "omnigraph-compiler",
 "omnigraph-engine",
@ -4863,7 +4863,7 @@ dependencies = [

 [[package]]
 name = "omnigraph-cli"
-version = "0.7.1"
+version = "0.7.2"
 dependencies = [
 "assert_cmd",
 "clap",
@ -4887,7 +4887,7 @@ dependencies = [

 [[package]]
 name = "omnigraph-cluster"
-version = "0.7.1"
+version = "0.7.2"
 dependencies = [
 "fail",
 "omnigraph-compiler",
@ -4895,6 +4895,7 @@ dependencies = [
 "serde",
 "serde_json",
 "serde_yaml",
+ "serial_test",
 "sha2 0.10.9",
 "tempfile",
 "thiserror",
@ -4905,7 +4906,7 @@ dependencies = [

 [[package]]
 name = "omnigraph-compiler"
-version = "0.7.1"
+version = "0.7.2"
 dependencies = [
 "ahash",
 "arrow-array",
@ -4924,7 +4925,7 @@ dependencies = [

 [[package]]
 name = "omnigraph-engine"
-version = "0.7.1"
+version = "0.7.2"
 dependencies = [
 "arc-swap",
 "arrow-array",
@ -4968,7 +4969,7 @@ dependencies = [

 [[package]]
 name = "omnigraph-mcp"
-version = "0.7.1"
+version = "0.7.2"
 dependencies = [
 "async-trait",
 "axum",
@ -4982,7 +4983,7 @@ dependencies = [

 [[package]]
 name = "omnigraph-policy"
-version = "0.7.1"
+version = "0.7.2"
 dependencies = [
 "cedar-policy",
 "clap",
@ -4995,7 +4996,7 @@ dependencies = [

 [[package]]
 name = "omnigraph-server"
-version = "0.7.1"
+version = "0.7.2"
 dependencies = [
 "arc-swap",
 "async-trait",
--- a/crates/omnigraph-api-types/Cargo.toml
+++ b/crates/omnigraph-api-types/Cargo.toml
@ -1,6 +1,6 @@
 [package]
 name = "omnigraph-api-types"
-version = "0.7.1"
+version = "0.7.2"
 edition = "2024"
 description = "Shared HTTP wire DTOs for Omnigraph — request/response types and engine-result → DTO mappings used by both omnigraph-server and omnigraph-cli (RFC-009). Plain serde/utoipa types; no transport or server internals."
 license = "MIT"
@ -9,8 +9,8 @@ homepage = "https://github.com/ModernRelay/omnigraph"
 documentation = "https://docs.rs/omnigraph-api-types"

 [dependencies]
-omnigraph = { package = "omnigraph-engine", path = "../omnigraph", version = "0.7.1" }
-omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.1" }
+omnigraph = { package = "omnigraph-engine", path = "../omnigraph", version = "0.7.2" }
+omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.2" }
 serde = { workspace = true }
 serde_json = { workspace = true }
 utoipa = { workspace = true }
--- a/crates/omnigraph-cli/Cargo.toml
+++ b/crates/omnigraph-cli/Cargo.toml
@ -1,6 +1,6 @@
 [package]
 name = "omnigraph-cli"
-version = "0.7.1"
+version = "0.7.2"
 edition = "2024"
 description = "CLI for the Omnigraph graph database."
 license = "MIT"
@ -13,12 +13,12 @@ name = "omnigraph"
 path = "src/main.rs"

 [dependencies]
-omnigraph = { package = "omnigraph-engine", path = "../omnigraph", version = "0.7.1" }
-omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.1" }
-omnigraph-api-types = { path = "../omnigraph-api-types", version = "0.7.1" }
-omnigraph-cluster = { path = "../omnigraph-cluster", version = "0.7.1" }
-omnigraph-policy = { path = "../omnigraph-policy", version = "0.7.1" }
-omnigraph-server = { path = "../omnigraph-server", version = "0.7.1" }
+omnigraph = { package = "omnigraph-engine", path = "../omnigraph", version = "0.7.2" }
+omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.2" }
+omnigraph-api-types = { path = "../omnigraph-api-types", version = "0.7.2" }
+omnigraph-cluster = { path = "../omnigraph-cluster", version = "0.7.2" }
+omnigraph-policy = { path = "../omnigraph-policy", version = "0.7.2" }
+omnigraph-server = { path = "../omnigraph-server", version = "0.7.2" }
 clap = { workspace = true }
 color-eyre = { workspace = true }
 serde = { workspace = true }
--- a/crates/omnigraph-cluster/Cargo.toml
+++ b/crates/omnigraph-cluster/Cargo.toml
@ -1,6 +1,6 @@
 [package]
 name = "omnigraph-cluster"
-version = "0.7.1"
+version = "0.7.2"
 edition = "2024"
 description = "Cluster configuration validation, planning, and config-only apply for Omnigraph."
 license = "MIT"
@ -14,8 +14,8 @@ documentation = "https://docs.rs/omnigraph-cluster"
 failpoints = ["dep:fail", "fail/failpoints", "omnigraph/failpoints"]

 [dependencies]
-omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.1" }
-omnigraph = { package = "omnigraph-engine", path = "../omnigraph", version = "0.7.1" }
+omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.2" }
+omnigraph = { package = "omnigraph-engine", path = "../omnigraph", version = "0.7.2" }
 fail = { workspace = true, optional = true }
 serde = { workspace = true }
 serde_json = { workspace = true }
@ -30,5 +30,6 @@ tokio = { workspace = true }
 ulid = { workspace = true }

 [dev-dependencies]
+serial_test = "3"
 tempfile = { workspace = true }
 tokio = { workspace = true }
--- a/crates/omnigraph-cluster/src/config.rs
+++ b/crates/omnigraph-cluster/src/config.rs
@ -474,7 +474,7 @@ pub(crate) async fn preview_schema_migration(
    Ok(preview.plan)
 }

-struct LiveGraphObservation {
+pub(crate) struct LiveGraphObservation {
    manifest_version: u64,
    schema_digest: String,
 }
@ -494,7 +494,7 @@ pub(crate) async fn observe_live_graph(graph_uri: &str) -> Result<LiveGraphObser
    })
 }

-struct GraphObservationJson<'a> {
+pub(crate) struct GraphObservationJson<'a> {
    address: &'a str,
    graph_uri: &'a str,
    observed_at: &'a str,
@ -949,7 +949,7 @@ pub(crate) fn validate_id(kind: &str, path: &str, value: &str, diagnostics: &mut
    }
 }

-enum PolicyTarget {
+pub(crate) enum PolicyTarget {
    Cluster,
    Graph(String),
    WrongKind(String),
--- a/crates/omnigraph-cluster/src/failpoints.rs
+++ b/crates/omnigraph-cluster/src/failpoints.rs
@ -1,6 +1,13 @@
 //! Fault-injection hooks for the cluster apply protocol, mirroring the
 //! engine's `omnigraph::failpoints` pattern. With the `failpoints` feature
 //! off, every call site compiles to `Ok(())`.
+//!
+//! Only `maybe_fail` lives here — it returns the cluster's [`Diagnostic`]
+//! error type. The test-side configuration guard is shared: use
+//! [`omnigraph::failpoints::ScopedFailPoint`], which is registry-only
+//! (error-type agnostic) and reachable because the cluster's `failpoints`
+//! feature enables `omnigraph/failpoints`. One `ScopedFailPoint`, in the
+//! lowest crate, avoids a drifting duplicate.

 use crate::Diagnostic;

@ -19,38 +26,16 @@ pub(crate) fn maybe_fail(_name: &str) -> Result<(), Diagnostic> {
    Ok(())
 }

-#[cfg(feature = "failpoints")]
-pub struct ScopedFailPoint {
-    name: String,
-}
-
-#[cfg(feature = "failpoints")]
-impl ScopedFailPoint {
-    pub fn new(name: &str, action: &str) -> Self {
-        fail::cfg(name, action).expect("configure failpoint");
-        Self {
-            name: name.to_string(),
-        }
-    }
-
-    /// Register a callback failpoint with the same Drop-based cleanup as
-    /// `new`. Without the guard, a panic while the point is active would
-    /// leak the callback into the process-global registry and fire it under
-    /// later tests in the same binary.
-    pub fn with_callback<F>(name: &str, callback: F) -> Self
-    where
-        F: Fn() + Send + Sync + 'static,
-    {
-        fail::cfg_callback(name, callback).expect("configure callback failpoint");
-        Self {
-            name: name.to_string(),
-        }
-    }
-}
-
-#[cfg(feature = "failpoints")]
-impl Drop for ScopedFailPoint {
-    fn drop(&mut self) {
-        fail::remove(&self.name);
-    }
+/// Compile-checked catalog of this crate's apply-protocol failpoint names.
+/// Engine-scoped names referenced from cluster tests live in
+/// [`omnigraph::failpoints::names`].
+pub mod names {
+    pub const CLUSTER_APPLY_AFTER_GRAPH_CREATE: &str = "cluster_apply.after_graph_create";
+    pub const CLUSTER_APPLY_AFTER_GRAPH_DELETE: &str = "cluster_apply.after_graph_delete";
+    pub const CLUSTER_APPLY_AFTER_PAYLOAD_PHASE: &str = "cluster_apply.after_payload_phase";
+    pub const CLUSTER_APPLY_AFTER_SCHEMA_APPLY: &str = "cluster_apply.after_schema_apply";
+    pub const CLUSTER_APPLY_BEFORE_GRAPH_CREATE: &str = "cluster_apply.before_graph_create";
+    pub const CLUSTER_APPLY_BEFORE_GRAPH_DELETE: &str = "cluster_apply.before_graph_delete";
+    pub const CLUSTER_APPLY_BEFORE_SCHEMA_APPLY: &str = "cluster_apply.before_schema_apply";
+    pub const CLUSTER_APPLY_BEFORE_STATE_WRITE: &str = "cluster_apply.before_state_write";
 }
--- a/crates/omnigraph-cluster/src/lib.rs
+++ b/crates/omnigraph-cluster/src/lib.rs
@ -1,8 +1,6 @@
 use std::collections::{BTreeMap, BTreeSet};
-use std::fs::{self, OpenOptions};
-use std::io::{ErrorKind, Write};
+use std::fs::{self};
 use std::path::{Path, PathBuf};
-use std::process;

 use omnigraph::db::{Omnigraph, ReadTarget, SchemaApplyOptions};
 use omnigraph_compiler::SchemaMigrationPlan;
@ -26,11 +24,7 @@ mod store;
 mod sweep;
 mod types;
 use config::{
-    QueriesDecl, future_field_diagnostics, graph_address, initial_import_state, load_desired,
-    normalize_policy_target, observe_declared_graphs, observe_live_graph, parse_cluster_config,
-    policy_address, preview_schema_migration, query_address, resolve_config_path,
-    resolve_query_decls, schema_address, state_resource_digests, validate_cluster_header,
-    validate_id, validate_query_source,
+    QueriesDecl, graph_address, initial_import_state, load_desired, observe_declared_graphs, parse_cluster_config, preview_schema_migration, schema_address, state_resource_digests, validate_cluster_header,
 };
 use diff::{
    FailedGraphOrigin, ResourceKind, append_embedding_profile_changes,
@ -42,13 +36,12 @@ pub use serve::{
    cluster_root_for_graph_uri, read_serving_snapshot, read_serving_snapshot_from_storage,
    resolve_graph_storage_uri,
 };
-use store::{ClusterStore, StateLockGuard, StateSnapshot};
+use store::ClusterStore;
 use sweep::{
    mark_approvals_consumed, record_approval_consumed, sweep_recovery_sidecars,
    tombstone_graph_subtree, warn_pending_recovery_sidecars,
 };
 pub use types::*;
-use types::*;

 pub const CLUSTER_CONFIG_FILE: &str = "cluster.yaml";
 pub const CLUSTER_GRAPHS_DIR: &str = "graphs";
@ -510,7 +503,7 @@ pub async fn apply_config_dir_with_options(
                continue;
            }
        };
-        if let Err(diagnostic) = failpoints::maybe_fail("cluster_apply.before_graph_create") {
+        if let Err(diagnostic) = failpoints::maybe_fail(crate::failpoints::names::CLUSTER_APPLY_BEFORE_GRAPH_CREATE) {
            // Simulated crash before the init: the sidecar stays for the
            // sweep (row 1: root absent -> intent removed next run).
            diagnostics.push(diagnostic);
@ -587,7 +580,7 @@ pub async fn apply_config_dir_with_options(
        // Crash point: the graph exists, the cluster state does not record it
        // yet. A failure here must acknowledge nothing; the next run's sweep
        // rolls the ledger forward (row 4).
-        if let Err(diagnostic) = failpoints::maybe_fail("cluster_apply.after_graph_create") {
+        if let Err(diagnostic) = failpoints::maybe_fail(crate::failpoints::names::CLUSTER_APPLY_AFTER_GRAPH_CREATE) {
            diagnostics.push(diagnostic);
            return early_return(
                display_path(&desired.config_dir),
@ -727,7 +720,7 @@ pub async fn apply_config_dir_with_options(
                continue;
            }
        };
-        if let Err(diagnostic) = failpoints::maybe_fail("cluster_apply.before_schema_apply") {
+        if let Err(diagnostic) = failpoints::maybe_fail(crate::failpoints::names::CLUSTER_APPLY_BEFORE_SCHEMA_APPLY) {
            // Simulated crash before the engine call: the sidecar stays; the
            // sweep retires it next run (ledger still consistent with live).
            diagnostics.push(diagnostic);
@ -787,7 +780,7 @@ pub async fn apply_config_dir_with_options(
        }
        // Crash point: the manifest moved, the ledger does not record it yet.
        // A failure here acknowledges nothing; the sweep rolls forward.
-        if let Err(diagnostic) = failpoints::maybe_fail("cluster_apply.after_schema_apply") {
+        if let Err(diagnostic) = failpoints::maybe_fail(crate::failpoints::names::CLUSTER_APPLY_AFTER_SCHEMA_APPLY) {
            diagnostics.push(diagnostic);
            return early_return(
                display_path(&desired.config_dir),
@ -872,7 +865,7 @@ pub async fn apply_config_dir_with_options(
    // Crash point: payloads are on disk, state has not moved. A failure here
    // must leave state.json byte-identical and acknowledge nothing; re-running
    // apply repairs via the skip-if-exists blob reuse.
-    if let Err(diagnostic) = failpoints::maybe_fail("cluster_apply.after_payload_phase") {
+    if let Err(diagnostic) = failpoints::maybe_fail(crate::failpoints::names::CLUSTER_APPLY_AFTER_PAYLOAD_PHASE) {
        diagnostics.push(diagnostic);
        return early_return(
            display_path(&desired.config_dir),
@ -949,7 +942,7 @@ pub async fn apply_config_dir_with_options(
                continue;
            }
        };
-        if let Err(diagnostic) = failpoints::maybe_fail("cluster_apply.before_graph_delete") {
+        if let Err(diagnostic) = failpoints::maybe_fail(crate::failpoints::names::CLUSTER_APPLY_BEFORE_GRAPH_DELETE) {
            // Simulated crash before removal: row 8 retires the intent and
            // the still-valid approval lets a later run retry.
            diagnostics.push(diagnostic);
@ -974,7 +967,7 @@ pub async fn apply_config_dir_with_options(
        }
        // Crash point: the root is gone, the ledger does not record it yet.
        // The sweep rolls forward (row 7b) and consumes the approval.
-        if let Err(diagnostic) = failpoints::maybe_fail("cluster_apply.after_graph_delete") {
+        if let Err(diagnostic) = failpoints::maybe_fail(crate::failpoints::names::CLUSTER_APPLY_AFTER_GRAPH_DELETE) {
            diagnostics.push(diagnostic);
            return early_return(
                display_path(&desired.config_dir),
@ -1080,7 +1073,7 @@ pub async fn apply_config_dir_with_options(
        // persisted-statuses revert contract below is exercised; a cfg_callback
        // on this point can mutate state.json to simulate a concurrent writer,
        // making write_state's CAS check fail organically.
-        let write_result = match failpoints::maybe_fail("cluster_apply.before_state_write") {
+        let write_result = match failpoints::maybe_fail(crate::failpoints::names::CLUSTER_APPLY_BEFORE_STATE_WRITE) {
            Ok(()) => {
                backend
                    .write_state(&new_state, expected_cas.as_deref(), &mut observations)
--- a/crates/omnigraph-cluster/src/store.rs
+++ b/crates/omnigraph-cluster/src/store.rs
@ -408,10 +408,6 @@ impl ClusterStore {
        }
    }

-    pub(crate) fn payload_display(&self, kind: &ResourceKind, digest: &str) -> Option<String> {
-        Self::payload_relative(kind, digest).map(|relative| self.display(&relative))
-    }
-
    pub(crate) async fn payload_exists(&self, kind: &ResourceKind, digest: &str) -> bool {
        let Some(relative) = Self::payload_relative(kind, digest) else {
            return false;
--- a/crates/omnigraph-cluster/tests/failpoints.rs
+++ b/crates/omnigraph-cluster/tests/failpoints.rs
@ -13,9 +13,11 @@ use std::fs;
 use std::path::{Path, PathBuf};

 use fail::FailScenario;
+use serial_test::serial;
 use omnigraph::db::Omnigraph;
-use omnigraph::failpoints::ScopedFailPoint as EngineScopedFailPoint;
-use omnigraph_cluster::failpoints::ScopedFailPoint;
+// One ScopedFailPoint for both engine- and cluster-scoped failpoint names:
+// it is registry-only (error-type agnostic) and lives in the lowest crate.
+use omnigraph::failpoints::ScopedFailPoint;
 use omnigraph_cluster::{
    ApplyOptions, apply_config_dir, apply_config_dir_with_options, approve_config_dir,
    validate_config_dir,
@ -105,12 +107,13 @@ fn query_blob(config_dir: &Path, digests: &BTreeMap<String, String>) -> PathBuf
 }

 #[tokio::test]
+#[serial]
 async fn failpoint_wiring_returns_injected_diagnostic() {
    let scenario = FailScenario::setup();
    let dir = fixture();
    seed_applyable_state(dir.path());

-    let _failpoint = ScopedFailPoint::new("cluster_apply.after_payload_phase", "return");
+    let _failpoint = ScopedFailPoint::new(omnigraph_cluster::failpoints::names::CLUSTER_APPLY_AFTER_PAYLOAD_PHASE, "return");
    let out = apply_config_dir(dir.path()).await;
    assert!(!out.ok);
    assert!(out.diagnostics.iter().any(|diagnostic| {
@ -127,6 +130,7 @@ async fn failpoint_wiring_returns_injected_diagnostic() {
 /// state.json is byte-identical, nothing is acknowledged — and a plain re-run
 /// repairs by trusting the existing content-addressed blobs.
 #[tokio::test]
+#[serial]
 async fn apply_crash_after_payload_phase_leaves_state_unmoved_then_recovers() {
    let scenario = FailScenario::setup();
    let dir = fixture();
@ -134,7 +138,7 @@ async fn apply_crash_after_payload_phase_leaves_state_unmoved_then_recovers() {
    let state_before = fs::read(state_path(dir.path())).unwrap();

    {
-        let _failpoint = ScopedFailPoint::new("cluster_apply.after_payload_phase", "return");
+        let _failpoint = ScopedFailPoint::new(omnigraph_cluster::failpoints::names::CLUSTER_APPLY_AFTER_PAYLOAD_PHASE, "return");
        let out = apply_config_dir(dir.path()).await;
        assert!(!out.ok);
        assert!(!out.state_written);
@ -169,6 +173,7 @@ async fn apply_crash_after_payload_phase_leaves_state_unmoved_then_recovers() {
 /// (possible under `state.lock: false`) must surface `state_cas_mismatch`,
 /// acknowledge nothing, and leave the concurrent writer's state on disk.
 #[tokio::test]
+#[serial]
 async fn apply_cas_race_surfaces_state_cas_mismatch() {
    let scenario = FailScenario::setup();
    let dir = fixture();
@ -179,7 +184,7 @@ async fn apply_cas_race_surfaces_state_cas_mismatch() {
    // after apply read it but before apply writes. RAII-guarded so a panic
    // inside apply cannot leak the callback into the global registry.
    let race_path = state_path(dir.path());
-    let failpoint = ScopedFailPoint::with_callback("cluster_apply.before_state_write", move || {
+    let failpoint = ScopedFailPoint::with_callback(omnigraph_cluster::failpoints::names::CLUSTER_APPLY_BEFORE_STATE_WRITE, move || {
        let mut state: serde_json::Value =
            serde_json::from_str(&fs::read_to_string(&race_path).unwrap()).unwrap();
        state["state_revision"] = serde_json::json!(99);
@ -256,13 +261,14 @@ fn recovery_sidecars(config_dir: &Path) -> Vec<PathBuf> {
 /// The next run's sweep removes the intent (row 1) and the same run creates
 /// the graph and converges.
 #[tokio::test]
+#[serial]
 async fn create_crash_before_init_recovers_via_sweep() {
    let scenario = FailScenario::setup();
    let dir = fixture();
    seed_empty_state(dir.path());

    {
-        let _failpoint = ScopedFailPoint::new("cluster_apply.before_graph_create", "return");
+        let _failpoint = ScopedFailPoint::new(omnigraph_cluster::failpoints::names::CLUSTER_APPLY_BEFORE_GRAPH_CREATE, "return");
        let out = apply_config_dir(dir.path()).await;
        assert!(!out.ok);
        assert!(out.diagnostics.iter().any(|diagnostic| {
@ -298,6 +304,7 @@ async fn create_crash_before_init_recovers_via_sweep() {
 /// ledger is stale, nothing was acknowledged. The next run's sweep rolls the
 /// ledger forward (row 4) with an audit entry, and the run converges.
 #[tokio::test]
+#[serial]
 async fn create_crash_after_init_rolls_state_forward() {
    let scenario = FailScenario::setup();
    let dir = fixture();
@ -305,7 +312,7 @@ async fn create_crash_after_init_rolls_state_forward() {
    let state_before = fs::read(dir.path().join("__cluster/state.json")).unwrap();

    {
-        let _failpoint = ScopedFailPoint::new("cluster_apply.after_graph_create", "return");
+        let _failpoint = ScopedFailPoint::new(omnigraph_cluster::failpoints::names::CLUSTER_APPLY_AFTER_GRAPH_CREATE, "return");
        let out = apply_config_dir(dir.path()).await;
        assert!(!out.ok);
        assert!(!out.state_written);
@ -385,6 +392,7 @@ async fn live_schema_digest(dir: &Path) -> String {
 /// live schema and ledger are untouched; the next run's sweep retires the
 /// stale intent and the same run applies and converges.
 #[tokio::test]
+#[serial]
 async fn schema_crash_before_apply_recovers_via_sweep() {
    let scenario = FailScenario::setup();
    let dir = fixture();
@ -393,7 +401,7 @@ async fn schema_crash_before_apply_recovers_via_sweep() {
    fs::write(dir.path().join("people.pg"), SCHEMA_V2).unwrap();

    {
-        let _failpoint = ScopedFailPoint::new("cluster_apply.before_schema_apply", "return");
+        let _failpoint = ScopedFailPoint::new(omnigraph_cluster::failpoints::names::CLUSTER_APPLY_BEFORE_SCHEMA_APPLY, "return");
        let out = apply_config_dir_with_options(
            dir.path(),
            ApplyOptions {
@ -425,6 +433,7 @@ async fn schema_crash_before_apply_recovers_via_sweep() {
 /// the graph manifest moves. The defensive cleanup proof should remove the
 /// cluster sidecar immediately so a pre-movement error cannot brick boot.
 #[tokio::test]
+#[serial]
 async fn schema_apply_error_before_graph_movement_removes_sidecar() {
    let scenario = FailScenario::setup();
    let dir = fixture();
@ -433,7 +442,7 @@ async fn schema_apply_error_before_graph_movement_removes_sidecar() {
    fs::write(dir.path().join("people.pg"), SCHEMA_V2).unwrap();

    {
-        let _failpoint = EngineScopedFailPoint::new("schema_apply.before_staging_write", "return");
+        let _failpoint = ScopedFailPoint::new(omnigraph::failpoints::names::SCHEMA_APPLY_BEFORE_STAGING_WRITE, "return");
        let out = apply_config_dir(dir.path()).await;
        assert!(!out.ok);
        assert!(
@ -462,6 +471,7 @@ async fn schema_apply_error_before_graph_movement_removes_sidecar() {
 /// prove this is a pre-movement failure, so the sidecar must survive for
 /// explicit recovery/quarantine instead of being cleaned up defensively.
 #[tokio::test]
+#[serial]
 async fn schema_apply_error_after_graph_movement_keeps_sidecar() {
    let scenario = FailScenario::setup();
    let dir = fixture();
@ -472,7 +482,7 @@ async fn schema_apply_error_after_graph_movement_keeps_sidecar() {
    let v2_digest = desired.resource_digests["schema.knowledge"].clone();

    {
-        let _failpoint = EngineScopedFailPoint::new("schema_apply.after_manifest_commit", "return");
+        let _failpoint = ScopedFailPoint::new(omnigraph::failpoints::names::SCHEMA_APPLY_AFTER_MANIFEST_COMMIT, "return");
        let out = apply_config_dir(dir.path()).await;
        assert!(!out.ok);
        assert!(
@ -524,6 +534,7 @@ async fn schema_apply_error_after_graph_movement_keeps_sidecar() {
 /// moved, the ledger is stale, nothing acknowledged; the next run's sweep
 /// rolls the ledger forward with an audit entry and the run converges.
 #[tokio::test]
+#[serial]
 async fn schema_crash_after_apply_rolls_state_forward() {
    let scenario = FailScenario::setup();
    let dir = fixture();
@ -534,7 +545,7 @@ async fn schema_crash_after_apply_rolls_state_forward() {
    let v2_digest = desired.resource_digests["schema.knowledge"].clone();

    {
-        let _failpoint = ScopedFailPoint::new("cluster_apply.after_schema_apply", "return");
+        let _failpoint = ScopedFailPoint::new(omnigraph_cluster::failpoints::names::CLUSTER_APPLY_AFTER_SCHEMA_APPLY, "return");
        let out = apply_config_dir(dir.path()).await;
        assert!(!out.ok);
        assert!(!out.state_written);
@ -608,13 +619,14 @@ async fn seed_approved_delete(dir: &Path) -> String {
 /// next run retires the stale intent (row 8) and the still-approved delete
 /// completes in the same run.
 #[tokio::test]
+#[serial]
 async fn delete_crash_before_removal_reproposes() {
    let scenario = FailScenario::setup();
    let dir = fixture();
    let approval_id = seed_approved_delete(dir.path()).await;

    {
-        let _failpoint = ScopedFailPoint::new("cluster_apply.before_graph_delete", "return");
+        let _failpoint = ScopedFailPoint::new(omnigraph_cluster::failpoints::names::CLUSTER_APPLY_BEFORE_GRAPH_DELETE, "return");
        let out = apply_config_dir(dir.path()).await;
        assert!(!out.ok);
        assert!(dir.path().join("graphs/old.omni").exists());
@ -650,6 +662,7 @@ async fn delete_crash_before_removal_reproposes() {
 /// nothing acknowledged; the next run's sweep rolls the tombstone forward,
 /// consumes the approval the sidecar carries, and audits the recovery.
 #[tokio::test]
+#[serial]
 async fn delete_crash_after_removal_rolls_forward() {
    let scenario = FailScenario::setup();
    let dir = fixture();
@ -657,7 +670,7 @@ async fn delete_crash_after_removal_rolls_forward() {
    let state_before = fs::read(state_path(dir.path())).unwrap();

    {
-        let _failpoint = ScopedFailPoint::new("cluster_apply.after_graph_delete", "return");
+        let _failpoint = ScopedFailPoint::new(omnigraph_cluster::failpoints::names::CLUSTER_APPLY_AFTER_GRAPH_DELETE, "return");
        let out = apply_config_dir(dir.path()).await;
        assert!(!out.ok);
        assert!(!out.state_written);
--- a/crates/omnigraph-compiler/Cargo.toml
+++ b/crates/omnigraph-compiler/Cargo.toml
@ -1,6 +1,6 @@
 [package]
 name = "omnigraph-compiler"
-version = "0.7.1"
+version = "0.7.2"
 edition = "2024"
 description = "Schema/query compiler for Omnigraph. Zero Lance dependency."
 license = "MIT"
--- a/crates/omnigraph-mcp/Cargo.toml
+++ b/crates/omnigraph-mcp/Cargo.toml
@ -1,6 +1,6 @@
 [package]
 name = "omnigraph-mcp"
-version = "0.7.1"
+version = "0.7.2"
 edition = "2024"
 description = "MCP (Model Context Protocol) Streamable-HTTP transport and backend seam for Omnigraph. Contains the rmcp dependency and defines the McpBackend trait the server implements; names no omnigraph engine/server type, so the dependency edge is server → mcp."
 license = "MIT"
--- a/crates/omnigraph-policy/Cargo.toml
+++ b/crates/omnigraph-policy/Cargo.toml
@ -1,6 +1,6 @@
 [package]
 name = "omnigraph-policy"
-version = "0.7.1"
+version = "0.7.2"
 edition = "2024"
 description = "Policy / authorization layer for Omnigraph — Cedar-backed PolicyEngine, PolicyChecker trait, ResourceScope enum."
 license = "MIT"
--- a/crates/omnigraph-server/Cargo.toml
+++ b/crates/omnigraph-server/Cargo.toml
@ -1,6 +1,6 @@
 [package]
 name = "omnigraph-server"
-version = "0.7.1"
+version = "0.7.2"
 edition = "2024"
 description = "HTTP server for the Omnigraph graph database."
 license = "MIT"
@ -19,14 +19,14 @@ default = []
 aws = ["dep:aws-config", "dep:aws-sdk-secretsmanager"]

 [dependencies]
-omnigraph = { package = "omnigraph-engine", path = "../omnigraph", version = "0.7.1" }
-omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.1" }
-omnigraph-policy = { path = "../omnigraph-policy", version = "0.7.1" }
-omnigraph-api-types = { path = "../omnigraph-api-types", version = "0.7.1" }
-omnigraph-cluster = { path = "../omnigraph-cluster", version = "0.7.1" }
+omnigraph = { package = "omnigraph-engine", path = "../omnigraph", version = "0.7.2" }
+omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.2" }
+omnigraph-policy = { path = "../omnigraph-policy", version = "0.7.2" }
+omnigraph-api-types = { path = "../omnigraph-api-types", version = "0.7.2" }
+omnigraph-cluster = { path = "../omnigraph-cluster", version = "0.7.2" }
 # The MCP surface. rmcp is contained to omnigraph-mcp — the server carries NO
 # direct rmcp dependency (verify: `cargo tree -p omnigraph-server -e normal | grep rmcp`).
-omnigraph-mcp = { path = "../omnigraph-mcp", version = "0.7.1" }
+omnigraph-mcp = { path = "../omnigraph-mcp", version = "0.7.2" }
 axum = { workspace = true }
 http = "1"
 clap = { workspace = true }
--- a/crates/omnigraph/Cargo.toml
+++ b/crates/omnigraph/Cargo.toml
@ -1,6 +1,6 @@
 [package]
 name = "omnigraph-engine"
-version = "0.7.1"
+version = "0.7.2"
 edition = "2024"
 description = "Runtime engine for the Omnigraph graph database."
 license = "MIT"
@ -16,8 +16,8 @@ default = []
 failpoints = ["dep:fail", "fail/failpoints"]

 [dependencies]
-omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.1" }
-omnigraph-policy = { path = "../omnigraph-policy", version = "0.7.1" }
+omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.2" }
+omnigraph-policy = { path = "../omnigraph-policy", version = "0.7.2" }
 lance = { workspace = true }
 lance-datafusion = { workspace = true }
 datafusion = { workspace = true }
@ -52,7 +52,7 @@ chrono = { workspace = true }
 arc-swap = { workspace = true }

 [dev-dependencies]
-omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.1" }
+omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.2" }
 tokio = { workspace = true }
 lance-namespace-impls = { workspace = true }
 lance-io = "7.0.0"
--- a/crates/omnigraph/src/db/commit_graph.rs
+++ b/crates/omnigraph/src/db/commit_graph.rs
@ -1,6 +1,5 @@
 use std::collections::{HashMap, VecDeque};
 use std::sync::Arc;
-use std::time::{SystemTime, UNIX_EPOCH};

 use arrow_array::{
    Array, RecordBatch, RecordBatchIterator, StringArray, TimestampMicrosecondArray, UInt64Array,
@ -29,7 +28,16 @@ pub struct GraphCommit {

 pub struct CommitGraph {
    root_uri: String,
-    dataset: Dataset,
+    /// Handle on `_graph_commits.lance` at the active branch, held only for the
+    /// branch-management WRITES (`create_branch`, formerly `version`) and
+    /// `refresh`. It is a DERIVED artifact (RFC-013 Phase 7): graph lineage lives
+    /// in `__manifest`, and reads (`head_commit`/`load_commits`/`get_commit`/
+    /// `merge_base`) never touch it. `None` means the branch's
+    /// `_graph_commits.lance` ref is missing (an interrupted fork-reclaim or a
+    /// `cleanup` race) while the manifest lineage is still authoritative — so the
+    /// READS stay correct and only a subsequent `create_branch` surfaces the loud
+    /// actionable error. Mirrors `actor_dataset`'s best-effort `Option`.
+    dataset: Option<Dataset>,
    actor_dataset: Option<Dataset>,
    active_branch: Option<String>,
    actor_by_commit_id: HashMap<String, String>,
@ -38,20 +46,19 @@ pub struct CommitGraph {
 }

 impl CommitGraph {
-    pub async fn init(root_uri: &str, manifest_version: u64) -> Result<Self> {
+    /// Create the commit-graph datasets for a fresh graph. The genesis
+    /// `graph_commit` + `graph_head` rows live in `__manifest` (folded into the
+    /// init write — RFC-013 Phase 7), so `_graph_commits.lance` is created EMPTY
+    /// here: it exists only to carry the Lance branch refs that `create_branch` /
+    /// `list_branches` / the `cleanup` orphan reconciler operate on. No commit
+    /// rows are ever written to it. The in-memory cache is sourced from the
+    /// manifest projection — the same path as [`open`], so genesis is seen
+    /// identically whether the graph was just initialized or reopened.
+    pub async fn init(root_uri: &str) -> Result<Self> {
        let root = root_uri.trim_end_matches('/');
        let uri = graph_commits_uri(root);
-        let genesis = GraphCommit {
-            graph_commit_id: ulid::Ulid::new().to_string(),
-            manifest_branch: None,
-            manifest_version,
-            parent_commit_id: None,
-            merged_parent_commit_id: None,
-            actor_id: None,
-            created_at: now_micros()?,
-        };

-        let batch = commits_to_batch(&[genesis.clone()])?;
+        let batch = RecordBatch::new_empty(commit_graph_schema());
        let reader = RecordBatchIterator::new(vec![Ok(batch)], commit_graph_schema());
        let params = WriteParams {
            mode: WriteMode::Create,
@ -66,17 +73,30 @@ impl CommitGraph {
            .map_err(|e| OmniError::Lance(e.to_string()))?;
        let actor_dataset = create_commit_actor_dataset(root).await?;

+        let (commit_by_id, head_commit) = load_commit_cache_from_manifest(root, None).await?;
        Ok(Self {
            root_uri: root.to_string(),
-            dataset,
+            dataset: Some(dataset),
            actor_dataset: Some(actor_dataset),
            active_branch: None,
            actor_by_commit_id: HashMap::new(),
-            commit_by_id: HashMap::from([(genesis.graph_commit_id.clone(), genesis.clone())]),
-            head_commit: Some(genesis),
+            commit_by_id,
+            head_commit,
        })
    }

+    /// Insert a just-published commit into the in-memory cache (RFC-013 Phase 7).
+    /// The durable write already happened in the manifest publish CAS; this only
+    /// keeps the cache consistent for same-handle reads, with no storage I/O.
+    /// Head selection matches the manifest-sourced load (`should_replace_head`).
+    pub fn insert_committed(&mut self, commit: GraphCommit) {
+        if should_replace_head(self.head_commit.as_ref(), &commit) {
+            self.head_commit = Some(commit.clone());
+        }
+        self.commit_by_id
+            .insert(commit.graph_commit_id.clone(), commit);
+    }
+
    pub async fn open(root_uri: &str) -> Result<Self> {
        let root = root_uri.trim_end_matches('/');
        let wrapper = crate::instrumentation::commit_graph_wrapper();
@ -87,17 +107,24 @@ impl CommitGraph {
            crate::instrumentation::open_dataset_tracked(&graph_commit_actors_uri(root), wrapper)
                .await
                .ok();
-        let actor_by_commit_id = match &actor_dataset {
-            Some(dataset) => load_commit_actor_cache(dataset).await?,
-            None => HashMap::new(),
-        };
-        let (commit_by_id, head_commit) = load_commit_cache(&dataset, &actor_by_commit_id).await?;
+        // RFC-013 step 4: source the in-memory cache from the `__manifest`
+        // lineage projection (which carries the actor inline), not from
+        // `_graph_commits.lance`. The dataset handles above are retained for the
+        // branch-management ops (create/delete/list/version) that still target
+        // the commit-graph dataset; the actor dataset is only kept for the
+        // dual-write append path. The projection-equivalence gate proves this
+        // cache equals the prior `_graph_commits.lance` read. A pre-Phase-7 (v3)
+        // graph not yet migrated falls back to the legacy read — see
+        // `load_commit_cache_for_branch`.
+        let (commit_by_id, head_commit) = load_commit_cache_for_branch(root, None).await?;
        Ok(Self {
            root_uri: root.to_string(),
-            dataset,
+            // `open` targets main and never checks out a branch (main cannot be
+            // deleted/recreated), so the handle is always present here.
+            dataset: Some(dataset),
            actor_dataset,
            active_branch: None,
-            actor_by_commit_id,
+            actor_by_commit_id: HashMap::new(),
            commit_by_id,
            head_commit,
        })
@ -109,25 +136,33 @@ impl CommitGraph {
        let dataset =
            crate::instrumentation::open_dataset_tracked(&graph_commits_uri(root), wrapper.clone())
                .await?;
-        let dataset = dataset
-            .checkout_branch(branch)
-            .await
-            .map_err(|e| OmniError::Lance(e.to_string()))?;
+        // Best-effort checkout of the DERIVED `_graph_commits.lance` branch ref.
+        // It is held only for `create_branch` (a write); the lineage READ below
+        // comes from `__manifest`. A missing ref (interrupted fork-reclaim /
+        // `cleanup` race) must not wedge the read, so a typed not-found yields a
+        // `None` handle — a subsequent `create_branch` then surfaces the loud
+        // error. Any OTHER open error (transient IO / corrupt) still propagates,
+        // matching the `force_delete_branch` / `read_legacy_commit_cache` idiom.
+        let dataset = match dataset.checkout_branch(branch).await {
+            Ok(ds) => Some(ds),
+            Err(lance::Error::RefNotFound { .. }) | Err(lance::Error::NotFound { .. }) => None,
+            Err(e) => return Err(OmniError::Lance(e.to_string())),
+        };
        let actor_dataset =
            crate::instrumentation::open_dataset_tracked(&graph_commit_actors_uri(root), wrapper)
                .await
                .ok();
-        let actor_by_commit_id = match &actor_dataset {
-            Some(dataset) => load_commit_actor_cache(dataset).await?,
-            None => HashMap::new(),
-        };
-        let (commit_by_id, head_commit) = load_commit_cache(&dataset, &actor_by_commit_id).await?;
+        // Hard `?`: the manifest existence gate. `load_commit_cache_for_branch`
+        // opens the branch's `__manifest` (its own `checkout_branch` on the
+        // authoritative table), so a TRULY absent branch still fails loudly here —
+        // only the derived `_graph_commits.lance` ref is allowed to be missing.
+        let (commit_by_id, head_commit) = load_commit_cache_for_branch(root, Some(branch)).await?;
        Ok(Self {
            root_uri: root.to_string(),
            dataset,
            actor_dataset,
            active_branch: Some(branch.to_string()),
-            actor_by_commit_id,
+            actor_by_commit_id: HashMap::new(),
            commit_by_id,
            head_commit,
        })
@ -136,40 +171,49 @@ impl CommitGraph {
    pub async fn refresh(&mut self) -> Result<()> {
        let root = self.root_uri.clone();
        let wrapper = crate::instrumentation::commit_graph_wrapper();
-        self.dataset = crate::instrumentation::open_dataset_tracked(
+        let dataset = crate::instrumentation::open_dataset_tracked(
            &graph_commits_uri(&root),
            wrapper.clone(),
        )
        .await?;
-        if let Some(branch) = &self.active_branch {
-            self.dataset = self
-                .dataset
-                .checkout_branch(branch)
-                .await
-                .map_err(|e| OmniError::Lance(e.to_string()))?;
-        }
+        // Same best-effort checkout as `open_at_branch`: a missing DERIVED branch
+        // ref leaves the handle `None` (only `create_branch` then errors), while
+        // the in-memory cache re-syncs from the authoritative manifest below.
+        self.dataset = match &self.active_branch {
+            Some(branch) => match dataset.checkout_branch(branch).await {
+                Ok(ds) => Some(ds),
+                Err(lance::Error::RefNotFound { .. }) | Err(lance::Error::NotFound { .. }) => None,
+                Err(e) => return Err(OmniError::Lance(e.to_string())),
+            },
+            None => Some(dataset),
+        };
        self.actor_dataset =
            crate::instrumentation::open_dataset_tracked(&graph_commit_actors_uri(&root), wrapper)
                .await
                .ok();
-        self.actor_by_commit_id = match &self.actor_dataset {
-            Some(dataset) => load_commit_actor_cache(dataset).await?,
-            None => HashMap::new(),
-        };
        let (commit_by_id, head_commit) =
-            load_commit_cache(&self.dataset, &self.actor_by_commit_id).await?;
+            load_commit_cache_for_branch(&root, self.active_branch.as_deref()).await?;
        self.commit_by_id = commit_by_id;
        self.head_commit = head_commit;
        Ok(())
    }

-    pub fn version(&self) -> u64 {
-        self.dataset.version().version
-    }
-
    pub async fn create_branch(&mut self, name: &str) -> Result<()> {
-        let mut ds = self.dataset.clone();
-        ds.create_branch(name, self.version(), None)
+        // The held `_graph_commits.lance` handle is the only thing that can fork a
+        // branch ref. If it is missing (an interrupted fork-reclaim or a `cleanup`
+        // race dropped the derived ref while manifest lineage stayed authoritative),
+        // fail loudly + actionably rather than silently. Repair is the existing
+        // `cleanup` orphan reconciler (`reconcile_commit_graph_orphans`), not an
+        // inline write on this path.
+        let Some(dataset) = &self.dataset else {
+            let branch = self.active_branch.as_deref().unwrap_or("main");
+            return Err(OmniError::manifest_internal(format!(
+                "commit-graph branch ref for '{branch}' is missing; run `omnigraph cleanup` then retry"
+            )));
+        };
+        let version = dataset.version().version;
+        let mut ds = dataset.clone();
+        ds.create_branch(name, version, None)
            .await
            .map_err(|e| OmniError::Lance(e.to_string()))?;
        Ok(())
@ -216,7 +260,17 @@ impl CommitGraph {
        Ok(branches.into_keys().collect())
    }

-    pub async fn append_commit(
+    // DEAD as of RFC-013 Phase 7: graph commits are recorded in `__manifest`
+    // (folded into the publish CAS), never appended to `_graph_commits.lance`.
+    // These append helpers are retained only because the actor sidecar table they
+    // touch is still enumerated by `optimize` (internal-table compaction); they
+    // have no caller on any write path. The single-source invariant is guarded by
+    // `tests/lineage_projection.rs`, which fails if `_graph_commits.lance` ever
+    // gains a commit row. Do NOT call these to record a commit — use the
+    // coordinator's `commit_*_with_actor` / `commit_merge_with_actor`, which carry
+    // the lineage intent into the manifest publish.
+    #[allow(dead_code)]
+    async fn append_commit(
        &mut self,
        manifest_branch: Option<&str>,
        manifest_version: u64,
@ -233,7 +287,8 @@ impl CommitGraph {
        .await
    }

-    pub async fn append_merge_commit(
+    #[allow(dead_code)]
+    async fn append_merge_commit(
        &mut self,
        manifest_branch: Option<&str>,
        manifest_version: u64,
@ -251,6 +306,7 @@ impl CommitGraph {
        .await
    }

+    #[allow(dead_code)]
    async fn append_commit_with_parents(
        &mut self,
        manifest_branch: Option<&str>,
@ -267,16 +323,22 @@ impl CommitGraph {
            parent_commit_id: parent_commit_id.map(|s| s.to_string()),
            merged_parent_commit_id: merged_parent_commit_id.map(|s| s.to_string()),
            actor_id: actor_id.map(str::to_string),
-            created_at: now_micros()?,
+            created_at: crate::db::now_micros()?,
        };

        let batch = commits_to_batch(&[commit.clone()])?;
        let reader = RecordBatchIterator::new(vec![Ok(batch)], commit_graph_schema());
-        let mut ds = self.dataset.clone();
+        // This helper is dead on every write path (RFC-013 Phase 7) — reached only
+        // by the transitional v3 fixtures, which always hold the commits dataset.
+        // A `None` here would be a fixture bug, so fail loudly rather than silently.
+        let mut ds = self
+            .dataset
+            .clone()
+            .ok_or_else(|| OmniError::manifest_internal("commit-graph dataset is missing"))?;
        ds.append(reader, None)
            .await
            .map_err(|e| OmniError::Lance(e.to_string()))?;
-        self.dataset = ds;
+        self.dataset = Some(ds);
        if let Some(actor_id) = actor_id {
            self.append_actor(&graph_commit_id, actor_id).await?;
        }
@ -289,6 +351,7 @@ impl CommitGraph {
        Ok(graph_commit_id)
    }

+    #[allow(dead_code)] // RFC-013 Phase 7: dead — see `append_commit`.
    async fn append_actor(&mut self, graph_commit_id: &str, actor_id: &str) -> Result<()> {
        if self
            .actor_by_commit_id
@ -301,7 +364,7 @@ impl CommitGraph {
        let record = CommitActorRecord {
            graph_commit_id: graph_commit_id.to_string(),
            actor_id: actor_id.to_string(),
-            created_at: now_micros()?,
+            created_at: crate::db::now_micros()?,
        };
        let batch = commit_actors_to_batch(&[record])?;
        let reader = RecordBatchIterator::new(vec![Ok(batch)], commit_actor_schema());
@ -452,7 +515,12 @@ async fn create_commit_actor_dataset(root_uri: &str) -> Result<Dataset> {
    };
    match Dataset::write(reader, &uri as &str, Some(params)).await {
        Ok(dataset) => Ok(dataset),
-        Err(err) if err.to_string().contains("Dataset already exists") => Dataset::open(&uri)
+        // Create-or-open idempotency: a concurrent/prior create raced us. Match
+        // the typed `DatasetAlreadyExists` variant, not the display string — the
+        // message is not a Lance API contract (a wording change would silently
+        // break this fallback). Pinned by
+        // `lance_surface_guards.rs::lance_error_dataset_already_exists_variant_exists`.
+        Err(lance::Error::DatasetAlreadyExists { .. }) => Dataset::open(&uri)
            .await
            .map_err(|open_err| OmniError::Lance(open_err.to_string())),
        Err(err) => Err(OmniError::Lance(err.to_string())),
@ -490,6 +558,156 @@ fn commits_to_batch(commits: &[GraphCommit]) -> Result<RecordBatch> {
    .map_err(|e| OmniError::Lance(e.to_string()))
 }

+/// Build the in-memory commit cache for `branch`, choosing the source by the
+/// branch manifest's internal-schema stamp (RFC-013 step 4 forward/back-compat):
+///
+/// - stamp ≥ v4 (post-Phase-7, the normal case): the `__manifest` lineage
+///   projection — `graph_commit`/`graph_head` rows folded into the publish CAS.
+/// - stamp < v4 (a pre-Phase-7 graph not yet migrated): the legacy
+///   `_graph_commits.lance` read. This is the **transitional v3 fallback** that
+///   lets a READ-ONLY open of an un-migrated graph still see correct history —
+///   a read-only open never runs the v3→v4 backfill (it must not write), so
+///   without this gate it would read an empty DAG from `__manifest`. A
+///   read-write open backfills `__manifest` on its first write and thereafter
+///   takes the projection branch.
+///
+/// Both sources pick the head with `should_replace_head`, so the cache is
+/// identical regardless of which branch is taken. Remove the fallback once no
+/// graph below internal-schema v4 remains.
+async fn load_commit_cache_for_branch(
+    root_uri: &str,
+    branch: Option<&str>,
+) -> Result<(HashMap<String, GraphCommit>, Option<GraphCommit>)> {
+    let stamp = crate::db::manifest::internal_schema_stamp_at(root_uri, branch).await?;
+    // Defense-in-depth: refuse a branch whose stamp this binary cannot serve —
+    // newer than CURRENT, or below MIN_SUPPORTED — for the same reason the main
+    // read path does (`refuse_if_internal_schema_unsupported`). A `> CURRENT` stamp
+    // means a newer binary wrote a shape we can't read, so the projection below
+    // would misread it; a `< MIN` stamp predates the legacy readers this binary
+    // still carries. Not a live hole today: migrations run main-first
+    // (`migrate_on_open` migrates main; each branch migrates on its own first
+    // write), so main's stamp bounds every branch's and the main read path already
+    // refuses first. The guard closes the gap if that ordering is ever weakened.
+    crate::db::manifest::refuse_if_stamp_unsupported(stamp)?;
+    if stamp < crate::db::manifest::INTERNAL_MANIFEST_SCHEMA_VERSION {
+        // Transitional: un-migrated v3 graph — read lineage from the legacy
+        // `_graph_commits.lance` so reads (incl. read-only opens) see history.
+        return read_legacy_commit_cache(root_uri, branch).await;
+    }
+    load_commit_cache_from_manifest(root_uri, branch).await
+}
+
+/// Build the in-memory commit cache from the `__manifest` graph-lineage
+/// projection (RFC-013 step 4) rather than `_graph_commits.lance`. The lineage
+/// rows carry the actor inline, so no separate actor-table read is needed. Head
+/// selection is identical to [`load_commit_cache`] (`should_replace_head`), so
+/// the resulting cache is equivalent to the prior `_graph_commits.lance` read.
+async fn load_commit_cache_from_manifest(
+    root_uri: &str,
+    branch: Option<&str>,
+) -> Result<(HashMap<String, GraphCommit>, Option<GraphCommit>)> {
+    let (rows, _heads) =
+        crate::db::manifest::ManifestCoordinator::read_graph_lineage_at(root_uri, branch).await?;
+    let mut commit_by_id = HashMap::with_capacity(rows.len());
+    let mut head_commit = None;
+    for row in rows {
+        let commit = GraphCommit {
+            graph_commit_id: row.graph_commit_id,
+            manifest_branch: row.manifest_branch,
+            manifest_version: row.manifest_version,
+            parent_commit_id: row.parent_commit_id,
+            merged_parent_commit_id: row.merged_parent_commit_id,
+            actor_id: row.actor_id,
+            created_at: row.created_at,
+        };
+        if should_replace_head(head_commit.as_ref(), &commit) {
+            head_commit = Some(commit.clone());
+        }
+        commit_by_id.insert(commit.graph_commit_id.clone(), commit);
+    }
+    Ok((commit_by_id, head_commit))
+}
+
+/// Read the legacy `_graph_commits.lance` (+ its actor sidecar) for `branch`
+/// into an in-memory cache — the transitional source for graphs not yet
+/// migrated to internal-schema v4 (RFC-013 step 4). Two callers, both
+/// transitional: the v3→v4 migration backfill (which copies these rows into
+/// `__manifest`) and the read-only v3 fallback in `CommitGraph::open*`. Returns
+/// `(commit_by_id, head)`, with the head picked by `should_replace_head` —
+/// identical to the manifest projection. A genuinely ABSENT (not-found) commit
+/// dataset or actor sidecar yields an empty cache (no head); any OTHER open error
+/// (transient IO / corrupt file) propagates loudly rather than being read as
+/// "empty" — a swallow here would let the v3→v4 migration backfill nothing and
+/// still stamp v4, orphaning the real lineage permanently. This keeps the legacy
+/// readers alive while any v3 graph survives; once no graph is below v4 it can
+/// retire.
+pub(crate) async fn read_legacy_commit_cache(
+    root_uri: &str,
+    branch: Option<&str>,
+) -> Result<(HashMap<String, GraphCommit>, Option<GraphCommit>)> {
+    let root = root_uri.trim_end_matches('/');
+    let commits_uri = graph_commits_uri(root);
+    let commits_open = match crate::failpoints::maybe_fail_lance_open("migration.v3_to_v4.legacy_open")
+    {
+        Ok(()) => Dataset::open(&commits_uri).await,
+        Err(injected) => Err(injected),
+    };
+    let mut dataset = match commits_open {
+        Ok(dataset) => dataset,
+        // An ABSENT commits dataset is the legitimate "no legacy data" signal —
+        // a graph with no `_graph_commits.lance` (or none on this branch) yields
+        // an empty cache. But ONLY a genuine not-found gets that treatment: a
+        // transient/corrupt open (IO / CorruptFile / …) must propagate, never be
+        // read as "empty". The v3→v4 migration calls this once before stamping
+        // v4; swallowing a non-not-found error here would backfill nothing and
+        // stamp v4 anyway, orphaning the real lineage permanently (the migration
+        // never re-runs, and the v3 fallback is then disabled). Lance maps an
+        // object-store NotFound to `DatasetNotFound`; the variant match (vs an
+        // existence probe) is exactly right and not over-strict — pinned by
+        // `lance_surface_guards.rs::dataset_open_missing_returns_not_found_variant`.
+        Err(lance::Error::DatasetNotFound { .. }) | Err(lance::Error::NotFound { .. }) => {
+            return Ok((HashMap::new(), None));
+        }
+        Err(e) => return Err(OmniError::Lance(e.to_string())),
+    };
+    if let Some(branch) = branch.filter(|b| *b != "main") {
+        dataset = dataset
+            .checkout_branch(branch)
+            .await
+            .map_err(|e| OmniError::Lance(e.to_string()))?;
+    }
+
+    // The actor sidecar may be absent (older graphs without authored commits);
+    // an empty actor map then leaves every commit's actor `None`. It is read
+    // FLAT (no branch checkout): the pre-Phase-7 commit graph never forked the
+    // actor dataset — actors are keyed by `graph_commit_id` globally — so a
+    // branch's commits resolve their actor from the same single actor table.
+    // This matches the live `CommitGraph::open_at_branch`, which also opens the
+    // actor dataset on main while checking out the branch only on the commits
+    // dataset.
+    let actors_open =
+        match crate::failpoints::maybe_fail_lance_open("migration.v3_to_v4.legacy_open") {
+            Ok(()) => Dataset::open(&graph_commit_actors_uri(root)).await,
+            Err(injected) => Err(injected),
+        };
+    let actor_by_commit_id = match actors_open {
+        Ok(actor_dataset) => load_commit_actor_cache(&actor_dataset).await?,
+        // An ABSENT actor sidecar is benign (older graphs without authored
+        // commits) — every commit's actor stays `None`. A not-found is therefore
+        // the empty-map signal. But a CORRUPT/transient actor open must NOT be
+        // read as "no authors": silently wiping all authorship and then stamping
+        // v4 is the same permanent-loss hole as the commits arm, so anything
+        // other than not-found propagates. (Same variant contract, different
+        // rationale — absence is normal here, error is not.)
+        Err(lance::Error::DatasetNotFound { .. }) | Err(lance::Error::NotFound { .. }) => {
+            HashMap::new()
+        }
+        Err(e) => return Err(OmniError::Lance(e.to_string())),
+    };
+
+    load_commit_cache(&dataset, &actor_by_commit_id).await
+}
+
 async fn load_commit_cache(
    dataset: &Dataset,
    actor_by_commit_id: &HashMap<String, String>,
@ -694,11 +912,170 @@ async fn open_for_branch(root_uri: &str, branch: Option<&str>) -> Result<CommitG
    }
 }

-fn now_micros() -> Result<i64> {
-    let duration = SystemTime::now()
-        .duration_since(UNIX_EPOCH)
-        .map_err(|e| OmniError::manifest(format!("system clock before UNIX_EPOCH: {}", e)))?;
-    Ok(duration.as_micros() as i64)
+/// Identities of the commits written into a synthetic pre-Phase-7 (v3) graph by
+/// [`seed_legacy_v3_lineage`], for assertions after migration.
+//
+// Gated on `test` OR the `failpoints` feature: the v3→v4 migration fault-injection
+// test lives in the `failpoints` integration binary (the fail registry is
+// process-global, so failpoint tests must not run in-source), and that binary
+// compiles the crate without `cfg(test)` — so it needs this fixture under the
+// feature too. Still excluded from release builds.
+#[cfg(any(test, feature = "failpoints"))]
+#[derive(Debug, Clone)]
+pub struct V3LineageFixture {
+    /// The genesis (parentless) commit id.
+    pub genesis: String,
+    /// A direct, authored commit on main (actor `act-a`).
+    pub commit_a: String,
+    /// A commit tagged to the `feature` branch (actor `act-feature`).
+    pub feature_commit: String,
+    /// The merge commit on main: parent = `commit_a`, merged_parent =
+    /// `feature_commit`, actor `act-merger`. This is the head of main.
+    pub merge_commit: String,
+    /// Every commit id written, in append order (for count assertions).
+    pub all_ids: Vec<String>,
+}
+
+/// Build a synthetic pre-Phase-7 (internal-schema v3) graph at `root_uri`: graph
+/// lineage lives ONLY in `_graph_commits.lance` (+ its actor sidecar), `__manifest`
+/// carries NO `graph_commit`/`graph_head` rows, and the stamp is set to v3. This
+/// reproduces exactly the on-disk shape a graph created by a pre-RFC-013-Phase-7
+/// binary would have, so the v3→v4 migration and the v3-read fallback can be
+/// tested against it.
+///
+/// The lineage is a realistic DAG with a branch + a real merge: genesis → A →
+/// (feature commit, off to the side) → merge(A, feature) at the head of main,
+/// with authored actors on the non-genesis commits. Reaches the dead-on-the-
+/// write-path `append_commit_with_parents`/`append_actor` (still present for
+/// exactly this transitional purpose) to write the legacy rows.
+#[cfg(any(test, feature = "failpoints"))]
+pub async fn seed_legacy_v3_lineage(root_uri: &str) -> Result<V3LineageFixture> {
+    let root = root_uri.trim_end_matches('/');
+
+    // 1. Create `__manifest` (Phase-7 folds genesis lineage into it) and the
+    //    EMPTY legacy `_graph_commits.lance`. We then append the v3-style commit
+    //    rows below — a real v3 graph carried its genesis in `_graph_commits`.
+    crate::db::manifest::seed_manifest_for_v3_fixture(root).await?;
+    let mut cg = CommitGraph::init(root).await?;
+    // Clear the cache that init seeded from the (genesis-bearing) manifest, so
+    // the appended rows below are the whole story and parents come out right.
+    cg.commit_by_id.clear();
+    cg.head_commit = None;
+
+    // 2. Append the legacy lineage to `_graph_commits.lance` on main.
+    let genesis = cg
+        .append_commit_with_parents(None, 1, None, None, None)
+        .await?;
+    let commit_a = cg
+        .append_commit_with_parents(None, 2, Some(&genesis), None, Some("act-a"))
+        .await?;
+    let feature_commit = cg
+        .append_commit_with_parents(Some("feature"), 3, Some(&commit_a), None, Some("act-feature"))
+        .await?;
+    let merge_commit = cg
+        .append_commit_with_parents(
+            None,
+            4,
+            Some(&commit_a),
+            Some(&feature_commit),
+            Some("act-merger"),
+        )
+        .await?;
+
+    // 3. Strip the genesis lineage rows the Phase-7 init folded into `__manifest`
+    //    and rewind the stamp to v3, so the manifest matches a true pre-Phase-7
+    //    graph (no lineage in `__manifest`, stamp v3).
+    crate::db::manifest::strip_lineage_and_set_v3_stamp_for_fixture(root).await?;
+
+    Ok(V3LineageFixture {
+        genesis: genesis.clone(),
+        commit_a: commit_a.clone(),
+        feature_commit: feature_commit.clone(),
+        merge_commit: merge_commit.clone(),
+        all_ids: vec![genesis, commit_a, feature_commit, merge_commit],
+    })
+}
+
+/// Identities of a synthetic pre-Phase-7 (v3) graph that carries a REAL Lance
+/// branch (built by [`seed_legacy_v3_lineage_with_branch`]).
+#[cfg(test)]
+#[derive(Debug, Clone)]
+pub struct V3BranchedLineageFixture {
+    /// The genesis (parentless) commit on main.
+    pub genesis: String,
+    /// A direct authored commit on main (actor `act-a`). The head of main.
+    pub commit_a: String,
+    /// A commit on the real `feature` Lance branch (actor `act-branch`),
+    /// parented off `commit_a`. The head of `feature`.
+    pub branch_commit: String,
+    /// The branch name forked on both `_graph_commits.lance` and `__manifest`.
+    pub branch: String,
+}
+
+/// Build a synthetic pre-Phase-7 (internal-schema v3) graph at `root_uri` that
+/// carries a REAL Lance branch `feature` on BOTH `_graph_commits.lance` and
+/// `__manifest`, reproducing exactly the on-disk shape of a branched graph
+/// created by a pre-RFC-013-Phase-7 binary:
+///
+/// - `_graph_commits.lance`: main has `genesis → A`; the `feature` Lance branch
+///   adds `branch_commit` (parent `A`). Authored actors land in the FLAT actor
+///   sidecar (the pre-Phase-7 commit graph never forked the actor table).
+/// - `__manifest`: main is stamped v3 with NO lineage rows; the `feature` branch
+///   is forked from main's v3 state, so it too is v3 with NO lineage of its own.
+///
+/// This is the fixture the per-branch v3→v4 migration runs against: it lets a
+/// test prove that migrating the `feature` branch reads the branch's legacy
+/// lineage, writes it into the BRANCH's `__manifest`, and leaves main untouched —
+/// the case the main-only [`seed_legacy_v3_lineage`] cannot exercise.
+#[cfg(test)]
+pub async fn seed_legacy_v3_lineage_with_branch(root_uri: &str) -> Result<V3BranchedLineageFixture> {
+    let root = root_uri.trim_end_matches('/');
+
+    // 1. `__manifest` (genesis folded by Phase-7 init) + an empty legacy
+    //    `_graph_commits.lance`. Clear the init-seeded cache so the rows we
+    //    append below are the whole story.
+    crate::db::manifest::seed_manifest_for_v3_fixture(root).await?;
+    let mut cg = CommitGraph::init(root).await?;
+    cg.commit_by_id.clear();
+    cg.head_commit = None;
+
+    // 2. Main lineage on `_graph_commits.lance`: genesis → A (authored).
+    let genesis = cg
+        .append_commit_with_parents(None, 1, None, None, None)
+        .await?;
+    let commit_a = cg
+        .append_commit_with_parents(None, 2, Some(&genesis), None, Some("act-a"))
+        .await?;
+
+    // 3. Fork a real `feature` Lance branch on `_graph_commits.lance`, switch the
+    //    handle to it, and append an authored branch commit (its actor lands in
+    //    the flat main actor table — exactly the pre-Phase-7 shape).
+    cg.create_branch("feature").await?;
+    let commits_ds = cg
+        .dataset
+        .take()
+        .expect("commits dataset present after create_branch")
+        .checkout_branch("feature")
+        .await
+        .map_err(|e| OmniError::Lance(e.to_string()))?;
+    cg.dataset = Some(commits_ds);
+    cg.active_branch = Some("feature".to_string());
+    let branch_commit = cg
+        .append_commit_with_parents(Some("feature"), 3, Some(&commit_a), None, Some("act-branch"))
+        .await?;
+
+    // 4. Rewind main's `__manifest` to the v3 shape (strip the folded genesis
+    //    lineage, set stamp 3) BEFORE forking — so the `feature` manifest branch
+    //    inherits the stripped v3 state (no lineage, stamp 3).
+    crate::db::manifest::strip_lineage_and_set_v3_stamp_for_fixture(root).await?;
+    crate::db::manifest::fork_manifest_branch_for_v3_fixture(root, "feature").await?;
+
+    Ok(V3BranchedLineageFixture {
+        genesis,
+        commit_a,
+        branch_commit,
+        branch: "feature".to_string(),
+    })
 }

 #[cfg(test)]
@ -709,6 +1086,83 @@ mod tests {

    use super::*;

+    // RFC-013 step 4: the v3-read fallback / migration source reads a NAMED
+    // branch's lineage from a real Lance branch on `_graph_commits.lance`, while
+    // resolving actors from the FLAT actor table (the pre-Phase-7 commit graph
+    // forked only the commits dataset, never the actor sidecar). This guards
+    // both that branch-checkout path and the flat-actor resolution — the case
+    // the main-branch fixture (commits on main only) does not exercise.
+    #[tokio::test]
+    async fn read_legacy_commit_cache_resolves_branch_commits_with_flat_actors() {
+        let dir = tempfile::tempdir().unwrap();
+        let uri = dir.path().to_str().unwrap();
+
+        // A v3 graph needs `__manifest` to exist for `CommitGraph::init`'s
+        // genesis-cache seed; we clear that cache and write our own legacy rows.
+        crate::db::manifest::seed_manifest_for_v3_fixture(uri)
+            .await
+            .unwrap();
+        let mut cg = CommitGraph::init(uri).await.unwrap();
+        cg.commit_by_id.clear();
+        cg.head_commit = None;
+
+        // Main lineage: genesis → A (authored). The actor lands in the FLAT
+        // `_graph_commit_actors.lance` (never branched).
+        let genesis = cg
+            .append_commit_with_parents(None, 1, None, None, None)
+            .await
+            .unwrap();
+        let commit_a = cg
+            .append_commit_with_parents(None, 2, Some(&genesis), None, Some("act-a"))
+            .await
+            .unwrap();
+
+        // Fork a real Lance branch on `_graph_commits.lance`, switch the handle
+        // to it, and append an authored branch commit (its actor also goes to
+        // the flat main actor table — exactly the pre-Phase-7 shape).
+        cg.create_branch("feature").await.unwrap();
+        cg.dataset = Some(
+            cg.dataset
+                .take()
+                .unwrap()
+                .checkout_branch("feature")
+                .await
+                .unwrap(),
+        );
+        cg.active_branch = Some("feature".to_string());
+        let branch_commit = cg
+            .append_commit_with_parents(
+                Some("feature"),
+                3,
+                Some(&commit_a),
+                None,
+                Some("act-branch"),
+            )
+            .await
+            .unwrap();
+
+        // The legacy read at the branch sees the inherited main commits + the
+        // branch commit, the head is the branch commit, and the authored actors
+        // resolve from the flat table (no branch checkout on the actor dataset).
+        let (commits, head) = read_legacy_commit_cache(uri, Some("feature")).await.unwrap();
+        assert_eq!(commits.len(), 3, "branch inherits genesis + A + its own commit");
+        assert_eq!(
+            head.as_ref().unwrap().graph_commit_id,
+            branch_commit,
+            "the branch commit is the head"
+        );
+        assert_eq!(
+            commits.get(&commit_a).unwrap().actor_id.as_deref(),
+            Some("act-a"),
+            "main commit's actor resolves from the flat actor table",
+        );
+        assert_eq!(
+            commits.get(&branch_commit).unwrap().actor_id.as_deref(),
+            Some("act-branch"),
+            "branch commit's actor resolves from the flat actor table",
+        );
+    }
+
    #[test]
    fn load_commits_from_batches_returns_error_for_bad_schema() {
        let batch = RecordBatch::try_new(
--- a/crates/omnigraph/src/db/graph_coordinator.rs
+++ b/crates/omnigraph/src/db/graph_coordinator.rs
@ -106,13 +106,17 @@ impl GraphCoordinator {
        storage: Arc<dyn StorageAdapter>,
    ) -> Result<Self> {
        let root = normalize_root_uri(root_uri)?;
+        // The genesis graph commit is folded into the manifest init write, so
+        // `__manifest` is the single source of graph lineage from version one
+        // (RFC-013 Phase 7). `CommitGraph::init` then creates the empty
+        // branch-ref dataset and seeds its cache from that manifest genesis.
        let manifest = ManifestCoordinator::init(&root, catalog).await?;
-        let commit_graph = Some(CommitGraph::init(&root, manifest.version()).await?);
+        let commit_graph = CommitGraph::init(&root).await?;
        Ok(Self {
            root_uri: root,
            storage,
            manifest,
-            commit_graph,
+            commit_graph: Some(commit_graph),
            bound_branch: None,
        })
    }
@ -257,7 +261,7 @@ impl GraphCoordinator {
    /// fresh, so any existing commit-graph branch with this name is provably
    /// orphaned and is force-dropped before recreating.
    async fn create_commit_graph_branch(&mut self, branch: &str) -> Result<()> {
-        failpoints::maybe_fail("branch_create.after_manifest_branch_create")?;
+        failpoints::maybe_fail(crate::failpoints::names::BRANCH_CREATE_AFTER_MANIFEST_BRANCH_CREATE)?;
        let Some(commit_graph) = &mut self.commit_graph else {
            return Ok(());
        };
@ -306,7 +310,7 @@ impl GraphCoordinator {
    /// Best-effort, idempotent reclaim of the commit-graph branch `branch`.
    /// Tolerates an absent commit-graph dataset (a graph that never committed).
    async fn reclaim_commit_graph_branch(&mut self, branch: &str) -> Result<()> {
-        failpoints::maybe_fail("branch_delete.before_commit_graph_reclaim")?;
+        failpoints::maybe_fail(crate::failpoints::names::BRANCH_DELETE_BEFORE_COMMIT_GRAPH_RECLAIM)?;
        if let Some(commit_graph) = &mut self.commit_graph {
            commit_graph.force_delete_branch(branch).await
        } else if self
@ -438,7 +442,12 @@ impl GraphCoordinator {
            .exists(&graph_commits_uri(self.root_uri()))
            .await?
        {
-            let _ = CommitGraph::init(self.root_uri(), self.manifest.version()).await?;
+            // A graph opened without a commit-graph dataset gets the empty
+            // branch-ref dataset created lazily here. Graph lineage lives in
+            // `__manifest` (RFC-013 Phase 7) — a graph initialized by current
+            // code already carries its genesis there, and the commit graph
+            // sources its cache from it. No genesis is written here.
+            CommitGraph::init(self.root_uri()).await?;
        }
        self.commit_graph = match self.current_branch() {
            Some(branch) => Some(CommitGraph::open_at_branch(self.root_uri(), branch).await?),
@ -452,12 +461,8 @@ impl GraphCoordinator {
        updates: &[SubTableUpdate],
        actor_id: Option<&str>,
    ) -> Result<PublishedSnapshot> {
-        let manifest_version = self.commit_manifest_updates(updates).await?;
-        let snapshot_id = self.record_graph_commit(manifest_version, actor_id).await?;
-        Ok(PublishedSnapshot {
-            manifest_version,
-            _snapshot_id: snapshot_id,
-        })
+        self.commit_updates_with_actor_with_expected(updates, &HashMap::new(), actor_id)
+            .await
    }

    /// Commit with publisher-level OCC fence. The `expected_table_versions` map
@ -471,45 +476,9 @@ impl GraphCoordinator {
        expected_table_versions: &HashMap<String, u64>,
        actor_id: Option<&str>,
    ) -> Result<PublishedSnapshot> {
-        let manifest_version = self
-            .commit_manifest_updates_with_expected(updates, expected_table_versions)
-            .await?;
-        let snapshot_id = self.record_graph_commit(manifest_version, actor_id).await?;
-        Ok(PublishedSnapshot {
-            manifest_version,
-            _snapshot_id: snapshot_id,
-        })
-    }
-
-    pub(crate) async fn commit_manifest_updates(
-        &mut self,
-        updates: &[SubTableUpdate],
-    ) -> Result<u64> {
-        let manifest_version = self.manifest.commit(updates).await?;
-        failpoints::maybe_fail("graph_publish.after_manifest_commit")?;
-        Ok(manifest_version)
-    }
-
-    pub(crate) async fn commit_manifest_updates_with_expected(
-        &mut self,
-        updates: &[SubTableUpdate],
-        expected_table_versions: &HashMap<String, u64>,
-    ) -> Result<u64> {
-        let manifest_version = self
-            .manifest
-            .commit_with_expected(updates, expected_table_versions)
-            .await?;
-        failpoints::maybe_fail("graph_publish.after_manifest_commit")?;
-        Ok(manifest_version)
-    }
-
-    pub(crate) async fn commit_manifest_changes(
-        &mut self,
-        changes: &[ManifestChange],
-    ) -> Result<u64> {
-        let manifest_version = self.manifest.commit_changes(changes).await?;
-        failpoints::maybe_fail("graph_publish.after_manifest_commit")?;
-        Ok(manifest_version)
+        let changes = updates_to_changes(updates);
+        self.commit_changes_with_actor_with_expected(&changes, expected_table_versions, actor_id)
+            .await
    }

    pub(crate) async fn commit_changes_with_actor(
@ -517,71 +486,110 @@ impl GraphCoordinator {
        changes: &[ManifestChange],
        actor_id: Option<&str>,
    ) -> Result<PublishedSnapshot> {
-        let manifest_version = self.commit_manifest_changes(changes).await?;
-        let snapshot_id = self.record_graph_commit(manifest_version, actor_id).await?;
+        self.commit_changes_with_actor_with_expected(changes, &HashMap::new(), actor_id)
+            .await
+    }
+
+    /// Publish `changes` and record one graph commit in the SAME manifest CAS
+    /// (RFC-013 Phase 7). The lineage intent (a freshly minted commit id, the
+    /// branch, the actor) rides the publish so the `graph_commit` + `graph_head`
+    /// rows land atomically with the table-version rows — one manifest version,
+    /// no separate write, no `commit_graph.refresh()` to pick a parent (the
+    /// publisher resolves it under the CAS). The in-memory commit cache is then
+    /// updated from the intent + the resolved parent without a re-read.
+    async fn commit_changes_with_actor_with_expected(
+        &mut self,
+        changes: &[ManifestChange],
+        expected_table_versions: &HashMap<String, u64>,
+        actor_id: Option<&str>,
+    ) -> Result<PublishedSnapshot> {
+        self.ensure_commit_graph_initialized().await?;
+        let intent = self.new_lineage_intent(actor_id, None)?;
+        failpoints::maybe_fail(crate::failpoints::names::GRAPH_PUBLISH_BEFORE_COMMIT_APPEND)?;
+        let outcome = self
+            .manifest
+            .commit_changes_with_lineage(changes, expected_table_versions, Some(&intent))
+            .await?;
+        failpoints::maybe_fail(crate::failpoints::names::GRAPH_PUBLISH_AFTER_MANIFEST_COMMIT)?;
+        let snapshot_id = self.apply_lineage_to_cache(intent, &outcome);
        Ok(PublishedSnapshot {
-            manifest_version,
+            manifest_version: outcome.version,
            _snapshot_id: snapshot_id,
        })
    }

-    pub(crate) async fn record_graph_commit(
+    /// Publish a branch-merge: `updates` (the merged table versions) plus the
+    /// merge commit, in one manifest CAS (RFC-013 Phase 7). The merge commit's
+    /// merged-in parent is `merged_parent_commit_id` (the source head, stable);
+    /// its first parent is resolved by the publisher as the current target-branch
+    /// head — the live head, which is the post-merge correct parent even if the
+    /// target advanced since the merge began.
+    pub(crate) async fn commit_merge_with_actor(
        &mut self,
-        manifest_version: u64,
-        actor_id: Option<&str>,
-    ) -> Result<SnapshotId> {
-        self.ensure_commit_graph_initialized().await?;
-        let current_branch = self.current_branch().map(str::to_string);
-        let Some(commit_graph) = &mut self.commit_graph else {
-            return Ok(SnapshotId::synthetic(
-                current_branch.as_deref(),
-                manifest_version,
-                self.manifest_incarnation().e_tag.as_deref(),
-            ));
-        };
-        failpoints::maybe_fail("graph_publish.before_commit_append")?;
-        // Refresh the commit-graph head from storage before selecting the
-        // parent. `append_commit` parents the new commit on the IN-MEMORY head
-        // (`head_commit_id`, zero storage read), but the manifest was just
-        // committed against a freshly rebased pin (`commit_all` opens a fresh
-        // coordinator) while THIS coordinator's cached head may be stale because
-        // an external writer advanced the branch. Without this refresh a
-        // same-branch write after an external commit appends off the stale head
-        // and FORKS the commit DAG (the new commit and the external commit
-        // sharing a parent). Refreshing makes the parent the true current head;
-        // the just-committed manifest version has no commit-graph row yet, so the
-        // fresh head is exactly the prior commit. (record_merge_commit is
-        // unaffected — it passes explicit parents, never the cached head.)
-        commit_graph.refresh().await?;
-        let graph_commit_id = commit_graph
-            .append_commit(current_branch.as_deref(), manifest_version, actor_id)
-            .await?;
-        Ok(SnapshotId::new(graph_commit_id))
-    }
-
-    pub(crate) async fn record_merge_commit(
-        &mut self,
-        manifest_version: u64,
-        parent_commit_id: &str,
+        updates: &[SubTableUpdate],
        merged_parent_commit_id: &str,
        actor_id: Option<&str>,
    ) -> Result<SnapshotId> {
        self.ensure_commit_graph_initialized().await?;
-        let current_branch = self.current_branch().map(str::to_string);
-        let commit_graph = self.commit_graph.as_mut().ok_or_else(|| {
-            OmniError::manifest("branch merge requires _graph_commits.lance".to_string())
-        })?;
-        failpoints::maybe_fail("graph_publish.before_commit_append")?;
-        let graph_commit_id = commit_graph
-            .append_merge_commit(
-                current_branch.as_deref(),
-                manifest_version,
-                parent_commit_id,
-                merged_parent_commit_id,
-                actor_id,
-            )
+        let intent =
+            self.new_lineage_intent(actor_id, Some(merged_parent_commit_id.to_string()))?;
+        failpoints::maybe_fail(crate::failpoints::names::GRAPH_PUBLISH_BEFORE_COMMIT_APPEND)?;
+        let changes = updates_to_changes(updates);
+        let outcome = self
+            .manifest
+            .commit_changes_with_lineage(&changes, &HashMap::new(), Some(&intent))
            .await?;
-        Ok(SnapshotId::new(graph_commit_id))
+        failpoints::maybe_fail(crate::failpoints::names::GRAPH_PUBLISH_AFTER_MANIFEST_COMMIT)?;
+        Ok(self.apply_lineage_to_cache(intent, &outcome))
+    }
+
+    /// Mint a [`LineageIntent`] for the next commit on the current branch: a
+    /// fresh ULID (stable across the publisher's CAS retries) and a timestamp.
+    /// The parent is NOT chosen here — the publisher resolves it per attempt
+    /// against the manifest it commits against.
+    fn new_lineage_intent(
+        &self,
+        actor_id: Option<&str>,
+        merged_parent_commit_id: Option<String>,
+    ) -> Result<crate::db::manifest::LineageIntent> {
+        Ok(crate::db::manifest::LineageIntent {
+            graph_commit_id: ulid::Ulid::new().to_string(),
+            branch: self.current_branch().map(str::to_string),
+            actor_id: actor_id.map(str::to_string),
+            merged_parent_commit_id,
+            created_at: crate::db::now_micros()?,
+        })
+    }
+
+    /// Insert the just-published commit into the in-memory commit cache from the
+    /// intent + the publisher-resolved parent + the new manifest version. No
+    /// storage I/O: the durable write already happened in the publish CAS, and
+    /// this keeps a same-handle read's `head_commit_id` consistent with the
+    /// snapshot it just advanced. Falls back to a synthetic id only when the
+    /// commit graph is somehow absent (never on a real write).
+    fn apply_lineage_to_cache(
+        &mut self,
+        intent: crate::db::manifest::LineageIntent,
+        outcome: &crate::db::manifest::CommitOutcome,
+    ) -> SnapshotId {
+        let Some(commit_graph) = &mut self.commit_graph else {
+            return SnapshotId::synthetic(
+                self.bound_branch.as_deref(),
+                outcome.version,
+                self.manifest.incarnation().e_tag.as_deref(),
+            );
+        };
+        let commit = GraphCommit {
+            graph_commit_id: intent.graph_commit_id.clone(),
+            manifest_branch: intent.branch,
+            manifest_version: outcome.version,
+            parent_commit_id: outcome.parent_commit_id.clone(),
+            merged_parent_commit_id: intent.merged_parent_commit_id,
+            actor_id: intent.actor_id,
+            created_at: intent.created_at,
+        };
+        commit_graph.insert_committed(commit);
+        SnapshotId::new(intent.graph_commit_id)
    }

    async fn open_commit_graph_for_branch(
@ -625,6 +633,15 @@ fn graph_commits_uri(root_uri: &str) -> String {
    join_uri(root_uri, GRAPH_COMMITS_DIR)
 }

+/// Wrap each `SubTableUpdate` as a `ManifestChange::Update` for the publisher.
+fn updates_to_changes(updates: &[SubTableUpdate]) -> Vec<ManifestChange> {
+    updates
+        .iter()
+        .cloned()
+        .map(ManifestChange::Update)
+        .collect()
+}
+
 fn normalize_branch_name(branch: &str) -> Result<Option<String>> {
    let branch = branch.trim();
    if branch.is_empty() {
--- a/crates/omnigraph/src/db/manifest.rs
+++ b/crates/omnigraph/src/db/manifest.rs
@ -35,7 +35,9 @@ pub(crate) use metadata::TableVersionMetadata;
 use metadata::{OMNIGRAPH_ROW_COUNT_KEY, table_version_metadata_for_state};
 #[cfg(test)]
 use namespace::{branch_manifest_namespace, staged_table_namespace};
-use publisher::{GraphNamespacePublisher, ManifestBatchPublisher};
+pub(crate) use migrations::refuse_if_stamp_unsupported;
+pub(crate) use publisher::LineageIntent;
+use publisher::{GraphNamespacePublisher, ManifestBatchPublisher, PublishOutcome};
 pub(crate) use recovery::{
    RecoveryMode, RecoverySidecar, RecoverySidecarHandle, SidecarKind, SidecarTablePin,
    SidecarTableRegistration, SidecarTombstone, confirm_sidecar_phase_b, delete_sidecar,
@ -43,6 +45,7 @@ pub(crate) use recovery::{
    recover_manifest_drift, schema_apply_serial_queue_key, write_sidecar,
 };
 pub use state::SubTableEntry;
+pub(crate) use state::{GraphLineageRow, read_graph_lineage};
 #[cfg(test)]
 use state::string_column;
 use state::{ManifestState, read_manifest_state};
@ -50,8 +53,34 @@ use state::{ManifestState, read_manifest_state};
 const OBJECT_TYPE_TABLE: &str = "table";
 const OBJECT_TYPE_TABLE_VERSION: &str = "table_version";
 const OBJECT_TYPE_TABLE_TOMBSTONE: &str = "table_tombstone";
+/// Immutable per-commit graph-lineage row (RFC-013 Phase 7). One row per graph
+/// commit; the projected form reconstructs a [`GraphCommit`]. `__manifest` is
+/// the single source — written in the same publish CAS as the table-version
+/// rows (no `_graph_commits.lance` row).
+const OBJECT_TYPE_GRAPH_COMMIT: &str = "graph_commit";
+/// Mutable per-branch head pointer for the graph lineage (RFC-013 Phase 7).
+/// `object_id` is `graph_head:<branch>` (`graph_head:main` for the main branch).
+const OBJECT_TYPE_GRAPH_HEAD: &str = "graph_head";
 const TABLE_VERSION_MANAGEMENT_KEY: &str = "table_version_management";

+/// Stable head-key segment for the main branch in `graph_head:<branch>` rows.
+/// `table_branch`/`manifest_branch` encode main as null, but `object_id` must be
+/// non-null, so the head row needs a literal — matching the `"main"` sentinel
+/// already used by `SnapshotId::synthetic` and `open_for_branch`.
+pub(crate) const MAIN_BRANCH_HEAD_KEY: &str = "main";
+
+/// The result of a manifest commit that may have folded in a graph commit
+/// (RFC-013 Phase 7).
+#[derive(Debug, Clone)]
+pub(crate) struct CommitOutcome {
+    /// The new `__manifest` version after the publish.
+    pub version: u64,
+    /// The parent the publisher resolved for the recorded commit, or `None` when
+    /// no lineage was recorded or the commit is the genesis. Lets the caller
+    /// update its in-memory commit cache without re-reading the manifest.
+    pub parent_commit_id: Option<String>,
+}
+
 /// Apply pending internal-schema migrations against `__manifest` on the
 /// open-for-write path, independent of a publish.
 ///
@ -65,7 +94,105 @@ const TABLE_VERSION_MANAGEMENT_KEY: &str = "table_version_management";
 /// Idempotent: a no-op stamp read when the on-disk version already matches.
 pub(crate) async fn migrate_on_open(root_uri: &str) -> Result<()> {
    let mut dataset = open_manifest_dataset(root_uri, None).await?;
-    migrations::migrate_internal_schema(&mut dataset).await
+    // Main branch: the v3→v4 lineage backfill reads `_graph_commits.lance` at
+    // main. Named branches migrate on their own first write via the publisher.
+    migrations::migrate_internal_schema(&mut dataset, root_uri, None).await
+}
+
+/// The on-disk internal-schema stamp of `__manifest` at `branch` (main when
+/// `None`). The transitional v3-read fallback in `CommitGraph` uses this to
+/// decide whether to source lineage from `__manifest` (stamp ≥ v4, post-Phase-7)
+/// or from the legacy `_graph_commits.lance` (stamp < v4, not yet migrated).
+pub(crate) async fn internal_schema_stamp_at(root_uri: &str, branch: Option<&str>) -> Result<u32> {
+    let dataset = open_manifest_dataset(root_uri, branch).await?;
+    Ok(migrations::read_stamp(&dataset))
+}
+
+/// Refuse to open a graph whose `__manifest` is stamped outside this binary's
+/// supported internal-schema range (newer than CURRENT, or older than
+/// MIN_SUPPORTED). The read-only open path calls this — it skips the write-path
+/// migration where the refusal otherwise lives — so an old binary still refuses a
+/// newer graph instead of silently misreading it, and a too-new binary refuses a
+/// below-floor graph instead of opening an unmigrated one.
+pub(crate) async fn refuse_if_internal_schema_unsupported(root_uri: &str) -> Result<()> {
+    let stamp = internal_schema_stamp_at(root_uri, None).await?;
+    migrations::refuse_if_stamp_unsupported(stamp)
+}
+
+/// The internal-schema version this binary writes. Exposed so the v3-read
+/// fallback can compare a branch's on-disk stamp against it.
+pub(crate) const INTERNAL_MANIFEST_SCHEMA_VERSION: u32 =
+    migrations::INTERNAL_MANIFEST_SCHEMA_VERSION;
+
+/// Test-only: create a `__manifest` for a minimal catalog, the first half of a
+/// synthetic pre-Phase-7 (v3) graph (see `commit_graph::seed_legacy_v3_lineage`).
+/// A small two-type schema is enough — the v3→v4 migration touches only the
+/// lineage rows, never the table-version rows.
+#[cfg(any(test, feature = "failpoints"))]
+pub(crate) async fn seed_manifest_for_v3_fixture(root_uri: &str) -> Result<()> {
+    let schema = omnigraph_compiler::schema::parser::parse_schema(
+        "node Person { name: String }\nedge Knows: Person -> Person { }\n",
+    )
+    .map_err(|e| OmniError::manifest(e.to_string()))?;
+    let catalog =
+        omnigraph_compiler::catalog::build_catalog(&schema).map_err(|e| OmniError::manifest(e.to_string()))?;
+    ManifestCoordinator::init(root_uri, &catalog).await?;
+    Ok(())
+}
+
+/// Test-only: strip the `graph_commit`/`graph_head` rows that Phase-7 init folds
+/// into `__manifest`, then rewind the internal-schema stamp to v3 — completing a
+/// synthetic pre-Phase-7 graph whose lineage lives only in `_graph_commits.lance`.
+#[cfg(any(test, feature = "failpoints"))]
+pub(crate) async fn strip_lineage_and_set_v3_stamp_for_fixture(root_uri: &str) -> Result<()> {
+    let mut dataset = open_manifest_dataset(root_uri, None).await?;
+    dataset
+        .delete("object_type = 'graph_commit' OR object_type = 'graph_head'")
+        .await
+        .map_err(|e| OmniError::Lance(e.to_string()))?;
+    // Re-open so the stamp write lands on the post-delete HEAD.
+    let mut dataset = open_manifest_dataset(root_uri, None).await?;
+    migrations::set_stamp_for_test(&mut dataset, 3).await
+}
+
+/// Test-only: fork a real Lance branch `name` on `__manifest` from main's CURRENT
+/// state. Call AFTER `strip_lineage_and_set_v3_stamp_for_fixture` so the forked
+/// branch inherits the v3 stamp with no lineage rows — i.e. a faithful
+/// pre-Phase-7 branch whose `__manifest` carries no lineage of its own. The
+/// branch's commits live only on the `_graph_commits.lance` branch until the
+/// per-branch v3→v4 migration runs against this branch's `__manifest`.
+#[cfg(test)]
+pub(crate) async fn fork_manifest_branch_for_v3_fixture(root_uri: &str, name: &str) -> Result<()> {
+    let mut dataset = open_manifest_dataset(root_uri, None).await?;
+    let version = dataset.version().version;
+    dataset
+        .create_branch(name, version, None)
+        .await
+        .map_err(|e| OmniError::Lance(e.to_string()))?;
+    Ok(())
+}
+
+/// Test-support re-export of the read-write migration entry point for the
+/// `failpoints` integration binary (which can't reach `pub(crate)` items). Gated
+/// on `test` OR `failpoints`; never in a release build.
+#[cfg(any(test, feature = "failpoints"))]
+pub async fn migrate_on_open_for_test(root_uri: &str) -> Result<()> {
+    migrate_on_open(root_uri).await
+}
+
+/// Test-support: the number of `graph_commit` lineage rows in `__manifest` at
+/// `branch` (main when `None`), plus the on-disk internal-schema stamp. Lets the
+/// `failpoints` integration binary assert the migration neither stamped nor
+/// backfilled when a legacy-open fault fired. Gated on `test` OR `failpoints`.
+#[cfg(any(test, feature = "failpoints"))]
+pub async fn lineage_row_count_and_stamp_for_test(
+    root_uri: &str,
+    branch: Option<&str>,
+) -> Result<(usize, u32)> {
+    let dataset = open_manifest_dataset(root_uri, branch).await?;
+    let stamp = migrations::read_stamp(&dataset);
+    let (rows, _heads) = read_graph_lineage(&dataset).await?;
+    Ok((rows.len(), stamp))
 }

 /// Immutable point-in-time view of the database.
@ -313,6 +440,9 @@ impl ManifestCoordinator {
    /// Create a new graph at `root_uri` from a catalog.
    ///
    /// Creates per-type Lance datasets and the namespace `__manifest` table.
+    /// The genesis graph commit is folded into the init write, so `__manifest`
+    /// is the single source of graph lineage from version one — callers read it
+    /// back through the lineage projection rather than via a second write.
    pub async fn init(root_uri: &str, catalog: &Catalog) -> Result<Self> {
        let root = root_uri.trim_end_matches('/');
        let (dataset, known_state) = init_manifest_graph(root, catalog).await?;
@ -419,17 +549,58 @@ impl ManifestCoordinator {
        changes: &[ManifestChange],
        expected_table_versions: &HashMap<String, u64>,
    ) -> Result<u64> {
-        if changes.is_empty() && expected_table_versions.is_empty() {
-            return Ok(self.version());
+        Ok(self
+            .commit_changes_with_lineage(changes, expected_table_versions, None)
+            .await?
+            .version)
+    }
+
+    /// Publish `changes` and, when `lineage` is present, record the graph commit
+    /// in the SAME merge-insert (RFC-013 Phase 7). `__manifest` is the single
+    /// source of graph lineage: the `graph_commit` + `graph_head:<branch>` rows
+    /// ride the table-version publish so the whole commit lands at one manifest
+    /// version — no separate write, no manifest→commit-graph atomicity gap, no
+    /// per-write commit-graph refresh. Returns the new version and the parent the
+    /// publisher resolved for the commit (so the caller can update its in-memory
+    /// commit cache without a re-read).
+    pub(crate) async fn commit_changes_with_lineage(
+        &mut self,
+        changes: &[ManifestChange],
+        expected_table_versions: &HashMap<String, u64>,
+        lineage: Option<&LineageIntent>,
+    ) -> Result<CommitOutcome> {
+        if changes.is_empty() && expected_table_versions.is_empty() && lineage.is_none() {
+            return Ok(CommitOutcome {
+                version: self.version(),
+                parent_commit_id: None,
+            });
        }

-        self.dataset = self
+        let PublishOutcome {
+            dataset,
+            parent_commit_id,
+        } = self
            .publisher
-            .publish(changes, expected_table_versions)
+            .publish(changes, expected_table_versions, lineage)
            .await?;
+        self.dataset = dataset;

        self.known_state = read_manifest_state(&self.dataset).await?;
-        Ok(self.version())
+        Ok(CommitOutcome {
+            version: self.version(),
+            parent_commit_id,
+        })
+    }
+
+    /// Project the graph-lineage rows out of `__manifest` at `branch` without an
+    /// open coordinator. Opens the manifest fresh; used by `CommitGraph` to
+    /// source its in-memory cache from the manifest projection.
+    pub(crate) async fn read_graph_lineage_at(
+        root_uri: &str,
+        branch: Option<&str>,
+    ) -> Result<(Vec<GraphLineageRow>, HashMap<String, String>)> {
+        let dataset = open_manifest_dataset(root_uri, branch).await?;
+        read_graph_lineage(&dataset).await
    }

    /// Current manifest version.
--- a/crates/omnigraph/src/db/manifest/graph.rs
+++ b/crates/omnigraph/src/db/manifest/graph.rs
@ -14,9 +14,17 @@ use super::layout::{manifest_uri, open_manifest_dataset, type_name_hash};
 use super::metadata::TableVersionMetadata;
 use super::migrations::stamp_current_version;
 use super::state::{
-    ManifestState, SubTableEntry, entries_to_batch, manifest_schema, read_manifest_state,
+    GraphLineageRow, ManifestState, SubTableEntry, entries_to_batch, graph_lineage_row_parts,
+    manifest_schema, read_manifest_state,
 };

+/// The manifest version the init `Dataset::write` produces (Lance datasets start
+/// at version one). The genesis graph commit pins this version — a snapshot at
+/// it is the empty, freshly-initialized graph. The two config-only commits that
+/// follow (`update_config`, `stamp_current_version`) advance the live manifest
+/// version but add no table data, so genesis correctly stays pinned at one.
+const GENESIS_MANIFEST_VERSION: u64 = 1;
+
 pub(super) async fn init_manifest_graph(
    root_uri: &str,
    catalog: &Catalog,
@ -24,7 +32,21 @@ pub(super) async fn init_manifest_graph(
    let root = root_uri.trim_end_matches('/');
    let (entries, version_metadata) = build_initial_entries(root, catalog).await?;

-    let manifest_batch = entries_to_batch(&entries, &version_metadata)?;
+    // Genesis graph commit: parentless, actorless, minted once and folded into
+    // the init write so `__manifest` is the single source of graph lineage from
+    // version one (no `_graph_commits.lance` row, no separate publish).
+    let genesis = GraphLineageRow {
+        graph_commit_id: ulid::Ulid::new().to_string(),
+        manifest_branch: None,
+        manifest_version: GENESIS_MANIFEST_VERSION,
+        parent_commit_id: None,
+        merged_parent_commit_id: None,
+        actor_id: None,
+        created_at: crate::db::now_micros()?,
+    };
+    let genesis_lineage = graph_lineage_row_parts(&genesis, None)?;
+
+    let manifest_batch = entries_to_batch(&entries, &version_metadata, &genesis_lineage)?;
    let schema = manifest_schema();
    let reader = RecordBatchIterator::new(vec![Ok(manifest_batch)], schema);
    let params = WriteParams {
--- a/crates/omnigraph/src/db/manifest/migrations.rs
+++ b/crates/omnigraph/src/db/manifest/migrations.rs
@ -37,6 +37,9 @@ use lance::Dataset;

 use crate::error::{OmniError, Result};

+use crate::db::commit_graph::GraphCommit;
+use super::state::{GraphLineageRow, graph_lineage_row_parts, merge_lineage_rows, read_graph_lineage};
+
 /// Current internal schema version this binary expects to find on disk.
 ///
 /// History:
@ -50,14 +53,62 @@ use crate::error::{OmniError, Result};
 ///   `__manifest` dataset by the pre-v0.4.0 Run state machine (removed in
 ///   MR-771). Once swept, the `is_internal_run_branch` defense-in-depth guard
 ///   is no longer needed (MR-770).
-pub(super) const INTERNAL_MANIFEST_SCHEMA_VERSION: u32 = 3;
+/// - v4 — RFC-013 Phase 7 folds graph lineage into `__manifest` as
+///   `graph_commit`/`graph_head` rows written in the publish CAS. A pre-Phase-7
+///   (v3) graph has its lineage only in `_graph_commits.lance`, so the new
+///   binary would read an empty commit DAG. This one-time per-branch backfill
+///   copies the lineage from `_graph_commits.lance` into `__manifest`
+///   (`migrate_v3_to_v4`). `_graph_commits.lance` is left in place as the
+///   branch-ref carrier; no commit rows are ever written to it again.
+pub(crate) const INTERNAL_MANIFEST_SCHEMA_VERSION: u32 = 4;
+
+/// The oldest on-disk internal-schema stamp this binary will open. A graph below
+/// this floor is refused (`refuse_if_stamp_unsupported`) with a "migrate it
+/// forward with an older release first" error, instead of obliging this binary to
+/// carry that version's `migrate_vN_…` arm and the legacy readers it needs
+/// forever. Raising the floor is how the migration chain sheds old code.
+///
+/// **Retirement runbook** — turning "accumulates forever" into a sliding window:
+/// 1. *Shed version N* once no graph below `N+1` remains in the fleet: bump this
+///    floor AND `LOWEST_REGISTERED_MIGRATION_SOURCE` to `N+1`, then delete the
+///    `N =>` arm in `migrate_internal_schema`, `migrate_vN_to_vN+1`, and its
+///    helpers + tests. The tripwire test keeps the two consts in lockstep, so a
+///    half-done shed fails CI.
+/// 2. *Retire the v3 legacy readers entirely* once MIN ≥ 4: `git rm` the
+///    `commit_graph/commit_graph_legacy_v3.rs` seam file and flip the single
+///    `stamp < CURRENT` gate in `load_commit_cache_for_branch` to read the
+///    manifest projection unconditionally.
+///
+/// MIN = 1 today is a pure no-op: `read_stamp` floors an absent stamp at 1 and no
+/// real graph carries 0, so nothing is refused.
+pub(crate) const MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION: u32 = 1;
+
+/// The lowest `current` value the `migrate_internal_schema` dispatcher still has a
+/// `match` arm for. Mirrors the lowest registered migration source so a floor bump
+/// that forgets to delete the now-dead arm (or vice versa) is caught by the
+/// compile-time tripwire below. Migration arms aren't an enumerable registry, so
+/// this hand-mirrored const is the minimal enforced coupling — cheaper than
+/// reshaping the dispatcher into a data-driven table.
+const LOWEST_REGISTERED_MIGRATION_SOURCE: u32 = 1;
+
+/// Retirement tripwire (compile-time): the refusal floor and the lowest migration
+/// arm must move together. Raising `MIN_SUPPORTED` without deleting the now-dead
+/// below-floor arm — or vice versa — fails the build with this message, which is
+/// stronger than a runtime test and impossible to skip. Migration arms can't be
+/// enumerated, so this const-mirror is the check.
+const _: () = assert!(
+    LOWEST_REGISTERED_MIGRATION_SOURCE == MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION,
+    "internal-schema floor drifted from the lowest registered migration arm: when raising \
+     MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION, delete every below-floor `N =>` arm + migrate_vN_… \
+     + its helpers/tests and bump LOWEST_REGISTERED_MIGRATION_SOURCE to match (or vice versa)",
+);

 const INTERNAL_SCHEMA_VERSION_KEY: &str = "omnigraph:internal_schema_version";
 const OBJECT_ID_PK_KEY: &str = "lance-schema:unenforced-primary-key";

 /// Read the on-disk stamp from `__manifest`'s schema-level metadata.
 /// Absent ⇒ v1 (pre-stamp world).
-pub(super) fn read_stamp(dataset: &Dataset) -> u32 {
+pub(crate) fn read_stamp(dataset: &Dataset) -> u32 {
    dataset
        .schema()
        .metadata
@ -72,20 +123,52 @@ pub(super) async fn stamp_current_version(dataset: &mut Dataset) -> Result<()> {
    set_stamp(dataset, INTERNAL_MANIFEST_SCHEMA_VERSION).await
 }

+/// Refuse to open a manifest whose stamp this binary cannot serve — in either
+/// direction — with a clear upgrade path. Shared by every place a stamp is read
+/// and enforced: the write-path migration dispatcher, the read-only open guard,
+/// and the branch lineage-read path. Checking both bounds in one function means a
+/// new stamp-reading caller gets the floor and the ceiling together and cannot
+/// half-enforce.
+///
+/// - `stamp > CURRENT`: the graph was written by a newer binary — upgrade omnigraph.
+/// - `stamp < MIN_SUPPORTED`: the graph predates the oldest migration this binary
+///   still carries — migrate it forward with an older release first, then reopen.
+pub(crate) fn refuse_if_stamp_unsupported(stamp: u32) -> Result<()> {
+    if stamp > INTERNAL_MANIFEST_SCHEMA_VERSION {
+        return Err(OmniError::manifest(format!(
+            "__manifest is stamped at internal schema v{} but this binary expects v{} \
+             — upgrade omnigraph before opening this graph",
+            stamp, INTERNAL_MANIFEST_SCHEMA_VERSION,
+        )));
+    }
+    if stamp < MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION {
+        return Err(OmniError::manifest(format!(
+            "__manifest is stamped at internal schema v{} but this binary supports v{} or later \
+             — open it with an older omnigraph release to migrate it forward first, then reopen",
+            stamp, MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION,
+        )));
+    }
+    Ok(())
+}
+
 /// Apply any pending internal-schema migrations to the manifest dataset.
 ///
 /// Idempotent: when the on-disk stamp matches the binary, this is a single
 /// metadata read with no writes.
-pub(super) async fn migrate_internal_schema(dataset: &mut Dataset) -> Result<()> {
+///
+/// `root_uri` + `branch` identify which graph + branch this `dataset` is a
+/// manifest for. The v3→v4 lineage backfill needs them to read that branch's
+/// `_graph_commits.lance`. `migrate_on_open` passes the main branch
+/// (`branch = None`); the publisher's `load_publish_state` passes its own
+/// branch, so each branch backfills on its first write.
+pub(super) async fn migrate_internal_schema(
+    dataset: &mut Dataset,
+    root_uri: &str,
+    branch: Option<&str>,
+) -> Result<()> {
    let mut current = read_stamp(dataset);

-    if current > INTERNAL_MANIFEST_SCHEMA_VERSION {
-        return Err(OmniError::manifest(format!(
-            "__manifest is stamped at internal schema v{} but this binary expects v{} \
-             — upgrade omnigraph before opening this graph for writes",
-            current, INTERNAL_MANIFEST_SCHEMA_VERSION,
-        )));
-    }
+    refuse_if_stamp_unsupported(current)?;

    while current < INTERNAL_MANIFEST_SCHEMA_VERSION {
        match current {
@ -97,6 +180,10 @@ pub(super) async fn migrate_internal_schema(dataset: &mut Dataset) -> Result<()>
                migrate_v2_to_v3(dataset).await?;
                current = 3;
            }
+            3 => {
+                migrate_v3_to_v4(dataset, root_uri, branch).await?;
+                current = 4;
+            }
            other => {
                return Err(OmniError::manifest_internal(format!(
                    "no internal-schema migration registered for v{} → v{}",
@ -202,6 +289,218 @@ async fn migrate_v2_to_v3(dataset: &mut Dataset) -> Result<()> {
    set_stamp(dataset, 3).await
 }

+/// v3 → v4: backfill the graph lineage from `_graph_commits.lance` into
+/// `__manifest`, then bump the stamp.
+///
+/// RFC-013 Phase 7 made `__manifest` the single source of graph lineage
+/// (`graph_commit` / `graph_head:<branch>` rows, written in the publish CAS).
+/// A pre-Phase-7 (v3) graph has its lineage only in `_graph_commits.lance` and
+/// none in `__manifest`, so the new binary would read an EMPTY commit DAG. This
+/// one-time per-branch migration copies that branch's commits + the single head
+/// into `__manifest` so reads see the real history. `_graph_commits.lance`
+/// itself is left untouched as the branch-ref carrier (no commit row is ever
+/// written to it again).
+///
+/// `dataset` is the `__manifest` for `branch` (main when `branch` is `None`);
+/// the migration runs per-branch on that branch's first write, so it reads
+/// `_graph_commits.lance` at the SAME branch.
+///
+/// Idempotency + crash recovery: the stamp bump is the LAST step, and the
+/// lineage merge is keyed on `object_id` (re-inserting the same commit rows is a
+/// no-op update). A crash after the merge but before the stamp bump re-enters
+/// here at v3 and re-runs harmlessly. As a fast path, if `__manifest` already
+/// carries `graph_commit` rows (a previous run completed the merge), we skip
+/// straight to the stamp bump.
+///
+/// Concurrent runners: two processes (or two open-for-write handles) can open the
+/// same legacy graph at once and both reach the backfill merge. `merge_lineage_rows`
+/// uses `conflict_retries(0)`, so the row-level CAS loser on `graph_head:<branch>`
+/// must be re-driven here rather than failing the open — `migrate_v2_to_v3` is
+/// concurrent-runner idempotent and this step must be too. The bounded loop
+/// re-reads the fast path (a concurrent winner's merge is one atomic Lance commit,
+/// so a re-read sees either zero or all of its rows, never partial), re-opens the
+/// stale handle past the winner's commit, and retries. On budget exhaustion it
+/// returns a `RowLevelCasContention`-typed error so the publisher's OUTER retry
+/// loop (which only re-runs `is_retryable_publish_conflict` conflicts) completes
+/// it on the next attempt — the same converge-on-next-attempt contract the
+/// recovery sweep uses.
+async fn migrate_v3_to_v4(
+    dataset: &mut Dataset,
+    root_uri: &str,
+    branch: Option<&str>,
+) -> Result<()> {
+    // Mirror the publisher's budget (`publisher::PUBLISHER_RETRY_BUDGET = 5`); kept
+    // as a local const rather than re-exporting that private one — the two are the
+    // same shape (bounded row-level-CAS retries) but independent knobs.
+    const MIGRATION_MERGE_RETRY_BUDGET: u32 = 5;
+
+    // Exclusive range + an unguarded retryable arm (see `commit_v4_stamp_idempotently`
+    // for the rationale): every retryable conflict re-opens and retries inside the
+    // loop, and the SINGLE reachable exhaustion path is the typed contention return
+    // below — so the retryable variant can never fall through to the `Err(err)`
+    // propagate arm on the last iteration.
+    for _ in 0..MIGRATION_MERGE_RETRY_BUDGET {
+        // Fast path / idempotency + concurrent-winner guard: if the backfill
+        // already landed (a previous run, OR a concurrent runner that won the CAS
+        // — its merge is atomic, so this is all-or-nothing), don't re-merge — just
+        // (re)stamp. `dataset` is re-opened past any winner's commit below, so this
+        // re-read sees the winner's rows on a retry.
+        let (existing_lineage, _heads) = read_graph_lineage(dataset).await?;
+        if !existing_lineage.is_empty() {
+            return commit_v4_stamp_idempotently(dataset, root_uri, branch).await;
+        }
+
+        // Read this branch's legacy commit cache (commits + the head). An absent or
+        // empty `_graph_commits.lance` yields no commits — nothing to backfill.
+        let (commit_by_id, head) =
+            crate::db::commit_graph::read_legacy_commit_cache(root_uri, branch).await?;
+        if commit_by_id.is_empty() {
+            return commit_v4_stamp_idempotently(dataset, root_uri, branch).await;
+        }
+
+        let parts = build_lineage_backfill_parts(&commit_by_id, head.as_ref(), branch)?;
+
+        match merge_lineage_rows(dataset.clone(), &parts).await {
+            Ok(new_dataset) => {
+                *dataset = new_dataset;
+                // Stamp LAST. Crash window: a failure between the merge above and
+                // this stamp bump leaves stamp v3 + lineage present in `__manifest`.
+                // The next open re-enters at v3, the fast path at the top sees the
+                // lineage and skips straight to the stamp bump — completing the
+                // migration with no duplicate rows (the merge is keyed on
+                // `object_id`). Pinned by
+                // `crash_after_merge_before_stamp_completes_on_next_open`.
+                return commit_v4_stamp_idempotently(dataset, root_uri, branch).await;
+            }
+            // A concurrent runner won the `graph_head:<branch>` CAS. Our in-hand
+            // handle is stale at the pre-contention HEAD, so a re-open is required
+            // to see the winner's commit; then re-loop (the fast path will see the
+            // winner's lineage and stamp). Bounded by the budget.
+            Err(err) if super::publisher::is_retryable_publish_conflict(&err) => {
+                *dataset = super::layout::open_manifest_dataset(root_uri, branch).await?;
+                continue;
+            }
+            Err(err) => return Err(err),
+        }
+    }
+
+    // Budget exhausted under sustained contention. Return a CAS-typed error (not a
+    // plain conflict) so the publisher's outer retry loop — which only re-runs
+    // `is_retryable_publish_conflict` — re-runs `load_publish_state` and completes
+    // the migration, rather than giving up.
+    Err(OmniError::manifest_row_level_cas_contention(format!(
+        "v3→v4 lineage backfill exhausted {} retries against concurrent runners",
+        MIGRATION_MERGE_RETRY_BUDGET
+    )))
+}
+
+/// Stamp the v3→v4 migration's terminal version idempotently under concurrent
+/// runners. `set_stamp` issues an `UpdateConfig` Lance commit; once the merge CAS
+/// loser is made to converge (above), BOTH runners reach this stamp bump and race
+/// it — the loser gets `lance::Error::IncompatibleTransaction` (two `UpdateConfig`
+/// commits touching the same metadata key), which is NOT a row-level CAS
+/// contention and so is not caught by the merge loop. But both write the SAME
+/// value, so the conflict is benign: re-open and, if the stamp already reached the
+/// target (the concurrent runner finished it), succeed; otherwise re-apply.
+/// Bounded; on exhaustion surface a CAS-typed error for the publisher's outer
+/// retry, same as the merge loop.
+async fn commit_v4_stamp_idempotently(
+    dataset: &mut Dataset,
+    root_uri: &str,
+    branch: Option<&str>,
+) -> Result<()> {
+    const STAMP_RETRY_BUDGET: u32 = 5;
+    // Exclusive range + an UNGUARDED `IncompatibleTransaction` arm: the retryable
+    // variant is always handled inside the loop (re-open + same-value check + retry),
+    // so it can never fall through to the stringifying `Err(e)` catch-all, and the
+    // SINGLE reachable exhaustion path is the typed contention return below. (A
+    // `0..=BUDGET` range with an `attempt < BUDGET` guard let the last iteration's
+    // retryable conflict reach the catch-all and return a non-retryable
+    // `OmniError::Lance` — the publisher's outer retry would then give up.)
+    for _ in 0..STAMP_RETRY_BUDGET {
+        // Inline the `update_schema_metadata` write (rather than `set_stamp`) so the
+        // raw Lance error variant is in hand — `set_stamp` pre-stringifies it.
+        let stamp_result = stamp_internal_schema(dataset).await;
+        match stamp_result {
+            Ok(_) => return Ok(()),
+            Err(lance::Error::IncompatibleTransaction { .. }) => {
+                // A concurrent runner's `UpdateConfig` preempted ours — the
+                // retryable case. Re-open past its commit; if it already stamped to
+                // the target we're done (the value is identical), else fall through
+                // to retry on the advanced handle.
+                *dataset = super::layout::open_manifest_dataset(root_uri, branch).await?;
+                if read_stamp(dataset) >= INTERNAL_MANIFEST_SCHEMA_VERSION {
+                    return Ok(());
+                }
+            }
+            Err(e) => return Err(OmniError::Lance(e.to_string())),
+        }
+    }
+
+    // Exhausted the budget against sustained concurrent stampers. Return a
+    // CAS-typed (retryable) error so the publisher's OUTER retry — which only
+    // re-runs `is_retryable_publish_conflict` — completes it, rather than the
+    // stringified `OmniError::Lance` it would treat as fatal.
+    Err(OmniError::manifest_row_level_cas_contention(format!(
+        "v3→v4 stamp bump exhausted {} retries against concurrent runners",
+        STAMP_RETRY_BUDGET
+    )))
+}
+
+/// The single `update_schema_metadata` write that bumps the on-disk internal-schema
+/// stamp to the current version. Extracted from `commit_v4_stamp_idempotently`'s
+/// retry loop so a `failpoints` test can inject a concurrent-stamper
+/// `IncompatibleTransaction` deterministically (the loop's exhaustion path is
+/// otherwise near-unreachable). Returns the RAW `lance::Error` so the loop can match
+/// the `IncompatibleTransaction` variant — `set_stamp` pre-stringifies it.
+async fn stamp_internal_schema(dataset: &mut Dataset) -> std::result::Result<(), lance::Error> {
+    crate::failpoints::maybe_fail_lance_incompatible("migration.v4_stamp.force_incompatible")?;
+    dataset
+        .update_schema_metadata([(
+            INTERNAL_SCHEMA_VERSION_KEY.to_string(),
+            INTERNAL_MANIFEST_SCHEMA_VERSION.to_string(),
+        )])
+        .await
+        .map(|_| ())
+}
+
+/// Build the `__manifest` rows for the v3→v4 backfill: one immutable
+/// `graph_commit` row per commit, plus EXACTLY ONE `graph_head:<branch>` row for
+/// the actual head. Each commit encodes to a `[graph_commit, graph_head]` pair,
+/// but only the head commit's head row is kept — the others would be redundant
+/// updates of the same `graph_head:<branch>` object_id (the head is per-branch,
+/// not per-commit).
+fn build_lineage_backfill_parts(
+    commit_by_id: &std::collections::HashMap<String, GraphCommit>,
+    head: Option<&GraphCommit>,
+    branch: Option<&str>,
+) -> Result<Vec<super::state::GraphLineageRowPart>> {
+    let head_id = head.map(|h| h.graph_commit_id.as_str());
+    // Deterministic iteration order (the source is a HashMap): merge-insert is
+    // keyed on `object_id` so the final manifest content is order-independent,
+    // but a stable order keeps the produced batch reproducible regardless.
+    let mut commits: Vec<&GraphCommit> = commit_by_id.values().collect();
+    commits.sort_by(|a, b| a.graph_commit_id.cmp(&b.graph_commit_id));
+    let mut parts = Vec::with_capacity(commits.len() + 1);
+    for commit in commits {
+        let row = GraphLineageRow {
+            graph_commit_id: commit.graph_commit_id.clone(),
+            manifest_branch: commit.manifest_branch.clone(),
+            manifest_version: commit.manifest_version,
+            parent_commit_id: commit.parent_commit_id.clone(),
+            merged_parent_commit_id: commit.merged_parent_commit_id.clone(),
+            actor_id: commit.actor_id.clone(),
+            created_at: commit.created_at,
+        };
+        let [commit_part, head_part] = graph_lineage_row_parts(&row, branch)?;
+        parts.push(commit_part);
+        if Some(commit.graph_commit_id.as_str()) == head_id {
+            parts.push(head_part);
+        }
+    }
+    Ok(parts)
+}
+
 async fn set_stamp(dataset: &mut Dataset, version: u32) -> Result<()> {
    dataset
        .update_schema_metadata([(INTERNAL_SCHEMA_VERSION_KEY.to_string(), version.to_string())])
@ -209,3 +508,42 @@ async fn set_stamp(dataset: &mut Dataset, version: u32) -> Result<()> {
        .map_err(|e| OmniError::Lance(e.to_string()))?;
    Ok(())
 }
+
+/// Test-only: force the on-disk internal-schema stamp to `version`. Used to
+/// synthesize a pre-migration graph (rewinding to v3) and to simulate a crash
+/// that lost the final stamp bump. Gated on `test` OR `failpoints` so the
+/// fault-injection migration test (in the `failpoints` integration binary,
+/// compiled without `cfg(test)`) can reach it too.
+#[cfg(any(test, feature = "failpoints"))]
+pub(crate) async fn set_stamp_for_test(dataset: &mut Dataset, version: u32) -> Result<()> {
+    set_stamp(dataset, version).await
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    /// The floor never refuses any stamp the binary can actually serve — a graph
+    /// at MIN through CURRENT passes, only sub-MIN / super-CURRENT are rejected.
+    /// With MIN = 1 and CURRENT = 4 this proves the live range is exactly [1, 4]
+    /// and that the floor is a no-op for every real graph (lowest real stamp is 1).
+    #[test]
+    fn unsupported_guard_accepts_exactly_the_supported_range() {
+        for stamp in MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION..=INTERNAL_MANIFEST_SCHEMA_VERSION {
+            assert!(
+                refuse_if_stamp_unsupported(stamp).is_ok(),
+                "stamp v{stamp} is within [MIN, CURRENT] and must be accepted"
+            );
+        }
+        if MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION > 0 {
+            assert!(
+                refuse_if_stamp_unsupported(MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION - 1).is_err(),
+                "a sub-floor stamp must be refused"
+            );
+        }
+        assert!(
+            refuse_if_stamp_unsupported(INTERNAL_MANIFEST_SCHEMA_VERSION + 1).is_err(),
+            "a future stamp must be refused"
+        );
+    }
+}
--- a/crates/omnigraph/src/db/manifest/publisher.rs
+++ b/crates/omnigraph/src/db/manifest/publisher.rs
@ -35,8 +35,8 @@ use super::layout::{open_manifest_dataset, tombstone_object_id, version_object_i
 use super::metadata::parse_namespace_version_request;
 use super::migrations::migrate_internal_schema;
 use super::state::{
-    manifest_rows_batch, manifest_schema, read_manifest_entries, read_registered_table_locations,
-    read_tombstone_versions,
+    GraphLineageRow, GraphLineageRowPart, graph_lineage_row_parts, head_lineage_row,
+    manifest_rows_batch, manifest_schema, read_publish_scan,
 };
 use super::{
    ManifestChange, OBJECT_TYPE_TABLE, OBJECT_TYPE_TABLE_TOMBSTONE, OBJECT_TYPE_TABLE_VERSION,
@ -50,13 +50,48 @@ use super::{
 /// iteration re-runs `load_publish_state` and the expected-version pre-check.
 const PUBLISHER_RETRY_BUDGET: u32 = 5;

+/// The graph-lineage commit to record atomically with a manifest publish
+/// (RFC-013 Phase 7). One logical commit per publish: the `graph_commit_id` is
+/// minted once by the caller and stays stable across the publisher's CAS
+/// retries; only the parent re-resolves per attempt (against the freshly loaded
+/// `__manifest`), so a retry after a concurrent commit parents off the new head
+/// — the TOCTOU the dual-write era's `commit_graph.refresh()` guarded is closed
+/// by construction.
+#[derive(Debug, Clone)]
+pub(crate) struct LineageIntent {
+    /// ULID minted once before the publish loop; the graph commit's identity.
+    pub graph_commit_id: String,
+    /// The branch this commit lands on (`None` = main). Selects the
+    /// `graph_head:<branch>` pointer row that gets updated.
+    pub branch: Option<String>,
+    /// Authoring actor, or `None` for unauthored / system writes.
+    pub actor_id: Option<String>,
+    /// The merged-in source head — `Some` only for a branch-merge commit.
+    pub merged_parent_commit_id: Option<String>,
+    /// Commit timestamp (microseconds since the UNIX epoch).
+    pub created_at: i64,
+}
+
+/// The result of a manifest publish that may have folded in a graph commit.
+#[derive(Debug)]
+pub(super) struct PublishOutcome {
+    /// The advanced `__manifest` dataset (its version is the published version).
+    pub dataset: Dataset,
+    /// The parent the publisher resolved for the recorded commit, if a
+    /// [`LineageIntent`] was supplied. Returned so the caller can update its
+    /// in-memory commit cache without a re-read. `None` when no lineage was
+    /// recorded, or when the commit is the genesis (no parent).
+    pub parent_commit_id: Option<String>,
+}
+
 #[async_trait]
 pub(super) trait ManifestBatchPublisher: Send + Sync {
    async fn publish(
        &self,
        changes: &[ManifestChange],
        expected_table_versions: &HashMap<String, u64>,
-    ) -> Result<Dataset>;
+        lineage: Option<&LineageIntent>,
+    ) -> Result<PublishOutcome>;
 }

 pub(super) struct GraphNamespacePublisher {
@ -76,6 +111,19 @@ struct PendingVersionRow {
    row_count: Option<u64>,
 }

+/// Everything one CAS attempt needs out of a single `__manifest` scan
+/// (RFC-013 P2): the open dataset, table state for the pre-check + pending-row
+/// build, and the `graph_commit` lineage rows for parent resolution. Folding the
+/// lineage into this struct is what lets `resolve_lineage_rows` skip its own
+/// `read_graph_lineage` scan.
+struct LoadedPublishState {
+    dataset: Dataset,
+    registered_tables: HashMap<String, String>,
+    existing_versions: HashMap<(String, u64), SubTableEntry>,
+    existing_tombstones: HashMap<(String, u64), ()>,
+    lineage_rows: Vec<GraphLineageRow>,
+}
+
 impl GraphNamespacePublisher {
    pub(super) fn new(root_uri: &str, branch: Option<&str>) -> Self {
        Self {
@ -90,22 +138,31 @@ impl GraphNamespacePublisher {
        open_manifest_dataset(&self.root_uri, self.branch.as_deref()).await
    }

-    async fn load_publish_state(
-        &self,
-    ) -> Result<(
-        Dataset,
-        HashMap<String, String>,
-        HashMap<(String, u64), SubTableEntry>,
-        HashMap<(String, u64), ()>,
-    )> {
+    async fn load_publish_state(&self) -> Result<LoadedPublishState> {
+        // Test seam: inject a retryable contention here to exercise the outer
+        // retry loop's re-run-on-retryable-load-error path (no-op without the
+        // `failpoints` feature). The migration surfaces the same typed error.
+        crate::failpoints::maybe_fail_retryable_contention(
+            crate::failpoints::names::PUBLISH_LOAD_STATE_RETRYABLE_CONTENTION,
+        )?;
        let mut dataset = self.dataset().await?;
        // Run pending internal-schema migrations exactly once per publish on
        // the open-for-write path; idempotent when the on-disk stamp already
-        // matches this binary. See `db/manifest/migrations.rs`.
-        migrate_internal_schema(&mut dataset).await?;
-        let registered_tables = read_registered_table_locations(&dataset).await?;
-        let existing_entries = read_manifest_entries(&dataset).await?;
-        let existing_versions = existing_entries
+        // matches this binary. Pass this publisher's branch so the v3→v4 lineage
+        // backfill reads `_graph_commits.lance` at the SAME branch it is
+        // publishing to (each branch backfills on its first write). See
+        // `db/manifest/migrations.rs`.
+        migrate_internal_schema(&mut dataset, &self.root_uri, self.branch.as_deref()).await?;
+        // ONE `__manifest` scan for everything the publish needs: table
+        // locations, version entries, tombstones, AND the `graph_commit` lineage
+        // rows for parent resolution (RFC-013 P2). The lineage extraction rides
+        // this pass instead of a second `read_graph_lineage` scan in
+        // `resolve_lineage_rows`; the per-attempt re-read is preserved because
+        // `load_publish_state` runs once per CAS attempt, so a retry sees the
+        // advanced head and re-parents correctly.
+        let scan = read_publish_scan(&dataset).await?;
+        let existing_versions = scan
+            .version_entries
            .iter()
            .map(|entry| {
                (
@ -114,13 +171,14 @@ impl GraphNamespacePublisher {
                )
            })
            .collect();
-        let existing_tombstones = read_tombstone_versions(&dataset).await?;
-        Ok((
+        let existing_tombstones = scan.tombstones.into_iter().collect();
+        Ok(LoadedPublishState {
            dataset,
-            registered_tables,
+            registered_tables: scan.table_locations,
            existing_versions,
            existing_tombstones,
-        ))
+            lineage_rows: scan.lineage_rows,
+        })
    }

    fn build_pending_rows(
@ -266,6 +324,50 @@ impl GraphNamespacePublisher {
        Ok(rows)
    }

+    /// Resolve the parent for `intent` against the just-loaded `dataset` and
+    /// build the two lineage rows (`graph_commit` + `graph_head:<branch>`) to
+    /// fold into the publish batch. Runs INSIDE the CAS retry loop, so the
+    /// parent is read from the manifest state this attempt will commit against —
+    /// a retry after a concurrent commit re-reads the advanced head and parents
+    /// correctly (TOCTOU closed). `new_manifest_version` is the version this
+    /// publish produces (the recorded commit pins it).
+    ///
+    /// The parent is the current head of the branch's lineage — the
+    /// `should_replace_head` winner over the visible `graph_commit` rows, the
+    /// same selection the commit-graph cache uses. (The denormalized
+    /// `graph_head:<branch>` row is written for forward-compat but is not the
+    /// parent source here: a branch freshly forked from main inherits main's
+    /// commits but not yet a `graph_head:<its-name>` row, and the head-over-rows
+    /// computation gives the correct fork-point parent in that case.)
+    ///
+    /// `lineage_rows` is the `graph_commit` set this attempt already parsed in
+    /// `load_publish_state`'s single scan (RFC-013 P2) — NOT a fresh
+    /// `read_graph_lineage` scan. The per-attempt re-read is still preserved: the
+    /// retry loop re-runs `load_publish_state`, so each attempt's `lineage_rows`
+    /// reflects the head as it stands for that attempt.
+    fn resolve_lineage_rows(
+        lineage_rows: &[GraphLineageRow],
+        intent: &LineageIntent,
+        new_manifest_version: u64,
+    ) -> Result<(Vec<PendingVersionRow>, Option<String>)> {
+        let parent_commit_id = head_lineage_row(lineage_rows).map(|h| h.graph_commit_id.clone());
+
+        let commit = GraphLineageRow {
+            graph_commit_id: intent.graph_commit_id.clone(),
+            manifest_branch: intent.branch.clone(),
+            manifest_version: new_manifest_version,
+            parent_commit_id: parent_commit_id.clone(),
+            merged_parent_commit_id: intent.merged_parent_commit_id.clone(),
+            actor_id: intent.actor_id.clone(),
+            created_at: intent.created_at,
+        };
+        let parts = graph_lineage_row_parts(&commit, intent.branch.as_deref())?;
+        Ok((
+            parts.into_iter().map(lineage_part_to_pending).collect(),
+            parent_commit_id,
+        ))
+    }
+
    fn pending_rows_to_batch(rows: Vec<PendingVersionRow>) -> Result<arrow_array::RecordBatch> {
        let mut object_ids = Vec::with_capacity(rows.len());
        let mut object_types = Vec::with_capacity(rows.len());
@ -420,7 +522,25 @@ impl GraphNamespacePublisher {
                }))
            })
            .collect::<Result<Vec<_>>>()?;
-        self.publish(&changes, &HashMap::new()).await
+        Ok(self.publish(&changes, &HashMap::new(), None).await?.dataset)
+    }
+}
+
+/// Map a `state::GraphLineageRowPart` onto a `PendingVersionRow` so a graph
+/// commit's two lineage rows ride the same publish batch as the table-version
+/// rows (RFC-013 Phase 7). Lineage rows carry no table identity: `table_key` is
+/// the empty string (never matched by a real key) and `location`/`row_count`
+/// are null.
+fn lineage_part_to_pending(part: GraphLineageRowPart) -> PendingVersionRow {
+    PendingVersionRow {
+        object_id: part.object_id,
+        object_type: part.object_type.to_string(),
+        location: None,
+        metadata: Some(part.metadata),
+        table_key: String::new(),
+        table_version: part.table_version,
+        table_branch: part.table_branch,
+        row_count: None,
    }
 }

@ -429,7 +549,17 @@ impl GraphNamespacePublisher {
 /// merge-insert join key, annotated as an unenforced primary key on
 /// `__manifest`). Translate it to a typed manifest conflict so callers can
 /// match without parsing strings; everything else is opaque storage.
-fn map_lance_publish_error(err: LanceError) -> OmniError {
+///
+/// Shared (`pub(crate)`) with the v3→v4 lineage backfill
+/// (`state::merge_lineage_rows`), which issues its own `__manifest` merge-insert
+/// outside the publisher and must surface the SAME typed
+/// `RowLevelCasContention` so the migration's re-open retry loop can recognize a
+/// CAS loss. This is the merge-insert (`execute_reader`) conflict vocabulary
+/// only. It is deliberately NOT `optimize::is_retryable_lance_conflict`: that one
+/// also matches `CommitConflict`/`RetryableCommitConflict` from the COMPACTION
+/// commit path (`compact_files` -> `apply_commit`), which a row-level merge-insert
+/// never emits — folding it in here would match impossible variants.
+pub(crate) fn map_lance_publish_error(err: LanceError) -> OmniError {
    if matches!(err, LanceError::TooMuchWriteContention { .. }) {
        return OmniError::manifest_row_level_cas_contention(format!(
            "manifest publish lost a row-level CAS race: {}",
@ -445,14 +575,40 @@ impl ManifestBatchPublisher for GraphNamespacePublisher {
        &self,
        changes: &[ManifestChange],
        expected_table_versions: &HashMap<String, u64>,
-    ) -> Result<Dataset> {
-        if changes.is_empty() && expected_table_versions.is_empty() {
-            return self.dataset().await;
+        lineage: Option<&LineageIntent>,
+    ) -> Result<PublishOutcome> {
+        if changes.is_empty() && expected_table_versions.is_empty() && lineage.is_none() {
+            return Ok(PublishOutcome {
+                dataset: self.dataset().await?,
+                parent_commit_id: None,
+            });
        }

        for attempt in 0..=PUBLISHER_RETRY_BUDGET {
-            let (dataset, known_tables, existing_versions, existing_tombstones) =
-                self.load_publish_state().await?;
+            // `load_publish_state` runs the v3→v4 migration (`migrate_internal_schema`)
+            // on its first scan. The migration's bounded merge/stamp retries surface a
+            // retryable `RowLevelCasContention` on exhaustion EXPECTING this outer loop
+            // to re-run them — a re-run re-reads the manifest, by which point a
+            // concurrent winner has usually completed the migration (next scan is a
+            // no-op). Route a retryable load error through the SAME retry path as a
+            // retryable `merge_rows` conflict below, so that typed contention actually
+            // composes with the publisher retry instead of aborting the publish.
+            let loaded = match self.load_publish_state().await {
+                Ok(loaded) => loaded,
+                Err(err)
+                    if attempt < PUBLISHER_RETRY_BUDGET && is_retryable_publish_conflict(&err) =>
+                {
+                    continue;
+                }
+                Err(err) => return Err(err),
+            };
+            let LoadedPublishState {
+                dataset,
+                registered_tables: known_tables,
+                existing_versions,
+                existing_tombstones,
+                lineage_rows,
+            } = loaded;

            let latest_per_table =
                Self::latest_visible_per_table(&existing_versions, &existing_tombstones);
@ -461,19 +617,48 @@ impl ManifestBatchPublisher for GraphNamespacePublisher {
            // surfaced as `ExpectedVersionMismatch` rather than retried.
            Self::check_expected_table_versions(&latest_per_table, expected_table_versions)?;

-            if changes.is_empty() {
-                return Ok(dataset);
-            }
-
-            let rows = Self::build_pending_rows(
+            let mut rows = Self::build_pending_rows(
                changes,
                &known_tables,
                &existing_versions,
                &existing_tombstones,
            )?;

+            // Fold the graph commit into the SAME batch so table-version rows
+            // and lineage rows land in one merge-insert (one Lance commit, one
+            // manifest version) — no separate write, no manifest→commit-graph
+            // atomicity gap. The merge-insert advances exactly one version on
+            // top of the loaded dataset, so the commit pins
+            // `current + 1`. The parent is resolved here, per attempt, from the
+            // lineage rows THIS attempt's scan loaded (TOCTOU closed on a CAS
+            // retry — a retry re-runs `load_publish_state` → fresh lineage).
+            let parent_commit_id = match lineage {
+                Some(intent) => {
+                    let new_manifest_version = dataset.version().version + 1;
+                    let (commit_rows, parent) =
+                        Self::resolve_lineage_rows(&lineage_rows, intent, new_manifest_version)?;
+                    rows.extend(commit_rows);
+                    parent
+                }
+                None => None,
+            };
+
+            if rows.is_empty() {
+                // Expected-version-only publish with no changes and no lineage:
+                // the precondition held, nothing to write.
+                return Ok(PublishOutcome {
+                    dataset,
+                    parent_commit_id,
+                });
+            }
+
            match self.merge_rows(dataset, rows).await {
-                Ok(new_dataset) => return Ok(new_dataset),
+                Ok(new_dataset) => {
+                    return Ok(PublishOutcome {
+                        dataset: new_dataset,
+                        parent_commit_id,
+                    });
+                }
                Err(err) => {
                    if attempt < PUBLISHER_RETRY_BUDGET && is_retryable_publish_conflict(&err) {
                        continue;
@ -497,7 +682,12 @@ impl ManifestBatchPublisher for GraphNamespacePublisher {
 /// contention; if the caller's `expected_table_versions` still holds against
 /// the new manifest state, we re-attempt. Other conflict variants (notably
 /// `ExpectedVersionMismatch`) propagate so the caller learns immediately.
-fn is_retryable_publish_conflict(err: &OmniError) -> bool {
+///
+/// Shared (`pub(crate)`) with the v3→v4 lineage backfill's re-open retry loop
+/// (`migrations::migrate_v3_to_v4`), so the migration's retry decision matches the
+/// publisher's by construction — both retry exactly `RowLevelCasContention` and
+/// propagate everything else.
+pub(crate) fn is_retryable_publish_conflict(err: &OmniError) -> bool {
    matches!(
        err,
        OmniError::Manifest(m)
--- a/crates/omnigraph/src/db/manifest/recovery.rs
+++ b/crates/omnigraph/src/db/manifest/recovery.rs
@ -40,17 +40,14 @@ use lance::Dataset;
 use serde::{Deserialize, Serialize};
 use tracing::warn;

-use crate::db::commit_graph::CommitGraph;
 use crate::db::graph_coordinator::GraphCoordinator;
-use crate::db::recovery_audit::{
-    RecoveryAudit, RecoveryAuditRecord, RecoveryKind, TableOutcome, now_micros,
-};
+use crate::db::recovery_audit::{RecoveryAudit, RecoveryAuditRecord, RecoveryKind, TableOutcome};
 use crate::db::schema_state::SchemaStateRecovery;
 use crate::error::{OmniError, Result};
 use crate::storage::StorageAdapter;

 use super::Snapshot;
-use super::publisher::{GraphNamespacePublisher, ManifestBatchPublisher};
+use super::publisher::{GraphNamespacePublisher, LineageIntent, ManifestBatchPublisher};
 use super::{ManifestChange, SubTableUpdate, TableRegistration, TableTombstone};

 /// System actor identifier recorded on every recovery commit. Operators
@ -59,6 +56,44 @@ use super::{ManifestChange, SubTableUpdate, TableRegistration, TableTombstone};
 /// into the audit row's `recovery_for_actor` field.
 pub(crate) const RECOVERY_ACTOR: &str = "omnigraph:recovery";

+/// Publish a recovery action's manifest `updates` AND its recovery commit in one
+/// CAS (RFC-013 Phase 7). The recovery commit's lineage (`graph_commit` +
+/// `graph_head`) rides the same merge-insert as the table-version re-pin — there
+/// is no separate `_graph_commits.lance` write and no manifest→commit-graph gap.
+/// `updates` is empty for the no-table-change recovery paths (all-NoMovement
+/// roll-back, stale-sidecar cleanup, orphaned-branch discard); the lineage rows
+/// still publish, so the recovery commit is always durable.
+///
+/// The commit's first parent is resolved by the publisher (the live head of the
+/// recovery's branch); its merged-in parent is the sidecar's recorded source
+/// head for a rolled-forward branch merge, matching the pre-Phase-7 merge-commit
+/// shape. Returns the new manifest version and the minted recovery commit id
+/// (which the audit row references).
+async fn publish_recovery_commit(
+    root_uri: &str,
+    sidecar: &RecoverySidecar,
+    kind: RecoveryKind,
+    updates: &[ManifestChange],
+    expected: &HashMap<String, u64>,
+) -> Result<(u64, String)> {
+    let merged_parent_commit_id = match (sidecar.writer_kind, kind) {
+        (SidecarKind::BranchMerge, RecoveryKind::RolledForward) => {
+            sidecar.merge_source_commit_id.clone()
+        }
+        _ => None,
+    };
+    let intent = LineageIntent {
+        graph_commit_id: ulid::Ulid::new().to_string(),
+        branch: sidecar.branch.clone(),
+        actor_id: Some(RECOVERY_ACTOR.to_string()),
+        merged_parent_commit_id,
+        created_at: crate::db::now_micros()?,
+    };
+    let publisher = GraphNamespacePublisher::new(root_uri, sidecar.branch.as_deref());
+    let outcome = publisher.publish(updates, expected, Some(&intent)).await?;
+    Ok((outcome.dataset.version().version, intent.graph_commit_id))
+}
+
 /// Subdirectory under the graph root holding sidecar files.
 pub(crate) const RECOVERY_DIR_NAME: &str = "__recovery";

@ -416,7 +451,7 @@ pub(crate) async fn write_sidecar(
 ) -> Result<RecoverySidecarHandle> {
    // Failpoint: models a storage put failure (S3 PutObject / fs write)
    // in Phase A — every writer must abort before any HEAD advance.
-    crate::failpoints::maybe_fail("recovery.sidecar_write")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::RECOVERY_SIDECAR_WRITE)?;
    debug_assert_eq!(sidecar.schema_version, SIDECAR_SCHEMA_VERSION);
    let uri = sidecar_uri(root_uri, &sidecar.operation_id);
    let json = serde_json::to_string_pretty(sidecar).map_err(|err| {
@ -457,7 +492,7 @@ pub(crate) async fn confirm_sidecar_phase_b(
 ) -> Result<()> {
    // Failpoint: models a storage failure on the confirmation write — the
    // pre-confirm sidecar stays on disk, so recovery rolls the operation back.
-    crate::failpoints::maybe_fail("recovery.sidecar_confirm")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::RECOVERY_SIDECAR_CONFIRM)?;
    for pin in &mut sidecar.tables {
        // Every pinned table MUST have an achieved version. A miss means the
        // pin set and the publish `updates` diverged — fail loudly at the
@ -489,7 +524,7 @@ pub(crate) async fn delete_sidecar(
    // Failpoint: models a storage delete failure (S3 DeleteObject) in
    // Phase D — callers swallow it (the write already published) and the
    // stale sidecar is healed by the next write or open.
-    crate::failpoints::maybe_fail("recovery.sidecar_delete")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::RECOVERY_SIDECAR_DELETE)?;
    storage.delete(&handle.sidecar_uri).await
 }

@ -507,7 +542,7 @@ pub(crate) async fn list_sidecars(
    // Failpoint: models a storage list failure (S3 ListObjectsV2) — every
    // consumer (open-time sweep, write-entry heal) must fail loudly
    // rather than silently skipping recovery.
-    crate::failpoints::maybe_fail("recovery.sidecar_list")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::RECOVERY_SIDECAR_LIST)?;
    let dir = recovery_dir_uri(root_uri);
    let mut uris = storage.list_dir(&dir).await?;
    // Sort by URI so the sweep processes sidecars deterministically.
@ -831,20 +866,13 @@ pub(crate) async fn heal_pending_sidecars_roll_forward(
                // authority) BEFORE opening: a deferred sidecar whose
                // branch was deleted would otherwise wedge every write
                // on the dead-branch open.
-                let (branch_exists, main_version) = {
+                let branch_exists = {
                    let mut coord = coordinator.write().await;
                    coord.refresh().await?;
-                    let exists = coord.all_branches().await?.iter().any(|name| name == b);
-                    (exists, coord.snapshot().version())
+                    coord.all_branches().await?.iter().any(|name| name == b)
                };
                if !branch_exists {
-                    discard_orphaned_branch_sidecar(
-                        root_uri,
-                        storage.as_ref(),
-                        &sidecar,
-                        main_version,
-                    )
-                    .await?;
+                    discard_orphaned_branch_sidecar(root_uri, storage.as_ref(), &sidecar).await?;
                    processed_any = true;
                    continue;
                }
@ -862,7 +890,7 @@ pub(crate) async fn heal_pending_sidecars_roll_forward(
        };
        if process_sidecar(
            root_uri,
-            storage.as_ref(),
+            &storage,
            &branch_snapshot,
            &sidecar,
            RecoveryMode::RollForwardOnly,
@ -893,7 +921,6 @@ async fn discard_orphaned_branch_sidecar(
    root_uri: &str,
    storage: &dyn StorageAdapter,
    sidecar: &RecoverySidecar,
-    manifest_version: u64,
 ) -> Result<()> {
    warn!(
        operation_id = sidecar.operation_id.as_str(),
@ -922,22 +949,31 @@ async fn discard_orphaned_branch_sidecar(
            && record.recovery_kind == RecoveryKind::OrphanedBranchDiscarded
    });
    if !already_recorded {
-        let mut graph = CommitGraph::open(root_uri).await?;
-        let graph_commit_id = graph
-            .append_commit(None, manifest_version, Some(RECOVERY_ACTOR))
-            .await?;
-        // Failpoint: the residual window above — commit appended, audit
+        // The orphan-discard commit is recorded on MAIN (the sidecar's own
+        // branch is gone), via a lineage-only publish into `__manifest` (RFC-013
+        // Phase 7) — no `_graph_commits.lance` row. The publisher stamps the
+        // commit at the version it produces.
+        let intent = LineageIntent {
+            graph_commit_id: ulid::Ulid::new().to_string(),
+            branch: None,
+            actor_id: Some(RECOVERY_ACTOR.to_string()),
+            merged_parent_commit_id: None,
+            created_at: crate::db::now_micros()?,
+        };
+        let publisher = GraphNamespacePublisher::new(root_uri, None);
+        publisher.publish(&[], &HashMap::new(), Some(&intent)).await?;
+        // Failpoint: the residual window above — commit published, audit
        // not yet durable.
-        crate::failpoints::maybe_fail("recovery.orphan_discard_audit_append")?;
+        crate::failpoints::maybe_fail(crate::failpoints::names::RECOVERY_ORPHAN_DISCARD_AUDIT_APPEND)?;
        audit
            .append(RecoveryAuditRecord {
-                graph_commit_id,
+                graph_commit_id: intent.graph_commit_id,
                recovery_kind: RecoveryKind::OrphanedBranchDiscarded,
                recovery_for_actor: sidecar.actor_id.clone(),
                operation_id: sidecar.operation_id.clone(),
                sidecar_writer_kind: format!("{:?}", sidecar.writer_kind),
                per_table_outcomes: Vec::new(),
-                created_at: now_micros()?,
+                created_at: crate::db::now_micros()?,
            })
            .await?;
    }
@ -1014,13 +1050,7 @@ pub(crate) async fn recover_manifest_drift(
                    .iter()
                    .any(|name| name == b)
                {
-                    discard_orphaned_branch_sidecar(
-                        root_uri,
-                        storage.as_ref(),
-                        &sidecar,
-                        coordinator.snapshot().version(),
-                    )
-                    .await?;
+                    discard_orphaned_branch_sidecar(root_uri, storage.as_ref(), &sidecar).await?;
                    continue;
                }
                let mut branch_coord =
@ -1036,7 +1066,7 @@ pub(crate) async fn recover_manifest_drift(
        };
        process_sidecar(
            root_uri,
-            storage.as_ref(),
+            &storage,
            &branch_snapshot,
            &sidecar,
            mode,
@ -1051,7 +1081,7 @@ pub(crate) async fn recover_manifest_drift(

 async fn process_sidecar(
    root_uri: &str,
-    storage: &dyn StorageAdapter,
+    storage: &std::sync::Arc<dyn StorageAdapter>,
    snapshot: &Snapshot,
    sidecar: &RecoverySidecar,
    mode: RecoveryMode,
@ -1154,7 +1184,7 @@ async fn process_sidecar(
                    );
                }
                return record_audit_recovery_rollforward(
-                    root_uri, storage, snapshot, sidecar, &states,
+                    root_uri, storage.as_ref(), sidecar, &states,
                )
                .await
                .map(|()| true);
@ -1176,7 +1206,7 @@ async fn process_sidecar(
                writer_kind = ?sidecar.writer_kind,
                "recovery: rolling back sidecar (mixed or unexpected state)"
            );
-            roll_back_sidecar(root_uri, storage, snapshot, sidecar, &states)
+            roll_back_sidecar(root_uri, storage.as_ref(), sidecar, &states)
                .await
                .map(|()| true)
        }
@ -1191,7 +1221,7 @@ async fn process_sidecar(
                            "recovery: rolling back SchemaApply sidecar because schema staging \
                             files were not promoted in this recovery pass"
                        );
-                        roll_back_sidecar(root_uri, storage, snapshot, sidecar, &states)
+                        roll_back_sidecar(root_uri, storage.as_ref(), sidecar, &states)
                            .await
                            .map(|()| true)
                    }
@ -1211,8 +1241,36 @@ async fn process_sidecar(
                "recovery: rolling forward sidecar (Phase B completed; \
                 Phase C did not land)"
            );
-            let (new_manifest_version, published_versions) =
-                roll_forward_all(root_uri, sidecar, &states, snapshot).await?;
+            // TOCTOU window: between `classify_table` (which read the manifest
+            // pin) and the publish CAS below, a concurrent live writer can
+            // advance the manifest past our expected version. The failpoint
+            // lets a test force that interleave deterministically.
+            crate::failpoints::maybe_fail(
+                crate::failpoints::names::RECOVERY_BEFORE_ROLL_FORWARD_PUBLISH,
+            )?;
+            // RFC-013 Phase 7: `roll_forward_all` folds the recovery commit into the
+            // manifest publish CAS, so it also returns the minted `graph_commit_id`
+            // for the audit row below.
+            let (new_manifest_version, published_versions, graph_commit_id) =
+                match roll_forward_all(root_uri, sidecar, &states, snapshot).await {
+                    Ok(published) => published,
+                    // Convergence-idempotent (invariants 7 & 15): a roll-forward's
+                    // postcondition is "the manifest reflects the sidecar's committed
+                    // Lance state", NOT "this sweep personally won the CAS". A
+                    // concurrent writer that advanced the manifest to/past that goal
+                    // during the classify→publish window is convergence, not a logical
+                    // conflict — so re-read and either record the already-achieved
+                    // roll-forward or defer to the next pass; never fail the open.
+                    // Any other error still propagates.
+                    Err(err) if is_expected_version_mismatch(&err) => {
+                        return converge_or_defer_roll_forward(
+                            root_uri, storage, sidecar, &states, err,
+                        )
+                        .await;
+                    }
+                    Err(err) => return Err(err),
+                };
+            let _ = new_manifest_version;
            // `to_version` records the ACTUAL Lance HEAD published for
            // each table (not pin.post_commit_pin, which is a lower bound
            // for loose-match writers like SchemaApply / EnsureIndices /
@ -1242,17 +1300,214 @@ async fn process_sidecar(
            record_audit(
                root_uri,
                sidecar,
-                new_manifest_version,
+                graph_commit_id,
                RecoveryKind::RolledForward,
                outcomes,
            )
            .await?;
-            delete_sidecar_by_operation_id(root_uri, storage, &sidecar.operation_id).await?;
+            delete_sidecar_by_operation_id(root_uri, storage.as_ref(), &sidecar.operation_id)
+                .await?;
            Ok(true)
        }
    }
 }

+/// True if `err` is the publisher's per-table CAS precondition failure
+/// (`ExpectedVersionMismatch`) — the signal that a concurrent writer advanced
+/// the manifest past what this caller expected.
+fn is_expected_version_mismatch(err: &OmniError) -> bool {
+    matches!(
+        err,
+        OmniError::Manifest(m)
+            if matches!(
+                m.details,
+                Some(crate::error::ManifestConflictDetails::ExpectedVersionMismatch { .. })
+            )
+    )
+}
+
+/// Whether the live manifest already reflects everything this sidecar intended
+/// to publish.
+///
+/// SOUNDNESS: the per-table test is `current_version >= observed lance_head`, a
+/// *proxy* for "the sidecar's committed Lance commit is an ancestor of the
+/// published HEAD" (so a higher version is a descendant that contains it). The
+/// proxy is sound only because of the heal-first invariant: every writer that
+/// can advance a table's manifest version first heals pending sidecars
+/// (`heal_pending_recovery_sidecars` runs at the head of `load`/`mutate`/
+/// schema-apply/branch-merge) or refuses on an unrecovered graph (`optimize`).
+/// So the only path past `expected_version` is one that first publishes THIS
+/// sidecar's commit at `lance_head` — version ordering then implies lineage
+/// containment. A future writer that advances a pinned table WITHOUT healing
+/// first (e.g. a non-heal-first `Overwrite` that replaces rows) would void this
+/// proxy and must be re-validated by row-id lineage, not version ordering.
+/// Added tables must be registered; tombstoned tables must be gone.
+fn sidecar_intent_satisfied(
+    snapshot: &Snapshot,
+    sidecar: &RecoverySidecar,
+    states: &[ClassifiedTable],
+) -> bool {
+    for (pin, state) in sidecar.tables.iter().zip(states.iter()) {
+        let current = snapshot
+            .entry(&pin.table_key)
+            .map(|e| e.table_version)
+            .unwrap_or(0);
+        if current < state.lance_head {
+            return false;
+        }
+    }
+    for reg in &sidecar.additional_registrations {
+        if snapshot.entry(&reg.table_key).is_none() {
+            return false;
+        }
+    }
+    for tomb in &sidecar.tombstones {
+        if snapshot.entry(&tomb.table_key).is_some() {
+            return false;
+        }
+    }
+    true
+}
+
+/// Re-read the live manifest snapshot for the sidecar's branch.
+async fn fresh_snapshot_for_sidecar(
+    root_uri: &str,
+    storage: &std::sync::Arc<dyn StorageAdapter>,
+    sidecar: &RecoverySidecar,
+) -> Result<Snapshot> {
+    let mut coordinator = match sidecar.branch.as_deref() {
+        Some(branch) if branch != "main" => {
+            GraphCoordinator::open_branch(root_uri, branch, std::sync::Arc::clone(storage)).await?
+        }
+        _ => GraphCoordinator::open(root_uri, std::sync::Arc::clone(storage)).await?,
+    };
+    coordinator.refresh().await?;
+    Ok(coordinator.snapshot())
+}
+
+/// Convergence-idempotent handling of a roll-forward publish CAS that lost to a
+/// concurrent writer (`ExpectedVersionMismatch`). A roll-forward's postcondition
+/// is "the manifest reflects the sidecar's committed Lance state", not "this
+/// sweep won the CAS" (invariants 7 & 15). Re-read the live manifest:
+///
+/// - if it already reached the sidecar's goal, the work is done (just not by us)
+///   — record the `RolledForward` audit and delete the sidecar idempotently;
+/// - otherwise the manifest is progressing but not yet at the goal — leave the
+///   sidecar for the next open / the live writer's own Phase D.
+///
+/// Either way the open does NOT fail. A genuine logical conflict (a table below
+/// `expected_version`, i.e. data lost) is not satisfiable here and re-surfaces
+/// loudly via the classifier's `InvariantViolation` on the next pass.
+/// See iss-schema-apply-reopen-recovery-race.
+async fn converge_or_defer_roll_forward(
+    root_uri: &str,
+    storage: &std::sync::Arc<dyn StorageAdapter>,
+    sidecar: &RecoverySidecar,
+    states: &[ClassifiedTable],
+    conflict: OmniError,
+) -> Result<bool> {
+    let fresh = fresh_snapshot_for_sidecar(root_uri, storage, sidecar).await?;
+    if !sidecar_intent_satisfied(&fresh, sidecar, states) {
+        warn!(
+            operation_id = sidecar.operation_id.as_str(),
+            writer_kind = ?sidecar.writer_kind,
+            "recovery: roll-forward publish lost a CAS and the manifest has not \
+             yet reached the sidecar's goal; deferring to the next pass \
+             (conflict: {conflict})"
+        );
+        return Ok(false);
+    }
+    // The manifest already reached the sidecar's goal — some other actor
+    // advanced it. Under the heal-first invariant, whoever advanced past
+    // `expected_version` first healed THIS sidecar (recorded its RolledForward
+    // audit and deleted it). So the audit row already exists; recording another
+    // here would put two RolledForward rows in `_graph_commit_recoveries` for
+    // one recovery event (visible in `commit list --filter actor=…recovery`).
+    // Only finish the bookkeeping if the sidecar is still on disk (the winner
+    // crashed between audit and delete); if it is already gone, the winner
+    // completed it — return success WITHOUT a duplicate audit, keeping the
+    // audit append-idempotent per operation_id across concurrent sweeps.
+    let sidecar_path = sidecar_uri(root_uri, &sidecar.operation_id);
+    if !storage.exists(&sidecar_path).await? {
+        warn!(
+            operation_id = sidecar.operation_id.as_str(),
+            writer_kind = ?sidecar.writer_kind,
+            "recovery: roll-forward publish lost a CAS; the winner already \
+             converged and cleaned up this sidecar — nothing to do"
+        );
+        return Ok(true);
+    }
+    warn!(
+        operation_id = sidecar.operation_id.as_str(),
+        writer_kind = ?sidecar.writer_kind,
+        "recovery: roll-forward publish lost a CAS to a concurrent writer that \
+         already reached the goal; converging (RolledForward audit + delete)"
+    );
+    let mut outcomes: Vec<TableOutcome> = sidecar
+        .tables
+        .iter()
+        .map(|pin| TableOutcome {
+            table_key: pin.table_key.clone(),
+            from_version: pin.expected_version,
+            to_version: fresh
+                .entry(&pin.table_key)
+                .map(|e| e.table_version)
+                .unwrap_or(pin.post_commit_pin),
+        })
+        .collect();
+    // Mirror the normal roll-forward audit shape: SchemaApply sidecars also
+    // register added tables, so the audit must list them too (else a converge
+    // audit row is incomplete vs the `roll_forward_all` path for the same
+    // recovery kind).
+    for reg in &sidecar.additional_registrations {
+        outcomes.push(TableOutcome {
+            table_key: reg.table_key.clone(),
+            from_version: 0,
+            to_version: fresh
+                .entry(&reg.table_key)
+                .map(|e| e.table_version)
+                .unwrap_or(0),
+        });
+    }
+    // RFC-013 Phase 7: the winning writer folded its recovery commit into the
+    // manifest CAS, so the converge audit references THAT commit. We lost the CAS
+    // and never minted it, but a recovery commit is distinguishable by its
+    // `RECOVERY_ACTOR` authorship (`publish_recovery_commit`), so the latest
+    // recovery-actored commit on this branch IS it. Do NOT use the branch head:
+    // a concurrent USER write can advance `graph_head` past the recovery commit
+    // between the winner's publish and this read, which would attribute the audit
+    // row to the wrong (later, user) commit. (We only reach here with the sidecar
+    // still on disk: the winner advanced the manifest but crashed before its own
+    // audit+delete, so we finish its bookkeeping.)
+    let cache = match sidecar.branch.as_deref() {
+        Some(branch) => {
+            crate::db::commit_graph::CommitGraph::open_at_branch(root_uri, branch).await?
+        }
+        None => crate::db::commit_graph::CommitGraph::open(root_uri).await?,
+    };
+    let converged_commit_id = match cache
+        .load_commits()
+        .await?
+        .into_iter()
+        .rfind(|c| c.actor_id.as_deref() == Some(RECOVERY_ACTOR))
+    {
+        Some(recovery_commit) => recovery_commit.graph_commit_id,
+        // No recovery commit visible — unexpected on this path (the winner just
+        // published one); fall back to the head rather than an empty id.
+        None => cache.head_commit_id().await?.unwrap_or_default(),
+    };
+    record_audit(
+        root_uri,
+        sidecar,
+        converged_commit_id,
+        RecoveryKind::RolledForward,
+        outcomes,
+    )
+    .await?;
+    delete_sidecar_by_operation_id(root_uri, storage.as_ref(), &sidecar.operation_id).await?;
+    Ok(true)
+}
+
 #[derive(Debug, Clone, Copy)]
 struct ClassifiedTable {
    classification: TableClassification,
@ -1268,7 +1523,6 @@ struct ClassifiedTable {
 async fn roll_back_sidecar(
    root_uri: &str,
    storage: &dyn StorageAdapter,
-    snapshot: &Snapshot,
    sidecar: &RecoverySidecar,
    states: &[ClassifiedTable],
 ) -> Result<()> {
@ -1328,23 +1582,18 @@ async fn roll_back_sidecar(
            });
        }
    }
-    // Publish the restored HEADs so manifest == HEAD. A degenerate all-NoMovement
-    // roll-back restores nothing — there's nothing to publish, and the audit
-    // records the unchanged snapshot version.
-    let manifest_version = if updates.is_empty() {
-        snapshot.version()
-    } else {
-        let publisher = GraphNamespacePublisher::new(root_uri, sidecar.branch.as_deref());
-        publisher
-            .publish(&updates, &expected)
-            .await?
-            .version()
-            .version
-    };
+    // Publish the restored HEADs so manifest == HEAD AND record the recovery
+    // commit in the same CAS (RFC-013 Phase 7). A degenerate all-NoMovement
+    // roll-back restores no table — `updates` is empty — but the recovery commit
+    // lineage still publishes (a lineage-only merge), so the rollback is recorded
+    // in the commit history just like a roll-forward.
+    let (_manifest_version, graph_commit_id) =
+        publish_recovery_commit(root_uri, sidecar, RecoveryKind::RolledBack, &updates, &expected)
+            .await?;
    record_audit(
        root_uri,
        sidecar,
-        manifest_version,
+        graph_commit_id,
        RecoveryKind::RolledBack,
        outcomes,
    )
@ -1370,7 +1619,6 @@ async fn roll_back_sidecar(
 async fn record_audit_recovery_rollforward(
    root_uri: &str,
    storage: &dyn StorageAdapter,
-    snapshot: &Snapshot,
    sidecar: &RecoverySidecar,
    states: &[ClassifiedTable],
 ) -> Result<()> {
@ -1384,10 +1632,22 @@ async fn record_audit_recovery_rollforward(
            to_version: state.manifest_pinned,
        })
        .collect();
+    // The substrate is already in the post-roll-forward state (the prior pass's
+    // table re-pin landed), so there are no table `updates` — but a recovery
+    // commit is still recorded for this cleanup pass via a lineage-only publish
+    // (RFC-013 Phase 7), which the audit row references.
+    let (_manifest_version, graph_commit_id) = publish_recovery_commit(
+        root_uri,
+        sidecar,
+        RecoveryKind::RolledForward,
+        &[],
+        &HashMap::new(),
+    )
+    .await?;
    record_audit(
        root_uri,
        sidecar,
-        snapshot.version(),
+        graph_commit_id,
        RecoveryKind::RolledForward,
        outcomes,
    )
@ -1407,17 +1667,19 @@ async fn record_audit_recovery_rollforward(
 /// contention; persistent contention surfaces the typed conflict error to
 /// the recovery sweep, which leaves the sidecar in place for the next
 /// open's retry.
-/// Returns `(new_manifest_version, per_table_published_versions)`. The
-/// per-table map is what the audit row's `to_version` should record —
-/// for loose-match writers the actual Lance HEAD can be higher than the
-/// sidecar's `post_commit_pin` (which is a lower bound), so the pin is
-/// the wrong source of truth for an operator-facing audit field.
+/// Returns `(new_manifest_version, per_table_published_versions,
+/// recovery_commit_id)`. The per-table map is what the audit row's `to_version`
+/// should record — for loose-match writers the actual Lance HEAD can be higher
+/// than the sidecar's `post_commit_pin` (which is a lower bound), so the pin is
+/// the wrong source of truth for an operator-facing audit field. The recovery
+/// commit id is the `graph_commit` folded into the publish CAS (RFC-013
+/// Phase 7), which the audit row references.
 async fn roll_forward_all(
    root_uri: &str,
    sidecar: &RecoverySidecar,
    states: &[ClassifiedTable],
    snapshot: &Snapshot,
-) -> Result<(u64, HashMap<String, u64>)> {
+) -> Result<(u64, HashMap<String, u64>, String)> {
    let total_changes =
        sidecar.tables.len() + sidecar.additional_registrations.len() + sidecar.tombstones.len();
    let mut updates: Vec<ManifestChange> = Vec::with_capacity(total_changes);
@ -1528,9 +1790,10 @@ async fn roll_forward_all(
        );
    }

-    let publisher = GraphNamespacePublisher::new(root_uri, sidecar.branch.as_deref());
-    let new_dataset = publisher.publish(&updates, &expected).await?;
-    Ok((new_dataset.version().version, published_versions))
+    let (new_manifest_version, graph_commit_id) =
+        publish_recovery_commit(root_uri, sidecar, RecoveryKind::RolledForward, &updates, &expected)
+            .await?;
+    Ok((new_manifest_version, published_versions, graph_commit_id))
 }

 /// Open `table_path` at its branch HEAD, read the current Lance HEAD version,
@ -1600,62 +1863,27 @@ async fn push_table_update(
    Ok(published_version)
 }

-/// Append the audit row describing this recovery action.
+/// Append the audit row describing this recovery action (RFC-013 Phase 7).
 ///
-/// Two-part write: (a) `_graph_commits.lance` row anchored on the recovery
-/// actor (`omnigraph:recovery`); (b) `_graph_commit_recoveries.lance` row
-/// linking back to (a) and naming the original actor + per-table outcomes.
-/// Same not-atomic-pair-write shape as the existing `_graph_commits`
-/// + `_graph_commit_actors` split — a crash between the two leaves an
-/// orphan commit row with no audit row. The recovery sweep tolerates this:
-/// on re-entry the classifier surfaces `NoMovement` for already-restored /
-/// already-published tables, the action is a no-op, and the audit append
-/// is retried.
+/// The recovery COMMIT (`graph_commit` + `graph_head`) was already recorded
+/// durably in `__manifest` by `publish_recovery_commit` (folded into the same
+/// CAS as the table re-pin), so this only writes the `_graph_commit_recoveries`
+/// row, referencing that commit by `graph_commit_id`. A crash between the
+/// recovery publish and this audit append leaves a recovery commit with no audit
+/// row — the same not-atomic-pair-write shape as before; the sweep tolerates it
+/// (on re-entry the classifier surfaces `NoMovement`, the action is a no-op, and
+/// the audit append is retried, minting a fresh recovery commit).
 async fn record_audit(
    root_uri: &str,
    sidecar: &RecoverySidecar,
-    manifest_version: u64,
+    graph_commit_id: String,
    kind: RecoveryKind,
    outcomes: Vec<TableOutcome>,
 ) -> Result<()> {
    // Failpoint: models an audit write failure after the roll-forward /
-    // roll-back publish already landed — the sweep aborts, the sidecar
-    // stays, and re-entry records the audit row (see the retry note in
-    // the doc comment above).
-    crate::failpoints::maybe_fail("recovery.record_audit")?;
-    // Non-main recovery commits must be appended on the sidecar branch's
-    // commit graph, otherwise parent_commit_id comes from the global
-    // main head. BranchMerge additionally records the source branch's
-    // HEAD as merged_parent_commit_id so future merges between the same
-    // pair recognize "already up-to-date".
-    let target_branch = sidecar.branch.as_deref();
-    let mut graph = match target_branch {
-        Some(branch) => CommitGraph::open_at_branch(root_uri, branch).await?,
-        None => CommitGraph::open(root_uri).await?,
-    };
-    let graph_commit_id = match (
-        sidecar.writer_kind,
-        sidecar.merge_source_commit_id.as_deref(),
-        kind,
-    ) {
-        (SidecarKind::BranchMerge, Some(source_id), RecoveryKind::RolledForward) => {
-            let parent_commit_id = graph.head_commit_id().await?.unwrap_or_default();
-            graph
-                .append_merge_commit(
-                    target_branch,
-                    manifest_version,
-                    &parent_commit_id,
-                    source_id,
-                    Some(RECOVERY_ACTOR),
-                )
-                .await?
-        }
-        _ => {
-            graph
-                .append_commit(target_branch, manifest_version, Some(RECOVERY_ACTOR))
-                .await?
-        }
-    };
+    // roll-back publish (with its folded-in recovery commit) already landed —
+    // the sweep aborts, the sidecar stays, and re-entry records the audit row.
+    crate::failpoints::maybe_fail(crate::failpoints::names::RECOVERY_RECORD_AUDIT)?;
    let mut audit = RecoveryAudit::open(root_uri).await?;
    audit
        .append(RecoveryAuditRecord {
@ -1665,7 +1893,7 @@ async fn record_audit(
            operation_id: sidecar.operation_id.clone(),
            sidecar_writer_kind: format!("{:?}", sidecar.writer_kind),
            per_table_outcomes: outcomes,
-            created_at: now_micros()?,
+            created_at: crate::db::now_micros()?,
        })
        .await?;
    Ok(())
--- a/crates/omnigraph/src/db/manifest/state.rs
+++ b/crates/omnigraph/src/db/manifest/state.rs
@ -10,7 +10,10 @@ use crate::error::{OmniError, Result};

 use super::layout::version_object_id;
 use super::metadata::TableVersionMetadata;
-use super::{OBJECT_TYPE_TABLE, OBJECT_TYPE_TABLE_TOMBSTONE, OBJECT_TYPE_TABLE_VERSION};
+use super::{
+    MAIN_BRANCH_HEAD_KEY, OBJECT_TYPE_GRAPH_COMMIT, OBJECT_TYPE_GRAPH_HEAD, OBJECT_TYPE_TABLE,
+    OBJECT_TYPE_TABLE_TOMBSTONE, OBJECT_TYPE_TABLE_VERSION,
+};

 #[derive(Debug, Clone)]
 pub struct SubTableEntry {
@ -34,11 +37,64 @@ struct TableTombstoneEntry {
    tombstone_version: u64,
 }

+/// A graph-lineage commit projected out of the `__manifest` `graph_commit`
+/// rows (RFC-013 step 4). Field-for-field identical to `commit_graph::GraphCommit`
+/// so the commit-graph cache can be sourced from the manifest projection without
+/// touching any reader above that boundary. Kept as a separate struct here to
+/// keep `state.rs` free of the `commit_graph` module dependency.
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub(crate) struct GraphLineageRow {
+    pub(crate) graph_commit_id: String,
+    pub(crate) manifest_branch: Option<String>,
+    pub(crate) manifest_version: u64,
+    pub(crate) parent_commit_id: Option<String>,
+    pub(crate) merged_parent_commit_id: Option<String>,
+    pub(crate) actor_id: Option<String>,
+    pub(crate) created_at: i64,
+}
+
+/// JSON payload of a `graph_commit` row's `metadata` column. The immutable
+/// commit fields that have no dedicated manifest column live here; the mutable
+/// ones (`graph_commit_id`, `manifest_branch`, `manifest_version`) reuse
+/// `object_id` / `table_branch` / `table_version`.
+#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
+struct GraphCommitMetadata {
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    parent_commit_id: Option<String>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    merged_parent_commit_id: Option<String>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    actor_id: Option<String>,
+    created_at: i64,
+}
+
+/// JSON payload of a `graph_head` row's `metadata` column.
+#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
+struct GraphHeadMetadata {
+    head_commit_id: String,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    parent_commit_id: Option<String>,
+}
+
+/// The `object_id` for a branch's mutable head pointer row. Main encodes as
+/// `graph_head:main`; named branches as `graph_head:<branch>`.
+pub(crate) fn graph_head_object_id(branch: Option<&str>) -> String {
+    format!("graph_head:{}", branch.unwrap_or(MAIN_BRANCH_HEAD_KEY))
+}
+
 #[derive(Debug, Clone)]
 struct ManifestScan {
    table_locations: HashMap<String, String>,
    version_entries: Vec<SubTableEntry>,
    tombstones: Vec<TableTombstoneEntry>,
+    /// Graph-lineage `graph_commit` rows, collected in the SAME pass only when
+    /// the caller asked (`collect_lineage`). Empty on the table-state read hot
+    /// path so it never pays the O(commits) lineage JSON decode; populated on the
+    /// publish path, where `load_publish_state` already needs the parent and would
+    /// otherwise scan `__manifest` a second time via `read_graph_lineage`. `graph_head`
+    /// rows are not collected here — parent resolution uses the head-over-commits
+    /// computation, not the denormalized head pointer (see `resolve_lineage_rows`).
+    lineage_rows: Vec<GraphLineageRow>,
 }

 pub(super) fn manifest_schema() -> SchemaRef {
@ -73,7 +129,8 @@ pub(super) fn manifest_schema() -> SchemaRef {

 pub(super) async fn read_manifest_state(dataset: &Dataset) -> Result<ManifestState> {
    let version = dataset.version().version;
-    let scan = read_manifest_scan(dataset).await?;
+    // The table-state hot path never needs lineage, so don't pay its JSON decode.
+    let scan = read_manifest_scan(dataset, false).await?;
    let mut latest_versions = HashMap::<String, SubTableEntry>::new();

    for entry in scan.version_entries {
@ -109,28 +166,85 @@ pub(super) async fn read_manifest_state(dataset: &Dataset) -> Result<ManifestSta
    Ok(ManifestState { version, entries })
 }

+// After RFC-013 P2 folded the publish path off this accessor (it now projects
+// version entries out of `read_publish_scan`'s single scan), the only remaining
+// caller is `BranchManifestNamespace::version_entries`. That namespace module is
+// `#[cfg(test)]` (see `db/manifest.rs`: "nothing in production routes through it;
+// the `LanceNamespace` impls are retained only to validate the contract in unit
+// tests"), so this stays `#[cfg(test)]` too — otherwise it is dead code in
+// non-test builds.
+#[cfg(test)]
 pub(super) async fn read_manifest_entries(dataset: &Dataset) -> Result<Vec<SubTableEntry>> {
-    Ok(read_manifest_scan(dataset).await?.version_entries)
+    Ok(read_manifest_scan(dataset, false).await?.version_entries)
 }

-pub(super) async fn read_registered_table_locations(
-    dataset: &Dataset,
-) -> Result<HashMap<String, String>> {
-    Ok(read_manifest_scan(dataset).await?.table_locations)
+/// The full table state the publisher needs to build its CAS batch, plus the
+/// `graph_commit` lineage rows for parent resolution — all from ONE `__manifest`
+/// scan (RFC-013 P2). Replaces the prior four scans on the publish path (three
+/// thin accessors + a separate `read_graph_lineage`): `load_publish_state`
+/// projects every piece it needs out of this single result.
+pub(super) struct PublishScan {
+    pub(super) table_locations: HashMap<String, String>,
+    pub(super) version_entries: Vec<SubTableEntry>,
+    pub(super) tombstones: Vec<((String, u64), ())>,
+    pub(super) lineage_rows: Vec<GraphLineageRow>,
 }

-pub(super) async fn read_tombstone_versions(
-    dataset: &Dataset,
-) -> Result<HashMap<(String, u64), ()>> {
-    Ok(read_manifest_scan(dataset)
-        .await?
-        .tombstones
-        .into_iter()
-        .map(|tombstone| ((tombstone.table_key, tombstone.tombstone_version), ()))
-        .collect())
+/// One-scan read of everything the publish path needs. `collect_lineage` is
+/// always on here (the publisher resolves a parent), so the lineage JSON decode
+/// rides the same pass as the table-state assembly instead of a second scan.
+pub(super) async fn read_publish_scan(dataset: &Dataset) -> Result<PublishScan> {
+    let scan = read_manifest_scan(dataset, true).await?;
+    Ok(PublishScan {
+        table_locations: scan.table_locations,
+        version_entries: scan.version_entries,
+        tombstones: scan
+            .tombstones
+            .into_iter()
+            .map(|tombstone| ((tombstone.table_key, tombstone.tombstone_version), ()))
+            .collect(),
+        lineage_rows: scan.lineage_rows,
+    })
 }

-async fn read_manifest_scan(dataset: &Dataset) -> Result<ManifestScan> {
+/// Decode one `graph_commit` row (`object_type == OBJECT_TYPE_GRAPH_COMMIT`) into
+/// a [`GraphLineageRow`]. The single decode for both lineage readers — the
+/// dedicated `read_graph_lineage` scan and the folded `collect_lineage` branch of
+/// `read_manifest_scan` — so the two cannot drift. The caller has already matched
+/// the object type; `row` indexes into the per-batch columns.
+fn decode_graph_commit_row(
+    object_ids: &StringArray,
+    metadata: &StringArray,
+    versions: &UInt64Array,
+    branches: &StringArray,
+    row: usize,
+) -> Result<GraphLineageRow> {
+    if metadata.is_null(row) {
+        return Err(OmniError::manifest_internal(format!(
+            "manifest graph_commit row missing metadata for {}",
+            object_ids.value(row)
+        )));
+    }
+    let commit_meta: GraphCommitMetadata =
+        serde_json::from_str(metadata.value(row)).map_err(|e| {
+            OmniError::manifest_internal(format!("failed to decode graph_commit metadata: {e}"))
+        })?;
+    Ok(GraphLineageRow {
+        graph_commit_id: object_ids.value(row).to_string(),
+        manifest_branch: if branches.is_null(row) {
+            None
+        } else {
+            Some(branches.value(row).to_string())
+        },
+        manifest_version: required_u64(versions, row, "table_version")?,
+        parent_commit_id: commit_meta.parent_commit_id,
+        merged_parent_commit_id: commit_meta.merged_parent_commit_id,
+        actor_id: commit_meta.actor_id,
+        created_at: commit_meta.created_at,
+    })
+}
+
+async fn read_manifest_scan(dataset: &Dataset, collect_lineage: bool) -> Result<ManifestScan> {
    let batches: Vec<RecordBatch> = dataset
        .scan()
        .try_into_stream()
@ -143,6 +257,7 @@ async fn read_manifest_scan(dataset: &Dataset) -> Result<ManifestScan> {
    let mut table_locations = HashMap::new();
    let mut version_entries = Vec::new();
    let mut tombstones = Vec::new();
+    let mut lineage_rows = Vec::new();

    for batch in &batches {
        let object_types = string_column(batch, "object_type")?;
@ -152,6 +267,13 @@ async fn read_manifest_scan(dataset: &Dataset) -> Result<ManifestScan> {
        let versions = u64_column(batch, "table_version")?;
        let branches = string_column(batch, "table_branch")?;
        let row_counts = u64_column(batch, "row_count")?;
+        // `object_id` is only needed for lineage decoding; skip the lookup
+        // entirely on the table-state hot path (`collect_lineage == false`).
+        let object_ids = if collect_lineage {
+            Some(string_column(batch, "object_id")?)
+        } else {
+            None
+        };

        for row in 0..batch.num_rows() {
            let table_key = table_keys.value(row).to_string();
@ -195,6 +317,21 @@ async fn read_manifest_scan(dataset: &Dataset) -> Result<ManifestScan> {
                        tombstone_version,
                    });
                }
+                // `graph_commit` rows (RFC-013) are decoded into the scan ONLY
+                // when `collect_lineage` is set (the publish path, which resolves
+                // a parent). The table-state hot path leaves them — and
+                // `graph_head` + any future object type — in the `_` arm so it
+                // never pays the O(commits) lineage JSON decode. When NOT
+                // collecting, `object_ids` is `None`, so this arm is the same
+                // forward-compat skip as the `_` arm.
+                OBJECT_TYPE_GRAPH_COMMIT if collect_lineage => {
+                    let object_ids = object_ids.expect("object_ids read when collect_lineage");
+                    lineage_rows.push(decode_graph_commit_row(
+                        object_ids, metadata, versions, branches, row,
+                    )?);
+                }
+                // Skipped on the table-state path (and for `graph_head` / unknown
+                // future object types on every path): no table snapshot needs them.
                _ => {}
            }
        }
@ -225,21 +362,167 @@ async fn read_manifest_scan(dataset: &Dataset) -> Result<ManifestScan> {
        table_locations,
        version_entries: entries,
        tombstones,
+        lineage_rows,
    })
 }

+/// Project the graph-lineage rows (`graph_commit` + `graph_head`) out of
+/// `__manifest` (RFC-013 step 4). Returns every commit and the per-branch head
+/// map (keyed by branch name, `"main"` for main). `__manifest` is the single
+/// source of graph lineage: the commit-graph cache is sourced from here, and the
+/// publisher resolves a new commit's parent from here inside its CAS loop.
+///
+/// Dedicated scan (separate from `read_manifest_scan`): it decodes ONLY the two
+/// lineage object types and builds no table snapshot, so the table-state hot
+/// path never pays for lineage JSON and this path never pays for table-entry
+/// assembly.
+pub(crate) async fn read_graph_lineage(
+    dataset: &Dataset,
+) -> Result<(Vec<GraphLineageRow>, HashMap<String, String>)> {
+    let batches: Vec<RecordBatch> = dataset
+        .scan()
+        .try_into_stream()
+        .await
+        .map_err(|e| OmniError::Lance(e.to_string()))?
+        .try_collect()
+        .await
+        .map_err(|e| OmniError::Lance(e.to_string()))?;
+
+    let mut graph_commits = Vec::new();
+    let mut graph_heads = HashMap::new();
+
+    for batch in &batches {
+        let object_ids = string_column(batch, "object_id")?;
+        let object_types = string_column(batch, "object_type")?;
+        let metadata = string_column(batch, "metadata")?;
+        let versions = u64_column(batch, "table_version")?;
+        let branches = string_column(batch, "table_branch")?;
+
+        for row in 0..batch.num_rows() {
+            match object_types.value(row) {
+                OBJECT_TYPE_GRAPH_COMMIT => {
+                    graph_commits.push(decode_graph_commit_row(
+                        object_ids, metadata, versions, branches, row,
+                    )?);
+                }
+                OBJECT_TYPE_GRAPH_HEAD => {
+                    if metadata.is_null(row) {
+                        return Err(OmniError::manifest_internal(format!(
+                            "manifest graph_head row missing metadata for {}",
+                            object_ids.value(row)
+                        )));
+                    }
+                    let head_meta: GraphHeadMetadata = serde_json::from_str(metadata.value(row))
+                        .map_err(|e| {
+                            OmniError::manifest_internal(format!(
+                                "failed to decode graph_head metadata: {e}"
+                            ))
+                        })?;
+                    // `object_id` is `graph_head:<branch>`; the branch key after
+                    // the prefix is the projection's map key (`main` for main).
+                    let branch_key = object_ids
+                        .value(row)
+                        .strip_prefix("graph_head:")
+                        .unwrap_or_default()
+                        .to_string();
+                    graph_heads.insert(branch_key, head_meta.head_commit_id);
+                }
+                _ => {}
+            }
+        }
+    }
+
+    Ok((graph_commits, graph_heads))
+}
+
+/// The current head of a branch's lineage: the [`GraphLineageRow`] with the
+/// greatest `(manifest_version, created_at, graph_commit_id)`. This is the same
+/// ordering the commit-graph cache uses to pick its head (`should_replace_head`)
+/// — kept in one place so the publisher's per-attempt parent resolution and the
+/// cache agree by construction. `None` only for a graph with no commits yet
+/// (a parentless genesis).
+pub(crate) fn head_lineage_row(rows: &[GraphLineageRow]) -> Option<&GraphLineageRow> {
+    rows.iter().max_by(|a, b| {
+        a.manifest_version
+            .cmp(&b.manifest_version)
+            .then_with(|| a.created_at.cmp(&b.created_at))
+            .then_with(|| a.graph_commit_id.cmp(&b.graph_commit_id))
+    })
+}
+
+/// One `__manifest` row materializing a piece of a graph commit's lineage. The
+/// publisher maps these onto its `PendingVersionRow`s (folding lineage into the
+/// table-version publish batch), and the genesis init path pushes them straight
+/// into the init batch.
+pub(crate) struct GraphLineageRowPart {
+    pub(crate) object_id: String,
+    pub(crate) object_type: &'static str,
+    pub(crate) metadata: String,
+    pub(crate) table_version: Option<u64>,
+    pub(crate) table_branch: Option<String>,
+}
+
+/// Encode one graph commit into its two `__manifest` rows: the immutable
+/// `graph_commit` row plus the mutable `graph_head:<branch>` pointer (a
+/// merge-insert on `object_id` updates the head in place). `branch` is `None`
+/// for main. The immutable commit fields with no dedicated column live in the
+/// `graph_commit` row's `metadata` JSON; the mutable head pointer payload lives
+/// in the `graph_head` row's `metadata`.
+pub(crate) fn graph_lineage_row_parts(
+    commit: &GraphLineageRow,
+    branch: Option<&str>,
+) -> Result<[GraphLineageRowPart; 2]> {
+    let commit_metadata = serde_json::to_string(&GraphCommitMetadata {
+        parent_commit_id: commit.parent_commit_id.clone(),
+        merged_parent_commit_id: commit.merged_parent_commit_id.clone(),
+        actor_id: commit.actor_id.clone(),
+        created_at: commit.created_at,
+    })
+    .map_err(|e| {
+        OmniError::manifest_internal(format!("failed to encode graph_commit metadata: {e}"))
+    })?;
+    let head_metadata = serde_json::to_string(&GraphHeadMetadata {
+        head_commit_id: commit.graph_commit_id.clone(),
+        parent_commit_id: commit.parent_commit_id.clone(),
+    })
+    .map_err(|e| {
+        OmniError::manifest_internal(format!("failed to encode graph_head metadata: {e}"))
+    })?;
+
+    Ok([
+        // Only the immutable commit row carries the manifest version + branch.
+        GraphLineageRowPart {
+            object_id: commit.graph_commit_id.clone(),
+            object_type: OBJECT_TYPE_GRAPH_COMMIT,
+            metadata: commit_metadata,
+            table_version: Some(commit.manifest_version),
+            table_branch: commit.manifest_branch.clone(),
+        },
+        // The head row reuses `metadata` for its pointer payload.
+        GraphLineageRowPart {
+            object_id: graph_head_object_id(branch),
+            object_type: OBJECT_TYPE_GRAPH_HEAD,
+            metadata: head_metadata,
+            table_version: None,
+            table_branch: None,
+        },
+    ])
+}
+
 pub(super) fn entries_to_batch(
    entries: &[SubTableEntry],
    version_metadata: &HashMap<String, String>,
+    genesis_lineage: &[GraphLineageRowPart],
 ) -> Result<RecordBatch> {
-    let mut object_ids = Vec::with_capacity(entries.len() * 2);
-    let mut object_types = Vec::with_capacity(entries.len() * 2);
-    let mut locations = Vec::with_capacity(entries.len() * 2);
-    let mut metadata = Vec::with_capacity(entries.len() * 2);
-    let mut table_keys = Vec::with_capacity(entries.len() * 2);
-    let mut table_versions = Vec::with_capacity(entries.len() * 2);
-    let mut table_branches = Vec::with_capacity(entries.len() * 2);
-    let mut row_counts = Vec::with_capacity(entries.len() * 2);
+    let cap = entries.len() * 2 + genesis_lineage.len();
+    let mut object_ids = Vec::with_capacity(cap);
+    let mut object_types = Vec::with_capacity(cap);
+    let mut locations = Vec::with_capacity(cap);
+    let mut metadata = Vec::with_capacity(cap);
+    let mut table_keys = Vec::with_capacity(cap);
+    let mut table_versions = Vec::with_capacity(cap);
+    let mut table_branches = Vec::with_capacity(cap);
+    let mut row_counts = Vec::with_capacity(cap);

    for entry in entries {
        object_ids.push(entry.table_key.clone());
@ -271,6 +554,22 @@ pub(super) fn entries_to_batch(
        row_counts.push(Some(entry.row_count));
    }

+    // Genesis graph-lineage rows ride the init write so a fresh graph carries
+    // its `graph_commit` + `graph_head` in `__manifest` from version one (no
+    // separate lineage fragment, no second commit). `table_key` is non-nullable
+    // but lineage rows have no table identity, so the empty string stands in
+    // (never matched by a real key).
+    for part in genesis_lineage {
+        object_ids.push(part.object_id.clone());
+        object_types.push(part.object_type.to_string());
+        locations.push(None);
+        metadata.push(Some(part.metadata.clone()));
+        table_keys.push(String::new());
+        table_versions.push(part.table_version);
+        table_branches.push(part.table_branch.clone());
+        row_counts.push(None);
+    }
+
    manifest_rows_batch(
        object_ids,
        object_types,
@ -283,6 +582,72 @@ pub(super) fn entries_to_batch(
    )
 }

+/// Merge-insert a set of graph-lineage rows (`graph_commit` + `graph_head`)
+/// straight into `__manifest`, keyed on `object_id`. Used only by the v3→v4
+/// internal-schema backfill (RFC-013 step 4): the normal write path folds
+/// lineage into the publisher's batch, but the migration writes lineage with
+/// no accompanying table-version change, so it issues its own merge.
+///
+/// Mirrors the publisher's merge knobs (`use_index(false)`, `skip_auto_cleanup`,
+/// `conflict_retries(0)`) so it has identical CAS / cleanup semantics. The
+/// migration runs under the open-for-write path and is idempotent (re-inserting
+/// the same `object_id` rows updates them in place), so it does not need the
+/// publisher's retry loop. Returns the advanced dataset (its version is the
+/// commit the lineage landed in).
+pub(crate) async fn merge_lineage_rows(
+    dataset: Dataset,
+    parts: &[GraphLineageRowPart],
+) -> Result<Dataset> {
+    let len = parts.len();
+    let mut object_ids = Vec::with_capacity(len);
+    let mut object_types = Vec::with_capacity(len);
+    let mut metadata = Vec::with_capacity(len);
+    let mut table_versions = Vec::with_capacity(len);
+    let mut table_branches = Vec::with_capacity(len);
+    for part in parts {
+        object_ids.push(part.object_id.clone());
+        object_types.push(part.object_type.to_string());
+        metadata.push(Some(part.metadata.clone()));
+        table_versions.push(part.table_version);
+        table_branches.push(part.table_branch.clone());
+    }
+    // Lineage rows carry no table identity: empty `table_key`, null location /
+    // row_count (matching `lineage_part_to_pending` in the publisher).
+    let batch = manifest_rows_batch(
+        object_ids,
+        object_types,
+        vec![None; len],
+        metadata,
+        vec![String::new(); len],
+        table_versions,
+        table_branches,
+        vec![None; len],
+    )?;
+    let reader =
+        arrow_array::RecordBatchIterator::new(vec![Ok(batch)], manifest_schema());
+    let dataset = Arc::new(dataset);
+    let mut merge_builder =
+        lance::dataset::MergeInsertBuilder::try_new(dataset, vec!["object_id".to_string()])
+            .map_err(|e| OmniError::Lance(e.to_string()))?;
+    merge_builder.when_matched(lance::dataset::WhenMatched::UpdateAll);
+    merge_builder.when_not_matched(lance::dataset::WhenNotMatched::InsertAll);
+    merge_builder.conflict_retries(0);
+    merge_builder.use_index(false);
+    merge_builder.skip_auto_cleanup(true);
+    let (new_dataset, _stats) = merge_builder
+        .try_build()
+        .map_err(|e| OmniError::Lance(e.to_string()))?
+        .execute_reader(Box::new(reader))
+        // Route through the publisher's classifier (not a stringify) so a
+        // concurrent first-open's CAS loss on `__manifest` surfaces as the SAME
+        // typed `RowLevelCasContention` the publisher's retry consumes. The
+        // migration's re-open retry loop matches on that to converge instead of
+        // erroring out (FIX B).
+        .await
+        .map_err(super::publisher::map_lance_publish_error)?;
+    Ok(Arc::try_unwrap(new_dataset).unwrap_or_else(|arc| (*arc).clone()))
+}
+
 pub(super) fn manifest_rows_batch(
    object_ids: Vec<String>,
    object_types: Vec<String>,
--- a/crates/omnigraph/src/db/manifest/tests.rs
+++ b/crates/omnigraph/src/db/manifest/tests.rs
--- a/crates/omnigraph/src/db/mod.rs
+++ b/crates/omnigraph/src/db/mod.rs
@ -10,12 +10,15 @@ pub use commit_graph::GraphCommit;
 pub use graph_coordinator::{GraphCoordinator, ReadTarget, ResolvedTarget, SnapshotId};
 pub use manifest::{Snapshot, SubTableEntry, SubTableUpdate};
 pub(crate) use omnigraph::ensure_public_branch_ref;
+pub(crate) use omnigraph::WriteTxn;
 pub use omnigraph::{
    CleanupPolicyOptions, InitOptions, MergeOutcome, Omnigraph, OpenMode, PendingIndex,
    RepairAction, RepairClassification, RepairOptions, RepairStats, SchemaApplyOptions,
    SchemaApplyResult, SkipReason, TableCleanupStats, TableOptimizeStats, TableRepairStats,
 };

+use crate::error::{OmniError, Result};
+
 pub(crate) const SCHEMA_APPLY_LOCK_BRANCH: &str = "__schema_apply_lock__";

 /// Mutation kind, threaded through the version-check call sites so the
@ -73,3 +76,14 @@ pub(crate) fn is_internal_system_branch(name: &str) -> bool {
    // only internal branch the engine still creates is the schema-apply lock.
    is_schema_apply_lock_branch(name)
 }
+
+/// Microseconds since the UNIX epoch — the `created_at` stamp threaded through
+/// every graph-lineage / recovery-audit / commit-graph row. One canonical
+/// helper so the clock-error mapping (variant + message) cannot drift across
+/// the call sites that record those timestamps.
+pub(crate) fn now_micros() -> Result<i64> {
+    let duration = std::time::SystemTime::now()
+        .duration_since(std::time::UNIX_EPOCH)
+        .map_err(|e| OmniError::manifest(format!("system clock before UNIX_EPOCH: {e}")))?;
+    Ok(duration.as_micros() as i64)
+}
--- a/crates/omnigraph/src/db/omnigraph.rs
+++ b/crates/omnigraph/src/db/omnigraph.rs
@ -41,6 +41,7 @@ pub use repair::{
 };
 pub use schema_apply::SchemaApplyOptions;
 pub use table_ops::PendingIndex;
+pub(crate) use table_ops::OpenedForMutation;

 use super::commit_graph::GraphCommit;
 use super::manifest::{
@ -79,6 +80,35 @@ pub struct SchemaApplyPreview {
    pub catalog: Catalog,
 }

+/// A capture-once write transaction (RFC-013 step 3b). Pins the operation's read
+/// base ONCE so the per-table opens reuse the pinned version instead of
+/// re-resolving / re-validating per table. The schema contract is validated once
+/// (when `base` is captured). NOT a general "no re-resolution" handle — the
+/// commit-time OCC re-read, the live-HEAD drift probe, and the fork-authority reads
+/// stay fresh (correctness machinery). Step 5 (PublishPlan unification) makes this
+/// the non-optional publish carrier and adds session-aware base opens there, gated
+/// by an S3 cost test — the warm-session benefit on the single remaining open is an
+/// object-store phenomenon, so it earns its own gate rather than riding this PR.
+///
+/// Threaded as `Option<&WriteTxn>` through the mutate/load write chain
+/// (`open_for_mutation_on_branch`, `commit_all`, `commit_updates_on_branch_with_expected`)
+/// so a single write validates the schema contract EXACTLY ONCE — at capture. When
+/// present, the per-table resolves source the pinned `base` entry instead of calling
+/// `resolved_branch_target` / `snapshot_for_branch` / `fresh_snapshot_for_branch`
+/// (each of which re-runs `ensure_schema_state_valid`). When absent (`None` — every
+/// non-mutate/load caller), every threaded function behaves byte-identically to
+/// before. The carrier never removes a version guard or changes which dataset version
+/// the per-table open targets: strict ops keep `open_dataset_head_for_write` +
+/// `ensure_expected_version`, and the commit-time OCC re-read still opens a fresh
+/// manifest snapshot (via `fresh_snapshot_for_branch_unchecked`) — only the redundant
+/// schema re-validation is dropped.
+pub(crate) struct WriteTxn {
+    /// The resolved branch (`None` = main).
+    pub(crate) branch: Option<String>,
+    /// The pinned base snapshot (per-table location + version + e_tag), captured once.
+    pub(crate) base: Snapshot,
+}
+
 /// Top-level handle to an Omnigraph database.
 ///
 /// An Omnigraph is a Lance-native graph database with git-style branching.
@ -93,12 +123,12 @@ pub struct Omnigraph {
    /// calls without a global write lock). Reads (`snapshot`, `version`,
    /// `current_branch`, `branch_list`, `resolve_*`, `head_commit_id`,
    /// `list_commits`, …) acquire `.read().await` and parallelize.
-    /// Writes (`refresh`, `branch_create`, `branch_delete`, `commit_*`,
-    /// `record_*`) acquire `.write().await` and serialize. The atomic
-    /// commit invariant — `commit_manifest_updates` followed by
-    /// `record_graph_commit` must be atomic — is preserved by the
-    /// single `.write()` covering both calls inside
-    /// `commit_updates_with_actor_with_expected`. PR 2 Phase 2
+    /// Writes (`refresh`, `branch_create`, `branch_delete`, `commit_*`)
+    /// acquire `.write().await` and serialize. The atomic commit invariant —
+    /// table-version rows and the graph commit are one unit — holds by
+    /// construction since RFC-013 Phase 7: both ride a SINGLE manifest publish
+    /// CAS (`commit_changes_with_lineage`), so there is no two-write window to
+    /// keep atomic. PR 2 Phase 2
    /// converted from `Mutex` to `RwLock` because the bench showed
    /// the Mutex was the dominant serializer for disjoint-table
    /// workloads. Lock acquisition order: always before `runtime_cache`
@ -287,7 +317,7 @@ impl Omnigraph {
            {
                return Err(OmniError::AlreadyInitialized { uri: root.clone() });
            }
-            if let Err(err) = crate::failpoints::maybe_fail("init.after_schema_pg_written") {
+            if let Err(err) = crate::failpoints::maybe_fail(crate::failpoints::names::INIT_AFTER_SCHEMA_PG_WRITTEN) {
                best_effort_cleanup_init_artifacts(&root, storage.as_ref()).await;
                return Err(err);
            }
@ -387,6 +417,14 @@ impl Omnigraph {
        // first read-write open (an accepted, documented limitation).
        if matches!(mode, OpenMode::ReadWrite) {
            crate::db::manifest::migrate_on_open(&root).await?;
+        } else {
+            // A read-only open skips `migrate_on_open` (no object-store writes),
+            // which is where the version refusal otherwise lives. Still refuse a
+            // `__manifest` stamped outside this binary's supported range — newer
+            // than CURRENT (an old binary cannot silently misread a newer graph,
+            // e.g. one folded to internal-schema v4 lineage), or below
+            // MIN_SUPPORTED (predates the readers we carry). Read-only, no write.
+            crate::db::manifest::refuse_if_internal_schema_unsupported(&root).await?;
        }
        // Open the coordinator first so the schema-staging recovery sweep can
        // compare its snapshot against any leftover staging files.
@ -736,6 +774,29 @@ impl Omnigraph {
        *self.coordinator.write().await = coordinator;
    }

+    /// Open a capture-once write transaction (RFC-013 step 3b): validate the schema
+    /// contract ONCE and pin the base snapshot. The per-table opens take
+    /// `Option<&WriteTxn>` and, on the bound branch for the non-strict (Insert/Merge)
+    /// path, source the pinned base entry — instead of re-resolving (re-validating the
+    /// schema) per table. Strict ops, the fork path, and the commit-time OCC re-read
+    /// keep their fresh reads (those are correctness machinery — see the handoff doc).
+    ///
+    /// "Once" covers the table-touch hot path captured here (proven by the node-insert
+    /// gate `write_validates_schema_contract_once`); it does NOT yet cover edge endpoint
+    /// / cardinality RI validation (`ensure_node_id_exists`, the loader's RI/cardinality),
+    /// which still resolve through `snapshot_for_branch` and re-validate. Those reads must
+    /// observe LIVE committed state, so unifying them (validate-once + pinned + re-checked
+    /// read-set) is step 4's §7.1 work — threading `txn.base` there would re-introduce the
+    /// stale-read class the #298 cardinality fix removed. A session-aware base open is
+    /// likewise deferred to step 5 (handoff §1d).
+    pub(crate) async fn open_write_txn(&self, branch: Option<&str>) -> Result<WriteTxn> {
+        let resolved = self.resolved_branch_target(branch).await?;
+        Ok(WriteTxn {
+            branch: resolved.branch,
+            base: resolved.snapshot,
+        })
+    }
+
    pub(crate) async fn resolved_branch_target(
        &self,
        branch: Option<&str>,
@ -770,12 +831,39 @@ impl Omnigraph {

    pub(crate) async fn fresh_snapshot_for_branch(&self, branch: Option<&str>) -> Result<Snapshot> {
        self.ensure_schema_state_valid().await?;
-        let requested = ReadTarget::Branch(branch.unwrap_or("main").to_string());
-        let coord = self.coordinator.read().await;
-        coord
-            .resolve_target(&requested)
-            .await
-            .map(|resolved| resolved.snapshot)
+        self.fresh_snapshot_for_branch_unchecked(branch).await
+    }
+
+    /// Fresh per-branch manifest snapshot WITHOUT the schema-contract
+    /// re-validation. Identical OCC freshness to [`fresh_snapshot_for_branch`]
+    /// — a fresh manifest re-read from storage, never the warm cache — only the
+    /// redundant `ensure_schema_state_valid` is dropped. Used inside a single
+    /// write once a `WriteTxn` has already validated the contract at capture: the
+    /// commit-time drift re-read needs the live manifest, not a second contract
+    /// read. Callers with no `WriteTxn` MUST use the checked variant.
+    ///
+    /// Reads the manifest directly via `ManifestCoordinator` rather than
+    /// `resolve_target`. The OCC re-read uses only the returned `Snapshot`
+    /// (per-table location + version), which `ManifestCoordinator::open().snapshot()`
+    /// produces identically to `GraphCoordinator::open(...).snapshot()` — but
+    /// `resolve_target` additionally opens the commit graph (an extra
+    /// `_graph_commits.lance` probe) the OCC read never consults. Skipping that
+    /// load is a pure read-cost reduction, not a freshness change. The checked
+    /// `fresh_snapshot_for_branch` delegates here, so its no-`txn` callers
+    /// (commit_all's None arm, optimize, repair, fork reclaim) get the same
+    /// identical `Snapshot` via this lighter manifest-only read; they consume
+    /// only the snapshot and never relied on the commit-graph side load.
+    pub(crate) async fn fresh_snapshot_for_branch_unchecked(
+        &self,
+        branch: Option<&str>,
+    ) -> Result<Snapshot> {
+        let manifest = match branch {
+            Some(branch) => {
+                crate::db::manifest::ManifestCoordinator::open_at_branch(self.uri(), branch).await?
+            }
+            None => crate::db::manifest::ManifestCoordinator::open(self.uri()).await?,
+        };
+        Ok(manifest.snapshot())
    }

    pub(crate) async fn version(&self) -> u64 {
@ -1367,7 +1455,7 @@ impl Omnigraph {

        for (table_key, table_path) in cleanup_targets {
            let dataset_uri = self.storage().dataset_uri(&table_path);
-            let outcome = match crate::failpoints::maybe_fail("branch_delete.before_table_cleanup")
+            let outcome = match crate::failpoints::maybe_fail(crate::failpoints::names::BRANCH_DELETE_BEFORE_TABLE_CLEANUP)
            {
                Ok(()) => {
                    self.storage()
@ -1599,7 +1687,7 @@ impl Omnigraph {
        &self,
        table_key: &str,
        op_kind: crate::db::MutationOpKind,
-    ) -> Result<(SnapshotHandle, String, Option<String>)> {
+    ) -> Result<OpenedForMutation> {
        table_ops::open_for_mutation(self, table_key, op_kind).await
    }

@ -1608,8 +1696,9 @@ impl Omnigraph {
        branch: Option<&str>,
        table_key: &str,
        op_kind: crate::db::MutationOpKind,
-    ) -> Result<(SnapshotHandle, String, Option<String>)> {
-        table_ops::open_for_mutation_on_branch(self, branch, table_key, op_kind).await
+        txn: Option<&crate::db::WriteTxn>,
+    ) -> Result<OpenedForMutation> {
+        table_ops::open_for_mutation_on_branch(self, branch, table_key, op_kind, txn).await
    }

    /// Fork `table_key` onto `active_branch` from the given source state,
@ -1698,28 +1787,17 @@ impl Omnigraph {
        table_ops::commit_updates(self, updates).await
    }

-    pub(crate) async fn commit_manifest_updates(
+    /// Publish a branch merge: the merged table `updates` and the merge commit
+    /// in one manifest CAS (RFC-013 Phase 7). The merge commit's merged-in parent
+    /// is `merged_parent_commit_id` (the source head); its first parent is the
+    /// live target-branch head, resolved by the publisher.
+    pub(crate) async fn commit_merge_with_actor(
        &self,
        updates: &[crate::db::SubTableUpdate],
-    ) -> Result<u64> {
-        table_ops::commit_manifest_updates(self, updates).await
-    }
-
-    pub(crate) async fn record_merge_commit(
-        &self,
-        manifest_version: u64,
-        parent_commit_id: &str,
        merged_parent_commit_id: &str,
        actor_id: Option<&str>,
    ) -> Result<String> {
-        table_ops::record_merge_commit(
-            self,
-            manifest_version,
-            parent_commit_id,
-            merged_parent_commit_id,
-            actor_id,
-        )
-        .await
+        table_ops::commit_merge_with_actor(self, updates, merged_parent_commit_id, actor_id).await
    }

    pub(crate) async fn commit_updates_on_branch_with_expected(
@ -1728,6 +1806,8 @@ impl Omnigraph {
        updates: &[crate::db::SubTableUpdate],
        expected_table_versions: &std::collections::HashMap<String, u64>,
        actor_id: Option<&str>,
+        txn: Option<&crate::db::WriteTxn>,
+        committed_handles: std::collections::HashMap<String, crate::storage_layer::SnapshotHandle>,
    ) -> Result<u64> {
        table_ops::commit_updates_on_branch_with_expected(
            self,
@ -1735,6 +1815,8 @@ impl Omnigraph {
            updates,
            expected_table_versions,
            actor_id,
+            txn,
+            committed_handles,
        )
        .await
    }
@ -1939,14 +2021,14 @@ async fn init_storage_phase(
    if write_schema_pg {
        let schema_path = join_uri(root, SCHEMA_SOURCE_FILENAME);
        storage.write_text(&schema_path, schema_source).await?;
-        crate::failpoints::maybe_fail("init.after_schema_pg_written")?;
+        crate::failpoints::maybe_fail(crate::failpoints::names::INIT_AFTER_SCHEMA_PG_WRITTEN)?;
    }

    write_schema_contract(root, storage.as_ref(), schema_ir).await?;
-    crate::failpoints::maybe_fail("init.after_schema_contract_written")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::INIT_AFTER_SCHEMA_CONTRACT_WRITTEN)?;

    let coordinator = GraphCoordinator::init(root, catalog, Arc::clone(storage)).await?;
-    crate::failpoints::maybe_fail("init.after_coordinator_init")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::INIT_AFTER_COORDINATOR_INIT)?;

    Ok(coordinator)
 }
@ -2466,10 +2548,13 @@ edge WorksAt: Person -> Company
    }

    async fn seed_person_row(db: &mut Omnigraph, name: &str, age: Option<i32>) {
+        // No-txn entry, so the handle is always `Some` (collapse #1's skip is
+        // gated on `txn.is_some()`).
        let (ds, full_path, table_branch) = db
            .open_for_mutation("node:Person", crate::db::MutationOpKind::Insert)
            .await
-            .unwrap();
+            .unwrap()
+            .require_handle("seed_person_row test");
        let schema: Arc<Schema> = Arc::new(ds.dataset().schema().into());
        let columns: Vec<Arc<dyn Array>> = schema
            .fields()
--- a/crates/omnigraph/src/db/omnigraph/optimize.rs
+++ b/crates/omnigraph/src/db/omnigraph/optimize.rs
@ -512,7 +512,7 @@ async fn optimize_one_table(

        // Test seam: a concurrent (cross-process) writer can interleave here, before
        // any Phase-B commit lands, to exercise the reopen+replan path.
-        crate::failpoints::maybe_fail("optimize.before_compact")?;
+        crate::failpoints::maybe_fail(crate::failpoints::names::OPTIMIZE_BEFORE_COMPACT)?;

        // Phase B: scrub stale auto_cleanup (keeps optimize non-destructive on a
        // graph upgraded from a pre-v7 binary whose `compact_files`/`optimize_indices`
@ -549,7 +549,7 @@ async fn optimize_one_table(
        // committed (so HEAD is already ahead of the manifest from our own work),
        // exercising the own-HEAD (not external) drift classification on the next
        // reopened attempt.
-        if crate::failpoints::maybe_fail("optimize.inject_reindex_conflict").is_err()
+        if crate::failpoints::maybe_fail(crate::failpoints::names::OPTIMIZE_INJECT_REINDEX_CONFLICT).is_err()
            && attempt < COMPACTION_RETRY_BUDGET
        {
            continue;
@ -584,7 +584,7 @@ async fn optimize_one_table(

    // Pin the per-writer Phase B → Phase C residual: Lance HEAD has advanced but the
    // manifest publish below hasn't run.
-    crate::failpoints::maybe_fail("optimize.post_phase_b_pre_manifest_commit")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::OPTIMIZE_POST_PHASE_B_PRE_MANIFEST_COMMIT)?;

    // Phase C: monotonic fast-forward publish. The compaction is committed at Lance
    // HEAD `N`; publish a manifest pointer that includes it. If a concurrent writer
@ -921,7 +921,7 @@ pub async fn cleanup_all_tables(
    let results: Vec<TableCleanupStats> = futures::stream::iter(table_tasks.into_iter())
        .map(|(table_key, full_path)| async move {
            let outcome: Result<RemovalStats> = async {
-                crate::failpoints::maybe_fail("cleanup.table_gc")?;
+                crate::failpoints::maybe_fail(crate::failpoints::names::CLEANUP_TABLE_GC)?;
                // `cleanup_old_versions` is a Lance-only maintenance API not
                // surfaced through `TableStorage` — see the optimize path
                // above for the same rationale. Unwrap via `into_dataset()`.
@ -1079,7 +1079,7 @@ pub async fn reconcile_orphaned_branches(db: &Omnigraph) -> Result<BranchReconci
                }
                if !branch_snapshots.contains_key(&branch) {
                    let branch_snapshot =
-                        match crate::failpoints::maybe_fail("cleanup.resolve_branch_snapshot") {
+                        match crate::failpoints::maybe_fail(crate::failpoints::names::CLEANUP_RESOLVE_BRANCH_SNAPSHOT) {
                            Ok(()) => db.snapshot_for_branch(Some(&branch)).await,
                            Err(injected) => Err(injected),
                        };
@ -1158,7 +1158,7 @@ pub async fn reconcile_orphaned_branches(db: &Omnigraph) -> Result<BranchReconci
                    continue;
                }
            }
-            let outcome = match crate::failpoints::maybe_fail("cleanup.reconcile_fork") {
+            let outcome = match crate::failpoints::maybe_fail(crate::failpoints::names::CLEANUP_RECONCILE_FORK) {
                Ok(()) => storage.force_delete_branch(&full_path, &branch).await,
                Err(injected) => Err(injected),
            };
@ -1308,7 +1308,10 @@ mod tests {
            ds.create_branch("feature", base, None).await.unwrap();
        }

-        let _fp = ScopedFailPoint::new("cleanup.resolve_branch_snapshot", "return");
+        let _fp = ScopedFailPoint::new(
+            crate::failpoints::names::CLEANUP_RESOLVE_BRANCH_SNAPSHOT,
+            "return",
+        );
        let stats = reconcile_orphaned_branches(&db).await.unwrap();

        assert_eq!(
--- a/crates/omnigraph/src/db/omnigraph/schema_apply.rs
+++ b/crates/omnigraph/src/db/omnigraph/schema_apply.rs
@ -648,7 +648,7 @@ where
    // `recover_schema_state_files`:
    //   - crash before commit  → manifest unchanged; staging deleted on open
    //   - crash after commit   → manifest advanced; staging renamed on open
-    crate::failpoints::maybe_fail("schema_apply.before_staging_write")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::SCHEMA_APPLY_BEFORE_STAGING_WRITE)?;

    let staging_pg_uri = schema_source_staging_uri(&db.root_uri);
    db.storage
@ -656,7 +656,7 @@ where
        .await?;
    write_schema_contract_staging(&db.root_uri, db.storage.as_ref(), &desired_ir).await?;

-    crate::failpoints::maybe_fail("schema_apply.after_staging_write")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::SCHEMA_APPLY_AFTER_STAGING_WRITE)?;

    // `apply_schema` doesn't currently take an actor; system-attributed.
    let PublishedSnapshot {
@ -669,7 +669,7 @@ where
        .commit_changes_with_actor(&manifest_changes, None)
        .await?;

-    crate::failpoints::maybe_fail("schema_apply.after_manifest_commit")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::SCHEMA_APPLY_AFTER_MANIFEST_COMMIT)?;

    db.storage
        .rename_text(&staging_pg_uri, &schema_source_uri(&db.root_uri))
--- a/crates/omnigraph/src/db/omnigraph/table_ops.rs
+++ b/crates/omnigraph/src/db/omnigraph/table_ops.rs
@ -296,7 +296,7 @@ pub(super) async fn ensure_indices_for_branch(
    // (one commit_staged per index built) but the manifest publish below
    // hasn't run. Used by
    // `tests/failpoints.rs::ensure_indices_phase_b_failure_recovered_on_next_open`.
-    crate::failpoints::maybe_fail("ensure_indices.post_phase_b_pre_manifest_commit")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::ENSURE_INDICES_POST_PHASE_B_PRE_MANIFEST_COMMIT)?;

    if !updates.is_empty() {
        commit_prepared_updates_on_branch(db, branch, &updates, None).await?;
@ -488,18 +488,52 @@ pub(super) async fn needs_index_work_edge(
        || !db.storage().has_btree_index(&ds, "dst").await?)
 }

+/// Result of opening a sub-table for mutation. `handle` is `None` only when a
+/// non-strict (Insert/Merge) op on the WriteTxn's own branch skipped the
+/// accumulation open (RFC-013 step 3b collapse #1) — there the caller needs just
+/// `expected_version`. It is ALWAYS `Some` for strict ops, the fork path, and
+/// every no-`txn` caller (branch merge), which use [`Self::require_handle`].
+#[derive(Debug)]
+pub(crate) struct OpenedForMutation {
+    /// The opened dataset, or `None` on the non-strict-txn open-skip path.
+    pub(crate) handle: Option<SnapshotHandle>,
+    /// The publisher's CAS fence: the opened handle's version, or — when the open
+    /// was skipped — the pinned base entry's version (equal absent uncovered drift).
+    pub(crate) expected_version: u64,
+    pub(crate) full_path: String,
+    pub(crate) table_branch: Option<String>,
+}
+
+impl OpenedForMutation {
+    /// Destructure for a caller that REQUIRES the handle (strict ops, the fork
+    /// path, every no-`txn` caller). The `None` skip fires solely on the
+    /// non-strict `txn` path, which these callers are not — so a panic here means
+    /// a future change broke that contract, named by `ctx`.
+    pub(crate) fn require_handle(self, ctx: &str) -> (SnapshotHandle, String, Option<String>) {
+        let handle = self.handle.unwrap_or_else(|| {
+            panic!("{ctx}: open_for_mutation returned no handle on a path that requires one")
+        });
+        (handle, self.full_path, self.table_branch)
+    }
+}
+
 pub(super) async fn open_for_mutation(
    db: &Omnigraph,
    table_key: &str,
    op_kind: crate::db::MutationOpKind,
-) -> Result<(SnapshotHandle, String, Option<String>)> {
+) -> Result<OpenedForMutation> {
    let current_branch = db
        .coordinator
        .read()
        .await
        .current_branch()
        .map(str::to_string);
-    open_for_mutation_on_branch(db, current_branch.as_deref(), table_key, op_kind).await
+    // `open_for_mutation` is the no-txn entry (branch merge). Passing `None`
+    // keeps the exact pre-WriteTxn code path (a fresh `resolved_branch_target`
+    // that re-validates the schema). With `txn = None` the non-strict early-skip
+    // in `open_for_mutation_on_branch` never fires, so this always returns a
+    // `Some(handle)` for its callers.
+    open_for_mutation_on_branch(db, current_branch.as_deref(), table_key, op_kind, None).await
 }

 /// Open a sub-table for mutation. The `op_kind` selects the strict-vs-relaxed
@ -513,15 +547,69 @@ pub(super) async fn open_for_mutation_on_branch(
    branch: Option<&str>,
    table_key: &str,
    op_kind: crate::db::MutationOpKind,
-) -> Result<(SnapshotHandle, String, Option<String>)> {
+    txn: Option<&crate::db::WriteTxn>,
+) -> Result<OpenedForMutation> {
    db.ensure_schema_apply_not_locked("write").await?;
-    let resolved = db.resolved_branch_target(branch).await?;
-    let entry = resolved
-        .snapshot
+    // Source the resolved (snapshot, branch). With a `WriteTxn` the contract was
+    // validated once at capture, so use the pinned base + resolved branch instead
+    // of `resolved_branch_target` (which re-runs `ensure_schema_state_valid`). The
+    // base is the same fresh per-branch manifest read the no-txn path would have
+    // resolved — only the redundant schema re-validation is dropped. Without a txn
+    // this is byte-identical to the prior `resolved_branch_target` call.
+    let (snapshot, resolved_branch) = match txn {
+        Some(txn) => (txn.base.clone(), txn.branch.clone()),
+        None => {
+            let resolved = db.resolved_branch_target(branch).await?;
+            (resolved.snapshot, resolved.branch)
+        }
+    };
+    let entry = snapshot
        .entry(table_key)
        .ok_or_else(|| OmniError::manifest(format!("no manifest entry for {}", table_key)))?;
    let full_path = format!("{}/{}", db.root_uri, entry.table_path);
-    match resolved.branch.as_deref() {
+
+    // Collapse #1 (RFC-013 step 3b): a non-strict op (Insert/Merge) on the txn's
+    // own branch needs no dataset open for ACCUMULATION — the only thing the
+    // caller reads from this handle on the non-strict path is `.version()` (the
+    // publisher's CAS fence), which is exactly the pinned base version. The base
+    // already validated the schema contract once, and the staging reopen
+    // (`reopen_for_mutation`) plus the publisher CAS in `commit_all` are the real
+    // drift guards. So skip `open_dataset_head_for_write` entirely and source the
+    // expected version from the pinned entry.
+    //
+    // Gated on `txn.is_some()`: without a txn (branch merge's `open_for_mutation`)
+    // every arm below is byte-identical to before. STRICT ops (Update/Delete/
+    // SchemaRewrite) always open live HEAD + run `ensure_expected_version`
+    // (read-modify-write SI), and any write that must FORK (the table isn't yet on
+    // the resolved branch) opens too (the fork is a real Lance state advance the
+    // manifest snapshot can't substitute for).
+    if txn.is_some() && !op_kind.strict_pre_stage_version_check() {
+        match resolved_branch.as_deref() {
+            // Non-strict, table already on the active branch → no open, no fork.
+            Some(active_branch) if entry.table_branch.as_deref() == Some(active_branch) => {
+                return Ok(OpenedForMutation {
+                    handle: None,
+                    expected_version: entry.table_version,
+                    full_path,
+                    table_branch: Some(active_branch.to_string()),
+                });
+            }
+            // Main branch, non-strict → no open. (Main never forks.)
+            None => {
+                return Ok(OpenedForMutation {
+                    handle: None,
+                    expected_version: entry.table_version,
+                    full_path,
+                    table_branch: None,
+                });
+            }
+            // Non-strict but the table isn't on the active branch yet — falls
+            // through to fork below.
+            Some(_) => {}
+        }
+    }
+
+    match resolved_branch.as_deref() {
        None => {
            let ds = db
                .storage()
@ -531,7 +619,13 @@ pub(super) async fn open_for_mutation_on_branch(
                db.storage()
                    .ensure_expected_version(&ds, table_key, entry.table_version)?;
            }
-            Ok((ds, full_path, None))
+            let version = ds.version();
+            Ok(OpenedForMutation {
+                handle: Some(ds),
+                expected_version: version,
+                full_path,
+                table_branch: None,
+            })
        }
        Some(active_branch) => {
            let (ds, table_branch) = open_owned_dataset_for_branch_write(
@ -544,7 +638,13 @@ pub(super) async fn open_for_mutation_on_branch(
                op_kind,
            )
            .await?;
-            Ok((ds, full_path, table_branch))
+            let version = ds.version();
+            Ok(OpenedForMutation {
+                handle: Some(ds),
+                expected_version: version,
+                full_path,
+                table_branch,
+            })
        }
    }
 }
@ -571,7 +671,7 @@ pub(super) async fn open_owned_dataset_for_branch_write(
            Ok((ds, Some(active_branch.to_string())))
        }
        source_branch => {
-            crate::failpoints::maybe_fail("fork.before_classify")?;
+            crate::failpoints::maybe_fail(crate::failpoints::names::FORK_BEFORE_CLASSIFY)?;
            // Authority check before forking: re-read the live manifest. If this
            // table is already forked on active_branch, a concurrent first-write
            // won the race and our snapshot is stale — that is a retryable
@ -667,7 +767,7 @@ pub(crate) async fn classify_fork_ref(
    // fresh-authority read (no-op without the `failpoints` feature). Lets a
    // test exercise the Indeterminate path — a read failure on a live branch
    // must classify as Indeterminate (skip), never Orphan (destroy).
-    let fresh = match crate::failpoints::maybe_fail("classify.fresh_read") {
+    let fresh = match crate::failpoints::maybe_fail(crate::failpoints::names::CLASSIFY_FRESH_READ) {
        Ok(()) => db.fresh_snapshot_for_branch(Some(branch)).await,
        Err(injected) => Err(injected),
    };
@ -751,7 +851,7 @@ pub(super) async fn reclaim_orphaned_fork_and_refork(
        }
    }

-    crate::failpoints::maybe_fail("fork.before_reclaim")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::FORK_BEFORE_RECLAIM)?;
    db.storage()
        .force_delete_branch(full_path, active_branch)
        .await
@ -1014,7 +1114,7 @@ async fn stage_and_commit_btree(
    // to demonstrate that a stage-step failure in the staged-index
    // path (`stage_create_btree_index` succeeded; `commit_staged` not
    // yet called) leaves no Lance-HEAD drift on the touched table.
-    crate::failpoints::maybe_fail("ensure_indices.post_stage_pre_commit_btree")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::ENSURE_INDICES_POST_STAGE_PRE_COMMIT_BTREE)?;
    let new_ds = db
        .storage()
        .commit_staged(ds.clone(), staged)
@ -1065,12 +1165,30 @@ async fn prepare_updates_for_commit(
    db: &Omnigraph,
    branch: Option<&str>,
    updates: &[crate::db::SubTableUpdate],
+    txn: Option<&crate::db::WriteTxn>,
+    // Post-`commit_staged` handles handed out by `StagedMutation::commit_all`
+    // (RFC-013 step 3b, collapse #4): table_key → the handle already open at
+    // its just-committed version. When a table's handle is present, the index
+    // build below reuses it and SKIPS the `reopen_for_mutation` open. Absent
+    // entries (other writers — schema apply, merge, ensure_indices, tests —
+    // pass `HashMap::new()`; inline-committed/delete tables are never staged)
+    // keep the byte-identical `reopen_for_mutation` path.
+    mut committed_handles: std::collections::HashMap<String, SnapshotHandle>,
 ) -> Result<Vec<crate::db::SubTableUpdate>> {
    if updates.is_empty() {
        return Ok(Vec::new());
    }

-    let snapshot = db.snapshot_for_branch(branch).await?;
+    // With a `WriteTxn` the schema contract was validated once at capture, so
+    // reuse the pinned base entries (same per-branch manifest snapshot) instead
+    // of `snapshot_for_branch` (which re-runs `ensure_schema_state_valid`). Only
+    // the `entry(table_key).table_path` is read out of it here, identical to the
+    // no-txn path; the post-`commit_staged` index build below still reopens the
+    // dataset at its just-committed version. Without a txn, byte-identical.
+    let snapshot = match txn {
+        Some(txn) => txn.base.clone(),
+        None => db.snapshot_for_branch(branch).await?,
+    };
    let mut prepared = Vec::with_capacity(updates.len());

    for update in updates {
@ -1084,21 +1202,34 @@ async fn prepare_updates_for_commit(
        let mut prepared_update = update.clone();
        if prepared_update.row_count > 0 {
            let full_path = format!("{}/{}", db.root_uri, entry.table_path);
-            // Strict version check is correct here: this runs INSIDE
+            // Reuse the post-`commit_staged` handle when the caller handed one
+            // out (collapse #4): it is already open at exactly
+            // `prepared_update.table_version`, so the defense-in-depth strict
+            // re-check `reopen_for_mutation` would run is trivially satisfied
+            // and the open is redundant. When no handle is present (other
+            // writers, or any non-staged table), fall back to the byte-identical
+            // `reopen_for_mutation` path.
+            //
+            // Strict version check is correct on the fallback: this runs INSIDE
            // the publisher commit path, after `commit_staged` already
            // advanced Lance HEAD to `prepared_update.table_version`.
            // The check is a defense-in-depth assertion that the
            // dataset state matches what we just committed; not the
            // pre-stage race the op-kind policy targets.
-            let mut ds = reopen_for_mutation(
-                db,
-                &prepared_update.table_key,
-                &full_path,
-                prepared_update.table_branch.as_deref(),
-                prepared_update.table_version,
-                crate::db::MutationOpKind::SchemaRewrite,
-            )
-            .await?;
+            let mut ds = match committed_handles.remove(&prepared_update.table_key) {
+                Some(ds) => ds,
+                None => {
+                    reopen_for_mutation(
+                        db,
+                        &prepared_update.table_key,
+                        &full_path,
+                        prepared_update.table_branch.as_deref(),
+                        prepared_update.table_version,
+                        crate::db::MutationOpKind::SchemaRewrite,
+                    )
+                    .await?
+                }
+            };
            // Any column not yet buildable (e.g. a vector column whose rows
            // have null embeddings) is deferred and logged inside
            // build_indices; a later ensure_indices/optimize materializes it.
@ -1237,37 +1368,27 @@ pub(super) async fn commit_updates(
        .await
        .current_branch()
        .map(str::to_string);
-    let prepared = prepare_updates_for_commit(db, current_branch.as_deref(), updates).await?;
+    let prepared = prepare_updates_for_commit(
+        db,
+        current_branch.as_deref(),
+        updates,
+        None,
+        std::collections::HashMap::new(),
+    )
+    .await?;
    commit_prepared_updates(db, &prepared, None).await
 }

-pub(super) async fn commit_manifest_updates(
+pub(super) async fn commit_merge_with_actor(
    db: &Omnigraph,
    updates: &[crate::db::SubTableUpdate],
-) -> Result<u64> {
-    db.coordinator
-        .write()
-        .await
-        .commit_manifest_updates(updates)
-        .await
-}
-
-pub(super) async fn record_merge_commit(
-    db: &Omnigraph,
-    manifest_version: u64,
-    parent_commit_id: &str,
    merged_parent_commit_id: &str,
    actor_id: Option<&str>,
 ) -> Result<String> {
    db.coordinator
        .write()
        .await
-        .record_merge_commit(
-            manifest_version,
-            parent_commit_id,
-            merged_parent_commit_id,
-            actor_id,
-        )
+        .commit_merge_with_actor(updates, merged_parent_commit_id, actor_id)
        .await
        .map(|snapshot_id| snapshot_id.as_str().to_string())
 }
@ -1281,9 +1402,12 @@ pub(super) async fn commit_updates_on_branch_with_expected(
    updates: &[crate::db::SubTableUpdate],
    expected_table_versions: &std::collections::HashMap<String, u64>,
    actor_id: Option<&str>,
+    txn: Option<&crate::db::WriteTxn>,
+    committed_handles: std::collections::HashMap<String, SnapshotHandle>,
 ) -> Result<u64> {
    db.ensure_schema_apply_not_locked("write commit").await?;
-    let prepared = prepare_updates_for_commit(db, branch, updates).await?;
+    let prepared =
+        prepare_updates_for_commit(db, branch, updates, txn, committed_handles).await?;
    commit_prepared_updates_on_branch_with_expected(
        db,
        branch,
--- a/crates/omnigraph/src/db/recovery_audit.rs
+++ b/crates/omnigraph/src/db/recovery_audit.rs
@ -14,15 +14,14 @@
 //! this change additive.
 //!
 //! Atomicity caveat: append to `_graph_commit_recoveries.lance` is
-//! sequential w.r.t. the `CommitGraph::append_commit` write. A crash
-//! between the two leaves an orphan commit-graph row with no audit row.
-//! Same shape as the existing `_graph_commits` + `_graph_commit_actors`
-//! split; the recovery sweep tolerates it the same way (re-entry sees
-//! `NoMovement` for already-restored / already-published tables; the
-//! audit append is retried).
+//! sequential w.r.t. the recovery commit, which RFC-013 Phase 7 records in
+//! `__manifest` (folded into the recovery publish CAS via `publish_recovery_commit`).
+//! A crash between the publish and this audit append leaves a recovery commit
+//! with no audit row. The recovery sweep tolerates it the same way (re-entry
+//! sees `NoMovement` for already-restored / already-published tables; the audit
+//! append is retried, minting a fresh recovery commit).

 use std::sync::Arc;
-use std::time::{SystemTime, UNIX_EPOCH};

 use arrow_array::{
    Array, RecordBatch, RecordBatchIterator, StringArray, TimestampMicrosecondArray,
@ -195,7 +194,11 @@ async fn create_recoveries_dataset(root_uri: &str) -> Result<Dataset> {
    };
    match Dataset::write(reader, &uri as &str, Some(params)).await {
        Ok(dataset) => Ok(dataset),
-        Err(err) if err.to_string().contains("Dataset already exists") => Dataset::open(&uri)
+        // Create-or-open idempotency — match the typed `DatasetAlreadyExists`
+        // variant, not the display string (not a Lance API contract). Same
+        // discipline as `commit_graph.rs`'s create-or-open; pinned by
+        // `lance_surface_guards.rs::lance_error_dataset_already_exists_variant_exists`.
+        Err(lance::Error::DatasetAlreadyExists { .. }) => Dataset::open(&uri)
            .await
            .map_err(|open_err| OmniError::Lance(open_err.to_string())),
        Err(err) => Err(OmniError::Lance(err.to_string())),
@ -276,13 +279,6 @@ fn decode_row(batch: &RecordBatch, row: usize) -> Result<RecoveryAuditRecord> {
    })
 }

-pub(crate) fn now_micros() -> Result<i64> {
-    SystemTime::now()
-        .duration_since(UNIX_EPOCH)
-        .map(|d| d.as_micros() as i64)
-        .map_err(|e| OmniError::manifest_internal(format!("system clock before unix epoch: {}", e)))
-}
-
 #[cfg(test)]
 mod tests {
    use super::*;
--- a/crates/omnigraph/src/exec/merge.rs
+++ b/crates/omnigraph/src/exec/merge.rs
@ -1068,10 +1068,13 @@ async fn publish_rewritten_merge_table(
    // source onto target). The inline `delete_where` later in this
    // function operates on rows the rewrite chose to remove, not
    // user-facing predicates, so Merge is the correct policy here.
-    let (ds, full_path, table_branch) = target_db
+    // `open_for_mutation` is the no-txn entry, so collapse #1's non-strict
+    // open-skip (gated on `txn.is_some()`) never fires here — the handle is
+    // always `Some`.
+    let (mut current_ds, full_path, table_branch) = target_db
        .open_for_mutation(table_key, crate::db::MutationOpKind::Merge)
-        .await?;
-    let mut current_ds = ds;
+        .await?
+        .require_handle("branch merge");

    // Phase 1: merge_insert changed/new rows (preserves _row_created_at_version for
    // existing rows, bumps _row_last_updated_at_version only for actually-changed rows).
@ -1125,7 +1128,7 @@ async fn publish_rewritten_merge_table(
    // rows are on Lance HEAD but the delete has not committed and the
    // achieved-version intent has not been recorded, so recovery must roll BACK.
    // See tests/failpoints.rs::branch_merge_rewrite_partial_after_merge_rolls_back.
-    crate::failpoints::maybe_fail("branch_merge.rewrite_after_merge_pre_delete")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::BRANCH_MERGE_REWRITE_AFTER_MERGE_PRE_DELETE)?;

    // Phase 2: delete removed rows via deletion vectors.
    //
@ -1156,7 +1159,7 @@ async fn publish_rewritten_merge_table(
    // recorded, so recovery must roll BACK (the index is reconciler-owned derived
    // state, but the merge itself never reached its commit boundary). See
    // tests/failpoints.rs::branch_merge_rewrite_partial_after_delete_rolls_back.
-    crate::failpoints::maybe_fail("branch_merge.rewrite_after_delete_pre_index")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::BRANCH_MERGE_REWRITE_AFTER_DELETE_PRE_INDEX)?;

    // Phase 3: rebuild indices.
    //
@ -1237,10 +1240,13 @@ async fn publish_adopted_delta(
    table_key: &str,
    delta: &AdoptDelta,
 ) -> Result<crate::db::SubTableUpdate> {
-    let (ds, full_path, table_branch) = target_db
+    // `open_for_mutation` is the no-txn entry, so collapse #1's non-strict
+    // open-skip (gated on `txn.is_some()`) never fires here — the handle is
+    // always `Some`.
+    let (mut current_ds, full_path, table_branch) = target_db
        .open_for_mutation(table_key, crate::db::MutationOpKind::Merge)
-        .await?;
-    let mut current_ds = ds;
+        .await?
+        .require_handle("branch merge");

    // Phase 1a: append the NEW rows. `stage_append_stream` is a streaming
    // `Operation::Append` — no hash join — so it never buffers the delta and
@ -1270,7 +1276,7 @@ async fn publish_adopted_delta(
    // have not committed and the achieved-version intent has not been recorded, so
    // recovery must roll BACK (not publish the appends-only state). See
    // tests/failpoints.rs::branch_merge_adopt_partial_after_append_rolls_back.
-    crate::failpoints::maybe_fail("branch_merge.adopt_after_append_pre_upsert")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::BRANCH_MERGE_ADOPT_AFTER_APPEND_PRE_UPSERT)?;

    // Phase 1b: upsert the CHANGED rows. The merge_insert hash join is now
    // bounded to the genuinely-changed set, not the whole delta. It runs against
@ -1302,7 +1308,7 @@ async fn publish_adopted_delta(
    // has not committed and the achieved-version intent has not been recorded, so
    // recovery must roll BACK. See
    // tests/failpoints.rs::branch_merge_adopt_partial_after_upsert_rolls_back.
-    crate::failpoints::maybe_fail("branch_merge.adopt_after_upsert_pre_delete")?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::BRANCH_MERGE_ADOPT_AFTER_UPSERT_PRE_DELETE)?;

    // Phase 2: delete removed rows via deletion vectors (inline-commit residual,
    // same as the three-way path until Lance ships a public two-phase delete).
@ -1787,17 +1793,22 @@ impl Omnigraph {
        // (publish_*) AND the sidecar is confirmed, but the manifest publish
        // below hasn't run — so recovery rolls FORWARD. Used by
        // `tests/failpoints.rs::branch_merge_phase_b_failure_recovered_on_next_open`.
-        crate::failpoints::maybe_fail("branch_merge.post_phase_b_pre_manifest_commit")?;
+        crate::failpoints::maybe_fail(crate::failpoints::names::BRANCH_MERGE_POST_PHASE_B_PRE_MANIFEST_COMMIT)?;

-        let manifest_version = if updates.is_empty() {
-            self.version().await
-        } else {
-            self.commit_manifest_updates(&updates).await?
-        };
+        // Publish the merged table versions AND the merge commit in one manifest
+        // CAS (RFC-013 Phase 7): `graph_commit` + `graph_head` rows ride the same
+        // merge-insert as the table-version rows. The merge commit's first parent
+        // is resolved by the publisher as the live target-branch head (the
+        // post-merge correct parent even if the target advanced); its merged-in
+        // parent is the source head. `target_head_commit_id` is no longer passed
+        // — it was the pre-merge target head, which the publisher reads live.
+        let _ = target_head_commit_id;
+        self.commit_merge_with_actor(&updates, source_head_commit_id, actor_id)
+            .await?;

-        // Recovery sidecar lifecycle: delete after manifest publish.
-        // Best-effort cleanup; the merge already landed durably so
-        // failing the user here is undesirable.
+        // Recovery sidecar lifecycle: delete after the manifest publish (Phase C).
+        // Best-effort cleanup; the merge already landed durably so failing the
+        // user here is undesirable.
        if let Some((_, handle)) = recovery {
            if let Err(err) =
                crate::db::manifest::delete_sidecar(&handle, self.storage_adapter()).await
@ -1809,13 +1820,6 @@ impl Omnigraph {
                );
            }
        }
-        self.record_merge_commit(
-            manifest_version,
-            target_head_commit_id,
-            source_head_commit_id,
-            actor_id,
-        )
-        .await?;

        if changed_edge_tables {
            self.invalidate_graph_index().await;
--- a/crates/omnigraph/src/exec/mutation.rs
+++ b/crates/omnigraph/src/exec/mutation.rs
@ -601,13 +601,51 @@ use super::staging::{MutationStaging, PendingMode};
 /// away once Lance exposes a two-phase delete API
 /// ([lance-format/lance#6658](https://github.com/lance-format/lance/issues/6658))
 /// and we can stage deletes on the same path as inserts/updates.
+impl Omnigraph {
+    /// Resolve a LIVE-HEAD read handle for an edge table's committed-state `@card`
+    /// scan when collapse #1 skipped the accumulation open. The edge-insert path no
+    /// longer opens the edge dataset (non-strict op + txn), but cardinality is
+    /// validated ONCE (never rechecked at commit), so the scan must observe the
+    /// freshest committed edges — NOT the pinned `txn.base`. A concurrent writer can
+    /// commit edges to this table after `txn` capture; counting against the stale
+    /// base undercounts and lets a violating insert through (invariant 9). The table
+    /// LOCATION is read from the pinned entry (stable across versions); the dataset is
+    /// opened at live HEAD via `open_dataset_head_for_write` (a read here despite the
+    /// name — no lock/stage), restoring the pre-3b image (the mutation's own open).
+    /// The residual validate→commit race (a writer committing between this scan and
+    /// the end-of-query commit) is the §7.1 gap, closed by RFC-013 step 4.
+    async fn edge_cardinality_read_handle(
+        &self,
+        txn: Option<&crate::db::WriteTxn>,
+        table_key: &str,
+    ) -> Result<SnapshotHandle> {
+        let branch = txn.and_then(|t| t.branch.as_deref());
+        match txn.and_then(|t| t.base.entry(table_key)) {
+            Some(entry) => {
+                let full_path = self.storage().dataset_uri(&entry.table_path);
+                self.storage()
+                    .open_dataset_head_for_write(table_key, &full_path, branch)
+                    .await
+            }
+            // Unreachable today (the `None` handle only reaches here under a txn whose
+            // base contains the table). Defensive: resolve the table fresh (live)
+            // without the schema re-validation `snapshot_for_branch` would re-run.
+            None => {
+                let snapshot = self.fresh_snapshot_for_branch_unchecked(branch).await?;
+                self.storage().open_snapshot_at_table(&snapshot, table_key).await
+            }
+        }
+    }
+}
+
 async fn open_table_for_mutation(
    db: &Omnigraph,
    staging: &mut MutationStaging,
    branch: Option<&str>,
    table_key: &str,
    op_kind: crate::db::MutationOpKind,
-) -> Result<(SnapshotHandle, String, Option<String>)> {
+    txn: Option<&crate::db::WriteTxn>,
+) -> Result<(Option<SnapshotHandle>, String, Option<String>)> {
    if let Some(prior) = staging.inline_committed.get(table_key) {
        let path = staging.paths.get(table_key).ok_or_else(|| {
            OmniError::manifest_internal(format!(
@ -615,6 +653,10 @@ async fn open_table_for_mutation(
                table_key
            ))
        })?;
+        // The inline-committed reopen does NOT validate the schema contract
+        // (it reopens at the post-inline-commit Lance version directly), so it
+        // takes no `txn` — threading it here would change nothing. Deletes are
+        // strict ops, so this always opens (returns `Some`).
        let ds = db
            .reopen_for_mutation(
                table_key,
@ -624,20 +666,32 @@ async fn open_table_for_mutation(
                op_kind,
            )
            .await?;
-        return Ok((ds, path.full_path.clone(), path.table_branch.clone()));
+        return Ok((Some(ds), path.full_path.clone(), path.table_branch.clone()));
    }
-    let (ds, full_path, table_branch) = db
-        .open_for_mutation_on_branch(branch, table_key, op_kind)
+    // `open_for_mutation_on_branch` returns the expected version even when it
+    // skips the open (collapse #1, the non-strict insert/merge path): the version
+    // is the pinned base's, identical to the opened handle's `.version()`. Use it
+    // directly for `ensure_path` so the no-open path still captures the publisher
+    // CAS fence.
+    let opened = db
+        .open_for_mutation_on_branch(branch, table_key, op_kind, txn)
        .await?;
-    let expected_version = ds.version();
+    // Pin the open-skip contract (collapse #1): a missing handle is legal ONLY on
+    // the non-strict `txn` path. A future change that returns `None` elsewhere
+    // (e.g. a new strict arm) trips this in debug builds rather than silently
+    // handing a `None` to a `require_handle` consumer.
+    debug_assert!(
+        opened.handle.is_some() || (txn.is_some() && !op_kind.strict_pre_stage_version_check()),
+        "open_for_mutation_on_branch returned no handle outside the non-strict txn open-skip path",
+    );
    staging.ensure_path(
        table_key,
-        full_path.clone(),
-        table_branch.clone(),
-        expected_version,
+        opened.full_path.clone(),
+        opened.table_branch.clone(),
+        opened.expected_version,
        op_kind,
    );
-    Ok((ds, full_path, table_branch))
+    Ok((opened.handle, opened.full_path, opened.table_branch))
 }

 /// D₂ parse-time check: a single mutation query is either insert/update-only
@ -720,14 +774,14 @@ impl Omnigraph {
        params: &ParamMap,
        actor_id: Option<&str>,
    ) -> Result<MutationResult> {
-        self.ensure_schema_state_valid().await?;
        // Converge any pending recovery sidecar (a previously failed
        // writer's Phase B → Phase C residual) before executing: the
        // inline delete path advances Lance HEAD during execution and
        // the staged path's commit-time drift guard refuses
        // sidecar-covered drift, so a long-lived handle must heal here
        // — not at restart. One `list_dir` when no sidecars exist (the
-        // steady state).
+        // steady state). MUST run before `open_write_txn` below — the heal
+        // may advance the manifest, so the pinned base must be captured after.
        self.heal_pending_recovery_sidecars().await?;
        let requested = Self::normalize_branch_name(branch)?;
        // Reject internal `__run__*` / system-prefixed branches at the
@ -737,6 +791,16 @@ impl Omnigraph {
        if let Some(name) = requested.as_deref() {
            crate::db::ensure_public_branch_ref(name, "mutate")?;
        }
+        // Capture-once write transaction (RFC-013 step 3b). `open_write_txn`
+        // validates the schema contract ONCE (it resolves the branch target,
+        // whose first line is `ensure_schema_state_valid`) and pins the base
+        // snapshot for this write. Threaded as `Some(&txn)` through execution,
+        // staging commit, and the manifest publish so the per-table opens and
+        // the commit-time OCC re-read reuse the pinned base instead of
+        // re-validating the contract at every resolve point. Captured AFTER the
+        // recovery heal (which may advance the manifest) and AFTER `requested`
+        // is known so it pins the post-heal snapshot for the correct branch.
+        let txn = self.open_write_txn(requested.as_deref()).await?;
        let resolved_params = enrich_mutation_params(params)?;

        // Per-query staging accumulator. Inserts and updates push batches
@ -785,7 +849,13 @@ impl Omnigraph {
        };

        let exec_result = self
-            .execute_named_mutation(&ir, &resolved_params, requested.as_deref(), &mut staging)
+            .execute_named_mutation(
+                &ir,
+                &resolved_params,
+                requested.as_deref(),
+                &mut staging,
+                Some(&txn),
+            )
            .await;

        match exec_result {
@ -799,13 +869,20 @@ impl Omnigraph {
                // interleave between our commit_staged and our publish
                // (which would correctly fail our CAS but leave Lance
                // HEAD advanced — the residual class MR-870 recovers).
-                let (updates, expected_versions, sidecar_handle, _queue_guards) = staged
+                let super::staging::CommittedMutation {
+                    updates,
+                    expected_versions,
+                    sidecar_handle,
+                    guards: _queue_guards,
+                    committed_handles,
+                } = staged
                    .commit_all(
                        self,
                        requested.as_deref(),
                        crate::db::manifest::SidecarKind::Mutation,
                        actor_id,
                        fork_queue_guards,
+                        Some(&txn),
                    )
                    .await?;
                // Failpoint that wedges the documented finalize→publisher
@ -818,12 +895,14 @@ impl Omnigraph {
                // across this failure so the next `Omnigraph::open`'s
                // recovery sweep can roll forward — see
                // `tests/failpoints.rs::recovery_rolls_forward_after_finalize_publisher_failure`.
-                crate::failpoints::maybe_fail("mutation.post_finalize_pre_publisher")?;
+                crate::failpoints::maybe_fail(crate::failpoints::names::MUTATION_POST_FINALIZE_PRE_PUBLISHER)?;
                self.commit_updates_on_branch_with_expected(
                    requested.as_deref(),
                    &updates,
                    &expected_versions,
                    actor_id,
+                    Some(&txn),
+                    committed_handles,
                )
                .await?;
                // Phase C succeeded — sidecar can be deleted. If this
@ -938,6 +1017,7 @@ impl Omnigraph {
        params: &ParamMap,
        branch: Option<&str>,
        staging: &mut MutationStaging,
+        txn: Option<&crate::db::WriteTxn>,
    ) -> Result<MutationResult> {
        let mut total = MutationResult::default();
        for op in &ir.ops {
@ -946,7 +1026,7 @@ impl Omnigraph {
                    type_name,
                    assignments,
                } => {
-                    self.execute_insert(type_name, assignments, params, branch, staging)
+                    self.execute_insert(type_name, assignments, params, branch, staging, txn)
                        .await?
                }
                MutationOpIR::Update {
@ -954,14 +1034,16 @@ impl Omnigraph {
                    assignments,
                    predicate,
                } => {
-                    self.execute_update(type_name, assignments, predicate, params, branch, staging)
-                        .await?
+                    self.execute_update(
+                        type_name, assignments, predicate, params, branch, staging, txn,
+                    )
+                    .await?
                }
                MutationOpIR::Delete {
                    type_name,
                    predicate,
                } => {
-                    self.execute_delete(type_name, predicate, params, branch, staging)
+                    self.execute_delete(type_name, predicate, params, branch, staging, txn)
                        .await?
                }
            };
@ -978,6 +1060,7 @@ impl Omnigraph {
        params: &ParamMap,
        branch: Option<&str>,
        staging: &mut MutationStaging,
+        txn: Option<&crate::db::WriteTxn>,
    ) -> Result<MutationResult> {
        let mut resolved: HashMap<String, Literal> = HashMap::new();
        for a in assignments {
@ -1025,8 +1108,12 @@ impl Omnigraph {
            } else {
                crate::db::MutationOpKind::Insert
            };
+            // Node inserts are non-strict (Insert/Merge), so with a `WriteTxn`
+            // this opens NOTHING (collapse #1) — the handle is discarded anyway;
+            // only `ensure_path`'s captured version (read inside
+            // `open_table_for_mutation`) is used downstream.
            let (_ds, _full_path, _table_branch) =
-                open_table_for_mutation(self, staging, branch, &table_key, insert_kind).await?;
+                open_table_for_mutation(self, staging, branch, &table_key, insert_kind, txn).await?;
            // Accumulate. @key inserts go into the Merge stream (so a
            // later update on the same id coalesces correctly); no-key
            // inserts go into the Append stream.
@ -1059,13 +1146,16 @@ impl Omnigraph {
                )?;
            }
            let table_key = format!("edge:{}", type_name);
-            // Capture pre-write metadata on first touch (no Lance write).
-            let (ds, _full_path, _table_branch) = open_table_for_mutation(
+            // Capture pre-write metadata on first touch. Edge inserts are
+            // non-strict, so with a `WriteTxn` this opens NOTHING (collapse #1)
+            // and returns `None`.
+            let (handle, _full_path, _table_branch) = open_table_for_mutation(
                self,
                staging,
                branch,
                &table_key,
                crate::db::MutationOpKind::Insert,
+                txn,
            )
            .await?;
            // Accumulate the new edge row. Edge IDs are ULID-generated so
@ -1075,9 +1165,27 @@ impl Omnigraph {
            // Edge cardinality validation: scan committed edges via Lance
            // + iterate pending edges in-memory for the `src` column,
            // group-by-src. The pending side already includes the row
-            // we just appended (above).
-            validate_edge_cardinality_with_pending(self, &ds, staging, &table_key, edge_type)
+            // we just appended (above). When the open was skipped (collapse
+            // #1), resolve a read handle for the committed scan at LIVE HEAD
+            // (`edge_cardinality_read_handle`, #298) — NOT the pinned txn.base,
+            // which would undercount edges a concurrent writer committed since
+            // capture. Only when cardinality is non-default, so the common
+            // default-cardinality edge keeps the open-free path. (The residual
+            // validate→commit race is the §7.1 gap — step 4.)
+            if !edge_type.cardinality.is_default() {
+                let committed_ds = match handle {
+                    Some(h) => h,
+                    None => self.edge_cardinality_read_handle(txn, &table_key).await?,
+                };
+                validate_edge_cardinality_with_pending(
+                    self,
+                    &committed_ds,
+                    staging,
+                    &table_key,
+                    edge_type,
+                )
                .await?;
+            }

            self.invalidate_graph_index().await;

@ -1098,6 +1206,7 @@ impl Omnigraph {
        params: &ParamMap,
        branch: Option<&str>,
        staging: &mut MutationStaging,
+        txn: Option<&crate::db::WriteTxn>,
    ) -> Result<MutationResult> {
        // Defense in depth: ensure this is a node type
        if !self.catalog().node_types.contains_key(type_name) {
@ -1122,14 +1231,18 @@ impl Omnigraph {
        let blob_props = self.catalog().node_types[type_name].blob_properties.clone();

        let table_key = format!("node:{}", type_name);
-        let (ds, _full_path, _table_branch) = open_table_for_mutation(
+        let (handle, _full_path, _table_branch) = open_table_for_mutation(
            self,
            staging,
            branch,
            &table_key,
            crate::db::MutationOpKind::Update,
+            txn,
        )
        .await?;
+        // Update is a STRICT op, so collapse #1 never skips its open — the
+        // handle is always `Some` (and it's needed for the committed scan below).
+        let ds = handle.expect("strict Update op always opens its dataset");

        // Scan committed via Lance + apply the same predicate to pending
        // batches via DataFusion `MemTable` (read-your-writes for prior
@ -1228,13 +1341,14 @@ impl Omnigraph {
        params: &ParamMap,
        branch: Option<&str>,
        staging: &mut MutationStaging,
+        txn: Option<&crate::db::WriteTxn>,
    ) -> Result<MutationResult> {
        let is_node = self.catalog().node_types.contains_key(type_name);
        if is_node {
-            self.execute_delete_node(type_name, predicate, params, branch, staging)
+            self.execute_delete_node(type_name, predicate, params, branch, staging, txn)
                .await
        } else {
-            self.execute_delete_edge(type_name, predicate, params, branch, staging)
+            self.execute_delete_edge(type_name, predicate, params, branch, staging, txn)
                .await
        }
    }
@ -1246,18 +1360,22 @@ impl Omnigraph {
        params: &ParamMap,
        branch: Option<&str>,
        staging: &mut MutationStaging,
+        txn: Option<&crate::db::WriteTxn>,
    ) -> Result<MutationResult> {
        let pred_sql = predicate_to_sql(predicate, params, false)?;

        let table_key = format!("node:{}", type_name);
-        let (ds, full_path, table_branch) = open_table_for_mutation(
+        let (handle, full_path, table_branch) = open_table_for_mutation(
            self,
            staging,
            branch,
            &table_key,
            crate::db::MutationOpKind::Delete,
+            txn,
        )
        .await?;
+        // Delete is a STRICT op, so collapse #1 never skips its open.
+        let ds = handle.expect("strict Delete op always opens its dataset");
        let initial_version = ds.version();

        // Scan matching IDs for cascade. Per D₂ this never overlaps with
@ -1305,7 +1423,7 @@ impl Omnigraph {
                crate::db::MutationOpKind::Delete,
            )
            .await?;
-        crate::failpoints::maybe_fail("mutation.delete_node_pre_primary_delete")?;
+        crate::failpoints::maybe_fail(crate::failpoints::names::MUTATION_DELETE_NODE_PRE_PRIMARY_DELETE)?;
        let (_new_ds, delete_state) = self
            .storage_inline_residual()
            .delete_where(&full_path, ds, &pred_sql)
@ -1347,14 +1465,17 @@ impl Omnigraph {

            let edge_table_key = format!("edge:{}", edge_name);
            let cascade_filter = cascade_filters.join(" OR ");
-            let (edge_ds, edge_full_path, edge_table_branch) = open_table_for_mutation(
+            let (edge_handle, edge_full_path, edge_table_branch) = open_table_for_mutation(
                self,
                staging,
                branch,
                &edge_table_key,
                crate::db::MutationOpKind::Delete,
+                txn,
            )
            .await?;
+            // Delete is a STRICT op, so collapse #1 never skips its open.
+            let edge_ds = edge_handle.expect("strict Delete op always opens its dataset");

            let (_new_edge_ds, edge_delete) = self
                .storage_inline_residual()
@ -1391,18 +1512,22 @@ impl Omnigraph {
        params: &ParamMap,
        branch: Option<&str>,
        staging: &mut MutationStaging,
+        txn: Option<&crate::db::WriteTxn>,
    ) -> Result<MutationResult> {
        let pred_sql = predicate_to_sql(predicate, params, true)?;

        let table_key = format!("edge:{}", type_name);
-        let (ds, full_path, table_branch) = open_table_for_mutation(
+        let (handle, full_path, table_branch) = open_table_for_mutation(
            self,
            staging,
            branch,
            &table_key,
            crate::db::MutationOpKind::Delete,
+            txn,
        )
        .await?;
+        // Delete is a STRICT op, so collapse #1 never skips its open.
+        let ds = handle.expect("strict Delete op always opens its dataset");

        let (_new_ds, delete_state) = self
            .storage_inline_residual()
--- a/crates/omnigraph/src/exec/staging.rs
+++ b/crates/omnigraph/src/exec/staging.rs
@ -440,6 +440,26 @@ struct StagedTableEntry {
    staged_write: StagedHandle,
 }

+/// Output of [`StagedMutation::commit_all`] (Phase B): the publisher's input plus
+/// the queue guards the caller must hold across the manifest publish.
+pub(crate) struct CommittedMutation {
+    /// Per-table updates to publish to the manifest.
+    pub(crate) updates: Vec<SubTableUpdate>,
+    /// Per-table manifest pins refreshed under the write queue — the publisher's CAS fence.
+    pub(crate) expected_versions: HashMap<String, u64>,
+    /// Recovery sidecar to delete after Phase C succeeds (`None` when nothing staged).
+    pub(crate) sidecar_handle: Option<RecoverySidecarHandle>,
+    /// Per-`(table, branch)` write-queue guards — the caller MUST hold these across
+    /// the manifest publish (see `commit_all`) so no writer interleaves between
+    /// `commit_staged` and the publish.
+    pub(crate) guards: Vec<tokio::sync::OwnedMutexGuard<()>>,
+    /// Post-`commit_staged` handle per STAGED table (table_key → handle at the
+    /// just-committed version). Carried out (RFC-013 step 3b, collapse #4) so the
+    /// publish-prepare index build reuses it instead of a fresh `reopen_for_mutation`
+    /// at the same version. Inline-committed / delete tables are absent (no staged handle).
+    pub(crate) committed_handles: HashMap<String, SnapshotHandle>,
+}
+
 impl StagedMutation {
    /// **Phase B** of the two-phase commit: acquire per-`(table_key,
    /// branch)` queues, revalidate manifest pins, write the recovery
@ -485,12 +505,8 @@ impl StagedMutation {
            Vec<(String, Option<String>)>,
            Vec<tokio::sync::OwnedMutexGuard<()>>,
        )>,
-    ) -> Result<(
-        Vec<SubTableUpdate>,
-        HashMap<String, u64>,
-        Option<RecoverySidecarHandle>,
-        Vec<tokio::sync::OwnedMutexGuard<()>>,
-    )> {
+        txn: Option<&crate::db::WriteTxn>,
+    ) -> Result<CommittedMutation> {
        let StagedMutation {
            inline_committed,
            mut staged,
@ -585,7 +601,18 @@ impl StagedMutation {
        // Multi-coordinator deployments (§VI.27 aspirational) get
        // genuine cross-process drift detection from this read for
        // free.
-        let snapshot = db.fresh_snapshot_for_branch(branch).await?;
+        //
+        // This MUST be a FRESH per-branch manifest read (never the warm
+        // cache) for the OCC re-capture below — but with a `WriteTxn` the
+        // schema contract was already validated at capture, so use the
+        // `_unchecked` variant, which drops the redundant
+        // `ensure_schema_state_valid` AND the commit-graph load the OCC read
+        // never consults (a fresh manifest read yields the same `Snapshot`).
+        // Without a txn this is byte-identical to the prior checked call.
+        let snapshot = match txn {
+            Some(_) => db.fresh_snapshot_for_branch_unchecked(branch).await?,
+            None => db.fresh_snapshot_for_branch(branch).await?,
+        };
        for entry in staged.iter_mut() {
            let current = snapshot
                .entry(&entry.table_key)
@ -619,15 +646,20 @@ impl StagedMutation {
            // live Lance HEAD still equals that manifest pin. If an external
            // raw Lance write or a pre-fix maintenance path moved HEAD without
            // publishing `__manifest`, this write must not silently fold it.
-            let head = db
-                .storage()
-                .open_dataset_head_for_write(
-                    &entry.table_key,
-                    &entry.path.full_path,
-                    entry.path.table_branch.as_deref(),
-                )
-                .await?
-                .version();
+            //
+            // `latest_version_id` reads the latest manifest pointer off the
+            // already-open staged handle (the #2 staging open) WITHOUT a fresh
+            // `Dataset::open` — the same cheap live-HEAD probe
+            // `ManifestCoordinator::probe_latest_version` uses. This replaces a
+            // redundant `open_dataset_head_for_write` (RFC-013 step 3b, collapse
+            // #3): the drift comparison below is byte-identical; only how `head`
+            // is obtained changes (probe vs cold open).
+            let head = entry
+                .dataset
+                .dataset()
+                .latest_version_id()
+                .await
+                .map_err(|e| OmniError::Lance(e.to_string()))?;
            if head < current {
                return Err(OmniError::manifest_internal(format!(
                    "table '{}' Lance HEAD version {} is behind manifest version {}",
@ -786,6 +818,12 @@ impl StagedMutation {

        let mut updates: Vec<SubTableUpdate> = inline_committed.into_values().collect();

+        // Carry each staged table's post-`commit_staged` handle out so the
+        // publish-prepare index build reuses it (collapse #4) instead of
+        // re-opening the dataset at the same just-committed version.
+        let mut committed_handles: HashMap<String, SnapshotHandle> =
+            HashMap::with_capacity(staged.len());
+
        for entry in staged {
            let StagedTableEntry {
                table_key,
@ -798,15 +836,22 @@ impl StagedMutation {
            let new_ds = db.storage().commit_staged(dataset, staged_write).await?;
            let state = db.storage().table_state(&path.full_path, &new_ds).await?;
            updates.push(SubTableUpdate {
-                table_key,
+                table_key: table_key.clone(),
                table_version: state.version,
                table_branch: path.table_branch.clone(),
                row_count: state.row_count,
                version_metadata: state.version_metadata,
            });
+            committed_handles.insert(table_key, new_ds);
        }

-        Ok((updates, expected_versions, sidecar_handle, guards))
+        Ok(CommittedMutation {
+            updates,
+            expected_versions,
+            sidecar_handle,
+            guards,
+            committed_handles,
+        })
    }
 }

--- a/crates/omnigraph/src/failpoints.rs
+++ b/crates/omnigraph/src/failpoints.rs
@ -14,6 +14,115 @@ pub(crate) fn maybe_fail(_name: &str) -> Result<()> {
    Ok(())
 }

+/// Failpoint that injects a *Lance* error rather than an `OmniError`. Used to
+/// stand in for a `Dataset::open` failing with a transient/corrupt (non-not-found)
+/// error, so a test can drive the caller's lance-error classification — the
+/// behavior FIX A (`read_legacy_commit_cache`) relies on: a not-found is benign
+/// (empty), anything else propagates. A no-op without the `failpoints` feature
+/// (the injected variant is therefore unreachable in release builds).
+#[allow(unused_variables)]
+pub(crate) fn maybe_fail_lance_open(name: &str) -> std::result::Result<(), lance::Error> {
+    #[cfg(feature = "failpoints")]
+    {
+        fail::fail_point!(name, |_| {
+            Err(lance::Error::io(format!(
+                "injected failpoint triggered: {name}"
+            )))
+        });
+    }
+    Ok(())
+}
+
+/// Failpoint that injects a Lance `IncompatibleTransaction` — the variant a
+/// concurrent `UpdateConfig` stamp race produces. Lets a test drive the v3→v4
+/// stamp loop's exhaustion path (`commit_v4_stamp_idempotently`) deterministically;
+/// it is otherwise near-unreachable, since a real concurrent winner stamps the SAME
+/// value, so the loop's re-read returns `Ok` on the first retry. A no-op without the
+/// `failpoints` feature.
+#[allow(unused_variables)]
+pub(crate) fn maybe_fail_lance_incompatible(name: &str) -> std::result::Result<(), lance::Error> {
+    #[cfg(feature = "failpoints")]
+    {
+        fail::fail_point!(name, |_| {
+            Err(lance::Error::incompatible_transaction_source(
+                format!("injected failpoint triggered: {name}").into(),
+            ))
+        });
+    }
+    Ok(())
+}
+
+/// Failpoint that injects a *retryable* `RowLevelCasContention` `OmniError` — the
+/// typed conflict the manifest publisher's outer retry treats as retryable
+/// (`is_retryable_publish_conflict`). Used to drive the publisher's
+/// retry-on-`load_publish_state`-error path deterministically: the v3→v4 migration
+/// surfaces this same type on exhaustion EXPECTING the publisher to re-run the
+/// load, a path otherwise reachable only under sustained multi-writer contention.
+/// A no-op without the `failpoints` feature.
+#[allow(unused_variables)]
+pub(crate) fn maybe_fail_retryable_contention(name: &str) -> Result<()> {
+    #[cfg(feature = "failpoints")]
+    {
+        fail::fail_point!(name, |_| {
+            return Err(crate::error::OmniError::manifest_row_level_cas_contention(
+                format!("injected retryable contention failpoint: {name}"),
+            ));
+        });
+    }
+    Ok(())
+}
+
+/// Compile-checked catalog of every failpoint name in this crate. Call sites
+/// (`maybe_fail`) and tests (`ScopedFailPoint` / the test rendezvous helper)
+/// reference these constants instead of bare string literals, so a typo is a
+/// compile error rather than a silently-never-firing failpoint.
+pub mod names {
+    pub const BRANCH_CREATE_AFTER_MANIFEST_BRANCH_CREATE: &str = "branch_create.after_manifest_branch_create";
+    pub const BRANCH_DELETE_BEFORE_COMMIT_GRAPH_RECLAIM: &str = "branch_delete.before_commit_graph_reclaim";
+    pub const BRANCH_DELETE_BEFORE_TABLE_CLEANUP: &str = "branch_delete.before_table_cleanup";
+    pub const BRANCH_MERGE_ADOPT_AFTER_APPEND_PRE_UPSERT: &str = "branch_merge.adopt_after_append_pre_upsert";
+    pub const BRANCH_MERGE_ADOPT_AFTER_UPSERT_PRE_DELETE: &str = "branch_merge.adopt_after_upsert_pre_delete";
+    pub const BRANCH_MERGE_POST_PHASE_B_PRE_MANIFEST_COMMIT: &str = "branch_merge.post_phase_b_pre_manifest_commit";
+    pub const BRANCH_MERGE_REWRITE_AFTER_DELETE_PRE_INDEX: &str = "branch_merge.rewrite_after_delete_pre_index";
+    pub const BRANCH_MERGE_REWRITE_AFTER_MERGE_PRE_DELETE: &str = "branch_merge.rewrite_after_merge_pre_delete";
+    pub const CLASSIFY_FRESH_READ: &str = "classify.fresh_read";
+    pub const CLEANUP_RECONCILE_FORK: &str = "cleanup.reconcile_fork";
+    pub const CLEANUP_RESOLVE_BRANCH_SNAPSHOT: &str = "cleanup.resolve_branch_snapshot";
+    pub const CLEANUP_TABLE_GC: &str = "cleanup.table_gc";
+    pub const ENSURE_INDICES_POST_PHASE_B_PRE_MANIFEST_COMMIT: &str = "ensure_indices.post_phase_b_pre_manifest_commit";
+    pub const ENSURE_INDICES_POST_STAGE_PRE_COMMIT_BTREE: &str = "ensure_indices.post_stage_pre_commit_btree";
+    pub const FORK_BEFORE_CLASSIFY: &str = "fork.before_classify";
+    pub const FORK_BEFORE_RECLAIM: &str = "fork.before_reclaim";
+    pub const GRAPH_PUBLISH_AFTER_MANIFEST_COMMIT: &str = "graph_publish.after_manifest_commit";
+    pub const GRAPH_PUBLISH_BEFORE_COMMIT_APPEND: &str = "graph_publish.before_commit_append";
+    pub const INIT_AFTER_COORDINATOR_INIT: &str = "init.after_coordinator_init";
+    pub const INIT_AFTER_SCHEMA_CONTRACT_WRITTEN: &str = "init.after_schema_contract_written";
+    pub const INIT_AFTER_SCHEMA_PG_WRITTEN: &str = "init.after_schema_pg_written";
+    pub const MUTATION_DELETE_NODE_PRE_PRIMARY_DELETE: &str = "mutation.delete_node_pre_primary_delete";
+    pub const MUTATION_POST_FINALIZE_PRE_PUBLISHER: &str = "mutation.post_finalize_pre_publisher";
+    pub const OPTIMIZE_BEFORE_COMPACT: &str = "optimize.before_compact";
+    pub const OPTIMIZE_INJECT_REINDEX_CONFLICT: &str = "optimize.inject_reindex_conflict";
+    pub const OPTIMIZE_POST_PHASE_B_PRE_MANIFEST_COMMIT: &str = "optimize.post_phase_b_pre_manifest_commit";
+    pub const RECOVERY_BEFORE_ROLL_FORWARD_PUBLISH: &str = "recovery.before_roll_forward_publish";
+    pub const RECOVERY_ORPHAN_DISCARD_AUDIT_APPEND: &str = "recovery.orphan_discard_audit_append";
+    pub const RECOVERY_RECORD_AUDIT: &str = "recovery.record_audit";
+    pub const RECOVERY_SIDECAR_CONFIRM: &str = "recovery.sidecar_confirm";
+    pub const RECOVERY_SIDECAR_DELETE: &str = "recovery.sidecar_delete";
+    pub const RECOVERY_SIDECAR_LIST: &str = "recovery.sidecar_list";
+    pub const RECOVERY_SIDECAR_WRITE: &str = "recovery.sidecar_write";
+    pub const SCHEMA_APPLY_AFTER_MANIFEST_COMMIT: &str = "schema_apply.after_manifest_commit";
+    pub const SCHEMA_APPLY_AFTER_STAGING_WRITE: &str = "schema_apply.after_staging_write";
+    pub const SCHEMA_APPLY_BEFORE_STAGING_WRITE: &str = "schema_apply.before_staging_write";
+    // RFC-013 Phase 7 migration failpoints (this branch).
+    pub const MIGRATION_V3_TO_V4_LEGACY_OPEN: &str = "migration.v3_to_v4.legacy_open";
+    pub const MIGRATION_V4_STAMP_FORCE_INCOMPATIBLE: &str = "migration.v4_stamp.force_incompatible";
+    /// Injects a retryable `RowLevelCasContention` from `load_publish_state` so a
+    /// test can prove the publisher's outer retry re-runs the load (the migration
+    /// surfaces this same typed error on exhaustion).
+    pub const PUBLISH_LOAD_STATE_RETRYABLE_CONTENTION: &str =
+        "publish.load_state_retryable_contention";
+}
+
 #[cfg(feature = "failpoints")]
 pub struct ScopedFailPoint {
    name: String,
@ -27,6 +136,20 @@ impl ScopedFailPoint {
            name: name.to_string(),
        }
    }
+
+    /// Register a callback failpoint with the same Drop-based cleanup as
+    /// `new`. Without the guard, a panic while the point is active would
+    /// leak the callback into the process-global registry and fire it under
+    /// later tests in the same binary.
+    pub fn with_callback<F>(name: &str, callback: F) -> Self
+    where
+        F: Fn() + Send + Sync + 'static,
+    {
+        fail::cfg_callback(name, callback).expect("configure callback failpoint");
+        Self {
+            name: name.to_string(),
+        }
+    }
 }

 #[cfg(feature = "failpoints")]
--- a/crates/omnigraph/src/instrumentation.rs
+++ b/crates/omnigraph/src/instrumentation.rs
@ -43,6 +43,23 @@ pub struct QueryIoProbes {
    /// handle cache (Fix 3) serves them.
    pub table_wrapper: Option<Arc<dyn WrappingObjectStore>>,
    pub probe_count: Arc<AtomicU64>,
+    /// Counts DATA-table open CALLS through the two instrumented chokepoints
+    /// (`open_dataset_tracked` / `open_table_dataset`), classified by URI so the
+    /// internal/system tables (`__manifest`, `_graph_commits*`) are EXCLUDED — the
+    /// publisher CAS and commit-graph append open those every write, and counting
+    /// them would make the `data_open_count <= |touched_tables|` write gate
+    /// (RFC-013 step 3b) unreachable by threading alone. Unlike the opener-read
+    /// term (which mixes with the merge-insert/RI scan on the write path), this is
+    /// an exact open-invocation count. `forbidden_apis` keeps engine code OUTSIDE the
+    /// storage layer (`exec/`, `db/omnigraph/`, `loader/`, `changes/`) from opening
+    /// datasets except through these chokepoints, so the count is complete for the
+    /// keyed-write data path the gate measures. (`table_store.rs` is allow-listed and
+    /// does hold direct `Dataset::open`s — but only for branch-management ops
+    /// (`delete_branch`/`list_branches`/`force_delete_branch`), never that hot path.)
+    pub data_open_count: Arc<AtomicU64>,
+    /// Internal/system-table (`__manifest`, `_graph_commits*`) open CALLS — the
+    /// complement of `data_open_count`, kept for symmetry and debugging.
+    pub internal_open_count: Arc<AtomicU64>,
 }

 tokio::task_local! {
@ -80,6 +97,39 @@ pub(crate) fn record_probe() {
    let _ = current(|p| p.probe_count.fetch_add(1, Ordering::Relaxed));
 }

+/// Internal/system table directory names. An open of one of these is a metadata
+/// open (publisher CAS, commit-graph append, recovery audit), NOT a data-table
+/// open. Kept in sync with the dir constants in `db/manifest/layout.rs`,
+/// `db/commit_graph.rs`, and `db/recovery_audit.rs`.
+const INTERNAL_TABLE_DIRS: [&str; 4] = [
+    "__manifest",
+    "_graph_commits.lance",
+    "_graph_commit_actors.lance",
+    "_graph_commit_recoveries.lance",
+];
+
+/// True when `uri`'s last path segment names an internal/system table.
+fn open_is_internal(uri: &str) -> bool {
+    let trimmed = uri.trim_end_matches('/');
+    let last = trimmed.rsplit('/').next().unwrap_or(trimmed);
+    INTERNAL_TABLE_DIRS.contains(&last)
+}
+
+/// Record one table-open call against the active per-query probes, classified by
+/// table class (the URI's last segment) so the write gate counts DATA-table opens
+/// only and ignores the publisher/commit-graph metadata opens. No-op in production
+/// (the classification runs only inside the probe closure, which `current` skips
+/// when no probes are installed). Called at both open chokepoints.
+pub(crate) fn record_open(uri: &str) {
+    let _ = current(|p| {
+        if open_is_internal(uri) {
+            p.internal_open_count.fetch_add(1, Ordering::Relaxed);
+        } else {
+            p.data_open_count.fetch_add(1, Ordering::Relaxed);
+        }
+    });
+}
+
 /// Per-operation staged-write counts, installed for a task via
 /// [`with_merge_write_probes`]. Lets a cost-budget test assert WHICH staged-write
 /// primitive an operation invokes — e.g. that an append-only fast-forward merge
@ -177,6 +227,7 @@ pub(crate) async fn open_dataset_tracked(
    uri: &str,
    wrapper: Option<Arc<dyn WrappingObjectStore>>,
 ) -> Result<Dataset> {
+    record_open(uri);
    let result = match wrapper {
        None => Dataset::open(uri).await,
        Some(wrapper) => {
@ -203,6 +254,7 @@ pub(crate) async fn open_table_dataset(
    version: u64,
    session: Option<&Arc<lance::session::Session>>,
 ) -> Result<Dataset> {
+    record_open(location);
    let mut builder = DatasetBuilder::from_uri(location).with_version(version);
    if let Some(session) = session {
        builder = builder.with_session(session.clone());
--- a/crates/omnigraph/src/loader/mod.rs
+++ b/crates/omnigraph/src/loader/mod.rs
@ -187,7 +187,10 @@ impl Omnigraph {
            &omnigraph_policy::ResourceScope::Branch(branch.to_string()),
            actor_id,
        )?;
-        self.ensure_schema_state_valid().await?;
+        // Schema-contract validation is captured ONCE per write via the
+        // `WriteTxn` opened in `load_jsonl_reader` (after branch resolution).
+        // The redundant `ensure_schema_state_valid` that used to run here is
+        // subsumed by `open_write_txn`'s `resolved_branch_target` call.
        // Converge any pending recovery sidecar (a previously failed
        // writer's Phase B → Phase C residual) before staging anything:
        // without this, sidecar-covered drift wedges every load on the
@ -397,7 +400,16 @@ async fn load_jsonl_reader<R: BufRead>(
    // inline path.

    let mut result = LoadResult::default();
-    let snapshot = db.snapshot_for_branch(branch).await?;
+    // Capture-once write transaction (RFC-013 step 3b). `open_write_txn`
+    // validates the schema contract ONCE and pins the base snapshot. Threaded
+    // as `Some(&txn)` through the per-table opens and the manifest publish so
+    // each resolve point reuses the pinned base instead of re-validating the
+    // contract. The branch already exists here (fork-if-missing ran in
+    // `load_as` before this), so this captures the post-fork snapshot. The
+    // load's own base read (`db.snapshot_for_branch` previously) is the same
+    // per-branch snapshot, so reuse `txn.base` for it — dropping a validation.
+    let txn = db.open_write_txn(branch).await?;
+    let snapshot = txn.base.clone();
    let mut staging = MutationStaging::default();
    let pending_mode = match mode {
        LoadMode::Merge => PendingMode::Merge,
@ -481,15 +493,18 @@ async fn load_jsonl_reader<R: BufRead>(
    // Phase 2b: accumulate every node type in memory. Fragment writes are
    // delayed until after all validation succeeds.
    for (type_name, table_key, batch, loaded_count) in prepared_nodes {
-        let (ds, full_path, table_branch) = db
-            .open_for_mutation_on_branch(branch, &table_key, load_op_kind)
+        // The loader only needs the captured expected version (the publisher's
+        // CAS fence) for `ensure_path` — it discards the handle. With a
+        // non-strict load op (Merge/Append) and a `WriteTxn`, collapse #1 skips
+        // the dataset open and returns the pinned base version directly.
+        let opened = db
+            .open_for_mutation_on_branch(branch, &table_key, load_op_kind, Some(&txn))
            .await?;
-        let expected_version = ds.version();
        staging.ensure_path(
            &table_key,
-            full_path,
-            table_branch,
-            expected_version,
+            opened.full_path,
+            opened.table_branch,
+            opened.expected_version,
            load_op_kind,
        );
        let schema = batch.schema();
@ -553,15 +568,16 @@ async fn load_jsonl_reader<R: BufRead>(

    // Phase 2e: accumulate every edge type. Same dispatch as Phase 2b.
    for (edge_name, table_key, batch, loaded_count) in prepared_edges {
-        let (ds, full_path, table_branch) = db
-            .open_for_mutation_on_branch(branch, &table_key, load_op_kind)
+        // Same as the node phase: only the captured expected version is used;
+        // collapse #1 skips the open for a non-strict load op under a `WriteTxn`.
+        let opened = db
+            .open_for_mutation_on_branch(branch, &table_key, load_op_kind, Some(&txn))
            .await?;
-        let expected_version = ds.version();
        staging.ensure_path(
            &table_key,
-            full_path,
-            table_branch,
-            expected_version,
+            opened.full_path,
+            opened.table_branch,
+            opened.expected_version,
            load_op_kind,
        );
        let schema = batch.schema();
@ -589,22 +605,36 @@ async fn load_jsonl_reader<R: BufRead>(
    // `_queue_guards` holds per-(table_key, branch) write queues
    // across the manifest publish below — see exec/mutation.rs for
    // the rationale (interleaving prevention).
-    let (updates, expected_versions, sidecar_handle, _queue_guards) = staged
+    let crate::exec::staging::CommittedMutation {
+        updates,
+        expected_versions,
+        sidecar_handle,
+        guards: _queue_guards,
+        committed_handles,
+    } = staged
        .commit_all(
            db,
            branch,
            crate::db::manifest::SidecarKind::Load,
            actor_id,
            fork_queue_guards,
+            Some(&txn),
        )
        .await?;
    // Same finalize → publisher residual as mutations: per-table
    // staged commits have advanced Lance HEAD, but the manifest
    // publish has not run yet. Reuse the mutation failpoint name so
    // one failpoint pins the shared `MutationStaging` boundary.
-    crate::failpoints::maybe_fail("mutation.post_finalize_pre_publisher")?;
-    db.commit_updates_on_branch_with_expected(branch, &updates, &expected_versions, actor_id)
-        .await?;
+    crate::failpoints::maybe_fail(crate::failpoints::names::MUTATION_POST_FINALIZE_PRE_PUBLISHER)?;
+    db.commit_updates_on_branch_with_expected(
+        branch,
+        &updates,
+        &expected_versions,
+        actor_id,
+        Some(&txn),
+        committed_handles,
+    )
+    .await?;
    // The recovery sidecar protects the per-table commit_staged →
    // manifest publish window. Phase C succeeded — clean up
    // best-effort: failing the user here would error out a write
@ -1548,80 +1578,14 @@ fn literal_value_to_f64(v: &omnigraph_compiler::catalog::LiteralValue) -> f64 {

 // ─── Edge cardinality validation ─────────────────────────────────────────────

-pub(crate) async fn validate_edge_cardinality(
-    db: &crate::db::Omnigraph,
-    branch: Option<&str>,
-    edge_name: &str,
-    written_version: u64,
-    written_branch: Option<&str>,
-) -> Result<()> {
-    use arrow_array::Array;
-    let catalog = db.catalog();
-    let edge_type = &catalog.edge_types[edge_name];
-    if edge_type.cardinality.is_default() {
-        return Ok(());
-    }
-
-    // Open edge sub-table at the just-written version, not the snapshot's
-    // (the snapshot still pins to the pre-write version).
-    let snapshot = db.snapshot_for_branch(branch).await?;
-    let table_key = format!("edge:{}", edge_name);
-    let entry = snapshot
-        .entry(&table_key)
-        .ok_or_else(|| OmniError::manifest(format!("no manifest entry for {}", table_key)))?;
-    let ds = db
-        .open_dataset_at_state(
-            &entry.table_path,
-            written_branch.or(entry.table_branch.as_deref()),
-            written_version,
-        )
-        .await?;
-
-    // Scan src column, count per source
-    let batches = db.storage().scan(&ds, Some(&["src"]), None, None).await?;
-
-    let mut counts: HashMap<String, u32> = HashMap::new();
-    for batch in &batches {
-        let srcs = batch
-            .column_by_name("src")
-            .unwrap()
-            .as_any()
-            .downcast_ref::<StringArray>()
-            .unwrap();
-        for i in 0..srcs.len() {
-            *counts.entry(srcs.value(i).to_string()).or_insert(0) += 1;
-        }
-    }
-
-    let card = &edge_type.cardinality;
-    for (src, count) in &counts {
-        if let Some(max) = card.max {
-            if *count > max {
-                return Err(OmniError::manifest(format!(
-                    "@card violation on edge {}: source '{}' has {} edges (max {})",
-                    edge_name, src, count, max
-                )));
-            }
-        }
-        if *count < card.min {
-            return Err(OmniError::manifest(format!(
-                "@card violation on edge {}: source '{}' has {} edges (min {})",
-                edge_name, src, count, card.min
-            )));
-        }
-    }
-
-    Ok(())
-}
-
 /// Validate edge `@card` cardinality with in-memory pending edges visible.
 ///
 /// Loader-level analog to `exec::mutation::validate_edge_cardinality_with_pending`:
 /// opens the committed dataset at the pre-load snapshot version, then
 /// delegates to the shared `count_src_per_edge` + `enforce_cardinality_bounds`
-/// helpers in `exec::staging`. Used by Append/Merge loads (the Overwrite
-/// path uses `validate_edge_cardinality` which opens the just-written
-/// Lance version).
+/// helpers in `exec::staging`. Used by every load mode; for `LoadMode::Overwrite`
+/// it treats the pending edge batches as the replacement table image (the
+/// committed rows are being replaced, so only the pending set is counted).
 ///
 /// `mode` controls dedup behavior. `LoadMode::Merge` passes `Some("id")`
 /// so committed edges that the load is *updating* (same edge id,
--- a/crates/omnigraph/src/table_store.rs
+++ b/crates/omnigraph/src/table_store.rs
@ -812,10 +812,12 @@ impl TableStore {
    /// Legacy inline-commit append: writes fragments AND commits in one
    /// call, advancing Lance HEAD as a side effect. Not on the
    /// `TableStorage` trait surface — the staged primitive `stage_append`
-    /// + `commit_staged` is the engine write path. This inherent
-    /// `pub(crate)` method survives only for recovery test setup. Do not
-    /// add new engine call sites — they re-introduce the multi-phase
-    /// commit drift the trait surface was designed to eliminate.
+    /// + `commit_staged` is the engine write path. This inherent method
+    /// survives only for in-source recovery test setup, so it is
+    /// `#[cfg(test)]`-gated: engine code physically cannot call it (which
+    /// enforces "no new call sites" by construction and silences the
+    /// dead-code warning the non-test lib build would otherwise emit).
+    #[cfg(test)]
    pub(crate) async fn append_batch(
        &self,
        dataset_uri: &str,
--- a/crates/omnigraph/tests/failpoint_names_guard.rs
+++ b/crates/omnigraph/tests/failpoint_names_guard.rs
@ -0,0 +1,96 @@
+//! Guard: failpoint names must come from the compile-checked `names` catalog
+//! (`omnigraph::failpoints::names` / `omnigraph_cluster::failpoints::names`),
+//! never bare string literals.
+//!
+//! The `names` consts give compile-time typo protection only if every call
+//! site uses them. A bare `maybe_fail("typo.literal")` still compiles (the
+//! arg is `&str`), so a typo there would silently never fire. This
+//! source-walk closes that gap by construction — the same defense-in-depth
+//! shape as `forbidden_apis.rs`. Add a new failpoint by adding its const to
+//! the catalog first; this guard then forces every call site to reference it.
+
+use std::path::{Path, PathBuf};
+
+/// Call-site prefixes whose first argument must be a `names::` constant. The
+/// check is whitespace/newline-tolerant (it skips past the open paren to the
+/// first non-whitespace token), so wrapping the call across lines cannot hide
+/// a literal — a per-line `contains` scan would miss
+/// `park_first(\n    "name",\n)`.
+const CALL_PREFIXES: &[&str] = &[
+    "maybe_fail(",
+    "ScopedFailPoint::new(",
+    "ScopedFailPoint::with_callback(",
+    "park_first(",
+];
+
+/// 1-based line number of `byte_off` within `contents`.
+fn line_of(contents: &str, byte_off: usize) -> usize {
+    contents[..byte_off].bytes().filter(|&b| b == b'\n').count() + 1
+}
+
+fn manifest_dir() -> PathBuf {
+    PathBuf::from(env!("CARGO_MANIFEST_DIR"))
+}
+
+/// Production call sites live under each crate's `src`; test call sites live
+/// in the two failpoint integration binaries. This guard file is deliberately
+/// not in the set (it names the patterns as literals itself).
+fn files_to_scan() -> Vec<PathBuf> {
+    let engine = manifest_dir();
+    let cluster = engine.join("../omnigraph-cluster");
+    let mut out = Vec::new();
+    collect_rs(&engine.join("src"), &mut out);
+    collect_rs(&cluster.join("src"), &mut out);
+    out.push(engine.join("tests/failpoints.rs"));
+    out.push(cluster.join("tests/failpoints.rs"));
+    out
+}
+
+fn collect_rs(dir: &Path, out: &mut Vec<PathBuf>) {
+    let Ok(entries) = std::fs::read_dir(dir) else {
+        return;
+    };
+    for entry in entries.flatten() {
+        let path = entry.path();
+        if path.is_dir() {
+            collect_rs(&path, out);
+        } else if path.extension().is_some_and(|e| e == "rs") {
+            out.push(path);
+        }
+    }
+}
+
+#[test]
+fn failpoint_names_use_the_compile_checked_catalog() {
+    let mut violations = Vec::new();
+    for file in files_to_scan() {
+        let Ok(contents) = std::fs::read_to_string(&file) else {
+            continue;
+        };
+        for prefix in CALL_PREFIXES {
+            let mut from = 0;
+            while let Some(rel) = contents[from..].find(prefix) {
+                let after_open = from + rel + prefix.len();
+                // Skip whitespace (incl. newlines) after the open paren. If the
+                // first argument token is a `"`, it's a literal failpoint name
+                // — across a line break or not.
+                if contents[after_open..].trim_start().starts_with('"') {
+                    violations.push(format!(
+                        "{}:{}: literal failpoint name at `{}` — use a `names::` const",
+                        file.display(),
+                        line_of(&contents, from + rel),
+                        prefix.trim_end_matches('('),
+                    ));
+                }
+                from = after_open;
+            }
+        }
+    }
+    assert!(
+        violations.is_empty(),
+        "failpoint names must reference the compile-checked \
+         `omnigraph::failpoints::names::*` (or `omnigraph_cluster::failpoints::names::*`) \
+         constants, not string literals — a literal typo would silently never fire:\n{}",
+        violations.join("\n")
+    );
+}
--- a/crates/omnigraph/tests/failpoints.rs
+++ b/crates/omnigraph/tests/failpoints.rs
--- a/crates/omnigraph/tests/helpers/cost.rs
+++ b/crates/omnigraph/tests/helpers/cost.rs
@ -58,6 +58,14 @@ pub struct IoCounts {
    pub commit_graph_reads: u64,
    /// Version-probe invocations (the cheap freshness check).
    pub version_probes: u64,
+    /// DATA-table open CALL count through the two instrumented chokepoints — an
+    /// exact open-invocation count (not the opener-read term), classified by URI so
+    /// internal/system-table opens are excluded. Step-3b target:
+    /// `data_open_count <= |touched_tables|` for a write.
+    pub data_open_count: u64,
+    /// Internal/system-table (`__manifest`, `_graph_commits*`) open CALL count —
+    /// the complement of `data_open_count` (publisher CAS + commit-graph append).
+    pub internal_open_count: u64,
 }

 impl IoCounts {
@ -225,6 +233,8 @@ struct ProbeHandles {
    commit_graph: IOTracker,
    table: PrefixCounter,
    probe_count: Arc<AtomicU64>,
+    data_open_count: Arc<AtomicU64>,
+    internal_open_count: Arc<AtomicU64>,
 }

 impl ProbeHandles {
@ -234,6 +244,8 @@ impl ProbeHandles {
            commit_graph: IOTracker::default(),
            table: PrefixCounter::default(),
            probe_count: Arc::new(AtomicU64::new(0)),
+            data_open_count: Arc::new(AtomicU64::new(0)),
+            internal_open_count: Arc::new(AtomicU64::new(0)),
        };
        let probes = QueryIoProbes {
            manifest_wrapper: Some(Arc::new(h.manifest.clone()) as Arc<dyn WrappingObjectStore>),
@ -242,6 +254,8 @@ impl ProbeHandles {
            ),
            table_wrapper: Some(Arc::new(h.table.clone()) as Arc<dyn WrappingObjectStore>),
            probe_count: Arc::clone(&h.probe_count),
+            data_open_count: Arc::clone(&h.data_open_count),
+            internal_open_count: Arc::clone(&h.internal_open_count),
        };
        (probes, h)
    }
@ -256,6 +270,8 @@ impl ProbeHandles {
            manifest_reads: self.manifest.stats().read_iops,
            commit_graph_reads: self.commit_graph.stats().read_iops,
            version_probes: self.probe_count.load(Ordering::Relaxed),
+            data_open_count: self.data_open_count.load(Ordering::Relaxed),
+            internal_open_count: self.internal_open_count.load(Ordering::Relaxed),
        }
    }
 }
--- a/crates/omnigraph/tests/helpers/failpoint.rs
+++ b/crates/omnigraph/tests/helpers/failpoint.rs
@ -0,0 +1,84 @@
+//! Deterministic rendezvous for concurrent failpoint tests.
+//!
+//! The pattern: park the FIRST thread that hits a failpoint until the test
+//! explicitly releases it, while later arrivals fall through. This replaces
+//! fixed "guess" `sleep`s for cross-thread coordination — the test waits on
+//! the *condition* (the point was reached) with a bounded timeout that fails
+//! loudly, instead of betting a fixed duration is long enough.
+//!
+//! Extracted from the open-coded `AtomicBool` + callback pattern that
+//! `fork_collision_with_live_concurrent_fork_is_retryable` proved out.
+//!
+//! The `reached` flag also doubles as a fired-assertion: a point that is
+//! never hit makes [`Rendezvous::wait_until_reached`] panic, so a typo'd or
+//! misplaced failpoint cannot pass silently.
+
+use std::sync::Arc;
+use std::sync::atomic::{AtomicBool, Ordering::SeqCst};
+use std::time::Duration;
+
+use omnigraph::failpoints::ScopedFailPoint;
+
+/// A parked-on-first-arrival rendezvous bound to a failpoint name. The
+/// underlying callback is RAII-cleaned when this guard drops.
+pub struct Rendezvous {
+    name: String,
+    reached: Arc<AtomicBool>,
+    release: Arc<AtomicBool>,
+    _failpoint: ScopedFailPoint,
+}
+
+impl Rendezvous {
+    /// Register `name` so the FIRST thread to hit it records readiness and
+    /// blocks until [`release`](Self::release); later arrivals fall through
+    /// immediately. The park is bounded (~30s) so a test bug cannot hang the
+    /// suite forever.
+    pub fn park_first(name: &str) -> Self {
+        let reached = Arc::new(AtomicBool::new(false));
+        let release = Arc::new(AtomicBool::new(false));
+        let (cb_reached, cb_release) = (Arc::clone(&reached), Arc::clone(&release));
+        let _failpoint = ScopedFailPoint::with_callback(name, move || {
+            if cb_reached
+                .compare_exchange(false, true, SeqCst, SeqCst)
+                .is_ok()
+            {
+                // ~30s bound (6000 * 5ms); released earlier on the common path.
+                for _ in 0..6000 {
+                    if cb_release.load(SeqCst) {
+                        return;
+                    }
+                    std::thread::sleep(Duration::from_millis(5));
+                }
+            }
+        });
+        Self {
+            name: name.to_string(),
+            reached,
+            release,
+            _failpoint,
+        }
+    }
+
+    /// Async-wait until the parked thread has reached the failpoint, polling
+    /// the readiness condition with a bounded (~12s) timeout. Panics if the
+    /// point is never hit — the fired-assertion.
+    pub async fn wait_until_reached(&self) {
+        for _ in 0..2400 {
+            if self.reached.load(SeqCst) {
+                return;
+            }
+            tokio::time::sleep(Duration::from_millis(5)).await;
+        }
+        panic!("rendezvous: failpoint '{}' was never reached", self.name);
+    }
+
+    /// Whether the parked thread has reached the failpoint yet.
+    pub fn reached(&self) -> bool {
+        self.reached.load(SeqCst)
+    }
+
+    /// Release the parked thread so it resumes past the failpoint.
+    pub fn release(&self) {
+        self.release.store(true, SeqCst);
+    }
+}
--- a/crates/omnigraph/tests/helpers/mod.rs
+++ b/crates/omnigraph/tests/helpers/mod.rs
@ -1,6 +1,8 @@
 #![allow(dead_code)]

 pub mod cost;
+#[cfg(feature = "failpoints")]
+pub mod failpoint;
 pub mod recovery;

 use arrow_array::{Array, RecordBatch, StringArray};
--- a/crates/omnigraph/tests/lance_surface_guards.rs
+++ b/crates/omnigraph/tests/lance_surface_guards.rs
@ -86,6 +86,83 @@ async fn lance_error_too_much_write_contention_variant_exists() {
    );
 }

+// --- Guard 1a: LanceError::IncompatibleTransaction variant exists ----------
+//
+// `db/manifest/migrations.rs::commit_v4_stamp_idempotently` pattern-matches on
+// this variant: two concurrent v3→v4 runners both bump the internal-schema stamp
+// (an `UpdateConfig` commit on the same metadata key), and the loser gets
+// `IncompatibleTransaction`. Since both write the same value the conflict is
+// benign and is retried idempotently. If Lance renames the variant or removes the
+// builder, the match silently stops catching the conflict — this guard fails to
+// force an update.
+
+#[tokio::test]
+async fn lance_error_incompatible_transaction_variant_exists() {
+    let err =
+        lance::Error::incompatible_transaction_source("concurrent UpdateConfig at version N".into());
+    assert!(
+        matches!(err, lance::Error::IncompatibleTransaction { .. }),
+        "Lance::Error::IncompatibleTransaction variant missing or renamed; \
+         update db/manifest/migrations.rs::commit_v4_stamp_idempotently and \
+         this guard, then re-pin docs/dev/lance.md."
+    );
+}
+
+// --- Guard 1c: LanceError::DatasetAlreadyExists variant exists --------------
+//
+// `db/commit_graph.rs` and `db/recovery_audit.rs` create internal Lance tables
+// with a create-or-open idempotency fallback: a concurrent/prior create races,
+// and the `DatasetAlreadyExists` arm falls back to `Dataset::open`. They match
+// the typed variant, NOT the display string ("Dataset already exists: ..."),
+// which is not a Lance API contract. If Lance renames the variant the match
+// silently stops catching the race and a re-create errors instead of opening —
+// this guard turns red to force an update.
+
+#[tokio::test]
+async fn lance_error_dataset_already_exists_variant_exists() {
+    let err = lance::Error::dataset_already_exists("guard");
+    assert!(
+        matches!(err, lance::Error::DatasetAlreadyExists { .. }),
+        "Lance::Error::DatasetAlreadyExists variant missing or renamed; update the \
+         db/commit_graph.rs + db/recovery_audit.rs create-or-open fallbacks and \
+         this guard, then re-pin docs/dev/lance.md."
+    );
+}
+
+// --- Guard 1b: Dataset::open on a missing path returns a not-found variant --
+//
+// `db/commit_graph.rs::read_legacy_commit_cache` (the v3→v4 lineage migration
+// source) classifies a legacy-open error: a genuine not-found is the benign
+// "no legacy data" signal (empty cache), and ANY OTHER error propagates loudly
+// rather than being read as "empty" — a swallow there would let the migration
+// stamp v4 over an empty backfill, orphaning real lineage permanently. That
+// classification relies on Lance mapping an object-store NotFound to
+// `DatasetNotFound` (or, for some paths, `NotFound`). If a Lance bump emits a
+// different variant for a missing dataset, the migration would propagate a
+// genuine "no legacy data" as a hard error — this guard turns red to force the
+// classifier (and this guard) to be updated together.
+
+#[tokio::test]
+async fn dataset_open_missing_returns_not_found_variant() {
+    let dir = tempfile::tempdir().unwrap();
+    // A path that was never written — nothing to open.
+    let missing = dir.path().join("does-not-exist.lance");
+    let err = match Dataset::open(missing.to_str().unwrap()).await {
+        Ok(_) => panic!("opening a never-written dataset path must error"),
+        Err(e) => e,
+    };
+    assert!(
+        matches!(
+            err,
+            lance::Error::DatasetNotFound { .. } | lance::Error::NotFound { .. }
+        ),
+        "Dataset::open on a missing path no longer returns DatasetNotFound/NotFound \
+         (got: {err:?}); update db/commit_graph.rs::read_legacy_commit_cache's \
+         legacy-open classification and this guard together, then re-pin \
+         docs/dev/lance.md."
+    );
+}
+
 // --- Guard 2: ManifestLocation field shape ---------------------------------
 //
 // `db/manifest/metadata.rs:84-88` reads `.path`, `.size`, `.e_tag`,
--- a/crates/omnigraph/tests/lineage_projection.rs
+++ b/crates/omnigraph/tests/lineage_projection.rs
@ -0,0 +1,235 @@
+//! RFC-013 Phase 7 acceptance gate: graph lineage lives ONLY in `__manifest`.
+//!
+//! The `graph_commit` + `graph_head` rows ride the same publish CAS as the
+//! table-version rows, so `_graph_commits.lance` carries NO commit rows. This
+//! gate proves two things over a realistic history (commits on main, a branch,
+//! a merge, all with actors):
+//!
+//! 1. The production commit-graph projection (`CommitGraph::open(...)`, which now
+//!    reads `__manifest`) reconstructs the full lineage correctly — commit set,
+//!    parents, the merge commit's two parents + merge actor, per-branch heads,
+//!    and the inline actors.
+//! 2. `_graph_commits.lance` (and its actor sidecar) hold ZERO commit rows: the
+//!    dual-write is gone and nothing appends to them. This is the load-bearing
+//!    "single source" assertion.
+
+mod helpers;
+
+use futures::TryStreamExt;
+use lance::Dataset;
+
+use omnigraph::db::commit_graph::CommitGraph;
+use omnigraph::db::{GraphCommit, Omnigraph};
+
+use helpers::*;
+
+/// Count rows in a Lance dataset directory under the graph root, or `0` if it
+/// does not exist.
+async fn row_count(root: &str, dir: &str) -> usize {
+    let uri = format!("{}/{}", root.trim_end_matches('/'), dir);
+    let Ok(dataset) = Dataset::open(&uri).await else {
+        return 0;
+    };
+    let batches: Vec<arrow_array::RecordBatch> = dataset
+        .scan()
+        .try_into_stream()
+        .await
+        .unwrap()
+        .try_collect()
+        .await
+        .unwrap();
+    batches.iter().map(|b| b.num_rows()).sum()
+}
+
+/// The production commit-graph projection at `branch`, sourced from `__manifest`.
+async fn projected_commits(root: &str, branch: Option<&str>) -> Vec<GraphCommit> {
+    let graph = match branch {
+        Some(branch) => CommitGraph::open_at_branch(root, branch).await.unwrap(),
+        None => CommitGraph::open(root).await.unwrap(),
+    };
+    let mut commits = graph.load_commits().await.unwrap();
+    commits.sort_by(|a, b| {
+        a.manifest_version
+            .cmp(&b.manifest_version)
+            .then_with(|| a.created_at.cmp(&b.created_at))
+            .then_with(|| a.graph_commit_id.cmp(&b.graph_commit_id))
+    });
+    commits
+}
+
+async fn head_id(root: &str, branch: Option<&str>) -> String {
+    let graph = match branch {
+        Some(branch) => CommitGraph::open_at_branch(root, branch).await.unwrap(),
+        None => CommitGraph::open(root).await.unwrap(),
+    };
+    graph
+        .head_commit()
+        .await
+        .unwrap()
+        .unwrap()
+        .graph_commit_id
+}
+
+#[tokio::test]
+async fn graph_lineage_lives_only_in_manifest() {
+    let dir = tempfile::tempdir().unwrap();
+    let uri = dir.path().to_str().unwrap().to_string();
+
+    // Build a realistic history: several authored commits on main, a branch with
+    // its own authored commits, then an authored merge back into main.
+    let main = init_and_load(&dir).await;
+
+    main.mutate_as(
+        "main",
+        MUTATION_QUERIES,
+        "insert_person",
+        &mixed_params(&[("$name", "Alice")], &[("$age", 30)]),
+        Some("act-alice"),
+    )
+    .await
+    .unwrap();
+    main.mutate_as(
+        "main",
+        MUTATION_QUERIES,
+        "insert_person",
+        &mixed_params(&[("$name", "Bob")], &[("$age", 41)]),
+        Some("act-bob"),
+    )
+    .await
+    .unwrap();
+
+    main.branch_create("feature").await.unwrap();
+
+    let feature = Omnigraph::open(&uri).await.unwrap();
+    feature
+        .mutate_as(
+            "feature",
+            MUTATION_QUERIES,
+            "insert_person",
+            &mixed_params(&[("$name", "Carol")], &[("$age", 27)]),
+            Some("act-carol"),
+        )
+        .await
+        .unwrap();
+    feature
+        .mutate_as(
+            "feature",
+            MUTATION_QUERIES,
+            "insert_person",
+            &mixed_params(&[("$name", "Dave")], &[("$age", 33)]),
+            Some("act-dave"),
+        )
+        .await
+        .unwrap();
+
+    // Advance main once more so the merge is a real (non-fast-forward) merge with
+    // two distinct parents.
+    main.mutate_as(
+        "main",
+        MUTATION_QUERIES,
+        "insert_person",
+        &mixed_params(&[("$name", "Erin")], &[("$age", 38)]),
+        Some("act-erin"),
+    )
+    .await
+    .unwrap();
+
+    let outcome = main
+        .branch_merge_as("feature", "main", Some("act-merger"))
+        .await
+        .unwrap();
+    // A genuine three-way merge (both sides advanced past the base).
+    assert_eq!(
+        outcome,
+        omnigraph::db::MergeOutcome::Merged,
+        "expected a real merge, not fast-forward/up-to-date"
+    );
+
+    // ── single source: nothing writes `_graph_commits.lance` ─────────────────
+    // RFC-013 Phase 7 folds lineage into `__manifest`; the commit-graph dataset
+    // exists only to carry branch refs, so it (and its actor sidecar) hold ZERO
+    // commit rows. If a stray `append_commit` reappears, this turns red.
+    assert_eq!(
+        row_count(&uri, "_graph_commits.lance").await,
+        0,
+        "_graph_commits.lance must carry no commit rows — lineage lives in __manifest"
+    );
+    assert_eq!(
+        row_count(&uri, "_graph_commit_actors.lance").await,
+        0,
+        "_graph_commit_actors.lance must carry no rows — actors live inline in __manifest"
+    );
+
+    // ── main lineage projected from `__manifest` ─────────────────────────────
+    let main_commits = projected_commits(&uri, None).await;
+    // genesis + Alice + Bob + Erin + the merge = 5 on main.
+    assert!(
+        main_commits.len() >= 5,
+        "expected a non-trivial main history, got {} commits",
+        main_commits.len()
+    );
+
+    // Genesis is the unique parentless commit and carries no actor.
+    let genesis: Vec<&GraphCommit> = main_commits
+        .iter()
+        .filter(|c| c.parent_commit_id.is_none())
+        .collect();
+    assert_eq!(genesis.len(), 1, "exactly one genesis (parentless) commit");
+    assert!(
+        genesis[0].actor_id.is_none(),
+        "genesis commit carries no actor"
+    );
+
+    // Every non-genesis commit's parent resolves to a known commit (a connected
+    // lineage — the publisher resolved each parent under the CAS).
+    for commit in &main_commits {
+        if let Some(parent) = &commit.parent_commit_id {
+            assert!(
+                main_commits.iter().any(|c| &c.graph_commit_id == parent),
+                "parent {parent} of {} must be a known commit",
+                commit.graph_commit_id
+            );
+        }
+    }
+
+    // The merge commit carries both parents and the merge actor.
+    let merge_commit = main_commits
+        .iter()
+        .find(|c| c.merged_parent_commit_id.is_some())
+        .expect("a merge commit with a merged parent must exist");
+    assert_eq!(merge_commit.actor_id.as_deref(), Some("act-merger"));
+    assert!(merge_commit.parent_commit_id.is_some());
+    // The merge is the head of main.
+    assert_eq!(
+        head_id(&uri, None).await,
+        merge_commit.graph_commit_id,
+        "the merge commit is the head of main"
+    );
+
+    // ── feature lineage projected from `__manifest` ──────────────────────────
+    let feature_commits = projected_commits(&uri, Some("feature")).await;
+    // The feature head is Dave's commit (the last authored on the branch).
+    let feature_head = head_id(&uri, Some("feature")).await;
+    let feature_head_commit = feature_commits
+        .iter()
+        .find(|c| c.graph_commit_id == feature_head)
+        .expect("feature head must be in the feature projection");
+    assert_eq!(
+        feature_head_commit.actor_id.as_deref(),
+        Some("act-dave"),
+        "feature head is Dave's authored commit"
+    );
+
+    // ── actors surface inline from the manifest metadata ─────────────────────
+    // main's authored commits: Alice, Bob, Erin (direct) + the merge (act-merger)
+    // = 4. Carol/Dave were authored on the feature branch, not main. Genesis has
+    // no actor.
+    let authored = main_commits
+        .iter()
+        .filter(|c| c.actor_id.is_some())
+        .count();
+    assert!(
+        authored >= 4,
+        "expected the authored commits to surface their actor in the projection, saw {authored}"
+    );
+}
--- a/crates/omnigraph/tests/maintenance.rs
+++ b/crates/omnigraph/tests/maintenance.rs
@ -97,7 +97,9 @@ async fn optimize_on_empty_graph_returns_stats_per_table_with_no_changes() {
    // Schema declares 2 nodes + 2 edges = 4 data tables, plus the 3 internal
    // system tables (`__manifest`, `_graph_commits`, `_graph_commit_actors`) optimize
    // also compacts (RFC-013 step 2) = 7. Compaction should run on each but find
-    // nothing to merge.
+    // nothing to merge. The genesis graph commit rides the SINGLE init
+    // `__manifest` write (RFC-013 Phase 7), so a fresh graph has one fragment per
+    // table — nothing to compact anywhere.
    assert_eq!(stats.len(), 7);
    for s in &stats {
        assert_eq!(s.fragments_removed, 0, "{} should not remove", s.table_key);
@ -143,17 +145,20 @@ async fn optimize_after_load_then_again_is_idempotent() {
    }
 }

-/// RFC-013 step 2: `optimize` compacts the internal system tables
-/// (`__manifest`, `_graph_commits`), which accumulate one fragment per commit.
-/// After compaction they shed fragments, write no recovery sidecar (a single
-/// atomic Lance commit — no HEAD-before-publish gap), and the graph stays
-/// coherent for subsequent reads + strict writes.
+/// RFC-013 step 2 + Phase 7: `optimize` compacts `__manifest`, which now
+/// accumulates one fragment per commit for BOTH the table-version rows and the
+/// folded-in graph-lineage rows (`graph_commit` + `graph_head`). The
+/// commit-graph datasets (`_graph_commits`, `_graph_commit_actors`) no longer
+/// take a per-commit row (lineage lives in `__manifest`), so they stay flat —
+/// nothing to compact. After compaction `__manifest` sheds fragments, writes no
+/// recovery sidecar (a single atomic Lance commit — no HEAD-before-publish gap),
+/// and the graph stays coherent for subsequent reads + strict writes.
 #[tokio::test]
 async fn optimize_compacts_internal_tables() {
    let dir = tempfile::tempdir().unwrap();
    let mut db = init_and_load(&dir).await;

-    // Build version-history depth so the internal tables accumulate fragments.
+    // Build version-history depth so `__manifest` accumulates fragments.
    for i in 0..20 {
        mutate_main(
            &mut db,
@ -167,16 +172,32 @@ async fn optimize_compacts_internal_tables() {

    let stats = db.optimize().await.unwrap();

-    for key in ["__manifest", "_graph_commits"] {
+    // `__manifest` carries every per-commit fragment (table versions + lineage)
+    // and compacts.
+    let manifest_stats = stats
+        .iter()
+        .find(|s| s.table_key == "__manifest")
+        .expect("optimize stats missing internal table __manifest");
+    assert!(
+        manifest_stats.committed,
+        "__manifest should compact after 20 commits"
+    );
+    assert!(
+        manifest_stats.fragments_removed > 0,
+        "__manifest should shed fragments, removed {}",
+        manifest_stats.fragments_removed
+    );
+
+    // The commit-graph datasets take no per-commit row anymore (RFC-013 Phase 7
+    // folds lineage into `__manifest`), so they stay at one fragment — no-ops.
+    for key in ["_graph_commits", "_graph_commit_actors"] {
        let s = stats
            .iter()
            .find(|s| s.table_key == key)
            .unwrap_or_else(|| panic!("optimize stats missing internal table {key}"));
-        assert!(s.committed, "{key} should compact after 20 commits");
        assert!(
-            s.fragments_removed > 0,
-            "{key} should shed fragments, removed {}",
-            s.fragments_removed
+            !s.committed,
+            "{key} carries no per-commit rows after Phase 7 — nothing to compact"
        );
    }

--- a/crates/omnigraph/tests/recovery.rs
+++ b/crates/omnigraph/tests/recovery.rs
@ -685,38 +685,21 @@ async fn list_recovery_audit_kinds(graph_root: &Path) -> Vec<String> {
    out
 }

-/// Helper: count `_graph_commits.lance` rows tagged with the recovery actor.
+/// Helper: count graph commits authored by the recovery actor. RFC-013 Phase 7
+/// records the recovery commit in `__manifest` (folded into the recovery publish
+/// CAS), not `_graph_commits.lance`, so this counts through the production
+/// commit-graph projection (`load_commits`), filtering on the inline actor.
 async fn count_recovery_actor_commits(graph_root: &Path) -> usize {
-    let actors_dir = graph_root.join("_graph_commit_actors.lance");
-    if !actors_dir.exists() {
-        return 0;
-    }
-    let ds = Dataset::open(actors_dir.to_str().unwrap()).await.unwrap();
-    use arrow_array::{Array, StringArray};
-    use futures::TryStreamExt;
-    let batches: Vec<arrow_array::RecordBatch> = ds
-        .scan()
-        .try_into_stream()
+    let commits = omnigraph::db::commit_graph::CommitGraph::open(graph_root.to_str().unwrap())
        .await
        .unwrap()
-        .try_collect()
+        .load_commits()
        .await
        .unwrap();
-    let mut count = 0;
-    for batch in &batches {
-        let actors = batch
-            .column_by_name("actor_id")
-            .unwrap()
-            .as_any()
-            .downcast_ref::<StringArray>()
-            .unwrap();
-        for i in 0..actors.len() {
-            if actors.value(i) == "omnigraph:recovery" {
-                count += 1;
-            }
-        }
-    }
-    count
+    commits
+        .iter()
+        .filter(|c| c.actor_id.as_deref() == Some("omnigraph:recovery"))
+        .count()
 }

 #[tokio::test]
--- a/crates/omnigraph/tests/validators.rs
+++ b/crates/omnigraph/tests/validators.rs
@ -237,6 +237,58 @@ async fn cardinality_rejected_on_mutation_insert_edge() {
    );
 }

+/// RFC-013 step 3b regression guard (cursor High / codex P1 on #298): edge `@card`
+/// validation must scan LIVE committed HEAD, not the pinned `txn.base`. Collapse #1
+/// skips the edge accumulation open, so a non-strict edge insert under a `WriteTxn`
+/// reopens for the cardinality scan — and that scan must observe edges a concurrent
+/// writer committed after this mutation captured its base, or a `@card` max is
+/// silently exceeded (invariant 9). The residual validate→commit TOCTOU is the §7.1
+/// gap (step 4); this only un-widens what 3b widened (live HEAD vs mutation-start base).
+///
+/// Deterministic — no failpoint: handle B's coordinator is stale by construction
+/// (the write path does not probe the manifest version, unlike the read path). B MUST
+/// NOT read between A's commit and B's insert — a read refreshes B's coordinator and
+/// masks the bug (the same caveat as the served stale-view repro in `writes.rs`).
+#[tokio::test]
+async fn cardinality_rejected_for_stale_handle_after_concurrent_edge_commit() {
+    let (dir, mut db_a) = init_with(CARDINALITY_SCHEMA, CARDINALITY_SEED).await;
+    let uri = dir.path().to_str().unwrap();
+
+    // Handle B opens the same graph at the seed version (no edges yet); it then
+    // never reads again, so its in-memory coordinator stays pinned at the seed.
+    let mut db_b = Omnigraph::open(uri).await.unwrap();
+
+    // Handle A commits WorksAt(Alice -> Acme): Alice is now at the @card(0..1) max.
+    // This advances the on-disk manifest; B's coordinator is now stale.
+    mutate_main(
+        &mut db_a,
+        CARDINALITY_MUTATIONS,
+        "add_employment",
+        &params(&[("$person", "Alice"), ("$company", "Acme")]),
+    )
+    .await
+    .unwrap();
+
+    // Handle B (stale, never read since A committed) inserts a second WorksAt for
+    // Alice. B is non-strict + under a WriteTxn, so collapse #1 skips the open and the
+    // cardinality scan reopens: it MUST read live HEAD (Alice has 1) → reject (1+1 > 1),
+    // not the stale base (Alice has 0) → which would wrongly pass and commit a 2nd edge.
+    let err = mutate_main(
+        &mut db_b,
+        CARDINALITY_MUTATIONS,
+        "add_employment",
+        &params(&[("$person", "Alice"), ("$company", "Beta")]),
+    )
+    .await
+    .unwrap_err();
+    assert!(
+        err.to_string().to_lowercase().contains("cardinality")
+            || err.to_string().to_lowercase().contains("@card"),
+        "a stale-handle edge insert must be rejected by @card against live HEAD, got: {}",
+        err
+    );
+}
+
 #[tokio::test]
 async fn cardinality_rejected_on_jsonl_load() {
    // Already covered by existing loader Phase 3 logic but assert the
--- a/crates/omnigraph/tests/write_cost.rs
+++ b/crates/omnigraph/tests/write_cost.rs
@ -24,10 +24,10 @@
 mod helpers;

 use helpers::cost::{
-    IoCounts, assert_flat, assert_grows, local_graph, measure_insert, measure_insert_as,
+    IoCounts, assert_flat, assert_grows, local_graph, measure, measure_insert, measure_insert_as,
    measure_with_staged,
 };
-use helpers::{MUTATION_QUERIES, commit_many, commit_many_as, mixed_params};
+use helpers::{MUTATION_QUERIES, commit_many, commit_many_as, init_and_load, mixed_params};

 // ── (A) The internal-table LOCK — the acceptance test for step 2 (compaction) ──
 //
@ -130,7 +130,16 @@ async fn single_insert_data_write_is_bounded() {

 /// At a fixed shallow depth, the per-write object-store read count is below a
 /// documented ceiling. Fails the moment a change *adds* a round-trip on the write
-/// path — the "no new round-trip" guard (calibrated: ~50 at depth ~5).
+/// path — the "no new round-trip" guard.
+///
+/// Two folds keep the count low: RFC-013 Phase 7 put the `graph_commit` +
+/// `graph_head` rows in the same publish merge-insert (no extra `__manifest`
+/// write/scan per commit), and RFC-013 P2 collapsed the publish path's FOUR
+/// `__manifest` scans (table locations + version entries + tombstones + a
+/// separate `read_graph_lineage` for the parent) into ONE — the
+/// `manifest_reads` sub-ceiling below would trip if any of those scans crept
+/// back. Calibrated at depth ~5: ~26 `__manifest` reads / ~36 total after the
+/// P2 fold (was ~44 / ~54 with the four separate scans).
 #[tokio::test]
 async fn write_op_count_ceiling_at_shallow_depth() {
    let dir = tempfile::tempdir().unwrap();
@ -141,6 +150,16 @@ async fn write_op_count_ceiling_at_shallow_depth() {
        "depth~5: data={} __manifest={} _graph_commits={} total_reads={}",
        io.data_reads, io.manifest_reads, io.commit_graph_reads, io.total_reads()
    );
+    // Sub-ceiling on `__manifest` reads specifically: the publish path does one
+    // scan, not four. ~26 measured at this depth; a re-added scan would push it
+    // well past this. (Deterministic on local FS.)
+    const MANIFEST_CEILING: u64 = 34;
+    assert!(
+        io.manifest_reads <= MANIFEST_CEILING,
+        "per-write __manifest reads {} exceeded ceiling {MANIFEST_CEILING} — a publish-path \
+         scan was re-added (RFC-013 P2 folds them into one)",
+        io.manifest_reads,
+    );
    const CEILING: u64 = 80;
    assert!(
        io.total_reads() <= CEILING,
@ -169,3 +188,86 @@ async fn keyed_insert_routes_through_merge_insert_only() {
    assert_eq!(staged.stage_append, 0, "keyed insert must not stage_append");
    assert_eq!(staged.create_vector_index, 0, "no inline vector-index build on a plain insert");
 }
+
+// ── (D) Step-3b capture-once fitness asserts (RED today → GREEN after WriteTxn) ──
+
+/// A write must validate the schema contract EXACTLY ONCE (3 `read_text` + 2 `exists`).
+/// Today the write path re-validates at every resolve point (entry, per-table
+/// `resolved_branch_target`, commit-time `fresh_snapshot_for_branch`), so the delta is
+/// a multiple of that. Step 3b's `WriteTxn` validates once and threads it. The shape is
+/// the write twin of `warm_read_cost.rs::warm_query_validates_schema_contract_once`,
+/// built with ZERO production change via the counting storage adapter.
+#[tokio::test]
+async fn write_validates_schema_contract_once() {
+    use omnigraph::instrumentation::CountingStorageAdapter;
+    use omnigraph::storage::storage_for_uri;
+
+    let dir = tempfile::tempdir().unwrap();
+    let _ = init_and_load(&dir).await;
+    let uri = dir.path().to_str().unwrap();
+    let (adapter, counts) = CountingStorageAdapter::new(storage_for_uri(uri).unwrap());
+    let db = omnigraph::db::Omnigraph::open_with_storage(uri, adapter)
+        .await
+        .unwrap();
+
+    let before_read_text = counts.read_text();
+    let before_exists = counts.exists();
+    db.mutate(
+        "main",
+        MUTATION_QUERIES,
+        "insert_person",
+        &mixed_params(&[("$name", "schema_once")], &[("$age", 30)]),
+    )
+    .await
+    .unwrap();
+
+    let read_text_delta = counts.read_text() - before_read_text;
+    let exists_delta = counts.exists() - before_exists;
+    eprintln!("schema-contract reads on one write: read_text={read_text_delta} exists={exists_delta}");
+    assert_eq!(
+        read_text_delta, 3,
+        "a write must validate the schema contract once (3 reads), not N times",
+    );
+    assert_eq!(
+        exists_delta, 2,
+        "a write must probe contract-file existence once (2 probes), not N times",
+    );
+}
+
+/// A keyed single-table write must open its DATA table AT MOST ONCE. Today it opens
+/// ~4× (accumulation, staging, commit drift-guard, publish-prepare/index-build), each
+/// a fresh cold `Dataset::open`. Step 3b opens the base once (a *session-aware* base
+/// open is deferred to step 5), threads the commit-return handle, and replaces the
+/// drift-guard open with a cheap `latest_version_id` probe — collapsing to 1 open.
+/// Counted by `data_open_count`, the
+/// table-class-scoped chokepoint probe: the internal-table opens (publisher CAS +
+/// commit-graph append) are EXCLUDED, since they are unrelated to data-table reuse and
+/// would otherwise keep this count >1 regardless of threading. (`forbidden_apis` keeps
+/// engine code outside the storage layer from opening datasets except through the
+/// instrumented chokepoints — `table_store.rs`'s own direct opens are branch-management
+/// ops, not this keyed-write path.)
+#[tokio::test]
+async fn keyed_insert_opens_table_at_most_once() {
+    let dir = tempfile::tempdir().unwrap();
+    let mut db = local_graph(&dir).await;
+    let io = {
+        let (res, io) = measure(db.mutate(
+            "main",
+            MUTATION_QUERIES,
+            "insert_person",
+            &mixed_params(&[("$name", "opens")], &[("$age", 30)]),
+        ))
+        .await;
+        res.unwrap();
+        io
+    };
+    eprintln!(
+        "data_open_count={} internal_open_count={} for a single-table keyed insert",
+        io.data_open_count, io.internal_open_count
+    );
+    assert!(
+        io.data_open_count <= 1,
+        "a keyed single-table write must open its data table at most once, got {}",
+        io.data_open_count,
+    );
+}
--- a/crates/omnigraph/tests/writes.rs
+++ b/crates/omnigraph/tests/writes.rs
@ -613,7 +613,10 @@ async fn mixed_insert_and_update_on_same_person_coalesces_to_one_merge() {
        "dedupe must keep the update's age value, not the insert's",
    );

-    // One-publish guarantee: manifest version advanced by exactly 1.
+    // One-publish guarantee: manifest version advanced by exactly 1. The graph
+    // commit (`graph_commit` + `graph_head` rows) rides the SAME publish CAS as
+    // the table-version rows (RFC-013 Phase 7), so one graph commit is exactly
+    // one manifest version bump.
    let post_version = version_main(&db).await.unwrap();
    assert_eq!(
        post_version,
@ -659,7 +662,9 @@ async fn multiple_appends_to_same_edge_coalesce_to_one_append() {
    let edges_after = count_rows(&db, "edge:Knows").await;
    assert_eq!(edges_after, edges_before + 2);

-    // One manifest version bump for the two-edge query (atomic publish).
+    // One manifest version bump for the two-edge query (atomic publish): the
+    // graph commit rides the same publish CAS as the table-version rows
+    // (RFC-013 Phase 7).
    let post_version = version_main(&db).await.unwrap();
    assert_eq!(
        post_version,
@ -690,6 +695,8 @@ async fn multi_statement_inserts_publish_exactly_once() {
    .await
    .unwrap();

+    // One manifest version bump: the graph commit rides the same publish CAS
+    // as the table-version rows (RFC-013 Phase 7).
    let post_version = version_main(&db).await.unwrap();
    assert_eq!(
        post_version,
@ -1005,6 +1012,8 @@ async fn chained_updates_with_overlapping_predicate_respects_intermediate_value(
        "chained-update final value must reflect the second update applied to op-1's pending value"
    );

+    // One manifest version bump: the graph commit rides the same publish CAS
+    // as the table-version rows (RFC-013 Phase 7).
    let post_version = version_main(&db).await.unwrap();
    assert_eq!(
        post_version,
@ -1043,6 +1052,9 @@ async fn multi_statement_delete_on_same_node_table() {
        pre_persons - 2,
        "both deletes must land",
    );
+    // One manifest version bump: the graph commit (delete-only queries record
+    // one too) rides the same publish CAS as the table-version rows
+    // (RFC-013 Phase 7).
    let post_version = version_main(&db).await.unwrap();
    assert_eq!(
        post_version,
--- a/docs/dev/architecture.md
+++ b/docs/dev/architecture.md
@ -133,7 +133,7 @@ flowchart TB
    subgraph state[graph state]
        coord[GraphCoordinator]:::l2
        mr[ManifestCoordinator<br/>db/manifest.rs]:::l2
-        cg[CommitGraph<br/>_graph_commits.lance]:::l2
+        cg[CommitGraph<br/>projection of __manifest graph_commit/graph_head rows]:::l2
        stg[MutationStaging<br/>per-query in-memory accumulator<br/>exec/staging.rs]:::l2
    end

--- a/docs/dev/handoff-rfc-013-write-path.md
+++ b/docs/dev/handoff-rfc-013-write-path.md
@ -0,0 +1,460 @@
+# Handoff: finishing RFC-013 (write-path latency + correctness)
+
+**Status:** living handoff. **Source of truth is [`rfc-013-write-path-latency.md`](rfc-013-write-path-latency.md)** —
+this doc is the *current-state map + the decisions/validation from the latest work cycle
+ the concrete next actions*. When they disagree, the RFC wins (and fix this doc).
+
+**Audience:** the engineer/agent who picks up RFC-013 next.
+
+---
+
+## 0. TL;DR — where we are and what's next
+
+RFC-013 makes the write path fast **and** correct on object storage (217 Lance tables
+under one `__manifest` catalog, on R2/S3). It is sequenced as steps; read §9 of the RFC
+for the canonical list. Current reality:
+
+**Landed on `main`:**
+- **Step 1** — Tier-1 cost gate + the shared `helpers::cost` harness (#288).
+- **Step 3a** — opener bypass: write opens go direct (`Dataset::open` by URI + version)
+  instead of the Lance-namespace builder (#288). **This already banked the dominant
+  depth win** — see §2 below; it reframes everything.
+- **Step 2a** — internal-table compaction: `optimize` now compacts `__manifest` /
+  `_graph_commits` / `_graph_commit_actors` (#291). Plus the RFC latency-model
+  correction (#292).
+- **Optimize-vs-write race** — optimize survives a cross-process write race on the
+  same table (#297, **LANDED** — origin/main `6d4606a8`; see §6 for why it's not
+  redundant with Design A). Step 3b stacks on top of this.
+
+**Open PRs (land these; relationships in §7):**
+- **#296** `correctness-by-design-fix` — recovery roll-forward converges on a concurrent
+  manifest advance (the fix for the flaky `iss-schema-apply-reopen-recovery-race`).
+  **MERGED to main and integrated into this branch** — the converge helper now threads
+  Phase-7's manifest-CAS recovery `graph_commit_id` (see `converge_or_defer_roll_forward`).
+- **#295** `docs/rfc-013-step-3b` — the step-3b RFC doc.
+- **#254** `ragnorc/bug-4-schema-apply-occ` — schema-apply vs optimize false-fail
+  (same op-class family as #297, logical side).
+
+**Step 3b is DONE** (capture-once `WriteTxn`, schema-once + open-collapse; see §4) on
+`rfc-013-step-3b-writetxn-v2`. **Next: Phase 7 (step 4), then the big one — Design A /
+`PublishPlan` unification (step 5)** — see §5, the convergent fix for the bug *class* this
+area keeps generating, which also absorbs 3b's deferred session-aware write opens.
+
+---
+
+## 1. The corrected mental model (read this before touching anything)
+
+Three reframes from the latest cycle that the older RFC prose may not fully reflect:
+
+### 1a. 3a already won the depth fight → the residual is constant-factor + RTT
+Before 3a, the write re-opened each table through the lance-namespace builder ~13×, and
+that path was **O(depth)** (it re-opened `__manifest` + `list_table_versions` per open —
+**not** a Lance back-walk; the root cause was OmniGraph's own namespace round-trips, not
+Lance — validated against Lance source). 3a swapped it for the direct opener, which is
+**O(1)** (`from_uri(loc).with_version(N)` = arithmetic path + one HEAD). So:
+
+- The dominant **O(depth) data-table** term is **gone**.
+- Step 2a flattened the secondary **internal-table** scan term.
+- What remains is the **~110-hop serial backbone × RTT + compute** — a constant in
+  depth. The latency model is **`wall = (serial_hops + ops/effective_concurrency)·RTT
+  + compute`**; on a capped store (R2) the op-count term re-enters wall-clock, on an
+  unlimited store it parallelizes away. Measured: prod one-row write 27→15.76s after
+  2a; the remaining 15.76s is the serial backbone — **step 3b's target**, not step 2's.
+- Step 3b's win is therefore the **call-count/RTT collapse** (redundant opens, the
+  flat-46 schema reads), NOT a depth slope. Don't expect a depth-slope improvement from
+  3b; gate it on the constant-factor (S3 round-trips), not a curve.
+
+### 1b. Two op classes, two commit models (the §6.6 principle)
+Every concurrency bug in this area is **one op class using the other's commit model**:
+
+| class | examples | commutes? | correct commit model |
+|---|---|---|---|
+| **maintenance** | compaction (`Rewrite`), `optimize_indices` | yes (content-preserving) | Lance native rebase + app reopen/replan on real overlap + **monotonic manifest fast-forward** — no epoch, no read-set |
+| **logical mutation** | load / mutate / merge / delete | no (lost-update, write-skew) | strict cross-process OCC: read-set + write-set CAS under the `writer_epoch` fence |
+
+Applying strict OCC + equality-CAS uniformly is the mistake: too strong for maintenance
+(false conflicts — #297's bug), too weak for logical cross-process (§6.5 corruption).
+
+### 1c. The root liability (what keeps generating these bugs)
+Lance gives **per-table atomic commits** but **no cross-table/cross-step atomicity**, so
+every multi-commit op advances per-table Lance HEAD **before** the manifest references it
+(the "A-before-B window"). The resulting `HEAD vs manifest` delta is **ambiguous**
+(external drift? my own in-flight work? a crashed writer?), and **many uncoordinated code
+paths each re-interpret it** (4 writers + the maintenance path + recovery + the write-path
+drift guard). Each interpreter is a fresh chance to misclassify. That is the bug class:
+- §6.5 cross-process logical corruption,
+- #297's own-HEAD-drift misclassification,
+- the flaky write-path "HEAD ahead of manifest, run repair" guard,
+- the recovery classifier edges.
+
+**The convergent fix is Design A (one publish authority — step 5); Lance MTT eventually
+retires the window entirely.** See §5.
+
+### 1d. The second facet: the write base is a stale pin (no probe)
+The READ path resolves its base behind a freshness probe (`resolve_target_inner`
+omnigraph.rs:~1072 → `probe_latest_incarnation` → `refresh_manifest_only`); the WRITE path
+does NOT (`resolved_branch_target` omnigraph.rs:~778 returns the warm `coord.snapshot()` for
+the bound branch, no probe). So a long-lived server's write base lags the live manifest. That
+single staleness feeds **two distinct failure modes**, both surfaced this cycle:
+
+1. **Stale validation *reads* → integrity under-enforced.** Write-path RI checks read
+   committed state off the stale base. 3b's collapse #1 made it worse for edge `@card`:
+   `edge_cardinality_read_handle` (mutation.rs:~614) scans the pinned `txn.base` instead of
+   live HEAD (was live HEAD pre-3b), so a concurrent edge committed after `txn` capture is
+   uncounted → a `@card` max can be exceeded (cursor **High** / codex **P1** on #298,
+   **VALID**). **#298 fix: restore the live-HEAD read for that scan** (un-regress; gate-safe —
+   the `data_open_count` gate is a node insert) + a deterministic regression test (commit A's
+   edge, then B validates → must see A) + correct the wrong "pinned base == live HEAD" doc
+   comment (mutation.rs:~605-613, which assumes a single writer). The *structural* liability
+   underneath: there is **no unified write-validation read-set** — endpoint
+   (`ensure_node_id_exists`, warm `snapshot_for_branch`), cardinality (mutation: pinned
+   `txn.base`; loader: warm `snapshot_for_branch` — the SAME check forks per write path),
+   commit drift guard (live `fresh_snapshot_for_branch`), and uniqueness
+   (`enforce_unique_constraints_intra_batch`, intra-batch only — cross-version uniqueness is a
+   documented gap). Three freshness levels chosen ad hoc, none re-validated at commit → the
+   §7.1 TOCTOU class, and each new constraint forks the pattern again.
+
+2. **Stale OCC *pin* → false-fail on a maintenance advance.** A served strict update/delete
+   pins the stale base version, then false-fails `ExpectedVersionMismatch` after an external
+   `optimize` advanced `__manifest` — even though the advance was content-preserving
+   compaction the logical write should fast-forward past (invariant 7). It's the **write-side
+   mirror of #297/§6.6** (#297 made optimize fast-forward past a logical write; this is a
+   logical write that must fast-forward past optimize). A served read clears it (the read
+   probes the shared coordinator). Validated repro on prod (omnigraph.ragnor.co) +
+   `writes.rs::served_strict_delete_after_external_optimize_advance_auto_refreshes`
+   (`#[ignore]` on branch `fix/write-path-stale-view-probe`). **The naive "just probe" fix is
+   proven wrong** — a blanket probe silently refreshes past *logical* advances too, breaking
+   `consistency::stale_handle_public_mutation_must_refresh_then_retry` (the deliberate
+   cross-process lost-update OCC primitive). The fix must **discriminate by op class**.
+
+**Both fold into Design A (step 5), same as §1c.** `open_txn`'s one warm probe makes the base
+fresh (absorbs maintenance advances cheaply); the **op-class-aware strict precondition** —
+derive from Lance's per-version transaction metadata (all `Rewrite`/`ReserveFragments` =
+maintenance → fast-forward the pin; any `Append`/`Update`/`Delete`/`Merge` = logical → fail
+loudly; NO parallel marker, invariant 1/15) — is the correctness fence for anything that lands
+after. And the §7.1 read-set-in-CAS unifies the validation read-set + re-validates it under the
+`graph_head` contention. So **the stale-view false-fail, the cardinality/validation-read-set
+liability, and #297's mirror are one bug** (the write base is a stale, un-probed, un-classified
+pin) with **one home: the single PublishPlan delta-interpreter** (§1c + §5). Strong corroboration
+of Design A — three symptoms, one fix.
+
+---
+
+## 2. Validated facts — do NOT re-derive these
+
+Established this cycle against **Lance 7.0.0 source**
+(`~/.cargo/registry/src/index.crates.io-*/lance-7.0.0`) and current engine code. Cited so
+you can trust them without re-investigating.
+
+**Lance (upstream):**
+- `from_uri(loc).with_version(N).load()` and `checkout_version(N)` are **O(1)** (computed
+  V2 path `_versions/{u64::MAX-N:020}.manifest` + one HEAD; no listing/back-walk).
+  (`lance-table/src/io/commit.rs` `default_resolve_version`.)
+- A shared `Arc<Session>` (`DatasetBuilder::with_session`) warms metadata/index caches
+  keyed by `(URI, version, e_tag)`. Caveat: the *first* manifest read on open is uncached
+  — the Session warms the *scan/index* metadata, not the first open. **`WriteParams` *does*
+  carry a `session` field** (`lance/src/dataset/write.rs`), but it only matters on the
+  `WriteDestination::Uri` arm; OmniGraph's staged path always drives off an **already-open
+  `Dataset`**, and Lance takes the store/session from that handle. So to attach the shared
+  Session to a write base, open read-style (`open_table_dataset` → `from_uri().with_version()
+  .with_session()`) and drive the staged write off that handle.
+- A held `Arc<Dataset>` at a pinned version is `Send + Sync`, immutable, safe to reuse for
+  many scans/count/staged-write base in one txn (OmniGraph's `TableHandleCache` already
+  relies on this).
+- **No compaction `RetryExecutor`** (only Delete/MergeInsert/Update have one).
+  `commit_compaction` commits a fixed `Rewrite` via `apply_commit` direct. In
+  `commit_transaction`, a semantic `RetryableCommitConflict` **escapes the retry loop**
+  via `?` at `io/commit.rs:979`; the loop only retries the OCC `CommitConflict`
+  (`:1096`), and even that re-rebases the *same* transaction (never re-plans). ⇒
+  **compaction needs app-level reopen+REPLAN; you cannot "set conflict_retries" and let
+  Lance own it.**
+- `check_rewrite_txn`: a `Rewrite` rebases **cleanly** past a concurrent `Append`/disjoint
+  `Update`/`Delete` (preserving both); only a same-fragment overlap yields a retryable
+  conflict. ⇒ the common concurrent insert/update/delete is rebased for free; the app
+  retry fires only on real overlap.
+
+**Engine (internal):**
+- Read path (post-#268) already has the capture-once machinery: `Snapshot` (`db/manifest.rs`),
+  warm `GraphCoordinator` behind a `latest_version_id`/incarnation probe, a held
+  `TableHandleCache` keyed `(table,branch,version,e_tag)`, **one shared `Session` per
+  graph** (`read_caches.session`). **Writes bypass all of it by construction**
+  (`resolved_branch_target` returns `read_caches: None`; the 3a write opener attaches no
+  session and opens by latest, not pinned version).
+- A single write opens each table **3–4×** (accumulation → staging reopen → commit
+  drift-guard → publish prepare), each a fresh cold open. `validate_schema_contract`
+  (`db/schema_state.rs`, via `ensure_schema_state_valid`) runs uncached (~3 `read_text`
+  + 2 `exists`) at every resolve point (~the flat-46). Both are constant-factor, flat in
+  depth — 3b's targets.
+- Strict-op guards are the lost-update floor (3 layers: pre-stage `ensure_expected_version`
+  `table_store.rs`; commit-time strict drift `exec/staging.rs`; publisher CAS
+  `publisher.rs`). Capture-once **supplies** the pinned operand — never remove a guard.
+- Fork-on-first-write authority reads (`classify_fork_ref` → `fresh_snapshot_for_branch`)
+  must stay **fresh** (not served from a pinned base).
+- Cost harness: `helpers::cost` (`measure`/`measure_with_staged`/`IoCounts`/`assert_flat`/
+  `local_graph`/`s3_graph`). The schema-once assert can reuse `CountingStorageAdapter`
+  (`warm_read_cost.rs::warm_query_validates_schema_contract_once`) with **zero** prod
+  change; an open-count assert wants a small `open_count` AtomicU64 in `QueryIoProbes`
+  (copy the `probe_count`/`record_probe` pattern). The forbidden-API guard
+  (`tests/forbidden_apis.rs`) makes an instrumentation-level counter complete.
+
+---
+
+## 3. The #297 cycle (this branch) — what it is, and the lesson
+
+`fix-optimize-concurrency-race` (5 commits): a CLI `optimize` racing a served write on the
+same table failed (Lance Rewrite lost, or the equality-CAS publish lost). Fix: unify both
+compaction paths on the internal path's **reopen+replan** shape, with a **two-level retry**
+— outer loop reopens+replans on a real Lance overlap; inner Phase-C loop makes the manifest
+publish a **monotonic fast-forward** (advance to compacted version `N`, or no-op when the
+manifest already moved to `≥ N`), never the strict equality CAS. Sidecar written once;
+in-process queue kept as a contention reducer (not the cross-process guard); no `writer_epoch`.
+
+**Two review rounds surfaced two follow-on bugs I introduced with the retry loop** — both
+fixed, both regression-tested (own-HEAD-drift via negative control):
+1. **Own-HEAD-drift misclassification** (`56d004e0`): the drift guard re-ran every
+   iteration and, after a partial Phase-B commit (auto_cleanup strip or compact, then a
+   later op conflicts), saw `HEAD > manifest` from *our own* covered work and deleted the
+   sidecar + returned `skipped_for_drift` (stranding uncovered drift). Fix: track
+   `head_advanced`; the drift guard fires only when `!head_advanced`.
+2. **Publish exhaustion spurious error** (`e9d16a2c`): the publish loop returned `Err` on
+   its final retry even if the conflict meant a concurrent writer already published `≥ N`
+   (postcondition met). Fix: re-check `current >= state.version` on exhaustion.
+
+**The lesson (write it on the wall):** *wrapping a sequence of side-effecting commits in a
+retry silently converts every "checked once, before any side effect" precondition into
+"re-checked after partial side effects."* That's a distinct bug class; it needs
+fault-injection tests **at each commit boundary**, not just end-to-end concurrency tests.
+(The `optimize.before_compact` / `optimize.inject_reindex_conflict` failpoints exist for
+exactly this.)
+
+**Temporary mechanism flag:** `head_advanced` is an in-memory proxy for "is this HEAD
+movement mine." Under Design A the authority answers that from the plan/sidecar **identity**
+— so `head_advanced` is the part that gets *replaced*, while the monotonic-publish +
+reopen/replan **semantics** are permanent. (Noted in RFC §6.6.)
+
+---
+
+## 4. DONE: Step 3b — capture-once `WriteTxn` (shipped on `rfc-013-step-3b-writetxn-v2`)
+
+**Delivered:** on the **table-touch hot path**, a single `mutate`/`load` validates the schema
+contract **once** and opens each touched data table **at most once** — a constant-factor/RTT
+win (not a depth-slope win; 1a). Two cost gates in `write_cost.rs` lock it (both on a node
+insert): `write_validates_schema_contract_once` (3 `read_text` / 2 `exists`, was 12/9) and
+`keyed_insert_opens_table_at_most_once` (`data_open_count <= 1`, was 4). The carrier is the
+minimal `WriteTxn { branch, base }`, threaded as `Option<&WriteTxn>` (`Some` on the hot
+mutate/load path, `None` byte-identical everywhere else); it **converges into** step 5's
+`PublishPlan`.
+
+**Not "once" everywhere (scope, not regression):** edge endpoint / cardinality RI validation
+(`ensure_node_id_exists`, the loader's RI + cardinality) still resolves through
+`snapshot_for_branch` and re-validates the schema — and reads **warm**, not live. Threading
+`txn.base` there to make it "once" would re-introduce the stale-read class the #298 cardinality
+fix removed (it now reads live HEAD). Doing schema-once *and* fresh reads for those validations
+needs the unified, re-checked read-set — **step 4 §7.1** (§1d). So #298 **un-regresses
+cardinality only; it does not close write-validation freshness.** No edge-insert/load schema-once
+gate yet (only the node gates above).
+
+Commits (off merged-#297 main):
+- **Stage 0** — scope `open_count` → `data_open_count`/`internal_open_count` by URI class
+  (the review fix: `open_dataset_tracked` also opens `__manifest`/`_graph_commits`, so the
+  raw counter conflated them and the gate was unreachable). Re-baselined RED 4.
+- **Commit A (schema-once)** — capture `txn` once at entry (the single validation); the 4
+  validation sites collapse: S1 (entry `ensure_schema_state_valid`) removed; S3a
+  (`open_for_mutation_on_branch`) + S3b (`prepare_updates_for_commit`) source `txn.base`;
+  S4 (`commit_all`) uses new `fresh_snapshot_for_branch_unchecked` (the OCC manifest re-read
+  minus the schema re-validation). `fresh_snapshot_for_branch{,_unchecked}` now read the
+  manifest directly via `ManifestCoordinator` (drops a spurious commit-graph `exists` probe;
+  same `Snapshot`).
+- **Commit B (open collapse 4→1)** — #1 accumulation open ELIMINATED (the node path discarded
+  the handle; read `txn.base.entry().table_version`); #2 staging open KEPT (the one open);
+  #3 commit drift-guard reads live HEAD via `entry.dataset.dataset().latest_version_id()` (a
+  cheap manifest-pointer probe off the staged handle, not a fresh open); #4 index build reuses
+  the `commit_staged` handle threaded through `CommittedMutation`/`prepare_updates_for_commit`.
+- **Commit B.1 + cleanup** — named the two positional returns (`OpenedForMutation`,
+  `CommittedMutation`) + a `debug_assert` pinning the open-skip contract; **removed the
+  unearned `WriteTxn.session` field** (the collapse uses skip/probe/reuse, not a session).
+
+**RFC §4.1 corrections — how they resolved:**
+1. *Thread the evolving handle, not a version-keyed cache* → realized as collapse #4 (carry
+   the `commit_staged` handle forward into the index build).
+2. *Don't forbid re-resolution* → honored: the commit-time OCC re-read
+   (`fresh_snapshot_for_branch_unchecked` — fresh manifest, only schema-revalidation dropped)
+   and the fork-authority reads stay fresh.
+3. *Minimal carrier* → `WriteTxn { branch, base }` (even the `session` from the original
+   sketch was dropped as unearned).
+
+**Deferred to step 5 (NOT in this PR):** session-aware write base opens. The one remaining
+open (#2) stays a HEAD open; warming the shared `Session` across writes is an object-store
+(S3) phenomenon invisible on local FS, so it earns its own `write_cost_s3.rs` gate in step 5,
+where `txn` becomes the non-optional publish carrier. No new concurrency test was needed here:
+#2 stays a HEAD open (no pinned+session base introduced), so the publisher CAS + #3 live-HEAD
+probe fences are unchanged (covered by the green `writes.rs`/`consistency.rs`).
+
+**Guardrails (don't regress):** schema validation is deliberately uncached for drift
+detection — collapse to 1 *per write*, never cache across writes on a long-lived handle
+(`lifecycle::long_lived_handle_rejects_schema_*`). The commit-time fresh read is OCC
+machinery, not redundancy. Keep all 3 strict-op guards. Keep fork-authority reads fresh.
+Pin the *correct* branch (server-bound-to-main writing a feature branch falls to a fresh
+open). A branch `rfc-013-step-3b-writetxn` exists off an earlier main; rebase onto the
+post-#297 main before starting.
+
+---
+
+## 5. Design A — the `PublishPlan` unification (step 5) = the convergent fix
+
+**This is the real fix for the bug class in §1c.** Collapse the four hand-rolled writers +
+the maintenance path into **one `publish(txn, plan)` authority** where the CAS + bounded
+retry is **unconditional and unbypassable** (no caller can "hold the queue → skip the CAS").
+Properties:
+- **One interpreter of the `HEAD vs manifest` delta** — and "is this my work?" is answered
+  by the plan/sidecar **identity**, not a re-derived comparison. The own-HEAD-drift bug, the
+  §6.5 writers, the write-path guard — all close *by construction*.
+- **Recovery = the same `PublishPlan` re-applied** — the crash-recovery interpreter and the
+  live interpreter become the same code (`iss-merge-recovery-partial-rollforward` gone).
+- Each `TableAction` commits by its **class** (§1b): `Rewrite` = maintenance (Lance rebase
+  + reopen/replan + monotonic fast-forward, **no epoch**); load/mutate = logical (strict OCC
+  + `writer_epoch`).
+
+**Why it composes with Lance MTT (don't over-build):**
+- The **unification itself is convergent** — when MTT lands, it slots *underneath* the same
+  authority; nothing wasted. Build this.
+- The **`writer_epoch`** is the one MTT-redundant piece (MTT's commit-handler lease subsumes
+  a cross-process fence). Build it *last and minimally*, gated on actually deploying
+  multi-writer topologies. Per the deny-list, don't reimplement what the substrate will own.
+
+**Sequencing judgment (this cycle's strongest signal):** the bug density here (this PR alone
+= 3 review rounds, all "a writer re-interprets the delta") means the current N-writers interim
+is high integrated-over-time liability. **Consider pulling the *convergent half* of step 5
+(the single authority + recovery-as-plan) forward — possibly ahead of 3b** — because it stops
+the bug class rather than patching instances. #297 + #254 are the *de-risking inputs*: they
+validate the maintenance-class and logical-class commit models in isolation first, so Design
+A implements a known spec rather than designing under refactor pressure. Do NOT build more
+substrate-shaped scaffolding (custom WAL / job queue / second coordination table) to paper
+over the window — strictly higher liability than either Design A or waiting for MTT.
+
+**Deeper-than-A (post-MTT or as Lance exposes uncommitted variants):** all-uncommitted-fragments
+ one manifest commit would shrink the A-before-B window itself, blocked today by Lance not
+exposing uncommitted variants for `compact_files` / `optimize_indices` / vector index (#6666
+open; delete #6658 shipped). Track, don't build yet.
+
+### 5.1 Step-5 design constraints inherited from the #295 spec review
+3b shipped a **minimal** `WriteTxn { branch, base }` (schema-once + open-collapse via
+eliminate/probe/thread) and **deferred** the full §4.1 opener-unification — the pinned-base
+opener, the shared-`Session` open, the write-local **handle cache**, and the strict-op
+conflict-timing move — to step 5. So the greptile-bot comments on the #295 *spec* were **moot
+for #298** (which built none of those constructs) but are **load-bearing constraints for step
+5** when it builds them. Bank them:
+1. **Handle cache must be `Send + Sync`** (`Mutex<HashMap<…, Dataset>>`, not `RefCell`) if
+   `WriteTxn::open(&self)` is shared across concurrent stage futures — a `RefCell` compiles
+   but panics when two stages poll. Or make it `&mut self` (no parallel-stage sharing). This
+   is the deny-list "in-process-only `Dataset` impls — `Send + Sync`" item.
+2. **The strict-op timing move needs an explicit retry contract.** If step 5 moves
+   strict-op conflict detection from open-time `ensure_expected_version` to commit-time CAS
+   (the §4.1 pinned-base design), it MUST specify: the txn is **discarded after any commit**
+   (success or conflict — the handle cache is commit-invalidated), and the retry **re-opens a
+   fresh `WriteTxn` at the new HEAD** (never re-stages against the stale pinned base — that
+   reproduces the lost-update). **This is the same retry/refresh contract as the stale-view
+   false-fail (§1d.2)** — the op-class-aware precondition + "fresh base on retry" are one
+   design point. Today (#298) strict ops keep open-at-HEAD + `ensure_expected_version`, so the
+   contract is unchanged; step 5 owns it the moment it pins strict reads to the base.
+3. **The opener-equivalence test must be non-trivial.** A differential test that only passes
+   when `HEAD == base` proves nothing about pinning. To actually prove "`WriteTxn::open`
+   returns the pinned base, not HEAD," the test must **advance the branch HEAD externally
+   (direct Lance write), then assert the txn open still reads the base version** — and that a
+   strict write then fails `ExpectedVersionMismatch` at commit (verifying the timing move).
+
+---
+
+## 6. Why #297 is still needed even if you do Design A
+- Design A **relocates** #297's maintenance-class commit logic into the authority's
+  `TableAction::Rewrite` path; it does not eliminate it. #297 is the *validated spec + tests*.
+- The two regression tests + §6.6 are the **contract** Design A must keep green.
+- The prod bug is **live**; Design A is the largest write-path change in the RFC. Don't hold a
+  correctness fix hostage to a big refactor, and don't do a big refactor under bug-fix urgency.
+- Genuinely throwaway under Design A: only the loop's *location* + the `head_advanced` proxy
+  (~a dozen lines). Everything else relocates or persists. **#297 LANDED.**
+
+---
+
+## 7. Open PRs and their relationships
+- **#297** — maintenance-class fix (optimize vs write). **LANDED** (origin/main `6d4606a8`);
+  step 3b stacks on it.
+- **#254** — logical-class fix (schema-apply vs optimize false-fail). Same op-class family;
+  both are de-risking inputs for Design A's per-class commit models.
+- **#296** — recovery roll-forward converges on concurrent manifest advance. The fix
+  for the flaky `iss-schema-apply-reopen-recovery-race`. It touches `recovery.rs` and is
+  *aligned* with #297's "postcondition is the state, not winning the CAS" principle. **#296
+  landed on main first and is merged into this branch:** the converge helper
+  (`converge_or_defer_roll_forward`) was reconciled with Phase-7's manifest-CAS roll-forward —
+  on convergence the audit references the winner's folded `graph_commit_id` (the current
+  `graph_head`), not a freshly minted one.
+- **#295** — the step-3b RFC doc (apply §4's three corrections to it).
+
+---
+
+## 8. Remaining RFC steps after 3b (RFC §9 is canonical)
+- **#298 follow-up (do on the 3b PR, before merge): the edge-`@card` stale-read regression**
+  (§1d.1). Restore the live-HEAD cardinality scan, add the deterministic regression test, fix
+  the wrong doc comment. Small, gate-safe, un-regresses an integrity check (invariant 9). The
+  residual concurrent TOCTOU is the §7.1 gap (step 4) — un-widen here, don't over-reach.
+- **Step 4 / Phase 7** (`iss-991`): lineage into `__manifest` (publish `graph_commit` +
+  mutable `graph_head:<branch>` in the same merge-insert; `_graph_commits` becomes a
+  projection). Removes the per-write `commit_graph.refresh`; closes the manifest→commit-graph
+  atomicity + commit-graph-parent-under-concurrency gaps. **Hard prereq: step 2 (done).**
+  Carries the §7.1 *concurrent* write-skew fix (needs the `graph_head` contention row) —
+  **frame §7.1 as "unify the entire write-validation read-set" (endpoint + cardinality +
+  cross-version uniqueness), not merely "add `graph_head`"** (§1d.1): the bespoke
+  `edge_cardinality_read_handle` and the mutation-vs-loader freshness fork dissolve into one
+  pinned read-set re-validated under the `graph_head` contention, or the liability survives as
+  a second special-case.
+- **Step 5 / Design A** — §5 above. **Acceptance item: the served-strict-write stale-view
+  false-fail** (§1d.2) — the op-class-aware precondition + `open_txn` probe. The contract is
+  two tests passing *together*: un-ignore
+  `writes.rs::served_strict_delete_after_external_optimize_advance_auto_refreshes` (goes green)
+  *while* `consistency::stale_handle_public_mutation_must_refresh_then_retry` stays green
+  (maintenance fast-forwards; logical fails loudly). Self-contained enough to ship standalone
+  like #297 if prod pain is acute; otherwise fold into the single PublishPlan delta-interpreter.
+- **Step 2b** — internal-table cleanup + the Q8 monotonic watermark (a Lance boundary tag).
+  Deferred: only the secondary version-count/space term, touches the read/open path, and is
+  MTT-redundant. Land when version-count cost bites.
+- **§7.1 sequential write-skew** (`iss-overwrite-orphans-committed-edges`) — inbound-RI
+  validation on node removal; independent, ships anytime.
+- **#20** — the prod per-write `storage.ops` span metric (RFC §5.3), still owed.
+- Branch ops: Lance `Clone` for create (`iss-691`).
+
+---
+
+## 9. Gotchas / traps (learned the hard way)
+- **In-process queue ≠ cross-process lock.** Any "I hold the queue → skip the retry/CAS"
+  reasoning is a bug across processes. This is the recurring trap.
+- **Monotonic publish must be `≥`-conditional, never "no assertion."** The `__manifest`
+  merge-insert is unconditional `UpdateAll` keyed on `object_id` (`publisher.rs:379`), so
+  the equality (or monotonic) pre-check is the *only* guard — dropping it lets `UpdateAll`
+  regress a newer version = lost write.
+- **The drift guard interprets an ambiguous delta.** Re-evaluating it in a retry over
+  self-mutated state is how #297's follow-on bug happened. Gate any HEAD-vs-manifest
+  interpretation on "have *we* committed yet."
+- **`compact_files` fires Lance's auto_cleanup GC hook** (commits with
+  `skip_auto_cleanup=false`, no override) — optimize strips stale `lance.auto_cleanup.*`
+  config before compacting to stay non-destructive on upgraded graphs. The strip is a
+  separate commit (relevant to the partial-commit retry trap).
+- **Lance rebases the common concurrent case for free** — so the data-table conflict usually
+  surfaces as the manifest fast-forward, not a Lance error. The Lance-Rewrite-overlap path is
+  rare and needs failpoint injection to test.
+
+---
+
+## 10. Verification (the gate)
+- `cargo test --workspace --locked` — the canonical gate (matches CI).
+- `cargo test -p omnigraph-engine --features failpoints --test failpoints optimize` —
+  the optimize concurrency/recovery tests.
+- `cargo test -p omnigraph-engine --test write_cost` / `write_cost_s3` (bucket-gated) —
+  cost gates (3b adds the schema-once + open-count asserts here).
+- `cargo test -p omnigraph-engine --test maintenance` — optimize/repair/cleanup.
+- Re-read [`invariants.md`](invariants.md), [`lance.md`](lance.md), [`testing.md`](testing.md)
+  before each change (always-on requirement).
+
+Lance source for re-validation:
+`/Users/ragnor/.cargo/registry/src/index.crates.io-*/lance-7.0.0` (key files: `io/commit.rs`,
+`io/commit/conflict_resolver.rs`, `dataset/optimize.rs`, `dataset/write/retry.rs`,
+`dataset/builder.rs`).
--- a/docs/dev/index.md
+++ b/docs/dev/index.md
@ -93,6 +93,7 @@ Working documents for in-flight feature work. Removed when the work lands.
 | CLI refactoring — one addressing & config model post-`omnigraph.yaml`: scope + `--graph` + derived access path, served-default / privileged-direct, profiles, named queries, capability classifier (completes RFC-008) | [rfc-011-cli-refactoring.md](rfc-011-cli-refactoring.md) |
 | Provider-independent embedding configuration — one resolved `EmbeddingConfig` + sealed provider enum (Gemini/OpenAI/Mock), identity recorded in the schema IR, query-time same-space validation, NFR floor | [rfc-012-embedding-provider-config.md](rfc-012-embedding-provider-config.md) |
 | Write-path latency — capture-once `WriteTxn`, version-pinned opens, one `GraphPublishAuthority` fed declarative `PublishPlan`s, manifest-authoritative lineage, epoch fence, bounded history (compaction + cleanup), and an IO-counted cost contract (`iss-write-s3-roundtrip-amplification`, `iss-991`) | [rfc-013-write-path-latency.md](rfc-013-write-path-latency.md) |
+| RFC-013 handoff — current-state map, latest validation, and concrete next actions for finishing write-path latency and correctness work | [handoff-rfc-013-write-path.md](handoff-rfc-013-write-path.md) |

 ## Boundary

--- a/docs/dev/invariants.md
+++ b/docs/dev/invariants.md
@ -211,10 +211,21 @@ them explicit.
  sweep has the same exposure, and always has): it may roll a live foreign
  writer's sidecar forward, which degrades to publisher-CAS contention for
  data writes but can race the schema-staging promotion for a foreign live
-  schema apply. Multi-process writers on one graph are already documented
-  one-winner-CAS territory; closing this fully needs a cross-process
-  serialization primitive (e.g. lease-based use of the schema-apply lock
-  branch) — design it before promoting multi-process write topologies.
+  schema apply. The roll-**forward** CAS contention is now
+  convergence-idempotent: when the publish loses the CAS to a concurrent
+  writer that already reached the sidecar's goal, the sweep treats it as
+  convergence (record the `RolledForward` audit + delete) rather than a fatal
+  `ExpectedVersionMismatch`, and defers when the manifest is only partway
+  (`converge_or_defer_roll_forward` in `db/manifest/recovery.rs`;
+  iss-schema-apply-reopen-recovery-race). So a concurrent advance no longer
+  fails the open. The schema-staging promotion race and the destructive
+  roll-**back** path (Lance `Restore` "trumps" a concurrent commit, so it
+  cannot be made idempotent — iss-recovery-sweep-live-writer-rollback) still
+  need the cross-process primitive. Multi-process writers on one graph are
+  already documented one-winner-CAS territory; closing this fully needs a
+  cross-process serialization primitive (e.g. lease-based use of the
+  schema-apply lock branch) — design it before promoting multi-process write
+  topologies.
 - **Fork reclaim is in-process-safe only:** the first write to a table on a
  branch forks it (a Lance `create_branch` that advances state before the
  manifest publish). An interrupted fork (crash, or a cancelled request
@ -242,20 +253,43 @@ them explicit.
  acknowledged-before-visible bug this branch fixed. Close it (local CAS
  primitive, or a trait-level lock requirement) before admitting any
  lock-free `if_match` caller.
- **Manifest→commit-graph publish atomicity:** a graph commit advances
-  `__manifest` (the visibility authority) and then appends `_graph_commits` as
-  two separate writes (`commit_updates_with_actor_with_expected`, failpoint
-  `graph_publish.before_commit_append`). A crash between them leaves the manifest
-  at version N with no commit-graph row for N. Live reads and durability are
-  unaffected — the live version resolves via the manifest
-  (`GraphCoordinator::version()`), not the commit-graph head — and the open-time
-  recovery sweep does NOT repair it (`lance_head == manifest_pinned` classifies
-  `NoMovement`; a recovery sidecar would not change this). Impact is bounded to
-  commit history: `commit list` misses N, time-travel by commit id to N fails,
-  and merge-base loses a node (a likely-benign off-by-one re-merge). This affects
-  every publish, not a specific maintenance command. Eventual fix: make the
-  commit graph reconcilable from the manifest (or the two writes atomic) — not a
-  recovery-sidecar concern.
+- **Manifest→commit-graph publish atomicity — CLOSED (RFC-013 Phase 7):** graph
+  lineage now lives ONLY in `__manifest`, as `graph_commit` + `graph_head:<branch>`
+  rows written in the SAME `MergeInsertBuilder` commit as the table-version rows
+  (`commit_changes_with_lineage` → `GraphNamespacePublisher::publish` with a
+  `LineageIntent`). There is no second write to fail between — a graph commit and
+  its lineage land at one manifest version atomically, so a crash after the publish
+  leaves no gap. The commit-graph cache is a derived projection of those manifest
+  rows; nothing writes `_graph_commits.lance` (it persists only to carry branch
+  refs). The prior two-write gap (manifest at N with no `_graph_commits` row for N)
+  is gone by construction. A graph created before Phase 7 (internal schema v3)
+  carries its lineage only in `_graph_commits.lance`; the `migrate_v3_to_v4`
+  internal-schema step (`db/manifest/migrations.rs`) backfills it into `__manifest`
+  per-branch on the first read-write open (idempotent, crash-safe, data-preserving),
+  and a read-only open of an un-migrated v3 graph sources the DAG from
+  `_graph_commits.lance` via a stamp-gated transitional fallback so reads stay
+  correct until the first write migrates it. An old binary refuses a v4-stamped
+  graph (read-write and read-only) with the standard upgrade error. The migration
+  is **loud on failure and concurrent-runner idempotent**: the legacy-open read
+  (`read_legacy_commit_cache`) treats only a genuine not-found as "no legacy data"
+  and propagates any other open error (so a transient/corrupt open can never stamp
+  v4 over an empty backfill — orphaning lineage permanently), and the backfill
+  converges all-or-nothing when two runners open the same legacy graph at once — a
+  bounded re-open retry on the `graph_head:<branch>` row-level CAS plus an
+  idempotent terminal stamp bump (both runners write the same value, so a concurrent
+  `UpdateConfig`/`IncompatibleTransaction` loss re-opens and no-ops if the stamp
+  already landed). The branch read path (`load_commit_cache_for_branch`) also
+  refuses an out-of-range branch stamp (`> CURRENT` or `< MIN_SUPPORTED`;
+  defense-in-depth; not a live hole because migrations run main-first, so main
+  refuses first). The migration chain is **floor-bounded**:
+  `MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION` (migrations.rs; 1 today, a pure no-op) is
+  the oldest stamp this binary opens, enforced symmetrically with the ceiling by the
+  single `refuse_if_stamp_unsupported` guard at all three stamp-read sites
+  (write-path migrate, read-only open, branch lineage-read). Raising MIN sheds the
+  now-dead `migrate_vN_…` arms and (at MIN ≥ 4) the `commit_graph_legacy_v3` legacy
+  readers; a compile-time tripwire (`LOWEST_REGISTERED_MIGRATION_SOURCE`) fails the
+  build if the floor and the lowest registered arm drift. Retirement runbook lives on
+  the `MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION` doc-comment.
 - **Planner capability/stat surfaces:** cost-aware planning, complete
  capability advertisement, and explain-with-cost are roadmap. Do not describe
  them as implemented.
@ -291,19 +325,23 @@ them explicit.
  in history; but they are not yet brought into `cleanup` (version GC), so the
  `_versions/` chain still grows until an explicit cleanup (the cleanup half is
  deferred — it needs the Q8 cleanup-resurrection watermark first). The commit
-  graph is not yet reconcilable from the manifest; and the traversal id-map is
+  graph IS now reconcilable from the manifest (RFC-013 Phase 7 — it is a pure
+  projection of the `graph_commit`/`graph_head` rows); the traversal id-map is
  still rebuilt.
- **Commit-graph parent under concurrency:** `record_graph_commit` now refreshes
-  the commit-graph head from storage before appending, so a same-branch write
-  after an external commit no longer forks the commit DAG by parenting off a
-  stale cached head (the single-process fork, pre-existing for non-strict
-  inserts and widened to strict ops by Fix 1's `refresh_manifest_only`, is now
-  closed). Residual: two processes writing disjoint tables can still pass their
-  per-table manifest CAS and append off the same parent (a refresh-then-append
-  TOCTOU). The convergent fix is reconcile-from-manifest (parent = the commit at
-  the manifest version the publisher CAS'd against; `manifest_version` is on
-  every commit row), composing with the manifest-to-commit-graph atomicity gap;
-  it needs commit-graph append ordering or a Lance append-CAS to fully close.
+- **Commit-graph parent under concurrency — CLOSED (RFC-013 Phase 7):** the graph
+  commit is now recorded in the manifest publish CAS, and the publisher resolves
+  the new commit's parent INSIDE its retry loop, per attempt, from the just-loaded
+  `__manifest` (the `should_replace_head` winner over the visible `graph_commit`
+  rows). A CAS-conflict retry re-reads the advanced head and parents correctly, so
+  the refresh-then-append TOCTOU is gone. Two processes writing disjoint tables on
+  the same branch now also contend on the shared `graph_head:<branch>` row (one
+  `object_id`, `WhenMatched::UpdateAll`): one wins, the other retries and re-parents
+  — so the cross-process disjoint-table fork is closed too. This is the intended
+  §7.1 contention point, pinned by
+  `manifest::tests::concurrent_disjoint_writes_share_head_and_form_linear_chain`
+  (two disjoint writers → both commit, single linear chain) and
+  `manifest::tests::n_concurrent_disjoint_writers_converge_to_one_linear_chain`
+  (N=8 disjoint writers with app-level retry → one linear chain of 8, no fork).

 ## Deny-list

--- a/docs/dev/lance.md
+++ b/docs/dev/lance.md
@ -170,6 +170,7 @@ Migration from Lance 6.0.1 → 7.0.0 landed in this cycle. **Arrow stayed 58, Da
 - **Native `DirectoryNamespace` no longer recognizes omnigraph's manifest-tracked tables** (`lance-namespace-impls` dir.rs ~L1310): `list/describe/create_table_version` route through `check_table_status`, which reports an omnigraph table absent → `TableNotFound`. The decoupling is *contingent on omnigraph's legacy boolean PK key*, not an unconditional v7 property: v7's namespace eagerly adds the new `lance-schema:unenforced-primary-key:position` key to any `__manifest` lacking it; that write hits the immutable-PK rule above (the boolean key already set the PK), so `ensure_manifest_table_up_to_date` errors and the namespace silently falls back to directory listing. omnigraph keeps the boolean key deliberately — Lance honors it permanently (maps to PK position 0), and one uniform on-disk format beats a new-vs-old split (existing graphs can't be re-keyed to the position key under that same immutability rule). omnigraph production never uses Lance's native namespace (its publisher writes `__manifest` directly via merge_insert; its own `namespace.rs` impls are custom), so this is test-only — the `test_directory_namespace_direct_publish_cannot_replace_native_omnigraph_write_path` surface guard was realigned to the v7 behavior (it now asserts the native namespace is fully decoupled, which only strengthens the guard's thesis).
 - **Still NOT fixed in 7.0.0:** vector-index two-phase (Lance #6666 open) — `create_vector_index` inline residual retained; blob-column compaction — `compact_files_still_fails_on_blob_columns` guard still red on a fix, `optimize` still skips blob tables behind `LANCE_SUPPORTS_BLOB_COMPACTION`.
 - **No Lance API surface omnigraph uses changed at *compile* time** (the only compile break was object_store) — but **two runtime behaviors did** (the unenforced-PK immutability and the native-namespace `TableNotFound`, above), each caught by the full engine test suite rather than the build. `CleanupPolicy`, `WriteParams` (apart from the `auto_cleanup` default), `CompactionOptions`, the namespace models (resolved via `lance-namespace-reqwest-client` 0.7.7, unchanged across the bump), `Operation`, `ManifestLocation`, and `MergeInsertBuilder` shapes are all stable. Lesson: a clean build is not a clean alignment — run `cargo test --workspace` before declaring a Lance bump done.
+- **Two surface guards added by the v3→v4 migration-robustness follow-up** (not a Lance bump, but they pin Lance error surfaces the migration now classifies on): `dataset_open_missing_returns_not_found_variant` (a missing `Dataset::open` returns `DatasetNotFound`/`NotFound` — the legacy-open read in `db/commit_graph.rs::read_legacy_commit_cache` treats only those as "no legacy data" and propagates everything else) and `lance_error_incompatible_transaction_variant_exists` (a concurrent `UpdateConfig` stamp-bump loses with `IncompatibleTransaction` — `db/manifest/migrations.rs::commit_v4_stamp_idempotently` matches it to retry the benign same-value race). Re-run on a Lance bump like the others.

 Bump this date stanza on the next alignment pass.

--- a/docs/dev/rfc-013-write-path-latency.md
+++ b/docs/dev/rfc-013-write-path-latency.md
@ -523,7 +523,10 @@ struct WriteTxn {
    branch: BranchRef,
    base: PinnedSnapshot,   // {manifest_version, per-table (loc,version,e_tag), schema_hash, writer_epoch}
    session: Arc<Session>,  // shared per-graph; warms metadata/index caches across opens
-    handles: HandleCache,   // open-by-version; each table opened once, reused across stages
+    handles: HandleMap,     // open the base once WITH session; thread the handle each
+                            // commit RETURNS forward (HEAD walks N→N+1→N+2). NOT a
+                            // version-keyed cache — HEAD moves, so a (table,version) key
+                            // misses; reuse = forward the commit-return handle. [3b-validated]
 }

 // A typed, declarative publish plan — the COMPLETE "what", built before any HEAD moves.
@ -546,8 +549,17 @@ impl GraphPublishAuthority {

 Properties that make it optimal:

- **Stages take `&WriteTxn`/`&PublishPlan`, never storage** — re-resolution and
-  open-latest are *unrepresentable*. Invariants 2/3/15 hold by construction.
+- **Stages take `&WriteTxn`/`&PublishPlan` for the BASE** — re-resolving the pinned
+  read base / open-latest for the pre-commit phase is unrepresentable; invariants 2/3/15
+  hold for the base by construction. **Caveat [3b-validated]:** this is NOT "no
+  re-resolution anywhere." Three commit-boundary reads are irreducible correctness
+  machinery and MUST stay fresh: the commit-time `fresh_snapshot_for_branch` (cross-process
+  OCC), the live-HEAD drift probe (a concurrent writer may have moved HEAD since staging),
+  and the fork-authority reads (`classify_fork_ref` deliberately bypasses the cached base —
+  a pinned base there re-opens the "force-delete a live fork" bug). Model "pinned base for
+  the pre-commit phase + named fresh re-reads at the commit/fork boundary." The achievable
+  open count is **1 base open (with session) + 1 cheap `latest_version_id` probe + threaded
+  commit handles**, not literally one open.
 - **The recovery sidecar *is* the serialized `PublishPlan`.** Phase C and
  recovery both call `plan.apply()` — a merge that bumps tables A+B can never
  roll A forward and silently drop B. The
--- a/docs/dev/testing.md
+++ b/docs/dev/testing.md
@ -47,7 +47,7 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
 | `validators.rs` | Schema constraint enforcement (enum, range, unique, cardinality) across JSONL, insert, update paths |
 | `policy_engine_chassis.rs` | Engine-layer Cedar enforcement (MR-722): allow + deny through every `_as` writer via the SDK directly — no HTTP — proving embedded and CLI callers hit the same gate as the server, with action × scope shapes matching `authorize_request` |
 | `maintenance.rs` | `optimize` (compaction), `repair` (explicit uncovered-drift publish), and `cleanup` (version GC): empty/idempotent/no-op edges, policy validation, head preservation; `optimize` publishes its own compaction (`optimize_publishes_compaction_to_manifest_so_schema_apply_succeeds`), skips pre-existing uncovered drift (`optimize_skips_preexisting_manifest_head_drift`), and refuses to run while a `__recovery` sidecar is pending (`optimize_defers_when_recovery_sidecar_is_pending`); `repair` previews/heals verified maintenance drift, refuses raw semantic drift without `--force`, and forced repair publishes only by explicit operator choice; the index reconciler (iss-848): `index_build_tolerates_null_vector_rows` (an untrainable Vector column defers instead of aborting the build, sibling indexes still build) and `optimize_materializes_index_declared_but_unbuilt` (optimize creates a declared-but-deferred index) |
-| `failpoints.rs` | Failure-injection coverage (gated on `failpoints` feature). Includes the five per-writer Phase B → recovery integration tests (`recovery_rolls_forward_after_finalize_publisher_failure`, `schema_apply_phase_b_failure_recovered_on_next_open`, `branch_merge_phase_b_failure_recovered_on_next_open`, `ensure_indices_phase_b_failure_recovered_on_next_open`, `optimize_phase_b_failure_recovered_on_next_open`) and the write-entry in-process heal contract (the four `*_after_finalize_publisher_failure_heals_without_reopen` tests — load, mutation, schema apply, branch merge: a follow-up write on the same handle rolls a sidecar-covered residual forward without reopen/refresh) and the storage-fault matrix for the sidecar lifecycle (`recovery.sidecar_{write,delete,list}` / `recovery.record_audit` failpoints: Phase A put failure aborts with zero drift, Phase D delete failure is swallowed and healed by the next write, list failures are loud at heal and open, audit-append failures are retried to exactly one audit row; plus the bucket-gated `s3_load_recovers_after_publisher_failure_without_reopen`). |
+| `failpoints.rs` | Failure-injection coverage (gated on `failpoints` feature). Includes the five per-writer Phase B → recovery integration tests (`recovery_rolls_forward_after_finalize_publisher_failure`, `schema_apply_phase_b_failure_recovered_on_next_open`, `branch_merge_phase_b_failure_recovered_on_next_open`, `ensure_indices_phase_b_failure_recovered_on_next_open`, `optimize_phase_b_failure_recovered_on_next_open`) and the write-entry in-process heal contract (the four `*_after_finalize_publisher_failure_heals_without_reopen` tests — load, mutation, schema apply, branch merge: a follow-up write on the same handle rolls a sidecar-covered residual forward without reopen/refresh) and the storage-fault matrix for the sidecar lifecycle (`recovery.sidecar_{write,delete,list}` / `recovery.record_audit` failpoints: Phase A put failure aborts with zero drift, Phase D delete failure is swallowed and healed by the next write, list failures are loud at heal and open, audit-append failures are retried to exactly one audit row; plus the bucket-gated `s3_load_recovers_after_publisher_failure_without_reopen`). Also the v3→v4 migration fault-injection test (`transient_legacy_open_failure_aborts_migration_without_stamping_v4`, `migration.v3_to_v4.legacy_open` failpoint): a transient legacy-open failure aborts the migration loudly and leaves it retryable (stamp stays v3, no partial backfill), never stamping v4 over an empty backfill. Also the v4 stamp-bump exhaustion regression (`v4_stamp_exhaustion_returns_retryable_contention`, `migration.v4_stamp.force_incompatible` failpoint): the stamp retry loop surfaces a retryable `RowLevelCasContention` on exhaustion, not a stringified `Lance`. And the convergence-idempotent roll-forward regression (`open_sweep_roll_forward_converges_when_manifest_advances_concurrently`: two concurrent open-sweeps race one sidecar at the `recovery.before_roll_forward_publish` rendezvous; the CAS loser must converge, not fail the open — iss-schema-apply-reopen-recovery-race). |
 | `recovery.rs` | Open-time recovery sweep — sidecar I/O, classifier dispatch (NoMovement / RolledPastExpected / UnexpectedAtP1 / UnexpectedMultistep / InvariantViolation), all-or-nothing decision, roll-forward via `ManifestBatchPublisher::publish`, roll-back via `Dataset::restore`, audit row in `_graph_commit_recoveries.lance`, `OpenMode::ReadOnly` skip path |
 | `composite_flow.rs` | Compositional/narrative end-to-end stories — multi-step flows that compose mechanics covered by other test files. Catches integration regressions where individual operations all pass their unit tests but their composition breaks (sequential merges, post-merge main writes, time-travel through merge DAG, reopen consistency over multi-merge histories, post-optimize and post-cleanup strict writes). |

@ -65,10 +65,12 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav

 ## Failpoints (fault injection)

- Cargo feature: `failpoints = ["dep:fail", "fail/failpoints"]` (in `crates/omnigraph/Cargo.toml` **and** `crates/omnigraph-cluster/Cargo.toml`; the cluster feature does not enable the engine's).
- Wrappers: `crates/omnigraph/src/failpoints.rs` and `crates/omnigraph-cluster/src/failpoints.rs` expose `maybe_fail("name")` and `ScopedFailPoint` for tests.
- Call sites are inserted at sensitive transaction boundaries (branch create, graph publish commit, cluster apply's payload→state-write window, etc.).
- Activated tests: `crates/omnigraph/tests/failpoints.rs` and `crates/omnigraph-cluster/tests/failpoints.rs` (crash-mid-apply + state CAS race via `fail::cfg_callback`; integration binaries, never in-source — the fail registry is process-global). Run with `cargo test -p omnigraph-engine --features failpoints --test failpoints` / `cargo test -p omnigraph-cluster --features failpoints --test failpoints`.
+- Cargo feature: `failpoints = ["dep:fail", "fail/failpoints"]` in `crates/omnigraph/Cargo.toml`; the cluster's `failpoints` feature additionally enables `omnigraph/failpoints` (`crates/omnigraph-cluster/Cargo.toml`), so the shared test guard is available to cluster tests.
+- Wrappers: `crates/omnigraph/src/failpoints.rs` and `crates/omnigraph-cluster/src/failpoints.rs` each expose `maybe_fail("name")` (per-crate error type). The test-side config guard `ScopedFailPoint` (`new` for action strings, `with_callback` for callbacks; RAII `Drop` removes the point) lives **once** in the engine and is reused by both test binaries.
+- **Names are compile-checked.** Every failpoint name is a `pub const` in `omnigraph::failpoints::names` (engine) / `omnigraph_cluster::failpoints::names` (cluster). Call sites and tests reference the constant, never a bare literal — a typo is a compile error, not a silently-never-firing point. Add a new failpoint by adding its const first.
+- Call sites are inserted at sensitive transaction boundaries (branch create, graph publish commit, the recovery sweep's classify→roll-forward-publish window, cluster apply's payload→state-write window, etc.).
+- **Serialize and rendezvous, never sleep.** The `fail` registry is process-global, so every failpoint test carries `#[serial]` (`serial_test`). For concurrent tests, use `helpers::failpoint::Rendezvous` (`tests/helpers/failpoint.rs`): `park_first(name)` parks the first thread to hit the point until `release()`, and `wait_until_reached().await` blocks on that condition (it doubles as a fired-assertion). Do not coordinate threads with fixed `sleep`s.
+- Activated tests: `crates/omnigraph/tests/failpoints.rs` and `crates/omnigraph-cluster/tests/failpoints.rs` (integration binaries, never in-source — the fail registry is process-global). Run with `cargo test -p omnigraph-engine --features failpoints --test failpoints` / `cargo test -p omnigraph-cluster --features failpoints --test failpoints`.

 ## RustFS / S3 integration

--- a/docs/dev/writes.md
+++ b/docs/dev/writes.md
@ -230,8 +230,9 @@ recovery sweep in `crates/omnigraph/src/db/manifest/recovery.rs`:
  rolled-back-to version (`manifest_pinned`); the manifest is published at the
  restore commit (`manifest_pinned + 1`, same content).
 - After a successful roll-forward or roll-back, an audit row is
-  recorded — `_graph_commits.lance` carries
-  a commit tagged `actor_id = "omnigraph:recovery"`, and a sibling
+  recorded — the graph commit lineage (the `graph_commit` rows in `__manifest`
+  since RFC-013 Phase 7) carries a commit tagged
+  `actor_id = "omnigraph:recovery"`, and a sibling
  `_graph_commit_recoveries.lance` row carries `recovery_kind`,
  `recovery_for_actor` (the original sidecar's actor), `operation_id`,
  per-table outcomes. Operators run `omnigraph commit list --filter
@ -336,20 +337,40 @@ actual }`. The HTTP server maps this to **409 Conflict** with body

 ## Audit

-`actor_id` lands in `_graph_commits.lance` via `record_graph_commit` (no
-intermediate run record). Audit history is queried via `omnigraph commit
-list`.
+`actor_id` lands in the graph commit lineage — the `graph_commit` rows in
+`__manifest`, written in the publish CAS (RFC-013 Phase 7; previously
+`_graph_commits.lance`). Audit history is queried via `omnigraph commit list`.

 ## Migration code

-`db/manifest/migrations.rs` carries the v2→v3 internal-schema step (MR-770):
-a one-time sweep that deletes legacy `__run__*` staging branches off
-`__manifest`. It runs in `Omnigraph::open(ReadWrite)` (via
-`manifest::migrate_on_open`, before the coordinator reads branch state) and
-again on the publisher's write path; both are idempotent once the stamp is at
-v3. Deleting the inert `_graph_runs.lance` / `_graph_run_actors.lance` dataset
-*bytes* is still deferred — it needs a `StorageAdapter::delete_prefix`
-primitive — but those bytes are invisible to graph-level state.
+`db/manifest/migrations.rs` is the single place on-disk `__manifest` shape is
+reconciled with what the binary expects, stepping the
+`omnigraph:internal_schema_version` stamp forward one `match`-arm at a time. It
+runs in `Omnigraph::open(ReadWrite)` (via `manifest::migrate_on_open`, before the
+coordinator reads branch state) and again on the publisher's write path, so each
+branch migrates on its first write; every step is idempotent under crash-retry
+(work first, stamp bump last).
+
+- **v2→v3** (MR-770): a one-time sweep that deletes legacy `__run__*` staging
+  branches off `__manifest`. Deleting the inert `_graph_runs.lance` /
+  `_graph_run_actors.lance` dataset *bytes* is still deferred — it needs a
+  `StorageAdapter::delete_prefix` primitive — but those bytes are invisible to
+  graph-level state.
+- **v3→v4** (RFC-013 Phase 7, `migrate_v3_to_v4`): backfills the graph lineage
+  from `_graph_commits.lance` into `__manifest` as `graph_commit` / `graph_head`
+  rows. A graph created before Phase 7 has its lineage only in
+  `_graph_commits.lance`; the new binary reads lineage from the `__manifest`
+  projection, so without this backfill it would see an empty commit DAG. The
+  backfill is per-branch (each branch migrates on its first write), idempotent
+  (keyed on `object_id`; a fast-path guard skips when `__manifest` already
+  carries `graph_commit` rows), and writes exactly one `graph_head:<branch>` row
+  for the actual head. `_graph_commits.lance` is left in place as the branch-ref
+  carrier — no commit row is written to it again. While a graph is below v4, a
+  **read-only** open (which never writes, so never migrates) sources the commit
+  DAG from `_graph_commits.lance` via the stamp-gated transitional fallback in
+  `CommitGraph::open*`, so reads see correct history before the first write
+  migrates the graph. An old binary opening a v4-stamped graph is refused with an
+  "upgrade omnigraph" error in both read-write and read-only modes.

 ## Mid-query partial failure: closed by MR-794

--- a/docs/releases/v0.7.2.md
+++ b/docs/releases/v0.7.2.md
@ -0,0 +1,60 @@
+# Omnigraph v0.7.2
+
+A patch release over v0.7.1: write-path latency reductions plus three
+correctness fixes on the maintenance and recovery paths. No breaking changes, no
+on-disk format change, and no migration — drop-in over v0.7.1.
+
+## Performance
+
+- **Write opens go direct, schema validates once (#288, #298).** Write opens
+  used to route through the per-table Lance namespace catalog, which re-opened
+  the dataset just to read its location and re-resolved the latest version on
+  every table open — an O(commit-depth) double resolution that dominated write
+  latency on object stores (~70%). Writes now open each touched data table
+  directly by its manifest-recorded location (Lance's O(1) version-hint path),
+  validate the schema contract once per write instead of ~4×, and open each
+  touched table once instead of 4×.
+
+- **`optimize` compacts the internal metadata tables (#291).** `optimize`
+  previously iterated only node/edge tables, so the internal `__manifest`,
+  `_graph_commits`, and `_graph_commit_actors` tables accumulated one fragment
+  per commit and were never compacted — making every write's metadata scan grow
+  with commit history. `optimize` now compacts all three, so a periodically
+  optimized long-lived graph keeps its per-write metadata scan flat in history.
+
+## Fixes
+
+- **`optimize` survives a cross-process write race (#297).** A CLI `optimize`
+  racing a served write on the same table could fail: the in-process write queue
+  doesn't serialize across processes, so a concurrent insert/delete advancing the
+  manifest between optimize's compaction and its publish broke the strict
+  equality CAS. Optimize now reopens-and-replans on a genuine Lance conflict and
+  fast-forwards its publish monotonically, so a maintenance compaction never
+  fails a live write. Bounded retry; sustained contention surfaces a loud
+  conflict rather than dropping work.
+
+- **`optimize` is non-destructive on upgraded graphs (#291).** A graph created by
+  a pre-0.7.0 binary carries an on-by-default Lance auto-cleanup config; under it,
+  optimize's compaction commit could fire Lance's version-GC hook and prune
+  `__manifest`-pinned versions (breaking snapshots and time travel). Optimize now
+  strips any stale `lance.auto_cleanup.*` config off every table — data and
+  internal — before its HEAD-advancing commits, so compaction can never GC pinned
+  versions.
+
+- **Recovery converges instead of failing `open` under a concurrent manifest
+  advance (#296).** The open-time recovery sweep published its roll-forward at the
+  sidecar's pinned expected version; if another writer advanced the manifest
+  during the classify→publish window, the CAS failed and aborted the whole
+  `Omnigraph::open`. The sweep now treats roll-forward as "the manifest reflects
+  the sidecar's committed state," not "this sweep won the CAS": on a CAS loss it
+  re-reads the live manifest and, when the sidecar's intent is already satisfied,
+  records the recovery and deletes the sidecar idempotently — so a concurrent
+  advance no longer fails the open. (The destructive roll-back twin still defers
+  to a cross-process lease, as documented.)
+
+## Upgrade notes
+
+Drop-in over v0.7.1 — no configuration, schema, or data changes. Upgrade the
+server and CLI together as usual. Graphs created on v0.7.1 read and write
+identically on v0.7.2; the optimize non-destructive fix additionally protects
+graphs created by pre-0.7.0 binaries from version GC during compaction.
--- a/docs/releases/v0.8.0.md
+++ b/docs/releases/v0.8.0.md
@ -1,16 +1,23 @@
 # Omnigraph v0.8.0

-v0.8.0 makes every served graph an **MCP (Model Context Protocol) server**. An
-MCP-capable agent — Claude Code/Desktop, Cursor, the OpenAI Responses `mcp` tool,
-and others — can connect to a graph and operate it directly: run reads and
-mutations, load data, manage branches, browse commits, read the schema, and
-invoke the graph's curated stored queries. The surface adds no new capability and
-no new business logic; every tool delegates to the same engine/handler path the
-REST routes use and is gated by the same Cedar policy.
+v0.8.0 has two headline changes:

-## Highlights
+1. **Every served graph becomes an MCP (Model Context Protocol) server** — an
+   MCP-capable agent (Claude Code/Desktop, Cursor, the OpenAI Responses `mcp`
+   tool, and others) can connect to a graph and operate it directly. The surface
+   adds no new capability and no new business logic; every tool delegates to the
+   same engine/handler path the REST routes use and is gated by the same Cedar
+   policy. It is **additive**.
+2. **Graph commit lineage moves into `__manifest`** (RFC-013 Phase 7), folded
+   into the publish CAS, via a one-time on-disk migration (internal schema
+   **v3 → v4**). This is the first internal-schema change since v0.4.0 and carries
+   an **upgrade-order requirement** — read the upgrade notes before rolling it out.

-### MCP surface (`POST /graphs/{id}/mcp`)
+## MCP surface (`POST /graphs/{id}/mcp`)
+
+An MCP-capable agent can connect to a graph and run reads and mutations, load
+data, manage branches, browse commits, read the schema, and invoke the graph's
+curated stored queries.

 - **One MCP endpoint per served graph**, mounted automatically by the cluster
  server — no separate flag. It is a stateless Streamable-HTTP transport: a
@ -78,8 +85,56 @@ carried in the query source:
  unsupported version is a `400`); `initialize` negotiates the version in its
  body and is exempt by design.

+## Graph lineage now lives in `__manifest` (internal schema v4)
+
+The graph commit DAG (commits, parents, merge parents, per-branch heads, and the
+authoring actor) is now stored in `__manifest` as `graph_commit` / `graph_head`
+rows, written in the **same commit (CAS)** as the table-version rows of a graph
+publish. Previously the lineage lived in a separate `_graph_commits.lance`
+dataset written after the manifest commit, leaving a narrow window where a crash
+could land a manifest version with no matching lineage row. Folding the lineage
+into the publish closes that gap by construction: a graph commit and its lineage
+now land atomically at one manifest version. The in-memory commit graph is a
+projection of those manifest rows; `_graph_commits.lance` is retained only as a
+carrier for Lance branch refs and no longer receives commit rows.
+
+This bumps the `__manifest` internal schema stamp from **v3 to v4**.
+
+### Existing graphs migrate seamlessly on first write
+
+A graph created by an earlier binary (internal schema v3) keeps its lineage in
+`_graph_commits.lance` with none in `__manifest`. On the **first read-write
+open**, Omnigraph backfills that lineage into `__manifest` (the `migrate_v3_to_v4`
+internal-schema step) and bumps the stamp to v4. The migration:
+
+- is **per-branch** — each branch backfills on its first write;
+- is **idempotent and crash-safe** — the stamp bump is the last step, and the
+  backfill is keyed on the commit id, so a crash mid-migration re-runs harmlessly
+  on the next open;
+- **preserves all data** — every commit, parent, merge parent, actor, and head is
+  carried over; commit ids are stable, so existing references still resolve.
+
+No data is lost and no operator action is required beyond upgrading the binary.
+
+Before its first write migrates the graph, a **read-only** open of a v3 graph
+(e.g. `omnigraph commit list`, NDJSON export) still reads correct history via a
+transitional fallback that sources the commit DAG from `_graph_commits.lance` —
+read-only opens never write, so they never migrate, but they never show an empty
+history either.
+
 ## Upgrade notes

+- **Breaking: internal schema v4 — upgrade writer (and reader) binaries first.**
+  Internal schema v4 is a hard version gate. Once a graph has been opened for
+  write by a v0.8.0 binary, its `__manifest` is stamped v4, and an **older binary
+  will refuse to open it** — read-write *and* read-only — with an
+  `upgrade omnigraph before opening this graph` error rather than silently
+  misreading the new lineage. This is the standard forward-version protection
+  (same shape as the v1→v2 / v2→v3 steps), now enforced on the read-only path
+  too. Upgrade every writer (and reader) binary that touches a graph to v0.8.0
+  before, or together with, the first write under the new version. A mixed fleet
+  where an old binary still writes the same graph is unsupported, as with any
+  internal-schema bump.
 - **`GET /graphs/{id}/queries` is now `invoke_query`-gated (was `read`).** The
  stored-query catalog uses the same authority as invocation and the MCP
  `tools/list` surface, so discovery and invocation agree ("see the menu iff you
@ -87,8 +142,9 @@ carried in the query source:
  `403` instead of a listing; in default-deny mode the endpoint returns `403`
  until an `invoke_query` rule is configured. This is the one observable REST
  behavior change in this release.
- Otherwise no breaking changes: the rest of the REST surface, CLI, cluster
-  config, and on-disk format are unchanged. The MCP endpoint is additive.
+- **The MCP endpoint is additive.** Apart from the `GET /queries` gate change and
+  the v4 on-disk migration above, the REST surface, CLI, and cluster config are
+  unchanged.
 - **Pointing an agent at a graph:** configure your MCP client with the URL
  `https://<host>/graphs/<id>/mcp` and the same bearer token you use for REST.
  See [docs/user/operations/mcp.md](../user/operations/mcp.md) for the connect
--- a/docs/user/concepts/storage.md
+++ b/docs/user/concepts/storage.md
@ -20,13 +20,14 @@ OmniGraph is **not** a single Lance dataset; it is a *graph* of datasets coordin
 - **Layout**:
  - `nodes/{fnv1a64-hex(type_name)}` — one Lance dataset per node type
  - `edges/{fnv1a64-hex(edge_type_name)}` — one Lance dataset per edge type
-  - `__manifest/` — the catalog of all sub-tables and their published versions
-  - `_graph_commits.lance` / `_graph_commit_actors.lance` — the commit graph and its actor map
+  - `__manifest/` — the catalog of all sub-tables and their published versions, **and** the graph commit lineage (RFC-013 Phase 7)
+  - `_graph_commits.lance` / `_graph_commit_actors.lance` — legacy / branch-ref carriers. Since RFC-013 Phase 7 the graph lineage lives in `__manifest` (`graph_commit` / `graph_head` rows, written in the publish CAS); `_graph_commits.lance` no longer receives commit rows, but is retained to carry the Lance branch refs that `create_branch` / `list_branches` / the `cleanup` orphan reconciler operate on. A graph created before Phase 7 (internal schema v3) keeps its lineage here until its first read-write open, which migrates it into `__manifest` via `migrate_v3_to_v4`.
  - (legacy `_graph_runs.lance` / `_graph_run_actors.lance` from pre-v0.4.0 graphs are inert; the run state machine was removed. The internal schema migration sweeps stale `__run__*` branches on first write-open; the inert dataset bytes themselves remain until a prefix-delete storage primitive lands)
 - **Manifest row schema** (`object_id, object_type, location, metadata, base_objects, table_key, table_version, table_branch, row_count`):
-  - `object_type` ∈ `table | table_version | table_tombstone`
-  - `table_key` ∈ `node:<TypeName> | edge:<EdgeName>`
+  - `object_type` ∈ `table | table_version | table_tombstone | graph_commit | graph_head`
+  - `table_key` ∈ `node:<TypeName> | edge:<EdgeName>` (empty for `graph_commit` / `graph_head` lineage rows)
  - `table_branch` is `null` for the main lineage and the branch name otherwise
+  - **Graph lineage rows** (RFC-013 Phase 7): one immutable `graph_commit` row per commit (`object_id` = the commit ULID; `metadata` JSON carries parent / merged-parent / actor / timestamp) plus one mutable `graph_head:<branch>` pointer per branch (`graph_head:main` for main). The in-memory commit DAG is a projection of these rows.
 - **Snapshot reconstruction**: latest visible `table_version` per `(table_key, table_branch)` minus tombstones — rows where `object_type = table_tombstone`, whose own `table_version` (acting as the tombstone version) is `>= the entry's table_version`.
 - **Atomic publish**: multi-dataset commits publish so that a single write to `__manifest` flips all the new sub-table versions visible at once.
 - **Row-level CAS on the merge-insert join key**: `object_id` carries an unenforced-primary-key annotation so Lance's bloom-filter conflict resolver rejects two concurrent commits that land the same `object_id` row. Without this annotation, Lance's transparent rebase would admit silent duplicates from racing publishers.
@ -90,8 +91,8 @@ flowchart TB
 - **Graph root** is one directory (or S3 prefix). Everything below is part of one OmniGraph graph.
 - **`__manifest/`** is a Lance dataset whose rows describe which sub-table version is published at which graph-branch. Reading a snapshot starts here.
 - **`nodes/`** and **`edges/`** are sibling directories holding one Lance dataset per declared type. Names are `fnv1a64-hex` of the type name to keep paths fixed-length and case-safe.
- **`_graph_commits.lance`** is an L2 dataset that records the graph-level commit DAG, with a paired `_graph_commit_actors.lance` for the actor map. (Pre-v0.4.0 graphs also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; the internal schema migration sweeps their stale `__run__*` branches, and the dataset bytes are reclaimed once a prefix-delete primitive lands.)
- **`_graph_commit_recoveries.lance`** — one row per crash-recovery action. Joined to `_graph_commits.lance` by `graph_commit_id`; the linked commit row carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join.
+- **`_graph_commits.lance`** is an L2 dataset retained only as a branch-ref carrier (and, on a pre-Phase-7 graph, the migration source). Since RFC-013 Phase 7 the graph commit DAG lives in `__manifest` as `graph_commit` / `graph_head` rows written in the publish CAS — `_graph_commits.lance` and its paired `_graph_commit_actors.lance` no longer receive commit rows. A graph created before Phase 7 (internal schema v3) backfills its lineage into `__manifest` on its first read-write open (`migrate_v3_to_v4`). (Pre-v0.4.0 graphs also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; the internal schema migration sweeps their stale `__run__*` branches, and the dataset bytes are reclaimed once a prefix-delete primitive lands.)
+- **`_graph_commit_recoveries.lance`** — one row per crash-recovery action. Joined by `graph_commit_id` to the graph commit lineage (the `graph_commit` rows in `__manifest` since RFC-013 Phase 7); the linked commit carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join.
 - **`__recovery/{ulid}.json`** — transient sidecar files written by a writer before it advances the underlying dataset, deleted once the matching manifest publish succeeds. A sidecar persisting after process exit means the writer crashed mid-commit; the next read-write open processes it. Steady-state directory is empty.
 - **`_refs/branches/{name}.json`** is graph-level branch metadata — pointers from a branch name to the manifest version it heads.
 - **Inside each Lance dataset** (orange): the standard Lance directory layout. `_versions/{n}.manifest` records every commit; `data/` holds the actual Arrow fragments; `_indices/{uuid}/` holds index segments with their own `fragment_bitmap` for partial coverage; `_refs/` holds Lance-native per-dataset branches and tags.
--- a/docs/user/reference/constants.md
+++ b/docs/user/reference/constants.md
@ -3,12 +3,12 @@
 | Name | Value | Area |
 |---|---|---|
 | `MANIFEST_DIR` | `__manifest` | manifest layout |
-| Commit graph dir | `_graph_commits.lance` | commit graph |
+| Commit graph dir | `_graph_commits.lance` | branch-ref carrier + pre-v4 lineage source (lineage lives in `__manifest` since RFC-013 Phase 7) |
 | Run registry dir (legacy, removed) | `_graph_runs.lance` | inert post-v0.4.0; bytes remain until a prefix-delete primitive lands |
 | Run branch prefix (legacy, removed) | `__run__` | swept off `__manifest` by the internal schema migration; no longer a reserved name |
 | Schema apply lock | `__schema_apply_lock__` | schema apply |
 | Manifest publisher retry budget | `PUBLISHER_RETRY_BUDGET = 5` | manifest publish |
-| Internal manifest schema version | `INTERNAL_MANIFEST_SCHEMA_VERSION = 3` | manifest migrations |
+| Internal manifest schema version | `INTERNAL_MANIFEST_SCHEMA_VERSION = 4` | manifest migrations (v4 = graph lineage in `__manifest`, RFC-013 Phase 7) |
 | Merge stage batch | `MERGE_STAGE_BATCH_ROWS = 8192` | merge execution |
 | Maintenance concurrency | `OMNIGRAPH_MAINTENANCE_CONCURRENCY=8` | optimize/cleanup |
 | Lance blob compaction support | `LANCE_SUPPORTS_BLOB_COMPACTION = false` | optimize |
--- a/openapi.json
+++ b/openapi.json
@ -7,7 +7,7 @@
      "name": "MIT",
      "identifier": "MIT"
    },
-    "version": "0.7.1"
+    "version": "0.7.2"
  },
  "paths": {
    "/graphs": {