From 05e52f2ee0a05c5264192b7da761969faf15a7d8 Mon Sep 17 00:00:00 2001 From: Ragnor Comerford Date: Sun, 3 May 2026 13:56:36 +0200 Subject: [PATCH] recovery: rename composite test, strip ticket references, address review Three bundled changes: 1. Rename `tests/agent_lifecycle.rs` -> `tests/composite_flow.rs` (and the test function). OmniGraph is consumed by both humans and agents - naming the test after one audience misframes the library. 2. Strip Linear ticket IDs, PR numbers, bot reviewer names, and review-round labels from source, tests, and docs added by this branch. Internal traceability belongs in commit messages and PR descriptions, not in checked-in artifacts. Upstream lance-format/lance issue refs and pre-existing MR-XXX refs in docs not touched by this branch are left alone. 3. Two outstanding review findings addressed: - `needs_index_work_node` / `needs_index_work_edge`: propagate `count_rows` errors instead of `unwrap_or(0)`. Silently treating transient I/O failures as "0 rows" risked skipping a table from the recovery sidecar pin set that was actually about to be modified. - `recovery_multi_sidecar_requires_fresh_snapshot_for_correctness`: strengthen the assertion to fail when sidecar B classifies under a stale snapshot. The new assertion checks post-recovery Lance HEAD == v3 (no `Dataset::restore` ran). The previous "sidecar deleted + audit rows present" pair passed in both the bug and fix paths because both delete the sidecar and write an audit row; the differentiator is the post-recovery HEAD. Strengthening the assertion exposed an additional nuance: in this overlapping- sidecar scenario sidecar B's audit kind is RolledBack (no-op) rather than RolledForward, since sidecar A's roll-forward publishes Lance HEAD as the new manifest pin (absorbing B's work). The docstring now explains why this is correct given current `roll_forward_all` semantics. All workspace tests pass with --features failpoints. Co-Authored-By: Claude Opus 4.7 (1M context) --- AGENTS.md | 2 +- crates/omnigraph/src/db/manifest/recovery.rs | 71 +++---- crates/omnigraph/src/db/omnigraph.rs | 57 +++-- .../src/db/omnigraph/schema_apply.rs | 22 +- .../omnigraph/src/db/omnigraph/table_ops.rs | 72 ++++--- crates/omnigraph/src/db/recovery_audit.rs | 12 +- crates/omnigraph/src/exec/merge.rs | 74 +++---- crates/omnigraph/src/exec/mutation.rs | 8 +- crates/omnigraph/src/exec/staging.rs | 12 +- crates/omnigraph/src/loader/mod.rs | 17 +- crates/omnigraph/src/storage.rs | 8 +- .../{agent_lifecycle.rs => composite_flow.rs} | 69 +++---- crates/omnigraph/tests/failpoints.rs | 77 +++---- crates/omnigraph/tests/forbidden_apis.rs | 22 +- crates/omnigraph/tests/recovery.rs | 194 ++++++++++++------ crates/omnigraph/tests/staged_writes.rs | 61 +++--- docs/branches-commits.md | 2 +- docs/invariants.md | 4 +- docs/maintenance.md | 2 +- docs/runs.md | 8 +- docs/storage.md | 6 +- docs/testing.md | 4 +- 22 files changed, 430 insertions(+), 374 deletions(-) rename crates/omnigraph/tests/{agent_lifecycle.rs => composite_flow.rs} (84%) diff --git a/AGENTS.md b/AGENTS.md index be197fa..a98b974 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -198,7 +198,7 @@ omnigraph policy explain --actor act-alice --action change --branch main | Columnar storage on object store | ✅ Arrow/Lance | URI normalization, S3 env-var plumbing | | Per-dataset versioning + time travel | ✅ | `snapshot_at_version`, `entity_at`, snapshot-pinned reads across many tables | | Per-dataset branches | ✅ | **Graph-level** branches (atomic across all sub-tables), lazy fork, system branch filtering | -| Atomic single-dataset commits | ✅ | **Multi-table publish via three layers**, NOT a single Lance primitive: (1) per-table Lance `commit_staged` for the data write, (2) `__manifest` row-level CAS via `ManifestBatchPublisher` for cross-table ordering, (3) recovery-on-open reconciler (MR-847) for the residual gap between (1) and (2). All three layers ship; the four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) write a `__recovery/{ulid}.json` sidecar before Phase B and delete it after Phase C. The next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the sweep in `db/manifest/recovery.rs`: classify, decide all-or-nothing per sidecar, roll forward via single `ManifestBatchPublisher::publish` or roll back via `Dataset::restore`, and record an audit row in `_graph_commit_recoveries.lance` (queryable via `omnigraph commit list --filter actor=omnigraph:recovery`). Continuous in-process recovery (no restart needed between Phase B failure and recovery) arrives with [MR-856](https://linear.app/modernrelay/issue/MR-856) (background reconciler). Engine writes route through a sealed `TableStorage` trait (MR-793) exposing `stage_*` + `commit_staged` as the canonical staged-write surface; documented inline-commit residuals (`delete_where`, `create_vector_index`, plus legacy `append_batch` / `merge_insert_batches` / `overwrite_batch` / `create_*_index` pending Phase 9) remain on the trait until upstream Lance ships a public two-phase API ([#6658](https://github.com/lance-format/lance/issues/6658), [#6666](https://github.com/lance-format/lance/issues/6666)) and call-site conversion (Phase 1b) completes. | +| Atomic single-dataset commits | ✅ | **Multi-table publish via three layers**, NOT a single Lance primitive: (1) per-table Lance `commit_staged` for the data write, (2) `__manifest` row-level CAS via `ManifestBatchPublisher` for cross-table ordering, (3) the open-time recovery sweep for the residual gap between (1) and (2). All three layers ship; the four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) write a `__recovery/{ulid}.json` sidecar before Phase B and delete it after Phase C. The next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the sweep in `db/manifest/recovery.rs`: classify, decide all-or-nothing per sidecar, roll forward via single `ManifestBatchPublisher::publish` or roll back via `Dataset::restore`, and record an audit row in `_graph_commit_recoveries.lance` (queryable via `omnigraph commit list --filter actor=omnigraph:recovery`). Continuous in-process recovery (no restart needed between Phase B failure and recovery) is the goal of a future background reconciler. Engine writes route through a sealed `TableStorage` trait exposing `stage_*` + `commit_staged` as the canonical staged-write surface; documented inline-commit residuals (`delete_where`, `create_vector_index`, plus legacy `append_batch` / `merge_insert_batches` / `overwrite_batch` / `create_*_index`) remain on the trait until upstream Lance ships a public two-phase API ([#6658](https://github.com/lance-format/lance/issues/6658), [#6666](https://github.com/lance-format/lance/issues/6666)) and the migration of every call site completes. | | Compaction (`compact_files`) | ✅ | `omnigraph optimize` orchestrates over all node/edge tables, bounded concurrency | | Cleanup (`cleanup_old_versions`) | ✅ | `omnigraph cleanup` with `--keep` / `--older-than` policy | | BTREE / inverted (FTS) / vector indexes | ✅ | `ensure_indices` builds them on every relevant column; idempotent; lazy across branches | diff --git a/crates/omnigraph/src/db/manifest/recovery.rs b/crates/omnigraph/src/db/manifest/recovery.rs index 49f03d6..8a8f23a 100644 --- a/crates/omnigraph/src/db/manifest/recovery.rs +++ b/crates/omnigraph/src/db/manifest/recovery.rs @@ -1,8 +1,8 @@ -//! MR-847 — Recovery-on-open primitives. +//! Recovery-on-open primitives. //! //! This module implements the building blocks of the per-sidecar recovery //! sweep that closes the documented Phase B → Phase C residual (see -//! `docs/runs.md` "Finalize → publisher residual"). The high-level shape: +//! `docs/runs.md` "Open-time recovery sweep"). The high-level shape: //! //! 1. Each writer that performs a multi-table commit writes a small JSON //! sidecar at `__recovery/{ulid}.json` BEFORE its per-table @@ -17,11 +17,6 @@ //! `post_commit_pin` AND matches the sidecar) or rolls back all //! `RolledPastExpected` tables to `expected_version`. //! -//! Phase 2 (this commit) ships only the primitives: sidecar I/O, -//! classifier, decision dispatcher, restore-with-fragment-set-shortcut. -//! No integration into `Omnigraph::open` or any writer yet — those land -//! in Phase 3+. -//! //! ## Verified Lance behavior the rollback path depends on //! //! - `Dataset::restore()` takes no version arg; restores @@ -33,12 +28,10 @@ //! CreateIndex/Merge — see `check_restore_txn` at lance-4.0.0 //! `src/io/commit/conflict_resolver.rs:986`. The hazard is documented //! by `tests/staged_writes.rs::lance_restore_loses_to_concurrent_append_via_orphaning`. -//! MR-847 sidesteps this by running recovery only at `Omnigraph::open` -//! (before any other writers can race); MR-856's continuous-recovery -//! reconciler must guard via per-(table_key, branch) queue acquisition -//! once MR-686 lands. -//! -//! See `.context/mr-847-design.md` for the full design. +//! This module sidesteps the hazard by running recovery only at +//! `Omnigraph::open` (before any other writers can race). A future +//! continuous in-process recovery reconciler will need to guard via +//! per-(table_key, branch) queue acquisition. use std::collections::HashMap; @@ -271,7 +264,7 @@ pub(crate) async fn list_sidecars( // === chronologically sortable; the older sidecar is processed // before the newer one. Without this sort, `list_dir` returns // filesystem-order results which are nondeterministic and can mask - // ordering-sensitive bugs. (PR #72 review.) + // ordering-sensitive bugs. uris.sort(); let mut out = Vec::with_capacity(uris.len()); for uri in uris { @@ -335,8 +328,7 @@ pub(crate) fn parse_sidecar(sidecar_uri: &str, body: &str) -> Result Vec { /// in [`restore_table_to_version`] prevents version pile-up under /// repeated mid-rollback crashes. /// -/// Concurrency: today (pre-MR-686) recovery runs synchronously in -/// `Omnigraph::open` *before* the engine is wrapped in the server's -/// `Arc>`. No request handlers can race. Under MR-686 -/// + MR-856 (background reconciler) the per-(table_key, branch) queues -/// will need acquisition before the sweep restores or publishes — see -/// `.context/mr-847-design.md` "Concurrency policy" §"After MR-686". +/// Concurrency: today recovery runs synchronously in `Omnigraph::open` +/// *before* the engine is wrapped in the server's `Arc>`. +/// No request handlers can race. A future per-(table_key, branch) writer +/// queue model (paired with a background reconciler) will need to acquire +/// queues before the sweep restores or publishes. pub(crate) async fn recover_manifest_drift( root_uri: &str, storage: &dyn StorageAdapter, @@ -467,12 +458,12 @@ pub(crate) async fn recover_manifest_drift( return Ok(()); } - // PR #72 review (chatgpt-codex + cubic): refresh the coordinator - // snapshot BEFORE each sidecar's classification. Sidecar N's - // roll-forward writes manifest changes that sidecar N+1 must - // observe, otherwise sidecar N+1 classifies its tables against - // stale pins and may incorrectly roll back work that landed - // moments earlier. Refresh is cheap (one Lance manifest read). + // Refresh the coordinator snapshot BEFORE each sidecar's + // classification. Sidecar N's roll-forward writes manifest changes + // that sidecar N+1 must observe, otherwise sidecar N+1 classifies + // its tables against stale pins and may incorrectly roll back work + // that landed moments earlier. Refresh is cheap (one Lance manifest + // read). for sidecar in sidecars { coordinator.refresh().await?; let snapshot = coordinator.snapshot(); @@ -896,13 +887,12 @@ mod tests { ); } - /// PR #72 review (cubic + cursor) flagged that BranchMerge is in - /// the strict classifier set, but `publish_rewritten_merge_table` - /// runs multiple `commit_staged` calls per table (merge_insert + - /// delete_where + index rebuilds — the comment in `merge.rs` - /// explicitly says so). Strict classification rolls back valid - /// completed Phase B work as `UnexpectedMultistep`. BranchMerge - /// must be loose-matched like SchemaApply / EnsureIndices. + /// BranchMerge must be loose-matched, not strict: while the strict + /// classifier expects exactly one `commit_staged` per table, + /// `publish_rewritten_merge_table` runs multiple per table + /// (merge_insert + delete_where + index rebuilds — the comment in + /// `merge.rs` explicitly says so). Strict classification would roll + /// back valid completed Phase B work as `UnexpectedMultistep`. #[test] fn classify_loose_match_accepts_multi_commit_drift_for_branch_merge() { let pin = make_pin("node:Person", "irrelevant", 5, 6); @@ -1091,12 +1081,11 @@ mod tests { assert!(result.is_empty()); } - /// PR #72 review (cubic) flagged that `list_dir` returns - /// filesystem-order results, making sidecar processing - /// nondeterministic. Sidecar filenames are ULIDs (lexicographically - /// sortable === chronologically sortable), so sorting by URI gives - /// deterministic, time-ordered processing — the older sidecar - /// processed before the newer one. + /// `list_dir` returns filesystem-order results, which would make + /// sidecar processing nondeterministic. Sidecar filenames are ULIDs + /// (lexicographically sortable === chronologically sortable), so + /// sorting by URI gives deterministic, time-ordered processing — + /// the older sidecar processed before the newer one. #[tokio::test] async fn list_sidecars_returns_deterministic_order() { let dir = tempfile::tempdir().unwrap(); diff --git a/crates/omnigraph/src/db/omnigraph.rs b/crates/omnigraph/src/db/omnigraph.rs index 1734281..700ae30 100644 --- a/crates/omnigraph/src/db/omnigraph.rs +++ b/crates/omnigraph/src/db/omnigraph.rs @@ -81,17 +81,15 @@ pub struct Omnigraph { pub(crate) audit_actor_id: Option, } -/// Whether [`Omnigraph::open`] runs the MR-847 recovery sweep. +/// Whether [`Omnigraph::open`] runs the open-time recovery sweep. /// /// Recovery requires Lance writes (`Dataset::restore`, `ManifestBatchPublisher::publish`). /// Read-only consumers — NDJSON export, `commit list`, `read`, schema /// inspection — should not trigger writes (they may run with read-only -/// object-store credentials, and silent open-time mutations are surprising). -/// They also don't need recovery: reads always resolve through the manifest -/// pin, which is the consistent snapshot regardless of any Phase B → Phase C -/// drift on the per-table side. -/// -/// See `.context/mr-847-design.md` § "Read-only opens". +/// object-store credentials, and silent open-time mutations are +/// surprising). They also don't need recovery: reads always resolve +/// through the manifest pin, which is the consistent snapshot regardless +/// of any Phase B → Phase C drift on the per-table side. #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum OpenMode { /// Run the recovery sweep on open. Default for `Omnigraph::open`. @@ -142,13 +140,13 @@ impl Omnigraph { /// Open an existing repo (read-write). /// /// Reads `_schema.pg`, parses it, builds the catalog, and opens `__manifest`. - /// Runs the MR-847 recovery sweep before returning — see [`OpenMode`]. + /// Runs the open-time recovery sweep before returning — see [`OpenMode`]. pub async fn open(uri: &str) -> Result { Self::open_with_storage_and_mode(uri, storage_for_uri(uri)?, OpenMode::ReadWrite).await } /// Open an existing repo for read-only consumers (NDJSON export, - /// `commit list`, etc.). Skips the MR-847 recovery sweep — see [`OpenMode`]. + /// `commit list`, etc.). Skips the recovery sweep — see [`OpenMode`]. pub async fn open_read_only(uri: &str) -> Result { Self::open_with_storage_and_mode(uri, storage_for_uri(uri)?, OpenMode::ReadOnly).await } @@ -171,22 +169,23 @@ impl Omnigraph { // Open the coordinator first so the schema-staging recovery sweep can // compare its snapshot against any leftover staging files. let mut coordinator = GraphCoordinator::open(&root, Arc::clone(&storage)).await?; - // Both the schema-state recovery sweep AND the MR-847 recovery sweep - // are gated on `OpenMode::ReadWrite`. Read-only consumers (NDJSON - // export, `commit list`, schema show) shouldn't trigger object-store - // mutations: they may run with read-only credentials, and silent - // open-time writes are surprising. Both sweeps' work is recoverable - // on the next ReadWrite open, so skipping under ReadOnly doesn't - // lose any safety guarantees — the manifest pin is the consistent - // snapshot regardless of drift on the per-table side or leftover - // schema-staging files. + // Both the schema-state recovery sweep AND the manifest-drift + // recovery sweep are gated on `OpenMode::ReadWrite`. Read-only + // consumers (NDJSON export, `commit list`, schema show) shouldn't + // trigger object-store mutations: they may run with read-only + // credentials, and silent open-time writes are surprising. Both + // sweeps' work is recoverable on the next ReadWrite open, so + // skipping under ReadOnly doesn't lose any safety guarantees — + // the manifest pin is the consistent snapshot regardless of + // drift on the per-table side or leftover schema-staging files. if matches!(mode, OpenMode::ReadWrite) { recover_schema_state_files(&root, Arc::clone(&storage), &coordinator.snapshot()) .await?; - // MR-847 recovery sweep: close the Phase B → Phase C residual on + // Recovery sweep: close the Phase B → Phase C residual on // any sidecar left over from a crashed writer. Continuous - // in-process recovery for long-running servers is MR-856 - // (background reconciler). + // in-process recovery for long-running servers (no restart + // required between Phase B failure and recovery) is a + // separate background-reconciler effort. crate::db::manifest::recover_manifest_drift( &root, storage.as_ref(), @@ -259,27 +258,25 @@ impl Omnigraph { /// Engine-facing trait surface around `TableStore`. /// - /// MR-793 Phase 1: this is the canonical accessor for newly-written - /// engine code. The trait's signatures use opaque `SnapshotHandle` / - /// `StagedHandle` instead of leaking `lance::Dataset` / + /// This is the canonical accessor for newly-written engine code. The + /// trait's signatures use opaque `SnapshotHandle` / `StagedHandle` + /// instead of leaking `lance::Dataset` / /// `lance::dataset::transaction::Transaction`. Existing call sites /// that still use `db.table_store.X(...)` (the inherent struct - /// methods) are migrated incrementally — see §9 of - /// `.context/mr-793-design.md`. + /// methods) are migrated incrementally. pub(crate) fn storage(&self) -> &dyn crate::storage_layer::TableStorage { &self.table_store } /// Engine-level access to the object-store adapter (S3 / local fs). - /// Used by the MR-847 recovery sidecar protocol — writers in the - /// engine call this to write/delete sidecars at `__recovery/{ulid}.json`. + /// Used by the recovery sidecar protocol — writers in the engine + /// call this to write/delete sidecars at `__recovery/{ulid}.json`. pub(crate) fn storage_adapter(&self) -> &dyn crate::storage::StorageAdapter { self.storage.as_ref() } /// Engine-level access to the repo's normalized root URI. Used by - /// the MR-847 recovery sidecar protocol to compute `__recovery/` - /// paths. + /// the recovery sidecar protocol to compute `__recovery/` paths. pub(crate) fn root_uri(&self) -> &str { &self.root_uri } diff --git a/crates/omnigraph/src/db/omnigraph/schema_apply.rs b/crates/omnigraph/src/db/omnigraph/schema_apply.rs index 3ccd5a3..5d12e2a 100644 --- a/crates/omnigraph/src/db/omnigraph/schema_apply.rs +++ b/crates/omnigraph/src/db/omnigraph/schema_apply.rs @@ -151,7 +151,7 @@ pub(super) async fn apply_schema_with_lock( let mut table_updates = HashMap::::new(); let mut table_tombstones = HashMap::::new(); - // MR-847 sidecar: protect the per-table commit_staged loop in + // Recovery sidecar: protect the per-table commit_staged loop in // rewritten_tables + indexed_tables. The post_commit_pin we record // here is a lower bound (expected + 1); the classifier loose-matches // for SidecarKind::SchemaApply because the actual N depends on how @@ -281,8 +281,8 @@ pub(super) async fn apply_schema_with_lock( ) .await?; let dataset_uri = db.table_store.dataset_uri(&entry.table_path); - // MR-793 Phase 6: route through stage_overwrite + commit_staged - // for non-empty batches. Lance's `InsertBuilder::execute_uncommitted` + // Route through stage_overwrite + commit_staged for non-empty + // batches. Lance's `InsertBuilder::execute_uncommitted` // errors on empty data (lance-4.0.0 `src/dataset/write/insert.rs:144`), // so the empty-rewrite case stays on `overwrite_dataset` (which // accepts empty input). The empty case is rare in schema_apply @@ -440,13 +440,13 @@ pub(super) async fn apply_schema_with_lock( db.invalidate_graph_index().await; } - // MR-847 sidecar lifecycle: delete after the manifest commit succeeded. - // Best-effort: if this delete fails, the sidecar persists; on next open - // the sweep sees every table at the post-publish manifest pin - // (NoMovement) and the sidecar is treated as a stale artifact - // (recovery is a no-op and the sidecar is cleaned up). Failing the - // schema_apply call would report failure for a migration that - // already succeeded (PR #72 review). + // Recovery sidecar lifecycle: delete after the manifest commit + // succeeded. Best-effort: if this delete fails, the sidecar persists + // and on next open the sweep sees every table at the post-publish + // manifest pin (NoMovement) and the sidecar is treated as a stale + // artifact (recovery is a no-op and the sidecar is cleaned up). + // Failing the schema_apply call would report failure for a migration + // that already succeeded. if let Some(handle) = recovery_handle { if let Err(err) = crate::db::manifest::delete_sidecar(&handle, db.storage_adapter()).await @@ -454,7 +454,7 @@ pub(super) async fn apply_schema_with_lock( tracing::warn!( error = %err, operation_id = handle.operation_id.as_str(), - "MR-847 sidecar cleanup failed; the next open's recovery sweep will resolve it" + "recovery sidecar cleanup failed; the next open's recovery sweep will resolve it" ); } } diff --git a/crates/omnigraph/src/db/omnigraph/table_ops.rs b/crates/omnigraph/src/db/omnigraph/table_ops.rs index d4dd724..138d43d 100644 --- a/crates/omnigraph/src/db/omnigraph/table_ops.rs +++ b/crates/omnigraph/src/db/omnigraph/table_ops.rs @@ -42,14 +42,14 @@ pub(super) async fn ensure_indices_for_branch( let mut updates = Vec::new(); let active_branch = resolved.branch; - // MR-847 sidecar: protect the per-table commit_staged loop in + // Recovery sidecar: protect the per-table commit_staged loop in // build_indices_on_dataset (one commit per index built). Only pins // tables that ACTUALLY need index work — the classifier // loose-matches for SidecarKind::EnsureIndices (the actual N // depends on which indices are missing), but if a table needs zero - // commits and gets pinned, the all-or-nothing decision rule treats - // it as `NoMovement` and rolls back legitimately-committed work on - // sibling tables (PR #72 review). Steady-state runs (everything + // commits and gets pinned, the all-or-nothing decision rule + // classifies it as `NoMovement` and rolls back legitimately- + // committed work on sibling tables. Steady-state runs (everything // already indexed) skip the sidecar entirely. let mut recovery_pins: Vec = Vec::new(); for type_name in db.catalog.node_types.keys() { @@ -201,7 +201,7 @@ pub(super) async fn ensure_indices_for_branch( } } - // MR-847 failpoint: pin the per-writer Phase B → Phase C residual for + // Failpoint: pin the per-writer Phase B → Phase C residual for // ensure_indices. Lance HEAD has advanced on every touched table // (one commit_staged per index built) but the manifest publish below // hasn't run. Used by @@ -212,9 +212,10 @@ pub(super) async fn ensure_indices_for_branch( commit_prepared_updates_on_branch(db, branch, &updates).await?; } - // MR-847 sidecar lifecycle: delete after the manifest publish (or - // no-op when there were no updates — sidecar covered the per-table - // commit window regardless). Best-effort cleanup (PR #72 review). + // Recovery sidecar lifecycle: delete after the manifest publish (or + // no-op when there were no updates — the sidecar covered the + // per-table commit window regardless). Best-effort cleanup; failing + // the user here would error a call that already succeeded. if let Some(handle) = recovery_handle { if let Err(err) = crate::db::manifest::delete_sidecar(&handle, db.storage_adapter()).await @@ -222,7 +223,7 @@ pub(super) async fn ensure_indices_for_branch( tracing::warn!( error = %err, operation_id = handle.operation_id.as_str(), - "MR-847 sidecar cleanup failed; the next open's recovery sweep will resolve it" + "recovery sidecar cleanup failed; the next open's recovery sweep will resolve it" ); } } @@ -241,8 +242,8 @@ pub(super) async fn ensure_indices_for_branch( /// Per the actual `build_indices_on_dataset_for_catalog` implementation /// (this file, ~line 419-491), nodes get BTree (id) + per-prop FTS /// (@search String) + per-prop Vector indices; edges get BTree only -/// (id, src, dst). The two helpers mirror that asymmetry — see PR #72 -/// round-2 review and the `needs_index_work_edge` doc comment. +/// (id, src, dst). The two helpers mirror that asymmetry — see the +/// `needs_index_work_edge` doc comment. async fn needs_index_work_node( db: &Omnigraph, type_name: &str, @@ -254,9 +255,14 @@ async fn needs_index_work_node( .table_store .open_dataset_head_for_write(table_key, full_path, table_branch) .await?; - // Empty tables skipped by the ensure_indices loop — must not pin them - // in the sidecar (PR #72 round-2 review). - if db.table_store.count_rows(&ds, None).await.unwrap_or(0) == 0 { + // Empty tables are skipped by the ensure_indices loop, so they must + // not be pinned in the sidecar — pinning a table that produces zero + // commits classifies as NoMovement on recovery and forces all-or- + // nothing rollback of sibling tables' legitimate index work. + // Errors from count_rows are propagated: silently treating them as + // "0 rows" risks skipping a table that is actually about to be + // modified. + if db.table_store.count_rows(&ds, None).await? == 0 { return Ok(false); } if !db.table_store.has_btree_index(&ds, "id").await? { @@ -292,9 +298,7 @@ async fn needs_index_work_node( /// BTree indices (id, src, dst) per `build_indices_on_dataset_for_catalog` /// at the edge branch (this file, lines 474-485). FTS / vector indices /// on edge properties are not built today; if they ever are, this -/// helper plus the build function must be updated together. (PR #72 -/// round-1 cursor finding flagged the FTS/vector omission as a -/// possible inconsistency — confirmed intentional.) +/// helper plus the build function must be updated together. /// /// Empty edge tables are skipped by the ensure_indices loop the same /// way node tables are; see `needs_index_work_node`. @@ -308,7 +312,7 @@ async fn needs_index_work_edge( .table_store .open_dataset_head_for_write(table_key, full_path, table_branch) .await?; - if db.table_store.count_rows(&ds, None).await.unwrap_or(0) == 0 { + if db.table_store.count_rows(&ds, None).await? == 0 { return Ok(false); } Ok(!db.table_store.has_btree_index(&ds, "id").await? @@ -463,14 +467,14 @@ pub(super) async fn build_indices_on_dataset_for_catalog( } if let Some(node_type) = catalog.node_types.get(type_name) { - // Per MR-793 §10 OQ3: stage scalar indices first (BTree, - // Inverted), then call `create_vector_index` inline. The - // inline-commit on a vector index advances HEAD, which would - // invalidate any uncommitted scalar index transactions if we - // stacked them. Today the per-stage shape commits each - // scalar index immediately so the order constraint is - // implicit, but if we ever batch scalar stages we must - // ensure they all land before the vector inline-commit. + // Stage scalar indices first (BTree, Inverted), then call + // `create_vector_index` inline. The inline-commit on a vector + // index advances HEAD, which would invalidate any uncommitted + // scalar index transactions if we stacked them. Today the + // per-stage shape commits each scalar index immediately so + // the order constraint is implicit, but if we ever batch + // scalar stages we must ensure they all land before the + // vector inline-commit. for index_cols in &node_type.indices { if index_cols.len() != 1 { continue; @@ -526,13 +530,13 @@ pub(super) async fn build_indices_on_dataset_for_catalog( } /// Stage a BTREE index transaction and commit it, advancing the in-memory -/// `*ds` to the new HEAD. MR-793 Phase 4: replaces the previous -/// inline-commit `create_btree_index(ds)` call with the staged primitive -/// + an immediate `commit_staged`. Per-call behavior is unchanged -/// (HEAD advances once per index), but the bytes-on-disk and HEAD-advance -/// are now decoupled at the `TableStore` API surface — a caller that -/// needs end-of-batch atomicity can stage many transactions and commit -/// them in one pass (Phase 8's index reconciler relies on this). +/// `*ds` to the new HEAD. The staged primitive + immediate `commit_staged` +/// pair replaced the earlier inline-commit `create_btree_index(ds)` call. +/// Per-call behavior is unchanged (HEAD advances once per index), but +/// the bytes-on-disk and HEAD-advance are now decoupled at the +/// `TableStore` API surface — a caller that needs end-of-batch atomicity +/// can stage many transactions and commit them in one pass (the eventual +/// index reconciler relies on this). async fn stage_and_commit_btree( db: &Omnigraph, table_key: &str, @@ -568,7 +572,7 @@ async fn stage_and_commit_btree( } /// Stage an INVERTED (FTS) index transaction and commit it. See -/// `stage_and_commit_btree` for the MR-793 Phase 4 rationale. +/// `stage_and_commit_btree` for the rationale. async fn stage_and_commit_inverted( db: &Omnigraph, table_key: &str, diff --git a/crates/omnigraph/src/db/recovery_audit.rs b/crates/omnigraph/src/db/recovery_audit.rs index bf801ce..b7d4975 100644 --- a/crates/omnigraph/src/db/recovery_audit.rs +++ b/crates/omnigraph/src/db/recovery_audit.rs @@ -1,4 +1,4 @@ -//! MR-847 Phase 4 — Recovery audit row storage in `_graph_commit_recoveries.lance`. +//! Recovery audit row storage in `_graph_commit_recoveries.lance`. //! //! Sibling to `_graph_commits.lance` (`commit_graph.rs`). Each successful //! recovery sweep — roll-forward or roll-back — records one row here so @@ -6,12 +6,12 @@ //! `omnigraph commit list --filter actor=omnigraph:recovery` with the //! original actor whose mutation was rolled forward / back. //! -//! The schema-migration alternative (adding `recovery_for_actor` and -//! `recovery_kind` columns to `_graph_commits.lance` itself) was -//! considered and rejected for MR-847 — see `.context/mr-847-design.md` -//! § "Recovery audit model". Sibling-table is additive, doesn't bump +//! Sibling-table is additive: it doesn't bump //! `INTERNAL_MANIFEST_SCHEMA_VERSION`, and can be removed in favor of a -//! schema migration later if the join cost matters. +//! schema migration later if the join cost matters. The schema-migration +//! alternative (adding `recovery_for_actor` and `recovery_kind` columns +//! to `_graph_commits.lance` itself) was considered and rejected to keep +//! this change additive. //! //! Atomicity caveat: append to `_graph_commit_recoveries.lance` is //! sequential w.r.t. the `CommitGraph::append_commit` write. A crash diff --git a/crates/omnigraph/src/exec/merge.rs b/crates/omnigraph/src/exec/merge.rs index ae2b39d..e46a634 100644 --- a/crates/omnigraph/src/exec/merge.rs +++ b/crates/omnigraph/src/exec/merge.rs @@ -913,9 +913,9 @@ async fn publish_rewritten_merge_table( // Phase 1: merge_insert changed/new rows (preserves _row_created_at_version for // existing rows, bumps _row_last_updated_at_version only for actually-changed rows). // - // MR-793 Phase 5: routed through the staged primitive so a failure - // between writing fragments and committing leaves no Lance-HEAD - // drift. The commit_staged here is per-table per-call (Lance has no + // Routed through the staged primitive so a failure between writing + // fragments and committing leaves no Lance-HEAD drift. The + // commit_staged here is per-table per-call (Lance has no // multi-dataset atomic commit); the residual sits at this single // commit point, narrowed from the previous "merge_insert + delete + // index" multi-step inline-commit chain. @@ -957,11 +957,11 @@ async fn publish_rewritten_merge_table( // // INLINE-COMMIT RESIDUAL: lance-4.0.0 does not expose a public // two-phase delete API (DeleteJob is `pub(crate)` — - // lance-format/lance#6658 is open with no PRs). MR-793 deliberately - // does NOT introduce a `stage_delete` wrapper that would secretly - // inline-commit (a side-channel — see design doc §3.2). When the - // upstream API ships, swap this `delete_where` call for - // `stage_delete` + `commit_staged`. + // lance-format/lance#6658 is open with no PRs). We deliberately do + // NOT introduce a `stage_delete` wrapper that would secretly + // inline-commit (it would create a side-channel between the staged + // and inline write paths). When the upstream API ships, swap this + // `delete_where` call for `stage_delete` + `commit_staged`. if !staged.deleted_ids.is_empty() { let escaped: Vec = staged .deleted_ids @@ -977,11 +977,11 @@ async fn publish_rewritten_merge_table( // Phase 3: rebuild indices. // - // `build_indices_on_dataset` was migrated in MR-793 Phase 4 to use - // `stage_create_btree_index` / `stage_create_inverted_index` + - // `commit_staged` for scalar indices. Vector indices remain inline - // (residual — `build_index_metadata_from_segments` is `pub(crate)` - // in lance-4.0.0; companion ticket to lance-format/lance#6658). + // `build_indices_on_dataset` uses `stage_create_btree_index` / + // `stage_create_inverted_index` + `commit_staged` for scalar + // indices. Vector indices remain inline-commit + // (`build_index_metadata_from_segments` is `pub(crate)` in lance- + // 4.0.0 — companion ticket to lance-format/lance#6658). let row_count = target_db .table_store() .table_state(&full_path, ¤t_ds) @@ -1167,13 +1167,14 @@ impl Omnigraph { validate_merge_candidates(self, source_snapshot, &target_snapshot, &candidates).await?; - // MR-847 sidecar: protect the per-table commit_staged loop. Pins - // every table that will be touched by `publish_adopted_source_state` - // or `publish_rewritten_merge_table`. BranchMerge uses loose - // classification — the publish path may run multiple commit_staged - // calls per table (publish_rewritten_merge_table does - // stage_merge_insert + delete_where + index rebuilds per the - // existing branch-merge code path). + // Recovery sidecar: protect the per-table commit_staged loop. + // Pins every table that will be touched by + // `publish_adopted_source_state` or `publish_rewritten_merge_table`. + // BranchMerge uses loose classification — the publish path may + // run multiple commit_staged calls per table + // (publish_rewritten_merge_table does stage_merge_insert + + // delete_where + index rebuilds per the existing branch-merge + // code path). let recovery_pins: Vec = ordered_table_keys .iter() .filter(|tk| candidates.contains_key(*tk)) @@ -1190,15 +1191,15 @@ impl Omnigraph { let recovery_handle = if recovery_pins.is_empty() { None } else { - // PR #72 review (chatgpt-codex + cubic): use the merge target - // branch directly, NOT a heuristic derived from - // `ordered_table_keys.first()`. The first sorted table key may - // not be in the target snapshot at all (its `entry()` returns - // None → branch becomes None == main), and the SubTableEntry's - // `table_branch` field isn't necessarily the merge target - // branch. The caller `branch_merge` calls - // `swap_coordinator_for_branch(target_branch)` before invoking - // this function, so `self.active_branch()` is the target. + // Use the merge target branch directly, NOT a heuristic + // derived from `ordered_table_keys.first()`. The first + // sorted table key may not be in the target snapshot at all + // (its `entry()` returns None → branch becomes None == main), + // and the SubTableEntry's `table_branch` field isn't + // necessarily the merge target branch. The caller + // `branch_merge` calls `swap_coordinator_for_branch(target_branch)` + // before invoking this function, so `self.active_branch()` + // is the target. let target_branch = self.active_branch().map(str::to_string); let sidecar = crate::db::manifest::new_sidecar( crate::db::manifest::SidecarKind::BranchMerge, @@ -1244,10 +1245,10 @@ impl Omnigraph { updates.push(update); } - // MR-847 failpoint: pin the per-writer Phase B → Phase C residual - // for branch_merge. Lance HEAD has advanced on every touched - // table (publish_*) but the manifest publish below hasn't run. - // Used by `tests/failpoints.rs::branch_merge_phase_b_failure_recovered_on_next_open`. + // Failpoint: pin the per-writer Phase B → Phase C residual for + // branch_merge. Lance HEAD has advanced on every touched table + // (publish_*) but the manifest publish below hasn't run. Used + // by `tests/failpoints.rs::branch_merge_phase_b_failure_recovered_on_next_open`. crate::failpoints::maybe_fail("branch_merge.post_phase_b_pre_manifest_commit")?; let manifest_version = if updates.is_empty() { @@ -1256,8 +1257,9 @@ impl Omnigraph { self.commit_manifest_updates(&updates).await? }; - // MR-847 sidecar lifecycle: delete after manifest publish. - // Best-effort cleanup (PR #72 review). + // Recovery sidecar lifecycle: delete after manifest publish. + // Best-effort cleanup; the merge already landed durably so + // failing the user here is undesirable. if let Some(handle) = recovery_handle { if let Err(err) = crate::db::manifest::delete_sidecar(&handle, self.storage_adapter()).await @@ -1265,7 +1267,7 @@ impl Omnigraph { tracing::warn!( error = %err, operation_id = handle.operation_id.as_str(), - "MR-847 sidecar cleanup failed; the next open's recovery sweep will resolve it" + "recovery sidecar cleanup failed; the next open's recovery sweep will resolve it" ); } } diff --git a/crates/omnigraph/src/exec/mutation.rs b/crates/omnigraph/src/exec/mutation.rs index 81ae18d..c6d2737 100644 --- a/crates/omnigraph/src/exec/mutation.rs +++ b/crates/omnigraph/src/exec/mutation.rs @@ -751,8 +751,8 @@ impl Omnigraph { // the publisher's CAS pre-check rejects (or the manifest // write throws) after staged commits succeeded. The // sidecar written inside `staging.finalize()` persists - // across this failure so the next `Omnigraph::open` - // (MR-847 recovery sweep) can roll forward — see + // across this failure so the next `Omnigraph::open`'s + // recovery sweep can roll forward — see // `tests/failpoints.rs::recovery_rolls_forward_after_finalize_publisher_failure`. crate::failpoints::maybe_fail("mutation.post_finalize_pre_publisher")?; self.commit_updates_on_branch_with_expected( @@ -774,7 +774,7 @@ impl Omnigraph { // as `NoMovement` (manifest pin == Lance HEAD == // post_commit_pin) and tidies up. Failing the user // here would return an error for a write that - // already landed (PR #72 review). + // already landed. if let Err(err) = crate::db::manifest::delete_sidecar( &handle, self.storage_adapter(), @@ -784,7 +784,7 @@ impl Omnigraph { tracing::warn!( error = %err, operation_id = handle.operation_id.as_str(), - "MR-847 sidecar cleanup failed; the next open's recovery sweep will resolve it" + "recovery sidecar cleanup failed; the next open's recovery sweep will resolve it" ); } } diff --git a/crates/omnigraph/src/exec/staging.rs b/crates/omnigraph/src/exec/staging.rs index b1d392d..c32b688 100644 --- a/crates/omnigraph/src/exec/staging.rs +++ b/crates/omnigraph/src/exec/staging.rs @@ -238,12 +238,12 @@ impl MutationStaging { let mut updates: Vec = inline_committed.into_values().collect(); - // MR-847 — sidecar protocol. Build the per-table pin list BEFORE - // any Lance commit_staged runs, then write the sidecar so a crash - // between Phase B (this loop's commit_staged calls) and Phase C - // (the manifest publish in the caller) is recoverable on next - // open. Skipped when `pending` is empty (delete-only mutation; - // D₂ parse-time rule keeps deletes out of this code path so this + // Sidecar protocol: build the per-table pin list BEFORE any Lance + // commit_staged runs, then write the sidecar so a crash between + // Phase B (this loop's commit_staged calls) and Phase C (the + // manifest publish in the caller) is recoverable on next open. + // Skipped when `pending` is empty (delete-only mutation; the D₂ + // parse-time rule keeps deletes out of this code path so this // branch is reached only for the inline-committed-only case). let pins: Vec = pending .iter() diff --git a/crates/omnigraph/src/loader/mod.rs b/crates/omnigraph/src/loader/mod.rs index 99540c5..cbb3051 100644 --- a/crates/omnigraph/src/loader/mod.rs +++ b/crates/omnigraph/src/loader/mod.rs @@ -542,9 +542,10 @@ async fn load_jsonl_reader( .await?; db.commit_updates_on_branch_with_expected(branch, &updates, &expected_versions) .await?; - // MR-847: sidecar protects the per-table commit_staged → + // The recovery sidecar protects the per-table commit_staged → // manifest publish window. Phase C succeeded — clean up - // best-effort (PR #72 review). + // best-effort: failing the user here would error out a write + // that already landed durably. if let Some(handle) = sidecar_handle { if let Err(err) = crate::db::manifest::delete_sidecar(&handle, db.storage_adapter()).await @@ -552,18 +553,18 @@ async fn load_jsonl_reader( tracing::warn!( error = %err, operation_id = handle.operation_id.as_str(), - "MR-847 sidecar cleanup failed; the next open's recovery sweep will resolve it" + "recovery sidecar cleanup failed; the next open's recovery sweep will resolve it" ); } } } else { // LoadMode::Overwrite keeps the legacy inline-commit path — // truncate-then-append doesn't fit the staged shape (see - // `docs/runs.md` "LoadMode::Overwrite residual"). MR-847 sidecar - // is not applicable here because the writer doesn't go through - // MutationStaging; per-table inline commits + a final manifest - // publish handle their own residual via documented operator - // workflow (re-run overwrite to recover). + // `docs/runs.md` "LoadMode::Overwrite residual"). The recovery + // sidecar is not applicable here because the writer doesn't go + // through MutationStaging; per-table inline commits + a final + // manifest publish handle their own residual via the documented + // operator workflow (re-run overwrite to recover). db.commit_updates_on_branch_with_expected( branch, &overwrite_updates, diff --git a/crates/omnigraph/src/storage.rs b/crates/omnigraph/src/storage.rs index a895c02..5d2e568 100644 --- a/crates/omnigraph/src/storage.rs +++ b/crates/omnigraph/src/storage.rs @@ -64,10 +64,10 @@ impl StorageAdapter for LocalStorageAdapter { async fn write_text(&self, uri: &str, contents: &str) -> Result<()> { let path = local_path_from_uri(uri)?; // Ensure parent directory exists. S3 has no equivalent (PutObject - // is path-agnostic). For local fs, callers like the MR-847 - // recovery sidecar protocol expect transparent directory - // creation under the repo root (the `__recovery/` directory - // doesn't pre-exist; first sidecar write creates it). + // is path-agnostic). For local fs, callers like the recovery + // sidecar protocol expect transparent directory creation under + // the repo root (the `__recovery/` directory doesn't pre-exist; + // first sidecar write creates it). if let Some(parent) = path.parent() { if !parent.as_os_str().is_empty() { tokio::fs::create_dir_all(parent).await?; diff --git a/crates/omnigraph/tests/agent_lifecycle.rs b/crates/omnigraph/tests/composite_flow.rs similarity index 84% rename from crates/omnigraph/tests/agent_lifecycle.rs rename to crates/omnigraph/tests/composite_flow.rs index 965d6cf..d7a8725 100644 --- a/crates/omnigraph/tests/agent_lifecycle.rs +++ b/crates/omnigraph/tests/composite_flow.rs @@ -1,15 +1,13 @@ -//! MR-858 — Composite agent-lifecycle integration test. +//! Composite end-to-end flow integration test. //! -//! Walks the canonical agent narrative end to end in one fixture: -//! init → load → branch → mutate → query → merge → time-travel → -//! optimize → cleanup → reopen. Every numbered step has at least one -//! assertion. +//! Walks the canonical user flow in one fixture: init → load → branch → +//! mutate → query → merge → time-travel → optimize → cleanup → reopen. +//! Every numbered step has at least one assertion. //! -//! This is the **deterministic narrative** counterpart to MR-783's -//! randomized/property-based reliability harness — the test that -//! catches a regression where individual operations all work but their -//! composition under realistic agent usage breaks. It runs in CI on -//! every PR (no `#[ignore]`). +//! This is the deterministic narrative counterpart to a randomized / +//! property-based reliability harness — it catches regressions where +//! individual operations all pass their unit tests but their composition +//! breaks. It runs in CI on every PR (no `#[ignore]`). mod helpers; @@ -27,7 +25,7 @@ const TEST_DATA: &str = include_str!("fixtures/test.jsonl"); const TEST_QUERIES: &str = include_str!("fixtures/test.gq"); #[tokio::test] -async fn agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup() { +async fn composite_flow_init_load_branch_merge_time_travel_optimize_cleanup() { let dir = tempfile::tempdir().unwrap(); let uri = dir.path().to_str().unwrap(); @@ -215,7 +213,7 @@ async fn agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup() { v_pre_merge_main, v_post_merge, ); - let _ = merge_outcome; // outcome is structured; presence of Ok already validates audit/merge_commit recorded + let _ = merge_outcome; // ───────────────────────────────────────────────────────────────── // Step 8: query at the post-merge snapshot — verify both sides' @@ -277,24 +275,19 @@ async fn agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup() { // Step 10: optimize the post-merge graph — verify indices stay // valid and queryable. // - // **Known limitation** (uncovered by this composite test, surfaced - // for follow-up in MR-859 `omnigraph optimize` + `cleanup` integration - // coverage): `optimize_all_tables` (`db/omnigraph/optimize.rs:77`) - // calls Lance `compact_files` directly — it advances per-table Lance - // HEAD without updating the omnigraph `__manifest` pin. After - // optimize, the next writer's expected_table_versions captures the + // **Known limitation**: `optimize_all_tables` calls Lance + // `compact_files` directly — it advances per-table Lance HEAD + // without updating the omnigraph `__manifest` pin. After optimize, + // the next writer's expected_table_versions captures the // pre-optimize manifest pin, but the publisher's pre-check reads // a higher version from the manifest dataset (because some other - // path — possibly the schema-state recovery on reopen — wrote a - // newer __manifest row). The `ExpectedVersionMismatch` is benign - // (re-issuing the mutation after `db.refresh()` succeeds), but the - // composite test cannot reliably exercise post-optimize mutations - // until that path is investigated under MR-859. - // - // For this test we verify optimize completes and reads still work, - // then SKIP the post-optimize mutation step. The full coverage - // (mutation succeeds after optimize without manual refresh) lives in - // the MR-859 follow-up. + // path — possibly schema-state recovery on reopen — wrote a newer + // __manifest row). The `ExpectedVersionMismatch` is benign + // (re-issuing the mutation after a snapshot refresh succeeds), but + // a composite test cannot reliably exercise post-optimize mutations + // until that path is investigated. Coverage of post-optimize + // mutations is left to a focused optimize+cleanup integration test. + // ───────────────────────────────────────────────────────────────── let optimize_stats = db.optimize().await.unwrap(); assert!( !optimize_stats.is_empty(), @@ -324,9 +317,9 @@ async fn agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup() { // Step 11: cleanup — keep last 10 versions, only purge versions // older than 1 hour. With this small test, we have well under 10 // versions and nothing that old, so cleanup is a no-op except for - // any orphan files. The MR-847 recovery floor (--keep ≥ 3) is - // preserved by the keep-10 default. Verify the call doesn't break - // subsequent queries. + // any orphan files. The recovery floor (--keep ≥ 3) needed for the + // open-time recovery sweep is preserved by the keep-10 default. + // Verify the call doesn't break subsequent queries. // ───────────────────────────────────────────────────────────────── use omnigraph::db::CleanupPolicyOptions; use std::time::Duration; @@ -338,9 +331,6 @@ async fn agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup() { .await .unwrap(); - // Recovery audit dataset, if present, must survive cleanup. - // (No recovery happened in this test, so it may not exist.) - // ───────────────────────────────────────────────────────────────── // Step 12: reopen the engine — verify post-cleanup state is consistent. // ───────────────────────────────────────────────────────────────── @@ -366,7 +356,9 @@ async fn agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup() { ); // Final query exercise — full read path works post-reopen, - // post-cleanup. + // post-cleanup. Post-cleanup mutation is omitted here pending + // resolution of the optimize-vs-manifest-pin interaction documented + // in Step 10. let final_total = query_main( &mut db, TEST_QUERIES, @@ -377,13 +369,6 @@ async fn agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup() { .unwrap(); assert!(!final_total.batches().is_empty()); - // Final mutation skipped — post-optimize mutation surfaces - // `ExpectedVersionMismatch` because optimize advances Lance HEAD - // without updating the manifest pin (see Step 10 note above and - // MR-859 follow-up). The MR-859 ticket covers post-optimize - // mutation correctness explicitly. This test asserts the read - // path is intact post-cleanup-reopen, which is the more important - // user-visible property. let final_total = query_main( &mut db, TEST_QUERIES, diff --git a/crates/omnigraph/tests/failpoints.rs b/crates/omnigraph/tests/failpoints.rs index e882193..f782419 100644 --- a/crates/omnigraph/tests/failpoints.rs +++ b/crates/omnigraph/tests/failpoints.rs @@ -140,8 +140,8 @@ async fn schema_apply_recovers_partial_rename() { assert_no_staging_files(dir.path()); } -/// Prove the MR-847 recovery sweep closes the "finalize → publisher -/// residual" across one open cycle — the post-MR-847 contract. +/// Prove the recovery sweep closes the "finalize → publisher residual" +/// across one open cycle. /// /// `MutationStaging::finalize` runs `commit_staged` per touched table /// sequentially before the publisher commits the manifest. Lance has no @@ -149,16 +149,15 @@ async fn schema_apply_recovers_partial_rename() { /// per-table staged commits and the manifest commit leaves Lance HEAD /// advanced on the touched tables with no manifest update. /// -/// Pre-MR-847: the next mutation surfaced `ExpectedVersionMismatch` and -/// the residual persisted until process restart. Post-MR-847: the -/// finalize writes a sidecar at `__recovery/{ulid}.json` BEFORE Phase B, -/// the failpoint fires AFTER finalize but BEFORE the publisher, the -/// engine handle is dropped, and the next `Omnigraph::open` runs the -/// recovery sweep. The sweep classifies every table in the sidecar as -/// `RolledPastExpected` (Lance HEAD == expected + 1, post_commit_pin -/// matches), decides RollForward, atomically extends every manifest pin -/// via `ManifestBatchPublisher::publish`, records an audit row, and -/// deletes the sidecar. +/// Closing the residual: finalize writes a sidecar at +/// `__recovery/{ulid}.json` BEFORE Phase B, the failpoint fires AFTER +/// finalize but BEFORE the publisher, the engine handle is dropped, and +/// the next `Omnigraph::open` runs the recovery sweep. The sweep +/// classifies every table in the sidecar as `RolledPastExpected` (Lance +/// HEAD == expected + 1, post_commit_pin matches), decides RollForward, +/// atomically extends every manifest pin via +/// `ManifestBatchPublisher::publish`, records an audit row, and deletes +/// the sidecar. /// /// After this test passes: /// - The originally-attempted insert ("Eve") is visible via a normal @@ -169,7 +168,7 @@ async fn schema_apply_recovers_partial_rename() { /// `actor_id` in `recovery_for_actor`. /// /// Continuous in-process recovery (no restart needed between failure -/// and recovery) is MR-856 (background reconciler). +/// and recovery) is the goal of a future background reconciler. #[tokio::test] async fn recovery_rolls_forward_after_finalize_publisher_failure() { let _scenario = FailScenario::setup(); @@ -303,11 +302,11 @@ async fn finalize_publisher_residual_does_not_drift_untouched_tables() { .expect("Company write on a non-drifted table should succeed"); } -/// MR-793 Phase 4 acceptance bar — proves that a Phase A failure in -/// the staged-index path (`stage_create_btree_index` succeeded; -/// `commit_staged` not yet called) leaves NO Lance-HEAD drift on the -/// existing tables. Subsequent operations against those tables succeed -/// without `ExpectedVersionMismatch`. +/// Acceptance test: a Phase A failure in the staged-index path +/// (`stage_create_btree_index` succeeded; `commit_staged` not yet +/// called) leaves NO Lance-HEAD drift on the existing tables. +/// Subsequent operations against those tables succeed without +/// `ExpectedVersionMismatch`. /// /// Path: `apply_schema(v1 → v2)` adds a new node type. The /// `added_tables` loop in `schema_apply` creates the empty dataset and @@ -384,14 +383,15 @@ fn assert_no_staging_files(repo: &std::path::Path) { } // ===================================================================== -// MR-847 Phase 9 — per-writer Phase B → Phase C recovery integration +// Per-writer Phase B → Phase C recovery integration // ===================================================================== // -// Each of the four migrated writers writes a sidecar BEFORE its per-table -// commit_staged loop and deletes it AFTER the manifest publish. The -// `recovery_rolls_forward_after_finalize_publisher_failure` test above -// covers MutationStaging::finalize. The three tests below cover the -// other three writers: schema_apply, branch_merge, ensure_indices. +// Each of the four migrated writers writes a sidecar BEFORE its +// per-table commit_staged loop and deletes it AFTER the manifest +// publish. The `recovery_rolls_forward_after_finalize_publisher_failure` +// test above covers MutationStaging::finalize. The three tests below +// cover the other three writers: schema_apply, branch_merge, +// ensure_indices. // // Each follows the same shape: trigger the writer with a failpoint // active in the Phase B → Phase C window, drop the engine, reopen, @@ -424,7 +424,7 @@ async fn schema_apply_phase_b_failure_recovered_on_next_open() { // Phase A: trigger the residual via `schema_apply.after_staging_write`. // This failpoint fires AFTER the rewritten_tables/indexed_tables loops // (Lance HEAD advanced) AND AFTER the schema-state staging files are - // written, but BEFORE the manifest publish. The MR-847 sidecar persists. + // written, but BEFORE the manifest publish. The recovery sidecar persists. { let mut db = Omnigraph::open(&uri).await.unwrap(); let _failpoint = ScopedFailPoint::new("schema_apply.after_staging_write", "return"); @@ -434,7 +434,7 @@ async fn schema_apply_phase_b_failure_recovered_on_next_open() { // overall table set — required to keep `recover_schema_state_files` // (which runs BEFORE recover_manifest_drift) happy: it can't // disambiguate property-only migrations and would reject the - // open before the MR-847 sweep ever ran. + // open before the recovery sweep ever ran. let v2_schema = r#"node Person { name: String @key age: I32? @@ -582,11 +582,11 @@ async fn branch_merge_phase_b_failure_recovered_on_next_open() { ); } -/// PR #72 round-2 fix: `ensure_indices` only writes a sidecar when at -/// least one table genuinely needs index work (per `needs_index_work_*` -/// helpers in `db/omnigraph/table_ops.rs`). When all tables are -/// steady-state (every declared index already built, or empty tables -/// that the loop skips), the sidecar is omitted entirely. +/// `ensure_indices` only writes a sidecar when at least one table +/// genuinely needs index work (per `needs_index_work_*` helpers in +/// `db/omnigraph/table_ops.rs`). When all tables are steady-state +/// (every declared index already built, or empty tables that the loop +/// skips), the sidecar is omitted entirely. /// /// Test setup: `load_jsonl` auto-builds indices via /// `prepare_updates_for_commit`. So after the load, every Person/Knows @@ -595,11 +595,11 @@ async fn branch_merge_phase_b_failure_recovered_on_next_open() { /// after the loops), so the call returns Err — but no recovery state /// persists. Reopen is a clean no-op. /// -/// (Triggering an actual sidecar persistence requires bypassing +/// Triggering an actual sidecar persistence requires bypassing /// `load_jsonl`'s auto-build via raw `TableStore::append_batch` — the /// helper-direct path. That's covered structurally by the -/// `needs_index_work_*` code review + the -/// `recovery_ensure_indices_handles_empty_tables` integration test.) +/// `needs_index_work_*` code path and the +/// `recovery_ensure_indices_handles_empty_tables` integration test. #[tokio::test] async fn ensure_indices_phase_b_failure_does_not_leak_sidecar_when_no_work_needed() { use omnigraph::loader::{LoadMode, load_jsonl}; @@ -625,8 +625,9 @@ async fn ensure_indices_phase_b_failure_does_not_leak_sidecar_when_no_work_neede } // Phase A: trigger the failpoint. Steady-state ensure_indices - // produces zero sidecar pins (per the round-2 fix); no sidecar is - // written. The failpoint still fires, surfacing the Err. + // produces zero sidecar pins (the helpers scope pins to tables + // that genuinely need work); no sidecar is written. The failpoint + // still fires, surfacing the Err. { let mut db = Omnigraph::open(&uri).await.unwrap(); let _failpoint = ScopedFailPoint::new( @@ -641,8 +642,8 @@ async fn ensure_indices_phase_b_failure_does_not_leak_sidecar_when_no_work_neede "unexpected error: {err}" ); - // KEY ASSERTION: no sidecar persists, because the round-2 fix - // scopes pins to tables that genuinely need work. Steady-state + // KEY ASSERTION: no sidecar persists, because the helpers + // scope pins to tables that genuinely need work. Steady-state // = no pins = no sidecar = no recovery state = zero open-time // overhead. let recovery_dir = dir.path().join("__recovery"); diff --git a/crates/omnigraph/tests/forbidden_apis.rs b/crates/omnigraph/tests/forbidden_apis.rs index 30185a3..cc9f163 100644 --- a/crates/omnigraph/tests/forbidden_apis.rs +++ b/crates/omnigraph/tests/forbidden_apis.rs @@ -1,4 +1,4 @@ -//! MR-793 Phase 3 — forbidden-API guard test. +//! Forbidden-API guard test. //! //! Engine code (`exec/`, `db/omnigraph/`, `loader/`, `changes/`) MUST NOT //! call Lance's inline-commit data-write APIs directly. The @@ -29,15 +29,15 @@ //! the cross-table manifest commit. Documented exception. //! - `crates/omnigraph/src/storage_layer.rs` — IS the trait module. //! -//! ## Initial state (MR-793 Phase 3) +//! ## Transitional allow-list //! -//! At the time this test was written, MR-793 has migrated three writers -//! (ensure_indices, branch_merge, schema_apply rewrites) onto staged -//! primitives. Other engine call sites (the bulk loader, exec/mutation, -//! exec/query, etc.) still use the legacy inherent `TableStore` methods -//! — they're not visible at the trait boundary, but they DO call lance -//! types. The allow-list below reflects this transitional state. Phase 9 -//! tightens the allow-list as call sites migrate. +//! The migration of writers onto staged primitives is incremental. +//! Several writers (ensure_indices, branch_merge, schema_apply rewrites) +//! already route through the staged primitives; others (bulk loader, +//! exec/mutation, exec/query) still use the legacy inherent +//! `TableStore` methods — they're not visible at the trait boundary, but +//! they DO call lance types. The file-level allow-list below reflects +//! this transitional state and tightens as call sites migrate. use std::path::{Path, PathBuf}; @@ -99,7 +99,7 @@ const ALLOW_LIST_FILES: &[&str] = &[ "storage_layer.rs", // The trait module. "commit_graph.rs", // Maintains `_graph_commits.lance` system table. "graph_coordinator.rs", // Drives the manifest publisher / branch coordinator. - "recovery_audit.rs", // Maintains `_graph_commit_recoveries.lance` (MR-847 audit trail). + "recovery_audit.rs", // Maintains `_graph_commit_recoveries.lance` (recovery audit trail). ]; /// Directories exempt from the guard. Files under these paths may use @@ -203,7 +203,7 @@ fn engine_code_does_not_call_forbidden_lance_apis() { if !violations.is_empty() { panic!( - "MR-793 forbidden-API guard found {} violation(s) in engine code. \ + "Forbidden-API guard found {} violation(s) in engine code. \ Engine code MUST route through the `TableStorage` trait (or its \ inherent counterparts on `TableStore`) instead of calling Lance's \ inline-commit APIs directly. If a use is genuinely justified, add \ diff --git a/crates/omnigraph/tests/recovery.rs b/crates/omnigraph/tests/recovery.rs index e1e659b..edfbd16 100644 --- a/crates/omnigraph/tests/recovery.rs +++ b/crates/omnigraph/tests/recovery.rs @@ -1,13 +1,13 @@ -//! MR-847 — open-time recovery sweep integration tests. +//! Open-time recovery sweep integration tests. //! //! These exercise the full `Omnigraph::open` cycle: drop a synthetic //! sidecar into `__recovery/`, advance some Lance HEADs to simulate the //! Phase B → Phase C residual, reopen the engine, and assert the sweep's //! decision-tree dispatch did the right thing. //! -//! The Phase 3 tests pin open-time invocation, `OpenMode::{ReadWrite, -//! ReadOnly}`, the roll-back path, and schema-version refusal. The Phase -//! 4 tests pin the roll-forward path + audit row recording. +//! Coverage: open-time invocation, `OpenMode::{ReadWrite, ReadOnly}`, +//! roll-back path, schema-version refusal, roll-forward path, and audit +//! row recording. use std::path::Path; @@ -311,6 +311,35 @@ async fn read_latest_recovery_audit( )) } +/// Helper: read every recovery audit row's `recovery_kind` value, in +/// storage order (multiple batches concatenated). Used by the +/// multi-sidecar fresh-snapshot test as a diagnostic alongside the +/// post-recovery Lance HEAD assertion. +async fn list_recovery_audit_kinds(repo_root: &Path) -> Vec { + let recoveries_dir = repo_root.join("_graph_commit_recoveries.lance"); + if !recoveries_dir.exists() { + return Vec::new(); + } + let ds = Dataset::open(recoveries_dir.to_str().unwrap()).await.unwrap(); + use arrow_array::{Array, StringArray}; + use futures::TryStreamExt; + let batches: Vec = + ds.scan().try_into_stream().await.unwrap().try_collect().await.unwrap(); + let mut out = Vec::new(); + for batch in batches { + let kinds = batch + .column_by_name("recovery_kind") + .expect("recovery_kind column present") + .as_any() + .downcast_ref::() + .expect("recovery_kind is Utf8"); + for i in 0..kinds.len() { + out.push(kinds.value(i).to_string()); + } + } + out +} + /// Helper: count `_graph_commits.lance` rows tagged with the recovery actor. async fn count_recovery_actor_commits(repo_root: &Path) -> usize { let actors_dir = repo_root.join("_graph_commit_actors.lance"); @@ -566,21 +595,19 @@ async fn recovery_rolls_forward_with_null_actor() { } // ===================================================================== -// PR #72 review fixes — integration tests +// Multi-sidecar processing — integration tests // ===================================================================== -/// PR #72 review (chatgpt-codex + cubic): multiple sidecars must be -/// processed in deterministic ORDER and against FRESH manifest snapshots. -/// Without sort + per-sidecar refresh, sidecar B can be classified -/// against sidecar A's stale pre-publish snapshot and incorrectly roll -/// back work that just landed. +/// Multiple sidecars must be processed in deterministic ORDER and against +/// FRESH manifest snapshots. Without sort + per-sidecar refresh, sidecar +/// B can be classified against sidecar A's stale pre-publish snapshot +/// and incorrectly roll back work that just landed. /// /// This test drops two synthetic sidecars on independent tables and /// asserts the sweep processes both end-to-end (both deleted, both -/// audited). The unit test -/// `list_sidecars_returns_deterministic_order` pins the sort order; this -/// integration test pins the multi-sidecar flow against a real engine -/// state. +/// audited). The unit test `list_sidecars_returns_deterministic_order` +/// pins the sort order; this integration test pins the multi-sidecar +/// flow against a real engine state. #[tokio::test] async fn recovery_processes_multiple_sidecars_with_fresh_snapshot_per_iter() { use omnigraph::loader::{LoadMode, load_jsonl}; @@ -666,19 +693,18 @@ async fn recovery_processes_multiple_sidecars_with_fresh_snapshot_per_iter() { ); } -/// PR #72 review (cubic site #13): `ensure_indices_for_branch` previously -/// pinned every catalog table in the sidecar. If only ONE table needed +/// `ensure_indices_for_branch` must only pin tables that actually need +/// new index work. If it pinned every catalog table and only one needed /// new indices, the others would classify as `NoMovement` on recovery, /// triggering the all-or-nothing decision rule to roll BACK the table /// that did get index work — destroying legitimate Phase B output. /// /// Steady-state case: when nothing needs indexing, no sidecar should -/// be written. A sibling test -/// `recovery_ensure_indices_skips_empty_tables_in_sidecar_scope` -/// (PR #72 round-2 review) covers the more nuanced empty-table case -/// where the existing ensure_indices loop has -/// `if row_count > 0 { build_indices(...) }` — empty tables produce -/// zero commits and would otherwise force NoMovement → rollback. +/// be written. The sibling test `recovery_ensure_indices_handles_empty_tables` +/// covers the more nuanced empty-table case where the existing +/// ensure_indices loop has `if row_count > 0 { build_indices(...) }` — +/// empty tables produce zero commits and would otherwise force +/// NoMovement → rollback. #[tokio::test] async fn recovery_ensure_indices_steady_state_no_sidecar() { use omnigraph::loader::{LoadMode, load_jsonl}; @@ -702,28 +728,26 @@ async fn recovery_ensure_indices_steady_state_no_sidecar() { ); } -/// PR #72 round-2 review (cubic): empty tables (zero rows) bypass -/// `build_indices_on_dataset` because `ensure_indices_for_branch` has -/// `if row_count > 0 { build_indices(...) }`. The needs_index_work_* -/// helpers must match this — pinning an empty table means recovery -/// classifies it as `NoMovement` (no commits ever ran) and rolls back -/// any sibling table's legitimate index work. +/// Empty tables (zero rows) bypass `build_indices_on_dataset` because +/// `ensure_indices_for_branch` has `if row_count > 0 { build_indices(...) }`. +/// The `needs_index_work_*` helpers must match this — pinning an empty +/// table means recovery classifies it as `NoMovement` (no commits ever +/// ran) and rolls back any sibling table's legitimate index work. /// /// Integration verification: after a real init + ensure_indices on a /// repo where every table is empty, the recovery sweep must complete -/// cleanly (no leftover sidecar) AND the next ensure_indices must -/// also leave no sidecar — proving the empty-table-scoping fix lets +/// cleanly (no leftover sidecar) AND the next ensure_indices must also +/// leave no sidecar — proving the empty-table-scoping behavior lets /// steady-state runs incur zero sidecar I/O. The -/// `count_rows == 0 → return false` short-circuit in -/// `needs_index_work_*` is what makes this work. +/// `count_rows == 0 → return false` short-circuit in `needs_index_work_*` +/// is what makes this work. /// -/// (A stronger assertion that captures the sidecar mid-flight and -/// verifies the persisted JSON omits empty tables would require -/// bypassing `load_jsonl` — which auto-builds indices via -/// `prepare_updates_for_commit`. Pinning that with a unit test on the -/// helpers directly would require bootstrapping an engine plus raw -/// Lance writes; deferred as a follow-up. The behavioral correctness -/// is verified by code inspection + bot review concurrence.) +/// A stronger assertion that captured the sidecar mid-flight and verified +/// the persisted JSON omits empty tables would require bypassing +/// `load_jsonl` (which auto-builds indices via +/// `prepare_updates_for_commit`); pinning that with a unit test on the +/// helpers directly would require bootstrapping an engine plus raw Lance +/// writes — left as a follow-up. #[tokio::test] async fn recovery_ensure_indices_handles_empty_tables() { let dir = tempfile::tempdir().unwrap(); @@ -745,26 +769,50 @@ async fn recovery_ensure_indices_handles_empty_tables() { ); } -/// PR #72 round-2 review (cubic site #4 follow-up): the original -/// `recovery_processes_multiple_sidecars_with_fresh_snapshot_per_iter` -/// test used independent tables so the fresh-snapshot fix wasn't -/// load-bearing. This test makes the second sidecar's classification -/// DEPEND on the first sidecar's manifest update — proving the refresh -/// is required for correctness. +/// Multi-sidecar processing must refresh the manifest snapshot between +/// sidecars: sidecar N's roll-forward writes manifest changes that +/// sidecar N+1 must observe, otherwise N+1 classifies its tables +/// against stale pins and may incorrectly run a Dataset::restore that +/// would not have run under a fresh view. /// /// Setup: /// - Sidecar A: kind=EnsureIndices (loose), refers to Person at -/// expected=v1, post=v2. After processing, manifest pin advances -/// to wherever Lance HEAD is at the time. +/// expected=v1, post=v2. /// - Sidecar B: kind=EnsureIndices (loose), refers to Person at -/// expected=v2 (the post-A manifest pin). +/// expected=v2, post=v3. +/// - Lance HEAD for Person sits at v3 (both writers' Phase B fragments +/// chained but neither's Phase C landed). /// -/// Without the fresh-snapshot refresh, sidecar B's `expected_version=v2` -/// is compared against the pre-A snapshot's pin (v1), failing the -/// loose-match `pin.expected_version == manifest_pinned` predicate -/// → classified as UnexpectedAtP1 → RollBack. With the refresh, -/// expected=v2 matches the new pin v2 → RolledPastExpected → roll -/// forward succeeds. +/// Outcome paths: +/// +/// **Stale-snapshot bug** (no per-sidecar refresh): +/// Sidecar A's classifier sees pre-recovery pin=v1, expected=v1 +/// matches → RolledPastExpected → RollForward to HEAD=v3. Manifest +/// advances Person v1 → v3. Sidecar B's classifier still sees the +/// STALE pin v1: lance_head=v3, manifest_pinned=v1, expected=v2. +/// Loose-match predicate `expected == manifest_pinned` fails (v2 != +/// v1); `lance_head == manifest_pinned + 1` fails (v3 != v2) → +/// UnexpectedMultistep → RollBack. Restore Person to expected=v2, +/// creating Lance HEAD v4. +/// +/// **Fresh-snapshot fix** (refresh per sidecar): +/// Sidecar A: same as above; manifest pin advances to v3. +/// Sidecar B refresh: classifier now sees pin=v3, lance_head=v3, +/// expected=v2. lance_head == manifest_pinned → NoMovement → RollBack +/// decision but the rollback loop has no eligible tables (only +/// {RolledPastExpected, UnexpectedAtP1, UnexpectedMultistep} are +/// restored), so it's a no-op rollback. Lance HEAD stays at v3. +/// +/// **Differentiating assertion**: post-recovery Lance HEAD for Person +/// must be == v3 (no restore happened). The stale-snapshot bug would +/// have advanced HEAD to v4 via Dataset::restore. +/// +/// Note: the audit row for sidecar B is "RolledBack" in the fix path +/// because the all-or-nothing decision sees NoMovement. Overlapping- +/// sidecar scenarios where one writer's HEAD-chained work absorbs the +/// other's are rare in practice — per-(table, branch) writer +/// serialization prevents them in steady state — but the recovery +/// sweep handles them safely without forward-progress drift. #[tokio::test] async fn recovery_multi_sidecar_requires_fresh_snapshot_for_correctness() { use omnigraph::loader::{LoadMode, load_jsonl}; @@ -856,13 +904,41 @@ async fn recovery_multi_sidecar_requires_fresh_snapshot_for_correctness() { 2, "two sidecars → two audit rows" ); + + // The "sidecars deleted + audit rows present" assertions above are + // necessary but not sufficient — both pass even when sidecar B rolls + // back under a stale snapshot (the bug path), because the sidecar is + // still deleted and an audit row is still written. The differentiating + // signal is the post-recovery Lance HEAD for Person: + // - Fresh-snapshot fix: sidecar B is no-op rollback (NoMovement); + // no Dataset::restore runs; HEAD stays at v3. + // - Stale-snapshot bug: sidecar B classifies as UnexpectedMultistep; + // restore advances HEAD to v4. + let ds_after = Dataset::open(&person_uri).await.unwrap(); + assert_eq!( + ds_after.version().version, + v3, + "Person Lance HEAD must remain v3 (no restore from stale-snapshot rollback); got {} \ + — a higher value indicates sidecar B classified UnexpectedMultistep against the \ + stale pre-recovery pin and ran a restore", + ds_after.version().version + ); + // Sanity: the audit kinds are diagnostic — first sidecar rolls forward + // (RolledPastExpected → RollForward); second is no-op rollback in this + // overlapping-sidecar scenario. + let kinds = list_recovery_audit_kinds(dir.path()).await; + assert_eq!(kinds.len(), 2, "expected 2 audit rows, got {:?}", kinds); + assert!( + matches!(kinds[0].as_str(), "RolledForward"), + "first sidecar must roll forward; got {:?}", + kinds + ); } -/// PR #72 review (cubic site #10): `OpenMode::ReadOnly` previously ran -/// `recover_schema_state_files` unconditionally, which can delete or -/// rename schema-staging files. Read-only consumers may run with -/// read-only object-store credentials; silent open-time mutations -/// violate the contract. +/// `OpenMode::ReadOnly` must NOT run `recover_schema_state_files`, +/// which can delete or rename schema-staging files. Read-only consumers +/// may run with read-only object-store credentials, and silent open-time +/// mutations violate the contract. /// /// This test drops a schema-staging file (which the recovery sweep /// would normally delete) then opens with ReadOnly mode. The staging diff --git a/crates/omnigraph/tests/staged_writes.rs b/crates/omnigraph/tests/staged_writes.rs index 4d3292f..83d5c30 100644 --- a/crates/omnigraph/tests/staged_writes.rs +++ b/crates/omnigraph/tests/staged_writes.rs @@ -397,8 +397,7 @@ async fn scan_with_staged_with_filter_silently_drops_staged_rows() { If you're here because this assertion failed: either (a) Lance \ exposed a way to scan uncommitted fragments without stats-based \ pruning (good — update to assert == [alice, carol, dave]), or \ - (b) something changed in our scan_with_staged path. See PR #67 \ - test fix discussion + .context/mr-794-step2-design.md §1.1." + (b) something changed in our scan_with_staged path." ); // Without filter, staged data IS visible — confirms the issue is @@ -493,7 +492,7 @@ async fn chained_stage_merge_insert_with_shared_key_documents_duplicate_behavior ); } -// ─── MR-793 Phase 2: stage_overwrite + scalar index staging ───────────────── +// ─── stage_overwrite + scalar index staging ───────────────── /// `stage_overwrite` writes replacement fragments to object storage but /// does NOT advance Lance HEAD until `commit_staged` runs. Mirrors @@ -663,11 +662,11 @@ async fn stage_create_inverted_index_does_not_advance_head_until_commit() { /// Pin the inline-commit behavior of `delete_where`. Lance 4.0.0 does /// NOT expose a public `DeleteJob::execute_uncommitted` -/// (`pub(crate)` — see lance-format/lance#6658). MR-793 deliberately +/// (`pub(crate)` — see lance-format/lance#6658). The trait deliberately /// does NOT introduce a `stage_delete` wrapper that would secretly -/// inline-commit (a side-channel — see design doc §3.2). Instead, the -/// trait keeps `delete_where` as the only delete entry point, named -/// honestly. +/// inline-commit (a side-channel between the staged and inline write +/// paths). Instead, the trait keeps `delete_where` as the only delete +/// entry point, named honestly. /// /// **When Lance #6658 lands**: this test will need to flip — replace /// the assertion with a `stage_delete` + `commit_staged` round-trip @@ -704,9 +703,10 @@ async fn delete_where_advances_head_inline_documents_residual() { /// `create_vector_index`. Lance 4.0.0 vector indices take the /// "segment commit path" which calls `build_index_metadata_from_segments` /// (`pub(crate)` in lance-4.0.0 `src/index.rs:111`). Until upstream -/// exposes that helper (companion ticket to #6658), MR-793's trait -/// surface deliberately does NOT include `stage_create_vector_index` — -/// see design doc Appendix A.3. +/// exposes that helper (companion ticket to lance-format/lance#6658), +/// the trait surface deliberately does NOT include +/// `stage_create_vector_index` — same rationale as `stage_delete`'s +/// absence (no side-channel between staged and inline write paths). #[tokio::test] async fn create_vector_index_advances_head_inline_documents_residual() { use arrow_array::FixedSizeListArray; @@ -759,13 +759,12 @@ async fn create_vector_index_advances_head_inline_documents_residual() { assert!(store.has_vector_index(&ds, "embedding").await.unwrap()); } -/// Empirical pin of `Dataset::restore` semantics for MR-847. +/// Empirical pin of `Dataset::restore` semantics for the recovery sweep. /// -/// MR-847's recovery sweep depends on the `restore` invariant: from -/// HEAD = `h`, calling `Dataset::checkout_version(p).await?` then +/// The recovery sweep depends on the `restore` invariant: from HEAD = +/// `h`, calling `Dataset::checkout_version(p).await?` then /// `Dataset::restore().await?` produces a NEW commit at HEAD = `h + 1` -/// (NOT `h + 2` as the v1 design draft assumed) with content == content -/// at version `p`. +/// with content == content at version `p`. /// /// The Lance source confirms this — `restore()` (no args) takes the /// currently-checked-out version's content and applies it via @@ -817,8 +816,8 @@ async fn lance_restore_appends_one_commit_with_checked_out_content() { head_before + 1, "Dataset::restore must append exactly one commit (HEAD + 1). If \ this assertion fires, lance changed restore semantics — re-read \ - lance src/dataset.rs::restore and update the MR-847 design AND \ - the recovery sweep's rollback path before proceeding." + lance src/dataset.rs::restore and update the recovery sweep's \ + rollback path before proceeding." ); // Content equality: the restored HEAD must match version 1 (just alice). @@ -839,8 +838,9 @@ async fn lance_restore_appends_one_commit_with_checked_out_content() { ); } -/// Empirical pin of the `Dataset::restore` concurrency hazard that motivates -/// MR-847's open-time-only invocation strategy and MR-856's queue-acquisition +/// Empirical pin of the `Dataset::restore` concurrency hazard that +/// motivates the recovery sweep's open-time-only invocation strategy +/// and any future continuous-recovery reconciler's queue-acquisition /// requirement. /// /// `Dataset::restore`'s `check_restore_txn` (lance-4.0.0 @@ -854,13 +854,14 @@ async fn lance_restore_appends_one_commit_with_checked_out_content() { /// rewind commit AFTER the legitimate concurrent Append, silently /// orphaning that Append's data from the active timeline. /// -/// MR-847 sidesteps this by running recovery only at `Omnigraph::open` -/// (before any other writers can race). MR-856's continuous-recovery +/// The recovery sweep sidesteps this by running only at `Omnigraph::open` +/// (before any other writers can race). A future continuous-recovery /// reconciler must acquire per-(table_key, branch) queues for sidecar /// tables before invoking restore — otherwise this hazard becomes /// reachable during in-flight tenant traffic. /// -/// This test is the load-bearing constraint MR-856 must honor. +/// This test is the load-bearing constraint any future reconciler must +/// honor. #[tokio::test] async fn lance_restore_loses_to_concurrent_append_via_orphaning() { let dir = tempfile::tempdir().unwrap(); @@ -879,8 +880,8 @@ async fn lance_restore_loses_to_concurrent_append_via_orphaning() { let mut recovery_handle = recovery_open.checkout_version(1).await.unwrap(); // Concurrent legitimate writer: appends bob, advancing HEAD to v2. - // This simulates MR-686's per-table-queue model where another tenant - // wrote between recovery's open and recovery's restore call. + // This simulates a per-table-queue model where another tenant wrote + // between recovery's open and recovery's restore call. let mut writer_handle = Dataset::open(&uri).await.unwrap(); store .append_batch(&uri, &mut writer_handle, person_batch(&[("bob", Some(25))])) @@ -901,8 +902,8 @@ async fn lance_restore_loses_to_concurrent_append_via_orphaning() { "Restore commits at HEAD+1 even when a concurrent commit landed \ between recovery's open and recovery's restore call. If this \ assertion fails, lance changed restore-vs-append conflict \ - semantics — re-read check_restore_txn and update MR-847's \ - concurrency analysis." + semantics — re-read check_restore_txn and update the recovery \ + sweep's concurrency analysis." ); let scanner = post.scan(); @@ -918,10 +919,10 @@ async fn lance_restore_loses_to_concurrent_append_via_orphaning() { ids, vec!["alice".to_string()], "Concurrent Append's row 'bob' was silently orphaned by the \ - Restore. Active-timeline contents == v1's contents. This is the \ - hazard MR-847 sidesteps via open-time-only invocation, and MR-856 \ - must guard against via per-(table, branch) queue acquisition. \ - Got: {:?}", + Restore. Active-timeline contents == v1's contents. The recovery \ + sweep sidesteps this hazard via open-time-only invocation; any \ + future continuous-recovery reconciler must guard against it via \ + per-(table, branch) queue acquisition. Got: {:?}", ids, ); diff --git a/docs/branches-commits.md b/docs/branches-commits.md index 37ed1d3..de6c653 100644 --- a/docs/branches-commits.md +++ b/docs/branches-commits.md @@ -56,7 +56,7 @@ Filtered from `branch_list()` but visible to internals: - `__schema_apply_lock__` — serializes schema migrations. - `__run__` — legacy from the pre-v0.4.0 Run state machine (removed in MR-771). The branch-name guard predicate `is_internal_run_branch` is kept as defense-in-depth so users cannot create a branch matching the legacy prefix; the filter will be removed once production legacy branches are swept (MR-770). -## L2 — Recovery audit trail (MR-847) +## L2 — Recovery audit trail The four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) protect their multi-table commits with a sidecar at `__recovery/{ulid}.json` written before Phase B and deleted after Phase C. The next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the recovery sweep in `crates/omnigraph/src/db/manifest/recovery.rs`: classify per-table state, decide all-or-nothing per sidecar, roll forward / back, record an audit row. diff --git a/docs/invariants.md b/docs/invariants.md index 4c8cfa0..68ab428 100644 --- a/docs/invariants.md +++ b/docs/invariants.md @@ -105,13 +105,13 @@ These are user-visible commitments. They state what the engine guarantees and wh Specific defaults (timeout values, memory caps, TTL windows) are *configuration*, not invariants — see [docs/constants.md](constants.md) and per-deployment configuration. The invariant is that bounds and contracts exist, not their numerical values. 23. **Atomicity is per-query.** Every `.gq` query is atomic — multi-statement mutations are all-or-nothing via the substrate's atomic-commit primitive. No cross-query `BEGIN`/`COMMIT`; branches and merges fill that role for agent workflows. - *Status: upheld at the writer-trait surface AND across process boundaries after MR-847 — the sealed `TableStorage` trait routes inserts / updates / scalar-index builds / merge_insert / overwrite through `stage_*` + `commit_staged` (Phase A is drift-free), and the open-time recovery sweep in `db/manifest/recovery.rs` (sidecars at `__recovery/{ulid}.json` written by `MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) closes the per-table commit_staged → manifest publish residual on the next `Omnigraph::open`. The "Lance HEAD ahead of `__manifest`" drift class is unreachable for op-execution failures and recoverable across process boundaries for finalize→publisher failures. Continuous in-process recovery (no restart required between Phase B failure and recovery) arrives with MR-856 (background recovery reconciler). Two writer paths still inline-commit pending upstream Lance work: `delete_where` (lance-format/lance#6658) and `create_vector_index` (lance-format/lance#6666).* + *Status: upheld at the writer-trait surface AND across process boundaries — the sealed `TableStorage` trait routes inserts / updates / scalar-index builds / merge_insert / overwrite through `stage_*` + `commit_staged` (Phase A is drift-free), and the open-time recovery sweep in `db/manifest/recovery.rs` (sidecars at `__recovery/{ulid}.json` written by `MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) closes the per-table commit_staged → manifest publish residual on the next `Omnigraph::open`. The "Lance HEAD ahead of `__manifest`" drift class is unreachable for op-execution failures and recoverable across process boundaries for finalize→publisher failures. Continuous in-process recovery (no restart required between Phase B failure and recovery) is the goal of a future background reconciler. Two writer paths still inline-commit pending upstream Lance work: `delete_where` (lance-format/lance#6658) and `create_vector_index` (lance-format/lance#6666).* 24. **Schema integrity is strict at commit.** Type validation, required-field presence (auto-filled from `@default` if declared), uniqueness across batches and versions, and referential integrity — all enforced before commit succeeds. Per-write softening flags are opt-in, never default. *Status: aspirational — referential integrity at scale requires SIP-backed cross-table validation; not yet implemented. Cross-batch / cross-version uniqueness tracked in MR-714.* 25. **Isolation: per-query snapshot; read-your-writes within and across queries in a session.** Each query reads from one consistent manifest version. Within a multi-statement mutation, the read subplan inside each write operator sees the writes from earlier statements. Across queries in a session, reads always resolve the latest manifest version — no reader pinning to older snapshots. - *Status: upheld for inserts/updates after MR-794 step 2+ — `MutationStaging`'s in-memory accumulator + `TableStore::scan_with_pending` (DataFusion `MemTable` union with the committed Lance scan, with merge-shadow semantics for chained updates) implements read-your-writes within a multi-statement mutation. Delete-touching mutations are limited to delete-only by parse-time D₂; closing the within-query RYW gap for deletes requires Lance's two-phase delete API (tracked: MR-793 / Lance-upstream lance-format/lance#6658). The "Lance HEAD ahead of `__manifest`" drift class is unreachable for op-execution failures (the partial-failure test pins this), and the narrower finalize→publisher residual is closed across one open cycle by the MR-847 recovery sweep — see [docs/runs.md](runs.md) "Open-time recovery sweep".* + *Status: upheld for inserts/updates — `MutationStaging`'s in-memory accumulator + `TableStore::scan_with_pending` (DataFusion `MemTable` union with the committed Lance scan, with merge-shadow semantics for chained updates) implements read-your-writes within a multi-statement mutation. Delete-touching mutations are limited to delete-only by parse-time D₂; closing the within-query RYW gap for deletes requires Lance's two-phase delete API (Lance-upstream lance-format/lance#6658). The "Lance HEAD ahead of `__manifest`" drift class is unreachable for op-execution failures (the partial-failure test pins this), and the narrower finalize→publisher residual is closed across one open cycle by the open-time recovery sweep — see [docs/runs.md](runs.md) "Open-time recovery sweep".* 26. **Durability before acknowledgement.** Commit returns only after the substrate has confirmed durable persistence. No "fast" or "fire-and-forget" durability levels. diff --git a/docs/maintenance.md b/docs/maintenance.md index af3ce45..6725269 100644 --- a/docs/maintenance.md +++ b/docs/maintenance.md @@ -16,7 +16,7 @@ - `CleanupPolicyOptions { keep_versions: Option, older_than: Option }` — at least one is required. - Returns `[TableCleanupStats { table_key, bytes_removed, old_versions_removed }]`. - CLI guards with `--confirm`; without it, prints a preview line. -- **MR-847 recovery floor:** `--keep < 3` may garbage-collect Lance versions that the open-time recovery sweep needs as a rollback target (the sweep restores to the manifest-pinned `expected_version`, which is HEAD-1 in the typical Phase B → Phase C drift case). Default `--keep 10` is safe. +- **Recovery floor:** `--keep < 3` may garbage-collect Lance versions that the open-time recovery sweep needs as a rollback target (the sweep restores to the manifest-pinned `expected_version`, which is HEAD-1 in the typical Phase B → Phase C drift case). Default `--keep 10` is safe. ## Tombstones diff --git a/docs/runs.md b/docs/runs.md index 1646464..4f6fd24 100644 --- a/docs/runs.md +++ b/docs/runs.md @@ -130,7 +130,7 @@ will replace it. Operator-driven (rare in agent workloads); document permanently until Lance exposes `Operation::Overwrite { fragments }` as a two-phase op. -### Open-time recovery sweep (MR-847) +### Open-time recovery sweep The staged-write rewire eliminates one drift class **by construction at the writer layer**: an op that fails before pushing to the in-memory @@ -140,7 +140,7 @@ the case the `partial_failure_leaves_target_queryable_and_unblocks_next_mutation test pins. A second, narrower drift class — the **finalize → publisher window** — -is closed across one open cycle by the MR-847 recovery sweep: +is closed across one open cycle by the open-time recovery sweep: `MutationStaging::finalize` runs `stage_*` + `commit_staged` per touched table sequentially, then the publisher commits the manifest. Lance has @@ -197,8 +197,8 @@ contention exceeding `PUBLISHER_RETRY_BUDGET = 5` retries. `Omnigraph::open` (typically a server restart), subsequent writers on the affected tables surface `ManifestConflictDetails::ExpectedVersionMismatch`. Continuous -in-process recovery (no restart required) arrives with MR-856 -(background recovery reconciler). +in-process recovery (no restart required) is the goal of a future +background reconciler. The publisher-CAS contract is unchanged: a *concurrent writer* that advances any of our touched tables between snapshot capture and diff --git a/docs/storage.md b/docs/storage.md index daca1e8..825fbbe 100644 --- a/docs/storage.md +++ b/docs/storage.md @@ -63,7 +63,7 @@ flowchart TB nodes["nodes/{fnv1a64-hex}/
one dataset per node type"]:::l2 edges["edges/{fnv1a64-hex}/
one dataset per edge type"]:::l2 cgraph["_graph_commits.lance/
_graph_commit_actors.lance/
_graph_commit_recoveries.lance/"]:::l2 - recovery["__recovery/{ulid}.json
MR-847 sidecars (transient)"]:::l2 + recovery["__recovery/{ulid}.json
recovery sidecars (transient)"]:::l2 refs["_refs/branches/{name}.json
graph-level branches"]:::l2 repo --> manifest @@ -92,8 +92,8 @@ flowchart TB - **`__manifest/`** is a Lance dataset whose rows describe which sub-table version is published at which graph-branch. Reading a snapshot starts here. - **`nodes/`** and **`edges/`** are sibling directories holding one Lance dataset per declared type. Names are `fnv1a64-hex` of the type name to keep paths fixed-length and case-safe. - **`_graph_commits.lance`** is an L2 dataset that records the graph-level commit DAG, with a paired `_graph_commit_actors.lance` for the actor map. (Pre-v0.4.0 repos also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; MR-770 sweeps these in production.) -- **`_graph_commit_recoveries.lance`** (MR-847) — one row per recovery sweep action. Joined to `_graph_commits.lance` by `graph_commit_id`; the linked commit row carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join. See `crates/omnigraph/src/db/recovery_audit.rs`. -- **`__recovery/{ulid}.json`** (MR-847) — transient sidecar files written by the four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) before Phase B begins, deleted after Phase C succeeds. A sidecar persisting after process exit means the writer crashed in the Phase B → Phase C window; the next `Omnigraph::open` recovery sweep processes it. Steady-state directory is empty. See `crates/omnigraph/src/db/manifest/recovery.rs`. +- **`_graph_commit_recoveries.lance`** — one row per recovery sweep action. Joined to `_graph_commits.lance` by `graph_commit_id`; the linked commit row carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join. See `crates/omnigraph/src/db/recovery_audit.rs`. +- **`__recovery/{ulid}.json`** — transient sidecar files written by the four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) before Phase B begins, deleted after Phase C succeeds. A sidecar persisting after process exit means the writer crashed in the Phase B → Phase C window; the next `Omnigraph::open` recovery sweep processes it. Steady-state directory is empty. See `crates/omnigraph/src/db/manifest/recovery.rs`. - **`_refs/branches/{name}.json`** is graph-level branch metadata — pointers from a branch name to the manifest version it heads. - **Inside each Lance dataset** (orange): the standard Lance directory layout. `_versions/{n}.manifest` records every commit; `data/` holds the actual Arrow fragments; `_indices/{uuid}/` holds index segments with their own `fragment_bitmap` for partial coverage; `_refs/` holds Lance-native per-dataset branches and tags. diff --git a/docs/testing.md b/docs/testing.md index 675491a..a53c447 100644 --- a/docs/testing.md +++ b/docs/testing.md @@ -32,8 +32,8 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav | `export.rs` | NDJSON streaming export filters | | `s3_storage.rs` | S3-backed repo (skipped unless `OMNIGRAPH_S3_TEST_BUCKET` is set) | | `lance_version_columns.rs` | Per-row `_row_last_updated_at_version` behavior | -| `failpoints.rs` | Failure-injection coverage (gated on `failpoints` feature). Includes the four MR-847 per-writer Phase B → recovery integration tests (`recovery_rolls_forward_after_finalize_publisher_failure`, `schema_apply_phase_b_failure_recovered_on_next_open`, `branch_merge_phase_b_failure_recovered_on_next_open`, `ensure_indices_phase_b_failure_recovered_on_next_open`). | -| `recovery.rs` | MR-847 open-time recovery sweep — sidecar I/O, classifier dispatch (NoMovement / RolledPastExpected / UnexpectedAtP1 / UnexpectedMultistep / InvariantViolation), all-or-nothing decision, roll-forward via `ManifestBatchPublisher::publish`, roll-back via `Dataset::restore`, audit row in `_graph_commit_recoveries.lance`, `OpenMode::ReadOnly` skip path | +| `failpoints.rs` | Failure-injection coverage (gated on `failpoints` feature). Includes the four per-writer Phase B → recovery integration tests (`recovery_rolls_forward_after_finalize_publisher_failure`, `schema_apply_phase_b_failure_recovered_on_next_open`, `branch_merge_phase_b_failure_recovered_on_next_open`, `ensure_indices_phase_b_failure_recovered_on_next_open`). | +| `recovery.rs` | Open-time recovery sweep — sidecar I/O, classifier dispatch (NoMovement / RolledPastExpected / UnexpectedAtP1 / UnexpectedMultistep / InvariantViolation), all-or-nothing decision, roll-forward via `ManifestBatchPublisher::publish`, roll-back via `Dataset::restore`, audit row in `_graph_commit_recoveries.lance`, `OpenMode::ReadOnly` skip path | ## Fixtures