mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-09 01:35:18 +02:00
recovery: rename composite test, strip ticket references, address review
Three bundled changes:
1. Rename `tests/agent_lifecycle.rs` -> `tests/composite_flow.rs` (and
the test function). OmniGraph is consumed by both humans and agents
- naming the test after one audience misframes the library.
2. Strip Linear ticket IDs, PR numbers, bot reviewer names, and
review-round labels from source, tests, and docs added by this
branch. Internal traceability belongs in commit messages and PR
descriptions, not in checked-in artifacts. Upstream
lance-format/lance issue refs and pre-existing MR-XXX refs in docs
not touched by this branch are left alone.
3. Two outstanding review findings addressed:
- `needs_index_work_node` / `needs_index_work_edge`: propagate
`count_rows` errors instead of `unwrap_or(0)`. Silently treating
transient I/O failures as "0 rows" risked skipping a table from
the recovery sidecar pin set that was actually about to be
modified.
- `recovery_multi_sidecar_requires_fresh_snapshot_for_correctness`:
strengthen the assertion to fail when sidecar B classifies under
a stale snapshot. The new assertion checks post-recovery Lance
HEAD == v3 (no `Dataset::restore` ran). The previous "sidecar
deleted + audit rows present" pair passed in both the bug and
fix paths because both delete the sidecar and write an audit
row; the differentiator is the post-recovery HEAD. Strengthening
the assertion exposed an additional nuance: in this overlapping-
sidecar scenario sidecar B's audit kind is RolledBack (no-op)
rather than RolledForward, since sidecar A's roll-forward
publishes Lance HEAD as the new manifest pin (absorbing B's
work). The docstring now explains why this is correct given
current `roll_forward_all` semantics.
All workspace tests pass with --features failpoints.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
78cc548846
commit
05e52f2ee0
22 changed files with 430 additions and 374 deletions
|
|
@ -198,7 +198,7 @@ omnigraph policy explain --actor act-alice --action change --branch main
|
|||
| Columnar storage on object store | ✅ Arrow/Lance | URI normalization, S3 env-var plumbing |
|
||||
| Per-dataset versioning + time travel | ✅ | `snapshot_at_version`, `entity_at`, snapshot-pinned reads across many tables |
|
||||
| Per-dataset branches | ✅ | **Graph-level** branches (atomic across all sub-tables), lazy fork, system branch filtering |
|
||||
| Atomic single-dataset commits | ✅ | **Multi-table publish via three layers**, NOT a single Lance primitive: (1) per-table Lance `commit_staged` for the data write, (2) `__manifest` row-level CAS via `ManifestBatchPublisher` for cross-table ordering, (3) recovery-on-open reconciler (MR-847) for the residual gap between (1) and (2). All three layers ship; the four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) write a `__recovery/{ulid}.json` sidecar before Phase B and delete it after Phase C. The next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the sweep in `db/manifest/recovery.rs`: classify, decide all-or-nothing per sidecar, roll forward via single `ManifestBatchPublisher::publish` or roll back via `Dataset::restore`, and record an audit row in `_graph_commit_recoveries.lance` (queryable via `omnigraph commit list --filter actor=omnigraph:recovery`). Continuous in-process recovery (no restart needed between Phase B failure and recovery) arrives with [MR-856](https://linear.app/modernrelay/issue/MR-856) (background reconciler). Engine writes route through a sealed `TableStorage` trait (MR-793) exposing `stage_*` + `commit_staged` as the canonical staged-write surface; documented inline-commit residuals (`delete_where`, `create_vector_index`, plus legacy `append_batch` / `merge_insert_batches` / `overwrite_batch` / `create_*_index` pending Phase 9) remain on the trait until upstream Lance ships a public two-phase API ([#6658](https://github.com/lance-format/lance/issues/6658), [#6666](https://github.com/lance-format/lance/issues/6666)) and call-site conversion (Phase 1b) completes. |
|
||||
| Atomic single-dataset commits | ✅ | **Multi-table publish via three layers**, NOT a single Lance primitive: (1) per-table Lance `commit_staged` for the data write, (2) `__manifest` row-level CAS via `ManifestBatchPublisher` for cross-table ordering, (3) the open-time recovery sweep for the residual gap between (1) and (2). All three layers ship; the four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) write a `__recovery/{ulid}.json` sidecar before Phase B and delete it after Phase C. The next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the sweep in `db/manifest/recovery.rs`: classify, decide all-or-nothing per sidecar, roll forward via single `ManifestBatchPublisher::publish` or roll back via `Dataset::restore`, and record an audit row in `_graph_commit_recoveries.lance` (queryable via `omnigraph commit list --filter actor=omnigraph:recovery`). Continuous in-process recovery (no restart needed between Phase B failure and recovery) is the goal of a future background reconciler. Engine writes route through a sealed `TableStorage` trait exposing `stage_*` + `commit_staged` as the canonical staged-write surface; documented inline-commit residuals (`delete_where`, `create_vector_index`, plus legacy `append_batch` / `merge_insert_batches` / `overwrite_batch` / `create_*_index`) remain on the trait until upstream Lance ships a public two-phase API ([#6658](https://github.com/lance-format/lance/issues/6658), [#6666](https://github.com/lance-format/lance/issues/6666)) and the migration of every call site completes. |
|
||||
| Compaction (`compact_files`) | ✅ | `omnigraph optimize` orchestrates over all node/edge tables, bounded concurrency |
|
||||
| Cleanup (`cleanup_old_versions`) | ✅ | `omnigraph cleanup` with `--keep` / `--older-than` policy |
|
||||
| BTREE / inverted (FTS) / vector indexes | ✅ | `ensure_indices` builds them on every relevant column; idempotent; lazy across branches |
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
//! MR-847 — Recovery-on-open primitives.
|
||||
//! Recovery-on-open primitives.
|
||||
//!
|
||||
//! This module implements the building blocks of the per-sidecar recovery
|
||||
//! sweep that closes the documented Phase B → Phase C residual (see
|
||||
//! `docs/runs.md` "Finalize → publisher residual"). The high-level shape:
|
||||
//! `docs/runs.md` "Open-time recovery sweep"). The high-level shape:
|
||||
//!
|
||||
//! 1. Each writer that performs a multi-table commit writes a small JSON
|
||||
//! sidecar at `__recovery/{ulid}.json` BEFORE its per-table
|
||||
|
|
@ -17,11 +17,6 @@
|
|||
//! `post_commit_pin` AND matches the sidecar) or rolls back all
|
||||
//! `RolledPastExpected` tables to `expected_version`.
|
||||
//!
|
||||
//! Phase 2 (this commit) ships only the primitives: sidecar I/O,
|
||||
//! classifier, decision dispatcher, restore-with-fragment-set-shortcut.
|
||||
//! No integration into `Omnigraph::open` or any writer yet — those land
|
||||
//! in Phase 3+.
|
||||
//!
|
||||
//! ## Verified Lance behavior the rollback path depends on
|
||||
//!
|
||||
//! - `Dataset::restore()` takes no version arg; restores
|
||||
|
|
@ -33,12 +28,10 @@
|
|||
//! CreateIndex/Merge — see `check_restore_txn` at lance-4.0.0
|
||||
//! `src/io/commit/conflict_resolver.rs:986`. The hazard is documented
|
||||
//! by `tests/staged_writes.rs::lance_restore_loses_to_concurrent_append_via_orphaning`.
|
||||
//! MR-847 sidesteps this by running recovery only at `Omnigraph::open`
|
||||
//! (before any other writers can race); MR-856's continuous-recovery
|
||||
//! reconciler must guard via per-(table_key, branch) queue acquisition
|
||||
//! once MR-686 lands.
|
||||
//!
|
||||
//! See `.context/mr-847-design.md` for the full design.
|
||||
//! This module sidesteps the hazard by running recovery only at
|
||||
//! `Omnigraph::open` (before any other writers can race). A future
|
||||
//! continuous in-process recovery reconciler will need to guard via
|
||||
//! per-(table_key, branch) queue acquisition.
|
||||
|
||||
use std::collections::HashMap;
|
||||
|
||||
|
|
@ -271,7 +264,7 @@ pub(crate) async fn list_sidecars(
|
|||
// === chronologically sortable; the older sidecar is processed
|
||||
// before the newer one. Without this sort, `list_dir` returns
|
||||
// filesystem-order results which are nondeterministic and can mask
|
||||
// ordering-sensitive bugs. (PR #72 review.)
|
||||
// ordering-sensitive bugs.
|
||||
uris.sort();
|
||||
let mut out = Vec::with_capacity(uris.len());
|
||||
for uri in uris {
|
||||
|
|
@ -335,8 +328,7 @@ pub(crate) fn parse_sidecar(sidecar_uri: &str, body: &str) -> Result<RecoverySid
|
|||
/// `pin.expected_version == manifest_pinned` (the writer's CAS
|
||||
/// target matches what the manifest currently shows). The risk this
|
||||
/// admits — an external agent advancing HEAD between sidecar write
|
||||
/// and recovery — is out of scope for the single-coordinator model
|
||||
/// (MR-668 territory).
|
||||
/// and recovery — is out of scope for the single-coordinator model.
|
||||
pub(crate) fn classify_table(
|
||||
pin: &SidecarTablePin,
|
||||
lance_head: u64,
|
||||
|
|
@ -451,12 +443,11 @@ fn fragment_ids(ds: &Dataset) -> Vec<u64> {
|
|||
/// in [`restore_table_to_version`] prevents version pile-up under
|
||||
/// repeated mid-rollback crashes.
|
||||
///
|
||||
/// Concurrency: today (pre-MR-686) recovery runs synchronously in
|
||||
/// `Omnigraph::open` *before* the engine is wrapped in the server's
|
||||
/// `Arc<RwLock<Omnigraph>>`. No request handlers can race. Under MR-686
|
||||
/// + MR-856 (background reconciler) the per-(table_key, branch) queues
|
||||
/// will need acquisition before the sweep restores or publishes — see
|
||||
/// `.context/mr-847-design.md` "Concurrency policy" §"After MR-686".
|
||||
/// Concurrency: today recovery runs synchronously in `Omnigraph::open`
|
||||
/// *before* the engine is wrapped in the server's `Arc<RwLock<Omnigraph>>`.
|
||||
/// No request handlers can race. A future per-(table_key, branch) writer
|
||||
/// queue model (paired with a background reconciler) will need to acquire
|
||||
/// queues before the sweep restores or publishes.
|
||||
pub(crate) async fn recover_manifest_drift(
|
||||
root_uri: &str,
|
||||
storage: &dyn StorageAdapter,
|
||||
|
|
@ -467,12 +458,12 @@ pub(crate) async fn recover_manifest_drift(
|
|||
return Ok(());
|
||||
}
|
||||
|
||||
// PR #72 review (chatgpt-codex + cubic): refresh the coordinator
|
||||
// snapshot BEFORE each sidecar's classification. Sidecar N's
|
||||
// roll-forward writes manifest changes that sidecar N+1 must
|
||||
// observe, otherwise sidecar N+1 classifies its tables against
|
||||
// stale pins and may incorrectly roll back work that landed
|
||||
// moments earlier. Refresh is cheap (one Lance manifest read).
|
||||
// Refresh the coordinator snapshot BEFORE each sidecar's
|
||||
// classification. Sidecar N's roll-forward writes manifest changes
|
||||
// that sidecar N+1 must observe, otherwise sidecar N+1 classifies
|
||||
// its tables against stale pins and may incorrectly roll back work
|
||||
// that landed moments earlier. Refresh is cheap (one Lance manifest
|
||||
// read).
|
||||
for sidecar in sidecars {
|
||||
coordinator.refresh().await?;
|
||||
let snapshot = coordinator.snapshot();
|
||||
|
|
@ -896,13 +887,12 @@ mod tests {
|
|||
);
|
||||
}
|
||||
|
||||
/// PR #72 review (cubic + cursor) flagged that BranchMerge is in
|
||||
/// the strict classifier set, but `publish_rewritten_merge_table`
|
||||
/// runs multiple `commit_staged` calls per table (merge_insert +
|
||||
/// delete_where + index rebuilds — the comment in `merge.rs`
|
||||
/// explicitly says so). Strict classification rolls back valid
|
||||
/// completed Phase B work as `UnexpectedMultistep`. BranchMerge
|
||||
/// must be loose-matched like SchemaApply / EnsureIndices.
|
||||
/// BranchMerge must be loose-matched, not strict: while the strict
|
||||
/// classifier expects exactly one `commit_staged` per table,
|
||||
/// `publish_rewritten_merge_table` runs multiple per table
|
||||
/// (merge_insert + delete_where + index rebuilds — the comment in
|
||||
/// `merge.rs` explicitly says so). Strict classification would roll
|
||||
/// back valid completed Phase B work as `UnexpectedMultistep`.
|
||||
#[test]
|
||||
fn classify_loose_match_accepts_multi_commit_drift_for_branch_merge() {
|
||||
let pin = make_pin("node:Person", "irrelevant", 5, 6);
|
||||
|
|
@ -1091,12 +1081,11 @@ mod tests {
|
|||
assert!(result.is_empty());
|
||||
}
|
||||
|
||||
/// PR #72 review (cubic) flagged that `list_dir` returns
|
||||
/// filesystem-order results, making sidecar processing
|
||||
/// nondeterministic. Sidecar filenames are ULIDs (lexicographically
|
||||
/// sortable === chronologically sortable), so sorting by URI gives
|
||||
/// deterministic, time-ordered processing — the older sidecar
|
||||
/// processed before the newer one.
|
||||
/// `list_dir` returns filesystem-order results, which would make
|
||||
/// sidecar processing nondeterministic. Sidecar filenames are ULIDs
|
||||
/// (lexicographically sortable === chronologically sortable), so
|
||||
/// sorting by URI gives deterministic, time-ordered processing —
|
||||
/// the older sidecar processed before the newer one.
|
||||
#[tokio::test]
|
||||
async fn list_sidecars_returns_deterministic_order() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
|
|
|||
|
|
@ -81,17 +81,15 @@ pub struct Omnigraph {
|
|||
pub(crate) audit_actor_id: Option<String>,
|
||||
}
|
||||
|
||||
/// Whether [`Omnigraph::open`] runs the MR-847 recovery sweep.
|
||||
/// Whether [`Omnigraph::open`] runs the open-time recovery sweep.
|
||||
///
|
||||
/// Recovery requires Lance writes (`Dataset::restore`, `ManifestBatchPublisher::publish`).
|
||||
/// Read-only consumers — NDJSON export, `commit list`, `read`, schema
|
||||
/// inspection — should not trigger writes (they may run with read-only
|
||||
/// object-store credentials, and silent open-time mutations are surprising).
|
||||
/// They also don't need recovery: reads always resolve through the manifest
|
||||
/// pin, which is the consistent snapshot regardless of any Phase B → Phase C
|
||||
/// drift on the per-table side.
|
||||
///
|
||||
/// See `.context/mr-847-design.md` § "Read-only opens".
|
||||
/// object-store credentials, and silent open-time mutations are
|
||||
/// surprising). They also don't need recovery: reads always resolve
|
||||
/// through the manifest pin, which is the consistent snapshot regardless
|
||||
/// of any Phase B → Phase C drift on the per-table side.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub enum OpenMode {
|
||||
/// Run the recovery sweep on open. Default for `Omnigraph::open`.
|
||||
|
|
@ -142,13 +140,13 @@ impl Omnigraph {
|
|||
/// Open an existing repo (read-write).
|
||||
///
|
||||
/// Reads `_schema.pg`, parses it, builds the catalog, and opens `__manifest`.
|
||||
/// Runs the MR-847 recovery sweep before returning — see [`OpenMode`].
|
||||
/// Runs the open-time recovery sweep before returning — see [`OpenMode`].
|
||||
pub async fn open(uri: &str) -> Result<Self> {
|
||||
Self::open_with_storage_and_mode(uri, storage_for_uri(uri)?, OpenMode::ReadWrite).await
|
||||
}
|
||||
|
||||
/// Open an existing repo for read-only consumers (NDJSON export,
|
||||
/// `commit list`, etc.). Skips the MR-847 recovery sweep — see [`OpenMode`].
|
||||
/// `commit list`, etc.). Skips the recovery sweep — see [`OpenMode`].
|
||||
pub async fn open_read_only(uri: &str) -> Result<Self> {
|
||||
Self::open_with_storage_and_mode(uri, storage_for_uri(uri)?, OpenMode::ReadOnly).await
|
||||
}
|
||||
|
|
@ -171,22 +169,23 @@ impl Omnigraph {
|
|||
// Open the coordinator first so the schema-staging recovery sweep can
|
||||
// compare its snapshot against any leftover staging files.
|
||||
let mut coordinator = GraphCoordinator::open(&root, Arc::clone(&storage)).await?;
|
||||
// Both the schema-state recovery sweep AND the MR-847 recovery sweep
|
||||
// are gated on `OpenMode::ReadWrite`. Read-only consumers (NDJSON
|
||||
// export, `commit list`, schema show) shouldn't trigger object-store
|
||||
// mutations: they may run with read-only credentials, and silent
|
||||
// open-time writes are surprising. Both sweeps' work is recoverable
|
||||
// on the next ReadWrite open, so skipping under ReadOnly doesn't
|
||||
// lose any safety guarantees — the manifest pin is the consistent
|
||||
// snapshot regardless of drift on the per-table side or leftover
|
||||
// schema-staging files.
|
||||
// Both the schema-state recovery sweep AND the manifest-drift
|
||||
// recovery sweep are gated on `OpenMode::ReadWrite`. Read-only
|
||||
// consumers (NDJSON export, `commit list`, schema show) shouldn't
|
||||
// trigger object-store mutations: they may run with read-only
|
||||
// credentials, and silent open-time writes are surprising. Both
|
||||
// sweeps' work is recoverable on the next ReadWrite open, so
|
||||
// skipping under ReadOnly doesn't lose any safety guarantees —
|
||||
// the manifest pin is the consistent snapshot regardless of
|
||||
// drift on the per-table side or leftover schema-staging files.
|
||||
if matches!(mode, OpenMode::ReadWrite) {
|
||||
recover_schema_state_files(&root, Arc::clone(&storage), &coordinator.snapshot())
|
||||
.await?;
|
||||
// MR-847 recovery sweep: close the Phase B → Phase C residual on
|
||||
// Recovery sweep: close the Phase B → Phase C residual on
|
||||
// any sidecar left over from a crashed writer. Continuous
|
||||
// in-process recovery for long-running servers is MR-856
|
||||
// (background reconciler).
|
||||
// in-process recovery for long-running servers (no restart
|
||||
// required between Phase B failure and recovery) is a
|
||||
// separate background-reconciler effort.
|
||||
crate::db::manifest::recover_manifest_drift(
|
||||
&root,
|
||||
storage.as_ref(),
|
||||
|
|
@ -259,27 +258,25 @@ impl Omnigraph {
|
|||
|
||||
/// Engine-facing trait surface around `TableStore`.
|
||||
///
|
||||
/// MR-793 Phase 1: this is the canonical accessor for newly-written
|
||||
/// engine code. The trait's signatures use opaque `SnapshotHandle` /
|
||||
/// `StagedHandle` instead of leaking `lance::Dataset` /
|
||||
/// This is the canonical accessor for newly-written engine code. The
|
||||
/// trait's signatures use opaque `SnapshotHandle` / `StagedHandle`
|
||||
/// instead of leaking `lance::Dataset` /
|
||||
/// `lance::dataset::transaction::Transaction`. Existing call sites
|
||||
/// that still use `db.table_store.X(...)` (the inherent struct
|
||||
/// methods) are migrated incrementally — see §9 of
|
||||
/// `.context/mr-793-design.md`.
|
||||
/// methods) are migrated incrementally.
|
||||
pub(crate) fn storage(&self) -> &dyn crate::storage_layer::TableStorage {
|
||||
&self.table_store
|
||||
}
|
||||
|
||||
/// Engine-level access to the object-store adapter (S3 / local fs).
|
||||
/// Used by the MR-847 recovery sidecar protocol — writers in the
|
||||
/// engine call this to write/delete sidecars at `__recovery/{ulid}.json`.
|
||||
/// Used by the recovery sidecar protocol — writers in the engine
|
||||
/// call this to write/delete sidecars at `__recovery/{ulid}.json`.
|
||||
pub(crate) fn storage_adapter(&self) -> &dyn crate::storage::StorageAdapter {
|
||||
self.storage.as_ref()
|
||||
}
|
||||
|
||||
/// Engine-level access to the repo's normalized root URI. Used by
|
||||
/// the MR-847 recovery sidecar protocol to compute `__recovery/`
|
||||
/// paths.
|
||||
/// the recovery sidecar protocol to compute `__recovery/` paths.
|
||||
pub(crate) fn root_uri(&self) -> &str {
|
||||
&self.root_uri
|
||||
}
|
||||
|
|
|
|||
|
|
@ -151,7 +151,7 @@ pub(super) async fn apply_schema_with_lock(
|
|||
let mut table_updates = HashMap::<String, crate::db::SubTableUpdate>::new();
|
||||
let mut table_tombstones = HashMap::<String, u64>::new();
|
||||
|
||||
// MR-847 sidecar: protect the per-table commit_staged loop in
|
||||
// Recovery sidecar: protect the per-table commit_staged loop in
|
||||
// rewritten_tables + indexed_tables. The post_commit_pin we record
|
||||
// here is a lower bound (expected + 1); the classifier loose-matches
|
||||
// for SidecarKind::SchemaApply because the actual N depends on how
|
||||
|
|
@ -281,8 +281,8 @@ pub(super) async fn apply_schema_with_lock(
|
|||
)
|
||||
.await?;
|
||||
let dataset_uri = db.table_store.dataset_uri(&entry.table_path);
|
||||
// MR-793 Phase 6: route through stage_overwrite + commit_staged
|
||||
// for non-empty batches. Lance's `InsertBuilder::execute_uncommitted`
|
||||
// Route through stage_overwrite + commit_staged for non-empty
|
||||
// batches. Lance's `InsertBuilder::execute_uncommitted`
|
||||
// errors on empty data (lance-4.0.0 `src/dataset/write/insert.rs:144`),
|
||||
// so the empty-rewrite case stays on `overwrite_dataset` (which
|
||||
// accepts empty input). The empty case is rare in schema_apply
|
||||
|
|
@ -440,13 +440,13 @@ pub(super) async fn apply_schema_with_lock(
|
|||
db.invalidate_graph_index().await;
|
||||
}
|
||||
|
||||
// MR-847 sidecar lifecycle: delete after the manifest commit succeeded.
|
||||
// Best-effort: if this delete fails, the sidecar persists; on next open
|
||||
// the sweep sees every table at the post-publish manifest pin
|
||||
// (NoMovement) and the sidecar is treated as a stale artifact
|
||||
// (recovery is a no-op and the sidecar is cleaned up). Failing the
|
||||
// schema_apply call would report failure for a migration that
|
||||
// already succeeded (PR #72 review).
|
||||
// Recovery sidecar lifecycle: delete after the manifest commit
|
||||
// succeeded. Best-effort: if this delete fails, the sidecar persists
|
||||
// and on next open the sweep sees every table at the post-publish
|
||||
// manifest pin (NoMovement) and the sidecar is treated as a stale
|
||||
// artifact (recovery is a no-op and the sidecar is cleaned up).
|
||||
// Failing the schema_apply call would report failure for a migration
|
||||
// that already succeeded.
|
||||
if let Some(handle) = recovery_handle {
|
||||
if let Err(err) =
|
||||
crate::db::manifest::delete_sidecar(&handle, db.storage_adapter()).await
|
||||
|
|
@ -454,7 +454,7 @@ pub(super) async fn apply_schema_with_lock(
|
|||
tracing::warn!(
|
||||
error = %err,
|
||||
operation_id = handle.operation_id.as_str(),
|
||||
"MR-847 sidecar cleanup failed; the next open's recovery sweep will resolve it"
|
||||
"recovery sidecar cleanup failed; the next open's recovery sweep will resolve it"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -42,14 +42,14 @@ pub(super) async fn ensure_indices_for_branch(
|
|||
let mut updates = Vec::new();
|
||||
let active_branch = resolved.branch;
|
||||
|
||||
// MR-847 sidecar: protect the per-table commit_staged loop in
|
||||
// Recovery sidecar: protect the per-table commit_staged loop in
|
||||
// build_indices_on_dataset (one commit per index built). Only pins
|
||||
// tables that ACTUALLY need index work — the classifier
|
||||
// loose-matches for SidecarKind::EnsureIndices (the actual N
|
||||
// depends on which indices are missing), but if a table needs zero
|
||||
// commits and gets pinned, the all-or-nothing decision rule treats
|
||||
// it as `NoMovement` and rolls back legitimately-committed work on
|
||||
// sibling tables (PR #72 review). Steady-state runs (everything
|
||||
// commits and gets pinned, the all-or-nothing decision rule
|
||||
// classifies it as `NoMovement` and rolls back legitimately-
|
||||
// committed work on sibling tables. Steady-state runs (everything
|
||||
// already indexed) skip the sidecar entirely.
|
||||
let mut recovery_pins: Vec<crate::db::manifest::SidecarTablePin> = Vec::new();
|
||||
for type_name in db.catalog.node_types.keys() {
|
||||
|
|
@ -201,7 +201,7 @@ pub(super) async fn ensure_indices_for_branch(
|
|||
}
|
||||
}
|
||||
|
||||
// MR-847 failpoint: pin the per-writer Phase B → Phase C residual for
|
||||
// Failpoint: pin the per-writer Phase B → Phase C residual for
|
||||
// ensure_indices. Lance HEAD has advanced on every touched table
|
||||
// (one commit_staged per index built) but the manifest publish below
|
||||
// hasn't run. Used by
|
||||
|
|
@ -212,9 +212,10 @@ pub(super) async fn ensure_indices_for_branch(
|
|||
commit_prepared_updates_on_branch(db, branch, &updates).await?;
|
||||
}
|
||||
|
||||
// MR-847 sidecar lifecycle: delete after the manifest publish (or
|
||||
// no-op when there were no updates — sidecar covered the per-table
|
||||
// commit window regardless). Best-effort cleanup (PR #72 review).
|
||||
// Recovery sidecar lifecycle: delete after the manifest publish (or
|
||||
// no-op when there were no updates — the sidecar covered the
|
||||
// per-table commit window regardless). Best-effort cleanup; failing
|
||||
// the user here would error a call that already succeeded.
|
||||
if let Some(handle) = recovery_handle {
|
||||
if let Err(err) =
|
||||
crate::db::manifest::delete_sidecar(&handle, db.storage_adapter()).await
|
||||
|
|
@ -222,7 +223,7 @@ pub(super) async fn ensure_indices_for_branch(
|
|||
tracing::warn!(
|
||||
error = %err,
|
||||
operation_id = handle.operation_id.as_str(),
|
||||
"MR-847 sidecar cleanup failed; the next open's recovery sweep will resolve it"
|
||||
"recovery sidecar cleanup failed; the next open's recovery sweep will resolve it"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
@ -241,8 +242,8 @@ pub(super) async fn ensure_indices_for_branch(
|
|||
/// Per the actual `build_indices_on_dataset_for_catalog` implementation
|
||||
/// (this file, ~line 419-491), nodes get BTree (id) + per-prop FTS
|
||||
/// (@search String) + per-prop Vector indices; edges get BTree only
|
||||
/// (id, src, dst). The two helpers mirror that asymmetry — see PR #72
|
||||
/// round-2 review and the `needs_index_work_edge` doc comment.
|
||||
/// (id, src, dst). The two helpers mirror that asymmetry — see the
|
||||
/// `needs_index_work_edge` doc comment.
|
||||
async fn needs_index_work_node(
|
||||
db: &Omnigraph,
|
||||
type_name: &str,
|
||||
|
|
@ -254,9 +255,14 @@ async fn needs_index_work_node(
|
|||
.table_store
|
||||
.open_dataset_head_for_write(table_key, full_path, table_branch)
|
||||
.await?;
|
||||
// Empty tables skipped by the ensure_indices loop — must not pin them
|
||||
// in the sidecar (PR #72 round-2 review).
|
||||
if db.table_store.count_rows(&ds, None).await.unwrap_or(0) == 0 {
|
||||
// Empty tables are skipped by the ensure_indices loop, so they must
|
||||
// not be pinned in the sidecar — pinning a table that produces zero
|
||||
// commits classifies as NoMovement on recovery and forces all-or-
|
||||
// nothing rollback of sibling tables' legitimate index work.
|
||||
// Errors from count_rows are propagated: silently treating them as
|
||||
// "0 rows" risks skipping a table that is actually about to be
|
||||
// modified.
|
||||
if db.table_store.count_rows(&ds, None).await? == 0 {
|
||||
return Ok(false);
|
||||
}
|
||||
if !db.table_store.has_btree_index(&ds, "id").await? {
|
||||
|
|
@ -292,9 +298,7 @@ async fn needs_index_work_node(
|
|||
/// BTree indices (id, src, dst) per `build_indices_on_dataset_for_catalog`
|
||||
/// at the edge branch (this file, lines 474-485). FTS / vector indices
|
||||
/// on edge properties are not built today; if they ever are, this
|
||||
/// helper plus the build function must be updated together. (PR #72
|
||||
/// round-1 cursor finding flagged the FTS/vector omission as a
|
||||
/// possible inconsistency — confirmed intentional.)
|
||||
/// helper plus the build function must be updated together.
|
||||
///
|
||||
/// Empty edge tables are skipped by the ensure_indices loop the same
|
||||
/// way node tables are; see `needs_index_work_node`.
|
||||
|
|
@ -308,7 +312,7 @@ async fn needs_index_work_edge(
|
|||
.table_store
|
||||
.open_dataset_head_for_write(table_key, full_path, table_branch)
|
||||
.await?;
|
||||
if db.table_store.count_rows(&ds, None).await.unwrap_or(0) == 0 {
|
||||
if db.table_store.count_rows(&ds, None).await? == 0 {
|
||||
return Ok(false);
|
||||
}
|
||||
Ok(!db.table_store.has_btree_index(&ds, "id").await?
|
||||
|
|
@ -463,14 +467,14 @@ pub(super) async fn build_indices_on_dataset_for_catalog(
|
|||
}
|
||||
|
||||
if let Some(node_type) = catalog.node_types.get(type_name) {
|
||||
// Per MR-793 §10 OQ3: stage scalar indices first (BTree,
|
||||
// Inverted), then call `create_vector_index` inline. The
|
||||
// inline-commit on a vector index advances HEAD, which would
|
||||
// invalidate any uncommitted scalar index transactions if we
|
||||
// stacked them. Today the per-stage shape commits each
|
||||
// scalar index immediately so the order constraint is
|
||||
// implicit, but if we ever batch scalar stages we must
|
||||
// ensure they all land before the vector inline-commit.
|
||||
// Stage scalar indices first (BTree, Inverted), then call
|
||||
// `create_vector_index` inline. The inline-commit on a vector
|
||||
// index advances HEAD, which would invalidate any uncommitted
|
||||
// scalar index transactions if we stacked them. Today the
|
||||
// per-stage shape commits each scalar index immediately so
|
||||
// the order constraint is implicit, but if we ever batch
|
||||
// scalar stages we must ensure they all land before the
|
||||
// vector inline-commit.
|
||||
for index_cols in &node_type.indices {
|
||||
if index_cols.len() != 1 {
|
||||
continue;
|
||||
|
|
@ -526,13 +530,13 @@ pub(super) async fn build_indices_on_dataset_for_catalog(
|
|||
}
|
||||
|
||||
/// Stage a BTREE index transaction and commit it, advancing the in-memory
|
||||
/// `*ds` to the new HEAD. MR-793 Phase 4: replaces the previous
|
||||
/// inline-commit `create_btree_index(ds)` call with the staged primitive
|
||||
/// + an immediate `commit_staged`. Per-call behavior is unchanged
|
||||
/// (HEAD advances once per index), but the bytes-on-disk and HEAD-advance
|
||||
/// are now decoupled at the `TableStore` API surface — a caller that
|
||||
/// needs end-of-batch atomicity can stage many transactions and commit
|
||||
/// them in one pass (Phase 8's index reconciler relies on this).
|
||||
/// `*ds` to the new HEAD. The staged primitive + immediate `commit_staged`
|
||||
/// pair replaced the earlier inline-commit `create_btree_index(ds)` call.
|
||||
/// Per-call behavior is unchanged (HEAD advances once per index), but
|
||||
/// the bytes-on-disk and HEAD-advance are now decoupled at the
|
||||
/// `TableStore` API surface — a caller that needs end-of-batch atomicity
|
||||
/// can stage many transactions and commit them in one pass (the eventual
|
||||
/// index reconciler relies on this).
|
||||
async fn stage_and_commit_btree(
|
||||
db: &Omnigraph,
|
||||
table_key: &str,
|
||||
|
|
@ -568,7 +572,7 @@ async fn stage_and_commit_btree(
|
|||
}
|
||||
|
||||
/// Stage an INVERTED (FTS) index transaction and commit it. See
|
||||
/// `stage_and_commit_btree` for the MR-793 Phase 4 rationale.
|
||||
/// `stage_and_commit_btree` for the rationale.
|
||||
async fn stage_and_commit_inverted(
|
||||
db: &Omnigraph,
|
||||
table_key: &str,
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
//! MR-847 Phase 4 — Recovery audit row storage in `_graph_commit_recoveries.lance`.
|
||||
//! Recovery audit row storage in `_graph_commit_recoveries.lance`.
|
||||
//!
|
||||
//! Sibling to `_graph_commits.lance` (`commit_graph.rs`). Each successful
|
||||
//! recovery sweep — roll-forward or roll-back — records one row here so
|
||||
|
|
@ -6,12 +6,12 @@
|
|||
//! `omnigraph commit list --filter actor=omnigraph:recovery` with the
|
||||
//! original actor whose mutation was rolled forward / back.
|
||||
//!
|
||||
//! The schema-migration alternative (adding `recovery_for_actor` and
|
||||
//! `recovery_kind` columns to `_graph_commits.lance` itself) was
|
||||
//! considered and rejected for MR-847 — see `.context/mr-847-design.md`
|
||||
//! § "Recovery audit model". Sibling-table is additive, doesn't bump
|
||||
//! Sibling-table is additive: it doesn't bump
|
||||
//! `INTERNAL_MANIFEST_SCHEMA_VERSION`, and can be removed in favor of a
|
||||
//! schema migration later if the join cost matters.
|
||||
//! schema migration later if the join cost matters. The schema-migration
|
||||
//! alternative (adding `recovery_for_actor` and `recovery_kind` columns
|
||||
//! to `_graph_commits.lance` itself) was considered and rejected to keep
|
||||
//! this change additive.
|
||||
//!
|
||||
//! Atomicity caveat: append to `_graph_commit_recoveries.lance` is
|
||||
//! sequential w.r.t. the `CommitGraph::append_commit` write. A crash
|
||||
|
|
|
|||
|
|
@ -913,9 +913,9 @@ async fn publish_rewritten_merge_table(
|
|||
// Phase 1: merge_insert changed/new rows (preserves _row_created_at_version for
|
||||
// existing rows, bumps _row_last_updated_at_version only for actually-changed rows).
|
||||
//
|
||||
// MR-793 Phase 5: routed through the staged primitive so a failure
|
||||
// between writing fragments and committing leaves no Lance-HEAD
|
||||
// drift. The commit_staged here is per-table per-call (Lance has no
|
||||
// Routed through the staged primitive so a failure between writing
|
||||
// fragments and committing leaves no Lance-HEAD drift. The
|
||||
// commit_staged here is per-table per-call (Lance has no
|
||||
// multi-dataset atomic commit); the residual sits at this single
|
||||
// commit point, narrowed from the previous "merge_insert + delete +
|
||||
// index" multi-step inline-commit chain.
|
||||
|
|
@ -957,11 +957,11 @@ async fn publish_rewritten_merge_table(
|
|||
//
|
||||
// INLINE-COMMIT RESIDUAL: lance-4.0.0 does not expose a public
|
||||
// two-phase delete API (DeleteJob is `pub(crate)` —
|
||||
// lance-format/lance#6658 is open with no PRs). MR-793 deliberately
|
||||
// does NOT introduce a `stage_delete` wrapper that would secretly
|
||||
// inline-commit (a side-channel — see design doc §3.2). When the
|
||||
// upstream API ships, swap this `delete_where` call for
|
||||
// `stage_delete` + `commit_staged`.
|
||||
// lance-format/lance#6658 is open with no PRs). We deliberately do
|
||||
// NOT introduce a `stage_delete` wrapper that would secretly
|
||||
// inline-commit (it would create a side-channel between the staged
|
||||
// and inline write paths). When the upstream API ships, swap this
|
||||
// `delete_where` call for `stage_delete` + `commit_staged`.
|
||||
if !staged.deleted_ids.is_empty() {
|
||||
let escaped: Vec<String> = staged
|
||||
.deleted_ids
|
||||
|
|
@ -977,11 +977,11 @@ async fn publish_rewritten_merge_table(
|
|||
|
||||
// Phase 3: rebuild indices.
|
||||
//
|
||||
// `build_indices_on_dataset` was migrated in MR-793 Phase 4 to use
|
||||
// `stage_create_btree_index` / `stage_create_inverted_index` +
|
||||
// `commit_staged` for scalar indices. Vector indices remain inline
|
||||
// (residual — `build_index_metadata_from_segments` is `pub(crate)`
|
||||
// in lance-4.0.0; companion ticket to lance-format/lance#6658).
|
||||
// `build_indices_on_dataset` uses `stage_create_btree_index` /
|
||||
// `stage_create_inverted_index` + `commit_staged` for scalar
|
||||
// indices. Vector indices remain inline-commit
|
||||
// (`build_index_metadata_from_segments` is `pub(crate)` in lance-
|
||||
// 4.0.0 — companion ticket to lance-format/lance#6658).
|
||||
let row_count = target_db
|
||||
.table_store()
|
||||
.table_state(&full_path, ¤t_ds)
|
||||
|
|
@ -1167,13 +1167,14 @@ impl Omnigraph {
|
|||
|
||||
validate_merge_candidates(self, source_snapshot, &target_snapshot, &candidates).await?;
|
||||
|
||||
// MR-847 sidecar: protect the per-table commit_staged loop. Pins
|
||||
// every table that will be touched by `publish_adopted_source_state`
|
||||
// or `publish_rewritten_merge_table`. BranchMerge uses loose
|
||||
// classification — the publish path may run multiple commit_staged
|
||||
// calls per table (publish_rewritten_merge_table does
|
||||
// stage_merge_insert + delete_where + index rebuilds per the
|
||||
// existing branch-merge code path).
|
||||
// Recovery sidecar: protect the per-table commit_staged loop.
|
||||
// Pins every table that will be touched by
|
||||
// `publish_adopted_source_state` or `publish_rewritten_merge_table`.
|
||||
// BranchMerge uses loose classification — the publish path may
|
||||
// run multiple commit_staged calls per table
|
||||
// (publish_rewritten_merge_table does stage_merge_insert +
|
||||
// delete_where + index rebuilds per the existing branch-merge
|
||||
// code path).
|
||||
let recovery_pins: Vec<crate::db::manifest::SidecarTablePin> = ordered_table_keys
|
||||
.iter()
|
||||
.filter(|tk| candidates.contains_key(*tk))
|
||||
|
|
@ -1190,15 +1191,15 @@ impl Omnigraph {
|
|||
let recovery_handle = if recovery_pins.is_empty() {
|
||||
None
|
||||
} else {
|
||||
// PR #72 review (chatgpt-codex + cubic): use the merge target
|
||||
// branch directly, NOT a heuristic derived from
|
||||
// `ordered_table_keys.first()`. The first sorted table key may
|
||||
// not be in the target snapshot at all (its `entry()` returns
|
||||
// None → branch becomes None == main), and the SubTableEntry's
|
||||
// `table_branch` field isn't necessarily the merge target
|
||||
// branch. The caller `branch_merge` calls
|
||||
// `swap_coordinator_for_branch(target_branch)` before invoking
|
||||
// this function, so `self.active_branch()` is the target.
|
||||
// Use the merge target branch directly, NOT a heuristic
|
||||
// derived from `ordered_table_keys.first()`. The first
|
||||
// sorted table key may not be in the target snapshot at all
|
||||
// (its `entry()` returns None → branch becomes None == main),
|
||||
// and the SubTableEntry's `table_branch` field isn't
|
||||
// necessarily the merge target branch. The caller
|
||||
// `branch_merge` calls `swap_coordinator_for_branch(target_branch)`
|
||||
// before invoking this function, so `self.active_branch()`
|
||||
// is the target.
|
||||
let target_branch = self.active_branch().map(str::to_string);
|
||||
let sidecar = crate::db::manifest::new_sidecar(
|
||||
crate::db::manifest::SidecarKind::BranchMerge,
|
||||
|
|
@ -1244,10 +1245,10 @@ impl Omnigraph {
|
|||
updates.push(update);
|
||||
}
|
||||
|
||||
// MR-847 failpoint: pin the per-writer Phase B → Phase C residual
|
||||
// for branch_merge. Lance HEAD has advanced on every touched
|
||||
// table (publish_*) but the manifest publish below hasn't run.
|
||||
// Used by `tests/failpoints.rs::branch_merge_phase_b_failure_recovered_on_next_open`.
|
||||
// Failpoint: pin the per-writer Phase B → Phase C residual for
|
||||
// branch_merge. Lance HEAD has advanced on every touched table
|
||||
// (publish_*) but the manifest publish below hasn't run. Used
|
||||
// by `tests/failpoints.rs::branch_merge_phase_b_failure_recovered_on_next_open`.
|
||||
crate::failpoints::maybe_fail("branch_merge.post_phase_b_pre_manifest_commit")?;
|
||||
|
||||
let manifest_version = if updates.is_empty() {
|
||||
|
|
@ -1256,8 +1257,9 @@ impl Omnigraph {
|
|||
self.commit_manifest_updates(&updates).await?
|
||||
};
|
||||
|
||||
// MR-847 sidecar lifecycle: delete after manifest publish.
|
||||
// Best-effort cleanup (PR #72 review).
|
||||
// Recovery sidecar lifecycle: delete after manifest publish.
|
||||
// Best-effort cleanup; the merge already landed durably so
|
||||
// failing the user here is undesirable.
|
||||
if let Some(handle) = recovery_handle {
|
||||
if let Err(err) =
|
||||
crate::db::manifest::delete_sidecar(&handle, self.storage_adapter()).await
|
||||
|
|
@ -1265,7 +1267,7 @@ impl Omnigraph {
|
|||
tracing::warn!(
|
||||
error = %err,
|
||||
operation_id = handle.operation_id.as_str(),
|
||||
"MR-847 sidecar cleanup failed; the next open's recovery sweep will resolve it"
|
||||
"recovery sidecar cleanup failed; the next open's recovery sweep will resolve it"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -751,8 +751,8 @@ impl Omnigraph {
|
|||
// the publisher's CAS pre-check rejects (or the manifest
|
||||
// write throws) after staged commits succeeded. The
|
||||
// sidecar written inside `staging.finalize()` persists
|
||||
// across this failure so the next `Omnigraph::open`
|
||||
// (MR-847 recovery sweep) can roll forward — see
|
||||
// across this failure so the next `Omnigraph::open`'s
|
||||
// recovery sweep can roll forward — see
|
||||
// `tests/failpoints.rs::recovery_rolls_forward_after_finalize_publisher_failure`.
|
||||
crate::failpoints::maybe_fail("mutation.post_finalize_pre_publisher")?;
|
||||
self.commit_updates_on_branch_with_expected(
|
||||
|
|
@ -774,7 +774,7 @@ impl Omnigraph {
|
|||
// as `NoMovement` (manifest pin == Lance HEAD ==
|
||||
// post_commit_pin) and tidies up. Failing the user
|
||||
// here would return an error for a write that
|
||||
// already landed (PR #72 review).
|
||||
// already landed.
|
||||
if let Err(err) = crate::db::manifest::delete_sidecar(
|
||||
&handle,
|
||||
self.storage_adapter(),
|
||||
|
|
@ -784,7 +784,7 @@ impl Omnigraph {
|
|||
tracing::warn!(
|
||||
error = %err,
|
||||
operation_id = handle.operation_id.as_str(),
|
||||
"MR-847 sidecar cleanup failed; the next open's recovery sweep will resolve it"
|
||||
"recovery sidecar cleanup failed; the next open's recovery sweep will resolve it"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -238,12 +238,12 @@ impl MutationStaging {
|
|||
let mut updates: Vec<SubTableUpdate> =
|
||||
inline_committed.into_values().collect();
|
||||
|
||||
// MR-847 — sidecar protocol. Build the per-table pin list BEFORE
|
||||
// any Lance commit_staged runs, then write the sidecar so a crash
|
||||
// between Phase B (this loop's commit_staged calls) and Phase C
|
||||
// (the manifest publish in the caller) is recoverable on next
|
||||
// open. Skipped when `pending` is empty (delete-only mutation;
|
||||
// D₂ parse-time rule keeps deletes out of this code path so this
|
||||
// Sidecar protocol: build the per-table pin list BEFORE any Lance
|
||||
// commit_staged runs, then write the sidecar so a crash between
|
||||
// Phase B (this loop's commit_staged calls) and Phase C (the
|
||||
// manifest publish in the caller) is recoverable on next open.
|
||||
// Skipped when `pending` is empty (delete-only mutation; the D₂
|
||||
// parse-time rule keeps deletes out of this code path so this
|
||||
// branch is reached only for the inline-committed-only case).
|
||||
let pins: Vec<SidecarTablePin> = pending
|
||||
.iter()
|
||||
|
|
|
|||
|
|
@ -542,9 +542,10 @@ async fn load_jsonl_reader<R: BufRead>(
|
|||
.await?;
|
||||
db.commit_updates_on_branch_with_expected(branch, &updates, &expected_versions)
|
||||
.await?;
|
||||
// MR-847: sidecar protects the per-table commit_staged →
|
||||
// The recovery sidecar protects the per-table commit_staged →
|
||||
// manifest publish window. Phase C succeeded — clean up
|
||||
// best-effort (PR #72 review).
|
||||
// best-effort: failing the user here would error out a write
|
||||
// that already landed durably.
|
||||
if let Some(handle) = sidecar_handle {
|
||||
if let Err(err) =
|
||||
crate::db::manifest::delete_sidecar(&handle, db.storage_adapter()).await
|
||||
|
|
@ -552,18 +553,18 @@ async fn load_jsonl_reader<R: BufRead>(
|
|||
tracing::warn!(
|
||||
error = %err,
|
||||
operation_id = handle.operation_id.as_str(),
|
||||
"MR-847 sidecar cleanup failed; the next open's recovery sweep will resolve it"
|
||||
"recovery sidecar cleanup failed; the next open's recovery sweep will resolve it"
|
||||
);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// LoadMode::Overwrite keeps the legacy inline-commit path —
|
||||
// truncate-then-append doesn't fit the staged shape (see
|
||||
// `docs/runs.md` "LoadMode::Overwrite residual"). MR-847 sidecar
|
||||
// is not applicable here because the writer doesn't go through
|
||||
// MutationStaging; per-table inline commits + a final manifest
|
||||
// publish handle their own residual via documented operator
|
||||
// workflow (re-run overwrite to recover).
|
||||
// `docs/runs.md` "LoadMode::Overwrite residual"). The recovery
|
||||
// sidecar is not applicable here because the writer doesn't go
|
||||
// through MutationStaging; per-table inline commits + a final
|
||||
// manifest publish handle their own residual via the documented
|
||||
// operator workflow (re-run overwrite to recover).
|
||||
db.commit_updates_on_branch_with_expected(
|
||||
branch,
|
||||
&overwrite_updates,
|
||||
|
|
|
|||
|
|
@ -64,10 +64,10 @@ impl StorageAdapter for LocalStorageAdapter {
|
|||
async fn write_text(&self, uri: &str, contents: &str) -> Result<()> {
|
||||
let path = local_path_from_uri(uri)?;
|
||||
// Ensure parent directory exists. S3 has no equivalent (PutObject
|
||||
// is path-agnostic). For local fs, callers like the MR-847
|
||||
// recovery sidecar protocol expect transparent directory
|
||||
// creation under the repo root (the `__recovery/` directory
|
||||
// doesn't pre-exist; first sidecar write creates it).
|
||||
// is path-agnostic). For local fs, callers like the recovery
|
||||
// sidecar protocol expect transparent directory creation under
|
||||
// the repo root (the `__recovery/` directory doesn't pre-exist;
|
||||
// first sidecar write creates it).
|
||||
if let Some(parent) = path.parent() {
|
||||
if !parent.as_os_str().is_empty() {
|
||||
tokio::fs::create_dir_all(parent).await?;
|
||||
|
|
|
|||
|
|
@ -1,15 +1,13 @@
|
|||
//! MR-858 — Composite agent-lifecycle integration test.
|
||||
//! Composite end-to-end flow integration test.
|
||||
//!
|
||||
//! Walks the canonical agent narrative end to end in one fixture:
|
||||
//! init → load → branch → mutate → query → merge → time-travel →
|
||||
//! optimize → cleanup → reopen. Every numbered step has at least one
|
||||
//! assertion.
|
||||
//! Walks the canonical user flow in one fixture: init → load → branch →
|
||||
//! mutate → query → merge → time-travel → optimize → cleanup → reopen.
|
||||
//! Every numbered step has at least one assertion.
|
||||
//!
|
||||
//! This is the **deterministic narrative** counterpart to MR-783's
|
||||
//! randomized/property-based reliability harness — the test that
|
||||
//! catches a regression where individual operations all work but their
|
||||
//! composition under realistic agent usage breaks. It runs in CI on
|
||||
//! every PR (no `#[ignore]`).
|
||||
//! This is the deterministic narrative counterpart to a randomized /
|
||||
//! property-based reliability harness — it catches regressions where
|
||||
//! individual operations all pass their unit tests but their composition
|
||||
//! breaks. It runs in CI on every PR (no `#[ignore]`).
|
||||
|
||||
mod helpers;
|
||||
|
||||
|
|
@ -27,7 +25,7 @@ const TEST_DATA: &str = include_str!("fixtures/test.jsonl");
|
|||
const TEST_QUERIES: &str = include_str!("fixtures/test.gq");
|
||||
|
||||
#[tokio::test]
|
||||
async fn agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup() {
|
||||
async fn composite_flow_init_load_branch_merge_time_travel_optimize_cleanup() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap();
|
||||
|
||||
|
|
@ -215,7 +213,7 @@ async fn agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup() {
|
|||
v_pre_merge_main,
|
||||
v_post_merge,
|
||||
);
|
||||
let _ = merge_outcome; // outcome is structured; presence of Ok already validates audit/merge_commit recorded
|
||||
let _ = merge_outcome;
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Step 8: query at the post-merge snapshot — verify both sides'
|
||||
|
|
@ -277,24 +275,19 @@ async fn agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup() {
|
|||
// Step 10: optimize the post-merge graph — verify indices stay
|
||||
// valid and queryable.
|
||||
//
|
||||
// **Known limitation** (uncovered by this composite test, surfaced
|
||||
// for follow-up in MR-859 `omnigraph optimize` + `cleanup` integration
|
||||
// coverage): `optimize_all_tables` (`db/omnigraph/optimize.rs:77`)
|
||||
// calls Lance `compact_files` directly — it advances per-table Lance
|
||||
// HEAD without updating the omnigraph `__manifest` pin. After
|
||||
// optimize, the next writer's expected_table_versions captures the
|
||||
// **Known limitation**: `optimize_all_tables` calls Lance
|
||||
// `compact_files` directly — it advances per-table Lance HEAD
|
||||
// without updating the omnigraph `__manifest` pin. After optimize,
|
||||
// the next writer's expected_table_versions captures the
|
||||
// pre-optimize manifest pin, but the publisher's pre-check reads
|
||||
// a higher version from the manifest dataset (because some other
|
||||
// path — possibly the schema-state recovery on reopen — wrote a
|
||||
// newer __manifest row). The `ExpectedVersionMismatch` is benign
|
||||
// (re-issuing the mutation after `db.refresh()` succeeds), but the
|
||||
// composite test cannot reliably exercise post-optimize mutations
|
||||
// until that path is investigated under MR-859.
|
||||
//
|
||||
// For this test we verify optimize completes and reads still work,
|
||||
// then SKIP the post-optimize mutation step. The full coverage
|
||||
// (mutation succeeds after optimize without manual refresh) lives in
|
||||
// the MR-859 follow-up.
|
||||
// path — possibly schema-state recovery on reopen — wrote a newer
|
||||
// __manifest row). The `ExpectedVersionMismatch` is benign
|
||||
// (re-issuing the mutation after a snapshot refresh succeeds), but
|
||||
// a composite test cannot reliably exercise post-optimize mutations
|
||||
// until that path is investigated. Coverage of post-optimize
|
||||
// mutations is left to a focused optimize+cleanup integration test.
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
let optimize_stats = db.optimize().await.unwrap();
|
||||
assert!(
|
||||
!optimize_stats.is_empty(),
|
||||
|
|
@ -324,9 +317,9 @@ async fn agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup() {
|
|||
// Step 11: cleanup — keep last 10 versions, only purge versions
|
||||
// older than 1 hour. With this small test, we have well under 10
|
||||
// versions and nothing that old, so cleanup is a no-op except for
|
||||
// any orphan files. The MR-847 recovery floor (--keep ≥ 3) is
|
||||
// preserved by the keep-10 default. Verify the call doesn't break
|
||||
// subsequent queries.
|
||||
// any orphan files. The recovery floor (--keep ≥ 3) needed for the
|
||||
// open-time recovery sweep is preserved by the keep-10 default.
|
||||
// Verify the call doesn't break subsequent queries.
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
use omnigraph::db::CleanupPolicyOptions;
|
||||
use std::time::Duration;
|
||||
|
|
@ -338,9 +331,6 @@ async fn agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup() {
|
|||
.await
|
||||
.unwrap();
|
||||
|
||||
// Recovery audit dataset, if present, must survive cleanup.
|
||||
// (No recovery happened in this test, so it may not exist.)
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Step 12: reopen the engine — verify post-cleanup state is consistent.
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
|
@ -366,7 +356,9 @@ async fn agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup() {
|
|||
);
|
||||
|
||||
// Final query exercise — full read path works post-reopen,
|
||||
// post-cleanup.
|
||||
// post-cleanup. Post-cleanup mutation is omitted here pending
|
||||
// resolution of the optimize-vs-manifest-pin interaction documented
|
||||
// in Step 10.
|
||||
let final_total = query_main(
|
||||
&mut db,
|
||||
TEST_QUERIES,
|
||||
|
|
@ -377,13 +369,6 @@ async fn agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup() {
|
|||
.unwrap();
|
||||
assert!(!final_total.batches().is_empty());
|
||||
|
||||
// Final mutation skipped — post-optimize mutation surfaces
|
||||
// `ExpectedVersionMismatch` because optimize advances Lance HEAD
|
||||
// without updating the manifest pin (see Step 10 note above and
|
||||
// MR-859 follow-up). The MR-859 ticket covers post-optimize
|
||||
// mutation correctness explicitly. This test asserts the read
|
||||
// path is intact post-cleanup-reopen, which is the more important
|
||||
// user-visible property.
|
||||
let final_total = query_main(
|
||||
&mut db,
|
||||
TEST_QUERIES,
|
||||
|
|
@ -140,8 +140,8 @@ async fn schema_apply_recovers_partial_rename() {
|
|||
assert_no_staging_files(dir.path());
|
||||
}
|
||||
|
||||
/// Prove the MR-847 recovery sweep closes the "finalize → publisher
|
||||
/// residual" across one open cycle — the post-MR-847 contract.
|
||||
/// Prove the recovery sweep closes the "finalize → publisher residual"
|
||||
/// across one open cycle.
|
||||
///
|
||||
/// `MutationStaging::finalize` runs `commit_staged` per touched table
|
||||
/// sequentially before the publisher commits the manifest. Lance has no
|
||||
|
|
@ -149,16 +149,15 @@ async fn schema_apply_recovers_partial_rename() {
|
|||
/// per-table staged commits and the manifest commit leaves Lance HEAD
|
||||
/// advanced on the touched tables with no manifest update.
|
||||
///
|
||||
/// Pre-MR-847: the next mutation surfaced `ExpectedVersionMismatch` and
|
||||
/// the residual persisted until process restart. Post-MR-847: the
|
||||
/// finalize writes a sidecar at `__recovery/{ulid}.json` BEFORE Phase B,
|
||||
/// the failpoint fires AFTER finalize but BEFORE the publisher, the
|
||||
/// engine handle is dropped, and the next `Omnigraph::open` runs the
|
||||
/// recovery sweep. The sweep classifies every table in the sidecar as
|
||||
/// `RolledPastExpected` (Lance HEAD == expected + 1, post_commit_pin
|
||||
/// matches), decides RollForward, atomically extends every manifest pin
|
||||
/// via `ManifestBatchPublisher::publish`, records an audit row, and
|
||||
/// deletes the sidecar.
|
||||
/// Closing the residual: finalize writes a sidecar at
|
||||
/// `__recovery/{ulid}.json` BEFORE Phase B, the failpoint fires AFTER
|
||||
/// finalize but BEFORE the publisher, the engine handle is dropped, and
|
||||
/// the next `Omnigraph::open` runs the recovery sweep. The sweep
|
||||
/// classifies every table in the sidecar as `RolledPastExpected` (Lance
|
||||
/// HEAD == expected + 1, post_commit_pin matches), decides RollForward,
|
||||
/// atomically extends every manifest pin via
|
||||
/// `ManifestBatchPublisher::publish`, records an audit row, and deletes
|
||||
/// the sidecar.
|
||||
///
|
||||
/// After this test passes:
|
||||
/// - The originally-attempted insert ("Eve") is visible via a normal
|
||||
|
|
@ -169,7 +168,7 @@ async fn schema_apply_recovers_partial_rename() {
|
|||
/// `actor_id` in `recovery_for_actor`.
|
||||
///
|
||||
/// Continuous in-process recovery (no restart needed between failure
|
||||
/// and recovery) is MR-856 (background reconciler).
|
||||
/// and recovery) is the goal of a future background reconciler.
|
||||
#[tokio::test]
|
||||
async fn recovery_rolls_forward_after_finalize_publisher_failure() {
|
||||
let _scenario = FailScenario::setup();
|
||||
|
|
@ -303,11 +302,11 @@ async fn finalize_publisher_residual_does_not_drift_untouched_tables() {
|
|||
.expect("Company write on a non-drifted table should succeed");
|
||||
}
|
||||
|
||||
/// MR-793 Phase 4 acceptance bar — proves that a Phase A failure in
|
||||
/// the staged-index path (`stage_create_btree_index` succeeded;
|
||||
/// `commit_staged` not yet called) leaves NO Lance-HEAD drift on the
|
||||
/// existing tables. Subsequent operations against those tables succeed
|
||||
/// without `ExpectedVersionMismatch`.
|
||||
/// Acceptance test: a Phase A failure in the staged-index path
|
||||
/// (`stage_create_btree_index` succeeded; `commit_staged` not yet
|
||||
/// called) leaves NO Lance-HEAD drift on the existing tables.
|
||||
/// Subsequent operations against those tables succeed without
|
||||
/// `ExpectedVersionMismatch`.
|
||||
///
|
||||
/// Path: `apply_schema(v1 → v2)` adds a new node type. The
|
||||
/// `added_tables` loop in `schema_apply` creates the empty dataset and
|
||||
|
|
@ -384,14 +383,15 @@ fn assert_no_staging_files(repo: &std::path::Path) {
|
|||
}
|
||||
|
||||
// =====================================================================
|
||||
// MR-847 Phase 9 — per-writer Phase B → Phase C recovery integration
|
||||
// Per-writer Phase B → Phase C recovery integration
|
||||
// =====================================================================
|
||||
//
|
||||
// Each of the four migrated writers writes a sidecar BEFORE its per-table
|
||||
// commit_staged loop and deletes it AFTER the manifest publish. The
|
||||
// `recovery_rolls_forward_after_finalize_publisher_failure` test above
|
||||
// covers MutationStaging::finalize. The three tests below cover the
|
||||
// other three writers: schema_apply, branch_merge, ensure_indices.
|
||||
// Each of the four migrated writers writes a sidecar BEFORE its
|
||||
// per-table commit_staged loop and deletes it AFTER the manifest
|
||||
// publish. The `recovery_rolls_forward_after_finalize_publisher_failure`
|
||||
// test above covers MutationStaging::finalize. The three tests below
|
||||
// cover the other three writers: schema_apply, branch_merge,
|
||||
// ensure_indices.
|
||||
//
|
||||
// Each follows the same shape: trigger the writer with a failpoint
|
||||
// active in the Phase B → Phase C window, drop the engine, reopen,
|
||||
|
|
@ -424,7 +424,7 @@ async fn schema_apply_phase_b_failure_recovered_on_next_open() {
|
|||
// Phase A: trigger the residual via `schema_apply.after_staging_write`.
|
||||
// This failpoint fires AFTER the rewritten_tables/indexed_tables loops
|
||||
// (Lance HEAD advanced) AND AFTER the schema-state staging files are
|
||||
// written, but BEFORE the manifest publish. The MR-847 sidecar persists.
|
||||
// written, but BEFORE the manifest publish. The recovery sidecar persists.
|
||||
{
|
||||
let mut db = Omnigraph::open(&uri).await.unwrap();
|
||||
let _failpoint = ScopedFailPoint::new("schema_apply.after_staging_write", "return");
|
||||
|
|
@ -434,7 +434,7 @@ async fn schema_apply_phase_b_failure_recovered_on_next_open() {
|
|||
// overall table set — required to keep `recover_schema_state_files`
|
||||
// (which runs BEFORE recover_manifest_drift) happy: it can't
|
||||
// disambiguate property-only migrations and would reject the
|
||||
// open before the MR-847 sweep ever ran.
|
||||
// open before the recovery sweep ever ran.
|
||||
let v2_schema = r#"node Person {
|
||||
name: String @key
|
||||
age: I32?
|
||||
|
|
@ -582,11 +582,11 @@ async fn branch_merge_phase_b_failure_recovered_on_next_open() {
|
|||
);
|
||||
}
|
||||
|
||||
/// PR #72 round-2 fix: `ensure_indices` only writes a sidecar when at
|
||||
/// least one table genuinely needs index work (per `needs_index_work_*`
|
||||
/// helpers in `db/omnigraph/table_ops.rs`). When all tables are
|
||||
/// steady-state (every declared index already built, or empty tables
|
||||
/// that the loop skips), the sidecar is omitted entirely.
|
||||
/// `ensure_indices` only writes a sidecar when at least one table
|
||||
/// genuinely needs index work (per `needs_index_work_*` helpers in
|
||||
/// `db/omnigraph/table_ops.rs`). When all tables are steady-state
|
||||
/// (every declared index already built, or empty tables that the loop
|
||||
/// skips), the sidecar is omitted entirely.
|
||||
///
|
||||
/// Test setup: `load_jsonl` auto-builds indices via
|
||||
/// `prepare_updates_for_commit`. So after the load, every Person/Knows
|
||||
|
|
@ -595,11 +595,11 @@ async fn branch_merge_phase_b_failure_recovered_on_next_open() {
|
|||
/// after the loops), so the call returns Err — but no recovery state
|
||||
/// persists. Reopen is a clean no-op.
|
||||
///
|
||||
/// (Triggering an actual sidecar persistence requires bypassing
|
||||
/// Triggering an actual sidecar persistence requires bypassing
|
||||
/// `load_jsonl`'s auto-build via raw `TableStore::append_batch` — the
|
||||
/// helper-direct path. That's covered structurally by the
|
||||
/// `needs_index_work_*` code review + the
|
||||
/// `recovery_ensure_indices_handles_empty_tables` integration test.)
|
||||
/// `needs_index_work_*` code path and the
|
||||
/// `recovery_ensure_indices_handles_empty_tables` integration test.
|
||||
#[tokio::test]
|
||||
async fn ensure_indices_phase_b_failure_does_not_leak_sidecar_when_no_work_needed() {
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
|
|
@ -625,8 +625,9 @@ async fn ensure_indices_phase_b_failure_does_not_leak_sidecar_when_no_work_neede
|
|||
}
|
||||
|
||||
// Phase A: trigger the failpoint. Steady-state ensure_indices
|
||||
// produces zero sidecar pins (per the round-2 fix); no sidecar is
|
||||
// written. The failpoint still fires, surfacing the Err.
|
||||
// produces zero sidecar pins (the helpers scope pins to tables
|
||||
// that genuinely need work); no sidecar is written. The failpoint
|
||||
// still fires, surfacing the Err.
|
||||
{
|
||||
let mut db = Omnigraph::open(&uri).await.unwrap();
|
||||
let _failpoint = ScopedFailPoint::new(
|
||||
|
|
@ -641,8 +642,8 @@ async fn ensure_indices_phase_b_failure_does_not_leak_sidecar_when_no_work_neede
|
|||
"unexpected error: {err}"
|
||||
);
|
||||
|
||||
// KEY ASSERTION: no sidecar persists, because the round-2 fix
|
||||
// scopes pins to tables that genuinely need work. Steady-state
|
||||
// KEY ASSERTION: no sidecar persists, because the helpers
|
||||
// scope pins to tables that genuinely need work. Steady-state
|
||||
// = no pins = no sidecar = no recovery state = zero open-time
|
||||
// overhead.
|
||||
let recovery_dir = dir.path().join("__recovery");
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
//! MR-793 Phase 3 — forbidden-API guard test.
|
||||
//! Forbidden-API guard test.
|
||||
//!
|
||||
//! Engine code (`exec/`, `db/omnigraph/`, `loader/`, `changes/`) MUST NOT
|
||||
//! call Lance's inline-commit data-write APIs directly. The
|
||||
|
|
@ -29,15 +29,15 @@
|
|||
//! the cross-table manifest commit. Documented exception.
|
||||
//! - `crates/omnigraph/src/storage_layer.rs` — IS the trait module.
|
||||
//!
|
||||
//! ## Initial state (MR-793 Phase 3)
|
||||
//! ## Transitional allow-list
|
||||
//!
|
||||
//! At the time this test was written, MR-793 has migrated three writers
|
||||
//! (ensure_indices, branch_merge, schema_apply rewrites) onto staged
|
||||
//! primitives. Other engine call sites (the bulk loader, exec/mutation,
|
||||
//! exec/query, etc.) still use the legacy inherent `TableStore` methods
|
||||
//! — they're not visible at the trait boundary, but they DO call lance
|
||||
//! types. The allow-list below reflects this transitional state. Phase 9
|
||||
//! tightens the allow-list as call sites migrate.
|
||||
//! The migration of writers onto staged primitives is incremental.
|
||||
//! Several writers (ensure_indices, branch_merge, schema_apply rewrites)
|
||||
//! already route through the staged primitives; others (bulk loader,
|
||||
//! exec/mutation, exec/query) still use the legacy inherent
|
||||
//! `TableStore` methods — they're not visible at the trait boundary, but
|
||||
//! they DO call lance types. The file-level allow-list below reflects
|
||||
//! this transitional state and tightens as call sites migrate.
|
||||
|
||||
use std::path::{Path, PathBuf};
|
||||
|
||||
|
|
@ -99,7 +99,7 @@ const ALLOW_LIST_FILES: &[&str] = &[
|
|||
"storage_layer.rs", // The trait module.
|
||||
"commit_graph.rs", // Maintains `_graph_commits.lance` system table.
|
||||
"graph_coordinator.rs", // Drives the manifest publisher / branch coordinator.
|
||||
"recovery_audit.rs", // Maintains `_graph_commit_recoveries.lance` (MR-847 audit trail).
|
||||
"recovery_audit.rs", // Maintains `_graph_commit_recoveries.lance` (recovery audit trail).
|
||||
];
|
||||
|
||||
/// Directories exempt from the guard. Files under these paths may use
|
||||
|
|
@ -203,7 +203,7 @@ fn engine_code_does_not_call_forbidden_lance_apis() {
|
|||
|
||||
if !violations.is_empty() {
|
||||
panic!(
|
||||
"MR-793 forbidden-API guard found {} violation(s) in engine code. \
|
||||
"Forbidden-API guard found {} violation(s) in engine code. \
|
||||
Engine code MUST route through the `TableStorage` trait (or its \
|
||||
inherent counterparts on `TableStore`) instead of calling Lance's \
|
||||
inline-commit APIs directly. If a use is genuinely justified, add \
|
||||
|
|
|
|||
|
|
@ -1,13 +1,13 @@
|
|||
//! MR-847 — open-time recovery sweep integration tests.
|
||||
//! Open-time recovery sweep integration tests.
|
||||
//!
|
||||
//! These exercise the full `Omnigraph::open` cycle: drop a synthetic
|
||||
//! sidecar into `__recovery/`, advance some Lance HEADs to simulate the
|
||||
//! Phase B → Phase C residual, reopen the engine, and assert the sweep's
|
||||
//! decision-tree dispatch did the right thing.
|
||||
//!
|
||||
//! The Phase 3 tests pin open-time invocation, `OpenMode::{ReadWrite,
|
||||
//! ReadOnly}`, the roll-back path, and schema-version refusal. The Phase
|
||||
//! 4 tests pin the roll-forward path + audit row recording.
|
||||
//! Coverage: open-time invocation, `OpenMode::{ReadWrite, ReadOnly}`,
|
||||
//! roll-back path, schema-version refusal, roll-forward path, and audit
|
||||
//! row recording.
|
||||
|
||||
use std::path::Path;
|
||||
|
||||
|
|
@ -311,6 +311,35 @@ async fn read_latest_recovery_audit(
|
|||
))
|
||||
}
|
||||
|
||||
/// Helper: read every recovery audit row's `recovery_kind` value, in
|
||||
/// storage order (multiple batches concatenated). Used by the
|
||||
/// multi-sidecar fresh-snapshot test as a diagnostic alongside the
|
||||
/// post-recovery Lance HEAD assertion.
|
||||
async fn list_recovery_audit_kinds(repo_root: &Path) -> Vec<String> {
|
||||
let recoveries_dir = repo_root.join("_graph_commit_recoveries.lance");
|
||||
if !recoveries_dir.exists() {
|
||||
return Vec::new();
|
||||
}
|
||||
let ds = Dataset::open(recoveries_dir.to_str().unwrap()).await.unwrap();
|
||||
use arrow_array::{Array, StringArray};
|
||||
use futures::TryStreamExt;
|
||||
let batches: Vec<arrow_array::RecordBatch> =
|
||||
ds.scan().try_into_stream().await.unwrap().try_collect().await.unwrap();
|
||||
let mut out = Vec::new();
|
||||
for batch in batches {
|
||||
let kinds = batch
|
||||
.column_by_name("recovery_kind")
|
||||
.expect("recovery_kind column present")
|
||||
.as_any()
|
||||
.downcast_ref::<StringArray>()
|
||||
.expect("recovery_kind is Utf8");
|
||||
for i in 0..kinds.len() {
|
||||
out.push(kinds.value(i).to_string());
|
||||
}
|
||||
}
|
||||
out
|
||||
}
|
||||
|
||||
/// Helper: count `_graph_commits.lance` rows tagged with the recovery actor.
|
||||
async fn count_recovery_actor_commits(repo_root: &Path) -> usize {
|
||||
let actors_dir = repo_root.join("_graph_commit_actors.lance");
|
||||
|
|
@ -566,21 +595,19 @@ async fn recovery_rolls_forward_with_null_actor() {
|
|||
}
|
||||
|
||||
// =====================================================================
|
||||
// PR #72 review fixes — integration tests
|
||||
// Multi-sidecar processing — integration tests
|
||||
// =====================================================================
|
||||
|
||||
/// PR #72 review (chatgpt-codex + cubic): multiple sidecars must be
|
||||
/// processed in deterministic ORDER and against FRESH manifest snapshots.
|
||||
/// Without sort + per-sidecar refresh, sidecar B can be classified
|
||||
/// against sidecar A's stale pre-publish snapshot and incorrectly roll
|
||||
/// back work that just landed.
|
||||
/// Multiple sidecars must be processed in deterministic ORDER and against
|
||||
/// FRESH manifest snapshots. Without sort + per-sidecar refresh, sidecar
|
||||
/// B can be classified against sidecar A's stale pre-publish snapshot
|
||||
/// and incorrectly roll back work that just landed.
|
||||
///
|
||||
/// This test drops two synthetic sidecars on independent tables and
|
||||
/// asserts the sweep processes both end-to-end (both deleted, both
|
||||
/// audited). The unit test
|
||||
/// `list_sidecars_returns_deterministic_order` pins the sort order; this
|
||||
/// integration test pins the multi-sidecar flow against a real engine
|
||||
/// state.
|
||||
/// audited). The unit test `list_sidecars_returns_deterministic_order`
|
||||
/// pins the sort order; this integration test pins the multi-sidecar
|
||||
/// flow against a real engine state.
|
||||
#[tokio::test]
|
||||
async fn recovery_processes_multiple_sidecars_with_fresh_snapshot_per_iter() {
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
|
|
@ -666,19 +693,18 @@ async fn recovery_processes_multiple_sidecars_with_fresh_snapshot_per_iter() {
|
|||
);
|
||||
}
|
||||
|
||||
/// PR #72 review (cubic site #13): `ensure_indices_for_branch` previously
|
||||
/// pinned every catalog table in the sidecar. If only ONE table needed
|
||||
/// `ensure_indices_for_branch` must only pin tables that actually need
|
||||
/// new index work. If it pinned every catalog table and only one needed
|
||||
/// new indices, the others would classify as `NoMovement` on recovery,
|
||||
/// triggering the all-or-nothing decision rule to roll BACK the table
|
||||
/// that did get index work — destroying legitimate Phase B output.
|
||||
///
|
||||
/// Steady-state case: when nothing needs indexing, no sidecar should
|
||||
/// be written. A sibling test
|
||||
/// `recovery_ensure_indices_skips_empty_tables_in_sidecar_scope`
|
||||
/// (PR #72 round-2 review) covers the more nuanced empty-table case
|
||||
/// where the existing ensure_indices loop has
|
||||
/// `if row_count > 0 { build_indices(...) }` — empty tables produce
|
||||
/// zero commits and would otherwise force NoMovement → rollback.
|
||||
/// be written. The sibling test `recovery_ensure_indices_handles_empty_tables`
|
||||
/// covers the more nuanced empty-table case where the existing
|
||||
/// ensure_indices loop has `if row_count > 0 { build_indices(...) }` —
|
||||
/// empty tables produce zero commits and would otherwise force
|
||||
/// NoMovement → rollback.
|
||||
#[tokio::test]
|
||||
async fn recovery_ensure_indices_steady_state_no_sidecar() {
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
|
|
@ -702,28 +728,26 @@ async fn recovery_ensure_indices_steady_state_no_sidecar() {
|
|||
);
|
||||
}
|
||||
|
||||
/// PR #72 round-2 review (cubic): empty tables (zero rows) bypass
|
||||
/// `build_indices_on_dataset` because `ensure_indices_for_branch` has
|
||||
/// `if row_count > 0 { build_indices(...) }`. The needs_index_work_*
|
||||
/// helpers must match this — pinning an empty table means recovery
|
||||
/// classifies it as `NoMovement` (no commits ever ran) and rolls back
|
||||
/// any sibling table's legitimate index work.
|
||||
/// Empty tables (zero rows) bypass `build_indices_on_dataset` because
|
||||
/// `ensure_indices_for_branch` has `if row_count > 0 { build_indices(...) }`.
|
||||
/// The `needs_index_work_*` helpers must match this — pinning an empty
|
||||
/// table means recovery classifies it as `NoMovement` (no commits ever
|
||||
/// ran) and rolls back any sibling table's legitimate index work.
|
||||
///
|
||||
/// Integration verification: after a real init + ensure_indices on a
|
||||
/// repo where every table is empty, the recovery sweep must complete
|
||||
/// cleanly (no leftover sidecar) AND the next ensure_indices must
|
||||
/// also leave no sidecar — proving the empty-table-scoping fix lets
|
||||
/// cleanly (no leftover sidecar) AND the next ensure_indices must also
|
||||
/// leave no sidecar — proving the empty-table-scoping behavior lets
|
||||
/// steady-state runs incur zero sidecar I/O. The
|
||||
/// `count_rows == 0 → return false` short-circuit in
|
||||
/// `needs_index_work_*` is what makes this work.
|
||||
/// `count_rows == 0 → return false` short-circuit in `needs_index_work_*`
|
||||
/// is what makes this work.
|
||||
///
|
||||
/// (A stronger assertion that captures the sidecar mid-flight and
|
||||
/// verifies the persisted JSON omits empty tables would require
|
||||
/// bypassing `load_jsonl` — which auto-builds indices via
|
||||
/// `prepare_updates_for_commit`. Pinning that with a unit test on the
|
||||
/// helpers directly would require bootstrapping an engine plus raw
|
||||
/// Lance writes; deferred as a follow-up. The behavioral correctness
|
||||
/// is verified by code inspection + bot review concurrence.)
|
||||
/// A stronger assertion that captured the sidecar mid-flight and verified
|
||||
/// the persisted JSON omits empty tables would require bypassing
|
||||
/// `load_jsonl` (which auto-builds indices via
|
||||
/// `prepare_updates_for_commit`); pinning that with a unit test on the
|
||||
/// helpers directly would require bootstrapping an engine plus raw Lance
|
||||
/// writes — left as a follow-up.
|
||||
#[tokio::test]
|
||||
async fn recovery_ensure_indices_handles_empty_tables() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
|
@ -745,26 +769,50 @@ async fn recovery_ensure_indices_handles_empty_tables() {
|
|||
);
|
||||
}
|
||||
|
||||
/// PR #72 round-2 review (cubic site #4 follow-up): the original
|
||||
/// `recovery_processes_multiple_sidecars_with_fresh_snapshot_per_iter`
|
||||
/// test used independent tables so the fresh-snapshot fix wasn't
|
||||
/// load-bearing. This test makes the second sidecar's classification
|
||||
/// DEPEND on the first sidecar's manifest update — proving the refresh
|
||||
/// is required for correctness.
|
||||
/// Multi-sidecar processing must refresh the manifest snapshot between
|
||||
/// sidecars: sidecar N's roll-forward writes manifest changes that
|
||||
/// sidecar N+1 must observe, otherwise N+1 classifies its tables
|
||||
/// against stale pins and may incorrectly run a Dataset::restore that
|
||||
/// would not have run under a fresh view.
|
||||
///
|
||||
/// Setup:
|
||||
/// - Sidecar A: kind=EnsureIndices (loose), refers to Person at
|
||||
/// expected=v1, post=v2. After processing, manifest pin advances
|
||||
/// to wherever Lance HEAD is at the time.
|
||||
/// expected=v1, post=v2.
|
||||
/// - Sidecar B: kind=EnsureIndices (loose), refers to Person at
|
||||
/// expected=v2 (the post-A manifest pin).
|
||||
/// expected=v2, post=v3.
|
||||
/// - Lance HEAD for Person sits at v3 (both writers' Phase B fragments
|
||||
/// chained but neither's Phase C landed).
|
||||
///
|
||||
/// Without the fresh-snapshot refresh, sidecar B's `expected_version=v2`
|
||||
/// is compared against the pre-A snapshot's pin (v1), failing the
|
||||
/// loose-match `pin.expected_version == manifest_pinned` predicate
|
||||
/// → classified as UnexpectedAtP1 → RollBack. With the refresh,
|
||||
/// expected=v2 matches the new pin v2 → RolledPastExpected → roll
|
||||
/// forward succeeds.
|
||||
/// Outcome paths:
|
||||
///
|
||||
/// **Stale-snapshot bug** (no per-sidecar refresh):
|
||||
/// Sidecar A's classifier sees pre-recovery pin=v1, expected=v1
|
||||
/// matches → RolledPastExpected → RollForward to HEAD=v3. Manifest
|
||||
/// advances Person v1 → v3. Sidecar B's classifier still sees the
|
||||
/// STALE pin v1: lance_head=v3, manifest_pinned=v1, expected=v2.
|
||||
/// Loose-match predicate `expected == manifest_pinned` fails (v2 !=
|
||||
/// v1); `lance_head == manifest_pinned + 1` fails (v3 != v2) →
|
||||
/// UnexpectedMultistep → RollBack. Restore Person to expected=v2,
|
||||
/// creating Lance HEAD v4.
|
||||
///
|
||||
/// **Fresh-snapshot fix** (refresh per sidecar):
|
||||
/// Sidecar A: same as above; manifest pin advances to v3.
|
||||
/// Sidecar B refresh: classifier now sees pin=v3, lance_head=v3,
|
||||
/// expected=v2. lance_head == manifest_pinned → NoMovement → RollBack
|
||||
/// decision but the rollback loop has no eligible tables (only
|
||||
/// {RolledPastExpected, UnexpectedAtP1, UnexpectedMultistep} are
|
||||
/// restored), so it's a no-op rollback. Lance HEAD stays at v3.
|
||||
///
|
||||
/// **Differentiating assertion**: post-recovery Lance HEAD for Person
|
||||
/// must be == v3 (no restore happened). The stale-snapshot bug would
|
||||
/// have advanced HEAD to v4 via Dataset::restore.
|
||||
///
|
||||
/// Note: the audit row for sidecar B is "RolledBack" in the fix path
|
||||
/// because the all-or-nothing decision sees NoMovement. Overlapping-
|
||||
/// sidecar scenarios where one writer's HEAD-chained work absorbs the
|
||||
/// other's are rare in practice — per-(table, branch) writer
|
||||
/// serialization prevents them in steady state — but the recovery
|
||||
/// sweep handles them safely without forward-progress drift.
|
||||
#[tokio::test]
|
||||
async fn recovery_multi_sidecar_requires_fresh_snapshot_for_correctness() {
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
|
|
@ -856,13 +904,41 @@ async fn recovery_multi_sidecar_requires_fresh_snapshot_for_correctness() {
|
|||
2,
|
||||
"two sidecars → two audit rows"
|
||||
);
|
||||
|
||||
// The "sidecars deleted + audit rows present" assertions above are
|
||||
// necessary but not sufficient — both pass even when sidecar B rolls
|
||||
// back under a stale snapshot (the bug path), because the sidecar is
|
||||
// still deleted and an audit row is still written. The differentiating
|
||||
// signal is the post-recovery Lance HEAD for Person:
|
||||
// - Fresh-snapshot fix: sidecar B is no-op rollback (NoMovement);
|
||||
// no Dataset::restore runs; HEAD stays at v3.
|
||||
// - Stale-snapshot bug: sidecar B classifies as UnexpectedMultistep;
|
||||
// restore advances HEAD to v4.
|
||||
let ds_after = Dataset::open(&person_uri).await.unwrap();
|
||||
assert_eq!(
|
||||
ds_after.version().version,
|
||||
v3,
|
||||
"Person Lance HEAD must remain v3 (no restore from stale-snapshot rollback); got {} \
|
||||
— a higher value indicates sidecar B classified UnexpectedMultistep against the \
|
||||
stale pre-recovery pin and ran a restore",
|
||||
ds_after.version().version
|
||||
);
|
||||
// Sanity: the audit kinds are diagnostic — first sidecar rolls forward
|
||||
// (RolledPastExpected → RollForward); second is no-op rollback in this
|
||||
// overlapping-sidecar scenario.
|
||||
let kinds = list_recovery_audit_kinds(dir.path()).await;
|
||||
assert_eq!(kinds.len(), 2, "expected 2 audit rows, got {:?}", kinds);
|
||||
assert!(
|
||||
matches!(kinds[0].as_str(), "RolledForward"),
|
||||
"first sidecar must roll forward; got {:?}",
|
||||
kinds
|
||||
);
|
||||
}
|
||||
|
||||
/// PR #72 review (cubic site #10): `OpenMode::ReadOnly` previously ran
|
||||
/// `recover_schema_state_files` unconditionally, which can delete or
|
||||
/// rename schema-staging files. Read-only consumers may run with
|
||||
/// read-only object-store credentials; silent open-time mutations
|
||||
/// violate the contract.
|
||||
/// `OpenMode::ReadOnly` must NOT run `recover_schema_state_files`,
|
||||
/// which can delete or rename schema-staging files. Read-only consumers
|
||||
/// may run with read-only object-store credentials, and silent open-time
|
||||
/// mutations violate the contract.
|
||||
///
|
||||
/// This test drops a schema-staging file (which the recovery sweep
|
||||
/// would normally delete) then opens with ReadOnly mode. The staging
|
||||
|
|
|
|||
|
|
@ -397,8 +397,7 @@ async fn scan_with_staged_with_filter_silently_drops_staged_rows() {
|
|||
If you're here because this assertion failed: either (a) Lance \
|
||||
exposed a way to scan uncommitted fragments without stats-based \
|
||||
pruning (good — update to assert == [alice, carol, dave]), or \
|
||||
(b) something changed in our scan_with_staged path. See PR #67 \
|
||||
test fix discussion + .context/mr-794-step2-design.md §1.1."
|
||||
(b) something changed in our scan_with_staged path."
|
||||
);
|
||||
|
||||
// Without filter, staged data IS visible — confirms the issue is
|
||||
|
|
@ -493,7 +492,7 @@ async fn chained_stage_merge_insert_with_shared_key_documents_duplicate_behavior
|
|||
);
|
||||
}
|
||||
|
||||
// ─── MR-793 Phase 2: stage_overwrite + scalar index staging ─────────────────
|
||||
// ─── stage_overwrite + scalar index staging ─────────────────
|
||||
|
||||
/// `stage_overwrite` writes replacement fragments to object storage but
|
||||
/// does NOT advance Lance HEAD until `commit_staged` runs. Mirrors
|
||||
|
|
@ -663,11 +662,11 @@ async fn stage_create_inverted_index_does_not_advance_head_until_commit() {
|
|||
|
||||
/// Pin the inline-commit behavior of `delete_where`. Lance 4.0.0 does
|
||||
/// NOT expose a public `DeleteJob::execute_uncommitted`
|
||||
/// (`pub(crate)` — see lance-format/lance#6658). MR-793 deliberately
|
||||
/// (`pub(crate)` — see lance-format/lance#6658). The trait deliberately
|
||||
/// does NOT introduce a `stage_delete` wrapper that would secretly
|
||||
/// inline-commit (a side-channel — see design doc §3.2). Instead, the
|
||||
/// trait keeps `delete_where` as the only delete entry point, named
|
||||
/// honestly.
|
||||
/// inline-commit (a side-channel between the staged and inline write
|
||||
/// paths). Instead, the trait keeps `delete_where` as the only delete
|
||||
/// entry point, named honestly.
|
||||
///
|
||||
/// **When Lance #6658 lands**: this test will need to flip — replace
|
||||
/// the assertion with a `stage_delete` + `commit_staged` round-trip
|
||||
|
|
@ -704,9 +703,10 @@ async fn delete_where_advances_head_inline_documents_residual() {
|
|||
/// `create_vector_index`. Lance 4.0.0 vector indices take the
|
||||
/// "segment commit path" which calls `build_index_metadata_from_segments`
|
||||
/// (`pub(crate)` in lance-4.0.0 `src/index.rs:111`). Until upstream
|
||||
/// exposes that helper (companion ticket to #6658), MR-793's trait
|
||||
/// surface deliberately does NOT include `stage_create_vector_index` —
|
||||
/// see design doc Appendix A.3.
|
||||
/// exposes that helper (companion ticket to lance-format/lance#6658),
|
||||
/// the trait surface deliberately does NOT include
|
||||
/// `stage_create_vector_index` — same rationale as `stage_delete`'s
|
||||
/// absence (no side-channel between staged and inline write paths).
|
||||
#[tokio::test]
|
||||
async fn create_vector_index_advances_head_inline_documents_residual() {
|
||||
use arrow_array::FixedSizeListArray;
|
||||
|
|
@ -759,13 +759,12 @@ async fn create_vector_index_advances_head_inline_documents_residual() {
|
|||
assert!(store.has_vector_index(&ds, "embedding").await.unwrap());
|
||||
}
|
||||
|
||||
/// Empirical pin of `Dataset::restore` semantics for MR-847.
|
||||
/// Empirical pin of `Dataset::restore` semantics for the recovery sweep.
|
||||
///
|
||||
/// MR-847's recovery sweep depends on the `restore` invariant: from
|
||||
/// HEAD = `h`, calling `Dataset::checkout_version(p).await?` then
|
||||
/// The recovery sweep depends on the `restore` invariant: from HEAD =
|
||||
/// `h`, calling `Dataset::checkout_version(p).await?` then
|
||||
/// `Dataset::restore().await?` produces a NEW commit at HEAD = `h + 1`
|
||||
/// (NOT `h + 2` as the v1 design draft assumed) with content == content
|
||||
/// at version `p`.
|
||||
/// with content == content at version `p`.
|
||||
///
|
||||
/// The Lance source confirms this — `restore()` (no args) takes the
|
||||
/// currently-checked-out version's content and applies it via
|
||||
|
|
@ -817,8 +816,8 @@ async fn lance_restore_appends_one_commit_with_checked_out_content() {
|
|||
head_before + 1,
|
||||
"Dataset::restore must append exactly one commit (HEAD + 1). If \
|
||||
this assertion fires, lance changed restore semantics — re-read \
|
||||
lance src/dataset.rs::restore and update the MR-847 design AND \
|
||||
the recovery sweep's rollback path before proceeding."
|
||||
lance src/dataset.rs::restore and update the recovery sweep's \
|
||||
rollback path before proceeding."
|
||||
);
|
||||
|
||||
// Content equality: the restored HEAD must match version 1 (just alice).
|
||||
|
|
@ -839,8 +838,9 @@ async fn lance_restore_appends_one_commit_with_checked_out_content() {
|
|||
);
|
||||
}
|
||||
|
||||
/// Empirical pin of the `Dataset::restore` concurrency hazard that motivates
|
||||
/// MR-847's open-time-only invocation strategy and MR-856's queue-acquisition
|
||||
/// Empirical pin of the `Dataset::restore` concurrency hazard that
|
||||
/// motivates the recovery sweep's open-time-only invocation strategy
|
||||
/// and any future continuous-recovery reconciler's queue-acquisition
|
||||
/// requirement.
|
||||
///
|
||||
/// `Dataset::restore`'s `check_restore_txn` (lance-4.0.0
|
||||
|
|
@ -854,13 +854,14 @@ async fn lance_restore_appends_one_commit_with_checked_out_content() {
|
|||
/// rewind commit AFTER the legitimate concurrent Append, silently
|
||||
/// orphaning that Append's data from the active timeline.
|
||||
///
|
||||
/// MR-847 sidesteps this by running recovery only at `Omnigraph::open`
|
||||
/// (before any other writers can race). MR-856's continuous-recovery
|
||||
/// The recovery sweep sidesteps this by running only at `Omnigraph::open`
|
||||
/// (before any other writers can race). A future continuous-recovery
|
||||
/// reconciler must acquire per-(table_key, branch) queues for sidecar
|
||||
/// tables before invoking restore — otherwise this hazard becomes
|
||||
/// reachable during in-flight tenant traffic.
|
||||
///
|
||||
/// This test is the load-bearing constraint MR-856 must honor.
|
||||
/// This test is the load-bearing constraint any future reconciler must
|
||||
/// honor.
|
||||
#[tokio::test]
|
||||
async fn lance_restore_loses_to_concurrent_append_via_orphaning() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
|
@ -879,8 +880,8 @@ async fn lance_restore_loses_to_concurrent_append_via_orphaning() {
|
|||
let mut recovery_handle = recovery_open.checkout_version(1).await.unwrap();
|
||||
|
||||
// Concurrent legitimate writer: appends bob, advancing HEAD to v2.
|
||||
// This simulates MR-686's per-table-queue model where another tenant
|
||||
// wrote between recovery's open and recovery's restore call.
|
||||
// This simulates a per-table-queue model where another tenant wrote
|
||||
// between recovery's open and recovery's restore call.
|
||||
let mut writer_handle = Dataset::open(&uri).await.unwrap();
|
||||
store
|
||||
.append_batch(&uri, &mut writer_handle, person_batch(&[("bob", Some(25))]))
|
||||
|
|
@ -901,8 +902,8 @@ async fn lance_restore_loses_to_concurrent_append_via_orphaning() {
|
|||
"Restore commits at HEAD+1 even when a concurrent commit landed \
|
||||
between recovery's open and recovery's restore call. If this \
|
||||
assertion fails, lance changed restore-vs-append conflict \
|
||||
semantics — re-read check_restore_txn and update MR-847's \
|
||||
concurrency analysis."
|
||||
semantics — re-read check_restore_txn and update the recovery \
|
||||
sweep's concurrency analysis."
|
||||
);
|
||||
|
||||
let scanner = post.scan();
|
||||
|
|
@ -918,10 +919,10 @@ async fn lance_restore_loses_to_concurrent_append_via_orphaning() {
|
|||
ids,
|
||||
vec!["alice".to_string()],
|
||||
"Concurrent Append's row 'bob' was silently orphaned by the \
|
||||
Restore. Active-timeline contents == v1's contents. This is the \
|
||||
hazard MR-847 sidesteps via open-time-only invocation, and MR-856 \
|
||||
must guard against via per-(table, branch) queue acquisition. \
|
||||
Got: {:?}",
|
||||
Restore. Active-timeline contents == v1's contents. The recovery \
|
||||
sweep sidesteps this hazard via open-time-only invocation; any \
|
||||
future continuous-recovery reconciler must guard against it via \
|
||||
per-(table, branch) queue acquisition. Got: {:?}",
|
||||
ids,
|
||||
);
|
||||
|
||||
|
|
|
|||
|
|
@ -56,7 +56,7 @@ Filtered from `branch_list()` but visible to internals:
|
|||
- `__schema_apply_lock__` — serializes schema migrations.
|
||||
- `__run__<run-id>` — legacy from the pre-v0.4.0 Run state machine (removed in MR-771). The branch-name guard predicate `is_internal_run_branch` is kept as defense-in-depth so users cannot create a branch matching the legacy prefix; the filter will be removed once production legacy branches are swept (MR-770).
|
||||
|
||||
## L2 — Recovery audit trail (MR-847)
|
||||
## L2 — Recovery audit trail
|
||||
|
||||
The four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) protect their multi-table commits with a sidecar at `__recovery/{ulid}.json` written before Phase B and deleted after Phase C. The next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the recovery sweep in `crates/omnigraph/src/db/manifest/recovery.rs`: classify per-table state, decide all-or-nothing per sidecar, roll forward / back, record an audit row.
|
||||
|
||||
|
|
|
|||
|
|
@ -105,13 +105,13 @@ These are user-visible commitments. They state what the engine guarantees and wh
|
|||
Specific defaults (timeout values, memory caps, TTL windows) are *configuration*, not invariants — see [docs/constants.md](constants.md) and per-deployment configuration. The invariant is that bounds and contracts exist, not their numerical values.
|
||||
|
||||
23. **Atomicity is per-query.** Every `.gq` query is atomic — multi-statement mutations are all-or-nothing via the substrate's atomic-commit primitive. No cross-query `BEGIN`/`COMMIT`; branches and merges fill that role for agent workflows.
|
||||
*Status: upheld at the writer-trait surface AND across process boundaries after MR-847 — the sealed `TableStorage` trait routes inserts / updates / scalar-index builds / merge_insert / overwrite through `stage_*` + `commit_staged` (Phase A is drift-free), and the open-time recovery sweep in `db/manifest/recovery.rs` (sidecars at `__recovery/{ulid}.json` written by `MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) closes the per-table commit_staged → manifest publish residual on the next `Omnigraph::open`. The "Lance HEAD ahead of `__manifest`" drift class is unreachable for op-execution failures and recoverable across process boundaries for finalize→publisher failures. Continuous in-process recovery (no restart required between Phase B failure and recovery) arrives with MR-856 (background recovery reconciler). Two writer paths still inline-commit pending upstream Lance work: `delete_where` (lance-format/lance#6658) and `create_vector_index` (lance-format/lance#6666).*
|
||||
*Status: upheld at the writer-trait surface AND across process boundaries — the sealed `TableStorage` trait routes inserts / updates / scalar-index builds / merge_insert / overwrite through `stage_*` + `commit_staged` (Phase A is drift-free), and the open-time recovery sweep in `db/manifest/recovery.rs` (sidecars at `__recovery/{ulid}.json` written by `MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) closes the per-table commit_staged → manifest publish residual on the next `Omnigraph::open`. The "Lance HEAD ahead of `__manifest`" drift class is unreachable for op-execution failures and recoverable across process boundaries for finalize→publisher failures. Continuous in-process recovery (no restart required between Phase B failure and recovery) is the goal of a future background reconciler. Two writer paths still inline-commit pending upstream Lance work: `delete_where` (lance-format/lance#6658) and `create_vector_index` (lance-format/lance#6666).*
|
||||
|
||||
24. **Schema integrity is strict at commit.** Type validation, required-field presence (auto-filled from `@default` if declared), uniqueness across batches and versions, and referential integrity — all enforced before commit succeeds. Per-write softening flags are opt-in, never default.
|
||||
*Status: aspirational — referential integrity at scale requires SIP-backed cross-table validation; not yet implemented. Cross-batch / cross-version uniqueness tracked in MR-714.*
|
||||
|
||||
25. **Isolation: per-query snapshot; read-your-writes within and across queries in a session.** Each query reads from one consistent manifest version. Within a multi-statement mutation, the read subplan inside each write operator sees the writes from earlier statements. Across queries in a session, reads always resolve the latest manifest version — no reader pinning to older snapshots.
|
||||
*Status: upheld for inserts/updates after MR-794 step 2+ — `MutationStaging`'s in-memory accumulator + `TableStore::scan_with_pending` (DataFusion `MemTable` union with the committed Lance scan, with merge-shadow semantics for chained updates) implements read-your-writes within a multi-statement mutation. Delete-touching mutations are limited to delete-only by parse-time D₂; closing the within-query RYW gap for deletes requires Lance's two-phase delete API (tracked: MR-793 / Lance-upstream lance-format/lance#6658). The "Lance HEAD ahead of `__manifest`" drift class is unreachable for op-execution failures (the partial-failure test pins this), and the narrower finalize→publisher residual is closed across one open cycle by the MR-847 recovery sweep — see [docs/runs.md](runs.md) "Open-time recovery sweep".*
|
||||
*Status: upheld for inserts/updates — `MutationStaging`'s in-memory accumulator + `TableStore::scan_with_pending` (DataFusion `MemTable` union with the committed Lance scan, with merge-shadow semantics for chained updates) implements read-your-writes within a multi-statement mutation. Delete-touching mutations are limited to delete-only by parse-time D₂; closing the within-query RYW gap for deletes requires Lance's two-phase delete API (Lance-upstream lance-format/lance#6658). The "Lance HEAD ahead of `__manifest`" drift class is unreachable for op-execution failures (the partial-failure test pins this), and the narrower finalize→publisher residual is closed across one open cycle by the open-time recovery sweep — see [docs/runs.md](runs.md) "Open-time recovery sweep".*
|
||||
|
||||
26. **Durability before acknowledgement.** Commit returns only after the substrate has confirmed durable persistence. No "fast" or "fire-and-forget" durability levels.
|
||||
|
||||
|
|
|
|||
|
|
@ -16,7 +16,7 @@
|
|||
- `CleanupPolicyOptions { keep_versions: Option<u32>, older_than: Option<Duration> }` — at least one is required.
|
||||
- Returns `[TableCleanupStats { table_key, bytes_removed, old_versions_removed }]`.
|
||||
- CLI guards with `--confirm`; without it, prints a preview line.
|
||||
- **MR-847 recovery floor:** `--keep < 3` may garbage-collect Lance versions that the open-time recovery sweep needs as a rollback target (the sweep restores to the manifest-pinned `expected_version`, which is HEAD-1 in the typical Phase B → Phase C drift case). Default `--keep 10` is safe.
|
||||
- **Recovery floor:** `--keep < 3` may garbage-collect Lance versions that the open-time recovery sweep needs as a rollback target (the sweep restores to the manifest-pinned `expected_version`, which is HEAD-1 in the typical Phase B → Phase C drift case). Default `--keep 10` is safe.
|
||||
|
||||
## Tombstones
|
||||
|
||||
|
|
|
|||
|
|
@ -130,7 +130,7 @@ will replace it. Operator-driven (rare in agent workloads); document
|
|||
permanently until Lance exposes `Operation::Overwrite { fragments }` as
|
||||
a two-phase op.
|
||||
|
||||
### Open-time recovery sweep (MR-847)
|
||||
### Open-time recovery sweep
|
||||
|
||||
The staged-write rewire eliminates one drift class **by construction at
|
||||
the writer layer**: an op that fails before pushing to the in-memory
|
||||
|
|
@ -140,7 +140,7 @@ the case the `partial_failure_leaves_target_queryable_and_unblocks_next_mutation
|
|||
test pins.
|
||||
|
||||
A second, narrower drift class — the **finalize → publisher window** —
|
||||
is closed across one open cycle by the MR-847 recovery sweep:
|
||||
is closed across one open cycle by the open-time recovery sweep:
|
||||
|
||||
`MutationStaging::finalize` runs `stage_*` + `commit_staged` per touched
|
||||
table sequentially, then the publisher commits the manifest. Lance has
|
||||
|
|
@ -197,8 +197,8 @@ contention exceeding `PUBLISHER_RETRY_BUDGET = 5` retries.
|
|||
`Omnigraph::open` (typically a server restart), subsequent writers on
|
||||
the affected tables surface
|
||||
`ManifestConflictDetails::ExpectedVersionMismatch`. Continuous
|
||||
in-process recovery (no restart required) arrives with MR-856
|
||||
(background recovery reconciler).
|
||||
in-process recovery (no restart required) is the goal of a future
|
||||
background reconciler.
|
||||
|
||||
The publisher-CAS contract is unchanged: a *concurrent writer* that
|
||||
advances any of our touched tables between snapshot capture and
|
||||
|
|
|
|||
|
|
@ -63,7 +63,7 @@ flowchart TB
|
|||
nodes["nodes/{fnv1a64-hex}/<br/>one dataset per node type"]:::l2
|
||||
edges["edges/{fnv1a64-hex}/<br/>one dataset per edge type"]:::l2
|
||||
cgraph["_graph_commits.lance/<br/>_graph_commit_actors.lance/<br/>_graph_commit_recoveries.lance/"]:::l2
|
||||
recovery["__recovery/{ulid}.json<br/>MR-847 sidecars (transient)"]:::l2
|
||||
recovery["__recovery/{ulid}.json<br/>recovery sidecars (transient)"]:::l2
|
||||
refs["_refs/branches/{name}.json<br/>graph-level branches"]:::l2
|
||||
|
||||
repo --> manifest
|
||||
|
|
@ -92,8 +92,8 @@ flowchart TB
|
|||
- **`__manifest/`** is a Lance dataset whose rows describe which sub-table version is published at which graph-branch. Reading a snapshot starts here.
|
||||
- **`nodes/`** and **`edges/`** are sibling directories holding one Lance dataset per declared type. Names are `fnv1a64-hex` of the type name to keep paths fixed-length and case-safe.
|
||||
- **`_graph_commits.lance`** is an L2 dataset that records the graph-level commit DAG, with a paired `_graph_commit_actors.lance` for the actor map. (Pre-v0.4.0 repos also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; MR-770 sweeps these in production.)
|
||||
- **`_graph_commit_recoveries.lance`** (MR-847) — one row per recovery sweep action. Joined to `_graph_commits.lance` by `graph_commit_id`; the linked commit row carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join. See `crates/omnigraph/src/db/recovery_audit.rs`.
|
||||
- **`__recovery/{ulid}.json`** (MR-847) — transient sidecar files written by the four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) before Phase B begins, deleted after Phase C succeeds. A sidecar persisting after process exit means the writer crashed in the Phase B → Phase C window; the next `Omnigraph::open` recovery sweep processes it. Steady-state directory is empty. See `crates/omnigraph/src/db/manifest/recovery.rs`.
|
||||
- **`_graph_commit_recoveries.lance`** — one row per recovery sweep action. Joined to `_graph_commits.lance` by `graph_commit_id`; the linked commit row carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join. See `crates/omnigraph/src/db/recovery_audit.rs`.
|
||||
- **`__recovery/{ulid}.json`** — transient sidecar files written by the four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) before Phase B begins, deleted after Phase C succeeds. A sidecar persisting after process exit means the writer crashed in the Phase B → Phase C window; the next `Omnigraph::open` recovery sweep processes it. Steady-state directory is empty. See `crates/omnigraph/src/db/manifest/recovery.rs`.
|
||||
- **`_refs/branches/{name}.json`** is graph-level branch metadata — pointers from a branch name to the manifest version it heads.
|
||||
- **Inside each Lance dataset** (orange): the standard Lance directory layout. `_versions/{n}.manifest` records every commit; `data/` holds the actual Arrow fragments; `_indices/{uuid}/` holds index segments with their own `fragment_bitmap` for partial coverage; `_refs/` holds Lance-native per-dataset branches and tags.
|
||||
|
||||
|
|
|
|||
|
|
@ -32,8 +32,8 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
|
|||
| `export.rs` | NDJSON streaming export filters |
|
||||
| `s3_storage.rs` | S3-backed repo (skipped unless `OMNIGRAPH_S3_TEST_BUCKET` is set) |
|
||||
| `lance_version_columns.rs` | Per-row `_row_last_updated_at_version` behavior |
|
||||
| `failpoints.rs` | Failure-injection coverage (gated on `failpoints` feature). Includes the four MR-847 per-writer Phase B → recovery integration tests (`recovery_rolls_forward_after_finalize_publisher_failure`, `schema_apply_phase_b_failure_recovered_on_next_open`, `branch_merge_phase_b_failure_recovered_on_next_open`, `ensure_indices_phase_b_failure_recovered_on_next_open`). |
|
||||
| `recovery.rs` | MR-847 open-time recovery sweep — sidecar I/O, classifier dispatch (NoMovement / RolledPastExpected / UnexpectedAtP1 / UnexpectedMultistep / InvariantViolation), all-or-nothing decision, roll-forward via `ManifestBatchPublisher::publish`, roll-back via `Dataset::restore`, audit row in `_graph_commit_recoveries.lance`, `OpenMode::ReadOnly` skip path |
|
||||
| `failpoints.rs` | Failure-injection coverage (gated on `failpoints` feature). Includes the four per-writer Phase B → recovery integration tests (`recovery_rolls_forward_after_finalize_publisher_failure`, `schema_apply_phase_b_failure_recovered_on_next_open`, `branch_merge_phase_b_failure_recovered_on_next_open`, `ensure_indices_phase_b_failure_recovered_on_next_open`). |
|
||||
| `recovery.rs` | Open-time recovery sweep — sidecar I/O, classifier dispatch (NoMovement / RolledPastExpected / UnexpectedAtP1 / UnexpectedMultistep / InvariantViolation), all-or-nothing decision, roll-forward via `ManifestBatchPublisher::publish`, roll-back via `Dataset::restore`, audit row in `_graph_commit_recoveries.lance`, `OpenMode::ReadOnly` skip path |
|
||||
|
||||
## Fixtures
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue