omnigraph/docs
Ragnor Comerford 815ff743f5
recovery: refresh-time roll-forward closes the in-process residual + invariants helper
Bundle of three correctness fixes plus a shared invariants helper that
existing tests now use.

1. SchemaApply atomicity: close the residual gap where a sidecar exists
   but staging files don't (e.g., Phase B failure BEFORE
   `_schema.pg.staging` write). `recover_schema_state_files` now returns
   a `SchemaStateRecovery` discriminator (`Noop` /
   `CleanedStaging` / `CompletedStagingRename { schema_apply_sidecar }`);
   the token threads through `recover_manifest_drift` →
   `process_sidecar`. SchemaApply sidecars are eligible for roll-forward
   ONLY when the staging rename completed in the same recovery pass.
   Full mode rolls back; RollForwardOnly defers. Without this, recovery
   would publish the manifest pin against new-schema data while
   `_schema.pg` stayed old (real corruption). New failpoint
   `schema_apply.before_staging_write` + new test
   `schema_apply_without_schema_staging_rolls_back_on_next_open` pin
   the gating.

2. Rollback target correction. Rollback now restores Lance HEAD to the
   current manifest pin (`state.manifest_pinned`) instead of the
   sidecar's `expected_version`. For UnexpectedAtP1/UnexpectedMultistep
   classifications these can differ; the old code could regress Lance
   HEAD past the manifest pin, re-introducing drift in the OTHER
   direction. The new behavior establishes `Lance HEAD == manifest pin`
   post-rollback — the canonical drift-free invariant. Param renamed
   from `expected_version` → `target_version` to match. Audit
   `to_version` records the actual restore target.

   This is a latent-behavior change. Any external consumer that compared
   `audit.to_version` against `sidecar.expected_version` for non-trivial
   classifications now sees the manifest pin instead.

3. Audit commit-graph unification. `record_audit` now opens the
   per-branch commit graph for ANY sidecar with `sidecar.branch.is_some()`
   — not just BranchMerge. Plain Mutation/Load/EnsureIndices commits on a
   feature branch now correctly land on that branch's commit graph,
   instead of main's. Closes the class of bug analogous to D2 but for
   non-merge writers.

   Pre-existing repos with non-main commits already on main's commit
   graph stay where they are; future recoveries write to the per-branch
   ref. Mixed-version compatibility is asymmetric but safe (old binaries
   ignore per-branch refs they don't know about; new binaries read both).

4. Recovery invariants helper + branch-axis cells. New
   `tests/helpers/recovery.rs` (~505 LOC) exports
   `assert_post_recovery_invariants(repo, op_id, RecoveryExpectation)`
   plus a `TableExpectation` builder. Six existing recovery tests
   refactored to call it; per-test bespoke assertions replaced. Two new
   branch-axis cells added in `tests/failpoints.rs`:
     - `recovery_rolls_forward_load_on_feature_branch`
     - `recovery_rolls_forward_ensure_indices_on_feature_branch`
   The loader gains a `mutation.post_finalize_pre_publisher` failpoint
   hook (gated on the `failpoints` feature; zero-cost in release) so the
   load test can pin the same Phase B → Phase C boundary the mutation
   path uses.

Misc:
   - `Omnigraph::refresh` extracts `reload_schema_if_source_changed`:
     early-return when schema source unchanged (saves IR parse + catalog
     rebuild on the steady-state refresh path).
   - New test injection point
     `failpoint_publish_table_head_without_index_rebuild_for_test`
     under `#[cfg(feature = "failpoints")]`.

Tests: 31 recovery + failpoint integration tests pass (14 + 17, up from
14 + 16). Full workspace sweep with `--features failpoints` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 16:04:48 +02:00
..
releases MR-794 step 2: address PR #68 review — merge semantics, cardinality, residual 2026-05-01 13:47:55 +02:00
architecture.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
audit.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
branches-commits.md recovery: rename composite test, strip ticket references, address review 2026-05-03 13:56:36 +02:00
changes.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
ci.md Address reviewer feedback (Cursor + cubic) on PR #60 2026-04-29 00:09:06 +02:00
cli-reference.md Address reviewer feedback (Cursor + cubic) on PR #60 2026-04-29 00:09:06 +02:00
cli.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
constants.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
deployment.md Document AWS build variant and bearer-token sources 2026-04-18 04:04:45 +03:00
embeddings.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
errors.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
execution.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
indexes.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
install.md Remove stale Homebrew source-build note 2026-04-11 14:12:49 +03:00
invariants.md recovery: refresh-time roll-forward closes the in-process residual 2026-05-04 00:15:42 +02:00
lance.md lance: confirm MemWAL is opt-in, intra-table, no overlap with MR-847 2026-05-02 19:44:37 +02:00
maintenance.md recovery: refresh-time roll-forward closes the in-process residual + invariants helper 2026-05-05 16:04:48 +02:00
merge.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
policy.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
query-language.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
runs.md recovery: refresh-time roll-forward closes the in-process residual + invariants helper 2026-05-05 16:04:48 +02:00
schema-language.md Address reviewer feedback (Cursor + cubic) on PR #60 2026-04-29 00:09:06 +02:00
server.md MR-771: demote Run to direct-publish via expected_table_versions CAS 2026-04-30 08:52:50 +02:00
storage.md recovery: rename composite test, strip ticket references, address review 2026-05-03 13:56:36 +02:00
testing.md recovery: rename composite test, strip ticket references, address review 2026-05-03 13:56:36 +02:00