omnigraph/docs
Ragnor Comerford aaa031e834
recovery: refresh-time roll-forward closes the in-process residual
Adds RecoveryMode { Full, RollForwardOnly } and wires Omnigraph::refresh
to invoke roll-forward-only recovery. This closes the documented
"long-running server between Phase B failure and process restart"
residual without requiring a restart, for the common case (mutation /
load finalize → publisher failure).

Why roll-forward only and not full sweep:
  * Roll-forward is safe under concurrency (publisher uses row-level
    CAS).
  * Roll-back uses Dataset::restore, which "wins" against concurrent
    Append/Update/Delete/CreateIndex/Merge per check_restore_txn —
    silently orphaning the concurrent writer's commit (pinned by
    tests/staged_writes.rs::lance_restore_loses_to_concurrent_append_via_orphaning).
    Sidecars that classify as RollBack-eligible are LEFT ON DISK for the
    next ReadWrite open, where no concurrent writers exist and full
    restore is safe.

Implementation:
  * recovery.rs: RecoveryMode enum; recover_manifest_drift takes mode;
    process_sidecar branches on mode for Abort and RollBack — both
    defer to next ReadWrite open under RollForwardOnly. RollForward
    behavior unchanged.
  * omnigraph.rs: Omnigraph::refresh promoted to pub; calls
    recover_manifest_drift in RollForwardOnly mode after coordinator
    refresh. Steady-state cost: one list_dir of __recovery (early
    return on empty). Adds refresh_coordinator_only — pub(crate) —
    for engine-internal callers that hold an in-flight sidecar (the
    schema_apply lease-check + lock-release paths). Without this split,
    refresh would race the in-flight sidecar.
  * schema_apply.rs: switch all 6 internal db.refresh() call sites to
    refresh_coordinator_only().

Tests:
  * refresh_runs_roll_forward_recovery_in_process — trigger
    mutation.post_finalize_pre_publisher; without restart, call
    db.refresh(); assert sidecar deleted, drifted row visible,
    subsequent mutation succeeds.
  * refresh_defers_rollback_eligible_sidecar_to_next_open — synthesize
    a Mutation sidecar with bogus expected (UnexpectedAtP1 → RollBack);
    refresh leaves it on disk and Lance HEAD unchanged; drop and reopen
    runs the full sweep which advances HEAD via restore.

Docs:
  * docs/runs.md "Long-running servers" caveat updated to describe the
    refresh-time roll-forward path and the rollback-defer behavior.
  * docs/invariants.md §VI.23 status line updated to reflect in-process
    closure of the common case.

Workspace tests pass with --features failpoints; no regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 00:15:42 +02:00
..
releases MR-794 step 2: address PR #68 review — merge semantics, cardinality, residual 2026-05-01 13:47:55 +02:00
architecture.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
audit.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
branches-commits.md recovery: rename composite test, strip ticket references, address review 2026-05-03 13:56:36 +02:00
changes.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
ci.md Address reviewer feedback (Cursor + cubic) on PR #60 2026-04-29 00:09:06 +02:00
cli-reference.md Address reviewer feedback (Cursor + cubic) on PR #60 2026-04-29 00:09:06 +02:00
cli.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
constants.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
deployment.md Document AWS build variant and bearer-token sources 2026-04-18 04:04:45 +03:00
embeddings.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
errors.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
execution.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
indexes.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
install.md Remove stale Homebrew source-build note 2026-04-11 14:12:49 +03:00
invariants.md recovery: refresh-time roll-forward closes the in-process residual 2026-05-04 00:15:42 +02:00
lance.md lance: confirm MemWAL is opt-in, intra-table, no overlap with MR-847 2026-05-02 19:44:37 +02:00
maintenance.md recovery: rename composite test, strip ticket references, address review 2026-05-03 13:56:36 +02:00
merge.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
policy.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
query-language.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
runs.md recovery: refresh-time roll-forward closes the in-process residual 2026-05-04 00:15:42 +02:00
schema-language.md Address reviewer feedback (Cursor + cubic) on PR #60 2026-04-29 00:09:06 +02:00
server.md MR-771: demote Run to direct-publish via expected_table_versions CAS 2026-04-30 08:52:50 +02:00
storage.md recovery: rename composite test, strip ticket references, address review 2026-05-03 13:56:36 +02:00
testing.md recovery: rename composite test, strip ticket references, address review 2026-05-03 13:56:36 +02:00