mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-09 01:35:18 +02:00
Implement the remaining half of the open-time recovery sweep. Roll-forward execution (db/manifest/recovery.rs::roll_forward_all): constructs a GraphNamespacePublisher directly (recovery runs inside Omnigraph::open before the engine struct exists, so we can't go through Omnigraph::commit_updates_on_branch_with_expected). Builds a ManifestChange::Update per sidecar table reading row_count and TableVersionMetadata from the dataset at post_commit_pin (cheap; manifest-level reads, not a row scan), then calls publisher.publish with expected_table_versions = sidecar.expected_version per table. Single __manifest CAS extends every pin atomically — all-or-nothing at the substrate. Persistent CAS contention surfaces as the typed ExpectedVersionMismatch error and leaves the sidecar in place for the next open's retry. Audit model (new crates/omnigraph/src/db/recovery_audit.rs + record_audit() in recovery.rs): each successful recovery sweep records a graph-commit row tagged with actor_id="omnigraph:recovery" plus a row in a new sibling table _graph_commit_recoveries.lance carrying recovery_kind (RolledForward | RolledBack), recovery_for_actor (the sidecar's original actor_id), operation_id (sidecar ULID), sidecar_writer_kind, per_table_outcomes (JSON-serialized for schema flexibility), and created_at. Operators investigating "did my mutation land?" can find the answer via `omnigraph commit list --filter actor=omnigraph:recovery` joined to the recoveries table by graph_commit_id. The sibling-table choice avoids bumping INTERNAL_MANIFEST_SCHEMA_VERSION or migrating _graph_commits.lance. Same not-atomic-pair-write shape as the existing _graph_commits + _graph_commit_actors split — a crash between the two sequential writes leaves an orphan commit row with no recovery row. Recovery sweep tolerates this: re-entry classifies already-restored / already-published tables as NoMovement, the action is a no-op, and the audit append is retried. Note on classifier: process_sidecar's RollBack arm now restores RolledPastExpected, UnexpectedAtP1, AND UnexpectedMultistep (any drift class). Earlier Phase 3 logic restricted to RolledPastExpected only, which left UnexpectedAtP1/UnexpectedMultistep tables drifted; the all-or-nothing decision rule per docs/invariants.md §VI.23 demands all drifted tables be restored. 3 new integration tests in tests/recovery.rs (7 total now): - recovery_rolls_forward_after_phase_b_completes — happy-path roll-forward; audit row recorded; idempotent on second open. - recovery_rolls_back_records_audit_row_with_recovery_actor — roll-back path also records an audit row with the original actor. - recovery_rolls_forward_with_null_actor — sidecar without actor_id still records the audit row (recovery_for_actor = None). 3 new unit tests in db::recovery_audit pin the round-trip + persistence + recovery_kind string parsing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| omnigraph | ||
| omnigraph-cli | ||
| omnigraph-compiler | ||
| omnigraph-server | ||