omnigraph/crates
Ragnor Comerford 8c6506f5cd
recovery: close four correctness gaps (schema-apply, branch-aware, restore short-circuit, merge parent)
B1. Schema-apply atomicity. Before this commit, a failure between
    `_schema.pg.staging` write and the manifest publish left the repo
    corrupt: Lance HEADs advanced under the new schema, manifest stayed
    at old pins, and on reopen schema-state recovery deleted the staging
    files (manifest's table set still matched the live schema), then
    manifest-drift recovery rolled the table versions forward — leaving
    new-schema data on disk with the old `_schema.pg` live.

    Fix: a SchemaApply sidecar is the marker that Phase B completed but
    Phase C didn't. New helper `has_schema_apply_sidecar` is consulted
    by `recover_schema_state_files` BEFORE its disambiguation logic;
    when present, it completes the staging→final rename so the
    subsequent manifest-drift roll-forward sees the new catalog.

B2. Branch-aware recovery. Sidecars from feature-branch writers were
    being classified against main's snapshot and main's Lance HEAD,
    silently no-op'ing or rolling back the wrong table version (the
    classifier saw NoMovement; the writer's drift on the feature branch
    persisted; subsequent feature writers surfaced
    ExpectedVersionMismatch).

    Fix: SidecarTablePin gets an optional `table_branch` field;
    `recover_manifest_drift` opens a per-branch coordinator
    (`GraphCoordinator::open_branch`) per sidecar; `open_lance_head`,
    `restore_table_to_version`, and `roll_forward_all` honor the pin's
    branch via `Dataset::checkout_branch`.

B3. Remove fragment-id short-circuit in `restore_table_to_version`.
    Equal fragment IDs do NOT imply equal content: Lance index commits
    and deletion-vector updates change the manifest without touching
    fragment IDs. Skipping restore in those cases would leave Lance HEAD
    ahead of the manifest with no recovery artifact left. Restore is
    now unconditional; pile-up under repeated mid-rollback crashes
    bounded and reclaimed by `omnigraph cleanup`.

B4. Recovered branch_merge records merge parent. `record_audit` always
    called `append_commit`, dropping `merged_parent_commit_id`. Future
    `branch_merge feature -> main` between the same pair lost
    already-up-to-date detection. RecoverySidecar gets an optional
    `merge_source_commit_id`; `branch_merge_on_current_target`
    populates it from `source_head_commit_id`; `record_audit`
    dispatches to `append_merge_commit` when present.

New tests: feature-branch sidecar classification (B2); B1 deepens the
existing schema_apply test with live-`_schema.pg` and new-type
assertions; B4 deepens the existing branch_merge test by reading
`_graph_commits.lance` and asserting a non-null `merged_parent_commit_id`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 23:39:41 +02:00
..
omnigraph recovery: close four correctness gaps (schema-apply, branch-aware, restore short-circuit, merge parent) 2026-05-03 23:39:41 +02:00
omnigraph-cli release: bump version to 0.4.1 2026-05-02 23:20:50 +02:00
omnigraph-compiler release: bump version to 0.4.1 2026-05-02 23:20:50 +02:00
omnigraph-server release: bump version to 0.4.1 2026-05-02 23:20:50 +02:00