omnigraph/crates
Ragnor Comerford 164bafbbe7
recovery: address PR #72 review findings
Bot reviewers (cubic, cursor, chatgpt-codex) caught 4 merge-blocking
bugs + 3 strongly-recommended fixes + 3 doc errors in the initial PR.
Each fix has a paired test demonstrating the bug before the fix.

Merge-blocking fixes:

- BranchMerge moved to loose-match classifier arm. publish_rewritten_
  merge_table runs multiple commit_staged calls per table (merge_insert
  + delete_where + index rebuilds). Strict classification rolled back
  valid completed Phase B work as UnexpectedMultistep. Three new unit
  tests pin the loose-match behavior for BranchMerge.

- branch_merge sidecar uses self.active_branch() (the resolved target
  branch) instead of inferring from the first sorted table key. The
  previous heuristic could record None (== main) when the merge target
  was a non-main branch, causing recovery to publish to the wrong
  manifest namespace.

- Best-effort sidecar delete in all 5 writer sites (mutation, loader,
  schema_apply, branch_merge, ensure_indices). Previously, a sidecar
  cleanup failure after a successful manifest publish would error out
  the user's call for a write that already landed. Now: log a warning
  and ignore — the next open's recovery sweep tidies the stale sidecar
  via NoMovement classification.

- ensure_indices sidecar scoped to tables that need work via new
  helpers needs_index_work_node / needs_index_work_edge. Previously
  the sidecar pinned every catalog table; if only one needed indexing,
  the others classified as NoMovement and the all-or-nothing decision
  rolled back legitimate index work.

Strongly-recommended fixes:

- recover_manifest_drift now takes &mut GraphCoordinator and refreshes
  between sidecars. Sidecar B's classification needs to see sidecar
  A's manifest changes, otherwise B can be classified against stale
  pins and incorrectly roll back work that just landed.

- list_sidecars sorts URIs before reading. Sidecar filenames are
  ULIDs (chronologically sortable), so this gives deterministic,
  time-ordered processing. Filesystem-order was nondeterministic.

- ReadOnly opens skip recover_schema_state_files too (was: only the
  MR-847 sweep was gated). Read-only consumers may run with read-only
  credentials; silent open-time mutations violate the contract.

Doc cleanups:

- Removed stale "Phase 4 placeholder" comment from
  recover_manifest_drift.
- docs/runs.md decision-tree wording now correctly surfaces the
  InvariantViolation abort path.
- docs/branches-commits.md clarifies actor_id is in
  _graph_commit_actors.lance (joined by graph_commit_id), not on
  _graph_commits.lance itself.

Test surface (post-fixes):
- 25 unit tests in db::manifest::recovery (+4 from this commit).
- 10 integration tests in tests/recovery.rs (+3 from this commit).
- ~672 tests across ~25 binaries pass with --features failpoints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 12:21:40 +02:00
..
omnigraph recovery: address PR #72 review findings 2026-05-03 12:21:40 +02:00
omnigraph-cli release: bump version to 0.4.1 2026-05-02 23:20:50 +02:00
omnigraph-compiler release: bump version to 0.4.1 2026-05-02 23:20:50 +02:00
omnigraph-server release: bump version to 0.4.1 2026-05-02 23:20:50 +02:00