recovery: rename composite test, strip ticket references, address review

Three bundled changes:

1. Rename `tests/agent_lifecycle.rs` -> `tests/composite_flow.rs` (and
   the test function). OmniGraph is consumed by both humans and agents
   - naming the test after one audience misframes the library.

2. Strip Linear ticket IDs, PR numbers, bot reviewer names, and
   review-round labels from source, tests, and docs added by this
   branch. Internal traceability belongs in commit messages and PR
   descriptions, not in checked-in artifacts. Upstream
   lance-format/lance issue refs and pre-existing MR-XXX refs in docs
   not touched by this branch are left alone.

3. Two outstanding review findings addressed:
   - `needs_index_work_node` / `needs_index_work_edge`: propagate
     `count_rows` errors instead of `unwrap_or(0)`. Silently treating
     transient I/O failures as "0 rows" risked skipping a table from
     the recovery sidecar pin set that was actually about to be
     modified.
   - `recovery_multi_sidecar_requires_fresh_snapshot_for_correctness`:
     strengthen the assertion to fail when sidecar B classifies under
     a stale snapshot. The new assertion checks post-recovery Lance
     HEAD == v3 (no `Dataset::restore` ran). The previous "sidecar
     deleted + audit rows present" pair passed in both the bug and
     fix paths because both delete the sidecar and write an audit
     row; the differentiator is the post-recovery HEAD. Strengthening
     the assertion exposed an additional nuance: in this overlapping-
     sidecar scenario sidecar B's audit kind is RolledBack (no-op)
     rather than RolledForward, since sidecar A's roll-forward
     publishes Lance HEAD as the new manifest pin (absorbing B's
     work). The docstring now explains why this is correct given
     current `roll_forward_all` semantics.

All workspace tests pass with --features failpoints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Ragnor Comerford 2026-05-03 13:56:36 +02:00
parent 78cc548846
commit 05e52f2ee0
No known key found for this signature in database
22 changed files with 430 additions and 374 deletions

View file

@ -63,7 +63,7 @@ flowchart TB
nodes["nodes/{fnv1a64-hex}/<br/>one dataset per node type"]:::l2
edges["edges/{fnv1a64-hex}/<br/>one dataset per edge type"]:::l2
cgraph["_graph_commits.lance/<br/>_graph_commit_actors.lance/<br/>_graph_commit_recoveries.lance/"]:::l2
recovery["__recovery/{ulid}.json<br/>MR-847 sidecars (transient)"]:::l2
recovery["__recovery/{ulid}.json<br/>recovery sidecars (transient)"]:::l2
refs["_refs/branches/{name}.json<br/>graph-level branches"]:::l2
repo --> manifest
@ -92,8 +92,8 @@ flowchart TB
- **`__manifest/`** is a Lance dataset whose rows describe which sub-table version is published at which graph-branch. Reading a snapshot starts here.
- **`nodes/`** and **`edges/`** are sibling directories holding one Lance dataset per declared type. Names are `fnv1a64-hex` of the type name to keep paths fixed-length and case-safe.
- **`_graph_commits.lance`** is an L2 dataset that records the graph-level commit DAG, with a paired `_graph_commit_actors.lance` for the actor map. (Pre-v0.4.0 repos also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; MR-770 sweeps these in production.)
- **`_graph_commit_recoveries.lance`** (MR-847) — one row per recovery sweep action. Joined to `_graph_commits.lance` by `graph_commit_id`; the linked commit row carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join. See `crates/omnigraph/src/db/recovery_audit.rs`.
- **`__recovery/{ulid}.json`** (MR-847) — transient sidecar files written by the four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) before Phase B begins, deleted after Phase C succeeds. A sidecar persisting after process exit means the writer crashed in the Phase B → Phase C window; the next `Omnigraph::open` recovery sweep processes it. Steady-state directory is empty. See `crates/omnigraph/src/db/manifest/recovery.rs`.
- **`_graph_commit_recoveries.lance`** — one row per recovery sweep action. Joined to `_graph_commits.lance` by `graph_commit_id`; the linked commit row carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join. See `crates/omnigraph/src/db/recovery_audit.rs`.
- **`__recovery/{ulid}.json`** — transient sidecar files written by the four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) before Phase B begins, deleted after Phase C succeeds. A sidecar persisting after process exit means the writer crashed in the Phase B → Phase C window; the next `Omnigraph::open` recovery sweep processes it. Steady-state directory is empty. See `crates/omnigraph/src/db/manifest/recovery.rs`.
- **`_refs/branches/{name}.json`** is graph-level branch metadata — pointers from a branch name to the manifest version it heads.
- **Inside each Lance dataset** (orange): the standard Lance directory layout. `_versions/{n}.manifest` records every commit; `data/` holds the actual Arrow fragments; `_indices/{uuid}/` holds index segments with their own `fragment_bitmap` for partial coverage; `_refs/` holds Lance-native per-dataset branches and tags.