Bundle of three correctness fixes plus a shared invariants helper that
existing tests now use.
1. SchemaApply atomicity: close the residual gap where a sidecar exists
but staging files don't (e.g., Phase B failure BEFORE
`_schema.pg.staging` write). `recover_schema_state_files` now returns
a `SchemaStateRecovery` discriminator (`Noop` /
`CleanedStaging` / `CompletedStagingRename { schema_apply_sidecar }`);
the token threads through `recover_manifest_drift` →
`process_sidecar`. SchemaApply sidecars are eligible for roll-forward
ONLY when the staging rename completed in the same recovery pass.
Full mode rolls back; RollForwardOnly defers. Without this, recovery
would publish the manifest pin against new-schema data while
`_schema.pg` stayed old (real corruption). New failpoint
`schema_apply.before_staging_write` + new test
`schema_apply_without_schema_staging_rolls_back_on_next_open` pin
the gating.
2. Rollback target correction. Rollback now restores Lance HEAD to the
current manifest pin (`state.manifest_pinned`) instead of the
sidecar's `expected_version`. For UnexpectedAtP1/UnexpectedMultistep
classifications these can differ; the old code could regress Lance
HEAD past the manifest pin, re-introducing drift in the OTHER
direction. The new behavior establishes `Lance HEAD == manifest pin`
post-rollback — the canonical drift-free invariant. Param renamed
from `expected_version` → `target_version` to match. Audit
`to_version` records the actual restore target.
This is a latent-behavior change. Any external consumer that compared
`audit.to_version` against `sidecar.expected_version` for non-trivial
classifications now sees the manifest pin instead.
3. Audit commit-graph unification. `record_audit` now opens the
per-branch commit graph for ANY sidecar with `sidecar.branch.is_some()`
— not just BranchMerge. Plain Mutation/Load/EnsureIndices commits on a
feature branch now correctly land on that branch's commit graph,
instead of main's. Closes the class of bug analogous to D2 but for
non-merge writers.
Pre-existing repos with non-main commits already on main's commit
graph stay where they are; future recoveries write to the per-branch
ref. Mixed-version compatibility is asymmetric but safe (old binaries
ignore per-branch refs they don't know about; new binaries read both).
4. Recovery invariants helper + branch-axis cells. New
`tests/helpers/recovery.rs` (~505 LOC) exports
`assert_post_recovery_invariants(repo, op_id, RecoveryExpectation)`
plus a `TableExpectation` builder. Six existing recovery tests
refactored to call it; per-test bespoke assertions replaced. Two new
branch-axis cells added in `tests/failpoints.rs`:
- `recovery_rolls_forward_load_on_feature_branch`
- `recovery_rolls_forward_ensure_indices_on_feature_branch`
The loader gains a `mutation.post_finalize_pre_publisher` failpoint
hook (gated on the `failpoints` feature; zero-cost in release) so the
load test can pin the same Phase B → Phase C boundary the mutation
path uses.
Misc:
- `Omnigraph::refresh` extracts `reload_schema_if_source_changed`:
early-return when schema source unchanged (saves IR parse + catalog
rebuild on the steady-state refresh path).
- New test injection point
`failpoint_publish_table_head_without_index_rebuild_for_test`
under `#[cfg(feature = "failpoints")]`.
Tests: 31 recovery + failpoint integration tests pass (14 + 17, up from
14 + 16). Full workspace sweep with `--features failpoints` clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2.1 KiB
Maintenance: Optimize & Cleanup
db/omnigraph/optimize.rs.
optimize_all_tables(db) — non-destructive
- Lance
compact_files()on every node + edge table onmain. - Rewrites small fragments into fewer large ones; old fragments remain reachable via older manifests.
- Bounded by
OMNIGRAPH_MAINTENANCE_CONCURRENCY(default 8). - Returns
[TableOptimizeStats { table_key, fragments_removed, fragments_added, committed }].
cleanup_all_tables(db, options) — destructive
- Lance
cleanup_old_versions()per table. - Removes manifests (and their unique fragments) older than the retention policy.
CleanupPolicyOptions { keep_versions: Option<u32>, older_than: Option<Duration> }— at least one is required.- Returns
[TableCleanupStats { table_key, bytes_removed, old_versions_removed }]. - CLI guards with
--confirm; without it, prints a preview line. - Recovery floor:
--keep < 3may garbage-collect Lance versions that the open-time recovery sweep needs as a rollback target (the sweep restores to the branch's manifest-pinned table version, which is HEAD-1 in the typical Phase B → Phase C drift case). Default--keep 10is safe.
Tombstones
Logical sub-table delete markers in __manifest; tombstone_object_id(table_key, version) excludes a sub-table version from snapshot reconstruction.
Internal schema migrations (db/manifest/migrations.rs)
Version evolutions of the on-disk __manifest shape are reconciled automatically on the first write under a new binary. INTERNAL_MANIFEST_SCHEMA_VERSION declares the shape the binary expects; the on-disk stamp omnigraph:internal_schema_version (Lance schema-level metadata) records the on-disk shape. The publisher's open-for-write path calls migrate_internal_schema before reading state; reads are side-effect-free. No operator action is required for in-place upgrades. See storage.md → Internal schema versioning for the full mechanism.
A binary opening a manifest stamped at a version higher than it knows about refuses to publish with a clear "upgrade omnigraph first" error — old binaries cannot clobber a newer schema.