mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-12 01:45:14 +02:00
Bundle of three correctness fixes plus a shared invariants helper that
existing tests now use.
1. SchemaApply atomicity: close the residual gap where a sidecar exists
but staging files don't (e.g., Phase B failure BEFORE
`_schema.pg.staging` write). `recover_schema_state_files` now returns
a `SchemaStateRecovery` discriminator (`Noop` /
`CleanedStaging` / `CompletedStagingRename { schema_apply_sidecar }`);
the token threads through `recover_manifest_drift` →
`process_sidecar`. SchemaApply sidecars are eligible for roll-forward
ONLY when the staging rename completed in the same recovery pass.
Full mode rolls back; RollForwardOnly defers. Without this, recovery
would publish the manifest pin against new-schema data while
`_schema.pg` stayed old (real corruption). New failpoint
`schema_apply.before_staging_write` + new test
`schema_apply_without_schema_staging_rolls_back_on_next_open` pin
the gating.
2. Rollback target correction. Rollback now restores Lance HEAD to the
current manifest pin (`state.manifest_pinned`) instead of the
sidecar's `expected_version`. For UnexpectedAtP1/UnexpectedMultistep
classifications these can differ; the old code could regress Lance
HEAD past the manifest pin, re-introducing drift in the OTHER
direction. The new behavior establishes `Lance HEAD == manifest pin`
post-rollback — the canonical drift-free invariant. Param renamed
from `expected_version` → `target_version` to match. Audit
`to_version` records the actual restore target.
This is a latent-behavior change. Any external consumer that compared
`audit.to_version` against `sidecar.expected_version` for non-trivial
classifications now sees the manifest pin instead.
3. Audit commit-graph unification. `record_audit` now opens the
per-branch commit graph for ANY sidecar with `sidecar.branch.is_some()`
— not just BranchMerge. Plain Mutation/Load/EnsureIndices commits on a
feature branch now correctly land on that branch's commit graph,
instead of main's. Closes the class of bug analogous to D2 but for
non-merge writers.
Pre-existing repos with non-main commits already on main's commit
graph stay where they are; future recoveries write to the per-branch
ref. Mixed-version compatibility is asymmetric but safe (old binaries
ignore per-branch refs they don't know about; new binaries read both).
4. Recovery invariants helper + branch-axis cells. New
`tests/helpers/recovery.rs` (~505 LOC) exports
`assert_post_recovery_invariants(repo, op_id, RecoveryExpectation)`
plus a `TableExpectation` builder. Six existing recovery tests
refactored to call it; per-test bespoke assertions replaced. Two new
branch-axis cells added in `tests/failpoints.rs`:
- `recovery_rolls_forward_load_on_feature_branch`
- `recovery_rolls_forward_ensure_indices_on_feature_branch`
The loader gains a `mutation.post_finalize_pre_publisher` failpoint
hook (gated on the `failpoints` feature; zero-cost in release) so the
load test can pin the same Phase B → Phase C boundary the mutation
path uses.
Misc:
- `Omnigraph::refresh` extracts `reload_schema_if_source_changed`:
early-return when schema source unchanged (saves IR parse + catalog
rebuild on the steady-state refresh path).
- New test injection point
`failpoint_publish_table_head_without_index_rebuild_for_test`
under `#[cfg(feature = "failpoints")]`.
Tests: 31 recovery + failpoint integration tests pass (14 + 17, up from
14 + 16). Full workspace sweep with `--features failpoints` clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
29 lines
2.1 KiB
Markdown
29 lines
2.1 KiB
Markdown
# Maintenance: Optimize & Cleanup
|
|
|
|
`db/omnigraph/optimize.rs`.
|
|
|
|
## `optimize_all_tables(db)` — non-destructive
|
|
|
|
- Lance `compact_files()` on every node + edge table on `main`.
|
|
- Rewrites small fragments into fewer large ones; old fragments remain reachable via older manifests.
|
|
- Bounded by `OMNIGRAPH_MAINTENANCE_CONCURRENCY` (default 8).
|
|
- Returns `[TableOptimizeStats { table_key, fragments_removed, fragments_added, committed }]`.
|
|
|
|
## `cleanup_all_tables(db, options)` — destructive
|
|
|
|
- Lance `cleanup_old_versions()` per table.
|
|
- Removes manifests (and their unique fragments) older than the retention policy.
|
|
- `CleanupPolicyOptions { keep_versions: Option<u32>, older_than: Option<Duration> }` — at least one is required.
|
|
- Returns `[TableCleanupStats { table_key, bytes_removed, old_versions_removed }]`.
|
|
- CLI guards with `--confirm`; without it, prints a preview line.
|
|
- **Recovery floor:** `--keep < 3` may garbage-collect Lance versions that the open-time recovery sweep needs as a rollback target (the sweep restores to the branch's manifest-pinned table version, which is HEAD-1 in the typical Phase B → Phase C drift case). Default `--keep 10` is safe.
|
|
|
|
## Tombstones
|
|
|
|
Logical sub-table delete markers in `__manifest`; `tombstone_object_id(table_key, version)` excludes a sub-table version from snapshot reconstruction.
|
|
|
|
## Internal schema migrations (`db/manifest/migrations.rs`)
|
|
|
|
Version evolutions of the on-disk `__manifest` shape are reconciled automatically on the first write under a new binary. `INTERNAL_MANIFEST_SCHEMA_VERSION` declares the shape the binary expects; the on-disk stamp `omnigraph:internal_schema_version` (Lance schema-level metadata) records the on-disk shape. The publisher's open-for-write path calls `migrate_internal_schema` before reading state; reads are side-effect-free. No operator action is required for in-place upgrades. See [storage.md → Internal schema versioning](storage.md) for the full mechanism.
|
|
|
|
A binary opening a manifest stamped at a version *higher* than it knows about refuses to publish with a clear "upgrade omnigraph first" error — old binaries cannot clobber a newer schema.
|