omnigraph/docs/maintenance.md

30 lines
2.1 KiB
Markdown
Raw Normal View History

# Maintenance: Optimize & Cleanup
`db/omnigraph/optimize.rs`.
## `optimize_all_tables(db)` — non-destructive
- Lance `compact_files()` on every node + edge table on `main`.
- Rewrites small fragments into fewer large ones; old fragments remain reachable via older manifests.
- Bounded by `OMNIGRAPH_MAINTENANCE_CONCURRENCY` (default 8).
- Returns `[TableOptimizeStats { table_key, fragments_removed, fragments_added, committed }]`.
## `cleanup_all_tables(db, options)` — destructive
- Lance `cleanup_old_versions()` per table.
- Removes manifests (and their unique fragments) older than the retention policy.
- `CleanupPolicyOptions { keep_versions: Option<u32>, older_than: Option<Duration> }` — at least one is required.
- Returns `[TableCleanupStats { table_key, bytes_removed, old_versions_removed }]`.
- CLI guards with `--confirm`; without it, prints a preview line.
recovery: rename composite test, strip ticket references, address review Three bundled changes: 1. Rename `tests/agent_lifecycle.rs` -> `tests/composite_flow.rs` (and the test function). OmniGraph is consumed by both humans and agents - naming the test after one audience misframes the library. 2. Strip Linear ticket IDs, PR numbers, bot reviewer names, and review-round labels from source, tests, and docs added by this branch. Internal traceability belongs in commit messages and PR descriptions, not in checked-in artifacts. Upstream lance-format/lance issue refs and pre-existing MR-XXX refs in docs not touched by this branch are left alone. 3. Two outstanding review findings addressed: - `needs_index_work_node` / `needs_index_work_edge`: propagate `count_rows` errors instead of `unwrap_or(0)`. Silently treating transient I/O failures as "0 rows" risked skipping a table from the recovery sidecar pin set that was actually about to be modified. - `recovery_multi_sidecar_requires_fresh_snapshot_for_correctness`: strengthen the assertion to fail when sidecar B classifies under a stale snapshot. The new assertion checks post-recovery Lance HEAD == v3 (no `Dataset::restore` ran). The previous "sidecar deleted + audit rows present" pair passed in both the bug and fix paths because both delete the sidecar and write an audit row; the differentiator is the post-recovery HEAD. Strengthening the assertion exposed an additional nuance: in this overlapping- sidecar scenario sidecar B's audit kind is RolledBack (no-op) rather than RolledForward, since sidecar A's roll-forward publishes Lance HEAD as the new manifest pin (absorbing B's work). The docstring now explains why this is correct given current `roll_forward_all` semantics. All workspace tests pass with --features failpoints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 13:56:36 +02:00
- **Recovery floor:** `--keep < 3` may garbage-collect Lance versions that the open-time recovery sweep needs as a rollback target (the sweep restores to the manifest-pinned `expected_version`, which is HEAD-1 in the typical Phase B → Phase C drift case). Default `--keep 10` is safe.
## Tombstones
Logical sub-table delete markers in `__manifest`; `tombstone_object_id(table_key, version)` excludes a sub-table version from snapshot reconstruction.
Add internal-schema versioning + auto-migration for __manifest The on-disk shape of `__manifest` is reconciled with the binary via a single stamp + dispatcher in `db/manifest/migrations.rs`: - `INTERNAL_MANIFEST_SCHEMA_VERSION = 2` declares the shape this binary writes. - The on-disk stamp `omnigraph:internal_schema_version` lives in the manifest dataset's schema-level metadata (Lance `update_schema_metadata`). - `migrate_internal_schema(&mut dataset)` walks `match`-arm steps forward from the on-disk stamp until it matches the binary, then returns. Idempotent. - `init_manifest_repo` stamps the current version at creation; the publisher's open-for-write path runs pending migrations before reading state. Reads stay side-effect-free. - Forward-version protection: a stamp higher than the binary's known version triggers a clear "upgrade omnigraph first" error so an old binary cannot clobber a newer schema. Self-heals existing pre-MR-766 deployments by auto-applying the v1→v2 step: the `lance-schema:unenforced-primary-key` annotation on `__manifest.object_id` that engages Lance's row-level CAS at commit time. New repos created via `init` are stamped at v2 immediately and don't need migration. Adding a future on-disk shape change is one constant bump, one match arm in `migrate_internal_schema`, and one test — no new branches in unrelated code paths. Code outside the migration module never inspects the stamp. New tests in `manifest/tests.rs`: - `test_init_stamps_internal_schema_version` - `test_publish_migrates_pre_stamp_manifest_to_current_version` - `test_publish_rejects_manifest_stamped_at_future_version` Docs: `docs/storage.md`, `docs/maintenance.md`, `docs/constants.md` updated per the AGENTS.md maintenance contract.
2026-04-29 11:44:14 +00:00
## Internal schema migrations (`db/manifest/migrations.rs`)
Version evolutions of the on-disk `__manifest` shape are reconciled automatically on the first write under a new binary. `INTERNAL_MANIFEST_SCHEMA_VERSION` declares the shape the binary expects; the on-disk stamp `omnigraph:internal_schema_version` (Lance schema-level metadata) records the on-disk shape. The publisher's open-for-write path calls `migrate_internal_schema` before reading state; reads are side-effect-free. No operator action is required for in-place upgrades. See [storage.md → Internal schema versioning](storage.md) for the full mechanism.
A binary opening a manifest stamped at a version *higher* than it knows about refuses to publish with a clear "upgrade omnigraph first" error — old binaries cannot clobber a newer schema.