Remove developer-only scaffolding that leaked into the public user/operator docs, while preserving every user-facing behavior, command, flag, endpoint, constant, and env var. No behavior changes. Removed across 18 files: - internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N"); - source-code paths (crates/**/*.rs, *.pest) and internal struct/function dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types, internal fn names like fork_branch_from_state, optimize_all_tables); - Lance-internal blocker prose (upstream issue numbers, blob-decode cause, sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g. "optimize skips Blob-column tables; reads/writes unaffected"); - pre-v0.4.0 Run-state-machine archaeology. Internal IR/lowering/recovery-internals sections were either trimmed to a brief user-facing note (e.g. "Traversal execution", "interrupted writes recover automatically; recovery commits are recorded under actor omnigraph:recovery") or removed. Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints, error taxonomy, every constant and env var (verified none dropped from the constants cheat-sheet), and the operator-facing explanations of on-disk artifacts. Residual "legacy" mentions are all user-facing (the deprecated omnigraph.yaml, the legacy token chain, old command names). Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across docs/user; zero broken links; check-agents-md.sh green. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
6.2 KiB
Maintenance: Optimize, Repair & Cleanup
Addressing. optimize, repair, and cleanup are storage-plane CLI commands: they run with direct storage access against a positional URI, --target, or --cluster <dir|s3://…> --cluster-graph <id> (which resolves the graph's storage URI from the served cluster state, so you needn't know the <storage>/graphs/<id>.omni layout). They never run through a server, and reject --server / --graph or a --target that resolves to a remote (http(s)://) URL with a declared error. There are no server routes for them by design — to maintain a server-backed graph, run them out-of-band against the graph's storage URI. See the Command planes section of cli-reference.md.
optimize — non-destructive
- Compacts every node + edge table on
main, then publishes the compacted version to the__manifestso the manifest's recorded version tracks the compacted state. Reads pin the manifest version, so without this publish compaction would be invisible to readers and would break the version precondition of the next schema apply / strict update/delete ("stale view … refresh and retry"). The publish advances the graph version (a system-attributed commit) only for tables that actually compacted. - Rewrites small fragments into fewer large ones; old fragments remain reachable via older versions until
cleanupruns. - Each table's compact→publish serializes with concurrent mutations on the same table. A crash mid-operation is recovered automatically on the next open (compaction is content-preserving, so roll-forward is always safe).
- Requires a recovered graph.
optimizerefuses (errors) when a pending crash-recovery operation is present — operating on an unrecovered graph could publish a partial write that recovery would roll back. Reopen the graph to run recovery, then re-runoptimize. - Uncovered drift is skipped, not interpreted. If a table's underlying version is ahead of the version recorded in
__manifestand no crash-recovery record covers that movement,optimizereportsskipped: DriftNeedsRepairwith the manifest/head versions and leaves the table untouched. Runomnigraph repairto classify and explicitly publish that drift. - Bounded by
OMNIGRAPH_MAINTENANCE_CONCURRENCY(default 8). - Returns per-table stats:
table_key, fragments_removed, fragments_added, committed, skipped, manifest_version, lance_head_version. - Blob tables are skipped. A table that declares any
Blobproperty is not compacted: it is reported withskipped: BlobColumnsUnsupportedByLance(and logged) instead of compacted, and the rest of the sweep proceeds normally. Reads and writes are unaffected — only compaction is. Consequence: fragment count and deleted-row space on blob tables are not reclaimed; query results are never affected.
repair — explicit
- Handles uncovered manifest/head drift: a table's underlying version is ahead of the manifest pin and no crash-recovery record explains the movement.
- Preview by default.
omnigraph repair --json <uri>reports each table'sclassification,action, manifest/head versions, underlying operation names, and any classification error.--confirmpublishes only verified maintenance drift; if any suspicious or unverifiable table is refused, the CLI prints the per-table output and exits non-zero.--force --confirmalso publishes suspicious or unverifiable drift after operator review. - Classifies drift by reading the table's transaction history from
manifest_version + 1through the current head. Only fragment-reservation and rewrite (compaction) operations are verified maintenance. Semantic operations such as append, delete, update, merge, or missing transaction history are not auto-healed. - Publishes repair by advancing
__manifestto the existing head; it does not rewrite data. If the publish succeeds, normal reads and strict writes use the repaired version. If it fails, no new data-side partial state was created. - Requires a clean recovery state. A pending crash-recovery operation still belongs to automatic recovery, not manual repair.
cleanup — destructive
- Garbage-collects old versions per table.
- Removes versions (and their unique fragments) older than the retention policy.
- Policy options
keep_versionsandolder_than— at least one is required. - Returns per-table stats:
table_key, bytes_removed, old_versions_removed, error. - Fault-isolated per table. A single table's transient failure (version GC or
orphan reclaim) is recorded on that table's stats row (with an
error) and logged, and never aborts the healthy tables — cleanup is the convergence backstop, so it does as much as it can and converges on re-run. The CLI reports any failed tables; reruncleanupto retry them. - CLI guards with
--confirm; without it, prints a preview line. - Recovery floor:
--keep < 3may garbage-collect versions that crash recovery needs as a rollback target. Default--keep 10is safe. - Orphaned-branch reconciliation: before the version GC, cleanup reclaims any per-table or commit-graph branch absent from the manifest branch list. These orphans arise when a
branch_deleteflips the manifest authority but a downstream best-effort reclaim does not complete (see branches-commits.md). The reconciler is idempotent (it no-ops once nothing is orphaned), runs regardless of thekeep_versions/older_thanvalues (those gate version GC only), and never reclaimsmainor system-branch forks. Reclaimed forks are logged.
Tombstones
Logical sub-table delete markers in __manifest that exclude a sub-table version from snapshot reconstruction.
Internal schema migrations
Version evolutions of the on-disk __manifest shape are reconciled automatically on the first write under a new binary. An on-disk stamp records the shape; the binary migrates it forward before reading state, and reads are side-effect-free. No operator action is required for in-place upgrades. See storage.md → Internal schema versioning for the full mechanism.
A binary opening a manifest stamped at a version higher than it knows about refuses to publish with a clear "upgrade omnigraph first" error — old binaries cannot clobber a newer schema.