omnigraph/docs/user/maintenance.md
Andrew Altshuler 6144bb18d6
feat(cli): cluster-managed maintenance addressing + init signpost (RFC-010 Slice 3) (#221)
* feat(cluster): cluster_root_for_graph_uri detection helper (RFC-010 Slice 3)

Public helper the CLI uses to refuse `init` into a cluster-managed location:
given a graph storage URI of the cluster layout (`<root>/graphs/<id>.omni`),
return the cluster root if `<root>` holds `__cluster/state.json`, else None.

Cheap by construction — a URI that doesn't match the `<root>/graphs/<id>.omni`
shape returns None with zero I/O, so ordinary `init` targets never probe
storage. Works for file:// and s3:// via the storage adapter. Adds two
ClusterStore accessors (`display_root`, `has_state`).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(cli): cluster-managed maintenance addressing + init signpost (RFC-010 Slice 3)

Two cluster-graph-aware CLI behaviors, sharing the cluster-resolution path.

Maintenance addressing. `optimize`/`repair`/`cleanup` gain
`--cluster <dir|s3://…> --cluster-graph <id>`, which resolves the graph's
storage URI from the served cluster snapshot (the same truth a `--cluster`
server boots from — `read_serving_snapshot*`) and opens it embedded. The
operator no longer hand-types `<storage>/graphs/<id>.omni`. A distinct flag is
required because the global `--graph` is `requires = server` and means a remote
multi-graph id. clap enforces both-or-neither and exclusion with the positional
URI / `--target`; an unserved graph errors loudly, pointing at `cluster apply`.

init signpost. `init` refuses a cluster-managed positional path (the
`<root>/graphs/<id>.omni` layout where `<root>` holds `__cluster/state.json`,
detected by `cluster_root_for_graph_uri`) and points at `cluster apply` — graphs
in an established cluster are created with ledger/recovery/approvals, not by
hand. The check is gated on the path shape, so ordinary `init` does no extra I/O
and existing pre-apply cluster-graph inits are unaffected.

planes guard remediation now also mentions `--cluster … --cluster-graph …`
(the two Slice-1 guard-string tests track it). Docs updated (cli-reference
Command planes, maintenance.md, cluster.md §7); the stale "no S3-hosted cluster
directories" limitation is dropped (RFC-006 landed it).

Tests (cli_cluster.rs, reusing the apply-a-cluster fixture): resolve by id,
unknown-id error, `--cluster` requires `--cluster-graph`, init refusal +
signpost, and ordinary init still works.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(cli): resolve cluster graphs from the state ledger, not the serving snapshot

Addresses the Greptile review on #221. `read_serving_snapshot*` does
all-or-nothing serving validation — recovery-sidecar checks plus a digest
verify of every catalog payload (query .gq, policy blobs). Using it to resolve
a maintenance target coupled `optimize`/`repair`/`cleanup` to the readiness of
unrelated resources: a single corrupt policy blob, or a pending recovery sweep,
would block the command before it could touch the graph — worst for `repair`,
the tool you reach for *when the cluster is degraded*.

Add `omnigraph_cluster::resolve_graph_storage_uri(cluster, graph_id)`: read the
state ledger, confirm the graph is in the applied revision, return
`graph_root(id)` — the URI is deterministically derivable, no catalog
validation. The CLI's cluster resolver now calls it.

Test: `optimize --cluster … --cluster-graph …` still resolves after the catalog
payloads (`__cluster/resources/`) are removed — the ledger-only path is not
blocked by degraded/unrelated catalog state.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 02:52:21 +03:00

7.6 KiB

Maintenance: Optimize, Repair & Cleanup

db/omnigraph/optimize.rs and db/omnigraph/repair.rs.

Addressing (RFC-010). optimize, repair, and cleanup are storage-plane CLI commands: they run with direct storage access against a positional URI, --target, or --cluster <dir|s3://…> --cluster-graph <id> (which resolves the graph's storage URI from the served cluster state, so you needn't know the <storage>/graphs/<id>.omni layout). They never run through a server, and reject --server / --graph or a --target that resolves to a remote (http(s)://) URL with a declared error. There are no server routes for them by design — to maintain a server-backed graph, run them out-of-band against the graph's storage URI. See the Command planes section of cli-reference.md.

optimize_all_tables(db) — non-destructive

  • Lance compact_files() on every node + edge table on main, then publishes the compacted version to the __manifest so the manifest's table_version tracks the compacted Lance HEAD. Reads pin the manifest version, so without this publish compaction would be invisible to readers and would break the HEAD-vs-manifest precondition of the next schema apply / strict update/delete ("stale view … refresh and retry"). The publish advances the graph version (a system-attributed commit) only for tables that actually compacted.
  • Rewrites small fragments into fewer large ones; old fragments remain reachable via older manifests until cleanup runs.
  • Each table's compact→publish runs under its per-(table, main) write queue (serializing with concurrent mutations — compaction is a Lance Rewrite op that retryable-conflicts with a concurrent merge/update/delete on overlapping fragments). The Lance-HEAD-before-manifest-publish gap is covered by a SidecarKind::Optimize recovery sidecar (loose-match): a crash in that window rolls the compacted version forward on the next Omnigraph::open (compaction is content-preserving, so roll-forward is always safe).
  • Requires a recovered graph. optimize refuses (errors) when an unresolved recovery sidecar is present under __recovery — operating on an unrecovered graph could publish a partial write the open-time recovery sweep would roll back. Reopen the graph to run the recovery sweep, then re-run optimize.
  • Uncovered drift is skipped, not interpreted. If a table's Lance HEAD is ahead of the version recorded in __manifest and no recovery sidecar covers that movement, optimize reports skipped: Some(DriftNeedsRepair) with the manifest/head versions and leaves the table untouched. Run omnigraph repair to classify and explicitly publish that drift.
  • Bounded by OMNIGRAPH_MAINTENANCE_CONCURRENCY (default 8).
  • Returns [TableOptimizeStats { table_key, fragments_removed, fragments_added, committed, skipped, manifest_version, lance_head_version }].
  • Blob tables are skipped. A table that declares any Blob property is not compacted: it is reported with skipped: Some(BlobColumnsUnsupportedByLance) (and logged via tracing::warn) instead of compacted, and the rest of the sweep proceeds normally. The current Lance compact_files mis-decodes blob-v2 columns under its forced BlobHandling::AllBinary read; reads and writes are unaffected — only compaction is. This is gated by LANCE_SUPPORTS_BLOB_COMPACTION (db/omnigraph/optimize.rs) and removed when the upstream Lance fix lands (see docs/dev/lance.md). Consequence: fragment count and deleted-row space on blob tables are not reclaimed until then; query results are never affected.

repair_all_tables(db, options) — explicit

  • Handles uncovered manifest/head drift: a table's Lance HEAD is ahead of the manifest pin and no recovery sidecar records the writer intent.
  • Preview by default. omnigraph repair --json <uri> reports each table's classification, action, manifest/head versions, Lance operation names, and any classification error. --confirm publishes only verified maintenance drift; if any suspicious or unverifiable table is refused, the CLI prints the per-table output and exits non-zero. --force --confirm also publishes suspicious or unverifiable drift after operator review.
  • Classifies drift by reading Lance transactions from manifest_version + 1 through lance_head_version. Only ReserveFragments and Rewrite are verified maintenance. Semantic operations such as Append, Delete, Update, Merge, or missing transaction history are not auto-healed.
  • Publishes repair by advancing __manifest to the existing Lance HEAD; it does not rewrite Lance data. If the publish succeeds, normal reads and strict writes use the repaired version. If it fails, no new data-side partial state was created.
  • Requires a clean recovery state. Pending __recovery sidecars still belong to automatic sidecar recovery, not manual repair.

cleanup_all_tables(db, options) — destructive

  • Lance cleanup_old_versions() per table.
  • Removes manifests (and their unique fragments) older than the retention policy.
  • CleanupPolicyOptions { keep_versions: Option<u32>, older_than: Option<Duration> } — at least one is required.
  • Returns [TableCleanupStats { table_key, bytes_removed, old_versions_removed, error }].
  • Fault-isolated per table. A single table's transient failure (version GC or orphan reclaim) is recorded on that table's stats row (error: Some(..), logged via tracing) and never aborts the healthy tables — cleanup is the convergence backstop, so it does as much as it can and converges on re-run. The CLI reports any failed tables; rerun cleanup to retry them.
  • CLI guards with --confirm; without it, prints a preview line.
  • Recovery floor: --keep < 3 may garbage-collect Lance versions that the open-time recovery sweep needs as a rollback target (the sweep restores to the branch's manifest-pinned table version, which is HEAD-1 in the typical Phase B → Phase C drift case). Default --keep 10 is safe.
  • Orphaned-branch reconciliation: before the version GC, cleanup runs reconcile_orphaned_branches, which force_delete_branches any per-table or commit-graph Lance branch absent from the manifest branch list. These orphans arise when a branch_delete flips the manifest authority but a downstream best-effort reclaim does not complete (see branches-commits.md). The reconciler is authority-derived and idempotent (it no-ops once nothing is orphaned), runs regardless of the keep_versions / older_than values (those gate version GC only), and never reclaims main or system-branch forks. Reclaimed forks are logged via tracing::info.

Tombstones

Logical sub-table delete markers in __manifest; tombstone_object_id(table_key, version) excludes a sub-table version from snapshot reconstruction.

Internal schema migrations (db/manifest/migrations.rs)

Version evolutions of the on-disk __manifest shape are reconciled automatically on the first write under a new binary. INTERNAL_MANIFEST_SCHEMA_VERSION declares the shape the binary expects; the on-disk stamp omnigraph:internal_schema_version (Lance schema-level metadata) records the on-disk shape. The publisher's open-for-write path calls migrate_internal_schema before reading state; reads are side-effect-free. No operator action is required for in-place upgrades. See storage.md → Internal schema versioning for the full mechanism.

A binary opening a manifest stamped at a version higher than it knows about refuses to publish with a clear "upgrade omnigraph first" error — old binaries cannot clobber a newer schema.