feat(cli): cluster-managed maintenance addressing + init signpost (RFC-010 Slice 3) (#221)

* feat(cluster): cluster_root_for_graph_uri detection helper (RFC-010 Slice 3) Public helper the CLI uses to refuse `init` into a cluster-managed location: given a graph storage URI of the cluster layout (`<root>/graphs/<id>.omni`), return the cluster root if `<root>` holds `__cluster/state.json`, else None. Cheap by construction — a URI that doesn't match the `<root>/graphs/<id>.omni` shape returns None with zero I/O, so ordinary `init` targets never probe storage. Works for file:// and s3:// via the storage adapter. Adds two ClusterStore accessors (`display_root`, `has_state`). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(cli): cluster-managed maintenance addressing + init signpost (RFC-010 Slice 3) Two cluster-graph-aware CLI behaviors, sharing the cluster-resolution path. Maintenance addressing. `optimize`/`repair`/`cleanup` gain `--cluster <dir|s3://…> --cluster-graph <id>`, which resolves the graph's storage URI from the served cluster snapshot (the same truth a `--cluster` server boots from — `read_serving_snapshot*`) and opens it embedded. The operator no longer hand-types `<storage>/graphs/<id>.omni`. A distinct flag is required because the global `--graph` is `requires = server` and means a remote multi-graph id. clap enforces both-or-neither and exclusion with the positional URI / `--target`; an unserved graph errors loudly, pointing at `cluster apply`. init signpost. `init` refuses a cluster-managed positional path (the `<root>/graphs/<id>.omni` layout where `<root>` holds `__cluster/state.json`, detected by `cluster_root_for_graph_uri`) and points at `cluster apply` — graphs in an established cluster are created with ledger/recovery/approvals, not by hand. The check is gated on the path shape, so ordinary `init` does no extra I/O and existing pre-apply cluster-graph inits are unaffected. planes guard remediation now also mentions `--cluster … --cluster-graph …` (the two Slice-1 guard-string tests track it). Docs updated (cli-reference Command planes, maintenance.md, cluster.md §7); the stale "no S3-hosted cluster directories" limitation is dropped (RFC-006 landed it). Tests (cli_cluster.rs, reusing the apply-a-cluster fixture): resolve by id, unknown-id error, `--cluster` requires `--cluster-graph`, init refusal + signpost, and ordinary init still works. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(cli): resolve cluster graphs from the state ledger, not the serving snapshot Addresses the Greptile review on #221. `read_serving_snapshot*` does all-or-nothing serving validation — recovery-sidecar checks plus a digest verify of every catalog payload (query .gq, policy blobs). Using it to resolve a maintenance target coupled `optimize`/`repair`/`cleanup` to the readiness of unrelated resources: a single corrupt policy blob, or a pending recovery sweep, would block the command before it could touch the graph — worst for `repair`, the tool you reach for *when the cluster is degraded*. Add `omnigraph_cluster::resolve_graph_storage_uri(cluster, graph_id)`: read the state ledger, confirm the graph is in the applied revision, return `graph_root(id)` — the URI is deterministically derivable, no catalog validation. The CLI's cluster resolver now calls it. Test: `optimize --cluster … --cluster-graph …` still resolves after the catalog payloads (`__cluster/resources/`) are removed — the ledger-only path is not blocked by degraded/unrelated catalog state. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 02:28:07 +02:00 · 2026-06-14 02:52:21 +03:00 · 2026-06-14 02:52:21 +03:00 · 6144bb18d6
commit 6144bb18d6
parent d6cf5b298c
13 changed files with 401 additions and 14 deletions
--- a/docs/user/cluster.md
+++ b/docs/user/cluster.md
@ -251,12 +251,28 @@ with an in-flight apply.
  loads). It just no longer describes the deployment — a server boots from
  one source or the other, never a merge of both.

+## 7. Maintaining a cluster graph
+
+Storage maintenance (`optimize` / `repair` / `cleanup`) is **not** a control-plane
+operation — it runs out-of-band, with direct storage access, against the graph's
+roots. Address a cluster graph by name instead of hand-typing its storage path:
+
+```bash
+omnigraph optimize --cluster ./company-brain --cluster-graph knowledge
+omnigraph cleanup  --cluster ./company-brain --cluster-graph knowledge --keep 10 --confirm
+# --cluster also takes the storage-root URI directly (config-free):
+omnigraph optimize --cluster s3://bucket/clusters/company-brain --cluster-graph knowledge
+```
+
+The graph's storage URI is resolved from the **served cluster state** (the same
+truth a `--cluster` server boots from); a graph that hasn't been applied yet is
+not resolvable. Run these from a host with storage access — there are no server
+routes for them. Conversely, **`init` refuses** a cluster-managed path: graphs in
+a cluster are created by `cluster apply`, not by hand.
+
 ## What the control plane does not do (yet)

 - **No hot reload** — applied changes serve on the next restart.
- **No S3-hosted cluster directories** — the config dir, ledger, catalog,
-  and derived graph roots are local-filesystem paths today. (Individual
-  *graphs* on S3 are a server feature outside cluster mode.)
 - **No data operations** — rows move through `omnigraph load / ingest /
  mutate` against the graph roots, with branches and merges as usual.
 - **Stored-query exposure is all-or-nothing per cluster** — every applied