# Cluster Config **Status:** Phase 5 — cluster-booted serving (`omnigraph-server --cluster`). > New to the cluster tooling? Start with the operator how-to guide, > [cluster.md](cluster.md) — this document is the reference. Cluster config is the future control-plane configuration surface for a whole OmniGraph deployment. In this stage, OmniGraph can validate a local `cluster.yaml` folder, produce a deterministic read-only plan, inspect the local JSON state ledger, explicitly refresh/import graph observations into that ledger, manually remove a held local state lock by exact lock id, and **apply the executable subset of the plan** — stored-query and policy-bundle catalog writes, **graph creation** (a declared graph that does not exist yet is initialized by apply at the derived root), **schema updates** (soft drops only), and — behind an explicit, digest-bound **approval** — **graph deletion**. It does not perform data-loss schema migrations, start servers, or serve anything it applies: the server still boots from `omnigraph.yaml`. ## Commands ```bash omnigraph cluster validate --config ./company-brain omnigraph cluster plan --config ./company-brain --json omnigraph cluster apply --config ./company-brain --json omnigraph cluster approve graph. --config ./company-brain --as omnigraph cluster status --config ./company-brain --json omnigraph cluster refresh --config ./company-brain --json omnigraph cluster import --config ./company-brain --json omnigraph cluster force-unlock --config ./company-brain --json ``` `--config` points at a directory, not a file. The directory must contain `cluster.yaml`. When omitted, it defaults to the current directory. ## Relationship to `omnigraph.yaml` `cluster.yaml` does not replace `omnigraph.yaml`, and the two never describe the same fact. `omnigraph.yaml` is the permanent **per-operator** layer (CLI defaults, the operator's identity and credential references, graph targets for data-plane commands); `cluster.yaml` is the shared desired state of a whole deployment, read only by the `cluster` commands via `--config`. The exact contract: - **Cluster commands read `omnigraph.yaml` for exactly one thing**: the `cli.actor` default used by `apply`/`approve` when `--as` is omitted — operator identity is a per-operator fact. With `--as` present, no config is read at all. Nothing else (its graph set, targets, bind, queries, policies) ever influences a cluster command; a malformed `omnigraph.yaml` breaks only the no-flag actor lookup, loudly. - **A `--cluster` server reads `omnigraph.yaml` for nothing** — not even the implicit current-directory search runs (mode-inference rule 0). Boot from cluster state XOR `omnigraph.yaml`, never a merge. - **The other direction is ergonomics, not coupling**: a per-operator `omnigraph.yaml` may point `graphs..uri` at a cluster's derived root (`./company-brain/graphs/knowledge.omni`) so data-plane commands can use `--target ` — an ordinary local path, no special handling. ## Supported `cluster.yaml` Stage 3A accepts only this resource subset: ```yaml version: 1 metadata: name: company-brain state: backend: cluster lock: true graphs: knowledge: schema: ./knowledge.pg queries: find_experts: file: ./knowledge.gq policies: base: file: ./base.policy.yaml applies_to: [knowledge] ``` `metadata.name` is a display label. `state.backend` may be omitted or set to `cluster`; external state backends are reserved for a later stage. `state.lock` defaults to `true`. When enabled, `cluster plan`, `cluster apply`, `cluster refresh`, and `cluster import` briefly acquire `/__cluster/lock.json`, then remove it before returning. `cluster status` never acquires the lock; it only reports whether one is present. `cluster force-unlock` is the only lock-removal command; it requires the exact lock id and should be run only after confirming no cluster operation is active. ## Validation `cluster validate` checks: - `cluster.yaml` syntax and supported fields - duplicate YAML keys - schema, query, and policy file existence - schema parsing and catalog construction - stored-query parsing and query-name matching - stored-query type-checking against the desired schema - policy `applies_to` graph references Fields reserved for later phases, such as `pipelines`, `embeddings`, `ui`, `aliases`, and `bindings`, fail with a typed diagnostic instead of being silently ignored. ## Planning `cluster plan` first performs validation, then reads local JSON state from: ```text /__cluster/state.json ``` If the file is missing, the state is treated as empty and every desired resource is planned as a create. If present, the file must use this shape: ```json { "version": 1, "state_revision": 0, "applied_revision": { "config_digest": "...", "resources": { "graph.knowledge": { "digest": "..." }, "schema.knowledge": { "digest": "..." }, "query.knowledge.find_experts": { "digest": "..." }, "policy.base": { "digest": "...", "applies_to": ["cluster", "graph.knowledge"] } } }, "resource_statuses": { "graph.knowledge": { "status": "applied", "conditions": [], "message": "optional status detail" } }, "approval_records": {}, "recovery_records": {}, "observations": {} } ``` `state_revision`, `resource_statuses`, `approval_records`, `recovery_records`, and `observations` are optional so older Stage 1 state fixtures keep working. Missing `state_revision` is treated as `0`. Resource status values are `pending`, `planned`, `applying`, `applied`, `drifted`, `blocked`, or `error`. Plan output compares desired resource digests against state resource digests and reports `create`, `update`, and `delete` changes. It also reports the state CAS (`sha256:`) and state revision. `state_observations.locked` means an existing lock file was observed, along with its metadata (`lock_id`, `lock_operation`, `lock_created_at`, `lock_pid`, `lock_age_seconds`); a successful `plan` instead reports `lock_acquired: true` and an `acquired_lock_id`, then releases the lock before returning. The command never writes `state.json` and does not scan live graphs. Use explicit `cluster refresh` / `cluster import` when the state ledger should be updated from live observations. Live drift scans during plan are later-stage work. Policy entries additionally record their applied `applies_to` bindings as normalized typed refs — the state ledger is serving-sufficient for the future server-boot stage. A change to `applies_to` alone (the policy file digest unchanged) appears in the plan as an Update marked `binding_change` (human output: `[bindings]`), applies like any catalog change, and counts toward convergence; ledgers written before this field existed are backfilled by the next apply. Each plan change carries a `disposition` field — an honest preview of what `cluster apply` will do with it in this stage: `applied` (executes), `derived` (a `graph.` composite-digest update that converges automatically once its query digests land), `deferred` (graph/schema change, later phase), or `blocked` (query/policy gated by an unapplied or missing dependency, with the condition in `reason`). ## Apply `cluster apply` executes the executable subset of the plan — stored-query and policy-bundle changes, graph creates, and schema updates. There is no confirm flag: `cluster plan` is the preview, and apply recomputes the same diff under the state lock before executing, so a stale preview can never be applied. Apply requires an existing `state.json` (`state_missing` directs you to `cluster import` first). For each applied create/update, the resource payload is written content-addressed into the local catalog: ```text /__cluster/resources/query///.gq /__cluster/resources/policy//.yaml ``` Extensions are fixed per kind regardless of the source file's name. Payloads are written before the state update because `state.json` is the publish point: if the final CAS-checked state write fails, no success is reported and the digest-named blobs already written are inert — re-running apply is the repair. Deletes remove the resource from state; their old payload blobs stay on disk (garbage collection is a later stage). Re-running a converged apply is a no-op: no state write, no revision change (`state_written: false`). **Applied means serving — for deployments that opt in.** A server started with `--cluster ` boots from the applied revision (see [Serving from the cluster](#serving-from-the-cluster-the-mode-switch)); it picks up newly applied state on its next restart. Deployments still booting from `omnigraph.yaml` are untouched: for them, applied means recorded in the catalog, nothing more. ### Graph creation A `graph.` create (the graph is declared but no root exists) is executed by apply: the graph is initialized at the derived root ```text /graphs/.omni ``` with the declared schema, before any catalog writes, so queries and policies that depend on the new graph apply **in the same run**. Each create is fenced by a recovery sidecar under `__cluster/recoveries/{ulid}.json`, written before the init and removed only after the state update lands. If apply crashes in between, the next state-mutating command (`apply`, `refresh`, `import`) runs a **recovery sweep** that classifies the survivor by observation: an absent root removes the stale intent; a completed create rolls the cluster state forward (recorded in the state's `recovery_records`); a partial root reports `graph_create_incomplete` (status `error` — remove the root and re-run apply; nothing is auto-deleted); unexpected graph content reports `actual_applied_state_pending` (status `drifted` — run `cluster refresh` and re-plan). While a kept sidecar is pending, that graph's create and its dependents are blocked with `cluster_recovery_pending`. Read-only commands (`status`, `plan`) warn about pending sidecars without acting on them. **Re-creation is convergence.** If a graph root disappears out-of-band, `refresh` records the drift and the next `plan` proposes a create — and apply will execute it, producing an **empty** graph at the root. The data was already lost when the root vanished; the create is visible in the plan (disposition `applied`) before anything runs. ### Schema updates A `schema.` update (the declared schema differs from what state records) is executed by apply via the engine's schema-apply, after graph creates and before catalog writes — so a query change that depends on the new schema applies in the same run. Each schema apply is sidecar-fenced like a create: pre-operation manifest version recorded, post-operation version written back, sidecar retired only after the state update lands; the recovery sweep classifies survivors by schema digest (consistent ledger → retired; completed on the graph → state rolled forward with an audit entry; anything else → `drifted`/`actual_applied_state_pending`, kept). Migrations run with **soft drops only** — a removed property disappears from the current version while prior versions retain the data (reversible until `cleanup`). Data-loss migrations (`allow_data_loss`) are not reachable from cluster apply until the approval-artifact stage. Unsupported migrations (e.g. changing a property's type), engine lock contention, or graphs with user branches fail loudly as `schema_apply_failed` with the engine's message; dependent changes are demoted to `blocked` and graph-moving work stops for the run. `cluster plan` previews schema updates with the engine's real migration plan: each schema change carries a `migration` field (`supported` + typed steps), and the human output prints the steps. If the live graph cannot be opened the preview degrades to the digest diff with a `schema_preview_unavailable` warning. **Drift is converged, not just reported.** A schema changed out-of-band on the live graph shows up as `drifted` after `refresh`, and the next plan proposes migrating it back to the declared schema — apply executes that like any other soft migration. Drift correction is gated by the same rules as any change; nothing about it is hidden (the plan shows the steps, including soft drops of out-of-band fields). **Attribution.** `cluster apply --as ` records the operator identity in recovery sidecars and audit entries and threads it to the engine's schema-apply (so commit attribution and Cedar enforcement — wherever a policy checker is installed — work unchanged). ### Approvals and graph deletion Deleting a graph is the irreversible tier: it requires a recorded human decision. `cluster plan` lists the gate under `approvals_required` (one gate per graph — the graph-level approval carries its schema and queries); `cluster approve graph. --as ` writes a digest-bound artifact to ```text /__cluster/approvals/.json ``` bound to the exact desired config digest and the change's state digest, so **any config or state drift after approving invalidates the artifact** automatically (`approval_stale` warning; it never authorizes a different change). An unapproved delete blocks with `approval_required`. An approved delete executes **last** in the apply run: the graph root is removed recursively, the subtree (graph, schema, its queries) is tombstoned out of the state ledger with a tombstone observation, and the approval is consumed — recorded in the state's `approval_records` in the same state update, and the artifact file rewritten with `consumed_at` (the file is never deleted: the audit fact survives the loss of either store). A failed run consumes nothing; the approval stays valid for the retry. Catalog blobs of the deleted graph's queries stay on disk (GC is a later stage). Crash recovery for deletes: a completed-but-unrecorded delete is rolled forward by the sweep (tombstone + approval consumption + audit entry); an incomplete delete (root still present) is retired with a `graph_delete_incomplete` warning and simply **re-proposed** — prefix removal is idempotent, so the still-approved retry is the repair. Standalone schema deletes are never executed by this stage. They are reported as `deferred` (warning `apply_unsupported_change`), and query/policy changes that depend on them are `blocked` (warning `apply_dependency_blocked`, status `blocked` in state). A partially-applicable plan still exits 0 with warnings; the JSON `converged` field is the automation signal for "state now matches the desired revision". The applied `config_digest` is only recorded when apply fully converges. The `graph.` composite digest is recomputed from state's own schema/query digests after each apply, so applied query changes converge without graph movement. ## Serving from the cluster (the mode switch) ```bash omnigraph-server --cluster ./company-brain --bind 0.0.0.0:8080 ``` `--cluster ` is an **exclusive boot source** (axiom 15): it cannot combine with a graph URI, `--target`, or `--config`, and in this mode `omnigraph.yaml` is never read — not for graphs, not for queries, not for policies. The server serves the **applied revision**: graph roots recorded in `state.json`, stored-query and policy content from the content-addressed catalog at the applied digests (re-verified at boot), and policy bundles wired by their applied `applies_to` bindings — `cluster`-bound bundles become the server-level Cedar engine, graph-bound bundles attach per graph. Un-applied config drift never leaks into serving; `cluster plan` is where drift is visible. Routing is always multi-graph (`/graphs/{id}/...`). Bearer tokens and the bind address stay process-level (flags/env) — they are per-replica facts, not cluster facts. Boot is fail-fast: missing or unreadable state, pending recovery sidecars, missing/tampered catalog blobs, policy entries without binding metadata (pre-binding ledgers — re-run `cluster apply`), an empty graph set, more than one policy bundle binding a single scope (split or merge bundles; stacked scopes are a later stage), unopenable graph roots, and stored queries that no longer type-check all refuse startup with a remedy. A held state lock is *not* an error — boot reads the atomically-replaced state file without locking. Serving is static per process: the server reads the applied revision once at startup, so picking up newly applied state means restarting it. Stored queries are all listed in `GET /queries` in cluster mode (the cluster registry has no expose flag; exposure becomes a policy decision in a later phase). ## Status `cluster status` reads the same local JSON state ledger and prints what the ledger says is deployed. It does not validate referenced schema/query/policy files and does not inspect live graphs. Missing `state.json` succeeds with a warning; invalid state JSON or an unsupported state version fails. If a lock is present, status reports its id, operation, creation time, pid, and age. Status also verifies the catalog payloads read-only: every query/policy digest recorded in state is checked against its content-addressed blob under `__cluster/resources/` (existence and full digest re-hash). A missing or mismatched blob is reported as a warning (`catalog_payload_missing` / `catalog_payload_mismatch`); an unreadable blob is an error (`catalog_payload_read_error`) because an unverifiable catalog must not report healthy. Status never writes state — persisting the `drifted` condition is refresh's job. The check runs without the state lock, so it is a point-in-time report. ## Refresh And Import `cluster refresh` updates an existing `state.json` from actual observations. `cluster import` creates the first `state.json` when the ledger is missing. Both commands open declared graphs read-only at: ```text /graphs/.omni ``` They observe only branch `main`, recording graph existence, manifest version, live schema digest, desired schema digest, and schema-match status under `observations["graph."]`. Missing graph roots are recorded as drift and remove the graph/schema digests from state so a later `plan` proposes creates. Invalid graph roots are recorded as errors; `refresh` persists the error observation and exits non-zero, while `import` exits non-zero without creating initial state. Refresh also verifies the catalog payloads of every query/policy digest recorded in state (the same check `cluster status` reports read-only), and closes the loop: - a **missing** or **digest-mismatched** blob marks the resource `drifted` (condition `payload_missing` / `payload_mismatch`) and removes its digest from state — so the next `cluster plan` proposes a create and the next `cluster apply` republishes the blob (the self-heal loop, mirroring how a missing graph root is handled); - an **unreadable** blob (IO error other than not-found) keeps the digest, marks the resource `error` (condition `payload_read_error`), and exits non-zero — transient IO must not trigger a spurious republish. Upgrade note: a state ledger written before catalog publish existed records query/policy digests with no blobs on disk; the first refresh after upgrading flags them all `payload_missing`, and a single `cluster apply` republishes everything and converges. Refresh/import do not observe query or policy resources beyond their catalog payloads yet. Existing query and policy state digests are preserved on refresh (unless their payload drifted, above) and are not invented on import. ## Force Unlock `cluster force-unlock ` removes `/__cluster/lock.json` only when the file exists, is valid version-1 lock JSON, and its `lock_id` exactly matches the argument. A wrong id, missing lock, invalid lock JSON, or unsupported lock version exits non-zero and leaves the file untouched. This is manual recovery for abandoned local locks. OmniGraph does not perform PID-liveness checks, TTL expiry, stale-lock breaking, or automatic unlock in Stage 2C.