The 'Relationship to omnigraph.yaml' section becomes the exact rule set: cluster commands read the per-operator config for exactly one thing (the cli.actor default when --as is omitted), a --cluster server reads it for nothing, and pointing data-plane targets at derived roots is ergonomics, not coupling. Operator guide and CLI reference updated to match. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
20 KiB
Cluster Config
Status: Phase 5 — cluster-booted serving (omnigraph-server --cluster).
New to the cluster tooling? Start with the operator how-to guide, cluster.md — this document is the reference.
Cluster config is the future control-plane configuration surface for a whole
OmniGraph deployment. In this stage, OmniGraph can validate a local
cluster.yaml folder, produce a deterministic read-only plan, inspect the
local JSON state ledger, explicitly refresh/import graph observations into
that ledger, manually remove a held local state lock by exact lock id, and
apply the executable subset of the plan — stored-query and policy-bundle
catalog writes, graph creation (a declared graph that does not exist yet
is initialized by apply at the derived root), schema updates (soft drops
only), and — behind an explicit, digest-bound approval — graph
deletion. It does not perform data-loss schema migrations, start servers,
or serve anything it applies: the server still boots from omnigraph.yaml.
Commands
omnigraph cluster validate --config ./company-brain
omnigraph cluster plan --config ./company-brain --json
omnigraph cluster apply --config ./company-brain --json
omnigraph cluster approve graph.<id> --config ./company-brain --as <actor>
omnigraph cluster status --config ./company-brain --json
omnigraph cluster refresh --config ./company-brain --json
omnigraph cluster import --config ./company-brain --json
omnigraph cluster force-unlock <LOCK_ID> --config ./company-brain --json
--config points at a directory, not a file. The directory must contain
cluster.yaml. When omitted, it defaults to the current directory.
Relationship to omnigraph.yaml
cluster.yaml does not replace omnigraph.yaml, and the two never describe
the same fact. omnigraph.yaml is the permanent per-operator layer (CLI
defaults, the operator's identity and credential references, graph targets
for data-plane commands); cluster.yaml is the shared desired state of a
whole deployment, read only by the cluster commands via --config.
The exact contract:
- Cluster commands read
omnigraph.yamlfor exactly one thing: thecli.actordefault used byapply/approvewhen--asis omitted — operator identity is a per-operator fact. With--aspresent, no config is read at all. Nothing else (its graph set, targets, bind, queries, policies) ever influences a cluster command; a malformedomnigraph.yamlbreaks only the no-flag actor lookup, loudly. - A
--clusterserver readsomnigraph.yamlfor nothing — not even the implicit current-directory search runs (mode-inference rule 0). Boot from cluster state XORomnigraph.yaml, never a merge. - The other direction is ergonomics, not coupling: a per-operator
omnigraph.yamlmay pointgraphs.<name>.uriat a cluster's derived root (./company-brain/graphs/knowledge.omni) so data-plane commands can use--target <name>— an ordinary local path, no special handling.
Supported cluster.yaml
Stage 3A accepts only this resource subset:
version: 1
metadata:
name: company-brain
state:
backend: cluster
lock: true
graphs:
knowledge:
schema: ./knowledge.pg
queries:
find_experts:
file: ./knowledge.gq
policies:
base:
file: ./base.policy.yaml
applies_to: [knowledge]
metadata.name is a display label. state.backend may be omitted or set to
cluster; external state backends are reserved for a later stage. state.lock
defaults to true. When enabled, cluster plan, cluster apply,
cluster refresh, and cluster import briefly acquire
<config-dir>/__cluster/lock.json, then remove it before returning. cluster status never acquires the lock; it only reports
whether one is present. cluster force-unlock is the only lock-removal command;
it requires the exact lock id and should be run only after confirming no cluster
operation is active.
Validation
cluster validate checks:
cluster.yamlsyntax and supported fields- duplicate YAML keys
- schema, query, and policy file existence
- schema parsing and catalog construction
- stored-query parsing and query-name matching
- stored-query type-checking against the desired schema
- policy
applies_tograph references
Fields reserved for later phases, such as pipelines, embeddings, ui,
aliases, and bindings, fail with a typed diagnostic instead of being
silently ignored.
Planning
cluster plan first performs validation, then reads local JSON state from:
<config-dir>/__cluster/state.json
If the file is missing, the state is treated as empty and every desired resource is planned as a create. If present, the file must use this shape:
{
"version": 1,
"state_revision": 0,
"applied_revision": {
"config_digest": "...",
"resources": {
"graph.knowledge": { "digest": "..." },
"schema.knowledge": { "digest": "..." },
"query.knowledge.find_experts": { "digest": "..." },
"policy.base": {
"digest": "...",
"applies_to": ["cluster", "graph.knowledge"]
}
}
},
"resource_statuses": {
"graph.knowledge": {
"status": "applied",
"conditions": [],
"message": "optional status detail"
}
},
"approval_records": {},
"recovery_records": {},
"observations": {}
}
state_revision, resource_statuses, approval_records, recovery_records,
and observations are optional so older Stage 1 state fixtures keep working.
Missing state_revision is treated as 0. Resource status values are
pending, planned, applying, applied, drifted, blocked, or error.
Plan output compares desired resource digests against state resource digests
and reports create, update, and delete changes. It also reports the state
CAS (sha256:<digest>) and state revision. state_observations.locked means an
existing lock file was observed, along with its metadata (lock_id,
lock_operation, lock_created_at, lock_pid, lock_age_seconds); a
successful plan instead reports lock_acquired: true and an
acquired_lock_id, then releases the lock before returning. The command never
writes state.json and does not scan live graphs. Use explicit
cluster refresh / cluster import when the state ledger should be updated
from live observations. Live drift scans during plan are later-stage work.
Policy entries additionally record their applied applies_to bindings as
normalized typed refs — the state ledger is serving-sufficient for the
future server-boot stage. A change to applies_to alone (the policy file
digest unchanged) appears in the plan as an Update marked binding_change
(human output: [bindings]), applies like any catalog change, and counts
toward convergence; ledgers written before this field existed are backfilled
by the next apply.
Each plan change carries a disposition field — an honest preview of what
cluster apply will do with it in this stage: applied (executes), derived
(a graph.<id> composite-digest update that converges automatically once its
query digests land), deferred (graph/schema change, later phase), or
blocked (query/policy gated by an unapplied or missing dependency, with the
condition in reason).
Apply
cluster apply executes the executable subset of the plan — stored-query and
policy-bundle changes, graph creates, and schema updates. There is no confirm
flag: cluster plan is the preview,
and apply recomputes the same diff under the state lock before executing, so a
stale preview can never be applied. Apply requires an existing state.json
(state_missing directs you to cluster import first).
For each applied create/update, the resource payload is written content-addressed into the local catalog:
<config-dir>/__cluster/resources/query/<graph>/<name>/<digest>.gq
<config-dir>/__cluster/resources/policy/<name>/<digest>.yaml
Extensions are fixed per kind regardless of the source file's name. Payloads
are written before the state update because state.json is the publish point:
if the final CAS-checked state write fails, no success is reported and the
digest-named blobs already written are inert — re-running apply is the repair.
Deletes remove the resource from state; their old payload blobs stay on disk
(garbage collection is a later stage). Re-running a converged apply is a no-op:
no state write, no revision change (state_written: false).
Applied means serving — for deployments that opt in. A server started
with --cluster <dir> boots from the applied revision (see
Serving from the cluster); it
picks up newly applied state on its next restart. Deployments still booting
from omnigraph.yaml are untouched: for them, applied means recorded in the
catalog, nothing more.
Graph creation
A graph.<id> create (the graph is declared but no root exists) is executed
by apply: the graph is initialized at the derived root
<config-dir>/graphs/<graph-id>.omni
with the declared schema, before any catalog writes, so queries and policies
that depend on the new graph apply in the same run. Each create is fenced
by a recovery sidecar under __cluster/recoveries/{ulid}.json, written before
the init and removed only after the state update lands. If apply crashes in
between, the next state-mutating command (apply, refresh, import) runs a
recovery sweep that classifies the survivor by observation: an absent root
removes the stale intent; a completed create rolls the cluster state forward
(recorded in the state's recovery_records); a partial root reports
graph_create_incomplete (status error — remove the root and re-run apply;
nothing is auto-deleted); unexpected graph content reports
actual_applied_state_pending (status drifted — run cluster refresh and
re-plan). While a kept sidecar is pending, that graph's create and its
dependents are blocked with cluster_recovery_pending. Read-only commands
(status, plan) warn about pending sidecars without acting on them.
Re-creation is convergence. If a graph root disappears out-of-band,
refresh records the drift and the next plan proposes a create — and apply
will execute it, producing an empty graph at the root. The data was
already lost when the root vanished; the create is visible in the plan
(disposition applied) before anything runs.
Schema updates
A schema.<id> update (the declared schema differs from what state records)
is executed by apply via the engine's schema-apply, after graph creates and
before catalog writes — so a query change that depends on the new schema
applies in the same run. Each schema apply is sidecar-fenced like a create:
pre-operation manifest version recorded, post-operation version written back,
sidecar retired only after the state update lands; the recovery sweep
classifies survivors by schema digest (consistent ledger → retired; completed
on the graph → state rolled forward with an audit entry; anything else →
drifted/actual_applied_state_pending, kept).
Migrations run with soft drops only — a removed property disappears from
the current version while prior versions retain the data (reversible until
cleanup). Data-loss migrations (allow_data_loss) are not reachable from
cluster apply until the approval-artifact stage. Unsupported migrations
(e.g. changing a property's type), engine lock contention, or graphs with
user branches fail loudly as schema_apply_failed with the engine's message;
dependent changes are demoted to blocked and graph-moving work stops for
the run.
cluster plan previews schema updates with the engine's real migration plan:
each schema change carries a migration field (supported + typed steps),
and the human output prints the steps. If the live graph cannot be opened the
preview degrades to the digest diff with a schema_preview_unavailable
warning.
Drift is converged, not just reported. A schema changed out-of-band on
the live graph shows up as drifted after refresh, and the next plan
proposes migrating it back to the declared schema — apply executes that like
any other soft migration. Drift correction is gated by the same rules as any
change; nothing about it is hidden (the plan shows the steps, including soft
drops of out-of-band fields).
Attribution. cluster apply --as <actor> records the operator identity
in recovery sidecars and audit entries and threads it to the engine's
schema-apply (so commit attribution and Cedar enforcement — wherever a policy
checker is installed — work unchanged).
Approvals and graph deletion
Deleting a graph is the irreversible tier: it requires a recorded human
decision. cluster plan lists the gate under approvals_required (one gate
per graph — the graph-level approval carries its schema and queries);
cluster approve graph.<id> --as <actor> writes a digest-bound artifact to
<config-dir>/__cluster/approvals/<approval-id>.json
bound to the exact desired config digest and the change's state digest, so
any config or state drift after approving invalidates the artifact
automatically (approval_stale warning; it never authorizes a different
change). An unapproved delete blocks with approval_required.
An approved delete executes last in the apply run: the graph root is
removed recursively, the subtree (graph, schema, its queries) is tombstoned
out of the state ledger with a tombstone observation, and the approval is
consumed — recorded in the state's approval_records in the same state
update, and the artifact file rewritten with consumed_at (the file is never
deleted: the audit fact survives the loss of either store). A failed run
consumes nothing; the approval stays valid for the retry. Catalog blobs of
the deleted graph's queries stay on disk (GC is a later stage).
Crash recovery for deletes: a completed-but-unrecorded delete is rolled
forward by the sweep (tombstone + approval consumption + audit entry); an
incomplete delete (root still present) is retired with a
graph_delete_incomplete warning and simply re-proposed — prefix removal
is idempotent, so the still-approved retry is the repair.
Standalone schema deletes are never executed by this stage. They are
reported as deferred (warning apply_unsupported_change), and query/policy
changes that depend on them are blocked (warning apply_dependency_blocked, status
blocked in state). A partially-applicable plan still exits 0 with warnings;
the JSON converged field is the automation signal for "state now matches the
desired revision". The applied config_digest is only recorded when apply
fully converges. The graph.<id> composite digest is recomputed from state's
own schema/query digests after each apply, so applied query changes converge
without graph movement.
Serving from the cluster (the mode switch)
omnigraph-server --cluster ./company-brain --bind 0.0.0.0:8080
--cluster <dir> is an exclusive boot source (axiom 15): it cannot
combine with a graph URI, --target, or --config, and in this mode
omnigraph.yaml is never read — not for graphs, not for queries, not for
policies. The server serves the applied revision: graph roots recorded in
state.json, stored-query and policy content from the content-addressed
catalog at the applied digests (re-verified at boot), and policy bundles
wired by their applied applies_to bindings — cluster-bound bundles become
the server-level Cedar engine, graph-bound bundles attach per graph.
Un-applied config drift never leaks into serving; cluster plan is where
drift is visible. Routing is always multi-graph (/graphs/{id}/...). Bearer
tokens and the bind address stay process-level (flags/env) — they are
per-replica facts, not cluster facts.
Boot is fail-fast: missing or unreadable state, pending recovery sidecars,
missing/tampered catalog blobs, policy entries without binding metadata
(pre-binding ledgers — re-run cluster apply), an empty graph set, more than
one policy bundle binding a single scope (split or merge bundles; stacked
scopes are a later stage), unopenable graph roots, and stored queries that no
longer type-check all refuse startup with a remedy. A held state lock is
not an error — boot reads the atomically-replaced state file without
locking.
Serving is static per process: the server reads the applied revision once at
startup, so picking up newly applied state means restarting it. Stored
queries are all listed in GET /queries in cluster mode (the cluster
registry has no expose flag; exposure becomes a policy decision in a later
phase).
Status
cluster status reads the same local JSON state ledger and prints what the
ledger says is deployed. It does not validate referenced schema/query/policy
files and does not inspect live graphs. Missing state.json succeeds with a
warning; invalid state JSON or an unsupported state version fails. If a lock is
present, status reports its id, operation, creation time, pid, and age.
Status also verifies the catalog payloads read-only: every query/policy digest
recorded in state is checked against its content-addressed blob under
__cluster/resources/ (existence and full digest re-hash). A missing or
mismatched blob is reported as a warning (catalog_payload_missing /
catalog_payload_mismatch); an unreadable blob is an error
(catalog_payload_read_error) because an unverifiable catalog must not report
healthy. Status never writes state — persisting the drifted condition is
refresh's job. The check runs without the state lock, so it is a point-in-time
report.
Refresh And Import
cluster refresh updates an existing state.json from actual observations.
cluster import creates the first state.json when the ledger is missing.
Both commands open declared graphs read-only at:
<config-dir>/graphs/<graph-id>.omni
They observe only branch main, recording graph existence, manifest version,
live schema digest, desired schema digest, and schema-match status under
observations["graph.<id>"]. Missing graph roots are recorded as drift and
remove the graph/schema digests from state so a later plan proposes creates.
Invalid graph roots are recorded as errors; refresh persists the error
observation and exits non-zero, while import exits non-zero without creating
initial state.
Refresh also verifies the catalog payloads of every query/policy digest
recorded in state (the same check cluster status reports read-only), and
closes the loop:
- a missing or digest-mismatched blob marks the resource
drifted(conditionpayload_missing/payload_mismatch) and removes its digest from state — so the nextcluster planproposes a create and the nextcluster applyrepublishes the blob (the self-heal loop, mirroring how a missing graph root is handled); - an unreadable blob (IO error other than not-found) keeps the digest,
marks the resource
error(conditionpayload_read_error), and exits non-zero — transient IO must not trigger a spurious republish.
Upgrade note: a state ledger written before catalog publish existed records
query/policy digests with no blobs on disk; the first refresh after upgrading
flags them all payload_missing, and a single cluster apply republishes
everything and converges.
Refresh/import do not observe query or policy resources beyond their catalog payloads yet. Existing query and policy state digests are preserved on refresh (unless their payload drifted, above) and are not invented on import.
Force Unlock
cluster force-unlock <LOCK_ID> removes <config-dir>/__cluster/lock.json only
when the file exists, is valid version-1 lock JSON, and its lock_id exactly
matches the argument. A wrong id, missing lock, invalid lock JSON, or unsupported
lock version exits non-zero and leaves the file untouched.
This is manual recovery for abandoned local locks. OmniGraph does not perform PID-liveness checks, TTL expiry, stale-lock breaking, or automatic unlock in Stage 2C.