The axiom-15 mode switch: omnigraph-server --cluster <dir> (mutually exclusive with uri/--target/--config, zero omnigraph.yaml reads) serves the APPLIED revision — graph set from state, query/policy content from the content-addressed catalog at applied digests, cluster-scoped policy bundles as the server-level Cedar engine. The load-bearing finding: state is not yet serving-sufficient (policy applies_to bindings live only in cluster.yaml), so slice 5A records binding metadata into the applied revision at apply time — without it, boot-from-state silently becomes the merged read axiom 15 forbids. Fail-fast readiness table (missing state, pending sidecars, missing blobs, unbound policies all refuse boot with remedies), the expose-all mcp.expose bridge with its Phase 6 sunset, the operator migration path (exit criterion 7), and 5A/5B/5C sequencing. The existing boot pipeline (GraphStartupConfig -> registry -> routing/auth) is reused as-is — a new source, not a new pipeline. Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
15 KiB
RFC: Server Boots from Cluster State — Phase 5 of the Cluster Control Plane
Status: Proposed
Date: 2026-06-10
Builds on: Phase 4 complete (rfc-004-cluster-graph-schema-apply.md, Landed): cluster apply converges graphs, schemas, stored queries, and policies into the cluster catalog. Normative context: cluster-config-specs.md (the migration model's "window 2"), cluster-axioms.md (axiom 15), cluster-config-implementation-spec.md (Phase 5 rollout, Compatibility Stance #7–#9, exit criterion 7).
Target release: unversioned (phased — see Sequencing).
Summary
Give omnigraph-server a second boot source: omnigraph-server --cluster <dir> reads its graph set, stored queries, and Cedar policies from the cluster catalog — state.json's applied revision plus the content-addressed blobs under __cluster/resources/ — instead of omnigraph.yaml. This is the moment "applied" finally means "serving": the standing caveat in every cluster doc since Stage 3A ("the server still boots from omnigraph.yaml") retires for deployments that flip the switch.
Three commitments:
- An exclusive mode switch, never a merge (axiom 15, Compatibility Stance #7).
--cluster <dir>is mutually exclusive with the positional URI,--target, and--config. In cluster mode,omnigraph.yamlis not read at all — not for graphs, not for queries, not for policies. There is no precedence, no key-level aliasing, no fallback read. A deployment serves from one source. - The server serves the applied revision, not the desired config. What's live is what
cluster applyconverged: graph roots recorded in state, query/policy content at the applied digests from the content-addressed catalog. Un-applied config drift never leaks into serving — the serving surface and the ledger cannot disagree (axiom 5 extended to the data path). - The state ledger becomes serving-sufficient. Today one fact needed to serve is missing from state: a policy's
applies_tobindings live only incluster.yaml. A prerequisite slice (5A) records binding metadata into the applied revision at apply time, so a booting server reads state + blobs and nothing else. Without this, "boot from state" would silently become "boot from state and config" — the merged read axiom 15 forbids.
Motivation
Phase 4 closed the convergence loop but left it inert: an operator can declare, plan, approve, and apply an entire deployment, and the running server ignores all of it. The Sarah/Bob test still fails at the last step — Sarah's applied change is visible in cluster status but Bob's clients hit a server still wired to a hand-maintained omnigraph.yaml. Phase 5 makes the catalog the serving source, which is also the precondition for Phase 6 (policy-owned query exposure must filter a catalog the server actually reads).
Non-Goals
- Runtime reconciliation / hot reload. Cluster-mode boot is static, exactly like today's boot: the server reads the applied revision once at startup; picking up a newer applied state means restarting the process. The registry's runtime-mutation seam (the test-only
insert()+ mutateMutexinregistry.rs) stays future-proofing for a later watch-and-reload slice, not this RFC. - Policy-owned query exposure (Phase 6) — but this RFC defines the bridge it sunsets (§D5).
- Remote cluster roots.
--cluster <dir>is a local directory in this phase, same as theclusterCLI commands; S3-hosted cluster roots arrive with external state backends. - Retiring
omnigraph.yamlserver boot. It remains a fully supported mode indefinitely (Compatibility Stance #8: the file's job shrinks; the server-role keys become inert only for deployments that switch). - New management endpoints (
/cluster/statusetc.) — noted as future work; this RFC changes the boot source, not the HTTP surface (beyond OpenAPI regen if anything shifts).
Background (verified against main)
- Server boot today (
omnigraph-server/src/main.rs,lib.rs:891-1029):load_server_settingsapplies a four-rule mode inference (positional URI /--target/server.graph→ Single;--config+graphs:→ Multi), buildsServerConfigMode::{Single,Multi}with per-graphGraphStartupConfig {graph_id, uri, policy_file, queries}, loadsQueryRegistryfrom.gqfiles at settings time (identity-checked), type-checks queries at engine open (validate_and_attach), loads Cedar viaPolicyEngine::load_graph/load_server, installs it withwith_policy, and assemblesGraphRegistry::from_handles(startup-only; lock-freeArcSwapreads). Bind address and bearer tokens come from flags/env, not from graph config. No reload machinery exists. - The catalog today (
omnigraph-cluster):state.jsonrecordsapplied_revision.resources(address → digest) forgraph.*,schema.*,query.<graph>.<name>,policy.<name>, plus statuses, observations (incl. tombstones), approval and recovery records. Query/policy content lives content-addressed at__cluster/resources/query/<graph>/<name>/<digest>.gqandpolicy/<name>/<digest>.yaml. Graph roots are derived:<dir>/graphs/<id>.omni. - The gap: state records a policy's digest only;
applies_to(cluster vs graph refs) lives incluster.yaml. Queries are fine — their graph binding is encoded in the address itself.
Design
D1. The mode switch
New server flag: omnigraph-server --cluster <dir> (the directory containing cluster.yaml, __cluster/, and graphs/). Mutually exclusive — a hard startup error, not a precedence rule — with the positional URI, --target, and --config. --bind, --unauthenticated, and the bearer-token env vars keep working identically: listen address and credentials are process-operational facts, not cluster facts (they differ per replica/host and never belonged to the shared catalog; if a serve: section ever joins cluster.yaml, that's a separate proposal).
Mode inference gains rule 0: --cluster <dir> → Cluster mode, which is always multi-graph routing (/graphs/{graph_id}/...), even for a single declared graph. No flat-route legacy surface in cluster mode — it's a new mode with no compatibility debt to carry.
D2. What the server reads (the applied revision, and only it)
load_server_settings grows a cluster branch that reads, in order:
__cluster/state.json— missing state is a boot error ("runcluster import+cluster applyfirst"). Pending recovery sidecars under__cluster/recoveries/are also a boot error (cluster_recovery_pending): a server must not start serving a ledger that a sweep is about to rewrite.- Graph set = state's
graph.<id>resources (tombstoned graphs are absent by construction). Each graph's URI is the derived root<dir>/graphs/<id>.omni. A recorded graph whose root does not open is a boot error — same fail-fast posture as today's bad URI. - Stored queries = state's
query.<graph>.<name>entries, content loaded from the catalog blob at the recorded digest. Blob-missing or digest-mismatched is a boot error (the catalog verification semantics from Stage 3B, applied at boot). Queries type-check at engine open exactly as today (validate_and_attach— unchanged). - Policies = state's
policy.<name>entries, content from catalog blobs, bindings from the applied metadata of D3: bundles bound toclusterload as the server-level Cedar engine (PolicyEngine::load_server); bundles bound to graphs load per-graph (PolicyEngine::load_graph) and install viawith_policy— the existing two-gate structure, unchanged. cluster.yamlis parsed only to validate that the directory is a cluster root (and for nothing else — explicitly not for resource content; a divergence between desired config and applied state is served as applied, visible viacluster plan).
Everything downstream of settings construction — GraphStartupConfig, parallel engine opens, GraphRegistry::from_handles, routing middleware, auth, workload admission, OpenAPI — is reused as-is. Cluster mode is a new source for the same boot pipeline, not a new pipeline.
D3. Prerequisite: serving metadata in the applied revision (slice 5A)
State's StateResource records only a digest. To make the ledger serving-sufficient, cluster apply (and the sweep's roll-forwards) additionally record binding metadata for policy resources at apply time:
"applied_revision": {
"resources": {
"policy.base_rbac": {
"digest": "<sha256>",
"applies_to": ["cluster", "graph.knowledge"]
}
}
}
- Additive and optional (
#[serde(default)]) — existing state files parse unchanged; a policy entry withoutapplies_to(applied before 5A) is a boot error in cluster mode with the remedy "re-runcluster apply" (one apply rewrites the metadata; the digest needn't change — the metadata write is part of the state mutation, not the blob). applies_tois normalized to typed addresses (cluster|graph.<id>) at apply time, mirroring the validator's normalization.- Queries need no equivalent: the address (
query.<graph>.<name>) already carries the binding, and the registry key/symbol invariant is enforced at apply (validate) time. - This is deliberately applied metadata, not config mirroring: if
cluster.yamlchanges a binding, the server keeps serving the old binding untilcluster applyconverges it — the same contract as every other resource.
D4. Readiness and failure posture
Boot is fail-fast, matching the server's existing stance (bad policy YAML refuses boot):
| Condition | Behavior |
|---|---|
state.json missing / unparseable / unsupported version |
boot error |
| pending recovery sidecars | boot error (run any state-mutating cluster command to sweep) |
| recorded graph root missing or unopenable | boot error |
| query/policy blob missing or digest-mismatched | boot error (run cluster refresh + apply to self-heal, then restart) |
policy entry without applies_to metadata |
boot error ("re-run cluster apply", D3) |
| stored query fails type-check against the live schema | boot error (existing validate_and_attach behavior) |
| state lock held | not an error — boot takes no lock; it reads a point-in-time snapshot of an immutable-once-written state file (the CAS discipline means a concurrent apply produces a new file atomically; the server reads whichever was current at open) |
D5. The mcp.expose bridge in cluster mode
The cluster query registry has no expose flag by design (axiom 14: exposure is a policy decision — Phase 6). Until Phase 6 ships, cluster-mode servers list all stored queries in GET /queries. This is the documented bridge: cluster mode = everything exposed; omnigraph.yaml mode = mcp.expose honored as today. Its named sunset is Phase 6's policy-filtered catalog (Compatibility Stance #9). Invocation remains gated by the existing coarse invoke_query Cedar action in both modes.
D6. Migration path (exit criterion 7)
For an operator running multi-graph from omnigraph.yaml:
- Author
cluster.yamldeclaring the same graphs/queries/policies; place existing graph roots under<dir>/graphs/<id>.omni(or start fresh). cluster import(observes live graphs) →cluster plan→cluster apply(publishes queries/policies into the catalog; with 5A, records policy bindings).- Restart the server with
--cluster <dir>instead of--config omnigraph.yaml. omnigraph.yaml'sgraphs:/serve:/queries:/policy:keys are now inert for this deployment; the file remains the CLI's per-operator config.
Rollback is the same switch in reverse — nothing in cluster mode mutates omnigraph.yaml or the graphs in a way the yaml mode can't serve.
D7. Invariants and axioms check
- Axiom 15 / Stance #7: exclusive flag, hard mutual-exclusion error, zero
omnigraph.yamlreads in cluster mode — no fact has two readers. - Axiom 5: the server serves deployed reality (applied digests), never desired intent; D3 keeps the ledger the single serving source.
- Axiom 12: boot reads without the lock but relies on the atomic-replace write discipline; it never writes state.
- Axiom 14 / Stance #9: the expose-all bridge is named, scoped to cluster mode, and carries its Phase 6 sunset.
- Loud failures (deny-list): every degraded condition is a typed boot error with a remedy; no partial serving, no silent fallback to the yaml.
- Respect the boundaries:
omnigraph-clusterstays free of HTTP; the server reads the catalog through a small read-only loader (either apubread surface onomnigraph-clusteror a thin module in the server consuming the documented file formats — implementation picks the one that keepsomnigraph-clusterdependency-light; the state/blob formats are already a documented contract).
Sequencing
| Slice | Scope | Gate |
|---|---|---|
| 5A: serving metadata in state | applies_to recorded on policy resources at apply + sweep roll-forward; additive state schema; status/plan surfacing |
In-crate tests: metadata written/rolled-forward; old state parses; re-apply backfills |
5B: --cluster boot mode |
Flag + mode inference rule 0; catalog loader (state → GraphStartupConfigs + registries + policy engines); readiness table; OpenAPI regen if surface shifts |
Server tests: boot from a converged fixture dir, serve /graphs/{id}/query + stored queries + Cedar gates; every D4 row refuses boot; e2e: cluster apply then serve — "applied means serving" |
| 5C: docs + caveat retirement | cluster-config.md mode-switch section; server.md/deployment.md; retire the "not serving" caveats for cluster-mode deployments; migration guide (D6) |
check-agents-md.sh; doc accuracy review |
Exit-criteria coverage
Answers implementation-spec exit criterion 7 (server startup + migration path) in full; touches 1 (state schema gains policy binding metadata — additive). Criteria 8 (per-query policy) and 9 (pipelines — descoped to a separate project) remain.
Open Questions
- Loader home:
pubread-only API onomnigraph-cluster(server gains the dependency) vs a server-side reader of the documented formats. Leaningomnigraph-clusterAPI — one parser for the state schema beats two drifting ones; the crate stays HTTP-free either way. - Boot-time blob re-hash: D4 requires digest verification at boot; for large catalogs a stat-only fast path with full hashes behind a flag may matter later. Start with full verification (catalogs are small).
GET /graphsenrichment: cluster mode could expose applied digests/revision in the enumeration — deferred until a consumer exists.- Watch-and-reload: the natural follow-up once cluster mode exists; the registry's mutation seam is ready, but reload semantics (drain? cutover?) deserve their own design.
References
- rfc-004-cluster-graph-schema-apply.md — the convergence machinery this serves
- cluster-config-specs.md §Migration model — window 2 is this RFC
- cluster-axioms.md — axioms 5, 12, 14, 15
- cluster-config-implementation-spec.md — Phase 5 rollout, Compatibility Stance #7–#9, blast-radius rows for the server registry
crates/omnigraph-server/src/lib.rs(load_server_settings,ServerConfigMode,GraphRegistry) — the boot pipeline this extends without forking