docs(cluster,server): the Phase 5 mode switch; retire applied-not-serving caveats

The standing caveat ('applied means recorded in the cluster catalog — nothing more; the server still boots from omnigraph.yaml') retires: cluster docs gain the 'Serving from the cluster' section (exclusivity, applied- revision serving, fail-fast readiness, restart-to-pick-up, expose-all bridge), server.md gains mode-inference rule 0 and the cluster-booted multi mode, deployment.md the boot-source choice, and the CLI's apply note plus the cli-reference cluster row (stale back to Stage 3A) now describe the full convergence surface. RFC-005 flips to Landed with four implementation deviations recorded. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-15 01:55:13 +02:00 · 2026-06-10 17:55:15 +03:00 · 2026-06-10 17:55:15 +03:00 · 711865e6f1
commit 711865e6f1
parent f3eb60fa4e
7 changed files with 69 additions and 12 deletions
--- a/docs/user/cli-reference.md
+++ b/docs/user/cli-reference.md
@ -19,7 +19,7 @@ Top-level command families and subcommands. Graph-targeting commands accept eith
 | `commit list \| show` | inspect commit graph |
 | `schema plan \| apply \| show (alias: get)` | migrations |
 | `lint` (alias: `check`) | offline / graph-backed query validation. Replaces `query lint` / `query check`, which are kept as deprecated argv-level shims that print a one-line warning and rewrite to `omnigraph lint` |
-| `cluster validate \| plan \| apply \| approve \| status \| refresh \| import \| force-unlock` | cluster-control preview. `validate` checks a local `cluster.yaml` folder and referenced schema/query/policy files; `plan` diffs it against local JSON state at `__cluster/state.json` and annotates each change with its apply disposition; `apply` executes the config-only (stored-query/policy) subset into the content-addressed local catalog under `__cluster/resources/` — graph/schema changes are deferred loudly, and nothing applied serves traffic (the server still boots from `omnigraph.yaml`); `status` reads the state ledger; `refresh`/`import` explicitly update local JSON state from read-only graph observations; `force-unlock <LOCK_ID>` manually removes a held local state lock by exact id. No graph-manifest movement, server change, automatic stale-lock breaking, or `plan --refresh` occurs in Stage 3A |
+| `cluster validate \| plan \| apply \| approve \| status \| refresh \| import \| force-unlock` | declarative cluster control plane. `validate` checks a local `cluster.yaml` folder and referenced schema/query/policy files; `plan` diffs it against local JSON state at `__cluster/state.json`, annotates dispositions, and embeds real schema-migration previews; `apply` converges the cluster — stored-query/policy catalog writes (content-addressed under `__cluster/resources/`), graph creates, schema updates (soft drops only; `--as` records the actor), and graph deletes behind a digest-bound approval from `cluster approve <resource> --as <actor>`; what apply converges is what an `omnigraph-server --cluster <dir>` deployment serves on its next restart (omnigraph.yaml deployments are unaffected); `status` reads the state ledger; `refresh`/`import` explicitly update local JSON state from read-only graph observations; `force-unlock <LOCK_ID>` manually removes a held local state lock by exact id |
 | `optimize` | non-destructive Lance compaction (skips tables with `Blob` columns or uncovered drift; `--json` reports `skipped`) |
 | `repair [--confirm] [--force]` | preview or explicitly publish uncovered manifest/head drift. `--confirm` heals verified maintenance drift and exits non-zero if suspicious/unverifiable drift is refused; `--force --confirm` publishes suspicious/unverifiable drift after operator review |
 | `cleanup --keep N --older-than 7d --confirm` | destructive version GC |
--- a/docs/user/cluster-config.md
+++ b/docs/user/cluster-config.md
@ -1,6 +1,6 @@
 # Cluster Config

-**Status:** Stage 4C — Phase 4 complete (graph create, schema apply, gated graph delete).
+**Status:** Phase 5 — cluster-booted serving (`omnigraph-server --cluster`).

 Cluster config is the future control-plane configuration surface for a whole
 OmniGraph deployment. In this stage, OmniGraph can validate a local
@ -190,10 +190,12 @@ Deletes remove the resource from state; their old payload blobs stay on disk
 (garbage collection is a later stage). Re-running a converged apply is a no-op:
 no state write, no revision change (`state_written: false`).

-**Applied means recorded in the cluster catalog — nothing more.** The server
-still boots from `omnigraph.yaml`; no query or policy applied here serves
-traffic until the server-boot stage ships, as an explicit per-deployment mode
-switch.
+**Applied means serving — for deployments that opt in.** A server started
+with `--cluster <dir>` boots from the applied revision (see
+[Serving from the cluster](#serving-from-the-cluster-the-mode-switch)); it
+picks up newly applied state on its next restart. Deployments still booting
+from `omnigraph.yaml` are untouched: for them, applied means recorded in the
+catalog, nothing more.

 ### Graph creation

@ -305,6 +307,40 @@ fully converges. The `graph.<id>` composite digest is recomputed from state's
 own schema/query digests after each apply, so applied query changes converge
 without graph movement.

+## Serving from the cluster (the mode switch)
+
+```bash
+omnigraph-server --cluster ./company-brain --bind 0.0.0.0:8080
+```
+
+`--cluster <dir>` is an **exclusive boot source** (axiom 15): it cannot
+combine with a graph URI, `--target`, or `--config`, and in this mode
+`omnigraph.yaml` is never read — not for graphs, not for queries, not for
+policies. The server serves the **applied revision**: graph roots recorded in
+`state.json`, stored-query and policy content from the content-addressed
+catalog at the applied digests (re-verified at boot), and policy bundles
+wired by their applied `applies_to` bindings — `cluster`-bound bundles become
+the server-level Cedar engine, graph-bound bundles attach per graph.
+Un-applied config drift never leaks into serving; `cluster plan` is where
+drift is visible. Routing is always multi-graph (`/graphs/{id}/...`). Bearer
+tokens and the bind address stay process-level (flags/env) — they are
+per-replica facts, not cluster facts.
+
+Boot is fail-fast: missing or unreadable state, pending recovery sidecars,
+missing/tampered catalog blobs, policy entries without binding metadata
+(pre-binding ledgers — re-run `cluster apply`), an empty graph set, more than
+one policy bundle binding a single scope (split or merge bundles; stacked
+scopes are a later stage), unopenable graph roots, and stored queries that no
+longer type-check all refuse startup with a remedy. A held state lock is
+*not* an error — boot reads the atomically-replaced state file without
+locking.
+
+Serving is static per process: the server reads the applied revision once at
+startup, so picking up newly applied state means restarting it. Stored
+queries are all listed in `GET /queries` in cluster mode (the cluster
+registry has no expose flag; exposure becomes a policy decision in a later
+phase).
+
 ## Status

 `cluster status` reads the same local JSON state ledger and prints what the
--- a/docs/user/deployment.md
+++ b/docs/user/deployment.md
@ -13,6 +13,14 @@ Omnigraph supports two broad deployment shapes:

 The server binary and container image expose the same HTTP surface.

+The server also has two **boot sources**: `omnigraph.yaml` (graph targets
+declared in the per-operator config) or a **cluster directory**
+(`omnigraph-server --cluster <dir>`), which serves the cluster control
+plane's applied revision — see
+[cluster-config.md](cluster-config.md#serving-from-the-cluster-the-mode-switch).
+The two are exclusive per deployment; switching is a restart with a different
+flag.
+
 ## Binary Deployment

 Build or install:
--- a/docs/user/server.md
+++ b/docs/user/server.md
@ -1,6 +1,6 @@
 # HTTP Server (`omnigraph-server`)

-Axum 0.8 + tokio + utoipa-generated OpenAPI. **Two modes** (v0.6.0+): single-graph (legacy) and multi-graph (MR-668). Mode is inferred from CLI args + config shape.
+Axum 0.8 + tokio + utoipa-generated OpenAPI. **Two modes** (v0.6.0+): single-graph (legacy) and multi-graph (MR-668), with **two boot sources** for multi mode: `omnigraph.yaml` or — exclusively — a cluster directory (`--cluster`, RFC-005). Mode is inferred from CLI args + config shape.

 ## Modes

@ -14,8 +14,20 @@ Axum 0.8 + tokio + utoipa-generated OpenAPI. **Two modes** (v0.6.0+): single-gra

 `omnigraph-server --config omnigraph.yaml` with a non-empty `graphs:` map and **no** single-mode selector (no `server.graph`, no `<URI>`, no `--target`). The server opens every configured graph in parallel at startup (bounded concurrency = 4, fail-fast on the first open error). Routes are nested under `/graphs/{graph_id}/...`. Bare flat paths return 404 in multi mode.

-Mode inference (four-rule matrix):
+### Cluster-booted multi mode (Phase 5)

+`omnigraph-server --cluster <dir>` boots from the cluster catalog's **applied
+revision** (`state.json` + content-addressed blobs) instead of
+`omnigraph.yaml` — an exclusive boot source: combining it with `<URI>`,
+`--target`, or `--config` is a startup error, and `omnigraph.yaml` is never
+read in this mode. Always multi-graph routing. See
+[cluster-config.md](cluster-config.md#serving-from-the-cluster-the-mode-switch)
+for what is read and the fail-fast readiness rules. `--bind`,
+`--unauthenticated`, and the bearer-token env vars work identically.
+
+Mode inference:
+
+0. CLI `--cluster <dir>` → **multi, cluster-booted** (exclusive)
 1. CLI positional `<URI>` → single
 2. CLI `--target <name>` → single
 3. `server.graph` in config → single