docs(cluster,server): the Phase 5 mode switch; retire applied-not-serving caveats

The standing caveat ('applied means recorded in the cluster catalog —
nothing more; the server still boots from omnigraph.yaml') retires: cluster
docs gain the 'Serving from the cluster' section (exclusivity, applied-
revision serving, fail-fast readiness, restart-to-pick-up, expose-all
bridge), server.md gains mode-inference rule 0 and the cluster-booted multi
mode, deployment.md the boot-source choice, and the CLI's apply note plus
the cli-reference cluster row (stale back to Stage 3A) now describe the full
convergence surface. RFC-005 flips to Landed with four implementation
deviations recorded.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
aaltshuler 2026-06-10 17:55:15 +03:00
parent f3eb60fa4e
commit 711865e6f1
7 changed files with 69 additions and 12 deletions

View file

@ -19,7 +19,7 @@ Top-level command families and subcommands. Graph-targeting commands accept eith
| `commit list \| show` | inspect commit graph |
| `schema plan \| apply \| show (alias: get)` | migrations |
| `lint` (alias: `check`) | offline / graph-backed query validation. Replaces `query lint` / `query check`, which are kept as deprecated argv-level shims that print a one-line warning and rewrite to `omnigraph lint` |
| `cluster validate \| plan \| apply \| approve \| status \| refresh \| import \| force-unlock` | cluster-control preview. `validate` checks a local `cluster.yaml` folder and referenced schema/query/policy files; `plan` diffs it against local JSON state at `__cluster/state.json` and annotates each change with its apply disposition; `apply` executes the config-only (stored-query/policy) subset into the content-addressed local catalog under `__cluster/resources/` — graph/schema changes are deferred loudly, and nothing applied serves traffic (the server still boots from `omnigraph.yaml`); `status` reads the state ledger; `refresh`/`import` explicitly update local JSON state from read-only graph observations; `force-unlock <LOCK_ID>` manually removes a held local state lock by exact id. No graph-manifest movement, server change, automatic stale-lock breaking, or `plan --refresh` occurs in Stage 3A |
| `cluster validate \| plan \| apply \| approve \| status \| refresh \| import \| force-unlock` | declarative cluster control plane. `validate` checks a local `cluster.yaml` folder and referenced schema/query/policy files; `plan` diffs it against local JSON state at `__cluster/state.json`, annotates dispositions, and embeds real schema-migration previews; `apply` converges the cluster — stored-query/policy catalog writes (content-addressed under `__cluster/resources/`), graph creates, schema updates (soft drops only; `--as` records the actor), and graph deletes behind a digest-bound approval from `cluster approve <resource> --as <actor>`; what apply converges is what an `omnigraph-server --cluster <dir>` deployment serves on its next restart (omnigraph.yaml deployments are unaffected); `status` reads the state ledger; `refresh`/`import` explicitly update local JSON state from read-only graph observations; `force-unlock <LOCK_ID>` manually removes a held local state lock by exact id |
| `optimize` | non-destructive Lance compaction (skips tables with `Blob` columns or uncovered drift; `--json` reports `skipped`) |
| `repair [--confirm] [--force]` | preview or explicitly publish uncovered manifest/head drift. `--confirm` heals verified maintenance drift and exits non-zero if suspicious/unverifiable drift is refused; `--force --confirm` publishes suspicious/unverifiable drift after operator review |
| `cleanup --keep N --older-than 7d --confirm` | destructive version GC |

View file

@ -1,6 +1,6 @@
# Cluster Config
**Status:** Stage 4C — Phase 4 complete (graph create, schema apply, gated graph delete).
**Status:** Phase 5 — cluster-booted serving (`omnigraph-server --cluster`).
Cluster config is the future control-plane configuration surface for a whole
OmniGraph deployment. In this stage, OmniGraph can validate a local
@ -190,10 +190,12 @@ Deletes remove the resource from state; their old payload blobs stay on disk
(garbage collection is a later stage). Re-running a converged apply is a no-op:
no state write, no revision change (`state_written: false`).
**Applied means recorded in the cluster catalog — nothing more.** The server
still boots from `omnigraph.yaml`; no query or policy applied here serves
traffic until the server-boot stage ships, as an explicit per-deployment mode
switch.
**Applied means serving — for deployments that opt in.** A server started
with `--cluster <dir>` boots from the applied revision (see
[Serving from the cluster](#serving-from-the-cluster-the-mode-switch)); it
picks up newly applied state on its next restart. Deployments still booting
from `omnigraph.yaml` are untouched: for them, applied means recorded in the
catalog, nothing more.
### Graph creation
@ -305,6 +307,40 @@ fully converges. The `graph.<id>` composite digest is recomputed from state's
own schema/query digests after each apply, so applied query changes converge
without graph movement.
## Serving from the cluster (the mode switch)
```bash
omnigraph-server --cluster ./company-brain --bind 0.0.0.0:8080
```
`--cluster <dir>` is an **exclusive boot source** (axiom 15): it cannot
combine with a graph URI, `--target`, or `--config`, and in this mode
`omnigraph.yaml` is never read — not for graphs, not for queries, not for
policies. The server serves the **applied revision**: graph roots recorded in
`state.json`, stored-query and policy content from the content-addressed
catalog at the applied digests (re-verified at boot), and policy bundles
wired by their applied `applies_to` bindings — `cluster`-bound bundles become
the server-level Cedar engine, graph-bound bundles attach per graph.
Un-applied config drift never leaks into serving; `cluster plan` is where
drift is visible. Routing is always multi-graph (`/graphs/{id}/...`). Bearer
tokens and the bind address stay process-level (flags/env) — they are
per-replica facts, not cluster facts.
Boot is fail-fast: missing or unreadable state, pending recovery sidecars,
missing/tampered catalog blobs, policy entries without binding metadata
(pre-binding ledgers — re-run `cluster apply`), an empty graph set, more than
one policy bundle binding a single scope (split or merge bundles; stacked
scopes are a later stage), unopenable graph roots, and stored queries that no
longer type-check all refuse startup with a remedy. A held state lock is
*not* an error — boot reads the atomically-replaced state file without
locking.
Serving is static per process: the server reads the applied revision once at
startup, so picking up newly applied state means restarting it. Stored
queries are all listed in `GET /queries` in cluster mode (the cluster
registry has no expose flag; exposure becomes a policy decision in a later
phase).
## Status
`cluster status` reads the same local JSON state ledger and prints what the

View file

@ -13,6 +13,14 @@ Omnigraph supports two broad deployment shapes:
The server binary and container image expose the same HTTP surface.
The server also has two **boot sources**: `omnigraph.yaml` (graph targets
declared in the per-operator config) or a **cluster directory**
(`omnigraph-server --cluster <dir>`), which serves the cluster control
plane's applied revision — see
[cluster-config.md](cluster-config.md#serving-from-the-cluster-the-mode-switch).
The two are exclusive per deployment; switching is a restart with a different
flag.
## Binary Deployment
Build or install:

View file

@ -1,6 +1,6 @@
# HTTP Server (`omnigraph-server`)
Axum 0.8 + tokio + utoipa-generated OpenAPI. **Two modes** (v0.6.0+): single-graph (legacy) and multi-graph (MR-668). Mode is inferred from CLI args + config shape.
Axum 0.8 + tokio + utoipa-generated OpenAPI. **Two modes** (v0.6.0+): single-graph (legacy) and multi-graph (MR-668), with **two boot sources** for multi mode: `omnigraph.yaml` or — exclusively — a cluster directory (`--cluster`, RFC-005). Mode is inferred from CLI args + config shape.
## Modes
@ -14,8 +14,20 @@ Axum 0.8 + tokio + utoipa-generated OpenAPI. **Two modes** (v0.6.0+): single-gra
`omnigraph-server --config omnigraph.yaml` with a non-empty `graphs:` map and **no** single-mode selector (no `server.graph`, no `<URI>`, no `--target`). The server opens every configured graph in parallel at startup (bounded concurrency = 4, fail-fast on the first open error). Routes are nested under `/graphs/{graph_id}/...`. Bare flat paths return 404 in multi mode.
Mode inference (four-rule matrix):
### Cluster-booted multi mode (Phase 5)
`omnigraph-server --cluster <dir>` boots from the cluster catalog's **applied
revision** (`state.json` + content-addressed blobs) instead of
`omnigraph.yaml` — an exclusive boot source: combining it with `<URI>`,
`--target`, or `--config` is a startup error, and `omnigraph.yaml` is never
read in this mode. Always multi-graph routing. See
[cluster-config.md](cluster-config.md#serving-from-the-cluster-the-mode-switch)
for what is read and the fail-fast readiness rules. `--bind`,
`--unauthenticated`, and the bearer-token env vars work identically.
Mode inference:
0. CLI `--cluster <dir>`**multi, cluster-booted** (exclusive)
1. CLI positional `<URI>` → single
2. CLI `--target <name>` → single
3. `server.graph` in config → single