diff --git a/docs/dev/rfc-004-cluster-graph-schema-apply.md b/docs/dev/rfc-004-cluster-graph-schema-apply.md index ca72fdc..e9c0336 100644 --- a/docs/dev/rfc-004-cluster-graph-schema-apply.md +++ b/docs/dev/rfc-004-cluster-graph-schema-apply.md @@ -1,6 +1,7 @@ # RFC: Cluster Graph & Schema Apply — Phase 4 of the Cluster Control Plane -**Status:** Proposed +**Status:** Landed (4A #170, 4B #171, 4C — all shipped) +**Implementation deviations:** (1) D3 row 8 retires the stale delete sidecar and lets the still-approved delete re-propose and retry, instead of a pending-block — prefix removal is idempotent, so the retry is the repair. (2) The approver/actor flag is the CLI's existing global `--as`, not a dedicated `--actor`/`--by`. (3) Consumed approval artifacts are rewritten with `consumed_at` rather than moved into state — the file and the ledger record both survive independently (axiom 11). **Date:** 2026-06-10 **Builds on:** cluster Stages 1–3B (shipped: validate/plan/status/refresh/import/force-unlock, config-only `cluster apply` with content-addressed catalog publish, catalog payload verification, failpoint-proven crash/CAS recovery for the apply protocol). Normative context: [cluster-config-specs.md](cluster-config-specs.md), [cluster-axioms.md](cluster-axioms.md), [cluster-config-implementation-spec.md](cluster-config-implementation-spec.md). **Target release:** unversioned (phased — see Sequencing); no cluster functionality is in a tagged release yet. diff --git a/docs/dev/testing.md b/docs/dev/testing.md index 5402ccf..2302b13 100644 --- a/docs/dev/testing.md +++ b/docs/dev/testing.md @@ -8,7 +8,7 @@ This file is the always-on map of the test surface. **Consult it before every ta |---|---|---| | `omnigraph` (engine) | `crates/omnigraph/tests/` | Integration tests (21 files), fixture-driven, share `tests/helpers/mod.rs` | | `omnigraph-cli` | `crates/omnigraph-cli/tests/` | `cli.rs` (unit-ish; includes the `cluster_e2e_*` lifecycle compositions over the spawned binary — lost-state re-import recovery, out-of-band drift, graph-root destruction, multi-graph mixed-disposition convergence), `system_local.rs`, `system_remote.rs`, share `tests/support/mod.rs` | -| `omnigraph-cluster` | mostly in-source `#[cfg(test)] mod tests`; `tests/failpoints.rs` (feature-gated) | Cluster config parser, local JSON state diff, state CAS/lock handling/recovery, read-only validate/plan/status plus explicit refresh/import graph observations, config-only apply (content-addressed payload publish, disposition gating, composite-digest convergence, idempotent re-apply), catalog payload verification (status read-only, refresh drift + self-heal), failpoint crash-mid-apply / CAS-race coverage, Stage 4A graph creation (create executor, recovery sidecars + sweep rows, create crash windows), and Stage 4B schema apply (migration previews in plan, schema executor, schema-apply sweep classification, schema crash windows) | +| `omnigraph-cluster` | mostly in-source `#[cfg(test)] mod tests`; `tests/failpoints.rs` (feature-gated) | Cluster config parser, local JSON state diff, state CAS/lock handling/recovery, read-only validate/plan/status plus explicit refresh/import graph observations, config-only apply (content-addressed payload publish, disposition gating, composite-digest convergence, idempotent re-apply), catalog payload verification (status read-only, refresh drift + self-heal), failpoint crash-mid-apply / CAS-race coverage, Stage 4A graph creation (create executor, recovery sidecars + sweep rows, create crash windows), Stage 4B schema apply (migration previews in plan, schema executor, schema-apply sweep classification, schema crash windows), and Stage 4C gated deletes (digest-bound approvals, delete executor + tombstones, delete sweep rows, delete crash windows) | | `omnigraph-server` | `crates/omnigraph-server/tests/` | `server.rs` (HTTP-level), `openapi.rs` (OpenAPI drift / regeneration) | | `omnigraph-compiler` | mostly in-source `#[cfg(test)] mod tests` | Parser, type-checker, IR lowering, lint | diff --git a/docs/user/cli-reference.md b/docs/user/cli-reference.md index 774ea6b..9dc8a25 100644 --- a/docs/user/cli-reference.md +++ b/docs/user/cli-reference.md @@ -19,7 +19,7 @@ Top-level command families and subcommands. Graph-targeting commands accept eith | `commit list \| show` | inspect commit graph | | `schema plan \| apply \| show (alias: get)` | migrations | | `lint` (alias: `check`) | offline / graph-backed query validation. Replaces `query lint` / `query check`, which are kept as deprecated argv-level shims that print a one-line warning and rewrite to `omnigraph lint` | -| `cluster validate \| plan \| apply \| status \| refresh \| import \| force-unlock` | cluster-control preview. `validate` checks a local `cluster.yaml` folder and referenced schema/query/policy files; `plan` diffs it against local JSON state at `__cluster/state.json` and annotates each change with its apply disposition; `apply` executes the config-only (stored-query/policy) subset into the content-addressed local catalog under `__cluster/resources/` — graph/schema changes are deferred loudly, and nothing applied serves traffic (the server still boots from `omnigraph.yaml`); `status` reads the state ledger; `refresh`/`import` explicitly update local JSON state from read-only graph observations; `force-unlock ` manually removes a held local state lock by exact id. No graph-manifest movement, server change, automatic stale-lock breaking, or `plan --refresh` occurs in Stage 3A | +| `cluster validate \| plan \| apply \| approve \| status \| refresh \| import \| force-unlock` | cluster-control preview. `validate` checks a local `cluster.yaml` folder and referenced schema/query/policy files; `plan` diffs it against local JSON state at `__cluster/state.json` and annotates each change with its apply disposition; `apply` executes the config-only (stored-query/policy) subset into the content-addressed local catalog under `__cluster/resources/` — graph/schema changes are deferred loudly, and nothing applied serves traffic (the server still boots from `omnigraph.yaml`); `status` reads the state ledger; `refresh`/`import` explicitly update local JSON state from read-only graph observations; `force-unlock ` manually removes a held local state lock by exact id. No graph-manifest movement, server change, automatic stale-lock breaking, or `plan --refresh` occurs in Stage 3A | | `optimize` | non-destructive Lance compaction (skips tables with `Blob` columns or uncovered drift; `--json` reports `skipped`) | | `repair [--confirm] [--force]` | preview or explicitly publish uncovered manifest/head drift. `--confirm` heals verified maintenance drift and exits non-zero if suspicious/unverifiable drift is refused; `--force --confirm` publishes suspicious/unverifiable drift after operator review | | `cleanup --keep N --older-than 7d --confirm` | destructive version GC | @@ -79,6 +79,7 @@ policy: omnigraph cluster validate --config ./company-brain omnigraph cluster plan --config ./company-brain --json omnigraph cluster apply --config ./company-brain --json +omnigraph cluster approve graph. --config ./company-brain --as omnigraph cluster status --config ./company-brain --json omnigraph cluster refresh --config ./company-brain --json omnigraph cluster import --config ./company-brain --json diff --git a/docs/user/cluster-config.md b/docs/user/cluster-config.md index 9de305a..2df26be 100644 --- a/docs/user/cluster-config.md +++ b/docs/user/cluster-config.md @@ -1,6 +1,6 @@ # Cluster Config -**Status:** Stage 4B schema-apply preview. +**Status:** Stage 4C — Phase 4 complete (graph create, schema apply, gated graph delete). Cluster config is the future control-plane configuration surface for a whole OmniGraph deployment. In this stage, OmniGraph can validate a local @@ -9,11 +9,10 @@ local JSON state ledger, explicitly refresh/import graph observations into that ledger, manually remove a held local state lock by exact lock id, and **apply the executable subset of the plan** — stored-query and policy-bundle catalog writes, **graph creation** (a declared graph that does not exist yet -is initialized by apply at the derived root), and **schema updates**: a -changed schema is migrated on the live graph by apply itself, soft drops -only. It does not delete graphs (a later stage), perform data-loss -migrations, start servers, or serve anything it applies: the server still -boots from `omnigraph.yaml`. +is initialized by apply at the derived root), **schema updates** (soft drops +only), and — behind an explicit, digest-bound **approval** — **graph +deletion**. It does not perform data-loss schema migrations, start servers, +or serve anything it applies: the server still boots from `omnigraph.yaml`. ## Commands @@ -21,6 +20,7 @@ boots from `omnigraph.yaml`. omnigraph cluster validate --config ./company-brain omnigraph cluster plan --config ./company-brain --json omnigraph cluster apply --config ./company-brain --json +omnigraph cluster approve graph. --config ./company-brain --as omnigraph cluster status --config ./company-brain --json omnigraph cluster refresh --config ./company-brain --json omnigraph cluster import --config ./company-brain --json @@ -253,7 +253,38 @@ in recovery sidecars and audit entries and threads it to the engine's schema-apply (so commit attribution and Cedar enforcement — wherever a policy checker is installed — work unchanged). -Schema deletes (removing a graph) are never executed by this stage. They are +### Approvals and graph deletion + +Deleting a graph is the irreversible tier: it requires a recorded human +decision. `cluster plan` lists the gate under `approvals_required` (one gate +per graph — the graph-level approval carries its schema and queries); +`cluster approve graph. --as ` writes a digest-bound artifact to + +```text +/__cluster/approvals/.json +``` + +bound to the exact desired config digest and the change's state digest, so +**any config or state drift after approving invalidates the artifact** +automatically (`approval_stale` warning; it never authorizes a different +change). An unapproved delete blocks with `approval_required`. + +An approved delete executes **last** in the apply run: the graph root is +removed recursively, the subtree (graph, schema, its queries) is tombstoned +out of the state ledger with a tombstone observation, and the approval is +consumed — recorded in the state's `approval_records` in the same state +update, and the artifact file rewritten with `consumed_at` (the file is never +deleted: the audit fact survives the loss of either store). A failed run +consumes nothing; the approval stays valid for the retry. Catalog blobs of +the deleted graph's queries stay on disk (GC is a later stage). + +Crash recovery for deletes: a completed-but-unrecorded delete is rolled +forward by the sweep (tombstone + approval consumption + audit entry); an +incomplete delete (root still present) is retired with a +`graph_delete_incomplete` warning and simply **re-proposed** — prefix removal +is idempotent, so the still-approved retry is the repair. + +Standalone schema deletes are never executed by this stage. They are reported as `deferred` (warning `apply_unsupported_change`), and query/policy changes that depend on them are `blocked` (warning `apply_dependency_blocked`, status `blocked` in state). A partially-applicable plan still exits 0 with warnings;