mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-12 01:45:14 +02:00
docs(cluster): document Stage 4C — Phase 4 complete
Approvals + gated graph deletion in the user docs, the approve command in the CLI reference, RFC-004 flipped to Landed with its three implementation deviations recorded (row-8 retire-and-repropose, --as instead of --actor/--by, consumed artifacts rewritten in place rather than moved). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
87691fe9c7
commit
c949a2b717
4 changed files with 43 additions and 10 deletions
|
|
@ -1,6 +1,7 @@
|
|||
# RFC: Cluster Graph & Schema Apply — Phase 4 of the Cluster Control Plane
|
||||
|
||||
**Status:** Proposed
|
||||
**Status:** Landed (4A #170, 4B #171, 4C — all shipped)
|
||||
**Implementation deviations:** (1) D3 row 8 retires the stale delete sidecar and lets the still-approved delete re-propose and retry, instead of a pending-block — prefix removal is idempotent, so the retry is the repair. (2) The approver/actor flag is the CLI's existing global `--as`, not a dedicated `--actor`/`--by`. (3) Consumed approval artifacts are rewritten with `consumed_at` rather than moved into state — the file and the ledger record both survive independently (axiom 11).
|
||||
**Date:** 2026-06-10
|
||||
**Builds on:** cluster Stages 1–3B (shipped: validate/plan/status/refresh/import/force-unlock, config-only `cluster apply` with content-addressed catalog publish, catalog payload verification, failpoint-proven crash/CAS recovery for the apply protocol). Normative context: [cluster-config-specs.md](cluster-config-specs.md), [cluster-axioms.md](cluster-axioms.md), [cluster-config-implementation-spec.md](cluster-config-implementation-spec.md).
|
||||
**Target release:** unversioned (phased — see Sequencing); no cluster functionality is in a tagged release yet.
|
||||
|
|
|
|||
|
|
@ -8,7 +8,7 @@ This file is the always-on map of the test surface. **Consult it before every ta
|
|||
|---|---|---|
|
||||
| `omnigraph` (engine) | `crates/omnigraph/tests/` | Integration tests (21 files), fixture-driven, share `tests/helpers/mod.rs` |
|
||||
| `omnigraph-cli` | `crates/omnigraph-cli/tests/` | `cli.rs` (unit-ish; includes the `cluster_e2e_*` lifecycle compositions over the spawned binary — lost-state re-import recovery, out-of-band drift, graph-root destruction, multi-graph mixed-disposition convergence), `system_local.rs`, `system_remote.rs`, share `tests/support/mod.rs` |
|
||||
| `omnigraph-cluster` | mostly in-source `#[cfg(test)] mod tests`; `tests/failpoints.rs` (feature-gated) | Cluster config parser, local JSON state diff, state CAS/lock handling/recovery, read-only validate/plan/status plus explicit refresh/import graph observations, config-only apply (content-addressed payload publish, disposition gating, composite-digest convergence, idempotent re-apply), catalog payload verification (status read-only, refresh drift + self-heal), failpoint crash-mid-apply / CAS-race coverage, Stage 4A graph creation (create executor, recovery sidecars + sweep rows, create crash windows), and Stage 4B schema apply (migration previews in plan, schema executor, schema-apply sweep classification, schema crash windows) |
|
||||
| `omnigraph-cluster` | mostly in-source `#[cfg(test)] mod tests`; `tests/failpoints.rs` (feature-gated) | Cluster config parser, local JSON state diff, state CAS/lock handling/recovery, read-only validate/plan/status plus explicit refresh/import graph observations, config-only apply (content-addressed payload publish, disposition gating, composite-digest convergence, idempotent re-apply), catalog payload verification (status read-only, refresh drift + self-heal), failpoint crash-mid-apply / CAS-race coverage, Stage 4A graph creation (create executor, recovery sidecars + sweep rows, create crash windows), Stage 4B schema apply (migration previews in plan, schema executor, schema-apply sweep classification, schema crash windows), and Stage 4C gated deletes (digest-bound approvals, delete executor + tombstones, delete sweep rows, delete crash windows) |
|
||||
| `omnigraph-server` | `crates/omnigraph-server/tests/` | `server.rs` (HTTP-level), `openapi.rs` (OpenAPI drift / regeneration) |
|
||||
| `omnigraph-compiler` | mostly in-source `#[cfg(test)] mod tests` | Parser, type-checker, IR lowering, lint |
|
||||
|
||||
|
|
|
|||
|
|
@ -19,7 +19,7 @@ Top-level command families and subcommands. Graph-targeting commands accept eith
|
|||
| `commit list \| show` | inspect commit graph |
|
||||
| `schema plan \| apply \| show (alias: get)` | migrations |
|
||||
| `lint` (alias: `check`) | offline / graph-backed query validation. Replaces `query lint` / `query check`, which are kept as deprecated argv-level shims that print a one-line warning and rewrite to `omnigraph lint` |
|
||||
| `cluster validate \| plan \| apply \| status \| refresh \| import \| force-unlock` | cluster-control preview. `validate` checks a local `cluster.yaml` folder and referenced schema/query/policy files; `plan` diffs it against local JSON state at `__cluster/state.json` and annotates each change with its apply disposition; `apply` executes the config-only (stored-query/policy) subset into the content-addressed local catalog under `__cluster/resources/` — graph/schema changes are deferred loudly, and nothing applied serves traffic (the server still boots from `omnigraph.yaml`); `status` reads the state ledger; `refresh`/`import` explicitly update local JSON state from read-only graph observations; `force-unlock <LOCK_ID>` manually removes a held local state lock by exact id. No graph-manifest movement, server change, automatic stale-lock breaking, or `plan --refresh` occurs in Stage 3A |
|
||||
| `cluster validate \| plan \| apply \| approve \| status \| refresh \| import \| force-unlock` | cluster-control preview. `validate` checks a local `cluster.yaml` folder and referenced schema/query/policy files; `plan` diffs it against local JSON state at `__cluster/state.json` and annotates each change with its apply disposition; `apply` executes the config-only (stored-query/policy) subset into the content-addressed local catalog under `__cluster/resources/` — graph/schema changes are deferred loudly, and nothing applied serves traffic (the server still boots from `omnigraph.yaml`); `status` reads the state ledger; `refresh`/`import` explicitly update local JSON state from read-only graph observations; `force-unlock <LOCK_ID>` manually removes a held local state lock by exact id. No graph-manifest movement, server change, automatic stale-lock breaking, or `plan --refresh` occurs in Stage 3A |
|
||||
| `optimize` | non-destructive Lance compaction (skips tables with `Blob` columns or uncovered drift; `--json` reports `skipped`) |
|
||||
| `repair [--confirm] [--force]` | preview or explicitly publish uncovered manifest/head drift. `--confirm` heals verified maintenance drift and exits non-zero if suspicious/unverifiable drift is refused; `--force --confirm` publishes suspicious/unverifiable drift after operator review |
|
||||
| `cleanup --keep N --older-than 7d --confirm` | destructive version GC |
|
||||
|
|
@ -79,6 +79,7 @@ policy:
|
|||
omnigraph cluster validate --config ./company-brain
|
||||
omnigraph cluster plan --config ./company-brain --json
|
||||
omnigraph cluster apply --config ./company-brain --json
|
||||
omnigraph cluster approve graph.<id> --config ./company-brain --as <actor>
|
||||
omnigraph cluster status --config ./company-brain --json
|
||||
omnigraph cluster refresh --config ./company-brain --json
|
||||
omnigraph cluster import --config ./company-brain --json
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
# Cluster Config
|
||||
|
||||
**Status:** Stage 4B schema-apply preview.
|
||||
**Status:** Stage 4C — Phase 4 complete (graph create, schema apply, gated graph delete).
|
||||
|
||||
Cluster config is the future control-plane configuration surface for a whole
|
||||
OmniGraph deployment. In this stage, OmniGraph can validate a local
|
||||
|
|
@ -9,11 +9,10 @@ local JSON state ledger, explicitly refresh/import graph observations into
|
|||
that ledger, manually remove a held local state lock by exact lock id, and
|
||||
**apply the executable subset of the plan** — stored-query and policy-bundle
|
||||
catalog writes, **graph creation** (a declared graph that does not exist yet
|
||||
is initialized by apply at the derived root), and **schema updates**: a
|
||||
changed schema is migrated on the live graph by apply itself, soft drops
|
||||
only. It does not delete graphs (a later stage), perform data-loss
|
||||
migrations, start servers, or serve anything it applies: the server still
|
||||
boots from `omnigraph.yaml`.
|
||||
is initialized by apply at the derived root), **schema updates** (soft drops
|
||||
only), and — behind an explicit, digest-bound **approval** — **graph
|
||||
deletion**. It does not perform data-loss schema migrations, start servers,
|
||||
or serve anything it applies: the server still boots from `omnigraph.yaml`.
|
||||
|
||||
## Commands
|
||||
|
||||
|
|
@ -21,6 +20,7 @@ boots from `omnigraph.yaml`.
|
|||
omnigraph cluster validate --config ./company-brain
|
||||
omnigraph cluster plan --config ./company-brain --json
|
||||
omnigraph cluster apply --config ./company-brain --json
|
||||
omnigraph cluster approve graph.<id> --config ./company-brain --as <actor>
|
||||
omnigraph cluster status --config ./company-brain --json
|
||||
omnigraph cluster refresh --config ./company-brain --json
|
||||
omnigraph cluster import --config ./company-brain --json
|
||||
|
|
@ -253,7 +253,38 @@ in recovery sidecars and audit entries and threads it to the engine's
|
|||
schema-apply (so commit attribution and Cedar enforcement — wherever a policy
|
||||
checker is installed — work unchanged).
|
||||
|
||||
Schema deletes (removing a graph) are never executed by this stage. They are
|
||||
### Approvals and graph deletion
|
||||
|
||||
Deleting a graph is the irreversible tier: it requires a recorded human
|
||||
decision. `cluster plan` lists the gate under `approvals_required` (one gate
|
||||
per graph — the graph-level approval carries its schema and queries);
|
||||
`cluster approve graph.<id> --as <actor>` writes a digest-bound artifact to
|
||||
|
||||
```text
|
||||
<config-dir>/__cluster/approvals/<approval-id>.json
|
||||
```
|
||||
|
||||
bound to the exact desired config digest and the change's state digest, so
|
||||
**any config or state drift after approving invalidates the artifact**
|
||||
automatically (`approval_stale` warning; it never authorizes a different
|
||||
change). An unapproved delete blocks with `approval_required`.
|
||||
|
||||
An approved delete executes **last** in the apply run: the graph root is
|
||||
removed recursively, the subtree (graph, schema, its queries) is tombstoned
|
||||
out of the state ledger with a tombstone observation, and the approval is
|
||||
consumed — recorded in the state's `approval_records` in the same state
|
||||
update, and the artifact file rewritten with `consumed_at` (the file is never
|
||||
deleted: the audit fact survives the loss of either store). A failed run
|
||||
consumes nothing; the approval stays valid for the retry. Catalog blobs of
|
||||
the deleted graph's queries stay on disk (GC is a later stage).
|
||||
|
||||
Crash recovery for deletes: a completed-but-unrecorded delete is rolled
|
||||
forward by the sweep (tombstone + approval consumption + audit entry); an
|
||||
incomplete delete (root still present) is retired with a
|
||||
`graph_delete_incomplete` warning and simply **re-proposed** — prefix removal
|
||||
is idempotent, so the still-approved retry is the repair.
|
||||
|
||||
Standalone schema deletes are never executed by this stage. They are
|
||||
reported as `deferred` (warning `apply_unsupported_change`), and query/policy
|
||||
changes that depend on them are `blocked` (warning `apply_dependency_blocked`, status
|
||||
`blocked` in state). A partially-applicable plan still exits 0 with warnings;
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue