omnigraph/docs
Andrew Altshuler 58c66a54a2
docs(cluster): RFC-004 — graph & schema apply design (Phase 4) (#168)
* docs(cluster): RFC-004 — graph & schema apply design (Phase 4)

The design the implementation spec's exit criteria require before
graph-moving cluster apply ships. Core positions:

- Cluster recovery is roll-forward-only: the engine's own sidecars make every
  graph-level operation atomic within the graph, so the cluster never rolls a
  graph back — its sidecars (__cluster/recoveries/{ulid}.json) classify and
  record, converging the ledger to observable reality (axiom 5) or surfacing
  a loud pending-repair condition. Eight-row decision matrix, every row
  testable with the Stage 3B failpoint harness.
- Irreversible operations (graph delete, allow_data_loss schema apply)
  consume digest-bound approval artifacts written by a new cluster approve
  command and retired into state.approval_records (axiom 11). A stale
  approval can never authorize a different change.
- cluster apply gains an actor, threaded to apply_schema_as so engine Cedar
  enforcement and commit attribution work unchanged; the cluster adds no
  policy engine of its own.
- Deterministic ordering (creates -> schema applies -> catalog -> deletes),
  per-resource apply groups, cross-graph atomicity explicitly not promised.
- Staged 4A graph create / 4B schema apply / 4C graph delete, each gated on
  per-matrix-row failpoint tests.

Answers exit criteria 2 and 4 fully, 1/5/6 partially; 3/7/8/9 deferred to
their phases (coverage table in the RFC). Linked from the dev index and the
implementation spec's Phase 4 section.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs(cluster): RFC-004 review fixes — graph_delete sweep rows, state_cas_base contract

Two greptile findings: (1) D3 row 2 could not be evaluated for graph_delete
(no manifest to version-check after prefix removal) and 'root absent, state
already tombstoned' fell into the stale row — split into rows 7 (delete's
analog of row 2) and 7b (the roll-forward), with expected_manifest_version
documented as always null for the delete kind. (2) state_cas_base is now
explicitly audit/diagnostics-only — the sweep never consults it; independent
state mutations are handled by the ordinary CAS like any concurrent write.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 04:34:14 +03:00
..
dev docs(cluster): RFC-004 — graph & schema apply design (Phase 4) (#168) 2026-06-10 04:34:14 +03:00
releases docs(releases): attribute the __run__ sweep (MR-770) to v0.6.2, not v0.6.1 (#161) 2026-06-09 22:31:50 +03:00
rfcs governance: external contribution model (issues/discussions/RFCs/PRs) (#143) 2026-06-06 23:58:08 +03:00
user feat(cluster): verify catalog payload blobs in status and refresh 2026-06-10 02:07:08 +03:00