Commit graph

19 commits

Author SHA1 Message Date
aaltshuler
8dc2f15255 feat(cluster): the storage: root — state, catalog, and graph roots relocatable
cluster.yaml gains an optional storage: URI deciding where everything the
cluster STORES lives: the state ledger, lock, content-addressed catalog,
recovery sidecars, approval artifacts, and the derived graph roots
(<storage>/graphs/<id>.omni). Absent, it defaults to the config directory
itself — the original layout, byte-compatible, so pre-existing clusters and
the whole test suite are untouched. Declared configuration always stays in
the working tree (Terraform's config-local/state-remote split); credentials
are env-only, never in cluster.yaml.

Every command resolves its store from the declared root (a bad root is a
loud invalid_storage_root). Graph-root derivation, the delete executor
(prefix delete via the adapter), the sweep's existence probes, the catalog
payload write/verify/read paths, and the serving snapshot all flow through
ClusterStore — the last raw-fs holdouts for stored state are gone, and the
deny-list gains the rule that keeps it that way.

Tests: default-layout byte-compat, a file:// root relocating the entire
cluster (ledger+catalog+graphs under the new root, nothing under the config
dir, serving snapshot follows), invalid-root validation. 98 in-crate + 9
failpoints + full workspace gate green. The s3:// flavor lands with PR 3's
gated RustFS e2e.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 14:28:04 +03:00
aaltshuler
44b5866516 docs: drop ./ path prefixes; document query discovery
Paths in cluster.yaml and command examples are relative to one explicit
config folder (Terraform-shaped) — the ./ prefixes were noise and are gone
across the user docs (109 instances; ../ links and ./scripts executables
untouched). The cluster docs now present directory discovery as the primary
queries form with the list and map forms documented alongside.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 01:33:30 +03:00
aaltshuler
99f7f36864 docs(cluster): the precise omnigraph.yaml contract
The 'Relationship to omnigraph.yaml' section becomes the exact rule set:
cluster commands read the per-operator config for exactly one thing (the
cli.actor default when --as is omitted), a --cluster server reads it for
nothing, and pointing data-plane targets at derived roots is ergonomics,
not coupling. Operator guide and CLI reference updated to match.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:30:40 +03:00
aaltshuler
97eb65e921 docs(cluster): operator how-to guide for deploying and managing clusters
New docs/user/cluster.md — the practical companion to cluster-config.md's
reference: zero-to-served walkthrough (validate/import/plan/apply, derived
roots, data loading, --cluster serving), the day-2 edit->plan->apply->restart
loop with a per-change-kind table, drift observation and convergence, the
approval gate for destructive changes, crash/lock/lost-ledger recovery, the
boot-refusal table with remedies, deployment patterns (replicas, backup
unit, CI gating), and the explicit not-yet list (hot reload, S3-hosted
cluster dirs, per-query exposure, pipelines). Linked from the user index,
the agent guide's topic map, and cross-linked from the reference.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:10:19 +03:00
aaltshuler
711865e6f1 docs(cluster,server): the Phase 5 mode switch; retire applied-not-serving caveats
The standing caveat ('applied means recorded in the cluster catalog —
nothing more; the server still boots from omnigraph.yaml') retires: cluster
docs gain the 'Serving from the cluster' section (exclusivity, applied-
revision serving, fail-fast readiness, restart-to-pick-up, expose-all
bridge), server.md gains mode-inference rule 0 and the cluster-booted multi
mode, deployment.md the boot-source choice, and the CLI's apply note plus
the cli-reference cluster row (stale back to Stage 3A) now describe the full
convergence surface. RFC-005 flips to Landed with four implementation
deviations recorded.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 17:56:54 +03:00
aaltshuler
6c98560dde docs(cluster): document policy binding metadata (5A)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 15:30:57 +03:00
aaltshuler
c949a2b717 docs(cluster): document Stage 4C — Phase 4 complete
Approvals + gated graph deletion in the user docs, the approve command in the
CLI reference, RFC-004 flipped to Landed with its three implementation
deviations recorded (row-8 retire-and-repropose, --as instead of --actor/--by,
consumed artifacts rewritten in place rather than moved).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 14:44:12 +03:00
aaltshuler
f217352c93 docs(cluster): document Stage 4B schema apply
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:14:20 +03:00
aaltshuler
cb6c67f196 docs(cluster): document Stage 4A graph create
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 05:00:42 +03:00
aaltshuler
15868972ff feat(cluster): verify catalog payload blobs in status and refresh
Closes the Stage 3A product gap where a deleted or corrupted blob under
__cluster/resources/ went unnoticed forever (status reported converged and
apply could not repair it because the digests matched).

verify_catalog_payloads checks every query/policy digest in state against its
content-addressed blob (existence + full sha256 re-hash; graph/schema/unknown
addresses have no payloads and are skipped). status reports findings read-only
(warnings catalog_payload_missing/_mismatch; error catalog_payload_read_error
— an unverifiable catalog must not report healthy). refresh closes the
self-heal loop: missing/mismatched blobs mark the resource drifted and remove
its digest from state so the next plan proposes a create and the next apply
republishes; unreadable blobs keep the digest (no spurious republish), mark
error, and exit non-zero. Verification runs before graph observation so the
recomputed graph composite already excludes removed query digests.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 02:07:08 +03:00
aaltshuler
7f3ecf282a Merge origin/main (#164 axiom-15 docs, #86 TableStorage migration) into feat/cluster-apply-stage3a
Clean auto-merge; also fix the stale 'Stage 2C accepts' line in
cluster-config.md to Stage 3A.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 00:45:44 +03:00
aaltshuler
69b63c33ac Merge remote-tracking branch 'origin/main' into feat/cluster-apply-stage3a 2026-06-10 00:45:03 +03:00
Andrew Altshuler
cec65b8ef8
docs(cluster): axiom 15 — single ownership, mode-switch migration, per-operator layer (#164)
Encode the omnigraph.yaml ↔ cluster.yaml coexistence rules that were implicit
across the specs:

- cluster-axioms.md: new axiom 15 — every fact has exactly one owner at a time;
  coexistence is a mode switch, never a merge; omnigraph.yaml's job description
  shrinks to the permanent per-operator layer. Added review-tension bullet.
- cluster-config-specs.md: "Migration model" subsection (three coexistence
  windows: no-conflict, Phase-5 mode switch, bridges-with-sunsets) and a
  "per-operator layer" completeness table (connection, credential reference,
  active context, ergonomics, personal aliases) with its global-config-dir
  destination per the RFC-002 direction.
- cluster-config-implementation-spec.md: Compatibility Stance #7–#9 (single
  ownership, shrinking role, bridges carry sunsets); Phase 5 boot is an
  exclusive XOR mode switch; fixed the duplicated recoveries/recovery dirs in
  the Phase-1 storage layout.
- docs/user/cluster-config.md: "Relationship to omnigraph.yaml" section in
  current-reality terms (cluster catalog is inspectable, not live).

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 00:44:51 +03:00
aaltshuler
40a21e4e77 docs(cluster): document Stage 3A config-only cluster apply
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 23:36:33 +03:00
aaltshuler
89b876c797 Add cluster state lock recovery 2026-06-09 22:31:46 +03:00
aaltshuler
d00d42274e Implement cluster refresh and import 2026-06-09 21:17:23 +03:00
aaltshuler
2f19656c0e fix(cluster): tighten state lock observations 2026-06-09 18:30:33 +03:00
aaltshuler
a7956ea5a9 Add cluster JSON state ledger status 2026-06-08 21:09:23 +03:00
aaltshuler
043b02e617 feat(cluster): add read-only validate and plan 2026-06-08 20:07:39 +03:00