From a3a8b33b09c47ebd2bce50e7cbc22fd8a3f66e7c Mon Sep 17 00:00:00 2001 From: aaltshuler Date: Thu, 11 Jun 2026 04:48:06 +0300 Subject: [PATCH] =?UTF-8?q?docs(rfc):=20RFC-006=20=E2=80=94=20object-stora?= =?UTF-8?q?ge-native=20omnigraph?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The migration design making object storage THE deployment model: a sealed ClusterStore interface (object_store-backed) replaces every raw-fs call in the cluster crate; cluster.yaml gains a storage: root (s3://... — state ledger via conditional-put CAS, cross-machine locking, catalog/sidecars/ approvals as objects, derived graph roots as engine-native S3 URIs); the server takes --cluster s3://... and cluster deployments become stateless (bucket, no volume). Config files stay in the working tree — Terraform's config-local/state-remote split. Local FS is demoted, not deleted: one interface, file:// as an explicit dev/test backend, S3-first everywhere in docs, storage: required at the v0.9 boundary. Grounded: conditional writes (If-None-Match and If-Match) verified live against RustFS 1.0.0-beta.8 — both probes pass; Lance 6 already commits via S3 conditional writes; Omnigraph::init/open accept S3 URIs today. Staged A-D with sizes and the migrate-storage cutover tool. Co-Authored-By: Claude Fable 5 --- docs/dev/index.md | 1 + docs/dev/rfc-006-object-storage-native.md | 193 ++++++++++++++++++++++ 2 files changed, 194 insertions(+) create mode 100644 docs/dev/rfc-006-object-storage-native.md diff --git a/docs/dev/index.md b/docs/dev/index.md index 4bc1e6a..30a19bb 100644 --- a/docs/dev/index.md +++ b/docs/dev/index.md @@ -76,6 +76,7 @@ Working documents for in-flight feature work. Removed when the work lands. | Future cluster control plane — declarative as-code config, JSON state ledger, reconciler | [cluster-config-specs.md](cluster-config-specs.md), [cluster-axioms.md](cluster-axioms.md), [cluster-config-implementation-spec.md](cluster-config-implementation-spec.md) | | Cluster graph & schema apply — Phase 4 sidecars, roll-forward recovery, approval artifacts | [rfc-004-cluster-graph-schema-apply.md](rfc-004-cluster-graph-schema-apply.md) | | Server boots from cluster state — Phase 5 mode switch, applied-revision serving | [rfc-005-server-cluster-boot.md](rfc-005-server-cluster-boot.md) | +| Object-storage-native migration — S3-first cluster, local-FS deprecation | [rfc-006-object-storage-native.md](rfc-006-object-storage-native.md) | ## Boundary diff --git a/docs/dev/rfc-006-object-storage-native.md b/docs/dev/rfc-006-object-storage-native.md new file mode 100644 index 0000000..4fde255 --- /dev/null +++ b/docs/dev/rfc-006-object-storage-native.md @@ -0,0 +1,193 @@ +# RFC-006: Object-Storage-Native OmniGraph + +**Status:** Proposed +**Depends on:** RFC-005 (cluster serving, landed), Phase 4 (landed) +**Decides:** how the cluster control plane migrates from raw-filesystem I/O to +object storage, and what "local filesystem support" means afterwards. + +## Motivation + +The engine is already object-storage-native: Lance datasets live behind the +`object_store` abstraction, S3/RustFS graphs are CI-tested, and the classic +single-graph deployment runs stateless against a bucket. The **cluster control +plane is not**: the state ledger, lock, catalog blobs, recovery sidecars, +approval artifacts, and — most consequentially — the derived graph roots are +all raw `fs::*` against the config directory. Consequences: + +- A cluster cannot put its data on S3 at all; cloud deployments need a + persistent volume, abandoning the stateless-bucket shape the classic mode + already has (cookbooks PR #12 validated it on Railway). +- The control plane carries a *second* storage layer with different semantics + (rename-CAS, `read_dir`, `remove_dir_all`) instead of the one the substrate + already provides. + +The directive this RFC implements: **OmniGraph is object-storage only.** One +storage interface; every stored byte addressed by URI; S3-compatible object +storage is the deployment model. The local filesystem does not disappear — it +is demoted from "a separate code path" to "one backend of the same interface" +(`file://`), retained for development and tests, and removed from the +production story. + +## Verified foundations + +- **S3 conditional writes work on the target backend.** Tested against RustFS + 1.0.0-beta.8 (2026-06-11): `PUT` with `If-None-Match: *` → first write 200, + second 412; `PUT` with `If-Match: ` → fresh etag 200, stale 412. AWS + S3 has shipped both since 2024/2025; the `object_store` crate exposes them + as `PutMode::Create` / `PutMode::Update(version)`. +- **Lance 6.x commits natively via S3 conditional writes** — the engine's own + multi-version safety on S3 needs no external lock table. +- **`Omnigraph::init/open` accept S3 URIs today** — derived graph roots on S3 + require no engine work. + +## Design + +### D1. One storage interface (the rule) + +A new sealed `ClusterStore` abstraction (thin over the `object_store` crate, +reusing the engine's `storage_for_uri` URI plumbing) carries **every** cluster +read/write: state ledger, lock, catalog payloads, recovery sidecars, approval +artifacts. Raw `fs::*` in `omnigraph-cluster` for stored state becomes a +**deny-list entry** in [invariants.md](invariants.md). Backends: `s3://` (and +any S3-compatible endpoint — RustFS, MinIO, R2) and `file://` (dev/test). + +The one deliberate exception: **declared configuration** — `cluster.yaml` and +the `.pg`/`.gq`/policy files it references — stays in the operator's working +tree, read-only, exactly like Terraform reads `.tf` files locally while the +*state backend* is remote. Config is versioned in git; state and data live in +the store. + +### D2. The `storage:` root + +```yaml +version: 1 +storage: s3://omnigraph-local/clusters/intel # the cluster's home +graphs: + spike: + schema: schema.pg # config: read from the working tree + queries: queries/ +``` + +Everything currently under `/__cluster/` and `/graphs/` +moves under the storage root: + +``` +s3://omnigraph-local/clusters/intel/ +├── state.json # the ledger (CAS via conditional put) +├── lock.json # create-only put; force-unlock = delete +├── resources/… # content-addressed catalog (immutable puts) +├── recoveries/… # sidecars +├── approvals/… # approval artifacts +└── graphs/ # derived Lance roots (engine-native S3) +``` + +During the migration window `storage:` defaults to `file://` +(today's layout, byte-compatible). After the deprecation boundary (D8) the key +is **required** — naming your storage is the point; `file://` remains legal +but explicit. + +Credentials are never in `cluster.yaml`: the standard `AWS_*` env contract +(already documented for the engine) applies to the control plane identically. + +### D3. Ledger CAS and locking on object storage + +- `write_state` (today: temp file + rename, guarded by recorded `state_cas`) + becomes `PutMode::Update(etag)` — the etag read with the state replaces the + sha256 sidecar field as the CAS token (the sha256 stays as content identity + in audit/output). A 412 maps to the existing `state_cas_conflict` path. +- `acquire_lock` becomes `PutMode::Create` on `lock.json`; 412 → held (read + the holder for the message); `force-unlock ` = read, verify id, delete. + Same semantics as today, now correct across machines — which the file + backend never was. +- Latency: one GET + one conditional PUT per command on the happy path — + noise next to graph opens. The recovery sweep adds a LIST of `recoveries/`. + +### D4. Catalog, sidecars, approvals + +Mechanical ports: content-addressed payloads are immutable puts (idempotent by +construction — a re-put of the same digest is a no-op); sidecars and approvals +are small JSON objects with LIST + GET + DELETE lifecycles. Approval files +gain nothing; their digest-binding semantics are storage-agnostic. + +### D5. Derived graph roots become URIs + +`/graphs/` replaces `/graphs/.omni`. Executors: +create = `Omnigraph::init(uri)` (works today); schema apply = `open(uri)` +(works today); approved delete = object-store **prefix delete** replacing +`remove_dir_all`. The recovery sweep's `root.exists()` becomes a prefix LIST +(non-empty = exists). Tombstones and the digest classification logic are +unchanged — they never depended on the filesystem. + +### D6. Serving from a bucket + +`omnigraph-server --cluster ` accepts the storage root URI directly +(`--cluster s3://omnigraph-local/clusters/intel`). `read_serving_snapshot` +reads ledger + catalog through `ClusterStore`; graphs open by their S3 URIs. +**A cluster deployment becomes stateless again**: no volume, restart-to-adopt +unchanged, replicas trivially safe (boot is read-only). Railway = service + +Bucket; ECS = task + S3; the PR-#12 topology and the cluster topology +converge. + +### D7. Migration tooling + +`omnigraph cluster migrate-storage --config `: object-copy +graphs + catalog + approvals to the destination (Lance layouts are +path-relative; immutable files copy safely), write the ledger last via +create-only put, then print the `storage:` line to commit into `cluster.yaml`. +Idempotent and resumable (copy is keyed by listing diff; the ledger write is +the atomic cutover). The reverse direction works identically (S3 → `file://` +for local debugging). Fallback path: `cluster import` against live S3 graphs +already reconstructs a lost ledger. + +### D8. Deprecation of local-FS as an operating mode + +Per axiom 15's bridge rule (every bridge names its replacement and sunset): + +| Phase | local FS status | +|---|---| +| Now → Stage C lands | implicit default (`storage:` absent ⇒ `file://`) | +| Stage D (docs flip) | S3-first everywhere: docs, cookbooks, skills, deployment recipes; `file://` documented **only** under development/testing; absent `storage:` emits a deprecation warning naming this RFC | +| v0.9 boundary | `storage:` **required**; `file://` stays a legal explicit backend for dev/test — it is the same code path, costs no second implementation, and keeps `cargo test` hermetic (no daemon dependency in unit tests) | + +**Recommendation embedded here (the one place this RFC pushes back):** "object +storage only" is enforced at the *interface* level — one code path, every +location a URI — not by deleting the `file://` backend. Hard removal would +force a RustFS daemon into every unit test and air-gapped dev loop while +deleting zero code (the backend ships inside `object_store` either way). +Terraform's local state backend survives for the same reason its S3 backend is +still the only one anyone deploys. If a harder line is wanted later, it is a +docs-and-validation flip, not an architecture change. + +## Staging + +| Stage | Delivers | Size | +|---|---|---| +| **A** | `ClusterStore` + ledger/lock/sidecars/approvals/catalog ported; `file://` behavior byte-compatible; S3 backend live behind `storage:`; conditional-put CAS + cross-machine lock; RustFS-gated integration tests | the big one — touches every backend call in `omnigraph-cluster` | +| **B** | URI graph roots: executor init/apply/delete + sweep on URIs; prefix-delete; e2e: full lifecycle against RustFS | medium | +| **C** | `--cluster ` serving + bucket-backed snapshot reads; system e2e: apply to RustFS, serve from it; Railway Bucket deploy validated (closes the loop with cookbooks PR #12's topology) | medium | +| **D** | `migrate-storage`, docs flip (S3-first), cookbooks/skills update, deprecation warning, deny-list entry | small code, wide docs | + +Each stage is a PR with the usual gates; A and B are separable but land best +back-to-back (B is where the user-visible payoff starts). + +## Open questions + +1. **RustFS GA & conditional-write contract stability** — beta.8 passes both + probes; pin the probes as a `lance_surface_guards`-style integration test + so a regression in a RustFS bump turns red here, not in production. +2. **Multi-writer ergonomics** — conditional puts make concurrent applies + *safe* (one wins, one gets `state_cas_conflict`); whether we want lease + semantics (lock TTL + auto-break) is a later UX question, not a + correctness one. +3. **Catalog GC on object storage** — deletes leave blobs today on FS too; + the existing gap carries over unchanged, tracked separately. +4. **`--config` ergonomics** — once state is remote, two operators sharing a + bucket need only the config repo; document the "config in git, state in + S3" workflow as the primary pattern (it is the Terraform workflow). + +## What this explicitly does not change + +Engine storage (already object-store native), the `.pg`/`.gq` languages, the +plan/apply/approve model, recovery semantics (sidecar classification is +storage-agnostic), the serving API surface, and `omnigraph.yaml`'s +per-operator role.