omnigraph/docs/dev/rfc-006-object-storage-native.md
aaltshuler a3a8b33b09 docs(rfc): RFC-006 — object-storage-native omnigraph
The migration design making object storage THE deployment model: a sealed
ClusterStore interface (object_store-backed) replaces every raw-fs call in
the cluster crate; cluster.yaml gains a storage: root (s3://... — state
ledger via conditional-put CAS, cross-machine locking, catalog/sidecars/
approvals as objects, derived graph roots as engine-native S3 URIs); the
server takes --cluster s3://... and cluster deployments become stateless
(bucket, no volume). Config files stay in the working tree — Terraform's
config-local/state-remote split. Local FS is demoted, not deleted: one
interface, file:// as an explicit dev/test backend, S3-first everywhere in
docs, storage: required at the v0.9 boundary.

Grounded: conditional writes (If-None-Match and If-Match) verified live
against RustFS 1.0.0-beta.8 — both probes pass; Lance 6 already commits via
S3 conditional writes; Omnigraph::init/open accept S3 URIs today. Staged
A-D with sizes and the migrate-storage cutover tool.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 04:48:06 +03:00

10 KiB

RFC-006: Object-Storage-Native OmniGraph

Status: Proposed Depends on: RFC-005 (cluster serving, landed), Phase 4 (landed) Decides: how the cluster control plane migrates from raw-filesystem I/O to object storage, and what "local filesystem support" means afterwards.

Motivation

The engine is already object-storage-native: Lance datasets live behind the object_store abstraction, S3/RustFS graphs are CI-tested, and the classic single-graph deployment runs stateless against a bucket. The cluster control plane is not: the state ledger, lock, catalog blobs, recovery sidecars, approval artifacts, and — most consequentially — the derived graph roots are all raw fs::* against the config directory. Consequences:

  • A cluster cannot put its data on S3 at all; cloud deployments need a persistent volume, abandoning the stateless-bucket shape the classic mode already has (cookbooks PR #12 validated it on Railway).
  • The control plane carries a second storage layer with different semantics (rename-CAS, read_dir, remove_dir_all) instead of the one the substrate already provides.

The directive this RFC implements: OmniGraph is object-storage only. One storage interface; every stored byte addressed by URI; S3-compatible object storage is the deployment model. The local filesystem does not disappear — it is demoted from "a separate code path" to "one backend of the same interface" (file://), retained for development and tests, and removed from the production story.

Verified foundations

  • S3 conditional writes work on the target backend. Tested against RustFS 1.0.0-beta.8 (2026-06-11): PUT with If-None-Match: * → first write 200, second 412; PUT with If-Match: <etag> → fresh etag 200, stale 412. AWS S3 has shipped both since 2024/2025; the object_store crate exposes them as PutMode::Create / PutMode::Update(version).
  • Lance 6.x commits natively via S3 conditional writes — the engine's own multi-version safety on S3 needs no external lock table.
  • Omnigraph::init/open accept S3 URIs today — derived graph roots on S3 require no engine work.

Design

D1. One storage interface (the rule)

A new sealed ClusterStore abstraction (thin over the object_store crate, reusing the engine's storage_for_uri URI plumbing) carries every cluster read/write: state ledger, lock, catalog payloads, recovery sidecars, approval artifacts. Raw fs::* in omnigraph-cluster for stored state becomes a deny-list entry in invariants.md. Backends: s3:// (and any S3-compatible endpoint — RustFS, MinIO, R2) and file:// (dev/test).

The one deliberate exception: declared configurationcluster.yaml and the .pg/.gq/policy files it references — stays in the operator's working tree, read-only, exactly like Terraform reads .tf files locally while the state backend is remote. Config is versioned in git; state and data live in the store.

D2. The storage: root

version: 1
storage: s3://omnigraph-local/clusters/intel   # the cluster's home
graphs:
  spike:
    schema: schema.pg          # config: read from the working tree
    queries: queries/

Everything currently under <config-dir>/__cluster/ and <config-dir>/graphs/ moves under the storage root:

s3://omnigraph-local/clusters/intel/
├── state.json                  # the ledger (CAS via conditional put)
├── lock.json                   # create-only put; force-unlock = delete
├── resources/…                 # content-addressed catalog (immutable puts)
├── recoveries/…                # sidecars
├── approvals/…                 # approval artifacts
└── graphs/<id>                 # derived Lance roots (engine-native S3)

During the migration window storage: defaults to file://<config-dir> (today's layout, byte-compatible). After the deprecation boundary (D8) the key is required — naming your storage is the point; file:// remains legal but explicit.

Credentials are never in cluster.yaml: the standard AWS_* env contract (already documented for the engine) applies to the control plane identically.

D3. Ledger CAS and locking on object storage

  • write_state (today: temp file + rename, guarded by recorded state_cas) becomes PutMode::Update(etag) — the etag read with the state replaces the sha256 sidecar field as the CAS token (the sha256 stays as content identity in audit/output). A 412 maps to the existing state_cas_conflict path.
  • acquire_lock becomes PutMode::Create on lock.json; 412 → held (read the holder for the message); force-unlock <id> = read, verify id, delete. Same semantics as today, now correct across machines — which the file backend never was.
  • Latency: one GET + one conditional PUT per command on the happy path — noise next to graph opens. The recovery sweep adds a LIST of recoveries/.

D4. Catalog, sidecars, approvals

Mechanical ports: content-addressed payloads are immutable puts (idempotent by construction — a re-put of the same digest is a no-op); sidecars and approvals are small JSON objects with LIST + GET + DELETE lifecycles. Approval files gain nothing; their digest-binding semantics are storage-agnostic.

D5. Derived graph roots become URIs

<storage>/graphs/<id> replaces <config-dir>/graphs/<id>.omni. Executors: create = Omnigraph::init(uri) (works today); schema apply = open(uri) (works today); approved delete = object-store prefix delete replacing remove_dir_all. The recovery sweep's root.exists() becomes a prefix LIST (non-empty = exists). Tombstones and the digest classification logic are unchanged — they never depended on the filesystem.

D6. Serving from a bucket

omnigraph-server --cluster <uri> accepts the storage root URI directly (--cluster s3://omnigraph-local/clusters/intel). read_serving_snapshot reads ledger + catalog through ClusterStore; graphs open by their S3 URIs. A cluster deployment becomes stateless again: no volume, restart-to-adopt unchanged, replicas trivially safe (boot is read-only). Railway = service + Bucket; ECS = task + S3; the PR-#12 topology and the cluster topology converge.

D7. Migration tooling

omnigraph cluster migrate-storage <dest-uri> --config <dir>: object-copy graphs + catalog + approvals to the destination (Lance layouts are path-relative; immutable files copy safely), write the ledger last via create-only put, then print the storage: line to commit into cluster.yaml. Idempotent and resumable (copy is keyed by listing diff; the ledger write is the atomic cutover). The reverse direction works identically (S3 → file:// for local debugging). Fallback path: cluster import against live S3 graphs already reconstructs a lost ledger.

D8. Deprecation of local-FS as an operating mode

Per axiom 15's bridge rule (every bridge names its replacement and sunset):

Phase local FS status
Now → Stage C lands implicit default (storage: absent ⇒ file://<config-dir>)
Stage D (docs flip) S3-first everywhere: docs, cookbooks, skills, deployment recipes; file:// documented only under development/testing; absent storage: emits a deprecation warning naming this RFC
v0.9 boundary storage: required; file:// stays a legal explicit backend for dev/test — it is the same code path, costs no second implementation, and keeps cargo test hermetic (no daemon dependency in unit tests)

Recommendation embedded here (the one place this RFC pushes back): "object storage only" is enforced at the interface level — one code path, every location a URI — not by deleting the file:// backend. Hard removal would force a RustFS daemon into every unit test and air-gapped dev loop while deleting zero code (the backend ships inside object_store either way). Terraform's local state backend survives for the same reason its S3 backend is still the only one anyone deploys. If a harder line is wanted later, it is a docs-and-validation flip, not an architecture change.

Staging

Stage Delivers Size
A ClusterStore + ledger/lock/sidecars/approvals/catalog ported; file:// behavior byte-compatible; S3 backend live behind storage:; conditional-put CAS + cross-machine lock; RustFS-gated integration tests the big one — touches every backend call in omnigraph-cluster
B URI graph roots: executor init/apply/delete + sweep on URIs; prefix-delete; e2e: full lifecycle against RustFS medium
C --cluster <uri> serving + bucket-backed snapshot reads; system e2e: apply to RustFS, serve from it; Railway Bucket deploy validated (closes the loop with cookbooks PR #12's topology) medium
D migrate-storage, docs flip (S3-first), cookbooks/skills update, deprecation warning, deny-list entry small code, wide docs

Each stage is a PR with the usual gates; A and B are separable but land best back-to-back (B is where the user-visible payoff starts).

Open questions

  1. RustFS GA & conditional-write contract stability — beta.8 passes both probes; pin the probes as a lance_surface_guards-style integration test so a regression in a RustFS bump turns red here, not in production.
  2. Multi-writer ergonomics — conditional puts make concurrent applies safe (one wins, one gets state_cas_conflict); whether we want lease semantics (lock TTL + auto-break) is a later UX question, not a correctness one.
  3. Catalog GC on object storage — deletes leave blobs today on FS too; the existing gap carries over unchanged, tracked separately.
  4. --config ergonomics — once state is remote, two operators sharing a bucket need only the config repo; document the "config in git, state in S3" workflow as the primary pattern (it is the Terraform workflow).

What this explicitly does not change

Engine storage (already object-store native), the .pg/.gq languages, the plan/apply/approve model, recovery semantics (sidecar classification is storage-agnostic), the serving API surface, and omnigraph.yaml's per-operator role.