docs(rfc-013): tenancy model — cluster-as-tenant cells, pooled compute

General server/topology/auth/deployment RFC resolving the half-built tenancy ambiguity (cluster-only server vs pooled tenant_id scaffolding). Decision: the cluster is the tenant is the cell — silo the data (own storage/catalog/ policy/tokens), pool the compute (one process : N cells). No row-level pooling (no engine RLS). - §5.1 CellRuntime lifts today's per-cluster runtime into a value. - §5.2/§5.3 AppState holds a CellRegistry; resolve_cell is one new outer middleware hop before auth; the per-graph + Cedar + MCP stack is unchanged. - §5.4 per-cell CellAuth (Static | Oidc TokenVerifier); WorkOS org -> cell 1:1 with per-cell OAuth audience (cross-tenant token replay fails on aud). - §5.5 Cedar stays per-graph/per-cell; default-deny-read becomes safe; no tenant dimension needed. - §5.6 control plane = Cell Registry (metadata only) + provisioning-as-code; cell hot-load is the one safe runtime mutation (cell-granular, not graph). - §5.7 tiered dedicated/pooled/on-prem on one binary; §7 backward-compatible (today's single-cluster server = a one-cell map). MCP (rfc-003) is one consumer, not the driver. Linked from docs/dev/index.md.
2026-06-18 02:24:27 +02:00 · 2026-06-16 18:44:37 +02:00 · 2026-06-16 18:44:37 +02:00 · 0f58329ab7
commit 0f58329ab7
parent 86fbb62d12
2 changed files with 355 additions and 0 deletions
--- a/docs/dev/index.md
+++ b/docs/dev/index.md
@ -82,6 +82,7 @@ Working documents for in-flight feature work. Removed when the work lands.
 | Restructure the CLI around explicit planes — one graph-addressing model, declared capability surface, plane-grouped help (expands RFC-009 Phase 4) | [rfc-010-cli-planes-restructure.md](rfc-010-cli-planes-restructure.md) |
 | CLI refactoring — one addressing & config model post-`omnigraph.yaml`: scope + `--graph` + derived access path, served-default / privileged-direct, profiles, named queries, capability classifier (completes RFC-008) | [rfc-011-cli-refactoring.md](rfc-011-cli-refactoring.md) |
 | Provider-independent embedding configuration — one resolved `EmbeddingConfig` + sealed provider enum (Gemini/OpenAI/Mock), identity recorded in the schema IR, query-time same-space validation, NFR floor | [rfc-012-embedding-provider-config.md](rfc-012-embedding-provider-config.md) |
+| Tenancy model — cluster-as-tenant cells (silo the data, pool the compute): `CellRuntime` lifts the per-cluster runtime, one server hosts N cells resolved by host before auth, WorkOS org→cell 1:1 with per-cell audience, tiered dedicated/pooled/on-prem on one binary | [rfc-013-tenancy-cells.md](rfc-013-tenancy-cells.md) |

 ## Boundary

--- a/docs/dev/rfc-013-tenancy-cells.md
+++ b/docs/dev/rfc-013-tenancy-cells.md
@ -0,0 +1,354 @@
+# RFC-013: Tenancy model — cluster-as-tenant cells, pooled compute
+
+**Status:** Proposed — general architecture (server topology, identity, deployment).
+**Date:** 2026-06-16
+**Audience:** server / cluster / platform maintainers.
+**Builds on:** [rfc-005-server-cluster-boot.md](rfc-005-server-cluster-boot.md)
+(cluster-only boot, applied-revision serving), [rfc-011-cli-refactoring.md](rfc-011-cli-refactoring.md)
+(the store/cluster/server ontology), [rfc-004-cluster-graph-schema-apply.md](rfc-004-cluster-graph-schema-apply.md)
+(ledger/recovery/approvals — what makes a cell self-contained), [rfc-007-operator-config.md](rfc-007-operator-config.md)
+(keyed credentials / secret resolution).
+**Consumed by:** [rfc-003-mcp-server-surface.md](rfc-003-mcp-server-surface.md) (the MCP
+surface is one tenant-scoped consumer of this model, not its driver — see §6).
+**Target release:** v0.9.x (cell refactor) → cloud GA (pooled tier + WorkOS/OIDC).
+
+---
+
+## 1. Summary
+
+This RFC fixes the **tenancy model** for OmniGraph as a server/platform concern —
+independent of any one surface (HTTP data plane, MCP, CLI). It resolves an
+ambiguity that currently sits half-built in the code: the server is **cluster-only**
+(one cluster per process — [rfc-005](rfc-005-server-cluster-boot.md)), yet the
+identity layer carries **pooled multi-tenant scaffolding** (`GraphKey.tenant_id`,
+`ResolvedActor.tenant_id`, "Cloud will set `Some(...)` from the OAuth `org_id`
+claim"). Those two point at *different* tenancy architectures. We pick one:
+
+> **The cluster is the tenant is the cell.** A cell is the unit of data isolation —
+> its own storage root, catalog, Cedar policy bundle, and token source. Isolation is
+> **structural** (by deployment), never a row-level `tenant_id` filter on shared
+> data. **Density comes from one server process hosting many cells**, not from
+> pooling tenants into one graph.
+
+The one structural change: lift today's per-cluster server runtime into a
+**`CellRuntime`** value and let the server hold a map of them, resolved per request
+by host (or path) **before** authentication. The entire per-graph stack beneath —
+handlers, Cedar enforcement, stored queries, the RFC-003 MCP backend — is unchanged;
+it gains one outer dimension, not a rewrite. Identity maps **WorkOS Organization →
+cell, 1:1**, with a per-cell OAuth audience, so a token for one tenant cannot be
+verified against another's endpoint. The result is best-practice **tiered isolation**
+(silo the data, pool the compute) on one binary: dedicated/on-prem (1 process : 1
+cell) and pooled cloud (1 process : N cells) are the *same code*, different topology.
+
+## 2. Goals
+
+- **Decide the tenancy model** and make the code stop implying two.
+- **Isolation by construction:** a tenant cannot reach or enumerate another tenant's
+  data even if a Cedar policy is missing or a handler has a bug.
+- **Density without row-level pooling:** amortize compute across tenants while keeping
+  each tenant's storage, catalog, policy, and tokens fully separate.
+- **One binary, tiered topology:** dedicated, pooled, and on-prem are deployment
+  choices, not forks. No cloud-only correctness.
+- **Additive to the substrate:** no change to the manifest/commit/Cedar invariants;
+  the data plane and MCP surface ride on top unchanged.
+
+## 3. Non-Goals
+
+- **Row-level (pooled-into-one-graph) multi-tenancy.** OmniGraph has no engine-level
+  row security; pooling tenants into a shared graph would make isolation depend on a
+  per-query filter — the highest-risk pattern, explicitly rejected (§4).
+- **A new in-process `tenant_id` authorization dimension.** Cedar stays per-graph /
+  per-cell; the cell boundary does the tenant isolation (§5.5).
+- **Hosting an OAuth Authorization Server.** Each cell is a Resource Server; the AS is
+  WorkOS or the customer's IdP (§5.4), same posture as [rfc-003](rfc-003-mcp-server-surface.md) §3.
+- **Cross-cell queries / cross-tenant joins.** A cell is a hard boundary by design.
+
+## 4. The tenancy decision (and why)
+
+The industry framing is **silo / pool / bridge**, refined to **tiered + cell-based**:
+
+| Model | Isolation | Density | Fit |
+|---|---|---|---|
+| **Silo** — infra per tenant | strongest (structural) | worst | few, large, regulated |
+| **Pool** — shared infra, logical `tenant_id` | weakest (code-dependent) | best | many small — *only with engine RLS* |
+| **Bridge** — silo some layers, pool others | tunable | tunable | most real products |
+
+**Why not pool the data tier.** A pooled data store puts isolation in one place:
+every read/write must carry the tenant filter; one miss is a cross-tenant breach, and
+leaked customer data cannot be rotated. Postgres has Row-Level Security precisely to
+move that check into the engine. **OmniGraph has no RLS equivalent and no row-level
+tenant filtering** — so pooling tenants into a shared graph would adopt the most
+dangerous isolation pattern without the substrate support that makes it safe.
+Disqualifying for a data substrate.
+
+**Therefore: silo the data, pool the compute (the bridge model, biased to
+storage-silo because it is a database).** The isolation unit is the **cluster**, which
+already gives separate storage root + catalog + policy bundle + token source. A
+cluster is also a natural **cell** (a bounded blast-radius unit, the cell-based
+architecture pattern): a fault or breach is contained to one tenant, never the fleet.
+Density is then a *compute* concern — one process serving many cells — not a data
+concern. This is the model the code is already ~80% built for; the gap is
+compute-density routing, not isolation.
+
+## 5. Design
+
+### 5.1 The Cell abstraction
+
+A **cell** is exactly today's whole single-cluster server runtime, lifted into a
+value. The fields that are per-cluster today (registry, token table, server policy,
+boot source) move off `AppState` onto a `CellRuntime`:
+
+```rust
+// new — a cell == one cluster's runtime == one tenant
+pub struct CellRuntime {
+    pub cell_id:       CellId,                       // == cluster id == tenant id (audit/log key, NOT an isolation check)
+    pub registry:      Arc<GraphRegistry>,           // this cell's graphs (GraphHandle{ engine, policy, queries })
+    pub auth:          CellAuth,                      // per-cell token source — §5.4
+    pub server_policy: Option<Arc<PolicyEngine>>,    // this cell's server-scoped Cedar (GraphList, …)
+    pub config_path:   PathBuf,                       // this cell's cluster boot source (applied revision)
+}
+```
+
+Everything inside a cell — `GraphHandle`, per-graph Cedar bundles, stored-query
+registries, `GraphKey { tenant_id, graph_id }` — is **unchanged**. Graph ids are
+unique *within a cell*, which is all that is needed; there is no global graph
+namespace and no route↔key reconciliation problem because the cell is resolved first.
+
+### 5.2 Server: one process, many cells
+
+```rust
+pub struct AppState {
+    pub cells:    Arc<CellRegistry>,     // host/prefix -> Arc<CellRuntime>   (the ONLY new top-level field)
+    pub workload: Arc<WorkloadController>, // admission control (see Deferred — per-cell fairness)
+}
+
+// today's single `Multi { graphs, config_path, server_policy }` (rfc-005) becomes ONE cell;
+// the server boots N of them.
+pub enum ServerConfigMode {
+    MultiCluster { cells: Vec<CellBootConfig> },
+}
+```
+
+Boot opens each cell's applied revision (the existing rfc-005 path, run N times,
+bounded-concurrency) and inserts `Arc<CellRuntime>` into the `CellRegistry` keyed by
+its host (or path prefix). A **dedicated/on-prem** deployment boots a one-entry map; a
+**pooled** deployment boots many.
+
+### 5.3 Routing & middleware — one new outer hop
+
+The existing `build_app` nests per-graph routes under `/graphs/{graph_id}` with two
+`route_layer`s (`resolve_graph_handle` inner, `require_bearer_auth` outer). We add
+**one outermost layer**, `resolve_cell`, and rebind two existing layers to read the
+cell instead of `AppState`:
+
+```rust
+let per_graph_protected = Router::new()
+    .route("/snapshot", get(server_snapshot))
+    // … /query /mutate /queries /schema /load /branches /commits …
+    .merge(mcp::mcp_router(state.clone()))                       // RFC-003 — unchanged
+    .route_layer(from_fn_with_state(state.clone(), resolve_graph_handle))  // inner: reads CELL.registry  (was AppState.routing)
+    .route_layer(from_fn_with_state(state.clone(), require_bearer_auth))   // mid:   reads CELL.auth      (was AppState.bearer_tokens)
+    .route_layer(from_fn_with_state(state.clone(), resolve_cell));         // OUTER: injects Arc<CellRuntime>   ← NEW
+```
+
+Request lifecycle:
+
+```
+resolve_cell        host/prefix → Arc<CellRuntime>            (404 unknown cell)   ← NEW, outermost
+  └─ require_bearer_auth   validate token vs CELL.auth → ResolvedActor   (401)     ← now cell-scoped
+       └─ resolve_graph_handle   {graph_id} in CELL.registry → Arc<GraphHandle> (404) ← now cell-scoped
+            └─ handler / MCP   run_query · run_mutate · /mcp · Cedar enforce(...)   ← UNCHANGED
+```
+
+The only handler-adjacent edits: `require_bearer_auth` reads `cell.auth`,
+`resolve_graph_handle` / `server_graphs_list` read `cell.registry`. The isolation is
+in the ordering: **cell-A's token table and registry are unreachable from a cell-B
+request** because the cell is resolved first and everything downstream reads *that*
+cell.
+
+**Cell selector — host-based (recommended) vs path-based:**
+
+| | Host-based | Path-based |
+|---|---|---|
+| Selector | `Host: tenant-a.omnigraph.example.com` | `/clusters/{cell_id}/…` |
+| OAuth audience | the per-tenant origin (natural RFC 8707 resource) | `…/clusters/{cell_id}` |
+| Origin/CORS | isolated per subdomain (free) | shared origin |
+| DNS/cert | wildcard `*.example.com` → pooled fleet | one host |
+| `resolve_cell` | `cells.by_host(host)` | `cells.by_prefix(first_segment)` |
+
+Host-based wins for cloud (per-tenant audience, Origin, and cookie boundaries fall out
+for free). Path-based is the simple on-prem/dev shape. `resolve_cell` abstracts which.
+
+### 5.4 Identity & auth — per cell, two modes
+
+```rust
+pub enum CellAuth {
+    Static(Arc<[(BearerTokenHash, Arc<str>)]>),   // on-prem / self-host / dev — today's path, per cell
+    Oidc(Arc<dyn TokenVerifier>),                  // WorkOS (or customer IdP) for this org — cloud
+}
+```
+
+- **WorkOS Organization → cell, 1:1.** The cell's `Oidc` verifier is configured with
+  *that org's* issuer + audience. A token minted for `tenant-a`'s audience **fails
+  verification at `tenant-b`** (wrong `aud`) — structural isolation that runs *before*
+  Cedar and is independent of policy completeness.
+- **Same Resource-Server endpoint, mode by cell.** `require_bearer_auth` dispatches on
+  `cell.auth`. Static and OIDC cells coexist in one process. This is the
+  `TokenVerifier` seam already drafted in `identity.rs` ("RFC 0001 step 1 adds
+  `AuthSource::Oidc` when the `OidcJwtVerifier` ships"); WorkOS is one implementation.
+- **`ResolvedActor` mapping:** `actor_id` ← `sub`; `tenant_id` ← the **cell id** (for
+  audit/log clarity — *not* an isolation mechanism, since the endpoint already
+  isolated); `scopes` ← the OAuth `scope`/roles claim; `source` ← `Oidc`/`Static`.
+  This repurposes `tenant_id` from vestigial pooled scaffolding into "which cell logged
+  this," which is honest. **Identity stays server-resolved, never client-set** (the
+  MR-731 invariant, now applied per cell).
+- **Per-cell OAuth discovery:** each cell serves its own
+  `/.well-known/oauth-protected-resource` → that org's WorkOS AS, with the cell's
+  audience. Per-tenant PRM → per-tenant OAuth → per-tenant audience. (The RFC-003 §8
+  PRM config-gate for issue #59467 becomes a per-cell flag.)
+
+### 5.5 Authorization — Cedar stays per-graph / per-cell
+
+The cell boundary already guarantees a cell-A actor never reaches cell-B's policy
+engine, so **no tenant dimension is added to authorization**:
+
+- `PolicyRequest { action, branch, target_branch }` and `ResourceScope`
+  (Graph / Branch / TargetBranch) — **unchanged**. The principal stays `actor_id`.
+- `authorize`'s **default-deny-except-`Read`** becomes *safe*: "readable on missing
+  policy" now means the tenant's *own* graphs, not cross-tenant. The exact hazard that
+  would make this dangerous under pooled tenancy is structurally absent.
+- `GET /graphs` reads `cell.registry`, so it enumerates only the tenant's own graphs
+  and storage URIs — no cross-tenant topology leak.
+
+This is the payoff of cluster-as-tenant: the in-process tenant machinery a pooled
+model would require (tenant-keyed routing, a tenant Cedar principal, a tenant-aware
+deny default, a tenant-filtered enumeration) is **not built because it is not needed**.
+
+### 5.6 Control plane — the one legitimately pooled component
+
+A small **Cell Registry** holds *metadata only* (no tenant data):
+
+```
+org_id (WorkOS)  ──▶  cell_id  ──▶  { storage_root, issuer, audience, host, tier }
+```
+
+Onboarding a tenant is provisioning-as-code — the thing that makes silo *operable*
+(automated, not N hand-built stacks):
+
+```
+1. WorkOS Organization created / detected.
+2. `cluster apply` a NEW cell on a fresh storage root (own bucket/prefix), with the
+   org's schema.pg / queries / policy        → ledger + recovery + approvals (rfc-004).
+3. Register org_id → cell_id (+ issuer/audience/host) in the Cell Registry.
+4. Cell goes live:
+     • dedicated tier → its own process boots that one cell (today's exact path).
+     • pooled tier   → the fleet HOT-LOADS the cell into the CellRegistry map.
+5. DNS: tenant-a.example.com → the pooled fleet (wildcard); the host selects the cell.
+```
+
+Step 4's pooled hot-load is the **one new runtime-mutation primitive**, and it is
+deliberately **cell-granular, not graph-granular**: [rfc-011](rfc-011-cli-refactoring.md)
+closes runtime *graph*-add inside a cluster (correct — it mutates a live registry),
+but loading a **whole, independently-validated cell** is just "open a cluster" — its
+own ledger/recovery/catalog, nothing in any other cell moves. Far safer than the thing
+rfc-011 forbids. Eviction = drop the `Arc<CellRuntime>` from the map; in-flight
+requests keep their `Arc`.
+
+### 5.7 Deployment tiers — same binary, different topology
+
+| Tier | Topology | Mode | Use |
+|---|---|---|---|
+| **Dedicated** | 1 process : 1 cell | `MultiCluster { cells: [one] }` | enterprise / regulated / data-residency |
+| **Pooled** | 1 process : N cells | `MultiCluster { cells: [many] }` | SMB / free / long tail |
+| **On-prem** | 1 process : 1 cell, `Static` auth | `MultiCluster { cells: [one], Static }` | air-gapped / self-host |
+
+A tenant graduates pooled → dedicated by moving its cell to its own process — **no data
+migration** (the storage root does not move; the cell is already self-contained).
+
+## 6. How the surfaces ride on top
+
+This is a server/topology change; the surfaces are consumers and need little or no
+change:
+
+- **HTTP data plane.** Every protected route already resolves `Arc<GraphHandle>` from a
+  request extension; it now comes from the cell's registry. Handlers are unchanged.
+- **MCP ([rfc-003](rfc-003-mcp-server-surface.md)).** The MCP backend "consumes a
+  resolved actor and branches on nothing about how the token was verified" and mounts
+  under `per_graph_protected`. So `/graphs/{id}/mcp` simply lives under a cell now:
+  `https://tenant-a.example.com/graphs/{id}/mcp`. Per-graph isolation (rfc-003 §15.1)
+  is *sufficient* under cluster-as-tenant — each tenant's MCP clients point at their
+  own cell's endpoints; the discovery/enumeration concerns that would bite a pooled
+  model do not apply. MCP is **one** tenant-scoped consumer, not the reason for this
+  RFC.
+- **CLI ([rfc-011](rfc-011-cli-refactoring.md)).** A `--server <name|url>` scope already
+  addresses one served endpoint; a per-tenant subdomain is just a server URL. No new
+  addressing concept — the cell is reached as a server.
+
+## 7. Migration & backward compatibility
+
+Today's `omnigraph-server --cluster <one>` is a `MultiCluster` with **one cell** and a
+**host-agnostic** `resolve_cell` (any host → the sole cell). Therefore:
+
+- Existing single-cluster deployments keep working unchanged (one cell; `resolve_cell`
+  is identity).
+- `--cluster <dir|s3>` stays the dedicated/on-prem entry point.
+- A pooled fleet boots from a cell list (e.g. repeated `--cluster`, or a
+  `--cells <registry>` source).
+- RFC-003 MCP, OpenAPI generation, and CLI addressing are unchanged; `/graphs/{id}/…`
+  just lives under a cell.
+- `ResolvedActor.tenant_id` / `GraphKey.tenant_id` are **repurposed to the cell id**
+  (or removed) — they stop implying pooled-row tenancy. This is the cleanup that ends
+  the two-models ambiguity.
+
+## 8. Invariants & deny-list check
+
+- **§11 transport/auth at the boundary:** cell auth + `resolve_cell` live only in
+  `omnigraph-server`; engine/compiler/cluster crates never learn cells exist. ✓
+- **§12 no client-set identity:** both the cell (host/prefix → registry) and the actor
+  (token → `ResolvedActor`) are server-resolved; neither is client-settable. ✓
+- **No cloud-only correctness / no fork:** cells are a server-layer wrapper; the OSS
+  `Static`-auth cell is first-class, OIDC is additive. ✓
+- **Strong consistency / manifest atomicity (§1–§6):** untouched — this adds an outer
+  routing dimension, not a write path. Each cell's engine keeps its own snapshot
+  isolation and manifest publish. ✓
+- **No state that drifts:** the Cell Registry is control-plane metadata; per-cell state
+  remains the cluster's applied revision (rfc-005), derived from the ledger. ✓
+- **Least privilege:** a leaked cell token reaches one tenant; rotation is per-cell;
+  storage credentials stay with the server process, never operators. ✓
+
+## 9. Decisions, open questions, deferred
+
+**Decided:**
+- Cluster = tenant = cell; silo data, pool compute. No row-level pooling.
+- Cells resolved before auth; per-cell token source; per-cell Cedar.
+- WorkOS Organization → cell 1:1; per-cell OAuth audience.
+- `tenant_id` repurposed to cell id (audit), not an isolation mechanism.
+
+**Open / deferred:**
+- **Per-cell vs process-wide admission control.** `WorkloadController` is process-wide
+  today; pooled cells want per-cell fairness (noisy-neighbor). Make `workload` a
+  per-cell value or add per-cell quotas. This is the one shared resource pooling
+  reintroduces — design before the pooled tier ships.
+- **Cell hot-load / evict protocol.** Control-plane push vs poll of the Cell Registry;
+  pin the consistency story (a cell appears atomically or not at all; eviction drains).
+- **Cell Registry storage.** Its own OmniGraph graph, a control DB, or a config object
+  — metadata-only either way. Decide ownership and durability.
+- **`TokenVerifier` trait shape.** Still draft in `identity.rs`; this RFC fixes *where*
+  it plugs in (per-cell `CellAuth::Oidc`), not its exact signature.
+- **Scope semantics.** `ResolvedActor.scopes` is currently `[Full]` and read nowhere;
+  when OIDC populates real scopes, decide whether/how they feed Cedar context (a
+  behavior change to sequence deliberately, not silently).
+
+## 10. Relationship to prior RFCs
+
+[rfc-005](rfc-005-server-cluster-boot.md) made the server boot one cluster from its
+applied revision; this RFC makes the server boot *many* such clusters as cells and
+resolves one per request. [rfc-011](rfc-011-cli-refactoring.md) fixed the
+store/cluster/server ontology and closed runtime graph mutation; this RFC adds the
+*cell* as the tenant unit and the one safe runtime mutation (cell hot-load) that does
+not violate rfc-011's reasoning. [rfc-004](rfc-004-cluster-graph-schema-apply.md)'s
+ledger/recovery/approvals are what make a cell a self-contained, independently
+provisionable unit. [rfc-003](rfc-003-mcp-server-surface.md) is a consumer: its
+per-graph MCP surface becomes per-tenant for free once each cluster is a tenant. The
+net: tenancy is decided once, at the server topology layer, and every surface inherits
+it.