(feat): multi-graph server mode (#119)

* mr-668: add GraphId newtype + Cloud-mode forward identity stubs (PR 1/10) PR 1 of the MR-668 multi-graph server work. Pure types, no runtime behavior changes yet. Ships the validated identity vocabulary that the rest of the implementation will consume: - `GraphId(String)` — `^[a-zA-Z0-9-]{1,64}$`, leading underscore rejected (engine reserves every `_*` filename), reserved route names rejected (`policies`, `healthz`, `openapi`, `openapi.json`, `graphs`). Validation lives in `try_from` only; serde `Deserialize` re-runs it so JSON payloads cannot bypass. - `TenantId(String)` — same regex shape as GraphId. `None` in Cluster mode; reserved for Cloud mode (RFC 0003) where it carries the OAuth `org_id` claim. - `GraphKey { tenant_id: Option<TenantId>, graph_id }` — the registry HashMap key. `cluster()` constructor for the Cluster-mode default. - `Scope` enum with `Full` variant — Cluster mode default; RFC 0004 will extend with OAuth scopes (`graph:read`/`write`/`admin`/`*`). - `AuthSource` enum with `Static` variant — Cluster mode default; RFC 0001 step 1 will add `Oidc`. - `ResolvedActor { actor_id, tenant_id, scopes, source }` — replaces the upcoming refactor of `AuthenticatedActor(Arc<str>)` in PR 4a. Per MR-668 design decision 13: ship the Cloud-mode forward type shapes now (no `TokenVerifier` trait yet — that's RFC 0001 step 1) so handler signatures stay stable across the Cluster → Cloud trajectory. `Scope` and `AuthSource` use `#[non_exhaustive]` so future variants don't break caller matches. Tests: 26 new (15 graph_id + 11 identity), all passing. No regression in the existing 36 server library tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: Omnigraph::init error-path cleanup + three failpoints (PR 2a/10) PR 2a of the MR-668 multi-graph server work. Bug fix: a partially-failed `Omnigraph::init` previously left orphan schema files at the graph URI, making the URI unusable for a retry (the next `init` would refuse because `_schema.pg` already exists). Changes: 1. `init_with_storage` now wraps the I/O phase. On any error from `init_storage_phase`, calls `best_effort_cleanup_init_artifacts` to remove the three schema files before returning the original error: - `_schema.pg` - `_schema.ir.json` - `__schema_state.json` Cleanup is best-effort: a failure to delete is logged via `tracing::warn` but does NOT mask the init error. 2. Three failpoints added at the init phase boundaries: - `init.after_schema_pg_written` - `init.after_schema_contract_written` - `init.after_coordinator_init` 3. Four new failpoint tests in `tests/failpoints.rs` pin the cleanup behavior at each boundary plus the "original error wins over cleanup error" contract. All 23 failpoint tests pass. Coverage gap (documented in code comments): Lance per-type datasets and `__manifest/` directory created by `GraphCoordinator::init` are NOT cleaned up after a coordinator-init-phase failure. Recursive directory deletion requires `StorageAdapter::delete_prefix`, which was deferred along with `DELETE /graphs/{id}` (originally PR 2b). When that primitive lands, the third failpoint test can be tightened to assert the graph root is fully empty. Tests: 4 new (init_failpoint_*), all 23 failpoint tests green. No regression in the 105 engine library tests or 64 end_to_end tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: add GraphHandle + GraphRegistry data structure (PR 3/10) PR 3 of the MR-668 multi-graph server work. Pure data structure — no routing changes yet (that's PR 4a). New file: `crates/omnigraph-server/src/registry.rs` - `GraphHandle { key: GraphKey, uri: String, engine: Arc<Omnigraph>, policy: Option<Arc<PolicyEngine>> }` — the per-graph state that the routing middleware (PR 4a) will inject as a request extension. - `RegistrySnapshot { graphs: HashMap<GraphKey, Arc<GraphHandle>> }` — immutable snapshot; replaced atomically via `ArcSwap`. - `GraphRegistry { snapshot: ArcSwap<_>, mutate: Mutex<()> }` — lock-free reads, mutex-serialized mutations. - `RegistryLookup { Ready(Arc<GraphHandle>) | Gone }` — two-valued, no `Tombstoned` variant since DELETE is deferred in v0.7.0 scope. - `InsertError { DuplicateKey | DuplicateUri }` — both rejection cases for create-graph (maps to HTTP 409 in PR 7). - Methods: `new`, `from_handles` (bulk startup-time init), `get`, `list`, `len`, `insert`. Race semantics pinned by three multi-thread tests: - `concurrent_insert_same_key_exactly_one_succeeds` — N=8 spawned inserts with the same key; exactly 1 returns Ok, 7 return DuplicateKey. - `concurrent_insert_distinct_keys_all_succeed` — N=8 spawned inserts with distinct keys; all succeed. - `concurrent_reads_during_inserts_see_consistent_snapshots` — reader loop concurrent with sequential writes; every listed handle's key resolves via `get()` (no torn state). Why no tombstones field: `DELETE /graphs/{id}` is deferred to bound the scope of v0.7.0. Without a delete endpoint, there's no use for tombstones — every key in the registry is `Ready`, and every key not in the registry is `Gone`. When DELETE lands later, the `Tombstoned` variant + `tombstones: HashSet<GraphKey>` slot in additively without breaking caller signatures (the `Gone` variant remains the "not currently active" case). Why `tokio::sync::Mutex`: insert is async because PR 7's flow holds this mutex across the atomic YAML rewrite step (file I/O). std::Mutex would footgun across .await. Dependency additions: `arc-swap = { workspace = true }`, `thiserror = { workspace = true }` (used by InsertError). Tests: 12 new (12 passing). 74 server lib tests total green (62 from PR 1 + 12 new). Clippy clean on server crate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: router restructure + handler refactor for multi-graph (PR 4a/10) PR 4a of the MR-668 multi-graph server work. The heaviest single PR — rewires every handler to extract `Arc<GraphHandle>` from a routing middleware, replaces `AuthenticatedActor(Arc<str>)` with `ResolvedActor` everywhere, and adds the `ServerMode` discriminator. Behavior changes: - **Single mode** (legacy `omnigraph-server <URI>`): flat routes (`/snapshot`, `/read`, `/branches`, …) continue to work exactly as v0.6.0. Internally, the registry holds a single handle keyed by the sentinel `SINGLE_GRAPH_KEY_ID = "default"`; routing middleware injects that handle on every request. No HTTP-visible change. - **Multi mode** (new): routes nest under `/graphs/{graph_id}/...`. Routing middleware extracts the graph id from the path, looks it up in the registry, and injects the handle. 404 if not found. (Multi-mode startup itself lands in PR 5; this PR provides the router-side wiring.) AppState refactor: - `engine: Arc<Omnigraph>` and `policy_engine: Option<Arc<PolicyEngine>>` fields removed — both now live inside `GraphHandle` in the registry. - `mode: ServerMode { Single { uri } | Multi { config_path } }` added. - `registry: Arc<GraphRegistry>` added. - `server_policy: Option<Arc<PolicyEngine>>` added (placeholder for management endpoints in PR 6b; unused today). - Existing constructors (`new`, `new_with_bearer_token{s,_and_policy}`, `new_with_workload`, `open*`) build a single-mode AppState internally and remain source-compatible. Tests that constructed AppState via these constructors continue to work. - `with_policy_engine` post-construction setter — rebuilds the single-mode handle with the policy attached. Engine-layer enforcement is NOT reinstalled (matches the old single-field semantics; `open_with_bearer_tokens_and_policy` is the path that installs both layers). - `new_multi` constructor added for PR 5's startup loop. - `uri()` now returns `Option<&str>` (Some in single, None in multi). Routing middleware: - `resolve_graph_handle` injects `Arc<GraphHandle>` as a request extension. Mode-aware: single returns the only handle; multi parses `/graphs/{graph_id}/...` from the URI. Returns 404 in multi mode when the graph id is unregistered. Records `graph_id` on the current tracing span. - `require_bearer_auth` updated to insert `ResolvedActor` (was `AuthenticatedActor`). Handler refactor — every protected handler: - Gains `Extension(handle): Extension<Arc<GraphHandle>>` param. - Replaces `state.engine` → `handle.engine`. - Replaces `state.policy_engine()` → `handle.policy.as_deref()`. - Replaces `state.uri()` → `handle.uri.as_str()` (or `.clone()` where String is needed). - Replaces `Arc::clone(&state.engine)` → `Arc::clone(&handle.engine)` (the spawn-and-clone pattern in `server_export` — proof that a long-running export survives the registry being mutated later). authorize_request signature: - Was: `(state: &AppState, actor: Option<&AuthenticatedActor>, request: PolicyRequest)`. - Now: `(actor: Option<&ResolvedActor>, policy: Option<&PolicyEngine>, request: PolicyRequest)`. - Per-graph callers pass `handle.policy.as_deref()`. The (future PR 6b) management endpoints will pass `state.server_policy.as_deref()`. MR-731 invariant preserved: - The single chokepoint `request.actor_id = actor.actor_id.as_ref().to_string()` inside `authorize_request` still overwrites any client-supplied actor identity. Regression test `actor_id_resolves_from_bearer_token_ignoring_client_supplied_headers` at `tests/server.rs:1114-1216` passes unchanged. Tests: 0 new (the registry race tests in PR 3 already cover the data structure; this PR exercises them indirectly via the existing test suite). 74 lib + 57 server integration + 60 openapi = 191 tests green. Clippy clean. LOC: +397 insertions, -153 deletions in `crates/omnigraph-server/src/lib.rs`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: OpenAPI multi-mode cluster filter (PR 4b/10) PR 4b of the MR-668 multi-graph server work. In multi mode, the served `/openapi.json` reports cluster routes (`/graphs/{graph_id}/...`) instead of the legacy flat protected paths — matching what `build_app` actually mounts (PR 4a's `Router::nest`). Single mode is unchanged. Implementation: - New `server_openapi` branch: when `state.mode()` is `Multi`, call `nest_paths_under_cluster_prefix(&mut doc)` after `ApiDoc::openapi()`. - The rewrite consumes `doc.paths.paths`, then for every path-item: - If the path is in `ALWAYS_FLAT_PATHS` (`/healthz` for now), keep it flat. - Otherwise, prefix every operation_id with `cluster_` and reinsert the item at `/graphs/{graph_id}<original_path>`. - Single mode hits no extra work — the path map is untouched. - The static `ApiDoc::openapi()` still emits the flat surface, so in-process callers (the existing `openapi_json()` helper in tests) see the unmodified spec. Why cluster_ prefix on operation IDs: OpenAPI specs require unique operation_ids across the document. With both flat (single-mode) and cluster (multi-mode) surfaces ever co-existing in a generated SDK, the prefix prevents collision. The current served doc only carries one surface, so the prefix is forward-compat with potential future dual-surface generation. Tests: 6 new in `tests/openapi.rs`, all via the `/openapi.json` route (not the static `ApiDoc::openapi()` helper): - `multi_mode_openapi_lists_cluster_paths` — every protected path appears as a cluster variant. - `multi_mode_openapi_drops_flat_protected_paths` — flat protected paths are absent. - `multi_mode_openapi_keeps_healthz_flat` — `/healthz` survives. - `multi_mode_openapi_prefixes_operation_ids_with_cluster` — every cluster operation_id starts with `cluster_`. - `multi_mode_operation_ids_are_unique` — no operation_id collisions. - `single_mode_openapi_unchanged_by_cluster_filter` — single mode still emits the legacy flat surface (regression). New test helper `app_for_multi_mode(graph_ids)` exercises the new `AppState::new_multi` constructor from PR 4a — first user of multi-mode construction outside of unit tests. Result: 66 openapi tests + 57 server integration tests + 74 lib tests = 197 green. No regression in the existing OpenAPI drift check (`openapi_spec_is_up_to_date` still validates the static flat surface matches the committed openapi.json). LOC: +67 in lib.rs (rewrite logic), +219 in tests/openapi.rs (test suite + helper). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: multi-graph startup + mode inference (PR 5/10) PR 5 of the MR-668 multi-graph server work. This is the first PR that makes multi mode actually usable end-to-end: operators invoking `omnigraph-server --config omnigraph.yaml` with a non-empty `graphs:` map and no single-mode selector now get a running multi-graph server. Mode inference (MR-668 decision 2, four-rule matrix in `load_server_settings`): 1. CLI `<URI>` positional → Single 2. CLI `--target <name>` → Single (URI from graphs.<name>) 3. `server.graph` in config → Single (URI from graphs.<name>) 4. `--config` + non-empty `graphs:` + no single-mode selector → Multi (all entries in `graphs:`) 5. otherwise → error with migration hint Rule 5's error message names every escape hatch so operators can fix their invocation without grepping docs. Config schema extensions: - `TargetConfig.policy: PolicySettings` (per-graph Cedar policy file). `#[serde(default)]` so existing single-graph YAMLs keep parsing. - `ServerDefaults.policy: PolicySettings` (server-level Cedar policy for management endpoints — loaded in PR 5, wired into `GET /graphs` in PR 6b). - `OmnigraphConfig::resolve_target_policy_file(name)` and `resolve_server_policy_file()` helpers — both resolve relative to the config file's `base_dir`. Public types added to `omnigraph-server`: - `ServerConfigMode { Single { uri, policy_file } | Multi { graphs, config_path, server_policy_file } }`. - `GraphStartupConfig { graph_id, uri, policy_file }` — one entry per graph in multi mode. `ServerConfig` shape change: - WAS: `{ uri: String, bind, policy_file, allow_unauthenticated }`. - NOW: `{ mode: ServerConfigMode, bind, allow_unauthenticated }`. - Breaking for any code that constructs `ServerConfig` directly. `main.rs` is unaffected (uses `load_server_settings`). `serve()` now forks on `ServerConfig.mode`: - Single: existing flow via `AppState::open_with_bearer_tokens_and_policy`. - Multi: parallel open via `futures::stream::iter(graphs) .map(open_single_graph).buffer_unordered(4).collect()`. Bound 4 is a rule-of-thumb for I/O-bound work — at N≤10 this trades startup latency for a small amount of concurrent S3/Lance open pressure. Fail-fast: first open error aborts startup; in-flight opens drop their engine via Arc (Lance datasets close cleanly). New helper `open_single_graph(GraphStartupConfig)`: - Validates `GraphId` per the regex in PR 1. - `Omnigraph::open(uri).await` with descriptive error context. - Loads per-graph policy file and re-applies it via `Omnigraph::with_policy` (engine-layer enforcement, MR-722). - Returns `Arc<GraphHandle>` ready for the registry. Routing middleware bug fix: - `Router::nest("/graphs/{graph_id}", inner)` rewrites `request.uri().path()` to the inner suffix (e.g. `/snapshot`). The previous middleware tried to parse `{graph_id}` from `request.uri().path()` and got 400 instead of 200. Fixed by reading from `axum::extract::OriginalUri` request extension, which preserves the pre-rewrite URI. - Caught by the two new tests `cluster_routes_dispatch_per_graph_handle` and `cluster_route_for_unknown_graph_returns_404`. Tests (14 new, all passing): - Four-rule matrix: one test per branch + the joint case `mode_inference_cli_uri_overrides_graphs_map` + the empty-graphs-map error case. - Per-graph + server-level policy file path resolution. - Reserved `GraphId` rejection at startup. - End-to-end multi-graph routing: two graphs side by side, each cluster route hits the right engine. - Unknown graph id under cluster prefix → 404. - Flat routes 404 in multi mode. Inline `ServerConfig` test (`serve_refuses_to_start_in_state_1_without_unauthenticated`) and three `server_settings_*` tests updated to the new `mode` shape. Result: 211 server tests green (74 lib + 71 integration + 66 openapi), MR-731 regression test still pinned and passing. LOC: +45 config.rs, +281 lib.rs (net), +395 tests/server.rs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: Cedar resource-model refactor (PR 6a/10) PR 6a of the MR-668 multi-graph server work. Policy-crate-only refactor — no HTTP handler changes, no operator-supplied policy.yaml changes. Sets up the chassis that PR 6b's `GET /graphs` consumes. Two new `PolicyAction` variants: - `GraphCreate` — gates `POST /graphs` (deferred behavioral PR). - `GraphList` — gates `GET /graphs` (lands in PR 6b). Note: `GraphDelete` is intentionally NOT added in this PR. `DELETE /graphs/{id}` is deferred from MR-668's v0.7.0 scope to bound complexity (no `delete_prefix`, no tombstone, no `RegistryLookup::Tombstoned`). Adding the Cedar action without a consumer would be the same kind of "dead vocabulary" trap the `Admin` variant already documents. New `PolicyResourceKind { Graph, Server }` enum, plus a `PolicyAction::resource_kind()` method that classifies every action. Per-graph actions (Read, Change, BranchCreate, …) bind to `Omnigraph::Graph::"<graph_label>"`; server-scoped actions (GraphCreate, GraphList) bind to the singleton `Omnigraph::Server::"root"`. `Admin` stays classified as per-graph for now — MR-724 will pick the final shape when the first consumer surface ships. Cedar schema string additions: - `entity Server;` - `action "graph_create" appliesTo { principal: Actor, resource: Server, ... }` - `action "graph_list" appliesTo { principal: Actor, resource: Server, ... }` Compiler updates: - `compile_policy_source` picks the resource literal based on the action's `resource_kind`. Existing graph-only policies generate the same Cedar source as before — pinned by `per_graph_rules_continue_to_work_alongside_server_rules`. - `compile_entities` includes the `Server::"root"` entity only when a rule references a server-scoped action. Keeps test assertions for graph-only policies tight. - `PolicyEngine::authorize` builds the right resource UID at request time based on `request.action.resource_kind()`. Validation rules added to `PolicyConfig::validate`: - A rule may not mix server-scoped and per-graph actions (different resource kinds need different `permit` clauses). - Server-scoped actions cannot have `branch_scope` or `target_branch_scope` — there's no branch context at the server level. Operator impact: zero. The Cedar schema `Omnigraph::Server` entity is internally referenced by `compile_policy_source`; operator policy.yaml files only declare actions in `rules[].allow.actions` and never reference the resource entity directly. Decision 6's "internal rename only; operator policies unaffected" contract is preserved and pinned by `per_graph_rules_continue_to_work_alongside_server_rules`. Tests: 5 new (11 policy tests total, up from 6): - `graph_list_action_authorizes_against_server_resource` - `graph_create_action_authorizes_against_server_resource` - `server_scoped_rule_cannot_use_branch_scope` - `rule_mixing_server_and_per_graph_actions_is_rejected` - `per_graph_rules_continue_to_work_alongside_server_rules` No regression: 145 server tests (74 lib + 71 integration) still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: GET /graphs endpoint + per-graph policy wire-up (PR 6b/10) PR 6b of the MR-668 multi-graph server work. First management endpoint — `GET /graphs` lists every graph registered with the server, gated by the server-level Cedar policy from PR 6a. New API shapes (in `omnigraph-server::api`): - `GraphInfo { graph_id, uri }` — one entry per registered graph. - `GraphListResponse { graphs: Vec<GraphInfo> }` — sorted alphabetically by `graph_id` for deterministic output. Handler `server_graphs_list`: - Mounted at `GET /graphs` in both modes. - Single mode: returns 405 (resource exists in the API surface, just not operational without a `graphs:` map). 405 chosen over 404 so clients see "resource exists, wrong context" rather than "no such resource". - Multi mode: requires bearer auth (when configured); Cedar-gated by `PolicyAction::GraphList` against `Omnigraph::Server::"root"` (PR 6a's chassis). Returns the sorted registry list. Cedar gate composition: - When no `server.policy.file` is configured, the MR-723 default-deny falls through: `GraphList` is not `Read`, so an authenticated actor without a server policy gets 403. This is the right default — don't expose the registry until the operator explicitly authorizes it. - When a server policy is configured, Cedar evaluates the rule. The test `get_graphs_with_server_policy_authorizes_per_cedar` pins the admin-allow / viewer-deny split. Routing: - New `management` sub-router holding `/graphs` (auth-required, no `resolve_graph_handle` middleware — operates on the registry, not a single graph). - Single mode merges flat protected routes + management. - Multi mode merges nested `/graphs/{graph_id}/...` + management. OpenAPI: - `server_graphs_list` registered in `ApiDoc::paths(...)`. - `EXPECTED_PATHS` in `tests/openapi.rs` gains `/graphs`. - `openapi.json` regenerated (auto-tracked by `openapi_spec_is_up_to_date` in CI). Tests: 4 new in `tests/server.rs::multi_graph_startup`: - `get_graphs_lists_registered_graphs_in_multi_mode` - `get_graphs_returns_405_in_single_mode` - `get_graphs_requires_bearer_auth_when_configured` - `get_graphs_with_server_policy_authorizes_per_cedar` What's NOT in this PR (deferred): - Per-graph policy enforcement is wired through `handle.policy` (PR 4a already did this); PR 6b doesn't add new per-graph behavior beyond making sure the server policy lookup composes cleanly alongside it. - `POST /graphs` (PR 7) and `DELETE /graphs/{id}` (out of scope for v0.7.0). - CLI `omnigraph graphs list` (PR 8 will add). Result: 215 server tests green (74 lib + 66 openapi + 75 integration), 11 policy tests green. MR-731 spoof regression preserved across all this work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: POST /graphs runtime create endpoint (PR 7/10) PR 7 of the MR-668 multi-graph server work. Operators can now add a graph to a running multi-graph server without restarting: curl -X POST http://server/graphs \ -H "Content-Type: application/json" \ -d '{ "graph_id": "beta", "uri": "/data/beta.omni", "schema": { "source": "node Person { name: String @key }\n" }, "policy": { "file": "./policies/beta.yaml" } }' DELETE remains deferred (out of v0.7.0 scope per the trimmed plan — no `delete_prefix`, no tombstones). Body shape (decision 7): - Nested `schema: { source: "..." }` (mirrors the `policy: { file }` pattern; leaves room for future fields without breakage). - Optional nested `policy: { file: "..." }` for per-graph Cedar. - 32 MiB body limit (reuses `INGEST_REQUEST_BODY_LIMIT_BYTES`). - Asymmetric with `SchemaApplyRequest` which keeps flat `schema_source: String` — documented in api.rs. Atomic YAML rewrite + drift detection: - New `config::rewrite_atomic(path, new_config, expected_hash)`: flock → re-read + hash check → serialize → write `.tmp` → fsync → rename → fsync parent dir. Returns the new hash for the caller to update its in-memory baseline. - New `config::hash_config_file(path)` — SHA-256 of the on-disk bytes, used at startup and after each rewrite. - New `RewriteAtomicError { Drift | Io | Serialize }` enum. - `AppState.config_hash: Option<Arc<Mutex<[u8;32]>>>` carries the in-memory baseline. Updated after every successful rewrite so subsequent POSTs don't false-trigger drift. - The mutex is `std::sync::Mutex` (brief critical section, no .await inside). The flock itself serializes file access process-wide AND across multiple server instances (defense in depth). - All sync I/O runs inside `tokio::task::spawn_blocking` — flock is sync. Handler ordering (the load-bearing sequence): 1. Mode check: 405 in single mode. 2. Cedar authorize: `GraphCreate` against `Omnigraph::Server::"root"`. 3. Validate body: `GraphId::try_from` (regex + reserved-name), empty schema/uri checks, per-graph policy file parse. 4. Pre-check registry for duplicate graph_id / duplicate uri (409). 5. `Omnigraph::init` the new engine. 6. Atomic YAML rewrite (drift detection inside). 7. Publish in registry (atomic re-check via `GraphRegistry::insert`). Failure modes (documented in handler rustdoc): - Init fails → orphan storage at `req.uri` (PR 2a cleans up schema files; Lance datasets remain orphans until `delete_prefix` lands). - YAML rewrite fails (drift, IO) → orphan storage; YAML unchanged. - Registry insert fails (race) → YAML has entry but registry doesn't; next restart opens it cleanly. New dependency: `fs2 = "0.4"` (workspace + omnigraph-server). POSIX-only file locking. Linux/macOS deployment supported; Windows out of scope. Tests (10 new in `tests/server.rs::multi_graph_startup`): - `post_graphs_creates_a_new_graph_end_to_end` — happy path, includes YAML inspection to confirm the rewrite landed. - `post_graphs_baseline_hash_updates_between_rewrites` — two POSTs in a row both succeed (drift baseline updates correctly). - `post_graphs_duplicate_graph_id_returns_409` - `post_graphs_duplicate_uri_returns_409` - `post_graphs_invalid_graph_id_returns_400` (reserved name) - `post_graphs_empty_schema_source_returns_400` - `post_graphs_returns_405_in_single_mode` - `post_graphs_yaml_drift_detection_returns_503` — operator hand-edits omnigraph.yaml; server refuses to clobber. - `hash_config_file_is_deterministic_and_detects_changes` - `rewrite_atomic_refuses_when_hash_drifts` OpenAPI: `server_graphs_create` registered in `ApiDoc::paths(...)`; openapi.json regenerated. Result: 225 server tests green (74 lib + 66 openapi + 85 integration), all MR-731 regressions still pinned. LOC: ~580 lib.rs net (handler + helpers), ~120 config.rs (rewrite machinery), +71 api.rs (request/response shapes), +332 tests/server.rs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: CLI omnigraph graphs list/create (PR 8/10) PR 8 of the MR-668 multi-graph server work. CLI parity for the v0.7.0 management surface: operators can now manage graphs from the command line against a running multi-graph server. omnigraph graphs list --target dev --json omnigraph graphs create \ --target dev \ --graph-id beta \ --graph-uri /data/beta.omni \ --schema schema.pg DELETE is intentionally absent — server-side DELETE was deferred from v0.7.0 scope, and shipping a client subcommand for a server endpoint that doesn't exist would be dead vocabulary. The help output, the subcommand enum, and the test that pins it (`graphs_subcommand_help_ lists_list_and_create`) all agree. CLI architecture (modeled on `BranchCommand`): - New `Command::Graphs { command: GraphsCommand }` top-level variant. - `GraphsCommand { List, Create }` enum. - List: GET `<base>/graphs`. Stdout is `<graph_id>\t<uri>` per line, or JSON via `--json`. - Create: reads `--schema <path>` from local disk, inlines as `schema: { source: <file> }` in the POST body (nested per MR-668 decision 7). Optional `--policy-file <path>` becomes `policy: { file: <path> }`. Returns 201 → "created graph X at Y" or JSON via `--json`. - Both subcommands reject local URI targets with a clear "remote multi-graph server URL" error. New API type imports in the CLI: `GraphCreateRequest`, `GraphCreateResponse`, `GraphListResponse`, `GraphSchemaSpec`, `GraphPolicySpec` — all from `omnigraph-server::api`. Tests: - cli.rs (4 new, non-network): * `graphs_subcommand_help_lists_list_and_create` — pins the deferral of `delete` (catches scope creep). * `graphs_list_against_local_uri_errors_with_remote_only_message` * `graphs_create_against_local_uri_errors_with_remote_only_message` * `graphs_create_with_missing_schema_file_errors` — pins the IO context in the schema-read error path. - system_remote.rs (1 new, `#[ignore]` like its peers): * `graphs_list_and_create_against_multi_graph_server` — spawns a multi-mode server, calls `graphs list` (sees `alpha`), `graphs create` (adds `beta`), `graphs list` again (sees both), and confirms the new graph is reachable via its cluster route. CLI suite: 62 tests green (58 existing + 4 new). The new ignored end-to-end test runs locally with `cargo test --ignored`. LOC: +159 main.rs (enum + handlers), +88 cli.rs (unit tests), +131 system_remote.rs (integration test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: composite e2e tests, race fix, v0.7.0 release (PR 9/10) PR 9 — the final integration PR for MR-668 multi-graph server work. Closes the v0.7.0 release. Composite lifecycle tests (closes gaps flagged in PR 7's coverage review): - `multi_graph_lifecycle_post_query_restart_persistence` — POST a graph, query it via cluster route, reload the config from disk and confirm `load_server_settings` sees the rewritten YAML. Validates the "restart resolves orphans" failure-mode story. - `per_graph_policy_enforced_on_post_created_graph` — POST a graph with a per-graph policy attached, then send authenticated read and change requests. Per-graph Cedar enforcement fires correctly on a POST-created graph (engine-layer policy reinstalled via `Omnigraph::with_policy` inside the create flow). - `concurrent_post_graphs_distinct_ids_all_succeed` — 4 concurrent POSTs with distinct graph_ids all return 201. Caught a real race in `rewrite_atomic` (see below). Race fix — `rewrite_atomic_with_modify`: The first composite test surfaced a real bug. The old `rewrite_atomic(path, new_config, expected_hash)` captured the baseline hash OUTSIDE the flock, then called rewrite_atomic which re-acquired it inside. Under concurrent writers: - POST A: captures baseline H0, calls rewrite_atomic. - POST B: captures baseline H0 too (before A's update lands). - A: acquires flock, on-disk == H0, writes H1, releases. - A: updates baseline H0 → H1. - B: tries to acquire flock — waits. - B: acquires flock. On-disk is now H1. Expected (captured before A finished) is H0. MISMATCH → spurious Drift error. Worse: even if the timing happens to align, B's `updated` config was constructed from BYTES read before the flock. B writes a config that doesn't include A's new graph — silent data loss. The fix: new `config::rewrite_atomic_with_modify(path, baseline, modify)` takes a closure. Inside the flock + baseline mutex: 1. Read on-disk bytes, hash, compare to baseline. 2. Parse on-disk YAML. 3. Call `modify(parsed)` to produce the new config — receives fresh on-disk state, returns the modification. 4. Serialize + write + fsync + rename + update baseline. Everything is read-modify-write under the same critical section. Concurrent writers serialize cleanly. Test confirmed this is no longer a race. The old `rewrite_atomic(path, new_config, expected_hash)` API stays for tests that don't need the read-modify-write shape; the POST handler switches to the new shape. Version bump v0.6.0 → v0.7.0: - All 5 `crates/*/Cargo.toml` (compiler, engine, policy, cli, server) plus their inter-crate `path` dep version constraints. - `Cargo.lock` regenerated by `cargo build --workspace`. - `AGENTS.md` "Version surveyed" line, capability matrix HTTP-server row updated to mention multi-graph + cluster routes + atomic YAML rewrite. - `openapi.json` regenerated. Docs: - `docs/releases/v0.7.0.md` (new) — release notes with breaking changes, new features, deferred items (DELETE, `delete_prefix`, actor forwarding), and the single→multi migration recipe. - `docs/user/server.md` — substantial section additions for the two modes, mode inference, cluster endpoint table, management endpoints, `omnigraph.yaml` ownership contract, `POST /graphs` body shape + status codes. - `docs/user/cli.md` — `omnigraph graphs list/create` section, deferred-DELETE note. - `docs/user/policy.md` — server-scoped Cedar actions (`graph_create`, `graph_list`), per-graph vs server-level policy composition, example server-level policy. Workspace test pass: 573 tests green across all crates. Zero failures. MR-731 spoof regression still pinned and passing across the entire 10-PR series. This commit closes MR-668. v0.7.0 is ready for tagging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: remove POST /graphs and CLI graphs create (defer runtime graph mgmt) The POST /graphs runtime-create endpoint shipped in PR 7/10 has three unresolved high-severity bugs: - flock-on-renamed-inode race: the YAML flock is taken on omnigraph.yaml itself, then a temp file is renamed over it. Cross-process writers end up locking different inodes — both believing they hold exclusive access. - duplicate-check outside the file lock: precheck runs against the in-memory registry only; the locked closure does config.graphs.insert(...) unconditionally. Concurrent same-id POSTs can persist the loser in YAML while the in-memory registry keeps the winner — they disagree after restart. - best_effort_cleanup_init_artifacts deletes _schema.pg / _schema.ir.json / __schema_state.json on any init failure. An accidental re-init against an existing graph's URI destroys its schema; subsequent open() fails at read_text(_schema.pg). The correct fix is a Lance-style cluster catalog (reserve → init → publish with recovery sidecars), parallel to the engine's existing __manifest discipline. That work is out of scope for v0.7.0. For now, disable runtime add/remove from the network and CLI surface. Operators add graphs by editing omnigraph.yaml and restarting. The GET /graphs read-only enumeration stays. Removed: - POST /graphs handler + router fragment + utoipa registration - 13 post_graphs_* server tests + 3 composite POST tests + multi_mode_app_with_real_config / post_graph helpers - CLI omnigraph graphs create subcommand + its handler + cli.rs tests - system_remote.rs combined list+create test trimmed to list-only - YAML rewrite infra: rewrite_atomic[_with_modify], RewriteAtomicError, staging_path, hash_config_file, AppState::config_hash field + threading through new_multi and open_multi_graph_state - fs2 dependency (verified absent from cargo tree) - sha2/fs2 imports in config.rs (only the rewrite path used them) - Cedar PolicyAction::GraphCreate variant + "graph_create" match arms + action def in Cedar schema + graph_create_action_authorizes_against_server_resource test - GraphCreateRequest / GraphCreateResponse / GraphSchemaSpec / GraphPolicySpec API types (only the POST handler / CLI imported them) Kept: - GET /graphs (read-only enumeration) and graph_list Cedar action - omnigraph graphs list CLI subcommand - All multi-graph startup, mode inference, cluster routes, per-graph + server-level Cedar policies - server_settings_drive_multi_graph_startup_end_to_end (the test that covers operator-authored YAML + restart — the path that survives) - best_effort_cleanup_init_artifacts and the three init failpoints (still reachable from CLI `omnigraph init`; preflight fix deferred as a follow-up) - GraphRegistry::insert and its concurrency tests — production callers gone, but the method is the natural seam for the future cluster-catalog work Also fixed (transcript issue 4): - ALWAYS_FLAT_PATHS now includes /graphs so multi-mode OpenAPI advertises the management route correctly (was previously rewritten to /graphs/{graph_id}/graphs) - multi_mode_openapi_keeps_healthz_flat → renamed to multi_mode_openapi_keeps_management_paths_flat, asserts both /healthz and /graphs stay flat - multi_mode_openapi_prefixes_operation_ids_with_cluster skips /graphs in addition to /healthz Doc fixes: - docs/user/cli.md: graphs list example was --target http://..., but --target is a config-graph-name lookup; corrected to --uri. Removed the graphs create example. - docs/user/server.md: dropped POST /graphs row, "omnigraph.yaml ownership", and "POST /graphs body shape" sections. Added a paragraph stating runtime add/remove is not exposed in v0.7.0. - docs/user/policy.md: dropped graph_create action; reworded the "Configuration" line to clarify that server-scoped rules (graph_list) take neither branch_scope nor target_branch_scope. - docs/releases/v0.7.0.md: rewrote release narrative — multi-graph mode ships; runtime add/remove deferred. - AGENTS.md: HTTP server bullet and capability matrix row updated to reflect read-only GET /graphs and the operator-edit workflow. - openapi.json regenerated; /graphs has only .get, no .post. Diff: 17 files, +123 −1525 LOC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: comment cleanup and policy format style Strip "PR Na/Nb" sub-PR references throughout MR-668 surfaces — they were useful during the 10-PR delivery sequence but rot now that the work is in the tree. Keep the MR-668 umbrella references. Also: - Add explicit `when = when` and `resource_literal = resource_literal` named args in `compile_policy_source`'s outer `format!` to match the surrounding crate style (already explicit for `group` and `action`). - Rename the best-effort cleanup tracing target from "omnigraph::init" to "omnigraph::init::cleanup" so operators can filter init-failure cleanup events separately from init's other log lines. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: drop actor_id from PolicyRequest; pass actor as separate arg The MR-731 "server-authoritative actor identity" invariant was enforced by an in-function chokepoint (`request.actor_id = actor.actor_id...` overwrite inside `authorize_request`). That worked but relied on every caller passing in a `PolicyRequest` and trusting the overwrite — a comment-enforced invariant. Move the invariant into the type system: * `PolicyRequest` no longer carries `actor_id`. The struct now models what a caller wants to do, not who they are. * `PolicyEngine::authorize(actor_id: &str, request: &PolicyRequest)` and `validate_request(actor_id, request)` take identity as a separate argument. The same shape `PolicyChecker::check` already had for the engine layer. * `authorize_request` in the HTTP layer extracts `actor_id` from the bearer-resolved `ResolvedActor` and passes it positionally — no overwrite step that could be skipped. * CLI `omnigraph policy explain` updated (the only other consumer that built a `PolicyRequest`). Public API break for the `omnigraph-policy` crate. Worth it: handlers can no longer accidentally populate `actor_id` from a request body field, and external consumers are forced by the compiler to source actor identity from a trusted path. The MR-731 chokepoint test `actor_id_resolves_from_bearer_token_ignoring_client_supplied_headers` still passes — the bearer-resolved actor is what reaches the engine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: consolidate AppState single-mode constructors; delete with_policy_engine The prior `with_policy_engine` constructor reused the engine `Arc` from the existing handle (`engine: Arc::clone(&existing.engine)`) without re-applying `Omnigraph::with_policy`. Combined with `new_with_workload`, the documented composition pattern was `AppState::new_with_workload(...).with_policy_engine(p)` — which produced an `AppState` whose HTTP layer enforced Cedar but whose underlying engine had no `PolicyChecker` installed. Any caller reaching the engine via `state.registry().list()[i].engine` could bypass policy entirely. The doc comment named this gap; the type system didn't. Make composition impossible to get wrong: * Add `AppState::new_single(uri, db, tokens, Option<PolicyEngine>, WorkloadController)` — canonical single-mode constructor that takes every option together and routes through `build_single_mode` (which applies `db.with_policy(checker)` to the engine itself). * `new`, `new_with_bearer_token`, `new_with_bearer_tokens`, `new_with_bearer_tokens_and_policy`, `new_with_workload` all become thin wrappers around `new_single`. * Delete `with_policy_engine`. There is no post-construction policy install path any more; the single linear construction forces HTTP-layer and engine-layer policy to install together or not at all. Regression test `engine_layer_policy_fires_via_direct_arc_omnigraph_from_new_single` constructs an `AppState::new_single` with a deny-all policy, pulls the `Arc<Omnigraph>` from the registry handle (the same path an embedded SDK consumer would take), and asserts a direct `mutate_as` call returns `OmniError::Policy`. Pre-fix this test would have succeeded the mutation. Test caller in `ingest_per_actor_admission_cap_returns_429` migrates from `.with_policy_engine(...)` to `new_single(..., Some(policy_engine), workload)`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: derive any_per_graph_policy on RegistrySnapshot; simplify dup check `AppState::requires_bearer_auth` walked the entire registry per request (cloning Arcs into a `Vec`, then `.iter().any(|h| h.policy .is_some())`) to decide whether the auth middleware should challenge. The walk is unnecessary — the answer only changes when the registry mutates, which is exactly the moment a new snapshot is constructed. Move the flag onto the snapshot itself: * `RegistrySnapshot { graphs, any_per_graph_policy: bool }`. * `RegistrySnapshot::new(graphs)` is the only construction path — it derives the flag from `graphs.values().any(|h| h.policy .is_some())` so the cached value can't drift from the source data. * `Default` delegates to `new(HashMap::new())`. * `GraphRegistry::from_handles` and `insert` build snapshots via `RegistrySnapshot::new(...)`. * `GraphRegistry::snapshot_ref()` exposes the current snapshot through an `arc_swap::Guard`; callers that need cached derived state go through this accessor (callers that only want `graphs` still use `list` / `get`). `requires_bearer_auth` becomes one `ArcSwap::load` + bool read. Also (drive-by, same file, same hunk): replace the dead `if let Some(other) = seen_uris.get(...)` + `let _ = other;` pattern in `from_handles` with a plain `seen_uris.contains_key(...)`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: fail-fast multi-graph startup with try_collect The `open_multi_graph_state` doc comment claims "Fail-fast — the first open error aborts startup; other in-flight opens are dropped" but the code did .buffer_unordered(4) .collect::<Vec<_>>() .await .into_iter() .collect::<Result<Vec<_>>>()?; which drains every future in the stream before propagating the first `Err`. With N S3-backed graphs and graph #2 failing fast, the caller still waits for #1, #3, #4, … to either succeed or fail before seeing the error. Replace the four-line dance with `futures::TryStreamExt::try_collect`, which short-circuits on the first `Err` and drops the rest. The doc comment now matches behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: drop unused State extractor from 7 read-only handlers After the routing-middleware refactor moved the engine into the per-graph `GraphHandle` (extracted via `Extension<Arc<GraphHandle>>`), seven read-only handlers — `server_snapshot`, `server_read`, `server_export`, `server_schema_get`, `server_branch_list`, `server_commit_list`, `server_commit_show` — kept an unused `State(_state): State<AppState>` extractor. Drop it. Each request avoids one `FromRequestParts` clone of `AppState`'s Arcs. Handlers that actually use state (workload admission for write paths, `server_policy` for management endpoints) keep theirs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: emit info! for graph routing decision `tracing::Span::current().record("graph_id", ...)` in the routing middleware silently no-ops here: no upstream `#[tracing::instrument]` on the handlers declares a `graph_id` field, and `TraceLayer::new_for_http` doesn't either. The recorded value never lands anywhere visible. Replace with an explicit `info!(graph_id = %handle.key.graph_id, "graph routed")` event so operators can grep logs and correlate requests with the active graph. In single mode the value is the sentinel `"default"`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: align GET /graphs 405 body code with HTTP status The single-mode `GET /graphs` handler returned an `ApiError` built via struct literal with `status: METHOD_NOT_ALLOWED, code: BadRequest`. The body code disagreed with the HTTP status — clients deserializing on `code` saw `bad_request`, clients deserializing on `status` saw 405. Same bug class as the earlier 503+Conflict mismatch on the removed YAML drift path. Close the class for this one remaining instance: * Add `ErrorCode::MethodNotAllowed` to the API enum. * Add `ApiError::method_not_allowed(msg)` — pairs the 405 status with the matching code. * Replace the struct literal in `server_graphs_list` with the constructor. * Regenerate `openapi.json` (adds `method_not_allowed` to the ErrorCode schema enum). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: drop unused axum::handler::Handler import The import landed in earlier work but no current call site uses it. Emitted an `unused_imports` warning on every server build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: drop unused fs2 workspace dependency `fs2 = "0.4"` lingered in [workspace.dependencies] after the POST /graphs flock-on-rename design was pulled. `cargo tree -i fs2` reports no consumers in the workspace and the dep is not in Cargo.lock. Removing the declaration closes the "phantom dep" class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: AGENTS.md Cedar row no longer hardcodes action count The "8 actions" claim drifted as soon as MR-668 added `graph_list`. Bumping the count would just push the drift one PR forward; the correct-by-design fix is to defer to the canonical list in docs/user/policy.md and stop maintaining a duplicate count. Closes the "doc hardcodes a count that drifts from the enum" class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: cfg(test)-gate GraphRegistry::insert and its mutex `insert` and the `mutate: Mutex<()>` that serializes it had no runtime consumer in v0.7.0 — the only insertion path at startup is `from_handles`, and runtime add/remove is deferred until a managed cluster catalog ships. Leaving both `pub` and live made them a "looks like API, isn't" footgun: a future change could build on `insert` without re-establishing the concurrency contract with an actual consumer in scope. Gate both together (`#[cfg(test)]` on the method, the field, and the `tokio::sync::Mutex` import) so the race-pinning tests still compile but production cannot reach them. When a real consumer ships, ungate both — they're a unit. Closes the "public API with no runtime consumer drifts toward incorrect" class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: drop vestigial PolicyEngine surface * `validate_request` had zero callsites — pure surface for nothing. * `deny`'s `_actor_id` and `_request` parameters were both unused (the underscore prefix gave it away); the message is built by the caller before `deny` ever sees the request. Trim both. Closes the "public API that the type system can't justify" class for the policy engine. No behavior change; every existing test stays green because the deletions never had a runtime effect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: regression test for init re-init footgun (red) A second `Omnigraph::init` against an existing graph URI today destroys the existing graph's schema artifacts. `init_storage_phase` overwrites `_schema.pg` before any preflight, and on the inner `GraphCoordinator::init` failure that follows, `best_effort_cleanup_init_artifacts` deletes all three schema files. The existing Lance datasets and `__manifest/` survive but the schema metadata is gone — unrecoverable without operator surgery. This test exercises that path and currently fails with "_schema.pg must not be deleted by a failed re-init", confirming the destructive cleanup branch fires. The fix in the next commit makes the test pass by preflighting with `storage.exists()` and returning a typed error before any write touches disk. Per AGENTS.md rule 12, the test commit lands just before the fix commit so the red → green pair is visible in `git log` and a reviewer can check out this commit alone to reproduce. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: close init re-init footgun via InitOptions preflight (green) `Omnigraph::init` is "create a new graph"; existing graphs need an explicit overwrite. Today's behavior — silently overwrite schema files, then on inner failure delete them via best-effort cleanup — is destructive against an existing graph regardless of which branch fires. Correct-by-design fix: * New `InitOptions { force: bool }` struct (default `force: false`). * New `Omnigraph::init_with_options(uri, schema, options)`. The old `Omnigraph::init(uri, schema)` is a thin shortcut that passes `InitOptions::default()`. * `init_with_storage` runs a `storage.exists()` preflight on the three schema URIs BEFORE any parse, write, or coordinator call. Any hit → typed `OmniError::AlreadyInitialized { uri }`. The destructive code paths (the `write_text` overwrite and the best-effort cleanup) are now unreachable in strict mode against an existing graph. * `force: true` skips the preflight; existing operators who actually mean to overwrite opt in explicitly. * CLI: `omnigraph init --force` maps to `InitOptions { force: true }`. * HTTP: `OmniError::AlreadyInitialized` maps to 409 via `ApiError::from_omni`. Not currently HTTP-reachable (POST /graphs was pulled), but the wiring lands here so a future runtime create endpoint has one canonical translation. Closes the "init is destructive against existing state" class. The regression test added in the previous commit (`init_on_existing_graph_uri_does_not_destroy_existing_schema`) turns green: the original schema files now survive a second init attempt byte-for-byte, and the call errors cleanly with `AlreadyInitialized`. The four existing `init_failpoint_after_*_cleans_up_*` tests stay green — strict mode's preflight passes on a fresh tempdir, and cleanup still runs as before when a failpoint fires mid-write. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: split PolicyEngine::load into kind-typed loaders Pre-fix, every caller of `PolicyEngine::load(path, graph_id)` passed *some* `graph_id` argument — even when the policy was server-scoped and Cedar's resolution would never touch a Graph entity. The server-level loader at lib.rs passed the meaningless sentinel `"server"`. A graph policy file containing a `graph_list` rule compiled fine; a server policy file containing a `read` rule compiled fine. Both silently no-op'd at request time because the engine kind and the rule's resource kind disagreed. Correct-by-design fix: replace `load` with two kind-typed loaders. * `PolicyEngine::load_graph(path, graph_id)` — for per-graph policy files. Rejects any rule whose action `resource_kind()` is `Server`. * `PolicyEngine::load_server(path)` — for server-level policy files. Takes no `graph_id`: server-scoped actions resolve against the singleton `Omnigraph::Server::"root"` entity, never a Graph. Rejects any rule whose action `resource_kind()` is `Graph`. The old `load` is hard-deleted in the same commit because every in-tree consumer migrates here (no semver promise on the workspace crate, no external pinners). New `PolicyEngineKind` enum types the loader's intent; `validate_kind_alignment` is the load-time check that closes the "wrong action, wrong file, silent no-op" class — operators get a load-time error instead of confused-and- silent behavior at request time. Callsites migrated: * server lib.rs:374 (single-mode per-graph) → load_graph * server lib.rs:1065 (multi-mode server) → load_server * server lib.rs:1103 (multi-mode per-graph) → load_graph * CLI main.rs:732 (resolve_policy_engine) → load_graph * tests/server.rs ×5 (4 graph, 1 server) → load_graph/load_server * policy_engine_chassis.rs → load_graph Four new in-source tests pin the contract: both rejection paths and both positive paths. Closes the "operator puts an action in the wrong file and the rule silently never matches" class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: introduce GraphRouting, retire single_mode_handle Pre-fix, `AppState` always carried `Arc<GraphRegistry>` even when serving one graph. Single mode populated the registry with one handle keyed by the `SINGLE_GRAPH_KEY_ID = "default"` sentinel; `single_mode_handle` walked the registry, asserted `len == 1`, and returned the single element with a 500-class "programmer error" branch on mismatch. Three smells in a row — magic key, walk-and-assert, programmer-error guard — all because the single-mode runtime was forced through a multi-mode abstraction. Correct-by-design fix: type the routing. * New `pub enum GraphRouting { Single { handle }, Multi { registry, config_path } }` on `AppState`. The `Single` arm carries the handle directly — no registry, no key, no walk. * `resolve_graph_handle` middleware matches on `routing`. Single mode returns the handle in O(1); multi mode does the same path-extract + registry lookup as before. The 500-class programmer-error branch is gone — the type system now makes the violated invariant ("single mode has exactly one handle") unrepresentable. * `requires_bearer_auth` reads `handle.policy.is_some()` directly in the Single arm; Multi arm still uses the cached `any_per_graph_policy` flag. `ServerMode` and the legacy `registry` field on `AppState` are still populated for now — C-3 removes both once every reader is migrated. The `SINGLE_GRAPH_KEY_ID` sentinel and `ServerMode` will also go away in C-3. Closes the "single mode forced through a multi-mode abstraction" class. All 76 server integration tests stay green: handlers still extract `Extension<Arc<GraphHandle>>` from the request, so the middleware's internal change is invisible to them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: remove ServerMode, registry field, and the SINGLE_GRAPH sentinel C-1/C-2 introduced `GraphRouting` and pointed the middleware at it. This commit removes the legacy shape that's now dead: * `ServerMode` enum — deleted. Single mode's `uri` lives on `handle.uri`; multi mode's `config_path` lives on the `GraphRouting::Multi` arm. * `AppState.mode: ServerMode` field — deleted. * `AppState.registry: Arc<GraphRegistry>` field — deleted. Multi mode's registry is on `GraphRouting::Multi { registry, .. }`; single mode has no registry at all. * `AppState::mode()`, `AppState::uri()`, `AppState::registry()` accessors — deleted. New `AppState::routing() -> &GraphRouting` is the single public entry point. * `SINGLE_GRAPH_KEY_ID` constant — deleted. `GraphHandle.key` is still required by the struct, but in single mode the key is now only a tracing label (`"default"`, inlined with a comment naming its sole remaining purpose). Single-mode flat routes never carry a `{graph_id}` parameter, so the key is never compared against user input, and there is no registry where it could be a map key. C-1/C-2 already removed the registry walk that the sentinel was named for. Callers migrated: * `build_app` (lib.rs:944) — matches on `state.routing()` instead of `state.mode()`. * `server_graphs_list` (lib.rs:1162) — destructures the Multi arm to get the registry; Single arm short-circuits to 405. * `server_openapi` (lib.rs:1217) — matches the Multi arm for the cluster-prefix rewrite. * `tests/server.rs:3735` — the B2 footgun regression test now matches on `state.routing()` to extract the single-mode handle (the test's earlier `state.registry().list().next()` shape was the closest pre-fix analog to "embedded consumer reaches the engine"; the new shape is more direct). Closes the entire "single mode forced through a multi-mode abstraction" class. After this commit: * No magic sentinel as a routing key. * No `single_mode_handle` walk-and-assert helper. * No 500-class "programmer error" branch in the middleware. * No two-field discriminant on `AppState` where one would do. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: regression test for nested-route path extraction (red) `server_branch_delete` and `server_commit_show` use bare `Path<String>` extractors. In single-mode flat routes (`/branches/{branch}`, `/commits/{commit_id}`) this works — one capture, one value. In multi-graph cluster routes (`/graphs/{graph_id}/branches/{branch}`, `/graphs/{graph_id}/commits/{commit_id}`) axum 0.8 propagates the outer `{graph_id}` capture into the inner handler, so the extractor sees two captures and 500s with "Wrong number of path arguments. Expected 1 but got 2." `cluster_routes_dispatch_per_graph_handle` only exercises `/snapshot` (no Path extractor), so the regression slipped through. This test closes that gap structurally: every cluster route with an inner path param gets exercised here. Currently fails with the exact symptom above. Fix in the next commit makes it pass. Per AGENTS.md rule 12, the red test commit lands just before the fix so the pair is visible in `git log` and a reviewer can check out this commit alone to reproduce. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: named-field path-param structs for nested cluster routes (green) `Path<String>` deserializes one path-param value positionally. Single-mode flat routes (`/branches/{branch}`, `/commits/{commit_id}`) have one capture; multi-mode nested routes (`/graphs/{graph_id}/branches/{branch}`, `/graphs/{graph_id}/commits/{commit_id}`) have two — axum 0.8 propagates the outer capture into nested handlers. Same handler, two different shapes; the multi-mode shape 500s with "Wrong number of path arguments. Expected 1 but got 2." Symptomatic fix: change to `Path<(String, String)>` and ignore the first element. Breaks again the moment we add another nest layer (e.g. tenant in Cloud mode). Correct-by-design fix: named-field structs deserialized by name from axum's path-param map. Each handler picks only the fields it needs. Stable across single / multi / future-cloud nest depths because deserialization is by field name, not position. * New `BranchPath { branch: String }` (file-local to lib.rs) * New `CommitPath { commit_id: String }` * `server_branch_delete` extractor → `Path<BranchPath>` * `server_commit_show` extractor → `Path<CommitPath>` Closes the "handler path-extractor type is positional and breaks when route nesting changes" class. Red test from the previous commit turns green. All 77 server tests pass (single-mode branch delete + commit show, plus new multi-mode coverage). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: centralize policy-requires-tokens check in the runtime classifier Single-mode `open_with_bearer_tokens_and_policy` bailed at lib.rs:380 when policy was installed and no tokens. Multi-mode `open_multi_graph_state` had no equivalent: the server started, every request 401'd because no token could ever match, and the operator spent time debugging a misconfiguration the single-mode path would have caught at startup. The doc/code contradiction made the gap easy to miss: the `ServerRuntimeState::PolicyEnabled` docstring said tokens-or-not was "unusual but valid — every request fails 401 without a bearer, which is effectively 'locked'." The single-mode bail contradicted that. In practice, silent-401-on-every-request is bug-shaped, not feature-shaped (operators wanting deny-all should configure tokens plus a deny-all Cedar rule to get meaningful 403s with policy-decision logging). Symptomatic fix: add a copy of the bail to multi-mode. Two copies that can drift again the next time a startup path is added. Correct-by-design fix: hoist the check into `classify_server_runtime_state` so both modes get the same enforcement from one source of truth. The classifier becomes the single source of truth for "should we start?" and adding a startup invariant there is now the natural extension point for any future mode. Classifier matrix is now complete: | has_tokens | has_policy | allow_unauthenticated | Result | |---|---|---|---| | F | F | F | bail (existing) | | F | F | T | Open (existing) | | T | F | * | DefaultDeny (existing) | | F | T | * | bail (NEW — closes the gap) | | T | T | * | PolicyEnabled (existing) | Changes: * `classify_server_runtime_state` (lib.rs:870-890) gains the `(false, true, _) => bail!(…)` arm with a clear message naming the failure mode and the two valid resolutions. * `open_with_bearer_tokens_and_policy` (lib.rs:369+) drops its redundant local bail — the classifier rejected the invalid case before construction was reached. * `ServerRuntimeState::PolicyEnabled` docstring is rewritten: drops the "(unusual but valid)" carve-out and states plainly that PolicyEnabled requires tokens. Names the explicit alternative (tokens + deny-all Cedar rule) for operators who want the all-requests-denied behavior. * `classify_policy_enabled_always_wins` test is renamed to `classify_policy_enabled_requires_tokens` and the now-invalid `(false, true, _)` assertion is removed (covered by the new rejection test). * New `classify_policy_without_tokens_is_rejected` test covers the new arm. * New `serve_refuses_to_start_with_policy_but_no_tokens_multi_mode` integration test pins the multi-mode propagation path — symmetric with the existing single-mode `serve_refuses_to_start_in_state_1_without_unauthenticated`. Closes the "single mode and multi mode startup branches can drift on safety invariants" class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: close coverage gaps surfaced by the test-coverage audit The bot-review pass and the subsequent coverage audit surfaced two material gaps in PR #119's test surface — both easy to close, both worth closing before merge. * **Gap 1 — cluster-route sweep.** The Bug-1 path-extractor regression slipped through because `cluster_routes_dispatch_per_graph_handle` only exercised `/snapshot`. The other six protected cluster routes (`/read`, `/change`, `/export`, `/schema`, `/schema/apply`, `/ingest`, `/branches/merge`) were implicitly trusted to work without any multi-mode integration test. Add `all_protected_cluster_routes_resolve_to_their_handler` (`tests/server.rs`) that hits each protected cluster route with a minimal request and asserts the response is consistent with the handler being reached — no 404 (router didn't match), no 500 with "Wrong number of path arguments" (Bug-1 class), no 500 with "missing extension" (routing middleware didn't inject the handle). Status code is a negative assertion because each handler's happy-path inputs differ; what matters is "the request reached the handler," not "the handler returned 200" — that's already pinned by the single-mode tests. * **Gap 2 — `--force` happy path.** The strict re-init regression test (`init_on_existing_graph_uri_does_not_destroy_existing_schema`) pins the error path; nothing pinned the `force: true` escape hatch actually doing what its docstring claims. Add `init_with_force_recovers_from_orphan_schema_files` (`tests/lifecycle.rs`). Writes a bare `_schema.pg` to simulate orphan files from a failed prior init, confirms strict mode bails as expected, then confirms `init_with_options(force: true)` succeeds and produces a functional graph. Note: the test follows the documented semantics — force skips the preflight only, it does NOT purge existing Lance state. An earlier draft of the test (against full overwrite of an existing populated graph) failed because `GraphCoordinator::init` errored on the existing `__manifest`, which is exactly the limitation the `InitOptions::force` docstring already calls out. Recursive purge needs `StorageAdapter::delete_prefix` (tracked separately). Coverage is now fully aligned with the PR's claims. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: regression test for GraphList open-mode bypass (red) Cursor bot's review at commit 4120448 surfaced that `server_graphs_list` returns 200 in Open mode (`--unauthenticated`, no tokens, no policy), exposing the full graph registry — graph IDs and URIs that may contain S3 bucket paths or internal hostnames — to any unauthenticated caller. Root cause: `authorize_request`'s no-policy fallback only denies when `actor.is_some()`. In Open mode `actor: None`, so the denial branch never fires and the call returns `Ok(())`. The docstring on `server_graphs_list` claims the endpoint is "Cedar-gated" and that we "don't leak the registry until the operator explicitly authorizes it" — but Open mode has no Cedar at all, so the docstring intent and the code disagree. This commit renames the existing `get_graphs_lists_registered_graphs_in_multi_mode` test to `get_graphs_denied_in_open_mode_without_server_policy` and flips the assertion from 200 → 403. Today this fails (server returns 200) — exactly the symptom the bot named. The fix in the next commit tightens the no-policy fallback to deny server-scoped actions unconditionally, regardless of mode. Per AGENTS.md rule 12, the red test commit lands just before the fix so the red → green pair is visible in `git log` and a reviewer can check out this commit alone to reproduce. Sort-order coverage that previously lived in the renamed test moves to `get_graphs_with_server_policy_authorizes_per_cedar` in the next commit, where the admin-200 response is operator- authorized and a non-empty body is asserted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: server-scoped actions always require explicit policy (green) `server_graphs_list` returned 200 in Open mode (`--unauthenticated`, no tokens, no policy) because `authorize_request`'s no-policy fallback only denied when `actor.is_some()` AND action != Read. In Open mode `actor: None`, so the denial branch never fired and the call returned `Ok(())` — leaking the registry (graph IDs + URIs that may contain S3 bucket paths or internal hostnames) to any unauthenticated caller. The docstring on `server_graphs_list` claimed it was "Cedar-gated" and that the server should "not leak the registry until the operator explicitly authorizes it" — docstring intent and code disagreed. Symptomatic fix: special-case GraphList. Breaks the moment another server-scoped action (`graph_create`, `graph_delete`) is added. Correct-by-design fix: tie authorization to the action's `resource_kind()`. Server-scoped actions (`PolicyResourceKind::Server`) always require explicit policy authorization — there is no runtime state where they're served by default. Per-graph actions keep the existing default-deny logic (DefaultDeny denies non-Read for authenticated actors; Open mode allows everything per the operator's `--unauthenticated` opt-in for graph DATA, but not for server topology). The fix uses the existing `PolicyResourceKind` enum that #119 already added — no new abstraction. Future server-scoped actions (runtime `graph_create`/`graph_delete` when the cluster catalog ships) automatically pick up the same enforcement without any per-action handler change. Changes: * `crates/omnigraph-server/src/lib.rs:51` — re-export `PolicyResourceKind` (the kind discriminator was already public on the omnigraph-policy crate; needed in scope here). * `crates/omnigraph-server/src/lib.rs:1457` — `authorize_request`'s no-policy fallback gains a server-scoped-action check that fires before the actor-based default-deny logic. Error message names the failure mode and points at `server.policy.file`. * `crates/omnigraph-server/tests/server.rs:5037` — `get_graphs_with_server_policy_authorizes_per_cedar` extended to register two graphs in non-alphabetical order and assert the admin-200 response is sorted alphabetically. Restores the sort-order coverage that lived in `get_graphs_lists_registered_graphs_in_multi_mode` before the red commit renamed it to assert denial. Also bundles a small adjacent cleanup that the bot-review flagged: * `crates/omnigraph-server/src/graph_id.rs:124` — drop the unreachable `"openapi.json"` entry from `is_reserved`. The regex `^[a-zA-Z0-9-]{1,64}$` rejects every dot-containing name before `is_reserved` can run, so dotted entries in this list were dead code that misled readers into thinking the list needed to cover them. Comment now names the structural exclusion. The `rejects_reserved_route_names` test loses its `openapi.json` row (covered by `rejects_dots` via the regex). Closes the "server-scoped management actions silently leak in Open mode" class. Red test from the previous commit (`get_graphs_denied_in_open_mode_without_server_policy`) turns green; all 78 server integration tests + 76 lib tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: fold multi-graph work into v0.6.0 (no separate v0.7.0 release) The branch had bumped workspace versions to 0.7.0 and added a dedicated `docs/releases/v0.7.0.md` for the multi-graph work. Per scope decision: ship the graph-rename and the multi-graph mode in one v0.6.0 release. Changes: * Workspace versions bumped 0.7.0 → 0.6.0 in every crate manifest (`omnigraph`, `omnigraph-compiler`, `omnigraph-policy`, `omnigraph-server`, `omnigraph-cli`) and their internal `path = ..., version = "..."` dependency constraints. * `docs/releases/v0.7.0.md` content merged into `docs/releases/v0.6.0.md`, retargeted to a single coherent v0.6.0 release note covering both the graph terminology rename and the multi-graph server mode. The original v0.7.0.md is deleted. * All `v0.7.0` / `0.7.0` doc and comment references throughout `crates/`, `docs/`, `AGENTS.md`, and `openapi.json` retargeted to `v0.6.0` / `0.6.0`. `Cargo.lock` regenerated to match. * OpenAPI spec regenerated via `OMNIGRAPH_UPDATE_OPENAPI=1 cargo test -p omnigraph-server --test openapi openapi_spec_is_up_to_date` — `"version": "0.6.0"` now. Verification: * `cargo build --workspace` — clean (6 pre-existing engine warnings only). * `cargo test --workspace --locked` — zero failures across all 39 test result groups. * `bash scripts/check-agents-md.sh` — passes (34 links / 33 docs). * `grep -rn "0\.7\.0\|v0\.7\.0" --include='*.rs' --include='*.md' --include='*.json' --include='*.toml' .` returns no workspace hits. The three remaining `0.7.0` strings in `Cargo.lock` belong to unrelated 3rd-party crates (`pem-rfc7468`, `radium`, `rand_xoshiro`). The git tag and crates.io publish happen later — this commit just consolidates the surface so the eventual release is one coherent v0.6.0 covering all the work since v0.5.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mr-668: sanitize internal refs from v0.6.0 release notes cubic-dev-ai P2 comments flagged that the release notes carried internal Linear ticket and RFC references (MR-668, MR-731, MR-723, RFC 0003, RFC 0004). Per AGENTS.md maintenance rule 5, "Release docs are public project history. Describe capabilities, behavior changes, breaking changes, upgrade notes, and user impact; do not reference private ticket systems, internal codenames, or planning shorthand that an outside contributor cannot inspect." The bot's comments are correct against our own published contract — they were a docs-quality regression introduced when I drafted these notes. Replaced each internal reference with the public-facing concept it stood for. The substantive content (capabilities, behavior, guarantees) was already present alongside the refs; sanitization just trimmed the bracketed ticket labels: * Line 6: dropped `(MR-668)` from the multi-graph mode summary — the descriptive name was already self-sufficient. * Line 24: `MR-731 spoof defense` → `the bearer-derived-actor- identity guarantee`; `Forward-compat for Cloud mode (RFC 0003) and OAuth provider (RFC 0004)` → "forward-compat seams for future multi-tenant and OAuth deployments; they're inert in this release" — describes what the operator sees instead of pointing at planning docs. * Line 26: `MR-731's server-authoritative-actor invariant` → "the server-authoritative-actor invariant: actor identity is always sourced from the bearer-token match resolved at the auth boundary" — the public-facing statement of the guarantee. * Line 36: `(MR-723 default-deny otherwise rejects …)` → "without a server policy the default-deny posture rejects …" — same content, no ticket label. * Line 121: `MR-731 spoof regression test` → "The bearer-auth- derived-actor-identity regression test (client-supplied identity headers are ignored; the server-resolved actor is the only identity Cedar sees)" — describes what the test guards instead of naming the originating ticket. Verified: `grep -E 'MR-\d+|RFC[ -]?\d+' docs/releases/v0.6.0.md` returns no matches; the rest of `docs/releases/` is also clean. `scripts/check-agents-md.sh` passes. Note: cubic-dev-ai also flagged `crates/omnigraph-cli/src/main.rs:276` ("doc comment incorrectly references v0.6.0 for a command that only exists in v0.7.0"). That comment is based on a stale model of the release surface — after folding v0.7.0 into v0.6.0 in the previous commit, the multi-graph CLI surface IS in v0.6.0 and the comment is correct as written. No change needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: close validated init and multi-graph gaps * chore: address review cleanup comments --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-09 01:35:18 +02:00 · 2026-05-28 16:19:31 +02:00 · 2026-05-28 16:19:31 +02:00 · e0f13b32c5
commit e0f13b32c5
parent fd41f798b7
54 changed files with 5745 additions and 747 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -17,7 +17,7 @@ Tools that support `@`-imports (Claude Code) auto-include all three files via th
 `CLAUDE.md` is a symlink to this file — there is exactly one source of truth. Edit `AGENTS.md`.

 **Version surveyed:** 0.6.0
-**Workspace crates:** `omnigraph-compiler`, `omnigraph` (engine), `omnigraph-cli`, `omnigraph-server`
+**Workspace crates:** `omnigraph-compiler`, `omnigraph` (engine), `omnigraph-policy`, `omnigraph-cli`, `omnigraph-server`
 **Storage substrate:** Lance 6.x (columnar, versioned, branchable)
 **License:** MIT
 **Toolchain:** Rust stable, edition 2024
@ -33,7 +33,7 @@ OmniGraph is a typed property-graph engine built as a coordination layer over ma
 - **Multi-modal querying**: vector ANN (`nearest`), full-text (`search`/`fuzzy`/`match_text`/`bm25`), Reciprocal Rank Fusion (`rrf`), and graph traversal (`Expand`, anti-join `not { … }`) in one runtime.
 - **Branches and commits across the whole graph**: Git-style — every successful publish appends to a commit DAG; merges are three-way at the row level.
 - **Atomic per-query writes**: `mutate_as` and `load` accumulate insert/update batches into an in-memory `MutationStaging.pending` per touched table; one `stage_*` + `commit_staged` per table runs at end-of-query, then `ManifestBatchPublisher::publish` commits the manifest atomically with per-table `expected_table_versions` CAS. A mid-query failure leaves Lance HEAD untouched on staged tables — no drift, no run state machine, no staging branches. Deletes still inline-commit; D₂ at parse time prevents inserts/updates and deletes from coexisting in one query.
- **HTTP server**: Axum + utoipa OpenAPI, bearer auth (SHA-256 hashed, optional AWS Secrets Manager). Cedar policy enforcement is engine-wide — every `_as` writer calls `Omnigraph::enforce(action, scope, actor)`, so HTTP, CLI, and embedded SDK consumers all hit the same gate.
+- **HTTP server**: Axum + utoipa OpenAPI, bearer auth (SHA-256 hashed, optional AWS Secrets Manager). Cedar policy enforcement is engine-wide — every `_as` writer calls `Omnigraph::enforce(action, scope, actor)`, so HTTP, CLI, and embedded SDK consumers all hit the same gate. **Two modes** (v0.6.0+): single-graph (legacy flat routes) and multi-graph (`/graphs/{graph_id}/...` cluster routes + read-only `GET /graphs` enumeration). Per-graph + server-level Cedar policies. Runtime add/remove (`POST /graphs`, `DELETE /graphs/{id}`) is not exposed — operators edit `omnigraph.yaml` and restart.
 - **CLI** driven by a single `omnigraph.yaml`; multi-format output (json/jsonl/csv/kv/table).

 Throughout the docs, capabilities are split into **L1 — Inherited from Lance** vs **L2 — Added by OmniGraph**.
@ -226,8 +226,8 @@ omnigraph policy explain --actor act-alice --action change --branch main
 | Per-query atomic writes | — | In-memory `MutationStaging.pending` accumulator + `stage_*` / `commit_staged` per touched table at end-of-query + publisher CAS via `commit_with_expected` (single manifest commit per `mutate_as` / `load`); D₂ parse-time rule keeps inserts/updates and deletes from mixing |
 | Three-way row-level merge | — | `OrderedTableCursor` + `StagedTableWriter`, structured `MergeConflictKind` |
 | Change feeds | — | `diff_between` / `diff_commits` with manifest fast path + ID streaming |
-| Cedar policy | — | 8 actions, branch / target_branch / protected scopes, validate/test/explain CLI. **Engine-wide enforcement** (MR-722): every `_as` writer (`apply_schema_as`, `mutate_as`, `load_as`, `ingest_as`, `branch_create_as` / `branch_create_from_as`, `branch_delete_as`, `branch_merge_as`) calls `Omnigraph::enforce(action, scope, actor)` — HTTP, CLI, embedded SDK all hit the same gate. |
-| HTTP server | — | Axum, OpenAPI via utoipa, bearer auth (SHA-256, AWS Secrets Manager option), `authorize_request` at the HTTP boundary (resolves bearer→actor, applies admission control), NDJSON streaming export |
+| Cedar policy | — | Per-graph actions plus server-scoped actions (see [docs/user/policy.md](docs/user/policy.md) for the current list), branch / target_branch / protected scopes, validate/test/explain CLI. **Engine-wide enforcement** (MR-722): every `_as` writer (`apply_schema_as`, `mutate_as`, `load_as`, `ingest_as`, `branch_create_as` / `branch_create_from_as`, `branch_delete_as`, `branch_merge_as`) calls `Omnigraph::enforce(action, scope, actor)` — HTTP, CLI, embedded SDK all hit the same gate. |
+| HTTP server | — | Axum, OpenAPI via utoipa, bearer auth (SHA-256, AWS Secrets Manager option), `authorize_request` at the HTTP boundary (resolves bearer→actor, applies admission control), NDJSON streaming export, **multi-graph mode (v0.6.0+) with cluster routes + read-only `GET /graphs` enumeration + per-graph + server-level Cedar policies. Add/remove graphs by editing `omnigraph.yaml` and restarting.** |
 | CLI with config | — | `omnigraph.yaml`, aliases, multi-format output (json/jsonl/csv/kv/table) |
 | Audit / actor tracking | — | `_as` write APIs + actor map in commit graph |
 | Local RustFS bootstrap | — | `scripts/local-rustfs-bootstrap.sh` one-shot S3-backed dev environment |
--- a/Cargo.lock
+++ b/Cargo.lock
@ -4642,6 +4642,7 @@ dependencies = [
 name = "omnigraph-server"
 version = "0.6.0"
 dependencies = [
+ "arc-swap",
 "async-trait",
 "aws-config",
 "aws-sdk-secretsmanager",
@ -4655,6 +4656,7 @@ dependencies = [
 "omnigraph-compiler",
 "omnigraph-engine",
 "omnigraph-policy",
+ "regex",
 "serde",
 "serde_json",
 "serde_yaml",
@ -4662,6 +4664,7 @@ dependencies = [
 "sha2",
 "subtle",
 "tempfile",
+ "thiserror",
 "tokio",
 "tower",
 "tower-http",
--- a/crates/omnigraph-cli/src/main.rs
+++ b/crates/omnigraph-cli/src/main.rs
@ -18,9 +18,10 @@ use omnigraph_compiler::{
 use omnigraph_server::api::{
    BranchCreateOutput, BranchCreateRequest, BranchDeleteOutput, BranchListOutput,
    BranchMergeOutput, BranchMergeRequest, ChangeOutput, ChangeRequest, CommitListOutput,
-    CommitOutput, ErrorOutput, ExportRequest, IngestOutput, IngestRequest, ReadOutput, ReadRequest,
-    SchemaApplyOutput, SchemaApplyRequest, SchemaOutput, SnapshotOutput, SnapshotTableOutput,
-    commit_output, ingest_output, read_output, schema_apply_output, snapshot_payload,
+    CommitOutput, ErrorOutput, ExportRequest, GraphListResponse, IngestOutput, IngestRequest,
+    ReadOutput, ReadRequest, SchemaApplyOutput, SchemaApplyRequest, SchemaOutput, SnapshotOutput,
+    SnapshotTableOutput, commit_output, ingest_output, read_output, schema_apply_output,
+    snapshot_payload,
 };
 use omnigraph_server::{
    AliasCommand, OmnigraphConfig, PolicyAction, PolicyDecision, PolicyEngine, PolicyRequest,
@ -72,6 +73,13 @@ enum Command {
        schema: PathBuf,
        /// Graph URI (local path or s3://)
        uri: String,
+        /// Overwrite existing schema artifacts at the URI. Without
+        /// this flag, init refuses to touch a URI that already holds
+        /// `_schema.pg`, `_schema.ir.json`, or `__schema_state.json`
+        /// — closes the re-init footgun (MR-668 follow-up). With the
+        /// flag, the operator opts in to destructive semantics.
+        #[arg(long)]
+        force: bool,
    },
    /// Load data into a graph
    Load {
@ -253,6 +261,33 @@ enum Command {
        #[arg(long)]
        json: bool,
    },
+    /// Manage graphs on a multi-graph server (MR-668)
+    Graphs {
+        #[command(subcommand)]
+        command: GraphsCommand,
+    },
+}
+
+/// Operations on the graph registry of a multi-graph server (MR-668).
+///
+/// All operations target a remote multi-graph server URL (http:// or
+/// https://). Local-URI invocations return a clear error. To add or
+/// remove graphs, operators edit `omnigraph.yaml` directly and restart
+/// the server — runtime mutation is not exposed in v0.6.0.
+#[derive(Debug, Subcommand)]
+enum GraphsCommand {
+    /// List every graph registered with the multi-graph server.
+    List {
+        /// Remote server URL (e.g. `https://server.example.com`).
+        #[arg(long)]
+        uri: Option<String>,
+        #[arg(long)]
+        target: Option<String>,
+        #[arg(long)]
+        config: Option<PathBuf>,
+        #[arg(long)]
+        json: bool,
+    },
 }

 #[derive(Debug, Subcommand)]
@ -694,7 +729,7 @@ fn resolve_policy_engine(config: &OmnigraphConfig) -> Result<PolicyEngine> {
    let policy_file = config
        .resolve_policy_file()
        .ok_or_else(|| color_eyre::eyre::eyre!("policy.file must be set in omnigraph.yaml"))?;
-    PolicyEngine::load(&policy_file, &policy_graph_id(config))
+    PolicyEngine::load_graph(&policy_file, &policy_graph_id(config))
 }

 /// Open a local-URI graph and, when `policy.file` is configured in
@ -1305,12 +1340,12 @@ fn print_commit_human(commit: &CommitOutput) {
    println!("created_at: {}", commit.created_at);
 }

-fn print_policy_explain(decision: &PolicyDecision, request: &PolicyRequest) {
+fn print_policy_explain(decision: &PolicyDecision, actor_id: &str, request: &PolicyRequest) {
    println!(
        "decision: {}",
        if decision.allowed { "allow" } else { "deny" }
    );
-    println!("actor: {}", request.actor_id);
+    println!("actor: {}", actor_id);
    println!("action: {}", request.action);
    if let Some(branch) = &request.branch {
        println!("branch: {}", branch);
@ -1718,10 +1753,15 @@ async fn main() -> Result<()> {
                print_embed_human(&output);
            }
        }
-        Command::Init { schema, uri } => {
+        Command::Init { schema, uri, force } => {
            let schema_source = fs::read_to_string(&schema)?;
            ensure_local_graph_parent(&uri)?;
-            Omnigraph::init(&uri, &schema_source).await?;
+            Omnigraph::init_with_options(
+                &uri,
+                &schema_source,
+                omnigraph::db::InitOptions { force },
+            )
+            .await?;
            scaffold_config_if_missing(&uri)?;
            println!("initialized {}", uri);
        }
@ -2443,13 +2483,12 @@ async fn main() -> Result<()> {
                let config = load_cli_config(config.as_ref())?;
                let engine = resolve_policy_engine(&config)?;
                let request = PolicyRequest {
-                    actor_id: actor,
                    action,
                    branch,
                    target_branch,
                };
-                let decision = engine.authorize(&request)?;
-                print_policy_explain(&decision, &request);
+                let decision = engine.authorize(&actor, &request)?;
+                print_policy_explain(&decision, &actor, &request);
            }
        },
        Command::Optimize {
@ -2556,6 +2595,41 @@ async fn main() -> Result<()> {
                );
            }
        }
+        Command::Graphs { command } => match command {
+            GraphsCommand::List {
+                uri,
+                target,
+                config,
+                json,
+            } => {
+                let config = load_cli_config(config.as_ref())?;
+                let bearer_token =
+                    resolve_remote_bearer_token(&config, uri.as_deref(), target.as_deref())?;
+                let uri = resolve_uri(&config, uri, target.as_deref())?;
+                if !is_remote_uri(&uri) {
+                    bail!(
+                        "`omnigraph graphs list` requires a remote multi-graph server URL \
+                         (http:// or https://). To enumerate local graphs, read `omnigraph.yaml` \
+                         directly."
+                    );
+                }
+                let payload = remote_json::<GraphListResponse>(
+                    &http_client,
+                    Method::GET,
+                    remote_url(&uri, "/graphs"),
+                    None,
+                    bearer_token.as_deref(),
+                )
+                .await?;
+                if json {
+                    print_json(&payload)?;
+                } else {
+                    for entry in payload.graphs {
+                        println!("{}\t{}", entry.graph_id, entry.uri);
+                    }
+                }
+            }
+        },
    }
    Ok(())
 }
--- a/crates/omnigraph-cli/tests/cli.rs
+++ b/crates/omnigraph-cli/tests/cli.rs
@ -2040,3 +2040,47 @@ fn schema_plan_parity_cli_and_sdk() {
    );
    assert_eq!(cli_payload["supported"], plan.supported);
 }
+
+// ─── MR-668 PR 8 — omnigraph graphs subcommand ─────────────────────────────
+
+/// `omnigraph graphs --help` lists only the read-only `list`
+/// subcommand. Runtime add (`create`) and remove (`delete`) are
+/// deferred — operators add/remove graphs by editing `omnigraph.yaml`
+/// and restarting. This test pins the deferral against accidental
+/// re-introduction.
+#[test]
+fn graphs_subcommand_help_lists_list_only() {
+    let output = output_success(cli().arg("graphs").arg("--help"));
+    let stdout = stdout_string(&output);
+    assert!(
+        stdout.contains("list"),
+        "expected `list` subcommand in help output:\n{stdout}"
+    );
+    let lowered = stdout.to_lowercase();
+    assert!(
+        !lowered.contains("create a new graph"),
+        "graph create should not be in v0.6.0 help; got:\n{stdout}"
+    );
+    assert!(
+        !lowered.contains("delete a graph"),
+        "graph delete should not be in v0.6.0 help; got:\n{stdout}"
+    );
+}
+
+/// `omnigraph graphs list` against a local URI errors with a clear
+/// message — the CLI only operates against remote multi-graph servers.
+#[test]
+fn graphs_list_against_local_uri_errors_with_remote_only_message() {
+    let output = output_failure(
+        cli()
+            .arg("graphs")
+            .arg("list")
+            .arg("--uri")
+            .arg("/tmp/local"),
+    );
+    let stderr = String::from_utf8_lossy(&output.stderr).into_owned();
+    assert!(
+        stderr.contains("remote multi-graph server URL"),
+        "expected 'remote multi-graph server URL' rejection in stderr; got:\n{stderr}"
+    );
+}
--- a/crates/omnigraph-cli/tests/system_remote.rs
+++ b/crates/omnigraph-cli/tests/system_remote.rs
@ -37,6 +37,17 @@ rules:
      target_branch_scope: protected
 "#;

+const GRAPH_LIST_SERVER_POLICY_YAML: &str = r#"
+version: 1
+groups:
+  admins: [act-admin]
+rules:
+  - id: admins-can-list-graphs
+    allow:
+      actors: { group: admins }
+      actions: [graph_list]
+"#;
+
 fn yaml_string(value: &str) -> String {
    format!("'{}'", value.replace('\'', "''"))
 }
@ -888,3 +899,112 @@ query insert_person($name: String, $age: I32) {
    assert_eq!(verify["row_count"], 1);
    assert_eq!(verify["rows"][0]["p.name"], "PolicyRemote");
 }
+
+// ─── MR-668 PR 8 — omnigraph graphs list end-to-end ────────────────────────
+
+/// Multi-graph server + CLI `omnigraph graphs list` end-to-end.
+///
+/// Steps:
+///   1. Init a graph `alpha` on disk and write an `omnigraph.yaml`
+///      whose `graphs:` map references it.
+///   2. Spawn the server with `--config <yaml>`.
+///   3. `omnigraph graphs list` — expect to see `alpha`.
+///
+/// Ignored by default — spawning servers needs loopback socket
+/// permissions some sandboxes lack.
+#[test]
+#[ignore = "requires loopback socket permissions in sandboxed runners"]
+fn graphs_list_against_multi_graph_server() {
+    let cfg_dir = tempfile::tempdir().unwrap();
+    let schema_path = fixture("test.pg");
+
+    // Init `alpha` on disk.
+    let alpha_uri = cfg_dir.path().join("alpha.omni");
+    tokio::runtime::Runtime::new().unwrap().block_on(async {
+        Omnigraph::init(
+            alpha_uri.to_str().unwrap(),
+            &fs::read_to_string(&schema_path).unwrap(),
+        )
+        .await
+        .unwrap();
+    });
+
+    fs::write(
+        cfg_dir.path().join("server-policy.yaml"),
+        GRAPH_LIST_SERVER_POLICY_YAML,
+    )
+    .unwrap();
+
+    // Server config with `graphs:` map and no `server.graph` selector
+    // — multi mode (rule 4 of the inference matrix). `GET /graphs` is a
+    // server-scoped action, so the success path needs an explicit server
+    // policy and bearer token.
+    let server_config_path = cfg_dir.path().join("omnigraph.yaml");
+    fs::write(
+        &server_config_path,
+        format!(
+            "\
+server:
+  policy:
+    file: ./server-policy.yaml
+graphs:
+  alpha:
+    uri: {}
+",
+            yaml_string(&alpha_uri.to_string_lossy())
+        ),
+    )
+    .unwrap();
+
+    let server = spawn_server_with_config_env(
+        &server_config_path,
+        &[(
+            "OMNIGRAPH_SERVER_BEARER_TOKENS_JSON",
+            r#"{"act-admin":"admin-token"}"#,
+        )],
+    );
+
+    // Client config — the CLI's `--target dev` resolves to `server.base_url`.
+    let client_config_path = cfg_dir.path().join("client.yaml");
+    fs::write(
+        &client_config_path,
+        format!(
+            "\
+graphs:
+  dev:
+    uri: {}
+    bearer_token_env: GRAPH_LIST_TOKEN
+cli:
+  graph: dev
+auth:
+  env_file: ./.env.omni
+",
+            yaml_string(&server.base_url)
+        ),
+    )
+    .unwrap();
+    fs::write(
+        cfg_dir.path().join(".env.omni"),
+        "GRAPH_LIST_TOKEN=admin-token\n",
+    )
+    .unwrap();
+
+    // `graphs list` lists `alpha`.
+    let payload = parse_stdout_json(&output_success(
+        cli()
+            .arg("graphs")
+            .arg("list")
+            .arg("--config")
+            .arg(&client_config_path)
+            .arg("--json"),
+    ));
+    let ids: Vec<&str> = payload["graphs"]
+        .as_array()
+        .unwrap()
+        .iter()
+        .map(|g| g["graph_id"].as_str().unwrap())
+        .collect();
+    assert_eq!(ids, vec!["alpha"]);
+
+    drop(server);
+}
--- a/crates/omnigraph-compiler/src/catalog/schema_plan.rs
+++ b/crates/omnigraph-compiler/src/catalog/schema_plan.rs
@ -150,9 +150,7 @@ impl SchemaMigrationStep {
    /// non-`UnsupportedChange` variant).
    pub fn diagnostic(&self) -> Option<&'static crate::lint::DiagnosticCode> {
        match self {
-            Self::UnsupportedChange {
-                code: Some(c), ..
-            } => crate::lint::lookup(c),
+            Self::UnsupportedChange { code: Some(c), .. } => crate::lint::lookup(c),
            _ => None,
        }
    }
@ -1037,10 +1035,7 @@ node Person {
        .unwrap();

        let plan = plan_schema_migration(&accepted, &desired).unwrap();
-        assert!(
-            plan.supported,
-            "drop-type plan must be supported: {plan:?}"
-        );
+        assert!(plan.supported, "drop-type plan must be supported: {plan:?}");
        assert!(
            plan.steps.iter().any(|step| matches!(
                step,
@ -1182,8 +1177,7 @@ node Person @description("new") {

        for step in steps {
            let json = serde_json::to_string(&step).expect("serialize");
-            let round_trip: SchemaMigrationStep =
-                serde_json::from_str(&json).expect("deserialize");
+            let round_trip: SchemaMigrationStep = serde_json::from_str(&json).expect("deserialize");
            assert_eq!(step, round_trip, "round-trip mismatch on {json}");
        }
    }
--- a/crates/omnigraph-compiler/src/ir/lower.rs
+++ b/crates/omnigraph-compiler/src/ir/lower.rs
@ -271,9 +271,7 @@ fn lower_clauses(
                .traversals
                .iter()
                .find(|rt| {
-                    rt.src == traversal.src
-                        && rt.dst == traversal.dst
-                        && rt.edge_type == edge.name
+                    rt.src == traversal.src && rt.dst == traversal.dst && rt.edge_type == edge.name
                })
                .map(|rt| rt.direction)
                .unwrap_or(Direction::Out);
--- a/crates/omnigraph-compiler/src/ir/lower_tests.rs
+++ b/crates/omnigraph-compiler/src/ir/lower_tests.rs
@ -205,12 +205,8 @@ insert Knows { from: $name, to: $friend }

    let ir = lower_mutation_query(&qf.queries[0]).unwrap();
    assert_eq!(ir.ops.len(), 2);
-    assert!(
-        matches!(&ir.ops[0], MutationOpIR::Insert { type_name, .. } if type_name == "Person")
-    );
-    assert!(
-        matches!(&ir.ops[1], MutationOpIR::Insert { type_name, .. } if type_name == "Knows")
-    );
+    assert!(matches!(&ir.ops[0], MutationOpIR::Insert { type_name, .. } if type_name == "Person"));
+    assert!(matches!(&ir.ops[1], MutationOpIR::Insert { type_name, .. } if type_name == "Knows"));
 }

 /// Destination binding is deferred: NodeScan + Expand + Filter (no cross-join).
--- a/crates/omnigraph-compiler/src/lib.rs
+++ b/crates/omnigraph-compiler/src/lib.rs
@ -18,9 +18,9 @@ pub use catalog::schema_ir::{
 pub use catalog::schema_plan::{
    DropMode, SchemaMigrationPlan, SchemaMigrationStep, SchemaTypeKind, plan_schema_migration,
 };
-pub use lint::{DiagnosticCode, Family, SafetyTier, Severity};
 pub use ir::ParamMap;
 pub use ir::lower::{lower_mutation_query, lower_query};
+pub use lint::{DiagnosticCode, Family, SafetyTier, Severity};
 pub use query::ast::Literal;
 pub use query::lint::{
    QueryLintFinding, QueryLintOutput, QueryLintQueryKind, QueryLintQueryResult,
--- a/crates/omnigraph-compiler/src/lint/codes.rs
+++ b/crates/omnigraph-compiler/src/lint/codes.rs
@ -116,7 +116,13 @@ pub const ALL_CODES: &[DiagnosticCode] = &[
 ];

 /// Codes actually emitted by the planner in v0 (i.e. not reserved).
-pub const EMITTED_IN_V0: &[&str] = &["OG-DS-102", "OG-DS-103", "OG-DS-104", "OG-MF-103", "OG-MF-106"];
+pub const EMITTED_IN_V0: &[&str] = &[
+    "OG-DS-102",
+    "OG-DS-103",
+    "OG-DS-104",
+    "OG-MF-103",
+    "OG-MF-106",
+];

 /// Look up a code by its string identifier.
 pub fn lookup(code: &str) -> Option<&'static DiagnosticCode> {
--- a/crates/omnigraph-compiler/src/lint/mod.rs
+++ b/crates/omnigraph-compiler/src/lint/mod.rs
@ -24,5 +24,5 @@
 pub mod codes;
 pub mod diagnostic;

-pub use codes::{lookup, DiagnosticCode, ALL_CODES};
+pub use codes::{ALL_CODES, DiagnosticCode, lookup};
 pub use diagnostic::{Family, SafetyTier, Severity};
--- a/crates/omnigraph-compiler/src/query/parser.rs
+++ b/crates/omnigraph-compiler/src/query/parser.rs
@ -137,12 +137,11 @@ fn parse_query_decl(pair: pest::iterators::Pair<Rule>) -> Result<QueryDecl> {
                    Rule::mutation_body => {
                        for mutation_pair in body.into_inner() {
                            if let Rule::mutation_stmt = mutation_pair.as_rule() {
-                                let stmt =
-                                    mutation_pair.into_inner().next().ok_or_else(|| {
-                                        NanoError::Parse(
-                                            "mutation statement cannot be empty".to_string(),
-                                        )
-                                    })?;
+                                let stmt = mutation_pair.into_inner().next().ok_or_else(|| {
+                                    NanoError::Parse(
+                                        "mutation statement cannot be empty".to_string(),
+                                    )
+                                })?;
                                mutations.push(parse_mutation_stmt(stmt)?);
                            }
                        }
--- a/crates/omnigraph-compiler/src/schema/parser_tests.rs
+++ b/crates/omnigraph-compiler/src/schema/parser_tests.rs
@ -271,9 +271,9 @@ age: I32?
    match &schema.declarations[0] {
        SchemaDecl::Node(n) => {
            assert!(
-                n.constraints.iter().any(
-                    |c| matches!(c, Constraint::Range { property, .. } if property == "age")
-                )
+                n.constraints
+                    .iter()
+                    .any(|c| matches!(c, Constraint::Range { property, .. } if property == "age"))
            );
        }
        _ => panic!("expected Node"),
--- a/crates/omnigraph-policy/src/lib.rs
+++ b/crates/omnigraph-policy/src/lib.rs
@ -39,6 +39,23 @@ pub enum PolicyAction {
    /// future shape. Avoid writing such rules until the first consumer
    /// endpoint ships to prevent confusion.
    Admin,
+    /// MR-668: management action that operates on the server's graph
+    /// registry, not on a single graph's contents. The Cedar `appliesTo`
+    /// declaration binds it to `resource: Server` instead of the
+    /// per-graph `resource: Graph`. Operators authorize a group with:
+    /// ```yaml
+    /// rules:
+    ///   - id: admins-can-list-graphs
+    ///     allow:
+    ///       actors: { group: admins }
+    ///       actions: [graph_list]
+    /// ```
+    /// `branch_scope` and `target_branch_scope` are NOT supported for
+    /// this action — there's no branch context at the server level.
+    /// Runtime `graph_create` / `graph_delete` are intentionally omitted
+    /// from v0.6.0; operators add and remove graphs by editing
+    /// `omnigraph.yaml` and restarting.
+    GraphList,
 }

 impl PolicyAction {
@ -52,6 +69,7 @@ impl PolicyAction {
            Self::BranchDelete => "branch_delete",
            Self::BranchMerge => "branch_merge",
            Self::Admin => "admin",
+            Self::GraphList => "graph_list",
        }
    }

@ -65,6 +83,56 @@ impl PolicyAction {
            Self::BranchCreate | Self::SchemaApply | Self::BranchDelete | Self::BranchMerge
        )
    }
+
+    /// Which Cedar resource entity governs this action.
+    /// Per-graph actions (Read, Change, …) apply to `Omnigraph::Graph::"<id>"`.
+    /// Server-scoped management actions (GraphList) apply to
+    /// `Omnigraph::Server::"root"`. `Admin` is reserved without a current
+    /// call site; classified as per-graph until MR-724 picks a shape.
+    pub fn resource_kind(self) -> PolicyResourceKind {
+        match self {
+            Self::GraphList => PolicyResourceKind::Server,
+            Self::Read
+            | Self::Export
+            | Self::Change
+            | Self::SchemaApply
+            | Self::BranchCreate
+            | Self::BranchDelete
+            | Self::BranchMerge
+            | Self::Admin => PolicyResourceKind::Graph,
+        }
+    }
+}
+
+/// Which Cedar entity an action's policies apply to. Internal to
+/// `omnigraph-policy` — drives the `compile_policy_source` template
+/// and the request-time resource UID construction.
+#[derive(Debug, Clone, Copy, Eq, PartialEq)]
+pub enum PolicyResourceKind {
+    /// `Omnigraph::Graph::"<graph_label>"` — per-graph actions.
+    Graph,
+    /// `Omnigraph::Server::"root"` — management actions.
+    Server,
+}
+
+/// Which kind of policy file the caller is loading. Drives the
+/// load-time validation that catches a "wrong action in wrong file"
+/// mistake — a graph policy with `graph_list` rules, or a server
+/// policy with `read` rules, both compile silently as Cedar but
+/// never match any actual request. Typing the loader makes the
+/// mistake a load-time error.
+///
+/// Pairs with [`PolicyAction::resource_kind`]: every action's resource
+/// kind must match the engine kind it's loaded under.
+#[derive(Debug, Clone, Copy, Eq, PartialEq)]
+pub enum PolicyEngineKind {
+    /// Engine is loaded for a single graph; only actions whose
+    /// `resource_kind()` is `PolicyResourceKind::Graph` are allowed.
+    Graph,
+    /// Engine is loaded for server-level management endpoints; only
+    /// actions whose `resource_kind()` is `PolicyResourceKind::Server`
+    /// are allowed.
+    Server,
 }

 impl fmt::Display for PolicyAction {
@ -86,6 +154,7 @@ impl FromStr for PolicyAction {
            "branch_delete" => Ok(Self::BranchDelete),
            "branch_merge" => Ok(Self::BranchMerge),
            "admin" => Ok(Self::Admin),
+            "graph_list" => Ok(Self::GraphList),
            other => bail!("unknown policy action '{other}'"),
        }
    }
@ -153,9 +222,16 @@ pub enum PolicyExpectation {
    Deny,
 }

+/// What a caller wants to do, sans identity. Actor identity flows
+/// through a separate `actor_id: &str` parameter on
+/// [`PolicyEngine::authorize`] / [`PolicyChecker::check`] — encoding
+/// the architectural invariant that actor identity is server-authoritative
+/// and must not be supplied by the same code path that supplies the
+/// requested action. In the HTTP layer, the bearer-token middleware
+/// resolves the actor and passes it independently; clients cannot
+/// smuggle identity inside this struct.
 #[derive(Debug, Clone)]
 pub struct PolicyRequest {
-    pub actor_id: String,
    pub action: PolicyAction,
    pub branch: Option<String>,
    pub target_branch: Option<String>,
@ -262,6 +338,34 @@ impl PolicyConfig {
                    }
                }
            }
+            // MR-668: server-scoped actions have no branch context and
+            // must not be mixed with per-graph actions in the same
+            // rule (each rule generates one Cedar `permit` referencing
+            // a specific resource kind).
+            let mut server_scoped = false;
+            let mut graph_scoped = false;
+            for action in &rule.allow.actions {
+                match action.resource_kind() {
+                    PolicyResourceKind::Server => server_scoped = true,
+                    PolicyResourceKind::Graph => graph_scoped = true,
+                }
+            }
+            if server_scoped && graph_scoped {
+                bail!(
+                    "policy rule '{}' mixes the server-scoped action `graph_list` \
+                     with per-graph actions; split into separate rules",
+                    rule.id
+                );
+            }
+            if server_scoped
+                && (rule.allow.branch_scope.is_some() || rule.allow.target_branch_scope.is_some())
+            {
+                bail!(
+                    "policy rule '{}' uses branch_scope/target_branch_scope with a \
+                     server-scoped action; server-scoped actions have no branch context",
+                    rule.id
+                );
+            }
        }

        Ok(())
@ -330,26 +434,61 @@ impl PolicyCompiler {
 }

 impl PolicyEngine {
-    pub fn load(path: &Path, graph_id: &str) -> Result<Self> {
+    /// Load a per-graph policy file. Rejects rules whose actions are
+    /// server-scoped (e.g. `graph_list`) — those belong in a server
+    /// policy file, not a per-graph one.
+    ///
+    /// `graph_id` is the label of the graph this engine governs;
+    /// becomes the Cedar `Omnigraph::Graph::"<graph_id>"` resource
+    /// for every per-graph action evaluated against this engine.
+    pub fn load_graph(path: &Path, graph_id: &str) -> Result<Self> {
        let config = PolicyConfig::load(path)?;
+        validate_kind_alignment(&config, PolicyEngineKind::Graph)?;
        PolicyCompiler::compile(&config, graph_id)
    }

-    pub fn authorize(&self, request: &PolicyRequest) -> Result<PolicyDecision> {
-        if !self.known_actors.contains(request.actor_id.as_str()) {
+    /// Load a server-level policy file. Rejects rules whose actions
+    /// are per-graph (e.g. `read`, `change`) — those belong in a
+    /// per-graph policy file, not the server one. Takes no `graph_id`:
+    /// server-scoped actions resolve against the singleton
+    /// `Omnigraph::Server::"root"` entity, never a Graph.
+    pub fn load_server(path: &Path) -> Result<Self> {
+        let config = PolicyConfig::load(path)?;
+        validate_kind_alignment(&config, PolicyEngineKind::Server)?;
+        // The Graph entity created by the compiler is never referenced
+        // by a server-scoped rule, so the label below is purely a
+        // placeholder. Use the canonical SERVER_RESOURCE_ID so any
+        // future inspection of an unreachable Graph entity at least
+        // points at the right concept.
+        PolicyCompiler::compile(&config, SERVER_RESOURCE_ID)
+    }
+
+    /// Evaluate a request. `actor_id` is supplied as a separate
+    /// argument (not inside `PolicyRequest`) so the type system enforces
+    /// the "server-authoritative actor identity" invariant — clients
+    /// supplying a `PolicyRequest` cannot smuggle identity through the
+    /// same struct that carries the requested action.
+    pub fn authorize(&self, actor_id: &str, request: &PolicyRequest) -> Result<PolicyDecision> {
+        if !self.known_actors.contains(actor_id) {
            return Ok(self.deny(
-                request,
                None,
                format!(
                    "policy denied action '{}' for unknown actor '{}'",
-                    request.action, request.actor_id
+                    request.action, actor_id
                ),
            ));
        }

-        let principal = entity_uid("Actor", &request.actor_id)?;
+        let principal = entity_uid("Actor", actor_id)?;
        let action = entity_uid("Action", request.action.as_str())?;
-        let resource = entity_uid("Graph", &self.graph_id)?;
+        // Pick the resource entity based on the action's `resource_kind`.
+        // Server-scoped actions (`graph_list`) bind to
+        // `Omnigraph::Server::"root"`; per-graph actions bind to
+        // `Omnigraph::Graph::"<graph_label>"`.
+        let resource = match request.action.resource_kind() {
+            PolicyResourceKind::Server => entity_uid("Server", SERVER_RESOURCE_ID)?,
+            PolicyResourceKind::Graph => entity_uid("Graph", &self.graph_id)?,
+        };
        let context_value = json!({
            "has_branch": request.branch.is_some(),
            "branch": request.branch.clone().unwrap_or_default(),
@ -386,7 +525,7 @@ impl PolicyEngine {
                matched_rule_id: matched_rule_id.clone(),
                message: format!(
                    "policy allowed action '{}' for actor '{}'",
-                    request.action, request.actor_id
+                    request.action, actor_id
                ),
            },
            Decision::Deny => {
@ -403,30 +542,27 @@ impl PolicyEngine {
                        .as_deref()
                        .map(|branch| format!(" targeting branch '{}'", branch))
                        .unwrap_or_default(),
-                    request.actor_id
+                    actor_id
                );
-                self.deny(request, matched_rule_id, message)
+                self.deny(matched_rule_id, message)
            }
        })
    }

-    pub fn validate_request(&self, request: &PolicyRequest) -> Result<()> {
-        let _ = self.authorize(request)?;
-        Ok(())
-    }
-
    pub fn run_tests(&self, tests: &PolicyTestConfig) -> Result<()> {
        if tests.version != 1 {
            bail!("policy test version must be 1");
        }
        let mut failures = Vec::new();
        for case in &tests.cases {
-            let decision = self.authorize(&PolicyRequest {
-                actor_id: case.actor.clone(),
-                action: case.action,
-                branch: case.branch.clone(),
-                target_branch: case.target_branch.clone(),
-            })?;
+            let decision = self.authorize(
+                &case.actor,
+                &PolicyRequest {
+                    action: case.action,
+                    branch: case.branch.clone(),
+                    target_branch: case.target_branch.clone(),
+                },
+            )?;
            let expected_allowed = matches!(case.expect, PolicyExpectation::Allow);
            if decision.allowed != expected_allowed {
                failures.push(format!(
@ -448,12 +584,7 @@ impl PolicyEngine {
        self.known_actors.len()
    }

-    fn deny(
-        &self,
-        _request: &PolicyRequest,
-        matched_rule_id: Option<String>,
-        message: String,
-    ) -> PolicyDecision {
+    fn deny(&self, matched_rule_id: Option<String>, message: String) -> PolicyDecision {
        PolicyDecision {
            allowed: false,
            matched_rule_id,
@ -462,6 +593,38 @@ impl PolicyEngine {
    }
 }

+/// Reject any rule whose actions don't match the engine kind
+/// being loaded. Closes the "wrong action in wrong file silently
+/// no-ops" class — `graph_list` in a per-graph file or `read` in
+/// a server file fails at load time instead of compiling cleanly
+/// and never matching a request.
+fn validate_kind_alignment(config: &PolicyConfig, kind: PolicyEngineKind) -> Result<()> {
+    let required = match kind {
+        PolicyEngineKind::Graph => PolicyResourceKind::Graph,
+        PolicyEngineKind::Server => PolicyResourceKind::Server,
+    };
+    for rule in &config.rules {
+        for action in &rule.allow.actions {
+            if action.resource_kind() != required {
+                let (got, expected_file) = match action.resource_kind() {
+                    PolicyResourceKind::Server => ("server-scoped", "server policy file"),
+                    PolicyResourceKind::Graph => ("per-graph", "per-graph policy file"),
+                };
+                bail!(
+                    "policy rule '{}' uses {} action '{}' in a {:?} policy file; \
+                     move it to a {}",
+                    rule.id,
+                    got,
+                    action,
+                    kind,
+                    expected_file
+                );
+            }
+        }
+    }
+    Ok(())
+}
+
 fn compile_entities(config: &PolicyConfig, graph_id: &str, schema: &Schema) -> Result<Entities> {
    let mut group_entities = Vec::new();
    for group in config.groups.keys() {
@ -505,6 +668,26 @@ fn compile_entities(config: &PolicyConfig, graph_id: &str, schema: &Schema) -> R
    entities.extend(group_entities);
    entities.extend(actor_entities);
    entities.push(graph_entity);
+
+    // MR-668: include the `Omnigraph::Server::"root"` entity
+    // whenever any rule references a server-scoped action. Cedar's
+    // schema validator will otherwise reject the policy. Keeping this
+    // conditional (rather than always-on) avoids polluting test
+    // assertions for graph-only policies.
+    let any_server_scoped = config.rules.iter().any(|rule| {
+        rule.allow
+            .actions
+            .iter()
+            .any(|action| action.resource_kind() == PolicyResourceKind::Server)
+    });
+    if any_server_scoped {
+        entities.push(Entity::new(
+            entity_uid("Server", SERVER_RESOURCE_ID)?,
+            HashMap::new(),
+            HashSet::<EntityUid>::new(),
+        )?);
+    }
+
    Ok(Entities::from_entities(entities, Some(schema))?)
 }

@ -543,16 +726,29 @@ fn compile_policy_source(rule: &PolicyRule, action: &PolicyAction, graph_id: &st
        format!("\nwhen {{ {} }}", conditions.join(" && "))
    };

+    // MR-668: emit the resource literal that matches the action's
+    // `resource_kind`. Per-graph actions reference the engine's
+    // `Omnigraph::Graph::"<graph_label>"` instance; server-scoped
+    // actions reference the singleton `Omnigraph::Server::"root"`.
+    let resource_literal = match action.resource_kind() {
+        PolicyResourceKind::Graph => {
+            format!("Omnigraph::Graph::{}", cedar_literal(graph_id))
+        }
+        PolicyResourceKind::Server => {
+            format!("Omnigraph::Server::{}", cedar_literal(SERVER_RESOURCE_ID))
+        }
+    };
+
    format!(
        r#"permit (
    principal in Omnigraph::Group::{group},
    action == Omnigraph::Action::{action},
-    resource == Omnigraph::Graph::{graph}
+    resource == {resource_literal}
 ){when};"#,
        group = cedar_literal(&rule.allow.actors.group),
        action = cedar_literal(action.as_str()),
-        graph = cedar_literal(graph_id),
        when = when,
+        resource_literal = resource_literal,
    )
 }

@ -581,6 +777,11 @@ fn target_branch_scope_condition(scope: PolicyBranchScope) -> String {
 }

 fn policy_schema_source() -> &'static str {
+    // MR-668: `entity Server;` plus the `graph_list` action that
+    // binds to it. Per-graph actions stay bound to `Graph`.
+    // The Cedar schema string lives here (not on a fixture file) so any
+    // omnigraph-policy build picks up the new vocabulary in lock-step
+    // with the Rust code.
    r#"
 namespace Omnigraph {
    type RequestContext = {
@ -595,6 +796,7 @@ namespace Omnigraph {
    entity Actor in [Group];
    entity Group;
    entity Graph;
+    entity Server;

    action "read" appliesTo { principal: Actor, resource: Graph, context: RequestContext };
    action "export" appliesTo { principal: Actor, resource: Graph, context: RequestContext };
@ -604,10 +806,17 @@ namespace Omnigraph {
    action "branch_delete" appliesTo { principal: Actor, resource: Graph, context: RequestContext };
    action "branch_merge" appliesTo { principal: Actor, resource: Graph, context: RequestContext };
    action "admin" appliesTo { principal: Actor, resource: Graph, context: RequestContext };
+
+    action "graph_list" appliesTo { principal: Actor, resource: Server, context: RequestContext };
 }
 "#
 }

+/// Canonical id of the `Omnigraph::Server` Cedar entity. There's only one
+/// (the running server); the id is fixed at `"root"` so Cedar rules can
+/// reference it unambiguously: `resource == Omnigraph::Server::"root"`.
+const SERVER_RESOURCE_ID: &str = "root";
+
 fn entity_uid(entity_type: &str, id: &str) -> Result<EntityUid> {
    let typename = EntityTypeName::from_str(&format!("Omnigraph::{entity_type}"))?;
    let entity_id = EntityId::from_str(id).map_err(|err| eyre!(err.to_string()))?;
@ -619,10 +828,6 @@ fn cedar_literal(value: &str) -> String {
 }

 impl PolicyRequest {
-    pub fn actor_id(&self) -> &str {
-        &self.actor_id
-    }
-
    pub fn action(&self) -> PolicyAction {
        self.action
    }
@ -761,13 +966,12 @@ impl PolicyChecker for PolicyEngine {
    ) -> Result<(), PolicyError> {
        let (branch, target_branch) = scope.to_branch_pair();
        let request = PolicyRequest {
-            actor_id: actor.to_string(),
            action,
            branch: branch.map(|s| s.to_string()),
            target_branch: target_branch.map(|s| s.to_string()),
        };
        let decision = self
-            .authorize(&request)
+            .authorize(actor, &request)
            .map_err(|e| PolicyError::Internal(e.to_string()))?;
        if decision.allowed {
            Ok(())
@ -780,7 +984,7 @@ impl PolicyChecker for PolicyEngine {
 #[cfg(test)]
 mod tests {
    use super::{
-        PolicyAction, PolicyCompiler, PolicyConfig, PolicyExpectation, PolicyRequest,
+        PolicyAction, PolicyCompiler, PolicyConfig, PolicyEngine, PolicyExpectation, PolicyRequest,
        PolicyTestCase, PolicyTestConfig,
    };

@ -883,33 +1087,39 @@ rules:

        let engine = PolicyCompiler::compile(&policy, "graph").unwrap();
        let allow = engine
-            .authorize(&PolicyRequest {
-                actor_id: "act-bruno".to_string(),
-                action: PolicyAction::Change,
-                branch: Some("feature".to_string()),
-                target_branch: None,
-            })
+            .authorize(
+                "act-bruno",
+                &PolicyRequest {
+                    action: PolicyAction::Change,
+                    branch: Some("feature".to_string()),
+                    target_branch: None,
+                },
+            )
            .unwrap();
        assert!(allow.allowed);
        assert_eq!(allow.matched_rule_id.as_deref(), Some("team-write"));

        let deny = engine
-            .authorize(&PolicyRequest {
-                actor_id: "act-bruno".to_string(),
-                action: PolicyAction::BranchDelete,
-                branch: None,
-                target_branch: Some("main".to_string()),
-            })
+            .authorize(
+                "act-bruno",
+                &PolicyRequest {
+                    action: PolicyAction::BranchDelete,
+                    branch: None,
+                    target_branch: Some("main".to_string()),
+                },
+            )
            .unwrap();
        assert!(!deny.allowed);

        let admin = engine
-            .authorize(&PolicyRequest {
-                actor_id: "act-andrew".to_string(),
-                action: PolicyAction::BranchDelete,
-                branch: None,
-                target_branch: Some("main".to_string()),
-            })
+            .authorize(
+                "act-andrew",
+                &PolicyRequest {
+                    action: PolicyAction::BranchDelete,
+                    branch: None,
+                    target_branch: Some("main".to_string()),
+                },
+            )
            .unwrap();
        assert!(admin.allowed);
        assert_eq!(admin.matched_rule_id.as_deref(), Some("admins-promote"));
@ -978,23 +1188,305 @@ rules:

        let engine = PolicyCompiler::compile(&policy, "graph").unwrap();
        let allow = engine
-            .authorize(&PolicyRequest {
-                actor_id: "act-ragnor".to_string(),
-                action: PolicyAction::SchemaApply,
-                branch: None,
-                target_branch: Some("main".to_string()),
-            })
+            .authorize(
+                "act-ragnor",
+                &PolicyRequest {
+                    action: PolicyAction::SchemaApply,
+                    branch: None,
+                    target_branch: Some("main".to_string()),
+                },
+            )
            .unwrap();
        assert!(allow.allowed);

        let deny = engine
-            .authorize(&PolicyRequest {
-                actor_id: "act-ragnor".to_string(),
-                action: PolicyAction::SchemaApply,
-                branch: None,
-                target_branch: Some("feature".to_string()),
-            })
+            .authorize(
+                "act-ragnor",
+                &PolicyRequest {
+                    action: PolicyAction::SchemaApply,
+                    branch: None,
+                    target_branch: Some("feature".to_string()),
+                },
+            )
            .unwrap();
        assert!(!deny.allowed);
    }
+
+    // ─── MR-668 — server-scoped action (graph_list) ─
+
+    #[test]
+    fn graph_list_action_authorizes_against_server_resource() {
+        let policy: PolicyConfig = serde_yaml::from_str(
+            r#"
+version: 1
+groups:
+  admins: [act-andrew]
+  viewers: [act-bruno]
+rules:
+  - id: admins-list-graphs
+    allow:
+      actors: { group: admins }
+      actions: [graph_list]
+"#,
+        )
+        .unwrap();
+
+        // The graph_label passed at compile time is irrelevant for
+        // server-scoped actions — they resolve against
+        // `Omnigraph::Server::"root"` regardless. We pass a sentinel
+        // so it's obvious the value isn't used.
+        let engine = PolicyCompiler::compile(&policy, "ignored").unwrap();
+
+        let allow = engine
+            .authorize(
+                "act-andrew",
+                &PolicyRequest {
+                    action: PolicyAction::GraphList,
+                    branch: None,
+                    target_branch: None,
+                },
+            )
+            .unwrap();
+        assert!(allow.allowed);
+        assert_eq!(allow.matched_rule_id.as_deref(), Some("admins-list-graphs"));
+
+        // Different actor, same policy → deny.
+        let deny = engine
+            .authorize(
+                "act-bruno",
+                &PolicyRequest {
+                    action: PolicyAction::GraphList,
+                    branch: None,
+                    target_branch: None,
+                },
+            )
+            .unwrap();
+        assert!(!deny.allowed);
+    }
+
+    #[test]
+    fn server_scoped_rule_cannot_use_branch_scope() {
+        let policy: PolicyConfig = serde_yaml::from_str(
+            r#"
+version: 1
+groups:
+  admins: [act-andrew]
+rules:
+  - id: bad-branch-scope-on-graph-list
+    allow:
+      actors: { group: admins }
+      actions: [graph_list]
+      branch_scope: any
+"#,
+        )
+        .unwrap();
+        let err = policy.validate().unwrap_err();
+        let msg = err.to_string();
+        assert!(
+            msg.contains("branch_scope") || msg.contains("server-scoped"),
+            "expected branch_scope rejection for server-scoped action; got: {msg}"
+        );
+    }
+
+    #[test]
+    fn rule_mixing_server_and_per_graph_actions_is_rejected() {
+        // A single rule must reference exactly one resource kind.
+        // `graph_list` (Server) + `read` (Graph) in one allow block
+        // is invalid — operators must split the rule.
+        let policy: PolicyConfig = serde_yaml::from_str(
+            r#"
+version: 1
+groups:
+  admins: [act-andrew]
+rules:
+  - id: mixed-resource-kinds
+    allow:
+      actors: { group: admins }
+      actions: [graph_list, read]
+"#,
+        )
+        .unwrap();
+        let err = policy.validate().unwrap_err();
+        let msg = err.to_string();
+        assert!(
+            msg.contains("server-scoped") || msg.contains("split into separate rules"),
+            "expected mix-resource-kinds rejection; got: {msg}"
+        );
+    }
+
+    #[test]
+    fn per_graph_rules_continue_to_work_alongside_server_rules() {
+        // Decision 6 contract: existing operator policies (which only
+        // reference per-graph actions) keep compiling and authorizing
+        // as before, even when the compiled-in schema now declares
+        // `Server` + `graph_*` actions. This pins the "Cedar refactor
+        // is operator-invisible" promise.
+        let policy: PolicyConfig = serde_yaml::from_str(
+            r#"
+version: 1
+groups:
+  team: [act-andrew]
+protected_branches: [main]
+rules:
+  - id: team-read
+    allow:
+      actors: { group: team }
+      actions: [read, export]
+      branch_scope: any
+"#,
+        )
+        .unwrap();
+        let engine = PolicyCompiler::compile(&policy, "graph").unwrap();
+        let allow = engine
+            .authorize(
+                "act-andrew",
+                &PolicyRequest {
+                    action: PolicyAction::Read,
+                    branch: Some("main".to_string()),
+                    target_branch: None,
+                },
+            )
+            .unwrap();
+        assert!(allow.allowed);
+        assert_eq!(allow.matched_rule_id.as_deref(), Some("team-read"));
+    }
+
+    // ─── MR-668 follow-up — load_graph / load_server kind alignment ─
+
+    /// A per-graph policy file containing a `graph_list` rule fails
+    /// at load time. Pre-fix, the file compiled cleanly and the rule
+    /// silently never matched (per-graph engine never gets a
+    /// `graph_list` check). Closes the "wrong action, wrong file,
+    /// silent no-op" class.
+    #[test]
+    fn load_graph_rejects_server_scoped_action() {
+        let dir = tempfile::tempdir().unwrap();
+        let path = dir.path().join("bad-graph-policy.yaml");
+        std::fs::write(
+            &path,
+            r#"
+version: 1
+groups:
+  admins: [act-andrew]
+rules:
+  - id: misplaced-graph-list
+    allow:
+      actors: { group: admins }
+      actions: [graph_list]
+"#,
+        )
+        .unwrap();
+        let err = match PolicyEngine::load_graph(&path, "g1") {
+            Ok(_) => panic!("expected server-scoped action in per-graph file to be rejected"),
+            Err(e) => e,
+        };
+        let msg = err.to_string();
+        assert!(
+            msg.contains("server-scoped") && msg.contains("graph_list"),
+            "expected server-scoped-in-graph-file rejection, got: {msg}"
+        );
+    }
+
+    /// A server policy file containing a `read` rule fails at load
+    /// time. Pre-fix, the file compiled cleanly and the rule silently
+    /// never matched (server engine never gets a `read` check).
+    #[test]
+    fn load_server_rejects_per_graph_action() {
+        let dir = tempfile::tempdir().unwrap();
+        let path = dir.path().join("bad-server-policy.yaml");
+        std::fs::write(
+            &path,
+            r#"
+version: 1
+groups:
+  team: [act-andrew]
+rules:
+  - id: misplaced-read
+    allow:
+      actors: { group: team }
+      actions: [read]
+      branch_scope: any
+"#,
+        )
+        .unwrap();
+        let err = match PolicyEngine::load_server(&path) {
+            Ok(_) => panic!("expected per-graph action in server file to be rejected"),
+            Err(e) => e,
+        };
+        let msg = err.to_string();
+        assert!(
+            msg.contains("per-graph") && msg.contains("read"),
+            "expected per-graph-in-server-file rejection, got: {msg}"
+        );
+    }
+
+    /// Positive case: a properly-shaped per-graph policy loads via
+    /// `load_graph` and authorizes as expected. Verifies the
+    /// kind-alignment check is permissive when the file is correct.
+    #[test]
+    fn load_graph_accepts_per_graph_only_policy() {
+        let dir = tempfile::tempdir().unwrap();
+        let path = dir.path().join("ok-graph-policy.yaml");
+        std::fs::write(
+            &path,
+            r#"
+version: 1
+groups:
+  team: [act-andrew]
+rules:
+  - id: team-read
+    allow:
+      actors: { group: team }
+      actions: [read]
+      branch_scope: any
+"#,
+        )
+        .unwrap();
+        let engine = PolicyEngine::load_graph(&path, "g1").unwrap();
+        let decision = engine
+            .authorize(
+                "act-andrew",
+                &PolicyRequest {
+                    action: PolicyAction::Read,
+                    branch: Some("main".to_string()),
+                    target_branch: None,
+                },
+            )
+            .unwrap();
+        assert!(decision.allowed);
+    }
+
+    /// Positive case: a properly-shaped server policy loads via
+    /// `load_server` and authorizes the `graph_list` action.
+    #[test]
+    fn load_server_accepts_server_only_policy() {
+        let dir = tempfile::tempdir().unwrap();
+        let path = dir.path().join("ok-server-policy.yaml");
+        std::fs::write(
+            &path,
+            r#"
+version: 1
+groups:
+  admins: [act-andrew]
+rules:
+  - id: admins-list-graphs
+    allow:
+      actors: { group: admins }
+      actions: [graph_list]
+"#,
+        )
+        .unwrap();
+        let engine = PolicyEngine::load_server(&path).unwrap();
+        let decision = engine
+            .authorize(
+                "act-andrew",
+                &PolicyRequest {
+                    action: PolicyAction::GraphList,
+                    branch: None,
+                    target_branch: None,
+                },
+            )
+            .unwrap();
+        assert!(decision.allowed);
+    }
 }
--- a/crates/omnigraph-server/Cargo.toml
+++ b/crates/omnigraph-server/Cargo.toml
@ -37,7 +37,10 @@ futures = { workspace = true }
 sha2 = { workspace = true }
 subtle = { workspace = true }
 async-trait = { workspace = true }
+arc-swap = { workspace = true }
 dashmap = "6"
+regex = { workspace = true }
+thiserror = { workspace = true }
 aws-config = { version = "1", optional = true, default-features = false, features = ["rustls", "rt-tokio", "credentials-process", "sso"] }
 aws-sdk-secretsmanager = { version = "1", optional = true, default-features = false, features = ["rustls", "rt-tokio"] }

--- a/crates/omnigraph-server/src/api.rs
+++ b/crates/omnigraph-server/src/api.rs
@ -235,7 +235,9 @@ pub struct CommitListOutput {
 pub struct ReadRequest {
    /// GQ query source. May declare one or more named queries; pick one with
    /// `query_name` if there is more than one.
-    #[schema(example = "query get_person($name: String) {\n    match {\n        $p: Person { name: $name }\n    }\n    return { $p.name, $p.age }\n}")]
+    #[schema(
+        example = "query get_person($name: String) {\n    match {\n        $p: Person { name: $name }\n    }\n    return { $p.name, $p.age }\n}"
+    )]
    pub query_source: String,
    /// Name of the query to run when `query_source` declares multiple. Optional
    /// when only one query is declared.
@ -252,7 +254,9 @@ pub struct ReadRequest {
 pub struct ChangeRequest {
    /// GQ mutation source containing `insert`, `update`, or `delete` statements.
    /// May declare multiple named mutations; pick one with `query_name`.
-    #[schema(example = "query insert_person($name: String, $age: I32) {\n    insert Person { name: $name, age: $age }\n}")]
+    #[schema(
+        example = "query insert_person($name: String, $age: I32) {\n    insert Person { name: $name, age: $age }\n}"
+    )]
    pub query_source: String,
    /// Name of the mutation to run when `query_source` declares multiple.
    pub query_name: Option<String>,
@ -266,7 +270,9 @@ pub struct ChangeRequest {
 pub struct SchemaApplyRequest {
    /// Project schema in `.pg` source form. The diff against the current
    /// schema produces the migration steps that will be applied.
-    #[schema(example = "node Person {\n    name: String @key\n    age: I32?\n}\n\nedge Knows: Person -> Person")]
+    #[schema(
+        example = "node Person {\n    name: String @key\n    age: I32?\n}\n\nedge Knows: Person -> Person"
+    )]
    pub schema_source: String,
    /// When true, promote every `DropMode::Soft` step in the plan to
    /// `DropMode::Hard`, making the prior column data unreachable
@ -303,7 +309,9 @@ pub struct IngestRequest {
    pub mode: Option<LoadMode>,
    /// NDJSON payload: one record per line, each shaped
    /// `{"type": "<TypeName>", "data": {...}}`.
-    #[schema(example = "{\"type\": \"Person\", \"data\": {\"name\": \"Alice\", \"age\": 30}}\n{\"type\": \"Person\", \"data\": {\"name\": \"Bob\", \"age\": 25}}")]
+    #[schema(
+        example = "{\"type\": \"Person\", \"data\": {\"name\": \"Alice\", \"age\": 30}}\n{\"type\": \"Person\", \"data\": {\"name\": \"Bob\", \"age\": 25}}"
+    )]
    pub data: String,
 }

@ -344,6 +352,11 @@ pub enum ErrorCode {
    Forbidden,
    BadRequest,
    NotFound,
+    /// 405 Method Not Allowed — the route exists but the active server
+    /// mode doesn't serve this method (e.g. `GET /graphs` in single-graph
+    /// mode). Distinct from 404 so clients can tell "wrong context" from
+    /// "no such resource."
+    MethodNotAllowed,
    Conflict,
    /// 429 Too Many Requests — per-actor admission cap exceeded.
    /// Clients should respect the `Retry-After` header.
@ -467,3 +480,23 @@ pub fn read_target_output(target: &ReadTarget) -> ReadTargetOutput {
        },
    }
 }
+
+// ─── MR-668 — management endpoint shapes ──────────────────────────────────
+
+/// One entry in the response from `GET /graphs`. Cluster operators
+/// consume this list to discover which graphs the server is currently
+/// serving. The shape is intentionally minimal — `graph_id` and `uri`
+/// are the only fields a routing client needs.
+#[derive(Debug, Clone, Serialize, Deserialize, ToSchema)]
+pub struct GraphInfo {
+    pub graph_id: String,
+    pub uri: String,
+}
+
+/// Response from `GET /graphs`. Lists every graph registered with the
+/// server in alphabetical order by `graph_id` (sorted server-side so
+/// clients get deterministic output across requests).
+#[derive(Debug, Clone, Serialize, Deserialize, ToSchema)]
+pub struct GraphListResponse {
+    pub graphs: Vec<GraphInfo>,
+}
--- a/crates/omnigraph-server/src/auth.rs
+++ b/crates/omnigraph-server/src/auth.rs
@ -119,7 +119,10 @@ pub(crate) fn parse_json_secret_payload(payload: &str) -> Result<Vec<(String, St
            bail!("bearer-token secret contains a blank actor id");
        }
        if token.is_empty() {
-            bail!("bearer-token secret has a blank token for actor '{}'", actor);
+            bail!(
+                "bearer-token secret has a blank token for actor '{}'",
+                actor
+            );
        }
        pairs.push((actor, token));
    }
@ -151,8 +154,7 @@ pub mod aws {
        /// Construct a new source. Resolves AWS credentials + region via the
        /// default chain — no explicit configuration needed on EC2/ECS/EKS.
        pub async fn new(secret_id: impl Into<String>) -> Result<Self> {
-            let config =
-                aws_config::load_defaults(aws_config::BehaviorVersion::latest()).await;
+            let config = aws_config::load_defaults(aws_config::BehaviorVersion::latest()).await;
            let client = aws_sdk_secretsmanager::Client::new(&config);
            Ok(Self {
                client,
@ -200,8 +202,8 @@ pub use aws::SecretsManagerTokenSource;
 #[cfg(test)]
 mod tests {
    use super::*;
-    use std::env;
    use serial_test::serial;
+    use std::env;

    fn clear_env() {
        unsafe {
@ -232,7 +234,10 @@ mod tests {
        unsafe {
            env::remove_var("OMNIGRAPH_SERVER_BEARER_TOKEN");
        }
-        assert_eq!(tokens, vec![("default".to_string(), "some-token".to_string())]);
+        assert_eq!(
+            tokens,
+            vec![("default".to_string(), "some-token".to_string())]
+        );
    }

    #[tokio::test]
--- a/crates/omnigraph-server/src/config.rs
+++ b/crates/omnigraph-server/src/config.rs
@ -6,6 +6,7 @@ use std::path::{Path, PathBuf};
 use clap::ValueEnum;
 use color_eyre::eyre::{Result, bail};
 use serde::{Deserialize, Serialize};
+
 pub const DEFAULT_CONFIG_FILE: &str = "omnigraph.yaml";

 #[derive(Debug, Clone, Default, Serialize, Deserialize)]
@ -17,6 +18,12 @@ pub struct ProjectConfig {
 pub struct TargetConfig {
    pub uri: String,
    pub bearer_token_env: Option<String>,
+    /// Per-graph Cedar policy file (MR-668). In single-graph mode this
+    /// field is unused — the top-level `policy.file` applies. In
+    /// multi-graph mode, each `graphs.<id>.policy.file` governs that
+    /// graph's HTTP-layer Cedar enforcement.
+    #[serde(default)]
+    pub policy: PolicySettings,
 }

 #[derive(Debug, Clone, Copy, Default, Eq, PartialEq, Serialize, Deserialize, ValueEnum)]
@ -59,6 +66,12 @@ pub struct ServerDefaults {
    #[serde(rename = "graph")]
    pub graph: Option<String>,
    pub bind: Option<String>,
+    /// Server-level Cedar policy (MR-668). Governs management endpoints
+    /// — currently `GET /graphs`; future runtime add/remove endpoints
+    /// will plug in here too. In single-graph mode this is unused — the
+    /// top-level `policy.file` covers the single graph.
+    #[serde(default)]
+    pub policy: PolicySettings,
 }

 #[derive(Debug, Clone, Default, Serialize, Deserialize)]
@ -197,23 +210,46 @@ impl OmnigraphConfig {
    }

    pub fn resolve_auth_env_file(&self) -> Option<PathBuf> {
-        let path = self.auth.env_file.as_deref()?;
-        let path = Path::new(path);
-        Some(if path.is_absolute() {
-            path.to_path_buf()
-        } else {
-            self.base_dir.join(path)
-        })
+        self.auth
+            .env_file
+            .as_deref()
+            .map(|path| self.resolve_config_path(path))
    }

    pub fn resolve_policy_file(&self) -> Option<PathBuf> {
-        let path = self.policy.file.as_deref()?;
-        let path = Path::new(path);
-        Some(if path.is_absolute() {
-            path.to_path_buf()
-        } else {
-            self.base_dir.join(path)
-        })
+        self.policy
+            .file
+            .as_deref()
+            .map(|path| self.resolve_config_path(path))
+    }
+
+    /// Resolve the per-graph policy file path for the named target,
+    /// relative to the config file's `base_dir`. Returns `None` if the
+    /// target is unknown or no per-graph `policy.file` is set.
+    pub fn resolve_target_policy_file(&self, target_name: &str) -> Option<PathBuf> {
+        let target = self.graphs.get(target_name)?;
+        target
+            .policy
+            .file
+            .as_deref()
+            .map(|path| self.resolve_config_path(path))
+    }
+
+    /// Resolve the server-level policy file path (used by management
+    /// endpoints). Returns `None` if `server.policy.file` is not set.
+    pub fn resolve_server_policy_file(&self) -> Option<PathBuf> {
+        self.server
+            .policy
+            .file
+            .as_deref()
+            .map(|path| self.resolve_config_path(path))
+    }
+
+    /// Resolve a raw config-supplied URI (which may be relative) to its
+    /// absolute form. URIs containing `://` are passed through as-is;
+    /// relative paths are joined with the config file's `base_dir`.
+    pub fn resolve_uri_value(&self, value: &str) -> String {
+        self.resolve_config_uri(value)
    }

    pub fn resolve_policy_tests_file(&self) -> Option<PathBuf> {
@ -282,6 +318,15 @@ impl OmnigraphConfig {
            self.base_dir.join(path).to_string_lossy().to_string()
        }
    }
+
+    fn resolve_config_path(&self, value: &str) -> PathBuf {
+        let path = Path::new(value);
+        if path.is_absolute() {
+            path.to_path_buf()
+        } else {
+            self.base_dir.join(path)
+        }
+    }
 }

 pub fn default_config_path() -> PathBuf {
--- a/crates/omnigraph-server/src/graph_id.rs
+++ b/crates/omnigraph-server/src/graph_id.rs
@ -0,0 +1,254 @@
+//! `GraphId` — registry-level identity for a graph in multi-graph mode (MR-668).
+//!
+//! Validation lives in `GraphId::try_from(String)`; nothing else can construct a
+//! `GraphId`. The newtype prevents `graph_id` strings from escaping the storage
+//! root via path traversal or colliding with engine-reserved filenames.
+//!
+//! Regex: `^[a-zA-Z0-9-]{1,64}$`
+//!
+//! The engine reserves every filename starting with `_` at the graph root
+//! (`_schema.pg`, `_schema.ir.json`, `__schema_state.json`, `__manifest/`,
+//! `__recovery/`, etc.). Disallowing leading underscores at the regex level
+//! means a `graph_id` can never collide with engine-managed files. Path
+//! traversal (`..`, `/`) is unrepresentable.
+//!
+//! `policies` is additionally reserved as a future-proofing measure for a
+//! potential `/graphs/policies/...` cluster route.
+
+use std::fmt;
+use std::sync::OnceLock;
+
+use color_eyre::eyre::{Result, bail};
+use regex::Regex;
+use serde::{Deserialize, Serialize};
+
+/// Maximum length of a `GraphId` value.
+pub const GRAPH_ID_MAX_LEN: usize = 64;
+
+/// Validated registry-level identity for a graph.
+///
+/// Constructed only via `GraphId::try_from(String)` or
+/// `GraphId::try_from(&str)`. The inner `String` is private to enforce the
+/// validation contract.
+#[derive(Debug, Clone, Eq, PartialEq, Hash, Serialize)]
+#[serde(transparent)]
+pub struct GraphId(String);
+
+impl GraphId {
+    /// View the validated identifier as `&str`.
+    pub fn as_str(&self) -> &str {
+        &self.0
+    }
+}
+
+impl fmt::Display for GraphId {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        f.write_str(&self.0)
+    }
+}
+
+impl AsRef<str> for GraphId {
+    fn as_ref(&self) -> &str {
+        &self.0
+    }
+}
+
+impl TryFrom<String> for GraphId {
+    type Error = color_eyre::eyre::Error;
+
+    fn try_from(value: String) -> Result<Self> {
+        validate(value.as_str())?;
+        Ok(Self(value))
+    }
+}
+
+impl TryFrom<&str> for GraphId {
+    type Error = color_eyre::eyre::Error;
+
+    fn try_from(value: &str) -> Result<Self> {
+        validate(value)?;
+        Ok(Self(value.to_string()))
+    }
+}
+
+// Custom Deserialize that re-runs validation. Otherwise a serde-derived impl
+// would accept any String, defeating the newtype's guarantee.
+impl<'de> Deserialize<'de> for GraphId {
+    fn deserialize<D>(deserializer: D) -> std::result::Result<Self, D::Error>
+    where
+        D: serde::Deserializer<'de>,
+    {
+        let s = String::deserialize(deserializer)?;
+        Self::try_from(s).map_err(serde::de::Error::custom)
+    }
+}
+
+fn validate(value: &str) -> Result<()> {
+    if value.is_empty() {
+        bail!("graph_id must not be empty");
+    }
+    if value.len() > GRAPH_ID_MAX_LEN {
+        bail!(
+            "graph_id '{}' is {} chars; max {}",
+            value,
+            value.len(),
+            GRAPH_ID_MAX_LEN
+        );
+    }
+    if !regex().is_match(value) {
+        bail!(
+            "graph_id '{}' must match ^[a-zA-Z0-9-]{{1,64}}$ — \
+             no underscores (engine reserves them), no path separators, no unicode",
+            value
+        );
+    }
+    if is_reserved(value) {
+        bail!(
+            "graph_id '{}' is reserved (would collide with engine-managed names or \
+             future cluster routes)",
+            value
+        );
+    }
+    Ok(())
+}
+
+fn regex() -> &'static Regex {
+    static RE: OnceLock<Regex> = OnceLock::new();
+    RE.get_or_init(|| Regex::new(r"^[a-zA-Z0-9-]{1,64}$").expect("regex literal"))
+}
+
+/// Reserved `graph_id` values that the regex alone wouldn't catch.
+/// The leading-underscore rule already excludes every engine-managed
+/// filename pattern (`_schema.pg`, `__manifest`, etc.); the regex
+/// `^[a-zA-Z0-9-]{1,64}$` (see `regex()`) additionally rejects every
+/// dot-containing name structurally — `openapi.json` and friends
+/// never reach this check.
+///
+/// This list only needs to cover route-prefix collisions and
+/// top-level endpoint names whose spellings DO satisfy the regex
+/// (no dots, no underscores).
+fn is_reserved(value: &str) -> bool {
+    matches!(value, "policies" | "healthz" | "openapi" | "graphs")
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn accepts_simple_alphanumeric_ids() {
+        for ok in ["alpha", "beta", "tenant-001", "A", "g", "X-9-z"] {
+            GraphId::try_from(ok).unwrap_or_else(|_| panic!("expected accept: {ok}"));
+        }
+    }
+
+    #[test]
+    fn accepts_64_char_max() {
+        let max = "a".repeat(64);
+        GraphId::try_from(max.as_str()).unwrap();
+    }
+
+    #[test]
+    fn rejects_empty() {
+        assert!(GraphId::try_from("").is_err());
+    }
+
+    #[test]
+    fn rejects_over_64_chars() {
+        let too_long = "a".repeat(65);
+        assert!(GraphId::try_from(too_long.as_str()).is_err());
+    }
+
+    #[test]
+    fn rejects_leading_underscore() {
+        // Engine reserves every `_*` filename at the graph root.
+        assert!(GraphId::try_from("_internal").is_err());
+        assert!(GraphId::try_from("__manifest").is_err());
+    }
+
+    #[test]
+    fn rejects_underscores_anywhere() {
+        // The regex doesn't allow `_` at all — keeps the disallow-leading-`_`
+        // rule cheap to enforce. If the rule changes later, we'd need to
+        // distinguish "starts with `_`" from "contains `_`".
+        assert!(GraphId::try_from("tenant_alpha").is_err());
+    }
+
+    #[test]
+    fn rejects_path_separators() {
+        for bad in ["alpha/beta", "../etc", "..", "alpha\\beta"] {
+            assert!(GraphId::try_from(bad).is_err(), "expected reject: {bad}");
+        }
+    }
+
+    #[test]
+    fn rejects_unicode() {
+        assert!(GraphId::try_from("αlpha").is_err());
+        assert!(GraphId::try_from("graph-✨").is_err());
+    }
+
+    #[test]
+    fn rejects_whitespace() {
+        assert!(GraphId::try_from(" alpha").is_err());
+        assert!(GraphId::try_from("alpha ").is_err());
+        assert!(GraphId::try_from("alpha beta").is_err());
+        assert!(GraphId::try_from("\talpha").is_err());
+    }
+
+    #[test]
+    fn rejects_dots() {
+        // Reserves the "extension"-shaped ids that look like filenames.
+        assert!(GraphId::try_from(".").is_err());
+        assert!(GraphId::try_from("alpha.beta").is_err());
+        assert!(GraphId::try_from("alpha.").is_err());
+    }
+
+    #[test]
+    fn rejects_reserved_route_names() {
+        // Names that satisfy the regex but are still reserved because
+        // they'd collide with top-level route prefixes / endpoint names.
+        // Dot-containing names (e.g. `openapi.json`) are rejected by the
+        // regex, not this list — `rejects_dots` above covers them.
+        for bad in ["policies", "healthz", "openapi", "graphs"] {
+            assert!(
+                GraphId::try_from(bad).is_err(),
+                "expected reject (reserved): {bad}"
+            );
+        }
+    }
+
+    #[test]
+    fn display_returns_inner_string() {
+        let id = GraphId::try_from("alpha").unwrap();
+        assert_eq!(format!("{id}"), "alpha");
+        assert_eq!(id.as_str(), "alpha");
+    }
+
+    #[test]
+    fn serialize_round_trips_via_json() {
+        let id = GraphId::try_from("tenant-007").unwrap();
+        let json = serde_json::to_string(&id).unwrap();
+        assert_eq!(json, "\"tenant-007\"");
+        let back: GraphId = serde_json::from_str(&json).unwrap();
+        assert_eq!(back, id);
+    }
+
+    #[test]
+    fn deserialize_runs_validation() {
+        // Hostile payload must not produce a GraphId.
+        let bad = serde_json::from_str::<GraphId>("\"_evil\"");
+        assert!(bad.is_err());
+        let bad = serde_json::from_str::<GraphId>("\"../../etc\"");
+        assert!(bad.is_err());
+    }
+
+    #[test]
+    fn hash_equality_works_for_use_as_map_key() {
+        use std::collections::HashMap;
+        let a = GraphId::try_from("alpha").unwrap();
+        let b = GraphId::try_from("alpha").unwrap();
+        let mut m = HashMap::new();
+        m.insert(a, 1u32);
+        assert_eq!(m.get(&b), Some(&1));
+    }
+}
--- a/crates/omnigraph-server/src/identity.rs
+++ b/crates/omnigraph-server/src/identity.rs
@ -0,0 +1,308 @@
+//! Identity types for the multi-graph server (MR-668) + forward-compatible
+//! shapes for Cloud mode (RFC 0003) and OAuth provider (RFC 0004).
+//!
+//! Per decision 13 in the implementation plan: ship the type shapes that
+//! Cloud mode will consume, without committing to any trait shape
+//! (`TokenVerifier` stays draft in RFC 0001). Every Cluster-mode call site
+//! constructs these types with their Cluster-mode-specific values:
+//!
+//! - `tenant_id: None` (Cloud will set `Some(...)` from the OAuth `org_id` claim)
+//! - `scopes: vec![Scope::Full]` (Cloud will populate from the OAuth `scope` claim)
+//! - `source: AuthSource::Static` (Cloud / OIDC will set `AuthSource::Oidc`)
+//!
+//! The enums use `#[non_exhaustive]` so RFC 0001 step 1 / RFC 0004 can
+//! add variants without breaking exhaustive matches in callers.
+
+use std::fmt;
+use std::sync::Arc;
+use std::sync::OnceLock;
+
+use color_eyre::eyre::{Result, bail};
+use regex::Regex;
+use serde::{Deserialize, Serialize};
+
+use crate::graph_id::GraphId;
+
+/// Maximum length of a `TenantId` value.
+pub const TENANT_ID_MAX_LEN: usize = 64;
+
+/// Cloud-mode tenant identifier. Validated with the same regex as
+/// `GraphId` so the two interchange syntactically.
+///
+/// `None` in Cluster mode; Cloud mode (RFC 0003) sets `Some(...)` from
+/// the OAuth `org_id` claim. Constructed only via `try_from` so callers
+/// cannot bypass validation.
+#[derive(Debug, Clone, Eq, PartialEq, Hash, Serialize)]
+#[serde(transparent)]
+pub struct TenantId(String);
+
+impl TenantId {
+    pub fn as_str(&self) -> &str {
+        &self.0
+    }
+}
+
+impl fmt::Display for TenantId {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        f.write_str(&self.0)
+    }
+}
+
+impl AsRef<str> for TenantId {
+    fn as_ref(&self) -> &str {
+        &self.0
+    }
+}
+
+impl TryFrom<String> for TenantId {
+    type Error = color_eyre::eyre::Error;
+
+    fn try_from(value: String) -> Result<Self> {
+        validate_tenant_id(value.as_str())?;
+        Ok(Self(value))
+    }
+}
+
+impl TryFrom<&str> for TenantId {
+    type Error = color_eyre::eyre::Error;
+
+    fn try_from(value: &str) -> Result<Self> {
+        validate_tenant_id(value)?;
+        Ok(Self(value.to_string()))
+    }
+}
+
+impl<'de> Deserialize<'de> for TenantId {
+    fn deserialize<D>(deserializer: D) -> std::result::Result<Self, D::Error>
+    where
+        D: serde::Deserializer<'de>,
+    {
+        let s = String::deserialize(deserializer)?;
+        Self::try_from(s).map_err(serde::de::Error::custom)
+    }
+}
+
+fn validate_tenant_id(value: &str) -> Result<()> {
+    if value.is_empty() {
+        bail!("tenant_id must not be empty");
+    }
+    if value.len() > TENANT_ID_MAX_LEN {
+        bail!(
+            "tenant_id '{}' is {} chars; max {}",
+            value,
+            value.len(),
+            TENANT_ID_MAX_LEN
+        );
+    }
+    if !tenant_id_regex().is_match(value) {
+        bail!("tenant_id '{}' must match ^[a-zA-Z0-9-]{{1,64}}$", value);
+    }
+    Ok(())
+}
+
+fn tenant_id_regex() -> &'static Regex {
+    static RE: OnceLock<Regex> = OnceLock::new();
+    RE.get_or_init(|| Regex::new(r"^[a-zA-Z0-9-]{1,64}$").expect("regex literal"))
+}
+
+/// Registry HashMap key. Cluster mode populates `tenant_id: None`;
+/// Cloud mode (RFC 0003) populates `tenant_id: Some(...)`.
+///
+/// The `Option<TenantId>` field is the **single forward-compatibility seam**
+/// between Cluster and Cloud modes. Every handler reaches the engine via
+/// `state.registry.get(&key)` — the key shape stays stable, so handlers
+/// don't get re-touched when Cloud mode lands.
+#[derive(Debug, Clone, Eq, PartialEq, Hash)]
+pub struct GraphKey {
+    pub tenant_id: Option<TenantId>,
+    pub graph_id: GraphId,
+}
+
+impl GraphKey {
+    /// Cluster-mode constructor (`tenant_id: None`).
+    pub fn cluster(graph_id: GraphId) -> Self {
+        Self {
+            tenant_id: None,
+            graph_id,
+        }
+    }
+
+    /// Cloud-mode constructor — reserved for RFC 0003; included here so
+    /// the seam is visible even though no Cluster-mode code path calls it.
+    pub fn cloud(tenant_id: TenantId, graph_id: GraphId) -> Self {
+        Self {
+            tenant_id: Some(tenant_id),
+            graph_id,
+        }
+    }
+}
+
+impl fmt::Display for GraphKey {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        match &self.tenant_id {
+            Some(t) => write!(f, "{}/{}", t, self.graph_id),
+            None => write!(f, "{}", self.graph_id),
+        }
+    }
+}
+
+/// Authorization scope. Cluster mode: every authenticated actor gets
+/// `Scope::Full`. Cloud mode (RFC 0004) adds OAuth-style scopes via the
+/// dashboard-configured `graph:read`, `graph:write`, `graph:admin`,
+/// `graph:*` set; those become additional variants here.
+///
+/// `#[non_exhaustive]` so RFC 0004 can extend without breaking matches.
+#[derive(Debug, Clone, Copy, Eq, PartialEq, Hash)]
+#[non_exhaustive]
+pub enum Scope {
+    /// Full access. The Cluster-mode default — every authenticated actor
+    /// has unrestricted access subject to Cedar policy.
+    Full,
+}
+
+/// How the actor was authenticated. Cluster mode: every actor authenticates
+/// via the existing SHA-256 hash compare against a static token set, so
+/// `AuthSource::Static`. RFC 0001 step 1 adds `AuthSource::Oidc` when the
+/// `OidcJwtVerifier` ships.
+///
+/// `#[non_exhaustive]` so RFC 0001 can extend without breaking matches.
+#[derive(Debug, Clone, Copy, Eq, PartialEq, Hash)]
+#[non_exhaustive]
+pub enum AuthSource {
+    /// Authenticated via the static bearer-token hash table.
+    Static,
+}
+
+/// Server-resolved actor identity. Replaces the previous
+/// `AuthenticatedActor(Arc<str>)` from `lib.rs`.
+///
+/// The fields are populated by `authenticate_bearer_token` after a successful
+/// constant-time hash match. **Clients cannot set any of these fields directly**
+/// — this is the MR-731 invariant. See `authorize_request` in `lib.rs` for the
+/// chokepoint that overwrites any client-supplied actor identity.
+///
+/// Cluster mode constructs this with `tenant_id: None`, `scopes: vec![Scope::Full]`,
+/// `source: AuthSource::Static` via the convenience constructor below.
+#[derive(Debug, Clone)]
+pub struct ResolvedActor {
+    pub actor_id: Arc<str>,
+    pub tenant_id: Option<TenantId>,
+    pub scopes: Vec<Scope>,
+    pub source: AuthSource,
+}
+
+impl ResolvedActor {
+    /// Cluster-mode constructor — Static auth, no tenant, Full scope.
+    /// Used by `authenticate_bearer_token` after a successful hash match.
+    pub fn cluster_static(actor_id: Arc<str>) -> Self {
+        Self {
+            actor_id,
+            tenant_id: None,
+            scopes: vec![Scope::Full],
+            source: AuthSource::Static,
+        }
+    }
+
+    /// View the actor identifier as `&str`. Stable across the Cluster/Cloud
+    /// boundary — Cedar always sees this value as the principal.
+    pub fn actor_id_str(&self) -> &str {
+        &self.actor_id
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn tenant_id_accepts_simple_values() {
+        for ok in ["alpha", "tenant-001", "X", "01HZWA0KT0H0V0V0V0V0V0V0V0"] {
+            TenantId::try_from(ok).unwrap_or_else(|_| panic!("expected accept: {ok}"));
+        }
+    }
+
+    #[test]
+    fn tenant_id_rejects_empty_and_over_max() {
+        assert!(TenantId::try_from("").is_err());
+        let too_long = "a".repeat(65);
+        assert!(TenantId::try_from(too_long.as_str()).is_err());
+    }
+
+    #[test]
+    fn tenant_id_rejects_path_traversal() {
+        assert!(TenantId::try_from("../etc").is_err());
+        assert!(TenantId::try_from("alpha/beta").is_err());
+    }
+
+    #[test]
+    fn tenant_id_deserialize_runs_validation() {
+        let bad: Result<TenantId, _> = serde_json::from_str("\"../evil\"");
+        assert!(bad.is_err());
+    }
+
+    #[test]
+    fn graph_key_cluster_constructor_sets_no_tenant() {
+        let id = GraphId::try_from("alpha").unwrap();
+        let key = GraphKey::cluster(id.clone());
+        assert!(key.tenant_id.is_none());
+        assert_eq!(key.graph_id, id);
+    }
+
+    #[test]
+    fn graph_key_cloud_constructor_sets_tenant() {
+        let tenant = TenantId::try_from("acme").unwrap();
+        let id = GraphId::try_from("alpha").unwrap();
+        let key = GraphKey::cloud(tenant.clone(), id.clone());
+        assert_eq!(key.tenant_id.as_ref(), Some(&tenant));
+        assert_eq!(key.graph_id, id);
+    }
+
+    #[test]
+    fn graph_key_displays_with_or_without_tenant() {
+        let id = GraphId::try_from("alpha").unwrap();
+        let cluster_key = GraphKey::cluster(id.clone());
+        assert_eq!(format!("{cluster_key}"), "alpha");
+
+        let tenant = TenantId::try_from("acme").unwrap();
+        let cloud_key = GraphKey::cloud(tenant, id);
+        assert_eq!(format!("{cloud_key}"), "acme/alpha");
+    }
+
+    #[test]
+    fn graph_key_is_hashable_for_map_use() {
+        use std::collections::HashMap;
+        let a = GraphKey::cluster(GraphId::try_from("alpha").unwrap());
+        let b = GraphKey::cluster(GraphId::try_from("alpha").unwrap());
+        let mut m: HashMap<GraphKey, u32> = HashMap::new();
+        m.insert(a, 1);
+        assert_eq!(m.get(&b), Some(&1));
+    }
+
+    #[test]
+    fn graph_key_distinguishes_tenants() {
+        let id = GraphId::try_from("alpha").unwrap();
+        let t1 = TenantId::try_from("acme").unwrap();
+        let t2 = TenantId::try_from("globex").unwrap();
+        let k1 = GraphKey::cloud(t1, id.clone());
+        let k2 = GraphKey::cloud(t2, id);
+        assert_ne!(k1, k2);
+    }
+
+    #[test]
+    fn resolved_actor_cluster_defaults() {
+        let actor = ResolvedActor::cluster_static(Arc::<str>::from("act-alice"));
+        assert_eq!(actor.actor_id_str(), "act-alice");
+        assert!(actor.tenant_id.is_none());
+        assert_eq!(actor.scopes, vec![Scope::Full]);
+        assert_eq!(actor.source, AuthSource::Static);
+    }
+
+    #[test]
+    fn scope_and_auth_source_are_non_exhaustive() {
+        // Regression: keep the `#[non_exhaustive]` annotation. If someone
+        // removes it, this test still passes (matches are still legal); it's
+        // the cross-crate compile that catches it. Document the contract here.
+        let _scope = Scope::Full;
+        let _src = AuthSource::Static;
+    }
+}
--- a/crates/omnigraph-server/src/lib.rs
+++ b/crates/omnigraph-server/src/lib.rs
--- a/crates/omnigraph-server/src/registry.rs
+++ b/crates/omnigraph-server/src/registry.rs
@ -0,0 +1,558 @@
+//! `GraphRegistry` — the multi-graph routing substrate (MR-668).
+//!
+//! Holds the open `Arc<GraphHandle>` for every graph the server is currently
+//! serving. Lock-free reads via `ArcSwap<RegistrySnapshot>`; mutations
+//! serialize through `mutate: Mutex<()>` for read-modify-write atomicity.
+//!
+//! **Deletion is deferred** in v0.6.0 (MR-668 scope cut). The registry has
+//! no `tombstones` field, no `RegistryLookup::Tombstoned` variant, no
+//! `tombstone()` / `clear_tombstone()` methods. When `DELETE /graphs/{id}`
+//! lands in a follow-up release, those return without breaking caller
+//! signatures (`Gone` is the closest semantic — the graph is no longer
+//! in the registry).
+//!
+//! Engine instance survival across registry mutations:
+//! a request that grabbed `Arc<GraphHandle>` before a registry swap keeps
+//! the engine alive via its own `Arc` clone (see `server_export` at
+//! `lib.rs:1019-1033` for the spawn-and-clone pattern). The engine drops
+//! when the last `Arc<Omnigraph>` clone drops, regardless of the
+//! registry's current state.
+
+use std::collections::HashMap;
+use std::sync::Arc;
+
+use arc_swap::ArcSwap;
+use omnigraph::db::Omnigraph;
+use omnigraph::storage::normalize_root_uri;
+#[cfg(test)]
+use tokio::sync::Mutex;
+
+use crate::identity::GraphKey;
+use crate::policy::PolicyEngine;
+
+/// Open handle for a single graph in the registry. Cheap to clone (`Arc`-wrapped
+/// engine + policy). Cluster-mode handlers extract this via
+/// `Extension<Arc<GraphHandle>>` injected by the routing middleware.
+pub struct GraphHandle {
+    /// Registry key. In Cluster mode `key.tenant_id` is always `None`.
+    pub key: GraphKey,
+    /// The URI the engine was opened from (`s3://...` or local path).
+    /// Stable for the engine's lifetime; surfaced in responses like
+    /// `BranchCreateOutput.uri`.
+    pub uri: String,
+    /// Engine. Reads/writes go directly through `&self` methods on
+    /// `Omnigraph` (no `RwLock` — MR-686 preserved).
+    pub engine: Arc<Omnigraph>,
+    /// Per-graph Cedar policy. `None` means "no policy gate on engine-layer
+    /// `_as` writers"; the HTTP-layer `require_bearer_auth` middleware still
+    /// runs regardless.
+    pub policy: Option<Arc<PolicyEngine>>,
+}
+
+/// Immutable snapshot of the registry's current state. Replaced atomically
+/// via `ArcSwap`; readers see a consistent view of all graphs without locking.
+///
+/// Derived state (`any_per_graph_policy`) is computed at snapshot
+/// construction so request-time middleware doesn't have to walk the
+/// graph map every call. Construct only via [`RegistrySnapshot::new`]
+/// (or `Default`) so the field stays in sync with `graphs`.
+pub struct RegistrySnapshot {
+    pub graphs: HashMap<GraphKey, Arc<GraphHandle>>,
+    /// `true` iff any registered graph has a per-graph policy installed.
+    /// Used by `AppState::requires_bearer_auth` to decide whether the
+    /// auth middleware should challenge a request — a per-graph policy
+    /// implies bearer auth is required even when no server-level tokens
+    /// or policy are configured.
+    pub any_per_graph_policy: bool,
+}
+
+impl RegistrySnapshot {
+    /// Build a snapshot from a graph map, deriving cached fields.
+    /// The only construction path — direct struct-literal use elsewhere
+    /// would let derived state drift from `graphs`.
+    pub fn new(graphs: HashMap<GraphKey, Arc<GraphHandle>>) -> Self {
+        let any_per_graph_policy = graphs.values().any(|h| h.policy.is_some());
+        Self {
+            graphs,
+            any_per_graph_policy,
+        }
+    }
+}
+
+impl Default for RegistrySnapshot {
+    fn default() -> Self {
+        Self::new(HashMap::new())
+    }
+}
+
+/// Result of a registry lookup. Two-valued — `Tombstoned` deferred with DELETE.
+pub enum RegistryLookup {
+    /// Graph is open and ready to serve.
+    Ready(Arc<GraphHandle>),
+    /// Graph is not in the registry (never existed, or was unregistered in a
+    /// future release). Handlers respond with 404.
+    Gone,
+}
+
+/// Why an `insert` was rejected.
+#[derive(Debug, thiserror::Error)]
+pub enum InsertError {
+    /// Another handle already exists for this `GraphKey`. Maps to HTTP 409.
+    #[error("graph '{0}' is already registered")]
+    DuplicateKey(GraphKey),
+    /// Another handle is open against this URI. Two graphs sharing a URI
+    /// would commit through the same Lance manifest and corrupt each other.
+    /// Maps to HTTP 409.
+    #[error("URI '{0}' is already registered as another graph")]
+    DuplicateUri(String),
+    /// A handle carried an invalid graph URI. Maps to startup failure.
+    #[error("URI '{uri}' is invalid: {message}")]
+    InvalidUri { uri: String, message: String },
+}
+
+pub struct GraphRegistry {
+    snapshot: ArcSwap<RegistrySnapshot>,
+    /// Serializes runtime mutations through [`GraphRegistry::insert`].
+    /// Gated with `insert` because they share a single contract — if
+    /// the consumer goes away, so does the lock. Re-introducing one
+    /// requires re-introducing the other.
+    #[cfg(test)]
+    mutate: Mutex<()>,
+}
+
+impl GraphRegistry {
+    /// Empty registry. Used as a placeholder before startup populates it.
+    pub fn new() -> Self {
+        Self {
+            snapshot: ArcSwap::from_pointee(RegistrySnapshot::default()),
+            #[cfg(test)]
+            mutate: Mutex::new(()),
+        }
+    }
+
+    /// Build a registry from a startup-time list of open handles.
+    /// Rejects duplicate `GraphKey`s and duplicate URIs.
+    pub fn from_handles(handles: Vec<Arc<GraphHandle>>) -> Result<Self, InsertError> {
+        let mut graphs: HashMap<GraphKey, Arc<GraphHandle>> = HashMap::with_capacity(handles.len());
+        let mut seen_uris: HashMap<String, GraphKey> = HashMap::with_capacity(handles.len());
+        for handle in handles {
+            let (canonical_uri, handle) = canonicalize_handle_uri(handle)?;
+            if graphs.contains_key(&handle.key) {
+                return Err(InsertError::DuplicateKey(handle.key.clone()));
+            }
+            if seen_uris.contains_key(&canonical_uri) {
+                return Err(InsertError::DuplicateUri(handle.uri.clone()));
+            }
+            seen_uris.insert(canonical_uri, handle.key.clone());
+            graphs.insert(handle.key.clone(), handle);
+        }
+        Ok(Self {
+            snapshot: ArcSwap::from_pointee(RegistrySnapshot::new(graphs)),
+            #[cfg(test)]
+            mutate: Mutex::new(()),
+        })
+    }
+
+    /// Lock-free snapshot read. Callers that need derived state cached
+    /// on the snapshot (e.g. `any_per_graph_policy`) go through here;
+    /// callers that only need values of `graphs` should use [`list`]
+    /// or [`get`].
+    pub fn snapshot_ref(&self) -> arc_swap::Guard<Arc<RegistrySnapshot>> {
+        self.snapshot.load()
+    }
+
+    /// Lock-free read. Returns `Ready` if the graph is in the current snapshot,
+    /// `Gone` otherwise.
+    pub fn get(&self, key: &GraphKey) -> RegistryLookup {
+        let snapshot = self.snapshot.load();
+        match snapshot.graphs.get(key) {
+            Some(handle) => RegistryLookup::Ready(Arc::clone(handle)),
+            None => RegistryLookup::Gone,
+        }
+    }
+
+    /// Snapshot the full set of currently-registered handles. Ordering
+    /// matches the underlying `HashMap` iteration (intentionally
+    /// non-deterministic — callers that need a stable order sort by
+    /// `handle.key.graph_id`).
+    pub fn list(&self) -> Vec<Arc<GraphHandle>> {
+        let snapshot = self.snapshot.load();
+        snapshot.graphs.values().cloned().collect()
+    }
+
+    /// Number of registered graphs (excluding any future tombstones).
+    pub fn len(&self) -> usize {
+        self.snapshot.load().graphs.len()
+    }
+
+    pub fn is_empty(&self) -> bool {
+        self.len() == 0
+    }
+
+    /// Add a new handle. Async because the mutex is `tokio::sync::Mutex`
+    /// (a future managed-catalog flow may hold it across `.await` points
+    /// during atomic registry mutations). Rejects duplicate `GraphKey`
+    /// and duplicate `uri`.
+    ///
+    /// **Test-only surface.** No production code reaches this — startup
+    /// uses `from_handles`, and runtime add/remove is deferred. The
+    /// race-contract tests below pin the mutex linearization point so
+    /// that when a real consumer ships (managed cluster catalog), the
+    /// concurrency contract is already proven. Ungate by removing
+    /// `#[cfg(test)]` once that consumer is in scope.
+    ///
+    /// Race semantics (pinned by `concurrent_insert_same_key_exactly_one_succeeds`):
+    /// under N concurrent calls with the same key, exactly one returns
+    /// `Ok(())` and the rest return `Err(InsertError::DuplicateKey(_))`.
+    #[cfg(test)]
+    pub async fn insert(&self, handle: Arc<GraphHandle>) -> Result<(), InsertError> {
+        let _guard = self.mutate.lock().await;
+        let current = self.snapshot.load();
+        let (canonical_uri, handle) = canonicalize_handle_uri(handle)?;
+        if current.graphs.contains_key(&handle.key) {
+            return Err(InsertError::DuplicateKey(handle.key.clone()));
+        }
+        for existing in current.graphs.values() {
+            let existing_uri =
+                normalize_root_uri(&existing.uri).map_err(|err| InsertError::InvalidUri {
+                    uri: existing.uri.clone(),
+                    message: err.to_string(),
+                })?;
+            if existing_uri == canonical_uri {
+                return Err(InsertError::DuplicateUri(handle.uri.clone()));
+            }
+        }
+        let mut new_graphs = current.graphs.clone();
+        new_graphs.insert(handle.key.clone(), handle);
+        self.snapshot
+            .store(Arc::new(RegistrySnapshot::new(new_graphs)));
+        Ok(())
+    }
+}
+
+fn canonicalize_handle_uri(
+    handle: Arc<GraphHandle>,
+) -> Result<(String, Arc<GraphHandle>), InsertError> {
+    let canonical_uri = normalize_root_uri(&handle.uri).map_err(|err| InsertError::InvalidUri {
+        uri: handle.uri.clone(),
+        message: err.to_string(),
+    })?;
+    if canonical_uri == handle.uri {
+        return Ok((canonical_uri, handle));
+    }
+    let canonical_handle = Arc::new(GraphHandle {
+        key: handle.key.clone(),
+        uri: canonical_uri.clone(),
+        engine: Arc::clone(&handle.engine),
+        policy: handle.policy.clone(),
+    });
+    Ok((canonical_uri, canonical_handle))
+}
+
+impl Default for GraphRegistry {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use std::path::Path;
+
+    use tempfile::TempDir;
+
+    use super::*;
+    use crate::graph_id::GraphId;
+
+    const TEST_SCHEMA: &str = "node Person { name: String @key }\n";
+
+    async fn build_handle(graph_id: &str, dir: &Path) -> Arc<GraphHandle> {
+        let graph_uri = dir.join(graph_id).to_str().unwrap().to_string();
+        let engine = Omnigraph::init(&graph_uri, TEST_SCHEMA)
+            .await
+            .expect("init engine for registry test");
+        Arc::new(GraphHandle {
+            key: GraphKey::cluster(GraphId::try_from(graph_id).unwrap()),
+            uri: graph_uri,
+            engine: Arc::new(engine),
+            policy: None,
+        })
+    }
+
+    #[tokio::test]
+    async fn new_registry_is_empty() {
+        let registry = GraphRegistry::new();
+        assert!(registry.is_empty());
+        assert_eq!(registry.len(), 0);
+        assert!(registry.list().is_empty());
+    }
+
+    #[tokio::test]
+    async fn insert_then_get_returns_ready() {
+        let dir = TempDir::new().unwrap();
+        let registry = GraphRegistry::new();
+        let handle = build_handle("alpha", dir.path()).await;
+        registry.insert(Arc::clone(&handle)).await.unwrap();
+
+        match registry.get(&handle.key) {
+            RegistryLookup::Ready(found) => {
+                assert!(Arc::ptr_eq(&found, &handle));
+            }
+            RegistryLookup::Gone => panic!("expected Ready, got Gone"),
+        }
+    }
+
+    #[tokio::test]
+    async fn get_nonexistent_returns_gone() {
+        let registry = GraphRegistry::new();
+        let key = GraphKey::cluster(GraphId::try_from("ghost").unwrap());
+        match registry.get(&key) {
+            RegistryLookup::Gone => {}
+            RegistryLookup::Ready(_) => panic!("expected Gone"),
+        }
+    }
+
+    #[tokio::test]
+    async fn insert_duplicate_key_returns_error() {
+        let dir = TempDir::new().unwrap();
+        let registry = GraphRegistry::new();
+        let h1 = build_handle("alpha", dir.path()).await;
+        // Same key, different URI sub-path (build_handle uses graph_id as subdir).
+        let dir2 = TempDir::new().unwrap();
+        let h2 = build_handle("alpha", dir2.path()).await;
+        registry.insert(h1).await.unwrap();
+
+        match registry.insert(h2).await {
+            Err(InsertError::DuplicateKey(_)) => {}
+            other => panic!("expected DuplicateKey, got {other:?}"),
+        }
+    }
+
+    #[tokio::test]
+    async fn insert_duplicate_uri_returns_error() {
+        let dir = TempDir::new().unwrap();
+        // Two handles with the same URI but different keys.
+        let shared_uri = dir.path().join("shared").to_str().unwrap().to_string();
+        let engine = Omnigraph::init(&shared_uri, TEST_SCHEMA).await.unwrap();
+        let engine = Arc::new(engine);
+        let h1 = Arc::new(GraphHandle {
+            key: GraphKey::cluster(GraphId::try_from("alpha").unwrap()),
+            uri: shared_uri.clone(),
+            engine: Arc::clone(&engine),
+            policy: None,
+        });
+        let h2 = Arc::new(GraphHandle {
+            key: GraphKey::cluster(GraphId::try_from("beta").unwrap()),
+            uri: shared_uri,
+            engine,
+            policy: None,
+        });
+
+        let registry = GraphRegistry::new();
+        registry.insert(h1).await.unwrap();
+        match registry.insert(h2).await {
+            Err(InsertError::DuplicateUri(_)) => {}
+            other => panic!("expected DuplicateUri, got {other:?}"),
+        }
+    }
+
+    #[tokio::test]
+    async fn list_returns_all_inserted_handles() {
+        let dir = TempDir::new().unwrap();
+        let registry = GraphRegistry::new();
+        for name in ["alpha", "beta", "gamma"] {
+            let h = build_handle(name, dir.path()).await;
+            registry.insert(h).await.unwrap();
+        }
+        assert_eq!(registry.len(), 3);
+        let mut ids: Vec<_> = registry
+            .list()
+            .into_iter()
+            .map(|h| h.key.graph_id.as_str().to_string())
+            .collect();
+        ids.sort();
+        assert_eq!(ids, vec!["alpha", "beta", "gamma"]);
+    }
+
+    #[tokio::test]
+    async fn from_handles_bulk_init_succeeds() {
+        let dir = TempDir::new().unwrap();
+        let handles = vec![
+            build_handle("alpha", dir.path()).await,
+            build_handle("beta", dir.path()).await,
+        ];
+        let registry = GraphRegistry::from_handles(handles).unwrap();
+        assert_eq!(registry.len(), 2);
+    }
+
+    #[tokio::test]
+    async fn from_handles_rejects_duplicate_keys() {
+        let dir1 = TempDir::new().unwrap();
+        let dir2 = TempDir::new().unwrap();
+        let h1 = build_handle("alpha", dir1.path()).await;
+        let h2 = build_handle("alpha", dir2.path()).await;
+        let err = match GraphRegistry::from_handles(vec![h1, h2]) {
+            Ok(_) => panic!("expected DuplicateKey, got Ok"),
+            Err(err) => err,
+        };
+        assert!(
+            matches!(err, InsertError::DuplicateKey(_)),
+            "expected DuplicateKey, got {err}",
+        );
+    }
+
+    #[tokio::test]
+    async fn from_handles_rejects_duplicate_uris() {
+        let dir = TempDir::new().unwrap();
+        let shared_uri = dir.path().join("shared").to_str().unwrap().to_string();
+        let engine = Arc::new(Omnigraph::init(&shared_uri, TEST_SCHEMA).await.unwrap());
+        let h1 = Arc::new(GraphHandle {
+            key: GraphKey::cluster(GraphId::try_from("alpha").unwrap()),
+            uri: shared_uri.clone(),
+            engine: Arc::clone(&engine),
+            policy: None,
+        });
+        let h2 = Arc::new(GraphHandle {
+            key: GraphKey::cluster(GraphId::try_from("beta").unwrap()),
+            uri: shared_uri,
+            engine,
+            policy: None,
+        });
+        let err = match GraphRegistry::from_handles(vec![h1, h2]) {
+            Ok(_) => panic!("expected DuplicateUri, got Ok"),
+            Err(err) => err,
+        };
+        assert!(
+            matches!(err, InsertError::DuplicateUri(_)),
+            "expected DuplicateUri, got {err}",
+        );
+    }
+
+    /// Race test modeled on `actor_admission_race_does_not_exceed_cap`
+    /// at `tests/server.rs:3596+`. Spawn N concurrent inserts with the
+    /// same `GraphKey` (each constructing its own `GraphHandle` against
+    /// its own tempdir). Exactly one must succeed; the others must
+    /// return `DuplicateKey`. No `unwrap` panic: the `Mutex<()>` +
+    /// in-mutex re-check is the linearization point.
+    #[tokio::test(flavor = "multi_thread")]
+    async fn concurrent_insert_same_key_exactly_one_succeeds() {
+        const N: usize = 8;
+
+        let registry = Arc::new(GraphRegistry::new());
+        // Pre-create N handles (each in its own tempdir; same key).
+        let mut handles = Vec::with_capacity(N);
+        let mut dirs = Vec::with_capacity(N);
+        for _ in 0..N {
+            let d = TempDir::new().unwrap();
+            handles.push(build_handle("contested", d.path()).await);
+            dirs.push(d);
+        }
+
+        let barrier = Arc::new(tokio::sync::Barrier::new(N));
+        let mut tasks = Vec::with_capacity(N);
+        for handle in handles {
+            let registry = Arc::clone(&registry);
+            let barrier = Arc::clone(&barrier);
+            tasks.push(tokio::spawn(async move {
+                barrier.wait().await;
+                registry.insert(handle).await
+            }));
+        }
+
+        let mut ok_count = 0usize;
+        let mut dup_count = 0usize;
+        for t in tasks {
+            match t.await.unwrap() {
+                Ok(()) => ok_count += 1,
+                Err(InsertError::DuplicateKey(_)) => dup_count += 1,
+                Err(other) => panic!("unexpected error: {other:?}"),
+            }
+        }
+        assert_eq!(ok_count, 1, "exactly one insert must succeed");
+        assert_eq!(dup_count, N - 1, "the rest must return DuplicateKey");
+        assert_eq!(registry.len(), 1);
+
+        // Drop the dirs at the end (preserves engines until tasks finish).
+        drop(dirs);
+    }
+
+    /// Concurrent inserts with **distinct** keys all succeed.
+    /// Linearizability over the mutex still serializes them.
+    #[tokio::test(flavor = "multi_thread")]
+    async fn concurrent_insert_distinct_keys_all_succeed() {
+        const N: usize = 8;
+
+        let registry = Arc::new(GraphRegistry::new());
+        // Pre-create N handles with distinct ids, each in its own tempdir.
+        let mut handles = Vec::with_capacity(N);
+        let mut dirs = Vec::with_capacity(N);
+        for i in 0..N {
+            let d = TempDir::new().unwrap();
+            handles.push(build_handle(&format!("graph-{i}"), d.path()).await);
+            dirs.push(d);
+        }
+
+        let barrier = Arc::new(tokio::sync::Barrier::new(N));
+        let mut tasks = Vec::with_capacity(N);
+        for handle in handles {
+            let registry = Arc::clone(&registry);
+            let barrier = Arc::clone(&barrier);
+            tasks.push(tokio::spawn(async move {
+                barrier.wait().await;
+                registry.insert(handle).await
+            }));
+        }
+        for t in tasks {
+            t.await.unwrap().unwrap();
+        }
+        assert_eq!(registry.len(), N);
+        drop(dirs);
+    }
+
+    /// Concurrent reads during a write must always see a consistent
+    /// snapshot (no torn state). With `ArcSwap`, the read either sees
+    /// the old snapshot or the new one — never both, never neither.
+    #[tokio::test(flavor = "multi_thread")]
+    async fn concurrent_reads_during_inserts_see_consistent_snapshots() {
+        let dir = TempDir::new().unwrap();
+        let registry = Arc::new(GraphRegistry::new());
+
+        // Spawn a writer that inserts graph-0..graph-9 sequentially.
+        const N_WRITES: usize = 10;
+        let writer_registry = Arc::clone(&registry);
+        let writer_dir = dir.path().to_path_buf();
+        let writer = tokio::spawn(async move {
+            for i in 0..N_WRITES {
+                let h = build_handle(&format!("graph-{i}"), &writer_dir).await;
+                writer_registry.insert(h).await.unwrap();
+            }
+        });
+
+        // Reader loop: repeatedly snapshot the registry until the writer
+        // finishes. Every snapshot's len must be in [0, N_WRITES], and
+        // for every key g in the snapshot, get(g) must return Ready.
+        let reader_registry = Arc::clone(&registry);
+        let reader = tokio::spawn(async move {
+            for _ in 0..200 {
+                let snap = reader_registry.list();
+                assert!(snap.len() <= N_WRITES);
+                for handle in &snap {
+                    match reader_registry.get(&handle.key) {
+                        RegistryLookup::Ready(found) => {
+                            assert!(Arc::ptr_eq(&found, handle));
+                        }
+                        RegistryLookup::Gone => panic!(
+                            "snapshot listed key {} but get() returned Gone",
+                            handle.key.graph_id
+                        ),
+                    }
+                }
+                tokio::task::yield_now().await;
+            }
+        });
+
+        writer.await.unwrap();
+        reader.await.unwrap();
+        assert_eq!(registry.len(), N_WRITES);
+    }
+}
--- a/crates/omnigraph-server/src/workload.rs
+++ b/crates/omnigraph-server/src/workload.rs
@ -270,12 +270,13 @@ mod tests {
        let err = controller
            .try_admit(&actor, 100)
            .expect_err("third should reject on count");
-        assert!(matches!(err, RejectReason::InFlightCountExceeded { cap: 2 }));
+        assert!(matches!(
+            err,
+            RejectReason::InFlightCountExceeded { cap: 2 }
+        ));
        drop(g1);
        // After drop, a new admit succeeds again.
-        let _g3 = controller
-            .try_admit(&actor, 100)
-            .expect("admit after drop");
+        let _g3 = controller.try_admit(&actor, 100).expect("admit after drop");
    }

    #[tokio::test(flavor = "multi_thread", worker_threads = 4)]
@ -356,7 +357,9 @@ mod tests {
        let bob: Arc<str> = "bob".into();
        let _ga = controller.try_admit(&alice, 100).expect("alice ok");
        // Alice over count cap, Bob unaffected.
-        let err = controller.try_admit(&alice, 100).expect_err("alice rejected");
+        let err = controller
+            .try_admit(&alice, 100)
+            .expect_err("alice rejected");
        assert!(matches!(err, RejectReason::InFlightCountExceeded { .. }));
        let _gb = controller.try_admit(&bob, 100).expect("bob ok");
    }
--- a/crates/omnigraph-server/tests/openapi.rs
+++ b/crates/omnigraph-server/tests/openapi.rs
@ -161,6 +161,7 @@ fn openapi_info_contains_version() {

 const EXPECTED_PATHS: &[&str] = &[
    "/healthz",
+    "/graphs",
    "/snapshot",
    "/read",
    "/export",
@ -957,3 +958,289 @@ fn openapi_spec_is_up_to_date() {
        "openapi.json is out of date. Run: OMNIGRAPH_UPDATE_OPENAPI=1 cargo test -p omnigraph-server --test openapi openapi_spec_is_up_to_date"
    );
 }
+
+// ---------------------------------------------------------------------------
+// MR-668 — multi-mode OpenAPI cluster filter
+// ---------------------------------------------------------------------------
+//
+// In multi-graph mode, `/openapi.json` reports cluster routes
+// (`/graphs/{graph_id}/...`) instead of the legacy flat routes. The
+// only flat path that survives is `/healthz`. Operation IDs gain a
+// `cluster_` prefix so SDK generators have stable, unique ids.
+//
+// These tests exercise the request-time `server_openapi` handler via
+// `oneshot`, not the static `ApiDoc::openapi()` — the rewrite happens
+// only on the served document.
+
+const EXPECTED_CLUSTER_PATHS: &[&str] = &[
+    "/graphs/{graph_id}/snapshot",
+    "/graphs/{graph_id}/read",
+    "/graphs/{graph_id}/export",
+    "/graphs/{graph_id}/change",
+    "/graphs/{graph_id}/schema",
+    "/graphs/{graph_id}/schema/apply",
+    "/graphs/{graph_id}/ingest",
+    "/graphs/{graph_id}/branches",
+    "/graphs/{graph_id}/branches/{branch}",
+    "/graphs/{graph_id}/branches/merge",
+    "/graphs/{graph_id}/commits",
+    "/graphs/{graph_id}/commits/{commit_id}",
+];
+
+async fn app_for_multi_mode(graph_ids: &[&str]) -> (Vec<tempfile::TempDir>, Router) {
+    use std::sync::Arc;
+
+    use omnigraph_server::{GraphHandle, GraphId, GraphKey};
+
+    let mut dirs = Vec::with_capacity(graph_ids.len());
+    let mut handles = Vec::with_capacity(graph_ids.len());
+    for id in graph_ids {
+        let dir = tempfile::tempdir().unwrap();
+        let graph_uri = dir.path().join(id).to_str().unwrap().to_string();
+        let schema = fs::read_to_string(fixture("test.pg")).unwrap();
+        let engine = Omnigraph::init(&graph_uri, &schema).await.unwrap();
+        handles.push(Arc::new(GraphHandle {
+            key: GraphKey::cluster(GraphId::try_from(*id).unwrap()),
+            uri: graph_uri,
+            engine: Arc::new(engine),
+            policy: None,
+        }));
+        dirs.push(dir);
+    }
+    let workload = omnigraph_server::workload::WorkloadController::from_env();
+    let state = AppState::new_multi(handles, Vec::new(), None, workload, None).unwrap();
+    let app = build_app(state);
+    (dirs, app)
+}
+
+#[tokio::test]
+async fn multi_mode_openapi_lists_cluster_paths() {
+    let (_dirs, app) = app_for_multi_mode(&["alpha"]).await;
+    let request = Request::builder()
+        .method(Method::GET)
+        .uri("/openapi.json")
+        .body(Body::empty())
+        .unwrap();
+    let (status, json) = json_response(&app, request).await;
+    assert_eq!(status, StatusCode::OK);
+    let paths = json["paths"].as_object().expect("paths must be an object");
+    let path_keys: HashSet<&str> = paths.keys().map(|k| k.as_str()).collect();
+    for expected in EXPECTED_CLUSTER_PATHS {
+        assert!(
+            path_keys.contains(expected),
+            "missing cluster path in multi-mode spec: {expected}. \
+             Found: {path_keys:?}"
+        );
+    }
+}
+
+#[tokio::test]
+async fn multi_mode_openapi_drops_flat_protected_paths() {
+    let (_dirs, app) = app_for_multi_mode(&["alpha"]).await;
+    let request = Request::builder()
+        .method(Method::GET)
+        .uri("/openapi.json")
+        .body(Body::empty())
+        .unwrap();
+    let (_, json) = json_response(&app, request).await;
+    let paths = json["paths"].as_object().unwrap();
+    // None of the legacy flat protected paths should appear in multi mode.
+    let flat_protected = [
+        "/snapshot",
+        "/read",
+        "/export",
+        "/change",
+        "/schema",
+        "/schema/apply",
+        "/ingest",
+        "/branches",
+        "/branches/{branch}",
+        "/branches/merge",
+        "/commits",
+        "/commits/{commit_id}",
+    ];
+    for flat in flat_protected {
+        assert!(
+            !paths.contains_key(flat),
+            "flat path {flat} must not appear in multi-mode spec; \
+             cluster routes are the only protected surface"
+        );
+    }
+}
+
+#[tokio::test]
+async fn multi_mode_openapi_keeps_management_paths_flat() {
+    let (_dirs, app) = app_for_multi_mode(&["alpha"]).await;
+    let request = Request::builder()
+        .method(Method::GET)
+        .uri("/openapi.json")
+        .body(Body::empty())
+        .unwrap();
+    let (_, json) = json_response(&app, request).await;
+    let paths = json["paths"].as_object().unwrap();
+    for flat in ["/healthz", "/graphs"] {
+        assert!(
+            paths.contains_key(flat),
+            "{flat} must remain flat in multi mode"
+        );
+        let nested = format!("/graphs/{{graph_id}}{flat}");
+        assert!(
+            !paths.contains_key(&nested),
+            "{flat} must NOT be cluster-prefixed to {nested}"
+        );
+    }
+}
+
+#[tokio::test]
+async fn multi_mode_openapi_prefixes_operation_ids_with_cluster() {
+    let (_dirs, app) = app_for_multi_mode(&["alpha"]).await;
+    let request = Request::builder()
+        .method(Method::GET)
+        .uri("/openapi.json")
+        .body(Body::empty())
+        .unwrap();
+    let (_, json) = json_response(&app, request).await;
+    // Every cluster path operation must have a `cluster_` operation_id.
+    // Flat-mounted paths (healthz, management /graphs) keep their
+    // original operation_ids — they're not per-graph.
+    let paths = json["paths"].as_object().unwrap();
+    let mut checked = 0;
+    for (path, item) in paths {
+        if path == "/healthz" || path == "/graphs" {
+            continue;
+        }
+        for method in ["get", "post", "put", "delete", "patch"] {
+            if let Some(op) = item.get(method).filter(|v| v.is_object()) {
+                if let Some(id) = op["operationId"].as_str() {
+                    assert!(
+                        id.starts_with("cluster_"),
+                        "operation_id at {path}.{method} must start with `cluster_`, got `{id}`"
+                    );
+                    checked += 1;
+                }
+            }
+        }
+    }
+    assert!(
+        checked >= EXPECTED_CLUSTER_PATHS.len(),
+        "expected at least {} cluster operation_ids; checked {checked}",
+        EXPECTED_CLUSTER_PATHS.len()
+    );
+}
+
+#[tokio::test]
+async fn multi_mode_openapi_declares_graph_id_path_parameter() {
+    let (_dirs, app) = app_for_multi_mode(&["alpha"]).await;
+    let request = Request::builder()
+        .method(Method::GET)
+        .uri("/openapi.json")
+        .body(Body::empty())
+        .unwrap();
+    let (_, json) = json_response(&app, request).await;
+    let paths = json["paths"].as_object().unwrap();
+
+    for expected_path in EXPECTED_CLUSTER_PATHS {
+        let item = paths
+            .get(*expected_path)
+            .unwrap_or_else(|| panic!("missing cluster path {expected_path}"));
+        for method in ["get", "post", "put", "delete", "patch"] {
+            let Some(operation) = item.get(method).filter(|value| value.is_object()) else {
+                continue;
+            };
+            let parameters = operation["parameters"]
+                .as_array()
+                .unwrap_or_else(|| panic!("{expected_path}.{method} missing parameters"));
+            let graph_id = parameters
+                .iter()
+                .find(|param| param["name"] == "graph_id" && param["in"] == "path")
+                .unwrap_or_else(|| {
+                    panic!("{expected_path}.{method} missing graph_id path parameter")
+                });
+            assert_eq!(
+                graph_id["required"].as_bool(),
+                Some(true),
+                "{expected_path}.{method} graph_id parameter must be required"
+            );
+            assert_eq!(
+                graph_id["schema"]["type"].as_str(),
+                Some("string"),
+                "{expected_path}.{method} graph_id parameter must be string typed"
+            );
+        }
+    }
+
+    for flat in ["/healthz", "/graphs"] {
+        let item = paths.get(flat).unwrap();
+        for method in ["get", "post", "put", "delete", "patch"] {
+            if let Some(operation) = item.get(method).filter(|value| value.is_object()) {
+                let has_graph_id = operation["parameters"]
+                    .as_array()
+                    .map(|params| {
+                        params
+                            .iter()
+                            .any(|param| param["name"] == "graph_id" && param["in"] == "path")
+                    })
+                    .unwrap_or(false);
+                assert!(
+                    !has_graph_id,
+                    "{flat}.{method} must not declare graph_id; it remains flat"
+                );
+            }
+        }
+    }
+}
+
+#[tokio::test]
+async fn multi_mode_operation_ids_are_unique() {
+    // Sanity check: the cluster_ prefix prevents collision with flat ids
+    // (which don't appear in multi mode, but the contract is "unique
+    // across the spec"). Verify every operation_id in the multi-mode
+    // spec is unique.
+    let (_dirs, app) = app_for_multi_mode(&["alpha"]).await;
+    let request = Request::builder()
+        .method(Method::GET)
+        .uri("/openapi.json")
+        .body(Body::empty())
+        .unwrap();
+    let (_, json) = json_response(&app, request).await;
+    let paths = json["paths"].as_object().unwrap();
+    let mut seen_ids: HashSet<String> = HashSet::new();
+    for (_, item) in paths {
+        for method in ["get", "post", "put", "delete", "patch"] {
+            if let Some(op) = item.get(method).filter(|v| v.is_object()) {
+                if let Some(id) = op["operationId"].as_str() {
+                    assert!(
+                        seen_ids.insert(id.to_string()),
+                        "duplicate operation_id `{id}` in multi-mode spec"
+                    );
+                }
+            }
+        }
+    }
+}
+
+#[tokio::test]
+async fn single_mode_openapi_unchanged_by_cluster_filter() {
+    // Regression: single mode still emits the legacy flat surface.
+    let (_temp, app) = app_for_loaded_graph().await;
+    let request = Request::builder()
+        .method(Method::GET)
+        .uri("/openapi.json")
+        .body(Body::empty())
+        .unwrap();
+    let (_, json) = json_response(&app, request).await;
+    let paths = json["paths"].as_object().unwrap();
+    let path_keys: HashSet<&str> = paths.keys().map(|k| k.as_str()).collect();
+    for expected in EXPECTED_PATHS {
+        assert!(
+            path_keys.contains(expected),
+            "single mode must still emit flat path: {expected}"
+        );
+    }
+    for cluster in EXPECTED_CLUSTER_PATHS {
+        assert!(
+            !path_keys.contains(cluster),
+            "single mode must NOT emit cluster path: {cluster}"
+        );
+    }
+}
--- a/crates/omnigraph-server/tests/server.rs
+++ b/crates/omnigraph-server/tests/server.rs
--- a/crates/omnigraph/examples/bench_expand.rs
+++ b/crates/omnigraph/examples/bench_expand.rs
@ -239,7 +239,9 @@ async fn main() {

        let jsonl = generate_jsonl(n, avg_deg, 42);
        let t = Instant::now();
-        load_jsonl(&mut db, &jsonl, LoadMode::Overwrite).await.unwrap();
+        load_jsonl(&mut db, &jsonl, LoadMode::Overwrite)
+            .await
+            .unwrap();
        let load_elapsed = t.elapsed();

        println!(
--- a/crates/omnigraph/src/db/mod.rs
+++ b/crates/omnigraph/src/db/mod.rs
@ -10,11 +10,11 @@ pub(crate) mod write_queue;
 pub use commit_graph::GraphCommit;
 pub use graph_coordinator::{GraphCoordinator, ReadTarget, ResolvedTarget, SnapshotId};
 pub use manifest::{Snapshot, SubTableEntry, SubTableUpdate};
+pub(crate) use omnigraph::ensure_public_branch_ref;
 pub use omnigraph::{
-    CleanupPolicyOptions, MergeOutcome, Omnigraph, OpenMode, SchemaApplyOptions,
+    CleanupPolicyOptions, InitOptions, MergeOutcome, Omnigraph, OpenMode, SchemaApplyOptions,
    SchemaApplyResult, TableCleanupStats, TableOptimizeStats,
 };
-pub(crate) use omnigraph::ensure_public_branch_ref;
 pub(crate) use run_registry::is_internal_run_branch;

 pub(crate) const SCHEMA_APPLY_LOCK_BRANCH: &str = "__schema_apply_lock__";
@ -59,9 +59,7 @@ impl MutationOpKind {
    pub(crate) fn strict_pre_stage_version_check(self) -> bool {
        match self {
            MutationOpKind::Insert | MutationOpKind::Merge => false,
-            MutationOpKind::Update
-            | MutationOpKind::Delete
-            | MutationOpKind::SchemaRewrite => true,
+            MutationOpKind::Update | MutationOpKind::Delete | MutationOpKind::SchemaRewrite => true,
        }
    }
 }
--- a/crates/omnigraph/src/db/omnigraph.rs
+++ b/crates/omnigraph/src/db/omnigraph.rs
@ -165,31 +165,137 @@ pub enum OpenMode {
    ReadOnly,
 }

+/// Options for [`Omnigraph::init_with_options`].
+///
+/// `force` controls the safety preflight that prevents an
+/// accidental re-init from overwriting an existing graph's schema
+/// metadata. Default behavior (`force: false`) fails fast with
+/// [`OmniError::AlreadyInitialized`] if any of `_schema.pg`,
+/// `_schema.ir.json`, or `__schema_state.json` already exists at
+/// the target URI. With `force: true` the preflight is skipped —
+/// existing schema files are overwritten in place. Force does NOT
+/// purge old Lance datasets or `__manifest/`; reclaiming those
+/// still requires deleting the graph directory by hand (or via a
+/// future `DELETE /graphs/{id}`).
+#[derive(Debug, Clone, Copy, Default)]
+pub struct InitOptions {
+    /// Skip the existing-graph preflight. Operators set this when
+    /// they actually mean to overwrite — e.g. `omnigraph init --force`.
+    pub force: bool,
+}
+
 impl Omnigraph {
    /// Create a new graph at `uri` from schema source.
    ///
-    /// Creates `_schema.pg`, per-type Lance datasets, and `__manifest`.
+    /// Strict mode: errors with [`OmniError::AlreadyInitialized`] if
+    /// `uri` already holds any of the three schema artifacts. To
+    /// overwrite an existing graph deliberately, call
+    /// [`Self::init_with_options`] with `InitOptions { force: true }`.
    pub async fn init(uri: &str, schema_source: &str) -> Result<Self> {
-        Self::init_with_storage(uri, schema_source, storage_for_uri(uri)?).await
+        Self::init_with_options(uri, schema_source, InitOptions::default()).await
+    }
+
+    /// Create a new graph at `uri`, with explicit init-time options.
+    ///
+    /// See [`InitOptions`] for the safety contract — by default this
+    /// behaves identically to [`Self::init`].
+    pub async fn init_with_options(
+        uri: &str,
+        schema_source: &str,
+        options: InitOptions,
+    ) -> Result<Self> {
+        Self::init_with_storage(uri, schema_source, storage_for_uri(uri)?, options).await
    }

    pub(crate) async fn init_with_storage(
        uri: &str,
        schema_source: &str,
        storage: Arc<dyn StorageAdapter>,
+        options: InitOptions,
    ) -> Result<Self> {
        let root = normalize_root_uri(uri)?;
+
+        // Preflight: refuse to clobber an existing graph unless the
+        // operator passed `force`. This runs BEFORE any parse or
+        // write so a misdirected `init` against an existing graph
+        // URI cannot reach a code path that overwrites or, on a
+        // later cleanup, deletes the schema files.
+        //
+        // Closes the "init is destructive against existing state"
+        // class: there is no longer a code path where strict-mode
+        // `init` can mutate a populated graph root.
+        if !options.force {
+            for candidate in [
+                schema_source_uri(&root),
+                schema_ir_uri(&root),
+                schema_state_uri(&root),
+            ] {
+                if storage.exists(&candidate).await? {
+                    return Err(OmniError::AlreadyInitialized { uri: root.clone() });
+                }
+            }
+        }
+
        let schema_ir = read_schema_ir_from_source(schema_source)?;
        let mut catalog = build_catalog_from_ir(&schema_ir)?;
        fixup_blob_schemas(&mut catalog);

-        // Write _schema.pg
-        let schema_path = join_uri(&root, SCHEMA_SOURCE_FILENAME);
-        storage.write_text(&schema_path, schema_source).await?;
-        write_schema_contract(&root, storage.as_ref(), &schema_ir).await?;
+        // Establish an atomic ownership claim on `_schema.pg` before
+        // writing the remaining init artifacts. A check-then-write preflight
+        // is not enough under concurrent `init` calls: two callers can both
+        // observe an empty root, one can successfully initialize, and the
+        // loser can then fail in Lance `WriteMode::Create`. Only the caller
+        // that atomically created `_schema.pg` may clean up schema artifacts
+        // on later failure.
+        let schema_pg_claimed = if options.force {
+            false
+        } else {
+            let schema_path = join_uri(&root, SCHEMA_SOURCE_FILENAME);
+            if !storage
+                .write_text_if_absent(&schema_path, schema_source)
+                .await?
+            {
+                return Err(OmniError::AlreadyInitialized { uri: root.clone() });
+            }
+            if let Err(err) = crate::failpoints::maybe_fail("init.after_schema_pg_written") {
+                best_effort_cleanup_init_artifacts(&root, storage.as_ref()).await;
+                return Err(err);
+            }
+            true
+        };

-        // Create manifest + per-type datasets
-        let coordinator = GraphCoordinator::init(&root, &catalog, Arc::clone(&storage)).await?;
+        // Run the I/O phase. On any error, best-effort-clean schema
+        // artifacts only when this invocation owns them: strict mode owns
+        // them after the atomic `_schema.pg` claim above; force mode owns
+        // destructive overwrite semantics by explicit operator request.
+        //
+        // Coverage gap: Lance per-type datasets and `__manifest/`
+        // directory created by `GraphCoordinator::init` are NOT cleaned
+        // up here — fully recursive directory deletion requires a
+        // `StorageAdapter::delete_prefix` primitive that's deferred
+        // along with `DELETE /graphs/{id}` (PR 2b in the MR-668 plan
+        // is currently deferred). If `init` fails after coordinator
+        // init succeeds, operators may need to remove the graph
+        // directory manually before retrying `init` on the same URI.
+        // Documented in the PR 2a commit message and `init` rustdoc.
+        let coordinator = match init_storage_phase(
+            &root,
+            schema_source,
+            &schema_ir,
+            &catalog,
+            &storage,
+            !schema_pg_claimed,
+        )
+        .await
+        {
+            Ok(coordinator) => coordinator,
+            Err(err) => {
+                if schema_pg_claimed || options.force {
+                    best_effort_cleanup_init_artifacts(&root, storage.as_ref()).await;
+                }
+                return Err(err);
+            }
+        };

        Ok(Self {
            root_uri: root.clone(),
@ -1477,6 +1583,71 @@ fn read_schema_ir_from_source(schema_source: &str) -> Result<SchemaIR> {
    build_schema_ir(&schema_ast).map_err(|err| OmniError::manifest(err.to_string()))
 }

+/// I/O phase of `Omnigraph::init_with_storage`. Split out so the caller
+/// can pattern-match on the result and run cleanup on error before
+/// returning the original error.
+///
+/// Failpoints fire at the phase boundaries:
+/// * `init.after_schema_pg_written` — `_schema.pg` is on disk. In strict mode
+///   this fires in the caller immediately after the atomic ownership claim; in
+///   force mode it fires here after the explicit overwrite.
+/// * `init.after_schema_contract_written` — `_schema.pg` + `_schema.ir.json`
+///   + `__schema_state.json` are on disk.
+/// * `init.after_coordinator_init` — all schema files plus Lance per-type
+///   datasets and `__manifest/` are on disk. (The cleanup wrapper can only
+///   remove the schema files; Lance directories need `delete_prefix` —
+///   deferred along with `DELETE /graphs/{id}`.)
+async fn init_storage_phase(
+    root: &str,
+    schema_source: &str,
+    schema_ir: &SchemaIR,
+    catalog: &Catalog,
+    storage: &Arc<dyn StorageAdapter>,
+    write_schema_pg: bool,
+) -> Result<GraphCoordinator> {
+    if write_schema_pg {
+        let schema_path = join_uri(root, SCHEMA_SOURCE_FILENAME);
+        storage.write_text(&schema_path, schema_source).await?;
+        crate::failpoints::maybe_fail("init.after_schema_pg_written")?;
+    }
+
+    write_schema_contract(root, storage.as_ref(), schema_ir).await?;
+    crate::failpoints::maybe_fail("init.after_schema_contract_written")?;
+
+    let coordinator = GraphCoordinator::init(root, catalog, Arc::clone(storage)).await?;
+    crate::failpoints::maybe_fail("init.after_coordinator_init")?;
+
+    Ok(coordinator)
+}
+
+/// Best-effort cleanup of init-phase artifacts. Called from
+/// `init_with_storage` on any error returned by `init_storage_phase`.
+///
+/// Removes the three schema files: `_schema.pg`, `_schema.ir.json`,
+/// `__schema_state.json`. Lance datasets and `__manifest/` are not
+/// touched here — recursive directory deletion requires a
+/// `StorageAdapter::delete_prefix` primitive that's deferred along
+/// with `DELETE /graphs/{id}` (MR-668 PR 2b).
+///
+/// Failures to delete are logged via `tracing::warn` and do not mask
+/// the original init error.
+async fn best_effort_cleanup_init_artifacts(root: &str, storage: &dyn StorageAdapter) {
+    for uri in [
+        schema_source_uri(root),
+        schema_ir_uri(root),
+        schema_state_uri(root),
+    ] {
+        if let Err(err) = storage.delete(&uri).await {
+            tracing::warn!(
+                target: "omnigraph::init::cleanup",
+                uri = %uri,
+                error = %err,
+                "init failed; best-effort cleanup could not delete artifact",
+            );
+        }
+    }
+}
+
 fn schema_table_key(type_kind: SchemaTypeKind, name: &str) -> String {
    match type_kind {
        SchemaTypeKind::Node => format!("node:{}", name),
@ -1686,7 +1857,7 @@ mod tests {
    use crate::db::manifest::ManifestCoordinator;
    use async_trait::async_trait;
    use serde_json::Value;
-    use std::sync::Mutex;
+    use std::sync::{Arc, Mutex};

    use crate::storage::{LocalStorageAdapter, StorageAdapter, join_uri};

@ -1740,6 +1911,11 @@ edge WorksAt: Person -> Company
            self.inner.write_text(uri, contents).await
        }

+        async fn write_text_if_absent(&self, uri: &str, contents: &str) -> Result<bool> {
+            self.writes.lock().unwrap().push(uri.to_string());
+            self.inner.write_text_if_absent(uri, contents).await
+        }
+
        async fn exists(&self, uri: &str) -> Result<bool> {
            self.exists_checks.lock().unwrap().push(uri.to_string());
            self.inner.exists(uri).await
@ -1763,13 +1939,96 @@ edge WorksAt: Person -> Company
        }
    }

+    #[derive(Debug)]
+    struct InitRaceStorageAdapter {
+        inner: LocalStorageAdapter,
+        root: String,
+        barrier: Arc<tokio::sync::Barrier>,
+    }
+
+    #[async_trait]
+    impl StorageAdapter for InitRaceStorageAdapter {
+        async fn read_text(&self, uri: &str) -> Result<String> {
+            self.inner.read_text(uri).await
+        }
+
+        async fn write_text(&self, uri: &str, contents: &str) -> Result<()> {
+            self.inner.write_text(uri, contents).await
+        }
+
+        async fn write_text_if_absent(&self, uri: &str, contents: &str) -> Result<bool> {
+            self.inner.write_text_if_absent(uri, contents).await
+        }
+
+        async fn exists(&self, uri: &str) -> Result<bool> {
+            let exists = self.inner.exists(uri).await?;
+            if uri == schema_state_uri(&self.root) {
+                self.barrier.wait().await;
+            }
+            Ok(exists)
+        }
+
+        async fn rename_text(&self, from_uri: &str, to_uri: &str) -> Result<()> {
+            self.inner.rename_text(from_uri, to_uri).await
+        }
+
+        async fn delete(&self, uri: &str) -> Result<()> {
+            self.inner.delete(uri).await
+        }
+
+        async fn list_dir(&self, dir_uri: &str) -> Result<Vec<String>> {
+            self.inner.list_dir(dir_uri).await
+        }
+    }
+
+    #[tokio::test(flavor = "multi_thread", worker_threads = 2)]
+    async fn concurrent_strict_init_does_not_delete_winning_schema_files() {
+        let dir = tempfile::tempdir().unwrap();
+        let uri = dir.path().to_str().unwrap().to_string();
+        let root = normalize_root_uri(&uri).unwrap();
+        let storage: Arc<dyn StorageAdapter> = Arc::new(InitRaceStorageAdapter {
+            inner: LocalStorageAdapter,
+            root,
+            barrier: Arc::new(tokio::sync::Barrier::new(2)),
+        });
+
+        let left = Omnigraph::init_with_storage(
+            &uri,
+            TEST_SCHEMA,
+            Arc::clone(&storage),
+            InitOptions::default(),
+        );
+        let right = Omnigraph::init_with_storage(
+            &uri,
+            TEST_SCHEMA,
+            Arc::clone(&storage),
+            InitOptions::default(),
+        );
+        let (left, right) = tokio::join!(left, right);
+        let ok_count = usize::from(left.is_ok()) + usize::from(right.is_ok());
+        assert_eq!(ok_count, 1, "exactly one concurrent init should win");
+
+        assert!(
+            dir.path().join("_schema.pg").exists(),
+            "winning init must leave _schema.pg in place"
+        );
+        assert!(
+            dir.path().join("_schema.ir.json").exists(),
+            "winning init must leave _schema.ir.json in place"
+        );
+        assert!(
+            dir.path().join("__schema_state.json").exists(),
+            "winning init must leave __schema_state.json in place"
+        );
+    }
+
    #[tokio::test]
    async fn test_init_and_open_route_graph_metadata_through_storage_adapter() {
        let dir = tempfile::tempdir().unwrap();
        let uri = dir.path().to_str().unwrap();
        let adapter = Arc::new(RecordingStorageAdapter::default());

-        Omnigraph::init_with_storage(uri, TEST_SCHEMA, adapter.clone())
+        Omnigraph::init_with_storage(uri, TEST_SCHEMA, adapter.clone(), InitOptions::default())
            .await
            .unwrap();
        assert!(adapter.writes().contains(&join_uri(uri, "_schema.pg")));
--- a/crates/omnigraph/src/db/omnigraph/export.rs
+++ b/crates/omnigraph/src/db/omnigraph/export.rs
@ -16,7 +16,12 @@ pub(super) async fn entity_at(
    id: &str,
    version: u64,
 ) -> Result<Option<serde_json::Value>> {
-    let snap = db.coordinator.read().await.snapshot_at_version(version).await?;
+    let snap = db
+        .coordinator
+        .read()
+        .await
+        .snapshot_at_version(version)
+        .await?;
    entity_from_snapshot(db, &snap, table_key, id).await
 }

--- a/crates/omnigraph/src/db/omnigraph/table_ops.rs
+++ b/crates/omnigraph/src/db/omnigraph/table_ops.rs
@ -22,7 +22,12 @@ pub(super) async fn graph_index_for_resolved(
 }

 pub(super) async fn ensure_indices(db: &Omnigraph) -> Result<()> {
-    let current_branch = db.coordinator.read().await.current_branch().map(str::to_string);
+    let current_branch = db
+        .coordinator
+        .read()
+        .await
+        .current_branch()
+        .map(str::to_string);
    ensure_indices_for_branch(db, current_branch.as_deref()).await
 }

@ -68,10 +73,7 @@ pub(super) async fn failpoint_publish_table_head_without_index_rebuild_for_test(
    .await
 }

-pub(super) async fn ensure_indices_for_branch(
-    db: &Omnigraph,
-    branch: Option<&str>,
-) -> Result<()> {
+pub(super) async fn ensure_indices_for_branch(db: &Omnigraph, branch: Option<&str>) -> Result<()> {
    db.ensure_schema_state_valid().await?;
    db.ensure_schema_apply_idle("ensure_indices").await?;
    let resolved = db.resolved_branch_target(branch).await?;
@ -403,7 +405,12 @@ pub(super) async fn open_for_mutation(
    table_key: &str,
    op_kind: crate::db::MutationOpKind,
 ) -> Result<(Dataset, String, Option<String>)> {
-    let current_branch = db.coordinator.read().await.current_branch().map(str::to_string);
+    let current_branch = db
+        .coordinator
+        .read()
+        .await
+        .current_branch()
+        .map(str::to_string);
    open_for_mutation_on_branch(db, current_branch.as_deref(), table_key, op_kind).await
 }

@ -807,7 +814,12 @@ pub(super) async fn commit_prepared_updates_on_branch(
    updates: &[crate::db::SubTableUpdate],
    actor_id: Option<&str>,
 ) -> Result<u64> {
-    let current_branch = db.coordinator.read().await.current_branch().map(str::to_string);
+    let current_branch = db
+        .coordinator
+        .read()
+        .await
+        .current_branch()
+        .map(str::to_string);
    let requested_branch = branch.map(str::to_string);
    if requested_branch == current_branch {
        return commit_prepared_updates(db, updates, actor_id).await;
@ -835,7 +847,12 @@ pub(super) async fn commit_prepared_updates_on_branch_with_expected(
    expected_table_versions: &std::collections::HashMap<String, u64>,
    actor_id: Option<&str>,
 ) -> Result<u64> {
-    let current_branch = db.coordinator.read().await.current_branch().map(str::to_string);
+    let current_branch = db
+        .coordinator
+        .read()
+        .await
+        .current_branch()
+        .map(str::to_string);
    let requested_branch = branch.map(str::to_string);
    if requested_branch == current_branch {
        return commit_prepared_updates_with_expected(
@ -870,7 +887,12 @@ pub(super) async fn commit_updates(
    updates: &[crate::db::SubTableUpdate],
 ) -> Result<u64> {
    db.ensure_schema_apply_not_locked("write commit").await?;
-    let current_branch = db.coordinator.read().await.current_branch().map(str::to_string);
+    let current_branch = db
+        .coordinator
+        .read()
+        .await
+        .current_branch()
+        .map(str::to_string);
    let prepared = prepare_updates_for_commit(db, current_branch.as_deref(), updates).await?;
    commit_prepared_updates(db, &prepared, None).await
 }
@ -879,7 +901,11 @@ pub(super) async fn commit_manifest_updates(
    db: &Omnigraph,
    updates: &[crate::db::SubTableUpdate],
 ) -> Result<u64> {
-    db.coordinator.write().await.commit_manifest_updates(updates).await
+    db.coordinator
+        .write()
+        .await
+        .commit_manifest_updates(updates)
+        .await
 }

 pub(super) async fn record_merge_commit(
@ -889,7 +915,9 @@ pub(super) async fn record_merge_commit(
    merged_parent_commit_id: &str,
    actor_id: Option<&str>,
 ) -> Result<String> {
-    db.coordinator.write().await
+    db.coordinator
+        .write()
+        .await
        .record_merge_commit(
            manifest_version,
            parent_commit_id,
@ -923,7 +951,11 @@ pub(super) async fn commit_updates_on_branch_with_expected(
 }

 pub(super) async fn ensure_commit_graph_initialized(db: &Omnigraph) -> Result<()> {
-    db.coordinator.write().await.ensure_commit_graph_initialized().await
+    db.coordinator
+        .write()
+        .await
+        .ensure_commit_graph_initialized()
+        .await
 }

 pub(super) async fn invalidate_graph_index(db: &Omnigraph) {
--- a/crates/omnigraph/src/db/write_queue.rs
+++ b/crates/omnigraph/src/db/write_queue.rs
@ -91,10 +91,7 @@ impl WriteQueueManager {
    /// Empty input returns an empty Vec without touching the map.
    /// Duplicates in `keys` are deduped before acquisition (the same
    /// key acquired twice would deadlock against itself).
-    pub(crate) async fn acquire_many(
-        &self,
-        keys: &[TableQueueKey],
-    ) -> Vec<OwnedMutexGuard<()>> {
+    pub(crate) async fn acquire_many(&self, keys: &[TableQueueKey]) -> Vec<OwnedMutexGuard<()>> {
        if keys.is_empty() {
            return Vec::new();
        }
@ -167,7 +164,10 @@ mod tests {
            qm2.acquire_many(&[z_clone, a_clone]).await
        })
        .await;
-        assert!(result.is_err(), "acquire_many should block on `a`, the lex-first key");
+        assert!(
+            result.is_err(),
+            "acquire_many should block on `a`, the lex-first key"
+        );
    }

    #[tokio::test]
@ -180,9 +180,10 @@ mod tests {
        // Second acquire on same key should NOT complete within 200ms.
        let qm2 = Arc::clone(&qm);
        let k2 = k.clone();
-        let blocked = timeout(Duration::from_millis(200), async move {
-            qm2.acquire(&k2).await
-        })
+        let blocked = timeout(
+            Duration::from_millis(200),
+            async move { qm2.acquire(&k2).await },
+        )
        .await;
        assert!(blocked.is_err(), "second acquire on same key must block");

--- a/crates/omnigraph/src/error.rs
+++ b/crates/omnigraph/src/error.rs
@ -92,6 +92,14 @@ pub enum OmniError {
    /// callers can match on this variant directly.
    #[error("policy: {0}")]
    Policy(String),
+    /// `Omnigraph::init` was called against a URI that already holds
+    /// schema artifacts from a previous init. Strict mode (the default)
+    /// fails fast with this error before touching disk so an existing
+    /// graph's metadata cannot be overwritten or destroyed. Operators
+    /// who actually want to overwrite pass `InitOptions { force: true }`
+    /// (CLI: `omnigraph init --force`).
+    #[error("graph already initialized at '{uri}'; pass --force to overwrite")]
+    AlreadyInitialized { uri: String },
 }

 impl OmniError {
--- a/crates/omnigraph/src/exec/mutation.rs
+++ b/crates/omnigraph/src/exec/mutation.rs
@ -794,11 +794,8 @@ impl Omnigraph {
                    // post_commit_pin) and tidies up. Failing the user
                    // here would return an error for a write that
                    // already landed.
-                    if let Err(err) = crate::db::manifest::delete_sidecar(
-                        &handle,
-                        self.storage_adapter(),
-                    )
-                    .await
+                    if let Err(err) =
+                        crate::db::manifest::delete_sidecar(&handle, self.storage_adapter()).await
                    {
                        tracing::warn!(
                            error = %err,
@ -852,15 +849,8 @@ impl Omnigraph {
                    assignments,
                    predicate,
                } => {
-                    self.execute_update(
-                        type_name,
-                        assignments,
-                        predicate,
-                        params,
-                        branch,
-                        staging,
-                    )
-                    .await?
+                    self.execute_update(type_name, assignments, predicate, params, branch, staging)
+                        .await?
                }
                MutationOpIR::Delete {
                    type_name,
@ -981,14 +971,8 @@ impl Omnigraph {
            // + iterate pending edges in-memory for the `src` column,
            // group-by-src. The pending side already includes the row
            // we just appended (above).
-            validate_edge_cardinality_with_pending(
-                self,
-                &ds,
-                staging,
-                &table_key,
-                edge_type,
-            )
-            .await?;
+            validate_edge_cardinality_with_pending(self, &ds, staging, &table_key, edge_type)
+                .await?;

            self.invalidate_graph_index().await;

@ -1379,14 +1363,8 @@ async fn validate_edge_cardinality_with_pending(
    if edge_type.cardinality.is_default() {
        return Ok(());
    }
-    let counts = super::staging::count_src_per_edge(
-        db,
-        committed_ds,
-        table_key,
-        staging,
-        None,
-    )
-    .await?;
+    let counts =
+        super::staging::count_src_per_edge(db, committed_ds, table_key, staging, None).await?;
    super::staging::enforce_cardinality_bounds(edge_type, &counts)
 }

--- a/crates/omnigraph/src/exec/projection.rs
+++ b/crates/omnigraph/src/exec/projection.rs
@ -345,10 +345,7 @@ fn evaluate_projection(
        IRExpr::PropAccess { variable, property } => {
            let col_name = format!("{}.{}", variable, property);
            let col = wide_batch.column_by_name(&col_name).ok_or_else(|| {
-                OmniError::manifest(format!(
-                    "column '{}' not found in wide batch",
-                    col_name
-                ))
+                OmniError::manifest(format!("column '{}' not found in wide batch", col_name))
            })?;
            Ok((col_name, col.clone()))
        }
@ -516,12 +513,10 @@ fn aggregate_return(
    }

    let num_groups = group_indices.len();
-    let mut result_columns: Vec<(usize, String, ArrayRef)> =
-        Vec::with_capacity(projections.len());
+    let mut result_columns: Vec<(usize, String, ArrayRef)> = Vec::with_capacity(projections.len());

    for gk in &group_keys {
-        let first_row_indices: Vec<u32> =
-            group_indices.iter().map(|rows| rows[0] as u32).collect();
+        let first_row_indices: Vec<u32> = group_indices.iter().map(|rows| rows[0] as u32).collect();
        let take_idx = UInt32Array::from(first_row_indices);
        let col = arrow_select::take::take(gk.column.as_ref(), &take_idx, None)
            .map_err(|e| OmniError::Lance(e.to_string()))?;
@ -584,11 +579,19 @@ fn compute_aggregate(
    }
 }

-fn compute_sum(arg: &ArrayRef, group_indices: &[Vec<usize>], num_groups: usize) -> Result<ArrayRef> {
+fn compute_sum(
+    arg: &ArrayRef,
+    group_indices: &[Vec<usize>],
+    num_groups: usize,
+) -> Result<ArrayRef> {
    macro_rules! sum_numeric {
        ($arr_type:ty, $arg:expr, $dt:expr) => {{
            let arr = $arg.as_any().downcast_ref::<$arr_type>().ok_or_else(|| {
-                OmniError::manifest(format!("sum: expected {:?}, got {:?}", $dt, $arg.data_type()))
+                OmniError::manifest(format!(
+                    "sum: expected {:?}, got {:?}",
+                    $dt,
+                    $arg.data_type()
+                ))
            })?;
            let mut builder = Float64Builder::with_capacity(num_groups);
            for group in group_indices {
@ -613,24 +616,42 @@ fn compute_sum(arg: &ArrayRef, group_indices: &[Vec<usize>], num_groups: usize)
        dt @ DataType::UInt64 => sum_numeric!(UInt64Array, arg, dt),
        dt @ DataType::Float32 => sum_numeric!(Float32Array, arg, dt),
        dt @ DataType::Float64 => sum_numeric!(Float64Array, arg, dt),
-        dt => Err(OmniError::manifest(format!("sum: unsupported type {:?}", dt))),
+        dt => Err(OmniError::manifest(format!(
+            "sum: unsupported type {:?}",
+            dt
+        ))),
    }
 }

-fn compute_avg(arg: &ArrayRef, group_indices: &[Vec<usize>], num_groups: usize) -> Result<ArrayRef> {
+fn compute_avg(
+    arg: &ArrayRef,
+    group_indices: &[Vec<usize>],
+    num_groups: usize,
+) -> Result<ArrayRef> {
    macro_rules! avg_typed {
        ($arr_type:ty, $arg:expr) => {{
            let arr = $arg.as_any().downcast_ref::<$arr_type>().ok_or_else(|| {
-                OmniError::manifest(format!("avg: expected {:?}, got {:?}", stringify!($arr_type), $arg.data_type()))
+                OmniError::manifest(format!(
+                    "avg: expected {:?}, got {:?}",
+                    stringify!($arr_type),
+                    $arg.data_type()
+                ))
            })?;
            let mut builder = Float64Builder::with_capacity(num_groups);
            for group in group_indices {
                let mut sum = 0.0f64;
                let mut count = 0usize;
                for &i in group {
-                    if !arr.is_null(i) { sum += arr.value(i) as f64; count += 1; }
+                    if !arr.is_null(i) {
+                        sum += arr.value(i) as f64;
+                        count += 1;
+                    }
+                }
+                if count > 0 {
+                    builder.append_value(sum / count as f64);
+                } else {
+                    builder.append_null();
                }
-                if count > 0 { builder.append_value(sum / count as f64); } else { builder.append_null(); }
            }
            Ok(Arc::new(builder.finish()) as ArrayRef)
        }};
@ -642,15 +663,27 @@ fn compute_avg(arg: &ArrayRef, group_indices: &[Vec<usize>], num_groups: usize)
        DataType::UInt64 => avg_typed!(UInt64Array, arg),
        DataType::Float32 => avg_typed!(Float32Array, arg),
        DataType::Float64 => avg_typed!(Float64Array, arg),
-        dt => Err(OmniError::manifest(format!("avg: unsupported type {:?}", dt))),
+        dt => Err(OmniError::manifest(format!(
+            "avg: unsupported type {:?}",
+            dt
+        ))),
    }
 }

-fn compute_min_max(arg: &ArrayRef, group_indices: &[Vec<usize>], num_groups: usize, is_min: bool) -> Result<ArrayRef> {
+fn compute_min_max(
+    arg: &ArrayRef,
+    group_indices: &[Vec<usize>],
+    num_groups: usize,
+    is_min: bool,
+) -> Result<ArrayRef> {
    macro_rules! minmax_typed {
        ($arr_type:ty, $builder_type:ty, $arg:expr, $is_min:expr) => {{
            let arr = $arg.as_any().downcast_ref::<$arr_type>().ok_or_else(|| {
-                OmniError::manifest(format!("min/max: expected {:?}, got {:?}", stringify!($arr_type), $arg.data_type()))
+                OmniError::manifest(format!(
+                    "min/max: expected {:?}, got {:?}",
+                    stringify!($arr_type),
+                    $arg.data_type()
+                ))
            })?;
            let mut builder = <$builder_type>::with_capacity(num_groups);
            for group in group_indices {
@ -660,11 +693,20 @@ fn compute_min_max(arg: &ArrayRef, group_indices: &[Vec<usize>], num_groups: usi
                        let v = arr.value(i);
                        result = Some(match result {
                            None => v,
-                            Some(cur) => if $is_min { if v < cur { v } else { cur } } else { if v > cur { v } else { cur } },
+                            Some(cur) => {
+                                if $is_min {
+                                    if v < cur { v } else { cur }
+                                } else {
+                                    if v > cur { v } else { cur }
+                                }
+                            }
                        });
                    }
                }
-                match result { Some(v) => builder.append_value(v), None => builder.append_null() }
+                match result {
+                    Some(v) => builder.append_value(v),
+                    None => builder.append_null(),
+                }
            }
            Ok(Arc::new(builder.finish()) as ArrayRef)
        }};
@ -688,15 +730,27 @@ fn compute_min_max(arg: &ArrayRef, group_indices: &[Vec<usize>], num_groups: usi
                        let v = arr.value(i);
                        result = Some(match result {
                            None => v,
-                            Some(cur) => if is_min { if v < cur { v } else { cur } } else { if v > cur { v } else { cur } },
+                            Some(cur) => {
+                                if is_min {
+                                    if v < cur { v } else { cur }
+                                } else {
+                                    if v > cur { v } else { cur }
+                                }
+                            }
                        });
                    }
                }
-                match result { Some(v) => builder.append_value(v), None => builder.append_null() }
+                match result {
+                    Some(v) => builder.append_value(v),
+                    None => builder.append_null(),
+                }
            }
            Ok(Arc::new(builder.finish()) as ArrayRef)
        }
-        dt => Err(OmniError::manifest(format!("min/max: unsupported type {:?}", dt))),
+        dt => Err(OmniError::manifest(format!(
+            "min/max: unsupported type {:?}",
+            dt
+        ))),
    }
 }

@ -715,7 +769,8 @@ fn build_empty_aggregate_result(projections: &[IRProjection]) -> Result<RecordBa
                }
                _ => {
                    fields.push(Field::new(name, DataType::Float64, true));
-                    columns.push(Arc::new(Float64Array::from(vec![None as Option<f64>])) as ArrayRef);
+                    columns
+                        .push(Arc::new(Float64Array::from(vec![None as Option<f64>])) as ArrayRef);
                }
            },
            _ => {
--- a/crates/omnigraph/src/exec/query.rs
+++ b/crates/omnigraph/src/exec/query.rs
@ -75,14 +75,7 @@ impl Omnigraph {
            None
        };

-        execute_query(
-            &ir,
-            params,
-            &snapshot,
-            graph_index.as_deref(),
-            &catalog,
-        )
-        .await
+        execute_query(&ir, params, &snapshot, graph_index.as_deref(), &catalog).await
    }
 }

@ -360,11 +353,23 @@ pub async fn execute_query(
    }

    let mut wide: Option<RecordBatch> = None;
-    execute_pipeline(&ir.pipeline, params, snapshot, graph_index, catalog, &mut wide, &search_mode).await?;
+    execute_pipeline(
+        &ir.pipeline,
+        params,
+        snapshot,
+        graph_index,
+        catalog,
+        &mut wide,
+        &search_mode,
+    )
+    .await?;
    let wide_batch = wide.unwrap_or_else(|| RecordBatch::new_empty(Arc::new(Schema::empty())));

    // Project return expressions
-    let has_aggregates = ir.return_exprs.iter().any(|p| matches!(&p.expr, IRExpr::Aggregate { .. }));
+    let has_aggregates = ir
+        .return_exprs
+        .iter()
+        .any(|p| matches!(&p.expr, IRExpr::Aggregate { .. }));
    let mut result_batch = project_return(&wide_batch, &ir.return_exprs, params)?;

    // Apply ordering (skip if search mode already ordered the results)
@ -516,9 +521,9 @@ async fn execute_rrf_query(
 }

 fn extract_id_column_by_name(batch: &RecordBatch, col_name: &str) -> Result<Vec<String>> {
-    let col = batch
-        .column_by_name(col_name)
-        .ok_or_else(|| OmniError::manifest(format!("batch missing '{}' column for RRF", col_name)))?;
+    let col = batch.column_by_name(col_name).ok_or_else(|| {
+        OmniError::manifest(format!("batch missing '{}' column for RRF", col_name))
+    })?;
    let ids = col
        .as_any()
        .downcast_ref::<StringArray>()
@ -653,8 +658,19 @@ fn execute_pipeline<'a>(
                    })?;
                    if let Some(batch) = wide.as_mut() {
                        execute_expand(
-                            batch, gi, snapshot, catalog, src_var, dst_var, edge_type, *direction,
-                            dst_type, *min_hops, *max_hops, dst_filters, params,
+                            batch,
+                            gi,
+                            snapshot,
+                            catalog,
+                            src_var,
+                            dst_var,
+                            edge_type,
+                            *direction,
+                            dst_type,
+                            *min_hops,
+                            *max_hops,
+                            dst_filters,
+                            params,
                        )
                        .await?;
                    }
@ -691,7 +707,9 @@ async fn execute_expand(
    let src_id_col_name = format!("{}.id", src_var);
    let src_ids = wide
        .column_by_name(&src_id_col_name)
-        .ok_or_else(|| OmniError::manifest(format!("wide batch missing '{}' column", src_id_col_name)))?
+        .ok_or_else(|| {
+            OmniError::manifest(format!("wide batch missing '{}' column", src_id_col_name))
+        })?
        .as_any()
        .downcast_ref::<StringArray>()
        .ok_or_else(|| OmniError::manifest(format!("'{}' column is not Utf8", src_id_col_name)))?
@ -1421,22 +1439,39 @@ fn literal_to_expr(lit: &Literal) -> Option<datafusion::prelude::Expr> {
 }

 fn prefix_batch(batch: &RecordBatch, variable: &str) -> Result<RecordBatch> {
-    let fields: Vec<Field> = batch.schema().fields().iter().map(|f| {
-        Field::new(format!("{}.{}", variable, f.name()), f.data_type().clone(), f.is_nullable())
-    }).collect();
+    let fields: Vec<Field> = batch
+        .schema()
+        .fields()
+        .iter()
+        .map(|f| {
+            Field::new(
+                format!("{}.{}", variable, f.name()),
+                f.data_type().clone(),
+                f.is_nullable(),
+            )
+        })
+        .collect();
    let schema = Arc::new(Schema::new(fields));
-    RecordBatch::try_new(schema, batch.columns().to_vec()).map_err(|e| OmniError::Lance(e.to_string()))
+    RecordBatch::try_new(schema, batch.columns().to_vec())
+        .map_err(|e| OmniError::Lance(e.to_string()))
 }

 fn cross_join_batches(left: &RecordBatch, right: &RecordBatch) -> Result<RecordBatch> {
    let n = left.num_rows();
    let m = right.num_rows();
    if n == 0 || m == 0 {
-        let mut fields: Vec<Field> = left.schema().fields().iter().map(|f| f.as_ref().clone()).collect();
+        let mut fields: Vec<Field> = left
+            .schema()
+            .fields()
+            .iter()
+            .map(|f| f.as_ref().clone())
+            .collect();
        fields.extend(right.schema().fields().iter().map(|f| f.as_ref().clone()));
        return Ok(RecordBatch::new_empty(Arc::new(Schema::new(fields))));
    }
-    let left_indices: Vec<u32> = (0..n as u32).flat_map(|i| std::iter::repeat(i).take(m)).collect();
+    let left_indices: Vec<u32> = (0..n as u32)
+        .flat_map(|i| std::iter::repeat(i).take(m))
+        .collect();
    let right_indices: Vec<u32> = (0..n).flat_map(|_| 0..m as u32).collect();
    let left_expanded = take_batch(left, &UInt32Array::from(left_indices))?;
    let right_expanded = take_batch(right, &UInt32Array::from(right_indices))?;
@ -1444,23 +1479,39 @@ fn cross_join_batches(left: &RecordBatch, right: &RecordBatch) -> Result<RecordB
 }

 fn hconcat_batches(left: &RecordBatch, right: &RecordBatch) -> Result<RecordBatch> {
-    let mut fields: Vec<Field> = left.schema().fields().iter().map(|f| f.as_ref().clone()).collect();
+    let mut fields: Vec<Field> = left
+        .schema()
+        .fields()
+        .iter()
+        .map(|f| f.as_ref().clone())
+        .collect();
    if cfg!(debug_assertions) {
        let left_schema = left.schema();
-        let left_names: HashSet<&str> = left_schema.fields().iter().map(|f| f.name().as_str()).collect();
+        let left_names: HashSet<&str> = left_schema
+            .fields()
+            .iter()
+            .map(|f| f.name().as_str())
+            .collect();
        let right_schema = right.schema();
        for f in right_schema.fields() {
-            debug_assert!(!left_names.contains(f.name().as_str()), "hconcat_batches: duplicate column '{}'", f.name());
+            debug_assert!(
+                !left_names.contains(f.name().as_str()),
+                "hconcat_batches: duplicate column '{}'",
+                f.name()
+            );
        }
    }
    fields.extend(right.schema().fields().iter().map(|f| f.as_ref().clone()));
    let mut columns: Vec<ArrayRef> = left.columns().to_vec();
    columns.extend(right.columns().to_vec());
-    RecordBatch::try_new(Arc::new(Schema::new(fields)), columns).map_err(|e| OmniError::Lance(e.to_string()))
+    RecordBatch::try_new(Arc::new(Schema::new(fields)), columns)
+        .map_err(|e| OmniError::Lance(e.to_string()))
 }

 fn take_batch(batch: &RecordBatch, indices: &UInt32Array) -> Result<RecordBatch> {
-    let columns: Vec<ArrayRef> = batch.columns().iter()
+    let columns: Vec<ArrayRef> = batch
+        .columns()
+        .iter()
        .map(|col| arrow_select::take::take(col.as_ref(), indices, None))
        .collect::<std::result::Result<Vec<_>, _>>()
        .map_err(|e| OmniError::Lance(e.to_string()))?;
--- a/crates/omnigraph/src/loader/mod.rs
+++ b/crates/omnigraph/src/loader/mod.rs
@ -212,12 +212,7 @@ impl Omnigraph {
            .await
    }

-    pub async fn load_file(
-        &self,
-        branch: &str,
-        path: &str,
-        mode: LoadMode,
-    ) -> Result<LoadResult> {
+    pub async fn load_file(&self, branch: &str, path: &str, mode: LoadMode) -> Result<LoadResult> {
        self.load_file_as(branch, path, mode, None).await
    }

@ -457,13 +452,7 @@ async fn load_jsonl_reader<R: BufRead>(
    for (edge_name, rows) in &edge_rows {
        let edge_type = &catalog.edge_types[edge_name];
        let from_ids = if use_staging {
-            collect_node_ids_with_pending(
-                db,
-                branch,
-                &edge_type.from_type,
-                &staging,
-            )
-            .await?
+            collect_node_ids_with_pending(db, branch, &edge_type.from_type, &staging).await?
        } else {
            collect_node_ids(
                db,
@ -476,13 +465,7 @@ async fn load_jsonl_reader<R: BufRead>(
            .await?
        };
        let to_ids = if use_staging {
-            collect_node_ids_with_pending(
-                db,
-                branch,
-                &edge_type.to_type,
-                &staging,
-            )
-            .await?
+            collect_node_ids_with_pending(db, branch, &edge_type.to_type, &staging).await?
        } else {
            collect_node_ids(
                db,
@ -581,12 +564,7 @@ async fn load_jsonl_reader<R: BufRead>(
        let table_key = format!("edge:{}", edge_name);
        if use_staging {
            validate_edge_cardinality_with_pending_loader(
-                db,
-                branch,
-                edge_type,
-                &table_key,
-                &staging,
-                mode,
+                db, branch, edge_type, &table_key, &staging, mode,
            )
            .await?;
        } else if let Some(update) = overwrite_updates.iter().find(|u| u.table_key == table_key) {
@ -1699,8 +1677,7 @@ async fn validate_edge_cardinality_with_pending_loader(
        LoadMode::Append | LoadMode::Overwrite => None,
    };
    let counts =
-        crate::exec::staging::count_src_per_edge(db, &ds, table_key, staging, dedupe_key)
-            .await?;
+        crate::exec::staging::count_src_per_edge(db, &ds, table_key, staging, dedupe_key).await?;
    crate::exec::staging::enforce_cardinality_bounds(edge_type, &counts)
 }

--- a/crates/omnigraph/src/storage.rs
+++ b/crates/omnigraph/src/storage.rs
@ -7,7 +7,8 @@ use async_trait::async_trait;
 use futures::TryStreamExt;
 use object_store::aws::AmazonS3Builder;
 use object_store::path::Path as ObjectPath;
-use object_store::{DynObjectStore, ObjectStore, PutPayload};
+use object_store::{DynObjectStore, ObjectStore, PutMode, PutPayload};
+use tokio::io::AsyncWriteExt;
 use url::Url;

 use crate::error::{OmniError, Result};
@ -19,6 +20,13 @@ const S3_SCHEME_PREFIX: &str = "s3://";
 pub trait StorageAdapter: Debug + Send + Sync {
    async fn read_text(&self, uri: &str) -> Result<String>;
    async fn write_text(&self, uri: &str, contents: &str) -> Result<()>;
+    /// Write a text object only if no object exists at `uri`.
+    ///
+    /// Returns `Ok(true)` when this call created the object, `Ok(false)`
+    /// when the object already existed, and propagates every other storage
+    /// error. Callers use this to establish ownership before running
+    /// best-effort cleanup on partial failure.
+    async fn write_text_if_absent(&self, uri: &str, contents: &str) -> Result<bool>;
    async fn exists(&self, uri: &str) -> Result<bool>;
    /// Move a file from `from_uri` to `to_uri`, replacing any existing file at
    /// `to_uri`. Atomic on local POSIX; on S3 implemented as copy + delete
@ -77,6 +85,30 @@ impl StorageAdapter for LocalStorageAdapter {
        Ok(())
    }

+    async fn write_text_if_absent(&self, uri: &str, contents: &str) -> Result<bool> {
+        let path = local_path_from_uri(uri)?;
+        if let Some(parent) = path.parent() {
+            if !parent.as_os_str().is_empty() {
+                tokio::fs::create_dir_all(parent).await?;
+            }
+        }
+        let mut file = match tokio::fs::OpenOptions::new()
+            .write(true)
+            .create_new(true)
+            .open(&path)
+            .await
+        {
+            Ok(file) => file,
+            Err(err) if err.kind() == std::io::ErrorKind::AlreadyExists => return Ok(false),
+            Err(err) => return Err(err.into()),
+        };
+        if let Err(err) = file.write_all(contents.as_bytes()).await {
+            let _ = tokio::fs::remove_file(&path).await;
+            return Err(err.into());
+        }
+        Ok(true)
+    }
+
    async fn exists(&self, uri: &str) -> Result<bool> {
        Ok(local_path_from_uri(uri)?.exists())
    }
@ -146,6 +178,24 @@ impl StorageAdapter for S3StorageAdapter {
        Ok(())
    }

+    async fn write_text_if_absent(&self, uri: &str, contents: &str) -> Result<bool> {
+        let location = self.object_path(uri)?;
+        match self
+            .store
+            .put_opts(
+                &location,
+                PutPayload::from(contents.as_bytes().to_vec()),
+                PutMode::Create.into(),
+            )
+            .await
+        {
+            Ok(_) => Ok(true),
+            Err(object_store::Error::AlreadyExists { .. })
+            | Err(object_store::Error::Precondition { .. }) => Ok(false),
+            Err(err) => Err(storage_backend_error("write_if_absent", uri, err)),
+        }
+    }
+
    async fn exists(&self, uri: &str) -> Result<bool> {
        let location = self.object_path(uri)?;
        match self.store.head(&location).await {
@ -447,4 +497,16 @@ mod tests {
        assert_eq!(location.bucket, "bucket");
        assert_eq!(location.key, "graph/_schema.pg");
    }
+
+    #[tokio::test]
+    async fn local_write_text_if_absent_creates_once_without_overwrite() {
+        let dir = tempfile::tempdir().unwrap();
+        let uri = dir.path().join("claim.txt");
+        let uri = uri.to_str().unwrap();
+        let storage = LocalStorageAdapter;
+
+        assert!(storage.write_text_if_absent(uri, "first").await.unwrap());
+        assert!(!storage.write_text_if_absent(uri, "second").await.unwrap());
+        assert_eq!(storage.read_text(uri).await.unwrap(), "first");
+    }
 }
--- a/crates/omnigraph/src/storage_layer.rs
+++ b/crates/omnigraph/src/storage_layer.rs
@ -94,7 +94,9 @@ impl SnapshotHandle {
    /// Construct from a Lance dataset. `pub(crate)` — only
    /// `TableStore` should produce these.
    pub(crate) fn new(ds: Dataset) -> Self {
-        Self { inner: Arc::new(ds) }
+        Self {
+            inner: Arc::new(ds),
+        }
    }

    /// Borrow the underlying Lance dataset. `pub(crate)` so only the
@ -242,16 +244,10 @@ pub trait TableStorage: sealed::Sealed + Send + Sync + Debug {

    async fn scan_batches(&self, snapshot: &SnapshotHandle) -> Result<Vec<RecordBatch>>;

-    async fn scan_batches_for_rewrite(
-        &self,
-        snapshot: &SnapshotHandle,
-    ) -> Result<Vec<RecordBatch>>;
+    async fn scan_batches_for_rewrite(&self, snapshot: &SnapshotHandle)
+    -> Result<Vec<RecordBatch>>;

-    async fn count_rows(
-        &self,
-        snapshot: &SnapshotHandle,
-        filter: Option<String>,
-    ) -> Result<usize>;
+    async fn count_rows(&self, snapshot: &SnapshotHandle, filter: Option<String>) -> Result<usize>;

    async fn count_rows_with_staged(
        &self,
@ -284,11 +280,8 @@ pub trait TableStorage: sealed::Sealed + Send + Sync + Debug {
        filter: &str,
    ) -> Result<Option<u64>>;

-    async fn table_state(
-        &self,
-        dataset_uri: &str,
-        snapshot: &SnapshotHandle,
-    ) -> Result<TableState>;
+    async fn table_state(&self, dataset_uri: &str, snapshot: &SnapshotHandle)
+    -> Result<TableState>;

    // ── Staged writes (no HEAD advance) ────────────────────────────────

@ -565,11 +558,7 @@ impl TableStorage for TableStore {
        TableStore::scan_batches_for_rewrite(self, snapshot.dataset()).await
    }

-    async fn count_rows(
-        &self,
-        snapshot: &SnapshotHandle,
-        filter: Option<String>,
-    ) -> Result<usize> {
+    async fn count_rows(&self, snapshot: &SnapshotHandle, filter: Option<String>) -> Result<usize> {
        TableStore::count_rows(self, snapshot.dataset(), filter).await
    }

@ -591,14 +580,8 @@ impl TableStorage for TableStore {
        filter: Option<&str>,
    ) -> Result<Vec<RecordBatch>> {
        let staged_writes = staged_handles_as_writes(staged);
-        TableStore::scan_with_staged(
-            self,
-            snapshot.dataset(),
-            &staged_writes,
-            projection,
-            filter,
-        )
-        .await
+        TableStore::scan_with_staged(self, snapshot.dataset(), &staged_writes, projection, filter)
+            .await
    }

    async fn scan_with_pending(
@ -658,18 +641,10 @@ impl TableStorage for TableStore {
        when_matched: WhenMatched,
        when_not_matched: WhenNotMatched,
    ) -> Result<StagedHandle> {
-        let ds = Arc::try_unwrap(snapshot.into_arc())
-            .unwrap_or_else(|arc| (*arc).clone());
-        TableStore::stage_merge_insert(
-            self,
-            ds,
-            batch,
-            key_columns,
-            when_matched,
-            when_not_matched,
-        )
-        .await
-        .map(StagedHandle::new)
+        let ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
+        TableStore::stage_merge_insert(self, ds, batch, key_columns, when_matched, when_not_matched)
+            .await
+            .map(StagedHandle::new)
    }

    async fn commit_staged(
@ -720,8 +695,7 @@ impl TableStorage for TableStore {
        snapshot: SnapshotHandle,
        batch: RecordBatch,
    ) -> Result<(SnapshotHandle, TableState)> {
-        let mut ds = Arc::try_unwrap(snapshot.into_arc())
-            .unwrap_or_else(|arc| (*arc).clone());
+        let mut ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
        let state = TableStore::append_batch(self, dataset_uri, &mut ds, batch).await?;
        Ok((SnapshotHandle::new(ds), state))
    }
@ -735,8 +709,7 @@ impl TableStorage for TableStore {
        when_matched: WhenMatched,
        when_not_matched: WhenNotMatched,
    ) -> Result<TableState> {
-        let ds = Arc::try_unwrap(snapshot.into_arc())
-            .unwrap_or_else(|arc| (*arc).clone());
+        let ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
        TableStore::merge_insert_batches(
            self,
            dataset_uri,
@ -755,8 +728,7 @@ impl TableStorage for TableStore {
        snapshot: SnapshotHandle,
        batch: RecordBatch,
    ) -> Result<(SnapshotHandle, TableState)> {
-        let mut ds = Arc::try_unwrap(snapshot.into_arc())
-            .unwrap_or_else(|arc| (*arc).clone());
+        let mut ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
        let state = TableStore::overwrite_batch(self, dataset_uri, &mut ds, batch).await?;
        Ok((SnapshotHandle::new(ds), state))
    }
@ -767,8 +739,7 @@ impl TableStorage for TableStore {
        snapshot: SnapshotHandle,
        filter: &str,
    ) -> Result<(SnapshotHandle, DeleteState)> {
-        let mut ds = Arc::try_unwrap(snapshot.into_arc())
-            .unwrap_or_else(|arc| (*arc).clone());
+        let mut ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
        let state = TableStore::delete_where(self, dataset_uri, &mut ds, filter).await?;
        Ok((SnapshotHandle::new(ds), state))
    }
@ -790,8 +761,7 @@ impl TableStorage for TableStore {
        snapshot: SnapshotHandle,
        columns: &[&str],
    ) -> Result<SnapshotHandle> {
-        let mut ds = Arc::try_unwrap(snapshot.into_arc())
-            .unwrap_or_else(|arc| (*arc).clone());
+        let mut ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
        TableStore::create_btree_index(self, &mut ds, columns).await?;
        Ok(SnapshotHandle::new(ds))
    }
@ -801,8 +771,7 @@ impl TableStorage for TableStore {
        snapshot: SnapshotHandle,
        column: &str,
    ) -> Result<SnapshotHandle> {
-        let mut ds = Arc::try_unwrap(snapshot.into_arc())
-            .unwrap_or_else(|arc| (*arc).clone());
+        let mut ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
        TableStore::create_inverted_index(self, &mut ds, column).await?;
        Ok(SnapshotHandle::new(ds))
    }
@ -812,8 +781,7 @@ impl TableStorage for TableStore {
        snapshot: SnapshotHandle,
        column: &str,
    ) -> Result<SnapshotHandle> {
-        let mut ds = Arc::try_unwrap(snapshot.into_arc())
-            .unwrap_or_else(|arc| (*arc).clone());
+        let mut ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
        TableStore::create_vector_index(self, &mut ds, column).await?;
        Ok(SnapshotHandle::new(ds))
    }
@ -837,6 +805,13 @@ impl TableStorage for TableStore {
        // Note: existing TableStore::scan_stream is an associated fn that
        // takes &Dataset, so we delegate via the dataset reference held by
        // the snapshot.
-        TableStore::scan_stream(snapshot.dataset(), projection, filter, order_by, with_row_id).await
+        TableStore::scan_stream(
+            snapshot.dataset(),
+            projection,
+            filter,
+            order_by,
+            with_row_id,
+        )
+        .await
    }
 }
--- a/crates/omnigraph/src/table_store.rs
+++ b/crates/omnigraph/src/table_store.rs
@ -1793,25 +1793,24 @@ mod tests {
    #[test]
    fn check_batch_unique_by_keys_errors_on_duplicate_id() {
        let batch = batch_with_ids(&["a", "b", "a"]);
-        let err =
-            check_batch_unique_by_keys(&batch, &["id".to_string()], "test").unwrap_err();
+        let err = check_batch_unique_by_keys(&batch, &["id".to_string()], "test").unwrap_err();
        let msg = err.to_string();
        assert!(
            msg.contains("duplicate source row for key 'a'"),
            "unexpected error: {msg}"
        );
-        assert!(msg.contains("MR-957"), "error should reference MR-957: {msg}");
+        assert!(
+            msg.contains("MR-957"),
+            "error should reference MR-957: {msg}"
+        );
    }

    #[test]
    fn check_batch_unique_by_keys_rejects_multi_column_keys() {
        let batch = batch_with_ids(&["a"]);
-        let err = check_batch_unique_by_keys(
-            &batch,
-            &["id".to_string(), "other".to_string()],
-            "test",
-        )
-        .unwrap_err();
+        let err =
+            check_batch_unique_by_keys(&batch, &["id".to_string(), "other".to_string()], "test")
+                .unwrap_err();
        assert!(err.to_string().contains("single-column keys only"));
    }
 }
--- a/crates/omnigraph/tests/end_to_end.rs
+++ b/crates/omnigraph/tests/end_to_end.rs
@ -1910,9 +1910,14 @@ query docs_with_tag($tag: String) {
    return { $d.slug }
 }
 "#;
-    let result = query_main(&mut db, queries, "docs_with_tag", &params(&[("$tag", "red")]))
-        .await
-        .unwrap();
+    let result = query_main(
+        &mut db,
+        queries,
+        "docs_with_tag",
+        &params(&[("$tag", "red")]),
+    )
+    .await
+    .unwrap();

    let batch = result.concat_batches().unwrap();
    let slugs = batch
--- a/crates/omnigraph/tests/failpoints.rs
+++ b/crates/omnigraph/tests/failpoints.rs
@ -1666,3 +1666,143 @@ async fn ensure_indices_phase_b_failure_does_not_leak_sidecar_when_no_work_neede
        "_graph_commit_recoveries.lance must NOT exist when no sidecar was processed"
    );
 }
+
+// ─── MR-668 PR 2a: Omnigraph::init cleanup on partial failure ──────────────
+//
+// `init_with_storage` writes three schema artifacts before invoking
+// `GraphCoordinator::init`. Without cleanup, a failure between any of those
+// steps left orphan files behind, making the URI unusable for a retry of
+// `init` (it would refuse because `_schema.pg` already exists). The tests
+// below pin: on failpoint trigger at each of the three phase boundaries,
+// the three schema files are removed before the error is returned.
+//
+// Coverage note: the third boundary (`init.after_coordinator_init`) only
+// asserts cleanup of the schema files. Lance per-type directories and
+// `__manifest/` are NOT cleaned up — that requires a recursive
+// `StorageAdapter::delete_prefix` primitive deferred along with
+// `DELETE /graphs/{id}` (MR-668 PR 2b). The orphan Lance directories
+// after a coordinator-init-phase failure are documented as a known
+// limitation.
+
+#[tokio::test]
+async fn init_failpoint_after_schema_pg_written_cleans_up_schema_file() {
+    let _scenario = FailScenario::setup();
+    let dir = tempfile::tempdir().unwrap();
+    let uri = dir.path().to_str().unwrap();
+    let _failpoint = ScopedFailPoint::new("init.after_schema_pg_written", "return");
+
+    let err = match Omnigraph::init(uri, helpers::TEST_SCHEMA).await {
+        Ok(_) => panic!("expected Omnigraph::init to fail at the configured failpoint"),
+        Err(e) => e,
+    };
+    assert!(
+        err.to_string()
+            .contains("injected failpoint triggered: init.after_schema_pg_written"),
+        "got: {err}"
+    );
+
+    // Only `_schema.pg` was written at this phase boundary, but the
+    // cleanup attempts all three — `delete` treats not-found as Ok,
+    // so the other two deletes are no-ops.
+    assert!(
+        !dir.path().join("_schema.pg").exists(),
+        "_schema.pg must be cleaned up after init failure"
+    );
+}
+
+#[tokio::test]
+async fn init_failpoint_after_schema_contract_written_cleans_up_all_schema_files() {
+    let _scenario = FailScenario::setup();
+    let dir = tempfile::tempdir().unwrap();
+    let uri = dir.path().to_str().unwrap();
+    let _failpoint = ScopedFailPoint::new("init.after_schema_contract_written", "return");
+
+    let err = match Omnigraph::init(uri, helpers::TEST_SCHEMA).await {
+        Ok(_) => panic!("expected Omnigraph::init to fail at the configured failpoint"),
+        Err(e) => e,
+    };
+    assert!(
+        err.to_string()
+            .contains("injected failpoint triggered: init.after_schema_contract_written"),
+        "got: {err}"
+    );
+
+    assert!(
+        !dir.path().join("_schema.pg").exists(),
+        "_schema.pg must be cleaned up"
+    );
+    assert!(
+        !dir.path().join("_schema.ir.json").exists(),
+        "_schema.ir.json must be cleaned up"
+    );
+    assert!(
+        !dir.path().join("__schema_state.json").exists(),
+        "__schema_state.json must be cleaned up"
+    );
+}
+
+#[tokio::test]
+async fn init_failpoint_after_coordinator_init_cleans_up_schema_files() {
+    let _scenario = FailScenario::setup();
+    let dir = tempfile::tempdir().unwrap();
+    let uri = dir.path().to_str().unwrap();
+    let _failpoint = ScopedFailPoint::new("init.after_coordinator_init", "return");
+
+    let err = match Omnigraph::init(uri, helpers::TEST_SCHEMA).await {
+        Ok(_) => panic!("expected Omnigraph::init to fail at the configured failpoint"),
+        Err(e) => e,
+    };
+    assert!(
+        err.to_string()
+            .contains("injected failpoint triggered: init.after_coordinator_init"),
+        "got: {err}"
+    );
+
+    // Schema files are cleaned up by `best_effort_cleanup_init_artifacts`.
+    assert!(
+        !dir.path().join("_schema.pg").exists(),
+        "_schema.pg must be cleaned up after late-phase init failure"
+    );
+    assert!(
+        !dir.path().join("_schema.ir.json").exists(),
+        "_schema.ir.json must be cleaned up after late-phase init failure"
+    );
+    assert!(
+        !dir.path().join("__schema_state.json").exists(),
+        "__schema_state.json must be cleaned up after late-phase init failure"
+    );
+
+    // Documented limitation: Lance per-type datasets and `__manifest/`
+    // created by `GraphCoordinator::init` are NOT cleaned up — recursive
+    // deletion requires the deferred `delete_prefix` primitive. This
+    // assertion does NOT check for their absence; it merely documents
+    // the boundary by noting we don't validate orphan directories here.
+    // When PR 2b lands, this test can be tightened to assert the graph
+    // root is fully empty.
+}
+
+#[tokio::test]
+async fn init_failpoint_returns_original_error_not_cleanup_error() {
+    // The cleanup is best-effort. If `storage.delete` fails (e.g. transient
+    // network blip on S3), the original init failpoint error must still
+    // surface — not be masked by a cleanup failure. This test triggers the
+    // failpoint and asserts the returned error references the failpoint,
+    // not the cleanup. (The cleanup currently logs via `tracing::warn`;
+    // we can't easily fault-inject delete failures without another seam,
+    // so this is a smoke test for the precedence contract.)
+    let _scenario = FailScenario::setup();
+    let dir = tempfile::tempdir().unwrap();
+    let uri = dir.path().to_str().unwrap();
+    let _failpoint = ScopedFailPoint::new("init.after_schema_pg_written", "return");
+
+    let err = match Omnigraph::init(uri, helpers::TEST_SCHEMA).await {
+        Ok(_) => panic!("expected Omnigraph::init to fail at the configured failpoint"),
+        Err(e) => e,
+    };
+    // Failpoint message wins; no "cleanup" substring expected.
+    let msg = err.to_string();
+    assert!(
+        msg.contains("init.after_schema_pg_written"),
+        "init error must surface the failpoint cause, got: {msg}"
+    );
+}
--- a/crates/omnigraph/tests/forbidden_apis.rs
+++ b/crates/omnigraph/tests/forbidden_apis.rs
@ -95,11 +95,11 @@ const FORBIDDEN_PATTERNS: &[&str] = &[
 /// provide the staged primitives or to maintain the system tables
 /// (commit graph, manifest).
 const ALLOW_LIST_FILES: &[&str] = &[
-    "table_store.rs",        // The storage layer itself.
-    "storage_layer.rs",      // The trait module.
-    "commit_graph.rs",       // Maintains `_graph_commits.lance` system table.
-    "graph_coordinator.rs",  // Drives the manifest publisher / branch coordinator.
-    "recovery_audit.rs",     // Maintains `_graph_commit_recoveries.lance` (recovery audit trail).
+    "table_store.rs",       // The storage layer itself.
+    "storage_layer.rs",     // The trait module.
+    "commit_graph.rs",      // Maintains `_graph_commits.lance` system table.
+    "graph_coordinator.rs", // Drives the manifest publisher / branch coordinator.
+    "recovery_audit.rs",    // Maintains `_graph_commit_recoveries.lance` (recovery audit trail).
 ];

 /// Directories exempt from the guard. Files under these paths may use
@ -168,10 +168,7 @@ fn engine_code_does_not_call_forbidden_lance_apis() {
            // comments are documentation, not code use. The trait
            // surface (sealed + trait-only) is the actual enforcement;
            // this test only catches code use.
-            if trimmed.starts_with("//")
-                || trimmed.starts_with("/*")
-                || trimmed.starts_with("*")
-            {
+            if trimmed.starts_with("//") || trimmed.starts_with("/*") || trimmed.starts_with("*") {
                continue;
            }
            // Allow lines marked with the sentinel on the SAME line or
--- a/crates/omnigraph/tests/lifecycle.rs
+++ b/crates/omnigraph/tests/lifecycle.rs
@ -2,7 +2,7 @@ mod helpers;

 use std::fs;

-use omnigraph::db::{Omnigraph, ReadTarget};
+use omnigraph::db::{InitOptions, Omnigraph, ReadTarget};
 use omnigraph_compiler::schema::parser::parse_schema;
 use omnigraph_compiler::{build_schema_ir, schema_ir_pretty_json};

@ -185,3 +185,122 @@ async fn snapshot_version_is_pinned() {

    assert_eq!(snap1.version(), v1);
 }
+
+/// Regression for the `Omnigraph::init` re-init footgun (MR-668
+/// follow-up): a second `init` against a URI that already holds a
+/// graph must NOT modify or destroy the existing graph's schema
+/// artifacts. Today's behavior is destructive either way — the
+/// `write_text(_schema.pg, ...)` call at the top of
+/// `init_storage_phase` overwrites the existing file before any
+/// preflight, and `best_effort_cleanup_init_artifacts` will later
+/// delete all three files if the inner `GraphCoordinator::init`
+/// fails. Both outcomes corrupt an existing graph.
+///
+/// After the fix: strict-mode `init` (no `force` flag) errors out
+/// before touching any file, and the original schema artifacts
+/// match their pre-attempt contents byte-for-byte.
+#[tokio::test]
+async fn init_on_existing_graph_uri_does_not_destroy_existing_schema() {
+    let dir = tempfile::tempdir().unwrap();
+    let uri = dir.path().to_str().unwrap();
+
+    // Establish the first graph and snapshot its three schema files.
+    Omnigraph::init(uri, TEST_SCHEMA).await.unwrap();
+    let original_schema_pg = fs::read_to_string(dir.path().join("_schema.pg")).unwrap();
+    let original_schema_ir = fs::read_to_string(dir.path().join("_schema.ir.json")).unwrap();
+    let original_schema_state = fs::read_to_string(dir.path().join("__schema_state.json")).unwrap();
+
+    // Attempt a re-init with a deliberately different schema so any
+    // overwrite would be observable in the file contents.
+    let different_schema = "node Other { id: String @key }\n";
+    let result = Omnigraph::init(uri, different_schema).await;
+
+    // The new init must report the conflict, not silently mutate.
+    assert!(
+        result.is_err(),
+        "init against an existing graph URI must error, not silently overwrite"
+    );
+
+    // The three schema files must remain present and byte-identical to
+    // their pre-attempt contents.
+    assert!(
+        dir.path().join("_schema.pg").exists(),
+        "_schema.pg must not be deleted by a failed re-init"
+    );
+    assert!(
+        dir.path().join("_schema.ir.json").exists(),
+        "_schema.ir.json must not be deleted by a failed re-init"
+    );
+    assert!(
+        dir.path().join("__schema_state.json").exists(),
+        "__schema_state.json must not be deleted by a failed re-init"
+    );
+    assert_eq!(
+        fs::read_to_string(dir.path().join("_schema.pg")).unwrap(),
+        original_schema_pg,
+        "_schema.pg contents must be preserved when re-init is rejected"
+    );
+    assert_eq!(
+        fs::read_to_string(dir.path().join("_schema.ir.json")).unwrap(),
+        original_schema_ir,
+        "_schema.ir.json contents must be preserved when re-init is rejected"
+    );
+    assert_eq!(
+        fs::read_to_string(dir.path().join("__schema_state.json")).unwrap(),
+        original_schema_state,
+        "__schema_state.json contents must be preserved when re-init is rejected"
+    );
+}
+
+/// Happy-path sibling to the strict re-init regression above:
+/// `InitOptions { force: true }` must skip the schema-file preflight
+/// when the operator deliberately wants to recover from orphan
+/// schema artifacts (e.g. files left behind by a failed prior init).
+///
+/// Documented semantics per `InitOptions::force`: skips the preflight
+/// only. Force does NOT purge existing Lance datasets or `__manifest/`
+/// — that needs `StorageAdapter::delete_prefix`, which is tracked
+/// separately. The realistic recovery scenario is "schema files
+/// exist but Lance state doesn't," which this test reproduces.
+///
+/// Without this test, a future refactor could invert the `if !force`
+/// branch and silently break the operator-facing escape hatch.
+#[tokio::test]
+async fn init_with_force_recovers_from_orphan_schema_files() {
+    let dir = tempfile::tempdir().unwrap();
+    let uri = dir.path().to_str().unwrap();
+
+    // Simulate orphan schema files: write `_schema.pg` to disk
+    // without running a full init. The preflight will see it and
+    // bail in strict mode.
+    fs::write(dir.path().join("_schema.pg"), TEST_SCHEMA).unwrap();
+
+    // Strict mode refuses because `_schema.pg` exists.
+    let strict_err = match Omnigraph::init(uri, TEST_SCHEMA).await {
+        Ok(_) => panic!("strict init must refuse when orphan _schema.pg exists"),
+        Err(e) => e,
+    };
+    assert!(
+        strict_err.to_string().contains("already initialized"),
+        "strict init must surface AlreadyInitialized (sanity check); got: {strict_err}"
+    );
+
+    // Force init succeeds: it skips the preflight, overwrites the
+    // orphan file, and proceeds to initialize Lance state (which
+    // didn't exist, so `GraphCoordinator::init` is unblocked).
+    let db = Omnigraph::init_with_options(uri, TEST_SCHEMA, InitOptions { force: true })
+        .await
+        .expect("force init must succeed when only orphan schema files block strict init");
+
+    // Confirm the catalog is populated as expected — proves the
+    // graph is functional after force-recovery, not just that the
+    // call returned Ok.
+    assert!(
+        db.catalog().node_types.contains_key("Person"),
+        "force-recovered graph must have the new catalog installed"
+    );
+    assert!(
+        dir.path().join("__schema_state.json").exists(),
+        "force-recovered graph must have full schema state written"
+    );
+}
--- a/crates/omnigraph/tests/policy_engine_chassis.rs
+++ b/crates/omnigraph/tests/policy_engine_chassis.rs
@ -23,8 +23,8 @@ use std::path::Path;
 use std::sync::Arc;

 use omnigraph::db::{Omnigraph, ReadTarget, SchemaApplyOptions};
-use omnigraph::loader::LoadMode;
 use omnigraph::error::OmniError;
+use omnigraph::loader::LoadMode;
 use omnigraph_policy::{PolicyChecker, PolicyEngine};

 use helpers::*;
@ -58,13 +58,16 @@ rules:
 "#;

 fn additive_schema() -> String {
-    helpers::TEST_SCHEMA.replace("    age: I32?\n}", "    age: I32?\n    nickname: String?\n}")
+    helpers::TEST_SCHEMA.replace(
+        "    age: I32?\n}",
+        "    age: I32?\n    nickname: String?\n}",
+    )
 }

 fn install_policy(db: Omnigraph, dir_path: &Path) -> (Omnigraph, Arc<PolicyEngine>) {
    let policy_path = dir_path.join("policy.yaml");
    fs::write(&policy_path, POLICY_YAML).unwrap();
-    let engine = PolicyEngine::load(&policy_path, dir_path.to_str().unwrap()).unwrap();
+    let engine = PolicyEngine::load_graph(&policy_path, dir_path.to_str().unwrap()).unwrap();
    let engine = Arc::new(engine);
    let db = db.with_policy(Arc::clone(&engine) as Arc<dyn PolicyChecker>);
    (db, engine)
@ -238,7 +241,12 @@ async fn load_as_denies_when_policy_rejects_actor() {
    let (db, _engine) = init_with_policy(&dir).await;

    let result = db
-        .load_as("main", ONE_PERSON_JSONL, LoadMode::Merge, Some("act-denied"))
+        .load_as(
+            "main",
+            ONE_PERSON_JSONL,
+            LoadMode::Merge,
+            Some("act-denied"),
+        )
        .await;
    assert_denied(result, "load_as");
 }
--- a/crates/omnigraph/tests/runs.rs
+++ b/crates/omnigraph/tests/runs.rs
@ -127,10 +127,7 @@ async fn multi_statement_mutation_is_atomic_with_read_your_writes() {
            "main",
            MUTATION_QUERIES,
            "insert_person_and_friend",
-            &mixed_params(
-                &[("$name", "Eve"), ("$friend", "Alice")],
-                &[("$age", 22)],
-            ),
+            &mixed_params(&[("$name", "Eve"), ("$friend", "Alice")], &[("$age", 22)]),
        )
        .await
        .unwrap();
@ -187,10 +184,7 @@ async fn partial_failure_leaves_target_queryable_and_unblocks_next_mutation() {
            "main",
            MUTATION_QUERIES,
            "insert_person_and_friend",
-            &mixed_params(
-                &[("$name", "Eve"), ("$friend", "Missing")],
-                &[("$age", 22)],
-            ),
+            &mixed_params(&[("$name", "Eve"), ("$friend", "Missing")], &[("$age", 22)]),
        )
        .await
        .expect_err("op-2 must fail");
@ -543,10 +537,7 @@ async fn mutation_rejects_mixed_insert_and_delete_at_parse_time() {
            "main",
            STAGED_QUERIES,
            "mixed_insert_and_delete",
-            &mixed_params(
-                &[("$name", "Eve"), ("$victim", "Alice")],
-                &[("$age", 22)],
-            ),
+            &mixed_params(&[("$name", "Eve"), ("$victim", "Alice")], &[("$age", 22)]),
        )
        .await
        .expect_err("D₂ must reject mixed insert+delete");
@ -559,7 +550,9 @@ async fn mutation_rejects_mixed_insert_and_delete_at_parse_time() {
        manifest_err.message,
    );
    assert!(
-        manifest_err.message.contains("split into separate mutations"),
+        manifest_err
+            .message
+            .contains("split into separate mutations"),
        "error message should direct user to split: {}",
        manifest_err.message,
    );
@ -668,11 +661,7 @@ async fn multiple_appends_to_same_edge_coalesce_to_one_append() {
            "main",
            STAGED_QUERIES,
            "insert_two_friends",
-            &params(&[
-                ("$from", "Alice"),
-                ("$a", "Bob"),
-                ("$b", "Eve"),
-            ]),
+            &params(&[("$from", "Alice"), ("$a", "Bob"), ("$b", "Eve")]),
        )
        .await
        .unwrap();
@ -782,8 +771,14 @@ async fn load_with_bad_edge_reference_unblocks_next_load() {
    // No write made it to disk: counts unchanged.
    let mid_persons = count_rows(&db, "node:Person").await;
    let mid_edges = count_rows(&db, "edge:Knows").await;
-    assert_eq!(mid_persons, pre_persons, "failed load must not advance Person count");
-    assert_eq!(mid_edges, pre_edges, "failed load must not advance Knows count");
+    assert_eq!(
+        mid_persons, pre_persons,
+        "failed load must not advance Person count"
+    );
+    assert_eq!(
+        mid_edges, pre_edges,
+        "failed load must not advance Knows count"
+    );

    // Second load against the same tables — succeeds (no HEAD drift).
    let good = r#"{"type": "Person", "data": {"name": "Pat", "age": 55}}"#;
@ -824,7 +819,9 @@ edge WorksAt: Person -> Company @card(0..1)
 {"type": "Company", "data": {"name": "Acme"}}
 {"type": "Company", "data": {"name": "Bigco"}}
 "#;
-    load_jsonl(&mut db, seed, LoadMode::Overwrite).await.unwrap();
+    load_jsonl(&mut db, seed, LoadMode::Overwrite)
+        .await
+        .unwrap();

    let pre_works = count_rows(&db, "edge:WorksAt").await;

@ -1014,7 +1011,10 @@ query cascade_then_explicit($name: String, $other: String) {
    // — Bob→Diana would survive. The exact-count check makes both ops
    // independently observable.
    let pre_knows = count_rows(&db, "edge:Knows").await;
-    assert_eq!(pre_knows, 3, "fixture invariant: TEST_DATA seeds 3 Knows edges");
+    assert_eq!(
+        pre_knows, 3,
+        "fixture invariant: TEST_DATA seeds 3 Knows edges"
+    );

    db.mutate(
        "main",
@ -1066,7 +1066,9 @@ query add_friend($from: String, $to: String) {
    let seed = r#"{"type": "Person", "data": {"name": "Alice"}}
 {"type": "Person", "data": {"name": "Bob"}}
 "#;
-    load_jsonl(&mut db, seed, LoadMode::Overwrite).await.unwrap();
+    load_jsonl(&mut db, seed, LoadMode::Overwrite)
+        .await
+        .unwrap();

    // Single insert: count=1 < min=2 → reject with clear message.
    let err = db
@ -1082,8 +1084,7 @@ query add_friend($from: String, $to: String) {
        panic!("expected Manifest error, got {err:?}");
    };
    assert!(
-        manifest_err.message.contains("@card violation")
-            && manifest_err.message.contains("min 2"),
+        manifest_err.message.contains("@card violation") && manifest_err.message.contains("min 2"),
        "unexpected error: {}",
        manifest_err.message,
    );
@ -1121,7 +1122,9 @@ edge WorksAt: Person -> Company @card(0..1)
 {"type": "Company", "data": {"name": "Bigco"}}
 {"edge": "WorksAt", "from": "Alice", "to": "Acme", "data": {"id": "w1"}}
 "#;
-    load_jsonl(&mut db, seed, LoadMode::Overwrite).await.unwrap();
+    load_jsonl(&mut db, seed, LoadMode::Overwrite)
+        .await
+        .unwrap();

    // Merge-update the same edge id w1 to point at Bigco. Counted naively
    // as union, Alice has 2 WorksAt (committed Acme + pending Bigco) which
@ -1167,7 +1170,9 @@ edge WorksAt: Person -> Company @card(0..1)
 {"type": "Company", "data": {"name": "Acme"}}
 {"type": "Company", "data": {"name": "Bigco"}}
 "#;
-    load_jsonl(&mut db, seed, LoadMode::Overwrite).await.unwrap();
+    load_jsonl(&mut db, seed, LoadMode::Overwrite)
+        .await
+        .unwrap();

    // Merge load with the SAME edge id twice — the second row supersedes
    // the first in the finalize-time dedupe. If pending-counting doesn't
@ -1364,7 +1369,11 @@ query insert_then_update_note(
        )
        .await
        .unwrap();
-    assert_eq!(qr.num_rows(), 0, "letter must not be visible after early error");
+    assert_eq!(
+        qr.num_rows(),
+        0,
+        "letter must not be visible after early error"
+    );
 }

 /// MR-920 regression: two sequential `update T set {f:v} where x=y`
@ -1446,5 +1455,9 @@ async fn second_sequential_update_on_same_row_succeeds() {
            }
        }
    }
-    assert_eq!(alice_age, Some(42), "Alice's age must reflect the second update");
+    assert_eq!(
+        alice_age,
+        Some(42),
+        "Alice's age must reflect the second update"
+    );
 }
--- a/crates/omnigraph/tests/staged_writes.rs
+++ b/crates/omnigraph/tests/staged_writes.rs
@ -132,7 +132,11 @@ async fn stage_merge_insert_dedupes_superseded_committed_fragment() {
        .await
        .unwrap();
    let ids = collect_ids(&batches);
-    assert_eq!(ids, vec!["alice"], "merge_insert must not surface duplicates");
+    assert_eq!(
+        ids,
+        vec!["alice"],
+        "merge_insert must not surface duplicates"
+    );

    // Confirm the visible row is the rewritten one.
    let total: usize = batches.iter().map(|b| b.num_rows()).sum();
@ -382,12 +386,7 @@ async fn scan_with_staged_with_filter_silently_drops_staged_rows() {
    // Actual: dave (staged, age=35) is dropped — only the committed matches
    // come back.
    let batches = store
-        .scan_with_staged(
-            &ds,
-            std::slice::from_ref(&staged),
-            None,
-            Some("age >= 30"),
-        )
+        .scan_with_staged(&ds, std::slice::from_ref(&staged), None, Some("age >= 30"))
        .await
        .unwrap();
    assert_eq!(
@ -403,12 +402,7 @@ async fn scan_with_staged_with_filter_silently_drops_staged_rows() {
    // Without filter, staged data IS visible — confirms the issue is
    // specifically filter pushdown, not fragment scanning per se.
    let unfiltered = store
-        .scan_with_staged(
-            &ds,
-            std::slice::from_ref(&staged),
-            None,
-            None,
-        )
+        .scan_with_staged(&ds, std::slice::from_ref(&staged), None, None)
        .await
        .unwrap();
    assert_eq!(
@ -686,10 +680,7 @@ async fn stage_create_inverted_index_does_not_advance_head_until_commit() {
    .unwrap();
    let pre_version = ds.version().version;

-    let staged = store
-        .stage_create_inverted_index(&ds, "id")
-        .await
-        .unwrap();
+    let staged = store.stage_create_inverted_index(&ds, "id").await.unwrap();
    assert_eq!(
        ds.version().version,
        pre_version,
@ -781,13 +772,9 @@ async fn create_vector_index_advances_head_inline_documents_residual() {
    let id_arr = StringArray::from(ids);
    let flat: Vec<f32> = (0..(n_rows * dim)).map(|i| i as f32).collect();
    let values = arrow_array::Float32Array::from(flat);
-    let vec_arr =
-        FixedSizeListArray::new(item_field, dim as i32, Arc::new(values), None);
-    let batch = RecordBatch::try_new(
-        schema.clone(),
-        vec![Arc::new(id_arr), Arc::new(vec_arr)],
-    )
-    .unwrap();
+    let vec_arr = FixedSizeListArray::new(item_field, dim as i32, Arc::new(values), None);
+    let batch =
+        RecordBatch::try_new(schema.clone(), vec![Arc::new(id_arr), Arc::new(vec_arr)]).unwrap();

    let mut ds = TableStore::write_dataset(&uri, batch).await.unwrap();
    let pre_version = ds.version().version;
--- a/crates/omnigraph/tests/traversal.rs
+++ b/crates/omnigraph/tests/traversal.rs
@ -504,9 +504,21 @@ query fof_chain($name: String) {

    let batch = result.concat_batches().unwrap();
    assert_eq!(batch.num_rows(), 1);
-    let col0 = batch.column(0).as_any().downcast_ref::<StringArray>().unwrap();
-    let col1 = batch.column(1).as_any().downcast_ref::<StringArray>().unwrap();
-    let col2 = batch.column(2).as_any().downcast_ref::<StringArray>().unwrap();
+    let col0 = batch
+        .column(0)
+        .as_any()
+        .downcast_ref::<StringArray>()
+        .unwrap();
+    let col1 = batch
+        .column(1)
+        .as_any()
+        .downcast_ref::<StringArray>()
+        .unwrap();
+    let col2 = batch
+        .column(2)
+        .as_any()
+        .downcast_ref::<StringArray>()
+        .unwrap();
    assert_eq!(col0.value(0), "Alice");
    assert_eq!(col1.value(0), "Bob");
    assert_eq!(col2.value(0), "Diana");
@ -574,8 +586,16 @@ query at_acme_named() {

    let batch = result.concat_batches().unwrap();
    assert_eq!(batch.num_rows(), 1);
-    let person = batch.column(0).as_any().downcast_ref::<StringArray>().unwrap();
-    let company = batch.column(1).as_any().downcast_ref::<StringArray>().unwrap();
+    let person = batch
+        .column(0)
+        .as_any()
+        .downcast_ref::<StringArray>()
+        .unwrap();
+    let company = batch
+        .column(1)
+        .as_any()
+        .downcast_ref::<StringArray>()
+        .unwrap();
    assert_eq!(person.value(0), "Alice");
    assert_eq!(company.value(0), "Acme");
 }
@ -608,8 +628,16 @@ query at_company($company: String) {

    let batch = result.concat_batches().unwrap();
    assert_eq!(batch.num_rows(), 1);
-    let person = batch.column(0).as_any().downcast_ref::<StringArray>().unwrap();
-    let company = batch.column(1).as_any().downcast_ref::<StringArray>().unwrap();
+    let person = batch
+        .column(0)
+        .as_any()
+        .downcast_ref::<StringArray>()
+        .unwrap();
+    let company = batch
+        .column(1)
+        .as_any()
+        .downcast_ref::<StringArray>()
+        .unwrap();
    assert_eq!(person.value(0), "Bob");
    assert_eq!(company.value(0), "Globex");
 }
@ -633,19 +661,22 @@ query fan_out($name: String) {
 "#;
    // Alice knows Bob and Charlie, works at Acme.
    // Each friend paired with her company → 2 rows.
-    let result = query_main(
-        &mut db,
-        queries,
-        "fan_out",
-        &params(&[("$name", "Alice")]),
-    )
-    .await
-    .unwrap();
+    let result = query_main(&mut db, queries, "fan_out", &params(&[("$name", "Alice")]))
+        .await
+        .unwrap();

    let batch = result.concat_batches().unwrap();
    assert_eq!(batch.num_rows(), 2);
-    let friends = batch.column(0).as_any().downcast_ref::<StringArray>().unwrap();
-    let companies = batch.column(1).as_any().downcast_ref::<StringArray>().unwrap();
+    let friends = batch
+        .column(0)
+        .as_any()
+        .downcast_ref::<StringArray>()
+        .unwrap();
+    let companies = batch
+        .column(1)
+        .as_any()
+        .downcast_ref::<StringArray>()
+        .unwrap();

    let mut pairs: Vec<(&str, &str)> = (0..batch.num_rows())
        .map(|i| (friends.value(i), companies.value(i)))
--- a/crates/omnigraph/tests/validators.rs
+++ b/crates/omnigraph/tests/validators.rs
@ -76,7 +76,9 @@ async fn init_with(schema: &str, data: &str) -> (tempfile::TempDir, Omnigraph) {
    let uri = dir.path().to_str().unwrap();
    let mut db = Omnigraph::init(uri, schema).await.unwrap();
    if !data.is_empty() {
-        load_jsonl(&mut db, data, LoadMode::Overwrite).await.unwrap();
+        load_jsonl(&mut db, data, LoadMode::Overwrite)
+            .await
+            .unwrap();
    }
    (dir, db)
 }
--- a/docs/releases/v0.6.0.md
+++ b/docs/releases/v0.6.0.md
@ -1,19 +1,127 @@
 # Omnigraph v0.6.0

+Two pieces of work land in this release:
+
+1. The **graph terminology rename** (renamed `Repo` → `Graph` across the Cedar resource model, policy API, and query-lint schema source).
+2. **Multi-graph server mode** — one `omnigraph-server` process can now serve 1–10 graphs concurrently behind cluster routes (`/graphs/{graph_id}/...`), with per-graph and server-level Cedar policy, read-only `GET /graphs` enumeration, and CLI parity (`omnigraph graphs list`).
+
+Runtime add/remove (`POST /graphs`, `DELETE /graphs/{id}`, `omnigraph graphs create`) is **not** in v0.6.0. Operators add or remove graphs by editing `omnigraph.yaml` and restarting. The first cut of `POST /graphs` shipped behind an atomic-YAML-rewrite design that we pulled before release once its concurrency guarantees were challenged (flock-on-renamed-inode race, duplicate-check outside the critical section, and an init-cleanup path that could destroy an existing graph's schema on re-init). The correct fix is a Lance-style cluster catalog (reserve → init → publish with recovery sidecars); that work is deferred.
+
 ## Breaking Changes

+### Graph terminology rename
+
 - Renamed the Cedar resource entity from `Omnigraph::Repo` to `Omnigraph::Graph`.
- Renamed policy API terminology from `repo_id` to `graph_id` on `PolicyCompiler::compile` and `PolicyEngine::load`.
+- Renamed policy API terminology from `repo_id` to `graph_id` on `PolicyCompiler::compile` (and on the new `PolicyEngine::load_graph` / `PolicyEngine::load_server` loaders described below).
 - Renamed query-lint schema source JSON from `"repo"` to `"graph"` for `schema_source.kind`.

+### Multi-graph server mode
+
+- **Multi-graph deployments lose flat routes.** Single-graph invocation (`omnigraph-server <URI>`) is unchanged — same flat `/snapshot`, `/read`, `/branches`, etc. Multi-graph deployments serve those routes under `/graphs/{graph_id}/...`; bare flat paths return 404 in multi mode.
+- **`ServerConfig` shape change** (programmatic embedders only): `ServerConfig { uri, policy_file }` is replaced by `ServerConfig { mode: ServerConfigMode }`, where `ServerConfigMode = Single { uri, policy_file } | Multi { graphs, config_path, server_policy_file }`. Callers that use `load_server_settings` are unaffected; callers that construct `ServerConfig` directly need to wrap their fields in `ServerConfigMode::Single`.
+- **`AppState`'s routing surface** is `AppState::routing() -> &GraphRouting`, where `GraphRouting = Single { handle } | Multi { registry, config_path }`. The previous `AppState::uri()`, `AppState::mode()`, `AppState::registry()` accessors and the `ServerMode` enum are gone — embedders read `state.routing()` and match on the arm they need. Per-graph URIs live on `handle.uri`.
+- **`AppState::new_multi`** is the new multi-graph constructor. Single-mode `new_*` / `open_*` constructors are unchanged.
+- **`AuthenticatedActor(Arc<str>)` → `ResolvedActor { actor_id, tenant_id, scopes, source }`** (programmatic embedders only). The struct shape changes, but the HTTP contract — bearer auth and the bearer-derived-actor-identity guarantee — is unchanged. Cluster-mode call sites construct with `tenant_id: None`, `scopes: vec![Scope::Full]`, `source: AuthSource::Static`. The new fields are forward-compat seams for future multi-tenant and OAuth deployments; they're inert in this release.
+- **`PolicyEngine::load(path, graph_id)` removed** in favor of two kind-typed loaders: `PolicyEngine::load_graph(path, graph_id)` for per-graph policies and `PolicyEngine::load_server(path)` for server-level policies. Each loader rejects rules whose action `resource_kind()` doesn't match the engine kind — operators who put a `graph_list` rule in a per-graph file (or a `read` rule in a server file) now get a load-time error instead of a silently-never-matching rule.
+- **`PolicyRequest::actor_id` field removed.** Actor identity is now a separate parameter on `PolicyEngine::authorize(actor_id, &request)`. The type system enforces the server-authoritative-actor invariant: actor identity is always sourced from the bearer-token match resolved at the auth boundary; handlers cannot smuggle identity through the request body.
+- **`Omnigraph::init` is strict by default.** Initialization at a URI that already holds schema files now errors with `OmniError::AlreadyInitialized` instead of silently overwriting. Operators who actually want to overwrite use `InitOptions { force: true }` (CLI: `omnigraph init --force`). Closes the destructive-cleanup footgun where a failed re-init would delete an existing graph's schema files.
+- **Top-level `policy.file` is rejected in multi-graph server mode.** It remains valid for single-graph / CLI-local policy. Multi-graph deployments must move graph rules to `graphs.<graph_id>.policy.file` and server-scoped `graph_list` rules to `server.policy.file`.
+- **Open server startup requires explicit opt-in.** A server with no bearer tokens and no policy now refuses to start unless passed `--unauthenticated` or `OMNIGRAPH_UNAUTHENTICATED=1`.
+- **Policy requires bearer tokens.** Configuring any policy file without bearer tokens now refuses startup; otherwise every protected request would 401 before Cedar could evaluate it.
+- **Tokens without policy default-deny non-read actions.** Existing authenticated deployments that relied on writes or admin routes without Cedar policy must add policy rules for those actions.
+- **`GET /graphs` requires `server.policy.file` in every runtime state.** Even `--unauthenticated` mode keeps server topology closed until the operator explicitly authorizes `graph_list`.
+
+## New
+
+- **Multi-graph mode**. Invoke with `omnigraph-server --config omnigraph.yaml` where the YAML has a non-empty `graphs:` map and no single-mode selector (no `server.graph`, no CLI `<URI>` or `--target`). At startup the server opens every configured graph in parallel (bounded concurrency, fail-fast).
+- **`GET /graphs`**. Lists every registered graph, sorted alphabetically by `graph_id`. Auth-required when bearer tokens are configured; Cedar-gated by `PolicyAction::GraphList` against `Omnigraph::Server::"root"`. Returns 405 in single mode. Server-scoped actions require an explicit `server.policy.file` in every runtime state — the management surface is closed by default even in `--unauthenticated` mode so that server topology is never exposed without operator opt-in.
+- **CLI `omnigraph graphs list`**. Mirrors the HTTP surface. Rejects local URI targets with a clear message — for remote multi-graph servers only.
+- **CLI `omnigraph init --force`**. Bypasses the strict-init preflight when an operator deliberately wants to recover from orphan schema files. Does NOT purge existing Lance datasets; recursive deletion needs `StorageAdapter::delete_prefix` (deferred — see below).
+- **Per-graph Cedar policy**. Each entry in the `graphs:` map can carry a `policy.file` path, loaded at startup via `PolicyEngine::load_graph`. Cedar's `Omnigraph::Graph::"<graph_id>"` resource is per-graph; the new `Omnigraph::Server::"root"` resource governs server-level actions.
+- **Server-level Cedar policy**. `server.policy.file` in the config governs the `graph_list` action on `Omnigraph::Server::"root"`. Required to expose `GET /graphs` in every runtime state — without a server policy the default-deny posture rejects `graph_list`, including in `--unauthenticated` mode.
+- **Cedar action vocabulary**: `graph_list` (server-scoped). Runtime `graph_create` / `graph_delete` are reserved but not shipped — see "Deferred."
+- **Canonical graph URI identity.** Server startup normalizes graph root URIs before registry insertion and response output, so aliases such as `/tmp/g`, `/tmp/g/`, and `file:///tmp/g` cannot register as distinct graphs that actually share one Lance root.
+
+## Configuration
+
+`omnigraph.yaml` schema additions (all optional, single-mode unaffected):
+
+```yaml
+server:
+  bind: 0.0.0.0:8080
+  policy:
+    file: ./server-policy.yaml          # server-level Cedar (graph_list)
+
+graphs:
+  alpha:
+    uri: s3://tenant-bucket/alpha
+    policy:
+      file: ./policies/alpha.yaml       # per-graph Cedar
+  beta:
+    uri: s3://tenant-bucket/beta
+    # no per-graph policy → engine-layer enforcement is a no-op
+```
+
+## Deferred
+
+- **`POST /graphs` runtime graph creation** and **CLI `omnigraph graphs create`**. Pulled before release after the YAML-rewrite design's correctness story didn't survive review. A future release will add a managed cluster catalog (Lance-backed reserve → init → publish with recovery sidecars) and re-expose runtime creation on top of it. Until then, operators add graphs by editing `omnigraph.yaml` and restarting.
+- **`DELETE /graphs/{id}`**. Never shipped in v0.6.0; deferred with the same cluster-catalog work.
+- **`StorageAdapter::delete_prefix`**. The substrate primitive a managed catalog would need. Will land alongside runtime mutation.
+- **`omnigraph init --force` purging Lance state.** Today `--force` only bypasses the schema-file preflight; recursive deletion of existing Lance datasets needs `delete_prefix`.
+- **`X-Actor-Id` service delegation forwarding**. Needs durable both-actor audit on `_graph_commits.lance` — out of scope.
+- **Hot policy reload**. Restart is cheap at N≤10 graphs.
+
 ## User Impact

- No on-disk migration is required. Existing `.omni` graphs continue to open with the same storage layout.
- Supported YAML policy authoring is unchanged because the YAML schema does not expose the Cedar entity type name.
- Operators with unsupported raw Cedar policy files should update `Omnigraph::Repo`
-  resource references to `Omnigraph::Graph`.
+- **No on-disk migration is required.** Existing `.omni` graphs from v0.5.0 (and earlier) open cleanly under v0.6.0 — Lance datasets, `__manifest`, `_schema.pg`, `_schema.ir.json`, `__schema_state.json`, `_graph_commits.lance`, `_graph_commit_recoveries.lance` all use unchanged formats. No conversion step.
+- **Existing single-graph storage upgrades without migration.** Server deployments may need auth/policy config changes: explicitly pass `--unauthenticated` for local open mode, configure tokens when using policy, and add Cedar policy for non-read authenticated actions.
+- **Multi-graph adoption is opt-in.** Add a `graphs:` map to `omnigraph.yaml` (and remove `server.graph`) to switch a deployment to multi mode.
+- **Cluster routes are breaking for client SDKs targeting multi mode.** Generated clients from previous v0.5.0 OpenAPI specs will hit 404 on flat paths against a multi-mode server. Regenerate against the v0.6.0 `openapi.json`.
+- **Supported YAML policy authoring is unchanged.** The Cedar `Omnigraph::Graph` and `Omnigraph::Server` entities are internally generated by `compile_policy_source` — operator YAML only references actions and groups.
+- **Operators with unsupported raw Cedar policy files** should update `Omnigraph::Repo` resource references to `Omnigraph::Graph`.
+
+## Migration: single → multi
+
+```yaml
+# Before (v0.5.0 single-mode invocation)
+server:
+  graph: my-graph
+graphs:
+  my-graph:
+    uri: /var/lib/omnigraph/my-graph
+policy:
+  file: ./policy.yaml
+```
+
+```yaml
+# After (v0.6.0 multi-mode — drop `server.graph` and the top-level `policy`)
+server:
+  policy:
+    file: ./server-policy.yaml      # NEW: governs GET /graphs
+graphs:
+  my-graph:
+    uri: /var/lib/omnigraph/my-graph
+    policy:
+      file: ./policy.yaml           # MOVED: was top-level
+```
+
+Same `omnigraph.yaml` file; restart the server. Clients targeting the old flat routes (`/snapshot`, `/read`, …) must update to `/graphs/my-graph/snapshot`, etc.
+
+To add a new graph after rollout: stop the server, append a new `graphs.<id>` entry, restart.

 ## Documentation

 - Public docs, CLI help, examples, server docs, and test helpers now consistently use "graph" for the OmniGraph data artifact.
 - GitHub/source repository terminology remains spelled out as "repository" where needed.
+- New: `docs/user/cli.md` documents `omnigraph graphs list`; `docs/user/server.md` documents the multi-graph mode and the cluster route convention; `docs/user/policy.md` documents the per-graph vs server-scoped action distinction.
+
+## Test coverage
+
+- `GraphId` newtype validation, registry race tests, init failpoints (still reachable from `omnigraph init` CLI).
+- Mode-inference four-rule matrix, parallel multi-graph startup, cluster routing.
+- Cedar `Server` resource refactor, backwards-compat for graph-only policies, kind-alignment rejection (server actions in graph files / vice versa).
+- `GET /graphs` enumeration, 405-in-single-mode, 403-in-Open-mode-without-server-policy, Cedar admin/viewer authorization.
+- Cluster routes with inner path params (`/branches/{branch}`, `/commits/{commit_id}`) deserialize correctly under axum 0.8 nested routing.
+- Policy-requires-tokens startup invariant enforced uniformly across single and multi mode.
+- The bearer-auth-derived-actor-identity regression test (client-supplied identity headers are ignored; the server-resolved actor is the only identity Cedar sees) stays green across the entire refactor.
+</content>
--- a/docs/user/cli.md
+++ b/docs/user/cli.md
@ -44,6 +44,27 @@ omnigraph read \
 If the server requires auth, set `OMNIGRAPH_SERVER_BEARER_TOKEN` on the server
 and configure the matching `bearer_token_env` in `omnigraph.yaml`.

+## Multi-graph servers (v0.6.0+)
+
+Against a multi-graph server (started with `--config omnigraph.yaml` referencing a non-empty `graphs:` map), use `omnigraph graphs list` to enumerate the registered graphs. The server must configure bearer tokens and `server.policy.file` with a rule that allows `graph_list`; `/graphs` is closed by default even when the server runs with `--unauthenticated`.
+
+```bash
+OMNIGRAPH_BEARER_TOKEN=admin-token \
+  omnigraph graphs list --uri http://server.example.com --json
+```
+
+For config-driven clients, set the remote graph's `bearer_token_env` to an environment variable containing a token whose actor is authorized by `server.policy.file`.
+
+`list` rejects local URI targets — it's for remote multi-graph servers only.
+
+Runtime add/remove is **not** in v0.6.0. To add a graph, stop the server, add a `graphs.<id>` entry to `omnigraph.yaml`, then restart. To remove, stop the server, delete the entry, restart.
+
+Per-graph URLs: hit a graph's cluster route from any subcommand by pointing `--uri` at it:
+
+```bash
+omnigraph read --uri http://server.example.com/graphs/beta --query ./q.gq ...
+```
+
 ## Runs, Policy, And Diagnostics

 ```bash
--- a/docs/user/deployment.md
+++ b/docs/user/deployment.md
@ -109,7 +109,8 @@ docker run --rm -p 8080:8080 \

 ## Auth

-The server can run unauthenticated for local development, but any shared or
+The server can run unauthenticated for local development only when explicitly
+started with `--unauthenticated` or `OMNIGRAPH_UNAUTHENTICATED=1`. Any shared or
 internet-facing deployment should set a bearer token source.

 ### Token sources
--- a/docs/user/policy.md
+++ b/docs/user/policy.md
@ -4,6 +4,8 @@ OmniGraph integrates AWS Cedar (`cedar-policy = 4.9`) for ABAC.

 ## Policy actions

+Per-graph actions (bind to `Omnigraph::Graph::"<graph_id>"`):
+
 1. `read` — query / snapshot / list branches & commits
 2. `export` — NDJSON export
 3. `change` — mutations
@ -13,12 +15,57 @@ OmniGraph integrates AWS Cedar (`cedar-policy = 4.9`) for ABAC.
 7. `branch_merge`
 8. `admin` — reserved for policy-management surfaces (hot reload, audit log, approvals). No call site today; see MR-724 for the reservation rationale.

+Server-scoped action (v0.6.0+; binds to `Omnigraph::Server::"root"`):
+
+9. `graph_list` — `GET /graphs` registry enumeration (multi-graph mode)
+
+Server-scoped actions cannot use `branch_scope` or `target_branch_scope` — they operate on the registry, not on a graph's branches. A rule cannot mix server-scoped and per-graph actions; split into separate rules. (Runtime `graph_create` / `graph_delete` are reserved but not shipped in v0.6.0; operators add/remove graphs by editing `omnigraph.yaml` and restarting.)
+
 ## Scope kinds

 - `branch_scope` — applied to source branch (`read`, `export`, `change`)
 - `target_branch_scope` — applied to destination (`schema_apply`, branch ops, run ops)
 - `protected_branches` — named list with special rules; rule scopes are `any | protected | unprotected`

+## Per-graph vs. server-level policy (multi-graph mode)
+
+In multi mode (`omnigraph.yaml` with a non-empty `graphs:` map), policy files attach at two levels:
+
+```yaml
+server:
+  policy:
+    file: ./server-policy.yaml          # server-level: graph_list
+
+graphs:
+  alpha:
+    uri: s3://tenant-bucket/alpha
+    policy:
+      file: ./policies/alpha.yaml       # per-graph: read, change, branch_*, schema_apply
+  beta:
+    uri: s3://tenant-bucket/beta
+    # no per-graph policy → no engine-layer Cedar enforcement on beta
+```
+
+Top-level `policy.file` is single-graph / CLI-local policy only. Multi-graph
+server startup rejects it because applying one graph policy to every configured
+graph is ambiguous. Move per-graph rules to `graphs.<graph_id>.policy.file` and
+move `graph_list` rules to `server.policy.file`.
+
+Each graph's HTTP request flows through its own per-graph policy. The management endpoint (`GET /graphs`) flows through the server-level policy. When `server.policy.file` is unset, `GET /graphs` is denied in every runtime state, including `--unauthenticated`; with bearer tokens configured, it returns 403 after admission control because `graph_list` is not a `read`-equivalent action. The operator must explicitly authorize via `server-policy.yaml` to expose `/graphs`.
+
+Example server-level policy:
+
+```yaml
+version: 1
+groups:
+  admins: [act-andrew]
+rules:
+  - id: admins-can-list-graphs
+    allow:
+      actors: { group: admins }
+      actions: [graph_list]
+```
+
 ## Configuration

 `omnigraph.yaml`:
@ -32,7 +79,7 @@ cli:
  actor: act-andrew            # default actor for CLI direct-engine writes
 ```

-Each rule must use exactly one of `branch_scope` or `target_branch_scope`.
+Each per-graph rule may use at most one of `branch_scope` or `target_branch_scope`. Server-scoped rules (`graph_list`) take neither — they have no branch context.

 `cli.actor` is the default actor identity for CLI direct-engine writes
 when `policy.file` is configured. Override per-invocation with `--as
@ -74,12 +121,13 @@ reaches `authorize_request()` without a matching policy permit.
 |---|---|---|---|
 | **Open** | no | no | Every request is permitted. Refuses to start unless `--unauthenticated` or `OMNIGRAPH_UNAUTHENTICATED=1` is set — the operator must explicitly opt in. |
 | **DefaultDeny** | yes | no | Every authenticated request for an action other than `read` is rejected with HTTP 403. Closes the "tokens but forgot the policy file" trap — an operator who sets up auth and forgot to point at a policy file used to ship the illusion of protection. |
-| **PolicyEnabled** | any | yes | Every request is evaluated by Cedar against the configured policy. |
+| **PolicyEnabled** | yes | yes | Authenticated requests that reach a configured policy engine are evaluated by Cedar. Server-scoped actions still require `server.policy.file`. |

 The classifier is `classify_server_runtime_state` in
 `crates/omnigraph-server/src/lib.rs`; it returns `Err` for the "no
-tokens, no policy, no flag" cell so the server refuses to start instead
-of silently shipping an open instance. Tests pin every cell of the
+tokens, no policy, no flag" cell and for "policy file, no tokens" so the
+server refuses to start instead of silently shipping an open instance or
+a policy-protected server that can only 401. Tests pin every cell of the
 matrix and the State-2 deny path.

 Server-side, `authorize_request()` still runs at the HTTP boundary —
--- a/docs/user/server.md
+++ b/docs/user/server.md
@ -1,26 +1,64 @@
 # HTTP Server (`omnigraph-server`)

-Axum 0.8 + tokio + utoipa-generated OpenAPI. Single graph per process; deploy multiple processes for multi-tenant.
+Axum 0.8 + tokio + utoipa-generated OpenAPI. **Two modes** (v0.6.0+): single-graph (legacy) and multi-graph (MR-668). Mode is inferred from CLI args + config shape.
+
+## Modes
+
+### Single-graph mode (legacy)
+
+`omnigraph-server <URI>` or `omnigraph-server --target <name> --config omnigraph.yaml`. Routes are flat — `/snapshot`, `/read`, `/branches`, etc. Behavior unchanged from v0.6.0.
+
+### Multi-graph mode (v0.6.0+)
+
+`omnigraph-server --config omnigraph.yaml` with a non-empty `graphs:` map and **no** single-mode selector (no `server.graph`, no `<URI>`, no `--target`). The server opens every configured graph in parallel at startup (bounded concurrency = 4, fail-fast on the first open error). Routes are nested under `/graphs/{graph_id}/...`. Bare flat paths return 404 in multi mode.
+
+Mode inference (four-rule matrix):
+
+1. CLI positional `<URI>` → single
+2. CLI `--target <name>` → single
+3. `server.graph` in config → single
+4. `--config` + non-empty `graphs:` + no single-mode selector → **multi**
+5. otherwise → error with migration hint

 ## Endpoint inventory

+Per-graph endpoints — same body shape across modes; URLs differ:
+
+| Method | Single-mode path | Multi-mode path | Auth | Action | Handler |
+|---|---|---|---|---|---|
+| GET | `/healthz` | `/healthz` | none | — | `server_health` |
+| GET | `/openapi.json` | `/openapi.json` | none | — | `server_openapi` (strips security if auth disabled; in multi mode emits cluster paths with `cluster_` operation-id prefix) |
+| GET | `/snapshot?branch=` | `/graphs/{id}/snapshot?branch=` | bearer + `read` | snapshot of branch | `server_snapshot` |
+| POST | `/read` | `/graphs/{id}/read` | bearer + `read` | run named query | `server_read` |
+| POST | `/export` | `/graphs/{id}/export` | bearer + `export` | NDJSON stream | `server_export` |
+| POST | `/change` | `/graphs/{id}/change` | bearer + `change` | mutation | `server_change` |
+| GET | `/schema` | `/graphs/{id}/schema` | bearer + `read` | get current `.pg` source | `server_schema_get` |
+| POST | `/schema/apply` | `/graphs/{id}/schema/apply` | bearer + `schema_apply` (target=`main`) | migrate | `server_schema_apply` |
+| POST | `/ingest` | `/graphs/{id}/ingest` | bearer + `branch_create` (if new) + `change` | bulk load | `server_ingest` (32 MB body limit) |
+| GET | `/branches` | `/graphs/{id}/branches` | bearer + `read` | list branches | `server_branch_list` |
+| POST | `/branches` | `/graphs/{id}/branches` | bearer + `branch_create` | create | `server_branch_create` |
+| DELETE | `/branches/{branch}` | `/graphs/{id}/branches/{branch}` | bearer + `branch_delete` | delete | `server_branch_delete` |
+| POST | `/branches/merge` | `/graphs/{id}/branches/merge` | bearer + `branch_merge` | merge `source → target` | `server_branch_merge` |
+| GET | `/commits?branch=` | `/graphs/{id}/commits?branch=` | bearer + `read` | list | `server_commit_list` |
+| GET | `/commits/{commit_id}` | `/graphs/{id}/commits/{commit_id}` | bearer + `read` | show | `server_commit_show` |
+
+Server-level management endpoints (v0.6.0+):
+
 | Method | Path | Auth | Action | Handler |
 |---|---|---|---|---|
-| GET | `/healthz` | none | — | `server_health` |
-| GET | `/openapi.json` | none | — | `server_openapi` (strips security if auth disabled) |
-| GET | `/snapshot?branch=` | bearer + `read` | snapshot of branch | `server_snapshot` |
-| POST | `/read` | bearer + `read` | run named query | `server_read` |
-| POST | `/export` | bearer + `export` | NDJSON stream | `server_export` |
-| POST | `/change` | bearer + `change` | mutation | `server_change` |
-| GET | `/schema` | bearer + `read` | get current `.pg` source | `server_schema_get` |
-| POST | `/schema/apply` | bearer + `schema_apply` (target=`main`) | migrate | `server_schema_apply` |
-| POST | `/ingest` | bearer + `branch_create` (if new) + `change` | bulk load | `server_ingest` (32 MB body limit) |
-| GET | `/branches` | bearer + `read` | list branches | `server_branch_list` |
-| POST | `/branches` | bearer + `branch_create` | create | `server_branch_create` |
-| DELETE | `/branches/{branch}` | bearer + `branch_delete` | delete | `server_branch_delete` |
-| POST | `/branches/merge` | bearer + `branch_merge` | merge `source → target` | `server_branch_merge` |
-| GET | `/commits?branch=` | bearer + `read` | list | `server_commit_list` |
-| GET | `/commits/{commit_id}` | bearer + `read` | show | `server_commit_show` |
+| GET | `/graphs` | bearer + `graph_list` on `Server::"root"` | list registered graphs | `server_graphs_list` (405 in single mode) |
+
+## Adding and removing graphs (multi mode)
+
+Runtime add/remove via API is **not** exposed in v0.6.0 — neither
+`POST /graphs` nor `DELETE /graphs/{id}` is implemented. Operators add
+or remove graphs by stopping the server, editing the `graphs:` map in
+`omnigraph.yaml`, then restarting. The server treats `omnigraph.yaml`
+as operator-owned configuration and never writes it.
+
+A future release may introduce a managed registry (Lance-backed,
+catalog-style: reserve → init → publish with recovery sidecars) and
+re-expose runtime mutation on top of it.

 ## Streaming

@ -79,7 +117,10 @@ endpoints (`/snapshot`, `/read`, `/export`, `/branches` GET, `/commits`,
  1. `OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` — AWS Secrets Manager (build with `--features aws`)
  2. `OMNIGRAPH_SERVER_BEARER_TOKENS_FILE` or `OMNIGRAPH_SERVER_BEARER_TOKENS_JSON` — JSON `{actor_id: token, …}`
  3. `OMNIGRAPH_SERVER_BEARER_TOKEN` — single legacy token, actor `default`
- If no tokens configured, server runs unauthenticated (local dev) and `/openapi.json` strips the security scheme.
+- If no tokens are configured, startup refuses unless `--unauthenticated` or
+  `OMNIGRAPH_UNAUTHENTICATED=1` explicitly opts into open local-dev mode. A
+  policy file without tokens is also rejected at startup. In open mode
+  `/openapi.json` strips the security scheme.

 See [deployment.md](deployment.md) for token-source operational details.

@ -98,4 +139,4 @@ See [deployment.md](deployment.md) for token-source operational details.
  admission control" above). No global rate limiter is configured;
  add `tower_http::limit` if a graph-wide cap is needed.
 - Pagination — none (commits/branches return everything; export streams).
- Multi-tenant routing — one graph per process.
+- Runtime graph add/remove — edit `omnigraph.yaml` and restart.
--- a/openapi.json
+++ b/openapi.json
@ -585,6 +585,63 @@
        ]
      }
    },
+    "/graphs": {
+      "get": {
+        "tags": [
+          "management"
+        ],
+        "summary": "List every graph currently registered with this server (MR-668).",
+        "description": "Multi-graph mode only. In single mode, the route returns 405 — there's\nno registry to enumerate. Cedar-gated by the server-level policy via\nthe `graph_list` action against `Omnigraph::Server::\"root\"`.\n\nOrder: alphabetical by `graph_id` (server-sorted so clients see\ndeterministic output across requests).",
+        "operationId": "listGraphs",
+        "responses": {
+          "200": {
+            "description": "List of registered graphs",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/GraphListResponse"
+                }
+              }
+            }
+          },
+          "401": {
+            "description": "Unauthorized",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/ErrorOutput"
+                }
+              }
+            }
+          },
+          "403": {
+            "description": "Forbidden",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/ErrorOutput"
+                }
+              }
+            }
+          },
+          "405": {
+            "description": "Method not allowed (single-graph mode)",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/ErrorOutput"
+                }
+              }
+            }
+          }
+        },
+        "security": [
+          {
+            "bearer_token": []
+          }
+        ]
+      }
+    },
    "/healthz": {
      "get": {
        "tags": [
@ -1199,6 +1256,7 @@
          "forbidden",
          "bad_request",
          "not_found",
+          "method_not_allowed",
          "conflict",
          "too_many_requests",
          "internal"
@ -1268,6 +1326,37 @@
          }
        }
      },
+      "GraphInfo": {
+        "type": "object",
+        "description": "One entry in the response from `GET /graphs`. Cluster operators\nconsume this list to discover which graphs the server is currently\nserving. The shape is intentionally minimal — `graph_id` and `uri`\nare the only fields a routing client needs.",
+        "required": [
+          "graph_id",
+          "uri"
+        ],
+        "properties": {
+          "graph_id": {
+            "type": "string"
+          },
+          "uri": {
+            "type": "string"
+          }
+        }
+      },
+      "GraphListResponse": {
+        "type": "object",
+        "description": "Response from `GET /graphs`. Lists every graph registered with the\nserver in alphabetical order by `graph_id` (sorted server-side so\nclients get deterministic output across requests).",
+        "required": [
+          "graphs"
+        ],
+        "properties": {
+          "graphs": {
+            "type": "array",
+            "items": {
+              "$ref": "#/components/schemas/GraphInfo"
+            }
+          }
+        }
+      },
      "HealthOutput": {
        "type": "object",
        "required": [