fix: close validated init and multi-graph gaps

This commit is contained in:
Ragnor Comerford 2026-05-28 15:41:04 +02:00
parent 37ec7373f5
commit eab99e6f48
No known key found for this signature in database
45 changed files with 1058 additions and 454 deletions

View file

@ -25,6 +25,11 @@ Runtime add/remove (`POST /graphs`, `DELETE /graphs/{id}`, `omnigraph graphs cre
- **`PolicyEngine::load(path, graph_id)` removed** in favor of two kind-typed loaders: `PolicyEngine::load_graph(path, graph_id)` for per-graph policies and `PolicyEngine::load_server(path)` for server-level policies. Each loader rejects rules whose action `resource_kind()` doesn't match the engine kind — operators who put a `graph_list` rule in a per-graph file (or a `read` rule in a server file) now get a load-time error instead of a silently-never-matching rule.
- **`PolicyRequest::actor_id` field removed.** Actor identity is now a separate parameter on `PolicyEngine::authorize(actor_id, &request)`. The type system enforces the server-authoritative-actor invariant: actor identity is always sourced from the bearer-token match resolved at the auth boundary; handlers cannot smuggle identity through the request body.
- **`Omnigraph::init` is strict by default.** Initialization at a URI that already holds schema files now errors with `OmniError::AlreadyInitialized` instead of silently overwriting. Operators who actually want to overwrite use `InitOptions { force: true }` (CLI: `omnigraph init --force`). Closes the destructive-cleanup footgun where a failed re-init would delete an existing graph's schema files.
- **Top-level `policy.file` is rejected in multi-graph server mode.** It remains valid for single-graph / CLI-local policy. Multi-graph deployments must move graph rules to `graphs.<graph_id>.policy.file` and server-scoped `graph_list` rules to `server.policy.file`.
- **Open server startup requires explicit opt-in.** A server with no bearer tokens and no policy now refuses to start unless passed `--unauthenticated` or `OMNIGRAPH_UNAUTHENTICATED=1`.
- **Policy requires bearer tokens.** Configuring any policy file without bearer tokens now refuses startup; otherwise every protected request would 401 before Cedar could evaluate it.
- **Tokens without policy default-deny non-read actions.** Existing authenticated deployments that relied on writes or admin routes without Cedar policy must add policy rules for those actions.
- **`GET /graphs` requires `server.policy.file` in every runtime state.** Even `--unauthenticated` mode keeps server topology closed until the operator explicitly authorizes `graph_list`.
## New
@ -33,9 +38,9 @@ Runtime add/remove (`POST /graphs`, `DELETE /graphs/{id}`, `omnigraph graphs cre
- **CLI `omnigraph graphs list`**. Mirrors the HTTP surface. Rejects local URI targets with a clear message — for remote multi-graph servers only.
- **CLI `omnigraph init --force`**. Bypasses the strict-init preflight when an operator deliberately wants to recover from orphan schema files. Does NOT purge existing Lance datasets; recursive deletion needs `StorageAdapter::delete_prefix` (deferred — see below).
- **Per-graph Cedar policy**. Each entry in the `graphs:` map can carry a `policy.file` path, loaded at startup via `PolicyEngine::load_graph`. Cedar's `Omnigraph::Graph::"<graph_id>"` resource is per-graph; the new `Omnigraph::Server::"root"` resource governs server-level actions.
- **Server-level Cedar policy**. `server.policy.file` in the config governs the `graph_list` action on `Omnigraph::Server::"root"`. Required to expose `GET /graphs` once bearer tokens are configured — without a server policy the default-deny posture rejects `graph_list` as a non-`read` action.
- **Server-level Cedar policy**. `server.policy.file` in the config governs the `graph_list` action on `Omnigraph::Server::"root"`. Required to expose `GET /graphs` in every runtime state — without a server policy the default-deny posture rejects `graph_list`, including in `--unauthenticated` mode.
- **Cedar action vocabulary**: `graph_list` (server-scoped). Runtime `graph_create` / `graph_delete` are reserved but not shipped — see "Deferred."
- **Startup invariant: policy requires tokens.** Configuring any policy file (per-graph or server-level) without bearer tokens now refuses to boot — `serve()` would otherwise start a server that 401s every request. The check lives in `classify_server_runtime_state` and applies uniformly to single and multi mode.
- **Canonical graph URI identity.** Server startup normalizes graph root URIs before registry insertion and response output, so aliases such as `/tmp/g`, `/tmp/g/`, and `file:///tmp/g` cannot register as distinct graphs that actually share one Lance root.
## Configuration
@ -69,7 +74,7 @@ graphs:
## User Impact
- **No on-disk migration is required.** Existing `.omni` graphs from v0.5.0 (and earlier) open cleanly under v0.6.0 — Lance datasets, `__manifest`, `_schema.pg`, `_schema.ir.json`, `__schema_state.json`, `_graph_commits.lance`, `_graph_commit_recoveries.lance` all use unchanged formats. No conversion step.
- **Existing single-graph deployments upgrade with zero changes.** `omnigraph-server <URI>` with v0.5.0 config keeps working identically.
- **Existing single-graph storage upgrades without migration.** Server deployments may need auth/policy config changes: explicitly pass `--unauthenticated` for local open mode, configure tokens when using policy, and add Cedar policy for non-read authenticated actions.
- **Multi-graph adoption is opt-in.** Add a `graphs:` map to `omnigraph.yaml` (and remove `server.graph`) to switch a deployment to multi mode.
- **Cluster routes are breaking for client SDKs targeting multi mode.** Generated clients from previous v0.5.0 OpenAPI specs will hit 404 on flat paths against a multi-mode server. Regenerate against the v0.6.0 `openapi.json`.
- **Supported YAML policy authoring is unchanged.** The Cedar `Omnigraph::Graph` and `Omnigraph::Server` entities are internally generated by `compile_policy_source` — operator YAML only references actions and groups.

View file

@ -46,12 +46,15 @@ and configure the matching `bearer_token_env` in `omnigraph.yaml`.
## Multi-graph servers (v0.6.0+)
Against a multi-graph server (started with `--config omnigraph.yaml` referencing a non-empty `graphs:` map), use `omnigraph graphs list` to enumerate the registered graphs:
Against a multi-graph server (started with `--config omnigraph.yaml` referencing a non-empty `graphs:` map), use `omnigraph graphs list` to enumerate the registered graphs. The server must configure bearer tokens and `server.policy.file` with a rule that allows `graph_list`; `/graphs` is closed by default even when the server runs with `--unauthenticated`.
```bash
omnigraph graphs list --uri http://server.example.com --json
OMNIGRAPH_BEARER_TOKEN=admin-token \
omnigraph graphs list --uri http://server.example.com --json
```
For config-driven clients, set the remote graph's `bearer_token_env` to an environment variable containing a token whose actor is authorized by `server.policy.file`.
`list` rejects local URI targets — it's for remote multi-graph servers only.
Runtime add/remove is **not** in v0.6.0. To add a graph, stop the server, add a `graphs.<id>` entry to `omnigraph.yaml`, then restart. To remove, stop the server, delete the entry, restart.

View file

@ -109,7 +109,8 @@ docker run --rm -p 8080:8080 \
## Auth
The server can run unauthenticated for local development, but any shared or
The server can run unauthenticated for local development only when explicitly
started with `--unauthenticated` or `OMNIGRAPH_UNAUTHENTICATED=1`. Any shared or
internet-facing deployment should set a bearer token source.
### Token sources

View file

@ -46,7 +46,12 @@ graphs:
# no per-graph policy → no engine-layer Cedar enforcement on beta
```
Each graph's HTTP request flows through its own per-graph policy. The management endpoint (`GET /graphs`) flows through the server-level policy. When `server.policy.file` is unset and bearer tokens are configured, `GET /graphs` falls through to MR-723 default-deny (only `read`-equivalent actions allowed for authenticated actors — and `graph_list` is not `read`) → 403. So the operator must explicitly authorize via `server-policy.yaml` to expose `/graphs`.
Top-level `policy.file` is single-graph / CLI-local policy only. Multi-graph
server startup rejects it because applying one graph policy to every configured
graph is ambiguous. Move per-graph rules to `graphs.<graph_id>.policy.file` and
move `graph_list` rules to `server.policy.file`.
Each graph's HTTP request flows through its own per-graph policy. The management endpoint (`GET /graphs`) flows through the server-level policy. When `server.policy.file` is unset, `GET /graphs` is denied in every runtime state, including `--unauthenticated`; with bearer tokens configured, it returns 403 after admission control because `graph_list` is not a `read`-equivalent action. The operator must explicitly authorize via `server-policy.yaml` to expose `/graphs`.
Example server-level policy:
@ -116,12 +121,13 @@ reaches `authorize_request()` without a matching policy permit.
|---|---|---|---|
| **Open** | no | no | Every request is permitted. Refuses to start unless `--unauthenticated` or `OMNIGRAPH_UNAUTHENTICATED=1` is set — the operator must explicitly opt in. |
| **DefaultDeny** | yes | no | Every authenticated request for an action other than `read` is rejected with HTTP 403. Closes the "tokens but forgot the policy file" trap — an operator who sets up auth and forgot to point at a policy file used to ship the illusion of protection. |
| **PolicyEnabled** | any | yes | Every request is evaluated by Cedar against the configured policy. |
| **PolicyEnabled** | yes | yes | Every request is evaluated by Cedar against the configured policy. |
The classifier is `classify_server_runtime_state` in
`crates/omnigraph-server/src/lib.rs`; it returns `Err` for the "no
tokens, no policy, no flag" cell so the server refuses to start instead
of silently shipping an open instance. Tests pin every cell of the
tokens, no policy, no flag" cell and for "policy file, no tokens" so the
server refuses to start instead of silently shipping an open instance or
a policy-protected server that can only 401. Tests pin every cell of the
matrix and the State-2 deny path.
Server-side, `authorize_request()` still runs at the HTTP boundary —

View file

@ -117,7 +117,9 @@ endpoints (`/snapshot`, `/read`, `/export`, `/branches` GET, `/commits`,
1. `OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` — AWS Secrets Manager (build with `--features aws`)
2. `OMNIGRAPH_SERVER_BEARER_TOKENS_FILE` or `OMNIGRAPH_SERVER_BEARER_TOKENS_JSON` — JSON `{actor_id: token, …}`
3. `OMNIGRAPH_SERVER_BEARER_TOKEN` — single legacy token, actor `default`
- If no tokens configured, server runs unauthenticated (local dev) and `/openapi.json` strips the security scheme.
- If no tokens and no policy are configured, startup refuses unless
`--unauthenticated` or `OMNIGRAPH_UNAUTHENTICATED=1` explicitly opts into
open local-dev mode. In that mode `/openapi.json` strips the security scheme.
See [deployment.md](deployment.md) for token-source operational details.
@ -136,4 +138,4 @@ See [deployment.md](deployment.md) for token-source operational details.
admission control" above). No global rate limiter is configured;
add `tower_http::limit` if a graph-wide cap is needed.
- Pagination — none (commits/branches return everything; export streams).
- Multi-tenant routing — one graph per process.
- Runtime graph add/remove — edit `omnigraph.yaml` and restart.