[codex] fix RFC-011 follow-up regressions (#258)

* fix rfc-011 follow-up regressions

* test(cli): remove served schema-apply tests obsoleted by the cluster 409

This PR disables server-side schema apply for cluster-backed serving (409 →
`omnigraph cluster apply`). Two system_local tests still drove *served* schema
apply against a spawned `--cluster` server and asserted the pre-409 behavior, so
they failed under `cargo test --workspace`:

- `local_cli_schema_apply_enforces_engine_layer_policy` — expected a per-actor
  policy `denied`/allow on the served route; the route now 409s for everyone
  before policy runs.
- `local_cli_schema_apply_rejects_stored_query_breakage_before_publish` —
  expected a served apply to reject a stored-query breakage; the route now 409s
  before any apply.

Both exercise a path the PR intentionally removed. Their surviving coverage:
the 409 itself is pinned by `schema_routes::schema_apply_route_refuses_cluster_backed_server_mode`
(asserts 409 + no mutation); stored-query-breakage-before-publish stays covered
by `schema_routes::schema_apply_route_rejects_stored_query_breakage_before_publish`
(single-mode); engine-layer schema_apply Cedar enforcement stays covered by
`policy_engine_chassis`. Remove the obsolete served versions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(server): report the cluster-backed schema-apply 409 after the Cedar gate

The 409 ("schema apply is disabled for cluster-backed serving") fired at the top
of `server_schema_apply`, before `authorize_request`. An authenticated-but-
unauthorized actor therefore learned the server is cluster-backed (409) instead
of getting a normal 403 — leaking topology before authorization, against the
same posture that keeps `GET /graphs` default-deny.

Move the 409 below the Cedar gate so the route reports 401 → 403 → 409: an
unauthorized actor gets 403, and only an actor authorized for `schema_apply`
sees the actionable "use `omnigraph cluster apply`" 409. (An open/unauthenticated
server still 409s, as it has no topology to protect.)

Regression: `schema_apply_route_cluster_backed_denies_unauthorized_actor_before_409`
(POLICY_YAML grants no schema_apply → act-ragnor gets 403, not 409). Addresses the
bot-review finding on #258.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Andrew Altshuler 2026-06-16 03:11:43 +03:00 committed by GitHub
parent 9513b076d2
commit b5658dc696
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
19 changed files with 429 additions and 261 deletions

View file

@ -53,14 +53,13 @@ pub(crate) async fn server_graphs_list(
) -> std::result::Result<Json<GraphListResponse>, ApiError> {
let registry = &state.routing().registry;
// Server-level Cedar gate. `state.server_policy` is loaded from
// `server.policy.file` in `omnigraph.yaml` at startup. When no
// server policy is configured, `authorize_request_server` falls
// through to the MR-723 default-deny semantics (every non-Read
// action denied for an authenticated actor). `GraphList` is not
// `Read`, so without a server policy the request gets 403 — which
// is the right default (don't leak the registry until the operator
// explicitly authorizes it).
// Server-level Cedar gate. `state.server_policy` is loaded from the
// cluster-scoped policy bundle at startup. When no server policy is
// configured, `authorize_request_server` falls through to the MR-723
// default-deny semantics (every non-Read action denied for an
// authenticated actor). `GraphList` is not `Read`, so without a server
// policy the request gets 403 — which is the right default (don't leak
// the registry until the operator explicitly authorizes it).
authorize_request(
actor.as_ref().map(|Extension(actor)| actor),
state.server_policy.as_deref(),
@ -360,22 +359,25 @@ pub(crate) fn authorize(
// runtime state means the docstring contract on
// `server_graphs_list` ("don't leak the registry until the
// operator explicitly authorizes it") holds uniformly; the
// operator's only path to enabling it is configuring an
// explicit `server.policy.file` in omnigraph.yaml.
// operator's only path to enabling it is configuring a
// cluster-scoped policy bundle, applying the cluster, and
// restarting the server.
if request.action.resource_kind() == PolicyResourceKind::Server {
return Ok(Authz::Denied(
"server-scoped actions require an explicit `server.policy.file` \
configured in omnigraph.yaml the management surface is closed \
by default in every runtime state, including --unauthenticated, \
so that server topology is never exposed without operator opt-in."
"server-scoped actions require an explicit cluster policy bundle \
applied with `omnigraph cluster apply` and served after restart \
the management surface is closed by default in every runtime state, \
including --unauthenticated, so that server topology is never exposed \
without operator opt-in."
.to_string(),
));
}
if actor.is_some() && request.action != PolicyAction::Read {
return Ok(Authz::Denied(
"server runs in default-deny mode (bearer tokens configured but no \
policy file). Only `read` actions are permitted; configure \
`policy.file` in omnigraph.yaml to enable other actions."
applied policy bundle). Only `read` actions are permitted; configure \
a graph or cluster policy bundle in the cluster config, run \
`omnigraph cluster apply`, and restart the server to enable other actions."
.to_string(),
));
}
@ -488,7 +490,7 @@ pub(crate) fn deprecation_headers(successor_link: &'static str) -> [(HeaderName,
operation_id = "read",
request_body = ReadRequest,
responses(
(status = 200, description = "Query results (response includes `Deprecation: true` + `Link: </query>; rel=\"successor-version\"`)", body = ReadOutput),
(status = 200, description = "Query results (response includes `Deprecation: true` + `Link: <query>; rel=\"successor-version\"`)", body = ReadOutput),
(status = 400, description = "Bad request", body = ErrorOutput),
(status = 401, description = "Unauthorized", body = ErrorOutput),
(status = 403, description = "Forbidden", body = ErrorOutput),
@ -502,7 +504,7 @@ pub(crate) fn deprecation_headers(successor_link: &'static str) -> [(HeaderName,
/// route is kept indefinitely for byte-stable back-compat. New integrations
/// should target `POST /query`, which has clean field names (`query` /
/// `name`) and a 400-on-mutation guard. Responses from this route include
/// `Deprecation: true` and `Link: </query>; rel="successor-version"`
/// `Deprecation: true` and `Link: <query>; rel="successor-version"`
/// headers per RFC 9745 / RFC 8288 so SDKs and proxies can surface the
/// signal.
pub(crate) async fn server_read(
@ -522,7 +524,7 @@ pub(crate) async fn server_read(
)
.await?;
Ok((
deprecation_headers("</query>; rel=\"successor-version\""),
deprecation_headers("<query>; rel=\"successor-version\""),
Json(api::read_output(selected_name, &target, result)),
))
}
@ -771,7 +773,7 @@ pub(crate) async fn run_query(
operation_id = "change",
request_body = ChangeRequest,
responses(
(status = 200, description = "Mutation results (response includes `Deprecation: true` + `Link: </mutate>; rel=\"successor-version\"`)", body = ChangeOutput),
(status = 200, description = "Mutation results (response includes `Deprecation: true` + `Link: <mutate>; rel=\"successor-version\"`)", body = ChangeOutput),
(status = 400, description = "Bad request", body = ErrorOutput),
(status = 401, description = "Unauthorized", body = ErrorOutput),
(status = 403, description = "Forbidden", body = ErrorOutput),
@ -787,7 +789,7 @@ pub(crate) async fn run_query(
/// kept indefinitely for back-compat. New integrations should target
/// `POST /mutate`, which has identical semantics and a name that pairs
/// cleanly with `POST /query`. Responses from this route include
/// `Deprecation: true` and `Link: </mutate>; rel="successor-version"`
/// `Deprecation: true` and `Link: <mutate>; rel="successor-version"`
/// headers per RFC 9745 / RFC 8288 so SDKs and proxies can surface the
/// signal.
pub(crate) async fn server_change(
@ -808,7 +810,7 @@ pub(crate) async fn server_change(
)
.await?;
Ok((
deprecation_headers("</mutate>; rel=\"successor-version\""),
deprecation_headers("<mutate>; rel=\"successor-version\""),
Json(output),
))
}
@ -1111,12 +1113,16 @@ pub(crate) async fn server_schema_get(
(status = 400, description = "Bad request", body = ErrorOutput),
(status = 401, description = "Unauthorized", body = ErrorOutput),
(status = 403, description = "Forbidden", body = ErrorOutput),
(status = 409, description = "Schema apply is disabled for cluster-backed serving; use `omnigraph cluster apply` and restart", body = ErrorOutput),
(status = 429, description = "Per-actor admission cap exceeded; honor `Retry-After` header", body = ErrorOutput),
),
security(("bearer_token" = [])),
)]
/// Apply a schema migration.
///
/// Cluster-backed servers reject this route with `409 Conflict`; operators
/// must apply schema changes through `omnigraph cluster apply` and restart.
///
/// Diffs `schema_source` against the current schema and applies the resulting
/// migration steps (add/drop type, add/drop column, etc.). **Destructive**:
/// some steps drop data. Returns the list of steps applied; if `applied` is
@ -1143,6 +1149,17 @@ pub(crate) async fn server_schema_apply(
target_branch: Some("main".to_string()),
},
)?;
// Disable HTTP schema apply on cluster-backed serving AFTER the Cedar gate,
// so an unauthorized actor gets a 403 (not a 409 that would disclose the
// server is cluster-backed): 401 → 403 → 409, never leak topology before
// authorization. An authorized actor gets the actionable 409 signpost.
if state.routing().config_path.is_some() {
return Err(ApiError::conflict(
"server-side schema apply is disabled for cluster-backed serving; \
update the cluster config, run `omnigraph cluster apply`, and restart \
the server.",
));
}
let est_bytes = request.schema_source.len() as u64;
let _admission = state
.workload
@ -1324,7 +1341,7 @@ pub(crate) async fn server_load(
operation_id = "ingest",
request_body = IngestRequest,
responses(
(status = 200, description = "Load results (response includes `Deprecation: true` + `Link: </load>; rel=\"successor-version\"`)", body = IngestOutput),
(status = 200, description = "Load results (response includes `Deprecation: true` + `Link: <load>; rel=\"successor-version\"`)", body = IngestOutput),
(status = 400, description = "Bad request", body = ErrorOutput),
(status = 401, description = "Unauthorized", body = ErrorOutput),
(status = 403, description = "Forbidden", body = ErrorOutput),
@ -1338,7 +1355,7 @@ pub(crate) async fn server_load(
/// Bulk-load NDJSON data into a branch. Behavior is unchanged; the route is
/// kept indefinitely for back-compat. New integrations should target
/// `POST /load`, which has identical semantics. Responses from this route
/// include `Deprecation: true` and `Link: </load>; rel="successor-version"`
/// include `Deprecation: true` and `Link: <load>; rel="successor-version"`
/// headers per RFC 9745 / RFC 8288 so SDKs and proxies can surface the signal.
pub(crate) async fn server_ingest(
State(state): State<AppState>,
@ -1354,7 +1371,7 @@ pub(crate) async fn server_ingest(
)
.await?;
Ok((
deprecation_headers("</load>; rel=\"successor-version\""),
deprecation_headers("<load>; rel=\"successor-version\""),
Json(output),
))
}
@ -1738,4 +1755,3 @@ pub(crate) fn query_params_from_json(
json_params_to_param_map(params_json, query_params, JsonParamMode::Standard)
.map_err(|err| color_eyre::eyre::eyre!(err.to_string()))
}

View file

@ -191,10 +191,10 @@ pub enum ServerConfigMode {
},
}
/// Where a Cedar policy bundle comes from at startup. File-based for
/// omnigraph.yaml deployments; inline (digest-verified catalog content)
/// for cluster-mode boots, where the catalog may live on object storage
/// and the server must not re-read mutable state after the snapshot.
/// Where a Cedar policy bundle comes from at startup. Cluster-local files are
/// used during config application; inline digest-verified catalog content is
/// used for serving, where the catalog may live on object storage and the
/// server must not re-read mutable state after the snapshot.
#[derive(Debug, Clone)]
pub enum PolicySource {
File(PathBuf),
@ -249,12 +249,10 @@ pub struct AppState {
/// see MR-668 decision Q6.
workload: Arc<workload::WorkloadController>,
bearer_tokens: Arc<[(BearerTokenHash, Arc<str>)]>,
/// Server-level Cedar policy. Used by management endpoints (`POST
/// /graphs`, `GET /graphs`) which act on the registry resource,
/// not on a per-graph resource. Loaded from `server.policy.file`
/// in `omnigraph.yaml`. `None` outside multi mode and when no
/// server policy is configured. Per-graph policies live on each
/// `GraphHandle.policy`.
/// Server-level Cedar policy. Used by management endpoints (`GET
/// /graphs`) which act on the registry resource, not on a per-graph
/// resource. Loaded from the cluster-scoped policy binding when
/// configured. Per-graph policies live on each `GraphHandle.policy`.
server_policy: Option<Arc<PolicyEngine>>,
}
@ -534,12 +532,11 @@ impl AppState {
}
/// Multi-mode constructor — used by the startup loop. Operators
/// reach this by invoking `omnigraph-server --config omnigraph.yaml`
/// with a non-empty `graphs:` map.
/// reach this by invoking `omnigraph-server --cluster <dir|s3://...>`.
///
/// Caller supplies the already-opened `GraphHandle`s and (optionally)
/// the path to the source config file. `server_policy` is loaded
/// from `server.policy.file` if configured.
/// the path to the source cluster. `server_policy` is loaded from the
/// cluster-scoped policy binding if configured.
pub fn new_multi(
handles: Vec<Arc<GraphHandle>>,
bearer_tokens: Vec<(String, String)>,
@ -993,7 +990,8 @@ pub async fn serve(config: ServerConfig) -> Result<()> {
ServerRuntimeState::DefaultDeny => warn!(
"bearer tokens are configured but no policy file is set — running in \
default-deny mode (only `read` actions are permitted for authenticated \
actors). Configure `policy.file` in omnigraph.yaml to enable Cedar rules."
actors). Configure a graph or cluster policy bundle in the cluster config, \
run `omnigraph cluster apply`, and restart to enable Cedar rules."
),
ServerRuntimeState::PolicyEnabled => {}
}
@ -1123,5 +1121,3 @@ async fn shutdown_signal() {
}
info!("shutdown signal received");
}

View file

@ -1,14 +1,13 @@
//! Server settings: omnigraph.yaml/CLI/env resolution, mode inference
//! (single vs multi vs cluster), bearer-token sources, and runtime-state
//! classification (moved verbatim from lib.rs in the modularization).
//! Server settings: cluster/CLI/env resolution, bearer-token sources, and
//! runtime-state classification (moved verbatim from lib.rs in the
//! modularization).
use super::*;
/// Build serving settings from a cluster directory's applied revision
/// (RFC-005 §D2): graphs at derived roots, stored queries from verified
/// catalog blob content, policy bundles from blob paths with their applied
/// bindings. Always multi-graph routing. The unauthenticated/env handling
/// matches the omnigraph.yaml path.
/// bindings. Always multi-graph routing.
pub(crate) async fn load_cluster_settings(
cluster_dir: &PathBuf,
cli_bind: Option<String>,
@ -189,7 +188,8 @@ pub fn classify_server_runtime_state(
"server has no bearer tokens and no policy file configured. This is a fully \
open server pass `--unauthenticated` (or set OMNIGRAPH_UNAUTHENTICATED=1) \
if you actually want that, otherwise configure bearer tokens (see \
docs/user/operations/server.md) and/or `policy.file` in omnigraph.yaml."
docs/user/operations/server.md) and a graph or cluster policy bundle in \
the cluster config, then run `omnigraph cluster apply` and restart."
),
(false, false, true) => Ok(ServerRuntimeState::Open),
(true, false, _) => Ok(ServerRuntimeState::DefaultDeny),