PR 9 — the final integration PR for MR-668 multi-graph server work.
Closes the v0.7.0 release.
Composite lifecycle tests (closes gaps flagged in PR 7's coverage
review):
- `multi_graph_lifecycle_post_query_restart_persistence` — POST a
graph, query it via cluster route, reload the config from disk
and confirm `load_server_settings` sees the rewritten YAML.
Validates the "restart resolves orphans" failure-mode story.
- `per_graph_policy_enforced_on_post_created_graph` — POST a graph
with a per-graph policy attached, then send authenticated read
and change requests. Per-graph Cedar enforcement fires correctly
on a POST-created graph (engine-layer policy reinstalled via
`Omnigraph::with_policy` inside the create flow).
- `concurrent_post_graphs_distinct_ids_all_succeed` — 4 concurrent
POSTs with distinct graph_ids all return 201. Caught a real
race in `rewrite_atomic` (see below).
Race fix — `rewrite_atomic_with_modify`:
The first composite test surfaced a real bug. The old
`rewrite_atomic(path, new_config, expected_hash)` captured the
baseline hash OUTSIDE the flock, then called rewrite_atomic which
re-acquired it inside. Under concurrent writers:
- POST A: captures baseline H0, calls rewrite_atomic.
- POST B: captures baseline H0 too (before A's update lands).
- A: acquires flock, on-disk == H0, writes H1, releases.
- A: updates baseline H0 → H1.
- B: tries to acquire flock — waits.
- B: acquires flock. On-disk is now H1. Expected (captured
before A finished) is H0. MISMATCH → spurious Drift error.
Worse: even if the timing happens to align, B's `updated` config
was constructed from BYTES read before the flock. B writes a config
that doesn't include A's new graph — silent data loss.
The fix: new `config::rewrite_atomic_with_modify(path, baseline,
modify)` takes a closure. Inside the flock + baseline mutex:
1. Read on-disk bytes, hash, compare to baseline.
2. Parse on-disk YAML.
3. Call `modify(parsed)` to produce the new config — receives
fresh on-disk state, returns the modification.
4. Serialize + write + fsync + rename + update baseline.
Everything is read-modify-write under the same critical section.
Concurrent writers serialize cleanly. Test confirmed this is no
longer a race.
The old `rewrite_atomic(path, new_config, expected_hash)` API stays
for tests that don't need the read-modify-write shape; the POST
handler switches to the new shape.
Version bump v0.6.0 → v0.7.0:
- All 5 `crates/*/Cargo.toml` (compiler, engine, policy, cli, server)
plus their inter-crate `path` dep version constraints.
- `Cargo.lock` regenerated by `cargo build --workspace`.
- `AGENTS.md` "Version surveyed" line, capability matrix HTTP-server
row updated to mention multi-graph + cluster routes + atomic YAML
rewrite.
- `openapi.json` regenerated.
Docs:
- `docs/releases/v0.7.0.md` (new) — release notes with breaking
changes, new features, deferred items (DELETE, `delete_prefix`,
actor forwarding), and the single→multi migration recipe.
- `docs/user/server.md` — substantial section additions for the
two modes, mode inference, cluster endpoint table, management
endpoints, `omnigraph.yaml` ownership contract, `POST /graphs`
body shape + status codes.
- `docs/user/cli.md` — `omnigraph graphs list/create` section,
deferred-DELETE note.
- `docs/user/policy.md` — server-scoped Cedar actions
(`graph_create`, `graph_list`), per-graph vs server-level policy
composition, example server-level policy.
Workspace test pass: 573 tests green across all crates. Zero
failures. MR-731 spoof regression still pinned and passing across
the entire 10-PR series.
This commit closes MR-668. v0.7.0 is ready for tagging.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.2 KiB
HTTP Server (omnigraph-server)
Axum 0.8 + tokio + utoipa-generated OpenAPI. Two modes (v0.7.0+): single-graph (legacy) and multi-graph (MR-668). Mode is inferred from CLI args + config shape.
Modes
Single-graph mode (legacy)
omnigraph-server <URI> or omnigraph-server --target <name> --config omnigraph.yaml. Routes are flat — /snapshot, /read, /branches, etc. Behavior unchanged from v0.6.0.
Multi-graph mode (v0.7.0+)
omnigraph-server --config omnigraph.yaml with a non-empty graphs: map and no single-mode selector (no server.graph, no <URI>, no --target). The server opens every configured graph in parallel at startup (bounded concurrency = 4, fail-fast on the first open error). Routes are nested under /graphs/{graph_id}/.... Bare flat paths return 404 in multi mode.
Mode inference (four-rule matrix):
- CLI positional
<URI>→ single - CLI
--target <name>→ single server.graphin config → single--config+ non-emptygraphs:+ no single-mode selector → multi- otherwise → error with migration hint
Endpoint inventory
Per-graph endpoints — same body shape across modes; URLs differ:
| Method | Single-mode path | Multi-mode path | Auth | Action | Handler |
|---|---|---|---|---|---|
| GET | /healthz |
/healthz |
none | — | server_health |
| GET | /openapi.json |
/openapi.json |
none | — | server_openapi (strips security if auth disabled; in multi mode emits cluster paths with cluster_ operation-id prefix) |
| GET | /snapshot?branch= |
/graphs/{id}/snapshot?branch= |
bearer + read |
snapshot of branch | server_snapshot |
| POST | /read |
/graphs/{id}/read |
bearer + read |
run named query | server_read |
| POST | /export |
/graphs/{id}/export |
bearer + export |
NDJSON stream | server_export |
| POST | /change |
/graphs/{id}/change |
bearer + change |
mutation | server_change |
| GET | /schema |
/graphs/{id}/schema |
bearer + read |
get current .pg source |
server_schema_get |
| POST | /schema/apply |
/graphs/{id}/schema/apply |
bearer + schema_apply (target=main) |
migrate | server_schema_apply |
| POST | /ingest |
/graphs/{id}/ingest |
bearer + branch_create (if new) + change |
bulk load | server_ingest (32 MB body limit) |
| GET | /branches |
/graphs/{id}/branches |
bearer + read |
list branches | server_branch_list |
| POST | /branches |
/graphs/{id}/branches |
bearer + branch_create |
create | server_branch_create |
| DELETE | /branches/{branch} |
/graphs/{id}/branches/{branch} |
bearer + branch_delete |
delete | server_branch_delete |
| POST | /branches/merge |
/graphs/{id}/branches/merge |
bearer + branch_merge |
merge source → target |
server_branch_merge |
| GET | /commits?branch= |
/graphs/{id}/commits?branch= |
bearer + read |
list | server_commit_list |
| GET | /commits/{commit_id} |
/graphs/{id}/commits/{commit_id} |
bearer + read |
show | server_commit_show |
Server-level management endpoints (v0.7.0+):
| Method | Path | Auth | Action | Handler |
|---|---|---|---|---|
| GET | /graphs |
bearer + graph_list on Server::"root" |
list registered graphs | server_graphs_list (405 in single mode) |
| POST | /graphs |
bearer + graph_create on Server::"root" |
create new graph at runtime | server_graphs_create (405 in single mode, 32 MB body limit) |
DELETE /graphs/{id} is not in v0.7.0. Operators remove graphs by stopping the server, editing omnigraph.yaml, then restarting.
omnigraph.yaml ownership (multi mode)
The server owns omnigraph.yaml while running. POST /graphs rewrites the file atomically under an exclusive fcntl::flock with SHA-256 drift detection:
- The server hashes the file at startup.
POST /graphsre-hashes under the flock before rewriting. If the hash doesn't match (operator hand-edited), the rewrite refuses with 503. - Comments and blank-line structure are not preserved across server-side rewrites — the file is regenerated via
serde_yaml::to_string. - Operators must not edit the file while the server is running. To make offline changes: stop the server, edit, restart.
In single mode the server never writes omnigraph.yaml.
POST /graphs body shape
{
"graph_id": "alpha",
"uri": "s3://tenant-bucket/alpha",
"schema": { "source": "<inline .pg source>" },
"policy": { "file": "./policies/alpha.yaml" }
}
schemaandpolicyare nested — leaves room for future fields without breaking the shape.policyis optional; without it, no per-graph Cedar enforcement.- Status codes: 201 Created · 400 invalid body · 401 missing bearer · 403 Cedar denied · 405 single mode · 409 duplicate
graph_idoruri· 413 body >32 MiB · 500 init or rewrite failure · 503 YAML drift.
Streaming
Only /export streams (application/x-ndjson, MPSC channel + Body::from_stream). Everything else is buffered JSON.
Error model
Uniform ErrorOutput { error, code?, merge_conflicts[], manifest_conflict? } with code ∈ unauthorized | forbidden | bad_request | not_found | conflict | too_many_requests | internal. Merge conflicts attach structured MergeConflictOutput { table_key, row_id?, kind, message }.
manifest_conflict is set on publisher CAS rejections (HTTP 409): the
caller's pre-write view of one table's manifest version was stale.
ManifestConflictOutput { table_key, expected, actual } tells the client
which table to refresh and retry. This is the conflict shape produced by
concurrent /change or /ingest calls landing the same (table, branch)
race.
HTTP status codes used: 200, 400, 401, 403, 404, 409, 429, 500.
Per-actor admission control
Disjoint
(table, branch) writes from different actors now run concurrently,
guarded only by the engine's per-(table, branch) write queue. To keep
one heavy actor from exhausting shared capacity (Lance I/O, manifest
churn, network), the server gates mutating handlers through a
WorkloadController configured per-process from environment variables:
| Env var | Default | Purpose |
|---|---|---|
OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX |
16 | Concurrent in-flight mutations per actor |
OMNIGRAPH_PER_ACTOR_BYTES_MAX |
4 GiB | In-flight estimated bytes per actor |
When an actor exceeds its in-flight count or byte budget, the server
returns HTTP 429 Too Many Requests with code: too_many_requests
and a Retry-After header (seconds). The actor should back off; other
actors are unaffected.
Cedar policy authorization runs before admission accounting so denied requests don't consume admission slots.
Today admission gates every mutating handler: /change, /ingest,
/branches/{create,delete,merge}, and /schema/apply. Read-only
endpoints (/snapshot, /read, /export, /branches GET, /commits,
/schema GET) are not admission-gated.
Body limits
- Default: 1 MB
/ingest: 32 MB
Auth model (bearer + SHA-256)
- Tokens are SHA-256 hashed on startup; plaintext is never persisted in memory.
- Constant-time comparison via
subtle::ConstantTimeEq. - Three sources, in precedence:
OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET— AWS Secrets Manager (build with--features aws)OMNIGRAPH_SERVER_BEARER_TOKENS_FILEorOMNIGRAPH_SERVER_BEARER_TOKENS_JSON— JSON{actor_id: token, …}OMNIGRAPH_SERVER_BEARER_TOKEN— single legacy token, actordefault
- If no tokens configured, server runs unauthenticated (local dev) and
/openapi.jsonstrips the security scheme.
See deployment.md for token-source operational details.
Tracing & observability
tower_http::TraceLayer::new_for_http()- Policy decisions logged at INFO level with actor, action, branch, decision, matched rule
- Startup logs: token source name, graph URI, bind address
- Graceful SIGINT shutdown
Not implemented (by design or "TBD")
- CORS — not configured; add
tower_http::corsif needed. - Rate limiting — per-actor admission control gates
/change,/ingest,/branches/{create,delete,merge},/schema/apply(see "Per-actor admission control" above). No global rate limiter is configured; addtower_http::limitif a graph-wide cap is needed. - Pagination — none (commits/branches return everything; export streams).
- Multi-tenant routing — one graph per process.