The POST /graphs runtime-create endpoint shipped in PR 7/10 has three
unresolved high-severity bugs:
- flock-on-renamed-inode race: the YAML flock is taken on
omnigraph.yaml itself, then a temp file is renamed over it.
Cross-process writers end up locking different inodes — both
believing they hold exclusive access.
- duplicate-check outside the file lock: precheck runs against
the in-memory registry only; the locked closure does
config.graphs.insert(...) unconditionally. Concurrent same-id
POSTs can persist the loser in YAML while the in-memory registry
keeps the winner — they disagree after restart.
- best_effort_cleanup_init_artifacts deletes _schema.pg /
_schema.ir.json / __schema_state.json on any init failure. An
accidental re-init against an existing graph's URI destroys its
schema; subsequent open() fails at read_text(_schema.pg).
The correct fix is a Lance-style cluster catalog (reserve → init →
publish with recovery sidecars), parallel to the engine's existing
__manifest discipline. That work is out of scope for v0.7.0.
For now, disable runtime add/remove from the network and CLI surface.
Operators add graphs by editing omnigraph.yaml and restarting. The
GET /graphs read-only enumeration stays.
Removed:
- POST /graphs handler + router fragment + utoipa registration
- 13 post_graphs_* server tests + 3 composite POST tests +
multi_mode_app_with_real_config / post_graph helpers
- CLI omnigraph graphs create subcommand + its handler + cli.rs tests
- system_remote.rs combined list+create test trimmed to list-only
- YAML rewrite infra: rewrite_atomic[_with_modify], RewriteAtomicError,
staging_path, hash_config_file, AppState::config_hash field +
threading through new_multi and open_multi_graph_state
- fs2 dependency (verified absent from cargo tree)
- sha2/fs2 imports in config.rs (only the rewrite path used them)
- Cedar PolicyAction::GraphCreate variant + "graph_create" match arms
+ action def in Cedar schema + graph_create_action_authorizes_against_server_resource test
- GraphCreateRequest / GraphCreateResponse / GraphSchemaSpec /
GraphPolicySpec API types (only the POST handler / CLI imported them)
Kept:
- GET /graphs (read-only enumeration) and graph_list Cedar action
- omnigraph graphs list CLI subcommand
- All multi-graph startup, mode inference, cluster routes,
per-graph + server-level Cedar policies
- server_settings_drive_multi_graph_startup_end_to_end (the test
that covers operator-authored YAML + restart — the path that
survives)
- best_effort_cleanup_init_artifacts and the three init failpoints
(still reachable from CLI `omnigraph init`; preflight fix deferred
as a follow-up)
- GraphRegistry::insert and its concurrency tests — production
callers gone, but the method is the natural seam for the future
cluster-catalog work
Also fixed (transcript issue 4):
- ALWAYS_FLAT_PATHS now includes /graphs so multi-mode OpenAPI
advertises the management route correctly (was previously rewritten
to /graphs/{graph_id}/graphs)
- multi_mode_openapi_keeps_healthz_flat → renamed to
multi_mode_openapi_keeps_management_paths_flat, asserts both
/healthz and /graphs stay flat
- multi_mode_openapi_prefixes_operation_ids_with_cluster skips
/graphs in addition to /healthz
Doc fixes:
- docs/user/cli.md: graphs list example was --target http://...,
but --target is a config-graph-name lookup; corrected to --uri.
Removed the graphs create example.
- docs/user/server.md: dropped POST /graphs row, "omnigraph.yaml
ownership", and "POST /graphs body shape" sections. Added a
paragraph stating runtime add/remove is not exposed in v0.7.0.
- docs/user/policy.md: dropped graph_create action; reworded the
"Configuration" line to clarify that server-scoped rules (graph_list)
take neither branch_scope nor target_branch_scope.
- docs/releases/v0.7.0.md: rewrote release narrative — multi-graph
mode ships; runtime add/remove deferred.
- AGENTS.md: HTTP server bullet and capability matrix row updated to
reflect read-only GET /graphs and the operator-edit workflow.
- openapi.json regenerated; /graphs has only .get, no .post.
Diff: 17 files, +123 −1525 LOC.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.8 KiB
Omnigraph v0.7.0
Multi-graph server mode (MR-668). One omnigraph-server process can now serve 1–10 graphs concurrently behind cluster routes (/graphs/{graph_id}/...), with per-graph and server-level Cedar policy, read-only GET /graphs enumeration, and CLI parity (omnigraph graphs list).
Runtime add/remove (POST /graphs, DELETE /graphs/{id}, omnigraph graphs create) is not in v0.7.0. Operators add or remove graphs by editing omnigraph.yaml and restarting. The first cut of POST /graphs shipped behind an atomic-YAML-rewrite design that we pulled before release once its concurrency guarantees were challenged (flock-on-renamed-inode race, duplicate-check outside the critical section, and an init-cleanup path that could destroy an existing graph's schema on re-init). The correct fix is a Lance-style cluster catalog (reserve → init → publish with recovery sidecars); that work is deferred.
Breaking Changes
- Multi-graph deployments lose flat routes. Single-graph invocation (
omnigraph-server <URI>) is unchanged — same flat/snapshot,/read,/branches, etc. Multi-graph deployments serve those routes under/graphs/{graph_id}/...; bare flat paths return 404 in multi mode. ServerConfigshape change (programmatic embedders only):ServerConfig { uri, policy_file }is replaced byServerConfig { mode: ServerConfigMode }, whereServerConfigMode = Single { uri, policy_file } | Multi { graphs, config_path, server_policy_file }. Callers that useload_server_settingsare unaffected; callers that constructServerConfigdirectly need to wrap their fields inServerConfigMode::Single.AppState::uri()now returnsOption<&str>(was&str). ReturnsSomein single mode,Nonein multi mode — per-graph URIs live onGraphHandle.uriinstead.AppState::new_multiis the new multi-graph constructor. Single-modenew_*/open_*constructors are unchanged.AuthenticatedActor(Arc<str>)→ResolvedActor { actor_id, tenant_id, scopes, source }(programmatic embedders only). The struct shape changes, but the HTTP contract — bearer auth, MR-731 spoof defense — is unchanged. Cluster-mode call sites construct withtenant_id: None,scopes: vec![Scope::Full],source: AuthSource::Static. Forward-compat for Cloud mode (RFC 0003) and OAuth provider (RFC 0004).
New
- Multi-graph mode. Invoke with
omnigraph-server --config omnigraph.yamlwhere the YAML has a non-emptygraphs:map and no single-mode selector (noserver.graph, no CLI<URI>or--target). At startup the server opens every configured graph in parallel (bounded concurrency, fail-fast). GET /graphs. Lists every registered graph, sorted alphabetically bygraph_id. Auth-required when bearer tokens are configured; Cedar-gated byPolicyAction::GraphListagainstOmnigraph::Server::"root". Returns 405 in single mode.- CLI
omnigraph graphs list. Mirrors the HTTP surface. Rejects local URI targets with a clear message — for remote multi-graph servers only. - Per-graph Cedar policy. Each entry in the
graphs:map can carry apolicy.filepath, loaded at startup. Cedar'sOmnigraph::Graph::"<graph_id>"resource is per-graph; the newOmnigraph::Server::"root"resource governs server-level actions. - Server-level Cedar policy.
server.policy.filein the config governs thegraph_listaction onOmnigraph::Server::"root". Required to exposeGET /graphsonce bearer tokens are configured (MR-723 default-deny otherwise rejectsgraph_listas non-read). - Cedar action vocabulary:
graph_list(server-scoped). Runtimegraph_create/graph_deleteare reserved but not shipped — see "Deferred."
Configuration
omnigraph.yaml schema additions (all optional, single-mode unaffected):
server:
bind: 0.0.0.0:8080
policy:
file: ./server-policy.yaml # server-level Cedar (graph_list)
graphs:
alpha:
uri: s3://tenant-bucket/alpha
policy:
file: ./policies/alpha.yaml # per-graph Cedar
beta:
uri: s3://tenant-bucket/beta
# no per-graph policy → engine-layer enforcement is a no-op
Deferred
POST /graphsruntime graph creation and CLIomnigraph graphs create. Pulled before release after the YAML-rewrite design's correctness story didn't survive review. A future release will add a managed cluster catalog (Lance-backed reserve → init → publish with recovery sidecars) and re-expose runtime creation on top of it. Until then, operators add graphs by editingomnigraph.yamland restarting.DELETE /graphs/{id}. Never shipped in v0.7.0; deferred with the same cluster-catalog work.StorageAdapter::delete_prefix. The substrate primitive a managed catalog would need. Will land alongside runtime mutation.X-Actor-Idservice delegation forwarding. Needs durable both-actor audit on_graph_commits.lance— out of scope.- Hot policy reload. Restart is cheap at N≤10 graphs.
User Impact
- Existing single-graph deployments upgrade with zero changes.
omnigraph-server <URI>with v0.6.0 config keeps working identically. - Multi-graph adoption is opt-in. Add a
graphs:map toomnigraph.yaml(and removeserver.graph) to switch a deployment to multi mode. - Cluster routes are breaking for client SDKs targeting multi mode. Generated clients from previous v0.6.0 OpenAPI specs will hit 404 on flat paths against a multi-mode server. Regenerate against the v0.7.0
openapi.json. - Operator-supplied policy.yaml files don't change. The Cedar
Omnigraph::GraphandOmnigraph::Serverentities are internally generated bycompile_policy_source— operator YAML only references actions and groups.
Migration: single → multi
# Before (v0.6.0 single-mode invocation)
server:
graph: my-graph
graphs:
my-graph:
uri: /var/lib/omnigraph/my-graph
policy:
file: ./policy.yaml
# After (v0.7.0 multi-mode — drop `server.graph` and the top-level `policy`)
server:
policy:
file: ./server-policy.yaml # NEW: governs GET /graphs
graphs:
my-graph:
uri: /var/lib/omnigraph/my-graph
policy:
file: ./policy.yaml # MOVED: was top-level
Same omnigraph.yaml file; restart the server. Clients targeting the old flat routes (/snapshot, /read, …) must update to /graphs/my-graph/snapshot, etc.
To add a new graph after rollout: stop the server, append a new graphs.<id> entry, restart.
Test coverage
GraphIdnewtype validation, registry race tests (PR 3), init failpoints (PR 2a — still reachable fromomnigraph initCLI).- Mode-inference four-rule matrix (PR 5), parallel multi-graph startup, cluster routing.
- Cedar
Serverresource refactor, backwards-compat for graph-only policies. GET /graphsenumeration, 405-in-single-mode.- MR-731 spoof regression test stays green across the entire refactor.