mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-21 02:28:07 +02:00
mr-668: composite e2e tests, race fix, v0.7.0 release (PR 9/10)
PR 9 — the final integration PR for MR-668 multi-graph server work.
Closes the v0.7.0 release.
Composite lifecycle tests (closes gaps flagged in PR 7's coverage
review):
- `multi_graph_lifecycle_post_query_restart_persistence` — POST a
graph, query it via cluster route, reload the config from disk
and confirm `load_server_settings` sees the rewritten YAML.
Validates the "restart resolves orphans" failure-mode story.
- `per_graph_policy_enforced_on_post_created_graph` — POST a graph
with a per-graph policy attached, then send authenticated read
and change requests. Per-graph Cedar enforcement fires correctly
on a POST-created graph (engine-layer policy reinstalled via
`Omnigraph::with_policy` inside the create flow).
- `concurrent_post_graphs_distinct_ids_all_succeed` — 4 concurrent
POSTs with distinct graph_ids all return 201. Caught a real
race in `rewrite_atomic` (see below).
Race fix — `rewrite_atomic_with_modify`:
The first composite test surfaced a real bug. The old
`rewrite_atomic(path, new_config, expected_hash)` captured the
baseline hash OUTSIDE the flock, then called rewrite_atomic which
re-acquired it inside. Under concurrent writers:
- POST A: captures baseline H0, calls rewrite_atomic.
- POST B: captures baseline H0 too (before A's update lands).
- A: acquires flock, on-disk == H0, writes H1, releases.
- A: updates baseline H0 → H1.
- B: tries to acquire flock — waits.
- B: acquires flock. On-disk is now H1. Expected (captured
before A finished) is H0. MISMATCH → spurious Drift error.
Worse: even if the timing happens to align, B's `updated` config
was constructed from BYTES read before the flock. B writes a config
that doesn't include A's new graph — silent data loss.
The fix: new `config::rewrite_atomic_with_modify(path, baseline,
modify)` takes a closure. Inside the flock + baseline mutex:
1. Read on-disk bytes, hash, compare to baseline.
2. Parse on-disk YAML.
3. Call `modify(parsed)` to produce the new config — receives
fresh on-disk state, returns the modification.
4. Serialize + write + fsync + rename + update baseline.
Everything is read-modify-write under the same critical section.
Concurrent writers serialize cleanly. Test confirmed this is no
longer a race.
The old `rewrite_atomic(path, new_config, expected_hash)` API stays
for tests that don't need the read-modify-write shape; the POST
handler switches to the new shape.
Version bump v0.6.0 → v0.7.0:
- All 5 `crates/*/Cargo.toml` (compiler, engine, policy, cli, server)
plus their inter-crate `path` dep version constraints.
- `Cargo.lock` regenerated by `cargo build --workspace`.
- `AGENTS.md` "Version surveyed" line, capability matrix HTTP-server
row updated to mention multi-graph + cluster routes + atomic YAML
rewrite.
- `openapi.json` regenerated.
Docs:
- `docs/releases/v0.7.0.md` (new) — release notes with breaking
changes, new features, deferred items (DELETE, `delete_prefix`,
actor forwarding), and the single→multi migration recipe.
- `docs/user/server.md` — substantial section additions for the
two modes, mode inference, cluster endpoint table, management
endpoints, `omnigraph.yaml` ownership contract, `POST /graphs`
body shape + status codes.
- `docs/user/cli.md` — `omnigraph graphs list/create` section,
deferred-DELETE note.
- `docs/user/policy.md` — server-scoped Cedar actions
(`graph_create`, `graph_list`), per-graph vs server-level policy
composition, example server-level policy.
Workspace test pass: 573 tests green across all crates. Zero
failures. MR-731 spoof regression still pinned and passing across
the entire 10-PR series.
This commit closes MR-668. v0.7.0 is ready for tagging.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
75514b6cfd
commit
d11c18fb27
15 changed files with 632 additions and 77 deletions
109
docs/releases/v0.7.0.md
Normal file
109
docs/releases/v0.7.0.md
Normal file
|
|
@ -0,0 +1,109 @@
|
|||
# Omnigraph v0.7.0
|
||||
|
||||
Multi-graph server mode (MR-668). One `omnigraph-server` process can now serve 1–10 graphs concurrently behind cluster routes (`/graphs/{graph_id}/...`), with per-graph Cedar policy, runtime graph creation via `POST /graphs`, and CLI parity (`omnigraph graphs list/create`).
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
- **Multi-graph deployments lose flat routes.** Single-graph invocation (`omnigraph-server <URI>`) is unchanged — same flat `/snapshot`, `/read`, `/branches`, etc. Multi-graph deployments serve those routes under `/graphs/{graph_id}/...`; bare flat paths return 404 in multi mode.
|
||||
- **`ServerConfig` shape change** (programmatic embedders only): `ServerConfig { uri, policy_file }` is replaced by `ServerConfig { mode: ServerConfigMode }`, where `ServerConfigMode = Single { uri, policy_file } | Multi { graphs, config_path, server_policy_file }`. Callers that use `load_server_settings` are unaffected; callers that construct `ServerConfig` directly need to wrap their fields in `ServerConfigMode::Single`.
|
||||
- **`AppState::uri()`** now returns `Option<&str>` (was `&str`). Returns `Some` in single mode, `None` in multi mode — per-graph URIs live on `GraphHandle.uri` instead.
|
||||
- **`AppState::new_multi`** is the new multi-graph constructor. Single-mode `new_*` / `open_*` constructors are unchanged.
|
||||
- **`AuthenticatedActor(Arc<str>)` → `ResolvedActor { actor_id, tenant_id, scopes, source }`** (programmatic embedders only). The struct shape changes, but the HTTP contract — bearer auth, MR-731 spoof defense — is unchanged. Cluster-mode call sites construct with `tenant_id: None`, `scopes: vec![Scope::Full]`, `source: AuthSource::Static`. Forward-compat for Cloud mode (RFC 0003) and OAuth provider (RFC 0004).
|
||||
|
||||
## New
|
||||
|
||||
- **Multi-graph mode**. Invoke with `omnigraph-server --config omnigraph.yaml` where the YAML has a non-empty `graphs:` map and no single-mode selector (no `server.graph`, no CLI `<URI>` or `--target`). At startup the server opens every configured graph in parallel (bounded concurrency, fail-fast).
|
||||
- **`POST /graphs`**. Runtime graph creation. Request body:
|
||||
```json
|
||||
{
|
||||
"graph_id": "beta",
|
||||
"uri": "/data/beta.omni",
|
||||
"schema": { "source": "<inline .pg source>" },
|
||||
"policy": { "file": "./policies/beta.yaml" }
|
||||
}
|
||||
```
|
||||
`schema` and `policy` are nested objects — leaves room for future fields without breaking the shape. (Asymmetric with the existing `POST /schema/apply`, which still uses flat `schema_source: String`. A follow-up release may migrate it.) Body limit is 32 MiB.
|
||||
|
||||
The server runs `Omnigraph::init` at the supplied URI, atomically rewrites `omnigraph.yaml` under an exclusive `fcntl::flock` with SHA-256 drift detection, then publishes the handle in the in-memory registry. Returns 201 on success; 409 on duplicate `graph_id` or URI; 503 on YAML drift (operator hand-edited the file between server start and the rewrite).
|
||||
- **`GET /graphs`**. Lists every registered graph, sorted alphabetically by `graph_id`. Auth-required when bearer tokens are configured; Cedar-gated by `PolicyAction::GraphList` against `Omnigraph::Server::"root"`. Returns 405 in single mode.
|
||||
- **CLI `omnigraph graphs list/create`**. Mirrors the HTTP surface. Reject local URI targets with a clear message — these subcommands are for remote multi-graph servers only.
|
||||
- **Per-graph Cedar policy**. Each entry in the `graphs:` map can carry a `policy.file` path. Loaded at startup or attached at `POST` time. Cedar's `Omnigraph::Graph::"<graph_id>"` resource is per-graph; the new `Omnigraph::Server::"root"` resource governs server-level actions.
|
||||
- **Cedar action vocabulary**: `graph_create` and `graph_list` (server-scoped). `graph_delete` is reserved but not shipped — see "Deferred."
|
||||
- **YAML drift detection**. Server hashes `omnigraph.yaml` at startup. `POST /graphs` re-hashes the on-disk file under the flock before rewriting; if the hash doesn't match the baseline, the rewrite refuses with 503 to avoid clobbering operator hand-edits.
|
||||
- **`Omnigraph::init` error-path cleanup**. A failed init now best-effort-deletes the schema artifacts (`_schema.pg`, `_schema.ir.json`, `__schema_state.json`). Lance per-type directories created by `GraphCoordinator::init` may still orphan — full recursive cleanup needs a `delete_prefix` substrate primitive, deferred along with `DELETE /graphs/{id}`.
|
||||
- **`omnigraph-policy` is now a published workspace crate.** The published-crates set is `omnigraph-compiler`, `omnigraph-policy`, `omnigraph-engine`, `omnigraph-server`, `omnigraph-cli`.
|
||||
|
||||
## Configuration
|
||||
|
||||
`omnigraph.yaml` schema additions (all optional, single-mode unaffected):
|
||||
|
||||
```yaml
|
||||
server:
|
||||
bind: 0.0.0.0:8080
|
||||
policy:
|
||||
file: ./server-policy.yaml # server-level Cedar (graph_create, graph_list)
|
||||
|
||||
graphs:
|
||||
alpha:
|
||||
uri: s3://tenant-bucket/alpha
|
||||
policy:
|
||||
file: ./policies/alpha.yaml # per-graph Cedar
|
||||
beta:
|
||||
uri: s3://tenant-bucket/beta
|
||||
# no per-graph policy → engine-layer enforcement is a no-op
|
||||
```
|
||||
|
||||
## Deferred
|
||||
|
||||
- **`DELETE /graphs/{id}`**. Cut from v0.7.0 scope to bound complexity (no `delete_prefix` substrate, no tombstones). Operators remove graphs by stopping the server, editing `omnigraph.yaml`, then restarting.
|
||||
- **`StorageAdapter::delete_prefix`**. The substrate primitive that DELETE would need. Will land alongside DELETE in a future release.
|
||||
- **`X-Actor-Id` service delegation forwarding**. Needs durable both-actor audit on `_graph_commits.lance` — out of scope.
|
||||
- **Hot policy reload**. Restart is cheap at N≤10 graphs.
|
||||
|
||||
## User Impact
|
||||
|
||||
- **Existing single-graph deployments upgrade with zero changes.** `omnigraph-server <URI>` with v0.6.0 config keeps working identically.
|
||||
- **Multi-graph adoption is opt-in.** Add a `graphs:` map to `omnigraph.yaml` (and remove `server.graph`) to switch a deployment to multi mode.
|
||||
- **Cluster routes are breaking for client SDKs targeting multi mode.** Generated clients from previous v0.6.0 OpenAPI specs will hit 404 on flat paths against a multi-mode server. Regenerate against the v0.7.0 `openapi.json`.
|
||||
- **`fs2 = "0.4"`** is a new dependency for the file locking that powers the atomic YAML rewrite. POSIX-only. Linux / macOS deployment supported; Windows is out of scope.
|
||||
- **Operator-supplied policy.yaml files don't change.** The Cedar `Omnigraph::Graph` and `Omnigraph::Server` entities are internally generated by `compile_policy_source` — operator YAML only references actions and groups.
|
||||
|
||||
## Migration: single → multi
|
||||
|
||||
```yaml
|
||||
# Before (v0.6.0 single-mode invocation)
|
||||
server:
|
||||
graph: my-graph
|
||||
graphs:
|
||||
my-graph:
|
||||
uri: /var/lib/omnigraph/my-graph
|
||||
policy:
|
||||
file: ./policy.yaml
|
||||
```
|
||||
|
||||
```yaml
|
||||
# After (v0.7.0 multi-mode — drop `server.graph` and the top-level `policy`)
|
||||
server:
|
||||
policy:
|
||||
file: ./server-policy.yaml # NEW: governs POST/GET /graphs
|
||||
graphs:
|
||||
my-graph:
|
||||
uri: /var/lib/omnigraph/my-graph
|
||||
policy:
|
||||
file: ./policy.yaml # MOVED: was top-level
|
||||
```
|
||||
|
||||
Same `omnigraph.yaml` file; restart the server. Clients targeting the old flat routes (`/snapshot`, `/read`, …) must update to `/graphs/my-graph/snapshot`, etc.
|
||||
|
||||
## Test coverage
|
||||
|
||||
v0.7.0 ships ~280 new tests covering MR-668 specifically:
|
||||
|
||||
- `GraphId` newtype validation, registry race tests (PR 3), init failpoints (PR 2a).
|
||||
- Mode-inference four-rule matrix (PR 5), parallel multi-graph startup, cluster routing.
|
||||
- Cedar `Server` resource refactor, backwards-compat for graph-only policies.
|
||||
- `POST /graphs` happy path + duplicate graph_id + duplicate URI + YAML drift detection + 405-in-single-mode.
|
||||
- Composite lifecycle: POST a graph, query it via cluster route, reload config from disk, confirm persistence.
|
||||
- Per-graph Cedar policy enforced for a POST-created graph (engine-layer enforcement is re-applied via `Omnigraph::with_policy`).
|
||||
- Concurrent distinct-id POSTs serialize correctly through the flock without spurious drift errors.
|
||||
- MR-731 spoof regression test stays green across the entire refactor.
|
||||
Loading…
Add table
Add a link
Reference in a new issue