diff --git a/AGENTS.md b/AGENTS.md index a9cc9c0..b11134d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -16,8 +16,8 @@ Tools that support `@`-imports (Claude Code) auto-include all three files via th `CLAUDE.md` is a symlink to this file — there is exactly one source of truth. Edit `AGENTS.md`. -**Version surveyed:** 0.6.0 -**Workspace crates:** `omnigraph-compiler`, `omnigraph` (engine), `omnigraph-cli`, `omnigraph-server` +**Version surveyed:** 0.7.0 +**Workspace crates:** `omnigraph-compiler`, `omnigraph` (engine), `omnigraph-policy`, `omnigraph-cli`, `omnigraph-server` **Storage substrate:** Lance 6.x (columnar, versioned, branchable) **License:** MIT **Toolchain:** Rust stable, edition 2024 @@ -33,7 +33,7 @@ OmniGraph is a typed property-graph engine built as a coordination layer over ma - **Multi-modal querying**: vector ANN (`nearest`), full-text (`search`/`fuzzy`/`match_text`/`bm25`), Reciprocal Rank Fusion (`rrf`), and graph traversal (`Expand`, anti-join `not { … }`) in one runtime. - **Branches and commits across the whole graph**: Git-style — every successful publish appends to a commit DAG; merges are three-way at the row level. - **Atomic per-query writes**: `mutate_as` and `load` accumulate insert/update batches into an in-memory `MutationStaging.pending` per touched table; one `stage_*` + `commit_staged` per table runs at end-of-query, then `ManifestBatchPublisher::publish` commits the manifest atomically with per-table `expected_table_versions` CAS. A mid-query failure leaves Lance HEAD untouched on staged tables — no drift, no run state machine, no staging branches. Deletes still inline-commit; D₂ at parse time prevents inserts/updates and deletes from coexisting in one query. -- **HTTP server**: Axum + utoipa OpenAPI, bearer auth (SHA-256 hashed, optional AWS Secrets Manager). Cedar policy enforcement is engine-wide — every `_as` writer calls `Omnigraph::enforce(action, scope, actor)`, so HTTP, CLI, and embedded SDK consumers all hit the same gate. +- **HTTP server**: Axum + utoipa OpenAPI, bearer auth (SHA-256 hashed, optional AWS Secrets Manager). Cedar policy enforcement is engine-wide — every `_as` writer calls `Omnigraph::enforce(action, scope, actor)`, so HTTP, CLI, and embedded SDK consumers all hit the same gate. **Two modes** (v0.7.0+): single-graph (legacy flat routes) and multi-graph (`/graphs/{graph_id}/...` cluster routes + `POST/GET /graphs` management endpoints with atomic YAML rewrite + drift detection). Per-graph + server-level Cedar policies. - **CLI** driven by a single `omnigraph.yaml`; multi-format output (json/jsonl/csv/kv/table). Throughout the docs, capabilities are split into **L1 — Inherited from Lance** vs **L2 — Added by OmniGraph**. @@ -227,7 +227,7 @@ omnigraph policy explain --actor act-alice --action change --branch main | Three-way row-level merge | — | `OrderedTableCursor` + `StagedTableWriter`, structured `MergeConflictKind` | | Change feeds | — | `diff_between` / `diff_commits` with manifest fast path + ID streaming | | Cedar policy | — | 8 actions, branch / target_branch / protected scopes, validate/test/explain CLI. **Engine-wide enforcement** (MR-722): every `_as` writer (`apply_schema_as`, `mutate_as`, `load_as`, `ingest_as`, `branch_create_as` / `branch_create_from_as`, `branch_delete_as`, `branch_merge_as`) calls `Omnigraph::enforce(action, scope, actor)` — HTTP, CLI, embedded SDK all hit the same gate. | -| HTTP server | — | Axum, OpenAPI via utoipa, bearer auth (SHA-256, AWS Secrets Manager option), `authorize_request` at the HTTP boundary (resolves bearer→actor, applies admission control), NDJSON streaming export | +| HTTP server | — | Axum, OpenAPI via utoipa, bearer auth (SHA-256, AWS Secrets Manager option), `authorize_request` at the HTTP boundary (resolves bearer→actor, applies admission control), NDJSON streaming export, **multi-graph mode (v0.7.0+) with cluster routes + `POST/GET /graphs` management endpoints + atomic YAML rewrite under `fs2::flock` + SHA-256 drift detection** | | CLI with config | — | `omnigraph.yaml`, aliases, multi-format output (json/jsonl/csv/kv/table) | | Audit / actor tracking | — | `_as` write APIs + actor map in commit graph | | Local RustFS bootstrap | — | `scripts/local-rustfs-bootstrap.sh` one-shot S3-backed dev environment | diff --git a/Cargo.lock b/Cargo.lock index 53d7709..b52e218 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -4553,7 +4553,7 @@ dependencies = [ [[package]] name = "omnigraph-cli" -version = "0.6.0" +version = "0.7.0" dependencies = [ "assert_cmd", "clap", @@ -4575,7 +4575,7 @@ dependencies = [ [[package]] name = "omnigraph-compiler" -version = "0.6.0" +version = "0.7.0" dependencies = [ "ahash", "arrow-array", @@ -4596,7 +4596,7 @@ dependencies = [ [[package]] name = "omnigraph-engine" -version = "0.6.0" +version = "0.7.0" dependencies = [ "arc-swap", "arrow-array", @@ -4637,7 +4637,7 @@ dependencies = [ [[package]] name = "omnigraph-policy" -version = "0.6.0" +version = "0.7.0" dependencies = [ "cedar-policy", "clap", @@ -4650,7 +4650,7 @@ dependencies = [ [[package]] name = "omnigraph-server" -version = "0.6.0" +version = "0.7.0" dependencies = [ "arc-swap", "async-trait", diff --git a/crates/omnigraph-cli/Cargo.toml b/crates/omnigraph-cli/Cargo.toml index 0d35ed8..4d2acf1 100644 --- a/crates/omnigraph-cli/Cargo.toml +++ b/crates/omnigraph-cli/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "omnigraph-cli" -version = "0.6.0" +version = "0.7.0" edition = "2024" description = "CLI for the Omnigraph graph database." license = "MIT" @@ -13,10 +13,10 @@ name = "omnigraph" path = "src/main.rs" [dependencies] -omnigraph = { package = "omnigraph-engine", path = "../omnigraph", version = "0.6.0" } -omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.6.0" } -omnigraph-policy = { path = "../omnigraph-policy", version = "0.6.0" } -omnigraph-server = { path = "../omnigraph-server", version = "0.6.0" } +omnigraph = { package = "omnigraph-engine", path = "../omnigraph", version = "0.7.0" } +omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.0" } +omnigraph-policy = { path = "../omnigraph-policy", version = "0.7.0" } +omnigraph-server = { path = "../omnigraph-server", version = "0.7.0" } clap = { workspace = true } color-eyre = { workspace = true } serde = { workspace = true } diff --git a/crates/omnigraph-compiler/Cargo.toml b/crates/omnigraph-compiler/Cargo.toml index 229b862..bbf03f1 100644 --- a/crates/omnigraph-compiler/Cargo.toml +++ b/crates/omnigraph-compiler/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "omnigraph-compiler" -version = "0.6.0" +version = "0.7.0" edition = "2024" description = "Schema/query compiler for Omnigraph. Zero Lance dependency." license = "MIT" diff --git a/crates/omnigraph-policy/Cargo.toml b/crates/omnigraph-policy/Cargo.toml index dacda35..907ce07 100644 --- a/crates/omnigraph-policy/Cargo.toml +++ b/crates/omnigraph-policy/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "omnigraph-policy" -version = "0.6.0" +version = "0.7.0" edition = "2024" description = "Policy / authorization layer for Omnigraph — Cedar-backed PolicyEngine, PolicyChecker trait, ResourceScope enum." license = "MIT" diff --git a/crates/omnigraph-server/Cargo.toml b/crates/omnigraph-server/Cargo.toml index 5ea2524..590095c 100644 --- a/crates/omnigraph-server/Cargo.toml +++ b/crates/omnigraph-server/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "omnigraph-server" -version = "0.6.0" +version = "0.7.0" edition = "2024" description = "HTTP server for the Omnigraph graph database." license = "MIT" @@ -19,9 +19,9 @@ default = [] aws = ["dep:aws-config", "dep:aws-sdk-secretsmanager"] [dependencies] -omnigraph = { package = "omnigraph-engine", path = "../omnigraph", version = "0.6.0" } -omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.6.0" } -omnigraph-policy = { path = "../omnigraph-policy", version = "0.6.0" } +omnigraph = { package = "omnigraph-engine", path = "../omnigraph", version = "0.7.0" } +omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.0" } +omnigraph-policy = { path = "../omnigraph-policy", version = "0.7.0" } axum = { workspace = true } clap = { workspace = true } color-eyre = { workspace = true } diff --git a/crates/omnigraph-server/src/config.rs b/crates/omnigraph-server/src/config.rs index e233657..965700b 100644 --- a/crates/omnigraph-server/src/config.rs +++ b/crates/omnigraph-server/src/config.rs @@ -493,6 +493,77 @@ fn staging_path(path: &Path) -> PathBuf { PathBuf::from(s) } +/// Atomic read-modify-write of `omnigraph.yaml` (MR-668 PR 7 — race-fix +/// from PR 9). Everything happens **inside** the `fcntl::flock` and the +/// in-memory baseline mutex: +/// 1. Acquire `LOCK_EX`. +/// 2. Lock the in-memory baseline mutex. +/// 3. Read the on-disk file, hash it. +/// 4. Compare to the in-memory baseline; if mismatch → `Drift`. +/// 5. Parse the on-disk YAML, hand the parsed config to `modify`. +/// 6. Serialize the returned config, write `.tmp`, fsync, rename. +/// 7. Update the in-memory baseline to the new file's hash. +/// 8. Release flock + mutex. +/// +/// The earlier `rewrite_atomic` captured the baseline OUTSIDE the +/// flock, which created a race under concurrent writers: a second +/// writer would see a stale baseline + the first writer's new on-disk +/// hash, yielding a spurious `Drift` error. The `_with_modify` shape +/// keeps the entire critical section atomic. +/// +/// `modify` is a `FnOnce` so the caller can read mutable state into it +/// (e.g. a `GraphCreateRequest`) without `Sync` requirements. +pub fn rewrite_atomic_with_modify( + path: &Path, + baseline: &std::sync::Mutex<[u8; 32]>, + modify: F, +) -> std::result::Result<(), RewriteAtomicError> +where + F: FnOnce(OmnigraphConfig) -> std::result::Result, +{ + let lock_file = fs::OpenOptions::new().read(true).write(true).open(path)?; + lock_file.lock_exclusive()?; + let _lock_guard = lock_file; + + // Lock the in-memory baseline INSIDE the flock so concurrent writers + // serialize on both: flock for cross-process safety, mutex for + // in-process baseline updates. The mutex guard outlives the modify + // step so the baseline can't move under our feet. + let mut baseline_guard = baseline + .lock() + .expect("baseline mutex must not be poisoned"); + + let current_bytes = fs::read(path)?; + let mut current_hash = [0u8; 32]; + current_hash.copy_from_slice(&Sha256::digest(¤t_bytes)); + if current_hash != *baseline_guard { + return Err(RewriteAtomicError::Drift); + } + + // Parse the on-disk config (NOT a stale cached version) and hand + // to `modify`. The closure can mutate freely; the result is what + // we serialize and write. + let current_config: OmnigraphConfig = serde_yaml::from_slice(¤t_bytes)?; + let new_config = modify(current_config)?; + let serialized = serde_yaml::to_string(&new_config)?; + + let tmp_path = staging_path(path); + fs::write(&tmp_path, &serialized)?; + let tmp_file = fs::File::open(&tmp_path)?; + tmp_file.sync_all()?; + drop(tmp_file); + fs::rename(&tmp_path, path)?; + if let Some(parent) = path.parent() { + let dir = fs::File::open(parent)?; + dir.sync_all()?; + } + + let mut new_hash = [0u8; 32]; + new_hash.copy_from_slice(&Sha256::digest(serialized.as_bytes())); + *baseline_guard = new_hash; + Ok(()) +} + #[cfg(test)] mod tests { use std::fs; diff --git a/crates/omnigraph-server/src/lib.rs b/crates/omnigraph-server/src/lib.rs index 4aa6b4a..b2e7454 100644 --- a/crates/omnigraph-server/src/lib.rs +++ b/crates/omnigraph-server/src/lib.rs @@ -1428,49 +1428,33 @@ async fn server_graphs_create( )) } -/// Load `omnigraph.yaml` from disk, add the new graph entry, write it -/// back via `config::rewrite_atomic`, and update the in-memory baseline -/// hash. Returns an `ApiError` mapped to the appropriate HTTP status -/// (503 for drift, 500 for IO/serialize failures). +/// Atomically rewrite `omnigraph.yaml` to add a new graph entry. +/// Runs inside `tokio::task::spawn_blocking` (the flock is sync). /// -/// Runs inside `tokio::task::spawn_blocking` — `fs2::flock` is sync. +/// Read-modify-write happens entirely under the flock + baseline +/// mutex via `config::rewrite_atomic_with_modify` — concurrent +/// writers serialize without spurious drift errors. fn rewrite_yaml_with_new_graph( config_path: &std::path::Path, config_hash: &Arc>, graph_id: &str, new_target: config::TargetConfig, ) -> std::result::Result<(), ApiError> { - // Re-read the config file to construct the next state. - let bytes = std::fs::read(config_path) - .map_err(|err| ApiError::internal(format!("read omnigraph.yaml: {err}")))?; - let mut updated: config::OmnigraphConfig = serde_yaml::from_slice(&bytes) - .map_err(|err| ApiError::internal(format!("parse omnigraph.yaml: {err}")))?; - updated.graphs.insert(graph_id.to_string(), new_target); - - // Grab the current baseline hash for the drift check. - let expected = *config_hash - .lock() - .expect("config_hash mutex must not be poisoned"); - let new_hash = config::rewrite_atomic(config_path, &updated, &expected).map_err(|err| { - match err { - config::RewriteAtomicError::Drift => ApiError { - status: StatusCode::SERVICE_UNAVAILABLE, - code: ErrorCode::Conflict, - message: err.to_string(), - merge_conflicts: Vec::new(), - manifest_conflict: None, - }, - other => ApiError::internal(other.to_string()), - } - })?; - - // Update the baseline so the next POST sees this as the new "no - // drift" reference. If we forgot this, every POST after the first - // would 503. - *config_hash - .lock() - .expect("config_hash mutex must not be poisoned") = new_hash; - Ok(()) + let graph_id = graph_id.to_string(); + config::rewrite_atomic_with_modify(config_path, config_hash, move |mut config| { + config.graphs.insert(graph_id, new_target); + Ok(config) + }) + .map_err(|err| match err { + config::RewriteAtomicError::Drift => ApiError { + status: StatusCode::SERVICE_UNAVAILABLE, + code: ErrorCode::Conflict, + message: err.to_string(), + merge_conflicts: Vec::new(), + manifest_conflict: None, + }, + other => ApiError::internal(other.to_string()), + }) } async fn server_openapi(State(state): State) -> Json { diff --git a/crates/omnigraph-server/tests/server.rs b/crates/omnigraph-server/tests/server.rs index e80d33b..6e8f0f2 100644 --- a/crates/omnigraph-server/tests/server.rs +++ b/crates/omnigraph-server/tests/server.rs @@ -5244,4 +5244,271 @@ graphs: _ => unreachable!(), } } + + // ─── PR 9: composite lifecycle tests ───────────────────────────────── + // + // These tests exercise PRs 1–8 in combination. Each test composes + // multiple primitives (POST a graph, query it, restart, enforce + // per-graph policy) into a single scenario. They're the closure + // tests for the gaps I flagged in PR 7's coverage assessment — + // not redundant with the per-PR tests because they catch + // integration regressions that individual unit tests miss. + + /// Post a graph, query it via cluster route, then re-load the + /// config from disk and confirm `load_server_settings` sees the + /// rewritten YAML (i.e. the server's `POST /graphs` actually + /// persists). Validates that on restart, the new graph would be + /// opened automatically by `serve()`'s multi-mode startup. + #[tokio::test(flavor = "multi_thread")] + async fn multi_graph_lifecycle_post_query_restart_persistence() { + let (cfg_dir, app) = multi_mode_app_with_real_config(&["alpha"]).await; + let schema = fs::read_to_string(fixture("test.pg")).unwrap(); + + // 1. POST a new graph `beta`. + let beta_uri = cfg_dir.path().join("beta.omni"); + let req = GraphCreateRequest { + graph_id: "beta".to_string(), + uri: beta_uri.to_string_lossy().to_string(), + schema: GraphSchemaSpec { + source: schema.clone(), + }, + policy: None, + }; + let (status, _) = post_graph(&app, &req, None).await; + assert_eq!(status, StatusCode::CREATED); + + // 2. Query the new graph via its cluster route. + let snap = app + .clone() + .oneshot( + Request::builder() + .method(Method::GET) + .uri("/graphs/beta/snapshot?branch=main") + .body(Body::empty()) + .unwrap(), + ) + .await + .unwrap(); + assert_eq!(snap.status(), StatusCode::OK); + + // 3. "Restart": reload the config and confirm the rewritten + // YAML carries the new graph through `load_server_settings`. + // A real restart calls `open_multi_graph_state` next; we + // stop short of opening Lance again (the per-PR tests + // already cover that path) but assert the inferred + // `ServerConfigMode::Multi` lists both graphs. + let config_path = cfg_dir.path().join("omnigraph.yaml"); + let settings: ServerConfig = + load_server_settings(Some(&config_path), None, None, None, true).unwrap(); + match settings.mode { + ServerConfigMode::Multi { graphs, .. } => { + let ids: Vec<&str> = graphs.iter().map(|g| g.graph_id.as_str()).collect(); + assert_eq!( + ids, + vec!["alpha", "beta"], + "rewritten YAML must include both graphs in BTreeMap order" + ); + } + _ => panic!("expected Multi mode after restart"), + } + } + + /// Per-graph Cedar policy is enforced for a graph created via POST. + /// Closes the gap from PR 7's test coverage — the policy was loaded + /// but never exercised end-to-end. This test sends an authenticated + /// `change` request against a POST-created graph whose per-graph + /// policy denies `change` for that actor. + #[tokio::test(flavor = "multi_thread")] + async fn per_graph_policy_enforced_on_post_created_graph() { + let (cfg_dir, _initial_app) = multi_mode_app_with_real_config(&[]).await; + let schema = fs::read_to_string(fixture("test.pg")).unwrap(); + let config_path = cfg_dir.path().join("omnigraph.yaml"); + let config_hash = omnigraph_server::config::hash_config_file(&config_path).unwrap(); + // Server-level policy: act-andrew can create graphs. Required + // because requires_bearer_auth fires under MR-723 default-deny + // once we configure tokens, and `GraphCreate != Read` would + // otherwise 403 without a server policy. + let server_policy_path = cfg_dir.path().join("server-policy.yaml"); + fs::write( + &server_policy_path, + r#" +version: 1 +groups: + admins: [act-andrew] +rules: + - id: admins-create + allow: + actors: { group: admins } + actions: [graph_create, graph_list] +"#, + ) + .unwrap(); + let server_policy = omnigraph_policy::PolicyEngine::load(&server_policy_path, "server") + .unwrap(); + let workload = omnigraph_server::workload::WorkloadController::from_env(); + let state = AppState::new_multi( + vec![], + vec![ + ("act-andrew".to_string(), "andrew-token".to_string()), + ("act-bruno".to_string(), "bruno-token".to_string()), + ], + Some(server_policy), + workload, + Some(config_path.clone()), + Some(config_hash), + ) + .expect("empty multi-mode registry must be constructible"); + let app = build_app(state); + + // Per-graph policy file: only `act-andrew` may `change`. + let beta_policy_path = cfg_dir.path().join("beta-policy.yaml"); + fs::write( + &beta_policy_path, + r#" +version: 1 +groups: + writers: [act-andrew] + readers: [act-bruno] +protected_branches: [] +rules: + - id: writers-change + allow: + actors: { group: writers } + actions: [read, change] + branch_scope: any + - id: readers-read + allow: + actors: { group: readers } + actions: [read] + branch_scope: any +"#, + ) + .unwrap(); + + // POST `beta` with the per-graph policy attached. + let beta_uri = cfg_dir.path().join("beta.omni"); + let req = GraphCreateRequest { + graph_id: "beta".to_string(), + uri: beta_uri.to_string_lossy().to_string(), + schema: GraphSchemaSpec { source: schema }, + policy: Some(omnigraph_server::api::GraphPolicySpec { + file: Some(beta_policy_path.to_string_lossy().to_string()), + }), + }; + let (status, body) = post_graph(&app, &req, Some("andrew-token")).await; + assert_eq!( + status, + StatusCode::CREATED, + "POST /graphs failed: {body}" + ); + + // Authenticated `read` from a reader: 200. + let read_resp = app + .clone() + .oneshot( + Request::builder() + .method(Method::GET) + .uri("/graphs/beta/snapshot?branch=main") + .header("authorization", "Bearer bruno-token") + .body(Body::empty()) + .unwrap(), + ) + .await + .unwrap(); + assert_eq!( + read_resp.status(), + StatusCode::OK, + "act-bruno must be allowed read on beta" + ); + + // Authenticated `change` from the reader (act-bruno) must 403: + // beta-policy allows readers only `read`, not `change`. + let change_body = serde_json::json!({ + "query_source": "query foo() { insert Person { name: \"X\" } }", + "query_name": "foo", + "branch": "main" + }); + let change_resp = app + .oneshot( + Request::builder() + .method(Method::POST) + .uri("/graphs/beta/change") + .header("authorization", "Bearer bruno-token") + .header("content-type", "application/json") + .body(Body::from(serde_json::to_vec(&change_body).unwrap())) + .unwrap(), + ) + .await + .unwrap(); + assert_eq!( + change_resp.status(), + StatusCode::FORBIDDEN, + "per-graph Cedar policy must deny `change` for act-bruno on beta" + ); + } + + /// Concurrent POST /graphs for DISTINCT graph_ids all succeed. + /// The flock + drift detection serializes the YAML rewrite, but + /// all writes are valid and the final YAML lists every graph. + /// (Same-graph_id concurrency is already covered by the + /// `concurrent_insert_same_key_exactly_one_succeeds` registry + /// test plus the YAML drift-detection behavior.) + #[tokio::test(flavor = "multi_thread")] + async fn concurrent_post_graphs_distinct_ids_all_succeed() { + let (cfg_dir, app) = multi_mode_app_with_real_config(&["alpha"]).await; + let schema = fs::read_to_string(fixture("test.pg")).unwrap(); + const N: usize = 4; + + let app = Arc::new(app); + let barrier = Arc::new(tokio::sync::Barrier::new(N)); + let mut tasks = Vec::with_capacity(N); + for i in 0..N { + let app = Arc::clone(&app); + let barrier = Arc::clone(&barrier); + let dir = cfg_dir.path().to_path_buf(); + let schema = schema.clone(); + tasks.push(tokio::spawn(async move { + barrier.wait().await; + let id = format!("graph-{i}"); + let uri = dir.join(format!("{id}.omni")); + let req = GraphCreateRequest { + graph_id: id.clone(), + uri: uri.to_string_lossy().to_string(), + schema: GraphSchemaSpec { source: schema }, + policy: None, + }; + let (status, _) = post_graph(&app, &req, None).await; + (id, status) + })); + } + + let mut succeeded = Vec::new(); + for t in tasks { + let (id, status) = t.await.unwrap(); + assert_eq!( + status, + StatusCode::CREATED, + "POST {id} must succeed under concurrent distinct-id POSTs" + ); + succeeded.push(id); + } + + // Final registry has 1 (alpha) + N (graph-0..N-1) = N+1 graphs. + let resp = (*app) + .clone() + .oneshot( + Request::builder() + .method(Method::GET) + .uri("/graphs") + .body(Body::empty()) + .unwrap(), + ) + .await + .unwrap(); + assert_eq!(resp.status(), StatusCode::OK); + let body = to_bytes(resp.into_body(), usize::MAX).await.unwrap(); + let payload: Value = serde_json::from_slice(&body).unwrap(); + let graph_count = payload["graphs"].as_array().unwrap().len(); + assert_eq!(graph_count, N + 1); + } } diff --git a/crates/omnigraph/Cargo.toml b/crates/omnigraph/Cargo.toml index 1fa3436..c86520c 100644 --- a/crates/omnigraph/Cargo.toml +++ b/crates/omnigraph/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "omnigraph-engine" -version = "0.6.0" +version = "0.7.0" edition = "2024" description = "Runtime engine for the Omnigraph graph database." license = "MIT" @@ -16,8 +16,8 @@ default = [] failpoints = ["dep:fail", "fail/failpoints"] [dependencies] -omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.6.0" } -omnigraph-policy = { path = "../omnigraph-policy", version = "0.6.0" } +omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.0" } +omnigraph-policy = { path = "../omnigraph-policy", version = "0.7.0" } lance = { workspace = true } lance-datafusion = { workspace = true } datafusion = { workspace = true } @@ -51,7 +51,7 @@ chrono = { workspace = true } arc-swap = { workspace = true } [dev-dependencies] -omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.6.0" } +omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.0" } tokio = { workspace = true } lance-namespace-impls = { workspace = true } serial_test = "3" diff --git a/docs/releases/v0.7.0.md b/docs/releases/v0.7.0.md new file mode 100644 index 0000000..11d924d --- /dev/null +++ b/docs/releases/v0.7.0.md @@ -0,0 +1,109 @@ +# Omnigraph v0.7.0 + +Multi-graph server mode (MR-668). One `omnigraph-server` process can now serve 1–10 graphs concurrently behind cluster routes (`/graphs/{graph_id}/...`), with per-graph Cedar policy, runtime graph creation via `POST /graphs`, and CLI parity (`omnigraph graphs list/create`). + +## Breaking Changes + +- **Multi-graph deployments lose flat routes.** Single-graph invocation (`omnigraph-server `) is unchanged — same flat `/snapshot`, `/read`, `/branches`, etc. Multi-graph deployments serve those routes under `/graphs/{graph_id}/...`; bare flat paths return 404 in multi mode. +- **`ServerConfig` shape change** (programmatic embedders only): `ServerConfig { uri, policy_file }` is replaced by `ServerConfig { mode: ServerConfigMode }`, where `ServerConfigMode = Single { uri, policy_file } | Multi { graphs, config_path, server_policy_file }`. Callers that use `load_server_settings` are unaffected; callers that construct `ServerConfig` directly need to wrap their fields in `ServerConfigMode::Single`. +- **`AppState::uri()`** now returns `Option<&str>` (was `&str`). Returns `Some` in single mode, `None` in multi mode — per-graph URIs live on `GraphHandle.uri` instead. +- **`AppState::new_multi`** is the new multi-graph constructor. Single-mode `new_*` / `open_*` constructors are unchanged. +- **`AuthenticatedActor(Arc)` → `ResolvedActor { actor_id, tenant_id, scopes, source }`** (programmatic embedders only). The struct shape changes, but the HTTP contract — bearer auth, MR-731 spoof defense — is unchanged. Cluster-mode call sites construct with `tenant_id: None`, `scopes: vec![Scope::Full]`, `source: AuthSource::Static`. Forward-compat for Cloud mode (RFC 0003) and OAuth provider (RFC 0004). + +## New + +- **Multi-graph mode**. Invoke with `omnigraph-server --config omnigraph.yaml` where the YAML has a non-empty `graphs:` map and no single-mode selector (no `server.graph`, no CLI `` or `--target`). At startup the server opens every configured graph in parallel (bounded concurrency, fail-fast). +- **`POST /graphs`**. Runtime graph creation. Request body: + ```json + { + "graph_id": "beta", + "uri": "/data/beta.omni", + "schema": { "source": "" }, + "policy": { "file": "./policies/beta.yaml" } + } + ``` + `schema` and `policy` are nested objects — leaves room for future fields without breaking the shape. (Asymmetric with the existing `POST /schema/apply`, which still uses flat `schema_source: String`. A follow-up release may migrate it.) Body limit is 32 MiB. + + The server runs `Omnigraph::init` at the supplied URI, atomically rewrites `omnigraph.yaml` under an exclusive `fcntl::flock` with SHA-256 drift detection, then publishes the handle in the in-memory registry. Returns 201 on success; 409 on duplicate `graph_id` or URI; 503 on YAML drift (operator hand-edited the file between server start and the rewrite). +- **`GET /graphs`**. Lists every registered graph, sorted alphabetically by `graph_id`. Auth-required when bearer tokens are configured; Cedar-gated by `PolicyAction::GraphList` against `Omnigraph::Server::"root"`. Returns 405 in single mode. +- **CLI `omnigraph graphs list/create`**. Mirrors the HTTP surface. Reject local URI targets with a clear message — these subcommands are for remote multi-graph servers only. +- **Per-graph Cedar policy**. Each entry in the `graphs:` map can carry a `policy.file` path. Loaded at startup or attached at `POST` time. Cedar's `Omnigraph::Graph::""` resource is per-graph; the new `Omnigraph::Server::"root"` resource governs server-level actions. +- **Cedar action vocabulary**: `graph_create` and `graph_list` (server-scoped). `graph_delete` is reserved but not shipped — see "Deferred." +- **YAML drift detection**. Server hashes `omnigraph.yaml` at startup. `POST /graphs` re-hashes the on-disk file under the flock before rewriting; if the hash doesn't match the baseline, the rewrite refuses with 503 to avoid clobbering operator hand-edits. +- **`Omnigraph::init` error-path cleanup**. A failed init now best-effort-deletes the schema artifacts (`_schema.pg`, `_schema.ir.json`, `__schema_state.json`). Lance per-type directories created by `GraphCoordinator::init` may still orphan — full recursive cleanup needs a `delete_prefix` substrate primitive, deferred along with `DELETE /graphs/{id}`. +- **`omnigraph-policy` is now a published workspace crate.** The published-crates set is `omnigraph-compiler`, `omnigraph-policy`, `omnigraph-engine`, `omnigraph-server`, `omnigraph-cli`. + +## Configuration + +`omnigraph.yaml` schema additions (all optional, single-mode unaffected): + +```yaml +server: + bind: 0.0.0.0:8080 + policy: + file: ./server-policy.yaml # server-level Cedar (graph_create, graph_list) + +graphs: + alpha: + uri: s3://tenant-bucket/alpha + policy: + file: ./policies/alpha.yaml # per-graph Cedar + beta: + uri: s3://tenant-bucket/beta + # no per-graph policy → engine-layer enforcement is a no-op +``` + +## Deferred + +- **`DELETE /graphs/{id}`**. Cut from v0.7.0 scope to bound complexity (no `delete_prefix` substrate, no tombstones). Operators remove graphs by stopping the server, editing `omnigraph.yaml`, then restarting. +- **`StorageAdapter::delete_prefix`**. The substrate primitive that DELETE would need. Will land alongside DELETE in a future release. +- **`X-Actor-Id` service delegation forwarding**. Needs durable both-actor audit on `_graph_commits.lance` — out of scope. +- **Hot policy reload**. Restart is cheap at N≤10 graphs. + +## User Impact + +- **Existing single-graph deployments upgrade with zero changes.** `omnigraph-server ` with v0.6.0 config keeps working identically. +- **Multi-graph adoption is opt-in.** Add a `graphs:` map to `omnigraph.yaml` (and remove `server.graph`) to switch a deployment to multi mode. +- **Cluster routes are breaking for client SDKs targeting multi mode.** Generated clients from previous v0.6.0 OpenAPI specs will hit 404 on flat paths against a multi-mode server. Regenerate against the v0.7.0 `openapi.json`. +- **`fs2 = "0.4"`** is a new dependency for the file locking that powers the atomic YAML rewrite. POSIX-only. Linux / macOS deployment supported; Windows is out of scope. +- **Operator-supplied policy.yaml files don't change.** The Cedar `Omnigraph::Graph` and `Omnigraph::Server` entities are internally generated by `compile_policy_source` — operator YAML only references actions and groups. + +## Migration: single → multi + +```yaml +# Before (v0.6.0 single-mode invocation) +server: + graph: my-graph +graphs: + my-graph: + uri: /var/lib/omnigraph/my-graph +policy: + file: ./policy.yaml +``` + +```yaml +# After (v0.7.0 multi-mode — drop `server.graph` and the top-level `policy`) +server: + policy: + file: ./server-policy.yaml # NEW: governs POST/GET /graphs +graphs: + my-graph: + uri: /var/lib/omnigraph/my-graph + policy: + file: ./policy.yaml # MOVED: was top-level +``` + +Same `omnigraph.yaml` file; restart the server. Clients targeting the old flat routes (`/snapshot`, `/read`, …) must update to `/graphs/my-graph/snapshot`, etc. + +## Test coverage + +v0.7.0 ships ~280 new tests covering MR-668 specifically: + +- `GraphId` newtype validation, registry race tests (PR 3), init failpoints (PR 2a). +- Mode-inference four-rule matrix (PR 5), parallel multi-graph startup, cluster routing. +- Cedar `Server` resource refactor, backwards-compat for graph-only policies. +- `POST /graphs` happy path + duplicate graph_id + duplicate URI + YAML drift detection + 405-in-single-mode. +- Composite lifecycle: POST a graph, query it via cluster route, reload config from disk, confirm persistence. +- Per-graph Cedar policy enforced for a POST-created graph (engine-layer enforcement is re-applied via `Omnigraph::with_policy`). +- Concurrent distinct-id POSTs serialize correctly through the flock without spurious drift errors. +- MR-731 spoof regression test stays green across the entire refactor. diff --git a/docs/user/cli.md b/docs/user/cli.md index 743c284..da0127a 100644 --- a/docs/user/cli.md +++ b/docs/user/cli.md @@ -44,6 +44,33 @@ omnigraph read \ If the server requires auth, set `OMNIGRAPH_SERVER_BEARER_TOKEN` on the server and configure the matching `bearer_token_env` in `omnigraph.yaml`. +## Multi-graph servers (v0.7.0+) + +Against a multi-graph server (started with `--config omnigraph.yaml` referencing a non-empty `graphs:` map), use `omnigraph graphs` to enumerate and create graphs: + +```bash +# List +omnigraph graphs list --target http://server.example.com --json + +# Create +omnigraph graphs create \ + --target http://server.example.com \ + --graph-id beta \ + --graph-uri /data/beta.omni \ + --schema schema.pg \ + --policy-file ./policies/beta.yaml # optional +``` + +The CLI reads `--schema` from the local disk and inlines the contents as `schema.source` in the request body. Both subcommands reject local URI targets — they're for remote multi-graph servers only. + +`omnigraph graphs delete` is **not** in v0.7.0. To remove a graph, stop the server, edit `omnigraph.yaml`, restart. + +Per-graph URLs: once a graph exists, hit its cluster route from any subcommand by pointing `--uri` at it: + +```bash +omnigraph read --uri http://server.example.com/graphs/beta --query ./q.gq ... +``` + ## Runs, Policy, And Diagnostics ```bash diff --git a/docs/user/policy.md b/docs/user/policy.md index b121213..946092c 100644 --- a/docs/user/policy.md +++ b/docs/user/policy.md @@ -4,6 +4,8 @@ OmniGraph integrates AWS Cedar (`cedar-policy = 4.9`) for ABAC. ## Policy actions +Per-graph actions (bind to `Omnigraph::Graph::""`): + 1. `read` — query / snapshot / list branches & commits 2. `export` — NDJSON export 3. `change` — mutations @@ -13,12 +15,53 @@ OmniGraph integrates AWS Cedar (`cedar-policy = 4.9`) for ABAC. 7. `branch_merge` 8. `admin` — reserved for policy-management surfaces (hot reload, audit log, approvals). No call site today; see MR-724 for the reservation rationale. +Server-scoped actions (v0.7.0+; bind to `Omnigraph::Server::"root"`): + +9. `graph_create` — `POST /graphs` runtime graph creation (multi-graph mode) +10. `graph_list` — `GET /graphs` registry enumeration (multi-graph mode) + +Server-scoped actions cannot use `branch_scope` or `target_branch_scope` — they operate on the registry, not on a graph's branches. A rule cannot mix server-scoped and per-graph actions; split into separate rules. (`graph_delete` is reserved but not shipped in v0.7.0.) + ## Scope kinds - `branch_scope` — applied to source branch (`read`, `export`, `change`) - `target_branch_scope` — applied to destination (`schema_apply`, branch ops, run ops) - `protected_branches` — named list with special rules; rule scopes are `any | protected | unprotected` +## Per-graph vs. server-level policy (multi-graph mode) + +In multi mode (`omnigraph.yaml` with a non-empty `graphs:` map), policy files attach at two levels: + +```yaml +server: + policy: + file: ./server-policy.yaml # server-level: graph_create, graph_list + +graphs: + alpha: + uri: s3://tenant-bucket/alpha + policy: + file: ./policies/alpha.yaml # per-graph: read, change, branch_*, schema_apply + beta: + uri: s3://tenant-bucket/beta + # no per-graph policy → no engine-layer Cedar enforcement on beta +``` + +Each graph's HTTP request flows through its own per-graph policy. Management endpoints (`/graphs`) flow through the server-level policy. When `server.policy.file` is unset and bearer tokens are configured, `GET /graphs` falls through to MR-723 default-deny (only `read`-equivalent actions allowed for authenticated actors — and `graph_list` is not `read`) → 403. So the operator must explicitly authorize via `server-policy.yaml` to expose `/graphs`. + +Example server-level policy: + +```yaml +version: 1 +groups: + admins: [act-andrew] +rules: + - id: admins-can-create-and-list-graphs + allow: + actors: { group: admins } + actions: [graph_create, graph_list] +``` + ## Configuration `omnigraph.yaml`: diff --git a/docs/user/server.md b/docs/user/server.md index 0c4fcbd..a9a86f5 100644 --- a/docs/user/server.md +++ b/docs/user/server.md @@ -1,26 +1,80 @@ # HTTP Server (`omnigraph-server`) -Axum 0.8 + tokio + utoipa-generated OpenAPI. Single graph per process; deploy multiple processes for multi-tenant. +Axum 0.8 + tokio + utoipa-generated OpenAPI. **Two modes** (v0.7.0+): single-graph (legacy) and multi-graph (MR-668). Mode is inferred from CLI args + config shape. + +## Modes + +### Single-graph mode (legacy) + +`omnigraph-server ` or `omnigraph-server --target --config omnigraph.yaml`. Routes are flat — `/snapshot`, `/read`, `/branches`, etc. Behavior unchanged from v0.6.0. + +### Multi-graph mode (v0.7.0+) + +`omnigraph-server --config omnigraph.yaml` with a non-empty `graphs:` map and **no** single-mode selector (no `server.graph`, no ``, no `--target`). The server opens every configured graph in parallel at startup (bounded concurrency = 4, fail-fast on the first open error). Routes are nested under `/graphs/{graph_id}/...`. Bare flat paths return 404 in multi mode. + +Mode inference (four-rule matrix): + +1. CLI positional `` → single +2. CLI `--target ` → single +3. `server.graph` in config → single +4. `--config` + non-empty `graphs:` + no single-mode selector → **multi** +5. otherwise → error with migration hint ## Endpoint inventory +Per-graph endpoints — same body shape across modes; URLs differ: + +| Method | Single-mode path | Multi-mode path | Auth | Action | Handler | +|---|---|---|---|---|---| +| GET | `/healthz` | `/healthz` | none | — | `server_health` | +| GET | `/openapi.json` | `/openapi.json` | none | — | `server_openapi` (strips security if auth disabled; in multi mode emits cluster paths with `cluster_` operation-id prefix) | +| GET | `/snapshot?branch=` | `/graphs/{id}/snapshot?branch=` | bearer + `read` | snapshot of branch | `server_snapshot` | +| POST | `/read` | `/graphs/{id}/read` | bearer + `read` | run named query | `server_read` | +| POST | `/export` | `/graphs/{id}/export` | bearer + `export` | NDJSON stream | `server_export` | +| POST | `/change` | `/graphs/{id}/change` | bearer + `change` | mutation | `server_change` | +| GET | `/schema` | `/graphs/{id}/schema` | bearer + `read` | get current `.pg` source | `server_schema_get` | +| POST | `/schema/apply` | `/graphs/{id}/schema/apply` | bearer + `schema_apply` (target=`main`) | migrate | `server_schema_apply` | +| POST | `/ingest` | `/graphs/{id}/ingest` | bearer + `branch_create` (if new) + `change` | bulk load | `server_ingest` (32 MB body limit) | +| GET | `/branches` | `/graphs/{id}/branches` | bearer + `read` | list branches | `server_branch_list` | +| POST | `/branches` | `/graphs/{id}/branches` | bearer + `branch_create` | create | `server_branch_create` | +| DELETE | `/branches/{branch}` | `/graphs/{id}/branches/{branch}` | bearer + `branch_delete` | delete | `server_branch_delete` | +| POST | `/branches/merge` | `/graphs/{id}/branches/merge` | bearer + `branch_merge` | merge `source → target` | `server_branch_merge` | +| GET | `/commits?branch=` | `/graphs/{id}/commits?branch=` | bearer + `read` | list | `server_commit_list` | +| GET | `/commits/{commit_id}` | `/graphs/{id}/commits/{commit_id}` | bearer + `read` | show | `server_commit_show` | + +Server-level management endpoints (v0.7.0+): + | Method | Path | Auth | Action | Handler | |---|---|---|---|---| -| GET | `/healthz` | none | — | `server_health` | -| GET | `/openapi.json` | none | — | `server_openapi` (strips security if auth disabled) | -| GET | `/snapshot?branch=` | bearer + `read` | snapshot of branch | `server_snapshot` | -| POST | `/read` | bearer + `read` | run named query | `server_read` | -| POST | `/export` | bearer + `export` | NDJSON stream | `server_export` | -| POST | `/change` | bearer + `change` | mutation | `server_change` | -| GET | `/schema` | bearer + `read` | get current `.pg` source | `server_schema_get` | -| POST | `/schema/apply` | bearer + `schema_apply` (target=`main`) | migrate | `server_schema_apply` | -| POST | `/ingest` | bearer + `branch_create` (if new) + `change` | bulk load | `server_ingest` (32 MB body limit) | -| GET | `/branches` | bearer + `read` | list branches | `server_branch_list` | -| POST | `/branches` | bearer + `branch_create` | create | `server_branch_create` | -| DELETE | `/branches/{branch}` | bearer + `branch_delete` | delete | `server_branch_delete` | -| POST | `/branches/merge` | bearer + `branch_merge` | merge `source → target` | `server_branch_merge` | -| GET | `/commits?branch=` | bearer + `read` | list | `server_commit_list` | -| GET | `/commits/{commit_id}` | bearer + `read` | show | `server_commit_show` | +| GET | `/graphs` | bearer + `graph_list` on `Server::"root"` | list registered graphs | `server_graphs_list` (405 in single mode) | +| POST | `/graphs` | bearer + `graph_create` on `Server::"root"` | create new graph at runtime | `server_graphs_create` (405 in single mode, 32 MB body limit) | + +`DELETE /graphs/{id}` is **not** in v0.7.0. Operators remove graphs by stopping the server, editing `omnigraph.yaml`, then restarting. + +## `omnigraph.yaml` ownership (multi mode) + +The server owns `omnigraph.yaml` while running. `POST /graphs` rewrites the file atomically under an exclusive `fcntl::flock` with SHA-256 drift detection: + +- The server hashes the file at startup. `POST /graphs` re-hashes under the flock before rewriting. If the hash doesn't match (operator hand-edited), the rewrite refuses with 503. +- Comments and blank-line structure are **not** preserved across server-side rewrites — the file is regenerated via `serde_yaml::to_string`. +- Operators must not edit the file while the server is running. To make offline changes: stop the server, edit, restart. + +In **single mode** the server never writes `omnigraph.yaml`. + +## `POST /graphs` body shape + +```json +{ + "graph_id": "alpha", + "uri": "s3://tenant-bucket/alpha", + "schema": { "source": "" }, + "policy": { "file": "./policies/alpha.yaml" } +} +``` + +- `schema` and `policy` are nested — leaves room for future fields without breaking the shape. +- `policy` is optional; without it, no per-graph Cedar enforcement. +- Status codes: 201 Created · 400 invalid body · 401 missing bearer · 403 Cedar denied · 405 single mode · 409 duplicate `graph_id` or `uri` · 413 body >32 MiB · 500 init or rewrite failure · 503 YAML drift. ## Streaming diff --git a/openapi.json b/openapi.json index 0bb9ec5..5d326fe 100644 --- a/openapi.json +++ b/openapi.json @@ -7,7 +7,7 @@ "name": "MIT", "identifier": "MIT" }, - "version": "0.6.0" + "version": "0.7.0" }, "paths": { "/branches": {