omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-09 01:35:18 +02:00

Author	SHA1	Message	Date
Andrew Altshuler	aadfa11ecb	schema: HTTP allow_data_loss exposure + e2e drop coverage (MR-694 follow-up) (#107 ) Some checks failed CI / Classify Changes (push) Has been cancelled Details CI / Check AGENTS.md Links (push) Has been cancelled Details Release Edge / Prepare edge release (push) Has been cancelled Details CI / Test Workspace (push) Has been cancelled Details CI / Test omnigraph-server --features aws (push) Has been cancelled Details CI / RustFS S3 Integration (push) Has been cancelled Details Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled Details Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled Details The schema-lint chassis v1.2 (PR #100) shipped `--allow-data-loss` on the CLI, but `SchemaApplyRequest` had no equivalent field — Hard-mode drops were CLI-only. This commit closes that feature gap and adds e2e test coverage for drop modes across HTTP + CLI, plus data preservation on additive apply, plus a CLI↔SDK plan-parity assertion. Feature gap closed: - `crates/omnigraph-server/src/api.rs` — added `allow_data_loss: bool` (default false via `#[serde(default)]`) to `SchemaApplyRequest`. Added `Default` derive so test usages can use `..Default::default()`. - `crates/omnigraph-server/src/lib.rs` — `server_schema_apply` now constructs `SchemaApplyOptions { allow_data_loss: request.allow_data_loss }` and threads through to `apply_schema_as`. - `crates/omnigraph-cli/src/main.rs` — remote-URI schema-apply path used to bail with "--allow-data-loss not yet supported on remote"; now forwards the flag into the JSON payload so the CLI behaves identically against local and remote URIs. - `openapi.json` — regenerated; only diff is the new field on `SchemaApplyRequest`. Tests added (8 new): * `crates/omnigraph-server/tests/server.rs` (+5): - `schema_apply_route_soft_drops_property_via_http` — POST schema removing nullable property, verify catalog reflects the drop AND `snapshot_at_version(pre)` still has `age` in the field list (time-travel reachability is the Soft contract). - `schema_apply_route_soft_drops_node_type_via_http` — POST schema removing `Company` node + cascading `WorksAt` edge. - `schema_apply_route_hard_drops_property_with_allow_data_loss` — POST with `allow_data_loss: true`, verify plan step reports `mode: hard`. - `schema_apply_route_keeps_drops_soft_without_flag` — same schema without flag, verify `mode: soft`. Pins default semantics against accidental Hard promotion. - `schema_apply_route_additive_property_preserves_existing_rows` — load fixture, POST adding nullable property, verify row count preserved (SDK suite covers data preservation on drops + renames; additive AddProperty wasn't pinned). Plus helpers `schema_without_age` and `schema_without_company`. * `crates/omnigraph-cli/tests/cli.rs` (+3): - `schema_apply_allow_data_loss_flag_promotes_drops_to_hard` — CLI `omnigraph schema apply --allow-data-loss --schema X.pg --json`, verify plan step has `mode: hard`. - `schema_apply_without_allow_data_loss_keeps_soft_drops` — without flag, verify Soft. - `schema_plan_parity_cli_and_sdk` — same `.pg` source through `Omnigraph::plan_schema` (SDK) and `omnigraph schema plan --json` (CLI), assert the steps array is byte-identical post-JSON. HTTP has no `/schema/plan` endpoint; apply-side parity is implicitly covered by the HTTP drop tests + CLI drop tests using identical fixtures. Docs: - `docs/user/schema-language.md` — new "Destructive drops" section documenting Soft vs Hard semantics and that `allow_data_loss` is now honored uniformly across CLI / HTTP / SDK. Verification: every new test passes; full `cargo test --workspace --locked` green; `scripts/check-agents-md.sh` passes. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 01:56:46 +03:00
Andrew Altshuler	e8fec2fa0f	tests: policy chassis e2e gap-fills (MR-722 follow-up) (#106 ) Some checks failed CI / Classify Changes (push) Has been cancelled Details CI / Check AGENTS.md Links (push) Has been cancelled Details Release Edge / Prepare edge release (push) Has been cancelled Details CI / Test Workspace (push) Has been cancelled Details CI / Test omnigraph-server --features aws (push) Has been cancelled Details CI / RustFS S3 Integration (push) Has been cancelled Details Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled Details Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled Details * tests: policy chassis e2e gap-fills (MR-722 follow-up) Audit after PRs #101-105 surfaced real e2e gaps in the policy chassis that could let regressions ride through silently. Coverage was strong at the SDK level (18 chassis tests) and reasonable at HTTP (12+ policy tests), but the CLI×writer matrix was asymmetric (only `change` tested end-to-end), the `cli.actor` config-only precedence path was untested, the `OMNIGRAPH_UNAUTHENTICATED` env-var read path was unexercised, `serve()`'s startup-refusal propagation was structural-review only, and engine↔HTTP decision parity was a structural property without a test pinning it. This commit closes those gaps. Added (15 new tests, all test-only): * `policy_engine_chassis.rs` (+2): `load_file_as` allow + deny pair — PR #104 added the actor-aware mirror of `load_file` but it was only exercised via CLI integration; this is direct-SDK coverage. * `omnigraph-server/src/lib.rs` mod tests (+2): - `unauthenticated_env_var_classification` — consolidated single test (process-global env var; running parallel would race) that pins truthy values, falsy values, unset, and CLI-flag-overrides- env behavior of the `OMNIGRAPH_UNAUTHENTICATED` read path inside `load_server_settings`. - `serve_refuses_to_start_in_state_1_without_unauthenticated` — `#[serial]` integration test. Clears all bearer-token env vars, builds a `ServerConfig` with no policy file and no flag, calls `serve(config).await`, asserts Err before any side-effecting work (Lance dataset open, TcpListener::bind). Guards the classifier→serve propagation path so a future refactor that drops the call turns red. * `omnigraph-server/tests/server.rs` (+4): `policy_decision_parity_` — four cases (Change×allowed+denied, BranchMerge×allowed+denied). Each case runs the same Cedar decision via both SDK (`Omnigraph::with_policy().mutate_as` / `branch_merge_as`) and HTTP (`POST /change` / `POST /branches/merge`) and asserts both either Allow or Deny. The structural property (both paths call `PolicyChecker::check`) is now test-asserted. `omnigraph-cli/tests/system_local.rs` (+8): the CLI×writer matrix fan-out: - `local_cli_load_enforces_engine_layer_policy` - `local_cli_ingest_enforces_engine_layer_policy` - `local_cli_schema_apply_enforces_engine_layer_policy` - `local_cli_branch_create_enforces_engine_layer_policy` - `local_cli_branch_delete_enforces_engine_layer_policy` - `local_cli_branch_merge_enforces_engine_layer_policy` Each: one denied case (`--as act-bruno` against protected main) + one allowed case (`--as act-ragnor` via existing/extended admins-* rules). Plus: - `local_cli_actor_from_config_used_when_no_flag` — proves the config-only precedence path works. - `local_cli_actor_flag_overrides_config_actor` — proves the `--as` flag wins over `cli.actor` in the config. Adds `local_policy_config_with_actor` helper. Extends `POLICY_E2E_YAML` with `admins-branch-ops` (BranchCreate + BranchDelete) and `admins-schema-apply` rules so the CLI×writer matrix has positive-case rule coverage. Verification: all new tests pass; full `cargo test --workspace --locked` is green; `scripts/check-agents-md.sh` passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * tests: serialize env-touching server lib tests to fix CI flake CI flake on PR #106's Test Workspace job: two of the new tests (`serve_refuses_to_start_in_state_1_without_unauthenticated` and `unauthenticated_env_var_classification`) raced against `server_bearer_tokens_from_env_reads_legacy_token_and_token_file`, which sets `OMNIGRAPH_SERVER_BEARER_TOKEN` via `EnvGuard`. While `serve_refuses` was mid-execution with its EnvGuard cleared, the bearer-token test's EnvGuard had `OMNIGRAPH_SERVER_BEARER_TOKEN` set; `resolve_token_source()` saw it and classified the runtime state as `DefaultDeny` rather than refusing — so the test panicked with "Dataset at path X not found" instead of the expected refusal message. The unauthenticated test had the symmetric failure: its `OMNIGRAPH_UNAUTHENTICATED="anything"` got overwritten by a peer `EnvGuard` drop. Fix: mark every test that uses `EnvGuard` with `#[serial]` so they serialize against each other (default key). Already on `serve_refuses_to_start_in_state_1_without_unauthenticated`; added to `unauthenticated_env_var_classification` and `server_bearer_tokens_from_env_reads_legacy_token_and_token_file`. The `parse_bearer_tokens_json_*` tests don't touch env vars and stay parallel. Locally green (36 tests pass on my workstation); the parallelism issue is CI-runner-specific (more aggressive thread interleaving) but the fix is universal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 22:25:04 +03:00
Andrew Altshuler	f3f2a051ba	policy: server 3-state default-deny matrix (MR-723) (#105 ) Closes the "tokens but no policy" trap. Pre-MR-723, an operator who configured bearer tokens and forgot to set policy.file got a server that required auth and then permitted every action — the illusion of protection. After MR-723, that configuration is default-deny: only `read` actions succeed; every other action returns HTTP 403. Three startup states, classified deterministically: - Open — no tokens, no policy. Requires explicit `--unauthenticated` flag or `OMNIGRAPH_UNAUTHENTICATED=1`; otherwise `serve()` refuses to start. Forces the operator to opt in to "fully open dev mode" so it can't happen accidentally. - DefaultDeny — tokens configured, no policy. `authorize_request` rejects every action except `Read` with 403. The warn-log on startup names the misconfiguration explicitly. - PolicyEnabled — policy file configured. Cedar evaluates every request, unchanged from pre-MR-723. What landed: - `ServerConfig.allow_unauthenticated: bool` + `--unauthenticated` flag on the `omnigraph-server` bin + `OMNIGRAPH_UNAUTHENTICATED` env var (`load_server_settings` honors both). - New `classify_server_runtime_state(has_tokens, has_policy, allow_unauthenticated) -> Result<ServerRuntimeState>` pure function. `serve()` calls it before opening the engine and bails with a clear error when the operator hits the no-tokens-no-policy-no-flag cell. - `authorize_request` state-2 branch: when `policy_engine()` is None but the bearer-auth middleware delivered an authenticated actor, any action other than `Read` returns 403 with a message that names the misconfiguration. - `AppState::with_policy_engine(self, engine)` builder method so integration tests that need a custom workload (`new_with_workload`) can still install a permit-all policy without a new constructor. - `app_for_loaded_repo_with_auth(token)` and `app_for_loaded_repo_with_auth_tokens(tokens)` test helpers now install a permit-all policy alongside tokens — they previously represented the "tokens but no policy" state that MR-723 makes default-deny, and tests that don't care about policy were inadvertently coupled to the loophole. Tests: - `classify_` unit tests (3) — every cell of the matrix. - `default_deny_mode_allows_read_for_authenticated_actor` — GET /snapshot succeeds with bearer token + no policy. - `default_deny_mode_rejects_change_with_forbidden` — POST /change rejected with 403 + "default-deny" message. - `default_deny_mode_rejects_schema_apply_with_forbidden` — POST /schema/apply rejected with 403 + "default-deny" message. - New `app_for_repo_with_auth_tokens_only(schema, tokens)` helper builds the State-2 fixture without policy. The pre-MR-723 helpers `app_for_loaded_repo_with_auth` shift semantics to "tokens + permit-all" so existing tests retain their original intent. docs/user/policy.md: new "Server runtime states (MR-723)" section documents the matrix and the explicit `--unauthenticated` opt-in. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 17:02:26 +03:00
Andrew Altshuler	a275306a15	policy: CLI policy injection — local writes go through engine enforce (MR-722) (#104 ) Closes the CLI side of the policy chassis fan-out. Before this commit, CLI direct-engine writes bypassed Cedar entirely because the CLI never called `Omnigraph::with_policy(...)` for non-`policy validate\|test\|explain` subcommands. After this commit, every CLI direct-engine writer (change, load, ingest, branch create/delete/merge, schema apply) opens the engine via a new `open_local_db_with_policy(uri, &config)` helper that installs the configured `PolicyEngine` when `policy.file` is set, and threads the resolved actor through to the `_as` writer methods. Actor identity resolution: - New top-level `--as <ACTOR>` global flag on the CLI overrides config. - New `cli.actor` field in `omnigraph.yaml` provides a default actor. - Precedence: `--as` > `cli.actor` > None. - When policy is configured and neither is set, the engine-layer footgun guard fires and the write is denied — silent bypass via "I forgot the actor" is exactly what the guard prevents. - Remote HTTP writes ignore both — bearer-token-resolved server-side. Helpers added in main.rs: - `open_local_db_with_policy(uri, &config) -> Result<Omnigraph>` — opens the DB and installs the PolicyEngine when configured. Without policy this is identical to a bare `Omnigraph::open`. - `resolve_cli_actor(cli_as, &config) -> Option<&str>` — implements the flag > config > None precedence. Engine: added `load_file_as` to the loader as the actor-aware mirror of `load_file`, so CLI file-path loads flow through the same enforce gate as in-memory `load_as` calls. Test rewrite: `local_cli_policy_tooling_is_end_to_end_while_local_writes_stay_unenforced` was the explicit assertion of the pre-chassis hole. Renamed and split: - `local_cli_policy_tooling_is_end_to_end` — sanity for the read-only policy CLI surfaces (validate/test/explain), unchanged behavior. - `local_cli_change_enforces_engine_layer_policy` — the new assertion: policy installed + no actor → footgun-guard denial; `--as act-bruno` on protected main → Cedar denial; `--as act-ragnor` (admins-write rule) on main → permit, write committed. POLICY_E2E_YAML gains an `admins-write` rule so the permit case has a non-trivial actor to exercise. docs/user/policy.md updated with `cli.actor` + `--as <ACTOR>` usage. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 04:06:21 +03:00
Andrew Altshuler	da42beec41	policy: chassis fan-out — _as variants on the remaining 6 writers (MR-722) (#103 ) PR #102 wired apply_schema_as. This PR completes the chassis-side coverage so every public mutating engine entry point hits the same Omnigraph::enforce(action, scope, actor) gate regardless of transport: - mutate_as → enforce(Change, Branch(branch), actor) - load_as → enforce(Change, Branch(branch), actor) - ingest_as → enforce(Change, Branch(branch), actor); also threads actor through the implicit branch_create_from_as so fresh-branch ingest correctly hits BranchCreate too - branch_create_as → enforce(BranchCreate, TargetBranch(name), actor) - branch_create_from_as → enforce(BranchCreate, BranchTransition { source, target }, actor) - branch_delete_as → enforce(BranchDelete, TargetBranch(name), actor) - branch_merge_as → enforce(BranchMerge, BranchTransition { source, target }, actor) Three new _as variants for branch ops (create, create_from, delete) that had no actor surface before; existing actor-less variants delegate with actor=None so the no-policy path is a strict no-op. HTTP handlers updated to thread the resolved actor into the new _as variants for branch_create and branch_delete (was previously dropped). 14 new SDK chassis tests (one allow + one deny pair per wired writer); the existing 4 apply_schema_as tests stay. All 18 pass. docs/user/policy.md updated to describe engine-wide enforcement and the coarse-vs-fine layer split (engine = action gate, query layer per-row = MR-725 future). AGENTS.md capability matrix updated to match. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 03:38:18 +03:00
Andrew Altshuler	9973683261	policy: chassis core — omnigraph-policy crate + Omnigraph::enforce() (MR-722) (#102 ) PR #2 of the policy chassis series (PR #1 = MR-731, merged in #101). The structural fix that moves Cedar enforcement from HTTP-only to engine-wide. apply_schema is the proof-of-concept writer; PR #3 fans the enforce() call out to the remaining six (mutate_as, load, ingest_as, branch_create_from, branch_delete, branch_merge). ## What lands ### New crate: omnigraph-policy The 844-line policy.rs moves from `omnigraph-server` into a new `omnigraph-policy` workspace crate so both engine and server can depend on it. Cedar dependency moves with it. The server's policy.rs becomes a re-export shim (`pub use omnigraph_policy::`) so existing `omnigraph_server::PolicyAction` etc. paths keep working — CLI and test consumers don't have to migrate in one go. ### New trait: PolicyChecker ```rust pub trait PolicyChecker: Send + Sync { fn check(&self, action: PolicyAction, scope: &ResourceScope, actor: &str) -> Result<(), PolicyError>; } ``` `PolicyEngine` (Cedar-backed) implements it. `Omnigraph::with_policy()` takes `Arc<dyn PolicyChecker>`. Engine tests mock the trait without spinning up Cedar. MR-725 will extend the trait with `predicate_for()` for query-layer pushdown — additive, no call-site changes. ### New enum: ResourceScope Four variants — Graph, Branch, TargetBranch, BranchTransition — mapping cleanly to today's `(branch, target_branch)` shape on PolicyRequest via `to_branch_pair()`. Each engine writer picks the variant that matches the existing HTTP-layer convention so engine and HTTP evaluate the same Cedar decision. Invariant: ResourceScope stays at branch granularity. Per-type and per-row scope are MR-725's territory, not engine-layer's. Adding Type/Row variants here creates two places per-type policy can be evaluated, which can drift. See chassis design refinements comment on MR-722 (2026-05-17). ### Omnigraph::with_policy() + enforce() New `policy: Option<Arc<dyn PolicyChecker>>` field on Omnigraph, None by default (preserves embedded/dev no-enforcement mode). * `with_policy(self, checker)` setter — builder-style, consumes self. * `enforce(action, scope, actor)` — the gate. When policy is None, no-op. When policy is Some AND actor is None, hard error — silent bypass via "I forgot the actor" is exactly the footgun this gate is here to prevent. ### apply_schema_as: first writer wired * New public method `apply_schema_as(source, options, actor)` that calls `enforce(SchemaApply, TargetBranch("main"), actor)` before acquiring the schema-apply lock or doing any other work. * Existing `apply_schema(source)` and `apply_schema_with_options(...)` delegate to it with actor=None (no-actor variants). * HTTP handler `server_schema_apply` updated to call apply_schema_as with the resolved actor. AppState construction injects the PolicyEngine into Omnigraph via `with_policy`. HTTP-layer authorize_request still fires first; the engine gate is the redundant-but-correct backstop and the only path that protects SDK / embedded callers. PR #3 removes the HTTP redundancy. ### OmniError::Policy New error variant for engine-layer policy denial / evaluation failure. ApiError::from_omni maps it to 403. ### MR-724 Admin action — Option A reservation PolicyAction::Admin kept in the enum with a load-bearing doc comment naming its future consumers (hot reload, audit log query, approvals list per MR-726 / MR-732 / MR-734). No enforce(Admin, ...) call site exists yet — the variant is reserved so the action vocabulary is complete from chassis day one. MR-724 closes when the first consumer surface ships. ### New SDK-side integration test `crates/omnigraph/tests/policy_engine_chassis.rs` — four tests covering: * Policy denies for unauthorized actor → OmniError::Policy * Policy permits for authorized actor → apply succeeds * Policy installed + no actor → hard error (forget-the-actor footgun) * No policy → no-op (embedded/dev default still works) These exercise the engine path directly — no HTTP layer involved. ## Test results - cargo test --workspace --locked --no-fail-fast: 851 passed, 0 failed * 45 server tests (existing) pass * 14 schema_apply tests (existing) pass * 4 new chassis tests pass * 60 OpenAPI tests pass (no HTTP API surface changes) * No regressions across the workspace ## Architectural decisions baked in Per MR-722 chassis design refinements comment (2026-05-17): 1. PolicyChecker is a trait, not just a concrete. Engine and server consume the trait. MR-725 adds predicate_for() additively. 2. ResourceScope stays at branch granularity. No Type/Row variants. 3. Coarse-vs-fine framing pinned: engine-layer is action gate; query-layer (MR-725) is predicate gate. Both backed by same Cedar engine; non-overlapping responsibilities. 4. Admin action reserved for policy-management surfaces (MR-724 Option A). ## Pending follow-ups (PR #3+) - Fan-out enforce() to mutate_as, load, ingest_as, branch_create_from, branch_delete, branch_merge (PR #3). - Remove HTTP-layer authorize_request redundancy once engine gate covers all writers (PR #3). - CLI policy injection into Omnigraph for non-`policy validate\|test\|explain` subcommands (PR #3 or follow-up). - MR-723 default-deny 3-state matrix (PR #4). - MR-736 severity warn/deny (PR #5). - AGENTS.md scope-of-enforcement rewrite once chassis fully lands. - Coarse-vs-fine framing in docs/user/policy.md. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 00:36:36 +03:00
Andrew Altshuler	7a86f654d4	policy: codify signed-token-claim-only actor identity (MR-731) (#101 ) Warm-up commit for the policy chassis epic (MR-722). PR #1 of the chassis series — same role as schema-lint v1's commit #1 baseline. Zero behavioral change; establishes the regression test, the load-bearing doc comment, and the user-doc paragraph for an invariant already true in code. Server auth already resolves `actor_id` from the matched bearer token at `omnigraph-server/src/lib.rs:692-694`, overwriting whatever the handler put in the PolicyRequest. The principle is named in docs/dev/invariants.md Hard Invariant 11 ("clients cannot set actor identity directly"). What was missing: a regression test, a load-bearing doc comment at the resolution site, and a user-facing documentation paragraph. This commit adds all three. Why first. The actor-identity invariant is the foundation every other policy decision stands on. If `actor_id` can be spoofed, every chassis primitive (per-row scope, audit log, two-person rule) becomes ungated. Pinning the invariant first means PR #2 (the chassis core) doesn't have to re-prove this assertion. Changes: * crates/omnigraph-server/tests/server.rs — new regression test actor_id_resolves_from_bearer_token_ignoring_client_supplied_headers with three sub-assertions: - spoof-up: bearer for denied actor + X-Actor-Id naming allowed actor → 403 (header doesn't promote) - spoof-down: bearer for allowed actor + X-Actor-Id naming denied actor → 200 (header doesn't demote) - empty-string spoof: empty X-Actor-Id doesn't clear resolved actor Cross-link to MR-777 (auth boundary cases — actor-id collision + malformed bearer) noted in the test docstring. * crates/omnigraph-server/src/lib.rs — expanded doc comment at the actor-resolution site explaining the SECURITY INVARIANT, citing Hard Invariant 11, the Supabase RLS history footgun, and the regression test that pins the contract. Reader thinking "I should let clients override actor_id for impersonation" hits this comment first. * docs/user/policy.md — new "Actor identity (signed-claim-only)" section near the existing Server enforcement section. Closes the user-facing doc gap MR-731's "Done when" requires. Architectural decisions for PR #2+ pinned this session (not implemented here, recorded so future implementers don't re-litigate): - PolicyEngine moves to new `omnigraph-policy` workspace crate so both engine and server can depend on it (Q2). - `enforce(action, scope, actor)` will take a new `ResourceScope` enum, leaving room for MR-725's per-type and per-row variants (Q3). - `PolicyAction::Admin` is kept and wired (Option A) — meta-action for policy-management surfaces (hot reload, audit log query, approvals list) as those consumer features land (Q4). Test results: - cargo test -p omnigraph-server --test server: 45 pass (44 existing + 1 new); no regressions - scripts/check-agents-md.sh: passes (34 links / 33 docs OK) Out of scope (PR #2+): - Omnigraph::with_policy() + enforce() method - omnigraph-policy crate creation - ResourceScope enum - CLI policy injection into Omnigraph - HTTP-layer redundant-check removal - MR-724 Admin action wiring (PR #2) - MR-723 default-deny 3-state (PR #4) - MR-736 severity warn/deny (PR #5) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 02:51:34 +03:00
Devin AI	e44a4704eb	docs: fix admission gating description	2026-05-10 14:16:26 +00:00
Devin AI	6a3f0677ae	server: drop unwired try_admit_rewrite / 503 admission surface	2026-05-09 20:58:17 +00:00
Ragnor Comerford	f9a0f31f80	server: drop 503 from OpenAPI on admission-gated endpoints (unreachable) Cursor Bugbot LOW on commit `3ad359d`: try_admit_rewrite is defined and tested but no HTTP handler calls it; the six handler OpenAPI annotations declared status = 503 (added in `8e1a8e7`) but try_admit (the only path handlers invoke) returns 429 only. 503 was unreachable. Fix: remove (status = 503, ...) from the six handler OpenAPI annotations and regenerate openapi.json. Kept as forward-looking infrastructure: try_admit_rewrite, global rewrite semaphore, RejectReason::GlobalRewriteExhausted, ApiError::ServiceUnavailable, the 503 branch in IntoResponse, --global-rewrite-cap, and OMNIGRAPH_GLOBAL_REWRITE_MAX. When a future commit wires try_admit_rewrite into a handler, the 503 OpenAPI annotation lands alongside that wiring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 21:54:24 +02:00
Ragnor Comerford	22d76dbb40	server+bench: AppState::new_with_workload; bench drops set_var, exercises heavy cap Two cubic findings on bench_actor_isolation.rs flagged together: P2 (lib.rs:202): `unsafe { std::env::set_var(...) }` ran inside `#[tokio::main] async fn main()` AFTER the multi-thread tokio runtime was up. Rust 2024 made `set_var` unsafe because libc's `setenv` is not thread-safe; concurrent env reads from logging or runtime internals can race or read torn state. Fix (correct by design, AGENTS.md rule 9): add a public `AppState::new_with_workload(uri, db, bearer_tokens, workload)` constructor that takes a caller-built `WorkloadController`. Tests and benches override per-actor caps via the constructor instead of mutating global env. Closes the bug class "tests need to mutate global env to override AppState defaults." P2 (lib.rs:130): heavy actor's `oneshot.await` inside the loop serialized — heavy in-flight count was always 1, so cap=1 never tripped on the heavy side. The bench validated isolation (light p99 bounded) but didn't demonstrate the rejection path. Fix: add a `--heavy-concurrency` arg (default 4) and spawn batches as concurrent tokio tasks bounded by an internal semaphore. With heavy_concurrency=4 and inflight_cap=1, the bench now reports heavy_too_many_requests > 0 and heavy_ok == 1 at peak — proving the gate fires for the heavy actor. Sample run on local FS (4 light actors × 30 ops, 20 heavy batches × 50 rows, heavy_concurrency=4, cap=1): heavy_ok: 1 heavy_too_many_requests: 19 light_ok: 120 light_too_many_requests: 0 light_p99: 565 ms (target < 2 s) Heavy saturates its own cap; light actors are completely unaffected. The isolation property is now empirically proven by the rejection counts rather than just by the latency tail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 17:57:42 +02:00
Ragnor Comerford	8e1a8e7d55	server: document 429 / 503 in admission-gated endpoint OpenAPI responses Closes the cubic finding (P2) at lib.rs:1061: the new admission gates add HTTP 429 / 503 failure paths but the affected endpoint `#[utoipa::path(... responses(...) ...)]` annotations weren't updated. Also closes a pre-existing miss on /change (admission-gated since PR 2 Step F). Adds (status = 429, ...) and (status = 503, ...) to all six admission-gated endpoints: - POST /change (operation_id = "change") - POST /schema/apply (operation_id = "applySchema") - POST /ingest (operation_id = "ingest") - POST /branches (operation_id = "createBranch") - DELETE /branches/{branch} (operation_id = "deleteBranch") - POST /branches/merge (operation_id = "mergeBranches") The descriptions reference the `Retry-After` header, which the `IntoResponse for ApiError` impl emits on both codes (added in commit `c745dd6`). openapi.json regenerated via OMNIGRAPH_UPDATE_OPENAPI=1; the openapi sentinel test passes both with the regen flag and in strict-check mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 17:49:02 +02:00
Ragnor Comerford	c745dd69ae	server: emit Retry-After header on 429 / 503 responses Closes the doc-vs-code gap at api.rs:343 and lib.rs:344-355: the documentation claims `Retry-After` is set on TooManyRequests / ServiceUnavailable responses, but `IntoResponse for ApiError` emitted only `(StatusCode, Json(ErrorOutput))` — no header. Wires a constant `RETRY_AFTER_SECONDS = "60"` for both 429 and 503 codes. Plumbing per-RejectReason durations through is a follow-up; the admission rejects we surface today recover bounded by request handler duration rather than calendar wait, so a constant suffices. Pinned by `ingest_per_actor_admission_cap_returns_429`. Test now fully green: 1+ of 8 concurrent /ingest under cap=1 receives 429 with Retry-After: 60. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:58:47 +02:00
Ragnor Comerford	05a8bd5de1	server: gate /ingest /branches/* /schema/apply on per-actor admission Closes the gap that admission control only fired on /change. A heavy actor sending bulk-ingest traffic could exhaust shared engine capacity (Lance I/O threads, manifest churn) without hitting the per-actor cap. Wires `state.workload.try_admit(&actor_arc, est_bytes)` into the five remaining mutating handlers AFTER Cedar authorization (so denied requests don't consume admission slots) and BEFORE the engine call. Byte estimates per handler: - /ingest: request.data.len() (NDJSON body) - /schema/apply: request.schema_source.len() - /branches/create, /branches/delete, /branches/merge: 256 (small JSON; the heavy work is bounded per-(table, branch) by the engine's writer queue rather than by request size) The admission guard is held in `let _admission = ...` so it stays alive until handler return, releasing the count permit + decrementing the byte budget on drop. Pinned by `ingest_per_actor_admission_cap_returns_429` (previous commit). The test still fails on the Retry-After header assertion; the next commit emits the header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:57:53 +02:00
Ragnor Comerford	c15962e6b0	server: flip AppState to Arc<Omnigraph>, wire admission on /change (PR 2 Step F) The substantive PR 2 change. Removes the global server `RwLock<Omnigraph>` that has serialized every mutating request across all actors. Disjoint `(table, branch)` writes from different actors now run concurrently, guarded only by the engine's per-(table, branch) write queue (PR 1b) and per-actor admission control (PR 2 Step E). AppState changes: - `db: Arc<RwLock<Omnigraph>>` -> `engine: Arc<Omnigraph>` - New field: `workload: Arc<workload::WorkloadController>` initialized from env (`OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=16`, `OMNIGRAPH_PER_ACTOR_BYTES_MAX=4GiB`, `OMNIGRAPH_GLOBAL_REWRITE_MAX=4`). - `tokio::sync::RwLock` import dropped. Handler updates (16 sites): - All `Arc::clone(&state.db).read_owned().await` and `write_owned()` calls replaced with `let db = &state.engine`. Engine APIs are now `&self` (Step C) so this works directly. - `/export` clones `Arc<Omnigraph>` once and moves into the spawned task instead of acquiring a long-held read lock. - `/change` handler additionally wires `state.workload.try_admit(&actor_arc, est_bytes)`. Cedar runs FIRST so denied requests don't consume admission slots; admission runs SECOND before the engine call. `est_bytes` uses the request body size as a coarse proxy. API surface additions (`api::ErrorCode`): - `TooManyRequests` -> HTTP 429 (per-actor cap exceeded; respect `Retry-After`) - `ServiceUnavailable` -> HTTP 503 (global rewrite pool exhausted) `ApiError` constructors `too_many_requests` / `service_unavailable` and `from_workload_reject` (maps `RejectReason` variants to HTTP status). Other mutating handlers (`/ingest`, `/branches/*`, `/branches/merge`, `/schema/apply`) currently flow through the Arc<Omnigraph> path without admission gates; wiring those is mechanical and lands as a follow-up. The /change hot path covers the bulk of MR-686's load profile. OpenAPI regenerated to include the new ErrorCode variants. 102 lib + 39 server tests + 5 workload tests pass. The regression sentinel `change_conflict_returns_manifest_conflict_409` continues to pass (revalidation perf opt + per-table queue + publisher CAS preserve manifest_conflict semantics under concurrent writers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 17:08:26 +02:00
Ragnor Comerford	17a1665002	server: add WorkloadController for per-actor admission (PR 2 Step E) PR 2 removes the global server `RwLock<Omnigraph>` (Step F). Without admission control, one heavy actor would exhaust shared capacity (Lance I/O threads, manifest churn, network) and starve other actors. The WorkloadController bounds per-actor in-flight count + bytes and provides a global rewrite-pool semaphore for compaction / index builds. New file: `crates/omnigraph-server/src/workload.rs` (~250 LOC + 5 tests). API: - `WorkloadController::new(inflight_cap, byte_cap, rewrite_cap)` / `from_env()` / `with_defaults()`. - `try_admit(actor_id, est_bytes) -> Result<AdmissionGuard, RejectReason>` acquires both an in-flight count permit and adds est_bytes to the per-actor counter atomically; returns RejectReason on either gate. - `try_admit_rewrite() -> Result<RewriteGuard, RejectReason>` for the global rewrite pool (Step F maps RewriteGuard exhaustion to HTTP 503). - `RejectReason::{InFlightCountExceeded, ByteBudgetExceeded, GlobalRewriteExhausted}`. Race-free admission via `tokio::sync::Semaphore::try_acquire_owned()` for the count gate (master plan Finding 6: independent atomic load+check+add lets two callers both pass a cap-N check; the Semaphore gate is atomic). Bytes use `fetch_add` + decrement-on-rejection so the cap is never exceeded even on rollback. Defaults (override via env): - OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=16 - OMNIGRAPH_PER_ACTOR_BYTES_MAX=4_294_967_296 (4 GiB) - OMNIGRAPH_GLOBAL_REWRITE_MAX=4 Tests cover under-cap admission, byte-budget rollback, per-actor isolation, global rewrite cap, and the load-bearing 32-concurrent-vs- cap-16 race test (forces real contention via a broadcast release channel so guards can't recycle permits task-by-task; pins the master plan's race-free invariant). Adds workspace dep `dashmap = "6"` for per-actor state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 16:59:45 +02:00
Ragnor Comerford	35be20cb05	MR-771: demote Run to direct-publish via expected_table_versions CAS mutate_as and load now write directly to target tables and call the publisher once at the end with per-table expected versions; the Run state machine, _graph_runs.lance writers, __run__ staging branches, and server /runs/* endpoints are removed. Multi-statement mutations remain atomic at the manifest level via an in-memory MutationStaging accumulator that gives read-your-writes within a query and a single publish at the end. Concurrent-writer conflicts surface as ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the old DivergentUpdate merge shape. Documents one known limitation in docs/runs.md: a multi-statement mid-query failure where op-N writes a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the manifest until a follow-up introduces per-table Lance branches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 08:52:50 +02:00
Andrew Altshuler	7310f69928	Revert "Merge pull request #49 from ModernRelay/ragnorc/x-request-id" (#54 ) This reverts commit `b352fca13c`, reversing changes made to `748ad334a9`.	2026-04-26 15:56:29 +03:00
Ragnor Comerford	b352fca13c	Merge pull request #49 from ModernRelay/ragnorc/x-request-id Add X-Request-Id middleware	2026-04-26 12:33:33 +02:00
Ragnor Comerford	e14b203208	Reuse X_REQUEST_ID constant for inbound header lookup Both Cursor Bugbot and Cubic flagged that the inbound `headers().get(...)` call constructed `HeaderName::from_static("x-request-id")` inline instead of reusing the `X_REQUEST_ID` constant defined at the top of the file. The two were already kept in sync by both being `from_static("x-request-id")`, but a future rename would have to touch both sites or risk silent drift between read and write. Also drops the now-unused `header` module import. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-26 12:05:19 +02:00
Ragnor Comerford	284c9377c2	Add X-Request-Id middleware Per-request ULID minted at the edge, exposed in request extensions and on the response header. Caller-supplied X-Request-Id is echoed when well-formed (1..=128 ASCII printable characters); otherwise rejected and replaced with a fresh ULID so the value is always safe to log. Companion to the TypeScript SDK redesign — clients now correlate logs across the wire by reading X-Request-Id from response headers (and the SDK already surfaces it on every OmnigraphError as `requestId`). No spec change required; the header is a transport-layer concern. Tests: - mint a ULID when no header is provided - echo a valid caller-supplied id - reject overlong header (200 chars), mint a fresh ULID Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 22:56:17 +02:00
Ragnor Comerford	7809bf607e	Polish OpenAPI spec for SDK generation Add operation descriptions and examples to utoipa annotations so the generated TypeScript SDK has rich JSDoc, and so future Python/Go SDKs and any /openapi.json docs UI benefit from the same effort. - Doc comments on all 18 handlers (utoipa picks up summary/description) - #[schema(example = ...)] on free-text fields (query_source, schema_source, NDJSON data) and i64 timestamps - Destructive/irreversible warnings on change, applySchema, ingest, mergeBranches, deleteBranch, publishRun, abortRun Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 16:36:51 +02:00
Ragnor Comerford	9de2079263	Merge remote-tracking branch 'origin/main' into ragnorc/explore-api # Conflicts: # CONTRIBUTING.md	2026-04-18 20:24:39 +02:00
andrew	7a3bf5c758	Add aws feature + SecretsManagerTokenSource backend Introduces an opt-in AWS Secrets Manager backend for bearer tokens, behind the `aws` Cargo feature. Default builds (on-prem, local dev) don't pull in the AWS SDK and don't pay its compile cost. - New Cargo feature `aws` gates the `aws-config` + `aws-sdk-secretsmanager` optional deps. Default features remain empty. - New `auth::aws::SecretsManagerTokenSource` implements `TokenSource` by fetching a JSON `{"actor_id": "token", ...}` payload from a named Secrets Manager secret. Credentials resolve via the AWS default chain (env, shared config, IMDSv2 instance role, ECS task role) so no explicit plumbing is needed under an IAM role. - New `resolve_token_source()` dispatches based on the `OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` env var. If the var is set but the binary was built without `--features aws`, returns a clear rebuild instruction rather than silently falling back. - `serve()` now uses `resolve_token_source()` and logs which source was selected at startup. - `parse_json_secret_payload()` is factored out as a free function so the payload validation (trim whitespace, reject blank actor/token, reject non-object) is unit-testable without the AWS SDK. - New CI job `test_aws_feature` builds + tests with `--features aws`. Not in this PR (follow-ups): - Background refresh loop for rotation. `SecretsManagerTokenSource` advertises `supports_refresh: true` but the AppState-level refresh task isn't wired yet. - Config-YAML dispatch (today the AWS source is selected via env var only; eventually `server.bearer_tokens.source` in `omnigraph.yaml`). Tests: - Default-feature build: 33 lib + 41 integration + 64 openapi. - `--features aws` build: 32 lib (one test is cfg-gated) + 41 + 64. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 03:48:51 +03:00
andrew	af41630520	Extract TokenSource trait for bearer token loading Pure refactor. No behavior change. Introduces a TokenSource trait so additional backends (AWS Secrets Manager, Vault, etc.) can plug in behind feature flags without touching the server wiring. - New module crates/omnigraph-server/src/auth.rs with the TokenSource trait and a single EnvOrFileTokenSource implementation that delegates to the existing server_bearer_tokens_from_env() function. - serve() now constructs EnvOrFileTokenSource and calls load() instead of calling the free function directly. - The trait has a supports_refresh() hook (false for env/file) for future implementations that can rotate without restart. - async-trait added to omnigraph-server deps; it's already in the workspace. Tests: - Unit tests in auth.rs covering load paths and the default supports_refresh / name values. - Existing 128 tests (lib + integration + openapi) pass unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 03:31:43 +03:00
andrew	c338e80180	Harden bearer auth: constant-time compare, hashed at rest, authoritative actor_id Fixes two live authz bugs in omnigraph-server: - Bearer-token lookup previously used HashMap::get, which compares keys with Eq and short-circuits on the first differing byte — a network-observable timing oracle for brute-forcing tokens. Tokens are now stored as SHA-256 digests and compared with subtle::ConstantTimeEq, iterating every entry unconditionally so total work is independent of which slot matches. Raw token bytes no longer live in server memory after startup. - authorize_request now overwrites PolicyRequest.actor_id from the authenticated session instead of trusting the handler-supplied field, which previously defaulted to "" via unwrap_or_default(). The empty string can no longer reach Cedar as a policy subject even if a future refactor drops the None check. External API of AppState constructors is unchanged — tokens still enter as Vec<(String, String)> and are hashed on the way in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 01:41:02 +03:00
andrew	be520f31f4	Polish schema endpoint: rename show, align field name, add tests Review feedback on #23, applied on top of the original commit: - Rename the CLI subcommand from `schema get` to `schema show` to match the existing `run show` / `commit show` convention. A `#[command(alias = "get")]` preserves muscle memory for anyone who already typed `get`. - Rename `SchemaGetOutput` → `SchemaOutput` and its field `source` → `schema_source`, so the get response and the apply request use the same field name for the same concept. - Use `println!` instead of `print!` in the CLI so the shell prompt doesn't land on the last line of schema output. - Add three integration tests on `/schema`: happy path (no auth), 401 when bearer is required but missing, 403 when the policy grants the actor branch_create but not read. Follow-ups left for a separate PR: include `schema_ir_hash` and `schema_identity_version` in the response payload so clients can do drift detection and the server can set an ETag; and a fast-path local read that skips `Omnigraph::open()` when only the schema source is needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 00:30:46 +03:00
Ragnor Comerford	228032a4ac	Add static OpenAPI spec and Stainless SDK config Introduce SDK generation scaffolding: commit a static openapi.json extracted from the Utoipa annotations via a golden-file test, add Stainless workspace/config for TypeScript and Python SDKs, and clean up operation IDs for ergonomic generated method names. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 14:26:31 +02:00
Claude	0c4df674fa	Add schema get command to CLI and HTTP API Exposes the existing schema_source() method via a new `omnigraph schema get` CLI subcommand and a `GET /schema` API endpoint, allowing users to retrieve the current accepted schema from any graph repository. https://claude.ai/code/session_01UYybeBQks3fz3RJrTHtwQw	2026-04-16 21:15:17 +00:00
andrew	1a26e2e654	Rename config targets to graphs	2026-04-14 04:12:14 +03:00
Claude	4c07d3c095	Make /openapi.json reflect runtime auth configuration The served OpenAPI spec now matches runtime behavior: when no bearer tokens or policy are configured (open mode), the spec omits security schemes and per-operation security requirements. When auth is active, the full bearer_token security metadata is included. Also fixes SecurityAddon to initialize components if absent, and removes the redundant utoipa dev-dependency. Adds 5 new tests covering open-mode vs auth-mode spec serving. https://claude.ai/code/session_01NfoPVx21rZUQned1f7WpXY	2026-04-12 11:04:13 +00:00
Claude	859ec9faa8	Add OpenAPI spec generation via utoipa with /openapi.json endpoint Integrate utoipa 5 to auto-generate an OpenAPI 3.1 spec from the existing Axum handlers and serde types. All 16 endpoints are annotated with path metadata, request/response schemas, security requirements, and tags. A public /openapi.json endpoint serves the spec without requiring auth. Includes 59 tests covering path completeness, HTTP methods, schema fields, enum variants, security scheme, path/query parameters, request bodies, response references, and endpoint integration. https://claude.ai/code/session_01NfoPVx21rZUQned1f7WpXY	2026-04-12 11:03:23 +00:00
andrew	92fa3189f7	Add schema apply command and policy support	2026-04-12 04:01:14 +03:00
andrew	4b058b9813	Fix CLI ergonomics and stream export output	2026-04-11 19:01:48 +03:00
andrew	338289656a	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00

35 commits