omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-12 01:45:14 +02:00

Author	SHA1	Message	Date
Devin AI	e44a4704eb	docs: fix admission gating description	2026-05-10 14:16:26 +00:00
Devin AI	6a3f0677ae	server: drop unwired try_admit_rewrite / 503 admission surface	2026-05-09 20:58:17 +00:00
Ragnor Comerford	f9a0f31f80	server: drop 503 from OpenAPI on admission-gated endpoints (unreachable) Cursor Bugbot LOW on commit `3ad359d`: try_admit_rewrite is defined and tested but no HTTP handler calls it; the six handler OpenAPI annotations declared status = 503 (added in `8e1a8e7`) but try_admit (the only path handlers invoke) returns 429 only. 503 was unreachable. Fix: remove (status = 503, ...) from the six handler OpenAPI annotations and regenerate openapi.json. Kept as forward-looking infrastructure: try_admit_rewrite, global rewrite semaphore, RejectReason::GlobalRewriteExhausted, ApiError::ServiceUnavailable, the 503 branch in IntoResponse, --global-rewrite-cap, and OMNIGRAPH_GLOBAL_REWRITE_MAX. When a future commit wires try_admit_rewrite into a handler, the 503 OpenAPI annotation lands alongside that wiring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 21:54:24 +02:00
Ragnor Comerford	22d76dbb40	server+bench: AppState::new_with_workload; bench drops set_var, exercises heavy cap Two cubic findings on bench_actor_isolation.rs flagged together: P2 (lib.rs:202): `unsafe { std::env::set_var(...) }` ran inside `#[tokio::main] async fn main()` AFTER the multi-thread tokio runtime was up. Rust 2024 made `set_var` unsafe because libc's `setenv` is not thread-safe; concurrent env reads from logging or runtime internals can race or read torn state. Fix (correct by design, AGENTS.md rule 9): add a public `AppState::new_with_workload(uri, db, bearer_tokens, workload)` constructor that takes a caller-built `WorkloadController`. Tests and benches override per-actor caps via the constructor instead of mutating global env. Closes the bug class "tests need to mutate global env to override AppState defaults." P2 (lib.rs:130): heavy actor's `oneshot.await` inside the loop serialized — heavy in-flight count was always 1, so cap=1 never tripped on the heavy side. The bench validated isolation (light p99 bounded) but didn't demonstrate the rejection path. Fix: add a `--heavy-concurrency` arg (default 4) and spawn batches as concurrent tokio tasks bounded by an internal semaphore. With heavy_concurrency=4 and inflight_cap=1, the bench now reports heavy_too_many_requests > 0 and heavy_ok == 1 at peak — proving the gate fires for the heavy actor. Sample run on local FS (4 light actors × 30 ops, 20 heavy batches × 50 rows, heavy_concurrency=4, cap=1): heavy_ok: 1 heavy_too_many_requests: 19 light_ok: 120 light_too_many_requests: 0 light_p99: 565 ms (target < 2 s) Heavy saturates its own cap; light actors are completely unaffected. The isolation property is now empirically proven by the rejection counts rather than just by the latency tail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 17:57:42 +02:00
Ragnor Comerford	8e1a8e7d55	server: document 429 / 503 in admission-gated endpoint OpenAPI responses Closes the cubic finding (P2) at lib.rs:1061: the new admission gates add HTTP 429 / 503 failure paths but the affected endpoint `#[utoipa::path(... responses(...) ...)]` annotations weren't updated. Also closes a pre-existing miss on /change (admission-gated since PR 2 Step F). Adds (status = 429, ...) and (status = 503, ...) to all six admission-gated endpoints: - POST /change (operation_id = "change") - POST /schema/apply (operation_id = "applySchema") - POST /ingest (operation_id = "ingest") - POST /branches (operation_id = "createBranch") - DELETE /branches/{branch} (operation_id = "deleteBranch") - POST /branches/merge (operation_id = "mergeBranches") The descriptions reference the `Retry-After` header, which the `IntoResponse for ApiError` impl emits on both codes (added in commit `c745dd6`). openapi.json regenerated via OMNIGRAPH_UPDATE_OPENAPI=1; the openapi sentinel test passes both with the regen flag and in strict-check mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 17:49:02 +02:00
Ragnor Comerford	c745dd69ae	server: emit Retry-After header on 429 / 503 responses Closes the doc-vs-code gap at api.rs:343 and lib.rs:344-355: the documentation claims `Retry-After` is set on TooManyRequests / ServiceUnavailable responses, but `IntoResponse for ApiError` emitted only `(StatusCode, Json(ErrorOutput))` — no header. Wires a constant `RETRY_AFTER_SECONDS = "60"` for both 429 and 503 codes. Plumbing per-RejectReason durations through is a follow-up; the admission rejects we surface today recover bounded by request handler duration rather than calendar wait, so a constant suffices. Pinned by `ingest_per_actor_admission_cap_returns_429`. Test now fully green: 1+ of 8 concurrent /ingest under cap=1 receives 429 with Retry-After: 60. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:58:47 +02:00
Ragnor Comerford	05a8bd5de1	server: gate /ingest /branches/* /schema/apply on per-actor admission Closes the gap that admission control only fired on /change. A heavy actor sending bulk-ingest traffic could exhaust shared engine capacity (Lance I/O threads, manifest churn) without hitting the per-actor cap. Wires `state.workload.try_admit(&actor_arc, est_bytes)` into the five remaining mutating handlers AFTER Cedar authorization (so denied requests don't consume admission slots) and BEFORE the engine call. Byte estimates per handler: - /ingest: request.data.len() (NDJSON body) - /schema/apply: request.schema_source.len() - /branches/create, /branches/delete, /branches/merge: 256 (small JSON; the heavy work is bounded per-(table, branch) by the engine's writer queue rather than by request size) The admission guard is held in `let _admission = ...` so it stays alive until handler return, releasing the count permit + decrementing the byte budget on drop. Pinned by `ingest_per_actor_admission_cap_returns_429` (previous commit). The test still fails on the Retry-After header assertion; the next commit emits the header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:57:53 +02:00
Ragnor Comerford	c15962e6b0	server: flip AppState to Arc<Omnigraph>, wire admission on /change (PR 2 Step F) The substantive PR 2 change. Removes the global server `RwLock<Omnigraph>` that has serialized every mutating request across all actors. Disjoint `(table, branch)` writes from different actors now run concurrently, guarded only by the engine's per-(table, branch) write queue (PR 1b) and per-actor admission control (PR 2 Step E). AppState changes: - `db: Arc<RwLock<Omnigraph>>` -> `engine: Arc<Omnigraph>` - New field: `workload: Arc<workload::WorkloadController>` initialized from env (`OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=16`, `OMNIGRAPH_PER_ACTOR_BYTES_MAX=4GiB`, `OMNIGRAPH_GLOBAL_REWRITE_MAX=4`). - `tokio::sync::RwLock` import dropped. Handler updates (16 sites): - All `Arc::clone(&state.db).read_owned().await` and `write_owned()` calls replaced with `let db = &state.engine`. Engine APIs are now `&self` (Step C) so this works directly. - `/export` clones `Arc<Omnigraph>` once and moves into the spawned task instead of acquiring a long-held read lock. - `/change` handler additionally wires `state.workload.try_admit(&actor_arc, est_bytes)`. Cedar runs FIRST so denied requests don't consume admission slots; admission runs SECOND before the engine call. `est_bytes` uses the request body size as a coarse proxy. API surface additions (`api::ErrorCode`): - `TooManyRequests` -> HTTP 429 (per-actor cap exceeded; respect `Retry-After`) - `ServiceUnavailable` -> HTTP 503 (global rewrite pool exhausted) `ApiError` constructors `too_many_requests` / `service_unavailable` and `from_workload_reject` (maps `RejectReason` variants to HTTP status). Other mutating handlers (`/ingest`, `/branches/*`, `/branches/merge`, `/schema/apply`) currently flow through the Arc<Omnigraph> path without admission gates; wiring those is mechanical and lands as a follow-up. The /change hot path covers the bulk of MR-686's load profile. OpenAPI regenerated to include the new ErrorCode variants. 102 lib + 39 server tests + 5 workload tests pass. The regression sentinel `change_conflict_returns_manifest_conflict_409` continues to pass (revalidation perf opt + per-table queue + publisher CAS preserve manifest_conflict semantics under concurrent writers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 17:08:26 +02:00
Ragnor Comerford	17a1665002	server: add WorkloadController for per-actor admission (PR 2 Step E) PR 2 removes the global server `RwLock<Omnigraph>` (Step F). Without admission control, one heavy actor would exhaust shared capacity (Lance I/O threads, manifest churn, network) and starve other actors. The WorkloadController bounds per-actor in-flight count + bytes and provides a global rewrite-pool semaphore for compaction / index builds. New file: `crates/omnigraph-server/src/workload.rs` (~250 LOC + 5 tests). API: - `WorkloadController::new(inflight_cap, byte_cap, rewrite_cap)` / `from_env()` / `with_defaults()`. - `try_admit(actor_id, est_bytes) -> Result<AdmissionGuard, RejectReason>` acquires both an in-flight count permit and adds est_bytes to the per-actor counter atomically; returns RejectReason on either gate. - `try_admit_rewrite() -> Result<RewriteGuard, RejectReason>` for the global rewrite pool (Step F maps RewriteGuard exhaustion to HTTP 503). - `RejectReason::{InFlightCountExceeded, ByteBudgetExceeded, GlobalRewriteExhausted}`. Race-free admission via `tokio::sync::Semaphore::try_acquire_owned()` for the count gate (master plan Finding 6: independent atomic load+check+add lets two callers both pass a cap-N check; the Semaphore gate is atomic). Bytes use `fetch_add` + decrement-on-rejection so the cap is never exceeded even on rollback. Defaults (override via env): - OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=16 - OMNIGRAPH_PER_ACTOR_BYTES_MAX=4_294_967_296 (4 GiB) - OMNIGRAPH_GLOBAL_REWRITE_MAX=4 Tests cover under-cap admission, byte-budget rollback, per-actor isolation, global rewrite cap, and the load-bearing 32-concurrent-vs- cap-16 race test (forces real contention via a broadcast release channel so guards can't recycle permits task-by-task; pins the master plan's race-free invariant). Adds workspace dep `dashmap = "6"` for per-actor state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 16:59:45 +02:00
Ragnor Comerford	35be20cb05	MR-771: demote Run to direct-publish via expected_table_versions CAS mutate_as and load now write directly to target tables and call the publisher once at the end with per-table expected versions; the Run state machine, _graph_runs.lance writers, __run__ staging branches, and server /runs/* endpoints are removed. Multi-statement mutations remain atomic at the manifest level via an in-memory MutationStaging accumulator that gives read-your-writes within a query and a single publish at the end. Concurrent-writer conflicts surface as ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the old DivergentUpdate merge shape. Documents one known limitation in docs/runs.md: a multi-statement mid-query failure where op-N writes a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the manifest until a follow-up introduces per-table Lance branches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 08:52:50 +02:00
Andrew Altshuler	7310f69928	Revert "Merge pull request #49 from ModernRelay/ragnorc/x-request-id" (#54 ) This reverts commit `b352fca13c`, reversing changes made to `748ad334a9`.	2026-04-26 15:56:29 +03:00
Ragnor Comerford	b352fca13c	Merge pull request #49 from ModernRelay/ragnorc/x-request-id Add X-Request-Id middleware	2026-04-26 12:33:33 +02:00
Ragnor Comerford	e14b203208	Reuse X_REQUEST_ID constant for inbound header lookup Both Cursor Bugbot and Cubic flagged that the inbound `headers().get(...)` call constructed `HeaderName::from_static("x-request-id")` inline instead of reusing the `X_REQUEST_ID` constant defined at the top of the file. The two were already kept in sync by both being `from_static("x-request-id")`, but a future rename would have to touch both sites or risk silent drift between read and write. Also drops the now-unused `header` module import. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-26 12:05:19 +02:00
Ragnor Comerford	284c9377c2	Add X-Request-Id middleware Per-request ULID minted at the edge, exposed in request extensions and on the response header. Caller-supplied X-Request-Id is echoed when well-formed (1..=128 ASCII printable characters); otherwise rejected and replaced with a fresh ULID so the value is always safe to log. Companion to the TypeScript SDK redesign — clients now correlate logs across the wire by reading X-Request-Id from response headers (and the SDK already surfaces it on every OmnigraphError as `requestId`). No spec change required; the header is a transport-layer concern. Tests: - mint a ULID when no header is provided - echo a valid caller-supplied id - reject overlong header (200 chars), mint a fresh ULID Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 22:56:17 +02:00
Ragnor Comerford	7809bf607e	Polish OpenAPI spec for SDK generation Add operation descriptions and examples to utoipa annotations so the generated TypeScript SDK has rich JSDoc, and so future Python/Go SDKs and any /openapi.json docs UI benefit from the same effort. - Doc comments on all 18 handlers (utoipa picks up summary/description) - #[schema(example = ...)] on free-text fields (query_source, schema_source, NDJSON data) and i64 timestamps - Destructive/irreversible warnings on change, applySchema, ingest, mergeBranches, deleteBranch, publishRun, abortRun Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 16:36:51 +02:00
Ragnor Comerford	9de2079263	Merge remote-tracking branch 'origin/main' into ragnorc/explore-api # Conflicts: # CONTRIBUTING.md	2026-04-18 20:24:39 +02:00
andrew	7a3bf5c758	Add aws feature + SecretsManagerTokenSource backend Introduces an opt-in AWS Secrets Manager backend for bearer tokens, behind the `aws` Cargo feature. Default builds (on-prem, local dev) don't pull in the AWS SDK and don't pay its compile cost. - New Cargo feature `aws` gates the `aws-config` + `aws-sdk-secretsmanager` optional deps. Default features remain empty. - New `auth::aws::SecretsManagerTokenSource` implements `TokenSource` by fetching a JSON `{"actor_id": "token", ...}` payload from a named Secrets Manager secret. Credentials resolve via the AWS default chain (env, shared config, IMDSv2 instance role, ECS task role) so no explicit plumbing is needed under an IAM role. - New `resolve_token_source()` dispatches based on the `OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` env var. If the var is set but the binary was built without `--features aws`, returns a clear rebuild instruction rather than silently falling back. - `serve()` now uses `resolve_token_source()` and logs which source was selected at startup. - `parse_json_secret_payload()` is factored out as a free function so the payload validation (trim whitespace, reject blank actor/token, reject non-object) is unit-testable without the AWS SDK. - New CI job `test_aws_feature` builds + tests with `--features aws`. Not in this PR (follow-ups): - Background refresh loop for rotation. `SecretsManagerTokenSource` advertises `supports_refresh: true` but the AppState-level refresh task isn't wired yet. - Config-YAML dispatch (today the AWS source is selected via env var only; eventually `server.bearer_tokens.source` in `omnigraph.yaml`). Tests: - Default-feature build: 33 lib + 41 integration + 64 openapi. - `--features aws` build: 32 lib (one test is cfg-gated) + 41 + 64. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 03:48:51 +03:00
andrew	af41630520	Extract TokenSource trait for bearer token loading Pure refactor. No behavior change. Introduces a TokenSource trait so additional backends (AWS Secrets Manager, Vault, etc.) can plug in behind feature flags without touching the server wiring. - New module crates/omnigraph-server/src/auth.rs with the TokenSource trait and a single EnvOrFileTokenSource implementation that delegates to the existing server_bearer_tokens_from_env() function. - serve() now constructs EnvOrFileTokenSource and calls load() instead of calling the free function directly. - The trait has a supports_refresh() hook (false for env/file) for future implementations that can rotate without restart. - async-trait added to omnigraph-server deps; it's already in the workspace. Tests: - Unit tests in auth.rs covering load paths and the default supports_refresh / name values. - Existing 128 tests (lib + integration + openapi) pass unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 03:31:43 +03:00
andrew	c338e80180	Harden bearer auth: constant-time compare, hashed at rest, authoritative actor_id Fixes two live authz bugs in omnigraph-server: - Bearer-token lookup previously used HashMap::get, which compares keys with Eq and short-circuits on the first differing byte — a network-observable timing oracle for brute-forcing tokens. Tokens are now stored as SHA-256 digests and compared with subtle::ConstantTimeEq, iterating every entry unconditionally so total work is independent of which slot matches. Raw token bytes no longer live in server memory after startup. - authorize_request now overwrites PolicyRequest.actor_id from the authenticated session instead of trusting the handler-supplied field, which previously defaulted to "" via unwrap_or_default(). The empty string can no longer reach Cedar as a policy subject even if a future refactor drops the None check. External API of AppState constructors is unchanged — tokens still enter as Vec<(String, String)> and are hashed on the way in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 01:41:02 +03:00
andrew	be520f31f4	Polish schema endpoint: rename show, align field name, add tests Review feedback on #23, applied on top of the original commit: - Rename the CLI subcommand from `schema get` to `schema show` to match the existing `run show` / `commit show` convention. A `#[command(alias = "get")]` preserves muscle memory for anyone who already typed `get`. - Rename `SchemaGetOutput` → `SchemaOutput` and its field `source` → `schema_source`, so the get response and the apply request use the same field name for the same concept. - Use `println!` instead of `print!` in the CLI so the shell prompt doesn't land on the last line of schema output. - Add three integration tests on `/schema`: happy path (no auth), 401 when bearer is required but missing, 403 when the policy grants the actor branch_create but not read. Follow-ups left for a separate PR: include `schema_ir_hash` and `schema_identity_version` in the response payload so clients can do drift detection and the server can set an ETag; and a fast-path local read that skips `Omnigraph::open()` when only the schema source is needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 00:30:46 +03:00
Ragnor Comerford	228032a4ac	Add static OpenAPI spec and Stainless SDK config Introduce SDK generation scaffolding: commit a static openapi.json extracted from the Utoipa annotations via a golden-file test, add Stainless workspace/config for TypeScript and Python SDKs, and clean up operation IDs for ergonomic generated method names. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 14:26:31 +02:00
Claude	0c4df674fa	Add schema get command to CLI and HTTP API Exposes the existing schema_source() method via a new `omnigraph schema get` CLI subcommand and a `GET /schema` API endpoint, allowing users to retrieve the current accepted schema from any graph repository. https://claude.ai/code/session_01UYybeBQks3fz3RJrTHtwQw	2026-04-16 21:15:17 +00:00
andrew	1a26e2e654	Rename config targets to graphs	2026-04-14 04:12:14 +03:00
Claude	4c07d3c095	Make /openapi.json reflect runtime auth configuration The served OpenAPI spec now matches runtime behavior: when no bearer tokens or policy are configured (open mode), the spec omits security schemes and per-operation security requirements. When auth is active, the full bearer_token security metadata is included. Also fixes SecurityAddon to initialize components if absent, and removes the redundant utoipa dev-dependency. Adds 5 new tests covering open-mode vs auth-mode spec serving. https://claude.ai/code/session_01NfoPVx21rZUQned1f7WpXY	2026-04-12 11:04:13 +00:00
Claude	859ec9faa8	Add OpenAPI spec generation via utoipa with /openapi.json endpoint Integrate utoipa 5 to auto-generate an OpenAPI 3.1 spec from the existing Axum handlers and serde types. All 16 endpoints are annotated with path metadata, request/response schemas, security requirements, and tags. A public /openapi.json endpoint serves the spec without requiring auth. Includes 59 tests covering path completeness, HTTP methods, schema fields, enum variants, security scheme, path/query parameters, request bodies, response references, and endpoint integration. https://claude.ai/code/session_01NfoPVx21rZUQned1f7WpXY	2026-04-12 11:03:23 +00:00
andrew	92fa3189f7	Add schema apply command and policy support	2026-04-12 04:01:14 +03:00
andrew	4b058b9813	Fix CLI ergonomics and stream export output	2026-04-11 19:01:48 +03:00
andrew	338289656a	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00

28 commits