omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-09 01:35:18 +02:00

Author	SHA1	Message	Date
Devin AI	6a3f0677ae	server: drop unwired try_admit_rewrite / 503 admission surface	2026-05-09 20:58:17 +00:00
Devin AI	4bb7964af9	tests: matrix cell k asserts post-reopen row count	2026-05-09 20:16:44 +00:00
Devin AI	708e170dc5	engine: branch-merge revalidates target snapshot under queue	2026-05-09 20:16:12 +00:00
Ragnor Comerford	3ad359db8b	tests: admission test uses new_with_workload, drops env mutation + #[serial] Migrates `ingest_per_actor_admission_cap_returns_429` from env-var override to direct `WorkloadController::new(1, ...)` construction via `AppState::new_with_workload`. Removes the `EnvGuard` and the `#[serial]` annotation that paired with it. Why correct by design (AGENTS.md rule 9): the previous round's matrix fix (commit `8bd9a5f`) shielded the matrix from this test's env mutation, but the broader bug class — "test A's process-wide env mutation can leak into any test B that calls `AppState::open` / `WorkloadController::from_env()`" — was still reachable by any future test that didn't think to opt out. Closing the class at the source: this test no longer mutates global state at all, so no other test needs to defend against it. Net effect: - This test no longer needs `#[serial]` (was the only reason it was marked) — runs in parallel with the rest of the suite. - The matrix's defensive `with_defaults()` construction (commit `8bd9a5f`) remains correct but is no longer required for correctness; it's now a "belt and suspenders" guard against any FUTURE env-mutating test. Verified locally: both tests pass when run together; full server suite (44 tests) green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 20:35:41 +02:00
Ragnor Comerford	8bd9a5ff14	tests: matrix harness uses with_defaults() workload, not from_env() Round 4 CI failure: Test Workspace and server-aws both red on `concurrent_branch_ops_morphological_matrix` cell b ("merge × merge: same-target-distinct-sources") — second merge returned 429 instead of 200. The matrix passes locally. Root cause: cargo test runs tests in parallel by default. The admission test `ingest_per_actor_admission_cap_returns_429` is wrapped with `#[serial]` and an EnvGuard that sets `OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=1` for its duration. Process-wide env vars are visible to concurrently-running tests; the matrix's `Harness::new()` called `AppState::open()` which delegates to `WorkloadController::from_env()`, picking up cap=1 if it ran while the admission test held the EnvGuard. With cap=1 + 2 concurrent merges in cell b, one merge waits behind merge_exclusive while the other is admitted; the waiter holds its admission permit, but a fresh actor permit is needed when admission is per-actor — the second merge's permit acquisition fails because the first hasn't released yet, and 429 fires. Fix (correct by design, AGENTS.md rule 9): the matrix harness builds the WorkloadController explicitly via `WorkloadController::with_defaults()` and passes it to `AppState::new_with_workload`, the constructor added in commit `22d76db`. Closes the bug class "tests pick up another concurrent test's env override at construction time" — the matrix is now insulated from any env-var manipulation in the rest of the test suite. Verified locally: with `OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=1` set in the environment, the matrix passes (it ignores env entirely now). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 20:19:42 +02:00
Ragnor Comerford	99b0941478	tests: remove three narrow concurrent_branch_* tests subsumed by T1 The previous commit added `concurrent_branch_ops_morphological_matrix` covering 11 cells with stronger assertions (identity + post-op /change + reopen). The three narrow tests it replaces: - concurrent_branch_create_from_distinct_parents_does_not_corrupt_coordinator → matrix cell f, with identity assertions added - concurrent_branch_merges_distinct_targets_do_not_swap_into_each_other → matrix cells a + b + c, with identity assertions that close the symmetric-swap blind spot cubic flagged on commit `64f2b99` - concurrent_change_during_branch_merge_preserves_writes → matrix cell d The matrix retains the original tests' diagnostic granularity through named cell labels in every assertion message ("[a:merge×merge:distinct-targets] merge a"), so a CI failure points to the exact cell + invariant. Net: 522 lines removed, 0 coverage lost. All other server tests pass unchanged (44 total). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 20:09:21 +02:00
Ragnor Comerford	ac8594462e	tests: branch-ops morphological matrix (T1) Replaces three narrow concurrent_branch_* tests (folded in below) with one parameterized matrix test covering 11 representative (op_a, op_b, target_overlap) cells, asserting C1-C6 uniformly: C1 — both complete (no deadlock; tokio::time::timeout(15s)) C2 — status: both 200 or exactly one clean conflict; never 500 C3 — per-target row count C4 — per-target row identity (named persons present + absent — catches the symmetric-swap class that count assertions miss; cubic P2 on commit `64f2b99` flagged this gap on the round-3 merge race test) C5 — engine state coherent (subsequent /snapshot consistent) C6 — post-op /change on main succeeds (engine isn't poisoned) Cells: a. Merge × Merge, distinct targets — branch_merge_impl race pin b. Merge × Merge, same target / distinct sources — merge_exclusive serialization c. Merge × Merge, same source / distinct targets — fanout d. Merge × Change, into target — per-(table, branch) queue e. Merge × BranchCreateFrom, target — interaction with refresh path f. BranchCreateFrom × BranchCreateFrom, distinct parents — round-1 race pin g. BranchCreateFrom × BranchDelete, unrelated branches — disjoint state h. BranchDelete × BranchDelete, distinct branches — concurrent refresh i. BranchDelete × Change, distinct branch — refresh-side vs writer j. BranchCreateFrom × Change, on source — fork-while-writing k. Reopen consistency after concurrent pair — disk-vs-cache drift Each cell: - spins up its own tempdir + AppState so failures don't cascade, - aligns the pair at a tokio::sync::Barrier so both reach the engine close in time, - wraps in a 15s deadlock timeout, - asserts identity via a /read with the `get_person` fixture query (specific names must be present on the right branch and absent from the wrong one). Subsumes: - concurrent_branch_create_from_distinct_parents_does_not_corrupt_coordinator (now cell f, with identity assertions added) - concurrent_branch_merges_distinct_targets_do_not_swap_into_each_other (now cells a + b + c, with identity assertions; the symmetric-swap blind spot cubic flagged on commit `64f2b99` is closed) - concurrent_change_during_branch_merge_preserves_writes (now cell d) Those three narrow tests are removed in the next commit so this lands green standalone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 20:07:37 +02:00
Ragnor Comerford	2b2e723125	tests: pin branch_merge swap-restore race (red) Per AGENTS.md rule 8, this commit lands the failing regression test ahead of the fix. Cursor Bugbot HIGH on commit `22d76db` rediscovered the residual flagged in the round 1 honest-review note: `branch_merge_impl` at `crates/omnigraph/src/exec/merge.rs:1085-1100` still uses the swap_coordinator_for_branch + operate + restore_coordinator pattern across three separate `coordinator.write().await` acquisitions. The same shape that branch_create_from_impl shed in commit `4ffbf6e`. The test spawns two concurrent /branches/merge calls A (feature-a → target-a) and B (feature-b → target-b) aligned at a tokio::sync::Barrier so both reach swap_coordinator_for_branch close in time. M=4 iterations boost race-catching odds. Currently fails on `22d76db` with target-a=5, target-b=4: B's merge landed on the wrong coord — target-b never got Frank because A's swap pushed self.coordinator to target-a, B's swap captured target-a as B's "previous", and B's restore set self.coordinator back to target-a (not the original main). Subsequent operations using self.coordinator point at the wrong branch. Fix lands in the next commit: serialize concurrent branch merges via `merge_exclusive: Arc<tokio::sync::Mutex<()>>` held across the entire swap-operate-restore window. Closes the bug class "non-atomic three-step coordinator manipulation" for branch_merge by serializing merges relative to each other; per-(table, branch) queue inside the merge body still lets merges and other writers run concurrently. A deeper "operate on local coord" refactor (the round-1 fix shape for branch_create_from) requires unwinding `branch_merge_on_current_target` and its uses of `self.snapshot()` / `self.ensure_commit_graph_initialized()`; deferred to a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 19:12:03 +02:00
Ragnor Comerford	976aa0ec1d	tests: pin concurrent /change + branch_merge interleave preserves writes Future-proofs against MR-895 work that may move or remove the per-(table, branch) writer queue acquisition inside `branch_merge` (`crates/omnigraph/src/exec/merge.rs:1224`). Today the queue linearizes a concurrent /change on main against a `branch_merge feature → main` on the same touched tables; both succeed and the inserted row is preserved post-merge. Codex flagged this scenario as a P1 in PR #75 review claiming the merge could silently overwrite concurrent target writes because the source-rewrite path opens with `MutationOpKind::Merge` (skipping the strict pre-stage check). Validation showed the queue at merge.rs:1224 is held across both Phase B (per-table commit_staged) and Phase C (manifest publish), so there's no interleave window. The Merge op_kind only affects same-process pre-stage drift detection, not cross-write linearization. The test passes on `f925ad1`; landing it as a regression sentinel catches future changes that drop the queue acquisition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 17:03:05 +02:00
Ragnor Comerford	5520ab72ff	tests: pin disjoint /change concurrency at HTTP level Closes the cubic acceptance-criteria gap (❌ "Integration test: two /change requests targeting different (table_key, branch) execute concurrently end-to-end"). The bench harness measures the throughput side; this test is the regression sentinel that catches a future change which accidentally re-introduces graph-wide serialization on the disjoint path. Spawns 4 concurrent /change inserts on node:Person and 4 on node:Company. All 8 must return 200, and the post-test row counts on each table must reflect every insert. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 17:01:52 +02:00
Ragnor Comerford	0976cbebc5	tests: pin /ingest admission gate + 429 Retry-After (red) Per AGENTS.md rule 8, this commit lands the failing regression test ahead of the fix. Currently fails on `f925ad1` with 8/8 statuses returning 200 because /ingest does not call WorkloadController::try_admit. The test pins: - /ingest is gated on per-actor admission control (returns 429 when the cap is exceeded). - 429 responses carry the structured `code: too_many_requests` error body so clients can distinguish them from generic conflicts. - 429 responses include a `Retry-After` header so clients can implement bounded backoff. The doc claim at api.rs:343 and lib.rs:344 was that this header exists; the IntoResponse impl currently emits no headers. Two follow-up commits will turn this green: 1. Wire WorkloadController::try_admit on /ingest and the four other mutating handlers (Block 2.1). 2. Emit the Retry-After header on 429/503 responses (Block 2.2). The test uses #[serial] + EnvGuard to override OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=1 without racing parallel tests, then spawns 8 concurrent /ingest tasks aligned at a tokio::sync::Barrier so multiple tasks reach try_admit close in time. With cap=1, at least one must be rejected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:57:01 +02:00
Ragnor Comerford	c263732b1a	tests: extend same-key insert test with /snapshot row-count assertion The existing change_concurrent_inserts_same_key_serialize_without_409 test claimed in its comment "asserts the final row count equals N" but only checked HTTP status codes. cubic flagged the gap; this commit adds the actual /snapshot read after the concurrent inserts to verify all N batches landed (no silent overwrite) by comparing the post-test node:Person row_count against SEED + N. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:49:38 +02:00
Ragnor Comerford	3b33e9ac56	tests: pin branch_create_from swap-restore race (red) Per AGENTS.md rule 8, this commit lands the failing regression test ahead of the fix so the red → green pair is visible in git log. The test demonstrates that two concurrent `POST /branches` calls with distinct `from` parents corrupt coordinator state: A's "operate" step runs against B's swapped coordinator instead of its own, forking the new branch off the wrong parent's HEAD. Currently fails on `f925ad1` with all 8 gamma branches (declared parent: alpha, 5 rows) reporting 4 rows — beta's row count. The operate step ran against beta's coord because B's swap interleaved between A's swap and A's operate. Fix lands in the next commit: hold a single `coordinator.write().await` guard across the entire swap-operate-restore sequence in `branch_create_from_impl` so the three steps are atomic relative to other callers. Closes the bug class "non-atomic three-step coordinator manipulation under &self callers" rather than guarding the specific call site — the right architectural seam (single critical section per swap-restore sequence) eliminates the interleave window for branch_create_from and any future swap-restore caller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:44:50 +02:00
Ragnor Comerford	ebf5a5769d	tests: pin UPDATE RYW under in-process concurrency (red) Per AGENTS.md rule 8, this commit lands the failing regression test ahead of the fix so the red → green pair is visible in git log. The test asserts the RYW invariant for in-process concurrent UPDATEs on the same row: exactly one writer commits and N-1 receive 409 manifest_conflict. Currently fails on `f925ad1` with 1 x 200 + 7 x 500: > "storage: Retryable commit conflict for version 6: This Update > transaction was preempted by concurrent transaction Update at > version 6. Please retry." Lance's transaction conflict resolver correctly detects the Update vs Update race, but the error wraps as `OmniError::Lance(<string>)` and the API surfaces it as 500 internal rather than 409 retryable conflict. Users see "internal server error" for what is documented as a retryable conflict path. The fix lands in the next commit: an op-kind-aware drift check at the commit_all entry that returns 409 ExpectedVersionMismatch for tables whose first touch was Update / Delete / SchemaRewrite when the staged dataset version drifts from the manifest pin under the queue. Closes the bug class "Lance internal conflict surfaces as 500 instead of 409" rather than mapping the specific Lance error variant — the right architectural layer (engine boundary, under the queue) catches the drift before commit_staged ever runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:33:53 +02:00
Ragnor Comerford	f925ad1739	mr-686: Phase 2 — op-kind-aware version check + coord Mutex → RwLock Fix A: op-kind-aware ensure_expected_version. Insert/Merge skip the strict pre-stage check; Update/Delete/SchemaRewrite keep it. New MutationOpKind enum threaded through open_for_mutation_on_branch / open_owned_dataset_for_branch_write / reopen_for_mutation and all callers (execute_insert/update/delete_node/delete_edge, branch_merge::publish_rewritten_merge_table, schema_apply, ensure_indices_for_branch, loader Append/Merge/Overwrite). Closes the 77% rejection rate on same-key concurrent inserts. Fix B: coordinator Mutex -> RwLock. Reads parallelize via .read(); writes serialize via .write(). Atomic-commit invariant preserved by the single .write() covering commit_manifest_updates + record_graph_commit. Bench-as-test change_concurrent_inserts_same_key_serialize_without_409 (server.rs:2180) spawns 12 concurrent /change inserts on a single (table, branch); asserts every request returns 200. Was failing pre-Phase-2; passes post-Phase-2. change_conflict_returns_manifest_conflict_409 (cross-process drift sentinel) and branch_merge_conflict_response_includes_structured_conflicts both still pass. Bench (after-pr2-phase2): - single-actor 1x1: 14.9 ops/s, p50 68ms (baseline 12.3, +22%) - disjoint 8x8: 7.04 ops/s, p50 1023ms (baseline 6.24, +13%) - same-key 8x1: 2.62 ops/s, 0 errors (after-pr2: 77% errors) Disjoint stayed at +13% — Fix B's RwLock helped read paths but the publisher's .write() critical section still serializes graph-wide. Splitting GraphCoordinator into per-concern primitives (manifest in ArcSwap, commit_graph in RwLock, atomic-commit serializer) is the deferred next step. 102 lib + 30 branching + 24 runs + 16 staged_writes + 63 end_to_end + 40 server tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 12:42:26 +02:00
Ragnor Comerford	044ed46019	chore: scrub Linear ticket numbers and review-bot mentions from code comments OmniGraph is OSS; internal Linear ticket references and code-review-bot mentions in source-code comments don't help external readers and leak internal tooling. Replace ticket numbers (MR-XXX) with descriptive prose, drop linear.app URLs, and remove inline mentions of Cursor/Bugbot/Cubic/Codex review threads. Scope is limited to source-code comments (`crates/`). Docs under `docs/` keep their MR-XXX references — those are part of the established change-history narrative for in-repo docs and don't require a Linear account to find context for. No behavior changes; no public API changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 22:45:38 +02:00
Ragnor Comerford	35be20cb05	MR-771: demote Run to direct-publish via expected_table_versions CAS mutate_as and load now write directly to target tables and call the publisher once at the end with per-table expected versions; the Run state machine, _graph_runs.lance writers, __run__ staging branches, and server /runs/* endpoints are removed. Multi-statement mutations remain atomic at the manifest level via an in-memory MutationStaging accumulator that gives read-your-writes within a query and a single publish at the end. Concurrent-writer conflicts surface as ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the old DivergentUpdate merge shape. Documents one known limitation in docs/runs.md: a multi-statement mid-query failure where op-N writes a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the manifest until a follow-up introduces per-table Lance branches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 08:52:50 +02:00
Andrew Altshuler	7310f69928	Revert "Merge pull request #49 from ModernRelay/ragnorc/x-request-id" (#54 ) This reverts commit `b352fca13c`, reversing changes made to `748ad334a9`.	2026-04-26 15:56:29 +03:00
Ragnor Comerford	284c9377c2	Add X-Request-Id middleware Per-request ULID minted at the edge, exposed in request extensions and on the response header. Caller-supplied X-Request-Id is echoed when well-formed (1..=128 ASCII printable characters); otherwise rejected and replaced with a fresh ULID so the value is always safe to log. Companion to the TypeScript SDK redesign — clients now correlate logs across the wire by reading X-Request-Id from response headers (and the SDK already surfaces it on every OmnigraphError as `requestId`). No spec change required; the header is a transport-layer concern. Tests: - mint a ULID when no header is provided - echo a valid caller-supplied id - reject overlong header (200 chars), mint a fresh ULID Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 22:56:17 +02:00
andrew	c338e80180	Harden bearer auth: constant-time compare, hashed at rest, authoritative actor_id Fixes two live authz bugs in omnigraph-server: - Bearer-token lookup previously used HashMap::get, which compares keys with Eq and short-circuits on the first differing byte — a network-observable timing oracle for brute-forcing tokens. Tokens are now stored as SHA-256 digests and compared with subtle::ConstantTimeEq, iterating every entry unconditionally so total work is independent of which slot matches. Raw token bytes no longer live in server memory after startup. - authorize_request now overwrites PolicyRequest.actor_id from the authenticated session instead of trusting the handler-supplied field, which previously defaulted to "" via unwrap_or_default(). The empty string can no longer reach Cedar as a policy subject even if a future refactor drops the None check. External API of AppState constructors is unchanged — tokens still enter as Vec<(String, String)> and are hashed on the way in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 01:41:02 +03:00
andrew	be520f31f4	Polish schema endpoint: rename show, align field name, add tests Review feedback on #23, applied on top of the original commit: - Rename the CLI subcommand from `schema get` to `schema show` to match the existing `run show` / `commit show` convention. A `#[command(alias = "get")]` preserves muscle memory for anyone who already typed `get`. - Rename `SchemaGetOutput` → `SchemaOutput` and its field `source` → `schema_source`, so the get response and the apply request use the same field name for the same concept. - Use `println!` instead of `print!` in the CLI so the shell prompt doesn't land on the last line of schema output. - Add three integration tests on `/schema`: happy path (no auth), 401 when bearer is required but missing, 403 when the policy grants the actor branch_create but not read. Follow-ups left for a separate PR: include `schema_ir_hash` and `schema_identity_version` in the response payload so clients can do drift detection and the server can set an ETag; and a fast-path local read that skips `Omnigraph::open()` when only the schema source is needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 00:30:46 +03:00
andrew	92fa3189f7	Add schema apply command and policy support	2026-04-12 04:01:14 +03:00
andrew	338289656a	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00

23 commits