omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-09 01:35:18 +02:00

Author	SHA1	Message	Date
Andrew Altshuler	60eee78465	docs: split user and developer docs (#93 )	2026-05-15 03:45:22 +03:00
Andrew Altshuler	e8d49559c4	branch-protection: allow admin bypass on main (#94 ) Flip enforce_admins from true to false. Repo admins can now merge their own PRs without waiting for code-owner review, by clicking "Merge without waiting for requirements to be met" once CI is green. The action is recorded in the audit log. Non-admins still see full enforcement: code-owner review required, 1 approving review, required status checks must pass. Rationale: as the solo owner of most CODEOWNERS scopes, the author cannot satisfy GitHub's "non-self approver" rule on their own PRs, which made every PR block on a second human. Admin bypass restores the practical workflow while keeping the protection rules as the default for everyone else.	2026-05-15 03:32:12 +03:00
Andrew Altshuler	6bad829ed0	branch-protection: declarative policy + apply script (#89 ) Branch protection on main, declared as code rather than as opaque GitHub UI state. Pairs with the CODEOWNERS chassis (#88): once this PR lands and an admin runs the apply script, every PR to main must satisfy code-owner review and the listed required checks. Components: - .github/branch-protection.json — the policy. Edit this to change required checks, review counts, etc. Includes a _comment field for human readers; the apply script strips it before PUT. - scripts/apply-branch-protection.sh — idempotent apply via `gh api`. Reads back current state for verification. Supports DRY_RUN=1. - docs/branch-protection.md — explains the policy, how to apply, how to change, why declared as code. - AGENTS.md topic-index row. Policy summary: - Required status checks (strict): Classify Changes, Check AGENTS.md Links, Test Workspace, Test omnigraph-server --features aws, CODEOWNERS / drift, CODEOWNERS / noedit. - Required approving reviews: 1, must be a code owner. - Dismiss stale reviews on new commits. - Required linear history (squash or rebase merges only). - No force pushes, no deletions, no admin bypasses. - Required conversation resolution. What's NOT in this PR: - Required signed commits — not yet; maintainers must enroll GPG/SSH signing first or merges will block. - Tag protection for v* tags — separate PR. - Additional required checks (cargo deny, audit, fmt, clippy, CodeQL, schema-lint MR-946) — separate PRs as each lands. - The script is NOT run by CI. Branch-protection changes are admin actions; CI-driven auto-apply would defeat the purpose. Manual invocation is the audit point. How to apply after merge: ./scripts/apply-branch-protection.sh Requires gh-CLI auth with repo-admin permissions. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:38:20 +03:00
Andrew Altshuler	730712b73f	codeowners: yml source of truth + generator + drift CI (#88 ) * codeowners: generator + drift CI + initial roles Source-of-truth approach to CODEOWNERS: yml is hand-edited, CODEOWNERS is generated and CI-enforced. Every role change is a reviewable PR with a permanent in-repo audit trail. No GitHub UI clicks, no shadow state. Initial roles: engineering @aaltshuler owns crates/** + default (.github/, scripts/, Cargo., openapi.json, everything else not docs) docs @aaltshuler @ragnorc owns docs/, README.md, AGENTS.md, CLAUDE.md, SECURITY.md Per GitHub semantics, multiple owners on a CODEOWNERS line means "any one satisfies the review" — for docs, either named member can approve. Strict "N distinct approvers" would need a CI workaround (not wired today; tracked for future hardening). Components: - .github/codeowners-roles.yml — source of truth. Edit this. - .github/scripts/render-codeowners.py — generator (PyYAML; ~100 LoC). - .github/CODEOWNERS — generated. CI rejects hand-edits. - .github/workflows/codeowners.yml — two checks: drift: re-render and assert CODEOWNERS matches. * noedit: reject PRs that edit CODEOWNERS without editing the yml. - docs/codeowners.md — explains the source-of-truth pattern, how to change roles, how to add new roles. - AGENTS.md topic-index row. What's NOT in this PR: - Branch protection on main (separate PR; needs `gh api` call against the org). - Required-reviewer enforcement (depends on branch protection landing). - Required CI status checks (depends on branch protection landing). - Scheduled rotation (the schedule: block in the yml + a weekly workflow). Today's roles are stable; rotation isn't needed yet. - Linear-as-source-of-truth integration (Approach 4 from the design discussion; deferred). Verified: - Generator output is deterministic (idempotent re-runs). - scripts/check-agents-md.sh OK (28 links, 28 docs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * codeowners: fix catch-all ordering (Devin review #88) Devin caught a real bug: GitHub CODEOWNERS uses "last match wins" semantics, but the generator emitted the catch-all `` AFTER specific patterns. Net effect: `` won for every file, silently nullifying the docs role and never routing reviews to @ragnorc. Fix is one-line — emit the default `` line before iterating the specific paths. Also: - Added a regression assertion in the generator: after rendering, the first non-comment line must start with `` if a default is configured. Generator exits non-zero otherwise. Catches the same class of mistake in any future refactor. - Rewrote the yml header comment, which incorrectly stated "keep more-specific paths after broader patterns" (correct for GitHub semantics but the generator was doing the opposite — so the comment read as a description of behavior when it was actually a contradicted intention). Verified by re-rendering: `` is now line 12, `crates/` is line 14, `docs/` is line 15, etc. README.md matches both `` and `README.md`; `README.md` is later → wins → @aaltshuler + @ragnorc both assigned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:26:06 +03:00
Andrew Altshuler	c142dafdf3	schema-lint chassis v0: code-tagged diagnostics (MR-694) (#87 ) First slice of the schema-lint chassis. Adds stable `OG-XXX-NNN` codes to schema-migration rejections so operators can suppress, look up, and filter on identifiers rather than free-text prose. Atlas-style chassis adapted to omnigraph's typed-IR substrate (no SQL injection vector, no per-engine locks, native edge/vector/embedding types). What's in v0: - New `omnigraph-compiler/src/lint/` module with: - `diagnostic.rs` — Family / SafetyTier / Severity enums covering ten families (DS, MF, CD, BC, NM, OW, NL, VE, ED, LK). Only DS and MF are populated in this PR. - `codes.rs` — 8 DiagnosticCode constants (OG-DS-101..105, OG-MF-103, OG-MF-104, OG-MF-106). Five of the eight are wired to real emission sites; the other three are reserved. - Unit tests for catalog invariants: codes unique, prefix matches family, suffixes are 3-digit, destructive defaults to error, lookup() works, EMITTED_IN_V0 codes exist in ALL_CODES. - `SchemaMigrationStep::UnsupportedChange` gains an optional `code: Option<String>` field. New `unsupported_error_message()` helper prefixes the message with `[code]` when present. - 5 of 17 existing rejection paths now carry codes: - `removing node type` → OG-DS-102 - `removing edge type` → OG-DS-103 - `removing property` → OG-DS-104 - `adding required property without backfill` → OG-MF-103 - `changing property type` → OG-MF-106 Remaining 12 paths carry `code: None` and are tagged as future work. - `schema_apply` surfaces the formatted error (with `[code]` prefix); CLI `omnigraph schema plan` renders the code on the `unsupported change on <entity>` line. - PR #62 destructive-rejection tests in `tests/schema_apply.rs` now assert on the stable code (`msg.contains("OG-DS-104")`) instead of the error-message substring. 11/11 tests pass. - New `docs/schema-lint.md` documents the v0 catalog + the 10 families + Atlas prior art. AGENTS.md index updated. What's explicitly NOT in v0 (subsequent PRs): - No severity config in `omnigraph.yaml` (MR-694 §2). - No `@allow(OG-XXX-NNN, "rationale")` suppression directive (§3). - No `--allow-data-loss` flag or destructive-tier enforcement. - No new `SchemaMigrationStep` variants (soft/hard drops, default, widen/narrow). MR-700, MR-697 land those. - No pre-migration checks (MR-941). - No CD / VE / LK / NM family rules (MR-942..945). - No CI integration (MR-946). Tests: 235 compiler tests, 11 schema_apply integration tests, 14 lint module tests, 55 CLI tests — all green. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:08:18 +03:00
Ragnor Comerford	f28f644bf2	Merge pull request #83 from ModernRelay/devin/1778623807-remove-orphan-loader-files Remove orphaned loader/{constraints,embeddings,jsonl}.rs files	2026-05-12 21:03:36 -07:00
Ragnor Comerford	53d41a30b4	Merge pull request #85 from ModernRelay/ragnorc/survey-state engine: pin stable-row-id preservation through stage_overwrite	2026-05-12 17:24:55 -07:00
Ragnor Comerford	3cc5c6a9a2	chore: gitignore the mdrip/ markdown snapshot cache npx mdrip writes fetched-page snapshots under mdrip/. The cache is a local-only working artifact (docs/lance.md is the curated index of upstream Lance pages we fetch on demand). Keep the cache out of the tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:02:14 -07:00
Ragnor Comerford	a30d1cc0dc	engine: stage_overwrite sets enable_stable_row_ids explicitly Defensive — Lance 4.0.0 preserves the source dataset's flag through Operation::Overwrite even when WriteParams omits it (pinned by the prior commit's test), but setting it explicitly matches the public overwrite_dataset path at line 454 and documents the dependency at the call site so a future refactor doesn't accidentally drop it. Setting it on a dataset created without stable row IDs is a no-op per Lance's row-id-lineage spec, so this stays correct for legacy datasets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:57:05 -07:00
Ragnor Comerford	549060f297	tests: pin stable-row-id preservation across stage_overwrite stage_overwrite is used by schema_apply to rewrite tables when an additive migration touches data. If Lance Operation::Overwrite ever stopped preserving the source dataset's enable_stable_row_ids flag, every schema_apply that triggers a rewrite would silently disable stable row IDs on the affected tables and downstream readers that depend on _rowid stability (change-feed validators, index reconcilers) would observe silent corruption. Empirically Lance 4.0.0 does preserve the flag through Overwrite even when WriteParams omits it — but the preservation isn't documented at the Lance spec level, so pin it here. Any future behaviour change surfaces as a test failure rather than silent corruption. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:56:58 -07:00
Ragnor Comerford	2121d9f6c3	docs: storage stable-row-ids reflects every dataset The L1 capability list claimed the flag was enabled "for the commit-graph and run-registry datasets" — stale. Every Lance dataset OmniGraph creates has enable_stable_row_ids: true; the run-registry datasets are gone since MR-771. Replace with a single paragraph capturing the invariant, the consequences (row-version columns available, CreateIndex × Rewrite not retryable, Lance reader version required), the legacy-dataset constraint (one-way at create, dump-and-reload to migrate), and a pointer to the regression test in staged_writes.rs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:56:51 -07:00
Ragnor Comerford	8427e705dd	Merge pull request #84 from ModernRelay/ragnorc/read-docs docs: lead AGENTS.md first principle with integrated-over-time framing	2026-05-12 16:38:52 -07:00
Ragnor Comerford	24c0558180	docs: lead AGENTS.md first principle with integrated-over-time framing Reframes the first-principle section to lead with Winters' "engineering is programming integrated over time" as the lens, keeping "minimize ongoing liability" as the operative directive and folding in "complexity should be earned." Adds a new Tiebreakers subsection with two rules that the prior section lacked clean appeals for: - correctness > simplicity > performance (lexicographic) - reversibility shapes evidence demand (reversible → prod metrics over napkin math over RFCs; irreversible → RFC up-front) Adds a Hyrum's-Law deny-list entry in both AGENTS.md and docs/invariants.md §IX: shipping observable behavior is shipping a contract, even when undocumented. Net always-on context cost: ~7 lines. No renumbering of §I–VIII invariants; Hyrum's Law lands in the deny-list to avoid breaking back-references. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:27:24 -07:00
Devin AI	327eb821b5	Remove orphaned loader/{constraints,embeddings,jsonl}.rs files These three files in crates/omnigraph/src/loader/ have no `mod` declaration anywhere in the workspace and no `#[path = "…"]` reference. They are not compiled — `touch`-ing them does not trigger `cargo check` to recompile anything. Their imports (`crate::catalog::schema_ir`, `crate::error::NanoError`, `crate::store::manifest::hash_string`, `crate::types::ScalarType`, `super::super::graph::DatasetAccumulator`) reference modules that no longer exist in the engine crate, so they could not even be wired in without further work. They are vestigial code from an earlier monolithic crate layout. The live functionality is independently implemented inside crates/omnigraph/src/loader/mod.rs. These files have been orphaned since the initial public commit. `cargo check --workspace --all-targets` and `cargo test --workspace --no-run` both pass with no new warnings. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-12 22:57:20 +00:00
Claude	a9c4423b82	Strengthen cleanup-then-optimize sequencing test with postconditions Reviewer feedback on PR #62: the original `cleanup_then_optimize_succeed_in_sequence` only unwrapped both calls and asserted nothing, so it didn't validate the claimed sequencing behavior. The concern that motivates the test is that cleanup destroys version history and optimize on a freshly-cleaned table could trip on dropped fragment refs or stale manifests. Rename to `cleanup_then_optimize_preserves_rows_and_table_remains_writable` and add three concrete postconditions: row counts in both Person and Company tables survive the sequence; the head remains readable; and a subsequent merge load still succeeds.	2026-05-12 23:36:01 +03:00
Claude	57a62756c5	Exercise actual type rename in schema-apply rename test The previous version of `apply_schema_renames_node_type_via_rename_from_and_preserves_rows` kept the node name as `Person` (`@rename_from("Person")`) and only renamed a property. The planner only emits a `RenameType` step when the new name differs from the accepted one, so the test name overstated what it covered: a regression in `RenameType` step emission or in the coordinator's table-key remap during type rename could pass while the test still went green. Rename the desired node from `Person` to `Human` (with `@rename_from("Person")`), update the dependent edge endpoints to point at `Human`, and assert both the `RenameType` step and that the manifest table key has moved from `node:Person` to `node:Human`.	2026-05-12 23:36:01 +03:00
Claude	e22d468e27	Add maintenance + destructive-migration test coverage The audit of test coverage flagged three holes: - `omnigraph optimize` and `omnigraph cleanup` had no integration tests (no `maintenance.rs`). Add one covering empty/idempotent edges, the policy-validation contract on `cleanup`, and head preservation under aggressive policies. - `apply_schema` only covered I32 -> I64 type-change rejection. Add the symmetric narrowing case plus rejections for the other destructive shapes (drop property with data, drop node type, drop edge type, add required property without backfill) and assert the manifest version doesn't advance. Add a positive `@rename_from` case to pin the stable-type-id contract preserves rows through a rename. - `docs/testing.md` was missing `validators.rs` and the new `maintenance.rs` from its file table; bump the count and add rows.	2026-05-12 23:36:01 +03:00
devin-ai-integration[bot]	6914e0256e	MR-786: merge-pair truth table with exhaustive op-variant matrix (#81 ) * MR-786: merge-pair truth table with exhaustive op-variant matrix Add crates/omnigraph/tests/merge_truth_table.rs that enumerates every (left_op, right_op) cell from the operation vocabulary named in the ticket — {noop, addNode, removeNode, addEdge, removeEdge, setProperty, dropProperty, addLabel, removeLabel} — and asserts the deterministic outcome of Omnigraph::branch_merge against a structured oracle. The matrix is built in a 9x9 match in build_case, so adding a new OpVariant is a compile-time, fail-on-omission task. Today's mutation grammar only exposes insert \| update set \| delete (see docs/query-language.md), so the 36 cells over the first six ops are executable and the 45 cells involving dropProperty/addLabel/removeLabel are recorded as Expected::Unsupported with a note. Each executable cell spins up a fresh tempdir, applies one mutation per branch, calls branch_merge, and asserts either: * MergeOutcome (AlreadyUpToDate / FastForward / Merged) plus a GraphAssert on the affected entities, or * an OmniError::MergeConflicts whose entries match the expected table_key + MergeConflictKind (row_id is optional because edge ULIDs are generated at runtime). branch_merge is directional, so the (L, R) and (R, L) cells live in separate entries in the matrix and are run independently — the op-pair symmetry encoded in build_case serves as the commutativity oracle without doubling the runtime. End-to-end the suite runs in ~10s on a fresh build, well under the 30s budget asserted at the bottom of the test. Also adds a row to docs/testing.md so the test-coverage map points future agents at this file. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> * Use one Omnigraph handle for both branches Self-review caught that the runner was opening two Omnigraph handles on the same temp dataset (one for main, a second via Omnigraph::open for feature). tests/branching.rs uses one handle and passes the branch name to mutate_branch — same pattern works here and avoids any cache-coherency surprises between the two handles. Also drops the post-merge reopen, which only existed to give the second handle a fresh snapshot. Runtime drops ~10s -> ~9s. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> * Assert exact conflict count, not subset inclusion cubic and Devin Review both flagged that check_outcome's Expected::Conflicts arm only enforces want ⊆ got, so a regression that produces a spurious extra conflict (e.g. emitting both OrphanEdge and a stray DivergentInsert) would silently pass the truth-table cell. For a deterministic oracle that's the wrong direction — the cell pins the exact conflict-artifact set, not a lower bound. Add an assert_eq!(got.len(), want.len()) before the existence loop. All 36 executable cells still pass; runtime unchanged. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> * Subsume 4 conflict tests in branching.rs into truth table The four `branch_merge_reports__conflict` tests (DivergentUpdate / DivergentInsert / DeleteVsUpdate / OrphanEdge) were redundant with the deterministic-oracle cells in the new `merge_truth_table.rs` and only added drift risk. To preserve the post-conflict invariant that lived in `branch_merge_reports_divergent_update_conflict` (target unchanged after a failed merge), the truth-table runner now generalizes it: on every `Conflicts` cell, main's state is asserted against `state_after_apply_only(right_op)`. That gives strictly more coverage than the deleted tests carried, since the invariant now applies to all* seven conflict cells, not just one. The `UniqueViolation` and `CardinalityViolation` cases stay in `branching.rs` — they're combinatorial (require >1 op per side with a non-default schema) and out of scope for the pair-wise truth table. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> * Fix misleading 'Total edges: 0' comment in (AddEdge, RemoveEdge) cell Devin Review flagged that the comment said 'Total edges: 0' while the parenthetical math evaluates to 1 (matching `GraphAssert::base()`). The assertion is correct; only the leading number in the comment was wrong. Reworded to 'Net edges: … = 1 (matches base)' so the prose agrees with both the math and the assertion. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> --------- Co-authored-by: Ragnor <ragnor@modernrelay.com> Co-authored-by: Ragnor Comerford <ragnor.comerford@gmail.com> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-12 22:36:01 +03:00
Ragnor Comerford	3bd072c917	docs: add docs/transactions.md — branch-as-transaction explainer (#69 ) The architectural rule "no cross-query BEGIN/COMMIT; branches fill that role" lives in docs/invariants.md §VI.23 but is not surfaced anywhere user-facing. New users coming from Postgres/MySQL hit the gap when they realize multiple queries on main are independently atomic, not jointly atomic. This page explains the model with worked examples: * Single-query multi-statement (atomic by default) * Two separate queries on main (NOT atomic — common surprise) * Many queries via a branch (atomic at merge) * Coordinating multiple agents via branch-per-agent Plus a comparison table to BEGIN/COMMIT, failure-mode rundown, and "when to use what" decision matrix. Linked from AGENTS.md "Where to find each topic" between branches-commits.md and runs.md. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:35:57 +03:00
Ragnor Comerford	c9c7c0672e	Update README.md	2026-05-12 08:17:31 -07:00
Ragnor Comerford	c2e3a9e5c3	Add use cases for unified company brain and context graphs	2026-05-12 08:08:08 -07:00
Ragnor Comerford	676c9eab05	Merge pull request #78 from ModernRelay/devin/1778363660-mr-901-blob-branch-merge Fix branch merge with blob columns	2026-05-12 07:31:04 -07:00
Ragnor Comerford	d6d2763609	Merge pull request #80 from ModernRelay/devin/1778524905-mr-923-merge-restore-refresh Fix MR-923: refresh restored coordinator on merge Err path	2026-05-11 15:55:43 -07:00
Devin AI	725d41205e	Drop redundant server-level regression test The matrix cell d:merge×change:into-target already exercises this race: pre-fix it flakes ~20% on shared-CPU hardware (sentinel 409s); post-fix it passes 100% regardless of which side of the racing pair returns first. That flake-to-stable transition is the regression signal. The replacement test (concurrent_merge_clean_409_does_not_poison_next_ change_on_target) tried to sharpen this by looping until the clean- 409 path fired and then strictly requiring it. On fast CI hardware the race window never opens in 50 iterations, which made the strict variant fail in CI despite passing 10/10 locally. The bug genuinely needs a real concurrent writer to advance on-disk manifest during the swap window — a deterministic failpoint can't substitute because forcing the merge body to Err without a real concurrent writer leaves no cache-vs-disk drift to validate. Reverting to the matrix cell as the sole regression coverage. Updated the comment in merge.rs accordingly. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 21:57:47 +00:00
Devin AI	a6c7e5fab5	Use if-let shape for refresh outcome handling Switch from match-on-Result to if-let-Err so the refresh outcome and merge_result outcome are checked independently, making the intent clearer: 'attempt refresh; on Ok-merge-with-refresh-error propagate; on Err-merge-with-refresh-error log and surface the original merge error'. No semantic change — both shapes were valid (wildcard patterns don't move the scrutinee) — but the if-let form sidesteps a needs-second-reading question raised in code review. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 21:50:26 +00:00
Devin AI	7d1a40102c	Address review feedback merge.rs: best-effort refresh on the Err path so a refresh-time storage error doesn't replace the merge body's structured error (typically the manifest_conflict that the HTTP layer maps to a 409 with a structured payload) with a less informative one. Ok-path behavior is unchanged — there a refresh failure is propagated so the caller knows the coord's cache is unsynced. server.rs: bump MAX_ITERATIONS to 50 and assert at the end that the named clean-409 path actually fired at least once. With ~20% per-iter rate on shared-CPU CI (per the original MR-923 repro), P(no hit in 50) is < 0.002%. Without this assertion the test silently degraded to exercising only the 200-merge path — covered already by the matrix cell. Both changes per Devin Review + cubic comments on PR #80. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 21:35:18 +00:00
Devin AI	b7353e1dc7	Use refresh_coordinator_only to avoid racing branch_merge's sidecar The previous fix used `self.refresh()` to sync the restored coordinator's cache after the swap-restore window. `refresh` runs the `RollForwardOnly` recovery sweep — which, on the merge Err path with a phase-B failure (sidecar written, per-table HEAD advanced, manifest publish skipped), would observe the merge's own in-flight sidecar and close it here. That violates the contract documented on `Omnigraph::refresh`: > Engine-internal callers that already hold an in-flight sidecar > (e.g. `schema_apply` mid-write) MUST use `refresh_coordinator_only` > to avoid the recovery sweep racing their own sidecar. The post-restore step's purpose is to sync the coord cache with disk, not to run recovery, so `refresh_coordinator_only` is the right primitive on both paths. CI surfaced this via `branch_merge_phase_b_failure_recovered_on_next_open` in `crates/omnigraph/tests/failpoints.rs`, which asserts the sidecar persists after the failpoint fires. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 21:09:44 +00:00
Devin AI	e91d5615c6	Fix MR-923: refresh restored coordinator on merge Err path branch_merge_impl swaps the coordinator for the merge target, runs the merge body, then restores the original coordinator. A concurrent /change on the same target during this window publishes against the swapped coord, advancing on-disk manifest state that the restored coord doesn't see. The post-restore refresh was previously gated on merge_result.is_ok(), so the clean-409 path (merge body's post_queue_snapshot drift check returning a recoverable conflict) left the restored coord's cached snapshot stale relative to disk. The next sequential /change seeded its publisher expected_versions from that stale cache and 409'd with ExpectedVersionMismatch — a non-retryable conflict surfaced to a caller with no concurrent writer of their own. Refresh on both Ok and Err paths so cached state cannot diverge from the manifest across the swap-restore window. Add a focused regression test (concurrent_merge_clean_409_does_not_poison_next_change_on_target) that loops the cell-d scenario until the clean-409 branch fires and asserts the follow-up sentinel succeeds in that branch specifically. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 20:31:18 +00:00
Devin AI	fca2b74dee	Materialize external blob URIs during branch merge Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 12:54:04 +00:00
Devin AI	da89e18e62	Merge main into blob merge fix Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-10 21:55:02 +00:00
Ragnor Comerford	19e9292ec0	Merge pull request #75 from ModernRelay/ragnorc/mr-686-lance Per-table writer queues + per-actor admission + op-kind-aware version check	2026-05-10 23:50:56 +02:00
Devin AI	7a338a8223	agents: keep guide short for context	2026-05-10 14:41:02 +00:00
Devin AI	4eb865b340	docs: expand 0.4.2 release notes	2026-05-10 14:37:58 +00:00
Devin AI	e44a4704eb	docs: fix admission gating description	2026-05-10 14:16:26 +00:00
Devin AI	a42d178119	release: prepare omnigraph 0.4.2	2026-05-10 14:02:28 +00:00
Devin AI	31b8ffe7b5	engine: inline-delete sidecar covers version-mismatch check	2026-05-10 10:37:46 +00:00
Devin AI	01660faa26	Tighten blob descriptor validation Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-10 09:28:44 +00:00
Devin AI	16ac166059	Fix branch merge with blob columns Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-09 22:33:29 +00:00
Devin AI	6a3f0677ae	server: drop unwired try_admit_rewrite / 503 admission surface	2026-05-09 20:58:17 +00:00
Devin AI	4bb7964af9	tests: matrix cell k asserts post-reopen row count	2026-05-09 20:16:44 +00:00
Devin AI	708e170dc5	engine: branch-merge revalidates target snapshot under queue	2026-05-09 20:16:12 +00:00
Devin AI	a6d244e648	engine: strict drift check uses read-time pin	2026-05-09 20:06:25 +00:00
Ragnor Comerford	f9a0f31f80	server: drop 503 from OpenAPI on admission-gated endpoints (unreachable) Cursor Bugbot LOW on commit `3ad359d`: try_admit_rewrite is defined and tested but no HTTP handler calls it; the six handler OpenAPI annotations declared status = 503 (added in `8e1a8e7`) but try_admit (the only path handlers invoke) returns 429 only. 503 was unreachable. Fix: remove (status = 503, ...) from the six handler OpenAPI annotations and regenerate openapi.json. Kept as forward-looking infrastructure: try_admit_rewrite, global rewrite semaphore, RejectReason::GlobalRewriteExhausted, ApiError::ServiceUnavailable, the 503 branch in IntoResponse, --global-rewrite-cap, and OMNIGRAPH_GLOBAL_REWRITE_MAX. When a future commit wires try_admit_rewrite into a handler, the 503 OpenAPI annotation lands alongside that wiring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 21:54:24 +02:00
Ragnor Comerford	3ad359db8b	tests: admission test uses new_with_workload, drops env mutation + #[serial] Migrates `ingest_per_actor_admission_cap_returns_429` from env-var override to direct `WorkloadController::new(1, ...)` construction via `AppState::new_with_workload`. Removes the `EnvGuard` and the `#[serial]` annotation that paired with it. Why correct by design (AGENTS.md rule 9): the previous round's matrix fix (commit `8bd9a5f`) shielded the matrix from this test's env mutation, but the broader bug class — "test A's process-wide env mutation can leak into any test B that calls `AppState::open` / `WorkloadController::from_env()`" — was still reachable by any future test that didn't think to opt out. Closing the class at the source: this test no longer mutates global state at all, so no other test needs to defend against it. Net effect: - This test no longer needs `#[serial]` (was the only reason it was marked) — runs in parallel with the rest of the suite. - The matrix's defensive `with_defaults()` construction (commit `8bd9a5f`) remains correct but is no longer required for correctness; it's now a "belt and suspenders" guard against any FUTURE env-mutating test. Verified locally: both tests pass when run together; full server suite (44 tests) green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 20:35:41 +02:00
Ragnor Comerford	8bd9a5ff14	tests: matrix harness uses with_defaults() workload, not from_env() Round 4 CI failure: Test Workspace and server-aws both red on `concurrent_branch_ops_morphological_matrix` cell b ("merge × merge: same-target-distinct-sources") — second merge returned 429 instead of 200. The matrix passes locally. Root cause: cargo test runs tests in parallel by default. The admission test `ingest_per_actor_admission_cap_returns_429` is wrapped with `#[serial]` and an EnvGuard that sets `OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=1` for its duration. Process-wide env vars are visible to concurrently-running tests; the matrix's `Harness::new()` called `AppState::open()` which delegates to `WorkloadController::from_env()`, picking up cap=1 if it ran while the admission test held the EnvGuard. With cap=1 + 2 concurrent merges in cell b, one merge waits behind merge_exclusive while the other is admitted; the waiter holds its admission permit, but a fresh actor permit is needed when admission is per-actor — the second merge's permit acquisition fails because the first hasn't released yet, and 429 fires. Fix (correct by design, AGENTS.md rule 9): the matrix harness builds the WorkloadController explicitly via `WorkloadController::with_defaults()` and passes it to `AppState::new_with_workload`, the constructor added in commit `22d76db`. Closes the bug class "tests pick up another concurrent test's env override at construction time" — the matrix is now insulated from any env-var manipulation in the rest of the test suite. Verified locally: with `OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=1` set in the environment, the matrix passes (it ignores env entirely now). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 20:19:42 +02:00
Ragnor Comerford	99b0941478	tests: remove three narrow concurrent_branch_* tests subsumed by T1 The previous commit added `concurrent_branch_ops_morphological_matrix` covering 11 cells with stronger assertions (identity + post-op /change + reopen). The three narrow tests it replaces: - concurrent_branch_create_from_distinct_parents_does_not_corrupt_coordinator → matrix cell f, with identity assertions added - concurrent_branch_merges_distinct_targets_do_not_swap_into_each_other → matrix cells a + b + c, with identity assertions that close the symmetric-swap blind spot cubic flagged on commit `64f2b99` - concurrent_change_during_branch_merge_preserves_writes → matrix cell d The matrix retains the original tests' diagnostic granularity through named cell labels in every assertion message ("[a:merge×merge:distinct-targets] merge a"), so a CI failure points to the exact cell + invariant. Net: 522 lines removed, 0 coverage lost. All other server tests pass unchanged (44 total). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 20:09:21 +02:00
Ragnor Comerford	ac8594462e	tests: branch-ops morphological matrix (T1) Replaces three narrow concurrent_branch_* tests (folded in below) with one parameterized matrix test covering 11 representative (op_a, op_b, target_overlap) cells, asserting C1-C6 uniformly: C1 — both complete (no deadlock; tokio::time::timeout(15s)) C2 — status: both 200 or exactly one clean conflict; never 500 C3 — per-target row count C4 — per-target row identity (named persons present + absent — catches the symmetric-swap class that count assertions miss; cubic P2 on commit `64f2b99` flagged this gap on the round-3 merge race test) C5 — engine state coherent (subsequent /snapshot consistent) C6 — post-op /change on main succeeds (engine isn't poisoned) Cells: a. Merge × Merge, distinct targets — branch_merge_impl race pin b. Merge × Merge, same target / distinct sources — merge_exclusive serialization c. Merge × Merge, same source / distinct targets — fanout d. Merge × Change, into target — per-(table, branch) queue e. Merge × BranchCreateFrom, target — interaction with refresh path f. BranchCreateFrom × BranchCreateFrom, distinct parents — round-1 race pin g. BranchCreateFrom × BranchDelete, unrelated branches — disjoint state h. BranchDelete × BranchDelete, distinct branches — concurrent refresh i. BranchDelete × Change, distinct branch — refresh-side vs writer j. BranchCreateFrom × Change, on source — fork-while-writing k. Reopen consistency after concurrent pair — disk-vs-cache drift Each cell: - spins up its own tempdir + AppState so failures don't cascade, - aligns the pair at a tokio::sync::Barrier so both reach the engine close in time, - wraps in a 15s deadlock timeout, - asserts identity via a /read with the `get_person` fixture query (specific names must be present on the right branch and absent from the wrong one). Subsumes: - concurrent_branch_create_from_distinct_parents_does_not_corrupt_coordinator (now cell f, with identity assertions added) - concurrent_branch_merges_distinct_targets_do_not_swap_into_each_other (now cells a + b + c, with identity assertions; the symmetric-swap blind spot cubic flagged on commit `64f2b99` is closed) - concurrent_change_during_branch_merge_preserves_writes (now cell d) Those three narrow tests are removed in the next commit so this lands green standalone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 20:07:37 +02:00
Ragnor Comerford	64f2b994f5	bench: assert --heavy-concurrency > 0 instead of silently clamping Closes the cubic P2 finding on commit `22d76db`: `Semaphore::new(concurrency.max(1))` silently coerced --heavy-concurrency=0 to 1, so the JSON output reported 0 while execution actually used 1. Reported settings differed from actual. Adds an explicit `--heavy-concurrency > 0` check in `main()` (with a helpful error message pointing to --heavy-batches=0 as the way to disable heavy traffic) and a defensive `assert!()` inside `drive_heavy_actor` so future callers can't pass 0 silently. Verified: `bench_actor_isolation --heavy-concurrency 0` exits with code 2 and the explanatory error message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 19:23:02 +02:00
Ragnor Comerford	3e6b2af4e9	engine: serialize concurrent branch merges via merge_exclusive mutex Closes the Cursor Bugbot HIGH on commit `22d76db` (round 2 review): `branch_merge_impl` at `crates/omnigraph/src/exec/merge.rs:1085-1100` still used the swap_coordinator_for_branch + operate + restore_coordinator pattern across three separate `coordinator.write().await` acquisitions. Two concurrent merges with distinct targets would interleave their swaps, leaving each merge's body running against the other's swapped coord — A's `feature_a → target_a` would land its rewrite in target_b instead. Adds `merge_exclusive: Arc<tokio::sync::Mutex<()>>` to `Omnigraph`, held across the entire swap → operate → restore window in `branch_merge_impl`. Concurrent branch merges now serialize relative to each other; everything else (per-(table, branch) writer queues, /change, /ingest) is unaffected. Why the mutex rather than the deeper "operate on local coord" refactor (the round-1 fix shape applied to `branch_create_from`): `branch_merge_on_current_target` calls `self.snapshot()` and `self.ensure_commit_graph_initialized()` internally, which use `self.coordinator` directly. Threading an explicit target coord parameter through the merge body would unwind dozens of call sites. The mutex is a smaller intrusion that fully closes the race. Documented as a follow-up if telemetry shows merge concurrency matters. Pinned by `concurrent_branch_merges_distinct_targets_do_not_swap_into_each_other` (previous commit). Pre-fix: M=4 iterations of concurrent merges deterministically corrupted target row counts. Post-fix: all M iterations land each merge on its declared target. The two adjacent branch concurrency tests (`concurrent_change_during_branch_merge_preserves_writes`, `concurrent_branch_create_from_distinct_parents_does_not_corrupt_coordinator`) still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 19:14:54 +02:00
Ragnor Comerford	2b2e723125	tests: pin branch_merge swap-restore race (red) Per AGENTS.md rule 8, this commit lands the failing regression test ahead of the fix. Cursor Bugbot HIGH on commit `22d76db` rediscovered the residual flagged in the round 1 honest-review note: `branch_merge_impl` at `crates/omnigraph/src/exec/merge.rs:1085-1100` still uses the swap_coordinator_for_branch + operate + restore_coordinator pattern across three separate `coordinator.write().await` acquisitions. The same shape that branch_create_from_impl shed in commit `4ffbf6e`. The test spawns two concurrent /branches/merge calls A (feature-a → target-a) and B (feature-b → target-b) aligned at a tokio::sync::Barrier so both reach swap_coordinator_for_branch close in time. M=4 iterations boost race-catching odds. Currently fails on `22d76db` with target-a=5, target-b=4: B's merge landed on the wrong coord — target-b never got Frank because A's swap pushed self.coordinator to target-a, B's swap captured target-a as B's "previous", and B's restore set self.coordinator back to target-a (not the original main). Subsequent operations using self.coordinator point at the wrong branch. Fix lands in the next commit: serialize concurrent branch merges via `merge_exclusive: Arc<tokio::sync::Mutex<()>>` held across the entire swap-operate-restore window. Closes the bug class "non-atomic three-step coordinator manipulation" for branch_merge by serializing merges relative to each other; per-(table, branch) queue inside the merge body still lets merges and other writers run concurrently. A deeper "operate on local coord" refactor (the round-1 fix shape for branch_create_from) requires unwinding `branch_merge_on_current_target` and its uses of `self.snapshot()` / `self.ensure_commit_graph_initialized()`; deferred to a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 19:12:03 +02:00

1 2 3 4 5 ...

293 commits