omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-09 01:35:18 +02:00

Author	SHA1	Message	Date
Andrew Altshuler	da42beec41	policy: chassis fan-out — _as variants on the remaining 6 writers (MR-722) (#103 ) PR #102 wired apply_schema_as. This PR completes the chassis-side coverage so every public mutating engine entry point hits the same Omnigraph::enforce(action, scope, actor) gate regardless of transport: - mutate_as → enforce(Change, Branch(branch), actor) - load_as → enforce(Change, Branch(branch), actor) - ingest_as → enforce(Change, Branch(branch), actor); also threads actor through the implicit branch_create_from_as so fresh-branch ingest correctly hits BranchCreate too - branch_create_as → enforce(BranchCreate, TargetBranch(name), actor) - branch_create_from_as → enforce(BranchCreate, BranchTransition { source, target }, actor) - branch_delete_as → enforce(BranchDelete, TargetBranch(name), actor) - branch_merge_as → enforce(BranchMerge, BranchTransition { source, target }, actor) Three new _as variants for branch ops (create, create_from, delete) that had no actor surface before; existing actor-less variants delegate with actor=None so the no-policy path is a strict no-op. HTTP handlers updated to thread the resolved actor into the new _as variants for branch_create and branch_delete (was previously dropped). 14 new SDK chassis tests (one allow + one deny pair per wired writer); the existing 4 apply_schema_as tests stay. All 18 pass. docs/user/policy.md updated to describe engine-wide enforcement and the coarse-vs-fine layer split (engine = action gate, query layer per-row = MR-725 future). AGENTS.md capability matrix updated to match. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 03:38:18 +03:00
Andrew Altshuler	9973683261	policy: chassis core — omnigraph-policy crate + Omnigraph::enforce() (MR-722) (#102 ) PR #2 of the policy chassis series (PR #1 = MR-731, merged in #101). The structural fix that moves Cedar enforcement from HTTP-only to engine-wide. apply_schema is the proof-of-concept writer; PR #3 fans the enforce() call out to the remaining six (mutate_as, load, ingest_as, branch_create_from, branch_delete, branch_merge). ## What lands ### New crate: omnigraph-policy The 844-line policy.rs moves from `omnigraph-server` into a new `omnigraph-policy` workspace crate so both engine and server can depend on it. Cedar dependency moves with it. The server's policy.rs becomes a re-export shim (`pub use omnigraph_policy::`) so existing `omnigraph_server::PolicyAction` etc. paths keep working — CLI and test consumers don't have to migrate in one go. ### New trait: PolicyChecker ```rust pub trait PolicyChecker: Send + Sync { fn check(&self, action: PolicyAction, scope: &ResourceScope, actor: &str) -> Result<(), PolicyError>; } ``` `PolicyEngine` (Cedar-backed) implements it. `Omnigraph::with_policy()` takes `Arc<dyn PolicyChecker>`. Engine tests mock the trait without spinning up Cedar. MR-725 will extend the trait with `predicate_for()` for query-layer pushdown — additive, no call-site changes. ### New enum: ResourceScope Four variants — Graph, Branch, TargetBranch, BranchTransition — mapping cleanly to today's `(branch, target_branch)` shape on PolicyRequest via `to_branch_pair()`. Each engine writer picks the variant that matches the existing HTTP-layer convention so engine and HTTP evaluate the same Cedar decision. Invariant: ResourceScope stays at branch granularity. Per-type and per-row scope are MR-725's territory, not engine-layer's. Adding Type/Row variants here creates two places per-type policy can be evaluated, which can drift. See chassis design refinements comment on MR-722 (2026-05-17). ### Omnigraph::with_policy() + enforce() New `policy: Option<Arc<dyn PolicyChecker>>` field on Omnigraph, None by default (preserves embedded/dev no-enforcement mode). * `with_policy(self, checker)` setter — builder-style, consumes self. * `enforce(action, scope, actor)` — the gate. When policy is None, no-op. When policy is Some AND actor is None, hard error — silent bypass via "I forgot the actor" is exactly the footgun this gate is here to prevent. ### apply_schema_as: first writer wired * New public method `apply_schema_as(source, options, actor)` that calls `enforce(SchemaApply, TargetBranch("main"), actor)` before acquiring the schema-apply lock or doing any other work. * Existing `apply_schema(source)` and `apply_schema_with_options(...)` delegate to it with actor=None (no-actor variants). * HTTP handler `server_schema_apply` updated to call apply_schema_as with the resolved actor. AppState construction injects the PolicyEngine into Omnigraph via `with_policy`. HTTP-layer authorize_request still fires first; the engine gate is the redundant-but-correct backstop and the only path that protects SDK / embedded callers. PR #3 removes the HTTP redundancy. ### OmniError::Policy New error variant for engine-layer policy denial / evaluation failure. ApiError::from_omni maps it to 403. ### MR-724 Admin action — Option A reservation PolicyAction::Admin kept in the enum with a load-bearing doc comment naming its future consumers (hot reload, audit log query, approvals list per MR-726 / MR-732 / MR-734). No enforce(Admin, ...) call site exists yet — the variant is reserved so the action vocabulary is complete from chassis day one. MR-724 closes when the first consumer surface ships. ### New SDK-side integration test `crates/omnigraph/tests/policy_engine_chassis.rs` — four tests covering: * Policy denies for unauthorized actor → OmniError::Policy * Policy permits for authorized actor → apply succeeds * Policy installed + no actor → hard error (forget-the-actor footgun) * No policy → no-op (embedded/dev default still works) These exercise the engine path directly — no HTTP layer involved. ## Test results - cargo test --workspace --locked --no-fail-fast: 851 passed, 0 failed * 45 server tests (existing) pass * 14 schema_apply tests (existing) pass * 4 new chassis tests pass * 60 OpenAPI tests pass (no HTTP API surface changes) * No regressions across the workspace ## Architectural decisions baked in Per MR-722 chassis design refinements comment (2026-05-17): 1. PolicyChecker is a trait, not just a concrete. Engine and server consume the trait. MR-725 adds predicate_for() additively. 2. ResourceScope stays at branch granularity. No Type/Row variants. 3. Coarse-vs-fine framing pinned: engine-layer is action gate; query-layer (MR-725) is predicate gate. Both backed by same Cedar engine; non-overlapping responsibilities. 4. Admin action reserved for policy-management surfaces (MR-724 Option A). ## Pending follow-ups (PR #3+) - Fan-out enforce() to mutate_as, load, ingest_as, branch_create_from, branch_delete, branch_merge (PR #3). - Remove HTTP-layer authorize_request redundancy once engine gate covers all writers (PR #3). - CLI policy injection into Omnigraph for non-`policy validate\|test\|explain` subcommands (PR #3 or follow-up). - MR-723 default-deny 3-state matrix (PR #4). - MR-736 severity warn/deny (PR #5). - AGENTS.md scope-of-enforcement rewrite once chassis fully lands. - Coarse-vs-fine framing in docs/user/policy.md. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 00:36:36 +03:00
Andrew Altshuler	7a86f654d4	policy: codify signed-token-claim-only actor identity (MR-731) (#101 ) Warm-up commit for the policy chassis epic (MR-722). PR #1 of the chassis series — same role as schema-lint v1's commit #1 baseline. Zero behavioral change; establishes the regression test, the load-bearing doc comment, and the user-doc paragraph for an invariant already true in code. Server auth already resolves `actor_id` from the matched bearer token at `omnigraph-server/src/lib.rs:692-694`, overwriting whatever the handler put in the PolicyRequest. The principle is named in docs/dev/invariants.md Hard Invariant 11 ("clients cannot set actor identity directly"). What was missing: a regression test, a load-bearing doc comment at the resolution site, and a user-facing documentation paragraph. This commit adds all three. Why first. The actor-identity invariant is the foundation every other policy decision stands on. If `actor_id` can be spoofed, every chassis primitive (per-row scope, audit log, two-person rule) becomes ungated. Pinning the invariant first means PR #2 (the chassis core) doesn't have to re-prove this assertion. Changes: * crates/omnigraph-server/tests/server.rs — new regression test actor_id_resolves_from_bearer_token_ignoring_client_supplied_headers with three sub-assertions: - spoof-up: bearer for denied actor + X-Actor-Id naming allowed actor → 403 (header doesn't promote) - spoof-down: bearer for allowed actor + X-Actor-Id naming denied actor → 200 (header doesn't demote) - empty-string spoof: empty X-Actor-Id doesn't clear resolved actor Cross-link to MR-777 (auth boundary cases — actor-id collision + malformed bearer) noted in the test docstring. * crates/omnigraph-server/src/lib.rs — expanded doc comment at the actor-resolution site explaining the SECURITY INVARIANT, citing Hard Invariant 11, the Supabase RLS history footgun, and the regression test that pins the contract. Reader thinking "I should let clients override actor_id for impersonation" hits this comment first. * docs/user/policy.md — new "Actor identity (signed-claim-only)" section near the existing Server enforcement section. Closes the user-facing doc gap MR-731's "Done when" requires. Architectural decisions for PR #2+ pinned this session (not implemented here, recorded so future implementers don't re-litigate): - PolicyEngine moves to new `omnigraph-policy` workspace crate so both engine and server can depend on it (Q2). - `enforce(action, scope, actor)` will take a new `ResourceScope` enum, leaving room for MR-725's per-type and per-row variants (Q3). - `PolicyAction::Admin` is kept and wired (Option A) — meta-action for policy-management surfaces (hot reload, audit log query, approvals list) as those consumer features land (Q4). Test results: - cargo test -p omnigraph-server --test server: 45 pass (44 existing + 1 new); no regressions - scripts/check-agents-md.sh: passes (34 links / 33 docs OK) Out of scope (PR #2+): - Omnigraph::with_policy() + enforce() method - omnigraph-policy crate creation - ResourceScope enum - CLI policy injection into Omnigraph - HTTP-layer redundant-check removal - MR-724 Admin action wiring (PR #2) - MR-723 default-deny 3-state (PR #4) - MR-736 severity warn/deny (PR #5) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 02:51:34 +03:00
Andrew Altshuler	a6e037547f	schema-lint chassis v1.2: --allow-data-loss flag + Hard mode (MR-694) — completes v1 (#100 ) * schema-lint v1 commit 5: --allow-data-loss flag + Hard mode Final v1 commit. Wires up the --allow-data-loss CLI flag and Hard mode for both DropProperty and DropType. Per docs/dev/schema-lint-v1-plan.md, commit #5 of the schema-lint chassis v1 series (MR-694). CLI (omnigraph-cli/src/main.rs): - New --allow-data-loss flag on both `omnigraph schema plan` and `omnigraph schema apply` subcommands. Off by default (Soft). - HTTP remote schema apply explicitly rejects the flag for now (CLI-only; HTTP parity is a separate small follow-up that adds the field to SchemaApplyRequest + the server handler). Engine (omnigraph.rs + schema_apply.rs): - New SchemaApplyOptions { allow_data_loss: bool } public struct (Default = all false), re-exported via omnigraph::db::SchemaApplyOptions. - New public methods: plan_schema_with_options and apply_schema_with_options. Existing plan_schema/apply_schema are now thin wrappers that pass Default::default(). - promote_drops_to_hard: post-plan walk that promotes every DropMode::Soft step to DropMode::Hard when the flag is set. Keeps the compiler's plan_schema_migration signature unchanged (no breaking change for tests / callers). - Apply path: both Drop arms accept Hard mode; behavior is identical to Soft inside the apply loop. The DIFFERENCE is the new hard_cleanup_targets: Vec<(String, String)> accumulator, populated for every Hard variant with (table_key, full_dataset_uri). - Post-publish cleanup: a new loop after the manifest commit iterates hard_cleanup_targets and calls cleanup_old_versions (before_timestamp = now) on each dataset URI. Best-effort — the apply is already durable; cleanup failure is logged via tracing::warn rather than failing the apply. - New cleanup_dataset_old_versions helper inlines the Lance cleanup_old_versions call against a dataset URI. Behavioral details: - DropProperty Hard: stage_overwrite produced a new dataset version without the column. cleanup_old_versions removes the prior version (and reclaims unique fragments). After Hard apply, snapshot_at_version(pre_drop).open(table_key) FAILS because the prior dataset version was reclaimed. - DropType Hard: no per-table write happens (the change is the manifest tombstone). cleanup_old_versions on the orphan dataset is a no-op in the immediate term (no prior versions to clean since the dataset wasn't modified by this apply). The dataset directory persists. Full orphan-cleanup is a documented follow-up — the user-facing contract is "data is unreachable via omnigraph" (manifest entry tombstoned), which is satisfied. Tests (tests/schema_apply.rs): - apply_schema_with_allow_data_loss_promotes_drops_to_hard: default plan emits Soft; with options.allow_data_loss=true, plan emits Hard; apply succeeds. - apply_schema_hard_drops_property_makes_prior_version_unreachable: Hard drop succeeds, current snapshot lacks the column, and snapshot_at_version(pre_drop).open("node:Person") FAILS (Lance prior version reclaimed by cleanup). - apply_schema_hard_drops_node_and_edge_with_flag_succeeds: both Node and Edge DropType variants are promoted to Hard with the flag; apply succeeds; current manifest entries gone. (Orphan dataset directory cleanup deferred.) Test results: - cargo test -p omnigraph-compiler --lib: 239 passed - cargo test -p omnigraph-engine --test schema_apply: 14 passed (3 new Hard tests + 11 existing soft/regression tests) - cargo test -p omnigraph-server --test openapi: 60 passed (no HTTP API surface changes in this commit; OpenAPI parity follow-up noted) v1 status: complete for CLI/embedded use. MR-694 chassis epic + MR-700 DropType/DropProperty ticket can close after this lands. Known follow-ups (separate small PRs): - HTTP parity: extend SchemaApplyRequest with allow_data_loss field, thread through server handler, regenerate openapi.json. - Orphan-dataset directory deletion for DropType Hard (currently the dataset directory persists; cleanup_old_versions doesn't remove it because the dataset wasn't modified). - MR-948 substrate alignment: swap DropProperty Soft from stage_overwrite to Dataset::drop_columns (catalog_only vs full_rewrite cost class). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fixup: use bail! from color_eyre::eyre instead of anyhow The remote-rejection branch in SchemaCommand::Apply used anyhow::anyhow! which isn't in scope; the CLI's Result type is color_eyre::eyre::Result and bail! is already imported. Caught by CI Test Workspace job on PR #100. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 22:12:46 +03:00
Andrew Altshuler	58cee158d8	schema-lint v1 commit 4: emit + apply DropType { Soft } (#99 ) Wire the second half of the dormant Drop* family. Per docs/dev/schema-lint-v1-plan.md, commit #4 of the schema-lint chassis v1 series (MR-694). Builds on commit #3 (PR #90, DropProperty Soft). Planner (schema_plan.rs): - plan_nodes leftover loop: emit DropType { Node, name, Soft } instead of UnsupportedChange (OG-DS-102) for node-type removals. - plan_edges leftover loop: emit DropType { Edge, name, Soft } instead of UnsupportedChange (OG-DS-103) for edge-type removals. Apply (schema_apply.rs): - New dropped_tables: BTreeSet<String> accumulator alongside added_tables / renamed_tables / rewritten_tables. - DropType arm in the metadata loop populates dropped_tables for Soft mode. Hard mode errors (lands in commit #5 with --allow-data-loss). - New tombstone-emission loop after the rename sidecar build: for each dropped table, push to sidecar_tombstones AND populate table_tombstones with table_version + 1. The existing manifest publish path converts table_tombstones into ManifestChange::Tombstone operations — no new manifest plumbing needed. - Soft DropType has no Phase B per-table write; the tombstone is the entire change. Lance dataset files are retained — prior __manifest versions still reference them, so time travel + branch-from-snapshot can read the dropped table until cleanup_old_versions runs. - Rides on SidecarKind::SchemaApply per MR-847 (already established by commit #3). Tests: - Planner unit test plan_emits_soft_drop_for_removed_node_and_edge_types asserts both Node and Edge DropType { Soft } emission for the Company + WorksAt combined drop, plus no UnsupportedChange. - Integration test apply_schema_drops_node_and_referencing_edge_softly (replaces apply_schema_rejects_dropping_a_node_type): asserts plan emission, apply success, current manifest entries absent, pre-drop manifest entries present (time-travel reversibility), reopen consistency. - Integration test apply_schema_drops_an_edge_type_softly (replaces apply_schema_rejects_dropping_an_edge_type): single edge drop, asserts other tables untouched, time-travel reversibility. Test results: - cargo test -p omnigraph-compiler --lib: 239 passed (1 new + 238) - cargo test -p omnigraph-engine --test schema_apply: 11 passed (2 converted + 9 unchanged) Pending for v1 completion: - Commit #5: --allow-data-loss CLI flag + Hard mode promotion in planner + immediate compact_files + cleanup_old_versions for both DropProperty and DropType. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 20:25:42 +03:00
Andrew Altshuler	e98347eb7b	schema-lint chassis v1.0: DropProperty Soft + code-tagged diagnostics (MR-694) (#90 ) * schema-lint chassis v1 (WIP): tier surfacing + plan doc First commit of the chassis v1 branch. Lands a small, foundational slice without behavior change, plus a planning doc that lays out the remaining 7 commits in sequence so the PR can be reviewed incrementally. This commit: - Adds SchemaMigrationStep::diagnostic() returning the full &'static DiagnosticCode (family + tier + severity) for UnsupportedChange steps with codes. Renderers can now reach the tier without re-implementing the code → tier lookup. - CLI `omnigraph schema plan` output now displays tier alongside code: unsupported change on node:Person.age [OG-DS-104, destructive]: removing property 'Person.age' is not supported in schema migration v1 Operators see at-a-glance the kind of risk each rejection represents — not just the rule identifier. - No behavior change. All 11 existing schema_apply tests still pass. Planning doc at docs/schema-lint-v1-plan.md tracks the 7 remaining commits to bring v1 to feature-complete: 1. (this commit) Tier surfacing in plan output. 2. Soft / Hard mode enum on drop steps. 3. Tombstone fields on catalog IR. 4. Planner emits DropProperty { Soft } by default. 5. Apply path implements Soft mode. 6. Convert PR #62 destructive-rejection tests. 7. --allow-data-loss flag + Hard mode. 8. (optional) Tombstone unhide / restore command. Delete the planning doc when v1 lands. Intentionally checked in to the WIP branch so the scope is reviewable; not intended as a permanent doc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * schema-lint v1 commit 2: DropMode + dormant Drop* variants Second commit of the chassis v1 branch. Lands the type-level shape of soft/hard drops without wiring them up. Variants are reachable from emitters but the planner doesn't produce them yet; the apply path returns an explicit not-yet-implemented error if one shows up via deserialization. Added: - `DropMode { Soft, Hard }` — orthogonal to `SafetyTier`. Tier classifies the rule's risk class; mode is the operator's intent for data treatment. - `Soft` → catalog tombstone, data retained. Tier: safe. - `Hard` → Lance-level removal. Tier: destructive; will require --allow-data-loss to apply (commit 7). - `SchemaMigrationStep::DropType { type_kind, name, mode }` and `SchemaMigrationStep::DropProperty { type_kind, type_name, property_name, mode }` variants. - Re-export `DropMode` from `omnigraph_compiler::DropMode` so downstream crates don't reach into the catalog submodule. - CLI `render_schema_plan_step` arms for both variants, surfacing the mode in plan output: `drop property 'Person.age' of node 'Person' (soft mode)`. - `apply_schema_with_lock` exhaustive match arm for the two new variants that returns `manifest_internal` with a clear not-yet-implemented message. If a SchemaIR JSON containing Drop{Type,Property} arrives (e.g. from a future tool or hand- written), the apply path fails explicitly rather than silently misclassifying. - Two new in-source tests: - `drop_steps_round_trip_through_serde` — pins the wire shape for all four (variant × mode) combinations. - `drop_mode_serde_uses_snake_case` — pins external-tool- friendly serialization (`"soft"` / `"hard"`). Build: clean, only pre-existing warnings. Tests: - omnigraph-compiler schema_plan: 6/6 (4 existing + 2 new). - omnigraph-engine schema_apply: 11/11 (unchanged — planner still emits UnsupportedChange for removal paths). Next commit (commit 3 per docs/schema-lint-v1-plan.md): add the `tombstoned: bool` fields to NodeIR / EdgeIR / PropertyIR for the catalog representation of soft-mode tombstones. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * plan doc: reframe v1 around Lance native drop_columns After a substrate audit of the Lance data-evolution guide on 2026-05-13, the v1 plan was simplified. Two key findings: 1. Lance's `drop_columns()` is already metadata-only and reversible via time travel until cleanup. No need for a parallel `tombstoned: bool` field in our catalog IR — Lance's version graph IS the tombstone. 2. The full schema_apply substrate migration (add_columns, drop_columns, alter_columns vs. stage_overwrite across all step types) is consolidated in MR-948 as a sibling issue. v1 only uses the relevant slice (drop_columns for OG-DS-1XX). Net plan changes: - Commit 3 (original): tombstone fields on catalog IR → dropped. No catalog IR change needed. The Lance drop_columns commit IS the tombstone. - Commit 5 (original): apply path writes tombstoned: true → replaced with: apply path calls Dataset::drop_columns([name]). - Commit 7 Hard mode: stage_overwrite removing the column → replaced with: drop_columns + compact_files + cleanup_old_versions. Same APIs omnigraph cleanup already uses. - Commit 8 (original): omnigraph schema unhide → dropped. Time travel is the undo (omnigraph snapshot --at <commit>). Net result: 8 commits → 5 commits. ~250 LoC less surface. More substrate-aligned. The chassis types from commit 2 (DropMode enum, DropType / DropProperty variants) are kept exactly as designed; only the implementation strategy changed. The Lance docs quote is included in the doc so future readers see the substrate behavior cited verbatim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * schema-lint v1 commit 3: emit + apply DropProperty { Soft } Wire the dormant DropProperty variant end-to-end for the Soft case. Per docs/schema-lint-v1-plan.md, commit #3 of the schema-lint chassis v1 series (MR-694). Planner (schema_plan.rs): - plan_properties: emit DropProperty { type_kind, type_name, property_name, mode: Soft } instead of UnsupportedChange when a property exists in accepted but not in desired. Plan is now supported = true for drop-only changes. Apply (schema_apply.rs): - Route DropProperty { Soft } through rewritten_tables. The existing batch_for_schema_apply_rewrite path already iterates the target schema fields, so a property absent from desired_catalog is naturally projected away. The prior Lance version retains the dropped column for time-travel reversibility (until cleanup runs). - DropType still errors (lands in commit #4 with different mechanics: __manifest entry removal instead of column projection). - DropProperty { Hard } still errors (lands in commit #5 with --allow-data-loss CLI flag + immediate compact_files + cleanup_old_versions). Tests: - Planner unit test plan_emits_soft_drop_for_removed_nullable_property asserts the variant emission + supported = true + no UnsupportedChange. - Integration test apply_schema_drops_a_nullable_property_softly_ preserves_prior_version (replaces the former apply_schema_rejects_dropping_a_property_with_data) asserts: (a) plan contains DropProperty { Soft } (b) apply succeeds + manifest advances + row count unchanged (c) current dataset schema lacks the dropped column (d) snapshot_at_version(pre_drop) still has the dropped column (e) reopen consistency — drop preserved across engine restart Recovery: rides on SidecarKind::SchemaApply per MR-847. No new sidecar kind needed; the entire apply path is already sidecar-wrapped. Substrate alignment: this commit uses the stage_overwrite full-rewrite path (full_rewrite cost class) rather than Lance native drop_columns (catalog_only cost class). MR-948 is the follow-up substrate-alignment refactor that introduces a LanceColumnOp surface and switches the metadata-only case onto drop_columns. Functional outcome is identical; cost-class improvement deferred. Test results: - cargo test -p omnigraph-compiler --lib: 238 passed - cargo test -p omnigraph-engine --test schema_apply: 11 passed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: move schema-lint-v1-plan into docs/dev/ + add to index Post-rebase fixup for the docs split (#93). The plan doc was added to docs/ at the top level before main reorganized to docs/{user,dev}/. This moves it into docs/dev/ and adds an entry to docs/dev/index.md under a new "Active Implementation Plans" section so the check-agents-md.sh link check passes. Per the original commit message (`617a77d`), the plan doc is intentionally temporary — it will be deleted when v1 lands. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 16:30:03 +03:00
Ragnor Comerford	5c889f8e42	Update README.md	2026-05-15 18:06:25 -07:00
Ragnor Comerford	e046c193bc	Update README.md	2026-05-15 18:03:40 -07:00
Ragnor Comerford	6abe59bbaa	Update use cases in README with second brains link	2026-05-15 15:40:30 -07:00
Andrew Altshuler	0de5f69d86	docs: drop npx mdrip; use curl \| pandoc for full-page fetches (#97 ) The previous "fetch the full page" recommendation in AGENTS.md and docs/dev/lance.md pointed at an unknown-author npm CLI that, on consent, wrote agent-targeted content into AGENTS.md and modified .gitignore / tsconfig.json. Source audit was clean of malicious code but the self-perpetuating prompt-injection pattern combined with a single maintainer and ~21 downloads/day made it not worth the risk. Switched to the curl + pandoc command already documented as the no-tool option. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 16:06:24 +03:00
Andrew Altshuler	60eee78465	docs: split user and developer docs (#93 )	2026-05-15 03:45:22 +03:00
Andrew Altshuler	e8d49559c4	branch-protection: allow admin bypass on main (#94 ) Flip enforce_admins from true to false. Repo admins can now merge their own PRs without waiting for code-owner review, by clicking "Merge without waiting for requirements to be met" once CI is green. The action is recorded in the audit log. Non-admins still see full enforcement: code-owner review required, 1 approving review, required status checks must pass. Rationale: as the solo owner of most CODEOWNERS scopes, the author cannot satisfy GitHub's "non-self approver" rule on their own PRs, which made every PR block on a second human. Admin bypass restores the practical workflow while keeping the protection rules as the default for everyone else.	2026-05-15 03:32:12 +03:00
Andrew Altshuler	6bad829ed0	branch-protection: declarative policy + apply script (#89 ) Branch protection on main, declared as code rather than as opaque GitHub UI state. Pairs with the CODEOWNERS chassis (#88): once this PR lands and an admin runs the apply script, every PR to main must satisfy code-owner review and the listed required checks. Components: - .github/branch-protection.json — the policy. Edit this to change required checks, review counts, etc. Includes a _comment field for human readers; the apply script strips it before PUT. - scripts/apply-branch-protection.sh — idempotent apply via `gh api`. Reads back current state for verification. Supports DRY_RUN=1. - docs/branch-protection.md — explains the policy, how to apply, how to change, why declared as code. - AGENTS.md topic-index row. Policy summary: - Required status checks (strict): Classify Changes, Check AGENTS.md Links, Test Workspace, Test omnigraph-server --features aws, CODEOWNERS / drift, CODEOWNERS / noedit. - Required approving reviews: 1, must be a code owner. - Dismiss stale reviews on new commits. - Required linear history (squash or rebase merges only). - No force pushes, no deletions, no admin bypasses. - Required conversation resolution. What's NOT in this PR: - Required signed commits — not yet; maintainers must enroll GPG/SSH signing first or merges will block. - Tag protection for v* tags — separate PR. - Additional required checks (cargo deny, audit, fmt, clippy, CodeQL, schema-lint MR-946) — separate PRs as each lands. - The script is NOT run by CI. Branch-protection changes are admin actions; CI-driven auto-apply would defeat the purpose. Manual invocation is the audit point. How to apply after merge: ./scripts/apply-branch-protection.sh Requires gh-CLI auth with repo-admin permissions. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:38:20 +03:00
Andrew Altshuler	730712b73f	codeowners: yml source of truth + generator + drift CI (#88 ) * codeowners: generator + drift CI + initial roles Source-of-truth approach to CODEOWNERS: yml is hand-edited, CODEOWNERS is generated and CI-enforced. Every role change is a reviewable PR with a permanent in-repo audit trail. No GitHub UI clicks, no shadow state. Initial roles: engineering @aaltshuler owns crates/** + default (.github/, scripts/, Cargo., openapi.json, everything else not docs) docs @aaltshuler @ragnorc owns docs/, README.md, AGENTS.md, CLAUDE.md, SECURITY.md Per GitHub semantics, multiple owners on a CODEOWNERS line means "any one satisfies the review" — for docs, either named member can approve. Strict "N distinct approvers" would need a CI workaround (not wired today; tracked for future hardening). Components: - .github/codeowners-roles.yml — source of truth. Edit this. - .github/scripts/render-codeowners.py — generator (PyYAML; ~100 LoC). - .github/CODEOWNERS — generated. CI rejects hand-edits. - .github/workflows/codeowners.yml — two checks: drift: re-render and assert CODEOWNERS matches. * noedit: reject PRs that edit CODEOWNERS without editing the yml. - docs/codeowners.md — explains the source-of-truth pattern, how to change roles, how to add new roles. - AGENTS.md topic-index row. What's NOT in this PR: - Branch protection on main (separate PR; needs `gh api` call against the org). - Required-reviewer enforcement (depends on branch protection landing). - Required CI status checks (depends on branch protection landing). - Scheduled rotation (the schedule: block in the yml + a weekly workflow). Today's roles are stable; rotation isn't needed yet. - Linear-as-source-of-truth integration (Approach 4 from the design discussion; deferred). Verified: - Generator output is deterministic (idempotent re-runs). - scripts/check-agents-md.sh OK (28 links, 28 docs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * codeowners: fix catch-all ordering (Devin review #88) Devin caught a real bug: GitHub CODEOWNERS uses "last match wins" semantics, but the generator emitted the catch-all `` AFTER specific patterns. Net effect: `` won for every file, silently nullifying the docs role and never routing reviews to @ragnorc. Fix is one-line — emit the default `` line before iterating the specific paths. Also: - Added a regression assertion in the generator: after rendering, the first non-comment line must start with `` if a default is configured. Generator exits non-zero otherwise. Catches the same class of mistake in any future refactor. - Rewrote the yml header comment, which incorrectly stated "keep more-specific paths after broader patterns" (correct for GitHub semantics but the generator was doing the opposite — so the comment read as a description of behavior when it was actually a contradicted intention). Verified by re-rendering: `` is now line 12, `crates/` is line 14, `docs/` is line 15, etc. README.md matches both `` and `README.md`; `README.md` is later → wins → @aaltshuler + @ragnorc both assigned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:26:06 +03:00
Andrew Altshuler	c142dafdf3	schema-lint chassis v0: code-tagged diagnostics (MR-694) (#87 ) First slice of the schema-lint chassis. Adds stable `OG-XXX-NNN` codes to schema-migration rejections so operators can suppress, look up, and filter on identifiers rather than free-text prose. Atlas-style chassis adapted to omnigraph's typed-IR substrate (no SQL injection vector, no per-engine locks, native edge/vector/embedding types). What's in v0: - New `omnigraph-compiler/src/lint/` module with: - `diagnostic.rs` — Family / SafetyTier / Severity enums covering ten families (DS, MF, CD, BC, NM, OW, NL, VE, ED, LK). Only DS and MF are populated in this PR. - `codes.rs` — 8 DiagnosticCode constants (OG-DS-101..105, OG-MF-103, OG-MF-104, OG-MF-106). Five of the eight are wired to real emission sites; the other three are reserved. - Unit tests for catalog invariants: codes unique, prefix matches family, suffixes are 3-digit, destructive defaults to error, lookup() works, EMITTED_IN_V0 codes exist in ALL_CODES. - `SchemaMigrationStep::UnsupportedChange` gains an optional `code: Option<String>` field. New `unsupported_error_message()` helper prefixes the message with `[code]` when present. - 5 of 17 existing rejection paths now carry codes: - `removing node type` → OG-DS-102 - `removing edge type` → OG-DS-103 - `removing property` → OG-DS-104 - `adding required property without backfill` → OG-MF-103 - `changing property type` → OG-MF-106 Remaining 12 paths carry `code: None` and are tagged as future work. - `schema_apply` surfaces the formatted error (with `[code]` prefix); CLI `omnigraph schema plan` renders the code on the `unsupported change on <entity>` line. - PR #62 destructive-rejection tests in `tests/schema_apply.rs` now assert on the stable code (`msg.contains("OG-DS-104")`) instead of the error-message substring. 11/11 tests pass. - New `docs/schema-lint.md` documents the v0 catalog + the 10 families + Atlas prior art. AGENTS.md index updated. What's explicitly NOT in v0 (subsequent PRs): - No severity config in `omnigraph.yaml` (MR-694 §2). - No `@allow(OG-XXX-NNN, "rationale")` suppression directive (§3). - No `--allow-data-loss` flag or destructive-tier enforcement. - No new `SchemaMigrationStep` variants (soft/hard drops, default, widen/narrow). MR-700, MR-697 land those. - No pre-migration checks (MR-941). - No CD / VE / LK / NM family rules (MR-942..945). - No CI integration (MR-946). Tests: 235 compiler tests, 11 schema_apply integration tests, 14 lint module tests, 55 CLI tests — all green. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:08:18 +03:00
Ragnor Comerford	f28f644bf2	Merge pull request #83 from ModernRelay/devin/1778623807-remove-orphan-loader-files Remove orphaned loader/{constraints,embeddings,jsonl}.rs files	2026-05-12 21:03:36 -07:00
Ragnor Comerford	53d41a30b4	Merge pull request #85 from ModernRelay/ragnorc/survey-state engine: pin stable-row-id preservation through stage_overwrite	2026-05-12 17:24:55 -07:00
Ragnor Comerford	3cc5c6a9a2	chore: gitignore the mdrip/ markdown snapshot cache npx mdrip writes fetched-page snapshots under mdrip/. The cache is a local-only working artifact (docs/lance.md is the curated index of upstream Lance pages we fetch on demand). Keep the cache out of the tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:02:14 -07:00
Ragnor Comerford	a30d1cc0dc	engine: stage_overwrite sets enable_stable_row_ids explicitly Defensive — Lance 4.0.0 preserves the source dataset's flag through Operation::Overwrite even when WriteParams omits it (pinned by the prior commit's test), but setting it explicitly matches the public overwrite_dataset path at line 454 and documents the dependency at the call site so a future refactor doesn't accidentally drop it. Setting it on a dataset created without stable row IDs is a no-op per Lance's row-id-lineage spec, so this stays correct for legacy datasets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:57:05 -07:00
Ragnor Comerford	549060f297	tests: pin stable-row-id preservation across stage_overwrite stage_overwrite is used by schema_apply to rewrite tables when an additive migration touches data. If Lance Operation::Overwrite ever stopped preserving the source dataset's enable_stable_row_ids flag, every schema_apply that triggers a rewrite would silently disable stable row IDs on the affected tables and downstream readers that depend on _rowid stability (change-feed validators, index reconcilers) would observe silent corruption. Empirically Lance 4.0.0 does preserve the flag through Overwrite even when WriteParams omits it — but the preservation isn't documented at the Lance spec level, so pin it here. Any future behaviour change surfaces as a test failure rather than silent corruption. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:56:58 -07:00
Ragnor Comerford	2121d9f6c3	docs: storage stable-row-ids reflects every dataset The L1 capability list claimed the flag was enabled "for the commit-graph and run-registry datasets" — stale. Every Lance dataset OmniGraph creates has enable_stable_row_ids: true; the run-registry datasets are gone since MR-771. Replace with a single paragraph capturing the invariant, the consequences (row-version columns available, CreateIndex × Rewrite not retryable, Lance reader version required), the legacy-dataset constraint (one-way at create, dump-and-reload to migrate), and a pointer to the regression test in staged_writes.rs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:56:51 -07:00
Ragnor Comerford	8427e705dd	Merge pull request #84 from ModernRelay/ragnorc/read-docs docs: lead AGENTS.md first principle with integrated-over-time framing	2026-05-12 16:38:52 -07:00
Ragnor Comerford	24c0558180	docs: lead AGENTS.md first principle with integrated-over-time framing Reframes the first-principle section to lead with Winters' "engineering is programming integrated over time" as the lens, keeping "minimize ongoing liability" as the operative directive and folding in "complexity should be earned." Adds a new Tiebreakers subsection with two rules that the prior section lacked clean appeals for: - correctness > simplicity > performance (lexicographic) - reversibility shapes evidence demand (reversible → prod metrics over napkin math over RFCs; irreversible → RFC up-front) Adds a Hyrum's-Law deny-list entry in both AGENTS.md and docs/invariants.md §IX: shipping observable behavior is shipping a contract, even when undocumented. Net always-on context cost: ~7 lines. No renumbering of §I–VIII invariants; Hyrum's Law lands in the deny-list to avoid breaking back-references. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:27:24 -07:00
Devin AI	327eb821b5	Remove orphaned loader/{constraints,embeddings,jsonl}.rs files These three files in crates/omnigraph/src/loader/ have no `mod` declaration anywhere in the workspace and no `#[path = "…"]` reference. They are not compiled — `touch`-ing them does not trigger `cargo check` to recompile anything. Their imports (`crate::catalog::schema_ir`, `crate::error::NanoError`, `crate::store::manifest::hash_string`, `crate::types::ScalarType`, `super::super::graph::DatasetAccumulator`) reference modules that no longer exist in the engine crate, so they could not even be wired in without further work. They are vestigial code from an earlier monolithic crate layout. The live functionality is independently implemented inside crates/omnigraph/src/loader/mod.rs. These files have been orphaned since the initial public commit. `cargo check --workspace --all-targets` and `cargo test --workspace --no-run` both pass with no new warnings. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-12 22:57:20 +00:00
Claude	a9c4423b82	Strengthen cleanup-then-optimize sequencing test with postconditions Reviewer feedback on PR #62: the original `cleanup_then_optimize_succeed_in_sequence` only unwrapped both calls and asserted nothing, so it didn't validate the claimed sequencing behavior. The concern that motivates the test is that cleanup destroys version history and optimize on a freshly-cleaned table could trip on dropped fragment refs or stale manifests. Rename to `cleanup_then_optimize_preserves_rows_and_table_remains_writable` and add three concrete postconditions: row counts in both Person and Company tables survive the sequence; the head remains readable; and a subsequent merge load still succeeds.	2026-05-12 23:36:01 +03:00
Claude	57a62756c5	Exercise actual type rename in schema-apply rename test The previous version of `apply_schema_renames_node_type_via_rename_from_and_preserves_rows` kept the node name as `Person` (`@rename_from("Person")`) and only renamed a property. The planner only emits a `RenameType` step when the new name differs from the accepted one, so the test name overstated what it covered: a regression in `RenameType` step emission or in the coordinator's table-key remap during type rename could pass while the test still went green. Rename the desired node from `Person` to `Human` (with `@rename_from("Person")`), update the dependent edge endpoints to point at `Human`, and assert both the `RenameType` step and that the manifest table key has moved from `node:Person` to `node:Human`.	2026-05-12 23:36:01 +03:00
Claude	e22d468e27	Add maintenance + destructive-migration test coverage The audit of test coverage flagged three holes: - `omnigraph optimize` and `omnigraph cleanup` had no integration tests (no `maintenance.rs`). Add one covering empty/idempotent edges, the policy-validation contract on `cleanup`, and head preservation under aggressive policies. - `apply_schema` only covered I32 -> I64 type-change rejection. Add the symmetric narrowing case plus rejections for the other destructive shapes (drop property with data, drop node type, drop edge type, add required property without backfill) and assert the manifest version doesn't advance. Add a positive `@rename_from` case to pin the stable-type-id contract preserves rows through a rename. - `docs/testing.md` was missing `validators.rs` and the new `maintenance.rs` from its file table; bump the count and add rows.	2026-05-12 23:36:01 +03:00
devin-ai-integration[bot]	6914e0256e	MR-786: merge-pair truth table with exhaustive op-variant matrix (#81 ) * MR-786: merge-pair truth table with exhaustive op-variant matrix Add crates/omnigraph/tests/merge_truth_table.rs that enumerates every (left_op, right_op) cell from the operation vocabulary named in the ticket — {noop, addNode, removeNode, addEdge, removeEdge, setProperty, dropProperty, addLabel, removeLabel} — and asserts the deterministic outcome of Omnigraph::branch_merge against a structured oracle. The matrix is built in a 9x9 match in build_case, so adding a new OpVariant is a compile-time, fail-on-omission task. Today's mutation grammar only exposes insert \| update set \| delete (see docs/query-language.md), so the 36 cells over the first six ops are executable and the 45 cells involving dropProperty/addLabel/removeLabel are recorded as Expected::Unsupported with a note. Each executable cell spins up a fresh tempdir, applies one mutation per branch, calls branch_merge, and asserts either: * MergeOutcome (AlreadyUpToDate / FastForward / Merged) plus a GraphAssert on the affected entities, or * an OmniError::MergeConflicts whose entries match the expected table_key + MergeConflictKind (row_id is optional because edge ULIDs are generated at runtime). branch_merge is directional, so the (L, R) and (R, L) cells live in separate entries in the matrix and are run independently — the op-pair symmetry encoded in build_case serves as the commutativity oracle without doubling the runtime. End-to-end the suite runs in ~10s on a fresh build, well under the 30s budget asserted at the bottom of the test. Also adds a row to docs/testing.md so the test-coverage map points future agents at this file. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> * Use one Omnigraph handle for both branches Self-review caught that the runner was opening two Omnigraph handles on the same temp dataset (one for main, a second via Omnigraph::open for feature). tests/branching.rs uses one handle and passes the branch name to mutate_branch — same pattern works here and avoids any cache-coherency surprises between the two handles. Also drops the post-merge reopen, which only existed to give the second handle a fresh snapshot. Runtime drops ~10s -> ~9s. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> * Assert exact conflict count, not subset inclusion cubic and Devin Review both flagged that check_outcome's Expected::Conflicts arm only enforces want ⊆ got, so a regression that produces a spurious extra conflict (e.g. emitting both OrphanEdge and a stray DivergentInsert) would silently pass the truth-table cell. For a deterministic oracle that's the wrong direction — the cell pins the exact conflict-artifact set, not a lower bound. Add an assert_eq!(got.len(), want.len()) before the existence loop. All 36 executable cells still pass; runtime unchanged. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> * Subsume 4 conflict tests in branching.rs into truth table The four `branch_merge_reports__conflict` tests (DivergentUpdate / DivergentInsert / DeleteVsUpdate / OrphanEdge) were redundant with the deterministic-oracle cells in the new `merge_truth_table.rs` and only added drift risk. To preserve the post-conflict invariant that lived in `branch_merge_reports_divergent_update_conflict` (target unchanged after a failed merge), the truth-table runner now generalizes it: on every `Conflicts` cell, main's state is asserted against `state_after_apply_only(right_op)`. That gives strictly more coverage than the deleted tests carried, since the invariant now applies to all* seven conflict cells, not just one. The `UniqueViolation` and `CardinalityViolation` cases stay in `branching.rs` — they're combinatorial (require >1 op per side with a non-default schema) and out of scope for the pair-wise truth table. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> * Fix misleading 'Total edges: 0' comment in (AddEdge, RemoveEdge) cell Devin Review flagged that the comment said 'Total edges: 0' while the parenthetical math evaluates to 1 (matching `GraphAssert::base()`). The assertion is correct; only the leading number in the comment was wrong. Reworded to 'Net edges: … = 1 (matches base)' so the prose agrees with both the math and the assertion. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> --------- Co-authored-by: Ragnor <ragnor@modernrelay.com> Co-authored-by: Ragnor Comerford <ragnor.comerford@gmail.com> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-12 22:36:01 +03:00
Ragnor Comerford	3bd072c917	docs: add docs/transactions.md — branch-as-transaction explainer (#69 ) The architectural rule "no cross-query BEGIN/COMMIT; branches fill that role" lives in docs/invariants.md §VI.23 but is not surfaced anywhere user-facing. New users coming from Postgres/MySQL hit the gap when they realize multiple queries on main are independently atomic, not jointly atomic. This page explains the model with worked examples: * Single-query multi-statement (atomic by default) * Two separate queries on main (NOT atomic — common surprise) * Many queries via a branch (atomic at merge) * Coordinating multiple agents via branch-per-agent Plus a comparison table to BEGIN/COMMIT, failure-mode rundown, and "when to use what" decision matrix. Linked from AGENTS.md "Where to find each topic" between branches-commits.md and runs.md. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:35:57 +03:00
Ragnor Comerford	c9c7c0672e	Update README.md	2026-05-12 08:17:31 -07:00
Ragnor Comerford	c2e3a9e5c3	Add use cases for unified company brain and context graphs	2026-05-12 08:08:08 -07:00
Ragnor Comerford	676c9eab05	Merge pull request #78 from ModernRelay/devin/1778363660-mr-901-blob-branch-merge Fix branch merge with blob columns	2026-05-12 07:31:04 -07:00
Ragnor Comerford	d6d2763609	Merge pull request #80 from ModernRelay/devin/1778524905-mr-923-merge-restore-refresh Fix MR-923: refresh restored coordinator on merge Err path	2026-05-11 15:55:43 -07:00
Devin AI	725d41205e	Drop redundant server-level regression test The matrix cell d:merge×change:into-target already exercises this race: pre-fix it flakes ~20% on shared-CPU hardware (sentinel 409s); post-fix it passes 100% regardless of which side of the racing pair returns first. That flake-to-stable transition is the regression signal. The replacement test (concurrent_merge_clean_409_does_not_poison_next_ change_on_target) tried to sharpen this by looping until the clean- 409 path fired and then strictly requiring it. On fast CI hardware the race window never opens in 50 iterations, which made the strict variant fail in CI despite passing 10/10 locally. The bug genuinely needs a real concurrent writer to advance on-disk manifest during the swap window — a deterministic failpoint can't substitute because forcing the merge body to Err without a real concurrent writer leaves no cache-vs-disk drift to validate. Reverting to the matrix cell as the sole regression coverage. Updated the comment in merge.rs accordingly. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 21:57:47 +00:00
Devin AI	a6c7e5fab5	Use if-let shape for refresh outcome handling Switch from match-on-Result to if-let-Err so the refresh outcome and merge_result outcome are checked independently, making the intent clearer: 'attempt refresh; on Ok-merge-with-refresh-error propagate; on Err-merge-with-refresh-error log and surface the original merge error'. No semantic change — both shapes were valid (wildcard patterns don't move the scrutinee) — but the if-let form sidesteps a needs-second-reading question raised in code review. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 21:50:26 +00:00
Devin AI	7d1a40102c	Address review feedback merge.rs: best-effort refresh on the Err path so a refresh-time storage error doesn't replace the merge body's structured error (typically the manifest_conflict that the HTTP layer maps to a 409 with a structured payload) with a less informative one. Ok-path behavior is unchanged — there a refresh failure is propagated so the caller knows the coord's cache is unsynced. server.rs: bump MAX_ITERATIONS to 50 and assert at the end that the named clean-409 path actually fired at least once. With ~20% per-iter rate on shared-CPU CI (per the original MR-923 repro), P(no hit in 50) is < 0.002%. Without this assertion the test silently degraded to exercising only the 200-merge path — covered already by the matrix cell. Both changes per Devin Review + cubic comments on PR #80. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 21:35:18 +00:00
Devin AI	b7353e1dc7	Use refresh_coordinator_only to avoid racing branch_merge's sidecar The previous fix used `self.refresh()` to sync the restored coordinator's cache after the swap-restore window. `refresh` runs the `RollForwardOnly` recovery sweep — which, on the merge Err path with a phase-B failure (sidecar written, per-table HEAD advanced, manifest publish skipped), would observe the merge's own in-flight sidecar and close it here. That violates the contract documented on `Omnigraph::refresh`: > Engine-internal callers that already hold an in-flight sidecar > (e.g. `schema_apply` mid-write) MUST use `refresh_coordinator_only` > to avoid the recovery sweep racing their own sidecar. The post-restore step's purpose is to sync the coord cache with disk, not to run recovery, so `refresh_coordinator_only` is the right primitive on both paths. CI surfaced this via `branch_merge_phase_b_failure_recovered_on_next_open` in `crates/omnigraph/tests/failpoints.rs`, which asserts the sidecar persists after the failpoint fires. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 21:09:44 +00:00
Devin AI	e91d5615c6	Fix MR-923: refresh restored coordinator on merge Err path branch_merge_impl swaps the coordinator for the merge target, runs the merge body, then restores the original coordinator. A concurrent /change on the same target during this window publishes against the swapped coord, advancing on-disk manifest state that the restored coord doesn't see. The post-restore refresh was previously gated on merge_result.is_ok(), so the clean-409 path (merge body's post_queue_snapshot drift check returning a recoverable conflict) left the restored coord's cached snapshot stale relative to disk. The next sequential /change seeded its publisher expected_versions from that stale cache and 409'd with ExpectedVersionMismatch — a non-retryable conflict surfaced to a caller with no concurrent writer of their own. Refresh on both Ok and Err paths so cached state cannot diverge from the manifest across the swap-restore window. Add a focused regression test (concurrent_merge_clean_409_does_not_poison_next_change_on_target) that loops the cell-d scenario until the clean-409 branch fires and asserts the follow-up sentinel succeeds in that branch specifically. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 20:31:18 +00:00
Devin AI	fca2b74dee	Materialize external blob URIs during branch merge Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 12:54:04 +00:00
Devin AI	da89e18e62	Merge main into blob merge fix Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-10 21:55:02 +00:00
Ragnor Comerford	19e9292ec0	Merge pull request #75 from ModernRelay/ragnorc/mr-686-lance Per-table writer queues + per-actor admission + op-kind-aware version check	2026-05-10 23:50:56 +02:00
Devin AI	7a338a8223	agents: keep guide short for context	2026-05-10 14:41:02 +00:00
Devin AI	4eb865b340	docs: expand 0.4.2 release notes	2026-05-10 14:37:58 +00:00
Devin AI	e44a4704eb	docs: fix admission gating description	2026-05-10 14:16:26 +00:00
Devin AI	a42d178119	release: prepare omnigraph 0.4.2	2026-05-10 14:02:28 +00:00
Devin AI	31b8ffe7b5	engine: inline-delete sidecar covers version-mismatch check	2026-05-10 10:37:46 +00:00
Devin AI	01660faa26	Tighten blob descriptor validation Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-10 09:28:44 +00:00
Devin AI	16ac166059	Fix branch merge with blob columns Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-09 22:33:29 +00:00
Devin AI	6a3f0677ae	server: drop unwired try_admit_rewrite / 503 admission surface	2026-05-09 20:58:17 +00:00
Devin AI	4bb7964af9	tests: matrix cell k asserts post-reopen row count	2026-05-09 20:16:44 +00:00

1 2 3 4 5 ...

303 commits