omnigraph/docs/releases/v0.4.1.md
Ragnor Comerford 3223b51cf1
MR-794 step 2: address PR #68 review — merge semantics, cardinality, residual
Five fixes from PR #68 review (Cursor Bugbot + Codex + Cubic):

* **scan_with_pending gains merge-shadow semantics** (Codex P1, Cubic P1#1):
  new `key_column: Option<&str>` parameter. When set, committed rows
  whose key value appears in any pending batch are excluded from the
  scan — making `scan_with_pending` correctly merge-semantic for chained
  updates instead of naively unioning. execute_update calls with
  Some("id"). Without this, a chained `update where age > 30` could
  match a row whose pending value already moved out of range.

* **Multi-delete on same table no longer trips ExpectedVersionMismatch**
  (Cursor Bugbot HIGH): open_table_for_mutation routes through
  reopen_for_mutation when staging.inline_committed has the table,
  using the post-inline-commit Lance version captured at record_inline
  time. The legacy open_for_mutation_on_branch fence (Lance HEAD ==
  manifest pinned) is correct cross-writer but wrong intra-query when
  deletes have already advanced HEAD on this table. Branch goes away
  when Lance ships two-phase delete (lance-format/lance#6658).

* **Cardinality validation consolidated** (Cursor LOW + Codex P2 +
  Cubic P1#2 + Cubic P2): new exec/staging::count_src_per_edge +
  enforce_cardinality_bounds shared by mutation and loader paths.
  Restores the missing min-cardinality check on the engine path.
  Loader Merge mode passes Some("id") to dedupe edges being updated
  by id (not double-count committed + pending). Loader Append mode
  and engine path pass None (ULID-generated ids never collide).

* **Dead count_rows_with_pending removed** (Cursor LOW): never called.

* **Misleading concat-helper comment fixed** (Cubic P3): claimed
  schema normalization the helper doesn't implement. Updated to match
  reality.

* **Documentation honesty** (Cubic P1#3): MR-794 narrows but doesn't
  eliminate the "Lance HEAD ahead of __manifest" drift class. Drift is
  unreachable for op-execution failures (the partial_failure test pins
  this), but a residual remains at the finalize→publisher boundary
  because Lance has no multi-dataset commit primitive: per-table
  commit_staged calls run sequentially before manifest commit. Updated
  docs/runs.md, docs/invariants.md §VI.25, docs/releases/v0.4.1.md to
  scope the claim precisely.

* **Failpoint test pinning the residual**: new
  mutation.post_finalize_pre_publisher failpoint + two tests in
  tests/failpoints.rs that confirm the documented residual behavior.
  Catches future regressions that widen the residual.

Test additions on tests/runs.rs:

* chained_updates_with_overlapping_predicate_respects_intermediate_value
* multi_statement_delete_on_same_node_table
* cascade_delete_node_then_explicit_delete_edge_on_same_table
* mutation_insert_edge_enforces_min_cardinality
* load_merge_mode_dedupes_edge_for_cardinality_count

113/113 engine integration tests pass (runs + end_to_end + consistency
+ staged_writes + validators). Failpoints feature build runs in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:47:55 +02:00

7.5 KiB

Omnigraph v0.4.1

Omnigraph v0.4.1 closes the multi-statement-mutation atomicity gap that v0.4.0 documented as a known limitation. Inserts and updates now route through an in-memory MutationStaging accumulator and commit via Lance's two-phase distributed-write API at end-of-query. A failed mid-query op no longer leaves Lance HEAD drifted on the touched table — the next mutation proceeds normally.

Highlights

  • Staged-write rewire (MR-794): mutate_as and load (Append / Merge modes) accumulate insert/update batches into MutationStaging.pending per touched table. No Lance HEAD advance happens during op execution; one stage_* + commit_staged per table runs at end-of-query, then ManifestBatchPublisher::publish commits the manifest atomically. For op-execution failures (validation errors, missing endpoints, parse-time D₂ rejection), Lance HEAD on every staged table is untouched and the next mutation proceeds normally. A narrowed residual remains at the finalize→publisher boundary (multi-table commit_staged is not atomic with the manifest commit) — see docs/runs.md "Finalize → publisher residual" for details.
  • D₂ parse-time rule: a single mutation query is either insert/update-only or delete-only. Mixed → rejected with a clear error directing the caller to split into two queries. Lance 4.0.0 has no public two-phase delete; deletes still inline-commit, and D₂ keeps that path safe.
  • Read-your-writes via DataFusion MemTable: read sites in multi-statement mutations consume TableStore::scan_with_pending, which Lance-scans the committed snapshot at the captured expected_version and unions with a DataFusion MemTable over the pending batches. Replaces the previous "reopen at staged Lance version" pattern.
  • Coordinator swap-restore eliminated from mutate_with_current_actor. Branch is threaded explicitly through the per-op execution path (execute_named_mutation, execute_insert, execute_update, execute_delete*, validate_edge_insert_endpoints, ensure_node_id_exists). The swap_coordinator_for_branch / restore_coordinator API and CoordinatorRestoreGuard are removed from mutation.rs. (merge.rs keeps its own swap pattern; that's a separate workflow tracked in MR-793.)
  • docs/invariants.md §VI.25 flips from aspirational/open to upheld for inserts/updates. The within-query read-your-writes guarantee is now load-bearing for the publisher CAS contract.

Behavior changes

  • A failed multi-statement mutation no longer surfaces ExpectedVersionMismatch on the next mutation against the same table. The next call proceeds normally — Lance HEAD on staged tables is unchanged.
  • Mixed insert/update + delete in one query is rejected at parse time. Existing test queries that mixed both must be split.
  • MutationStaging's shape changed: pending: HashMap<String, PendingTable>
    • inline_committed: HashMap<String, SubTableUpdate> replaces the previous latest: HashMap<String, StagedTable>. This is an internal type; no public API impact.

Residual / out of scope

  • LoadMode::Overwrite keeps the legacy inline-commit path (truncate-then-append doesn't fit the staged shape). A mid-overwrite failure can still drift Lance HEAD on a partially-truncated table; the next overwrite replaces it. Operator-driven, rare.
  • Delete-only multi-statement mutations still inline-commit per op. D₂ keeps inserts/updates from coexisting with deletes, so the inline path remains atomic per op but not per query for delete-only cascades. Closing this requires Lance to expose DeleteJob::execute_uncommitted; tracked in MR-793 / Lance-upstream.
  • schema_apply, branch_merge_internal, ensure_indices still use Lance's inline-commit APIs. The two-phase pattern is in mutate_as and load only; hoisting it to a storage-trait invariant covering all writers is MR-793.

Tests added

  • tests/runs.rs::partial_failure_leaves_target_queryable_and_unblocks_next_mutation (replaces the old partial_failure_observably_rolls_back_but_blocks_next_mutation_on_same_table)
  • tests/runs.rs::mutation_rejects_mixed_insert_and_delete_at_parse_time
  • tests/runs.rs::mixed_insert_and_update_on_same_person_coalesces_to_one_merge
  • tests/runs.rs::multiple_appends_to_same_edge_coalesce_to_one_append
  • tests/runs.rs::multi_statement_inserts_publish_exactly_once
  • tests/runs.rs::load_with_bad_edge_reference_unblocks_next_load
  • tests/runs.rs::load_with_cardinality_violation_unblocks_next_load

Files changed

  • crates/omnigraph/src/exec/staging.rs (NEW) — MutationStaging, PendingTable, PendingMode, StagedTablePath, dedupe_merge_batches_by_id.
  • crates/omnigraph/src/exec/mutation.rs — D₂ check; per-op rewires (execute_insert, execute_update, execute_delete*); branch threading; coordinator-swap removal; helper validate_edge_cardinality_with_pending; helper concat_match_batches_to_schema; apply_assignments updated to copy unassigned blob columns from full-schema scans.
  • crates/omnigraph/src/loader/mod.rsload_jsonl_reader split: staged path for Append/Merge, legacy inline-commit path for Overwrite. Helpers collect_node_ids_with_pending and validate_edge_cardinality_with_pending_loader.
  • crates/omnigraph/src/table_store.rsscan_with_pending, count_rows_with_pending (DataFusion MemTable-backed union with Lance scan).
  • Cargo.toml (workspace) + crates/omnigraph/Cargo.toml — added datafusion = "52" direct dep (transitively pulled by Lance already; required for MemTable).
  • docs/runs.md — removed "Known limitation" section; documented the new accumulator + D₂ + LoadMode::Overwrite residual.
  • docs/invariants.md — §VI.25 status flipped to upheld for inserts/updates.
  • docs/architecture.md — added "Mutation atomicity — in-memory accumulator (MR-794)" subsection; refreshed the engine + state diagrams to drop RunRegistry and add MutationStaging.
  • docs/execution.md — rewrote the mutation flow sequence diagram for the staged-write path; updated the LoadMode table to call out per-mode commit semantics; rewrote load vs ingest.
  • docs/query-language.md — documented the D₂ parse-time rule.
  • docs/errors.md — added the D₂ BadRequest rejection path.
  • docs/storage.md — dropped the live _graph_runs.lance reference (legacy from MR-771) from the layout diagram and prose.
  • docs/branches-commits.md — moved __run__<id> to a legacy note; removed publish_run from the publish-trigger list.
  • docs/audit.md — current _as API list refreshed; legacy RunRecord.actor_id moved to a historical note.
  • docs/constants.md — marked the run registry / branch-prefix rows as legacy.
  • docs/cli.md — replaced the legacy omnigraph run * quickstart block with omnigraph commit list/show.
  • docs/testing.md — extended the runs.rs row to cover the new MR-794 contract tests; added the staged_writes.rs row.
  • AGENTS.md (CLAUDE.md symlink) — updated the atomic-per-query description and the L2 capability matrix row.

Included Changes

  • MR-794 step 2+ — rewire mutate_as and load via in-memory MutationStaging + stage_* / commit_staged per touched table at end-of-query.
  • (MR-794 step 1 shipped in v0.4.0's PR #67 — StagedWrite, stage_append, stage_merge_insert, commit_staged, scan_with_staged, count_rows_with_staged — and is the substrate this release builds on.)