MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup

Refresh user-facing and agent-facing docs for the staged-write rewire
and clean up stale Run-state-machine references that survived MR-771.

MR-794-specific updates:
* docs/runs.md — remove "Known limitation: mid-query partial failure"
  section; document the in-memory accumulator + D₂ rule + the
  LoadMode::Overwrite residual.
* docs/invariants.md §VI.25 — flip from aspirational/open to
  upheld for inserts/updates. Within-query read-your-writes is now
  load-bearing for the publisher CAS contract.
* docs/architecture.md — add "Mutation atomicity — in-memory
  accumulator (MR-794)" subsection with per-op flow; refresh the
  engine + state diagrams to drop RunRegistry and add MutationStaging.
* docs/execution.md — rewrite the mutation flow sequence diagram
  for the staged-write path; updated the LoadMode table to call
  out per-mode commit semantics; rewrote load vs ingest.
* docs/query-language.md — document the D₂ parse-time rule.
* docs/errors.md — add the D₂ BadRequest rejection path.
* docs/testing.md — extend the runs.rs row to cover the new MR-794
  contract tests; add the staged_writes.rs row.
* docs/releases/v0.4.1.md (new) — release note covering the rewire,
  test additions, residuals, and files changed.
* AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query
  description and the L2 capability matrix row.

Stale-reference cleanup (MR-771 leftovers):
* docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance
  from the layout diagram and prose; mark legacy.
* docs/branches-commits.md — move __run__<id> to a legacy note;
  remove publish_run from the publish-trigger list.
* docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as);
  legacy RunRecord.actor_id moved to a historical note.
* docs/constants.md — mark run registry / branch-prefix rows as legacy.
* docs/cli.md — replace the legacy omnigraph run * quickstart block
  with omnigraph commit list/show.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Ragnor Comerford 2026-05-01 10:43:19 +02:00
parent 82350bdc4a
commit a61e82f47a
No known key found for this signature in database
14 changed files with 342 additions and 121 deletions

138
docs/releases/v0.4.1.md Normal file
View file

@ -0,0 +1,138 @@
# Omnigraph v0.4.1
Omnigraph v0.4.1 closes the multi-statement-mutation atomicity gap that
v0.4.0 documented as a known limitation. Inserts and updates now route
through an in-memory `MutationStaging` accumulator and commit via Lance's
two-phase distributed-write API at end-of-query. A failed mid-query op
no longer leaves Lance HEAD drifted on the touched table — the next
mutation proceeds normally.
## Highlights
- **Staged-write rewire (MR-794)**: `mutate_as` and `load` (Append /
Merge modes) accumulate insert/update batches into
`MutationStaging.pending` per touched table. No Lance HEAD advance
happens during op execution; one `stage_*` + `commit_staged` per
table runs at end-of-query, then `ManifestBatchPublisher::publish`
commits the manifest atomically. A mid-query failure leaves Lance
HEAD untouched on staged tables.
- **D₂ parse-time rule**: a single mutation query is either
insert/update-only or delete-only. Mixed → rejected with a clear
error directing the caller to split into two queries. Lance 4.0.0
has no public two-phase delete; deletes still inline-commit, and D₂
keeps that path safe.
- **Read-your-writes via DataFusion `MemTable`**: read sites in
multi-statement mutations consume `TableStore::scan_with_pending`,
which Lance-scans the committed snapshot at the captured
`expected_version` and unions with a DataFusion `MemTable` over the
pending batches. Replaces the previous "reopen at staged Lance
version" pattern.
- **Coordinator swap-restore eliminated** from `mutate_with_current_actor`.
Branch is threaded explicitly through the per-op execution path
(`execute_named_mutation`, `execute_insert`, `execute_update`,
`execute_delete*`, `validate_edge_insert_endpoints`,
`ensure_node_id_exists`). The `swap_coordinator_for_branch` /
`restore_coordinator` API and `CoordinatorRestoreGuard` are removed
from `mutation.rs`. (`merge.rs` keeps its own swap pattern; that's
a separate workflow tracked in MR-793.)
- **`docs/invariants.md` §VI.25** flips from `aspirational/open` to
`upheld for inserts/updates`. The within-query read-your-writes
guarantee is now load-bearing for the publisher CAS contract.
## Behavior changes
- A failed multi-statement mutation no longer surfaces
`ExpectedVersionMismatch` on the *next* mutation against the same
table. The next call proceeds normally — Lance HEAD on staged
tables is unchanged.
- Mixed insert/update + delete in one query is rejected at parse
time. Existing test queries that mixed both must be split.
- `MutationStaging`'s shape changed: `pending: HashMap<String, PendingTable>`
+ `inline_committed: HashMap<String, SubTableUpdate>` replaces the
previous `latest: HashMap<String, StagedTable>`. This is an internal
type; no public API impact.
## Residual / out of scope
- **`LoadMode::Overwrite`** keeps the legacy inline-commit path
(truncate-then-append doesn't fit the staged shape). A mid-overwrite
failure can still drift Lance HEAD on a partially-truncated table;
the next overwrite replaces it. Operator-driven, rare.
- **Delete-only multi-statement mutations** still inline-commit per op.
D₂ keeps inserts/updates from coexisting with deletes, so the
inline path remains atomic per op but not per query for delete-only
cascades. Closing this requires Lance to expose
`DeleteJob::execute_uncommitted`; tracked in MR-793 / Lance-upstream.
- **`schema_apply`, `branch_merge_internal`, `ensure_indices`** still
use Lance's inline-commit APIs. The two-phase pattern is in
`mutate_as` and `load` only; hoisting it to a storage-trait
invariant covering all writers is MR-793.
## Tests added
- `tests/runs.rs::partial_failure_leaves_target_queryable_and_unblocks_next_mutation`
(replaces the old `partial_failure_observably_rolls_back_but_blocks_next_mutation_on_same_table`)
- `tests/runs.rs::mutation_rejects_mixed_insert_and_delete_at_parse_time`
- `tests/runs.rs::mixed_insert_and_update_on_same_person_coalesces_to_one_merge`
- `tests/runs.rs::multiple_appends_to_same_edge_coalesce_to_one_append`
- `tests/runs.rs::multi_statement_inserts_publish_exactly_once`
- `tests/runs.rs::load_with_bad_edge_reference_unblocks_next_load`
- `tests/runs.rs::load_with_cardinality_violation_unblocks_next_load`
## Files changed
- `crates/omnigraph/src/exec/staging.rs` (NEW) — `MutationStaging`,
`PendingTable`, `PendingMode`, `StagedTablePath`,
`dedupe_merge_batches_by_id`.
- `crates/omnigraph/src/exec/mutation.rs` — D₂ check; per-op
rewires (`execute_insert`, `execute_update`, `execute_delete*`);
branch threading; coordinator-swap removal; helper
`validate_edge_cardinality_with_pending`; helper
`concat_match_batches_to_schema`; `apply_assignments` updated to
copy unassigned blob columns from full-schema scans.
- `crates/omnigraph/src/loader/mod.rs``load_jsonl_reader` split:
staged path for Append/Merge, legacy inline-commit path for
Overwrite. Helpers `collect_node_ids_with_pending` and
`validate_edge_cardinality_with_pending_loader`.
- `crates/omnigraph/src/table_store.rs``scan_with_pending`,
`count_rows_with_pending` (DataFusion `MemTable`-backed union with
Lance scan).
- `Cargo.toml` (workspace) + `crates/omnigraph/Cargo.toml` — added
`datafusion = "52"` direct dep (transitively pulled by Lance
already; required for `MemTable`).
- `docs/runs.md` — removed "Known limitation" section; documented
the new accumulator + D₂ + LoadMode::Overwrite residual.
- `docs/invariants.md` — §VI.25 status flipped to `upheld for
inserts/updates`.
- `docs/architecture.md` — added "Mutation atomicity — in-memory
accumulator (MR-794)" subsection; refreshed the engine + state
diagrams to drop `RunRegistry` and add `MutationStaging`.
- `docs/execution.md` — rewrote the mutation flow sequence diagram
for the staged-write path; updated the `LoadMode` table to call
out per-mode commit semantics; rewrote `load` vs `ingest`.
- `docs/query-language.md` — documented the D₂ parse-time rule.
- `docs/errors.md` — added the D₂ `BadRequest` rejection path.
- `docs/storage.md` — dropped the live `_graph_runs.lance` reference
(legacy from MR-771) from the layout diagram and prose.
- `docs/branches-commits.md` — moved `__run__<id>` to a legacy note;
removed `publish_run` from the publish-trigger list.
- `docs/audit.md` — current `_as` API list refreshed; legacy
`RunRecord.actor_id` moved to a historical note.
- `docs/constants.md` — marked the run registry / branch-prefix rows
as legacy.
- `docs/cli.md` — replaced the legacy `omnigraph run *` quickstart
block with `omnigraph commit list/show`.
- `docs/testing.md` — extended the `runs.rs` row to cover the new
MR-794 contract tests; added the `staged_writes.rs` row.
- `AGENTS.md` (CLAUDE.md symlink) — updated the atomic-per-query
description and the L2 capability matrix row.
## Included Changes
- MR-794 step 2+ — rewire `mutate_as` and `load` via in-memory
`MutationStaging` + `stage_*` / `commit_staged` per touched table at
end-of-query.
- (MR-794 step 1 shipped in v0.4.0's PR #67`StagedWrite`,
`stage_append`, `stage_merge_insert`, `commit_staged`,
`scan_with_staged`, `count_rows_with_staged` — and is the substrate
this release builds on.)