mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-09 01:35:18 +02:00
MR-794 step 2: address PR #68 review — merge semantics, cardinality, residual
Five fixes from PR #68 review (Cursor Bugbot + Codex + Cubic): * **scan_with_pending gains merge-shadow semantics** (Codex P1, Cubic P1#1): new `key_column: Option<&str>` parameter. When set, committed rows whose key value appears in any pending batch are excluded from the scan — making `scan_with_pending` correctly merge-semantic for chained updates instead of naively unioning. execute_update calls with Some("id"). Without this, a chained `update where age > 30` could match a row whose pending value already moved out of range. * **Multi-delete on same table no longer trips ExpectedVersionMismatch** (Cursor Bugbot HIGH): open_table_for_mutation routes through reopen_for_mutation when staging.inline_committed has the table, using the post-inline-commit Lance version captured at record_inline time. The legacy open_for_mutation_on_branch fence (Lance HEAD == manifest pinned) is correct cross-writer but wrong intra-query when deletes have already advanced HEAD on this table. Branch goes away when Lance ships two-phase delete (lance-format/lance#6658). * **Cardinality validation consolidated** (Cursor LOW + Codex P2 + Cubic P1#2 + Cubic P2): new exec/staging::count_src_per_edge + enforce_cardinality_bounds shared by mutation and loader paths. Restores the missing min-cardinality check on the engine path. Loader Merge mode passes Some("id") to dedupe edges being updated by id (not double-count committed + pending). Loader Append mode and engine path pass None (ULID-generated ids never collide). * **Dead count_rows_with_pending removed** (Cursor LOW): never called. * **Misleading concat-helper comment fixed** (Cubic P3): claimed schema normalization the helper doesn't implement. Updated to match reality. * **Documentation honesty** (Cubic P1#3): MR-794 narrows but doesn't eliminate the "Lance HEAD ahead of __manifest" drift class. Drift is unreachable for op-execution failures (the partial_failure test pins this), but a residual remains at the finalize→publisher boundary because Lance has no multi-dataset commit primitive: per-table commit_staged calls run sequentially before manifest commit. Updated docs/runs.md, docs/invariants.md §VI.25, docs/releases/v0.4.1.md to scope the claim precisely. * **Failpoint test pinning the residual**: new mutation.post_finalize_pre_publisher failpoint + two tests in tests/failpoints.rs that confirm the documented residual behavior. Catches future regressions that widen the residual. Test additions on tests/runs.rs: * chained_updates_with_overlapping_predicate_respects_intermediate_value * multi_statement_delete_on_same_node_table * cascade_delete_node_then_explicit_delete_edge_on_same_table * mutation_insert_edge_enforces_min_cardinality * load_merge_mode_dedupes_edge_for_cardinality_count 113/113 engine integration tests pass (runs + end_to_end + consistency + staged_writes + validators). Failpoints feature build runs in CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
a61e82f47a
commit
3223b51cf1
9 changed files with 828 additions and 199 deletions
|
|
@ -110,7 +110,7 @@ Specific defaults (timeout values, memory caps, TTL windows) are *configuration*
|
|||
*Status: aspirational — referential integrity at scale requires SIP-backed cross-table validation; not yet implemented. Cross-batch / cross-version uniqueness tracked in MR-714.*
|
||||
|
||||
25. **Isolation: per-query snapshot; read-your-writes within and across queries in a session.** Each query reads from one consistent manifest version. Within a multi-statement mutation, the read subplan inside each write operator sees the writes from earlier statements. Across queries in a session, reads always resolve the latest manifest version — no reader pinning to older snapshots.
|
||||
*Status: upheld for inserts/updates after MR-794 step 2+ — `MutationStaging`'s in-memory accumulator + `TableStore::scan_with_pending` (DataFusion `MemTable` union with the committed Lance scan) implements read-your-writes within a multi-statement mutation. Delete-touching mutations are limited to delete-only by parse-time D₂; closing the within-query RYW gap for deletes requires Lance's two-phase delete API (tracked: MR-793 / Lance-upstream).*
|
||||
*Status: upheld for inserts/updates after MR-794 step 2+ — `MutationStaging`'s in-memory accumulator + `TableStore::scan_with_pending` (DataFusion `MemTable` union with the committed Lance scan, with merge-shadow semantics for chained updates) implements read-your-writes within a multi-statement mutation. Delete-touching mutations are limited to delete-only by parse-time D₂; closing the within-query RYW gap for deletes requires Lance's two-phase delete API (tracked: MR-793 / Lance-upstream lance-format/lance#6658). The "Lance HEAD ahead of `__manifest`" drift class is unreachable for op-execution failures (the partial-failure test pins this), but a narrowed residual remains at the finalize→publisher boundary because Lance has no multi-dataset commit primitive — see [docs/runs.md](runs.md) "Finalize → publisher residual".*
|
||||
|
||||
26. **Durability before acknowledgement.** Commit returns only after the substrate has confirmed durable persistence. No "fast" or "fire-and-forget" durability levels.
|
||||
|
||||
|
|
|
|||
|
|
@ -14,8 +14,13 @@ mutation proceeds normally.
|
|||
`MutationStaging.pending` per touched table. No Lance HEAD advance
|
||||
happens during op execution; one `stage_*` + `commit_staged` per
|
||||
table runs at end-of-query, then `ManifestBatchPublisher::publish`
|
||||
commits the manifest atomically. A mid-query failure leaves Lance
|
||||
HEAD untouched on staged tables.
|
||||
commits the manifest atomically. **For op-execution failures**
|
||||
(validation errors, missing endpoints, parse-time D₂ rejection), Lance
|
||||
HEAD on every staged table is untouched and the next mutation
|
||||
proceeds normally. A narrowed residual remains at the
|
||||
finalize→publisher boundary (multi-table `commit_staged` is not
|
||||
atomic with the manifest commit) — see [docs/runs.md](../runs.md)
|
||||
"Finalize → publisher residual" for details.
|
||||
- **D₂ parse-time rule**: a single mutation query is either
|
||||
insert/update-only or delete-only. Mixed → rejected with a clear
|
||||
error directing the caller to split into two queries. Lance 4.0.0
|
||||
|
|
|
|||
36
docs/runs.md
36
docs/runs.md
|
|
@ -75,6 +75,42 @@ will replace it. Operator-driven (rare in agent workloads); document
|
|||
permanently until Lance exposes `Operation::Overwrite { fragments }` as
|
||||
a two-phase op.
|
||||
|
||||
### Finalize → publisher residual
|
||||
|
||||
The staged-write rewire eliminates one drift class **by construction at
|
||||
the writer layer**: an op that fails before pushing to the in-memory
|
||||
accumulator (validation errors, missing endpoints, parse-time D₂
|
||||
rejection) leaves Lance HEAD untouched on every staged table. This is
|
||||
the case the `partial_failure_leaves_target_queryable_and_unblocks_next_mutation`
|
||||
test pins.
|
||||
|
||||
A second, narrower drift class remains. `MutationStaging::finalize`
|
||||
runs `stage_*` + `commit_staged` per touched table sequentially, then
|
||||
the publisher commits the manifest. Lance has no multi-dataset atomic
|
||||
commit, so the per-table `commit_staged` calls are independent
|
||||
operations: if commit_staged on table N+1 fails *after* commit_staged
|
||||
on tables 1..N succeeded, or if the publisher's CAS pre-check rejects
|
||||
*after* every commit_staged succeeded, tables 1..N are left at
|
||||
`Lance HEAD = manifest_pinned + 1`. The next mutation against those
|
||||
tables surfaces `ManifestConflictDetails::ExpectedVersionMismatch` —
|
||||
the same loud failure mode the rewire was designed to make rare, just
|
||||
no longer "unreachable."
|
||||
|
||||
Triggers: transient Lance write errors during finalize (object-store
|
||||
retry budget exhaustion, disk full); persistent publisher contention
|
||||
exceeding `PUBLISHER_RETRY_BUDGET = 5` retries. Closing this requires
|
||||
either a Lance multi-dataset atomic-commit primitive (filed upstream
|
||||
alongside the two-phase delete request) or a manifest-layer journal
|
||||
that replays staged commits on next open. Both are heavyweight; the
|
||||
v1 stance is "narrowed window, documented residual, surface the loud
|
||||
error when it fires."
|
||||
|
||||
The publisher-CAS contract is unchanged: a *concurrent writer* that
|
||||
advances any of our touched tables between snapshot capture and
|
||||
publisher commit produces exactly one winner. The residual above is
|
||||
about *our* abandoned commits in the failure path, not about
|
||||
concurrency races.
|
||||
|
||||
## Conflict shape
|
||||
|
||||
Concurrent writers to the same `(table, branch)` produce exactly one
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue