staging: skip revalidation single-table; in-memory snapshot multi-table (PR 2 Step D)

Closes the PR 1b regression (-17% disjoint, -30% same-key) by
eliminating the fresh `db.snapshot_for_branch(branch).await` that
PR 1b's commit_all issued per mutation.

Single-table mutations (`staged.len() + inline_committed.len() == 1`):
skip revalidation entirely. The per-(table, branch) Mutex queue holds
exclusive while we commit; the publisher's CAS catches any drift
that slipped between expected_versions capture and queue acquisition.
Conflict cost: 1 orphan Lance HEAD advance, recovered via the existing
sidecar protocol on the next ReadWrite open. This is the same trade-off
the master plan §"Revalidation perf optimization" prescribes.

Multi-table mutations: replace `db.snapshot_for_branch(branch)` (fresh
manifest read) with `db.snapshot()` (in-memory). Correct under MR-686's
single-process scope because all in-process tenants share one
`Arc<Omnigraph>` -> one coordinator; publishes update the shared
coordinator BEFORE releasing queue guards, so a contending tenant
reads a fresh in-memory view by the time it acquires its queue keys.
The within-mutation race (A captures expected_versions[T2]=V0, B
publishes T2->V1 during A's stage I/O, A then acquires T2's queue)
is caught via the in-memory check. Multi-coordinator deployments
(§VI.27 aspirational) would need force-refresh under the queue —
documented in §VI's "Explicit non-commitments".

Adds a SAFETY comment naming the two load-bearing premises:
(1) per-table queue uses exclusive Mutex (not RwLock), and
(2) single-coordinator invariant (one Omnigraph engine per process).
Migrating either breaks this skip.

Regression sentinel `change_conflict_returns_manifest_conflict_409`
passes. 102 lib + 24 runs + 16 staged_writes pass with the new path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Ragnor Comerford 2026-05-07 16:53:51 +02:00
parent d08c42c369
commit 1b0a2c9310
No known key found for this signature in database

View file

@ -435,26 +435,42 @@ impl StagedMutation {
} }
let guards = db.write_queue().acquire_many(&queue_keys).await; let guards = db.write_queue().acquire_many(&queue_keys).await;
// Revalidate manifest pins. Read fresh per-branch snapshot — // Revalidate manifest pins (PR 2 perf optimization).
// in-memory `db.snapshot()` may be stale if another writer
// committed since our stage. If any pin moved past our
// expected_version, fail-fast before commit_staged moves
// Lance HEAD.
// //
// Both staged and inline-committed tables are revalidated. // Single-table mutations skip revalidation entirely: once the
// Inline-committed tables (delete-only path) had their Lance // per-(table, branch) queue is held, no concurrent writer can
// HEAD advanced before this point, but the *manifest pin* // move our table's pin (queue exclusivity); if revalidation
// shouldn't have moved if no other writer interleaved. If it // were to fail, the publisher's `expected_table_versions` CAS
// has, return manifest_conflict — the sidecar emitted below // catches the same drift. The cost on conflict is one orphan
// captures (expected, post) so the next open's recovery sweep // Lance HEAD advance, recovered via the sidecar protocol on
// can resolve the Lance-HEAD-vs-manifest divergence. // the next ReadWrite open.
// //
// Note: under PR 1b's intermediate state (global server RwLock // Multi-table mutations use the in-memory `db.snapshot()`
// in place), this revalidation is a no-op because no concurrent // (zero I/O) instead of `db.snapshot_for_branch(...)` (fresh
// writer can run. Becomes load-bearing once PR 2 removes the // manifest read). This is correct under MR-686's single-process
// global lock — see `.context/pr-1b-plan.md` Risk 3. // scope: all in-process tenants share one `Arc<Omnigraph>` and
if !staged.is_empty() || !inline_committed.is_empty() { // therefore one coordinator; publishes update the shared
let snapshot = db.snapshot_for_branch(branch).await?; // coordinator BEFORE releasing queue guards (see
// `commit_all` -> caller's publisher -> caller drops guards),
// so any tenant 2 acquiring queue keys after tenant 1 reads a
// fresh in-memory view. The within-mutation race (mutation A
// captures expected_versions[T2]=V0, tenant B publishes T2 to
// V1 during A's stage I/O, A then acquires T2's queue) is
// caught here via the in-memory check (B's publish updated the
// shared coordinator, so snapshot.entry(T2) returns V1 != V0).
//
// Multi-coordinator deployments (§VI.27 aspirational) would
// require force-refresh under the queue here. That trade-off
// is documented in §VI's "Explicit non-commitments" subsection.
//
// SAFETY: relies on (1) the per-(table, branch) WriteQueueManager
// using exclusive `tokio::sync::Mutex<()>` (not `RwLock`), and
// (2) the single-coordinator invariant (one Omnigraph engine
// per process). Migrating either premise breaks this skip; see
// master plan risk #2.
let total_tables = staged.len() + inline_committed.len();
if total_tables > 1 {
let snapshot = db.snapshot().await;
for entry in &staged { for entry in &staged {
let current = snapshot.entry(&entry.table_key).map(|e| e.table_version); let current = snapshot.entry(&entry.table_key).map(|e| e.table_version);
match current { match current {
@ -498,6 +514,9 @@ impl StagedMutation {
} }
} }
} }
// Avoid an unused-variable warning when `branch` is not consumed
// above (single-table fast path skips revalidation).
let _ = branch;
// Sidecar protocol: build the per-table pin list and write the // Sidecar protocol: build the per-table pin list and write the
// sidecar BEFORE any Lance commit_staged runs, so a crash // sidecar BEFORE any Lance commit_staged runs, so a crash