omnigraph/crates
Ragnor Comerford bdd6440c83
staging: re-capture expected_versions under queue (PR 2 Step D fix)
The Step D commit (1b0a2c9) skipped revalidation for single-table
mutations, betting that the publisher's CAS would be a no-op under the
per-(table, branch) queue. The bench falsified this: expected_versions
was captured during stage_all (BEFORE acquire_many), so by the time
the queue acquired and the publisher ran, those captured pins were
stale w.r.t. any in-process concurrent writer that had published in
between. Same-key 8x1 produced ~99% manifest_conflict 409 rejections
because every actor after the first carried stale expected_versions.

Fix: always re-read the in-memory snapshot under the queue and
overwrite expected_versions with the current per-table values.
Single-coordinator invariant (one Arc<Omnigraph> per process) makes
this safe with zero I/O — publishes update the shared coordinator
BEFORE releasing queue guards, so a contending tenant's read sees a
fresh view by the time it acquires its keys. The publisher's CAS
becomes a correct no-op for queued tables; cross-process drift
(coord stale because coord doesn't see external publishes) still
rejects via the publisher CAS as ExpectedVersionMismatch -> 409,
preserving the change_conflict_returns_manifest_conflict_409
regression sentinel.

Trade-off documented in the comment: SERIALIZABLE-opt-in writes
(§VI.36 aspirational) will need an additional revalidation step
here; the bench's append/upsert pattern is fine because Lance's
natural rebase handles concurrent writes onto the same dataset.

Bench results captured at .context/bench-results/after-pr2/ +
.context/bench-results/comparison.md:
- single-actor 1x1: 15.0 ops/s vs baseline 12.3 (+22%)
- disjoint 8x8:    7.03 ops/s vs baseline 6.24 (+13%)
- same-key 8x1:    still rejected (76% errors) by the
  ensure_expected_version strict check upstream of commit_all;
  follow-up to address.

Disjoint's 13% is below the master plan's ≥8× target. Bench shows
the coordinator Mutex is now the dominant serializer; relaxing
to RwLock for snapshot/version reads is the next perf step,
tracked as a follow-up in comparison.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 19:28:36 +02:00
..
omnigraph staging: re-capture expected_versions under queue (PR 2 Step D fix) 2026-05-07 19:28:36 +02:00
omnigraph-cli mr-686: bundle PR 0/1a/1b foundation + PR 2 catalog/schema_source ArcSwap 2026-05-07 16:22:38 +02:00
omnigraph-compiler release: bump version to 0.4.1 2026-05-02 23:20:50 +02:00
omnigraph-server server: flip AppState to Arc<Omnigraph>, wire admission on /change (PR 2 Step F) 2026-05-07 17:08:26 +02:00