omnigraph/crates
Ragnor Comerford d69b15d975
fix(writes): tolerate benign drift, defer sidecar-covered (OCC fence = fresh manifest pin)
The shared pre-stage precondition rejected any strict write (update / delete
/ schema apply) whenever a table's Lance HEAD differed from the manifest pin
the caller captured. But `HEAD > pin` is ambiguous: it can be benign
content-preserving drift that never published (compaction, a recovery
restore, an old-binary optimize, an external compact_files) — which carries
no recovery sidecar and is safe to write over — or a real in-flight partial
write a recovery sidecar covers and the open-time sweep will roll back. The
old check failed both with a stale-view 409, so schema apply (and strict
mutation) could not run on any write-active graph that had ever compacted or
recovered, and the original +1-per-retry loop followed from it.

Replace it with `Omnigraph::ensure_writable_or_defer`, which makes the OCC
fence the *current* manifest pin (re-read fresh on the conflict path), not the
caller's possibly-stale snapshot pin:

- HEAD == caller pin: fresh, no drift -> proceed (fast path, no extra read).
- caller pin != current pin: the caller is stale -> ExpectedVersionMismatch
  here, before any staged commit or sidecar, so a stale handle still fails
  loudly and leaves no residue (the prior early-reject behavior is preserved).
- caller pin == current pin, HEAD > pin, no sidecar: benign drift -> proceed;
  the writer's commit + publisher CAS reconcile the manifest.
- caller pin == current pin, HEAD > pin, a (foreign) sidecar pins the table:
  defer with an actionable 'reopen to recover' error.
- HEAD < current pin: manifest leads durable Lance state -> loud invariant.

Load-bearing details:

- The OCC fence staging records is now the manifest pin threaded out of
  `open_for_mutation_on_branch` (4th tuple element), not `ds.version()`. With
  benign drift tolerated, capturing the drifted HEAD would relocate the
  rejection into a spurious 409 at the post-queue strict check.
- `sidecar_pins_table` takes `exclude_operation_id`: schema apply writes its
  own recovery sidecar before its per-table head re-checks, so it must skip
  that sidecar or it would defer against itself.
- The delete path's `initial_version` (reopen-for-mutation) stays `ds.version()`
  — it is a HEAD-vs-HEAD self-consistency check, not an OCC fence.

Only the pre-stage check changes; the post-queue strict checks and the
publisher CAS (manifest-vs-manifest) are unchanged.
2026-06-08 11:54:51 +02:00
..
omnigraph fix(writes): tolerate benign drift, defer sidecar-covered (OCC fence = fresh manifest pin) 2026-06-08 11:54:51 +02:00
omnigraph-cli fix(optimize): skip blob-bearing tables to avoid Lance compaction crash (#138) 2026-06-02 17:12:00 +02:00
omnigraph-compiler fix(optimize): skip blob-bearing tables to avoid Lance compaction crash (#138) 2026-06-02 17:12:00 +02:00
omnigraph-policy fix(optimize): skip blob-bearing tables to avoid Lance compaction crash (#138) 2026-06-02 17:12:00 +02:00
omnigraph-server fix(optimize): skip blob-bearing tables to avoid Lance compaction crash (#138) 2026-06-02 17:12:00 +02:00