Confirms Lance v4.0.0 has row-level CAS for merge_insert only when the join-key column carries lance-schema:unenforced-primary-key=true. Our __manifest schema does not, so the publisher silently allows duplicate object_id rows under concurrent writers. Note + reproducible scratch crate select the layered (pre-check + row-level CAS) approach for the publisher API ticket.
8.3 KiB
MergeInsertBuilder CAS granularity (Lance 4.0.0)
Status: investigated (2026-04-28)
Consumed by: publisher per-table version API ticket — see "Implication" below.
Companion ticket: zombie-run investigation, Prerequisite section under "The demotion".
Repro: .context/scratch/merge-insert-cas-repro/ — cargo test --release -- --nocapture. Both tests pass against lance = "=4.0.0", demonstrating the answer.
TL;DR
Lance v4.0.0 supports row-level CAS for MergeInsertBuilder — but only when the join-key columns are annotated lance-schema:unenforced-primary-key=true. Without that annotation, two concurrent execute_reader() calls inserting the same key into disjoint fragments both succeed silently, producing duplicate rows under the same key.
Our __manifest schema (crates/omnigraph/src/db/manifest/state.rs:44-60) does not carry that annotation, so today's GraphNamespacePublisher::publish has no row-level CAS protection. The TOCTOU window between load_publish_state and the merge-insert commit is open — two concurrent publishers computing the same (table_key, table_version) row both land it.
Implication for the publisher API ticket
Take both layers, not one or the other:
- Annotate
object_id(or the version-row subset)lance-schema:unenforced-primary-key=trueinmanifest_schema()so Lance enforces row-level CAS at commit time. This closes the silent-duplicate hole that exists today, independent of the new feature. - Add a pre-check phase in
publishthat validatesexpected_table_versionsagainst the manifest snapshot loaded byload_publish_state. The pre-check covers "expected unchanged" assertions for tables the caller is not writing to (Lance's row-level CAS only covers rows we are writing). - Set
merge_builder.conflict_retries(0)and let the publisher's own caller-level retry loop handle the rebase: refresh manifest, re-run pre-check, re-attempt merge-insert. This gives strict atomicity for the per-table expected-version contract; Lance's auto-rebase would otherwise let our commit through against unfamiliar manifest state.
The ticket's "Approach 1 (parameter-only)" by itself is insufficient — Lance row-level CAS only catches collisions on rows we are emitting; it does not enforce "expected unchanged" for untouched tables. The ticket's "Approach 2 (pre-check + CAS)" is correct only if the row-level CAS is also enabled, otherwise the pre-check is TOCTOU-vulnerable and the existing publisher already has the silent-duplicate bug.
How Lance does it (source-quoted)
1. Filter is built only when ON columns match the unenforced primary key
rust/lance/src/dataset/write/merge_insert/exec/write.rs:209-221:
// Check if ON columns match the schema's unenforced primary key
let field_ids: Vec<i32> = params
.on
.iter()
.filter_map(|name| dataset.schema().field(name).map(|f| f.id))
.collect();
let pk_field_ids: Vec<i32> = dataset
.schema()
.unenforced_primary_key()
.iter()
.map(|f| f.id)
.collect();
let is_primary_key = !pk_field_ids.is_empty() && field_ids == pk_field_ids;
The filter is then attached only when is_primary_key:
rust/lance/src/dataset/write/merge_insert/exec/write.rs:903-928:
let inserted_rows_filter = if is_primary_key {
// ... build KeyExistenceFilter (bloom or exact set)
};
// ...
inserted_rows_filter: inserted_rows_filter.clone(),
The PK is sourced from field metadata; from inserted_rows.rs:158:
/// Tracks keys of inserted rows for conflict detection./// Only created when ON columns match the schema's unenforced primary key.
2. Conflict detection compares the bloom filters at commit time
rust/lance/src/io/commit/conflict_resolver.rs::check_update_txn (line 328+):
match (self_inserted_rows_filter, other_inserted_rows_filter) {
(Some(self_keys), Some(other_keys)) => {
// ...
let Ok((has_intersection, _maybe_false_positive)) =
self_keys.intersects(other_keys)
else { /* treat as conflict */ };
if has_intersection {
return Err(self.retryable_conflict_err(other_transaction, other_version));
}
}
(Some(_), None) => {
// Pessimistic: filter present on this side only -> conflict.
return Err(self.retryable_conflict_err(other_transaction, other_version));
}
_ => {}
}
RetryableCommitConflict is consumed by the merge-insert retry loop (rust/lance/src/dataset/write/retry.rs:97); when retries exhaust it surfaces as Error::TooMuchWriteContention.
3. Without the filter — fragment overlap only
If the filter is None on one side (no PK annotation), check_update_txn falls through to fragment-overlap checks (line 382+). For two Operation::Update transactions where each writer's merge-insert emits a new fragment with the new row, neither side's updated_fragments / removed_fragment_ids overlaps the other's modified_fragment_ids. The conflict resolver returns Ok and both commits land.
This is exactly what merge_insert does on a not-matched insert: rust/lance/src/dataset/write/merge_insert.rs:1574 builds Operation::Update with inserted_rows_filter: None (the v1 path) — though the more common v2/FullSchemaMergeInsertExec path also leaves the filter None unless is_primary_key flipped to true at plan time.
4. Today's __manifest schema does NOT have the annotation
crates/omnigraph/src/db/manifest/state.rs:44-60:
pub(super) fn manifest_schema() -> SchemaRef {
Arc::new(Schema::new(vec![
Field::new("object_id", DataType::Utf8, false), // <-- no metadata
// ...
]))
}
crates/omnigraph/src/db/manifest/publisher.rs:286 joins on object_id. With no PK metadata, is_primary_key is false for every commit; inserted_rows_filter is always None; the fallback fragment-overlap check accepts the duplicate.
Empirical confirmation
.context/scratch/merge-insert-cas-repro/src/lib.rs builds two tests against a __manifest-shaped schema, deliberately using MergeInsertBuilder with the same on=["object_id"], both writers based on the same dataset version:
without_pk_annotation_concurrent_inserts_both_succeed— passes. Test output:[without_pk] duplicate rows after both commits: 2.with_pk_annotation_concurrent_inserts_second_fails— passes. Test output:[with_pk] second writer correctly rejected with TooMuchWriteContention.
Run:
cd .context/scratch/merge-insert-cas-repro
cargo test --release -- --nocapture
Recommended publisher implementation outline
- In
manifest_schema()(crates/omnigraph/src/db/manifest/state.rs:44): attachlance-schema:unenforced-primary-key=truemetadata toobject_id. - Extend
ManifestBatchPublisher::publish(crates/omnigraph/src/db/manifest/publisher.rs:42-44) to takeexpected_table_versions: HashMap<String, u64>(empty = back-compat). - In
GraphNamespacePublisher::publish, afterload_publish_state(publisher.rs:77), reduceexisting_versionsto "latest non-tombstoned version per table" (mirrorstate.rs:65-94) and reject any expectation that doesn't match with a typedmanifest_conflict { table_key, expected, actual }. Add the structured variant onManifestError(crates/omnigraph/src/error.rs:5-46). - Switch
merge_builder.conflict_retries(5)to(0)(publisher.rs:290) so a concurrent commit fails fast instead of silently rebasing past our pre-check. - Wrap the publish in a bounded retry loop at the publisher level: on
TooMuchWriteContentionorCommitConflict, refresh manifest, re-run pre-check, re-attempt — bounded retries (5). - Existing tests at
crates/omnigraph/src/db/manifest/tests.rs:680-728are the right place to extend with stale-expected-version cases.
Out of scope here
- Whether to apply the same PK annotation to
_graph_commits.lance/_graph_runs.lance— separate review when those code paths land in scope. - Schema-migration story for existing manifests written before the annotation lands. Adding field metadata is a non-breaking schema change in Lance, but worth confirming with one round of
optimizesemantics before deploying. - Whether the fast-fail (
conflict_retries(0)+ caller-level retry) is preferable to Lance's built-in rebase. The argument here is correctness: Lance's rebase is "transparent merge", which is the wrong semantic for an OCC contract.