omnigraph/vendor/lance-table/README.omnigraph.md

43 lines
2.1 KiB
Markdown
Raw Normal View History

fix(deps): vendor lance-table 7.0.0 + lance#7480 so merge-updated tables survive filtered reads after deletes iss-merge-rowid-overlap-corrupts-filtered-reads / lance#7444: an update-style merge_insert over a merge-written fragment legally reuses the updated rows' stable row ids (row-id-lineage spec: updates preserve _rowid) while the superseded fragment keeps its full sequence plus a deletion vector. A later delete leaves the overlapping id range sparsely tiled, and lance-table 7.0.0's RowIdIndex::new asserted dense tiling — failing every filtered read that builds the id→address map ("Wrong range" debug assert; "all columns in a record batch must have the same length" or a silently-wrong batch in release). The upstream fix (lance#7480, merged 2026-07-01) landed hours AFTER v8.0.0 was cut, so no release ≤ 8.0.0 carries it. Consume it now as a vendored pin: vendor/lance-table is the pristine published 7.0.0 source plus ONLY the #7480 rowids/index.rs hunk (drop the false tiling assert; hard-error on the true invariant — one live id claimed by two fragments) and upstream's regression unit test, wired via [patch.crates-io]. The fix is read-side only, so already-written graphs become readable as-is — no data repair. Removal condition (see vendor/lance-table/README.omnigraph.md): drop the vendor dir + patch entry at the first Lance bump whose lance-table ships lance#7480 (9.0.0, or a backported 8.0.1). The surface guard filtered_scan_tolerates_merge_update_row_id_overlap keeps that honest in both directions. Turns the previous commit's red tests green. Full workspace gate passes (cargo test --workspace --locked --no-fail-fast, 68 suites).
2026-07-02 02:17:25 +03:00
# Vendored `lance-table` 7.0.0 + lance#7480 (omnigraph patch pin)
This directory is the **pristine `lance-table` 7.0.0 crates.io source** (unpacked
from the published `.crate`) carrying exactly one upstream fix, cherry-picked
from [lance-format/lance#7480](https://github.com/lance-format/lance/pull/7480)
(merged to Lance main 2026-07-01, first present in no release ≤ 8.0.0):
- `src/rowids/index.rs``RowIdIndex::new` no longer asserts that overlapping
row-id chunks densely tile their range (an update-style `merge_insert`
legally reuses the updated rows' stable ids in new fragments while the
superseded fragment keeps its full sequence + a deletion vector; a later
delete leaves the union short of the span). The real invariant — the same
live id claimed by two fragments — is now a hard error in
`merge_overlapping_chunks` instead. Upstream's regression unit test is
included.
Without the fix, any filtered read that builds the row-id index on such a
table fails: `rowids/index.rs:50` "Wrong range" debug assert; "all columns in
a record batch must have the same length" (or a silently-wrong batch) in
release. Bug: [lance#7444](https://github.com/lance-format/lance/issues/7444),
tracked as `iss-merge-rowid-overlap-corrupts-filtered-reads` /
`blk-lance-7444` on the dev graph.
Wired up via `[patch.crates-io] lance-table = { path = "vendor/lance-table" }`
in the workspace root `Cargo.toml`.
## Removal condition
Delete this directory and the `[patch.crates-io]` entry at the **first Lance
bump whose `lance-table` ships lance#7480** — 9.0.0, or a backported 8.0.1 if
upstream cuts one. The runtime guard
`crates/omnigraph/tests/lance_surface_guards.rs::filtered_scan_tolerates_merge_update_row_id_overlap`
pins the fixed behavior: it goes red if the patch is dropped too early or a
future bump regresses the fix.
## Verifying the delta
```bash
# The full diff vs the published crate should be ONLY the #7480 hunk + this README:
tar -xzf ~/.cargo/registry/cache/index.crates.io-*/lance-table-7.0.0.crate -C /tmp
diff -ru /tmp/lance-table-7.0.0 vendor/lance-table
```