mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-15 01:55:13 +02:00
build(deps): bump Lance 6.0.1 → 7.0.0 (correct-by-design substrate alignment) (#229)
* build(deps): bump Lance 6.0.1 → 7.0.0 (object_store 0.13.2, roaring 0.11.4) Arrow stays 58 and DataFusion stays 53 (no change). The only transitive bump is object_store 0.12.5 → 0.13.2. 141 upstream commits reviewed; no fixes lost (the 6.0.x release-branch backports are all forward-ported into 7.0.0). - object_store 0.13 moved get/put/head/rename/delete behind a new ObjectStoreExt trait (list/list_with_delimiter/put_opts stay on the core trait). Add `use object_store::ObjectStoreExt` in storage.rs and db/manifest/namespace.rs; no call-site changes. Mirrors Lance's own migration in PR #6672. - roaring pinned to 0.11.4 (cargo update -p roaring --precise 0.11.4). Lance 7.0.0's UpdatedFragmentOffsets newtype (lance#6650) derives Eq over HashMap<u64, RoaringBitmap>, which needs RoaringBitmap: Eq, added in roaring 0.11.4; the loose `roaring = "0.11"` constraint otherwise resolves 0.11.3 and lance itself fails to compile. - lance#6774: merge-insert INSERT rows now stamp _row_created_at_version with the commit version (was a fallback of 1). Flip the lance_version_columns assertion to `== v2` and correct the changes/mod.rs rationale comment. Production change-detection keys on _row_last_updated_at_version + ID membership, so its logic is unaffected. Refs lance#6650, lance#6774, lance#6672. * fix(storage): pin WriteParams::auto_cleanup = None (lance#6755 default flip) lance#6755 flipped the WriteParams::auto_cleanup default from on (a full cleanup pass every 20th commit) to None. On 6.0.1 the on-by-default hook could silently GC versions that __manifest pins for snapshots/time-travel. OmniGraph owns cleanup explicitly (optimize.rs::cleanup_all_tables) and never set auto_cleanup, so it was relying on a default that is both wrong for our snapshot model and now changed upstream. Pin auto_cleanup: None explicitly at all 11 production WriteParams sites (table_store ×6, commit_graph ×2, recovery_audit ×1, manifest/graph ×2 — the __manifest + sub-table Create paths). Removes the dependency on a default-flag value and locks in the snapshot-safe behavior regardless of future upstream re-flips. Refs lance#6755. * test(lance): pin BTREE range-boundary correctness (lance#6796) lance#6796 (issue #6792) fixed a BTREE scalar-index range-query bound inclusiveness bug: `x <= hi AND x > lo` returned the wrong boundary row. Add lance_surface_guards.rs::btree_range_query_boundary_is_correct, which reproduces the exact #6792 shape (5 rows + an explicit BTREE drives the index path even on tiny data) and pins the corrected inclusive-<= / exclusive-> semantics. It turns red if a future Lance regression reintroduces the bug. OmniGraph today builds BTREE only on string @key columns and queries them by equality/IN, so its current patterns do not hit this; the guard protects any future BTREE-range path (BTREE-on-properties, range-on-key). Refs lance#6796. * docs(dev): align Lance docs + invariants to 7.0.0 - docs/dev/lance.md: new 2026-06-14 alignment stanza for the 6.0.1 → 7.0.0 bump (object_store ObjectStoreExt move, roaring 0.11.4, #6774/#6796/#6755 behavior, #6658 shipped → MR-A unblocked but separate, #6666 + blob compaction still open); prior 6.0.1 stanza demoted to historical. - AGENTS.md: storage substrate 6.x → 7.x (line + architecture diagram). - docs/dev/invariants.md: deletes/vector known gap updated — the staged two-phase delete API (lance#6658) now exists and MR-A is unblocked, but delete_where stays inline and D2 stays in place until the migration lands; create_vector_index still gated on lance#6666. * fix(storage): skip Lance auto-cleanup on commit paths for legacy datasets Addresses PR #229 review (Codex P1). `WriteParams::auto_cleanup` is create-time config with no effect on existing datasets (Lance write.rs docs), so the previous `auto_cleanup: None` change alone did NOT protect graphs created before the v7 bump: 6.0.1 defaulted auto_cleanup ON, leaving `lance.auto_cleanup.*` config on those datasets, and Lance's per-commit hook (io/commit.rs: `if !commit_config.skip_auto_cleanup`) fires off that stored config — so omnigraph's own writes would GC versions the __manifest pins for snapshots/time-travel. Skip the hook on every commit path, covering new and legacy datasets alike: - commit_staged: CommitBuilder::with_skip_auto_cleanup(true) — the staged data path. - __manifest publisher: MergeInsertBuilder::skip_auto_cleanup(true). - all 11 WriteParams: skip_auto_cleanup: true (direct Dataset::write/append paths; auto_cleanup: None retained so new datasets store no cleanup config at all). Tests: - lance_surface_guards::skip_auto_cleanup_suppresses_version_gc — substrate: negative control (config GCs v1 without skip) + with-skip survival. - staged_writes::commit_staged_skips_auto_cleanup_so_pinned_versions_survive — omnigraph usage: commit_staged on a legacy-config dataset preserves the pinned create version. Refs lance#6755. * test(lance): assert created_at-preserved + updated_at-bumped on merge_insert UPDATE Addresses PR #229 review follow-up. `lance_merge_insert_update_preserves_created_at_version` documented (in a comment) that a merge_insert UPDATE preserves created_at and bumps updated_at, but only asserted the value change — leaving the change-feed invariant unguarded. Add the two missing assertions: - bob created_at == v1 (preserved across UPDATE; what the test name promises; lance#6774 only changed INSERT-row stamping). - bob updated_at == v2 (bumped to the commit version) — the invariant OmniGraph's insert/update classification relies on (changes/mod.rs keys on _row_last_updated_at_version). A regression here would silently drop updates from the diff/change feed.
This commit is contained in:
parent
7963499995
commit
67baf615d9
16 changed files with 1708 additions and 284 deletions
|
|
@ -132,13 +132,18 @@ them explicit.
|
|||
new writer cannot couple a write with a HEAD advance through the default
|
||||
surface. The dead legacy methods (`append_batch` on the trait,
|
||||
`merge_insert_batch{,es}`, `create_{btree,inverted}_index`) were removed. The
|
||||
remaining residuals are `delete_where` (gated on MR-A — Lance v7.x bump)
|
||||
and `create_vector_index` (gated on Lance #6666); see
|
||||
[lance.md](lance.md) and [writes.md](writes.md). New write paths should use
|
||||
the staged shape unless a documented Lance blocker applies.
|
||||
remaining residuals are `delete_where` and `create_vector_index`. The Lance
|
||||
6.0.1 → 7.0.0 bump landed, so the staged two-phase delete API
|
||||
(`DeleteBuilder::execute_uncommitted`, Lance #6658) is now available and MR-A
|
||||
is unblocked — but the migration itself is still pending, so `delete_where`
|
||||
stays inline for now. `create_vector_index` remains gated on Lance #6666
|
||||
(still open). See [lance.md](lance.md) and [writes.md](writes.md). New write
|
||||
paths should use the staged shape unless a documented Lance blocker applies.
|
||||
- **Deletes and vector indexes:** `delete_where` and vector index creation still
|
||||
advance Lance HEAD inline because the required public Lance APIs are missing.
|
||||
Keep D2 and recovery coverage in place until those residuals are removed.
|
||||
advance Lance HEAD inline. The public delete two-phase API now exists (Lance
|
||||
#6658 shipped in 7.0.0), so the delete residual is unblocked pending the MR-A
|
||||
migration; vector index creation is still blocked (Lance #6666 open). Keep D2
|
||||
and recovery coverage in place until those residuals are removed.
|
||||
- **Blob-column compaction:** Lance `compact_files` mis-decodes blob-v2 columns
|
||||
under its forced `BlobHandling::AllBinary` read ("more fields in the schema
|
||||
than provided column indices"), so `optimize` skips any table with a `Blob`
|
||||
|
|
|
|||
|
|
@ -156,7 +156,22 @@ If a future need pulls one of these into scope, add a row to the matching domain
|
|||
|
||||
When Lance ships a major release that changes any of the above (file format bump, new index type, transaction semantics change, new branching primitive), refresh this index in the same change as the omnigraph upgrade. Stale Lance pointers are worse than no pointers.
|
||||
|
||||
### Last alignment audit: 2026-05-22 (Lance 6.0.1 upstream; omnigraph pinned at 6.0.1)
|
||||
### Last alignment audit: 2026-06-14 (Lance 7.0.0 upstream; omnigraph pinned at 7.0.0)
|
||||
|
||||
Migration from Lance 6.0.1 → 7.0.0 landed in this cycle. **Arrow stayed 58, DataFusion stayed 53** (no change) — the only transitive bump is `object_store` 0.12.5 → 0.13.2. 141 upstream commits reviewed (6.0.1 → 7.0.0); no fixes lost (the 6.0.x release-branch backports are all forward-ported into 7.0.0). Behavior-affecting findings:
|
||||
|
||||
- **object_store 0.13 moved convenience methods behind a new `ObjectStoreExt` trait** (`get`/`put`/`head`/`rename`/`delete`; `list`/`list_with_delimiter`/`put_opts` stay on the core `ObjectStore` trait). Fix = add `use object_store::ObjectStoreExt;` to `storage.rs` and `db/manifest/namespace.rs`; no call-site changes. Mirrors Lance's own migration in PR #6672. The local-FS `PutMode::Update` gap is unchanged (still unimplemented upstream), so `storage.rs::write_text_if_match`'s local content-token emulation stays.
|
||||
- **`roaring` must be pinned to 0.11.4** (`cargo update -p roaring --precise 0.11.4`). Lance 7.0.0's `UpdatedFragmentOffsets` newtype (PR #6650) derives `Eq` over `HashMap<u64, RoaringBitmap>`, which needs `RoaringBitmap: Eq` — added only in roaring 0.11.4 (roaring-rs PR #341). Lance's loose `roaring = "0.11"` constraint otherwise resolves the broken 0.11.3 and **lance itself fails to compile** (`RoaringBitmap: Eq is not satisfied`). roaring is transitive (no direct workspace dep); the pin lives only in `Cargo.lock`.
|
||||
- **`_row_created_at_version` for merge-insert INSERT rows now = the commit version** (PR #6774; was a fallback of 1 / dataset-creation version). Flipped `lance_version_columns.rs::lance_merge_insert_new_row_stamps_created_at_version` to assert `== v2`. Production change-detection keys on `_row_last_updated_at_version` + ID-set membership, so classification logic is unaffected (the `changes/mod.rs` rationale comment was corrected).
|
||||
- **BTREE range-query bound inclusiveness fixed** (PR #6796, issue #6792): `x <= hi AND x > lo` returned the wrong boundary row on 6.0.1. omnigraph today builds BTREE only on string `@key` columns (`id`/`src`/`dst`) and queries them by equality/IN, not range, so its *current* query patterns almost certainly never hit this bug — but the corrected boundary semantics are a contract we rely on the moment a BTREE-range path appears (BTREE-on-properties via the index-type tickets, or a range-on-key query). Pinned by `lance_surface_guards.rs::btree_range_query_boundary_is_correct` (reproduces #6792's 5-row + BTREE shape).
|
||||
- **`WriteParams::auto_cleanup` default flipped from on (every-20-commits) to `None`** (PR #6755). On 6.0.1 the on-by-default hook could GC versions the `__manifest` pins for snapshots/time-travel. omnigraph owns cleanup explicitly (`optimize.rs::cleanup_all_tables`). Two parts to the fix, because `auto_cleanup` is **create-time config only and has no effect on existing datasets** (Lance `write.rs` docs): (1) `auto_cleanup: None` at all 11 `WriteParams` sites so *new* datasets store no cleanup config; (2) — the load-bearing half — `skip_auto_cleanup: true` on every commit path, because graphs created **before** the bump still carry the on-config in their datasets, and Lance's hook fires off the *dataset's stored* config at commit time (`io/commit.rs`: `if !commit_config.skip_auto_cleanup`). So the staged commit path (`commit_staged` → `CommitBuilder::with_skip_auto_cleanup(true)`), the `__manifest` publisher (`MergeInsertBuilder::skip_auto_cleanup(true)`), and the direct `WriteParams` paths all skip the hook. Without this, an upgraded graph would still auto-cleanup and delete `__manifest`-pinned versions. Pinned by `lance_surface_guards.rs::skip_auto_cleanup_suppresses_version_gc` (negative control + with-skip survival).
|
||||
- **Lance #6658 SHIPPED in 7.0.0** (`DeleteBuilder::execute_uncommitted`, exposed via PR #6781) → MR-A (migrate `delete_where` to the staged two-phase API, retire the parse-time D2 rule) is now **unblocked**, tracked separately (dev-graph `iss-950`). The bump itself keeps `delete_where` inline; the `_compile_delete_result_field_shape` guard is left untouched until MR-A.
|
||||
- **Still NOT fixed in 7.0.0:** vector-index two-phase (Lance #6666 open) — `create_vector_index` inline residual retained; blob-column compaction — `compact_files_still_fails_on_blob_columns` guard still red on a fix, `optimize` still skips blob tables behind `LANCE_SUPPORTS_BLOB_COMPACTION`.
|
||||
- **No Lance-API surface omnigraph uses changed 6.0.1 → 7.0.0** (verified by a clean engine build; the only compile break was object_store). `CleanupPolicy`, `WriteParams` (apart from the `auto_cleanup` default), `CompactionOptions`, the namespace models (resolved via `lance-namespace-reqwest-client` 0.7.7, unchanged across the bump), `Operation`, `ManifestLocation`, and `MergeInsertBuilder` shapes are all stable.
|
||||
|
||||
Bump this date stanza on the next alignment pass.
|
||||
|
||||
### Prior alignment audit: 2026-05-22 (Lance 6.0.1 upstream; omnigraph pinned at 6.0.1)
|
||||
|
||||
Migration from Lance 4.0.0 → 6.0.1 landed in this cycle (DataFusion 52 → 53, Arrow 57 → 58, lance-tokenizer 6.0.1 added, tantivy* removed). Direct 4 → 6 jump; v5.x was not used as an intermediate (rationale in `~/.claude/plans/shimmering-percolating-duckling.md`). Behavior-affecting findings:
|
||||
|
||||
|
|
@ -180,5 +195,3 @@ Migration from Lance 4.0.0 → 6.0.1 landed in this cycle (DataFusion 52 → 53,
|
|||
- **Lance blob-v2 `compact_files` bug** (no public issue found as of 2026-06): `compact_files` disables binary-copy for blob datasets and forces `BlobHandling::AllBinary` on the read side; the v2.1+ structural decoder then mis-counts column infos for the blob-v2 struct and fails with `Invalid user input: there were more fields in the schema than provided column indices / infos` (`lance-encoding/src/decoder.rs::ColumnInfoIter::expect_next`). This fails even a pristine uniform-V2_2 multi-fragment blob table; vector/list/scalar/ragged columns and mixed file versions all compact fine. Reads/queries use descriptor handling (`BlobHandling::default()`) and are unaffected. `optimize` skips blob-bearing tables behind `LANCE_SUPPORTS_BLOB_COMPACTION = false` (`db/omnigraph/optimize.rs`), reporting `SkipReason::BlobColumnsUnsupportedByLance`. Pinned by `lance_surface_guards.rs::compact_files_still_fails_on_blob_columns`, which turns red when the bug is fixed → flip the gate, remove the skip branch + the `maintenance.rs::optimize_skips_blob_table_and_reports_skip` skip assertions.
|
||||
|
||||
Surface guards added: `crates/omnigraph/tests/lance_surface_guards.rs` (10 named guards; 5 runtime + 5 compile-only; plus the index-coverage work's `_compile_optimize_indices_signature` and `optimize_indices_extends_fragment_coverage`). Future Lance bumps re-run this file first as the smoke check. Two additional guards from the original plan deferred to follow-up (`manifest_cas_returns_row_level_contention_variant` needs full publisher-race harness; `table_version_metadata_byte_compatible_with_v4` needs `pub(crate)` reach extension).
|
||||
|
||||
Bump this date stanza on the next alignment pass.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue