mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-09 01:35:18 +02:00
chore(lance): bump 4.0.0 → 6.0.1 (DataFusion 52→53, Arrow 57→58) (#111)
* tests: add lance_surface_guards pre-flight pins for the v6 bump
Land 8 named guards in a new test file that pin Lance API surfaces
OmniGraph relies on. Each guard turns a silent-break risk (variant
rename, struct restructure, async-flip) into a red CI bar instead of
runtime drift.
Guards (mapped to the silent-break inventory from the v6 migration plan):
Runtime (#[tokio::test]):
1. lance_error_too_much_write_contention_variant_exists — pins the
variant referenced by db/manifest/publisher.rs::map_lance_publish_error.
2. manifest_location_field_shape — pins .path/.size/.e_tag/.naming_scheme
types and ManifestLocation accessor returning &Self (the access
pattern at db/manifest/metadata.rs:84-88).
6. write_params_default_does_not_set_storage_version — confirms our
explicit V2_2 pin remains load-bearing (blob v2 requirement).
Compile-only async fns (#[allow(...)] + unimplemented!() placeholders;
never run, but cargo build --tests enforces the API shape):
3. checkout_version + restore chain — pins the recovery rollback hammer
at db/manifest/recovery.rs:505-522.
4. DatasetBuilder::from_namespace().with_branch().with_version().load()
— pins the namespace builder chain at db/manifest/namespace.rs:162-174.
5. MergeInsertBuilder fluent chain — pins the manifest CAS at
db/manifest/publisher.rs:370-391, including the return shape
(Arc<Dataset>, MergeStats).
7. compact_files(&mut ds, CompactionOptions, None) — pins
db/omnigraph/optimize.rs:107.
8. DeleteResult { new_dataset, num_deleted_rows } — pins the inline
delete result shape (MR-A will repurpose this guard to the staged
two-phase variant once Lance #6658 migration lands).
This is commit 1 of the chore/lance-6.0.1 migration. Cargo bump
follows in commit 2 (will trigger the guards under v6 if any surface
drifted).
Per the migration plan at ~/.claude/plans/shimmering-percolating-duckling.md
(written this session). Two guards from the plan deferred to follow-up:
- manifest_cas_returns_row_level_contention_variant (full publisher
race integration test — needs harness scaffolding)
- table_version_metadata_byte_compatible_with_v4 (TableVersionMetadata
is pub(crate); requires test reach extension).
Verified on v4: cargo test -p omnigraph-engine --test lance_surface_guards
passes 3/3 runtime tests; cargo build -p omnigraph-engine --tests
compiles all 5 compile-only guards clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(deps): bump Lance 4.0.0 → 6.0.1, DataFusion 52 → 53, Arrow 57 → 58
The Cargo bump itself. Source is intentionally untouched — this commit
will not compile. The compile errors are the work-list for subsequent
commits on this branch.
Lance updates: lance + 7 sub-crates 4.0.0 → 6.0.1. Transitive churn:
+ lance-tokenizer v6.0.1 (vendored tokenizer per Lance PR #6512)
+ object_store 0.13.x (Lance 6 brings it transitively; our explicit
pin stays at 0.12.5 for now — revisit in stages if diamond bites)
- tantivy* crates (replaced by lance-tokenizer)
Compile error landscape on this commit (11 errors):
• 1× E0432: `lance_index::DatasetIndexExt` import (Lance PR #6280
moved it to lance::index). Sites: table_store.rs:20,
db/manifest.rs:37 (the second site was missed by the pre-flight
inventory).
• 8× E0599: `create_index_builder` / `load_indices` missing on
`lance::Dataset` — all downstream of the DatasetIndexExt move.
Once the import is corrected on table_store.rs and db/manifest.rs,
these resolve automatically.
• 2× E0063: missing field `is_only_declared` in `DescribeTableResponse`
initializer at db/manifest/namespace.rs:221, 364. New Lance
namespace field per the v5 namespace restructure (PR #6186).
Surface guards (lance_surface_guards.rs, commit d571fa8) all still
compile + the 3 runtime ones pass on v6 — none of the silent-break
surfaces drifted. That's the load-bearing observation: the publisher
CAS chain, ManifestLocation field shape, checkout_version/restore,
DatasetBuilder fluent chain, MergeInsertBuilder return shape,
WriteParams::default, compact_files signature, and DeleteResult
fields are all v6-stable.
Next commits address the 11 errors per the migration plan stages
3-8.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* imports: move DatasetIndexExt to lance::index (Lance PR #6280)
Lance 5.0 (PR #6280) moved `DatasetIndexExt` out of `lance-index` into
`lance::index`. `is_system_index` and `IndexType` stayed in `lance-index`.
Mechanical update of 6 import sites:
crates/omnigraph/src/table_store.rs:20 — split into two `use` lines
crates/omnigraph-server/tests/server.rs:10 — was traits::DatasetIndexExt
crates/omnigraph/tests/search.rs:6
crates/omnigraph/tests/branching.rs:7
crates/omnigraph/tests/failpoints.rs:467
crates/omnigraph-cli/tests/cli.rs:3 — was traits::DatasetIndexExt
All 9 E0599 cascading errors on .create_index_builder / .load_indices
resolve once the trait is back in scope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* namespace: add is_only_declared field to DescribeTableResponse
Lance namespace 6.0.0 added `is_only_declared: Option<bool>` to
`DescribeTableResponse` (lance-namespace-reqwest-client 0.7+ via the
v5.0 namespace API restructure, Lance PR #6186). Set to `Some(false)`
because every table BranchManifestNamespace returns from describe_table
is materialized — the manifest snapshot only includes entries for
tables we've already opened via Dataset::open.
Two sites in db/manifest/namespace.rs (BranchManifestNamespace +
StagedTableNamespace impls of LanceNamespace::describe_table).
Closes the last two compile errors from the v6 bump in the engine lib.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* cargo: add lance to omnigraph-cli + omnigraph-server dev-deps
Stage 3 moved DatasetIndexExt imports from `lance-index` to `lance::index`
in the cli and server test crates. Both crates only had `lance-index`
in their dev-dependencies; add `lance` alongside so the new path
resolves.
This is the last compile-error fix from the v6 bump — `cargo build
--workspace --tests` is now green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: refresh Lance alignment audit for v6.0.1; bump surveyed version
Per CLAUDE.md maintenance rule 2 (same-PR docs):
- docs/dev/lance.md: replace the v4.0.1 alignment audit stanza with
the v6.0.1 audit. Captures every v5/v6 finding from this PR (the
DatasetIndexExt move, DescribeTableResponse.is_only_declared,
MergeInsertBuilder return shape, ManifestLocation field shape,
LanceFileVersion::default flip, file-reader async, tokenizer
vendor, Lance #6658/#6666/#6877 status). Cross-references each
guard in tests/lance_surface_guards.rs.
- AGENTS.md: bump "Storage substrate: Lance 4.x" → "Lance 6.x".
Note: surveyed crate version stays at 0.4.2 — substrate version
bumps are independent of OmniGraph's release version.
- crates/omnigraph/src/storage_layer.rs: update the trait module-level
doc-comment to reflect that Lance #6658 closed 2026-05-14 and
delete_where two-phase migration is MR-A (the next follow-up).
#6666 stays open; create_vector_index inline residual stays.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* tests: silence clippy::diverging_sub_expression on compile-only guards
The five `_compile_*` async fns in lance_surface_guards.rs use
`let ds: Dataset = unimplemented!()` as a placeholder so type inference
can chase the method chain we want to pin, without ever running the
function. Clippy's `diverging_sub_expression` lint flags this pattern
because the RHS diverges; that's the entire point. Added to the
per-fn `#[allow(...)]` list, alongside dead_code / unreachable_code /
unused_variables / unused_mut already there.
No behavior change. cargo test -p omnigraph-engine --test
lance_surface_guards still 3/3 green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: correct #6658 status — closed but API ships in Lance v7.x, not v6.0.1
The audit stanza in docs/dev/lance.md and the storage_layer.rs trait
doc-comment both implied the public DeleteBuilder::execute_uncommitted
API shipped with Lance 6.0.1. It did not. Issue #6658 closed
2026-05-14, but binary search across the release stream confirms:
v6.0.1 ❌ no pub async fn execute_uncommitted on DeleteBuilder
v6.1.0-rc.1 ❌
v7.0.0-beta.5 ❌
v7.0.0-beta.10 ✅ first appearance
v7.0.0-rc.1 ✅
So MR-A (delete two-phase migration) is gated on the Lance v7.x bump,
not on this PR. v7.0.0-rc.1 dropped 2026-05-21; GA likely within a
week.
No behavior change. Doc-only correction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci(lib): bump recursion_limit to 256 — Lance 6 trait depth on Linux
Lance 6's heavier trait surface around futures/streams in storage_layer.rs's
staged-write API pushes the rustc trait-resolution recursion limit past
the default 128 on Linux builds. CI on PR #111 surfaced this in both
`Test Workspace` and `Test omnigraph-server --features aws`:
error: queries overflow the depth limit!
= help: consider increasing the recursion limit by adding a
`#![recursion_limit = "256"]` attribute to your crate (`omnigraph`)
= note: query depth increased by 130 when computing layout of
`{async block@crates/omnigraph/src/storage_layer.rs:697:5: 697:10}`
(The async block is `stage_create_btree_index`'s body — its return type
is several layers of `impl Future<Output=Result<StagedHandle>>` deep on
top of Lance's own builder return types.)
Local macOS builds happened to short-circuit before tripping the limit,
which is why this didn't surface during the v6 bump sequence. The fix
rustc itself suggests is one line at the crate root.
No behavior change. Revisit if a future Lance bump stops needing it.
Verified: `cargo build --locked -p omnigraph-server --features aws`
compiles clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
e1b40aee0b
commit
3551e0d40e
16 changed files with 600 additions and 580 deletions
|
|
@ -18,7 +18,7 @@ Tools that support `@`-imports (Claude Code) auto-include all three files via th
|
|||
|
||||
**Version surveyed:** 0.4.2
|
||||
**Workspace crates:** `omnigraph-compiler`, `omnigraph` (engine), `omnigraph-cli`, `omnigraph-server`
|
||||
**Storage substrate:** Lance 4.x (columnar, versioned, branchable)
|
||||
**Storage substrate:** Lance 6.x (columnar, versioned, branchable)
|
||||
**License:** MIT
|
||||
**Toolchain:** Rust stable, edition 2024
|
||||
|
||||
|
|
@ -53,7 +53,7 @@ CLI (omnigraph) HTTP Server (omnigraph-server, Axum)
|
|||
omnigraph (engine) ── ManifestRepo, CommitGraph, RunRegistry, GraphIndex (CSR/CSC), exec
|
||||
│
|
||||
▼
|
||||
Lance 4.x ── columnar Arrow, fragments, per-dataset versions/branches, indexes
|
||||
Lance 6.x ── columnar Arrow, fragments, per-dataset versions/branches, indexes
|
||||
│
|
||||
▼
|
||||
Object store (file / s3 / RustFS / MinIO / S3-compat)
|
||||
|
|
|
|||
819
Cargo.lock
generated
819
Cargo.lock
generated
File diff suppressed because it is too large
Load diff
42
Cargo.toml
42
Cargo.toml
|
|
@ -14,29 +14,29 @@ default-members = [
|
|||
]
|
||||
|
||||
[workspace.dependencies]
|
||||
arrow-array = "57"
|
||||
arrow-ipc = "57"
|
||||
arrow-schema = "57"
|
||||
arrow-select = "57"
|
||||
arrow-cast = { version = "57", features = ["prettyprint"] }
|
||||
arrow-ord = "57"
|
||||
arrow-array = "58"
|
||||
arrow-ipc = "58"
|
||||
arrow-schema = "58"
|
||||
arrow-select = "58"
|
||||
arrow-cast = { version = "58", features = ["prettyprint"] }
|
||||
arrow-ord = "58"
|
||||
|
||||
datafusion = { version = "52", default-features = false }
|
||||
datafusion-physical-plan = "52"
|
||||
datafusion-physical-expr = "52"
|
||||
datafusion-execution = "52"
|
||||
datafusion-common = "52"
|
||||
datafusion-expr = "52"
|
||||
datafusion-functions-aggregate = "52"
|
||||
datafusion = { version = "53", default-features = false }
|
||||
datafusion-physical-plan = "53"
|
||||
datafusion-physical-expr = "53"
|
||||
datafusion-execution = "53"
|
||||
datafusion-common = "53"
|
||||
datafusion-expr = "53"
|
||||
datafusion-functions-aggregate = "53"
|
||||
|
||||
lance = { version = "4.0.0", default-features = false, features = ["aws"] }
|
||||
lance-datafusion = "4.0.0"
|
||||
lance-file = "4.0.0"
|
||||
lance-index = "4.0.0"
|
||||
lance-linalg = "4.0.0"
|
||||
lance-namespace = "4.0.0"
|
||||
lance-namespace-impls = "4.0.0"
|
||||
lance-table = "4.0.0"
|
||||
lance = { version = "6.0.1", default-features = false, features = ["aws"] }
|
||||
lance-datafusion = "6.0.1"
|
||||
lance-file = "6.0.1"
|
||||
lance-index = "6.0.1"
|
||||
lance-linalg = "6.0.1"
|
||||
lance-namespace = "6.0.1"
|
||||
lance-namespace-impls = "6.0.1"
|
||||
lance-table = "6.0.1"
|
||||
|
||||
ulid = "1"
|
||||
futures = "0.3"
|
||||
|
|
|
|||
|
|
@ -30,4 +30,5 @@ assert_cmd = "2"
|
|||
predicates = "3"
|
||||
serde_json = { workspace = true }
|
||||
tempfile = { workspace = true }
|
||||
lance = { workspace = true }
|
||||
lance-index = { workspace = true }
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
use std::fs;
|
||||
|
||||
use lance_index::traits::DatasetIndexExt;
|
||||
use lance::index::DatasetIndexExt;
|
||||
use omnigraph::db::{Omnigraph, ReadTarget};
|
||||
use serde_json::Value;
|
||||
use tempfile::tempdir;
|
||||
|
|
|
|||
|
|
@ -45,4 +45,5 @@ aws-sdk-secretsmanager = { version = "1", optional = true, default-features = fa
|
|||
tempfile = { workspace = true }
|
||||
tower = { workspace = true }
|
||||
serial_test = "3"
|
||||
lance = { workspace = true }
|
||||
lance-index = { workspace = true }
|
||||
|
|
|
|||
|
|
@ -7,7 +7,7 @@ use axum::Router;
|
|||
use axum::body::{Body, to_bytes};
|
||||
use axum::http::header::AUTHORIZATION;
|
||||
use axum::http::{Method, Request, StatusCode};
|
||||
use lance_index::traits::DatasetIndexExt;
|
||||
use lance::index::DatasetIndexExt;
|
||||
use omnigraph::db::{Omnigraph, ReadTarget, SchemaApplyOptions};
|
||||
use omnigraph::error::OmniError;
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
|
|
|
|||
|
|
@ -230,6 +230,11 @@ impl LanceNamespace for BranchManifestNamespace {
|
|||
metadata: None,
|
||||
properties: None,
|
||||
managed_versioning: Some(true),
|
||||
// Every table we return from describe_table is physically
|
||||
// materialized (open_manifest_dataset succeeds), never just
|
||||
// "declared." See lance-namespace 6.0.1 DescribeTableResponse
|
||||
// field docs.
|
||||
is_only_declared: Some(false),
|
||||
})
|
||||
}
|
||||
|
||||
|
|
@ -373,6 +378,11 @@ impl LanceNamespace for StagedTableNamespace {
|
|||
metadata: None,
|
||||
properties: None,
|
||||
managed_versioning: Some(true),
|
||||
// Every table we return from describe_table is physically
|
||||
// materialized (open_manifest_dataset succeeds), never just
|
||||
// "declared." See lance-namespace 6.0.1 DescribeTableResponse
|
||||
// field docs.
|
||||
is_only_declared: Some(false),
|
||||
})
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -1,3 +1,12 @@
|
|||
// Lance 6's trait surface (heavier futures/streams nesting around the
|
||||
// staged-write API in `storage_layer.rs`) pushes us past the default
|
||||
// trait-resolution recursion limit of 128 on Linux builds. Raising to
|
||||
// 256 here is the upstream-suggested fix from rustc itself
|
||||
// ("consider increasing the recursion limit"). macOS happens to short-
|
||||
// circuit before tripping the limit; CI on Linux does not. Revisit if
|
||||
// future Lance bumps stop needing this.
|
||||
#![recursion_limit = "256"]
|
||||
|
||||
pub mod changes;
|
||||
pub mod db;
|
||||
pub mod embedding;
|
||||
|
|
|
|||
|
|
@ -10,11 +10,15 @@
|
|||
//! ## Transitional residuals on the trait
|
||||
//!
|
||||
//! Several inline-commit methods remain on the trait surface as
|
||||
//! documented residuals: `delete_where` (Lance 4.0.0's `DeleteJob` is
|
||||
//! `pub(crate)` — see [#6658](https://github.com/lance-format/lance/issues/6658)),
|
||||
//! documented residuals: `delete_where`
|
||||
//! ([#6658](https://github.com/lance-format/lance/issues/6658) closed
|
||||
//! 2026-05-14, but the public `DeleteBuilder::execute_uncommitted` API
|
||||
//! did not backport to the 6.x release line — it first ships in
|
||||
//! `v7.0.0-beta.10`. Migration to staged two-phase delete is tracked as
|
||||
//! MR-A and is gated on the Lance v7.x bump, not the current v6.0.1 pin),
|
||||
//! `create_vector_index` (segment-commit-path requires
|
||||
//! `build_index_metadata_from_segments` which is `pub(crate)` — see
|
||||
//! [#6666](https://github.com/lance-format/lance/issues/6666)), and the
|
||||
//! [#6666](https://github.com/lance-format/lance/issues/6666), still open), and the
|
||||
//! legacy `append_batch` / `merge_insert_batches` / `overwrite_batch` /
|
||||
//! `create_btree_index` / `create_inverted_index` paths kept while
|
||||
//! engine call sites finish migrating off of them (Phase 1b / Phase 9
|
||||
|
|
|
|||
|
|
@ -14,10 +14,11 @@ use lance::dataset::{
|
|||
WriteParams,
|
||||
};
|
||||
use lance::datatypes::BlobKind;
|
||||
use lance::index::DatasetIndexExt;
|
||||
use lance::index::scalar::IndexDetails;
|
||||
use lance_file::version::LanceFileVersion;
|
||||
use lance_index::scalar::{InvertedIndexParams, ScalarIndexParams};
|
||||
use lance_index::{DatasetIndexExt, IndexType, is_system_index};
|
||||
use lance_index::{IndexType, is_system_index};
|
||||
use lance_linalg::distance::MetricType;
|
||||
use lance_table::format::{Fragment, IndexMetadata, RowIdMeta};
|
||||
use lance_table::rowids::{RowIdSequence, write_row_ids};
|
||||
|
|
|
|||
|
|
@ -4,7 +4,8 @@ use std::fs;
|
|||
|
||||
use arrow_array::{Array, Int32Array, UInt64Array};
|
||||
use futures::TryStreamExt;
|
||||
use lance_index::{DatasetIndexExt, is_system_index};
|
||||
use lance::index::DatasetIndexExt;
|
||||
use lance_index::is_system_index;
|
||||
|
||||
use omnigraph::db::commit_graph::CommitGraph;
|
||||
use omnigraph::db::{MergeOutcome, Omnigraph, ReadTarget};
|
||||
|
|
|
|||
|
|
@ -464,7 +464,7 @@ async fn recovery_rolls_forward_load_on_feature_branch() {
|
|||
|
||||
#[tokio::test]
|
||||
async fn recovery_rolls_forward_ensure_indices_on_feature_branch() {
|
||||
use lance_index::DatasetIndexExt;
|
||||
use lance::index::DatasetIndexExt;
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
use omnigraph::table_store::TableStore;
|
||||
|
||||
|
|
|
|||
244
crates/omnigraph/tests/lance_surface_guards.rs
Normal file
244
crates/omnigraph/tests/lance_surface_guards.rs
Normal file
|
|
@ -0,0 +1,244 @@
|
|||
//! Lance API surface guards.
|
||||
//!
|
||||
//! Each guard pins a Lance API surface that OmniGraph relies on. If a future
|
||||
//! Lance bump silently renames a variant, restructures a public struct, or
|
||||
//! flips a method to async, the corresponding guard either fails to compile
|
||||
//! (compile-time guards) or fails at runtime (runtime guards). The purpose
|
||||
//! is to turn silent-break risks into red CI bars on the *next* Lance bump,
|
||||
//! rather than into wrong-state recovery in production.
|
||||
//!
|
||||
//! Pair this file with `docs/dev/lance.md`'s alignment audit stanza: any
|
||||
//! Lance bump runs `cargo test -p omnigraph-engine --test lance_surface_guards`
|
||||
//! first as the smoke check.
|
||||
//!
|
||||
//! ## Compile-only guards
|
||||
//!
|
||||
//! Functions prefixed with `_compile_` are gated with a broad `#[allow(...)]`
|
||||
//! and never called. They exist to make `cargo build -p omnigraph-engine --tests`
|
||||
//! enforce the API shape. Using `unimplemented!()` as a placeholder lets type
|
||||
//! inference proceed without running anything.
|
||||
//!
|
||||
//! ## Runtime guards
|
||||
//!
|
||||
//! Functions decorated `#[tokio::test]` actually run; they construct real
|
||||
//! values and assert field shapes / types.
|
||||
|
||||
use std::sync::Arc;
|
||||
|
||||
use arrow_array::{Int32Array, RecordBatch, RecordBatchIterator, StringArray};
|
||||
use arrow_schema::{DataType, Field, Schema};
|
||||
use lance::Dataset;
|
||||
use lance::dataset::builder::DatasetBuilder;
|
||||
use lance::dataset::optimize::{CompactionOptions, compact_files};
|
||||
use lance::dataset::write::delete::DeleteResult;
|
||||
use lance::dataset::{MergeInsertBuilder, WhenMatched, WhenNotMatched, WriteMode, WriteParams};
|
||||
use lance_file::version::LanceFileVersion;
|
||||
use lance_namespace::LanceNamespace;
|
||||
use lance_table::io::commit::ManifestNamingScheme;
|
||||
|
||||
/// Helper: build a small fresh dataset in a tempdir. Pinned at V2_2 to match
|
||||
/// production write paths (blob v2 requires V2_2; see `docs/dev/lance.md`).
|
||||
async fn fresh_dataset(uri: &str) -> Dataset {
|
||||
let schema = Arc::new(Schema::new(vec![
|
||||
Field::new("id", DataType::Utf8, false),
|
||||
Field::new("value", DataType::Int32, false),
|
||||
]));
|
||||
let batch = RecordBatch::try_new(
|
||||
schema.clone(),
|
||||
vec![
|
||||
Arc::new(StringArray::from(vec!["alice", "bob"])),
|
||||
Arc::new(Int32Array::from(vec![1, 2])),
|
||||
],
|
||||
)
|
||||
.unwrap();
|
||||
let reader = RecordBatchIterator::new(vec![Ok(batch)], schema);
|
||||
let params = WriteParams {
|
||||
mode: WriteMode::Create,
|
||||
enable_stable_row_ids: true,
|
||||
data_storage_version: Some(LanceFileVersion::V2_2),
|
||||
..Default::default()
|
||||
};
|
||||
Dataset::write(reader, uri, Some(params)).await.unwrap()
|
||||
}
|
||||
|
||||
// --- Guard 1: LanceError::TooMuchWriteContention variant exists ------------
|
||||
//
|
||||
// `db/manifest/publisher.rs::map_lance_publish_error` pattern-matches on this
|
||||
// variant to surface typed `OmniError::ManifestRowLevelCasContention`. If
|
||||
// Lance renames the variant or removes the builder, this guard fails.
|
||||
|
||||
#[tokio::test]
|
||||
async fn lance_error_too_much_write_contention_variant_exists() {
|
||||
let err = lance::Error::too_much_write_contention("guard");
|
||||
assert!(
|
||||
matches!(err, lance::Error::TooMuchWriteContention { .. }),
|
||||
"Lance::Error::TooMuchWriteContention variant missing or renamed; \
|
||||
update db/manifest/publisher.rs::map_lance_publish_error and \
|
||||
this guard, then re-pin docs/dev/lance.md."
|
||||
);
|
||||
}
|
||||
|
||||
// --- Guard 2: ManifestLocation field shape ---------------------------------
|
||||
//
|
||||
// `db/manifest/metadata.rs:84-88` reads `.path`, `.size`, `.e_tag`,
|
||||
// `.naming_scheme` off `dataset.manifest_location()`. If any field renames
|
||||
// or changes type, this guard fails to compile.
|
||||
|
||||
#[tokio::test]
|
||||
async fn manifest_location_field_shape() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().join("guard.lance");
|
||||
let ds = fresh_dataset(uri.to_str().unwrap()).await;
|
||||
|
||||
let loc = ds.manifest_location();
|
||||
// Explicit type bindings — these are the load-bearing assertions. If a
|
||||
// type drifts (e.g. .size: Option<u64> → .size: u64), this fails to
|
||||
// compile.
|
||||
let _path: &object_store::path::Path = &loc.path;
|
||||
let _size: Option<u64> = loc.size;
|
||||
let _e_tag: Option<String> = loc.e_tag.clone();
|
||||
let _scheme: ManifestNamingScheme = loc.naming_scheme;
|
||||
// Runtime sanity — naming_scheme should produce a Debug string we use
|
||||
// verbatim in `TableVersionMetadata::naming_scheme`.
|
||||
assert!(!format!("{:?}", loc.naming_scheme).is_empty());
|
||||
}
|
||||
|
||||
// --- Guard 3: checkout_version + restore async chain -----------------------
|
||||
//
|
||||
// `db/manifest/recovery.rs:505-522` chains `Dataset::open(...).await?
|
||||
// .checkout_version(N).await?.restore().await?` as the recovery rollback
|
||||
// hammer. Compile-only — never runs.
|
||||
|
||||
#[allow(
|
||||
dead_code,
|
||||
unreachable_code,
|
||||
unused_variables,
|
||||
unused_mut,
|
||||
clippy::diverging_sub_expression
|
||||
)]
|
||||
async fn _compile_checkout_version_then_restore_signature() -> lance::Result<()> {
|
||||
let ds: Dataset = unimplemented!();
|
||||
let mut ds: Dataset = ds.checkout_version(1u64).await?;
|
||||
// `restore()` takes `&mut self` and returns `Result<()>`; the dataset
|
||||
// mutates in place. If Lance flips this to return a fresh `Dataset`
|
||||
// (consuming `self`), this guard fails to compile.
|
||||
let _: () = ds.restore().await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// --- Guard 4: DatasetBuilder::from_namespace fluent chain ------------------
|
||||
//
|
||||
// `db/manifest/namespace.rs:162-174` chains
|
||||
// `DatasetBuilder::from_namespace(ns, vec![id]).await?.with_branch(...).with_version(...).load().await?`.
|
||||
// Compile-only.
|
||||
|
||||
#[allow(
|
||||
dead_code,
|
||||
unreachable_code,
|
||||
unused_variables,
|
||||
unused_mut,
|
||||
clippy::diverging_sub_expression
|
||||
)]
|
||||
async fn _compile_dataset_builder_from_namespace_signature(
|
||||
ns: Arc<dyn LanceNamespace>,
|
||||
) -> lance::Result<()> {
|
||||
let builder: DatasetBuilder =
|
||||
DatasetBuilder::from_namespace(ns, vec!["table".to_string()]).await?;
|
||||
let builder: DatasetBuilder = builder.with_branch("b", None);
|
||||
let builder: DatasetBuilder = builder.with_version(1u64);
|
||||
let _ds: Dataset = builder.load().await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// --- Guard 5: MergeInsertBuilder fluent chain ------------------------------
|
||||
//
|
||||
// `db/manifest/publisher.rs:370-391` is the manifest CAS. If any method on
|
||||
// the builder renames or changes signature, the publisher silently breaks.
|
||||
// Compile-only.
|
||||
|
||||
#[allow(
|
||||
dead_code,
|
||||
unreachable_code,
|
||||
unused_variables,
|
||||
unused_mut,
|
||||
clippy::diverging_sub_expression
|
||||
)]
|
||||
async fn _compile_merge_insert_builder_method_chain() -> lance::Result<()> {
|
||||
use lance::dataset::MergeStats;
|
||||
|
||||
let ds: Arc<Dataset> = unimplemented!();
|
||||
let job = MergeInsertBuilder::try_new(ds, vec!["object_id".to_string()])?
|
||||
.when_matched(WhenMatched::UpdateAll)
|
||||
.when_not_matched(WhenNotMatched::InsertAll)
|
||||
.conflict_retries(0)
|
||||
.use_index(false)
|
||||
.try_build()?;
|
||||
|
||||
// execute_reader takes `impl StreamingWriteSource` (lance trait), which
|
||||
// RecordBatchIterator implements. Pin the return shape
|
||||
// `(Arc<Dataset>, MergeStats)` — the publisher's CAS loop depends on
|
||||
// both: the new Dataset to advance HEAD, the stats for the audit row.
|
||||
let source: RecordBatchIterator<Vec<Result<RecordBatch, arrow_schema::ArrowError>>> =
|
||||
unimplemented!();
|
||||
let result: (Arc<Dataset>, MergeStats) = job.execute_reader(source).await?;
|
||||
let _ds: Arc<Dataset> = result.0;
|
||||
let _stats: MergeStats = result.1;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// --- Guard 6: WriteParams::default() leaves data_storage_version = None ----
|
||||
//
|
||||
// Our V2_2 pin is load-bearing for blob v2 (verified earlier this session
|
||||
// when V2_1 produced "Blob v2 requires file version >= 2.2" on 13 blob
|
||||
// tests). If Lance changes the default to pin some version itself, audit
|
||||
// every `data_storage_version: Some(LanceFileVersion::V2_2)` site.
|
||||
|
||||
#[test]
|
||||
fn write_params_default_does_not_set_storage_version() {
|
||||
let params = WriteParams::default();
|
||||
assert_eq!(
|
||||
params.data_storage_version, None,
|
||||
"WriteParams::default().data_storage_version is no longer None; \
|
||||
audit every explicit V2_2 pin (see rg 'LanceFileVersion::V2_2')."
|
||||
);
|
||||
}
|
||||
|
||||
// --- Guard 7: compact_files signature --------------------------------------
|
||||
//
|
||||
// `db/omnigraph/optimize.rs:107` calls `compact_files(&mut ds, options, None)`.
|
||||
// Compile-only.
|
||||
|
||||
#[allow(
|
||||
dead_code,
|
||||
unreachable_code,
|
||||
unused_variables,
|
||||
unused_mut,
|
||||
clippy::diverging_sub_expression
|
||||
)]
|
||||
async fn _compile_compact_files_signature() -> lance::Result<()> {
|
||||
let mut ds: Dataset = unimplemented!();
|
||||
let options: CompactionOptions = CompactionOptions::default();
|
||||
let _metrics = compact_files(&mut ds, options, None).await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// --- Guard 8: Dataset::delete returns DeleteResult { new_dataset, num_deleted_rows } ---
|
||||
//
|
||||
// `table_store.rs::delete_where` consumes both fields. When MR-A migrates
|
||||
// `delete_where` to two-phase via `DeleteBuilder::execute_uncommitted`, this
|
||||
// guard updates to pin the staged path. Compile-only.
|
||||
|
||||
#[allow(
|
||||
dead_code,
|
||||
unreachable_code,
|
||||
unused_variables,
|
||||
unused_mut,
|
||||
clippy::diverging_sub_expression
|
||||
)]
|
||||
async fn _compile_delete_result_field_shape() -> lance::Result<()> {
|
||||
let mut ds: Dataset = unimplemented!();
|
||||
let result: DeleteResult = ds.delete("x = 1").await?;
|
||||
let _new_dataset: Arc<Dataset> = result.new_dataset;
|
||||
let _num_deleted: u64 = result.num_deleted_rows;
|
||||
Ok(())
|
||||
}
|
||||
|
|
@ -3,7 +3,8 @@ mod helpers;
|
|||
use std::env;
|
||||
|
||||
use arrow_array::{Array, StringArray};
|
||||
use lance_index::{DatasetIndexExt, is_system_index};
|
||||
use lance::index::DatasetIndexExt;
|
||||
use lance_index::is_system_index;
|
||||
use serial_test::serial;
|
||||
|
||||
use omnigraph::db::Omnigraph;
|
||||
|
|
|
|||
|
|
@ -156,13 +156,26 @@ If a future need pulls one of these into scope, add a row to the matching domain
|
|||
|
||||
When Lance ships a major release that changes any of the above (file format bump, new index type, transaction semantics change, new branching primitive), refresh this index in the same change as the omnigraph upgrade. Stale Lance pointers are worse than no pointers.
|
||||
|
||||
### Last alignment audit: 2026-05-02 (Lance 4.0.1 upstream; omnigraph pinned at 4.0.0)
|
||||
### Last alignment audit: 2026-05-22 (Lance 6.0.1 upstream; omnigraph pinned at 6.0.1)
|
||||
|
||||
A full read-through of every index page above was performed in the MR-793 cycle. Findings (no code changes required for PR #70):
|
||||
Migration from Lance 4.0.0 → 6.0.1 landed in this cycle (DataFusion 52 → 53, Arrow 57 → 58, lance-tokenizer 6.0.1 added, tantivy* removed). Direct 4 → 6 jump; v5.x was not used as an intermediate (rationale in `~/.claude/plans/shimmering-percolating-duckling.md`). Behavior-affecting findings:
|
||||
|
||||
- The MemWAL "three sub-pages" (Overview / Details / Implementation) turned out to be **anchor sections on the single existing page** at `https://lance.org/format/table/mem_wal/` — not separate URLs. Findings: MemWAL is opt-in (requires an unenforced primary key + explicit shard config; omnigraph doesn't use it), operates intra-table (LSM-tree for streaming writes into one Lance table), and does NOT overlap with MR-847's cross-table manifest-vs-Lance-HEAD recovery problem. MR-847's design is unaffected.
|
||||
- The distributed-indexing guide names Python APIs (`commit_existing_index_segments`, `merge_existing_index_segments`); the Rust analogues exist via `CreateIndexBuilder::execute_uncommitted` for scalar indices but **`build_index_metadata_from_segments` is `pub(crate)`** and blocks vector-index two-phase commits from outside the lance crate. Filed [lance-format/lance#6666](https://github.com/lance-format/lance/issues/6666) as a companion to [#6658](https://github.com/lance-format/lance/issues/6658).
|
||||
- "Stable Row ID for Index" is documented as **experimental** in lance-4.0.x. Our datasets enable stable row IDs at the dataset level (`WriteParams::enable_stable_row_ids = true`); confirming whether our created indices opt into stable-row-id mode is a follow-up worth doing before MR-848 (index reconciler) lands.
|
||||
- Fragment Reuse Index (FRI) is documented as one of three compaction strategies. omnigraph currently uses option 2 (immediate index rewrite at compaction time, via `omnigraph optimize`'s post-compaction rebuild). Adopting FRI is the explicit option for compaction-friendly index updates; relevant to MR-848.
|
||||
- **DatasetIndexExt moved** from `lance-index` to `lance::index` (Lance PR #6280, v5.0). Six import sites updated. `lance-index::IndexType` and `lance-index::is_system_index` stayed in `lance-index`. `omnigraph-cli` and `omnigraph-server` gained `lance = { workspace = true }` in their dev-dependencies.
|
||||
- **`DescribeTableResponse` gained `is_only_declared: Option<bool>`** (lance-namespace 6.0+, v5.0 PR #6186). Set to `Some(false)` in both `BranchManifestNamespace::describe_table` and `StagedTableNamespace::describe_table` — every table we return is physically materialized via `Dataset::open`, never "declared-only."
|
||||
- **`MergeInsertBuilder` execute_reader return shape preserved** `(Arc<Dataset>, MergeStats)`; the publisher CAS chain at `db/manifest/publisher.rs:370-391` works unchanged. Pinned by `tests/lance_surface_guards.rs::_compile_merge_insert_builder_method_chain`.
|
||||
- **`LanceError::TooMuchWriteContention` variant retained** in v6.0.1 (no rename). The typed publisher translation at `db/manifest/publisher.rs:417-430` continues to apply. Pinned by `lance_surface_guards.rs::lance_error_too_much_write_contention_variant_exists`.
|
||||
- **`ManifestLocation` field shape stable**: `.path: object_store::path::Path`, `.size: Option<u64>`, `.e_tag: Option<String>`, `.naming_scheme: ManifestNamingScheme`. Pinned by `lance_surface_guards.rs::manifest_location_field_shape`.
|
||||
- **`LanceFileVersion::default()` flipped V2_0 → V2_1** (v5.0). No effect — every `data_storage_version` callsite explicitly pins `Some(LanceFileVersion::V2_2)` (load-bearing for blob v2: `Blob v2 requires file version >= 2.2` enforced in `lance/src/dataset/write.rs:748`).
|
||||
- **`Dataset::checkout_version(N).await?.restore().await?`**: `restore()` takes `&mut self` and returns `Result<()>` (mutates in place, does not consume + return a new dataset). The recovery rollback hammer at `db/manifest/recovery.rs:505-522` continues to work. Pinned by `lance_surface_guards.rs::_compile_checkout_version_then_restore_signature`.
|
||||
- **`DatasetBuilder::from_namespace(...).with_branch(...).with_version(...).load()`** surface preserved (the namespace builder chain at `db/manifest/namespace.rs:162-174`). Pinned by `lance_surface_guards.rs::_compile_dataset_builder_from_namespace_signature`.
|
||||
- **`compact_files(&mut ds, CompactionOptions::default(), None)`** signature stable. `CompactionOptions` still does not expose `data_storage_version`; `compact_files` builds its own `WriteParams { ..Default::default() }`. Note: `LanceFileVersion::default()` is now V2_1 in v6, so optimize-rewritten fragments come out at V2_1 by default (was V2_0 in v4). Existing explicit V2_2 pins on creates/appends still apply.
|
||||
- **`Dataset::delete(predicate)` returns `DeleteResult { new_dataset: Arc<Dataset>, num_deleted_rows: u64 }`** — unchanged shape. Pinned by `lance_surface_guards.rs::_compile_delete_result_field_shape`. MR-A will repurpose this guard to the staged two-phase variant once `DeleteBuilder::execute_uncommitted` migration lands.
|
||||
- **File reader read methods now async** (Lance PR #6710, v6.0). No effect — omnigraph reaches Lance exclusively through `Dataset::scan` and the staged-write API.
|
||||
- **Tokenizer vendored as `lance-tokenizer`** (Lance PR #6512, v6.0). No effect — no direct tokenizer imports.
|
||||
- **Lance #6658 closed** (2026-05-14) but `DeleteBuilder::execute_uncommitted` did **not** ship in v6.0.1 — binary search across the release stream shows it first appears in `v7.0.0-beta.10` (the closing commits landed on main but didn't backport to the 6.x line). Tracked as MR-A: migrate `delete_where` to staged, retire the parse-time D2 mutation rule, extend recovery sidecar coverage. **Gated on the Lance v7.x bump**, not this PR. v7.0.0-rc.1 dropped 2026-05-21.
|
||||
- **Lance #6666 still open** (`build_index_metadata_from_segments` public): vector-index two-phase blocked; inline `create_vector_index` residual retained.
|
||||
- **Lance #6877 still open** (`MergeInsertBuilder` dup-rowid): PR #109's `SourceDedupeBehavior::FirstSeen` + `check_batch_unique_by_keys` precondition stay load-bearing.
|
||||
|
||||
Surface guards added: `crates/omnigraph/tests/lance_surface_guards.rs` (8 named guards; 3 runtime + 5 compile-only). Future Lance bumps re-run this file first as the smoke check. Two additional guards from the original plan deferred to follow-up (`manifest_cas_returns_row_level_contention_variant` needs full publisher-race harness; `table_version_metadata_byte_compatible_with_v4` needs `pub(crate)` reach extension).
|
||||
|
||||
Bump this date stanza on the next alignment pass.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue