test(engine): pin Lance 7 immutable-PK behavior + sharpen native-namespace alignment notes (#240)

* test(engine): pin Lance 7 immutable-PK behavior + sharpen native-namespace alignment notes

Follow-up polish to the Lance 7.0.0 alignment (the immutable-PK migration fix
and the realigned native-namespace surface test). Two precision nits, no
behavior change:

1. Pin the upstream behavior we now depend on. Lance 7 makes the unenforced PK
   immutable once set (`lance::dataset::transaction`): re-applying the reserved
   `lance-schema:unenforced-primary-key` key — even with the same value — errors
   "cannot be changed once set". That is exactly what broke `migrate_v1_to_v2`'s
   crash-idempotency and forced its field-guard. Add
   `lance_surface_guards.rs::unenforced_primary_key_is_immutable_once_set` so a
   future Lance bump that relaxes immutability turns red, prompting re-evaluation
   of the migration guard. (Matches the "first smoke check on a Lance bump"
   discipline in docs/dev/lance.md.)

2. Clarify that the native `DirectoryNamespace` decoupling is contingent on
   omnigraph's legacy boolean PK key, not an unconditional v7 property: with the
   position key the native namespace would still read the manifest. omnigraph
   keeps the boolean key deliberately — Lance honors it permanently (maps to PK
   position 0) and one uniform on-disk format beats a new-vs-old split, since
   existing graphs can't be re-keyed under the same immutability rule. Updated
   the test comment and the lance.md stanza; also corrected the stale `is_empty()`
   description of the migration guard (it now matches on the specific PK field).

* test(engine): make the immutable-PK guard's red-bar diagnostic fire in every change-shape

Review follow-up: the guard's re-set assertion chained `.unwrap().await.unwrap_err()`,
which only surfaces the actionable "Lance no longer rejects re-setting the unenforced
PK" message when immutability is enforced on the async commit path and still returns an
error. Two other change-shapes would panic generically instead, defeating the guard's
purpose:

- if Lance moves the check to the sync validation stage, the first `.unwrap()` panics
  with a bare "unwrap() on Err";
- if Lance relaxes immutability so the re-set succeeds, `.unwrap_err()` panics with a
  bare "unwrap_err() on Ok".

Normalize the sync `.update()` result and the async `.await` into one `Result` and assert
on it, so the diagnostic fires whichever stage enforces (or relaxes) the rule.
This commit is contained in:
Ragnor Comerford 2026-06-15 11:33:25 +02:00 committed by GitHub
parent 9a607abdec
commit c3d7639377
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 92 additions and 8 deletions

View file

@ -337,12 +337,23 @@ async fn test_directory_namespace_direct_publish_cannot_replace_native_omnigraph
.unwrap();
// Lance 7: the native `DirectoryNamespace` no longer recognizes omnigraph's
// manifest-tracked tables. `check_table_status` reports the table absent
// (`lance-namespace-impls` dir.rs), so list / describe / create_table_version
// all return `TableNotFound`. The native path is fully decoupled from
// omnigraph's manifest authority: it cannot enumerate, inspect, or publish
// over omnigraph's tables. (Pre-v7 the native namespace could still *see* the
// published version but never replace the write path; v7 only widens the gap.)
// manifest-tracked tables, so list / describe / create_table_version all
// return `TableNotFound`. The mechanism is *contingent on omnigraph's legacy
// boolean PK key*, not an unconditional v7 property: v7's namespace eagerly
// rewrites any `__manifest` whose `object_id` lacks the new
// `lance-schema:unenforced-primary-key:position` key, omnigraph declares the
// PK with the legacy boolean key, and v7 forbids changing a PK once set — so
// `ensure_manifest_table_up_to_date` errors, the namespace silently falls
// back to directory listing (disabled here), and `check_table_status` reports
// the table absent. omnigraph keeps the boolean key deliberately: Lance
// honors it permanently (it maps to PK position 0) and one uniform on-disk
// format beats a new-vs-old split, since existing graphs can't be re-keyed to
// the position key under that same immutability rule. The decoupling is
// therefore an accepted, production-irrelevant tradeoff (omnigraph never uses
// the native namespace — its publisher writes `__manifest` via merge_insert
// and its reads go through its own `LanceNamespace` impls), and it only
// strengthens this guard's thesis: native tooling cannot enumerate, inspect,
// or publish over omnigraph's tables, let alone replace the write path.
let assert_table_not_found = |what: &str, dbg: String| {
assert!(
dbg.contains("TableNotFound") && dbg.contains("node:Person"),

View file

@ -918,3 +918,76 @@ async fn skip_auto_cleanup_suppresses_version_gc() {
__manifest-pinned versions on upgraded (pre-bump) graphs."
);
}
// --- Guard 19: unenforced primary key is immutable once set (lance v7) ------
//
// Lance 7 (`lance::dataset::transaction`) makes the unenforced PK reserved:
// once `lance-schema:unenforced-primary-key` is set on a field, any later write
// that touches that reserved key — even re-applying the SAME value — errors
// "the unenforced primary key is a reserved key and cannot be changed once set".
//
// This is the upstream behavior that broke
// `db/manifest/migrations.rs::migrate_v1_to_v2`'s crash-idempotency: a
// pre-v0.4.0 graph that crashed after the field-set but before the stamp bump
// re-enters the migration with the PK already present, and on Lance 6 the
// re-apply was a no-op. The migration now guards the set on the manifest's
// unenforced-PK field (`["object_id"]` → no-op, `[]` → set, anything else →
// loud refusal). If Lance ever relaxes immutability (a re-set becomes a no-op
// again), this guard goes red — revisit whether that field-guard is still
// needed, and re-pin docs/dev/lance.md.
#[tokio::test]
async fn unenforced_primary_key_is_immutable_once_set() {
use lance::datatypes::LANCE_UNENFORCED_PRIMARY_KEY;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().join("g19.lance");
let mut ds = fresh_dataset(uri.to_str().unwrap()).await;
// Precondition: no unenforced PK yet (mirrors a genuine pre-v0.4.0 manifest).
assert!(
ds.schema().unenforced_primary_key().is_empty(),
"fresh dataset should carry no unenforced primary key"
);
// First set succeeds — the genuine pre-v0.4.0 migration path. (Discard the
// returned &Schema so the &mut borrow ends before the next call.)
ds.update_field_metadata()
.update(
"id",
[(LANCE_UNENFORCED_PRIMARY_KEY.to_string(), "true".to_string())],
)
.unwrap()
.await
.unwrap();
let pk: Vec<String> = ds
.schema()
.unenforced_primary_key()
.iter()
.map(|field| field.name.clone())
.collect();
assert_eq!(
pk,
["id"],
"first set should install `id` as the unenforced PK"
);
// Re-applying the SAME reserved key must still error. Normalize the sync
// validation stage (`.update()`) and the async commit stage (`.await`) into
// one Result so the actionable diagnostic below fires whichever stage Lance
// enforces immutability at — and even if a future Lance relaxes it to `Ok`.
// Bare `.unwrap()` / `.unwrap_err()` would instead panic with a generic
// message in those cases, defeating the guard's purpose.
let outcome: lance::Result<()> = match ds.update_field_metadata().update(
"id",
[(LANCE_UNENFORCED_PRIMARY_KEY.to_string(), "true".to_string())],
) {
Ok(builder) => builder.await.map(|_| ()),
Err(e) => Err(e),
};
assert!(
matches!(&outcome, Err(e) if e.to_string().contains("cannot be changed once set")),
"Lance no longer rejects re-setting the unenforced PK as immutable \
(got: {outcome:?}); immutability relaxed or moved off the commit path \
revisit migrate_v1_to_v2's field-guard and re-pin docs/dev/lance.md."
);
}