mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-12 01:45:14 +02:00
Merge remote-tracking branch 'origin/main' into feat/cluster-apply-stage3a
This commit is contained in:
commit
69b63c33ac
33 changed files with 1250 additions and 1249 deletions
|
|
@ -240,7 +240,7 @@ omnigraph policy explain --actor act-alice --action change --branch main
|
|||
| Columnar storage on object store | ✅ Arrow/Lance | URI normalization, S3 env-var plumbing |
|
||||
| Per-dataset versioning + time travel | ✅ | `snapshot_at_version`, `entity_at`, snapshot-pinned reads across many tables |
|
||||
| Per-dataset branches | ✅ | **Graph-level** branches (atomic across all sub-tables), lazy fork, system branch filtering |
|
||||
| Atomic single-dataset commits | ✅ | **Multi-table publish via three layers**, NOT a single Lance primitive: (1) per-table Lance `commit_staged` for the data write, (2) `__manifest` row-level CAS via `ManifestBatchPublisher` for cross-table ordering, (3) the open-time recovery sweep for the residual gap between (1) and (2). All three layers ship; the five migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`, `optimize_all_tables`) write a `__recovery/{ulid}.json` sidecar before Phase B and delete it after Phase C. The next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the sweep in `db/manifest/recovery.rs`: classify, decide all-or-nothing per sidecar, roll forward via single `ManifestBatchPublisher::publish` or roll back via `Dataset::restore` followed by a manifest publish of the restored version (so both directions converge to `manifest == HEAD` — no residual drift), and record an audit row in `_graph_commit_recoveries.lance` (queryable via `omnigraph commit list --filter actor=omnigraph:recovery`). Continuous in-process recovery (no restart needed between Phase B failure and recovery) is the goal of a future background reconciler. Engine writes route through a sealed `TableStorage` trait exposing `stage_*` + `commit_staged` as the canonical staged-write surface; documented inline-commit residuals (`delete_where`, `create_vector_index`, plus legacy `append_batch` / `merge_insert_batches` / `overwrite_batch` / `create_*_index`) remain on the trait until upstream Lance ships a public two-phase API ([#6658](https://github.com/lance-format/lance/issues/6658), [#6666](https://github.com/lance-format/lance/issues/6666)) and the migration of every call site completes. |
|
||||
| Atomic single-dataset commits | ✅ | **Multi-table publish via three layers**, NOT a single Lance primitive: (1) per-table Lance `commit_staged` for the data write, (2) `__manifest` row-level CAS via `ManifestBatchPublisher` for cross-table ordering, (3) the open-time recovery sweep for the residual gap between (1) and (2). All three layers ship; the five migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`, `optimize_all_tables`) write a `__recovery/{ulid}.json` sidecar before Phase B and delete it after Phase C. The next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the sweep in `db/manifest/recovery.rs`: classify, decide all-or-nothing per sidecar, roll forward via single `ManifestBatchPublisher::publish` or roll back via `Dataset::restore` followed by a manifest publish of the restored version (so both directions converge to `manifest == HEAD` — no residual drift), and record an audit row in `_graph_commit_recoveries.lance` (queryable via `omnigraph commit list --filter actor=omnigraph:recovery`). Continuous in-process recovery (no restart needed between Phase B failure and recovery) is the goal of a future background reconciler. Engine writes route through a sealed `TableStorage` trait (`db.storage()`) exposing only `stage_*` + `commit_staged` + reads; the inline-commit residuals (`delete_where`, `create_vector_index`) are split onto a separate sealed `InlineCommitResidual` trait reached via `db.storage_inline_residual()` (MR-854), so the default surface cannot couple a write with a HEAD advance — §1 holds by construction. `delete_where` and `create_vector_index` stay inline until upstream Lance ships a public two-phase API ([#6658](https://github.com/lance-format/lance/issues/6658), [#6666](https://github.com/lance-format/lance/issues/6666)); `LoadMode::Overwrite` uses Lance `Overwrite` staged transactions. |
|
||||
| Compaction (`compact_files`) | ✅ | `omnigraph optimize` orchestrates over all node/edge tables, bounded concurrency; **publishes each compacted table's new version to `__manifest`** (so the manifest tracks the Lance HEAD — required for reads to observe compaction and for schema apply / strict writes to pass their HEAD-vs-manifest precondition), under the per-`(table, main)` write queue with `SidecarKind::Optimize` recovery coverage; **refuses on an unrecovered graph** (errors if a `__recovery` sidecar is pending); **skips uncovered HEAD > manifest drift** with `DriftNeedsRepair` instead of interpreting it; **skips blob-bearing tables** (reported via `TableOptimizeStats.skipped`, not silent), gated on `LANCE_SUPPORTS_BLOB_COMPACTION` until the upstream blob-v2 compaction-decode bug is fixed (see [docs/dev/invariants.md](docs/dev/invariants.md) Known Gaps) |
|
||||
| Repair uncovered drift | — | `omnigraph repair` explicitly classifies uncovered table `HEAD > manifest` drift: verified maintenance drift (`ReserveFragments`/`Rewrite`) can be published with `--confirm`; suspicious or unverifiable drift requires `--force --confirm`. Sidecar-covered crash residuals still recover automatically on open. |
|
||||
| Cleanup (`cleanup_old_versions`) | ✅ | `omnigraph cleanup` with `--keep` / `--older-than` policy |
|
||||
|
|
|
|||
|
|
@ -7,6 +7,7 @@ use lance::dataset::scanner::ColumnOrdering;
|
|||
use crate::db::SubTableEntry;
|
||||
use crate::db::manifest::Snapshot;
|
||||
use crate::error::Result;
|
||||
use crate::storage_layer::{SnapshotHandle, TableStorage};
|
||||
use crate::table_store::TableStore;
|
||||
|
||||
// ─── Types ──────────────────────────────────────────────────────────────────
|
||||
|
|
@ -229,7 +230,8 @@ async fn diff_table_same_lineage(
|
|||
) -> Result<Vec<EntityChange>> {
|
||||
let vf = from_entry.table_version;
|
||||
let vt = to_entry.table_version;
|
||||
let to_ds = table_store.open_at_entry(to_entry).await?;
|
||||
let storage: &dyn TableStorage = table_store;
|
||||
let to_ds = storage.open_snapshot_at_entry(to_entry).await?;
|
||||
|
||||
let cols: Vec<&str> = if is_edge {
|
||||
vec!["id", "src", "dst", "_row_last_updated_at_version"]
|
||||
|
|
@ -257,12 +259,12 @@ async fn diff_table_same_lineage(
|
|||
"_row_last_updated_at_version > {} AND _row_last_updated_at_version <= {}",
|
||||
vf, vt
|
||||
);
|
||||
let changed_rows = scan_with_filter(table_store, &to_ds, &cols, &filter_sql).await?;
|
||||
let changed_rows = scan_with_filter(storage, &to_ds, &cols, &filter_sql).await?;
|
||||
|
||||
if !changed_rows.is_empty() {
|
||||
// Build the set of IDs that existed at the from version
|
||||
let from_ds = table_store.open_at_entry(from_entry).await?;
|
||||
let from_ids: HashSet<String> = scan_id_set(table_store, &from_ds, &["id"])
|
||||
let from_ds = storage.open_snapshot_at_entry(from_entry).await?;
|
||||
let from_ids: HashSet<String> = scan_id_set(storage, &from_ds, &["id"])
|
||||
.await?
|
||||
.into_iter()
|
||||
.map(|r| r.id)
|
||||
|
|
@ -282,8 +284,8 @@ async fn diff_table_same_lineage(
|
|||
|
||||
// Deletes: ID set-difference
|
||||
if wants_deletes {
|
||||
let from_ds = table_store.open_at_entry(from_entry).await?;
|
||||
let deleted = deleted_ids_by_set_diff(table_store, &from_ds, &to_ds, is_edge).await?;
|
||||
let from_ds = storage.open_snapshot_at_entry(from_entry).await?;
|
||||
let deleted = deleted_ids_by_set_diff(storage, &from_ds, &to_ds, is_edge).await?;
|
||||
changes.extend(deleted);
|
||||
}
|
||||
|
||||
|
|
@ -300,13 +302,14 @@ async fn diff_table_cross_branch(
|
|||
is_edge: bool,
|
||||
filter: &ChangeFilter,
|
||||
) -> Result<Vec<EntityChange>> {
|
||||
let from_ds = table_store
|
||||
.open_snapshot_table(from_snap, table_key)
|
||||
let storage: &dyn TableStorage = table_store;
|
||||
let from_ds = storage
|
||||
.open_snapshot_at_table(from_snap, table_key)
|
||||
.await?;
|
||||
let to_ds = table_store.open_snapshot_table(to_snap, table_key).await?;
|
||||
let to_ds = storage.open_snapshot_at_table(to_snap, table_key).await?;
|
||||
|
||||
let from_rows = scan_all_rows_ordered(table_store, &from_ds, is_edge).await?;
|
||||
let to_rows = scan_all_rows_ordered(table_store, &to_ds, is_edge).await?;
|
||||
let from_rows = scan_all_rows_ordered(storage, &from_ds, is_edge).await?;
|
||||
let to_rows = scan_all_rows_ordered(storage, &to_ds, is_edge).await?;
|
||||
|
||||
let mut changes = Vec::new();
|
||||
let mut fi = 0;
|
||||
|
|
@ -392,8 +395,9 @@ async fn diff_table_added(
|
|||
if !filter.wants_op(ChangeOp::Insert) {
|
||||
return Ok(Vec::new());
|
||||
}
|
||||
let ds = table_store.open_snapshot_table(to_snap, table_key).await?;
|
||||
let rows = scan_all_rows_ordered(table_store, &ds, is_edge).await?;
|
||||
let storage: &dyn TableStorage = table_store;
|
||||
let ds = storage.open_snapshot_at_table(to_snap, table_key).await?;
|
||||
let rows = scan_all_rows_ordered(storage, &ds, is_edge).await?;
|
||||
Ok(rows
|
||||
.into_iter()
|
||||
.map(|r| entity_change_from_row(&r, ChangeOp::Insert, is_edge))
|
||||
|
|
@ -410,10 +414,11 @@ async fn diff_table_removed(
|
|||
if !filter.wants_op(ChangeOp::Delete) {
|
||||
return Ok(Vec::new());
|
||||
}
|
||||
let ds = table_store
|
||||
.open_snapshot_table(from_snap, table_key)
|
||||
let storage: &dyn TableStorage = table_store;
|
||||
let ds = storage
|
||||
.open_snapshot_at_table(from_snap, table_key)
|
||||
.await?;
|
||||
let rows = scan_all_rows_ordered(table_store, &ds, is_edge).await?;
|
||||
let rows = scan_all_rows_ordered(storage, &ds, is_edge).await?;
|
||||
Ok(rows
|
||||
.into_iter()
|
||||
.map(|r| entity_change_from_row(&r, ChangeOp::Delete, is_edge))
|
||||
|
|
@ -424,12 +429,12 @@ async fn diff_table_removed(
|
|||
|
||||
/// Scan with a SQL filter, projecting specific columns.
|
||||
async fn scan_with_filter(
|
||||
table_store: &TableStore,
|
||||
ds: &lance::Dataset,
|
||||
storage: &dyn TableStorage,
|
||||
ds: &SnapshotHandle,
|
||||
cols: &[&str],
|
||||
filter_sql: &str,
|
||||
) -> Result<Vec<ScannedRow>> {
|
||||
let batches = table_store
|
||||
let batches = storage
|
||||
.scan(ds, Some(cols), Some(filter_sql), None)
|
||||
.await?;
|
||||
Ok(extract_rows(&batches))
|
||||
|
|
@ -437,11 +442,11 @@ async fn scan_with_filter(
|
|||
|
||||
/// Scan all rows ordered by id, projecting id (+ src/dst for edges) + all columns for signature.
|
||||
async fn scan_all_rows_ordered(
|
||||
table_store: &TableStore,
|
||||
ds: &lance::Dataset,
|
||||
storage: &dyn TableStorage,
|
||||
ds: &SnapshotHandle,
|
||||
is_edge: bool,
|
||||
) -> Result<Vec<ScannedRow>> {
|
||||
let batches = table_store
|
||||
let batches = storage
|
||||
.scan(
|
||||
ds,
|
||||
None,
|
||||
|
|
@ -454,9 +459,9 @@ async fn scan_all_rows_ordered(
|
|||
|
||||
/// Compute deleted IDs: scan id at from and to, set-difference.
|
||||
async fn deleted_ids_by_set_diff(
|
||||
table_store: &TableStore,
|
||||
from_ds: &lance::Dataset,
|
||||
to_ds: &lance::Dataset,
|
||||
storage: &dyn TableStorage,
|
||||
from_ds: &SnapshotHandle,
|
||||
to_ds: &SnapshotHandle,
|
||||
is_edge: bool,
|
||||
) -> Result<Vec<EntityChange>> {
|
||||
let cols: Vec<&str> = if is_edge {
|
||||
|
|
@ -465,8 +470,8 @@ async fn deleted_ids_by_set_diff(
|
|||
vec!["id"]
|
||||
};
|
||||
|
||||
let from_rows = scan_id_set(table_store, from_ds, &cols).await?;
|
||||
let to_ids: HashSet<String> = scan_id_set(table_store, to_ds, &["id"])
|
||||
let from_rows = scan_id_set(storage, from_ds, &cols).await?;
|
||||
let to_ids: HashSet<String> = scan_id_set(storage, to_ds, &["id"])
|
||||
.await?
|
||||
.into_iter()
|
||||
.map(|r| r.id)
|
||||
|
|
@ -480,11 +485,11 @@ async fn deleted_ids_by_set_diff(
|
|||
}
|
||||
|
||||
async fn scan_id_set(
|
||||
table_store: &TableStore,
|
||||
ds: &lance::Dataset,
|
||||
storage: &dyn TableStorage,
|
||||
ds: &SnapshotHandle,
|
||||
cols: &[&str],
|
||||
) -> Result<Vec<ScannedRow>> {
|
||||
let batches = table_store.scan(ds, Some(cols), None, None).await?;
|
||||
let batches = storage.scan(ds, Some(cols), None, None).await?;
|
||||
Ok(extract_rows(&batches))
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -25,7 +25,7 @@
|
|||
//! version. Pinned by
|
||||
//! `tests/staged_writes.rs::lance_restore_appends_one_commit_with_checked_out_content`.
|
||||
//! - `Dataset::restore` "wins" against concurrent Append/Update/Delete/
|
||||
//! CreateIndex/Merge — see `check_restore_txn` at lance-4.0.0
|
||||
//! CreateIndex/Merge — see `check_restore_txn` at lance-6.0.1
|
||||
//! `src/io/commit/conflict_resolver.rs:986`. The hazard is documented
|
||||
//! by `tests/staged_writes.rs::lance_restore_loses_to_concurrent_append_via_orphaning`.
|
||||
//! This module sidesteps the hazard by running recovery only at
|
||||
|
|
|
|||
|
|
@ -26,6 +26,7 @@ use crate::db::graph_coordinator::{GraphCoordinator, PublishedSnapshot};
|
|||
use crate::error::{OmniError, Result};
|
||||
use crate::runtime_cache::RuntimeCache;
|
||||
use crate::storage::{StorageAdapter, join_uri, normalize_root_uri, storage_for_uri};
|
||||
use crate::storage_layer::SnapshotHandle;
|
||||
use crate::table_store::TableStore;
|
||||
|
||||
mod export;
|
||||
|
|
@ -583,19 +584,30 @@ impl Omnigraph {
|
|||
schema_apply::ensure_schema_apply_not_locked(self, operation).await
|
||||
}
|
||||
|
||||
pub(crate) fn table_store(&self) -> &TableStore {
|
||||
/// Engine-facing trait surface around `TableStore`.
|
||||
///
|
||||
/// This is the **only** accessor for engine code reaching into the
|
||||
/// storage layer. The trait's signatures use opaque `SnapshotHandle`
|
||||
/// / `StagedHandle` instead of leaking `lance::Dataset` /
|
||||
/// `lance::dataset::transaction::Transaction`, so newly-added engine
|
||||
/// call sites cannot drift the staged-write invariant by mistake
|
||||
/// (the trait's `stage_*` + `commit_staged` pair is the only way to
|
||||
/// land a write).
|
||||
pub(crate) fn storage(&self) -> &dyn crate::storage_layer::TableStorage {
|
||||
&self.table_store
|
||||
}
|
||||
|
||||
/// Engine-facing trait surface around `TableStore`.
|
||||
///
|
||||
/// This is the canonical accessor for newly-written engine code. The
|
||||
/// trait's signatures use opaque `SnapshotHandle` / `StagedHandle`
|
||||
/// instead of leaking `lance::Dataset` /
|
||||
/// `lance::dataset::transaction::Transaction`. Existing call sites
|
||||
/// that still use `db.table_store.X(...)` (the inherent struct
|
||||
/// methods) are migrated incrementally.
|
||||
pub(crate) fn storage(&self) -> &dyn crate::storage_layer::TableStorage {
|
||||
/// Inline-commit residual surface (`delete_where`,
|
||||
/// `create_vector_index`) — the writes Lance cannot yet express as a
|
||||
/// stage-then-commit pair. Deliberately separate from [`Self::storage`] so
|
||||
/// the default storage surface is staged-only and a new writer cannot couple
|
||||
/// "write bytes" with "advance HEAD" by reaching for `db.storage()`. Only
|
||||
/// the handful of documented residual call sites (mutation/merge deletes,
|
||||
/// vector-index build) use this accessor. See
|
||||
/// `crate::storage_layer::InlineCommitResidual` for the per-method blocker.
|
||||
pub(crate) fn storage_inline_residual(
|
||||
&self,
|
||||
) -> &dyn crate::storage_layer::InlineCommitResidual {
|
||||
&self.table_store
|
||||
}
|
||||
|
||||
|
|
@ -1055,19 +1067,24 @@ impl Omnigraph {
|
|||
|
||||
let snapshot = self.snapshot().await;
|
||||
let table_key = format!("node:{}", type_name);
|
||||
let ds = snapshot.open(&table_key).await?;
|
||||
let handle = self
|
||||
.storage()
|
||||
.open_snapshot_at_table(&snapshot, &table_key)
|
||||
.await?;
|
||||
|
||||
let filter_sql = format!("id = '{}'", id.replace('\'', "''"));
|
||||
let row_id = self
|
||||
.table_store
|
||||
.first_row_id_for_filter(&ds, &filter_sql)
|
||||
.storage()
|
||||
.first_row_id_for_filter(&handle, &filter_sql)
|
||||
.await?
|
||||
.ok_or_else(|| {
|
||||
OmniError::manifest(format!("no {} with id '{}' found", type_name, id))
|
||||
})?;
|
||||
|
||||
// Use take_blobs to get the BlobFile handle
|
||||
let ds = Arc::new(ds);
|
||||
// `take_blobs` is a Lance-specific blob accessor not surfaced
|
||||
// through the `TableStorage` trait — reach the inner `Arc<Dataset>`
|
||||
// via the `pub(crate)` accessor for this read-only call.
|
||||
let ds = handle.into_arc();
|
||||
let mut blobs = ds
|
||||
.take_blobs(&[row_id], property)
|
||||
.await
|
||||
|
|
@ -1141,10 +1158,14 @@ impl Omnigraph {
|
|||
cleanup_targets.sort_by(|left, right| left.0.cmp(&right.0));
|
||||
|
||||
for (table_key, table_path) in cleanup_targets {
|
||||
let dataset_uri = self.table_store.dataset_uri(&table_path);
|
||||
let dataset_uri = self.storage().dataset_uri(&table_path);
|
||||
let outcome = match crate::failpoints::maybe_fail("branch_delete.before_table_cleanup")
|
||||
{
|
||||
Ok(()) => self.table_store.force_delete_branch(&dataset_uri, branch).await,
|
||||
Ok(()) => {
|
||||
self.storage()
|
||||
.force_delete_branch(&dataset_uri, branch)
|
||||
.await
|
||||
}
|
||||
Err(injected) => Err(injected),
|
||||
};
|
||||
if let Err(err) = outcome {
|
||||
|
|
@ -1370,7 +1391,7 @@ impl Omnigraph {
|
|||
&self,
|
||||
table_key: &str,
|
||||
op_kind: crate::db::MutationOpKind,
|
||||
) -> Result<(Dataset, String, Option<String>)> {
|
||||
) -> Result<(SnapshotHandle, String, Option<String>)> {
|
||||
table_ops::open_for_mutation(self, table_key, op_kind).await
|
||||
}
|
||||
|
||||
|
|
@ -1379,7 +1400,7 @@ impl Omnigraph {
|
|||
branch: Option<&str>,
|
||||
table_key: &str,
|
||||
op_kind: crate::db::MutationOpKind,
|
||||
) -> Result<(Dataset, String, Option<String>)> {
|
||||
) -> Result<(SnapshotHandle, String, Option<String>)> {
|
||||
table_ops::open_for_mutation_on_branch(self, branch, table_key, op_kind).await
|
||||
}
|
||||
|
||||
|
|
@ -1390,7 +1411,7 @@ impl Omnigraph {
|
|||
source_branch: Option<&str>,
|
||||
source_version: u64,
|
||||
active_branch: &str,
|
||||
) -> Result<Dataset> {
|
||||
) -> Result<SnapshotHandle> {
|
||||
table_ops::fork_dataset_from_entry_state(
|
||||
self,
|
||||
table_key,
|
||||
|
|
@ -1409,7 +1430,7 @@ impl Omnigraph {
|
|||
table_branch: Option<&str>,
|
||||
expected_version: u64,
|
||||
op_kind: crate::db::MutationOpKind,
|
||||
) -> Result<Dataset> {
|
||||
) -> Result<SnapshotHandle> {
|
||||
table_ops::reopen_for_mutation(
|
||||
self,
|
||||
table_key,
|
||||
|
|
@ -1426,14 +1447,14 @@ impl Omnigraph {
|
|||
table_path: &str,
|
||||
table_branch: Option<&str>,
|
||||
table_version: u64,
|
||||
) -> Result<Dataset> {
|
||||
) -> Result<SnapshotHandle> {
|
||||
table_ops::open_dataset_at_state(self, table_path, table_branch, table_version).await
|
||||
}
|
||||
|
||||
pub(crate) async fn build_indices_on_dataset(
|
||||
&self,
|
||||
table_key: &str,
|
||||
ds: &mut Dataset,
|
||||
ds: &mut SnapshotHandle,
|
||||
) -> Result<()> {
|
||||
table_ops::build_indices_on_dataset(self, table_key, ds).await
|
||||
}
|
||||
|
|
@ -1442,7 +1463,7 @@ impl Omnigraph {
|
|||
&self,
|
||||
catalog: &Catalog,
|
||||
table_key: &str,
|
||||
ds: &mut Dataset,
|
||||
ds: &mut SnapshotHandle,
|
||||
) -> Result<()> {
|
||||
table_ops::build_indices_on_dataset_for_catalog(self, catalog, table_key, ds).await
|
||||
}
|
||||
|
|
@ -2139,8 +2160,12 @@ edge WorksAt: Person -> Company
|
|||
|
||||
async fn table_rows_json(db: &Omnigraph, table_key: &str) -> Vec<Value> {
|
||||
let snapshot = db.snapshot().await;
|
||||
let ds = snapshot.open(table_key).await.unwrap();
|
||||
let batches = db.table_store().scan_batches(&ds).await.unwrap();
|
||||
let ds = db
|
||||
.storage()
|
||||
.open_snapshot_at_table(&snapshot, table_key)
|
||||
.await
|
||||
.unwrap();
|
||||
let batches = db.storage().scan_batches(&ds).await.unwrap();
|
||||
batches
|
||||
.into_iter()
|
||||
.flat_map(|batch| {
|
||||
|
|
@ -2152,11 +2177,11 @@ edge WorksAt: Person -> Company
|
|||
}
|
||||
|
||||
async fn seed_person_row(db: &mut Omnigraph, name: &str, age: Option<i32>) {
|
||||
let (mut ds, full_path, table_branch) = db
|
||||
let (ds, full_path, table_branch) = db
|
||||
.open_for_mutation("node:Person", crate::db::MutationOpKind::Insert)
|
||||
.await
|
||||
.unwrap();
|
||||
let schema: Arc<Schema> = Arc::new(ds.schema().into());
|
||||
let schema: Arc<Schema> = Arc::new(ds.dataset().schema().into());
|
||||
let columns: Vec<Arc<dyn Array>> = schema
|
||||
.fields()
|
||||
.iter()
|
||||
|
|
@ -2168,9 +2193,11 @@ edge WorksAt: Person -> Company
|
|||
})
|
||||
.collect();
|
||||
let batch = RecordBatch::try_new(Arc::clone(&schema), columns).unwrap();
|
||||
let staged = db.storage().stage_append(&ds, batch, &[]).await.unwrap();
|
||||
let committed = db.storage().commit_staged(ds, staged).await.unwrap();
|
||||
let state = db
|
||||
.table_store()
|
||||
.append_batch(&full_path, &mut ds, batch)
|
||||
.storage()
|
||||
.table_state(&full_path, &committed)
|
||||
.await
|
||||
.unwrap();
|
||||
db.commit_updates(&[crate::db::SubTableUpdate {
|
||||
|
|
@ -2354,8 +2381,12 @@ edge WorksAt: Person -> Company
|
|||
db.apply_schema(&desired).await.unwrap();
|
||||
|
||||
let snapshot = db.snapshot().await;
|
||||
let ds = snapshot.open("node:Person").await.unwrap();
|
||||
assert!(db.table_store().has_fts_index(&ds, "name").await.unwrap());
|
||||
let ds = db
|
||||
.storage()
|
||||
.open_snapshot_at_table(&snapshot, "node:Person")
|
||||
.await
|
||||
.unwrap();
|
||||
assert!(db.storage().has_fts_index(&ds, "name").await.unwrap());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
|
|
@ -2373,9 +2404,13 @@ edge WorksAt: Person -> Company
|
|||
db.apply_schema(&desired).await.unwrap();
|
||||
|
||||
let snapshot = db.snapshot().await;
|
||||
let ds = snapshot.open("node:Person").await.unwrap();
|
||||
assert!(db.table_store().has_btree_index(&ds, "id").await.unwrap());
|
||||
assert!(db.table_store().has_fts_index(&ds, "name").await.unwrap());
|
||||
let ds = db
|
||||
.storage()
|
||||
.open_snapshot_at_table(&snapshot, "node:Person")
|
||||
.await
|
||||
.unwrap();
|
||||
assert!(db.storage().has_btree_index(&ds, "id").await.unwrap());
|
||||
assert!(db.storage().has_fts_index(&ds, "name").await.unwrap());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
|
|
|
|||
|
|
@ -60,12 +60,12 @@ async fn entity_from_snapshot(
|
|||
}
|
||||
|
||||
let ds = db
|
||||
.table_store
|
||||
.open_snapshot_table(snapshot, table_key)
|
||||
.storage()
|
||||
.open_snapshot_at_table(snapshot, table_key)
|
||||
.await?;
|
||||
let filter_sql = format!("id = '{}'", id.replace('\'', "''"));
|
||||
let batches = db
|
||||
.table_store
|
||||
.storage()
|
||||
.scan(&ds, None, Some(&filter_sql), None)
|
||||
.await?;
|
||||
let Some(batch) = batches.iter().find(|batch| batch.num_rows() > 0) else {
|
||||
|
|
@ -143,23 +143,23 @@ async fn export_table_to_writer<W: Write>(
|
|||
writer: &mut W,
|
||||
) -> Result<()> {
|
||||
let ds = db
|
||||
.table_store
|
||||
.open_snapshot_table(snapshot, table_key)
|
||||
.storage()
|
||||
.open_snapshot_at_table(snapshot, table_key)
|
||||
.await?;
|
||||
let ordering = Some(vec![ColumnOrdering::asc_nulls_last("id".to_string())]);
|
||||
let catalog = db.catalog();
|
||||
let blob_properties = blob_properties_for_table_key(&catalog, table_key)?;
|
||||
|
||||
if blob_properties.is_empty() {
|
||||
for batch in db.table_store.scan(&ds, None, None, ordering).await? {
|
||||
for batch in db.storage().scan(&ds, None, None, ordering).await? {
|
||||
write_export_rows_from_batch(db, table_key, &batch, None, writer)?;
|
||||
}
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let batches = db
|
||||
.table_store
|
||||
.scan_with(&ds, None, None, ordering, true, |_| Ok(()))
|
||||
.storage()
|
||||
.scan_with_row_id(&ds, None, None, ordering, true)
|
||||
.await?;
|
||||
for batch in batches {
|
||||
let row_ids = batch
|
||||
|
|
@ -175,7 +175,13 @@ async fn export_table_to_writer<W: Write>(
|
|||
.iter()
|
||||
.copied()
|
||||
.collect::<Vec<_>>();
|
||||
let blob_values = export_blob_values(&ds, &batch, &row_ids, blob_properties).await?;
|
||||
// Blob materialization reaches through to the inner Lance
|
||||
// `Dataset` because `take_blobs` is a Lance-only API not lifted
|
||||
// onto the `TableStorage` trait surface (the trait covers
|
||||
// staged-write and snapshot-scan primitives; blob descriptor
|
||||
// materialization sits outside that surface).
|
||||
let blob_values =
|
||||
export_blob_values(ds.dataset(), &batch, &row_ids, blob_properties).await?;
|
||||
write_export_rows_from_batch(db, table_key, &batch, Some(&blob_values), writer)?;
|
||||
}
|
||||
Ok(())
|
||||
|
|
|
|||
|
|
@ -317,10 +317,16 @@ async fn optimize_one_table(
|
|||
.acquire_many(&[(table_key.clone(), None)])
|
||||
.await;
|
||||
|
||||
let mut ds = db
|
||||
.table_store
|
||||
// `compact_files` is a Lance-only maintenance API that needs `&mut Dataset`.
|
||||
// The `TableStorage` trait deliberately does not surface it (the staged-write
|
||||
// invariant covers writes; compaction is a separate concern). Unwrap the
|
||||
// opaque `SnapshotHandle` via `into_dataset()` (`pub(crate)`, gated to the
|
||||
// maintenance path).
|
||||
let handle = db
|
||||
.storage()
|
||||
.open_dataset_head_for_write(&table_key, &full_path, None)
|
||||
.await?;
|
||||
let mut ds = handle.into_dataset();
|
||||
|
||||
// CAS baseline: the table's current manifest version, read under the queue
|
||||
// (in-memory coordinator snapshot, no storage I/O — stable for this section).
|
||||
|
|
@ -408,7 +414,10 @@ async fn optimize_one_table(
|
|||
// expected = the version observed under the queue). On failure the sidecar
|
||||
// is intentionally left for the open-time recovery sweep to roll forward.
|
||||
if committed {
|
||||
let state = db.table_store.table_state(&full_path, &ds).await?;
|
||||
// Re-wrap the post-compaction dataset to read its state through the
|
||||
// trait surface (`table_state` is a read; no HEAD advance).
|
||||
let snapshot = crate::storage_layer::SnapshotHandle::new(ds);
|
||||
let state = db.storage().table_state(&full_path, &snapshot).await?;
|
||||
let update = crate::db::SubTableUpdate {
|
||||
table_key: table_key.clone(),
|
||||
table_version: state.version,
|
||||
|
|
@ -493,7 +502,7 @@ pub async fn cleanup_all_tables(
|
|||
}
|
||||
|
||||
let concurrency = maint_concurrency().min(table_tasks.len()).max(1);
|
||||
let table_store = &db.table_store;
|
||||
let storage = db.storage();
|
||||
|
||||
// Fault-isolated per table: a single table's GC failure is recorded on its
|
||||
// stats row (`error: Some`) and logged, never aborting the healthy tables.
|
||||
|
|
@ -503,9 +512,13 @@ pub async fn cleanup_all_tables(
|
|||
.map(|(table_key, full_path)| async move {
|
||||
let outcome: Result<RemovalStats> = async {
|
||||
crate::failpoints::maybe_fail("cleanup.table_gc")?;
|
||||
let ds = table_store
|
||||
// `cleanup_old_versions` is a Lance-only maintenance API not
|
||||
// surfaced through `TableStorage` — see the optimize path
|
||||
// above for the same rationale. Unwrap via `into_dataset()`.
|
||||
let handle = storage
|
||||
.open_dataset_head_for_write(&table_key, &full_path, None)
|
||||
.await?;
|
||||
let ds = handle.into_dataset();
|
||||
let before_version = keep_versions
|
||||
.map(|n| ds.version().version.saturating_sub(n as u64))
|
||||
.filter(|v| *v > 0);
|
||||
|
|
@ -606,8 +619,9 @@ pub async fn reconcile_orphaned_branches(db: &Omnigraph) -> Result<BranchReconci
|
|||
|
||||
// Per-table fault isolation: one table's transient failure is recorded and
|
||||
// logged, never aborting the rest of the sweep.
|
||||
let storage = db.storage();
|
||||
for (table_key, full_path) in table_targets {
|
||||
let listed = match db.table_store.list_branches(&full_path).await {
|
||||
let listed = match storage.list_branches(&full_path).await {
|
||||
Ok(listed) => listed,
|
||||
Err(err) => {
|
||||
tracing::warn!(
|
||||
|
|
@ -622,7 +636,7 @@ pub async fn reconcile_orphaned_branches(db: &Omnigraph) -> Result<BranchReconci
|
|||
};
|
||||
for branch in orphan_branches(listed, &keep) {
|
||||
let outcome = match crate::failpoints::maybe_fail("cleanup.reconcile_fork") {
|
||||
Ok(()) => db.table_store.force_delete_branch(&full_path, &branch).await,
|
||||
Ok(()) => storage.force_delete_branch(&full_path, &branch).await,
|
||||
Err(injected) => Err(injected),
|
||||
};
|
||||
match outcome {
|
||||
|
|
|
|||
|
|
@ -165,10 +165,15 @@ pub async fn repair_all_tables(db: &Omnigraph, options: RepairOptions) -> Result
|
|||
let mut any_forced = false;
|
||||
|
||||
for (table_key, full_path) in table_tasks {
|
||||
// `classify_drift` inspects raw Lance transaction history
|
||||
// (`read_transaction_by_version`), a Lance-only maintenance read the
|
||||
// staged-write trait does not surface. Open via `db.storage()` and
|
||||
// unwrap the opaque handle (mirrors optimize / cleanup).
|
||||
let ds = db
|
||||
.table_store
|
||||
.storage()
|
||||
.open_dataset_head_for_write(&table_key, &full_path, None)
|
||||
.await?;
|
||||
.await?
|
||||
.into_dataset();
|
||||
let manifest_version = snapshot
|
||||
.entry(&table_key)
|
||||
.map(|e| e.table_version)
|
||||
|
|
@ -214,7 +219,10 @@ pub async fn repair_all_tables(db: &Omnigraph, options: RepairOptions) -> Result
|
|||
};
|
||||
|
||||
if matches!(action, RepairAction::Healed | RepairAction::Forced) {
|
||||
let state = db.table_store.table_state(&full_path, &ds).await?;
|
||||
// Re-wrap the opened dataset to read its state through the trait
|
||||
// surface (`table_state` is a read; no HEAD advance).
|
||||
let snapshot = crate::storage_layer::SnapshotHandle::new(ds);
|
||||
let state = db.storage().table_state(&full_path, &snapshot).await?;
|
||||
updates.push(crate::db::SubTableUpdate {
|
||||
table_key: table_key.clone(),
|
||||
table_version: state.version,
|
||||
|
|
|
|||
|
|
@ -355,7 +355,7 @@ where
|
|||
let entry = snapshot.entry(table_key)?;
|
||||
Some(crate::db::manifest::SidecarTablePin {
|
||||
table_key: table_key.clone(),
|
||||
table_path: db.table_store.dataset_uri(&entry.table_path),
|
||||
table_path: db.storage().dataset_uri(&entry.table_path),
|
||||
expected_version: entry.table_version,
|
||||
post_commit_pin: entry.table_version + 1,
|
||||
table_branch: entry.table_branch.clone(),
|
||||
|
|
@ -469,12 +469,13 @@ where
|
|||
|
||||
for table_key in &added_tables {
|
||||
let table_path = table_path_for_table_key(table_key)?;
|
||||
let dataset_uri = db.table_store.dataset_uri(&table_path);
|
||||
let dataset_uri = db.storage().dataset_uri(&table_path);
|
||||
let schema = schema_for_table_key(&desired_catalog, table_key)?;
|
||||
let mut ds = TableStore::create_empty_dataset(&dataset_uri, &schema).await?;
|
||||
let mut ds =
|
||||
SnapshotHandle::new(TableStore::create_empty_dataset(&dataset_uri, &schema).await?);
|
||||
db.build_indices_on_dataset_for_catalog(&desired_catalog, table_key, &mut ds)
|
||||
.await?;
|
||||
let state = db.table_store.table_state(&dataset_uri, &ds).await?;
|
||||
let state = db.storage().table_state(&dataset_uri, &ds).await?;
|
||||
table_registrations.insert(table_key.clone(), table_path);
|
||||
table_updates.insert(
|
||||
table_key.clone(),
|
||||
|
|
@ -496,7 +497,10 @@ where
|
|||
))
|
||||
})?;
|
||||
ensure_snapshot_entry_head_matches(db, source_entry).await?;
|
||||
let source_ds = snapshot.open(source_table_key).await?;
|
||||
let source_ds = db
|
||||
.storage()
|
||||
.open_snapshot_at_table(&snapshot, source_table_key)
|
||||
.await?;
|
||||
let current_catalog = db.catalog();
|
||||
let batch = batch_for_schema_apply_rewrite(
|
||||
db,
|
||||
|
|
@ -509,11 +513,12 @@ where
|
|||
)
|
||||
.await?;
|
||||
let table_path = table_path_for_table_key(target_table_key)?;
|
||||
let dataset_uri = db.table_store.dataset_uri(&table_path);
|
||||
let mut target_ds = TableStore::write_dataset(&dataset_uri, batch).await?;
|
||||
let dataset_uri = db.storage().dataset_uri(&table_path);
|
||||
let mut target_ds =
|
||||
SnapshotHandle::new(TableStore::write_dataset(&dataset_uri, batch).await?);
|
||||
db.build_indices_on_dataset_for_catalog(&desired_catalog, target_table_key, &mut target_ds)
|
||||
.await?;
|
||||
let state = db.table_store.table_state(&dataset_uri, &target_ds).await?;
|
||||
let state = db.storage().table_state(&dataset_uri, &target_ds).await?;
|
||||
table_registrations.insert(target_table_key.clone(), table_path);
|
||||
table_updates.insert(
|
||||
target_table_key.clone(),
|
||||
|
|
@ -542,7 +547,10 @@ where
|
|||
))
|
||||
})?;
|
||||
ensure_snapshot_entry_head_matches(db, entry).await?;
|
||||
let source_ds = snapshot.open(table_key).await?;
|
||||
let source_ds = db
|
||||
.storage()
|
||||
.open_snapshot_at_table(&snapshot, table_key)
|
||||
.await?;
|
||||
let current_catalog = db.catalog();
|
||||
let batch = batch_for_schema_apply_rewrite(
|
||||
db,
|
||||
|
|
@ -554,37 +562,22 @@ where
|
|||
property_renames.get(table_key),
|
||||
)
|
||||
.await?;
|
||||
let dataset_uri = db.table_store.dataset_uri(&entry.table_path);
|
||||
// Route through stage_overwrite + commit_staged for non-empty
|
||||
// batches. Lance's `InsertBuilder::execute_uncommitted`
|
||||
// errors on empty data (lance-4.0.0 `src/dataset/write/insert.rs:144`),
|
||||
// so the empty-rewrite case stays on `overwrite_dataset` (which
|
||||
// accepts empty input). The empty case is rare in schema_apply
|
||||
// — it only fires when the source table itself was already empty
|
||||
// — and schema_apply runs under `__schema_apply_lock__` so the
|
||||
// narrow inline-commit residual is bounded.
|
||||
let mut target_ds = if batch.num_rows() == 0 {
|
||||
TableStore::overwrite_dataset(&dataset_uri, batch).await?
|
||||
} else {
|
||||
// Pass `entry.table_branch.as_deref()` (not `None`) for
|
||||
// consistency with the indexed_tables block below. Schema
|
||||
// apply runs under `__schema_apply_lock__` which today
|
||||
// rejects non-main branches, so `entry.table_branch` is
|
||||
// expected to be `None`. But the defensive passthrough
|
||||
// means a future relaxation of the lock-check can't quietly
|
||||
// open the wrong HEAD here.
|
||||
let existing = db
|
||||
.table_store
|
||||
.open_dataset_head_for_write(table_key, &dataset_uri, entry.table_branch.as_deref())
|
||||
.await?;
|
||||
let staged = db.table_store.stage_overwrite(&existing, batch).await?;
|
||||
db.table_store
|
||||
.commit_staged(Arc::new(existing), staged.transaction)
|
||||
.await?
|
||||
};
|
||||
let dataset_uri = db.storage().dataset_uri(&entry.table_path);
|
||||
// Pass `entry.table_branch.as_deref()` (not `None`) for
|
||||
// consistency with the indexed_tables block below. Schema
|
||||
// apply runs under `__schema_apply_lock__` which today rejects
|
||||
// non-main branches, so `entry.table_branch` is expected to be
|
||||
// `None`. But the defensive passthrough means a future relaxation
|
||||
// of the lock-check can't quietly open the wrong HEAD here.
|
||||
let existing = db
|
||||
.storage()
|
||||
.open_dataset_head_for_write(table_key, &dataset_uri, entry.table_branch.as_deref())
|
||||
.await?;
|
||||
let staged = db.storage().stage_overwrite(&existing, batch).await?;
|
||||
let mut target_ds = db.storage().commit_staged(existing, staged).await?;
|
||||
db.build_indices_on_dataset_for_catalog(&desired_catalog, table_key, &mut target_ds)
|
||||
.await?;
|
||||
let state = db.table_store.table_state(&dataset_uri, &target_ds).await?;
|
||||
let state = db.storage().table_state(&dataset_uri, &target_ds).await?;
|
||||
table_updates.insert(
|
||||
table_key.clone(),
|
||||
crate::db::SubTableUpdate {
|
||||
|
|
@ -611,16 +604,16 @@ where
|
|||
))
|
||||
})?;
|
||||
ensure_snapshot_entry_head_matches(db, entry).await?;
|
||||
let dataset_uri = db.table_store.dataset_uri(&entry.table_path);
|
||||
let dataset_uri = db.storage().dataset_uri(&entry.table_path);
|
||||
let mut ds = db
|
||||
.table_store
|
||||
.storage()
|
||||
.open_dataset_head_for_write(table_key, &dataset_uri, entry.table_branch.as_deref())
|
||||
.await?;
|
||||
db.table_store
|
||||
db.storage()
|
||||
.ensure_expected_version(&ds, table_key, entry.table_version)?;
|
||||
db.build_indices_on_dataset_for_catalog(&desired_catalog, table_key, &mut ds)
|
||||
.await?;
|
||||
let state = db.table_store.table_state(&dataset_uri, &ds).await?;
|
||||
let state = db.storage().table_state(&dataset_uri, &ds).await?;
|
||||
table_updates.insert(
|
||||
table_key.clone(),
|
||||
crate::db::SubTableUpdate {
|
||||
|
|
@ -869,22 +862,22 @@ pub(super) async fn ensure_snapshot_entry_head_matches(
|
|||
db: &Omnigraph,
|
||||
entry: &SubTableEntry,
|
||||
) -> Result<()> {
|
||||
let dataset_uri = db.table_store.dataset_uri(&entry.table_path);
|
||||
let dataset_uri = db.storage().dataset_uri(&entry.table_path);
|
||||
let ds = db
|
||||
.table_store
|
||||
.storage()
|
||||
.open_dataset_head_for_write(
|
||||
&entry.table_key,
|
||||
&dataset_uri,
|
||||
entry.table_branch.as_deref(),
|
||||
)
|
||||
.await?;
|
||||
db.table_store
|
||||
db.storage()
|
||||
.ensure_expected_version(&ds, &entry.table_key, entry.table_version)
|
||||
}
|
||||
|
||||
pub(super) async fn batch_for_schema_apply_rewrite(
|
||||
db: &Omnigraph,
|
||||
source_ds: &Dataset,
|
||||
source_ds: &SnapshotHandle,
|
||||
source_table_key: &str,
|
||||
source_catalog: &Catalog,
|
||||
target_table_key: &str,
|
||||
|
|
@ -896,11 +889,11 @@ pub(super) async fn batch_for_schema_apply_rewrite(
|
|||
let target_blob_properties = blob_properties_for_table_key(target_catalog, target_table_key)?;
|
||||
let needs_row_ids = !source_blob_properties.is_empty() || !target_blob_properties.is_empty();
|
||||
let batches = if needs_row_ids {
|
||||
db.table_store()
|
||||
.scan_with(source_ds, None, None, None, true, |_| Ok(()))
|
||||
db.storage()
|
||||
.scan_with_row_id(source_ds, None, None, None, true)
|
||||
.await?
|
||||
} else {
|
||||
db.table_store().scan_batches(source_ds).await?
|
||||
db.storage().scan_batches(source_ds).await?
|
||||
};
|
||||
if batches.is_empty() {
|
||||
return Ok(RecordBatch::new_empty(target_schema));
|
||||
|
|
@ -970,7 +963,7 @@ pub(super) async fn batch_for_schema_apply_rewrite(
|
|||
|
||||
async fn rebuild_blob_column(
|
||||
_db: &Omnigraph,
|
||||
source_ds: &Dataset,
|
||||
source_ds: &SnapshotHandle,
|
||||
column_name: &str,
|
||||
descriptions: &StructArray,
|
||||
row_ids: &[u64],
|
||||
|
|
@ -990,7 +983,7 @@ async fn rebuild_blob_column(
|
|||
let blob_files = if non_null_row_ids.is_empty() {
|
||||
Vec::new()
|
||||
} else {
|
||||
Arc::new(source_ds.clone())
|
||||
Arc::new(source_ds.dataset().clone())
|
||||
.take_blobs(&non_null_row_ids, column_name)
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?
|
||||
|
|
|
|||
|
|
@ -50,10 +50,10 @@ pub(super) async fn failpoint_publish_table_head_without_index_rebuild_for_test(
|
|||
.ok_or_else(|| OmniError::manifest(format!("no manifest entry for {}", table_key)))?;
|
||||
let full_path = format!("{}/{}", db.root_uri, entry.table_path);
|
||||
let ds = db
|
||||
.table_store
|
||||
.storage()
|
||||
.open_dataset_head_for_write(table_key, &full_path, table_branch)
|
||||
.await?;
|
||||
let state = db.table_store.table_state(&full_path, &ds).await?;
|
||||
let state = db.storage().table_state(&full_path, &ds).await?;
|
||||
let update = crate::db::SubTableUpdate {
|
||||
table_key: table_key.to_string(),
|
||||
table_version: state.version,
|
||||
|
|
@ -209,18 +209,18 @@ pub(super) async fn ensure_indices_for_branch(db: &Omnigraph, branch: Option<&st
|
|||
}
|
||||
},
|
||||
None => (
|
||||
db.table_store
|
||||
db.storage()
|
||||
.open_dataset_head_for_write(&table_key, &full_path, None)
|
||||
.await?,
|
||||
None,
|
||||
),
|
||||
};
|
||||
let row_count = db.table_store.count_rows(&ds, None).await.unwrap_or(0);
|
||||
let row_count = db.storage().count_rows(&ds, None).await.unwrap_or(0);
|
||||
if row_count > 0 {
|
||||
build_indices_on_dataset(db, &table_key, &mut ds).await?;
|
||||
}
|
||||
|
||||
let state = db.table_store.table_state(&full_path, &ds).await?;
|
||||
let state = db.storage().table_state(&full_path, &ds).await?;
|
||||
if state.version != entry.table_version
|
||||
|| resolved_branch.as_deref() != entry.table_branch.as_deref()
|
||||
{
|
||||
|
|
@ -257,18 +257,18 @@ pub(super) async fn ensure_indices_for_branch(db: &Omnigraph, branch: Option<&st
|
|||
}
|
||||
},
|
||||
None => (
|
||||
db.table_store
|
||||
db.storage()
|
||||
.open_dataset_head_for_write(&table_key, &full_path, None)
|
||||
.await?,
|
||||
None,
|
||||
),
|
||||
};
|
||||
let row_count = db.table_store.count_rows(&ds, None).await.unwrap_or(0);
|
||||
let row_count = db.storage().count_rows(&ds, None).await.unwrap_or(0);
|
||||
if row_count > 0 {
|
||||
build_indices_on_dataset(db, &table_key, &mut ds).await?;
|
||||
}
|
||||
|
||||
let state = db.table_store.table_state(&full_path, &ds).await?;
|
||||
let state = db.storage().table_state(&full_path, &ds).await?;
|
||||
if state.version != entry.table_version
|
||||
|| resolved_branch.as_deref() != entry.table_branch.as_deref()
|
||||
{
|
||||
|
|
@ -331,7 +331,7 @@ async fn needs_index_work_node(
|
|||
table_branch: Option<&str>,
|
||||
) -> Result<bool> {
|
||||
let ds = db
|
||||
.table_store
|
||||
.storage()
|
||||
.open_dataset_head_for_write(table_key, full_path, table_branch)
|
||||
.await?;
|
||||
// Empty tables are skipped by the ensure_indices loop, so they must
|
||||
|
|
@ -341,10 +341,10 @@ async fn needs_index_work_node(
|
|||
// Errors from count_rows are propagated: silently treating them as
|
||||
// "0 rows" risks skipping a table that is actually about to be
|
||||
// modified.
|
||||
if db.table_store.count_rows(&ds, None).await? == 0 {
|
||||
if db.storage().count_rows(&ds, None).await? == 0 {
|
||||
return Ok(false);
|
||||
}
|
||||
if !db.table_store.has_btree_index(&ds, "id").await? {
|
||||
if !db.storage().has_btree_index(&ds, "id").await? {
|
||||
return Ok(true);
|
||||
}
|
||||
let catalog = db.catalog();
|
||||
|
|
@ -360,11 +360,11 @@ async fn needs_index_work_node(
|
|||
continue;
|
||||
};
|
||||
if matches!(prop_type.scalar, ScalarType::String) && !prop_type.list {
|
||||
if !db.table_store.has_fts_index(&ds, prop_name).await? {
|
||||
if !db.storage().has_fts_index(&ds, prop_name).await? {
|
||||
return Ok(true);
|
||||
}
|
||||
} else if matches!(prop_type.scalar, ScalarType::Vector(_)) && !prop_type.list {
|
||||
if !db.table_store.has_vector_index(&ds, prop_name).await? {
|
||||
if !db.storage().has_vector_index(&ds, prop_name).await? {
|
||||
return Ok(true);
|
||||
}
|
||||
}
|
||||
|
|
@ -389,22 +389,22 @@ async fn needs_index_work_edge(
|
|||
table_branch: Option<&str>,
|
||||
) -> Result<bool> {
|
||||
let ds = db
|
||||
.table_store
|
||||
.storage()
|
||||
.open_dataset_head_for_write(table_key, full_path, table_branch)
|
||||
.await?;
|
||||
if db.table_store.count_rows(&ds, None).await? == 0 {
|
||||
if db.storage().count_rows(&ds, None).await? == 0 {
|
||||
return Ok(false);
|
||||
}
|
||||
Ok(!db.table_store.has_btree_index(&ds, "id").await?
|
||||
|| !db.table_store.has_btree_index(&ds, "src").await?
|
||||
|| !db.table_store.has_btree_index(&ds, "dst").await?)
|
||||
Ok(!db.storage().has_btree_index(&ds, "id").await?
|
||||
|| !db.storage().has_btree_index(&ds, "src").await?
|
||||
|| !db.storage().has_btree_index(&ds, "dst").await?)
|
||||
}
|
||||
|
||||
pub(super) async fn open_for_mutation(
|
||||
db: &Omnigraph,
|
||||
table_key: &str,
|
||||
op_kind: crate::db::MutationOpKind,
|
||||
) -> Result<(Dataset, String, Option<String>)> {
|
||||
) -> Result<(SnapshotHandle, String, Option<String>)> {
|
||||
let current_branch = db
|
||||
.coordinator
|
||||
.read()
|
||||
|
|
@ -425,7 +425,7 @@ pub(super) async fn open_for_mutation_on_branch(
|
|||
branch: Option<&str>,
|
||||
table_key: &str,
|
||||
op_kind: crate::db::MutationOpKind,
|
||||
) -> Result<(Dataset, String, Option<String>)> {
|
||||
) -> Result<(SnapshotHandle, String, Option<String>)> {
|
||||
db.ensure_schema_apply_not_locked("write").await?;
|
||||
let resolved = db.resolved_branch_target(branch).await?;
|
||||
let entry = resolved
|
||||
|
|
@ -436,11 +436,11 @@ pub(super) async fn open_for_mutation_on_branch(
|
|||
match resolved.branch.as_deref() {
|
||||
None => {
|
||||
let ds = db
|
||||
.table_store
|
||||
.storage()
|
||||
.open_dataset_head_for_write(table_key, &full_path, None)
|
||||
.await?;
|
||||
if op_kind.strict_pre_stage_version_check() {
|
||||
db.table_store
|
||||
db.storage()
|
||||
.ensure_expected_version(&ds, table_key, entry.table_version)?;
|
||||
}
|
||||
Ok((ds, full_path, None))
|
||||
|
|
@ -469,15 +469,15 @@ pub(super) async fn open_owned_dataset_for_branch_write(
|
|||
entry_version: u64,
|
||||
active_branch: &str,
|
||||
op_kind: crate::db::MutationOpKind,
|
||||
) -> Result<(Dataset, Option<String>)> {
|
||||
) -> Result<(SnapshotHandle, Option<String>)> {
|
||||
match entry_branch {
|
||||
Some(branch) if branch == active_branch => {
|
||||
let ds = db
|
||||
.table_store
|
||||
.storage()
|
||||
.open_dataset_head_for_write(table_key, full_path, Some(active_branch))
|
||||
.await?;
|
||||
if op_kind.strict_pre_stage_version_check() {
|
||||
db.table_store
|
||||
db.storage()
|
||||
.ensure_expected_version(&ds, table_key, entry_version)?;
|
||||
}
|
||||
Ok((ds, Some(active_branch.to_string())))
|
||||
|
|
@ -509,11 +509,11 @@ pub(super) async fn open_owned_dataset_for_branch_write(
|
|||
)
|
||||
.await?;
|
||||
let ds = db
|
||||
.table_store
|
||||
.storage()
|
||||
.open_dataset_head_for_write(table_key, full_path, Some(active_branch))
|
||||
.await?;
|
||||
if op_kind.strict_pre_stage_version_check() {
|
||||
db.table_store
|
||||
db.storage()
|
||||
.ensure_expected_version(&ds, table_key, entry_version)?;
|
||||
}
|
||||
Ok((ds, Some(active_branch.to_string())))
|
||||
|
|
@ -528,8 +528,8 @@ pub(super) async fn fork_dataset_from_entry_state(
|
|||
source_branch: Option<&str>,
|
||||
source_version: u64,
|
||||
active_branch: &str,
|
||||
) -> Result<Dataset> {
|
||||
db.table_store
|
||||
) -> Result<SnapshotHandle> {
|
||||
db.storage()
|
||||
.fork_branch_from_state(
|
||||
full_path,
|
||||
source_branch,
|
||||
|
|
@ -547,10 +547,10 @@ pub(super) async fn reopen_for_mutation(
|
|||
table_branch: Option<&str>,
|
||||
expected_version: u64,
|
||||
op_kind: crate::db::MutationOpKind,
|
||||
) -> Result<Dataset> {
|
||||
) -> Result<SnapshotHandle> {
|
||||
db.ensure_schema_apply_not_locked("write").await?;
|
||||
if op_kind.strict_pre_stage_version_check() {
|
||||
db.table_store
|
||||
db.storage()
|
||||
.reopen_for_mutation(full_path, table_branch, table_key, expected_version)
|
||||
.await
|
||||
} else {
|
||||
|
|
@ -563,7 +563,7 @@ pub(super) async fn reopen_for_mutation(
|
|||
// genuine cross-process drift as 409. See
|
||||
// [`crate::db::MutationOpKind`] for the policy rationale.
|
||||
let _ = expected_version;
|
||||
db.table_store
|
||||
db.storage()
|
||||
.open_dataset_head_for_write(table_key, full_path, table_branch)
|
||||
.await
|
||||
}
|
||||
|
|
@ -574,8 +574,8 @@ pub(super) async fn open_dataset_at_state(
|
|||
table_path: &str,
|
||||
table_branch: Option<&str>,
|
||||
table_version: u64,
|
||||
) -> Result<Dataset> {
|
||||
db.table_store
|
||||
) -> Result<SnapshotHandle> {
|
||||
db.storage()
|
||||
.open_dataset_at_state(table_path, table_branch, table_version)
|
||||
.await
|
||||
}
|
||||
|
|
@ -583,7 +583,7 @@ pub(super) async fn open_dataset_at_state(
|
|||
pub(super) async fn build_indices_on_dataset(
|
||||
db: &Omnigraph,
|
||||
table_key: &str,
|
||||
ds: &mut Dataset,
|
||||
ds: &mut SnapshotHandle,
|
||||
) -> Result<()> {
|
||||
let catalog = db.catalog();
|
||||
build_indices_on_dataset_for_catalog(db, &catalog, table_key, ds).await
|
||||
|
|
@ -593,10 +593,10 @@ pub(super) async fn build_indices_on_dataset_for_catalog(
|
|||
db: &Omnigraph,
|
||||
catalog: &Catalog,
|
||||
table_key: &str,
|
||||
ds: &mut Dataset,
|
||||
ds: &mut SnapshotHandle,
|
||||
) -> Result<()> {
|
||||
if let Some(type_name) = table_key.strip_prefix("node:") {
|
||||
if !db.table_store.has_btree_index(ds, "id").await? {
|
||||
if !db.storage().has_btree_index(ds, "id").await? {
|
||||
stage_and_commit_btree(db, table_key, ds, &["id"]).await?;
|
||||
}
|
||||
|
||||
|
|
@ -616,19 +616,20 @@ pub(super) async fn build_indices_on_dataset_for_catalog(
|
|||
let prop_name = &index_cols[0];
|
||||
if let Some(prop_type) = node_type.properties.get(prop_name) {
|
||||
if matches!(prop_type.scalar, ScalarType::String) && !prop_type.list {
|
||||
if !db.table_store.has_fts_index(ds, prop_name).await? {
|
||||
if !db.storage().has_fts_index(ds, prop_name).await? {
|
||||
stage_and_commit_inverted(db, table_key, ds, prop_name.as_str())
|
||||
.await?;
|
||||
}
|
||||
} else if matches!(prop_type.scalar, ScalarType::Vector(_)) && !prop_type.list {
|
||||
if !db.table_store.has_vector_index(ds, prop_name).await? {
|
||||
// Inline-commit residual: lance-4.0.0 does not
|
||||
if !db.storage().has_vector_index(ds, prop_name).await? {
|
||||
// Inline-commit residual: lance-6.0.1 does not
|
||||
// expose `build_index_metadata_from_segments` as
|
||||
// `pub`, so vector indices cannot be staged from
|
||||
// outside the lance crate. Document at the call
|
||||
// site; companion ticket to lance-format/lance#6658.
|
||||
db.table_store
|
||||
.create_vector_index(ds, prop_name.as_str())
|
||||
let new_snap = db
|
||||
.storage_inline_residual()
|
||||
.create_vector_index(ds.clone(), prop_name.as_str())
|
||||
.await
|
||||
.map_err(|e| {
|
||||
OmniError::Lance(format!(
|
||||
|
|
@ -636,6 +637,7 @@ pub(super) async fn build_indices_on_dataset_for_catalog(
|
|||
table_key, prop_name, e
|
||||
))
|
||||
})?;
|
||||
*ds = new_snap;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -645,13 +647,13 @@ pub(super) async fn build_indices_on_dataset_for_catalog(
|
|||
}
|
||||
|
||||
if table_key.starts_with("edge:") {
|
||||
if !db.table_store.has_btree_index(ds, "id").await? {
|
||||
if !db.storage().has_btree_index(ds, "id").await? {
|
||||
stage_and_commit_btree(db, table_key, ds, &["id"]).await?;
|
||||
}
|
||||
if !db.table_store.has_btree_index(ds, "src").await? {
|
||||
if !db.storage().has_btree_index(ds, "src").await? {
|
||||
stage_and_commit_btree(db, table_key, ds, &["src"]).await?;
|
||||
}
|
||||
if !db.table_store.has_btree_index(ds, "dst").await? {
|
||||
if !db.storage().has_btree_index(ds, "dst").await? {
|
||||
stage_and_commit_btree(db, table_key, ds, &["dst"]).await?;
|
||||
}
|
||||
return Ok(());
|
||||
|
|
@ -674,11 +676,11 @@ pub(super) async fn build_indices_on_dataset_for_catalog(
|
|||
async fn stage_and_commit_btree(
|
||||
db: &Omnigraph,
|
||||
table_key: &str,
|
||||
ds: &mut Dataset,
|
||||
ds: &mut SnapshotHandle,
|
||||
columns: &[&str],
|
||||
) -> Result<()> {
|
||||
let staged = db
|
||||
.table_store
|
||||
.storage()
|
||||
.stage_create_btree_index(ds, columns)
|
||||
.await
|
||||
.map_err(|e| {
|
||||
|
|
@ -693,8 +695,8 @@ async fn stage_and_commit_btree(
|
|||
// yet called) leaves no Lance-HEAD drift on the touched table.
|
||||
crate::failpoints::maybe_fail("ensure_indices.post_stage_pre_commit_btree")?;
|
||||
let new_ds = db
|
||||
.table_store
|
||||
.commit_staged(Arc::new(ds.clone()), staged.transaction)
|
||||
.storage()
|
||||
.commit_staged(ds.clone(), staged)
|
||||
.await
|
||||
.map_err(|e| {
|
||||
OmniError::Lance(format!(
|
||||
|
|
@ -711,11 +713,11 @@ async fn stage_and_commit_btree(
|
|||
async fn stage_and_commit_inverted(
|
||||
db: &Omnigraph,
|
||||
table_key: &str,
|
||||
ds: &mut Dataset,
|
||||
ds: &mut SnapshotHandle,
|
||||
column: &str,
|
||||
) -> Result<()> {
|
||||
let staged = db
|
||||
.table_store
|
||||
.storage()
|
||||
.stage_create_inverted_index(ds, column)
|
||||
.await
|
||||
.map_err(|e| {
|
||||
|
|
@ -725,8 +727,8 @@ async fn stage_and_commit_inverted(
|
|||
))
|
||||
})?;
|
||||
let new_ds = db
|
||||
.table_store
|
||||
.commit_staged(Arc::new(ds.clone()), staged.transaction)
|
||||
.storage()
|
||||
.commit_staged(ds.clone(), staged)
|
||||
.await
|
||||
.map_err(|e| {
|
||||
OmniError::Lance(format!(
|
||||
|
|
@ -777,7 +779,7 @@ async fn prepare_updates_for_commit(
|
|||
)
|
||||
.await?;
|
||||
build_indices_on_dataset(db, &prepared_update.table_key, &mut ds).await?;
|
||||
let state = db.table_store.table_state(&full_path, &ds).await?;
|
||||
let state = db.storage().table_state(&full_path, &ds).await?;
|
||||
prepared_update.table_version = state.version;
|
||||
prepared_update.row_count = state.row_count;
|
||||
prepared_update.version_metadata = state.version_metadata;
|
||||
|
|
|
|||
|
|
@ -926,7 +926,7 @@ async fn publish_adopted_source_state(
|
|||
target_branch,
|
||||
)
|
||||
.await?;
|
||||
let state = target_db.table_store().table_state(&full_path, &ds).await?;
|
||||
let state = target_db.storage().table_state(&full_path, &ds).await?;
|
||||
Ok(crate::db::SubTableUpdate {
|
||||
table_key: table_key.to_string(),
|
||||
table_version: state.version,
|
||||
|
|
@ -963,9 +963,13 @@ async fn publish_rewritten_merge_table(
|
|||
// commit point, narrowed from the previous "merge_insert + delete +
|
||||
// index" multi-step inline-commit chain.
|
||||
if let Some(delta) = &staged.delta_staged {
|
||||
// The staged delta dataset is a temp-dir Lance dataset used only
|
||||
// to collect the rewrite batches; wrap it in a `SnapshotHandle`
|
||||
// so we can route through the trait's `scan_batches_for_rewrite`.
|
||||
let delta_snapshot = SnapshotHandle::new(delta.dataset.clone());
|
||||
let batches: Vec<RecordBatch> = target_db
|
||||
.table_store()
|
||||
.scan_batches_for_rewrite(&delta.dataset)
|
||||
.storage()
|
||||
.scan_batches_for_rewrite(&delta_snapshot)
|
||||
.await?
|
||||
.into_iter()
|
||||
.filter(|batch| batch.num_rows() > 0)
|
||||
|
|
@ -980,7 +984,7 @@ async fn publish_rewritten_merge_table(
|
|||
.map_err(|e| OmniError::Lance(e.to_string()))?
|
||||
};
|
||||
let staged_merge = target_db
|
||||
.table_store()
|
||||
.storage()
|
||||
.stage_merge_insert(
|
||||
current_ds.clone(),
|
||||
combined,
|
||||
|
|
@ -990,15 +994,15 @@ async fn publish_rewritten_merge_table(
|
|||
)
|
||||
.await?;
|
||||
current_ds = target_db
|
||||
.table_store()
|
||||
.commit_staged(Arc::new(current_ds), staged_merge.transaction)
|
||||
.storage()
|
||||
.commit_staged(current_ds, staged_merge)
|
||||
.await?;
|
||||
}
|
||||
}
|
||||
|
||||
// Phase 2: delete removed rows via deletion vectors.
|
||||
//
|
||||
// INLINE-COMMIT RESIDUAL: lance-4.0.0 does not expose a public
|
||||
// INLINE-COMMIT RESIDUAL: lance-6.0.1 does not expose a public
|
||||
// two-phase delete API (DeleteJob is `pub(crate)` —
|
||||
// lance-format/lance#6658 is open with no PRs). We deliberately do
|
||||
// NOT introduce a `stage_delete` wrapper that would secretly
|
||||
|
|
@ -1012,10 +1016,11 @@ async fn publish_rewritten_merge_table(
|
|||
.map(|id| format!("'{}'", id.replace('\'', "''")))
|
||||
.collect();
|
||||
let filter = format!("id IN ({})", escaped.join(", "));
|
||||
target_db
|
||||
.table_store()
|
||||
.delete_where(&full_path, &mut current_ds, &filter)
|
||||
let (new_ds, _) = target_db
|
||||
.storage_inline_residual()
|
||||
.delete_where(&full_path, current_ds, &filter)
|
||||
.await?;
|
||||
current_ds = new_ds;
|
||||
}
|
||||
|
||||
// Phase 3: rebuild indices.
|
||||
|
|
@ -1024,9 +1029,9 @@ async fn publish_rewritten_merge_table(
|
|||
// `stage_create_inverted_index` + `commit_staged` for scalar
|
||||
// indices. Vector indices remain inline-commit
|
||||
// (`build_index_metadata_from_segments` is `pub(crate)` in lance-
|
||||
// 4.0.0 — companion ticket to lance-format/lance#6658).
|
||||
// 6.0.1 — companion ticket to lance-format/lance#6666).
|
||||
let row_count = target_db
|
||||
.table_store()
|
||||
.storage()
|
||||
.table_state(&full_path, ¤t_ds)
|
||||
.await?
|
||||
.row_count;
|
||||
|
|
@ -1036,7 +1041,7 @@ async fn publish_rewritten_merge_table(
|
|||
.await?;
|
||||
}
|
||||
let final_state = target_db
|
||||
.table_store()
|
||||
.storage()
|
||||
.table_state(&full_path, ¤t_ds)
|
||||
.await?;
|
||||
|
||||
|
|
@ -1362,7 +1367,7 @@ impl Omnigraph {
|
|||
let entry = target_snapshot.entry(table_key)?;
|
||||
Some(crate::db::manifest::SidecarTablePin {
|
||||
table_key: table_key.clone(),
|
||||
table_path: self.table_store().dataset_uri(&entry.table_path),
|
||||
table_path: self.storage().dataset_uri(&entry.table_path),
|
||||
expected_version: entry.table_version,
|
||||
post_commit_pin: entry.table_version + 1,
|
||||
// Use the merge target branch (where commits actually
|
||||
|
|
|
|||
|
|
@ -40,6 +40,7 @@ use crate::db::{ReadTarget, Snapshot};
|
|||
use crate::embedding::EmbeddingClient;
|
||||
use crate::error::{MergeConflict, MergeConflictKind, OmniError, Result};
|
||||
use crate::graph_index::GraphIndex;
|
||||
use crate::storage_layer::SnapshotHandle;
|
||||
use tempfile::{Builder as TempDirBuilder, TempDir};
|
||||
|
||||
mod merge;
|
||||
|
|
|
|||
|
|
@ -428,12 +428,11 @@ async fn ensure_node_id_exists(
|
|||
|
||||
let filter = format!("id = '{}'", id.replace('\'', "''"));
|
||||
let snapshot = db.snapshot_for_branch(branch).await?;
|
||||
let ds = snapshot.open(&table_key).await?;
|
||||
let exists = ds
|
||||
.count_rows(Some(filter))
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?
|
||||
> 0;
|
||||
let ds = db
|
||||
.storage()
|
||||
.open_snapshot_at_table(&snapshot, &table_key)
|
||||
.await?;
|
||||
let exists = db.storage().count_rows(&ds, Some(filter)).await? > 0;
|
||||
|
||||
if exists {
|
||||
Ok(())
|
||||
|
|
@ -602,7 +601,7 @@ async fn open_table_for_mutation(
|
|||
branch: Option<&str>,
|
||||
table_key: &str,
|
||||
op_kind: crate::db::MutationOpKind,
|
||||
) -> Result<(Dataset, String, Option<String>)> {
|
||||
) -> Result<(SnapshotHandle, String, Option<String>)> {
|
||||
if let Some(prior) = staging.inline_committed.get(table_key) {
|
||||
let path = staging.paths.get(table_key).ok_or_else(|| {
|
||||
OmniError::manifest_internal(format!(
|
||||
|
|
@ -624,7 +623,7 @@ async fn open_table_for_mutation(
|
|||
let (ds, full_path, table_branch) = db
|
||||
.open_for_mutation_on_branch(branch, table_key, op_kind)
|
||||
.await?;
|
||||
let expected_version = ds.version().version;
|
||||
let expected_version = ds.version();
|
||||
staging.ensure_path(
|
||||
table_key,
|
||||
full_path.clone(),
|
||||
|
|
@ -640,7 +639,7 @@ async fn open_table_for_mutation(
|
|||
///
|
||||
/// Reason: under the staged-write writer, inserts and updates
|
||||
/// accumulate in memory and commit at end-of-query, while deletes still
|
||||
/// inline-commit (Lance lacks a public two-phase delete in 4.0.0).
|
||||
/// inline-commit (Lance lacks a public two-phase delete in 6.0.1).
|
||||
/// Mixing creates ordering hazards (same-row insert→delete becomes a no-op
|
||||
/// because the staged insert isn't visible to delete; cascading deletes
|
||||
/// of just-inserted edges break referential integrity by silent design).
|
||||
|
|
@ -1056,7 +1055,7 @@ impl Omnigraph {
|
|||
// and a chained `update where <pred>` can match a row whose
|
||||
// pending value no longer satisfies <pred>.
|
||||
let batches = self
|
||||
.table_store()
|
||||
.storage()
|
||||
.scan_with_pending(
|
||||
&ds,
|
||||
pending_batches,
|
||||
|
|
@ -1154,13 +1153,13 @@ impl Omnigraph {
|
|||
crate::db::MutationOpKind::Delete,
|
||||
)
|
||||
.await?;
|
||||
let initial_version = ds.version().version;
|
||||
let initial_version = ds.version();
|
||||
|
||||
// Scan matching IDs for cascade. Per D₂ this never overlaps with
|
||||
// staged inserts (mixed insert/delete in one query is rejected at
|
||||
// parse time), so we scan committed only.
|
||||
let batches = self
|
||||
.table_store()
|
||||
.storage()
|
||||
.scan(&ds, Some(&["id"]), Some(&pred_sql), None)
|
||||
.await?;
|
||||
|
||||
|
|
@ -1188,11 +1187,11 @@ impl Omnigraph {
|
|||
let affected_nodes = deleted_ids.len();
|
||||
|
||||
// Delete nodes — still inline-commit (Lance's `Dataset::delete` is
|
||||
// not exposed as a two-phase op in 4.0.0). D₂ keeps inserts and
|
||||
// not exposed as a two-phase op in 6.0.1). D₂ keeps inserts and
|
||||
// deletes from coexisting in one query, so this advance of Lance
|
||||
// HEAD is the only HEAD movement during the query and the
|
||||
// publisher's CAS captures it intact.
|
||||
let mut ds = self
|
||||
let ds = self
|
||||
.reopen_for_mutation(
|
||||
&table_key,
|
||||
&full_path,
|
||||
|
|
@ -1202,9 +1201,9 @@ impl Omnigraph {
|
|||
)
|
||||
.await?;
|
||||
crate::failpoints::maybe_fail("mutation.delete_node_pre_primary_delete")?;
|
||||
let delete_state = self
|
||||
.table_store()
|
||||
.delete_where(&full_path, &mut ds, &pred_sql)
|
||||
let (_new_ds, delete_state) = self
|
||||
.storage_inline_residual()
|
||||
.delete_where(&full_path, ds, &pred_sql)
|
||||
.await?;
|
||||
|
||||
staging.record_inline(crate::db::SubTableUpdate {
|
||||
|
|
@ -1243,7 +1242,7 @@ impl Omnigraph {
|
|||
|
||||
let edge_table_key = format!("edge:{}", edge_name);
|
||||
let cascade_filter = cascade_filters.join(" OR ");
|
||||
let (mut edge_ds, edge_full_path, edge_table_branch) = open_table_for_mutation(
|
||||
let (edge_ds, edge_full_path, edge_table_branch) = open_table_for_mutation(
|
||||
self,
|
||||
staging,
|
||||
branch,
|
||||
|
|
@ -1252,9 +1251,9 @@ impl Omnigraph {
|
|||
)
|
||||
.await?;
|
||||
|
||||
let edge_delete = self
|
||||
.table_store()
|
||||
.delete_where(&edge_full_path, &mut edge_ds, &cascade_filter)
|
||||
let (_new_edge_ds, edge_delete) = self
|
||||
.storage_inline_residual()
|
||||
.delete_where(&edge_full_path, edge_ds, &cascade_filter)
|
||||
.await?;
|
||||
|
||||
affected_edges += edge_delete.deleted_rows;
|
||||
|
|
@ -1291,7 +1290,7 @@ impl Omnigraph {
|
|||
let pred_sql = predicate_to_sql(predicate, params, true)?;
|
||||
|
||||
let table_key = format!("edge:{}", type_name);
|
||||
let (mut ds, full_path, table_branch) = open_table_for_mutation(
|
||||
let (ds, full_path, table_branch) = open_table_for_mutation(
|
||||
self,
|
||||
staging,
|
||||
branch,
|
||||
|
|
@ -1300,9 +1299,9 @@ impl Omnigraph {
|
|||
)
|
||||
.await?;
|
||||
|
||||
let delete_state = self
|
||||
.table_store()
|
||||
.delete_where(&full_path, &mut ds, &pred_sql)
|
||||
let (_new_ds, delete_state) = self
|
||||
.storage_inline_residual()
|
||||
.delete_where(&full_path, ds, &pred_sql)
|
||||
.await?;
|
||||
let affected = delete_state.deleted_rows;
|
||||
|
||||
|
|
@ -1356,7 +1355,7 @@ fn concat_match_batches_to_schema(
|
|||
/// dedup needed (`dedupe_key_column = None`).
|
||||
async fn validate_edge_cardinality_with_pending(
|
||||
db: &Omnigraph,
|
||||
committed_ds: &Dataset,
|
||||
committed_ds: &SnapshotHandle,
|
||||
staging: &MutationStaging,
|
||||
table_key: &str,
|
||||
edge_type: &omnigraph_compiler::catalog::EdgeType,
|
||||
|
|
|
|||
|
|
@ -21,9 +21,10 @@
|
|||
use std::collections::{HashMap, HashSet};
|
||||
use std::sync::Arc;
|
||||
|
||||
use crate::storage_layer::{SnapshotHandle, StagedHandle};
|
||||
use arrow_array::{Array, RecordBatch, StringArray, UInt32Array};
|
||||
use arrow_schema::SchemaRef;
|
||||
use lance::Dataset;
|
||||
use futures::stream::StreamExt;
|
||||
use omnigraph_compiler::catalog::EdgeType;
|
||||
|
||||
use crate::db::manifest::{
|
||||
|
|
@ -32,15 +33,13 @@ use crate::db::manifest::{
|
|||
use crate::db::{MutationOpKind, SubTableUpdate};
|
||||
use crate::error::{OmniError, Result};
|
||||
|
||||
/// Whether the per-table accumulator should commit via `stage_append`
|
||||
/// (no @key inserts, edge inserts) or `stage_merge_insert` (any @key insert
|
||||
/// or update). Once set to `Merge` for a table within a query, subsequent
|
||||
/// inserts on that table are rolled into the same merge — a `WhenNotMatched
|
||||
/// = InsertAll` merge is correct for both cases.
|
||||
/// Whether the per-table accumulator should commit via `stage_append`,
|
||||
/// `stage_merge_insert`, or `stage_overwrite`.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub(crate) enum PendingMode {
|
||||
Append,
|
||||
Merge,
|
||||
Overwrite,
|
||||
}
|
||||
|
||||
/// Per-table accumulator. Each insert/update op pushes a `RecordBatch` into
|
||||
|
|
@ -158,9 +157,9 @@ impl MutationStaging {
|
|||
mode: PendingMode,
|
||||
batch: RecordBatch,
|
||||
) -> Result<()> {
|
||||
if batch.num_rows() == 0 {
|
||||
// No-op — staging is purely additive; an empty batch should not
|
||||
// be appended.
|
||||
if batch.num_rows() == 0 && mode != PendingMode::Overwrite {
|
||||
// No-op for additive modes. For Overwrite, an empty batch is
|
||||
// observable: it means "replace this table with zero rows".
|
||||
return Ok(());
|
||||
}
|
||||
// If we've already accumulated a batch on this table, the new
|
||||
|
|
@ -174,6 +173,14 @@ impl MutationStaging {
|
|||
// caller a clearer point of failure attached to the specific
|
||||
// op that introduced the drift.
|
||||
if let Some(existing) = self.pending.get(table_key) {
|
||||
if existing.mode == PendingMode::Overwrite || mode == PendingMode::Overwrite {
|
||||
if existing.mode != mode {
|
||||
return Err(OmniError::manifest_internal(format!(
|
||||
"table '{}' cannot mix overwrite staging with append/merge staging",
|
||||
table_key
|
||||
)));
|
||||
}
|
||||
}
|
||||
if !schemas_compatible(&existing.schema, &batch.schema()) {
|
||||
return Err(OmniError::manifest(format!(
|
||||
"table '{}' accumulated mutation batches with mismatched schemas: \
|
||||
|
|
@ -194,8 +201,9 @@ impl MutationStaging {
|
|||
.pending
|
||||
.entry(table_key.to_string())
|
||||
.or_insert_with(|| PendingTable::new(schema.clone(), mode));
|
||||
// Upgrade Append -> Merge if any op needs merge semantics.
|
||||
if mode == PendingMode::Merge {
|
||||
// Upgrade Append -> Merge if any op needs merge semantics. Overwrite
|
||||
// is never mixed with additive modes (guarded above).
|
||||
if mode == PendingMode::Merge && entry.mode == PendingMode::Append {
|
||||
entry.mode = PendingMode::Merge;
|
||||
}
|
||||
entry.batches.push(batch);
|
||||
|
|
@ -217,6 +225,11 @@ impl MutationStaging {
|
|||
.unwrap_or(&[])
|
||||
}
|
||||
|
||||
/// Accumulator mode for `table_key`, if this query has touched it.
|
||||
pub(crate) fn pending_mode(&self, table_key: &str) -> Option<PendingMode> {
|
||||
self.pending.get(table_key).map(|p| p.mode)
|
||||
}
|
||||
|
||||
/// Schema of the accumulated batches for `table_key`, or `None` if no
|
||||
/// op has touched the table. Used by `scan_with_pending` to construct
|
||||
/// the in-memory `MemTable`.
|
||||
|
|
@ -249,9 +262,21 @@ impl MutationStaging {
|
|||
/// Lance datasets is a perf follow-up; same loop structure as the
|
||||
/// pre-split `finalize`.
|
||||
pub(crate) async fn stage_all(
|
||||
self,
|
||||
db: &crate::db::Omnigraph,
|
||||
branch: Option<&str>,
|
||||
) -> Result<StagedMutation> {
|
||||
self.stage_all_with_concurrency(db, branch, 1).await
|
||||
}
|
||||
|
||||
/// Loader-facing variant of [`stage_all`] that preserves
|
||||
/// `OMNIGRAPH_LOAD_CONCURRENCY` for the fragment-writing stage while
|
||||
/// still leaving all Lance HEAD movement to [`StagedMutation::commit_all`].
|
||||
pub(crate) async fn stage_all_with_concurrency(
|
||||
self,
|
||||
db: &crate::db::Omnigraph,
|
||||
_branch: Option<&str>,
|
||||
concurrency: usize,
|
||||
) -> Result<StagedMutation> {
|
||||
let MutationStaging {
|
||||
expected_versions,
|
||||
|
|
@ -261,7 +286,8 @@ impl MutationStaging {
|
|||
op_kinds,
|
||||
} = self;
|
||||
|
||||
let mut staged_entries: Vec<StagedTableEntry> = Vec::with_capacity(pending.len());
|
||||
let mut stage_inputs: Vec<(String, PendingTable, StagedTablePath, u64)> =
|
||||
Vec::with_capacity(pending.len());
|
||||
for (table_key, table) in pending {
|
||||
let path = paths.get(&table_key).cloned().ok_or_else(|| {
|
||||
OmniError::manifest_internal(format!(
|
||||
|
|
@ -275,77 +301,22 @@ impl MutationStaging {
|
|||
table_key
|
||||
))
|
||||
})?;
|
||||
|
||||
// Reopen the dataset for staging. The op_kind reflects the
|
||||
// accumulated PendingTable's mode: Append-mode batches are
|
||||
// INSERT-shaped (no key-based dedup at commit_staged); Merge-
|
||||
// mode batches are MERGE-shaped (key-dedup at commit_staged).
|
||||
// Both skip the strict pre-stage version check under the
|
||||
// [`MutationOpKind`] policy: Lance's natural rebase + the
|
||||
// per-(table, branch) queue + the publisher CAS in
|
||||
// `commit_all` handle drift; the strict check would
|
||||
// over-reject in-process concurrent inserts (PR 2 / MR-686
|
||||
// Phase 2).
|
||||
let stage_kind = match table.mode {
|
||||
PendingMode::Append => crate::db::MutationOpKind::Insert,
|
||||
PendingMode::Merge => crate::db::MutationOpKind::Merge,
|
||||
};
|
||||
let ds = db
|
||||
.reopen_for_mutation(
|
||||
&table_key,
|
||||
&path.full_path,
|
||||
path.table_branch.as_deref(),
|
||||
expected,
|
||||
stage_kind,
|
||||
)
|
||||
.await?;
|
||||
|
||||
if table.batches.is_empty() {
|
||||
continue;
|
||||
}
|
||||
|
||||
// For Merge mode, dedupe accumulated batches by `id`, keeping
|
||||
// the LAST occurrence (last-write-wins for the query). This
|
||||
// is required because Lance's `MergeInsertBuilder` produces
|
||||
// arbitrary results on duplicate keys in the source. Append
|
||||
// mode is exempt because no-key node and edge inserts use
|
||||
// ULID-generated ids that are unique within a query.
|
||||
let combined = match table.mode {
|
||||
PendingMode::Merge => dedupe_merge_batches_by_id(&table.schema, table.batches)?,
|
||||
PendingMode::Append => {
|
||||
if table.batches.len() == 1 {
|
||||
table.batches.into_iter().next().unwrap()
|
||||
} else {
|
||||
arrow_select::concat::concat_batches(&table.schema, &table.batches)
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
// Stage produces uncommitted fragments + transaction. No
|
||||
// Lance HEAD advance until `commit_all` runs `commit_staged`.
|
||||
let staged = match table.mode {
|
||||
PendingMode::Append => db.table_store().stage_append(&ds, combined, &[]).await?,
|
||||
PendingMode::Merge => {
|
||||
db.table_store()
|
||||
.stage_merge_insert(
|
||||
ds.clone(),
|
||||
combined,
|
||||
vec!["id".to_string()],
|
||||
lance::dataset::WhenMatched::UpdateAll,
|
||||
lance::dataset::WhenNotMatched::InsertAll,
|
||||
)
|
||||
.await?
|
||||
}
|
||||
};
|
||||
staged_entries.push(StagedTableEntry {
|
||||
table_key,
|
||||
path,
|
||||
expected_version: expected,
|
||||
dataset: ds,
|
||||
staged_write: staged,
|
||||
});
|
||||
stage_inputs.push((table_key, table, path, expected));
|
||||
}
|
||||
let concurrency = concurrency.min(stage_inputs.len()).max(1);
|
||||
let staged_entries = futures::stream::iter(stage_inputs.into_iter().map(
|
||||
|(table_key, table, path, expected)| async move {
|
||||
stage_pending_table(db, table_key, table, path, expected).await
|
||||
},
|
||||
))
|
||||
.buffered(concurrency)
|
||||
.collect::<Vec<Result<Option<StagedTableEntry>>>>()
|
||||
.await
|
||||
.into_iter()
|
||||
.collect::<Result<Vec<_>>>()?
|
||||
.into_iter()
|
||||
.flatten()
|
||||
.collect();
|
||||
|
||||
Ok(StagedMutation {
|
||||
inline_committed,
|
||||
|
|
@ -357,6 +328,73 @@ impl MutationStaging {
|
|||
}
|
||||
}
|
||||
|
||||
async fn stage_pending_table(
|
||||
db: &crate::db::Omnigraph,
|
||||
table_key: String,
|
||||
table: PendingTable,
|
||||
path: StagedTablePath,
|
||||
expected: u64,
|
||||
) -> Result<Option<StagedTableEntry>> {
|
||||
// Reopen the dataset for staging. Append/Merge can be rebased later by
|
||||
// Lance + publisher CAS; Overwrite is a strict replacement and uses the
|
||||
// same SchemaRewrite policy as schema apply.
|
||||
let stage_kind = match table.mode {
|
||||
PendingMode::Append => crate::db::MutationOpKind::Insert,
|
||||
PendingMode::Merge => crate::db::MutationOpKind::Merge,
|
||||
PendingMode::Overwrite => crate::db::MutationOpKind::SchemaRewrite,
|
||||
};
|
||||
let ds = db
|
||||
.reopen_for_mutation(
|
||||
&table_key,
|
||||
&path.full_path,
|
||||
path.table_branch.as_deref(),
|
||||
expected,
|
||||
stage_kind,
|
||||
)
|
||||
.await?;
|
||||
|
||||
if table.batches.is_empty() {
|
||||
return Ok(None);
|
||||
}
|
||||
|
||||
let combined = match table.mode {
|
||||
PendingMode::Merge => dedupe_merge_batches_by_id(&table.schema, table.batches)?,
|
||||
PendingMode::Append | PendingMode::Overwrite => {
|
||||
if table.batches.len() == 1 {
|
||||
table.batches.into_iter().next().unwrap()
|
||||
} else {
|
||||
arrow_select::concat::concat_batches(&table.schema, &table.batches)
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
// Stage produces uncommitted fragments + transaction. No Lance HEAD
|
||||
// advance until `commit_all` runs `commit_staged`.
|
||||
let staged = match table.mode {
|
||||
PendingMode::Append => db.storage().stage_append(&ds, combined, &[]).await?,
|
||||
PendingMode::Merge => {
|
||||
db.storage()
|
||||
.stage_merge_insert(
|
||||
ds.clone(),
|
||||
combined,
|
||||
vec!["id".to_string()],
|
||||
lance::dataset::WhenMatched::UpdateAll,
|
||||
lance::dataset::WhenNotMatched::InsertAll,
|
||||
)
|
||||
.await?
|
||||
}
|
||||
PendingMode::Overwrite => db.storage().stage_overwrite(&ds, combined).await?,
|
||||
};
|
||||
Ok(Some(StagedTableEntry {
|
||||
table_key,
|
||||
path,
|
||||
expected_version: expected,
|
||||
dataset: ds,
|
||||
staged_write: staged,
|
||||
}))
|
||||
}
|
||||
|
||||
/// Output of [`MutationStaging::stage_all`]. Carries the staged Lance
|
||||
/// transactions (Phase A complete; uncommitted fragments written) plus
|
||||
/// the per-table metadata needed to write the recovery sidecar, run
|
||||
|
|
@ -389,15 +427,17 @@ pub(crate) struct StagedMutation {
|
|||
}
|
||||
|
||||
/// Per-table state captured during `stage_all` and consumed by
|
||||
/// `commit_all`. Holds the opened `Dataset` so `commit_staged` doesn't
|
||||
/// re-open, and the `StagedWrite` whose `transaction` `commit_staged`
|
||||
/// will execute.
|
||||
/// `commit_all`. Holds the opened snapshot (so `commit_staged` doesn't
|
||||
/// re-open) plus the staged Lance transaction that `commit_staged`
|
||||
/// will execute. Both held as opaque `TableStorage` handles per MR-793
|
||||
/// §III.9 — the inner `lance::Dataset` / `StagedWrite` are not visible
|
||||
/// to engine code outside the storage layer.
|
||||
struct StagedTableEntry {
|
||||
table_key: String,
|
||||
path: StagedTablePath,
|
||||
expected_version: u64,
|
||||
dataset: lance::Dataset,
|
||||
staged_write: crate::table_store::StagedWrite,
|
||||
dataset: SnapshotHandle,
|
||||
staged_write: StagedHandle,
|
||||
}
|
||||
|
||||
impl StagedMutation {
|
||||
|
|
@ -544,15 +584,14 @@ impl StagedMutation {
|
|||
// raw Lance write or a pre-fix maintenance path moved HEAD without
|
||||
// publishing `__manifest`, this write must not silently fold it.
|
||||
let head = db
|
||||
.table_store()
|
||||
.storage()
|
||||
.open_dataset_head_for_write(
|
||||
&entry.table_key,
|
||||
&entry.path.full_path,
|
||||
entry.path.table_branch.as_deref(),
|
||||
)
|
||||
.await?
|
||||
.version()
|
||||
.version;
|
||||
.version();
|
||||
if head < current {
|
||||
return Err(OmniError::manifest_internal(format!(
|
||||
"table '{}' Lance HEAD version {} is behind manifest version {}",
|
||||
|
|
@ -672,14 +711,8 @@ impl StagedMutation {
|
|||
staged_write,
|
||||
} = entry;
|
||||
|
||||
let new_ds = db
|
||||
.table_store()
|
||||
.commit_staged(Arc::new(dataset), staged_write.transaction)
|
||||
.await?;
|
||||
let state = db
|
||||
.table_store()
|
||||
.table_state(&path.full_path, &new_ds)
|
||||
.await?;
|
||||
let new_ds = db.storage().commit_staged(dataset, staged_write).await?;
|
||||
let state = db.storage().table_state(&path.full_path, &new_ds).await?;
|
||||
updates.push(SubTableUpdate {
|
||||
table_key,
|
||||
table_version: state.version,
|
||||
|
|
@ -813,7 +846,9 @@ fn dedupe_merge_batches_by_id(
|
|||
/// Count edges per `src` value across committed (Lance scan) + pending
|
||||
/// (in-memory). Caller supplies an opened committed dataset so the
|
||||
/// mutation path (which already has one) and the loader path (which
|
||||
/// opens via snapshot) share the same body.
|
||||
/// opens via snapshot) share the same body. For overwrite staging, the
|
||||
/// pending batches are the replacement table image, so committed rows are
|
||||
/// intentionally skipped.
|
||||
///
|
||||
/// `dedupe_key_column` controls whether committed rows are shadowed by
|
||||
/// pending:
|
||||
|
|
@ -828,7 +863,7 @@ fn dedupe_merge_batches_by_id(
|
|||
/// `LoadMode::Merge` double-counts.
|
||||
pub(crate) async fn count_src_per_edge(
|
||||
db: &crate::db::Omnigraph,
|
||||
committed_ds: &Dataset,
|
||||
committed_ds: &SnapshotHandle,
|
||||
table_key: &str,
|
||||
staging: &MutationStaging,
|
||||
dedupe_key_column: Option<&str>,
|
||||
|
|
@ -859,41 +894,44 @@ pub(crate) async fn count_src_per_edge(
|
|||
_ => None,
|
||||
};
|
||||
|
||||
// Committed side: scan `src` plus the dedupe key column when set, so
|
||||
// we can both count and shadow in one pass.
|
||||
let projection: Vec<&str> = match dedupe_key_column {
|
||||
Some(col) if pending_keys.as_ref().is_some_and(|s| !s.is_empty()) => vec!["src", col],
|
||||
_ => vec!["src"],
|
||||
};
|
||||
let committed = db
|
||||
.table_store()
|
||||
.scan(committed_ds, Some(&projection), None, None)
|
||||
.await?;
|
||||
for batch in &committed {
|
||||
let srcs = batch
|
||||
.column_by_name("src")
|
||||
.ok_or_else(|| OmniError::Lance("missing 'src' column on edge table".into()))?
|
||||
.as_any()
|
||||
.downcast_ref::<StringArray>()
|
||||
.ok_or_else(|| OmniError::Lance("'src' column is not Utf8".into()))?;
|
||||
// Optional shadow-key column (only present when dedupe is on).
|
||||
let key_arr = match (&pending_keys, dedupe_key_column) {
|
||||
(Some(set), Some(col)) if !set.is_empty() => batch
|
||||
.column_by_name(col)
|
||||
.and_then(|c| c.as_any().downcast_ref::<StringArray>()),
|
||||
_ => None,
|
||||
let replace_committed = staging.pending_mode(table_key) == Some(PendingMode::Overwrite);
|
||||
if !replace_committed {
|
||||
// Committed side: scan `src` plus the dedupe key column when set, so
|
||||
// we can both count and shadow in one pass.
|
||||
let projection: Vec<&str> = match dedupe_key_column {
|
||||
Some(col) if pending_keys.as_ref().is_some_and(|s| !s.is_empty()) => vec!["src", col],
|
||||
_ => vec!["src"],
|
||||
};
|
||||
for i in 0..srcs.len() {
|
||||
if !srcs.is_valid(i) {
|
||||
continue;
|
||||
}
|
||||
// Shadow this committed row if its key is in pending.
|
||||
if let (Some(arr), Some(set)) = (key_arr, pending_keys.as_ref()) {
|
||||
if arr.is_valid(i) && set.contains(arr.value(i)) {
|
||||
let committed = db
|
||||
.storage()
|
||||
.scan(committed_ds, Some(&projection), None, None)
|
||||
.await?;
|
||||
for batch in &committed {
|
||||
let srcs = batch
|
||||
.column_by_name("src")
|
||||
.ok_or_else(|| OmniError::Lance("missing 'src' column on edge table".into()))?
|
||||
.as_any()
|
||||
.downcast_ref::<StringArray>()
|
||||
.ok_or_else(|| OmniError::Lance("'src' column is not Utf8".into()))?;
|
||||
// Optional shadow-key column (only present when dedupe is on).
|
||||
let key_arr = match (&pending_keys, dedupe_key_column) {
|
||||
(Some(set), Some(col)) if !set.is_empty() => batch
|
||||
.column_by_name(col)
|
||||
.and_then(|c| c.as_any().downcast_ref::<StringArray>()),
|
||||
_ => None,
|
||||
};
|
||||
for i in 0..srcs.len() {
|
||||
if !srcs.is_valid(i) {
|
||||
continue;
|
||||
}
|
||||
// Shadow this committed row if its key is in pending.
|
||||
if let (Some(arr), Some(set)) = (key_arr, pending_keys.as_ref()) {
|
||||
if arr.is_valid(i) && set.contains(arr.value(i)) {
|
||||
continue;
|
||||
}
|
||||
}
|
||||
*counts.entry(srcs.value(i).to_string()).or_insert(0) += 1;
|
||||
}
|
||||
*counts.entry(srcs.value(i).to_string()).or_insert(0) += 1;
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -305,8 +305,7 @@ async fn load_jsonl_reader<R: BufRead>(
|
|||
if !catalog.node_types.contains_key(type_name) {
|
||||
return Err(OmniError::manifest(format!(
|
||||
"record {}: unknown node type '{}'",
|
||||
record_num,
|
||||
type_name
|
||||
record_num, type_name
|
||||
)));
|
||||
}
|
||||
let data = value
|
||||
|
|
@ -321,8 +320,7 @@ async fn load_jsonl_reader<R: BufRead>(
|
|||
if catalog.lookup_edge_by_name(edge_name).is_none() {
|
||||
return Err(OmniError::manifest(format!(
|
||||
"record {}: unknown edge type '{}'",
|
||||
record_num,
|
||||
edge_name
|
||||
record_num, edge_name
|
||||
)));
|
||||
}
|
||||
let from = value
|
||||
|
|
@ -357,27 +355,23 @@ async fn load_jsonl_reader<R: BufRead>(
|
|||
}
|
||||
|
||||
// Phase 2: Build per-type RecordBatches and accumulate into the
|
||||
// staging pipeline. For Append/Merge, batches go into an in-memory
|
||||
// accumulator and a single `stage_*` + `commit_staged` per touched
|
||||
// table runs at end-of-load — a mid-load failure (RI / cardinality
|
||||
// violation) leaves Lance HEAD untouched. For Overwrite, the legacy
|
||||
// inline-commit path is preserved (truncate+append doesn't fit the
|
||||
// staged shape cleanly, and overwrite has no in-flight read-your-writes
|
||||
// requirement).
|
||||
// staging pipeline. Batches go into an in-memory accumulator and a
|
||||
// single `stage_*` + `commit_staged` per touched table runs at
|
||||
// end-of-load — a mid-load failure (RI / cardinality violation) leaves
|
||||
// Lance HEAD untouched. `LoadMode::Overwrite` uses Lance's staged
|
||||
// `Overwrite` transaction rather than the former truncate-then-append
|
||||
// inline path.
|
||||
|
||||
let mut result = LoadResult::default();
|
||||
let snapshot = db.snapshot_for_branch(branch).await?;
|
||||
let use_staging = !matches!(mode, LoadMode::Overwrite);
|
||||
let mut staging = MutationStaging::default();
|
||||
let mut overwrite_updates: Vec<crate::db::SubTableUpdate> = Vec::new();
|
||||
let mut overwrite_expected: HashMap<String, u64> = HashMap::new();
|
||||
let pending_mode = match mode {
|
||||
LoadMode::Merge => PendingMode::Merge,
|
||||
// Append-mode loads accumulate as Append. Edge tables (no @key)
|
||||
// and no-key node tables stay safe on the stage_append path. The
|
||||
// Merge mode applies dedupe-by-id; Append assumes unique inputs.
|
||||
LoadMode::Append => PendingMode::Append,
|
||||
LoadMode::Overwrite => PendingMode::Append, // unused
|
||||
LoadMode::Overwrite => PendingMode::Overwrite,
|
||||
};
|
||||
// Map LoadMode to MutationOpKind for the version-check policy.
|
||||
// Append/Merge skip the strict pre-stage check (concurrency-safe
|
||||
|
|
@ -405,81 +399,43 @@ async fn load_jsonl_reader<R: BufRead>(
|
|||
}
|
||||
let loaded_count = batch.num_rows();
|
||||
let table_key = format!("node:{}", type_name);
|
||||
let entry = snapshot
|
||||
let _entry = snapshot
|
||||
.entry(&table_key)
|
||||
.ok_or_else(|| OmniError::manifest(format!("no manifest entry for {}", table_key)))?;
|
||||
if !use_staging {
|
||||
overwrite_expected.insert(table_key.clone(), entry.table_version);
|
||||
}
|
||||
prepared_nodes.push((type_name.clone(), table_key, batch, loaded_count));
|
||||
}
|
||||
|
||||
// Phase 2b: write every node type. Append/Merge → in-memory
|
||||
// accumulator. Overwrite → concurrent inline-commit (legacy path).
|
||||
if use_staging {
|
||||
for (type_name, table_key, batch, loaded_count) in prepared_nodes {
|
||||
let (ds, full_path, table_branch) = db
|
||||
.open_for_mutation_on_branch(branch, &table_key, load_op_kind)
|
||||
.await?;
|
||||
let expected_version = ds.version().version;
|
||||
staging.ensure_path(
|
||||
&table_key,
|
||||
full_path,
|
||||
table_branch,
|
||||
expected_version,
|
||||
load_op_kind,
|
||||
);
|
||||
let schema = batch.schema();
|
||||
staging.append_batch(&table_key, schema, pending_mode, batch)?;
|
||||
result.nodes_loaded.insert(type_name, loaded_count);
|
||||
}
|
||||
} else {
|
||||
let node_write_results =
|
||||
write_batches_concurrently(db, branch, mode, prepared_nodes).await?;
|
||||
for (type_name, table_key, loaded_count, state, table_branch) in node_write_results {
|
||||
overwrite_updates.push(crate::db::SubTableUpdate {
|
||||
table_key,
|
||||
table_version: state.version,
|
||||
table_branch,
|
||||
row_count: state.row_count,
|
||||
version_metadata: state.version_metadata,
|
||||
});
|
||||
result.nodes_loaded.insert(type_name, loaded_count);
|
||||
}
|
||||
// Phase 2b: accumulate every node type in memory. Fragment writes are
|
||||
// delayed until after all validation succeeds.
|
||||
for (type_name, table_key, batch, loaded_count) in prepared_nodes {
|
||||
let (ds, full_path, table_branch) = db
|
||||
.open_for_mutation_on_branch(branch, &table_key, load_op_kind)
|
||||
.await?;
|
||||
let expected_version = ds.version();
|
||||
staging.ensure_path(
|
||||
&table_key,
|
||||
full_path,
|
||||
table_branch,
|
||||
expected_version,
|
||||
load_op_kind,
|
||||
);
|
||||
let schema = batch.schema();
|
||||
staging.append_batch(&table_key, schema, pending_mode, batch)?;
|
||||
result.nodes_loaded.insert(type_name, loaded_count);
|
||||
}
|
||||
|
||||
// Phase 2c: Validate edge referential integrity — every src/dst must
|
||||
// reference an existing node ID in the appropriate type. For staged
|
||||
// loads, the lookup unions snapshot-committed IDs with the in-memory
|
||||
// pending batches (which carry the just-staged node inserts).
|
||||
// reference an existing node ID in the appropriate type. For
|
||||
// Append/Merge the lookup unions snapshot-committed IDs with the
|
||||
// in-memory pending batches. For Overwrite, a touched node table's
|
||||
// pending batch is the replacement image, so committed rows are not
|
||||
// included for that table.
|
||||
for (edge_name, rows) in &edge_rows {
|
||||
let edge_type = &catalog.edge_types[edge_name];
|
||||
let from_ids = if use_staging {
|
||||
collect_node_ids_with_pending(db, branch, &edge_type.from_type, &staging).await?
|
||||
} else {
|
||||
collect_node_ids(
|
||||
db,
|
||||
branch,
|
||||
&edge_type.from_type,
|
||||
&node_rows,
|
||||
&catalog,
|
||||
&overwrite_updates,
|
||||
)
|
||||
.await?
|
||||
};
|
||||
let to_ids = if use_staging {
|
||||
collect_node_ids_with_pending(db, branch, &edge_type.to_type, &staging).await?
|
||||
} else {
|
||||
collect_node_ids(
|
||||
db,
|
||||
branch,
|
||||
&edge_type.to_type,
|
||||
&node_rows,
|
||||
&catalog,
|
||||
&overwrite_updates,
|
||||
)
|
||||
.await?
|
||||
};
|
||||
let from_ids =
|
||||
collect_node_ids_with_pending(db, branch, &edge_type.from_type, &staging).await?;
|
||||
let to_ids =
|
||||
collect_node_ids_with_pending(db, branch, &edge_type.to_type, &staging).await?;
|
||||
|
||||
for (i, (src, dst, _)) in rows.iter().enumerate() {
|
||||
if !from_ids.contains(src.as_str()) {
|
||||
|
|
@ -516,118 +472,72 @@ async fn load_jsonl_reader<R: BufRead>(
|
|||
}
|
||||
let loaded_count = batch.num_rows();
|
||||
let table_key = format!("edge:{}", edge_name);
|
||||
let entry = snapshot
|
||||
let _entry = snapshot
|
||||
.entry(&table_key)
|
||||
.ok_or_else(|| OmniError::manifest(format!("no manifest entry for {}", table_key)))?;
|
||||
if !use_staging {
|
||||
overwrite_expected.insert(table_key.clone(), entry.table_version);
|
||||
}
|
||||
prepared_edges.push((edge_name.clone(), table_key, batch, loaded_count));
|
||||
}
|
||||
|
||||
// Phase 2e: write every edge type. Same dispatch as Phase 2b.
|
||||
if use_staging {
|
||||
for (edge_name, table_key, batch, loaded_count) in prepared_edges {
|
||||
let (ds, full_path, table_branch) = db
|
||||
.open_for_mutation_on_branch(branch, &table_key, load_op_kind)
|
||||
.await?;
|
||||
let expected_version = ds.version().version;
|
||||
staging.ensure_path(
|
||||
&table_key,
|
||||
full_path,
|
||||
table_branch,
|
||||
expected_version,
|
||||
load_op_kind,
|
||||
);
|
||||
let schema = batch.schema();
|
||||
staging.append_batch(&table_key, schema, pending_mode, batch)?;
|
||||
result.edges_loaded.insert(edge_name, loaded_count);
|
||||
}
|
||||
} else {
|
||||
let edge_write_results =
|
||||
write_batches_concurrently(db, branch, mode, prepared_edges).await?;
|
||||
for (edge_name, table_key, loaded_count, state, table_branch) in edge_write_results {
|
||||
overwrite_updates.push(crate::db::SubTableUpdate {
|
||||
table_key,
|
||||
table_version: state.version,
|
||||
table_branch,
|
||||
row_count: state.row_count,
|
||||
version_metadata: state.version_metadata,
|
||||
});
|
||||
result.edges_loaded.insert(edge_name, loaded_count);
|
||||
}
|
||||
// Phase 2e: accumulate every edge type. Same dispatch as Phase 2b.
|
||||
for (edge_name, table_key, batch, loaded_count) in prepared_edges {
|
||||
let (ds, full_path, table_branch) = db
|
||||
.open_for_mutation_on_branch(branch, &table_key, load_op_kind)
|
||||
.await?;
|
||||
let expected_version = ds.version();
|
||||
staging.ensure_path(
|
||||
&table_key,
|
||||
full_path,
|
||||
table_branch,
|
||||
expected_version,
|
||||
load_op_kind,
|
||||
);
|
||||
let schema = batch.schema();
|
||||
staging.append_batch(&table_key, schema, pending_mode, batch)?;
|
||||
result.edges_loaded.insert(edge_name, loaded_count);
|
||||
}
|
||||
|
||||
// Phase 3: Validate edge cardinality constraints (before commit —
|
||||
// invalid data must not be committed). Staged path scans committed
|
||||
// edges via Lance + iterates pending edges in-memory. Overwrite path
|
||||
// opens the just-written version (legacy behavior).
|
||||
// invalid data must not be committed). The helper scans committed
|
||||
// edges via Lance + iterates pending edges in-memory; for Overwrite it
|
||||
// treats the pending edge batches as the replacement table image.
|
||||
for (edge_name, _) in &edge_rows {
|
||||
let edge_type = &catalog.edge_types[edge_name];
|
||||
let table_key = format!("edge:{}", edge_name);
|
||||
if use_staging {
|
||||
validate_edge_cardinality_with_pending_loader(
|
||||
db, branch, edge_type, &table_key, &staging, mode,
|
||||
)
|
||||
.await?;
|
||||
} else if let Some(update) = overwrite_updates.iter().find(|u| u.table_key == table_key) {
|
||||
validate_edge_cardinality(
|
||||
db,
|
||||
branch,
|
||||
edge_name,
|
||||
update.table_version,
|
||||
update.table_branch.as_deref(),
|
||||
)
|
||||
.await?;
|
||||
}
|
||||
validate_edge_cardinality_with_pending_loader(
|
||||
db, branch, edge_type, &table_key, &staging, mode,
|
||||
)
|
||||
.await?;
|
||||
}
|
||||
|
||||
// Phase 4: Atomic manifest commit with publisher-level OCC.
|
||||
if use_staging {
|
||||
let staged = staging.stage_all(db, branch).await?;
|
||||
// `_queue_guards` holds per-(table_key, branch) write queues
|
||||
// across the manifest publish below — see exec/mutation.rs for
|
||||
// the rationale (interleaving prevention).
|
||||
let (updates, expected_versions, sidecar_handle, _queue_guards) = staged
|
||||
.commit_all(db, branch, crate::db::manifest::SidecarKind::Load, actor_id)
|
||||
.await?;
|
||||
// Same finalize → publisher residual as mutations: per-table
|
||||
// staged commits have advanced Lance HEAD, but the manifest
|
||||
// publish has not run yet. Reuse the mutation failpoint name so
|
||||
// one failpoint pins the shared `MutationStaging` boundary.
|
||||
crate::failpoints::maybe_fail("mutation.post_finalize_pre_publisher")?;
|
||||
db.commit_updates_on_branch_with_expected(branch, &updates, &expected_versions, actor_id)
|
||||
.await?;
|
||||
// The recovery sidecar protects the per-table commit_staged →
|
||||
// manifest publish window. Phase C succeeded — clean up
|
||||
// best-effort: failing the user here would error out a write
|
||||
// that already landed durably.
|
||||
if let Some(handle) = sidecar_handle {
|
||||
if let Err(err) =
|
||||
crate::db::manifest::delete_sidecar(&handle, db.storage_adapter()).await
|
||||
{
|
||||
tracing::warn!(
|
||||
error = %err,
|
||||
operation_id = handle.operation_id.as_str(),
|
||||
"recovery sidecar cleanup failed; the next open's recovery sweep will resolve it"
|
||||
);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// LoadMode::Overwrite keeps the legacy inline-commit path —
|
||||
// truncate-then-append doesn't fit the staged shape (see
|
||||
// `docs/dev/writes.md` "LoadMode::Overwrite residual"). The recovery
|
||||
// sidecar is not applicable here because the writer doesn't go
|
||||
// through MutationStaging; per-table inline commits + a final
|
||||
// manifest publish handle their own residual via the documented
|
||||
// operator workflow (re-run overwrite to recover).
|
||||
db.commit_updates_on_branch_with_expected(
|
||||
branch,
|
||||
&overwrite_updates,
|
||||
&overwrite_expected,
|
||||
actor_id,
|
||||
)
|
||||
let staged = staging
|
||||
.stage_all_with_concurrency(db, branch, load_write_concurrency())
|
||||
.await?;
|
||||
// `_queue_guards` holds per-(table_key, branch) write queues
|
||||
// across the manifest publish below — see exec/mutation.rs for
|
||||
// the rationale (interleaving prevention).
|
||||
let (updates, expected_versions, sidecar_handle, _queue_guards) = staged
|
||||
.commit_all(db, branch, crate::db::manifest::SidecarKind::Load, actor_id)
|
||||
.await?;
|
||||
// Same finalize → publisher residual as mutations: per-table
|
||||
// staged commits have advanced Lance HEAD, but the manifest
|
||||
// publish has not run yet. Reuse the mutation failpoint name so
|
||||
// one failpoint pins the shared `MutationStaging` boundary.
|
||||
crate::failpoints::maybe_fail("mutation.post_finalize_pre_publisher")?;
|
||||
db.commit_updates_on_branch_with_expected(branch, &updates, &expected_versions, actor_id)
|
||||
.await?;
|
||||
// The recovery sidecar protects the per-table commit_staged →
|
||||
// manifest publish window. Phase C succeeded — clean up
|
||||
// best-effort: failing the user here would error out a write
|
||||
// that already landed durably.
|
||||
if let Some(handle) = sidecar_handle {
|
||||
if let Err(err) = crate::db::manifest::delete_sidecar(&handle, db.storage_adapter()).await {
|
||||
tracing::warn!(
|
||||
error = %err,
|
||||
operation_id = handle.operation_id.as_str(),
|
||||
"recovery sidecar cleanup failed; the next open's recovery sweep will resolve it"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(result)
|
||||
|
|
@ -1157,89 +1067,6 @@ fn load_write_concurrency() -> usize {
|
|||
.unwrap_or(DEFAULT_LOAD_WRITE_CONCURRENCY)
|
||||
}
|
||||
|
||||
/// Write a set of prepared `(type_name, table_key, batch, row_count)` tuples
|
||||
/// concurrently. Returns results in original iteration order so callers can
|
||||
/// zip them back to per-type metadata.
|
||||
async fn write_batches_concurrently(
|
||||
db: &Omnigraph,
|
||||
branch: Option<&str>,
|
||||
mode: LoadMode,
|
||||
prepared: Vec<(String, String, RecordBatch, usize)>,
|
||||
) -> Result<
|
||||
Vec<(
|
||||
String,
|
||||
String,
|
||||
usize,
|
||||
crate::table_store::TableState,
|
||||
Option<String>,
|
||||
)>,
|
||||
> {
|
||||
use futures::stream::StreamExt;
|
||||
|
||||
if prepared.is_empty() {
|
||||
return Ok(Vec::new());
|
||||
}
|
||||
|
||||
let concurrency = load_write_concurrency().min(prepared.len()).max(1);
|
||||
|
||||
futures::stream::iter(prepared.into_iter().map(
|
||||
|(type_name, table_key, batch, loaded_count)| async move {
|
||||
let (state, table_branch) =
|
||||
write_batch_to_dataset(db, branch, &table_key, batch, mode).await?;
|
||||
Ok::<_, OmniError>((type_name, table_key, loaded_count, state, table_branch))
|
||||
},
|
||||
))
|
||||
.buffered(concurrency)
|
||||
.collect::<Vec<Result<_>>>()
|
||||
.await
|
||||
.into_iter()
|
||||
.collect()
|
||||
}
|
||||
|
||||
async fn write_batch_to_dataset(
|
||||
db: &Omnigraph,
|
||||
branch: Option<&str>,
|
||||
table_key: &str,
|
||||
batch: RecordBatch,
|
||||
mode: LoadMode,
|
||||
) -> Result<(crate::table_store::TableState, Option<String>)> {
|
||||
let op_kind = match mode {
|
||||
LoadMode::Append => crate::db::MutationOpKind::Insert,
|
||||
LoadMode::Merge => crate::db::MutationOpKind::Merge,
|
||||
LoadMode::Overwrite => crate::db::MutationOpKind::SchemaRewrite,
|
||||
};
|
||||
let (mut ds, full_path, table_branch) = db
|
||||
.open_for_mutation_on_branch(branch, table_key, op_kind)
|
||||
.await?;
|
||||
let table_store = db.table_store();
|
||||
|
||||
match mode {
|
||||
LoadMode::Overwrite => {
|
||||
let state = table_store
|
||||
.overwrite_batch(&full_path, &mut ds, batch)
|
||||
.await?;
|
||||
Ok((state, table_branch))
|
||||
}
|
||||
LoadMode::Append => {
|
||||
let state = table_store.append_batch(&full_path, &mut ds, batch).await?;
|
||||
Ok((state, table_branch))
|
||||
}
|
||||
LoadMode::Merge => {
|
||||
let state = table_store
|
||||
.merge_insert_batch(
|
||||
&full_path,
|
||||
ds,
|
||||
batch,
|
||||
vec!["id".to_string()],
|
||||
lance::dataset::WhenMatched::UpdateAll,
|
||||
lance::dataset::WhenNotMatched::InsertAll,
|
||||
)
|
||||
.await?;
|
||||
Ok((state, table_branch))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn generate_id() -> String {
|
||||
ulid::Ulid::new().to_string()
|
||||
}
|
||||
|
|
@ -1672,10 +1499,7 @@ pub(crate) async fn validate_edge_cardinality(
|
|||
.await?;
|
||||
|
||||
// Scan src column, count per source
|
||||
let batches = db
|
||||
.table_store()
|
||||
.scan(&ds, Some(&["src"]), None, None)
|
||||
.await?;
|
||||
let batches = db.storage().scan(&ds, Some(&["src"]), None, None).await?;
|
||||
|
||||
let mut counts: HashMap<String, u32> = HashMap::new();
|
||||
for batch in &batches {
|
||||
|
|
@ -1766,6 +1590,11 @@ async fn validate_edge_cardinality_with_pending_loader(
|
|||
/// - IDs from the staged loader's pending batches (in-memory; just-staged
|
||||
/// inserts of this type)
|
||||
/// - IDs from the committed sub-table at the pre-load snapshot version
|
||||
///
|
||||
/// For `LoadMode::Overwrite`, if the node table is touched then the pending
|
||||
/// batches are the replacement image. In that case committed IDs are not
|
||||
/// included, so edge RI is validated against exactly what the overwrite will
|
||||
/// publish.
|
||||
async fn collect_node_ids_with_pending(
|
||||
db: &Omnigraph,
|
||||
branch: Option<&str>,
|
||||
|
|
@ -1788,6 +1617,10 @@ async fn collect_node_ids_with_pending(
|
|||
}
|
||||
}
|
||||
|
||||
if staging.pending_mode(&table_key) == Some(PendingMode::Overwrite) {
|
||||
return Ok(ids);
|
||||
}
|
||||
|
||||
// From the committed Lance sub-table at the pre-load snapshot version.
|
||||
let snapshot = db.snapshot_for_branch(branch).await?;
|
||||
let Some(entry) = snapshot.entry(&table_key) else {
|
||||
|
|
@ -1801,10 +1634,7 @@ async fn collect_node_ids_with_pending(
|
|||
)
|
||||
.await?;
|
||||
|
||||
let batches = db
|
||||
.table_store()
|
||||
.scan(&ds, Some(&["id"]), None, None)
|
||||
.await?;
|
||||
let batches = db.storage().scan(&ds, Some(&["id"]), None, None).await?;
|
||||
|
||||
for batch in &batches {
|
||||
let id_col = batch
|
||||
|
|
@ -1827,72 +1657,6 @@ async fn collect_node_ids_with_pending(
|
|||
Ok(ids)
|
||||
}
|
||||
|
||||
/// Collect all valid node IDs for a given type. Union of:
|
||||
/// - IDs from the just-loaded batch (in memory, from node_rows)
|
||||
/// - IDs from the sub-table at the just-written version (if it was updated)
|
||||
/// - IDs from the sub-table at the snapshot-pinned version (if it was not updated)
|
||||
async fn collect_node_ids(
|
||||
db: &Omnigraph,
|
||||
branch: Option<&str>,
|
||||
type_name: &str,
|
||||
node_rows: &HashMap<String, Vec<JsonValue>>,
|
||||
catalog: &omnigraph_compiler::catalog::Catalog,
|
||||
updates: &[crate::db::SubTableUpdate],
|
||||
) -> Result<HashSet<String>> {
|
||||
let mut ids = HashSet::new();
|
||||
|
||||
// IDs from the in-memory batch (just loaded in this operation)
|
||||
if let Some(rows) = node_rows.get(type_name) {
|
||||
if let Some(node_type) = catalog.node_types.get(type_name) {
|
||||
if let Some(key_prop) = node_type.key_property() {
|
||||
for row in rows {
|
||||
if let Some(id) = row.get(key_prop).and_then(|v| v.as_str()) {
|
||||
ids.insert(id.to_string());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// IDs from the Lance sub-table
|
||||
let table_key = format!("node:{}", type_name);
|
||||
let snapshot = db.snapshot_for_branch(branch).await?;
|
||||
let Some(entry) = snapshot.entry(&table_key) else {
|
||||
return Ok(ids);
|
||||
};
|
||||
// Use the just-written version if this type was updated, else snapshot version
|
||||
let updated = updates
|
||||
.iter()
|
||||
.find(|u| u.table_key == table_key)
|
||||
.map(|u| (u.table_version, u.table_branch.as_deref()));
|
||||
let (version, branch) = updated.unwrap_or((entry.table_version, entry.table_branch.as_deref()));
|
||||
let ds = db
|
||||
.open_dataset_at_state(&entry.table_path, branch, version)
|
||||
.await?;
|
||||
|
||||
let batches = db
|
||||
.table_store()
|
||||
.scan(&ds, Some(&["id"]), None, None)
|
||||
.await?;
|
||||
|
||||
for batch in &batches {
|
||||
let id_col = batch
|
||||
.column_by_name("id")
|
||||
.unwrap()
|
||||
.as_any()
|
||||
.downcast_ref::<StringArray>()
|
||||
.unwrap();
|
||||
for i in 0..batch.num_rows() {
|
||||
if !id_col.is_valid(i) {
|
||||
continue;
|
||||
}
|
||||
ids.insert(id_col.value(i).to_string());
|
||||
}
|
||||
}
|
||||
|
||||
Ok(ids)
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
|
|
|||
|
|
@ -7,30 +7,32 @@
|
|||
//! way for new engine writers to advance Lance HEAD without coupling
|
||||
//! "write bytes" with "advance HEAD" in one Lance API call.
|
||||
//!
|
||||
//! ## Transitional residuals on the trait
|
||||
//! ## Inline-commit residuals live on a separate trait
|
||||
//!
|
||||
//! Several inline-commit methods remain on the trait surface as
|
||||
//! documented residuals: `delete_where`
|
||||
//! ([#6658](https://github.com/lance-format/lance/issues/6658) closed
|
||||
//! 2026-05-14, but the public `DeleteBuilder::execute_uncommitted` API
|
||||
//! did not backport to the 6.x release line — it first ships in
|
||||
//! `v7.0.0-beta.10`. Migration to staged two-phase delete is tracked as
|
||||
//! MR-A and is gated on the Lance v7.x bump, not the current v6.0.1 pin),
|
||||
//! `create_vector_index` (segment-commit-path requires
|
||||
//! `build_index_metadata_from_segments` which is `pub(crate)` — see
|
||||
//! [#6666](https://github.com/lance-format/lance/issues/6666), still open), and the
|
||||
//! legacy `append_batch` / `merge_insert_batches` / `overwrite_batch` /
|
||||
//! `create_btree_index` / `create_inverted_index` paths kept while
|
||||
//! engine call sites finish migrating off of them (Phase 1b / Phase 9
|
||||
//! of MR-793). These are named honestly at every call site; the
|
||||
//! forbidden-API guard test catches direct lance::* misuse outside the
|
||||
//! storage layer.
|
||||
//! The inline-commit writes that Lance cannot yet express as
|
||||
//! stage-then-commit are NOT on `TableStorage`. They sit on
|
||||
//! [`InlineCommitResidual`], reachable only via
|
||||
//! `Omnigraph::storage_inline_residual()`, so the default `db.storage()`
|
||||
//! surface is staged-only and cannot couple "write bytes" with "advance
|
||||
//! HEAD" — MR-793 acceptance §1 closes by construction. The residuals:
|
||||
//!
|
||||
//! * `delete_where` — Lance #6658 (`DeleteBuilder::execute_uncommitted`)
|
||||
//! did not backport to the 6.x line; it first ships in `v7.0.0-beta.10`.
|
||||
//! Migration to staged two-phase delete is tracked as MR-A, gated on the
|
||||
//! Lance v7.x bump.
|
||||
//! * `create_vector_index` — segment-commit-path needs
|
||||
//! `build_index_metadata_from_segments`, still `pub(crate)` in Lance
|
||||
//! 6.0.1 ([#6666](https://github.com/lance-format/lance/issues/6666),
|
||||
//! open). Scalar indices already stage.
|
||||
//!
|
||||
//! Each is named honestly at its call site; the forbidden-API guard test
|
||||
//! catches direct lance::* misuse outside the storage layer.
|
||||
//!
|
||||
//! ## Sealed
|
||||
//!
|
||||
//! `TableStorage: sealed::Sealed`. Only types in this crate can implement
|
||||
//! the trait, so a downstream crate cannot subvert the contract by
|
||||
//! providing its own impl.
|
||||
//! Both `TableStorage` and `InlineCommitResidual` are `: sealed::Sealed`.
|
||||
//! Only types in this crate can implement them, so a downstream crate
|
||||
//! cannot subvert the contract by providing its own impl.
|
||||
//!
|
||||
//! ## Opaque handles
|
||||
//!
|
||||
|
|
@ -40,15 +42,15 @@
|
|||
//! through. This aligns with the storage-boundary invariant:
|
||||
//! `lance::Dataset` does not appear in trait signatures.
|
||||
//!
|
||||
//! ## Migration status (MR-793 PR #70)
|
||||
//! ## Migration status
|
||||
//!
|
||||
//! Phases 1a / 2 / 4 / 5 / 6 are landed: trait scaffolding, three new
|
||||
//! staged primitives (`stage_overwrite`, scalar index staging), and
|
||||
//! migration of `ensure_indices`, `branch_merge`, `schema_apply` onto
|
||||
//! the staged surface. Phase 1b (call-site conversion to
|
||||
//! `Arc<dyn TableStorage>`), Phase 9 (demote unused inline-commit
|
||||
//! methods to `pub(crate)`), Phase 7 (recovery reconciler — MR-847),
|
||||
//! and Phase 8 (index reconciler — MR-848) are deferred to follow-ups.
|
||||
//! Phases 1a / 2 / 4 / 5 / 6 landed in MR-793 PR #70 (trait scaffolding,
|
||||
//! staged primitives, migration of `ensure_indices` / `branch_merge` /
|
||||
//! `schema_apply` onto the staged surface). Phase 1b (call-site
|
||||
//! conversion) and Phase 9 landed in MR-854, which also split the
|
||||
//! inline-commit residuals onto `InlineCommitResidual` so `db.storage()`
|
||||
//! is staged-only. Phase 7 (recovery reconciler) shipped as MR-847;
|
||||
//! Phase 8 (index reconciler) is tracked as MR-848.
|
||||
|
||||
use std::fmt::Debug;
|
||||
use std::sync::Arc;
|
||||
|
|
@ -105,12 +107,37 @@ impl SnapshotHandle {
|
|||
&self.inner
|
||||
}
|
||||
|
||||
/// Take ownership of the inner `Arc<Dataset>`. Used when committing
|
||||
/// staged writes (the call needs to consume the snapshot).
|
||||
/// Take ownership of the inner `Arc<Dataset>`. Used by the
|
||||
/// `TableStorage` impl when an op needs to mutate the dataset in
|
||||
/// place (commit a staged write, append, overwrite, …).
|
||||
///
|
||||
/// Performance note: callers consume the returned `Arc` via
|
||||
/// `Arc::try_unwrap(...).unwrap_or_else(|arc| (*arc).clone())`. The
|
||||
/// fast path (no clone) only fires when the snapshot is single-ref
|
||||
/// — i.e. the caller dropped every other `SnapshotHandle` clone
|
||||
/// before calling. Holding parallel clones (e.g. across an `await`
|
||||
/// point or stashed in a struct) forces a deep `Dataset` clone on
|
||||
/// every mutating op. Engine callers should pass `SnapshotHandle`
|
||||
/// by value into the mutating method, not keep a side copy.
|
||||
pub(crate) fn into_arc(self) -> Arc<Dataset> {
|
||||
self.inner
|
||||
}
|
||||
|
||||
/// Take ownership of the inner `Dataset` by unwrapping the `Arc`
|
||||
/// (or cloning if the snapshot is shared). `pub(crate)` — used
|
||||
/// only by the maintenance path (`optimize`, `cleanup`) which
|
||||
/// must hand `&mut Dataset` to Lance compaction / cleanup APIs
|
||||
/// that the `TableStorage` trait does not (and should not)
|
||||
/// surface. Engine code that participates in the staged-write
|
||||
/// invariant must stay on the trait methods.
|
||||
///
|
||||
/// Single-ref invariant: same fast-path/clone behavior as
|
||||
/// `into_arc` — see that method's doc. Drop sibling
|
||||
/// `SnapshotHandle` clones before calling.
|
||||
pub(crate) fn into_dataset(self) -> Dataset {
|
||||
Arc::try_unwrap(self.inner).unwrap_or_else(|arc| (*arc).clone())
|
||||
}
|
||||
|
||||
// ── public, lance-free accessors ──
|
||||
|
||||
/// Current Lance manifest version of the snapshot.
|
||||
|
|
@ -208,6 +235,20 @@ pub trait TableStorage: sealed::Sealed + Send + Sync + Debug {
|
|||
|
||||
async fn delete_branch(&self, dataset_uri: &str, branch: &str) -> Result<()>;
|
||||
|
||||
/// Idempotent variant of `delete_branch` used by the best-effort fork
|
||||
/// reclaim under branch delete (`db/omnigraph.rs::cleanup_deleted_branch_tables`)
|
||||
/// and by the orphan-fork reconciler in `optimize`. Tolerates an
|
||||
/// already-absent branch (both Lance's `RefNotFound` and the local-store
|
||||
/// `NotFound` quirk on a missing `tree/{branch}/` dir). A still-referenced
|
||||
/// branch (`RefConflict`) still surfaces as `OmniError::Lance`.
|
||||
async fn force_delete_branch(&self, dataset_uri: &str, branch: &str) -> Result<()>;
|
||||
|
||||
/// List the named Lance branches present on the dataset at `dataset_uri`.
|
||||
/// The `cleanup` orphan reconciler diffs this against the manifest
|
||||
/// branch set to find orphaned per-table forks. `main`/default is not a
|
||||
/// named branch and never appears here.
|
||||
async fn list_branches(&self, dataset_uri: &str) -> Result<Vec<String>>;
|
||||
|
||||
async fn reopen_for_mutation(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
|
|
@ -328,74 +369,19 @@ pub trait TableStorage: sealed::Sealed + Send + Sync + Debug {
|
|||
column: &str,
|
||||
) -> Result<StagedHandle>;
|
||||
|
||||
// ── Inline-commit residuals (named honestly per MR-793 §3.2) ──────
|
||||
// ── Index presence (reads, no HEAD advance) ──────────────────────
|
||||
//
|
||||
// These methods advance Lance HEAD as a side effect of writing.
|
||||
// They stay on the trait until the corresponding upstream Lance API
|
||||
// ships:
|
||||
//
|
||||
// * `delete_where` — Lance #6658 (two-phase delete).
|
||||
// * `create_*_index` — `build_index_metadata_from_segments` is
|
||||
// `pub(crate)` for vector indices in lance-4.0.0; scalar indices
|
||||
// migrate to staged in MR-793 Phase 2.
|
||||
// * `append_batch`, `merge_insert_batches`, `overwrite_batch` —
|
||||
// legacy paths that will be demoted to `pub(crate)` in MR-793
|
||||
// Phase 9 once all engine sites route through the staged
|
||||
// primitives.
|
||||
|
||||
async fn append_batch(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
snapshot: SnapshotHandle,
|
||||
batch: RecordBatch,
|
||||
) -> Result<(SnapshotHandle, TableState)>;
|
||||
|
||||
async fn merge_insert_batches(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
snapshot: SnapshotHandle,
|
||||
batches: Vec<RecordBatch>,
|
||||
key_columns: Vec<String>,
|
||||
when_matched: WhenMatched,
|
||||
when_not_matched: WhenNotMatched,
|
||||
) -> Result<TableState>;
|
||||
|
||||
async fn overwrite_batch(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
snapshot: SnapshotHandle,
|
||||
batch: RecordBatch,
|
||||
) -> Result<(SnapshotHandle, TableState)>;
|
||||
|
||||
async fn delete_where(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
snapshot: SnapshotHandle,
|
||||
filter: &str,
|
||||
) -> Result<(SnapshotHandle, DeleteState)>;
|
||||
// The inline-commit writes (`delete_where`, `create_vector_index`) are
|
||||
// deliberately NOT on this trait. They live on
|
||||
// the separate `InlineCommitResidual` trait, reachable only through
|
||||
// `Omnigraph::storage_inline_residual()`. As a result the default
|
||||
// `db.storage()` surface cannot couple "write bytes" with "advance HEAD"
|
||||
// — closing MR-793 acceptance §1 by construction rather than by review.
|
||||
|
||||
async fn has_btree_index(&self, snapshot: &SnapshotHandle, column: &str) -> Result<bool>;
|
||||
async fn has_fts_index(&self, snapshot: &SnapshotHandle, column: &str) -> Result<bool>;
|
||||
async fn has_vector_index(&self, snapshot: &SnapshotHandle, column: &str) -> Result<bool>;
|
||||
|
||||
async fn create_btree_index(
|
||||
&self,
|
||||
snapshot: SnapshotHandle,
|
||||
columns: &[&str],
|
||||
) -> Result<SnapshotHandle>;
|
||||
|
||||
async fn create_inverted_index(
|
||||
&self,
|
||||
snapshot: SnapshotHandle,
|
||||
column: &str,
|
||||
) -> Result<SnapshotHandle>;
|
||||
|
||||
async fn create_vector_index(
|
||||
&self,
|
||||
snapshot: SnapshotHandle,
|
||||
column: &str,
|
||||
) -> Result<SnapshotHandle>;
|
||||
|
||||
// ── URI helpers ────────────────────────────────────────────────────
|
||||
//
|
||||
// These are pure string formatting; they live on the trait so engine
|
||||
|
|
@ -422,6 +408,38 @@ pub trait TableStorage: sealed::Sealed + Send + Sync + Debug {
|
|||
) -> Result<DatasetRecordBatchStream>;
|
||||
}
|
||||
|
||||
// ─── InlineCommitResidual trait ────────────────────────────────────────────
|
||||
|
||||
/// Inline-commit residual surface: the writes Lance cannot yet express as a
|
||||
/// stage-then-commit pair, so they advance Lance HEAD as a side effect of
|
||||
/// writing. Kept OFF `TableStorage` and reachable only through
|
||||
/// `Omnigraph::storage_inline_residual()`, so the default `db.storage()` path
|
||||
/// is staged-only and a new writer cannot reintroduce the write+commit coupling
|
||||
/// by accident (MR-793 acceptance §1, by construction).
|
||||
///
|
||||
/// Residual reasons (each is named honestly at its call site):
|
||||
/// * `delete_where` — Lance has no public two-phase delete on the 6.x line
|
||||
/// (`DeleteBuilder::execute_uncommitted` first ships in v7.x; MR-A / Lance
|
||||
/// #6658). The D2 parse-time rule + recovery sidecars cover the gap meanwhile.
|
||||
/// * `create_vector_index` — vector-index segment-commit needs
|
||||
/// `build_index_metadata_from_segments`, still `pub(crate)` in Lance 6.0.1
|
||||
/// (Lance #6666). Scalar indices already stage.
|
||||
#[async_trait]
|
||||
pub(crate) trait InlineCommitResidual: sealed::Sealed + Send + Sync + Debug {
|
||||
async fn delete_where(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
snapshot: SnapshotHandle,
|
||||
filter: &str,
|
||||
) -> Result<(SnapshotHandle, DeleteState)>;
|
||||
|
||||
async fn create_vector_index(
|
||||
&self,
|
||||
snapshot: SnapshotHandle,
|
||||
column: &str,
|
||||
) -> Result<SnapshotHandle>;
|
||||
}
|
||||
|
||||
// ─── single impl: TableStore ──────────────────────────────────────────────
|
||||
|
||||
#[async_trait]
|
||||
|
|
@ -496,6 +514,14 @@ impl TableStorage for TableStore {
|
|||
TableStore::delete_branch(self, dataset_uri, branch).await
|
||||
}
|
||||
|
||||
async fn force_delete_branch(&self, dataset_uri: &str, branch: &str) -> Result<()> {
|
||||
TableStore::force_delete_branch(self, dataset_uri, branch).await
|
||||
}
|
||||
|
||||
async fn list_branches(&self, dataset_uri: &str) -> Result<Vec<String>> {
|
||||
TableStore::list_branches(self, dataset_uri).await
|
||||
}
|
||||
|
||||
async fn reopen_for_mutation(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
|
|
@ -689,61 +715,6 @@ impl TableStorage for TableStore {
|
|||
.map(StagedHandle::new)
|
||||
}
|
||||
|
||||
async fn append_batch(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
snapshot: SnapshotHandle,
|
||||
batch: RecordBatch,
|
||||
) -> Result<(SnapshotHandle, TableState)> {
|
||||
let mut ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
|
||||
let state = TableStore::append_batch(self, dataset_uri, &mut ds, batch).await?;
|
||||
Ok((SnapshotHandle::new(ds), state))
|
||||
}
|
||||
|
||||
async fn merge_insert_batches(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
snapshot: SnapshotHandle,
|
||||
batches: Vec<RecordBatch>,
|
||||
key_columns: Vec<String>,
|
||||
when_matched: WhenMatched,
|
||||
when_not_matched: WhenNotMatched,
|
||||
) -> Result<TableState> {
|
||||
let ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
|
||||
TableStore::merge_insert_batches(
|
||||
self,
|
||||
dataset_uri,
|
||||
ds,
|
||||
batches,
|
||||
key_columns,
|
||||
when_matched,
|
||||
when_not_matched,
|
||||
)
|
||||
.await
|
||||
}
|
||||
|
||||
async fn overwrite_batch(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
snapshot: SnapshotHandle,
|
||||
batch: RecordBatch,
|
||||
) -> Result<(SnapshotHandle, TableState)> {
|
||||
let mut ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
|
||||
let state = TableStore::overwrite_batch(self, dataset_uri, &mut ds, batch).await?;
|
||||
Ok((SnapshotHandle::new(ds), state))
|
||||
}
|
||||
|
||||
async fn delete_where(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
snapshot: SnapshotHandle,
|
||||
filter: &str,
|
||||
) -> Result<(SnapshotHandle, DeleteState)> {
|
||||
let mut ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
|
||||
let state = TableStore::delete_where(self, dataset_uri, &mut ds, filter).await?;
|
||||
Ok((SnapshotHandle::new(ds), state))
|
||||
}
|
||||
|
||||
async fn has_btree_index(&self, snapshot: &SnapshotHandle, column: &str) -> Result<bool> {
|
||||
TableStore::has_btree_index(self, snapshot.dataset(), column).await
|
||||
}
|
||||
|
|
@ -756,36 +727,6 @@ impl TableStorage for TableStore {
|
|||
TableStore::has_vector_index(self, snapshot.dataset(), column).await
|
||||
}
|
||||
|
||||
async fn create_btree_index(
|
||||
&self,
|
||||
snapshot: SnapshotHandle,
|
||||
columns: &[&str],
|
||||
) -> Result<SnapshotHandle> {
|
||||
let mut ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
|
||||
TableStore::create_btree_index(self, &mut ds, columns).await?;
|
||||
Ok(SnapshotHandle::new(ds))
|
||||
}
|
||||
|
||||
async fn create_inverted_index(
|
||||
&self,
|
||||
snapshot: SnapshotHandle,
|
||||
column: &str,
|
||||
) -> Result<SnapshotHandle> {
|
||||
let mut ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
|
||||
TableStore::create_inverted_index(self, &mut ds, column).await?;
|
||||
Ok(SnapshotHandle::new(ds))
|
||||
}
|
||||
|
||||
async fn create_vector_index(
|
||||
&self,
|
||||
snapshot: SnapshotHandle,
|
||||
column: &str,
|
||||
) -> Result<SnapshotHandle> {
|
||||
let mut ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
|
||||
TableStore::create_vector_index(self, &mut ds, column).await?;
|
||||
Ok(SnapshotHandle::new(ds))
|
||||
}
|
||||
|
||||
fn root_uri(&self) -> &str {
|
||||
TableStore::root_uri(self)
|
||||
}
|
||||
|
|
@ -815,3 +756,27 @@ impl TableStorage for TableStore {
|
|||
.await
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl InlineCommitResidual for TableStore {
|
||||
async fn delete_where(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
snapshot: SnapshotHandle,
|
||||
filter: &str,
|
||||
) -> Result<(SnapshotHandle, DeleteState)> {
|
||||
let mut ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
|
||||
let state = TableStore::delete_where(self, dataset_uri, &mut ds, filter).await?;
|
||||
Ok((SnapshotHandle::new(ds), state))
|
||||
}
|
||||
|
||||
async fn create_vector_index(
|
||||
&self,
|
||||
snapshot: SnapshotHandle,
|
||||
column: &str,
|
||||
) -> Result<SnapshotHandle> {
|
||||
let mut ds = Arc::try_unwrap(snapshot.into_arc()).unwrap_or_else(|arc| (*arc).clone());
|
||||
TableStore::create_vector_index(self, &mut ds, column).await?;
|
||||
Ok(SnapshotHandle::new(ds))
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -2,7 +2,6 @@ use arrow_array::{
|
|||
Array, ArrayRef, RecordBatch, StringArray, StructArray, UInt8Array, UInt32Array, UInt64Array,
|
||||
};
|
||||
use arrow_schema::SchemaRef;
|
||||
use arrow_select::concat::concat_batches;
|
||||
use futures::TryStreamExt;
|
||||
use lance::Dataset;
|
||||
use lance::blob::BlobArrayBuilder;
|
||||
|
|
@ -13,7 +12,7 @@ use lance::dataset::{
|
|||
CommitBuilder, InsertBuilder, MergeInsertBuilder, WhenMatched, WhenNotMatched, WriteMode,
|
||||
WriteParams,
|
||||
};
|
||||
use lance::datatypes::BlobKind;
|
||||
use lance::datatypes::{BlobKind, Schema as LanceSchema};
|
||||
use lance::index::DatasetIndexExt;
|
||||
use lance::index::scalar::IndexDetails;
|
||||
use lance_file::version::LanceFileVersion;
|
||||
|
|
@ -725,7 +724,14 @@ impl TableStore {
|
|||
})
|
||||
}
|
||||
|
||||
pub async fn append_batch(
|
||||
/// Legacy inline-commit append: writes fragments AND commits in one
|
||||
/// call, advancing Lance HEAD as a side effect. Not on the
|
||||
/// `TableStorage` trait surface — the staged primitive `stage_append`
|
||||
/// + `commit_staged` is the engine write path. This inherent
|
||||
/// `pub(crate)` method survives only for recovery test setup. Do not
|
||||
/// add new engine call sites — they re-introduce the multi-phase
|
||||
/// commit drift the trait surface was designed to eliminate.
|
||||
pub(crate) async fn append_batch(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
ds: &mut Dataset,
|
||||
|
|
@ -780,139 +786,7 @@ impl TableStore {
|
|||
}
|
||||
}
|
||||
|
||||
pub async fn overwrite_batch(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
ds: &mut Dataset,
|
||||
batch: RecordBatch,
|
||||
) -> Result<TableState> {
|
||||
ds.truncate_table()
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
self.append_batch(dataset_uri, ds, batch).await
|
||||
}
|
||||
|
||||
pub async fn overwrite_dataset(dataset_uri: &str, batch: RecordBatch) -> Result<Dataset> {
|
||||
let reader = arrow_array::RecordBatchIterator::new(vec![Ok(batch.clone())], batch.schema());
|
||||
let params = WriteParams {
|
||||
mode: WriteMode::Overwrite,
|
||||
enable_stable_row_ids: true,
|
||||
data_storage_version: Some(LanceFileVersion::V2_2),
|
||||
allow_external_blob_outside_bases: true,
|
||||
..Default::default()
|
||||
};
|
||||
Dataset::write(reader, dataset_uri, Some(params))
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))
|
||||
}
|
||||
|
||||
pub async fn merge_insert_batch(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
ds: Dataset,
|
||||
batch: RecordBatch,
|
||||
key_columns: Vec<String>,
|
||||
when_matched: WhenMatched,
|
||||
when_not_matched: WhenNotMatched,
|
||||
) -> Result<TableState> {
|
||||
if batch.num_rows() == 0 {
|
||||
return self.table_state(dataset_uri, &ds).await;
|
||||
}
|
||||
|
||||
// Precondition for the FirstSeen workaround below: every caller of
|
||||
// this primitive must hand in a source batch that is unique by
|
||||
// `key_columns`. Without this check, `SourceDedupeBehavior::FirstSeen`
|
||||
// would silently collapse genuine duplicates instead of erroring.
|
||||
check_batch_unique_by_keys(&batch, &key_columns, "merge_insert_batch")?;
|
||||
|
||||
// TODO(lance-upstream): MergeInsertBuilder does not accept WriteParams,
|
||||
// so allow_external_blob_outside_bases cannot be set here. External URI
|
||||
// blobs via merge_insert (LoadMode::Merge, mutations) are unsupported
|
||||
// until Lance exposes WriteParams on MergeInsertBuilder.
|
||||
let ds = Arc::new(ds);
|
||||
let mut builder = MergeInsertBuilder::try_new(ds, key_columns)
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
builder.when_matched(when_matched);
|
||||
builder.when_not_matched(when_not_matched);
|
||||
// Workaround for a Lance 4.0.x bug class where sequential
|
||||
// merge_insert calls against rows previously rewritten by
|
||||
// merge_insert produce a spurious "Ambiguous merge inserts:
|
||||
// multiple source rows match the same target row on (id = ...)"
|
||||
// error. Lance's `processed_row_ids: Mutex<HashSet<u64>>`
|
||||
// (lance-4.0.0 `src/dataset/write/merge_insert.rs:2099`)
|
||||
// double-processes the same source/target match against
|
||||
// datasets previously rewritten by merge_insert, and the default
|
||||
// `SourceDedupeBehavior::Fail` errors on the second insertion.
|
||||
// `FirstSeen` makes Lance skip the duplicate match instead.
|
||||
//
|
||||
// Covers both observed surfaces:
|
||||
// - PR #98 (sequential `load --mode merge` against same keys).
|
||||
// - MR-920 (sequential `update T set {f} where x=y` on same row).
|
||||
//
|
||||
// Correctness-preserving for OmniGraph because every call path
|
||||
// that reaches this primitive either pre-dedupes the source batch
|
||||
// by id, or surfaces a real source dup via the
|
||||
// `check_batch_unique_by_keys` precondition above (which fires
|
||||
// before the FirstSeen setter has a chance to silently collapse
|
||||
// anything):
|
||||
// - Load path: `enforce_unique_constraints_intra_batch`
|
||||
// (`loader/mod.rs:1442`) errors on intra-batch `@key` dups.
|
||||
// - Mutate path: `MutationStaging::finalize` (`exec/staging.rs`)
|
||||
// accumulates and dedupes by `id`.
|
||||
// - Branch-merge path: `compute_source_delta` /
|
||||
// `compute_three_way_delta` (`exec/merge.rs`) walk via
|
||||
// `OrderedTableCursor` and `push_row` each id at most once.
|
||||
// So FirstSeen only suppresses the spurious Lance behavior, never
|
||||
// user data. Pinned by `loader_rejects_intra_batch_duplicate_keys`
|
||||
// in `tests/consistency.rs` plus the
|
||||
// `check_batch_unique_by_keys` precondition.
|
||||
//
|
||||
// Retire when upstream Lance fixes the bug class. Tracked at
|
||||
// MR-957; upstream: lance-format/lance#6877.
|
||||
builder.source_dedupe_behavior(SourceDedupeBehavior::FirstSeen);
|
||||
let job = builder
|
||||
.try_build()
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
|
||||
let schema = batch.schema();
|
||||
let reader = arrow_array::RecordBatchIterator::new(vec![Ok(batch)], schema);
|
||||
let (new_ds, _stats) = job
|
||||
.execute(lance_datafusion::utils::reader_to_stream(Box::new(reader)))
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
self.table_state(dataset_uri, &new_ds).await
|
||||
}
|
||||
|
||||
pub async fn merge_insert_batches(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
ds: Dataset,
|
||||
batches: Vec<RecordBatch>,
|
||||
key_columns: Vec<String>,
|
||||
when_matched: WhenMatched,
|
||||
when_not_matched: WhenNotMatched,
|
||||
) -> Result<TableState> {
|
||||
if batches.is_empty() {
|
||||
return self.table_state(dataset_uri, &ds).await;
|
||||
}
|
||||
let batch = if batches.len() == 1 {
|
||||
batches.into_iter().next().unwrap()
|
||||
} else {
|
||||
let schema = batches[0].schema();
|
||||
concat_batches(&schema, &batches).map_err(|e| OmniError::Lance(e.to_string()))?
|
||||
};
|
||||
self.merge_insert_batch(
|
||||
dataset_uri,
|
||||
ds,
|
||||
batch,
|
||||
key_columns,
|
||||
when_matched,
|
||||
when_not_matched,
|
||||
)
|
||||
.await
|
||||
}
|
||||
|
||||
pub async fn delete_where(
|
||||
pub(crate) async fn delete_where(
|
||||
&self,
|
||||
dataset_uri: &str,
|
||||
ds: &mut Dataset,
|
||||
|
|
@ -1011,7 +885,7 @@ impl TableStore {
|
|||
}
|
||||
};
|
||||
// Assign real fragment IDs. Lance's `InsertBuilder::execute_uncommitted`
|
||||
// returns fragments with `id = 0` ("Temporary ID" — see lance-4.0.0
|
||||
// returns fragments with `id = 0` ("Temporary ID" — see lance-6.0.1
|
||||
// `dataset/write.rs:1044/1712`); the real assignment happens during
|
||||
// commit via `Transaction::fragments_with_ids`. Because we expose
|
||||
// these fragments to `scan_with_staged` *before* commit, two staged
|
||||
|
|
@ -1082,11 +956,12 @@ impl TableStore {
|
|||
));
|
||||
}
|
||||
|
||||
// Precondition for FirstSeen below. See the comment on
|
||||
// `merge_insert_batch` for why this check is here, not on the caller:
|
||||
// every call path that reaches stage_merge_insert (load,
|
||||
// MutationStaging::finalize, branch_merge::publish_rewritten_merge_table)
|
||||
// must hand in a source batch that is unique by `key_columns`.
|
||||
// Precondition for the FirstSeen workaround below: every call path that
|
||||
// reaches stage_merge_insert (load, MutationStaging::finalize,
|
||||
// branch_merge::publish_rewritten_merge_table) must hand in a source
|
||||
// batch that is unique by `key_columns`. Without this check,
|
||||
// `SourceDedupeBehavior::FirstSeen` would silently collapse genuine
|
||||
// duplicates instead of erroring.
|
||||
check_batch_unique_by_keys(&batch, &key_columns, "stage_merge_insert")?;
|
||||
|
||||
let ds = Arc::new(ds);
|
||||
|
|
@ -1094,11 +969,21 @@ impl TableStore {
|
|||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
builder.when_matched(when_matched);
|
||||
builder.when_not_matched(when_not_matched);
|
||||
// See `merge_insert_batch` for the FirstSeen rationale. Workaround
|
||||
// for the Lance 4.0.x bug class where sequential merge_insert /
|
||||
// update against rows previously rewritten by merge_insert trips
|
||||
// Lance's `processed_row_ids` HashSet and errors under the default
|
||||
// `SourceDedupeBehavior::Fail`. Retire when upstream Lance is fixed.
|
||||
// Workaround for a Lance bug class where sequential merge_insert calls
|
||||
// against rows previously rewritten by merge_insert produce a spurious
|
||||
// "Ambiguous merge inserts: multiple source rows match the same target
|
||||
// row on (id = ...)" error. Lance's `processed_row_ids:
|
||||
// Mutex<HashSet<u64>>` (lance-6.0.1 `src/dataset/write/merge_insert.rs`)
|
||||
// double-processes the same source/target match against datasets
|
||||
// previously rewritten by merge_insert, and the default
|
||||
// `SourceDedupeBehavior::Fail` errors on the second insertion; FirstSeen
|
||||
// makes Lance skip the duplicate match instead. Correctness-preserving
|
||||
// because every call path pre-dedupes the source batch by id or surfaces
|
||||
// a real source dup via `check_batch_unique_by_keys` above (load:
|
||||
// `enforce_unique_constraints_intra_batch`; mutate:
|
||||
// `MutationStaging::finalize`; branch-merge: the `OrderedTableCursor`
|
||||
// walk in `exec/merge.rs`). Retire when upstream Lance fixes the bug
|
||||
// class. Tracked at MR-957; upstream: lance-format/lance#6877.
|
||||
builder.source_dedupe_behavior(SourceDedupeBehavior::FirstSeen);
|
||||
let job = builder
|
||||
.try_build()
|
||||
|
|
@ -1174,40 +1059,51 @@ impl TableStore {
|
|||
/// MR-793 Phase 2: introduces this for the schema_apply rewrite path.
|
||||
/// Lance API verified in `.context/mr-793-design.md` Appendix A.1.
|
||||
pub async fn stage_overwrite(&self, ds: &Dataset, batch: RecordBatch) -> Result<StagedWrite> {
|
||||
if batch.num_rows() == 0 {
|
||||
return Err(OmniError::manifest_internal(
|
||||
"stage_overwrite called with empty batch".to_string(),
|
||||
));
|
||||
}
|
||||
// `enable_stable_row_ids: true` is defensive — empirically Lance 4.0.0
|
||||
// `enable_stable_row_ids: true` is defensive — empirically Lance 6.0.1
|
||||
// preserves the source dataset's flag through `Operation::Overwrite`
|
||||
// when WriteParams omits it (pinned by
|
||||
// `stage_overwrite_preserves_stable_row_ids` in tests/staged_writes.rs),
|
||||
// but setting it explicitly matches the public `overwrite_dataset`
|
||||
// path and keeps the invariant documented at every Overwrite site
|
||||
// but setting it explicitly keeps the invariant documented at every Overwrite site
|
||||
// (see docs/storage.md "Stable row IDs"). Setting it on an existing
|
||||
// dataset that was created without stable row IDs is a no-op per
|
||||
// Lance's row-id-lineage spec, so this stays correct for legacy
|
||||
// datasets.
|
||||
let params = WriteParams {
|
||||
mode: WriteMode::Overwrite,
|
||||
enable_stable_row_ids: true,
|
||||
allow_external_blob_outside_bases: true,
|
||||
..Default::default()
|
||||
};
|
||||
let transaction = InsertBuilder::new(Arc::new(ds.clone()))
|
||||
.with_params(¶ms)
|
||||
.execute_uncommitted(vec![batch])
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
let mut new_fragments = match &transaction.operation {
|
||||
Operation::Overwrite { fragments, .. } => fragments.clone(),
|
||||
other => {
|
||||
return Err(OmniError::manifest_internal(format!(
|
||||
"stage_overwrite: unexpected Lance operation {:?}",
|
||||
std::mem::discriminant(other)
|
||||
)));
|
||||
}
|
||||
let (transaction, mut new_fragments) = if batch.num_rows() == 0 {
|
||||
let schema = LanceSchema::try_from(batch.schema().as_ref())
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
let transaction = TransactionBuilder::new(
|
||||
ds.manifest.version,
|
||||
Operation::Overwrite {
|
||||
fragments: Vec::new(),
|
||||
schema,
|
||||
config_upsert_values: None,
|
||||
initial_bases: None,
|
||||
},
|
||||
)
|
||||
.build();
|
||||
(transaction, Vec::new())
|
||||
} else {
|
||||
let params = WriteParams {
|
||||
mode: WriteMode::Overwrite,
|
||||
enable_stable_row_ids: true,
|
||||
allow_external_blob_outside_bases: true,
|
||||
..Default::default()
|
||||
};
|
||||
let transaction = InsertBuilder::new(Arc::new(ds.clone()))
|
||||
.with_params(¶ms)
|
||||
.execute_uncommitted(vec![batch])
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
let new_fragments = match &transaction.operation {
|
||||
Operation::Overwrite { fragments, .. } => fragments.clone(),
|
||||
other => {
|
||||
return Err(OmniError::manifest_internal(format!(
|
||||
"stage_overwrite: unexpected Lance operation {:?}",
|
||||
std::mem::discriminant(other)
|
||||
)));
|
||||
}
|
||||
};
|
||||
(transaction, new_fragments)
|
||||
};
|
||||
// Overwrite REPLACES every committed fragment, and Lance restarts
|
||||
// fragment-ID and row-ID counters at the post-commit version.
|
||||
|
|
@ -1220,7 +1116,7 @@ impl TableStore {
|
|||
// 2) For stable-row-id datasets, assign row_id_meta starting
|
||||
// at 0 (Overwrite is a fresh-start) so `scan_with_staged`
|
||||
// doesn't hit the "Missing row id meta" panic in
|
||||
// lance-4.0.0 dataset/rowids.rs:22.
|
||||
// lance-6.0.1 dataset/rowids.rs:22.
|
||||
assign_fragment_ids(&mut new_fragments, 1);
|
||||
if ds.manifest.uses_stable_row_ids() {
|
||||
assign_row_id_meta(&mut new_fragments, 0)?;
|
||||
|
|
@ -1244,7 +1140,7 @@ impl TableStore {
|
|||
/// `IndexMetadata`; we manually wrap it in `Operation::CreateIndex
|
||||
/// { new_indices, removed_indices }` via the public `TransactionBuilder`,
|
||||
/// replicating the simple (non-segment-commit-path) branch of Lance's
|
||||
/// `CreateIndexBuilder::execute` (lance-4.0.0 `src/index/create.rs:502-512`).
|
||||
/// `CreateIndexBuilder::execute` (lance-6.0.1 `src/index/create.rs:502-512`).
|
||||
///
|
||||
/// `removed_indices` mirrors `execute()` lines 466-476: when the
|
||||
/// build replaces an existing same-named index, those entries are
|
||||
|
|
@ -1253,7 +1149,7 @@ impl TableStore {
|
|||
/// MR-793 Phase 2: scalar index types (BTree, Inverted) are
|
||||
/// stage-able. Vector indices are NOT (segment-commit-path requires
|
||||
/// `build_index_metadata_from_segments` which is `pub(crate)` in
|
||||
/// lance-4.0.0); see `create_vector_index` and Appendix A.3.
|
||||
/// lance-6.0.1); see `create_vector_index` and Appendix A.3.
|
||||
pub async fn stage_create_btree_index(
|
||||
&self,
|
||||
ds: &Dataset,
|
||||
|
|
@ -1348,7 +1244,7 @@ impl TableStore {
|
|||
/// committed fragments carry; Lance's optimizer drops them from the
|
||||
/// filtered scan even when their data would match. Staged-fragment
|
||||
/// rows are silently absent from the result. `scanner.use_stats(false)`
|
||||
/// does not fix this in lance 4.0.0. Callers needing correct filtered
|
||||
/// does not fix this in lance 6.0.1. Callers needing correct filtered
|
||||
/// reads against staged data should use a different strategy — the
|
||||
/// engine's `MutationStaging` accumulator unions in-memory pending
|
||||
/// batches with the committed scan via DataFusion `MemTable` (see
|
||||
|
|
@ -1572,25 +1468,7 @@ impl TableStore {
|
|||
}))
|
||||
}
|
||||
|
||||
pub async fn create_btree_index(&self, ds: &mut Dataset, columns: &[&str]) -> Result<()> {
|
||||
let params = ScalarIndexParams::default();
|
||||
ds.create_index_builder(columns, IndexType::BTree, ¶ms)
|
||||
.replace(true)
|
||||
.await
|
||||
.map(|_| ())
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))
|
||||
}
|
||||
|
||||
pub async fn create_inverted_index(&self, ds: &mut Dataset, column: &str) -> Result<()> {
|
||||
let params = InvertedIndexParams::default();
|
||||
ds.create_index_builder(&[column], IndexType::Inverted, ¶ms)
|
||||
.replace(true)
|
||||
.await
|
||||
.map(|_| ())
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))
|
||||
}
|
||||
|
||||
pub async fn create_vector_index(&self, ds: &mut Dataset, column: &str) -> Result<()> {
|
||||
pub(crate) async fn create_vector_index(&self, ds: &mut Dataset, column: &str) -> Result<()> {
|
||||
let params = lance::index::vector::VectorIndexParams::ivf_flat(1, MetricType::L2);
|
||||
ds.create_index_builder(&[column], IndexType::Vector, ¶ms)
|
||||
.replace(true)
|
||||
|
|
@ -1674,7 +1552,7 @@ fn prior_stages_fragment_count(prior_stages: &[StagedWrite]) -> u64 {
|
|||
}
|
||||
|
||||
/// Assign sequential fragment IDs starting at `start_id`. Mirrors Lance's
|
||||
/// commit-time `Transaction::fragments_with_ids` (lance-4.0.0
|
||||
/// commit-time `Transaction::fragments_with_ids` (lance-6.0.1
|
||||
/// `dataset/transaction.rs:1456`) — fragments produced by
|
||||
/// `InsertBuilder::execute_uncommitted` start with `id = 0` as a temporary
|
||||
/// placeholder; we renumber here so they don't collide with committed
|
||||
|
|
@ -1705,7 +1583,7 @@ fn prior_stages_row_count(prior_stages: &[StagedWrite]) -> Result<u64> {
|
|||
|
||||
/// Assign sequential row IDs to fragments that lack them, starting from
|
||||
/// `start_row_id`. Mirrors the relevant arm of Lance's
|
||||
/// `Transaction::assign_row_ids` (lance-4.0.0 `dataset/transaction.rs:2682`)
|
||||
/// `Transaction::assign_row_ids` (lance-6.0.1 `dataset/transaction.rs:2682`)
|
||||
/// for the `row_id_meta = None` case — fragments produced by
|
||||
/// `InsertBuilder::execute_uncommitted` against a stable-row-id dataset.
|
||||
///
|
||||
|
|
@ -1878,7 +1756,7 @@ fn combine_committed_with_staged(ds: &Dataset, staged: &[StagedWrite]) -> Vec<Fr
|
|||
combined
|
||||
}
|
||||
|
||||
/// Precondition guard for `merge_insert_batch` and `stage_merge_insert`.
|
||||
/// Precondition guard for `stage_merge_insert`.
|
||||
/// Both opt into `SourceDedupeBehavior::FirstSeen` to suppress the Lance
|
||||
/// `processed_row_ids` bug (MR-957). FirstSeen would *also* silently
|
||||
/// collapse genuine duplicate source keys; this check restores fail-fast
|
||||
|
|
|
|||
|
|
@ -126,7 +126,7 @@ async fn load_merge_upserts_existing_and_inserts_new() {
|
|||
/// source batch had one row per key.
|
||||
///
|
||||
/// Triggered by Lance's `processed_row_ids: Mutex<HashSet<u64>>`
|
||||
/// (lance-4.0.0 `src/dataset/write/merge_insert.rs:2099`) double-
|
||||
/// (lance-6.0.1 `src/dataset/write/merge_insert.rs:2099`) double-
|
||||
/// processing the same source/target match against datasets previously
|
||||
/// rewritten by merge_insert. Worked around by opting
|
||||
/// `MergeInsertBuilder` into `SourceDedupeBehavior::FirstSeen` in
|
||||
|
|
|
|||
|
|
@ -907,6 +907,76 @@ async fn recovery_rolls_forward_load_on_feature_branch() {
|
|||
);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn recovery_rolls_forward_load_overwrite() {
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
|
||||
let _scenario = FailScenario::setup();
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap().to_string();
|
||||
let operation_id;
|
||||
let parent_commit_id;
|
||||
|
||||
{
|
||||
let mut db = Omnigraph::init(&uri, helpers::TEST_SCHEMA).await.unwrap();
|
||||
load_jsonl(&mut db, helpers::TEST_DATA, LoadMode::Overwrite)
|
||||
.await
|
||||
.unwrap();
|
||||
parent_commit_id = branch_head_commit_id(dir.path(), "main").await.unwrap();
|
||||
|
||||
let _failpoint = ScopedFailPoint::new("mutation.post_finalize_pre_publisher", "return");
|
||||
let err = db
|
||||
.load(
|
||||
"main",
|
||||
r#"{"type":"Person","data":{"name":"OverwriteLoad","age":41}}
|
||||
"#,
|
||||
LoadMode::Overwrite,
|
||||
)
|
||||
.await
|
||||
.unwrap_err();
|
||||
assert!(
|
||||
err.to_string()
|
||||
.contains("injected failpoint triggered: mutation.post_finalize_pre_publisher"),
|
||||
"unexpected error: {err}"
|
||||
);
|
||||
operation_id = single_sidecar_operation_id(dir.path());
|
||||
}
|
||||
|
||||
let db = Omnigraph::open(&uri).await.unwrap();
|
||||
assert_eq!(
|
||||
helpers::count_rows(&db, "node:Person").await,
|
||||
1,
|
||||
"overwrite row must be visible after recovery rolls the load forward"
|
||||
);
|
||||
drop(db);
|
||||
|
||||
assert_post_recovery_invariants(
|
||||
dir.path(),
|
||||
&operation_id,
|
||||
RecoveryExpectation::RolledForward {
|
||||
tables: vec![
|
||||
TableExpectation::main("node:Person")
|
||||
.expected_recovery_parent_commit_id(parent_commit_id)
|
||||
.follow_up_mutation(FollowUpMutation::new(
|
||||
"main",
|
||||
MUTATION_QUERIES,
|
||||
"insert_person",
|
||||
mixed_params(&[("$name", "AfterOverwriteLoad")], &[("$age", 42)]),
|
||||
)),
|
||||
],
|
||||
},
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
let db = Omnigraph::open(&uri).await.unwrap();
|
||||
assert_eq!(
|
||||
helpers::count_rows(&db, "node:Person").await,
|
||||
2,
|
||||
"follow-up mutation must succeed after overwrite load recovery"
|
||||
);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn recovery_rolls_forward_ensure_indices_on_feature_branch() {
|
||||
use lance::index::DatasetIndexExt;
|
||||
|
|
@ -1132,7 +1202,6 @@ async fn refresh_runs_roll_forward_recovery_in_process() {
|
|||
#[tokio::test]
|
||||
async fn refresh_defers_rollback_eligible_sidecar_to_next_open() {
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
use omnigraph::table_store::TableStore;
|
||||
|
||||
let _scenario = FailScenario::setup();
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
|
@ -1162,12 +1231,8 @@ async fn refresh_defers_rollback_eligible_sidecar_to_next_open() {
|
|||
// touching the manifest) so the classifier can reach UnexpectedAtP1
|
||||
// / UnexpectedMultistep / RolledPastExpected paths that require
|
||||
// a real restore on rollback.
|
||||
let store = TableStore::new(&uri);
|
||||
let mut ds = lance::Dataset::open(&person_uri).await.unwrap();
|
||||
store
|
||||
.delete_where(&person_uri, &mut ds, "1 = 2")
|
||||
.await
|
||||
.unwrap();
|
||||
helpers::lance_delete_inline(&mut ds, "1 = 2").await;
|
||||
let head_after_drift = ds.version().version;
|
||||
assert_eq!(head_after_drift, manifest_pin + 1);
|
||||
|
||||
|
|
@ -1697,8 +1762,9 @@ async fn optimize_phase_b_failure_recovered_on_next_open() {
|
|||
ScopedFailPoint::new("optimize.post_phase_b_pre_manifest_commit", "return");
|
||||
let err = db.optimize().await.unwrap_err();
|
||||
assert!(
|
||||
err.to_string()
|
||||
.contains("injected failpoint triggered: optimize.post_phase_b_pre_manifest_commit"),
|
||||
err.to_string().contains(
|
||||
"injected failpoint triggered: optimize.post_phase_b_pre_manifest_commit"
|
||||
),
|
||||
"unexpected error: {err}"
|
||||
);
|
||||
|
||||
|
|
|
|||
|
|
@ -29,15 +29,21 @@
|
|||
//! the cross-table manifest commit. Documented exception.
|
||||
//! - `crates/omnigraph/src/storage_layer.rs` — IS the trait module.
|
||||
//!
|
||||
//! ## Transitional allow-list
|
||||
//! ## Allow-list shape
|
||||
//!
|
||||
//! The migration of writers onto staged primitives is incremental.
|
||||
//! Several writers (ensure_indices, branch_merge, schema_apply rewrites)
|
||||
//! already route through the staged primitives; others (bulk loader,
|
||||
//! exec/mutation, exec/query) still use the legacy inherent
|
||||
//! `TableStore` methods — they're not visible at the trait boundary, but
|
||||
//! they DO call lance types. The file-level allow-list below reflects
|
||||
//! this transitional state and tightens as call sites migrate.
|
||||
//! After MR-854, `db.storage()` (`&dyn TableStorage`) exposes only staged
|
||||
//! primitives + reads. The inline-commit writes live on a separate
|
||||
//! `InlineCommitResidual` trait reached via
|
||||
//! `Omnigraph::storage_inline_residual()`, so the default storage surface
|
||||
//! cannot couple "write bytes" with "advance HEAD" — engine code that
|
||||
//! wants an inline residual must name the residual accessor explicitly.
|
||||
//! The only residuals are `delete_where` (Lance #6658 / v7.x) and
|
||||
//! `create_vector_index` (Lance #6666). The dead legacy methods
|
||||
//! (trait `append_batch` / `merge_insert_batches`, inherent
|
||||
//! `merge_insert_batch{,es}`, `create_{btree,inverted}_index`) were
|
||||
//! removed entirely. This guard's scope is unchanged: it catches direct
|
||||
//! `lance::*` inline-commit misuse outside the storage layer. The
|
||||
//! file-level allow-list below matches that boundary.
|
||||
|
||||
use std::path::{Path, PathBuf};
|
||||
|
||||
|
|
|
|||
|
|
@ -195,6 +195,14 @@ pub async fn diff_since_branch(
|
|||
.await
|
||||
}
|
||||
|
||||
/// Advance a Lance dataset HEAD directly from tests without going through
|
||||
/// OmniGraph's storage residual surface. Used to synthesize uncovered drift.
|
||||
pub async fn lance_delete_inline(ds: &mut lance::Dataset, filter: &str) -> usize {
|
||||
let result = ds.delete(filter).await.unwrap();
|
||||
*ds = (*result.new_dataset).clone();
|
||||
result.num_deleted_rows as usize
|
||||
}
|
||||
|
||||
/// Build a ParamMap from string key-value pairs.
|
||||
pub fn params(pairs: &[(&str, &str)]) -> ParamMap {
|
||||
pairs
|
||||
|
|
@ -258,6 +266,27 @@ pub fn vector_and_string_params(
|
|||
map
|
||||
}
|
||||
|
||||
/// Test-only helper: perform a raw `Dataset::append` against Lance,
|
||||
/// advancing Lance HEAD without going through the manifest. Used by
|
||||
/// `recovery::*` and `staged_writes::*` tests that deliberately set up
|
||||
/// HEAD-ahead-of-manifest drift scenarios.
|
||||
///
|
||||
/// This mirrors the body of the engine's inline-commit
|
||||
/// `TableStore::append_batch` (which is `pub(crate)` after MR-854) —
|
||||
/// kept here as a test helper because integration tests need to
|
||||
/// simulate drift without depending on the demoted crate-internal API.
|
||||
pub async fn lance_append_inline(ds: &mut lance::Dataset, batch: RecordBatch) {
|
||||
use lance::dataset::{WriteMode, WriteParams};
|
||||
let schema = batch.schema();
|
||||
let reader = arrow_array::RecordBatchIterator::new(vec![Ok(batch)], schema);
|
||||
let params = WriteParams {
|
||||
mode: WriteMode::Append,
|
||||
allow_external_blob_outside_bases: true,
|
||||
..Default::default()
|
||||
};
|
||||
ds.append(reader, Some(params)).await.unwrap();
|
||||
}
|
||||
|
||||
pub fn s3_test_graph_uri(suite: &str) -> Option<String> {
|
||||
let bucket = std::env::var("OMNIGRAPH_S3_TEST_BUCKET").ok()?;
|
||||
let prefix = std::env::var("OMNIGRAPH_S3_TEST_PREFIX")
|
||||
|
|
|
|||
|
|
@ -175,7 +175,6 @@ async fn read_only_open_skips_recovery_sweep() {
|
|||
#[tokio::test]
|
||||
async fn recovery_rolls_back_synthetic_drift_on_open() {
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
use omnigraph::table_store::TableStore;
|
||||
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap();
|
||||
|
|
@ -202,13 +201,9 @@ async fn recovery_rolls_back_synthetic_drift_on_open() {
|
|||
// residual the sweep recovers from is the manifest-vs-Lance-HEAD gap;
|
||||
// it's agnostic to *what* op caused the gap.
|
||||
let person_uri = node_table_uri(uri, "Person");
|
||||
let store = TableStore::new(uri);
|
||||
let mut ds = Dataset::open(&person_uri).await.unwrap();
|
||||
let head_before_drift = ds.version().version;
|
||||
let _ = store
|
||||
.delete_where(&person_uri, &mut ds, "1 = 2")
|
||||
.await
|
||||
.unwrap();
|
||||
let _ = helpers::lance_delete_inline(&mut ds, "1 = 2").await;
|
||||
let head_after_drift = ds.version().version;
|
||||
assert_eq!(
|
||||
head_after_drift,
|
||||
|
|
@ -290,7 +285,6 @@ async fn recovery_rolls_back_synthetic_drift_on_open() {
|
|||
async fn recovery_rollback_converges_manifest_so_schema_apply_succeeds() {
|
||||
use omnigraph::db::ReadTarget;
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
use omnigraph::table_store::TableStore;
|
||||
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap();
|
||||
|
|
@ -310,13 +304,9 @@ async fn recovery_rollback_converges_manifest_so_schema_apply_succeeds() {
|
|||
// Forge a Phase-B residual: advance Person's Lance HEAD without publishing to
|
||||
// the manifest (the manifest pin stays at the load's committed version).
|
||||
let person_uri = node_table_uri(uri, "Person");
|
||||
let store = TableStore::new(uri);
|
||||
let mut ds = Dataset::open(&person_uri).await.unwrap();
|
||||
let manifest_pin = ds.version().version;
|
||||
let _ = store
|
||||
.delete_where(&person_uri, &mut ds, "1 = 2")
|
||||
.await
|
||||
.unwrap();
|
||||
let _ = helpers::lance_delete_inline(&mut ds, "1 = 2").await;
|
||||
drop(ds);
|
||||
|
||||
// Roll-back-classified sidecar (post_commit_pin != observed head ⇒
|
||||
|
|
@ -518,7 +508,6 @@ async fn count_recovery_actor_commits(graph_root: &Path) -> usize {
|
|||
#[tokio::test]
|
||||
async fn recovery_rolls_forward_after_phase_b_completes() {
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
use omnigraph::table_store::TableStore;
|
||||
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap();
|
||||
|
|
@ -535,16 +524,12 @@ async fn recovery_rolls_forward_after_phase_b_completes() {
|
|||
drop(db);
|
||||
|
||||
let person_uri = node_table_uri(uri, "Person");
|
||||
let store = TableStore::new(uri);
|
||||
let mut ds = Dataset::open(&person_uri).await.unwrap();
|
||||
let head_before = ds.version().version;
|
||||
|
||||
// Synthesize a successful Phase B: advance Lance HEAD by one
|
||||
// (delete_where with no-match — no fragment changes, but version bumps).
|
||||
let _ = store
|
||||
.delete_where(&person_uri, &mut ds, "1 = 2")
|
||||
.await
|
||||
.unwrap();
|
||||
let _ = helpers::lance_delete_inline(&mut ds, "1 = 2").await;
|
||||
let head_after = ds.version().version;
|
||||
assert_eq!(head_after, head_before + 1);
|
||||
|
||||
|
|
@ -728,7 +713,6 @@ async fn recovery_records_rolled_forward_for_stale_sidecar_after_successful_roll
|
|||
#[tokio::test]
|
||||
async fn recovery_rolls_back_records_audit_row_with_recovery_actor() {
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
use omnigraph::table_store::TableStore;
|
||||
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap();
|
||||
|
|
@ -742,13 +726,9 @@ async fn recovery_rolls_back_records_audit_row_with_recovery_actor() {
|
|||
drop(db);
|
||||
|
||||
let person_uri = node_table_uri(uri, "Person");
|
||||
let store = TableStore::new(uri);
|
||||
let mut ds = Dataset::open(&person_uri).await.unwrap();
|
||||
let head_before = ds.version().version;
|
||||
let _ = store
|
||||
.delete_where(&person_uri, &mut ds, "1 = 2")
|
||||
.await
|
||||
.unwrap();
|
||||
let _ = helpers::lance_delete_inline(&mut ds, "1 = 2").await;
|
||||
let head_after = ds.version().version;
|
||||
let _ = head_after;
|
||||
|
||||
|
|
@ -795,7 +775,6 @@ async fn recovery_rolls_back_records_audit_row_with_recovery_actor() {
|
|||
#[tokio::test]
|
||||
async fn recovery_rolls_forward_with_null_actor() {
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
use omnigraph::table_store::TableStore;
|
||||
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap();
|
||||
|
|
@ -809,13 +788,9 @@ async fn recovery_rolls_forward_with_null_actor() {
|
|||
drop(db);
|
||||
|
||||
let person_uri = node_table_uri(uri, "Person");
|
||||
let store = TableStore::new(uri);
|
||||
let mut ds = Dataset::open(&person_uri).await.unwrap();
|
||||
let head_before = ds.version().version;
|
||||
let _ = store
|
||||
.delete_where(&person_uri, &mut ds, "1 = 2")
|
||||
.await
|
||||
.unwrap();
|
||||
let _ = helpers::lance_delete_inline(&mut ds, "1 = 2").await;
|
||||
let head_after = ds.version().version;
|
||||
|
||||
// Sidecar with no actor_id (CLI-driven mutation; common case).
|
||||
|
|
@ -871,7 +846,6 @@ async fn recovery_rolls_forward_with_null_actor() {
|
|||
#[tokio::test]
|
||||
async fn recovery_processes_multiple_sidecars_with_fresh_snapshot_per_iter() {
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
use omnigraph::table_store::TableStore;
|
||||
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap();
|
||||
|
|
@ -889,21 +863,14 @@ async fn recovery_processes_multiple_sidecars_with_fresh_snapshot_per_iter() {
|
|||
// Synthesize drift on both tables independently.
|
||||
let person_uri = node_table_uri(uri, "Person");
|
||||
let company_uri = node_table_uri(uri, "Company");
|
||||
let store = TableStore::new(uri);
|
||||
let mut person_ds = Dataset::open(&person_uri).await.unwrap();
|
||||
let person_pre = person_ds.version().version;
|
||||
let _ = store
|
||||
.delete_where(&person_uri, &mut person_ds, "1 = 2")
|
||||
.await
|
||||
.unwrap();
|
||||
let _ = helpers::lance_delete_inline(&mut person_ds, "1 = 2").await;
|
||||
let person_post = person_ds.version().version;
|
||||
|
||||
let mut company_ds = Dataset::open(&company_uri).await.unwrap();
|
||||
let company_pre = company_ds.version().version;
|
||||
let _ = store
|
||||
.delete_where(&company_uri, &mut company_ds, "1 = 2")
|
||||
.await
|
||||
.unwrap();
|
||||
let _ = helpers::lance_delete_inline(&mut company_ds, "1 = 2").await;
|
||||
let company_post = company_ds.version().version;
|
||||
|
||||
// Drop two sidecars; ULID prefix ensures sort order is A then B.
|
||||
|
|
@ -1083,7 +1050,6 @@ async fn recovery_ensure_indices_handles_empty_tables() {
|
|||
#[tokio::test]
|
||||
async fn recovery_multi_sidecar_requires_fresh_snapshot_for_correctness() {
|
||||
use omnigraph::loader::{LoadMode, load_jsonl};
|
||||
use omnigraph::table_store::TableStore;
|
||||
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap();
|
||||
|
|
@ -1102,7 +1068,6 @@ async fn recovery_multi_sidecar_requires_fresh_snapshot_for_correctness() {
|
|||
drop(db);
|
||||
|
||||
let person_uri = node_table_uri(uri, "Person");
|
||||
let store = TableStore::new(uri);
|
||||
let mut ds = Dataset::open(&person_uri).await.unwrap();
|
||||
let v1 = ds.version().version;
|
||||
|
||||
|
|
@ -1116,23 +1081,9 @@ async fn recovery_multi_sidecar_requires_fresh_snapshot_for_correctness() {
|
|||
// Bypassing __manifest is what `delete_where` and `append_batch`
|
||||
// both do (direct on Lance); using append_batch (instead of no-op
|
||||
// deletes) is what makes the fragment-set differ across versions.
|
||||
store
|
||||
.append_batch(
|
||||
&person_uri,
|
||||
&mut ds,
|
||||
person_batch(&[("bob-id", "bob", Some(25))]),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
helpers::lance_append_inline(&mut ds, person_batch(&[("bob-id", "bob", Some(25))])).await;
|
||||
let v2 = ds.version().version;
|
||||
store
|
||||
.append_batch(
|
||||
&person_uri,
|
||||
&mut ds,
|
||||
person_batch(&[("carol-id", "carol", Some(40))]),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
helpers::lance_append_inline(&mut ds, person_batch(&[("carol-id", "carol", Some(40))])).await;
|
||||
let v3 = ds.version().version;
|
||||
assert_eq!(v2, v1 + 1);
|
||||
assert_eq!(v3, v2 + 1);
|
||||
|
|
@ -1297,14 +1248,7 @@ async fn recovery_classifies_feature_branch_sidecar_against_feature_branch() {
|
|||
.open_dataset_head(&person_uri, feature_branch_name.as_deref())
|
||||
.await
|
||||
.unwrap();
|
||||
store
|
||||
.append_batch(
|
||||
&person_uri,
|
||||
&mut ds,
|
||||
person_batch(&[("carol-id", "carol", Some(40))]),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
helpers::lance_append_inline(&mut ds, person_batch(&[("carol-id", "carol", Some(40))])).await;
|
||||
let v_head = ds.version().version;
|
||||
assert_eq!(v_head, v_pin + 1, "append must advance HEAD by 1");
|
||||
|
||||
|
|
@ -1419,14 +1363,7 @@ async fn recovery_rolls_back_feature_branch_sidecar_against_feature_branch() {
|
|||
.open_dataset_head(&person_uri, feature_branch_name.as_deref())
|
||||
.await
|
||||
.unwrap();
|
||||
store
|
||||
.append_batch(
|
||||
&person_uri,
|
||||
&mut ds,
|
||||
person_batch(&[("dave-id", "dave", Some(50))]),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
helpers::lance_append_inline(&mut ds, person_batch(&[("dave-id", "dave", Some(50))])).await;
|
||||
let v_head = ds.version().version;
|
||||
assert_eq!(v_head, v_pin + 1);
|
||||
|
||||
|
|
|
|||
|
|
@ -23,6 +23,9 @@ use arrow_schema::{DataType, Field, Schema};
|
|||
use futures::TryStreamExt;
|
||||
use lance::Dataset;
|
||||
use lance::dataset::{WhenMatched, WhenNotMatched};
|
||||
use lance::index::DatasetIndexExt;
|
||||
use lance_index::IndexType;
|
||||
use lance_linalg::distance::MetricType;
|
||||
use lance_table::format::Fragment;
|
||||
use omnigraph::table_store::{StagedWrite, TableStore};
|
||||
use std::sync::Arc;
|
||||
|
|
@ -34,6 +37,22 @@ fn person_schema() -> Arc<Schema> {
|
|||
]))
|
||||
}
|
||||
|
||||
/// Test-only helper: raw `Dataset::append` to advance Lance HEAD without
|
||||
/// going through the manifest. Mirrors `TableStore::append_batch`'s body
|
||||
/// (which is `pub(crate)` after MR-854) — kept local so these
|
||||
/// drift-simulation tests don't depend on the demoted crate-internal API.
|
||||
async fn lance_append_inline_local(ds: &mut Dataset, batch: RecordBatch) {
|
||||
use lance::dataset::{WriteMode, WriteParams};
|
||||
let schema = batch.schema();
|
||||
let reader = arrow_array::RecordBatchIterator::new(vec![Ok(batch)], schema);
|
||||
let params = WriteParams {
|
||||
mode: WriteMode::Append,
|
||||
allow_external_blob_outside_bases: true,
|
||||
..Default::default()
|
||||
};
|
||||
ds.append(reader, Some(params)).await.unwrap();
|
||||
}
|
||||
|
||||
fn person_batch(rows: &[(&str, Option<i32>)]) -> RecordBatch {
|
||||
let ids: Vec<&str> = rows.iter().map(|(id, _)| *id).collect();
|
||||
let ages: Vec<Option<i32>> = rows.iter().map(|(_, age)| *age).collect();
|
||||
|
|
@ -351,7 +370,7 @@ async fn stage_merge_insert_then_commit_persists_merged_view() {
|
|||
/// `write_fragments_internal` lack per-column statistics. The result
|
||||
/// contains only matching committed rows; matching staged rows are
|
||||
/// silently absent. `scanner.use_stats(false)` does not bypass this in
|
||||
/// lance 4.0.0.
|
||||
/// lance 6.0.1.
|
||||
///
|
||||
/// This test pins the actual behavior so a future change either
|
||||
/// preserves it (and updates the doc) or fixes it (and rewrites this
|
||||
|
|
@ -616,6 +635,58 @@ async fn stage_overwrite_replaces_all_fragments() {
|
|||
);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn stage_overwrite_empty_batch_replaces_all_rows() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = format!("{}/people.lance", dir.path().to_str().unwrap());
|
||||
let store = TableStore::new(dir.path().to_str().unwrap());
|
||||
|
||||
let ds = TableStore::write_dataset(
|
||||
&uri,
|
||||
person_batch(&[("alice", Some(30)), ("bob", Some(25))]),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
let pre_version = ds.version().version;
|
||||
|
||||
let target_schema = Arc::new(Schema::new(vec![
|
||||
Field::new("id", DataType::Utf8, false),
|
||||
Field::new("age", DataType::Int32, true),
|
||||
Field::new("nickname", DataType::Utf8, true),
|
||||
]));
|
||||
let staged = store
|
||||
.stage_overwrite(&ds, RecordBatch::new_empty(target_schema.clone()))
|
||||
.await
|
||||
.unwrap();
|
||||
assert!(
|
||||
staged.new_fragments.is_empty(),
|
||||
"empty overwrite should produce a zero-fragment Lance Overwrite transaction"
|
||||
);
|
||||
assert_eq!(
|
||||
staged.removed_fragment_ids.len(),
|
||||
ds.manifest.fragments.len(),
|
||||
"empty overwrite still removes every committed fragment"
|
||||
);
|
||||
assert_eq!(
|
||||
ds.version().version,
|
||||
pre_version,
|
||||
"staging empty overwrite must not advance HEAD"
|
||||
);
|
||||
|
||||
let new_ds = store
|
||||
.commit_staged(Arc::new(ds.clone()), staged.transaction)
|
||||
.await
|
||||
.unwrap();
|
||||
assert_eq!(new_ds.version().version, pre_version + 1);
|
||||
assert_eq!(new_ds.count_rows(None).await.unwrap(), 0);
|
||||
assert!(
|
||||
arrow_schema::Schema::from(new_ds.schema())
|
||||
.field_with_name("nickname")
|
||||
.is_ok(),
|
||||
"empty overwrite must commit the replacement batch schema"
|
||||
);
|
||||
}
|
||||
|
||||
/// `stage_create_btree_index` writes index segments to object storage
|
||||
/// but does NOT advance Lance HEAD until `commit_staged`. After commit,
|
||||
/// the index is queryable.
|
||||
|
|
@ -699,7 +770,7 @@ async fn stage_create_inverted_index_does_not_advance_head_until_commit() {
|
|||
);
|
||||
}
|
||||
|
||||
/// Pin the inline-commit behavior of `delete_where`. Lance 4.0.0 does
|
||||
/// Pin the inline-commit behavior of `delete_where`. Lance 6.0.1 does
|
||||
/// NOT expose a public `DeleteJob::execute_uncommitted`
|
||||
/// (`pub(crate)` — see lance-format/lance#6658). The trait deliberately
|
||||
/// does NOT introduce a `stage_delete` wrapper that would secretly
|
||||
|
|
@ -714,7 +785,6 @@ async fn stage_create_inverted_index_does_not_advance_head_until_commit() {
|
|||
async fn delete_where_advances_head_inline_documents_residual() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = format!("{}/people.lance", dir.path().to_str().unwrap());
|
||||
let store = TableStore::new(dir.path().to_str().unwrap());
|
||||
|
||||
let mut ds = TableStore::write_dataset(
|
||||
&uri,
|
||||
|
|
@ -724,13 +794,11 @@ async fn delete_where_advances_head_inline_documents_residual() {
|
|||
.unwrap();
|
||||
let pre_version = ds.version().version;
|
||||
|
||||
let result = store
|
||||
.delete_where(&uri, &mut ds, "id = 'alice'")
|
||||
.await
|
||||
.unwrap();
|
||||
assert_eq!(result.deleted_rows, 1);
|
||||
let result = ds.delete("id = 'alice'").await.unwrap();
|
||||
ds = (*result.new_dataset).clone();
|
||||
assert_eq!(result.num_deleted_rows, 1);
|
||||
assert!(
|
||||
result.version > pre_version,
|
||||
ds.version().version > pre_version,
|
||||
"delete_where ADVANCES Lance HEAD inline (the residual). When \
|
||||
lance-format/lance#6658 ships and we migrate to stage_delete + \
|
||||
commit_staged, flip this assertion to assert that staging does \
|
||||
|
|
@ -739,9 +807,9 @@ async fn delete_where_advances_head_inline_documents_residual() {
|
|||
}
|
||||
|
||||
/// Companion to `delete_where_*`: pin the inline-commit behavior of
|
||||
/// `create_vector_index`. Lance 4.0.0 vector indices take the
|
||||
/// `create_vector_index`. Lance 6.0.1 vector indices take the
|
||||
/// "segment commit path" which calls `build_index_metadata_from_segments`
|
||||
/// (`pub(crate)` in lance-4.0.0 `src/index.rs:111`). Until upstream
|
||||
/// (`pub(crate)` in lance-6.0.1 `src/index.rs:111`). Until upstream
|
||||
/// exposes that helper (companion ticket to lance-format/lance#6658),
|
||||
/// the trait surface deliberately does NOT include
|
||||
/// `stage_create_vector_index` — same rationale as `stage_delete`'s
|
||||
|
|
@ -780,8 +848,9 @@ async fn create_vector_index_advances_head_inline_documents_residual() {
|
|||
let pre_version = ds.version().version;
|
||||
assert!(!store.has_vector_index(&ds, "embedding").await.unwrap());
|
||||
|
||||
store
|
||||
.create_vector_index(&mut ds, "embedding")
|
||||
let params = lance::index::vector::VectorIndexParams::ivf_flat(1, MetricType::L2);
|
||||
ds.create_index_builder(&["embedding"], IndexType::Vector, ¶ms)
|
||||
.replace(true)
|
||||
.await
|
||||
.unwrap();
|
||||
assert!(
|
||||
|
|
@ -804,7 +873,7 @@ async fn create_vector_index_advances_head_inline_documents_residual() {
|
|||
/// The Lance source confirms this — `restore()` (no args) takes the
|
||||
/// currently-checked-out version's content and applies it via
|
||||
/// `apply_commit` against the latest manifest, advancing HEAD by one.
|
||||
/// See lance-4.0.0 `src/dataset.rs:1106` and the transaction-spec
|
||||
/// See lance-6.0.1 `src/dataset.rs:1106` and the transaction-spec
|
||||
/// example at https://lance.org/format/table/transaction/.
|
||||
///
|
||||
/// If the lance bump (4.0.0 → 4.x) ever changes this delta or the call
|
||||
|
|
@ -815,7 +884,6 @@ async fn create_vector_index_advances_head_inline_documents_residual() {
|
|||
async fn lance_restore_appends_one_commit_with_checked_out_content() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = format!("{}/people.lance", dir.path().to_str().unwrap());
|
||||
let store = TableStore::new(dir.path().to_str().unwrap());
|
||||
|
||||
// Build version history: v1 = {alice}, v2 = {alice, bob}, v3 = {alice, bob, carol}.
|
||||
let mut ds = TableStore::write_dataset(&uri, person_batch(&[("alice", Some(30))]))
|
||||
|
|
@ -823,16 +891,10 @@ async fn lance_restore_appends_one_commit_with_checked_out_content() {
|
|||
.unwrap();
|
||||
assert_eq!(ds.version().version, 1);
|
||||
|
||||
store
|
||||
.append_batch(&uri, &mut ds, person_batch(&[("bob", Some(25))]))
|
||||
.await
|
||||
.unwrap();
|
||||
lance_append_inline_local(&mut ds, person_batch(&[("bob", Some(25))])).await;
|
||||
assert_eq!(ds.version().version, 2);
|
||||
|
||||
store
|
||||
.append_batch(&uri, &mut ds, person_batch(&[("carol", Some(40))]))
|
||||
.await
|
||||
.unwrap();
|
||||
lance_append_inline_local(&mut ds, person_batch(&[("carol", Some(40))])).await;
|
||||
assert_eq!(ds.version().version, 3);
|
||||
|
||||
let head_before = ds.version().version;
|
||||
|
|
@ -878,7 +940,7 @@ async fn lance_restore_appends_one_commit_with_checked_out_content() {
|
|||
/// and any future continuous-recovery reconciler's queue-acquisition
|
||||
/// requirement.
|
||||
///
|
||||
/// `Dataset::restore`'s `check_restore_txn` (lance-4.0.0
|
||||
/// `Dataset::restore`'s `check_restore_txn` (lance-6.0.1
|
||||
/// `src/io/commit/conflict_resolver.rs:986`) returns `Ok(())` against
|
||||
/// almost every other op (Append, Update, Delete, CreateIndex, Merge, …),
|
||||
/// so a Restore commits successfully even with concurrent commits in
|
||||
|
|
@ -908,7 +970,6 @@ async fn lance_restore_appends_one_commit_with_checked_out_content() {
|
|||
async fn lance_restore_loses_to_concurrent_append_via_orphaning() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = format!("{}/people.lance", dir.path().to_str().unwrap());
|
||||
let store = TableStore::new(dir.path().to_str().unwrap());
|
||||
|
||||
// v1: seed with alice.
|
||||
let _ = TableStore::write_dataset(&uri, person_batch(&[("alice", Some(30))]))
|
||||
|
|
@ -925,10 +986,7 @@ async fn lance_restore_loses_to_concurrent_append_via_orphaning() {
|
|||
// This simulates a per-table-queue model where another tenant wrote
|
||||
// between recovery's open and recovery's restore call.
|
||||
let mut writer_handle = Dataset::open(&uri).await.unwrap();
|
||||
store
|
||||
.append_batch(&uri, &mut writer_handle, person_batch(&[("bob", Some(25))]))
|
||||
.await
|
||||
.unwrap();
|
||||
lance_append_inline_local(&mut writer_handle, person_batch(&[("bob", Some(25))])).await;
|
||||
assert_eq!(writer_handle.version().version, 2);
|
||||
|
||||
// Recovery now restores. Because restore's `check_restore_txn` returns
|
||||
|
|
|
|||
|
|
@ -778,6 +778,47 @@ async fn load_with_bad_edge_reference_unblocks_next_load() {
|
|||
);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn load_overwrite_with_bad_edge_reference_unblocks_next_load() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap();
|
||||
let mut db = Omnigraph::init(uri, TEST_SCHEMA).await.unwrap();
|
||||
load_jsonl(&mut db, TEST_DATA, LoadMode::Overwrite)
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
let pre_persons = count_rows(&db, "node:Person").await;
|
||||
let pre_edges = count_rows(&db, "edge:Knows").await;
|
||||
|
||||
let bad = r#"{"type": "Person", "data": {"name": "Mallory", "age": 5}}
|
||||
{"edge": "Knows", "from": "Mallory", "to": "Ghost"}
|
||||
"#;
|
||||
let err = load_jsonl(&mut db, bad, LoadMode::Overwrite)
|
||||
.await
|
||||
.expect_err("RI violation must fail overwrite before commit_staged");
|
||||
let OmniError::Manifest(manifest_err) = err else {
|
||||
panic!("expected Manifest error, got {err:?}");
|
||||
};
|
||||
assert!(
|
||||
manifest_err.message.contains("not found"),
|
||||
"unexpected error: {}",
|
||||
manifest_err.message,
|
||||
);
|
||||
|
||||
assert_eq!(count_rows(&db, "node:Person").await, pre_persons);
|
||||
assert_eq!(count_rows(&db, "edge:Knows").await, pre_edges);
|
||||
|
||||
let good = r#"{"type": "Person", "data": {"name": "Pat", "age": 55}}
|
||||
{"type": "Person", "data": {"name": "Quinn", "age": 56}}
|
||||
{"edge": "Knows", "from": "Pat", "to": "Quinn"}
|
||||
"#;
|
||||
load_jsonl(&mut db, good, LoadMode::Overwrite)
|
||||
.await
|
||||
.unwrap();
|
||||
assert_eq!(count_rows(&db, "node:Person").await, 2);
|
||||
assert_eq!(count_rows(&db, "edge:Knows").await, 1);
|
||||
}
|
||||
|
||||
/// Same shape as the RI test above, but driven by a cardinality
|
||||
/// violation (`@card(0..1)` on `WorksAt`). The staged loader's pending
|
||||
/// edge accumulator drives the cardinality scan; a violation aborts
|
||||
|
|
@ -842,6 +883,56 @@ edge WorksAt: Person -> Company @card(0..1)
|
|||
);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn load_overwrite_with_cardinality_violation_unblocks_next_load() {
|
||||
const CARD_SCHEMA: &str = r#"
|
||||
node Person {
|
||||
name: String @key
|
||||
age: I32?
|
||||
}
|
||||
node Company {
|
||||
name: String @key
|
||||
}
|
||||
edge WorksAt: Person -> Company @card(0..1)
|
||||
"#;
|
||||
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap();
|
||||
let mut db = Omnigraph::init(uri, CARD_SCHEMA).await.unwrap();
|
||||
|
||||
let seed = r#"{"type": "Person", "data": {"name": "Alice", "age": 30}}
|
||||
{"type": "Company", "data": {"name": "Acme"}}
|
||||
{"type": "Company", "data": {"name": "Bigco"}}
|
||||
"#;
|
||||
load_jsonl(&mut db, seed, LoadMode::Overwrite)
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
let pre_works = count_rows(&db, "edge:WorksAt").await;
|
||||
|
||||
let bad = r#"{"edge": "WorksAt", "from": "Alice", "to": "Acme"}
|
||||
{"edge": "WorksAt", "from": "Alice", "to": "Bigco"}
|
||||
"#;
|
||||
let err = load_jsonl(&mut db, bad, LoadMode::Overwrite)
|
||||
.await
|
||||
.expect_err("cardinality violation must fail overwrite before commit_staged");
|
||||
let OmniError::Manifest(manifest_err) = err else {
|
||||
panic!("expected Manifest error, got {err:?}");
|
||||
};
|
||||
assert!(
|
||||
manifest_err.message.contains("@card violation"),
|
||||
"unexpected error: {}",
|
||||
manifest_err.message,
|
||||
);
|
||||
assert_eq!(count_rows(&db, "edge:WorksAt").await, pre_works);
|
||||
|
||||
let good = r#"{"edge": "WorksAt", "from": "Alice", "to": "Acme"}"#;
|
||||
load_jsonl(&mut db, good, LoadMode::Overwrite)
|
||||
.await
|
||||
.unwrap();
|
||||
assert_eq!(count_rows(&db, "edge:WorksAt").await, 1);
|
||||
}
|
||||
|
||||
// ─── Chained-mutation correctness — pinned coverage ─────────────────────────
|
||||
|
||||
/// Chained `update` ops in one query must respect each previous op's
|
||||
|
|
|
|||
|
|
@ -186,7 +186,8 @@ op-2 (insert/update) → read committed via Lance + pending via DataFusion
|
|||
op-N → push batch
|
||||
─── end of query ───────────────────────────────────────
|
||||
finalize: per pending table:
|
||||
concat batches → stage_append OR stage_merge_insert → commit_staged
|
||||
concat batches → stage_append OR stage_merge_insert OR stage_overwrite
|
||||
→ commit_staged
|
||||
publisher: ManifestBatchPublisher::publish (one cross-table CAS)
|
||||
```
|
||||
|
||||
|
|
@ -197,9 +198,10 @@ contracts:
|
|||
- `D₂` parse-time rule: a query is either insert/update-only or
|
||||
delete-only. Mixed → reject. Deletes still inline-commit (Lance
|
||||
4.0.0 has no public two-phase delete); D₂ keeps the inline path safe.
|
||||
- `LoadMode::Overwrite` keeps the inline-commit path
|
||||
(truncate-then-append doesn't fit the staged shape; overwrite has no
|
||||
in-flight read-your-writes requirement).
|
||||
- `LoadMode::Overwrite` uses Lance `Operation::Overwrite` through the
|
||||
same staged path. Loader validation runs against the replacement
|
||||
in-memory batches before any `commit_staged`, and the publish window is
|
||||
covered by `SidecarKind::Load` recovery.
|
||||
- Read sites consume `TableStore::scan_with_pending`, which Lance-scans
|
||||
the committed snapshot at the captured `expected_version` and unions
|
||||
with a DataFusion `MemTable` over the pending batches.
|
||||
|
|
|
|||
|
|
@ -24,6 +24,12 @@ consequences that follow from them.
|
|||
> Terraform-style JSON documents plus backend lock/CAS, not Lance control-plane
|
||||
> datasets. Lance remains a possible later backend only if row-level history or
|
||||
> queryability justifies the extra machinery.
|
||||
>
|
||||
> **Revision 2026-06-09 — single ownership during migration.** Axiom **15**
|
||||
> added: while `omnigraph.yaml` and the cluster catalog coexist, every fact has
|
||||
> exactly one owner at a time — coexistence is a **mode switch, never a merge**.
|
||||
> `omnigraph.yaml` does not get replaced; its job description shrinks to the
|
||||
> permanent per-operator layer.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -72,6 +78,8 @@ invoke_query. This axiom is the target control-plane rule, not a statement
|
|||
about today's server catalog. -->
|
||||
**14. Exposure is a policy decision, not a config flag.** Target design: which stored queries (and the tools/dashboards built on them) an actor may **list or invoke** is decided by the policy layer (Cedar: `invoke_query` + catalog visibility), not by a per-query `expose:` boolean. The registry only says a query *exists* (name → file); **policy says who may see and run it**, so the MCP catalog (`GET /queries`) becomes each actor's policy-permitted set. This supersedes the engine's current `mcp.expose` flag only after per-query `invoke_query` scope and Cedar-filtered catalog listing land; until then, proposals must state the compatibility bridge to today's `mcp.expose` + coarse invocation gate.
|
||||
|
||||
**15. Every fact has exactly one owner at a time; coexistence is a mode switch, never a merge.** `cluster.yaml` is not `omnigraph.yaml` v2 — the two documents end with disjoint jobs, and only the *shared-truth* parts of today's `omnigraph.yaml` (the set of graphs, stored-query registry, policy wiring, server boot source) migrate to the cluster catalog. The per-operator parts — connection/cluster selection, the operator's own credential reference, active graph/branch context, CLI ergonomics — are per-operator *by nature* (Sarah's and Bob's differ) and stay in the per-operator layer permanently; plan a **shrinking job description** for `omnigraph.yaml`, not an exit. During the migration window each fact is read from exactly one source at a time: a deployment serves from `omnigraph.yaml` **or** boots from cluster state (an exclusive mode switch), never from a precedence-merge of both. Two readers for one fact is the brittle-backcompat failure mode — it is the deny-list's "state that drifts from what it can be derived from" wearing a compatibility costume. Any compatibility bridge must name its replacement and its removal phase (the `mcp.expose` → policy-owned exposure bridge of axiom 14 is the template); bridges that accumulate without an exit are rejected at review.
|
||||
|
||||
---
|
||||
|
||||
## The one-line compression
|
||||
|
|
@ -82,7 +90,7 @@ about today's server catalog. -->
|
|||
|
||||
## How to use this file
|
||||
|
||||
- **Reviewing a proposal:** walk axioms 0–14; any conflict is the burden of the proposer to justify. The most common tensions:
|
||||
- **Reviewing a proposal:** walk axioms 0–15; any conflict is the burden of the proposer to justify. The most common tensions:
|
||||
- Treating the *running system* as the source of truth for **intent** → axioms 2, 4 (intent lives in config).
|
||||
- Treating state as a throwaway derivation rather than an authoritative, locked, backend-held ledger → axiom 5, 12.
|
||||
- A runtime config-mutation API instead of declarative apply → axiom 3.
|
||||
|
|
@ -94,4 +102,5 @@ about today's server catalog. -->
|
|||
- A secret value (token, embedding key, pipeline source credential) inline in config instead of in the gitignored `.env` file → axiom 10.
|
||||
- A per-query `expose:`/visibility flag in target-state cluster config instead of governing list/invoke in policy; or failing to account for today's `mcp.expose` compatibility bridge → axiom 14.
|
||||
- Shipping `apply` before hermetic `validate` + read-only `plan` tests, or shipping graph/schema-moving apply before recovery tests for the graph/resource-moved-before-cluster-publish gap → axiom 5 and axiom 12.
|
||||
- Reading one fact from both `omnigraph.yaml` and the cluster catalog with precedence rules (a merge instead of a mode switch), migrating per-operator concerns into shared cluster config, or adding a compatibility bridge with no named replacement and removal phase → axiom 15.
|
||||
- **Citing:** reference axioms by number in PRs and review comments so the rationale is stable across renames and refactors.
|
||||
|
|
|
|||
|
|
@ -64,6 +64,21 @@ is trying to create. -->
|
|||
identity. It is not committed into `cluster.yaml`.
|
||||
6. `mcp.expose` remains supported in current `omnigraph.yaml` until the
|
||||
per-query policy replacement ships.
|
||||
7. **Single ownership (axiom 15).** While `omnigraph.yaml` and the cluster
|
||||
catalog coexist, each fact is read from exactly one source at a time.
|
||||
Phase 5 server boot is an exclusive mode switch — boot from cluster state
|
||||
XOR from `omnigraph.yaml` — never a precedence-merge of both. No phase may
|
||||
introduce a surface that reads the same fact (graph set, query registry,
|
||||
policy wiring, bind address) from both sources with tie-break rules.
|
||||
8. **`omnigraph.yaml` shrinks; it does not get deprecated.** Its terminal role
|
||||
is the per-operator layer: connection/cluster selection, the operator's
|
||||
credential reference, active graph/branch context, CLI ergonomics, and
|
||||
purely personal aliases (target home: the operator's global config dir per
|
||||
RFC-002). Shared-truth keys migrate to `cluster.yaml`; per-operator keys
|
||||
never do.
|
||||
9. **Bridges carry sunsets.** Every compatibility bridge names its replacement
|
||||
and the phase that removes it (`mcp.expose` → Phase 6 policy-owned exposure
|
||||
is the template). A bridge without an exit is a review-blocking finding.
|
||||
|
||||
## Terraform-Aligned Schema Validation
|
||||
|
||||
|
|
@ -335,8 +350,6 @@ Target Phase-1 cluster-root layout:
|
|||
<ulid>.json
|
||||
recoveries/
|
||||
<ulid>.json
|
||||
recovery/
|
||||
<ulid>.json
|
||||
resources/
|
||||
query/<graph>/<name>/<digest>.gq
|
||||
policy/<name>/<digest>.yaml
|
||||
|
|
@ -586,7 +599,9 @@ replacement would make every invariant harder to audit. -->
|
|||
|
||||
- Allow server startup from cluster state.
|
||||
- Add status and catalog endpoints as needed.
|
||||
- Keep the current `omnigraph.yaml` startup path as compatibility mode.
|
||||
- Keep the current `omnigraph.yaml` startup path as compatibility mode — an
|
||||
**exclusive mode switch** per deployment (cluster state XOR `omnigraph.yaml`),
|
||||
never a merged read of both (Compatibility Stance #7, axiom 15).
|
||||
- Regenerate OpenAPI for any HTTP surface.
|
||||
|
||||
### Phase 6: Policy-Owned Query Exposure
|
||||
|
|
|
|||
|
|
@ -387,6 +387,65 @@ This proposal:
|
|||
|
||||
The connection/credential/preference layer remains per operator: it points at a cluster, resolves that operator's identity, and holds personal ergonomics. The cluster config stays shared, secret-free, and reviewable; the state ledger stays authoritative and locked.
|
||||
|
||||
### Migration model: single ownership, mode switch, shrinking job description (axiom 15)
|
||||
|
||||
`omnigraph.yaml` is not being replaced; its **job description shrinks**. Only the
|
||||
shared-truth parts of its current role migrate to the cluster catalog (the set of
|
||||
graphs, the stored-query registry, policy wiring, the server boot source). The
|
||||
per-operator parts are per-operator *by nature* — Sarah's and Bob's differ — and
|
||||
keep `omnigraph.yaml`/the per-operator layer as a permanent, well-defined home.
|
||||
|
||||
While both exist, **each fact has exactly one owner at any moment, and
|
||||
coexistence is a mode switch, never a merge**. The brittle version of backward
|
||||
compatibility — the server reading graphs from `omnigraph.yaml` *and* from
|
||||
cluster state with precedence rules gluing them together — is rejected outright:
|
||||
two readers for one truth means every bug becomes "which file won?" and every
|
||||
feature pays the tax twice. The realistic timeline has three windows:
|
||||
|
||||
1. **Now → Phase 4 (no conflict).** Cluster apply writes only to its own catalog
|
||||
(`__cluster/`); `omnigraph.yaml` serves traffic. `Applied` status must
|
||||
visibly mean "recorded in the cluster catalog, not yet serving" so the
|
||||
overlap is loud, not hidden.
|
||||
2. **Phase 5 (the mode switch).** A deployment opts into booting from cluster
|
||||
state; `omnigraph.yaml`'s server-role keys become inert *for that
|
||||
deployment*. Exclusive — boot from cluster state XOR `omnigraph.yaml` — with
|
||||
no key-level aliasing and no merged precedence.
|
||||
3. **Phase 6+ (bridges with sunsets).** Targeted compatibility bridges are
|
||||
allowed only with a named replacement and a removal phase; `mcp.expose` →
|
||||
policy-owned exposure is the template. Bridges that accumulate without an
|
||||
exit are review-rejected.
|
||||
|
||||
Key-by-key compatibility inside one evolving file is the expensive kind of
|
||||
backcompat (the v1 `omnigraph.yaml` reshape's `--target`/legacy-key regressions
|
||||
are the in-repo cautionary tale); resource-ownership seams between two files
|
||||
with a mode switch is the cheap kind. Police the single-owner rule in every
|
||||
Phase 3–6 PR: a proposal that merges the two sources for one fact is the
|
||||
deny-list's "state that drifts from what it can be derived from" wearing a
|
||||
compatibility costume.
|
||||
|
||||
### The per-operator layer: contents and destination
|
||||
|
||||
The per-operator layer must be **complete** — everything an operator needs to
|
||||
work against any cluster from any directory, and nothing that two operators must
|
||||
agree on:
|
||||
|
||||
| Per-operator concern | Today | Target |
|
||||
|---|---|---|
|
||||
| Connection (which cluster/server, named endpoints) | `omnigraph.yaml` `graphs.<name>` URIs / `servers:` refs | global config, per-operator |
|
||||
| Operator credential **reference** (`bearer_token_env`, env-file lookup) | `omnigraph.yaml` + `.env` | global config references; secret values stay in env/`.env`, never in any config |
|
||||
| Active context (current graph/branch selection) | ad-hoc per-command flags / `defaults` | global state layer (e.g. `omnigraph use`), explicitly **not** the cluster state ledger (axiom 5's "state" is the applied-cluster ledger, not a personal selection) |
|
||||
| CLI ergonomics (output format, table layout) | `omnigraph.yaml` `cli:`/`defaults:` | global config, per-operator |
|
||||
| Personal command shortcuts (purely personal aliases) | `omnigraph.yaml` `aliases:` | global config; *shared* aliases (team vocabulary) are cluster config — see the aliases split note above |
|
||||
|
||||
Destination: this layer belongs in the operator's **global config dir**
|
||||
(`~/.omnigraph`, per the RFC-002 global-first layered-config direction —
|
||||
global config + active-context state file), not in a repo-committed file, so it
|
||||
survives `git clone`, works from any directory, and never collides with the
|
||||
shared cluster folder. The RFC-002 layering implementation is currently parked
|
||||
(PRs #139/#162 closed over review findings), but the *boundary* it draws is the
|
||||
one this spec depends on: per-operator → global dir; shared deployment intent →
|
||||
the cluster config folder; deployed reality → the state ledger.
|
||||
|
||||
Implementation gate: the Terraform-style workflow must be testable in order.
|
||||
`cluster validate` must catch bad config before any apply path exists;
|
||||
read-only `cluster plan` must have deterministic structured-plan tests before
|
||||
|
|
|
|||
|
|
@ -84,7 +84,7 @@ Resolves expression values to literals, converts to typed Arrow arrays (`literal
|
|||
- `insert` (no `@key`, edges) → accumulate into `MutationStaging.pending` (Append mode); finalize calls `stage_append` once per touched table.
|
||||
- `insert` (`@key` node) → accumulate into `pending` (Merge mode); finalize calls `stage_merge_insert` once per touched table.
|
||||
- `update` → scan committed via Lance + pending via DataFusion `MemTable` (read-your-writes), apply assignments, accumulate into `pending` (Merge mode).
|
||||
- `delete` → still inline-commits via `delete_where` (Lance 4.0.0 has no public two-phase delete); recorded into `MutationStaging.inline_committed`.
|
||||
- `delete` → still inline-commits via `delete_where` (Lance v6.0.1 has no public two-phase delete; `DeleteBuilder::execute_uncommitted` first ships in v7.0.0-beta.10 — tracked as MR-A in [docs/dev/lance.md](lance.md)); recorded into `MutationStaging.inline_committed`.
|
||||
|
||||
**D₂ parse-time rule.** A single mutation query is either insert/update-only or delete-only. Mixed → reject before any I/O. The check fires in `enforce_no_mixed_destructive_constructive(&ir)` inside `execute_named_mutation`.
|
||||
|
||||
|
|
|
|||
|
|
@ -102,7 +102,7 @@ Use it this way:
|
|||
| Branch delete | Manifest is the single authority, flipped atomically first; per-table forks + commit-graph branch are derived state, reclaimed best-effort (`force_delete_branch`) with the `cleanup` reconciler as the guaranteed backstop. Reusing a name whose reclaim failed before `cleanup` surfaces an actionable error | [branches-commits.md](../user/branches-commits.md), [maintenance.md](../user/maintenance.md) |
|
||||
| Schema validation | Type checks, required fields, defaults, edge endpoint checks, and edge cardinality are enforced on write paths | [schema-language.md](../user/schema-language.md), [execution.md](execution.md) |
|
||||
| Unique constraints | Intra-batch and write-path checks exist; intake and branch-merge derive the composite key through one shared function (`loader::composite_unique_key`, a separator-free `Vec<String>` tuple) and fail loudly on an un-keyable column type rather than silently exempting it; full cross-version uniqueness against already-committed rows is still a gap | [schema-language.md](../user/schema-language.md) |
|
||||
| Storage trait | `TableStorage` exists as the sealed staged-write surface; full call-site migration and capability/stat surfaces are incomplete | [writes.md](writes.md), [architecture.md](architecture.md) |
|
||||
| Storage trait | `TableStorage` (via `db.storage()`) is staged-only; the inline-commit residuals (`delete_where`, `create_vector_index`) are split onto a separate sealed `InlineCommitResidual` trait reached via `db.storage_inline_residual()` (MR-854), so §1 holds by construction; capability/stat surfaces are roadmap | [writes.md](writes.md), [architecture.md](architecture.md) |
|
||||
| Index lifecycle | `ensure_indices` is explicit today; reconciler-based convergence is roadmap | [indexes.md](../user/indexes.md), [maintenance.md](../user/maintenance.md) |
|
||||
| Traversal IDs | Runtime still builds `TypeIndex`; Lance stable row-id based graph IDs are roadmap | [architecture.md](architecture.md), [query-language.md](../user/query-language.md) |
|
||||
| Auth | Bearer token hashing and server-side actor resolution are implemented at the HTTP boundary | [server.md](../user/server.md), [policy.md](../user/policy.md) |
|
||||
|
|
@ -124,9 +124,16 @@ them explicit.
|
|||
renames. The current compiler still derives type IDs from `kind:name`; this
|
||||
must be fixed before relying on renamed IDs across accepted schemas.
|
||||
- **Storage abstraction:** `TableStorage` is present, sealed, and canonical for
|
||||
staged writes, but older inherent `TableStore` call sites and inline residuals
|
||||
remain. New write paths should use the staged shape unless a documented Lance
|
||||
blocker applies.
|
||||
staged writes. MR-854 sealed it: `db.storage()` exposes only staged primitives
|
||||
+ reads, and the inline-commit residuals are split onto a separate sealed
|
||||
`InlineCommitResidual` trait reached via `db.storage_inline_residual()`, so a
|
||||
new writer cannot couple a write with a HEAD advance through the default
|
||||
surface. The dead legacy methods (`append_batch` on the trait,
|
||||
`merge_insert_batch{,es}`, `create_{btree,inverted}_index`) were removed. The
|
||||
remaining residuals are `delete_where` (gated on MR-A — Lance v7.x bump)
|
||||
and `create_vector_index` (gated on Lance #6666); see
|
||||
[lance.md](lance.md) and [writes.md](writes.md). New write paths should use
|
||||
the staged shape unless a documented Lance blocker applies.
|
||||
- **Deletes and vector indexes:** `delete_where` and vector index creation still
|
||||
advance Lance HEAD inline because the required public Lance APIs are missing.
|
||||
Keep D2 and recovery coverage in place until those residuals are removed.
|
||||
|
|
|
|||
|
|
@ -55,18 +55,18 @@ Adding/changing index types, fixing coverage, debugging FTS or vector recall, de
|
|||
|
||||
| Topic | URL |
|
||||
|---|---|
|
||||
| Index spec overview | https://lance.org/format/table/index/ |
|
||||
| BTREE scalar index | https://lance.org/format/table/index/scalar/btree/ |
|
||||
| Bitmap scalar index | https://lance.org/format/table/index/scalar/bitmap/ |
|
||||
| Bloom-filter scalar index | https://lance.org/format/table/index/scalar/bloom_filter/ |
|
||||
| Label-list scalar index | https://lance.org/format/table/index/scalar/label_list/ |
|
||||
| Zone-map scalar index | https://lance.org/format/table/index/scalar/zonemap/ |
|
||||
| R-Tree scalar index (spatial) | https://lance.org/format/table/index/scalar/rtree/ |
|
||||
| Full-text search (FTS) index | https://lance.org/format/table/index/scalar/fts/ |
|
||||
| N-gram scalar index | https://lance.org/format/table/index/scalar/ngram/ |
|
||||
| Vector index | https://lance.org/format/table/index/vector/ |
|
||||
| Fragment-reuse system index | https://lance.org/format/table/index/system/frag_reuse/ |
|
||||
| MemWAL system index | https://lance.org/format/table/index/system/mem_wal/ |
|
||||
| Index spec overview | https://lance.org/format/index/ |
|
||||
| BTREE scalar index | https://lance.org/format/index/scalar/btree/ |
|
||||
| Bitmap scalar index | https://lance.org/format/index/scalar/bitmap/ |
|
||||
| Bloom-filter scalar index | https://lance.org/format/index/scalar/bloom_filter/ |
|
||||
| Label-list scalar index | https://lance.org/format/index/scalar/label_list/ |
|
||||
| Zone-map scalar index | https://lance.org/format/index/scalar/zonemap/ |
|
||||
| R-Tree scalar index (spatial) | https://lance.org/format/index/scalar/rtree/ |
|
||||
| Full-text search (FTS) index | https://lance.org/format/index/scalar/fts/ |
|
||||
| N-gram scalar index | https://lance.org/format/index/scalar/ngram/ |
|
||||
| Vector index | https://lance.org/format/index/vector/ |
|
||||
| Fragment-reuse system index | https://lance.org/format/index/system/frag_reuse/ |
|
||||
| MemWAL system index | https://lance.org/format/index/system/mem_wal/ |
|
||||
| HNSW Rust example | https://lance.org/examples/rust/hnsw/ |
|
||||
| Distributed indexing | https://lance.org/guide/distributed_indexing/ |
|
||||
| Tokenizer (FTS, n-gram) | https://lance.org/guide/tokenizer/ |
|
||||
|
|
@ -125,7 +125,7 @@ Touching `omnigraph optimize` / `cleanup`, the underlying `compact_files` / `cle
|
|||
|---|---|
|
||||
| Read-and-write guide (covers `compact_files`, `cleanup_old_versions`) | https://lance.org/guide/read_and_write/ |
|
||||
| Performance (compaction tradeoffs) | https://lance.org/guide/performance/ |
|
||||
| Fragment-reuse index | https://lance.org/format/table/index/system/frag_reuse/ |
|
||||
| Fragment-reuse index | https://lance.org/format/index/system/frag_reuse/ |
|
||||
|
||||
### DataFusion integration
|
||||
|
||||
|
|
|
|||
|
|
@ -48,7 +48,7 @@ shared by both `mutate_as` and the bulk loader:
|
|||
touched sub-tables. Cross-table conflicts surface as
|
||||
`ManifestConflictDetails::ExpectedVersionMismatch`.
|
||||
- **Deletes still inline-commit.** Lance's `Dataset::delete` is not
|
||||
exposed as a two-phase op in 4.0.0; deletes go through `delete_where`
|
||||
exposed as a two-phase op in 6.0.1; deletes go through `delete_where`
|
||||
immediately and record their post-write state in
|
||||
`MutationStaging.inline_committed`. The parse-time D₂ rule (below)
|
||||
prevents inserts/updates from coexisting with deletes in one query,
|
||||
|
|
@ -82,16 +82,14 @@ Three writers have been migrated onto staged primitives:
|
|||
* **`ensure_indices`** (`db/omnigraph/table_ops.rs::build_indices_on_dataset_for_catalog`)
|
||||
— scalar indices (BTree, Inverted) now use `stage_create_*_index` +
|
||||
`commit_staged`. Vector indices stay inline (residual — Lance
|
||||
`build_index_metadata_from_segments` is `pub(crate)` in 4.0.0;
|
||||
`build_index_metadata_from_segments` is `pub(crate)` in 6.0.1;
|
||||
companion ticket to lance-format/lance#6658 needed).
|
||||
* **`branch_merge::publish_rewritten_merge_table`**
|
||||
(`exec/merge.rs`) — merge_insert now uses `stage_merge_insert` +
|
||||
`commit_staged`. Deletes stay inline (Lance #6658 residual).
|
||||
* **`schema_apply` rewritten_tables** (`db/omnigraph/schema_apply.rs`)
|
||||
— non-empty rewrites use `stage_overwrite` + `commit_staged`.
|
||||
Empty-batch rewrites stay inline (Lance `InsertBuilder::execute_uncommitted`
|
||||
rejects empty data; the empty case is rare and bounded by the
|
||||
schema-apply lock branch).
|
||||
— rewrites use `stage_overwrite` + `commit_staged`, including empty-table
|
||||
rewrites via a zero-fragment Lance `Operation::Overwrite`.
|
||||
|
||||
A defense-in-depth integration test (`tests/forbidden_apis.rs`) walks
|
||||
engine source and fails if non-allow-listed code calls Lance's
|
||||
|
|
@ -106,34 +104,32 @@ the same drift class. Closing it requires either upstream Lance
|
|||
multi-dataset commit OR the omnigraph-side recovery-on-open reconciler
|
||||
described in `.context/mr-793-design.md` §15 (deferred to MR-795).
|
||||
|
||||
### Inline-commit method residuals on `TableStorage` (MR-793 acceptance §1 option b)
|
||||
### Inline-commit residuals live on `InlineCommitResidual`, not `db.storage()` (MR-793 acceptance §1, by construction)
|
||||
|
||||
MR-793's acceptance criterion §1 ("`TableStore` public API has no method that performs a manifest commit as a side effect of writing") is met **per-method** by enumerating every inline-commit method that remains on the trait surface, naming why it cannot yet be removed, and keeping the residual comment at every call site:
|
||||
MR-793's acceptance criterion §1 ("`TableStore` (or successor) public API has no method that performs a manifest commit as a side effect of writing") holds **by construction** after MR-854. `db.storage()` (`&dyn TableStorage`) exposes only staged primitives + reads; the inline-commit writes Lance cannot yet stage live on a separate `InlineCommitResidual` trait reached via `Omnigraph::storage_inline_residual()`. A new engine writer cannot couple a write with a Lance HEAD advance through the default surface — it would have to name the residual accessor explicitly. The dead legacy methods (trait `append_batch` / `merge_insert_batches`, inherent `merge_insert_batch{,es}`, `create_{btree,inverted}_index`) were removed; appends/merges and scalar index builds all use the `stage_*` primitives.
|
||||
|
||||
| Method on `TableStore` | Inline-commit reason | Closes when |
|
||||
Two methods remain on `InlineCommitResidual`, each named honestly at its call site:
|
||||
|
||||
| Residual method | Inline-commit reason | Closes when |
|
||||
|---|---|---|
|
||||
| `delete_where` | `DeleteJob` is `pub(crate)` in lance-4.0.0 — no public two-phase delete API | [lance-format/lance#6658](https://github.com/lance-format/lance/issues/6658) lands and `stage_delete` joins the trait |
|
||||
| `create_vector_index` | Vector indices take Lance's "segment commit path"; the helper `build_index_metadata_from_segments` is `pub(crate)` | [lance-format/lance#6666](https://github.com/lance-format/lance/issues/6666) lands and `stage_create_vector_index` joins the trait |
|
||||
| `append_batch` | Legacy inherent method; some engine call sites haven't migrated to `stage_append + commit_staged` yet | MR-793 Phase 1b (call-site conversion) + Phase 9 (demote to `pub(crate)`) |
|
||||
| `merge_insert_batch` / `merge_insert_batches` | Legacy inherent method | Same — Phase 1b + Phase 9 |
|
||||
| `overwrite_batch` | Legacy inherent method | Same — Phase 1b + Phase 9 |
|
||||
| `create_btree_index` (inherent) | Legacy inherent method (the migrated callers use `stage_create_btree_index` + `commit_staged`; the inherent stays for tests / un-migrated paths) | Same — Phase 1b + Phase 9 |
|
||||
| `create_inverted_index` (inherent) | Same | Same — Phase 1b + Phase 9 + index-class split (MR-848) |
|
||||
| `truncate_table` (inherent on `TableStore`) | Used by `overwrite_batch` internally | Phase 9 |
|
||||
| `delete_where` | `DeleteBuilder::execute_uncommitted` is not in Lance v6.0.1 (closed upstream as [#6658](https://github.com/lance-format/lance/issues/6658) but first ships in `v7.0.0-beta.10`); see [docs/dev/lance.md](lance.md) | MR-A: Lance v7.x bump migrates `delete_where` to staged, retires the parse-time D₂ mutation rule, and extends recovery sidecar coverage |
|
||||
| `create_vector_index` | Vector indices take Lance's "segment commit path"; `build_index_metadata_from_segments` is `pub(crate)` (Lance [#6666](https://github.com/lance-format/lance/issues/6666) still open) | Lance #6666 lands and `stage_create_vector_index` joins the staged surface |
|
||||
|
||||
After **lance#6658 + lance#6666 ship + MR-793 Phase 1b + MR-793 Phase 9 all complete**, the trait surface exposes only staged-write primitives + `commit_staged`. Until then this matrix names every residual explicitly, every call site carries a one-line residual comment, and no engine code outside `table_store.rs` is permitted to reach the inline-commit Lance APIs (enforced by the `tests/forbidden_apis.rs` guard).
|
||||
The `tests/forbidden_apis.rs` guard still catches direct `lance::*` inline-commit misuse outside the storage layer; the trait split makes the staged-only default a type-system guarantee on top of it.
|
||||
|
||||
### `LoadMode::Overwrite` residual
|
||||
### `LoadMode::Overwrite` uses staged Lance `Overwrite`
|
||||
|
||||
The bulk loader's Append and Merge modes use the staged-write path
|
||||
described above. `LoadMode::Overwrite` keeps the legacy inline-commit
|
||||
path: truncate-then-append doesn't fit the staged shape cleanly in
|
||||
Lance 4.0.0, and overwrite has no in-flight read-your-writes
|
||||
requirement (the prior data is being wiped). A mid-overwrite failure
|
||||
can leave Lance HEAD on a partially-truncated table; the next overwrite
|
||||
will replace it. Operator-driven (rare in agent workloads); document
|
||||
permanently until Lance exposes `Operation::Overwrite { fragments }` as
|
||||
a two-phase op.
|
||||
The bulk loader's Append, Merge, and Overwrite modes all use the
|
||||
staged-write path described above. `LoadMode::Overwrite` accumulates
|
||||
replacement batches in memory, validates node/edge constraints, referential
|
||||
integrity, and edge cardinality before any Lance HEAD movement, stages
|
||||
each touched table with Lance `Operation::Overwrite`, then runs
|
||||
`commit_staged` under the normal `SidecarKind::Load` recovery sidecar
|
||||
before publishing `__manifest`. `OMNIGRAPH_LOAD_CONCURRENCY` applies to the
|
||||
fragment-writing stage only; the commit and manifest publish still run
|
||||
under the per-table write queues. Empty-table overwrite is represented as
|
||||
a valid zero-fragment Lance `Overwrite` transaction, not as
|
||||
truncate-then-append.
|
||||
|
||||
### Open-time recovery sweep
|
||||
|
||||
|
|
@ -286,7 +282,7 @@ guarantee — the in-memory accumulator evaporates with the dropped task
|
|||
and no Lance write was ever issued.
|
||||
|
||||
For delete-touching mutations the legacy inline-commit shape is
|
||||
preserved (Lance has no public two-phase delete in 4.0.0) — the same
|
||||
preserved (Lance has no public two-phase delete in 6.0.1) — the same
|
||||
narrow window remains. The parse-time D₂ rule prevents inserts/updates
|
||||
from coexisting with deletes in one query, so a pure-delete failure
|
||||
cannot drift any staged-table state. If a delete-only multi-table
|
||||
|
|
|
|||
|
|
@ -27,6 +27,19 @@ omnigraph cluster force-unlock <LOCK_ID> --config ./company-brain --json
|
|||
`--config` points at a directory, not a file. The directory must contain
|
||||
`cluster.yaml`. When omitted, it defaults to the current directory.
|
||||
|
||||
## Relationship to `omnigraph.yaml`
|
||||
|
||||
`cluster.yaml` does not replace `omnigraph.yaml`, and the two never describe
|
||||
the same fact. `omnigraph.yaml` remains how the CLI and server are configured
|
||||
today (graph targets, server bind, CLI defaults, credential env references) and
|
||||
is its long-term home for per-operator settings. `cluster.yaml` is the shared
|
||||
desired state of a whole deployment, read only by the `cluster` commands via
|
||||
`--config`. In the current stage, nothing recorded in the cluster state ledger
|
||||
affects what a server serves or what other CLI commands target — the cluster
|
||||
catalog is inspectable, not live. When server boot from cluster state ships in
|
||||
a later stage, it will be an explicit per-deployment mode switch, not a merge
|
||||
of the two files.
|
||||
|
||||
## Supported `cluster.yaml`
|
||||
|
||||
Stage 2C accepts only the read-only resource subset:
|
||||
|
|
|
|||
|
|
@ -72,7 +72,7 @@ A single mutation query must be **either insert/update-only or delete-only**. Mi
|
|||
|
||||
> `mutation '<name>' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes. This restriction lifts when Lance exposes a two-phase delete API (tracked: MR-793 / Lance-upstream).`
|
||||
|
||||
Reason: under the staged-write rewire (MR-794), inserts and updates accumulate in memory and commit at end-of-query, while deletes still inline-commit (Lance 4.0.0 has no public two-phase delete). Mixing creates ordering hazards (same-row insert→delete becomes a no-op because the staged insert isn't visible to delete; cascading deletes of just-inserted edges break referential integrity by silent design). Until Lance exposes `DeleteJob::execute_uncommitted`, the parse-time rejection keeps both paths atomic and correct. See [docs/dev/writes.md](../dev/writes.md) and [docs/dev/invariants.md](../dev/invariants.md).
|
||||
Reason: under the staged-write rewire (MR-794), inserts and updates accumulate in memory and commit at end-of-query, while deletes still inline-commit (Lance v6.0.1 has no public two-phase delete). Mixing creates ordering hazards (same-row insert→delete becomes a no-op because the staged insert isn't visible to delete; cascading deletes of just-inserted edges break referential integrity by silent design). Until the MR-A Lance v7 bump migrates `delete_where` to staged (`DeleteBuilder::execute_uncommitted` first ships in `v7.0.0-beta.10`), the parse-time rejection keeps both paths atomic and correct. See [docs/dev/writes.md](../dev/writes.md), [docs/dev/lance.md](../dev/lance.md), and [docs/dev/invariants.md](../dev/invariants.md).
|
||||
|
||||
## IR (Intermediate Representation)
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue