diff --git a/AGENTS.md b/AGENTS.md index ae6e744..90fa6a9 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -241,10 +241,10 @@ omnigraph policy explain --actor act-alice --action change --branch main | Per-dataset versioning + time travel | ✅ | `snapshot_at_version`, `entity_at`, snapshot-pinned reads across many tables | | Per-dataset branches | ✅ | **Graph-level** branches (atomic across all sub-tables), lazy fork, system branch filtering | | Atomic single-dataset commits | ✅ | **Multi-table publish via three layers**, NOT a single Lance primitive: (1) per-table Lance `commit_staged` for the data write, (2) `__manifest` row-level CAS via `ManifestBatchPublisher` for cross-table ordering, (3) the open-time recovery sweep for the residual gap between (1) and (2). All three layers ship; the five migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`, `optimize_all_tables`) write a `__recovery/{ulid}.json` sidecar before Phase B and delete it after Phase C. The next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the sweep in `db/manifest/recovery.rs`: classify, decide all-or-nothing per sidecar, roll forward via single `ManifestBatchPublisher::publish` or roll back via `Dataset::restore` followed by a manifest publish of the restored version (so both directions converge to `manifest == HEAD` — no residual drift), and record an audit row in `_graph_commit_recoveries.lance` (queryable via `omnigraph commit list --filter actor=omnigraph:recovery`). The write entry points (`load_as`, `mutate_as`, `apply_schema_as`, `branch_merge_as`) and `refresh` additionally run an in-process roll-forward-only heal (serialized against live writers via the per-table write queues), so a long-lived server converges on its next write without restart; only rollback-eligible sidecars still defer to the next read-write open (a future background reconciler's goal). Engine writes route through a sealed `TableStorage` trait (`db.storage()`) exposing only `stage_*` + `commit_staged` + reads; the inline-commit residuals (`delete_where`, `create_vector_index`) are split onto a separate sealed `InlineCommitResidual` trait reached via `db.storage_inline_residual()` (MR-854), so the default surface cannot couple a write with a HEAD advance — §1 holds by construction. `delete_where` and `create_vector_index` stay inline until upstream Lance ships a public two-phase API ([#6658](https://github.com/lance-format/lance/issues/6658), [#6666](https://github.com/lance-format/lance/issues/6666)); `LoadMode::Overwrite` uses Lance `Overwrite` staged transactions. | -| Compaction (`compact_files`) | ✅ | `omnigraph optimize` orchestrates over all node/edge tables, bounded concurrency; **publishes each compacted table's new version to `__manifest`** (so the manifest tracks the Lance HEAD — required for reads to observe compaction and for schema apply / strict writes to pass their HEAD-vs-manifest precondition), under the per-`(table, main)` write queue with `SidecarKind::Optimize` recovery coverage; **refuses on an unrecovered graph** (errors if a `__recovery` sidecar is pending); **skips uncovered HEAD > manifest drift** with `DriftNeedsRepair` instead of interpreting it; **skips blob-bearing tables** (reported via `TableOptimizeStats.skipped`, not silent), gated on `LANCE_SUPPORTS_BLOB_COMPACTION` until the upstream blob-v2 compaction-decode bug is fixed (see [docs/dev/invariants.md](docs/dev/invariants.md) Known Gaps) | +| Compaction (`compact_files`) + reindex (`optimize_indices`) | ✅ | `omnigraph optimize` orchestrates over all node/edge tables, bounded concurrency; per table runs `compact_files` **then Lance `optimize_indices`** (folds appended/rewritten fragments back into existing indexes — incremental merge, not retrain) and **publishes the resulting version to `__manifest`** (so the manifest tracks the Lance HEAD — required for reads to observe the work and for schema apply / strict writes to pass their HEAD-vs-manifest precondition), under the per-`(table, main)` write queue with `SidecarKind::Optimize` recovery coverage spanning both ops; **commits even with no compaction work if index coverage is stale**; **refuses on an unrecovered graph**; **skips uncovered HEAD > manifest drift** with `DriftNeedsRepair`; **skips blob-bearing tables** (reported via `TableOptimizeStats.skipped`, not silent; reindex is skipped for them too today), gated on `LANCE_SUPPORTS_BLOB_COMPACTION` until the upstream blob-v2 compaction-decode bug is fixed (see [docs/dev/invariants.md](docs/dev/invariants.md) Known Gaps) | | Repair uncovered drift | — | `omnigraph repair` explicitly classifies uncovered table `HEAD > manifest` drift: verified maintenance drift (`ReserveFragments`/`Rewrite`) can be published with `--confirm`; suspicious or unverifiable drift requires `--force --confirm`. Sidecar-covered crash residuals still recover automatically on open. | | Cleanup (`cleanup_old_versions`) | ✅ | `omnigraph cleanup` with `--keep` / `--older-than` policy | -| BTREE / inverted (FTS) / vector indexes | ✅ | `ensure_indices` builds them on every relevant column; idempotent; lazy across branches | +| BTREE / inverted (FTS) / vector indexes | ✅ | `ensure_indices` builds them per `@index`/`@key` column, dispatched by type via `node_prop_index_kind` (enum + orderable scalar → BTREE, free-text String → FTS, Vector → vector); idempotent; lazy across branches. Coverage of fragments appended after build is restored by `optimize`'s `optimize_indices` pass (see Compaction row). | | `merge_insert` upsert | ✅ | `LoadMode::Merge`, mutation `update`/`insert`/`delete` lowering | | Vector search | ✅ | `nearest()` query op; embedding pipeline (Gemini / OpenAI clients); `@embed` in schema | | Full-text search | ✅ | `search/fuzzy/match_text/bm25` query ops | diff --git a/crates/omnigraph/src/db/omnigraph/optimize.rs b/crates/omnigraph/src/db/omnigraph/optimize.rs index 21629a8..9195256 100644 --- a/crates/omnigraph/src/db/omnigraph/optimize.rs +++ b/crates/omnigraph/src/db/omnigraph/optimize.rs @@ -32,6 +32,8 @@ use lance::dataset::cleanup::{CleanupPolicy, RemovalStats}; use lance::dataset::optimize::{ CompactionMetrics, CompactionOptions, compact_files, plan_compaction, }; +use lance::index::DatasetIndexExt; +use lance_index::optimize::OptimizeOptions; use super::*; @@ -361,16 +363,22 @@ async fn optimize_one_table( } // Precise "will it compact?" check — `plan_compaction` also accounts for - // deletion materialization (which can rewrite even a single fragment). A - // steady-state already-compacted table yields an empty plan and is never - // pinned in a sidecar (a zero-commit pin would classify NoMovement on - // recovery and force an all-or-nothing rollback). Uncovered pre-existing - // drift is skipped above and must go through explicit repair. + // deletion materialization (which can rewrite even a single fragment). let options = CompactionOptions::default(); let plan = plan_compaction(&ds, &options) .await .map_err(|e| OmniError::Lance(e.to_string()))?; - if plan.num_tasks() == 0 { + let will_compact = plan.num_tasks() > 0; + // Even when there is nothing to compact, the table may still have index + // work: rows appended since the index was built (e.g. via `ingest --mode + // merge`) are scanned unindexed until folded in. Either compaction or stale + // index coverage is enough to enter the publish path. If NEITHER, this + // table is a no-op and must NOT be pinned in a sidecar — a zero-commit pin + // classifies NoMovement on recovery and forces an all-or-nothing rollback + // of sibling tables' legitimate work. Uncovered pre-existing manifest/HEAD + // drift is skipped above and must go through explicit repair. + let needs_reindex = TableStore::has_unindexed_fragments(&ds).await?; + if !will_compact && !needs_reindex { return Ok(TableOptimizeStats::compacted( table_key, &CompactionMetrics::default(), @@ -378,8 +386,9 @@ async fn optimize_one_table( )); } - // Phase A: recovery sidecar BEFORE compaction advances the Lance HEAD, so a - // crash before the manifest publish rolls forward on next open. + // Phase A: recovery sidecar BEFORE any HEAD-advancing op (compaction or + // index optimize), so a crash before the manifest publish rolls forward on + // next open. let sidecar = crate::db::manifest::new_sidecar( crate::db::manifest::SidecarKind::Optimize, None, @@ -398,11 +407,26 @@ async fn optimize_one_table( let handle = crate::db::manifest::write_sidecar(db.root_uri(), db.storage_adapter(), &sidecar).await?; - // Phase B: compaction (reserve-fragments + rewrite commits advance HEAD). + // Phase B: compaction (if any) then incremental index optimize — both + // advance Lance HEAD inside the sidecar window. `compact_files` rewrites + // fragments and drops them from existing index segments' coverage; + // `optimize_indices` folds the rewritten and any previously-unindexed + // fragments back in (Lance's incremental merge, not a full retrain). This + // is the same compact -> optimize_indices sequencing LanceDB's `optimize()` + // uses. `optimize_indices` is an inline-commit residual: lance-6.0.1 + // exposes no uncommitted variant, so like `compact_files` it commits + // directly and relies on the sidecar for recovery. let version_before = ds.version().version; - let metrics: CompactionMetrics = compact_files(&mut ds, options, None) + let metrics: CompactionMetrics = if will_compact { + compact_files(&mut ds, options, None) + .await + .map_err(|e| OmniError::Lance(e.to_string()))? + } else { + CompactionMetrics::default() + }; + ds.optimize_indices(&OptimizeOptions::default()) .await - .map_err(|e| OmniError::Lance(e.to_string()))?; + .map_err(|e| OmniError::Lance(format!("optimize_indices on {}: {}", table_key, e)))?; let version_after = ds.version().version; let committed = version_after != version_before; diff --git a/crates/omnigraph/src/table_store.rs b/crates/omnigraph/src/table_store.rs index 65123c0..b6e8c4d 100644 --- a/crates/omnigraph/src/table_store.rs +++ b/crates/omnigraph/src/table_store.rs @@ -705,6 +705,36 @@ impl TableStore { Ok(IndexCoverage::Indexed) } + /// True if any non-system index on `ds` leaves at least one current + /// fragment uncovered, i.e. rows that the index does not yet account for + /// (appended after the index was built, or rewritten by compaction). Such + /// fragments are scanned unindexed until a reindex (`optimize_indices`) + /// folds them in. Returns false when every index covers every fragment, or + /// when the table has no (non-system) indices to optimize. A `None` + /// `fragment_bitmap` means Lance cannot report coverage for that index, so + /// we do not treat it as uncovered (mirrors `key_column_index_coverage`). + /// + /// Used by `optimize` to decide whether an otherwise-already-compacted + /// table still has index work to do. + pub async fn has_unindexed_fragments(ds: &Dataset) -> Result { + let indices = ds + .load_indices() + .await + .map_err(|e| OmniError::Lance(e.to_string()))?; + let frag_ids: Vec = ds.fragments().iter().map(|f| f.id as u32).collect(); + for index in indices.iter() { + if is_system_index(index) { + continue; + } + if let Some(bitmap) = index.fragment_bitmap.as_ref() { + if frag_ids.iter().any(|id| !bitmap.contains(*id)) { + return Ok(true); + } + } + } + Ok(false) + } + pub async fn count_rows(&self, ds: &Dataset, filter: Option) -> Result { ds.count_rows(filter) .await diff --git a/crates/omnigraph/tests/lance_surface_guards.rs b/crates/omnigraph/tests/lance_surface_guards.rs index 370f9e7..0429c1b 100644 --- a/crates/omnigraph/tests/lance_surface_guards.rs +++ b/crates/omnigraph/tests/lance_surface_guards.rs @@ -36,6 +36,7 @@ use lance::dataset::{MergeInsertBuilder, WhenMatched, WhenNotMatched, WriteMode, use lance::index::DatasetIndexExt; use lance_file::version::LanceFileVersion; use lance_index::IndexType; +use lance_index::optimize::OptimizeOptions; use lance_index::scalar::ScalarIndexParams; use lance_namespace::LanceNamespace; use lance_table::io::commit::ManifestNamingScheme; @@ -541,3 +542,108 @@ async fn fragment_deletion_metadata_is_available() { per-fragment deletions and would need to read the deletion vector.", ); } + +// --- Guard 14: Dataset::optimize_indices signature ---------------------------- +// +// `db/omnigraph/optimize.rs::optimize_one_table` calls +// `ds.optimize_indices(&OptimizeOptions::default())` (via `DatasetIndexExt`) to +// fold appended/compacted fragments back into existing indexes. If Lance +// changes the receiver, the options type, or the return shape, this fails to +// compile. Compile-only. + +#[allow( + dead_code, + unreachable_code, + unused_variables, + unused_mut, + clippy::diverging_sub_expression +)] +async fn _compile_optimize_indices_signature() -> lance::Result<()> { + let mut ds: Dataset = unimplemented!(); + let options = OptimizeOptions::default(); + // `&mut self`, `&OptimizeOptions`, returns `Result<()>` (mutates in place + // and commits — there is no uncommitted variant in this Lance, which is why + // optimize treats it as an inline-commit residual under a recovery sidecar). + let _: () = ds.optimize_indices(&options).await?; + Ok(()) +} + +// --- Guard 15: optimize_indices extends fragment coverage ---------------------- +// +// PR3's reindex assumes `optimize_indices` folds fragments appended AFTER an +// index was built into that index (incremental merge, not retrain). This pins +// that Lance behavior at the surface layer so a regression turns red here, the +// first smoke check on a Lance bump, before the slower engine suite. + +#[tokio::test] +async fn optimize_indices_extends_fragment_coverage() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().join("guard_optimize_indices.lance"); + let uri = uri.to_str().unwrap(); + + // Fragment 0: alice, bob. Build a BTREE over `value` covering only it. + let mut ds = fresh_dataset(uri).await; + ds.create_index_builder(&["value"], IndexType::BTree, &ScalarIndexParams::default()) + .replace(true) + .await + .unwrap(); + + // Append a second fragment the existing index does not cover. + let schema = Arc::new(Schema::new(vec![ + Field::new("id", DataType::Utf8, false), + Field::new("value", DataType::Int32, false), + ])); + let batch = RecordBatch::try_new( + schema.clone(), + vec![ + Arc::new(StringArray::from(vec!["carol"])), + Arc::new(Int32Array::from(vec![3])), + ], + ) + .unwrap(); + let reader = RecordBatchIterator::new(vec![Ok(batch)], schema); + let params = WriteParams { + mode: WriteMode::Append, + enable_stable_row_ids: true, + data_storage_version: Some(LanceFileVersion::V2_2), + ..Default::default() + }; + Dataset::write(reader, uri, Some(params)).await.unwrap(); + + let mut ds = Dataset::open(uri).await.unwrap(); + assert!( + value_index_uncovered_count(&ds).await > 0, + "appended fragment should be uncovered by the BTREE before optimize_indices" + ); + + ds.optimize_indices(&OptimizeOptions::default()) + .await + .unwrap(); + + assert_eq!( + value_index_uncovered_count(&ds).await, + 0, + "optimize_indices must fold the appended fragment into the existing index \ + (incremental coverage); if this regresses, PR3's reindex no longer keeps \ + coverage current — revisit db/omnigraph/optimize.rs and docs/dev/lance.md." + ); +} + +/// Count current fragments not covered by the single-column `value` BTREE — +/// mirrors `TableStore::has_unindexed_fragments` (load_indices + +/// `fragment_bitmap.contains`), pinned by Guard 11. +async fn value_index_uncovered_count(ds: &Dataset) -> usize { + let indices = ds.load_indices().await.unwrap(); + let frag_ids: Vec = ds.fragments().iter().map(|f| f.id as u32).collect(); + let value_fid = ds.schema().field("value").unwrap().id; + for index in indices.iter() { + if index.fields.len() == 1 && index.fields[0] == value_fid { + if let Some(bitmap) = index.fragment_bitmap.as_ref() { + return frag_ids.iter().filter(|id| !bitmap.contains(**id)).count(); + } + } + } + // No `value` index found — treat as fully uncovered so a missing index + // is never mistaken for full coverage. + frag_ids.len() +} diff --git a/crates/omnigraph/tests/maintenance.rs b/crates/omnigraph/tests/maintenance.rs index 13c9de7..deb4d2d 100644 --- a/crates/omnigraph/tests/maintenance.rs +++ b/crates/omnigraph/tests/maintenance.rs @@ -14,9 +14,11 @@ use omnigraph::db::{ SkipReason, }; use omnigraph::loader::{LoadMode, load_jsonl}; +use omnigraph::table_store::{IndexCoverage, TableStore}; use helpers::{ MUTATION_QUERIES, TEST_DATA, TEST_SCHEMA, count_rows, init_and_load, mixed_params, mutate_main, + snapshot_main, }; /// Filesystem URI of a node sub-table, mirroring the engine's layout @@ -131,6 +133,72 @@ async fn optimize_after_load_then_again_is_idempotent() { } } +// PR3 (Workstream B): an existing scalar index does not cover fragments +// appended after it was built (build_indices is existence-gated), so those +// rows are scanned unindexed. `optimize` must fold them back in via Lance's +// incremental `optimize_indices`, restoring full coverage. +#[tokio::test] +async fn optimize_reindexes_fragments_appended_after_index_build() { + const SCHEMA: &str = r#" +node Doc { + slug: String @key + rank: I32 @index +} +"#; + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + let mut db = Omnigraph::init(uri, SCHEMA).await.unwrap(); + + // First load builds the id + rank BTREEs over the initial fragment. + load_jsonl( + &mut db, + "{\"type\":\"Doc\",\"data\":{\"slug\":\"d1\",\"rank\":1}}\n\ + {\"type\":\"Doc\",\"data\":{\"slug\":\"d2\",\"rank\":2}}", + LoadMode::Merge, + ) + .await + .unwrap(); + + // A second load with NEW keys appends a fragment the existing BTREEs do not + // cover (the existence gate skips re-building an index that already exists). + load_jsonl( + &mut db, + "{\"type\":\"Doc\",\"data\":{\"slug\":\"d3\",\"rank\":3}}\n\ + {\"type\":\"Doc\",\"data\":{\"slug\":\"d4\",\"rank\":4}}", + LoadMode::Merge, + ) + .await + .unwrap(); + + // Precondition: the appended fragment is unindexed. + { + let snap = snapshot_main(&db).await.unwrap(); + let ds = snap.open("node:Doc").await.unwrap(); + assert!( + TableStore::has_unindexed_fragments(&ds).await.unwrap(), + "appended fragment should be unindexed before optimize" + ); + } + + db.optimize().await.unwrap(); + + // Postcondition: optimize_indices folded the appended fragment in, so every + // index covers every fragment and `rank` reports fully Indexed. + let snap = snapshot_main(&db).await.unwrap(); + let ds = snap.open("node:Doc").await.unwrap(); + assert!( + !TableStore::has_unindexed_fragments(&ds).await.unwrap(), + "optimize must extend index coverage to all fragments" + ); + assert_eq!( + TableStore::key_column_index_coverage(&ds, "rank") + .await + .unwrap(), + IndexCoverage::Indexed, + "rank BTREE must cover all fragments after optimize" + ); +} + // Regression: `optimize` must not crash on a graph that has a `Blob` table. // // Lance `compact_files` forces `BlobHandling::AllBinary`, which mis-decodes diff --git a/docs/dev/invariants.md b/docs/dev/invariants.md index b3bcfaf..2523926 100644 --- a/docs/dev/invariants.md +++ b/docs/dev/invariants.md @@ -105,7 +105,7 @@ Use it this way: | Schema validation | Type checks, required fields, defaults, edge endpoint checks, and edge cardinality are enforced on write paths | [schema-language.md](../user/schema-language.md), [execution.md](execution.md) | | Unique constraints | Intra-batch and write-path checks exist; intake and branch-merge derive the composite key through one shared function (`loader::composite_unique_key`, a separator-free `Vec` tuple) and fail loudly on an un-keyable column type rather than silently exempting it; full cross-version uniqueness against already-committed rows is still a gap | [schema-language.md](../user/schema-language.md) | | Storage trait | `TableStorage` (via `db.storage()`) is staged-only; the inline-commit residuals (`delete_where`, `create_vector_index`) are split onto a separate sealed `InlineCommitResidual` trait reached via `db.storage_inline_residual()` (MR-854), so §1 holds by construction; capability/stat surfaces are roadmap | [writes.md](writes.md), [architecture.md](architecture.md) | -| Index lifecycle | `ensure_indices` is explicit today; reconciler-based convergence is roadmap | [indexes.md](../user/indexes.md), [maintenance.md](../user/maintenance.md) | +| Index lifecycle | Index *creation* per `@index`/`@key` property is dispatched by type (enum + orderable scalar → BTREE, free-text String → FTS, Vector → vector) via `node_prop_index_kind`; index *coverage maintenance* exists — `optimize` runs Lance `optimize_indices` after compaction to fold appended/rewritten fragments into existing indexes (still an explicit maintenance call, not yet a background reconciler) | [indexes.md](../user/indexes.md), [maintenance.md](../user/maintenance.md) | | Traversal IDs | Runtime still builds `TypeIndex`; Lance stable row-id based graph IDs are roadmap | [architecture.md](architecture.md), [query-language.md](../user/query-language.md) | | Auth | Bearer token hashing and server-side actor resolution are implemented at the HTTP boundary | [server.md](../user/server.md), [policy.md](../user/policy.md) | | Tests | Tempdir-backed Lance tests are the current substrate; the storage adapter has an in-memory backend for adapter-level contract tests, but Lance datasets bypass it | [testing.md](testing.md) | diff --git a/docs/dev/lance.md b/docs/dev/lance.md index a4e311f..9544e80 100644 --- a/docs/dev/lance.md +++ b/docs/dev/lance.md @@ -169,6 +169,7 @@ Migration from Lance 4.0.0 → 6.0.1 landed in this cycle (DataFusion 52 → 53, - **`Dataset::checkout_version(N).await?.restore().await?`**: `restore()` takes `&mut self` and returns `Result<()>` (mutates in place, does not consume + return a new dataset). The recovery rollback hammer at `db/manifest/recovery.rs:505-522` continues to work. Pinned by `lance_surface_guards.rs::_compile_checkout_version_then_restore_signature`. - **`DatasetBuilder::from_namespace(...).with_branch(...).with_version(...).load()`** surface preserved (the namespace builder chain at `db/manifest/namespace.rs:162-174`). Pinned by `lance_surface_guards.rs::_compile_dataset_builder_from_namespace_signature`. - **`compact_files(&mut ds, CompactionOptions::default(), None)`** signature stable. `CompactionOptions` still does not expose `data_storage_version`; `compact_files` builds its own `WriteParams { ..Default::default() }`. Note: `LanceFileVersion::default()` is now V2_1 in v6, so optimize-rewritten fragments come out at V2_1 by default (was V2_0 in v4). Existing explicit V2_2 pins on creates/appends still apply. +- **`Dataset::optimize_indices(&mut self, &lance_index::optimize::OptimizeOptions)`** (via `DatasetIndexExt`) is a depended-on surface as of the index-coverage work: `db/omnigraph/optimize.rs` calls it after `compact_files` to fold appended/rewritten fragments into existing indexes (incremental merge, not retrain). It is a **committing** call (mutates in place, advances HEAD; no uncommitted variant in v6.0.1), so optimize treats it as an inline-commit residual under the `SidecarKind::Optimize` recovery sidecar. Signature pinned by `lance_surface_guards.rs::_compile_optimize_indices_signature`; the incremental-coverage behavior pinned by `optimize_indices_extends_fragment_coverage` (appended fragment uncovered before, covered after). - **`Dataset::delete(predicate)` returns `DeleteResult { new_dataset: Arc, num_deleted_rows: u64 }`** — unchanged shape. Pinned by `lance_surface_guards.rs::_compile_delete_result_field_shape`. MR-A will repurpose this guard to the staged two-phase variant once `DeleteBuilder::execute_uncommitted` migration lands. - **File reader read methods now async** (Lance PR #6710, v6.0). No effect — omnigraph reaches Lance exclusively through `Dataset::scan` and the staged-write API. - **Tokenizer vendored as `lance-tokenizer`** (Lance PR #6512, v6.0). No effect — no direct tokenizer imports. @@ -178,6 +179,6 @@ Migration from Lance 4.0.0 → 6.0.1 landed in this cycle (DataFusion 52 → 53, - **`Dataset::force_delete_branch`** (`branches().delete(name, force=true)`, dataset.rs:524) tolerates a missing branch-*contents* ref (vs plain `delete_branch`'s `RefNotFound`), but on the local store still errors `NotFound` if the branch `tree/` directory is fully absent (`remove_dir_all`'s NotFound is not caught for Lance's native error variant, refs.rs:526-549). Both variants still refuse a branch with referencing descendants (`RefConflict`). `TableStore::force_delete_branch` wraps this to be fully idempotent (tolerates already-absent). The single-authority branch-delete redesign uses it for orphan reclamation (eager best-effort reclaim + cleanup reconciler). Pinned by `lance_surface_guards.rs::force_delete_branch_semantics`. Branch delete is "flip the ref atomically, then `remove_dir_all(tree/{branch})`"; branch-exclusive data lives under `tree/{branch}/` so a drop reclaims it immediately without touching `main`. - **Lance blob-v2 `compact_files` bug** (no public issue found as of 2026-06): `compact_files` disables binary-copy for blob datasets and forces `BlobHandling::AllBinary` on the read side; the v2.1+ structural decoder then mis-counts column infos for the blob-v2 struct and fails with `Invalid user input: there were more fields in the schema than provided column indices / infos` (`lance-encoding/src/decoder.rs::ColumnInfoIter::expect_next`). This fails even a pristine uniform-V2_2 multi-fragment blob table; vector/list/scalar/ragged columns and mixed file versions all compact fine. Reads/queries use descriptor handling (`BlobHandling::default()`) and are unaffected. `optimize` skips blob-bearing tables behind `LANCE_SUPPORTS_BLOB_COMPACTION = false` (`db/omnigraph/optimize.rs`), reporting `SkipReason::BlobColumnsUnsupportedByLance`. Pinned by `lance_surface_guards.rs::compact_files_still_fails_on_blob_columns`, which turns red when the bug is fixed → flip the gate, remove the skip branch + the `maintenance.rs::optimize_skips_blob_table_and_reports_skip` skip assertions. -Surface guards added: `crates/omnigraph/tests/lance_surface_guards.rs` (10 named guards; 5 runtime + 5 compile-only). Future Lance bumps re-run this file first as the smoke check. Two additional guards from the original plan deferred to follow-up (`manifest_cas_returns_row_level_contention_variant` needs full publisher-race harness; `table_version_metadata_byte_compatible_with_v4` needs `pub(crate)` reach extension). +Surface guards added: `crates/omnigraph/tests/lance_surface_guards.rs` (10 named guards; 5 runtime + 5 compile-only; plus the index-coverage work's `_compile_optimize_indices_signature` and `optimize_indices_extends_fragment_coverage`). Future Lance bumps re-run this file first as the smoke check. Two additional guards from the original plan deferred to follow-up (`manifest_cas_returns_row_level_contention_variant` needs full publisher-race harness; `table_version_metadata_byte_compatible_with_v4` needs `pub(crate)` reach extension). Bump this date stanza on the next alignment pass. diff --git a/docs/dev/writes.md b/docs/dev/writes.md index 82d6ba8..6cbf227 100644 --- a/docs/dev/writes.md +++ b/docs/dev/writes.md @@ -80,10 +80,17 @@ deferred to a follow-up cycle — tracked). Three writers have been migrated onto staged primitives: * **`ensure_indices`** (`db/omnigraph/table_ops.rs::build_indices_on_dataset_for_catalog`) - — scalar indices (BTree, Inverted) now use `stage_create_*_index` + - `commit_staged`. Vector indices stay inline (residual — Lance - `build_index_metadata_from_segments` is `pub(crate)` in 6.0.1; - companion ticket to lance-format/lance#6658 needed). + — scalar indices (BTree, Inverted) use `stage_create_*_index` + + `commit_staged`. Which index a `@index`/`@key` property gets is dispatched by + type via `node_prop_index_kind` (enum + orderable scalar → BTree, free-text + String → Inverted/FTS, Vector → vector). Vector indices stay inline (residual + — Lance `build_index_metadata_from_segments` is `pub(crate)` in 6.0.1; + companion ticket to lance-format/lance#6658 needed). This build is + existence-gated (it creates a *missing* index over current fragments); folding + fragments appended afterward into an *existing* index is `optimize`'s + `optimize_indices` pass — an inline-commit residual, not a staged write (Lance + exposes no uncommitted index-optimize), covered by the optimize recovery + sidecar (see [maintenance.md](../user/maintenance.md)). * **`branch_merge::publish_rewritten_merge_table`** (`exec/merge.rs`) — merge_insert now uses `stage_merge_insert` + `commit_staged`. Deletes stay inline (Lance #6658 residual). diff --git a/docs/user/maintenance.md b/docs/user/maintenance.md index e69bba3..6baa6f0 100644 --- a/docs/user/maintenance.md +++ b/docs/user/maintenance.md @@ -4,14 +4,15 @@ ## `optimize_all_tables(db)` — non-destructive -- Lance `compact_files()` on every node + edge table on `main`, then **publishes the compacted version to the `__manifest`** so the manifest's `table_version` tracks the compacted Lance HEAD. Reads pin the manifest version, so without this publish compaction would be invisible to readers *and* would break the HEAD-vs-manifest precondition of the next schema apply / strict update/delete ("stale view … refresh and retry"). The publish advances the graph version (a system-attributed commit) only for tables that actually compacted. +- Lance `compact_files()` then `optimize_indices()` on every node + edge table on `main`, then **publishes the resulting version to the `__manifest`** so the manifest's `table_version` tracks the compacted-and-reindexed Lance HEAD. Reads pin the manifest version, so without this publish the work would be invisible to readers *and* would break the HEAD-vs-manifest precondition of the next schema apply / strict update/delete ("stale view … refresh and retry"). The publish advances the graph version (a system-attributed commit) only for tables that actually changed. - Rewrites small fragments into fewer large ones; old fragments remain reachable via older manifests until `cleanup` runs. -- Each table's compact→publish runs under its per-`(table, main)` write queue (serializing with concurrent mutations — compaction is a Lance `Rewrite` op that retryable-conflicts with a concurrent merge/update/delete on overlapping fragments). The Lance-HEAD-before-manifest-publish gap is covered by a `SidecarKind::Optimize` recovery sidecar (loose-match): a crash in that window rolls the compacted version forward on the next `Omnigraph::open` (compaction is content-preserving, so roll-forward is always safe). +- **Reindex (index coverage maintenance).** A scalar/FTS/vector index only covers the fragments it was built over. Rows appended after the index was built (e.g. by `ingest --mode merge`, whose commit does not rebuild an already-existing index) are scanned unindexed, and `compact_files` itself rewrites fragments out of an index's coverage. `optimize` runs Lance's incremental `optimize_indices` after compaction to fold those fragments back in (a delta merge, not a full retrain), restoring full coverage so equality/range/traversal predicates stay index-accelerated. This is why a table with **no compaction work but stale index coverage still commits** a new version under `optimize`. Run `optimize` on a cadence at least as frequent as your freshness window so recently-ingested rows do not linger in the unindexed flat-scan tail. `optimize_indices` is an inline-commit residual (Lance exposes no uncommitted variant), so it advances Lance HEAD like compaction and is covered by the same `SidecarKind::Optimize` recovery sidecar. +- Each table's compact→reindex→publish runs under its per-`(table, main)` write queue (serializing with concurrent mutations — compaction is a Lance `Rewrite` op that retryable-conflicts with a concurrent merge/update/delete on overlapping fragments). The Lance-HEAD-before-manifest-publish gap (now spanning both compaction and `optimize_indices`) is covered by a single `SidecarKind::Optimize` recovery sidecar (loose-match): a crash in that window rolls the compacted-and-reindexed version forward on the next `Omnigraph::open` (both ops are content-preserving, so roll-forward is always safe). - **Requires a recovered graph.** `optimize` refuses (errors) when an unresolved recovery sidecar is present under `__recovery` — operating on an unrecovered graph could publish a partial write the open-time recovery sweep would roll back. Reopen the graph to run the recovery sweep, then re-run `optimize`. - **Uncovered drift is skipped, not interpreted.** If a table's Lance HEAD is ahead of the version recorded in `__manifest` and no recovery sidecar covers that movement, `optimize` reports `skipped: Some(DriftNeedsRepair)` with the manifest/head versions and leaves the table untouched. Run `omnigraph repair` to classify and explicitly publish that drift. - Bounded by `OMNIGRAPH_MAINTENANCE_CONCURRENCY` (default 8). - Returns `[TableOptimizeStats { table_key, fragments_removed, fragments_added, committed, skipped, manifest_version, lance_head_version }]`. -- **Blob tables are skipped.** A table that declares any `Blob` property is not compacted: it is reported with `skipped: Some(BlobColumnsUnsupportedByLance)` (and logged via `tracing::warn`) instead of compacted, and the rest of the sweep proceeds normally. The current Lance `compact_files` mis-decodes blob-v2 columns under its forced `BlobHandling::AllBinary` read; **reads and writes are unaffected** — only compaction is. This is gated by `LANCE_SUPPORTS_BLOB_COMPACTION` (`db/omnigraph/optimize.rs`) and removed when the upstream Lance fix lands (see [docs/dev/lance.md](../dev/lance.md)). Consequence: fragment count and deleted-row space on blob tables are not reclaimed until then; query results are never affected. +- **Blob tables are skipped.** A table that declares any `Blob` property is not compacted: it is reported with `skipped: Some(BlobColumnsUnsupportedByLance)` (and logged via `tracing::warn`) instead of compacted, and the rest of the sweep proceeds normally. The current Lance `compact_files` mis-decodes blob-v2 columns under its forced `BlobHandling::AllBinary` read; **reads and writes are unaffected** — only compaction is. This is gated by `LANCE_SUPPORTS_BLOB_COMPACTION` (`db/omnigraph/optimize.rs`) and removed when the upstream Lance fix lands (see [docs/dev/lance.md](../dev/lance.md)). Consequence: fragment count and deleted-row space on blob tables are not reclaimed until then; query results are never affected. A skipped blob table is also **not reindexed** in the same sweep (the skip happens before the reindex step), so its index coverage on appended rows is not refreshed by `optimize` today — a follow-up may split reindex out of the blob skip since `optimize_indices` does not hit the blob-compaction bug. ## `repair_all_tables(db, options)` — explicit