feat(engine): reindex in optimize to keep index coverage current

A scalar/FTS/vector index only covers the fragments it was built over. Rows
appended after the build (e.g. `ingest --mode merge`, whose commit does not
rebuild an existing index) are scanned unindexed, and `compact_files` rewrites
fragments out of coverage. Nothing folded them back in, so coverage decayed as
the graph grew — even the id/src/dst BTREEs that power traversal.

`optimize_one_table` now runs Lance `optimize_indices` after `compact_files`
(incremental merge, not retrain — the same compact->optimize_indices sequence
LanceDB's `optimize()` uses) and enters the publish path on compaction work OR
stale index coverage (new `TableStore::has_unindexed_fragments`, reusing the
fragment_bitmap logic). `optimize_indices` is a committing call with no
uncommitted variant in lance-6.0.1, so it is an inline-commit residual covered
by the existing `SidecarKind::Optimize` recovery sidecar spanning both ops.
Blob-bearing tables are still skipped (the Lance blob-compaction bug is
compaction-specific; reindex-for-blob deferred as a noted follow-up).

Tests: maintenance.rs asserts an appended fragment is uncovered before and
covered after optimize, and idempotency holds (second pass is a no-op).
lance_surface_guards pins the `optimize_indices` signature and its incremental-
coverage behavior. The existing optimize Phase-B recovery failpoint now also
exercises a crash after reindex. Docs: maintenance.md, writes.md, invariants.md,
lance.md, AGENTS.md.
This commit is contained in:
Ragnor Comerford 2026-06-13 18:47:41 +02:00
parent 481de860b2
commit 0edcf3ec59
No known key found for this signature in database
9 changed files with 259 additions and 22 deletions

View file

@ -241,10 +241,10 @@ omnigraph policy explain --actor act-alice --action change --branch main
| Per-dataset versioning + time travel | ✅ | `snapshot_at_version`, `entity_at`, snapshot-pinned reads across many tables |
| Per-dataset branches | ✅ | **Graph-level** branches (atomic across all sub-tables), lazy fork, system branch filtering |
| Atomic single-dataset commits | ✅ | **Multi-table publish via three layers**, NOT a single Lance primitive: (1) per-table Lance `commit_staged` for the data write, (2) `__manifest` row-level CAS via `ManifestBatchPublisher` for cross-table ordering, (3) the open-time recovery sweep for the residual gap between (1) and (2). All three layers ship; the five migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`, `optimize_all_tables`) write a `__recovery/{ulid}.json` sidecar before Phase B and delete it after Phase C. The next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the sweep in `db/manifest/recovery.rs`: classify, decide all-or-nothing per sidecar, roll forward via single `ManifestBatchPublisher::publish` or roll back via `Dataset::restore` followed by a manifest publish of the restored version (so both directions converge to `manifest == HEAD` — no residual drift), and record an audit row in `_graph_commit_recoveries.lance` (queryable via `omnigraph commit list --filter actor=omnigraph:recovery`). The write entry points (`load_as`, `mutate_as`, `apply_schema_as`, `branch_merge_as`) and `refresh` additionally run an in-process roll-forward-only heal (serialized against live writers via the per-table write queues), so a long-lived server converges on its next write without restart; only rollback-eligible sidecars still defer to the next read-write open (a future background reconciler's goal). Engine writes route through a sealed `TableStorage` trait (`db.storage()`) exposing only `stage_*` + `commit_staged` + reads; the inline-commit residuals (`delete_where`, `create_vector_index`) are split onto a separate sealed `InlineCommitResidual` trait reached via `db.storage_inline_residual()` (MR-854), so the default surface cannot couple a write with a HEAD advance — §1 holds by construction. `delete_where` and `create_vector_index` stay inline until upstream Lance ships a public two-phase API ([#6658](https://github.com/lance-format/lance/issues/6658), [#6666](https://github.com/lance-format/lance/issues/6666)); `LoadMode::Overwrite` uses Lance `Overwrite` staged transactions. |
| Compaction (`compact_files`) | ✅ | `omnigraph optimize` orchestrates over all node/edge tables, bounded concurrency; **publishes each compacted table's new version to `__manifest`** (so the manifest tracks the Lance HEAD — required for reads to observe compaction and for schema apply / strict writes to pass their HEAD-vs-manifest precondition), under the per-`(table, main)` write queue with `SidecarKind::Optimize` recovery coverage; **refuses on an unrecovered graph** (errors if a `__recovery` sidecar is pending); **skips uncovered HEAD > manifest drift** with `DriftNeedsRepair` instead of interpreting it; **skips blob-bearing tables** (reported via `TableOptimizeStats.skipped`, not silent), gated on `LANCE_SUPPORTS_BLOB_COMPACTION` until the upstream blob-v2 compaction-decode bug is fixed (see [docs/dev/invariants.md](docs/dev/invariants.md) Known Gaps) |
| Compaction (`compact_files`) + reindex (`optimize_indices`) | ✅ | `omnigraph optimize` orchestrates over all node/edge tables, bounded concurrency; per table runs `compact_files` **then Lance `optimize_indices`** (folds appended/rewritten fragments back into existing indexes — incremental merge, not retrain) and **publishes the resulting version to `__manifest`** (so the manifest tracks the Lance HEAD — required for reads to observe the work and for schema apply / strict writes to pass their HEAD-vs-manifest precondition), under the per-`(table, main)` write queue with `SidecarKind::Optimize` recovery coverage spanning both ops; **commits even with no compaction work if index coverage is stale**; **refuses on an unrecovered graph**; **skips uncovered HEAD > manifest drift** with `DriftNeedsRepair`; **skips blob-bearing tables** (reported via `TableOptimizeStats.skipped`, not silent; reindex is skipped for them too today), gated on `LANCE_SUPPORTS_BLOB_COMPACTION` until the upstream blob-v2 compaction-decode bug is fixed (see [docs/dev/invariants.md](docs/dev/invariants.md) Known Gaps) |
| Repair uncovered drift | — | `omnigraph repair` explicitly classifies uncovered table `HEAD > manifest` drift: verified maintenance drift (`ReserveFragments`/`Rewrite`) can be published with `--confirm`; suspicious or unverifiable drift requires `--force --confirm`. Sidecar-covered crash residuals still recover automatically on open. |
| Cleanup (`cleanup_old_versions`) | ✅ | `omnigraph cleanup` with `--keep` / `--older-than` policy |
| BTREE / inverted (FTS) / vector indexes | ✅ | `ensure_indices` builds them on every relevant column; idempotent; lazy across branches |
| BTREE / inverted (FTS) / vector indexes | ✅ | `ensure_indices` builds them per `@index`/`@key` column, dispatched by type via `node_prop_index_kind` (enum + orderable scalar → BTREE, free-text String → FTS, Vector → vector); idempotent; lazy across branches. Coverage of fragments appended after build is restored by `optimize`'s `optimize_indices` pass (see Compaction row). |
| `merge_insert` upsert | ✅ | `LoadMode::Merge`, mutation `update`/`insert`/`delete` lowering |
| Vector search | ✅ | `nearest()` query op; embedding pipeline (Gemini / OpenAI clients); `@embed` in schema |
| Full-text search | ✅ | `search/fuzzy/match_text/bm25` query ops |

View file

@ -32,6 +32,8 @@ use lance::dataset::cleanup::{CleanupPolicy, RemovalStats};
use lance::dataset::optimize::{
CompactionMetrics, CompactionOptions, compact_files, plan_compaction,
};
use lance::index::DatasetIndexExt;
use lance_index::optimize::OptimizeOptions;
use super::*;
@ -361,16 +363,22 @@ async fn optimize_one_table(
}
// Precise "will it compact?" check — `plan_compaction` also accounts for
// deletion materialization (which can rewrite even a single fragment). A
// steady-state already-compacted table yields an empty plan and is never
// pinned in a sidecar (a zero-commit pin would classify NoMovement on
// recovery and force an all-or-nothing rollback). Uncovered pre-existing
// drift is skipped above and must go through explicit repair.
// deletion materialization (which can rewrite even a single fragment).
let options = CompactionOptions::default();
let plan = plan_compaction(&ds, &options)
.await
.map_err(|e| OmniError::Lance(e.to_string()))?;
if plan.num_tasks() == 0 {
let will_compact = plan.num_tasks() > 0;
// Even when there is nothing to compact, the table may still have index
// work: rows appended since the index was built (e.g. via `ingest --mode
// merge`) are scanned unindexed until folded in. Either compaction or stale
// index coverage is enough to enter the publish path. If NEITHER, this
// table is a no-op and must NOT be pinned in a sidecar — a zero-commit pin
// classifies NoMovement on recovery and forces an all-or-nothing rollback
// of sibling tables' legitimate work. Uncovered pre-existing manifest/HEAD
// drift is skipped above and must go through explicit repair.
let needs_reindex = TableStore::has_unindexed_fragments(&ds).await?;
if !will_compact && !needs_reindex {
return Ok(TableOptimizeStats::compacted(
table_key,
&CompactionMetrics::default(),
@ -378,8 +386,9 @@ async fn optimize_one_table(
));
}
// Phase A: recovery sidecar BEFORE compaction advances the Lance HEAD, so a
// crash before the manifest publish rolls forward on next open.
// Phase A: recovery sidecar BEFORE any HEAD-advancing op (compaction or
// index optimize), so a crash before the manifest publish rolls forward on
// next open.
let sidecar = crate::db::manifest::new_sidecar(
crate::db::manifest::SidecarKind::Optimize,
None,
@ -398,11 +407,26 @@ async fn optimize_one_table(
let handle =
crate::db::manifest::write_sidecar(db.root_uri(), db.storage_adapter(), &sidecar).await?;
// Phase B: compaction (reserve-fragments + rewrite commits advance HEAD).
// Phase B: compaction (if any) then incremental index optimize — both
// advance Lance HEAD inside the sidecar window. `compact_files` rewrites
// fragments and drops them from existing index segments' coverage;
// `optimize_indices` folds the rewritten and any previously-unindexed
// fragments back in (Lance's incremental merge, not a full retrain). This
// is the same compact -> optimize_indices sequencing LanceDB's `optimize()`
// uses. `optimize_indices` is an inline-commit residual: lance-6.0.1
// exposes no uncommitted variant, so like `compact_files` it commits
// directly and relies on the sidecar for recovery.
let version_before = ds.version().version;
let metrics: CompactionMetrics = compact_files(&mut ds, options, None)
let metrics: CompactionMetrics = if will_compact {
compact_files(&mut ds, options, None)
.await
.map_err(|e| OmniError::Lance(e.to_string()))?
} else {
CompactionMetrics::default()
};
ds.optimize_indices(&OptimizeOptions::default())
.await
.map_err(|e| OmniError::Lance(e.to_string()))?;
.map_err(|e| OmniError::Lance(format!("optimize_indices on {}: {}", table_key, e)))?;
let version_after = ds.version().version;
let committed = version_after != version_before;

View file

@ -705,6 +705,36 @@ impl TableStore {
Ok(IndexCoverage::Indexed)
}
/// True if any non-system index on `ds` leaves at least one current
/// fragment uncovered, i.e. rows that the index does not yet account for
/// (appended after the index was built, or rewritten by compaction). Such
/// fragments are scanned unindexed until a reindex (`optimize_indices`)
/// folds them in. Returns false when every index covers every fragment, or
/// when the table has no (non-system) indices to optimize. A `None`
/// `fragment_bitmap` means Lance cannot report coverage for that index, so
/// we do not treat it as uncovered (mirrors `key_column_index_coverage`).
///
/// Used by `optimize` to decide whether an otherwise-already-compacted
/// table still has index work to do.
pub async fn has_unindexed_fragments(ds: &Dataset) -> Result<bool> {
let indices = ds
.load_indices()
.await
.map_err(|e| OmniError::Lance(e.to_string()))?;
let frag_ids: Vec<u32> = ds.fragments().iter().map(|f| f.id as u32).collect();
for index in indices.iter() {
if is_system_index(index) {
continue;
}
if let Some(bitmap) = index.fragment_bitmap.as_ref() {
if frag_ids.iter().any(|id| !bitmap.contains(*id)) {
return Ok(true);
}
}
}
Ok(false)
}
pub async fn count_rows(&self, ds: &Dataset, filter: Option<String>) -> Result<usize> {
ds.count_rows(filter)
.await

View file

@ -36,6 +36,7 @@ use lance::dataset::{MergeInsertBuilder, WhenMatched, WhenNotMatched, WriteMode,
use lance::index::DatasetIndexExt;
use lance_file::version::LanceFileVersion;
use lance_index::IndexType;
use lance_index::optimize::OptimizeOptions;
use lance_index::scalar::ScalarIndexParams;
use lance_namespace::LanceNamespace;
use lance_table::io::commit::ManifestNamingScheme;
@ -541,3 +542,108 @@ async fn fragment_deletion_metadata_is_available() {
per-fragment deletions and would need to read the deletion vector.",
);
}
// --- Guard 14: Dataset::optimize_indices signature ----------------------------
//
// `db/omnigraph/optimize.rs::optimize_one_table` calls
// `ds.optimize_indices(&OptimizeOptions::default())` (via `DatasetIndexExt`) to
// fold appended/compacted fragments back into existing indexes. If Lance
// changes the receiver, the options type, or the return shape, this fails to
// compile. Compile-only.
#[allow(
dead_code,
unreachable_code,
unused_variables,
unused_mut,
clippy::diverging_sub_expression
)]
async fn _compile_optimize_indices_signature() -> lance::Result<()> {
let mut ds: Dataset = unimplemented!();
let options = OptimizeOptions::default();
// `&mut self`, `&OptimizeOptions`, returns `Result<()>` (mutates in place
// and commits — there is no uncommitted variant in this Lance, which is why
// optimize treats it as an inline-commit residual under a recovery sidecar).
let _: () = ds.optimize_indices(&options).await?;
Ok(())
}
// --- Guard 15: optimize_indices extends fragment coverage ----------------------
//
// PR3's reindex assumes `optimize_indices` folds fragments appended AFTER an
// index was built into that index (incremental merge, not retrain). This pins
// that Lance behavior at the surface layer so a regression turns red here, the
// first smoke check on a Lance bump, before the slower engine suite.
#[tokio::test]
async fn optimize_indices_extends_fragment_coverage() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().join("guard_optimize_indices.lance");
let uri = uri.to_str().unwrap();
// Fragment 0: alice, bob. Build a BTREE over `value` covering only it.
let mut ds = fresh_dataset(uri).await;
ds.create_index_builder(&["value"], IndexType::BTree, &ScalarIndexParams::default())
.replace(true)
.await
.unwrap();
// Append a second fragment the existing index does not cover.
let schema = Arc::new(Schema::new(vec![
Field::new("id", DataType::Utf8, false),
Field::new("value", DataType::Int32, false),
]));
let batch = RecordBatch::try_new(
schema.clone(),
vec![
Arc::new(StringArray::from(vec!["carol"])),
Arc::new(Int32Array::from(vec![3])),
],
)
.unwrap();
let reader = RecordBatchIterator::new(vec![Ok(batch)], schema);
let params = WriteParams {
mode: WriteMode::Append,
enable_stable_row_ids: true,
data_storage_version: Some(LanceFileVersion::V2_2),
..Default::default()
};
Dataset::write(reader, uri, Some(params)).await.unwrap();
let mut ds = Dataset::open(uri).await.unwrap();
assert!(
value_index_uncovered_count(&ds).await > 0,
"appended fragment should be uncovered by the BTREE before optimize_indices"
);
ds.optimize_indices(&OptimizeOptions::default())
.await
.unwrap();
assert_eq!(
value_index_uncovered_count(&ds).await,
0,
"optimize_indices must fold the appended fragment into the existing index \
(incremental coverage); if this regresses, PR3's reindex no longer keeps \
coverage current revisit db/omnigraph/optimize.rs and docs/dev/lance.md."
);
}
/// Count current fragments not covered by the single-column `value` BTREE —
/// mirrors `TableStore::has_unindexed_fragments` (load_indices +
/// `fragment_bitmap.contains`), pinned by Guard 11.
async fn value_index_uncovered_count(ds: &Dataset) -> usize {
let indices = ds.load_indices().await.unwrap();
let frag_ids: Vec<u32> = ds.fragments().iter().map(|f| f.id as u32).collect();
let value_fid = ds.schema().field("value").unwrap().id;
for index in indices.iter() {
if index.fields.len() == 1 && index.fields[0] == value_fid {
if let Some(bitmap) = index.fragment_bitmap.as_ref() {
return frag_ids.iter().filter(|id| !bitmap.contains(**id)).count();
}
}
}
// No `value` index found — treat as fully uncovered so a missing index
// is never mistaken for full coverage.
frag_ids.len()
}

View file

@ -14,9 +14,11 @@ use omnigraph::db::{
SkipReason,
};
use omnigraph::loader::{LoadMode, load_jsonl};
use omnigraph::table_store::{IndexCoverage, TableStore};
use helpers::{
MUTATION_QUERIES, TEST_DATA, TEST_SCHEMA, count_rows, init_and_load, mixed_params, mutate_main,
snapshot_main,
};
/// Filesystem URI of a node sub-table, mirroring the engine's layout
@ -131,6 +133,72 @@ async fn optimize_after_load_then_again_is_idempotent() {
}
}
// PR3 (Workstream B): an existing scalar index does not cover fragments
// appended after it was built (build_indices is existence-gated), so those
// rows are scanned unindexed. `optimize` must fold them back in via Lance's
// incremental `optimize_indices`, restoring full coverage.
#[tokio::test]
async fn optimize_reindexes_fragments_appended_after_index_build() {
const SCHEMA: &str = r#"
node Doc {
slug: String @key
rank: I32 @index
}
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, SCHEMA).await.unwrap();
// First load builds the id + rank BTREEs over the initial fragment.
load_jsonl(
&mut db,
"{\"type\":\"Doc\",\"data\":{\"slug\":\"d1\",\"rank\":1}}\n\
{\"type\":\"Doc\",\"data\":{\"slug\":\"d2\",\"rank\":2}}",
LoadMode::Merge,
)
.await
.unwrap();
// A second load with NEW keys appends a fragment the existing BTREEs do not
// cover (the existence gate skips re-building an index that already exists).
load_jsonl(
&mut db,
"{\"type\":\"Doc\",\"data\":{\"slug\":\"d3\",\"rank\":3}}\n\
{\"type\":\"Doc\",\"data\":{\"slug\":\"d4\",\"rank\":4}}",
LoadMode::Merge,
)
.await
.unwrap();
// Precondition: the appended fragment is unindexed.
{
let snap = snapshot_main(&db).await.unwrap();
let ds = snap.open("node:Doc").await.unwrap();
assert!(
TableStore::has_unindexed_fragments(&ds).await.unwrap(),
"appended fragment should be unindexed before optimize"
);
}
db.optimize().await.unwrap();
// Postcondition: optimize_indices folded the appended fragment in, so every
// index covers every fragment and `rank` reports fully Indexed.
let snap = snapshot_main(&db).await.unwrap();
let ds = snap.open("node:Doc").await.unwrap();
assert!(
!TableStore::has_unindexed_fragments(&ds).await.unwrap(),
"optimize must extend index coverage to all fragments"
);
assert_eq!(
TableStore::key_column_index_coverage(&ds, "rank")
.await
.unwrap(),
IndexCoverage::Indexed,
"rank BTREE must cover all fragments after optimize"
);
}
// Regression: `optimize` must not crash on a graph that has a `Blob` table.
//
// Lance `compact_files` forces `BlobHandling::AllBinary`, which mis-decodes

View file

@ -105,7 +105,7 @@ Use it this way:
| Schema validation | Type checks, required fields, defaults, edge endpoint checks, and edge cardinality are enforced on write paths | [schema-language.md](../user/schema-language.md), [execution.md](execution.md) |
| Unique constraints | Intra-batch and write-path checks exist; intake and branch-merge derive the composite key through one shared function (`loader::composite_unique_key`, a separator-free `Vec<String>` tuple) and fail loudly on an un-keyable column type rather than silently exempting it; full cross-version uniqueness against already-committed rows is still a gap | [schema-language.md](../user/schema-language.md) |
| Storage trait | `TableStorage` (via `db.storage()`) is staged-only; the inline-commit residuals (`delete_where`, `create_vector_index`) are split onto a separate sealed `InlineCommitResidual` trait reached via `db.storage_inline_residual()` (MR-854), so §1 holds by construction; capability/stat surfaces are roadmap | [writes.md](writes.md), [architecture.md](architecture.md) |
| Index lifecycle | `ensure_indices` is explicit today; reconciler-based convergence is roadmap | [indexes.md](../user/indexes.md), [maintenance.md](../user/maintenance.md) |
| Index lifecycle | Index *creation* per `@index`/`@key` property is dispatched by type (enum + orderable scalar → BTREE, free-text String → FTS, Vector → vector) via `node_prop_index_kind`; index *coverage maintenance* exists — `optimize` runs Lance `optimize_indices` after compaction to fold appended/rewritten fragments into existing indexes (still an explicit maintenance call, not yet a background reconciler) | [indexes.md](../user/indexes.md), [maintenance.md](../user/maintenance.md) |
| Traversal IDs | Runtime still builds `TypeIndex`; Lance stable row-id based graph IDs are roadmap | [architecture.md](architecture.md), [query-language.md](../user/query-language.md) |
| Auth | Bearer token hashing and server-side actor resolution are implemented at the HTTP boundary | [server.md](../user/server.md), [policy.md](../user/policy.md) |
| Tests | Tempdir-backed Lance tests are the current substrate; the storage adapter has an in-memory backend for adapter-level contract tests, but Lance datasets bypass it | [testing.md](testing.md) |

View file

@ -169,6 +169,7 @@ Migration from Lance 4.0.0 → 6.0.1 landed in this cycle (DataFusion 52 → 53,
- **`Dataset::checkout_version(N).await?.restore().await?`**: `restore()` takes `&mut self` and returns `Result<()>` (mutates in place, does not consume + return a new dataset). The recovery rollback hammer at `db/manifest/recovery.rs:505-522` continues to work. Pinned by `lance_surface_guards.rs::_compile_checkout_version_then_restore_signature`.
- **`DatasetBuilder::from_namespace(...).with_branch(...).with_version(...).load()`** surface preserved (the namespace builder chain at `db/manifest/namespace.rs:162-174`). Pinned by `lance_surface_guards.rs::_compile_dataset_builder_from_namespace_signature`.
- **`compact_files(&mut ds, CompactionOptions::default(), None)`** signature stable. `CompactionOptions` still does not expose `data_storage_version`; `compact_files` builds its own `WriteParams { ..Default::default() }`. Note: `LanceFileVersion::default()` is now V2_1 in v6, so optimize-rewritten fragments come out at V2_1 by default (was V2_0 in v4). Existing explicit V2_2 pins on creates/appends still apply.
- **`Dataset::optimize_indices(&mut self, &lance_index::optimize::OptimizeOptions)`** (via `DatasetIndexExt`) is a depended-on surface as of the index-coverage work: `db/omnigraph/optimize.rs` calls it after `compact_files` to fold appended/rewritten fragments into existing indexes (incremental merge, not retrain). It is a **committing** call (mutates in place, advances HEAD; no uncommitted variant in v6.0.1), so optimize treats it as an inline-commit residual under the `SidecarKind::Optimize` recovery sidecar. Signature pinned by `lance_surface_guards.rs::_compile_optimize_indices_signature`; the incremental-coverage behavior pinned by `optimize_indices_extends_fragment_coverage` (appended fragment uncovered before, covered after).
- **`Dataset::delete(predicate)` returns `DeleteResult { new_dataset: Arc<Dataset>, num_deleted_rows: u64 }`** — unchanged shape. Pinned by `lance_surface_guards.rs::_compile_delete_result_field_shape`. MR-A will repurpose this guard to the staged two-phase variant once `DeleteBuilder::execute_uncommitted` migration lands.
- **File reader read methods now async** (Lance PR #6710, v6.0). No effect — omnigraph reaches Lance exclusively through `Dataset::scan` and the staged-write API.
- **Tokenizer vendored as `lance-tokenizer`** (Lance PR #6512, v6.0). No effect — no direct tokenizer imports.
@ -178,6 +179,6 @@ Migration from Lance 4.0.0 → 6.0.1 landed in this cycle (DataFusion 52 → 53,
- **`Dataset::force_delete_branch`** (`branches().delete(name, force=true)`, dataset.rs:524) tolerates a missing branch-*contents* ref (vs plain `delete_branch`'s `RefNotFound`), but on the local store still errors `NotFound` if the branch `tree/` directory is fully absent (`remove_dir_all`'s NotFound is not caught for Lance's native error variant, refs.rs:526-549). Both variants still refuse a branch with referencing descendants (`RefConflict`). `TableStore::force_delete_branch` wraps this to be fully idempotent (tolerates already-absent). The single-authority branch-delete redesign uses it for orphan reclamation (eager best-effort reclaim + cleanup reconciler). Pinned by `lance_surface_guards.rs::force_delete_branch_semantics`. Branch delete is "flip the ref atomically, then `remove_dir_all(tree/{branch})`"; branch-exclusive data lives under `tree/{branch}/` so a drop reclaims it immediately without touching `main`.
- **Lance blob-v2 `compact_files` bug** (no public issue found as of 2026-06): `compact_files` disables binary-copy for blob datasets and forces `BlobHandling::AllBinary` on the read side; the v2.1+ structural decoder then mis-counts column infos for the blob-v2 struct and fails with `Invalid user input: there were more fields in the schema than provided column indices / infos` (`lance-encoding/src/decoder.rs::ColumnInfoIter::expect_next`). This fails even a pristine uniform-V2_2 multi-fragment blob table; vector/list/scalar/ragged columns and mixed file versions all compact fine. Reads/queries use descriptor handling (`BlobHandling::default()`) and are unaffected. `optimize` skips blob-bearing tables behind `LANCE_SUPPORTS_BLOB_COMPACTION = false` (`db/omnigraph/optimize.rs`), reporting `SkipReason::BlobColumnsUnsupportedByLance`. Pinned by `lance_surface_guards.rs::compact_files_still_fails_on_blob_columns`, which turns red when the bug is fixed → flip the gate, remove the skip branch + the `maintenance.rs::optimize_skips_blob_table_and_reports_skip` skip assertions.
Surface guards added: `crates/omnigraph/tests/lance_surface_guards.rs` (10 named guards; 5 runtime + 5 compile-only). Future Lance bumps re-run this file first as the smoke check. Two additional guards from the original plan deferred to follow-up (`manifest_cas_returns_row_level_contention_variant` needs full publisher-race harness; `table_version_metadata_byte_compatible_with_v4` needs `pub(crate)` reach extension).
Surface guards added: `crates/omnigraph/tests/lance_surface_guards.rs` (10 named guards; 5 runtime + 5 compile-only; plus the index-coverage work's `_compile_optimize_indices_signature` and `optimize_indices_extends_fragment_coverage`). Future Lance bumps re-run this file first as the smoke check. Two additional guards from the original plan deferred to follow-up (`manifest_cas_returns_row_level_contention_variant` needs full publisher-race harness; `table_version_metadata_byte_compatible_with_v4` needs `pub(crate)` reach extension).
Bump this date stanza on the next alignment pass.

View file

@ -80,10 +80,17 @@ deferred to a follow-up cycle — tracked).
Three writers have been migrated onto staged primitives:
* **`ensure_indices`** (`db/omnigraph/table_ops.rs::build_indices_on_dataset_for_catalog`)
— scalar indices (BTree, Inverted) now use `stage_create_*_index` +
`commit_staged`. Vector indices stay inline (residual — Lance
`build_index_metadata_from_segments` is `pub(crate)` in 6.0.1;
companion ticket to lance-format/lance#6658 needed).
— scalar indices (BTree, Inverted) use `stage_create_*_index` +
`commit_staged`. Which index a `@index`/`@key` property gets is dispatched by
type via `node_prop_index_kind` (enum + orderable scalar → BTree, free-text
String → Inverted/FTS, Vector → vector). Vector indices stay inline (residual
— Lance `build_index_metadata_from_segments` is `pub(crate)` in 6.0.1;
companion ticket to lance-format/lance#6658 needed). This build is
existence-gated (it creates a *missing* index over current fragments); folding
fragments appended afterward into an *existing* index is `optimize`'s
`optimize_indices` pass — an inline-commit residual, not a staged write (Lance
exposes no uncommitted index-optimize), covered by the optimize recovery
sidecar (see [maintenance.md](../user/maintenance.md)).
* **`branch_merge::publish_rewritten_merge_table`**
(`exec/merge.rs`) — merge_insert now uses `stage_merge_insert` +
`commit_staged`. Deletes stay inline (Lance #6658 residual).

View file

@ -4,14 +4,15 @@
## `optimize_all_tables(db)` — non-destructive
- Lance `compact_files()` on every node + edge table on `main`, then **publishes the compacted version to the `__manifest`** so the manifest's `table_version` tracks the compacted Lance HEAD. Reads pin the manifest version, so without this publish compaction would be invisible to readers *and* would break the HEAD-vs-manifest precondition of the next schema apply / strict update/delete ("stale view … refresh and retry"). The publish advances the graph version (a system-attributed commit) only for tables that actually compacted.
- Lance `compact_files()` then `optimize_indices()` on every node + edge table on `main`, then **publishes the resulting version to the `__manifest`** so the manifest's `table_version` tracks the compacted-and-reindexed Lance HEAD. Reads pin the manifest version, so without this publish the work would be invisible to readers *and* would break the HEAD-vs-manifest precondition of the next schema apply / strict update/delete ("stale view … refresh and retry"). The publish advances the graph version (a system-attributed commit) only for tables that actually changed.
- Rewrites small fragments into fewer large ones; old fragments remain reachable via older manifests until `cleanup` runs.
- Each table's compact→publish runs under its per-`(table, main)` write queue (serializing with concurrent mutations — compaction is a Lance `Rewrite` op that retryable-conflicts with a concurrent merge/update/delete on overlapping fragments). The Lance-HEAD-before-manifest-publish gap is covered by a `SidecarKind::Optimize` recovery sidecar (loose-match): a crash in that window rolls the compacted version forward on the next `Omnigraph::open` (compaction is content-preserving, so roll-forward is always safe).
- **Reindex (index coverage maintenance).** A scalar/FTS/vector index only covers the fragments it was built over. Rows appended after the index was built (e.g. by `ingest --mode merge`, whose commit does not rebuild an already-existing index) are scanned unindexed, and `compact_files` itself rewrites fragments out of an index's coverage. `optimize` runs Lance's incremental `optimize_indices` after compaction to fold those fragments back in (a delta merge, not a full retrain), restoring full coverage so equality/range/traversal predicates stay index-accelerated. This is why a table with **no compaction work but stale index coverage still commits** a new version under `optimize`. Run `optimize` on a cadence at least as frequent as your freshness window so recently-ingested rows do not linger in the unindexed flat-scan tail. `optimize_indices` is an inline-commit residual (Lance exposes no uncommitted variant), so it advances Lance HEAD like compaction and is covered by the same `SidecarKind::Optimize` recovery sidecar.
- Each table's compact→reindex→publish runs under its per-`(table, main)` write queue (serializing with concurrent mutations — compaction is a Lance `Rewrite` op that retryable-conflicts with a concurrent merge/update/delete on overlapping fragments). The Lance-HEAD-before-manifest-publish gap (now spanning both compaction and `optimize_indices`) is covered by a single `SidecarKind::Optimize` recovery sidecar (loose-match): a crash in that window rolls the compacted-and-reindexed version forward on the next `Omnigraph::open` (both ops are content-preserving, so roll-forward is always safe).
- **Requires a recovered graph.** `optimize` refuses (errors) when an unresolved recovery sidecar is present under `__recovery` — operating on an unrecovered graph could publish a partial write the open-time recovery sweep would roll back. Reopen the graph to run the recovery sweep, then re-run `optimize`.
- **Uncovered drift is skipped, not interpreted.** If a table's Lance HEAD is ahead of the version recorded in `__manifest` and no recovery sidecar covers that movement, `optimize` reports `skipped: Some(DriftNeedsRepair)` with the manifest/head versions and leaves the table untouched. Run `omnigraph repair` to classify and explicitly publish that drift.
- Bounded by `OMNIGRAPH_MAINTENANCE_CONCURRENCY` (default 8).
- Returns `[TableOptimizeStats { table_key, fragments_removed, fragments_added, committed, skipped, manifest_version, lance_head_version }]`.
- **Blob tables are skipped.** A table that declares any `Blob` property is not compacted: it is reported with `skipped: Some(BlobColumnsUnsupportedByLance)` (and logged via `tracing::warn`) instead of compacted, and the rest of the sweep proceeds normally. The current Lance `compact_files` mis-decodes blob-v2 columns under its forced `BlobHandling::AllBinary` read; **reads and writes are unaffected** — only compaction is. This is gated by `LANCE_SUPPORTS_BLOB_COMPACTION` (`db/omnigraph/optimize.rs`) and removed when the upstream Lance fix lands (see [docs/dev/lance.md](../dev/lance.md)). Consequence: fragment count and deleted-row space on blob tables are not reclaimed until then; query results are never affected.
- **Blob tables are skipped.** A table that declares any `Blob` property is not compacted: it is reported with `skipped: Some(BlobColumnsUnsupportedByLance)` (and logged via `tracing::warn`) instead of compacted, and the rest of the sweep proceeds normally. The current Lance `compact_files` mis-decodes blob-v2 columns under its forced `BlobHandling::AllBinary` read; **reads and writes are unaffected** — only compaction is. This is gated by `LANCE_SUPPORTS_BLOB_COMPACTION` (`db/omnigraph/optimize.rs`) and removed when the upstream Lance fix lands (see [docs/dev/lance.md](../dev/lance.md)). Consequence: fragment count and deleted-row space on blob tables are not reclaimed until then; query results are never affected. A skipped blob table is also **not reindexed** in the same sweep (the skip happens before the reindex step), so its index coverage on appended rows is not refreshed by `optimize` today — a follow-up may split reindex out of the blob skip since `optimize_indices` does not hit the blob-compaction bug.
## `repair_all_tables(db, options)` — explicit