mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-15 01:55:13 +02:00
feat(engine): reindex in optimize to keep index coverage current
A scalar/FTS/vector index only covers the fragments it was built over. Rows appended after the build (e.g. `ingest --mode merge`, whose commit does not rebuild an existing index) are scanned unindexed, and `compact_files` rewrites fragments out of coverage. Nothing folded them back in, so coverage decayed as the graph grew — even the id/src/dst BTREEs that power traversal. `optimize_one_table` now runs Lance `optimize_indices` after `compact_files` (incremental merge, not retrain — the same compact->optimize_indices sequence LanceDB's `optimize()` uses) and enters the publish path on compaction work OR stale index coverage (new `TableStore::has_unindexed_fragments`, reusing the fragment_bitmap logic). `optimize_indices` is a committing call with no uncommitted variant in lance-6.0.1, so it is an inline-commit residual covered by the existing `SidecarKind::Optimize` recovery sidecar spanning both ops. Blob-bearing tables are still skipped (the Lance blob-compaction bug is compaction-specific; reindex-for-blob deferred as a noted follow-up). Tests: maintenance.rs asserts an appended fragment is uncovered before and covered after optimize, and idempotency holds (second pass is a no-op). lance_surface_guards pins the `optimize_indices` signature and its incremental- coverage behavior. The existing optimize Phase-B recovery failpoint now also exercises a crash after reindex. Docs: maintenance.md, writes.md, invariants.md, lance.md, AGENTS.md.
This commit is contained in:
parent
481de860b2
commit
0edcf3ec59
9 changed files with 259 additions and 22 deletions
|
|
@ -80,10 +80,17 @@ deferred to a follow-up cycle — tracked).
|
|||
Three writers have been migrated onto staged primitives:
|
||||
|
||||
* **`ensure_indices`** (`db/omnigraph/table_ops.rs::build_indices_on_dataset_for_catalog`)
|
||||
— scalar indices (BTree, Inverted) now use `stage_create_*_index` +
|
||||
`commit_staged`. Vector indices stay inline (residual — Lance
|
||||
`build_index_metadata_from_segments` is `pub(crate)` in 6.0.1;
|
||||
companion ticket to lance-format/lance#6658 needed).
|
||||
— scalar indices (BTree, Inverted) use `stage_create_*_index` +
|
||||
`commit_staged`. Which index a `@index`/`@key` property gets is dispatched by
|
||||
type via `node_prop_index_kind` (enum + orderable scalar → BTree, free-text
|
||||
String → Inverted/FTS, Vector → vector). Vector indices stay inline (residual
|
||||
— Lance `build_index_metadata_from_segments` is `pub(crate)` in 6.0.1;
|
||||
companion ticket to lance-format/lance#6658 needed). This build is
|
||||
existence-gated (it creates a *missing* index over current fragments); folding
|
||||
fragments appended afterward into an *existing* index is `optimize`'s
|
||||
`optimize_indices` pass — an inline-commit residual, not a staged write (Lance
|
||||
exposes no uncommitted index-optimize), covered by the optimize recovery
|
||||
sidecar (see [maintenance.md](../user/maintenance.md)).
|
||||
* **`branch_merge::publish_rewritten_merge_table`**
|
||||
(`exec/merge.rs`) — merge_insert now uses `stage_merge_insert` +
|
||||
`commit_staged`. Deletes stay inline (Lance #6658 residual).
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue