From b183db078ffa57daae10c67041ca2dc12b4720ea Mon Sep 17 00:00:00 2001 From: Ragnor Comerford Date: Mon, 15 Jun 2026 18:48:43 +0200 Subject: [PATCH] Index materialization is derived state: defer off the write path, reconcile via optimize (iss-848) (#246) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * test(engine): reproduce empty-table Vector @index aborting schema apply A Vector (IVF) index trains k-means centroids over the column, so Lance cannot build it on 0 vectors ("Creating empty vector indices with train=False is not yet implemented"). schema apply reconciles a table's whole index set whenever any @index on it changes, so adding an unrelated scalar @index materializes the dormant empty vector index and aborts the entire migration (all-or-nothing). This regression test inits a 0-row Doc with a Vector @index, adds a scalar @index, and asserts the apply succeeds (then loads one embedded row and asserts the deferred index materializes). It fails today at the apply step with the vector-index abort; the fix lands in the next commit. Refs dev-graph iss-empty-vector-index-schema-apply, iss-848. * fix(engine): defer Vector @index on an empty table instead of aborting schema apply build_indices_on_dataset_for_catalog materialized a declared Vector @index unconditionally. On a 0-row table Lance cannot train the IVF index ("Creating empty vector indices with train=False is not yet implemented"), so any later migration that touches the table (e.g. adding an unrelated scalar @index, which reconciles the table's whole index set) aborted the entire migration on the dormant vector index — all-or-nothing. Guard the vector arm with a row-count check, matching the guard ensure_indices_for_branch and the branch-merge rebuild already use: an untrainable column becomes a pending index that a later ensure_indices / optimize materializes once the table has rows. Reads stay correct meanwhile (vector search degrades to a brute-force scan). Stop-gap: the residual rows-present-but-vectors-null window and the full decoupling (intent recorded at apply, an idempotent coverage reconciler) are dev-graph iss-848. Turns the green half of the regression test added in the previous commit. Refs dev-graph iss-empty-vector-index-schema-apply, iss-848, iss-687. * docs(invariants): record the logical-contract-over-physical-state principle The bug class behind the empty-table vector-index abort (and the schema-apply vs optimize version drift) is one shape: a physical operation allowed to fail a logical one. Several hard invariants (2, 5, 7, 13) and deny-list items are already instances of this, but the unifying rule was never written down. Add it to docs/dev/invariants.md as a "Governing principle" section above the hard invariants, naming which invariants and deny-list items instantiate it and the smell to watch for (a logical operation gated on a physical fact). Add a one-line always-on rule (7) in AGENTS.md so it stays in working memory, with the qualifier that genuine logical conflicts still fail loudly — the licence to lag covers physical convergence, not correctness. Audience-neutral: no private ticket refs. check-agents-md.sh passes. * test(engine): index build must tolerate rows with null vectors (load-before-embed) Loading rows whose vector column is null into a `Vector @index` table fails today: build_indices (reached via the loader's prepare_updates_for_commit) calls create_vector_index, and Lance's IVF KMeans errors "cannot train 1 centroids with 0 vectors". The same abort hits ensure_indices/optimize/schema apply/merge, since they all funnel through build_indices_on_dataset_for_catalog. This test loads two null-embedding rows and calls ensure_indices; it must not abort (the untrainable vector column is deferred, sibling indexes still build). Fails today at the load step; fixed in the next commit. Refs dev-graph iss-848, iss-empty-vector-index-schema-apply. * fix(engine): defer unbuildable index columns instead of aborting the write path build_indices_on_dataset_for_catalog is the chokepoint every write path funnels through (load/mutate via prepare_updates_for_commit, schema apply, ensure_indices, optimize, branch merge). Its vector arm called create_vector_index unconditionally, so a column with no trainable vectors yet — an empty table, or rows loaded before `embed` populates them — aborted the whole operation with Lance's IVF KMeans error. Fault-isolate the vector build: on failure, record the column as a PendingIndex (table, column, reason), log it, and continue building the sibling indexes; a later ensure_indices/optimize materializes it once the column is trainable, and reads use brute-force meanwhile. Manifest/CAS/IO errors at the publish boundary still propagate. Isolating at the single chokepoint realizes the governing principle (physical index state never fails a logical operation) for every write path, and supersedes the earlier symptomatic count_rows==0 stop-gap (removed) — closing the residual rows-present-but-vectors-null window it left open. Surfacing pending index status rather than failing is the database norm (Postgres indisvalid, LanceDB list_indices). ensure_indices and the build_indices wrappers now return Vec; optimize surfaces it in a later commit. Refs dev-graph iss-848, iss-951 (vector index stays inline-commit until lance#6666). * test(engine): index-only schema apply must not touch table data Adding an @index to an existing column should be a pure metadata change once index materialization moves to the reconciler (iss-848): the apply records the intent in the catalog/IR but builds nothing inline, so the table's manifest version is unchanged. Today the indexed_tables block builds the index inline and bumps the version (4 -> 5). Fixed in the next commit. Refs dev-graph iss-848. * fix(engine): schema apply records index intent only; index-only apply is metadata Schema apply no longer builds indexes inline. The four build_indices calls (added/renamed/rewritten/index-only tables) are removed; the @index/@key intent is already persisted in the catalog/IR the apply writes, and the physical index is materialized off the critical path by ensure_indices/optimize (iss-848). Concretely: - AddConstraint (an @index addition — every other added constraint plans as UnsupportedChange) becomes a pure metadata step alongside the metadata-only steps: it touches no table data, so the table version is unchanged. - added/renamed/rewritten tables still write their data; only the trailing index build is gone. The rewritten table's coverage is restored later by optimize_indices. - recovery_pins drops index-only tables (they no longer advance Lance HEAD) and keeps rewritten tables; their post_commit_pin = expected+1 is now exact (one rewrite commit), strengthening recovery classification. - the now-orphaned Omnigraph::build_indices_on_dataset_for_catalog wrapper is removed. A migration can no longer abort on an index build, for any index type at any cardinality. Turns the green half of index_only_constraint_apply_touches_no_table_data. Refs dev-graph iss-848. * test(engine): optimize must converge a declared-but-unbuilt index After iss-848, adding an @index post-data is a metadata-only apply that defers the physical build, so the column is declared-indexed but unbuilt (reads scan). `optimize` — the operator's cron reconciler — must materialize it. Today optimize only maintains coverage of EXISTING indexes (optimize_indices) and never creates missing ones, so the rank BTREE stays Degraded after optimize. Fixed next commit. Refs dev-graph iss-848. * fix(engine): optimize materializes declared-but-unbuilt indexes (the reconciler) `omnigraph optimize` is the operator's cron reconciler. It already compacts and folds new fragments into EXISTING indexes (optimize_indices); now it also builds declared-but-missing indexes, so the indexes schema apply / load defer (iss-848) converge on the next optimize. Done inside optimize_one_table (not by composing the all-tables ensure_indices, which is drift-blind and would re-publish the uncovered HEAD>manifest drift that optimize deliberately skips): after the per-table drift/blob skips and under the queue + Optimize sidecar already held, a needs_index_create gate (reusing needs_index_work_node/edge — "declared index missing AND row_count > 0", so empty tables stay no-ops) admits index-only work, and Phase B builds the missing index over the just-compacted layout via the build chokepoint. An untrainable vector column fault-isolates into the new TableOptimizeStats.pending_indexes (the list_indices/indisvalid analog operators read), not a failure. committed now reflects index commits, so the existing post-publish cache invalidation covers them. LanceDB's optimize only maintains existing indexes; creating declared-but-missing ones is the L2 behavior omnigraph's declarative @index needs. Turns the green half of optimize_materializes_index_declared_but_unbuilt. Refs dev-graph iss-848. * docs: index materialization is deferred to the reconciler (iss-848) Update the index-lifecycle docs to reflect the new contract: @index/@key declares intent and the physical index is derived state that never fails a logical operation. Schema apply builds nothing (records intent only); load/mutate build inline through one chokepoint that defers an untrainable Vector column as pending; optimize/ensure_indices is the reconciler that creates declared-but-missing indexes and maintains coverage, reporting still-pending columns. Touches: dev/invariants.md (truth-matrix Index-lifecycle row), AGENTS.md (capability matrix), user/search/indexes.md (L2 orchestration), user/operations/ maintenance.md (optimize reconciler bullet), dev/testing.md (new tests). * test(server): schema_apply_route_can_add_index reflects deferred index build iss-848 made schema apply record @index intent without building the physical index inline. The route test asserted the index count increased after apply; on an empty graph it now stays unchanged (the build is deferred to ensure_indices/optimize). Assert the new contract: apply succeeds and the physical index count is unchanged. * fix(engine): precheck vector trainability — don't pin or swallow (PR review) Two issues Cursor Bugbot caught in the chokepoint fault-isolation: 1. (HIGH) Pending vector pins roll back siblings. needs_index_work_node counted a missing vector index as work whenever the table had rows, so a column with no trainable vectors got pinned in the EnsureIndices recovery sidecar — but the build deferred it (zero commit). On a crash before manifest publish the classifier sees NoMovement and the all-or-nothing decision (recovery.rs decide()) rolls back the WHOLE sidecar, undoing a sibling table's committed index work. 2. (MED) Vector build swallowed fatal errors. The match arm converted every create_vector_index error into a deferred PendingIndex, hiding genuine I/O/manifest/Lance failures as "pending". Fix both with one trainability precheck (vector_column_trainable: >=1 non-null vector, the ivf_flat(1) minimum) used identically by needs_index_work_node and the build arm: an untrainable column is never counted as work (so never pinned — no zero-commit pin) and never attempted (so it can't fail); only a trainable column is built, and then any error PROPAGATES (stays fatal). The deferred column is still recorded as a PendingIndex with a clear reason. Refs dev-graph iss-848. * feat(cli): surface pending index column + reason in optimize output (PR review) Codex (P2): pending_indexes was documented as visible in `optimize --json` but the CLI projection never emitted it — operators would lose the only signal that optimize has deferred index work. Greptile (P2): the stat dropped the reason, so operators saw which column was stuck, not why. Carry the reason: TableOptimizeStats.pending_indexes is now Vec (column + reason), and `omnigraph optimize --json` emits {column, reason} per pending index; human output prints a "↳ index pending on '': " line. Refs dev-graph iss-848. * test: align CLI index-add test with deferred build; cover post-rename reconcile - schema_apply_json_adds_index_for_existing_property (cli_schema_config.rs): the CLI analog of the server test — asserted the index count grew after apply; under iss-848 the apply defers the build, so the count is unchanged on an empty graph. Assert the deferred contract. (The only full-suite failure.) - optimize_materializes_index_after_type_rename (maintenance.rs, new): covers the gap Greptile flagged — a RenameType writes the renamed table with rows but no indexes (inline build removed in Commit B); assert the rank index is Degraded post-rename and Indexed after optimize reconciles it. Refs dev-graph iss-848. * test(engine): in-source apply tests reflect deferred index materialization The two db::omnigraph in-source unit tests asserted the old "schema apply builds / preserves indexes inline" behavior (the only remaining full-suite failures): - test_apply_schema_defers_index_then_reconciler_builds_it (was test_apply_schema_adds_index_for_existing_property): apply records the @index intent but builds nothing; assert the BTREE on `age` is absent after apply and present after ensure_indices. (Uses `age`, unindexed in TEST_SCHEMA — `name @key` is already FTS-indexed at seed.) - test_apply_schema_rewrite_defers_index_then_reconciler_restores (was test_apply_schema_rewrite_preserves_existing_indices): an AddProperty rewrite no longer rebuilds indexes inline; assert ensure_indices restores id BTREE + name FTS after the rewrite. Verified by grep that these + the server/CLI tests are the complete set of "apply builds an index" assertions; all other index-presence tests run after load/ensure_indices/primitives, which still build. Refs dev-graph iss-848. * fix(engine): optimize always reports pending indexes, not only on create-work (PR review) Cursor Bugbot (MED): pending_indexes was filled only when needs_index_create was true, but the vector trainability precheck makes needs_index_work_node exclude an untrainable Vector column. So a table whose sole missing index is untrainable, but which optimize still compacts or reindexes, returned an empty pending_indexes — contradicting the documented operator contract for deferred columns. Run the (idempotent) build chokepoint unconditionally once past the no-op gate, rather than gating it on needs_index_create. It skips existing indexes, builds any buildable missing one, and reports an untrainable column as pending whether the table entered for compaction, reindex, or index creation. needs_index_create still gates the no-op decision (so an index-only table still enters the path). Refs dev-graph iss-848. * test(engine): reframe staged-BTREE-failure failpoint onto the reconciler path ensure_indices_stage_btree_failure_leaves_existing_tables_writable fired `ensure_indices.post_stage_pre_commit_btree` and expected `apply_schema` (adding a type) to fail mid-BTREE-build. iss-848 removed apply's inline index build, so that apply now succeeds and the test's unwrap_err panicked — it exercised a removed code path. Reframe onto where BTREE builds happen now: seed Person, add an `@index` on `age` (apply records intent, defers the build), then `ensure_indices` builds the deferred BTREE and the failpoint fires between stage and commit. Person's HEAD is unchanged (no drift) and its EnsureIndices sidecar pins NoMovement; a write to a different, unpinned table (Company) is unaffected (mutations/loads heal roll-forward and proceed, unlike optimize/repair which refuse on a pending sidecar). Preserves the original coverage (staged-index stage failure leaves other tables writable, no drift) in the new architecture. Refs dev-graph iss-848. * feat(server): converge deferred indexes promptly after schema apply (iss-848) Schema apply records @index intent but defers the physical build. On a long-lived server, spawn a detached best-effort ensure_indices after a successful apply so the indexes converge promptly instead of waiting for the operator's next optimize. Fire-and-forget: it never blocks or fails the apply response, and a failure is logged (the index still converges on the next optimize). Guarded on result.applied. The CLI is one-shot, so it has no equivalent; its convergence path is the optimize cadence. handle.engine is already an Arc, so the spawn takes an owned clone. Convergence itself is covered by the engine ensure_indices/optimize tests; the existing empty-graph schema-apply route tests confirm the response is unaffected (the spawn is a read-only no-op on an empty table). Refs dev-graph iss-848. * docs(maintenance): list pending_indexes in optimize per-table stats (consistency) --- AGENTS.md | 3 +- crates/omnigraph-cli/src/main.rs | 7 + .../omnigraph-cli/tests/cli_schema_config.rs | 8 +- crates/omnigraph-server/src/handlers.rs | 19 +++ .../omnigraph-server/tests/schema_routes.rs | 10 +- crates/omnigraph/src/db/mod.rs | 6 +- crates/omnigraph/src/db/omnigraph.rs | 54 +++++-- crates/omnigraph/src/db/omnigraph/optimize.rs | 67 ++++++-- .../src/db/omnigraph/schema_apply.rs | 94 ++++------- .../omnigraph/src/db/omnigraph/table_ops.rs | 142 +++++++++++++---- crates/omnigraph/tests/failpoints.rs | 95 ++++++----- crates/omnigraph/tests/maintenance.rs | 149 ++++++++++++++++++ crates/omnigraph/tests/schema_apply.rs | 105 ++++++++++++ docs/dev/invariants.md | 34 +++- docs/dev/testing.md | 4 +- docs/user/operations/maintenance.md | 3 +- docs/user/search/indexes.md | 3 +- 17 files changed, 622 insertions(+), 181 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 7e42a2a..b4453be 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -144,6 +144,7 @@ These are architectural rules that need to be in scope on every change. They're 4. **Bearer-token plaintext never persists in process memory.** Tokens are hashed at startup; auth uses constant-time comparison; the actor id is server-resolved from the hash match and must not be settable by the client. 5. **Reads always see the current index state for the branch they're reading.** Indexes track the branch head, not historical snapshots. If you change index lifecycle, preserve this guarantee. 6. **Stable type IDs survive renames.** Schema migration relies on identity that's stable across rename — don't mint new IDs on rename. +7. **Logical contract over physical state.** Physical state (index coverage, fragment layout, compaction versions, staged writes) is derived and rebuildable; it must never fail a logical operation. Check preconditions against logical state and let reconciliation converge the physical state idempotently — genuine logical conflicts still fail loudly. This is the rule rules 1–6 instantiate; full statement and applications in [docs/dev/invariants.md](docs/dev/invariants.md). ### Deny-list (fast-pass review filter — full reasoning in [docs/dev/invariants.md](docs/dev/invariants.md)) @@ -250,7 +251,7 @@ omnigraph policy explain --actor act-alice --action change --branch main | Compaction (`compact_files`) + reindex (`optimize_indices`) | ✅ | `omnigraph optimize` orchestrates over all node/edge tables, bounded concurrency; per table runs `compact_files` **then Lance `optimize_indices`** (folds appended/rewritten fragments back into existing indexes — incremental merge, not retrain) and **publishes the resulting version to `__manifest`** (so the manifest tracks the Lance HEAD — required for reads to observe the work and for schema apply / strict writes to pass their HEAD-vs-manifest precondition), under the per-`(table, main)` write queue with `SidecarKind::Optimize` recovery coverage spanning both ops; **commits even with no compaction work if index coverage is stale**; **refuses on an unrecovered graph**; **skips uncovered HEAD > manifest drift** with `DriftNeedsRepair`; **skips blob-bearing tables** (reported via `TableOptimizeStats.skipped`, not silent; reindex is skipped for them too today), gated on `LANCE_SUPPORTS_BLOB_COMPACTION` until the upstream blob-v2 compaction-decode bug is fixed (see [docs/dev/invariants.md](docs/dev/invariants.md) Known Gaps) | | Repair uncovered drift | — | `omnigraph repair` explicitly classifies uncovered table `HEAD > manifest` drift: verified maintenance drift (`ReserveFragments`/`Rewrite`) can be published with `--confirm`; suspicious or unverifiable drift requires `--force --confirm`. Sidecar-covered crash residuals still recover automatically on open. | | Cleanup (`cleanup_old_versions`) | ✅ | `omnigraph cleanup` with `--keep` / `--older-than` policy | -| BTREE / inverted (FTS) / vector indexes | ✅ | `ensure_indices` builds them per `@index`/`@key` column, dispatched by type via `node_prop_index_kind` (enum + orderable scalar → BTREE, free-text String → FTS, Vector → vector); idempotent; lazy across branches. Coverage of fragments appended after build is restored by `optimize`'s `optimize_indices` pass (see Compaction row). | +| BTREE / inverted (FTS) / vector indexes | ✅ | `@index`/`@key` declares intent; the physical index is derived state that never fails a logical op. Built per column through one chokepoint (`build_indices_on_dataset_for_catalog`, type-dispatched by `node_prop_index_kind`: enum + orderable scalar → BTREE, free-text String → FTS, Vector → vector); idempotent; lazy across branches. **Schema apply builds nothing** (records intent only); `load`/`mutate` build inline but **defer an untrainable Vector column** (no trainable vectors yet) as *pending* rather than aborting. `ensure_indices`/`optimize` is the reconciler that materializes declared-but-missing indexes and restores coverage of appended/rewritten fragments (`optimize_indices`), reporting still-pending columns (see Compaction row). | | `merge_insert` upsert | ✅ | `LoadMode::Merge`, mutation `update`/`insert`/`delete` lowering | | Vector search | ✅ | `nearest()` query op; embedding pipeline (Gemini / OpenAI clients); `@embed` in schema | | Full-text search | ✅ | `search/fuzzy/match_text/bm25` query ops | diff --git a/crates/omnigraph-cli/src/main.rs b/crates/omnigraph-cli/src/main.rs index 45e24f7..a02f9aa 100644 --- a/crates/omnigraph-cli/src/main.rs +++ b/crates/omnigraph-cli/src/main.rs @@ -770,6 +770,10 @@ async fn main() -> Result<()> { "skipped": s.skipped.map(|r| r.as_str()), "manifest_version": s.manifest_version, "lance_head_version": s.lance_head_version, + "pending_indexes": s.pending_indexes.iter().map(|p| serde_json::json!({ + "column": p.column, + "reason": p.reason, + })).collect::>(), })).collect::>(), }); print_json(&value)?; @@ -786,6 +790,9 @@ async fn main() -> Result<()> { } else { println!(" {:<40} no-op", s.table_key); } + for p in &s.pending_indexes { + println!(" ↳ index pending on '{}': {}", p.column, p.reason); + } } } } diff --git a/crates/omnigraph-cli/tests/cli_schema_config.rs b/crates/omnigraph-cli/tests/cli_schema_config.rs index 8c5b1f3..b81d6ff 100644 --- a/crates/omnigraph-cli/tests/cli_schema_config.rs +++ b/crates/omnigraph-cli/tests/cli_schema_config.rs @@ -334,7 +334,13 @@ fn schema_apply_json_adds_index_for_existing_property() { let dataset = snapshot.open("node:Person").await.unwrap(); dataset.load_indices().await.unwrap().len() }); - assert!(after_index_count > before_index_count); + // iss-848: `schema apply` records the `@index` intent but defers the physical + // index build (materialized later by ensure_indices/optimize; on this empty + // table nothing builds anyway). So the physical index count is unchanged. + assert_eq!( + after_index_count, before_index_count, + "schema apply records @index intent but defers the physical build (iss-848)" + ); } #[test] diff --git a/crates/omnigraph-server/src/handlers.rs b/crates/omnigraph-server/src/handlers.rs index 26fa33f..8e310fd 100644 --- a/crates/omnigraph-server/src/handlers.rs +++ b/crates/omnigraph-server/src/handlers.rs @@ -1196,6 +1196,25 @@ pub(crate) async fn server_schema_apply( .await .map_err(ApiError::from_omni)? }; + // Prompt index convergence (iss-848): schema apply records `@index` intent + // but defers the physical build. On a long-lived server, materialize it + // promptly rather than waiting for the next `optimize` cron — spawned + // detached so it never blocks or fails the apply response. Best-effort: a + // failure is logged and the index still converges on the next optimize. + // The CLI is one-shot, so it has no equivalent; its convergence path is the + // operator's optimize cadence. + if result.applied { + let engine = Arc::clone(&handle.engine); + tokio::spawn(async move { + if let Err(err) = engine.ensure_indices().await { + tracing::warn!( + target: "omnigraph::server", + error = %err, + "post-apply ensure_indices failed; indexes will converge on the next optimize", + ); + } + }); + } Ok(Json(schema_apply_output(handle.uri.as_str(), result))) } diff --git a/crates/omnigraph-server/tests/schema_routes.rs b/crates/omnigraph-server/tests/schema_routes.rs index d250d8a..65b39a9 100644 --- a/crates/omnigraph-server/tests/schema_routes.rs +++ b/crates/omnigraph-server/tests/schema_routes.rs @@ -294,6 +294,11 @@ async fn schema_apply_route_can_add_index() { assert_eq!(status, StatusCode::OK); assert_eq!(payload["applied"], true); + // iss-848: the /schema/apply route accepts the index-add and applies it as a + // metadata change — it records the `@index` intent in the catalog/IR but does + // NOT build the physical index inline (the build is deferred to + // ensure_indices/optimize; on this empty table nothing would build anyway). + // So the physical index count is unchanged by the apply. let reopened = Omnigraph::open(graph.to_str().unwrap()).await.unwrap(); let snapshot = reopened .snapshot_of(ReadTarget::branch("main")) @@ -301,7 +306,10 @@ async fn schema_apply_route_can_add_index() { .unwrap(); let dataset = snapshot.open("node:Person").await.unwrap(); let after_index_count = dataset.load_indices().await.unwrap().len(); - assert!(after_index_count > before_index_count); + assert_eq!( + after_index_count, before_index_count, + "schema apply records @index intent but defers the physical build (iss-848)" + ); } #[tokio::test] diff --git a/crates/omnigraph/src/db/mod.rs b/crates/omnigraph/src/db/mod.rs index 000602a..f382908 100644 --- a/crates/omnigraph/src/db/mod.rs +++ b/crates/omnigraph/src/db/mod.rs @@ -11,9 +11,9 @@ pub use graph_coordinator::{GraphCoordinator, ReadTarget, ResolvedTarget, Snapsh pub use manifest::{Snapshot, SubTableEntry, SubTableUpdate}; pub(crate) use omnigraph::ensure_public_branch_ref; pub use omnigraph::{ - CleanupPolicyOptions, InitOptions, MergeOutcome, Omnigraph, OpenMode, RepairAction, - RepairClassification, RepairOptions, RepairStats, SchemaApplyOptions, SchemaApplyResult, - SkipReason, TableCleanupStats, TableOptimizeStats, TableRepairStats, + CleanupPolicyOptions, InitOptions, MergeOutcome, Omnigraph, OpenMode, PendingIndex, + RepairAction, RepairClassification, RepairOptions, RepairStats, SchemaApplyOptions, + SchemaApplyResult, SkipReason, TableCleanupStats, TableOptimizeStats, TableRepairStats, }; pub(crate) const SCHEMA_APPLY_LOCK_BRANCH: &str = "__schema_apply_lock__"; diff --git a/crates/omnigraph/src/db/omnigraph.rs b/crates/omnigraph/src/db/omnigraph.rs index 6c80117..6d2ccd7 100644 --- a/crates/omnigraph/src/db/omnigraph.rs +++ b/crates/omnigraph/src/db/omnigraph.rs @@ -40,6 +40,7 @@ pub use repair::{ RepairAction, RepairClassification, RepairOptions, RepairStats, TableRepairStats, }; pub use schema_apply::SchemaApplyOptions; +pub use table_ops::PendingIndex; use super::commit_graph::GraphCommit; use super::manifest::{ @@ -1069,11 +1070,15 @@ impl Omnigraph { /// unbranched subtables keep inheriting `main`, while subtables inherited /// from an ancestor branch are first forked into the active branch before /// their index metadata is updated. - pub async fn ensure_indices(&self) -> Result<()> { + /// Returns the declared indexes that could not be materialized on this + /// pass (today: vector columns with no trainable vectors yet). They are + /// deferred, not errors; a later `ensure_indices`/`optimize` builds them + /// once the column is trainable. Reads stay correct (brute-force) meanwhile. + pub async fn ensure_indices(&self) -> Result> { table_ops::ensure_indices(self).await } - pub async fn ensure_indices_on(&self, branch: &str) -> Result<()> { + pub async fn ensure_indices_on(&self, branch: &str) -> Result> { table_ops::ensure_indices_on(self, branch).await } @@ -1530,19 +1535,10 @@ impl Omnigraph { &self, table_key: &str, ds: &mut SnapshotHandle, - ) -> Result<()> { + ) -> Result> { table_ops::build_indices_on_dataset(self, table_key, ds).await } - pub(crate) async fn build_indices_on_dataset_for_catalog( - &self, - catalog: &Catalog, - table_key: &str, - ds: &mut SnapshotHandle, - ) -> Result<()> { - table_ops::build_indices_on_dataset_for_catalog(self, catalog, table_key, ds).await - } - // Used only by in-tree tests (`#[cfg(test)]`); the runtime path now // uses `commit_updates_on_branch_with_expected` exclusively. #[cfg(test)] @@ -2498,25 +2494,49 @@ edge WorksAt: Person -> Company } #[tokio::test] - async fn test_apply_schema_adds_index_for_existing_property() { + async fn test_apply_schema_defers_index_then_reconciler_builds_it() { + // iss-848: schema apply records the @index intent but builds nothing + // inline; a later ensure_indices materializes it once the table has + // rows. (Use `age`, which is unindexed in TEST_SCHEMA — `name @key` is + // already FTS-indexed at seed, so it can't show the deferral.) let dir = tempfile::tempdir().unwrap(); let uri = dir.path().to_str().unwrap(); let mut db = Omnigraph::init(uri, TEST_SCHEMA).await.unwrap(); + seed_person_row(&mut db, "Alice", Some(30)).await; - let desired = TEST_SCHEMA.replace("name: String @key", "name: String @key @index"); + let desired = TEST_SCHEMA.replace("age: I32?", "age: I32? @index"); db.apply_schema(&desired).await.unwrap(); + // Apply built nothing — the BTREE on `age` is deferred. let snapshot = db.snapshot().await; let ds = db .storage() .open_snapshot_at_table(&snapshot, "node:Person") .await .unwrap(); - assert!(db.storage().has_fts_index(&ds, "name").await.unwrap()); + assert!( + !db.storage().has_btree_index(&ds, "age").await.unwrap(), + "apply must not build the index inline (deferred to the reconciler)" + ); + + // The reconciler materializes it (Person has a row). + db.ensure_indices().await.unwrap(); + let snapshot = db.snapshot().await; + let ds = db + .storage() + .open_snapshot_at_table(&snapshot, "node:Person") + .await + .unwrap(); + assert!( + db.storage().has_btree_index(&ds, "age").await.unwrap(), + "ensure_indices must build the deferred index" + ); } #[tokio::test] - async fn test_apply_schema_rewrite_preserves_existing_indices() { + async fn test_apply_schema_rewrite_defers_index_then_reconciler_restores() { + // iss-848: an AddProperty rewrite writes a new dataset version without + // rebuilding indexes inline (deferred); ensure_indices restores them. let dir = tempfile::tempdir().unwrap(); let uri = dir.path().to_str().unwrap(); let initial_schema = TEST_SCHEMA.replace("name: String @key", "name: String @key @index"); @@ -2529,6 +2549,8 @@ edge WorksAt: Person -> Company ); db.apply_schema(&desired).await.unwrap(); + // After the rewrite the reconciler restores index coverage. + db.ensure_indices().await.unwrap(); let snapshot = db.snapshot().await; let ds = db .storage() diff --git a/crates/omnigraph/src/db/omnigraph/optimize.rs b/crates/omnigraph/src/db/omnigraph/optimize.rs index 9195256..00cb872 100644 --- a/crates/omnigraph/src/db/omnigraph/optimize.rs +++ b/crates/omnigraph/src/db/omnigraph/optimize.rs @@ -140,6 +140,12 @@ pub struct TableOptimizeStats { /// Lance HEAD version observed by optimize for drift skips. `None` for /// normal compaction/no-op/blob skips. pub lance_head_version: Option, + /// Declared `@index` columns on this table the reconciler could not build + /// this run, each with the `reason` (today: a vector column with no + /// trainable vectors yet). Empty on the common path. Reported, not fatal — a + /// later `optimize` retries; the `list_indices`/`indisvalid` analog so + /// operators can see which index is pending and why. + pub pending_indexes: Vec, } impl TableOptimizeStats { @@ -153,6 +159,7 @@ impl TableOptimizeStats { skipped: None, manifest_version: None, lance_head_version: None, + pending_indexes: Vec::new(), } } @@ -166,6 +173,7 @@ impl TableOptimizeStats { skipped: Some(reason), manifest_version: None, lance_head_version: None, + pending_indexes: Vec::new(), } } @@ -183,6 +191,7 @@ impl TableOptimizeStats { skipped: Some(SkipReason::DriftNeedsRepair), manifest_version: Some(manifest_version), lance_head_version: Some(lance_head_version), + pending_indexes: Vec::new(), } } } @@ -371,14 +380,26 @@ async fn optimize_one_table( let will_compact = plan.num_tasks() > 0; // Even when there is nothing to compact, the table may still have index // work: rows appended since the index was built (e.g. via `ingest --mode - // merge`) are scanned unindexed until folded in. Either compaction or stale - // index coverage is enough to enter the publish path. If NEITHER, this - // table is a no-op and must NOT be pinned in a sidecar — a zero-commit pin - // classifies NoMovement on recovery and forces an all-or-nothing rollback - // of sibling tables' legitimate work. Uncovered pre-existing manifest/HEAD - // drift is skipped above and must go through explicit repair. + // merge`) are scanned unindexed until folded in (needs_reindex), OR a + // declared `@index` was never built — schema apply records the intent but + // defers the physical build (iss-848), so optimize is the operator-facing + // reconciler that materializes it (needs_index_create). Any of the three is + // enough to enter the publish path. If NONE, this table is a no-op and must + // NOT be pinned in a sidecar — a zero-commit pin classifies NoMovement on + // recovery and forces an all-or-nothing rollback of sibling tables' + // legitimate work. Uncovered pre-existing manifest/HEAD drift is skipped + // above and goes through explicit repair, so this only runs on a healthy + // table under the per-table queue + sidecar. let needs_reindex = TableStore::has_unindexed_fragments(&ds).await?; - if !will_compact && !needs_reindex { + // needs_index_work_* checks "a declared index is missing AND row_count > 0", + // so empty tables stay no-ops (never pinned). It re-reads the head under the + // queue we already hold, so it is consistent with `ds`. + let needs_index_create = if let Some(type_name) = table_key.strip_prefix("node:") { + super::table_ops::needs_index_work_node(db, type_name, &table_key, &full_path, None).await? + } else { + super::table_ops::needs_index_work_edge(db, &table_key, &full_path, None).await? + }; + if !will_compact && !needs_reindex && !needs_index_create { return Ok(TableOptimizeStats::compacted( table_key, &CompactionMetrics::default(), @@ -427,7 +448,30 @@ async fn optimize_one_table( ds.optimize_indices(&OptimizeOptions::default()) .await .map_err(|e| OmniError::Lance(format!("optimize_indices on {}: {}", table_key, e)))?; - let version_after = ds.version().version; + + // Materialize any declared-but-missing index over the just-compacted layout, + // reusing the build chokepoint (idempotent: skips existing indexes; fault- + // isolates an untrainable vector column into `pending` rather than failing). + // Run it UNCONDITIONALLY now that we are past the no-op gate — not only when + // `needs_index_create`. A table can enter this path for compaction or + // reindex while its sole missing index is an untrainable Vector column + // (which `needs_index_work_*` does not count as buildable work); calling the + // build here is what surfaces that column in `pending_indexes`, so optimize + // can't compact a table yet silently drop the deferred-index signal. + // Idempotent + cheap when there is nothing to build. Vector index creation + // is an inline-commit residual; the Optimize sidecar's loose post_commit_pin + // covers the extra commits. + let catalog = db.catalog(); + let mut snapshot = crate::storage_layer::SnapshotHandle::new(ds); + let pending_indexes: Vec = + super::table_ops::build_indices_on_dataset_for_catalog( + db, + &catalog, + &table_key, + &mut snapshot, + ) + .await?; + let version_after = snapshot.dataset().version().version; let committed = version_after != version_before; // Pin the per-writer Phase B → Phase C residual for optimize: Lance HEAD has @@ -438,9 +482,6 @@ async fn optimize_one_table( // expected = the version observed under the queue). On failure the sidecar // is intentionally left for the open-time recovery sweep to roll forward. if committed { - // Re-wrap the post-compaction dataset to read its state through the - // trait surface (`table_state` is a read; no HEAD advance). - let snapshot = crate::storage_layer::SnapshotHandle::new(ds); let state = db.storage().table_state(&full_path, &snapshot).await?; let update = crate::db::SubTableUpdate { table_key: table_key.clone(), @@ -467,7 +508,9 @@ async fn optimize_one_table( ); } - Ok(TableOptimizeStats::compacted(table_key, &metrics, committed)) + let mut stat = TableOptimizeStats::compacted(table_key, &metrics, committed); + stat.pending_indexes = pending_indexes; + Ok(stat) } /// Run Lance `cleanup_old_versions` on every node + edge table on `main`, diff --git a/crates/omnigraph/src/db/omnigraph/schema_apply.rs b/crates/omnigraph/src/db/omnigraph/schema_apply.rs index f965ad4..c054004 100644 --- a/crates/omnigraph/src/db/omnigraph/schema_apply.rs +++ b/crates/omnigraph/src/db/omnigraph/schema_apply.rs @@ -193,7 +193,6 @@ where let mut added_tables = BTreeSet::new(); let mut renamed_tables = HashMap::new(); let mut rewritten_tables = BTreeSet::new(); - let mut indexed_tables = BTreeSet::new(); let mut dropped_tables = BTreeSet::new(); // Hard-drop cleanup targets: (table_key, full_dataset_uri). // Populated for DropProperty { Hard } and DropType { Hard }; the @@ -252,14 +251,14 @@ where .or_default() .insert(to.clone(), from.clone()); } - SchemaMigrationStep::AddConstraint { - type_kind, - type_name, - .. - } => { - indexed_tables.insert(schema_table_key(*type_kind, type_name)); - } - SchemaMigrationStep::UpdateTypeMetadata { .. } + // AddConstraint is only ever an `@index` addition (every other + // added constraint plans as UnsupportedChange). It records intent + // in the desired catalog/IR; the physical index is built off the + // critical path by ensure_indices/optimize (iss-848), so the apply + // does no table work for it — a pure metadata change like the two + // metadata steps below. + SchemaMigrationStep::AddConstraint { .. } + | SchemaMigrationStep::UpdateTypeMetadata { .. } | SchemaMigrationStep::UpdatePropertyMetadata { .. } => {} SchemaMigrationStep::DropProperty { type_kind, @@ -347,18 +346,15 @@ where let mut table_updates = HashMap::::new(); let mut table_tombstones = HashMap::::new(); - // Recovery sidecar: protect the per-table commit_staged loop in - // rewritten_tables + indexed_tables. The post_commit_pin we record - // here is a lower bound (expected + 1); the classifier loose-matches - // for SidecarKind::SchemaApply because the actual N depends on how - // many indices need building. See classify_table's loose-match arm. + // Recovery sidecar: protect the per-table `stage_overwrite` + + // `commit_staged` in rewritten_tables — the only tables that advance Lance + // HEAD inline now that index building is deferred to the reconciler + // (iss-848). Each rewritten table is exactly one commit, so + // `post_commit_pin = expected + 1` is now exact (it was a loose lower bound + // when index builds added extra commits); the classifier's loose-match for + // SidecarKind::SchemaApply still accepts it. let recovery_pins: Vec = rewritten_tables .iter() - .chain(indexed_tables.iter().filter(|t| { - !rewritten_tables.contains(*t) - && !added_tables.contains(*t) - && !renamed_tables.contains_key(*t) - })) .filter_map(|table_key| { let entry = snapshot.entry(table_key)?; Some(crate::db::manifest::SidecarTablePin { @@ -490,10 +486,11 @@ where let table_path = table_path_for_table_key(table_key)?; let dataset_uri = db.storage().dataset_uri(&table_path); let schema = schema_for_table_key(&desired_catalog, table_key)?; - let mut ds = + let ds = SnapshotHandle::new(TableStore::create_empty_dataset(&dataset_uri, &schema).await?); - db.build_indices_on_dataset_for_catalog(&desired_catalog, table_key, &mut ds) - .await?; + // Indexes for the new table are materialized off the critical path by + // ensure_indices/optimize (iss-848); a 0-row table is never trainable + // anyway. The @index intent is recorded in the persisted catalog/IR. let state = db.storage().table_state(&dataset_uri, &ds).await?; table_registrations.insert(table_key.clone(), table_path); table_updates.insert( @@ -533,10 +530,9 @@ where .await?; let table_path = table_path_for_table_key(target_table_key)?; let dataset_uri = db.storage().dataset_uri(&table_path); - let mut target_ds = + let target_ds = SnapshotHandle::new(TableStore::write_dataset(&dataset_uri, batch).await?); - db.build_indices_on_dataset_for_catalog(&desired_catalog, target_table_key, &mut target_ds) - .await?; + // Indexes on the renamed table are reconciled later (iss-848). let state = db.storage().table_state(&dataset_uri, &target_ds).await?; table_registrations.insert(target_table_key.clone(), table_path); table_updates.insert( @@ -593,9 +589,10 @@ where .open_dataset_head_for_write(table_key, &dataset_uri, entry.table_branch.as_deref()) .await?; let staged = db.storage().stage_overwrite(&existing, batch).await?; - let mut target_ds = db.storage().commit_staged(existing, staged).await?; - db.build_indices_on_dataset_for_catalog(&desired_catalog, table_key, &mut target_ds) - .await?; + let target_ds = db.storage().commit_staged(existing, staged).await?; + // The rewrite drops the table's existing index coverage; it is + // restored off the critical path by optimize's optimize_indices / + // ensure_indices (iss-848). Reads scan uncovered fragments meanwhile. let state = db.storage().table_state(&dataset_uri, &target_ds).await?; table_updates.insert( table_key.clone(), @@ -609,41 +606,12 @@ where ); } - for table_key in &indexed_tables { - if added_tables.contains(table_key) - || renamed_tables.contains_key(table_key) - || rewritten_tables.contains(table_key) - { - continue; - } - let entry = snapshot.entry(table_key).ok_or_else(|| { - OmniError::manifest(format!( - "missing table '{}' for schema index apply", - table_key - )) - })?; - ensure_snapshot_entry_head_matches(db, entry).await?; - let dataset_uri = db.storage().dataset_uri(&entry.table_path); - let mut ds = db - .storage() - .open_dataset_head_for_write(table_key, &dataset_uri, entry.table_branch.as_deref()) - .await?; - db.storage() - .ensure_expected_version(&ds, table_key, entry.table_version)?; - db.build_indices_on_dataset_for_catalog(&desired_catalog, table_key, &mut ds) - .await?; - let state = db.storage().table_state(&dataset_uri, &ds).await?; - table_updates.insert( - table_key.clone(), - crate::db::SubTableUpdate { - table_key: table_key.clone(), - table_version: state.version, - table_branch: None, - row_count: state.row_count, - version_metadata: state.version_metadata, - }, - ); - } + // Index-only changes (AddConstraint, i.e. adding an `@index`) are pure + // metadata: the new `@index` intent is recorded in the desired catalog/IR + // persisted below, and the physical index is materialized off the critical + // path by `ensure_indices`/`optimize` (iss-848). Schema apply touches no + // table data for them, so there is no per-table loop here and no recovery + // pin (no Lance HEAD advances). Reads stay correct meanwhile via a scan. let mut manifest_changes = Vec::new(); for (table_key, table_path) in table_registrations { diff --git a/crates/omnigraph/src/db/omnigraph/table_ops.rs b/crates/omnigraph/src/db/omnigraph/table_ops.rs index 3f40c1d..fab0e0c 100644 --- a/crates/omnigraph/src/db/omnigraph/table_ops.rs +++ b/crates/omnigraph/src/db/omnigraph/table_ops.rs @@ -21,7 +21,7 @@ pub(super) async fn graph_index_for_resolved( db.runtime_cache.graph_index(resolved, &catalog).await } -pub(super) async fn ensure_indices(db: &Omnigraph) -> Result<()> { +pub(super) async fn ensure_indices(db: &Omnigraph) -> Result> { let current_branch = db .coordinator .read() @@ -31,7 +31,7 @@ pub(super) async fn ensure_indices(db: &Omnigraph) -> Result<()> { ensure_indices_for_branch(db, current_branch.as_deref()).await } -pub(super) async fn ensure_indices_on(db: &Omnigraph, branch: &str) -> Result<()> { +pub(super) async fn ensure_indices_on(db: &Omnigraph, branch: &str) -> Result> { let branch = normalize_branch_name(branch)?; ensure_indices_for_branch(db, branch.as_deref()).await } @@ -73,12 +73,16 @@ pub(super) async fn failpoint_publish_table_head_without_index_rebuild_for_test( .await } -pub(super) async fn ensure_indices_for_branch(db: &Omnigraph, branch: Option<&str>) -> Result<()> { +pub(super) async fn ensure_indices_for_branch( + db: &Omnigraph, + branch: Option<&str>, +) -> Result> { db.ensure_schema_state_valid().await?; db.ensure_schema_apply_idle("ensure_indices").await?; let resolved = db.resolved_branch_target(branch).await?; let snapshot = resolved.snapshot; let mut updates = Vec::new(); + let mut pending = Vec::new(); let active_branch = resolved.branch; let catalog = db.catalog(); @@ -217,7 +221,7 @@ pub(super) async fn ensure_indices_for_branch(db: &Omnigraph, branch: Option<&st }; let row_count = db.storage().count_rows(&ds, None).await.unwrap_or(0); if row_count > 0 { - build_indices_on_dataset(db, &table_key, &mut ds).await?; + pending.extend(build_indices_on_dataset(db, &table_key, &mut ds).await?); } let state = db.storage().table_state(&full_path, &ds).await?; @@ -265,7 +269,7 @@ pub(super) async fn ensure_indices_for_branch(db: &Omnigraph, branch: Option<&st }; let row_count = db.storage().count_rows(&ds, None).await.unwrap_or(0); if row_count > 0 { - build_indices_on_dataset(db, &table_key, &mut ds).await?; + pending.extend(build_indices_on_dataset(db, &table_key, &mut ds).await?); } let state = db.storage().table_state(&full_path, &ds).await?; @@ -307,7 +311,7 @@ pub(super) async fn ensure_indices_for_branch(db: &Omnigraph, branch: Option<&st } } - Ok(()) + Ok(pending) } /// The single scalar/vector index a node property receives from a one-column @@ -352,6 +356,26 @@ fn node_prop_index_kind(prop_type: &PropType) -> Option { } } +/// Whether a vector column currently has at least one non-null vector — the +/// minimum for Lance IVF k-means to train (the `ivf_flat(1)` index we build +/// needs >=1 vector). Used identically by `needs_index_work_node` (so an +/// untrainable column is not pinned for recovery — avoiding a zero-commit pin +/// that would roll back a sibling's index work) and by the vector build arm (so +/// `create_vector_index` is only attempted when it can succeed, keeping its +/// genuine errors fatal instead of swallowed as pending). If index params +/// become size-aware (dev-graph iss-687), this threshold moves with them. +async fn vector_column_trainable( + db: &Omnigraph, + ds: &SnapshotHandle, + column: &str, +) -> Result { + Ok(db + .storage() + .count_rows(ds, Some(format!("{column} IS NOT NULL"))) + .await? + > 0) +} + /// Returns true if the node table is missing at least one declared /// scalar/vector index that `build_indices_on_dataset_for_catalog` would /// build AND has at least one row (the ensure_indices loop has @@ -366,7 +390,7 @@ fn node_prop_index_kind(prop_type: &PropType) -> Option { /// (DateTime/Date/numeric/Bool), FTS for free-text Strings, or a Vector index. /// Edges get BTree only (id, src, dst). This helper and the builder share /// `node_prop_index_kind` so they cannot drift — see its doc comment. -async fn needs_index_work_node( +pub(super) async fn needs_index_work_node( db: &Omnigraph, type_name: &str, table_key: &str, @@ -409,7 +433,14 @@ async fn needs_index_work_node( } } Some(NodePropIndexKind::Vector) => { - if !db.storage().has_vector_index(&ds, prop_name).await? { + // Only count a missing vector index as buildable *work* when the + // column is trainable (>=1 non-null vector). An untrainable + // column would defer in the build and commit nothing; pinning it + // for recovery would be a zero-commit pin that classifies + // NoMovement and rolls back a sibling table's index work. + if !db.storage().has_vector_index(&ds, prop_name).await? + && vector_column_trainable(db, &ds, prop_name).await? + { return Ok(true); } } @@ -434,7 +465,7 @@ async fn needs_index_work_node( /// /// Empty edge tables are skipped by the ensure_indices loop the same /// way node tables are; see `needs_index_work_node`. -async fn needs_index_work_edge( +pub(super) async fn needs_index_work_edge( db: &Omnigraph, table_key: &str, full_path: &str, @@ -632,11 +663,25 @@ pub(super) async fn open_dataset_at_state( .await } +/// A declared index the builder could not materialize on this pass. Today the +/// only such case is a vector (IVF) column with no trainable vectors yet +/// (KMeans needs >=1 vector), e.g. the load-before-embed window. Reported, not +/// fatal: a later `ensure_indices`/`optimize` retries once the column is +/// buildable, and reads stay correct via brute-force meanwhile. Surfacing +/// pending index *status* rather than failing the operation is the database +/// norm (Postgres `indisvalid`, LanceDB `list_indices`). +#[derive(Debug, Clone)] +pub struct PendingIndex { + pub table_key: String, + pub column: String, + pub reason: String, +} + pub(super) async fn build_indices_on_dataset( db: &Omnigraph, table_key: &str, ds: &mut SnapshotHandle, -) -> Result<()> { +) -> Result> { let catalog = db.catalog(); build_indices_on_dataset_for_catalog(db, &catalog, table_key, ds).await } @@ -646,8 +691,9 @@ pub(super) async fn build_indices_on_dataset_for_catalog( catalog: &Catalog, table_key: &str, ds: &mut SnapshotHandle, -) -> Result<()> { +) -> Result> { if let Some(type_name) = table_key.strip_prefix("node:") { + let mut pending = Vec::new(); if !db.storage().has_btree_index(ds, "id").await? { stage_and_commit_btree(db, table_key, ds, &["id"]).await?; } @@ -676,22 +722,52 @@ pub(super) async fn build_indices_on_dataset_for_catalog( } Some(NodePropIndexKind::Vector) => { if !db.storage().has_vector_index(ds, prop_name).await? { - // Inline-commit residual: lance-6.0.1 does not - // expose `build_index_metadata_from_segments` as - // `pub`, so vector indices cannot be staged from - // outside the lance crate. Document at the call - // site; companion ticket to lance-format/lance#6658. - let new_snap = db - .storage_inline_residual() - .create_vector_index(ds.clone(), prop_name.as_str()) - .await - .map_err(|e| { - OmniError::Lance(format!( - "create Vector index on {}({}): {}", - table_key, prop_name, e - )) - })?; - *ds = new_snap; + // A vector (IVF) index trains k-means over the column, + // so it needs >=1 non-null vector (KMeans errors + // "cannot train N centroids with 0 vectors"). Precheck + // trainability: a column with no vectors yet (e.g. rows + // loaded before `embed`) is recorded as a *pending* + // index and skipped — deferred, not failed. The SAME + // predicate gates `needs_index_work_node`, so an + // untrainable column is never pinned for recovery (no + // zero-commit pin that would roll back a sibling + // table's index work). This function is the chokepoint + // every write path funnels through (load/mutate, schema + // apply, ensure_indices, optimize, merge), realizing + // the governing principle — physical index state never + // fails a logical operation. Only when trainable do we + // attempt the build, and then we PROPAGATE any error: a + // genuine I/O/manifest/Lance failure must stay fatal, + // not be hidden as pending. (Vector creation is an + // inline-commit residual until lance#6666; iss-951.) + if vector_column_trainable(db, ds, prop_name).await? { + let new_snap = db + .storage_inline_residual() + .create_vector_index(ds.clone(), prop_name.as_str()) + .await + .map_err(|e| { + OmniError::Lance(format!( + "create Vector index on {}({}): {}", + table_key, prop_name, e + )) + })?; + *ds = new_snap; + } else { + tracing::info!( + target: "omnigraph::index", + table = %table_key, + column = %prop_name, + "deferring Vector index: column has no \ + trainable vectors yet", + ); + pending.push(PendingIndex { + table_key: table_key.to_string(), + column: prop_name.clone(), + reason: "column has no non-null vectors to \ + train on yet" + .to_string(), + }); + } } } // Enum + orderable scalars (DateTime/Date/numeric/Bool) @@ -709,7 +785,7 @@ pub(super) async fn build_indices_on_dataset_for_catalog( } } } - return Ok(()); + return Ok(pending); } if table_key.starts_with("edge:") { @@ -722,7 +798,9 @@ pub(super) async fn build_indices_on_dataset_for_catalog( if !db.storage().has_btree_index(ds, "dst").await? { stage_and_commit_btree(db, table_key, ds, &["dst"]).await?; } - return Ok(()); + // Edge tables only get BTree (id/src/dst), which build at any + // cardinality; no pending state is possible here. + return Ok(Vec::new()); } Err(OmniError::manifest(format!( @@ -844,7 +922,11 @@ async fn prepare_updates_for_commit( crate::db::MutationOpKind::SchemaRewrite, ) .await?; - build_indices_on_dataset(db, &prepared_update.table_key, &mut ds).await?; + // Any column not yet buildable (e.g. a vector column whose rows + // have null embeddings) is deferred and logged inside + // build_indices; a later ensure_indices/optimize materializes it. + // The load/mutate/merge commit must not fail on it. + let _pending = build_indices_on_dataset(db, &prepared_update.table_key, &mut ds).await?; let state = db.storage().table_state(&full_path, &ds).await?; prepared_update.table_version = state.version; prepared_update.row_count = state.row_count; diff --git a/crates/omnigraph/tests/failpoints.rs b/crates/omnigraph/tests/failpoints.rs index b45cfa0..2a0e9aa 100644 --- a/crates/omnigraph/tests/failpoints.rs +++ b/crates/omnigraph/tests/failpoints.rs @@ -2619,69 +2619,66 @@ async fn finalize_publisher_residual_does_not_drift_untouched_tables() { } /// Acceptance test: a stage-step failure in the staged-index path -/// (`stage_create_btree_index` succeeded; `commit_staged` not yet -/// called) leaves NO Lance-HEAD drift on the existing tables. -/// Subsequent operations against those tables succeed without -/// `ExpectedVersionMismatch`. +/// (`stage_create_btree_index` succeeded; `commit_staged` not yet called) +/// leaves NO Lance-HEAD drift, so other tables stay writable. /// -/// Path: `apply_schema(v1 → v2)` adds a new node type. The -/// `added_tables` loop in `schema_apply` creates the empty dataset and -/// then calls `build_indices_on_dataset_for_catalog` → -/// `stage_and_commit_btree(..., &["id"])`. The failpoint fires -/// between `stage_create_btree_index` and `commit_staged`, so the -/// staged segments are written under `_indices//` but Lance HEAD -/// on the new dataset is unchanged at v=1. The schema-apply lock -/// branch is released by `apply_schema`'s outer match. Existing -/// tables (e.g. `node:Person`) are completely untouched by the new -/// node's added_tables iteration — they're outside the failed apply -/// path entirely — and we assert that mutations against them continue -/// to work. -/// -/// The orphan empty dataset from the failed apply is acceptable -/// residual: it's unreferenced by `__manifest` and will be reclaimed -/// by `cleanup_old_versions` (or removed when a future apply at the -/// same target path resolves the rename). +/// Under iss-848 schema apply no longer builds indexes inline — the build +/// happens in the reconciler (`ensure_indices`/`optimize`) and at load. So this +/// fires the failpoint where it lives now: an `ensure_indices` build of a BTREE +/// that a prior apply declared (`@index`) but deferred. The failpoint fires +/// between `stage_create_btree_index` and `commit_staged`, so the staged +/// segment is written under `_indices//` but `node:Person`'s Lance HEAD is +/// unchanged. `ensure_indices` fails and its EnsureIndices sidecar pins only +/// Person at NoMovement (a clean no-op on the next open). A write to a +/// different, unpinned table (`node:Company`) is unaffected: mutations/loads run +/// a roll-forward-only heal and proceed — they do not refuse on a pending +/// sidecar the way `optimize`/`repair` do — so the write succeeds with no drift. #[tokio::test] async fn ensure_indices_stage_btree_failure_leaves_existing_tables_writable() { let _scenario = FailScenario::setup(); let dir = tempfile::tempdir().unwrap(); let uri = dir.path().to_str().unwrap().to_string(); - - // Init with TEST_SCHEMA which declares Person + Knows. Indices on - // those tables get built during init. let mut db = Omnigraph::init(&uri, helpers::TEST_SCHEMA).await.unwrap(); - // Apply a schema that adds a new node type. The added_tables loop - // will hit the failpoint between stage and commit on the new - // node:Project table's btree-on-id build. (TEST_SCHEMA already - // has Person + Company + Knows + WorksAt — pick a name that isn't - // already declared.) - let extended_schema = format!( - "{}\nnode Project {{ name: String @key }}\n", - helpers::TEST_SCHEMA - ); - - { - let _failpoint = - ScopedFailPoint::new("ensure_indices.post_stage_pre_commit_btree", "return"); - let err = db.apply_schema(&extended_schema).await.unwrap_err(); - assert!( - err.to_string() - .contains("ensure_indices.post_stage_pre_commit_btree"), - "schema apply should fail with the synthetic failpoint error, got: {err}" - ); - } - - // Existing tables stayed at their pre-apply versions; subsequent - // mutations against them succeed (no Lance-HEAD drift). + // Seed a Person row — the load builds Person's id BTREE + name FTS. mutate_main( &mut db, helpers::MUTATION_QUERIES, "insert_person", - &mixed_params(&[("$name", "Eve")], &[("$age", 22)]), + &mixed_params(&[("$name", "Alice")], &[("$age", 30)]), ) .await - .expect("Person mutation must succeed after the failed schema apply — existing tables are not drifted"); + .expect("seed Person"); + + // Add `@index` on `age`: schema apply records the intent but defers the + // physical build (iss-848), so the BTREE on `age` is unbuilt. + let indexed_schema = helpers::TEST_SCHEMA.replace("age: I32?", "age: I32? @index"); + db.apply_schema(&indexed_schema) + .await + .expect("adding an @index is metadata-only and succeeds"); + + { + // ensure_indices builds the deferred `age` BTREE on Person; the failpoint + // fires between stage and commit, so Person's Lance HEAD does not move. + let _failpoint = + ScopedFailPoint::new("ensure_indices.post_stage_pre_commit_btree", "return"); + let err = db.ensure_indices().await.unwrap_err(); + assert!( + err.to_string() + .contains("ensure_indices.post_stage_pre_commit_btree"), + "ensure_indices should fail with the synthetic failpoint error, got: {err}" + ); + } + + // A different, unpinned table is untouched by the failed index build. + use omnigraph::loader::{LoadMode, load_jsonl}; + load_jsonl( + &mut db, + r#"{"type": "Company", "data": {"name": "Acme"}}"#, + LoadMode::Append, + ) + .await + .expect("Company write on a table untouched by the failed ensure_indices should succeed"); } fn assert_no_staging_files(graph: &std::path::Path) { diff --git a/crates/omnigraph/tests/maintenance.rs b/crates/omnigraph/tests/maintenance.rs index deb4d2d..02ee8a7 100644 --- a/crates/omnigraph/tests/maintenance.rs +++ b/crates/omnigraph/tests/maintenance.rs @@ -843,3 +843,152 @@ async fn cleanup_reconciles_orphaned_branch_forks() { .await .unwrap(); } + +// Regression (iss-848): a table with rows but NULL vectors (the load-before- +// embed window) must not abort index building. The vector (IVF) index cannot +// train on 0 vectors, so `create_vector_index` errors with "KMeans cannot +// train 1 centroids with 0 vectors". `build_indices_on_dataset_for_catalog` +// is the chokepoint every caller funnels through (load/mutate via +// prepare_updates_for_commit, ensure_indices, optimize, schema apply, merge), +// so per-index fault isolation there must defer that one column (pending) and +// still build the sibling scalar indexes, instead of propagating the error. +// This exercises both the load path (which builds indices inline) and the +// ensure_indices reconciler. Pre-fix this fails at the load step. +#[tokio::test] +async fn index_build_tolerates_null_vector_rows() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + let schema = "node Doc {\n \ + slug: String @key\n \ + n: I64 @index\n \ + embedding: Vector(8)? @index\n\ + }\n"; + let mut db = Omnigraph::init(uri, schema).await.unwrap(); + // Rows present, embeddings null (loaded but not yet embedded). + load_jsonl( + &mut db, + "{\"type\":\"Doc\",\"data\":{\"slug\":\"d1\",\"n\":1}}\n\ + {\"type\":\"Doc\",\"data\":{\"slug\":\"d2\",\"n\":2}}", + LoadMode::Merge, + ) + .await + .expect("load rows with null embeddings"); + + // Must not abort: the untrainable vector column is deferred, the sibling + // BTREE on `n` still builds. + db.ensure_indices().await.expect( + "ensure_indices must not abort when a vector column has no trainable vectors yet", + ); +} + +// iss-848: `optimize` converges declared-but-unbuilt indexes. After an @index is +// added post-data (a metadata-only apply that defers the physical build), the +// column is unindexed and reads scan. `optimize` — the operator's reconciler, +// run on a cron — must materialize it, by composing the ensure_indices +// reconciler after the compaction sweep. Pre-iss-848 optimize only maintained +// coverage of EXISTING indexes (optimize_indices) and never created missing ones. +#[tokio::test] +async fn optimize_materializes_index_declared_but_unbuilt() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + let v1 = "node Doc {\n slug: String @key\n rank: I32\n}\n"; + let mut db = Omnigraph::init(uri, v1).await.unwrap(); + load_jsonl( + &mut db, + "{\"type\":\"Doc\",\"data\":{\"slug\":\"d1\",\"rank\":1}}\n\ + {\"type\":\"Doc\",\"data\":{\"slug\":\"d2\",\"rank\":2}}", + LoadMode::Merge, + ) + .await + .unwrap(); + + // Add @index on `rank` after data exists: a metadata-only apply that defers + // the physical build (iss-848), so the column is declared-indexed but unbuilt. + let v2 = "node Doc {\n slug: String @key\n rank: I32 @index\n}\n"; + db.apply_schema(v2).await.expect("index-only apply"); + + // Precondition: `rank` is declared @index but unbuilt -> reads degrade. + { + let snap = snapshot_main(&db).await.unwrap(); + let ds = snap.open("node:Doc").await.unwrap(); + assert!( + matches!( + TableStore::key_column_index_coverage(&ds, "rank") + .await + .unwrap(), + IndexCoverage::Degraded { .. } + ), + "rank must be unindexed after the deferred apply" + ); + } + + db.optimize().await.unwrap(); + + // Postcondition: optimize's reconciler materialized the declared index. + let snap = snapshot_main(&db).await.unwrap(); + let ds = snap.open("node:Doc").await.unwrap(); + assert_eq!( + TableStore::key_column_index_coverage(&ds, "rank") + .await + .unwrap(), + IndexCoverage::Indexed, + "optimize must build the declared-but-unbuilt rank index" + ); +} + +// iss-848 (PR review): the rename path also defers index building. A RenameType +// migration writes the renamed table as a new dataset with the existing rows +// but no indexes (its inline build was removed). optimize must then materialize +// the declared index on the renamed table. +#[tokio::test] +async fn optimize_materializes_index_after_type_rename() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + let v1 = "node Doc {\n slug: String @key\n rank: I32 @index\n}\n"; + let mut db = Omnigraph::init(uri, v1).await.unwrap(); + load_jsonl( + &mut db, + "{\"type\":\"Doc\",\"data\":{\"slug\":\"d1\",\"rank\":1}}\n\ + {\"type\":\"Doc\",\"data\":{\"slug\":\"d2\",\"rank\":2}}", + LoadMode::Merge, + ) + .await + .unwrap(); + + // Rename Doc -> Item; rows are preserved on the new table key. + let v2 = "node Item @rename_from(\"Doc\") {\n slug: String @key\n rank: I32 @index\n}\n"; + let result = db.apply_schema(v2).await.expect("rename apply"); + assert!(result.applied); + assert_eq!( + count_rows(&db, "node:Item").await, + 2, + "rename must preserve rows" + ); + + // Post-rename the renamed table's declared rank index is unbuilt (deferred). + { + let snap = snapshot_main(&db).await.unwrap(); + let ds = snap.open("node:Item").await.unwrap(); + assert!( + matches!( + TableStore::key_column_index_coverage(&ds, "rank") + .await + .unwrap(), + IndexCoverage::Degraded { .. } + ), + "rank must be unindexed immediately after the rename" + ); + } + + db.optimize().await.unwrap(); + + let snap = snapshot_main(&db).await.unwrap(); + let ds = snap.open("node:Item").await.unwrap(); + assert_eq!( + TableStore::key_column_index_coverage(&ds, "rank") + .await + .unwrap(), + IndexCoverage::Indexed, + "optimize must build the renamed table's deferred rank index" + ); +} diff --git a/crates/omnigraph/tests/schema_apply.rs b/crates/omnigraph/tests/schema_apply.rs index cc0cae2..508451a 100644 --- a/crates/omnigraph/tests/schema_apply.rs +++ b/crates/omnigraph/tests/schema_apply.rs @@ -736,3 +736,108 @@ edge Knows: Person -> Person { // current contract, the data is *unreachable* via omnigraph // (no manifest entry), which is the user-facing guarantee. } + +// Regression (bug 3 / dev-graph iss-848): a `Vector @index` on a 0-row table +// must not abort an otherwise-valid schema apply. A vector (IVF) index trains +// k-means centroids over the column's vectors, so Lance cannot build it on 0 +// vectors — it errors with "Creating empty vector indices with train=False is +// not yet implemented". When a *later* migration touches that table (here, an +// unrelated scalar `@index` on `body`), schema apply reconciles the table's +// whole index set, which previously tried to materialize the dormant vector +// index and aborted the entire migration (all-or-nothing). The build is now +// deferred (pending) when the column is untrainable, instead of failing the +// migration. The dormant index is materialized by a later `ensure_indices` / +// `optimize` once the table has rows. Full decoupling — intent recorded at +// apply, an async reconciler converges physical coverage — is iss-848. +#[tokio::test] +async fn apply_schema_defers_vector_index_on_empty_table() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + + // init does not build indices, so the declared-but-unbuilt vector index + // sits harmless on the empty table (this is how it survived earlier + // applies that never touched the table). + // `slug` is the user @key; omnigraph injects its own internal `id` column, + // so the key field must not be named `id`. + let v1 = "node Doc {\n \ + slug: String @key\n \ + body: String?\n \ + embedding: Vector(8) @index\n\ + }\n"; + let mut db = Omnigraph::init(uri, v1).await.unwrap(); + + // Add an *unrelated* scalar @index on `body`. This routes Doc through + // schema apply's index reconcile, which must NOT abort on the untrainable + // empty vector index. + let v2 = "node Doc {\n \ + slug: String @key\n \ + body: String? @index\n \ + embedding: Vector(8) @index\n\ + }\n"; + let result = db.apply_schema(v2).await.expect( + "schema apply must succeed: an empty-table vector @index is deferred, not fatal", + ); + assert!(result.applied, "the scalar @index change must apply"); + + // The deferred vector index is not dropped — once the table has a + // trainable vector, `ensure_indices` materializes it without error. (If + // the guard wrongly skipped a non-empty column, this would still be + // unindexed; if it wrongly tried to build on empty, the apply above would + // have failed.) + load_jsonl( + &mut db, + r#"{"type":"Doc","data":{"slug":"d1","body":"hello","embedding":[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8]}}"#, + LoadMode::Merge, + ) + .await + .expect("loading a Doc with an embedding must succeed"); + db.ensure_indices() + .await + .expect("the deferred vector index must build once the table has a trainable vector"); +} + +// iss-848: adding an `@index` to an existing column is a pure metadata change. +// Schema apply records the intent (the catalog/IR now declares the index) but +// must NOT build the index inline, so the table's data and manifest version are +// untouched. The physical index is materialized later by ensure_indices / +// optimize. Pre-iss-848 the indexed_tables block built the index inline and +// bumped the table version. +#[tokio::test] +async fn index_only_constraint_apply_touches_no_table_data() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + let v1 = "node Doc {\n slug: String @key\n n: I64\n}\n"; + let mut db = Omnigraph::init(uri, v1).await.unwrap(); + load_jsonl( + &mut db, + r#"{"type":"Doc","data":{"slug":"d1","n":1}}"#, + LoadMode::Merge, + ) + .await + .expect("load a Doc"); + + let before = db + .snapshot_of(ReadTarget::branch("main")) + .await + .unwrap() + .entry("node:Doc") + .unwrap() + .table_version; + + // Add an @index on the existing `n` column. + let v2 = "node Doc {\n slug: String @key\n n: I64 @index\n}\n"; + let result = db.apply_schema(v2).await.expect("index-only apply must succeed"); + assert!(result.applied, "the @index addition must apply"); + + let after = db + .snapshot_of(ReadTarget::branch("main")) + .await + .unwrap() + .entry("node:Doc") + .unwrap() + .table_version; + assert_eq!( + before, after, + "adding an @index must not bump the table version (no inline index build)" + ); +} diff --git a/docs/dev/invariants.md b/docs/dev/invariants.md index 878adfe..dd802b1 100644 --- a/docs/dev/invariants.md +++ b/docs/dev/invariants.md @@ -15,6 +15,38 @@ Use it this way: - Keep implementation ledgers, roadmap detail, and historical MR notes in the per-area docs. This file is the filter, not the encyclopedia. +## Governing principle: logical contract over physical state + +The hard invariants below are instances of one rule. Keep it in view whenever +a change touches the boundary between what the graph *means* and how it is +physically stored. + +> **Logical state is the contract. Physical state — index coverage, fragment +> layout, compaction versions, staged writes — is derived, rebuildable, and may +> be produced asynchronously. A physical operation must never fail a logical +> one. Preconditions are checked against logical state; physical reconciliation +> is idempotent and may lag or retry. Genuine logical conflicts still fail +> loudly: the licence to lag covers physical convergence, not correctness.** + +Invariants that instantiate it: **2** (manifest-atomic visibility) and **5** +(recovery is part of the commit protocol) — a partially-written physical layer +never changes what a graph commit means; **7** (indexes are derived state) — a +query is correct under partial index coverage, and expensive index work +converges from manifest state instead of gating the write path; **13** (failures +bounded and observable) — the licence to lag is not a licence to drop, so a +physical step that cannot make progress is surfaced, not swallowed. Deny-list +items that enforce it: synchronous inline vector/FTS index rebuilds on the +commit path; state that drifts from Lance or the manifest when it can be +derived; job queues for manifest-derivable state where a reconciler fits. + +The failure shape it rules out: a legitimate background operation on the +physical layer (compaction, an index build, an interrupted staged write) is +allowed to break a logical operation (a query's correctness, a migration's +success, a branch's writability). The smell to watch for is a logical operation +whose precondition is a *physical* fact — a cached file version, an index's +existence, a fragment count. Make the precondition logical and let a reconciler +converge the physical state. + ## Hard Invariants 1. **Respect the substrate.** Lance owns columnar storage, per-dataset @@ -105,7 +137,7 @@ Use it this way: | Schema validation | Type checks, required fields, defaults, edge endpoint checks, and edge cardinality are enforced on write paths | [schema-language.md](../user/schema/index.md), [execution.md](execution.md) | | Unique constraints | Intra-batch and write-path checks exist; intake and branch-merge derive the composite key through one shared function (`loader::composite_unique_key`, a separator-free `Vec` tuple) and fail loudly on an un-keyable column type rather than silently exempting it; full cross-version uniqueness against already-committed rows is still a gap | [schema-language.md](../user/schema/index.md) | | Storage trait | `TableStorage` (via `db.storage()`) is staged-only; the inline-commit residuals (`delete_where`, `create_vector_index`) are split onto a separate sealed `InlineCommitResidual` trait reached via `db.storage_inline_residual()` (MR-854), so §1 holds by construction; capability/stat surfaces are roadmap | [writes.md](writes.md), [architecture.md](architecture.md) | -| Index lifecycle | Index *creation* per `@index`/`@key` property is dispatched by type (enum + orderable scalar → BTREE, free-text String → FTS, Vector → vector) via `node_prop_index_kind`; index *coverage maintenance* exists — `optimize` runs Lance `optimize_indices` after compaction to fold appended/rewritten fragments into existing indexes (still an explicit maintenance call, not yet a background reconciler) | [indexes.md](../user/search/indexes.md), [maintenance.md](../user/operations/maintenance.md) | +| Index lifecycle | `@index`/`@key` declares *intent*; the physical index is derived state and never fails a logical op. `schema apply` builds no indexes (records intent only; index-only changes touch no table data). `load`/`mutate` build inline through one chokepoint (`build_indices_on_dataset_for_catalog`, type-dispatched by `node_prop_index_kind`: enum + orderable scalar → BTREE, free-text String → FTS, Vector → vector) that fault-isolates an untrainable Vector column into a *pending* index instead of aborting. `optimize`/`ensure_indices` is the reconciler: it creates declared-but-missing indexes and folds appended/rewritten fragments into existing ones (`optimize_indices`), reporting still-pending columns. Explicit maintenance call, not yet a background loop | [indexes.md](../user/search/indexes.md), [maintenance.md](../user/operations/maintenance.md) | | Traversal IDs | Runtime still builds `TypeIndex`; Lance stable row-id based graph IDs are roadmap | [architecture.md](architecture.md), [query-language.md](../user/queries/index.md) | | Auth | Bearer token hashing and server-side actor resolution are implemented at the HTTP boundary | [server.md](../user/operations/server.md), [policy.md](../user/operations/policy.md) | | Tests | Tempdir-backed Lance tests are the current substrate; the storage adapter has an in-memory backend for adapter-level contract tests, but Lance datasets bypass it | [testing.md](testing.md) | diff --git a/docs/dev/testing.md b/docs/dev/testing.md index 7e181e1..38b81f2 100644 --- a/docs/dev/testing.md +++ b/docs/dev/testing.md @@ -29,7 +29,7 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav | `point_in_time.rs` | Snapshots, time travel (`snapshot_at_version`, `entity_at`) | | `changes.rs` | `diff_between` / `diff_commits` | | `consistency.rs` | Cross-table snapshot isolation, atomic publish | -| `schema_apply.rs` | Migration plan + apply, schema-apply lock | +| `schema_apply.rs` | Migration plan + apply, schema-apply lock; index materialization deferred to the reconciler (iss-848): `apply_schema_defers_vector_index_on_empty_table` (an empty-table Vector `@index` never aborts the apply) and `index_only_constraint_apply_touches_no_table_data` (adding an `@index` is metadata-only — no table-version bump) | | `search.rs` | FTS / vector / hybrid (`bm25`, `nearest`, `rrf`) | | `traversal.rs` | `Expand`, variable-length hops, anti-join (CSR path — `OMNIGRAPH_TRAVERSAL_MODE` unset) | | `traversal_indexed.rs` | BTREE-indexed Expand (`execute_expand_indexed`) forced via `OMNIGRAPH_TRAVERSAL_MODE`, asserted semantically equal to the CSR path; own binary, all `#[serial]` so env writes never race | @@ -42,7 +42,7 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav | `lance_version_columns.rs` | Per-row `_row_last_updated_at_version` behavior | | `validators.rs` | Schema constraint enforcement (enum, range, unique, cardinality) across JSONL, insert, update paths | | `policy_engine_chassis.rs` | Engine-layer Cedar enforcement (MR-722): allow + deny through every `_as` writer via the SDK directly — no HTTP — proving embedded and CLI callers hit the same gate as the server, with action × scope shapes matching `authorize_request` | -| `maintenance.rs` | `optimize` (compaction), `repair` (explicit uncovered-drift publish), and `cleanup` (version GC): empty/idempotent/no-op edges, policy validation, head preservation; `optimize` publishes its own compaction (`optimize_publishes_compaction_to_manifest_so_schema_apply_succeeds`), skips pre-existing uncovered drift (`optimize_skips_preexisting_manifest_head_drift`), and refuses to run while a `__recovery` sidecar is pending (`optimize_defers_when_recovery_sidecar_is_pending`); `repair` previews/heals verified maintenance drift, refuses raw semantic drift without `--force`, and forced repair publishes only by explicit operator choice | +| `maintenance.rs` | `optimize` (compaction), `repair` (explicit uncovered-drift publish), and `cleanup` (version GC): empty/idempotent/no-op edges, policy validation, head preservation; `optimize` publishes its own compaction (`optimize_publishes_compaction_to_manifest_so_schema_apply_succeeds`), skips pre-existing uncovered drift (`optimize_skips_preexisting_manifest_head_drift`), and refuses to run while a `__recovery` sidecar is pending (`optimize_defers_when_recovery_sidecar_is_pending`); `repair` previews/heals verified maintenance drift, refuses raw semantic drift without `--force`, and forced repair publishes only by explicit operator choice; the index reconciler (iss-848): `index_build_tolerates_null_vector_rows` (an untrainable Vector column defers instead of aborting the build, sibling indexes still build) and `optimize_materializes_index_declared_but_unbuilt` (optimize creates a declared-but-deferred index) | | `failpoints.rs` | Failure-injection coverage (gated on `failpoints` feature). Includes the five per-writer Phase B → recovery integration tests (`recovery_rolls_forward_after_finalize_publisher_failure`, `schema_apply_phase_b_failure_recovered_on_next_open`, `branch_merge_phase_b_failure_recovered_on_next_open`, `ensure_indices_phase_b_failure_recovered_on_next_open`, `optimize_phase_b_failure_recovered_on_next_open`) and the write-entry in-process heal contract (the four `*_after_finalize_publisher_failure_heals_without_reopen` tests — load, mutation, schema apply, branch merge: a follow-up write on the same handle rolls a sidecar-covered residual forward without reopen/refresh) and the storage-fault matrix for the sidecar lifecycle (`recovery.sidecar_{write,delete,list}` / `recovery.record_audit` failpoints: Phase A put failure aborts with zero drift, Phase D delete failure is swallowed and healed by the next write, list failures are loud at heal and open, audit-append failures are retried to exactly one audit row; plus the bucket-gated `s3_load_recovers_after_publisher_failure_without_reopen`). | | `recovery.rs` | Open-time recovery sweep — sidecar I/O, classifier dispatch (NoMovement / RolledPastExpected / UnexpectedAtP1 / UnexpectedMultistep / InvariantViolation), all-or-nothing decision, roll-forward via `ManifestBatchPublisher::publish`, roll-back via `Dataset::restore`, audit row in `_graph_commit_recoveries.lance`, `OpenMode::ReadOnly` skip path | | `composite_flow.rs` | Compositional/narrative end-to-end stories — multi-step flows that compose mechanics covered by other test files. Catches integration regressions where individual operations all pass their unit tests but their composition breaks (sequential merges, post-merge main writes, time-travel through merge DAG, reopen consistency over multi-merge histories, post-optimize and post-cleanup strict writes). | diff --git a/docs/user/operations/maintenance.md b/docs/user/operations/maintenance.md index 87688d6..161e5d6 100644 --- a/docs/user/operations/maintenance.md +++ b/docs/user/operations/maintenance.md @@ -7,11 +7,12 @@ - Compacts every node + edge table on `main`, then reindexes them, then **publishes the resulting version to the `__manifest`** so the manifest's recorded version tracks the compacted-and-reindexed state. Reads pin the manifest version, so without this publish the work would be invisible to readers *and* would break the version precondition of the next schema apply / strict update/delete ("stale view … refresh and retry"). The publish advances the graph version (a system-attributed commit) only for tables that actually changed. - Rewrites small fragments into fewer large ones; old fragments remain reachable via older versions until `cleanup` runs. - **Reindex (index coverage maintenance).** A scalar/FTS/vector index only covers the fragments it was built over. Rows appended after the index was built (e.g. by `load --mode merge`, whose commit does not rebuild an already-existing index) are scanned unindexed, and compaction itself rewrites fragments out of an index's coverage. `optimize` runs Lance's incremental `optimize_indices` after compaction to fold those fragments back in (a delta merge, not a full retrain), restoring full coverage so equality/range/traversal predicates stay index-accelerated. This is why a table with **no compaction work but stale index coverage still commits** a new version under `optimize`. Run `optimize` on a cadence at least as frequent as your freshness window so recently-loaded rows do not linger in the unindexed flat-scan tail. +- **Create declared-but-missing indexes (the index reconciler).** `@index`/`@key` declares intent; `schema apply` records it but builds nothing, and `load`/`mutate` defer a column that cannot be built yet (a `Vector` column with no trainable vectors). `optimize` materializes any such declared-but-unbuilt index over the compacted layout — so it is the convergence path for an `@index` added after data exists, or a vector index whose embeddings arrived via a later `embed`. A column still not buildable (no vectors yet) is reported on the table's stat as `pending_indexes` (visible in `--json`), not treated as a failure; the next `optimize` retries. So `optimize` is the single operator-facing index reconciler: it compacts, restores coverage, **and** builds declared-but-missing indexes. - Each table's compact→reindex→publish serializes with concurrent mutations on the same table. A crash mid-operation is recovered automatically on the next open (both compaction and reindex are content-preserving, so roll-forward is always safe). - **Requires a recovered graph.** `optimize` refuses (errors) when a pending crash-recovery operation is present — operating on an unrecovered graph could publish a partial write that recovery would roll back. Reopen the graph to run recovery, then re-run `optimize`. - **Uncovered drift is skipped, not interpreted.** If a table's underlying version is ahead of the version recorded in `__manifest` and no crash-recovery record covers that movement, `optimize` reports `skipped: DriftNeedsRepair` with the manifest/head versions and leaves the table untouched. Run `omnigraph repair` to classify and explicitly publish that drift. - Bounded by `OMNIGRAPH_MAINTENANCE_CONCURRENCY` (default 8). -- Returns per-table stats: `table_key, fragments_removed, fragments_added, committed, skipped, manifest_version, lance_head_version`. +- Returns per-table stats: `table_key, fragments_removed, fragments_added, committed, skipped, manifest_version, lance_head_version, pending_indexes` (the last lists any declared `@index` column the reconciler could not build this run, with the reason — e.g. a vector column with no trainable vectors yet). - **Blob tables are skipped.** A table that declares any `Blob` property is not compacted: it is reported with `skipped: BlobColumnsUnsupportedByLance` (and logged) instead of compacted, and the rest of the sweep proceeds normally. **Reads and writes are unaffected** — only compaction is. Consequence: fragment count and deleted-row space on blob tables are not reclaimed; query results are never affected. A skipped blob table is also **not reindexed** in the same sweep (the skip happens before the reindex step), so its index coverage on appended rows is not refreshed by `optimize` today. ## `repair` — explicit diff --git a/docs/user/search/indexes.md b/docs/user/search/indexes.md index ea65a6f..57935cd 100644 --- a/docs/user/search/indexes.md +++ b/docs/user/search/indexes.md @@ -27,7 +27,8 @@ list/`Blob` columns → none. ## L2 — OmniGraph orchestration -- `ensure_indices()` / `ensure_indices_on(branch)` — idempotent build of BTREE + inverted indexes for the current head; safe to re-run. +- **`@index`/`@key` declares intent; the physical index is derived state.** A migration records the declaration in the catalog/IR and never fails on it — `schema apply` builds **no** indexes (adding an `@index` to an existing column is a pure metadata change that touches no table data). `load`/`mutate` build declared indexes inline as part of the write, but a column that can't be built yet (a `Vector` column with no trainable vectors — IVF k-means needs ≥1 vector, e.g. rows loaded before `embed` runs) is left **pending**, not fatal. Reads stay correct meanwhile: a missing/partial index degrades to a scan (vector search to brute-force). A later `ensure_indices`/`optimize` materializes the pending index once it is buildable. This mirrors how LanceDB builds indexes asynchronously and serves unindexed rows by brute-force. +- `ensure_indices()` / `ensure_indices_on(branch)` — idempotent build of BTREE + inverted + vector indexes for the current head; safe to re-run; returns the columns it had to defer as pending. `optimize` runs it after compaction, so the maintenance cron is the convergence path for deferred indexes. - Indexes are built on the *branch head* (not on a snapshot), so reads always see the current index state. - **Lazy branch forking for indexes**: a branch that hasn't mutated a sub-table doesn't need its own index — the main lineage's index is reused until the first write triggers a copy-on-write fork. - Vector index parameters (metric, nlist, nprobe, etc.) are not exposed in the schema; they default at the Lance layer and are picked up automatically when an index is asked for on a Vector column.