mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-15 01:55:13 +02:00
* fix(engine): lower date/datetime filter literals as typed Arrow scalars `literal_to_expr` lowered `Date`/`DateTime` query literals as Utf8 strings, relying on DataFusion implicit casts. Against a physical `Date32`/`Date64` column that can coerce the column side (`CAST(col AS Utf8)`), which defeats a scalar BTREE and degrades the scan to a full filtered read. Lower to typed `Date32`/`Date64` scalars instead (reusing the loader's `parse_date32_literal`/`parse_date64_literal`, already used by the in-memory comparison arm), so the predicate stays a direct column comparison and the index is used. Malformed literals fall back to the Utf8 string so pushdown behavior never regresses. Tests: unit goldens asserting the lowered literal is typed (red before, green after) + inline-binding pushdown equality in literal_filters confirming the epoch conversion selects the right rows. * fix(engine): build scalar BTREE for enum and orderable-scalar @index columns `build_indices_on_dataset_for_catalog` only handled `String` (-> FTS) and `Vector` (-> vector). Enums are physically `String`, so an enum `@index` column (e.g. `status`) got an FTS inverted index, which Lance never consults for `=`; and `DateTime`/`Date`/numeric/`Bool` `@index` columns fell through and built nothing. Both meant equality/range filters degraded to full scans with `indices_loaded=0`. Dispatch index kind by property type via a shared `node_prop_index_kind`: enum + orderable scalar -> BTREE, free-text String -> FTS, Vector -> vector, list/Blob -> none. The helper is shared by the builder and `needs_index_work_node` so they cannot drift — the latter decides recovery- sidecar pinning, and under-reporting would leave a HEAD-advancing index build uncovered (invariant 5). Tests: scalar_indexes.rs asserts enum/DateTime/numeric @index columns report `IndexCoverage::Indexed` while free-text String/un-annotated columns stay `Degraded` (negative control). Docs: docs/user/indexes.md. * feat(engine): reindex in optimize to keep index coverage current A scalar/FTS/vector index only covers the fragments it was built over. Rows appended after the build (e.g. `ingest --mode merge`, whose commit does not rebuild an existing index) are scanned unindexed, and `compact_files` rewrites fragments out of coverage. Nothing folded them back in, so coverage decayed as the graph grew — even the id/src/dst BTREEs that power traversal. `optimize_one_table` now runs Lance `optimize_indices` after `compact_files` (incremental merge, not retrain — the same compact->optimize_indices sequence LanceDB's `optimize()` uses) and enters the publish path on compaction work OR stale index coverage (new `TableStore::has_unindexed_fragments`, reusing the fragment_bitmap logic). `optimize_indices` is a committing call with no uncommitted variant in lance-6.0.1, so it is an inline-commit residual covered by the existing `SidecarKind::Optimize` recovery sidecar spanning both ops. Blob-bearing tables are still skipped (the Lance blob-compaction bug is compaction-specific; reindex-for-blob deferred as a noted follow-up). Tests: maintenance.rs asserts an appended fragment is uncovered before and covered after optimize, and idempotency holds (second pass is a no-op). lance_surface_guards pins the `optimize_indices` signature and its incremental- coverage behavior. The existing optimize Phase-B recovery failpoint now also exercises a crash after reindex. Docs: maintenance.md, writes.md, invariants.md, lance.md, AGENTS.md. * fix(engine): coerce pushdown filter literals to the column type Filter literals were pushed to Lance in their natural Arrow type (every integer Int64, every float Float64). Against a narrower indexed column DataFusion widens to the literal's type and casts the COLUMN (`CAST(n32 AS Int64)`), which defeats the scalar BTREE and degrades to a full filtered read. A physical-plan probe confirms it: an Int32 column filtered by an i32 literal uses `ScalarIndexQuery`; by an i64 literal it does not. Thread the scan's `arrow_schema` through `build_lance_filter_expr` -> `ir_filter_to_expr` and coerce each literal operand to the opposite column's exact Arrow type, reusing `projection::literal_to_array` + `arrow_cast` (the same path the in-memory arm uses, so the two arms agree). Coercion never demotes a filter to None: on failure it falls back to the natural literal, because a node scan has no in-memory fallback for inline filters. Supersedes the date-specific change ine4ef67b(PR1): the probe shows dates were never index-defeated — temporal coercion casts the LITERAL, not the column — so PR1's index-use rationale was wrong though harmless. The generic coercion subsumes it; `literal_to_expr`'s date arms revert to the natural Utf8 fallback, and its unit tests now assert the live coerced path. Tests: surface guard `scalar_index_use_requires_matched_literal_type` pins the substrate behavior (matched -> index, widened -> column-cast full scan); unit tests cover Int32/UInt32/Float32 coercion, range op, reversed operand order, and the natural fallback; `literal_filters` adds an I32 column with equality + range and an F32 pushdown case. * fix(engine): only coerce filter literals when the cast is lossless The literal coercion inf064121narrowed unconditionally. typecheck permits numeric cross-type comparisons (`types_compatible`), so an out-of-domain literal reaches `literal_to_typed_expr` and casts lossily: a fractional float vs an integer column truncates (`{ count: 2.7 }` -> `count = 2`, wrongly matching the count=2 row) and an out-of-range integer overflows to null (`count < 3e9` on I32 -> `count < NULL` -> empty). Both silently change results, and a node scan has no in-memory fallback for inline filters. Add a lossless guard for integer targets: round-trip the cast back to the natural type and, on mismatch, return None so the caller keeps the natural literal (correct via DataFusion coercion; the index is just unused for that out-of-domain predicate). Float targets stay coerced -- narrowing F64 -> F32 is the column's own precision domain, not a value error. Resolves the two valid review findings on PR #216 (Codex float truncation, Greptile out-of-range). Tests: unit cases for fractional/out-of-range fallback vs whole-float/in-range coerce vs F32 exemption; e2e `{ count: 2.7 }` returns no rows.
48 lines
7.3 KiB
Markdown
48 lines
7.3 KiB
Markdown
# Maintenance: Optimize, Repair & Cleanup
|
|
|
|
**Addressing.** `optimize`, `repair`, and `cleanup` are **storage-plane** CLI commands: they run with direct storage access against a positional `URI`, `--target`, or **`--cluster <dir|s3://…> --cluster-graph <id>`** (which resolves the graph's storage URI from the served cluster state, so you needn't know the `<storage>/graphs/<id>.omni` layout). They never run through a server, and reject `--server` / `--graph` or a `--target` that resolves to a remote (`http(s)://`) URL with a declared error. There are no server routes for them by design — to maintain a server-backed graph, run them out-of-band against the graph's storage URI. See the *Command planes* section of [cli-reference.md](../cli/reference.md).
|
|
|
|
## `optimize` — non-destructive
|
|
|
|
- Compacts every node + edge table on `main`, then reindexes them, then **publishes the resulting version to the `__manifest`** so the manifest's recorded version tracks the compacted-and-reindexed state. Reads pin the manifest version, so without this publish the work would be invisible to readers *and* would break the version precondition of the next schema apply / strict update/delete ("stale view … refresh and retry"). The publish advances the graph version (a system-attributed commit) only for tables that actually changed.
|
|
- Rewrites small fragments into fewer large ones; old fragments remain reachable via older versions until `cleanup` runs.
|
|
- **Reindex (index coverage maintenance).** A scalar/FTS/vector index only covers the fragments it was built over. Rows appended after the index was built (e.g. by `load --mode merge`, whose commit does not rebuild an already-existing index) are scanned unindexed, and compaction itself rewrites fragments out of an index's coverage. `optimize` runs Lance's incremental `optimize_indices` after compaction to fold those fragments back in (a delta merge, not a full retrain), restoring full coverage so equality/range/traversal predicates stay index-accelerated. This is why a table with **no compaction work but stale index coverage still commits** a new version under `optimize`. Run `optimize` on a cadence at least as frequent as your freshness window so recently-loaded rows do not linger in the unindexed flat-scan tail.
|
|
- Each table's compact→reindex→publish serializes with concurrent mutations on the same table. A crash mid-operation is recovered automatically on the next open (both compaction and reindex are content-preserving, so roll-forward is always safe).
|
|
- **Requires a recovered graph.** `optimize` refuses (errors) when a pending crash-recovery operation is present — operating on an unrecovered graph could publish a partial write that recovery would roll back. Reopen the graph to run recovery, then re-run `optimize`.
|
|
- **Uncovered drift is skipped, not interpreted.** If a table's underlying version is ahead of the version recorded in `__manifest` and no crash-recovery record covers that movement, `optimize` reports `skipped: DriftNeedsRepair` with the manifest/head versions and leaves the table untouched. Run `omnigraph repair` to classify and explicitly publish that drift.
|
|
- Bounded by `OMNIGRAPH_MAINTENANCE_CONCURRENCY` (default 8).
|
|
- Returns per-table stats: `table_key, fragments_removed, fragments_added, committed, skipped, manifest_version, lance_head_version`.
|
|
- **Blob tables are skipped.** A table that declares any `Blob` property is not compacted: it is reported with `skipped: BlobColumnsUnsupportedByLance` (and logged) instead of compacted, and the rest of the sweep proceeds normally. **Reads and writes are unaffected** — only compaction is. Consequence: fragment count and deleted-row space on blob tables are not reclaimed; query results are never affected. A skipped blob table is also **not reindexed** in the same sweep (the skip happens before the reindex step), so its index coverage on appended rows is not refreshed by `optimize` today.
|
|
|
|
## `repair` — explicit
|
|
|
|
- Handles **uncovered manifest/head drift**: a table's underlying version is ahead of the manifest pin and no crash-recovery record explains the movement.
|
|
- Preview by default. `omnigraph repair --json <uri>` reports each table's `classification`, `action`, manifest/head versions, underlying operation names, and any classification error. `--confirm` publishes only verified maintenance drift; if any suspicious or unverifiable table is refused, the CLI prints the per-table output and exits non-zero. `--force --confirm` also publishes suspicious or unverifiable drift after operator review.
|
|
- Classifies drift by reading the table's transaction history from `manifest_version + 1` through the current head. Only fragment-reservation and rewrite (compaction) operations are verified maintenance. Semantic operations such as append, delete, update, merge, or missing transaction history are not auto-healed.
|
|
- Publishes repair by advancing `__manifest` to the existing head; it does **not** rewrite data. If the publish succeeds, normal reads and strict writes use the repaired version. If it fails, no new data-side partial state was created.
|
|
- Requires a clean recovery state. A pending crash-recovery operation still belongs to automatic recovery, not manual repair.
|
|
|
|
## `cleanup` — destructive
|
|
|
|
- Garbage-collects old versions per table.
|
|
- Removes versions (and their unique fragments) older than the retention policy.
|
|
- Policy options `keep_versions` and `older_than` — at least one is required.
|
|
- Returns per-table stats: `table_key, bytes_removed, old_versions_removed, error`.
|
|
- **Fault-isolated per table.** A single table's transient failure (version GC or
|
|
orphan reclaim) is recorded on that table's stats row (with an `error`) and logged,
|
|
and never aborts the healthy tables — cleanup is the convergence
|
|
backstop, so it does as much as it can and converges on re-run. The CLI reports
|
|
any failed tables; rerun `cleanup` to retry them.
|
|
- CLI guards with `--confirm`; without it, prints a preview line.
|
|
- **Recovery floor:** `--keep < 3` may garbage-collect versions that crash recovery needs as a rollback target. Default `--keep 10` is safe.
|
|
- **Orphaned-branch reconciliation:** before the version GC, cleanup reclaims any per-table or commit-graph branch absent from the manifest branch list. These orphans arise when a `branch_delete` flips the manifest authority but a downstream best-effort reclaim does not complete (see [branches-commits.md](../branching/index.md)). The reconciler is idempotent (it no-ops once nothing is orphaned), runs regardless of the `keep_versions` / `older_than` values (those gate version GC only), and never reclaims `main` or system-branch forks. Reclaimed forks are logged.
|
|
|
|
## Tombstones
|
|
|
|
Logical sub-table delete markers in `__manifest` that exclude a sub-table version from snapshot reconstruction.
|
|
|
|
## Internal schema migrations
|
|
|
|
Version evolutions of the on-disk `__manifest` shape are reconciled automatically on the first write under a new binary. An on-disk stamp records the shape; the binary migrates it forward before reading state, and reads are side-effect-free. No operator action is required for in-place upgrades. See [storage.md → Internal schema versioning](../concepts/storage.md) for the full mechanism.
|
|
|
|
A binary opening a manifest stamped at a version *higher* than it knows about refuses to publish with a clear "upgrade omnigraph first" error — old binaries cannot clobber a newer schema.
|