mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-21 02:28:07 +02:00
fix(engine): build scalar BTREE for enum and orderable-scalar @index columns
`build_indices_on_dataset_for_catalog` only handled `String` (-> FTS) and `Vector` (-> vector). Enums are physically `String`, so an enum `@index` column (e.g. `status`) got an FTS inverted index, which Lance never consults for `=`; and `DateTime`/`Date`/numeric/`Bool` `@index` columns fell through and built nothing. Both meant equality/range filters degraded to full scans with `indices_loaded=0`. Dispatch index kind by property type via a shared `node_prop_index_kind`: enum + orderable scalar -> BTREE, free-text String -> FTS, Vector -> vector, list/Blob -> none. The helper is shared by the builder and `needs_index_work_node` so they cannot drift — the latter decides recovery- sidecar pinning, and under-reporting would leave a HEAD-advancing index build uncovered (invariant 5). Tests: scalar_indexes.rs asserts enum/DateTime/numeric @index columns report `IndexCoverage::Indexed` while free-text String/un-annotated columns stay `Degraded` (negative control). Docs: docs/user/indexes.md.
This commit is contained in:
parent
e4ef67b0bb
commit
481de860b2
4 changed files with 193 additions and 36 deletions
|
|
@ -4,10 +4,27 @@
|
|||
|
||||
| Index | Use | Notes |
|
||||
|---|---|---|
|
||||
| **BTREE scalar** | range / equality on any scalar | created on `@key`, `@index(...)`, and on key columns by `ensure_indices()` |
|
||||
| **Inverted (FTS)** | `search`, `fuzzy`, `match_text`, `bm25` | created on text columns referenced by FTS queries |
|
||||
| **BTREE scalar** | `=` / range / `IN` / `IS NULL` on a scalar | always on the node `id` and edge `src`/`dst`; and on each one-column `@index`/`@key` property that is an **enum** or an **orderable scalar** (`DateTime`/`Date`/`I32`/`I64`/`U32`/`U64`/`F32`/`F64`/`Bool`) |
|
||||
| **Inverted (FTS)** | `search`, `fuzzy`, `match_text`, `bm25` | created on **free-text** (non-enum) `String` `@index`/`@key` columns |
|
||||
| **Vector** | `nearest()` k-NN | Lance picks IVF_PQ vs HNSW family by configuration; OmniGraph stores as FixedSizeList(Float32, dim) |
|
||||
|
||||
The per-property index a column gets is decided by `node_prop_index_kind` (shared
|
||||
by the builder and the sidecar-pinning coverage check so they cannot drift):
|
||||
enums and orderable scalars → BTREE, free-text Strings → FTS, `Vector` → vector,
|
||||
list/`Blob` columns → none.
|
||||
|
||||
> **Free-text Strings are not equality-indexed.** A non-enum `String` column
|
||||
> (including a `String @key` slug) gets an FTS inverted index, which Lance does
|
||||
> **not** consult for `=`/range — only for `search`/`match_text`/`bm25`. So an
|
||||
> equality filter on a free-text String falls back to a full scan. If you filter
|
||||
> a String identifier by equality on a large table, model it so the value is the
|
||||
> node id, or track it as a follow-up to also build a BTREE on such columns.
|
||||
|
||||
> **Coverage and cost.** Each indexed column adds index files and build time, and
|
||||
> an index only covers the fragments it was built over. Rows appended after the
|
||||
> index was built (e.g. by `ingest --mode merge`) are scanned unindexed until a
|
||||
> reindex extends coverage; see [maintenance](maintenance.md) → `optimize`.
|
||||
|
||||
## L2 — OmniGraph orchestration
|
||||
|
||||
- `ensure_indices()` / `ensure_indices_on(branch)` — idempotent build of BTREE + inverted indexes for the current head; safe to re-run.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue