mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-12 01:45:14 +02:00
Replace the env-only mode switch with an auto policy: Expand uses the BTREE-indexed path when the source frontier is small and the hop count bounded (OMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER=1024, OMNIGRAPH_EXPAND_INDEXED_MAX_HOPS=6), else the in-memory CSR. OMNIGRAPH_TRAVERSAL_MODE=indexed|csr still forces a mode. Make the CSR index lazy: thread a GraphIndexHandle (memoizing OnceCell over a Cached/Direct/None builder) through execute_query/execute_pipeline/ execute_rrf_query/execute_anti_join instead of a pre-built Option<&GraphIndex>. A query served entirely by the indexed path with no AntiJoin never pays the O(|E|) CSR build — the perf win of Tier 3. AntiJoin still realizes the index (its negation uses CSR has_neighbors). Net effect: selective traversals (the common case) skip the whole-graph CSR build and resolve neighbors from the persisted, incrementally-maintained src/dst BTREE. Existing traversal/aggregation/end_to_end/search suites now run the indexed path by default and stay green. Docs: constants.md (new env knobs), query-language.md (Expand dual path), indexes.md (graph index is lazy + the indexed alternative).
26 lines
2 KiB
Markdown
26 lines
2 KiB
Markdown
# Indexes
|
|
|
|
## L1 — Lance index types OmniGraph exposes
|
|
|
|
| Index | Use | Notes |
|
|
|---|---|---|
|
|
| **BTREE scalar** | range / equality on any scalar | created on `@key`, `@index(...)`, and on key columns by `ensure_indices()` |
|
|
| **Inverted (FTS)** | `search`, `fuzzy`, `match_text`, `bm25` | created on text columns referenced by FTS queries |
|
|
| **Vector** | `nearest()` k-NN | Lance picks IVF_PQ vs HNSW family by configuration; OmniGraph stores as FixedSizeList(Float32, dim) |
|
|
|
|
## L2 — OmniGraph orchestration
|
|
|
|
- `ensure_indices()` / `ensure_indices_on(branch)` — idempotent build of BTREE + inverted indexes for the current head; safe to re-run.
|
|
- Indexes are built on the *branch head* (not on a snapshot), so reads always see the current index state.
|
|
- **Lazy branch forking for indexes**: a branch that hasn't mutated a sub-table doesn't need its own index — the main lineage's index is reused until the first write triggers a copy-on-write fork.
|
|
- Vector index parameters (metric, nlist, nprobe, etc.) are not exposed in the schema; they default at the Lance layer and are picked up automatically when an index is asked for on a Vector column.
|
|
|
|
## L2 — Graph topology index (`graph_index/mod.rs`)
|
|
|
|
This is OmniGraph-specific (not Lance):
|
|
|
|
- `TypeIndex`: dense `u32 ↔ String id` mapping per node type.
|
|
- `CsrIndex`: Compressed Sparse Row representation of edges per edge type — `offsets[i]..offsets[i+1]` slices into `targets`.
|
|
- `GraphIndex { type_indices, csr (out), csc (in) }` — built on demand from a snapshot's edge tables, **lazily**: only when an `Expand` the planner routes to the CSR path (dense / large frontier) or an `AntiJoin` actually needs it.
|
|
- Cached in `RuntimeCache::graph_indices` (LRU, max 8 entries, keyed by snapshot id + edge table versions).
|
|
- Selective `Expand`s resolve neighbors from the persisted `src`/`dst` BTREE instead (one indexed scan per hop) and never trigger the CSR build; see [query-language](query-language.md) → Expand. Pure scans, and queries served entirely by the indexed traversal path, skip it.
|