omnigraph/docs/user/search/indexes.md
Andrew Altshuler d46e50dd6d
docs(user): restructure user docs into topic sections (Phase 1) (#223)
Move the 23 flat docs/user/*.md files into topic subdirectories so the
user guide is organized by area (schema, queries, search, branching, cli,
operations, clusters, concepts, reference) instead of a flat list. This is
a pure structural move — whole files relocated, every cross-doc link
recomputed, no prose rewrites or content splits (those follow in Phase 2).

- 19 `git mv`s (install.md, deployment.md stay top-level); history preserved
  (renames detected at 92–100% similarity).
- All intra-doc links, AGENTS.md's topic table (52 pointers), and the
  docs/dev + docs/releases back-links recomputed via relpath from each
  file's new location.
- docs/user/index.md rewritten as a sectioned nav hub.
- Fixed 5 doc-path references in Rust (comments + two user-facing server
  settings error strings) to point at the new locations.

Verified: zero broken .md links across tracked docs; check-agents-md.sh
green (with the untracked scratch docs set aside); touched crates build.

Note: the public site (omnigraph-web) imports docs/ via a flat-only script;
its import-docs.mjs needs a subdir-aware update before the next re-sync.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 13:52:14 +03:00

2 KiB

Indexes

L1 — Lance index types OmniGraph exposes

Index Use Notes
BTREE scalar range / equality on any scalar created on @key, @index(...), and on key columns by ensure_indices()
Inverted (FTS) search, fuzzy, match_text, bm25 created on text columns referenced by FTS queries
Vector nearest() k-NN Lance picks IVF_PQ vs HNSW family by configuration; OmniGraph stores as FixedSizeList(Float32, dim)

L2 — OmniGraph orchestration

  • ensure_indices() / ensure_indices_on(branch) — idempotent build of BTREE + inverted indexes for the current head; safe to re-run.
  • Indexes are built on the branch head (not on a snapshot), so reads always see the current index state.
  • Lazy branch forking for indexes: a branch that hasn't mutated a sub-table doesn't need its own index — the main lineage's index is reused until the first write triggers a copy-on-write fork.
  • Vector index parameters (metric, nlist, nprobe, etc.) are not exposed in the schema; they default at the Lance layer and are picked up automatically when an index is asked for on a Vector column.

L2 — Graph topology index (graph_index/mod.rs)

This is OmniGraph-specific (not Lance):

  • TypeIndex: dense u32 ↔ String id mapping per node type.
  • CsrIndex: Compressed Sparse Row representation of edges per edge type — offsets[i]..offsets[i+1] slices into targets.
  • GraphIndex { type_indices, csr (out), csc (in) } — built on demand from a snapshot's edge tables, lazily: only when an Expand the planner routes to the CSR path (dense / large frontier) or an AntiJoin actually needs it.
  • Cached in RuntimeCache::graph_indices (LRU, max 8 entries, keyed by snapshot id + edge table versions).
  • Selective Expands resolve neighbors from the persisted src/dst BTREE instead (one indexed scan per hop) and never trigger the CSR build; see query-language → Expand. Pure scans, and queries served entirely by the indexed traversal path, skip it.