mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-30 02:49:39 +02:00
* perf(engine): scope the CSR topology index to traversed edges, reuse it cross-branch The in-memory CSR graph index was built over every edge type in the catalog and cache-keyed by the resolved snapshot id, so a single-edge join (`$x identifiesPerson $p`) full-scanned every edge table in the graph (the 40-60s / 428s-first-traversal hang), and a lazy-fork branch cold-rebuilt main's index. Two cuts close that: - Scope (A2): build only the edge types the query traverses (`referenced_edge_types` over Expand/AntiJoin, exhaustive match), not the whole catalog. Threaded through GraphIndexHandle -> RuntimeCache; cache-keyed on the scoped set. - Cross-branch reuse (A1): key RuntimeCache by each edge table's physical identity (table_key, version, table_branch, e_tag) instead of the snapshot id, so a lazy-fork branch whose edge tables physically are main's reuses main's built index. Local-FS (e_tag None) falls back to refresh-invalidation. Adds graph_build_count/graph_edges_built probes for the cost tests. * test(engine): cost tests for scoped + cross-branch-reused topology index fresh_branch_traversal_reuses_main_graph_index (A1: a lazy-fork branch reuses main's cached CSR index, 0 rebuilds) and single_edge_query_builds_only_referenced_edge (A2: a one-edge query builds only that edge, not the whole catalog), via the graph_build_count/graph_edges_built probes. Forced CSR mode, #[serial]. Updates the recreated-branch incarnation test comment for the physical-identity key. * docs(engine): topology-index scoping + physical-identity cache key Document the scoped CSR build and the physical-identity (e_tag) graph-index cache key with its local-FS refresh-invalidation fallback across invariants, testing, execution, and architecture docs. * fix(test): move CSR-forced topology cost tests to the all-serial binary The two topology-build cost tests force OMNIGRAPH_TRAVERSAL_MODE via process- global env mutation, which query.rs reads. In warm_read_cost.rs (a mixed serial/non-serial binary) a concurrent non-serial traversal test could race the env write (UB under Rust 2024's unsafe set_var contract) and be forced onto CSR. Move them to traversal_indexed.rs — the dedicated all-serial binary with no non-serial env reader (its documented-safe home) — and add a ModeGuard RAII helper so a panic mid-test cannot leak the override. Addresses a PR review (P2). * fix(engine): include edge endpoints in the graph-index cache key The A1 physical-identity key omitted the edge's (from_type, to_type). GraphIndex keys its TypeIndexes by those endpoint names and execute_expand_csr looks them up by the current catalog's names, so a schema repoint of an edge type that leaves the edge table's physical identity unchanged would reuse a stale index built with the old endpoint namespace and fail with "no type index for <new type>". The old snapshot_id (carrying the manifest version) masked this; dropping it exposed it. Adding the endpoints to the key rebuilds on a repoint while preserving lazy-fork cross-branch reuse (same endpoints -> same key). Addresses a PR review (P1). * test(engine): scoped with_traversal_mode seam + e_tag graph-index coverage Replace the process-global OMNIGRAPH_TRAVERSAL_MODE env-mutation test hack (which forced #[serial] + dedicated all-serial binaries and was triplicated as ModeGuard + set_mode/clear_mode) with one general abstraction: a task-local `with_traversal_mode` seam mirroring `with_query_io_probes`. It is scope-bound (leak-free even on panic) and process-safe (never touches shared state), so a forced-mode test cannot affect a concurrent test in the same binary. `traversal_indexed_override` consults the seam first, then the env var (which stays the documented ops escape hatch). - Migrate traversal_indexed.rs, proptest_equivalence.rs, and the two topology cost tests (moved back to warm_read_cost.rs) to the seam; drop all ModeGuard / set_mode / clear_mode / #[serial] / per-file column0 helpers. - Consolidate the duplicated first-column extractors into one shared `helpers::first_column_sorted`. - Add `s3_storage.rs::s3_fresh_branch_traversal_reuses_main_graph_index_with_etags`: the CSR cache-key cross-branch reuse path on a REAL per-table e_tag (None on local FS, so local tests can't reach it). Confirmed empirically that RustFS — the CI S3 backend — surfaces ETags into version_metadata.e_tag(). CI path filter now triggers the rustfs job on runtime_cache/graph_index changes.
180 lines
11 KiB
Markdown
180 lines
11 KiB
Markdown
# Query Execution, Mutations, and Loading
|
|
|
|
## Query execution (`exec/query.rs`)
|
|
|
|
Pipeline:
|
|
|
|
1. Parse + typecheck via `omnigraph-compiler`.
|
|
2. Lower to IR.
|
|
3. If `Expand` or `AntiJoin` is present, build (or fetch from `RuntimeCache`) a `GraphIndex` **scoped to the edge types the query actually traverses** (`referenced_edge_types`, recursing through `AntiJoin` inners) — not every edge type in the catalog. The CSR build full-scans each covered edge dataset, so scoping is what keeps a single-edge join (`$x identifiesPerson $p`) from scanning the whole graph's edge data. The `RuntimeCache` key is each covered edge table's **physical identity** `(table_key, version, table_branch, e_tag)` (not the resolved snapshot id), so a `{Knows}` index and a `{Knows, WorksAt}` index are distinct entries AND a lazy-fork branch whose edge tables physically *are* main's reuses main's built index instead of cold-scanning it.
|
|
4. Run `execute_query` against the snapshot.
|
|
|
|
### Read flow — sequence
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
autonumber
|
|
participant client as Client
|
|
participant og as Omnigraph::query<br/>(query.rs:7)
|
|
participant cmp as omnigraph-compiler
|
|
participant exec as execute_query<br/>(query.rs:347)
|
|
participant gi as GraphIndex<br/>(RuntimeCache)
|
|
participant ts as table_store
|
|
participant lance as Lance scanner
|
|
|
|
client->>og: query(target, source, name, params)
|
|
og->>og: ensure_schema_state_valid()<br/>resolve target → snapshot
|
|
og->>cmp: parse + typecheck_query (typecheck.rs:83)
|
|
cmp-->>og: CheckedQuery
|
|
og->>cmp: lower_query (lower.rs:11)
|
|
cmp-->>og: QueryIR (pipeline of IROp)
|
|
og->>exec: extract_search_mode + dispatch (query.rs:110)
|
|
exec->>gi: build / fetch GraphIndex<br/>(if Expand or AntiJoin)
|
|
gi-->>exec: CSR / CSC topology
|
|
loop for each IROp in pipeline
|
|
exec->>ts: scan with predicate / SIP
|
|
ts->>lance: filter · nearest · full_text_search
|
|
lance-->>ts: Stream of RecordBatch
|
|
ts-->>exec: RecordBatch stream
|
|
exec->>exec: factorize · expand · fuse · project
|
|
end
|
|
exec-->>og: QueryResult (RecordBatches)
|
|
og-->>client: serialized result
|
|
```
|
|
|
|
**Code paths:**
|
|
|
|
- Entry: `Omnigraph::query` at `crates/omnigraph/src/exec/query.rs:7`
|
|
- Search-mode extraction: `extract_search_mode` at `crates/omnigraph/src/exec/query.rs:110`
|
|
- Pipeline runner: `execute_query` at `crates/omnigraph/src/exec/query.rs:347`
|
|
- RRF fan-out: `execute_rrf_query` at `crates/omnigraph/src/exec/query.rs:393`
|
|
- Per-source-row BFS: `execute_expand` at `crates/omnigraph/src/exec/query.rs:675`
|
|
- Lance scan + pushdown: `execute_node_scan` at `crates/omnigraph/src/exec/query.rs:1027`
|
|
- Filter → SQL pushdown: `build_lance_filter` at `crates/omnigraph/src/exec/query.rs:1158`
|
|
|
|
### Multi-modal search modes (`SearchMode`)
|
|
|
|
The executor recognizes three modes that may be combined in a single query:
|
|
|
|
- **`nearest`** — vector ANN (uses Lance vector index; `LIMIT` required).
|
|
- **`bm25`** — BM25 over an inverted index.
|
|
- **`rrf`** — Reciprocal Rank Fusion of two rankings, with k (default 60).
|
|
|
|
Hybrid example: `order { rrf(nearest($d.embedding, $q), bm25($d.body, $q_text)) desc } limit 20`.
|
|
|
|
### Joins / set operations
|
|
|
|
- Joins are implicit: MATCH bindings + traversals are implemented as scans + CSR/CSC lookups.
|
|
- `not { … }` lowers to an `AntiJoin` over the inner pipeline.
|
|
|
|
### Scoped reads
|
|
|
|
- `query(target, source, name, params)` — at any branch or snapshot.
|
|
- `run_query_at(version, …)` — direct historical query at a manifest version.
|
|
|
|
### Concurrency
|
|
|
|
- Snapshot isolation per query: all reads inside a query use the same `Snapshot`.
|
|
- Readers and writers on different branches don't block each other.
|
|
|
|
## Mutation execution (`exec/mutation.rs`)
|
|
|
|
Resolves expression values to literals, converts to typed Arrow arrays (`literal_to_typed_array(lit, DataType, num_rows)`), then writes via Lance's two-phase distributed-write API at end-of-query:
|
|
|
|
- `insert` (no `@key`, edges) → accumulate into `MutationStaging.pending` (Append mode); finalize calls `stage_append` once per touched table.
|
|
- `insert` (`@key` node) → accumulate into `pending` (Merge mode); finalize calls `stage_merge_insert` once per touched table.
|
|
- `update` → scan committed via Lance + pending via DataFusion `MemTable` (read-your-writes), apply assignments, accumulate into `pending` (Merge mode).
|
|
- `delete` → records a predicate into `MutationStaging.delete_predicates` (count matching committed rows now for `affected_*`); finalize combines a table's predicates into one `stage_delete` (Lance 7.0 `DeleteBuilder::execute_uncommitted`, a deletion-vector transaction) committed via `commit_staged` — no inline HEAD advance (MR-A).
|
|
|
|
**D₂ parse-time rule.** A single mutation query is either insert/update-only or delete-only. Mixed → reject before any I/O. The check fires in `enforce_no_mixed_destructive_constructive(&ir)` inside `execute_named_mutation`.
|
|
|
|
Multi-statement mutations are atomic at the publisher commit boundary: every insert/update batch lives in memory until end-of-query, then exactly one `stage_*` + `commit_staged` runs per touched table, then `ManifestBatchPublisher::publish` commits the manifest atomically with per-table `expected_table_versions` CAS.
|
|
|
|
### Mutation flow — sequence
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
autonumber
|
|
participant client as Client
|
|
participant og as Omnigraph::mutate_as<br/>(mutation.rs)
|
|
participant cmp as omnigraph-compiler
|
|
participant stg as MutationStaging<br/>(exec/staging.rs)
|
|
participant ts as table_store
|
|
participant lance as Lance dataset
|
|
participant pub as ManifestBatchPublisher
|
|
|
|
client->>og: mutate_as(branch, source, name, params, actor_id)
|
|
og->>cmp: parse + typecheck + lower_mutation_query
|
|
cmp-->>og: MutationIR
|
|
og->>og: enforce_no_mixed_destructive_constructive (D₂)
|
|
loop for each mutation op
|
|
og->>og: resolve literals + build batch
|
|
alt insert / update (accumulate)
|
|
og->>ts: open dataset @ pre-write version (first touch)
|
|
og->>stg: ensure_path + append_batch (PendingMode)
|
|
opt update — scan committed + pending
|
|
og->>ts: scan_with_pending (Lance + DataFusion MemTable union)
|
|
ts-->>og: matched batches
|
|
end
|
|
else delete (stage, D₂ keeps separate)
|
|
og->>ts: count_rows (committed match → affected_*)
|
|
og->>stg: ensure_path + record_delete (predicate)
|
|
end
|
|
end
|
|
og->>stg: finalize(db, branch)
|
|
loop per pending table
|
|
stg->>ts: stage_append OR stage_merge_insert (one per table)
|
|
ts-->>stg: StagedWrite (transaction + commit metadata + fragments)
|
|
stg->>ts: commit_staged (advances Lance HEAD)
|
|
ts-->>stg: new Dataset
|
|
end
|
|
stg-->>og: (updates: Vec<SubTableUpdate>, expected_versions)
|
|
og->>pub: commit_updates_on_branch_with_expected
|
|
pub->>pub: publisher CAS (cross-table OCC on __manifest)
|
|
pub-->>og: new manifest version
|
|
og-->>client: MutationResult
|
|
```
|
|
|
|
**Code paths:**
|
|
|
|
- Entry: `Omnigraph::mutate_as` at `crates/omnigraph/src/exec/mutation.rs`
|
|
- Per-mutation orchestration: `mutate_with_current_actor` at `crates/omnigraph/src/exec/mutation.rs`
|
|
- D₂ check: `enforce_no_mixed_destructive_constructive` (in the same file)
|
|
- Per-op execution: `execute_insert`, `execute_update`, `execute_delete_node`, `execute_delete_edge`
|
|
- Pending-aware reads: `TableStore::scan_with_pending` / `count_rows_with_pending` at `crates/omnigraph/src/table_store.rs`
|
|
- Edge cardinality with pending: `validate_edge_cardinality_with_pending` at `crates/omnigraph/src/exec/mutation.rs`
|
|
- Per-query accumulator: `crates/omnigraph/src/exec/staging.rs` (`MutationStaging`, `PendingTable`, `PendingMode`, `finalize`)
|
|
- End-of-query Lance commit: `TableStore::stage_append`, `stage_merge_insert`, `commit_staged` at `crates/omnigraph/src/table_store.rs`
|
|
- Manifest commit primitive: `commit_updates_on_branch_with_expected` at `crates/omnigraph/src/db/omnigraph/table_ops.rs`
|
|
|
|
Atomicity guarantee for multi-statement mutations: a mid-query failure leaves Lance HEAD untouched on staged tables (no inline commit happened during op execution), so the next mutation proceeds normally with no `ExpectedVersionMismatch`. The publisher CAS at the very end either succeeds (manifest advances atomically across all touched sub-tables) or fails with a typed `ManifestConflictDetails::ExpectedVersionMismatch` (no partial publish). See [docs/dev/invariants.md](invariants.md) and [docs/dev/writes.md](writes.md).
|
|
|
|
## Bulk loader (`loader/mod.rs`)
|
|
|
|
- **JSONL only** in v1, with two record shapes:
|
|
- Node: `{"type":"NodeType", "data":{…}}`
|
|
- Edge: `{"edge":"EdgeType", "from":"src_id", "to":"dst_id", "data":{…}}`
|
|
- Lines starting with `//` are treated as comments.
|
|
- Schema validation on every row (typecheck, required props, blob base64 decoding).
|
|
- Edge endpoint resolution by node `@key`.
|
|
|
|
## Load modes (`LoadMode`)
|
|
|
|
| Mode | Semantics | Path (post-MR-794) |
|
|
|---|---|---|
|
|
| `Overwrite` | Replace all data in the target tables on the branch | Same accumulator; one `stage_overwrite` + `commit_staged` per touched table at end-of-load (a staged Lance `Operation::Overwrite` transaction — HEAD does not advance until commit; MR-793 Phase 2); publisher CAS. |
|
|
| `Append` | Strict insert; duplicates error | In-memory `MutationStaging` accumulator; one `stage_append` + `commit_staged` per touched table at end-of-load; publisher CAS. |
|
|
| `Merge` | Upsert by `id` (`merge_insert`) | Same accumulator; one `stage_merge_insert` per touched table at end-of-load (Merge mode dedupes by `id`, last-write-wins); publisher CAS. |
|
|
|
|
For all three modes, a mid-load failure (RI / cardinality violation, validation error) leaves Lance HEAD untouched on the staged tables — the next load on the same tables proceeds normally with no `ExpectedVersionMismatch`.
|
|
|
|
## `load` and the deprecated `ingest` shims
|
|
|
|
- `load_as(branch, base, data, mode, actor)` — the unified entry (single publisher commit per call). `base: Some(b)` forks a missing `branch` from `b` first (via `branch_create_from_as`, which enforces `BranchCreate`); `base: None` requires the branch to exist — staging fails on an unknown branch, so a typo'd name can never create one.
|
|
- `load(branch, data, mode)` — convenience wrapper with `base: None` and no actor.
|
|
- Returns `LoadResult { branch, base_branch, branch_created, nodes_loaded, edges_loaded }`.
|
|
- `ingest{,_as,_file,_file_as}` are `#[deprecated]` shims over `load_as` preserving the historical contract (`from: None` forks from `main`; returns `IngestResult`); they are slated for removal. The CLI `ingest` command is a deprecated alias of `load --from <base>`.
|
|
|
|
## Embeddings during load
|
|
|
|
The loader does **not** embed `@embed` properties at load time. `@embed` is a catalog annotation consumed by query typecheck/lint; vectors are supplied directly in the load data, or pre-filled by the offline `omnigraph embed` pipeline. Query-time `nearest($v, "string")` auto-embeds the query string via the provider-independent embedding client. See [embeddings.md](../user/search/embeddings.md). (Ingest-time `@embed` execution is a planned RFC-012 phase.)
|