mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-09 01:35:18 +02:00
77 lines
2.9 KiB
Markdown
77 lines
2.9 KiB
Markdown
|
|
# Query Execution, Mutations, and Loading
|
||
|
|
|
||
|
|
## Query execution (`exec/query.rs`)
|
||
|
|
|
||
|
|
Pipeline:
|
||
|
|
|
||
|
|
1. Parse + typecheck via `omnigraph-compiler`.
|
||
|
|
2. Lower to IR.
|
||
|
|
3. If `Expand` or `AntiJoin` is present, build (or fetch from `RuntimeCache`) a `GraphIndex`.
|
||
|
|
4. Run `execute_query` against the snapshot.
|
||
|
|
|
||
|
|
### Multi-modal search modes (`SearchMode`)
|
||
|
|
|
||
|
|
The executor recognizes three modes that may be combined in a single query:
|
||
|
|
|
||
|
|
- **`nearest`** — vector ANN (uses Lance vector index; `LIMIT` required).
|
||
|
|
- **`bm25`** — BM25 over an inverted index.
|
||
|
|
- **`rrf`** — Reciprocal Rank Fusion of two rankings, with k (default 60).
|
||
|
|
|
||
|
|
Hybrid example: `order { rrf(nearest($d.embedding, $q), bm25($d.body, $q_text)) desc } limit 20`.
|
||
|
|
|
||
|
|
### Joins / set operations
|
||
|
|
|
||
|
|
- Joins are implicit: MATCH bindings + traversals are implemented as scans + CSR/CSC lookups.
|
||
|
|
- `not { … }` lowers to an `AntiJoin` over the inner pipeline.
|
||
|
|
|
||
|
|
### Scoped reads
|
||
|
|
|
||
|
|
- `query(target, source, name, params)` — at any branch or snapshot.
|
||
|
|
- `run_query_at(version, …)` — direct historical query at a manifest version.
|
||
|
|
|
||
|
|
### Concurrency
|
||
|
|
|
||
|
|
- Snapshot isolation per query: all reads inside a query use the same `Snapshot`.
|
||
|
|
- Readers and writers on different branches don't block each other.
|
||
|
|
|
||
|
|
## Mutation execution (`exec/mutation.rs`)
|
||
|
|
|
||
|
|
Resolves expression values to literals, converts to typed Arrow arrays (`literal_to_typed_array(lit, DataType, num_rows)`), then writes:
|
||
|
|
|
||
|
|
- `insert` → Lance `WriteMode::Append`
|
||
|
|
- `update` → Lance `merge_insert(WhenMatched::Update)`
|
||
|
|
- `delete` → Lance `merge_insert(WhenMatched::Delete)` (logical) or filtered overwrite.
|
||
|
|
|
||
|
|
Multi-statement mutations are atomic at the manifest commit boundary.
|
||
|
|
|
||
|
|
## Bulk loader (`loader/mod.rs`)
|
||
|
|
|
||
|
|
- **JSONL only** in v1, with two record shapes:
|
||
|
|
- Node: `{"type":"NodeType", "data":{…}}`
|
||
|
|
- Edge: `{"edge":"EdgeType", "from":"src_id", "to":"dst_id", "data":{…}}`
|
||
|
|
- Lines starting with `//` are treated as comments.
|
||
|
|
- Schema validation on every row (typecheck, required props, blob base64 decoding).
|
||
|
|
- Edge endpoint resolution by node `@key`.
|
||
|
|
|
||
|
|
## Load modes (`LoadMode`)
|
||
|
|
|
||
|
|
| Mode | Semantics |
|
||
|
|
|---|---|
|
||
|
|
| `Overwrite` | Replace all data in the target tables on the branch |
|
||
|
|
| `Append` | Strict insert; duplicates error |
|
||
|
|
| `Merge` | Upsert by id (`merge_insert`) |
|
||
|
|
|
||
|
|
## `load` vs `ingest`
|
||
|
|
|
||
|
|
- `load(branch, data, mode)` — direct load to a branch.
|
||
|
|
- `ingest(branch, from, data, mode)` — branch-creating, transactional load:
|
||
|
|
1. If target advanced since the run started, fork a fresh run branch from `from`.
|
||
|
|
2. Load into the run branch (Append).
|
||
|
|
3. If target hasn't moved, fast-publish; otherwise abort.
|
||
|
|
- Returns `IngestResult { branch, base_branch, branch_created, mode, tables[] }`.
|
||
|
|
- `ingest_as(actor_id)` records the actor on the resulting commit.
|
||
|
|
|
||
|
|
## Embeddings during load
|
||
|
|
|
||
|
|
If a node type has `@embed` properties, the loader calls the engine embedding client (Gemini, RETRIEVAL_DOCUMENT) per row to populate the vector column. See [embeddings.md](embeddings.md).
|