mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-15 01:55:13 +02:00
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture, where-to-find index, always-on invariants, capability matrix, maintenance contract) plus 18 new docs/*.md files holding the deep content per topic (storage, schema and query languages, indexes, embeddings, branches/commits, runs, merge, changes, execution, policy, server, CLI reference, audit, errors, CI, constants, v0.3.1 notes). Adds scripts/check-agents-md.sh and a check_agents_md CI job that verifies every docs/ link in AGENTS.md resolves and every doc in the canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
cfea41e942
commit
a335d98854
23 changed files with 1069 additions and 924 deletions
76
docs/execution.md
Normal file
76
docs/execution.md
Normal file
|
|
@ -0,0 +1,76 @@
|
|||
# Query Execution, Mutations, and Loading
|
||||
|
||||
## Query execution (`exec/query.rs`)
|
||||
|
||||
Pipeline:
|
||||
|
||||
1. Parse + typecheck via `omnigraph-compiler`.
|
||||
2. Lower to IR.
|
||||
3. If `Expand` or `AntiJoin` is present, build (or fetch from `RuntimeCache`) a `GraphIndex`.
|
||||
4. Run `execute_query` against the snapshot.
|
||||
|
||||
### Multi-modal search modes (`SearchMode`)
|
||||
|
||||
The executor recognizes three modes that may be combined in a single query:
|
||||
|
||||
- **`nearest`** — vector ANN (uses Lance vector index; `LIMIT` required).
|
||||
- **`bm25`** — BM25 over an inverted index.
|
||||
- **`rrf`** — Reciprocal Rank Fusion of two rankings, with k (default 60).
|
||||
|
||||
Hybrid example: `order { rrf(nearest($d.embedding, $q), bm25($d.body, $q_text)) desc } limit 20`.
|
||||
|
||||
### Joins / set operations
|
||||
|
||||
- Joins are implicit: MATCH bindings + traversals are implemented as scans + CSR/CSC lookups.
|
||||
- `not { … }` lowers to an `AntiJoin` over the inner pipeline.
|
||||
|
||||
### Scoped reads
|
||||
|
||||
- `query(target, source, name, params)` — at any branch or snapshot.
|
||||
- `run_query_at(version, …)` — direct historical query at a manifest version.
|
||||
|
||||
### Concurrency
|
||||
|
||||
- Snapshot isolation per query: all reads inside a query use the same `Snapshot`.
|
||||
- Readers and writers on different branches don't block each other.
|
||||
|
||||
## Mutation execution (`exec/mutation.rs`)
|
||||
|
||||
Resolves expression values to literals, converts to typed Arrow arrays (`literal_to_typed_array(lit, DataType, num_rows)`), then writes:
|
||||
|
||||
- `insert` → Lance `WriteMode::Append`
|
||||
- `update` → Lance `merge_insert(WhenMatched::Update)`
|
||||
- `delete` → Lance `merge_insert(WhenMatched::Delete)` (logical) or filtered overwrite.
|
||||
|
||||
Multi-statement mutations are atomic at the manifest commit boundary.
|
||||
|
||||
## Bulk loader (`loader/mod.rs`)
|
||||
|
||||
- **JSONL only** in v1, with two record shapes:
|
||||
- Node: `{"type":"NodeType", "data":{…}}`
|
||||
- Edge: `{"edge":"EdgeType", "from":"src_id", "to":"dst_id", "data":{…}}`
|
||||
- Lines starting with `//` are treated as comments.
|
||||
- Schema validation on every row (typecheck, required props, blob base64 decoding).
|
||||
- Edge endpoint resolution by node `@key`.
|
||||
|
||||
## Load modes (`LoadMode`)
|
||||
|
||||
| Mode | Semantics |
|
||||
|---|---|
|
||||
| `Overwrite` | Replace all data in the target tables on the branch |
|
||||
| `Append` | Strict insert; duplicates error |
|
||||
| `Merge` | Upsert by id (`merge_insert`) |
|
||||
|
||||
## `load` vs `ingest`
|
||||
|
||||
- `load(branch, data, mode)` — direct load to a branch.
|
||||
- `ingest(branch, from, data, mode)` — branch-creating, transactional load:
|
||||
1. If target advanced since the run started, fork a fresh run branch from `from`.
|
||||
2. Load into the run branch (Append).
|
||||
3. If target hasn't moved, fast-publish; otherwise abort.
|
||||
- Returns `IngestResult { branch, base_branch, branch_created, mode, tables[] }`.
|
||||
- `ingest_as(actor_id)` records the actor on the resulting commit.
|
||||
|
||||
## Embeddings during load
|
||||
|
||||
If a node type has `@embed` properties, the loader calls the engine embedding client (Gemini, RETRIEVAL_DOCUMENT) per row to populate the vector column. See [embeddings.md](embeddings.md).
|
||||
Loading…
Add table
Add a link
Reference in a new issue