From 64b9d56476133e4936d83d8f17949b855b0ed269 Mon Sep 17 00:00:00 2001 From: Ragnor Comerford Date: Wed, 29 Apr 2026 16:58:56 +0200 Subject: [PATCH] docs: add Mermaid architecture diagrams across architecture / storage / execution MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace the single ASCII stack in docs/architecture.md with a hierarchy of Mermaid diagrams that show the system from external context down to the component level. Add an on-disk layout diagram in docs/storage.md and two sequence diagrams (read query, mutation) in docs/execution.md so readers can navigate from "what is OmniGraph" to "how does a query run" without opening source. Static structure (docs/architecture.md): - System context — agents/clients, embedding providers, Cedar, object store. - Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling via classDef, replacing the pre-existing ASCII art. - Component zoom-ins — compiler, engine, storage trait, index lifecycle, server/CLI. Each zoom-in cites file:line entry points. Aspirational shapes (storage trait, full reconciler) are visually marked and pointed at the relevant invariants.md section so readers see the intended seam without thinking it's already implemented. On-disk layout (docs/storage.md): - Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance, _graph_runs.lance, _refs/branches/ down into Lance's per-dataset internals (_versions/, data/, _indices/, _refs/, _transactions/). - Annotated with the actual filenames so readers can `ls` the same paths. - Slots in below the existing __manifest CAS / OCC / migration prose; does not move or rewrite that content. Runtime flows (docs/execution.md): - Read flow sequence: client → Omnigraph::query → typecheck → lower → execute_query → table_store → Lance scanner → RecordBatch stream. - Mutation flow sequence: Omnigraph::mutate → resolve literals → Lance write op (Append / merge_insert) → ManifestRepo::commit → __manifest upsert. - Both diagrams are followed by a "Code paths" block with verified file:line citations so readers can navigate from diagram element to source in one step. Conventions established (this is the first Mermaid in the repo): - L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed. - Diagram size cap ~9 elements; more detail goes in a sub-diagram. - Diagrams paired with prose; code-path citations follow each diagram. - Consistent vocabulary across diagrams: frontend / compiler / engine / storage trait / Lance / object store. No accidental synonyms. Subsequent PRs will add flow diagrams for schema apply, branch + merge, run isolation, index reconcile, and the embedding pipeline in the same conventions. --- docs/architecture.md | 277 +++++++++++++++++++++++++++++++++++++------ docs/execution.md | 89 ++++++++++++++ docs/storage.md | 49 ++++++++ 3 files changed, 376 insertions(+), 39 deletions(-) diff --git a/docs/architecture.md b/docs/architecture.md index 4fc8867..1c47fd0 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -2,49 +2,248 @@ OmniGraph is a typed property-graph engine built as a coordination layer over many Lance datasets, with Git-style branches and commits across the whole graph, multi-modal querying (vector + FTS + BM25 + RRF + graph traversal) in one runtime, an HTTP server with Cedar policy, and a CLI driven by a single `omnigraph.yaml`. -## Stack +## Reading guide +Three views, increasing zoom: + +1. **System context** — what OmniGraph is and what it touches. +2. **Layer view** — the eight-layer stack inside one OmniGraph process. +3. **Component zoom-ins** — what's inside each layer. + +For runtime flows (read query, mutation), see [`docs/execution.md`](execution.md). For the on-disk layout of a repo, see [`docs/storage.md`](storage.md). + +L1 (orange in the diagrams) is what we inherit from Lance; L2 (blue) is what OmniGraph adds. The L1/L2 framing is also called out in prose at the bottom of this doc. + +## System context + +```mermaid +flowchart LR + classDef external fill:#fef3e8,stroke:#c46900,color:#000 + classDef omnigraph fill:#e8f4fd,stroke:#1e6aa8,color:#000 + classDef store fill:#f0f0f0,stroke:#555,color:#000 + + cli[CLI users]:::external + http[HTTP clients
and SDKs]:::external + agents[Agents]:::external + embed[Embedding providers
OpenAI / Gemini]:::external + + og[OmniGraph
kernel]:::omnigraph + + cedar[Cedar policy
engine]:::external + s3[Object store
local FS / S3 / RustFS]:::store + + cli --> og + http --> og + agents --> og + og --> embed + og --> cedar + og --> s3 ``` -┌──────────────────────────────────────────────────────────────────┐ -│ CLI (omnigraph) HTTP Server (omnigraph-server, Axum) │ -│ - 13 cmd families - REST + OpenAPI │ -│ - Aliases, configs - Bearer auth + Cedar policy │ -└──────────────────────────────┬───────────────────────────────────┘ - │ -┌──────────────────────────────▼───────────────────────────────────┐ -│ omnigraph-compiler │ -│ - Pest grammars: schema.pest, query.pest │ -│ - Catalog (Node/Edge/Interface types) │ -│ - IR + lowering (NodeScan / Expand / Filter / AntiJoin) │ -│ - Schema migration planner │ -│ - Embedding client (OpenAI-style for query-time normalization) │ -└──────────────────────────────┬───────────────────────────────────┘ - │ -┌──────────────────────────────▼───────────────────────────────────┐ -│ omnigraph (engine) │ -│ - GraphCoordinator + ManifestRepo (__manifest) │ -│ - CommitGraph (_graph_commits.lance) │ -│ - RunRegistry (_graph_runs.lance, __run__ branches) │ -│ - GraphIndex (CSR/CSC) + RuntimeCache (LRU 8) │ -│ - exec::query / mutation / merge │ -│ - Embedding client (Gemini for runtime ingest) │ -└──────────────────────────────┬───────────────────────────────────┘ - │ -┌──────────────────────────────▼───────────────────────────────────┐ -│ Lance 4.x (per-table dataset) │ -│ - Columnar (Arrow) storage, fragments │ -│ - Manifest versions per dataset │ -│ - Per-dataset branches (copy-on-write) │ -│ - Indexes: BTREE, Inverted (FTS/BM25), IVF/HNSW vector │ -│ - merge_insert (upsert), append, delete │ -│ - compact_files, cleanup_old_versions │ -└──────────────────────────────┬───────────────────────────────────┘ - │ -┌──────────────────────────────▼───────────────────────────────────┐ -│ Object store: local FS, S3, RustFS, MinIO, S3-compatible │ -└──────────────────────────────────────────────────────────────────┘ + +OmniGraph runs as a single process (one binary, multiple crates). External dependencies are the embedding APIs (called during ingest and at query-time normalization), Cedar (called for every privileged action), and an object store (everything OmniGraph persists lands here). + +## Layer view + +Inside the OmniGraph process, work flows through these layers: + +```mermaid +flowchart TB + classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000 + classDef l1 fill:#fef3e8,stroke:#c46900,color:#000 + + subgraph CLIs[CLI and HTTP server] + cli[omnigraph CLI]:::l2 + srv[omnigraph-server
Axum + Cedar]:::l2 + end + + subgraph compiler[omnigraph-compiler] + front[parse → AST → typecheck → catalog → IR]:::l2 + end + + subgraph engine[omnigraph engine] + plan[exec query and mutation]:::l2 + gi[graph index CSR/CSC
RuntimeCache LRU 8]:::l2 + coord[coordinator
ManifestRepo · CommitGraph · RunRegistry]:::l2 + end + + subgraph storage[storage trait — wraps Lance] + ts[table_store · storage.rs
direct lance::Dataset today]:::l2 + end + + subgraph lance_layer[Lance 4.x — substrate] + lance[per-dataset versions, fragments
BTREE · Inverted FTS · IVF/HNSW vector
merge_insert · compact_files · cleanup_old_versions]:::l1 + end + + subgraph object_store[Object store] + os[local FS · S3 · RustFS · MinIO]:::l1 + end + + CLIs -- "string + params" --> compiler + compiler -- IROp --> engine + engine -- "scan / write request" --> storage + storage -- "Stream of RecordBatch" --> engine + storage -- "Lance API calls" --> lance_layer + lance_layer -- bytes --> object_store ``` +The `storage trait` row is partly aspirational. Today the engine calls `lance::Dataset` methods through `table_store`; a capability-bearing `Dataset` trait per [`docs/invariants.md`](invariants.md) §I.4 is on the roadmap (MR-737). The diagram shows the intended seam. + +## Component zoom-ins + +### Compiler — `omnigraph-compiler` + +```mermaid +flowchart LR + classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000 + + src[".gq source"]:::l2 + p[parser Pest
query.pest · schema.pest]:::l2 + ast[AST
QueryDecl · Mutation · Schema]:::l2 + cat[catalog
NodeType · EdgeType · Interface]:::l2 + tc[typecheck
typecheck_query]:::l2 + low[lower
lower_query]:::l2 + ir[IROp pipeline
NodeScan · Expand · Filter · AntiJoin]:::l2 + + src --> p --> ast --> tc + cat --> tc + tc --> low --> ir +``` + +The compiler crate has zero Lance dependency. It owns the schema language, the query language, and the AST → IR lowering. + +Code paths: + +- Parser: `crates/omnigraph-compiler/src/query/parser.rs`, `crates/omnigraph-compiler/src/query/query.pest` +- Typecheck: `crates/omnigraph-compiler/src/query/typecheck.rs:83` (`typecheck_query`) +- Lower: `crates/omnigraph-compiler/src/ir/lower.rs:11` (`lower_query`) +- Catalog: `crates/omnigraph-compiler/src/catalog/` + +### Engine — `omnigraph` crate + +```mermaid +flowchart TB + classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000 + + subgraph exec[exec module] + eq[query · execute_query
query.rs:347]:::l2 + em[mutation · mutate
mutation.rs:511]:::l2 + ld[loader · ingest
loader/mod.rs:74]:::l2 + end + + subgraph state[graph state] + coord[GraphCoordinator]:::l2 + mr[ManifestRepo
db/manifest.rs]:::l2 + cg[CommitGraph
_graph_commits.lance]:::l2 + rr[RunRegistry
_graph_runs.lance]:::l2 + end + + subgraph idx[graph index] + gi[GraphIndex
CSR/CSC built per query]:::l2 + rc[RuntimeCache LRU=8]:::l2 + end + + subgraph io[Lance I/O] + ts[table_store]:::l2 + st[storage adapter
storage.rs]:::l2 + end + + eq --> gi + eq --> ts + em --> ts + ld --> ts + eq --> mr + em --> mr + coord --> mr + coord --> cg + coord --> rr + ts --> st +``` + +The engine binds the compiler IR to Lance. It owns multi-dataset coordination, the graph topology index, the run registry, and the snapshot/manifest read path. + +Code paths: + +- Read entry: `Omnigraph::query` at `crates/omnigraph/src/exec/query.rs:7` +- Mutation entry: `Omnigraph::mutate` at `crates/omnigraph/src/exec/mutation.rs:511` +- Manifest commit: `ManifestRepo::commit` at `crates/omnigraph/src/db/manifest.rs:280` +- Graph index: `crates/omnigraph/src/graph_index/` +- Loader: `Omnigraph::ingest` at `crates/omnigraph/src/loader/mod.rs:74` + +### Storage trait — today vs. roadmap + +```mermaid +flowchart LR + classDef now fill:#e8f4fd,stroke:#1e6aa8,color:#000 + classDef future fill:#fff,stroke:#888,stroke-dasharray:5 5,color:#444 + + subgraph today[Today] + d1[table_store
opens lance::Dataset directly]:::now + d2[storage.rs
S3 / file URI plumbing]:::now + end + + subgraph roadmap[Roadmap — invariants §I.4] + t[trait Dataset
schema · stats · placement
capabilities · scan · write]:::future + impl1[LanceStorage]:::future + impl2[MemStorage for tests]:::future + end + + today -.-> roadmap + t --> impl1 + t --> impl2 +``` + +The storage layer's trait surface is aspirational. Today the engine calls `lance::Dataset` methods directly. The roadmap (per [`docs/invariants.md`](invariants.md) §I.4 and MR-737) is a `Dataset` trait that surfaces capabilities and statistics so the planner can reason about pushdown opportunities. + +### Index lifecycle — today vs. roadmap + +```mermaid +flowchart LR + classDef now fill:#e8f4fd,stroke:#1e6aa8,color:#000 + classDef future fill:#fff,stroke:#888,stroke-dasharray:5 5,color:#444 + + subgraph today[Today] + ei[ensure_indices
omnigraph.rs:445]:::now + manual[called manually
or from optimize]:::now + end + + subgraph roadmap[Roadmap — invariants §VII.35] + rec[Reconciler
observes manifest]:::future + diff[coverage diff
fragments − fragment_bitmap]:::future + wp[worker pool
builds index segments]:::future + end + + manual --> ei + today -.-> roadmap + rec --> diff --> wp +``` + +Today, indexes are built explicitly via `ensure_indices`. Reads degrade gracefully when index coverage is partial — Lance's scanner unions indexed and scan paths automatically. The roadmap reconciler (per [`docs/invariants.md`](invariants.md) §VII.35) observes manifest state and converges coverage in the background. + +### Server / CLI + +```mermaid +flowchart LR + classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000 + + cli[omnigraph CLI
13 cmd families]:::l2 + srv_in[Axum HTTP
REST + OpenAPI]:::l2 + auth[Bearer auth
SHA-256 hashed tokens]:::l2 + pol[Cedar policy gate
per request]:::l2 + eng[engine API]:::l2 + + cli -.-> eng + srv_in --> auth --> pol --> eng +``` + +The server applies Cedar policy at the HTTP boundary today (per [`docs/invariants.md`](invariants.md) §VII.45, the roadmap is to push policy into the planner as predicates). The CLI bypasses the HTTP layer and calls the engine API directly. + +Code paths: + +- Server entry: `crates/omnigraph-server/src/lib.rs` +- Auth: `crates/omnigraph-server/src/auth.rs` +- Policy: `crates/omnigraph-server/src/policy.rs` +- CLI: `crates/omnigraph-cli/src/main.rs` + ## L1 / L2 framing Throughout the docs, capabilities are split into: diff --git a/docs/execution.md b/docs/execution.md index 3b65217..9f0d27c 100644 --- a/docs/execution.md +++ b/docs/execution.md @@ -9,6 +9,49 @@ Pipeline: 3. If `Expand` or `AntiJoin` is present, build (or fetch from `RuntimeCache`) a `GraphIndex`. 4. Run `execute_query` against the snapshot. +### Read flow — sequence + +```mermaid +sequenceDiagram + autonumber + participant client as Client + participant og as Omnigraph::query
(query.rs:7) + participant cmp as omnigraph-compiler + participant exec as execute_query
(query.rs:347) + participant gi as GraphIndex
(RuntimeCache) + participant ts as table_store + participant lance as Lance scanner + + client->>og: query(target, source, name, params) + og->>og: ensure_schema_state_valid()
resolve target → snapshot + og->>cmp: parse + typecheck_query (typecheck.rs:83) + cmp-->>og: CheckedQuery + og->>cmp: lower_query (lower.rs:11) + cmp-->>og: QueryIR (pipeline of IROp) + og->>exec: extract_search_mode + dispatch (query.rs:110) + exec->>gi: build / fetch GraphIndex
(if Expand or AntiJoin) + gi-->>exec: CSR / CSC topology + loop for each IROp in pipeline + exec->>ts: scan with predicate / SIP + ts->>lance: filter · nearest · full_text_search + lance-->>ts: Stream of RecordBatch + ts-->>exec: RecordBatch stream + exec->>exec: factorize · expand · fuse · project + end + exec-->>og: QueryResult (RecordBatches) + og-->>client: serialized result +``` + +**Code paths:** + +- Entry: `Omnigraph::query` at `crates/omnigraph/src/exec/query.rs:7` +- Search-mode extraction: `extract_search_mode` at `crates/omnigraph/src/exec/query.rs:110` +- Pipeline runner: `execute_query` at `crates/omnigraph/src/exec/query.rs:347` +- RRF fan-out: `execute_rrf_query` at `crates/omnigraph/src/exec/query.rs:393` +- Per-source-row BFS: `execute_expand` at `crates/omnigraph/src/exec/query.rs:675` +- Lance scan + pushdown: `execute_node_scan` at `crates/omnigraph/src/exec/query.rs:1027` +- Filter → SQL pushdown: `build_lance_filter` at `crates/omnigraph/src/exec/query.rs:1158` + ### Multi-modal search modes (`SearchMode`) The executor recognizes three modes that may be combined in a single query: @@ -44,6 +87,52 @@ Resolves expression values to literals, converts to typed Arrow arrays (`literal Multi-statement mutations are atomic at the manifest commit boundary. +### Mutation flow — sequence + +```mermaid +sequenceDiagram + autonumber + participant client as Client + participant og as Omnigraph::mutate
(mutation.rs:511) + participant cmp as omnigraph-compiler + participant ts as table_store + participant lance as Lance dataset + participant mr as ManifestRepo
(manifest.rs:280) + participant manifest as __manifest/ + + client->>og: mutate(target, source, name, params) + og->>cmp: parse + typecheck_query + cmp-->>og: CheckedQuery (Mutation IR) + og->>og: resolve expression literals
literal_to_typed_array(lit, type, n) + loop for each mutation statement + alt insert + og->>ts: append RecordBatches + ts->>lance: WriteMode::Append → new fragment(s) + else update + og->>ts: merge_insert keyed by id + ts->>lance: merge_insert(WhenMatched::Update) + else delete + og->>ts: merge_insert with delete predicate + ts->>lance: merge_insert(WhenMatched::Delete) + end + lance-->>ts: new dataset version + ts-->>og: SubTableUpdate (key, version, row_count) + end + og->>mr: commit(updates) + mr->>manifest: append rows
(table_version per sub-table) + manifest-->>mr: new graph-manifest version + mr-->>og: graph version + og-->>client: MutationResult +``` + +**Code paths:** + +- Entry: `Omnigraph::mutate` at `crates/omnigraph/src/exec/mutation.rs:511` +- Actor-attributed variant: `Omnigraph::mutate_as` at `crates/omnigraph/src/exec/mutation.rs:522` +- Manifest commit: `ManifestRepo::commit` at `crates/omnigraph/src/db/manifest.rs:280` + +The whole mutation — every statement, every affected sub-table — publishes through one call to `ManifestRepo::commit`. That single append to `__manifest` is what gives multi-statement mutations their atomicity guarantee (per [`docs/invariants.md`](invariants.md) §VI.26). + ## Bulk loader (`loader/mod.rs`) - **JSONL only** in v1, with two record shapes: diff --git a/docs/storage.md b/docs/storage.md index b7d1b92..153b080 100644 --- a/docs/storage.md +++ b/docs/storage.md @@ -48,6 +48,55 @@ Adding a new on-disk shape change is one constant bump (`INTERNAL_MANIFEST_SCHEM | v1 (implicit, pre-stamp) | `__manifest.object_id` had no PK annotation; publisher had no row-level CAS protection. | | v2 | `__manifest.object_id` carries `lance-schema:unenforced-primary-key=true`; row-level CAS engaged. Stamped as `omnigraph:internal_schema_version=2`. | +## On-disk layout + +A repo on disk is a directory tree of Lance datasets. Each dataset follows the standard Lance layout (`_versions/`, `data/`, `_indices/`, `_refs/`); OmniGraph adds the multi-dataset coordination by keeping `__manifest/` alongside the per-type datasets. + +```mermaid +flowchart TB + classDef l1 fill:#fef3e8,stroke:#c46900,color:#000 + classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000 + + repo["repo URI
file:// or s3://bucket/prefix"]:::l2 + + manifest["__manifest/
L2 catalog of sub-tables"]:::l2 + nodes["nodes/{fnv1a64-hex}/
one dataset per node type"]:::l2 + edges["edges/{fnv1a64-hex}/
one dataset per edge type"]:::l2 + cgraph["_graph_commits.lance/
_graph_commit_actors.lance/"]:::l2 + runs["_graph_runs.lance/
_graph_run_actors.lance/"]:::l2 + refs["_refs/branches/{name}.json
graph-level branches"]:::l2 + + repo --> manifest + repo --> nodes + repo --> edges + repo --> cgraph + repo --> runs + repo --> refs + + subgraph dataset[Inside each Lance dataset — L1] + ds_v["_versions/{n}.manifest
per-dataset versions"]:::l1 + ds_data["data/
fragment files (Arrow IPC)"]:::l1 + ds_idx["_indices/{uuid}/
BTREE · Inverted FTS · IVF/HNSW"]:::l1 + ds_refs["_refs/
per-dataset Lance branches/tags"]:::l1 + ds_tx["_transactions/
commit transaction logs"]:::l1 + end + + nodes -.-> dataset + edges -.-> dataset + manifest -.-> dataset +``` + +**What's where:** + +- **Repo root** is one directory (or S3 prefix). Everything below is part of one OmniGraph repo. +- **`__manifest/`** is a Lance dataset whose rows describe which sub-table version is published at which graph-branch. Reading a snapshot starts here. +- **`nodes/`** and **`edges/`** are sibling directories holding one Lance dataset per declared type. Names are `fnv1a64-hex` of the type name to keep paths fixed-length and case-safe. +- **`_graph_commits.lance` / `_graph_runs.lance`** are L2 datasets that record the graph-level commit DAG and run registry respectively (each has a paired `*_actors.lance` for the actor map). +- **`_refs/branches/{name}.json`** is graph-level branch metadata — pointers from a branch name to the manifest version it heads. +- **Inside each Lance dataset** (orange): the standard Lance directory layout. `_versions/{n}.manifest` records every commit; `data/` holds the actual Arrow fragments; `_indices/{uuid}/` holds index segments with their own `fragment_bitmap` for partial coverage; `_refs/` holds Lance-native per-dataset branches and tags. + +The split — L2 owns the cross-dataset catalog; L1 owns the per-dataset internals — means that schema work (which adds or removes datasets) updates `__manifest`, while data work (which adds fragments) updates `_versions/` inside the affected dataset and then bumps `__manifest`. + ## URI scheme support (`storage.rs`) | Scheme | Backend | Notes |