docs: add Mermaid architecture diagrams across architecture / storage / execution

Replace the single ASCII stack in docs/architecture.md with a hierarchy of Mermaid diagrams that show the system from external context down to the component level. Add an on-disk layout diagram in docs/storage.md and two sequence diagrams (read query, mutation) in docs/execution.md so readers can navigate from "what is OmniGraph" to "how does a query run" without opening source. Static structure (docs/architecture.md): - System context — agents/clients, embedding providers, Cedar, object store. - Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling via classDef, replacing the pre-existing ASCII art. - Component zoom-ins — compiler, engine, storage trait, index lifecycle, server/CLI. Each zoom-in cites file:line entry points. Aspirational shapes (storage trait, full reconciler) are visually marked and pointed at the relevant invariants.md section so readers see the intended seam without thinking it's already implemented. On-disk layout (docs/storage.md): - Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance, _graph_runs.lance, _refs/branches/ down into Lance's per-dataset internals (_versions/, data/, _indices/, _refs/, _transactions/). - Annotated with the actual filenames so readers can `ls` the same paths. - Slots in below the existing __manifest CAS / OCC / migration prose; does not move or rewrite that content. Runtime flows (docs/execution.md): - Read flow sequence: client → Omnigraph::query → typecheck → lower → execute_query → table_store → Lance scanner → RecordBatch stream. - Mutation flow sequence: Omnigraph::mutate → resolve literals → Lance write op (Append / merge_insert) → ManifestRepo::commit → __manifest upsert. - Both diagrams are followed by a "Code paths" block with verified file:line citations so readers can navigate from diagram element to source in one step. Conventions established (this is the first Mermaid in the repo): - L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed. - Diagram size cap ~9 elements; more detail goes in a sub-diagram. - Diagrams paired with prose; code-path citations follow each diagram. - Consistent vocabulary across diagrams: frontend / compiler / engine / storage trait / Lance / object store. No accidental synonyms. Subsequent PRs will add flow diagrams for schema apply, branch + merge, run isolation, index reconcile, and the embedding pipeline in the same conventions.
2026-06-09 01:35:18 +02:00 · 2026-04-29 16:58:56 +02:00 · 2026-04-29 16:58:56 +02:00 · 64b9d56476
commit 64b9d56476
parent 4e5374a85e
3 changed files with 376 additions and 39 deletions
--- a/docs/architecture.md
+++ b/docs/architecture.md
@ -2,49 +2,248 @@

 OmniGraph is a typed property-graph engine built as a coordination layer over many Lance datasets, with Git-style branches and commits across the whole graph, multi-modal querying (vector + FTS + BM25 + RRF + graph traversal) in one runtime, an HTTP server with Cedar policy, and a CLI driven by a single `omnigraph.yaml`.

-## Stack
+## Reading guide

+Three views, increasing zoom:
+
+1. **System context** — what OmniGraph is and what it touches.
+2. **Layer view** — the eight-layer stack inside one OmniGraph process.
+3. **Component zoom-ins** — what's inside each layer.
+
+For runtime flows (read query, mutation), see [`docs/execution.md`](execution.md). For the on-disk layout of a repo, see [`docs/storage.md`](storage.md).
+
+L1 (orange in the diagrams) is what we inherit from Lance; L2 (blue) is what OmniGraph adds. The L1/L2 framing is also called out in prose at the bottom of this doc.
+
+## System context
+
+```mermaid
+flowchart LR
+    classDef external fill:#fef3e8,stroke:#c46900,color:#000
+    classDef omnigraph fill:#e8f4fd,stroke:#1e6aa8,color:#000
+    classDef store fill:#f0f0f0,stroke:#555,color:#000
+
+    cli[CLI users]:::external
+    http[HTTP clients<br/>and SDKs]:::external
+    agents[Agents]:::external
+    embed[Embedding providers<br/>OpenAI / Gemini]:::external
+
+    og[OmniGraph<br/>kernel]:::omnigraph
+
+    cedar[Cedar policy<br/>engine]:::external
+    s3[Object store<br/>local FS / S3 / RustFS]:::store
+
+    cli --> og
+    http --> og
+    agents --> og
+    og --> embed
+    og --> cedar
+    og --> s3
 ```
-┌──────────────────────────────────────────────────────────────────┐
-│  CLI (omnigraph)        HTTP Server (omnigraph-server, Axum)     │
-│  - 13 cmd families      - REST + OpenAPI                         │
-│  - Aliases, configs     - Bearer auth + Cedar policy             │
-└──────────────────────────────┬───────────────────────────────────┘
-                               │
-┌──────────────────────────────▼───────────────────────────────────┐
-│  omnigraph-compiler                                              │
-│  - Pest grammars: schema.pest, query.pest                        │
-│  - Catalog (Node/Edge/Interface types)                           │
-│  - IR + lowering (NodeScan / Expand / Filter / AntiJoin)         │
-│  - Schema migration planner                                      │
-│  - Embedding client (OpenAI-style for query-time normalization)  │
-└──────────────────────────────┬───────────────────────────────────┘
-                               │
-┌──────────────────────────────▼───────────────────────────────────┐
-│  omnigraph (engine)                                              │
-│  - GraphCoordinator + ManifestRepo (__manifest)                  │
-│  - CommitGraph (_graph_commits.lance)                            │
-│  - RunRegistry  (_graph_runs.lance, __run__ branches)            │
-│  - GraphIndex (CSR/CSC) + RuntimeCache (LRU 8)                   │
-│  - exec::query / mutation / merge                                │
-│  - Embedding client (Gemini for runtime ingest)                  │
-└──────────────────────────────┬───────────────────────────────────┘
-                               │
-┌──────────────────────────────▼───────────────────────────────────┐
-│  Lance 4.x  (per-table dataset)                                  │
-│  - Columnar (Arrow) storage, fragments                           │
-│  - Manifest versions per dataset                                 │
-│  - Per-dataset branches (copy-on-write)                          │
-│  - Indexes: BTREE, Inverted (FTS/BM25), IVF/HNSW vector          │
-│  - merge_insert (upsert), append, delete                         │
-│  - compact_files, cleanup_old_versions                           │
-└──────────────────────────────┬───────────────────────────────────┘
-                               │
-┌──────────────────────────────▼───────────────────────────────────┐
-│  Object store: local FS, S3, RustFS, MinIO, S3-compatible        │
-└──────────────────────────────────────────────────────────────────┘
+
+OmniGraph runs as a single process (one binary, multiple crates). External dependencies are the embedding APIs (called during ingest and at query-time normalization), Cedar (called for every privileged action), and an object store (everything OmniGraph persists lands here).
+
+## Layer view
+
+Inside the OmniGraph process, work flows through these layers:
+
+```mermaid
+flowchart TB
+    classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
+    classDef l1 fill:#fef3e8,stroke:#c46900,color:#000
+
+    subgraph CLIs[CLI and HTTP server]
+        cli[omnigraph CLI]:::l2
+        srv[omnigraph-server<br/>Axum + Cedar]:::l2
+    end
+
+    subgraph compiler[omnigraph-compiler]
+        front[parse → AST → typecheck → catalog → IR]:::l2
+    end
+
+    subgraph engine[omnigraph engine]
+        plan[exec query and mutation]:::l2
+        gi[graph index CSR/CSC<br/>RuntimeCache LRU 8]:::l2
+        coord[coordinator<br/>ManifestRepo · CommitGraph · RunRegistry]:::l2
+    end
+
+    subgraph storage[storage trait — wraps Lance]
+        ts[table_store · storage.rs<br/>direct lance::Dataset today]:::l2
+    end
+
+    subgraph lance_layer[Lance 4.x — substrate]
+        lance[per-dataset versions, fragments<br/>BTREE · Inverted FTS · IVF/HNSW vector<br/>merge_insert · compact_files · cleanup_old_versions]:::l1
+    end
+
+    subgraph object_store[Object store]
+        os[local FS · S3 · RustFS · MinIO]:::l1
+    end
+
+    CLIs -- "string + params" --> compiler
+    compiler -- IROp --> engine
+    engine -- "scan / write request" --> storage
+    storage -- "Stream of RecordBatch" --> engine
+    storage -- "Lance API calls" --> lance_layer
+    lance_layer -- bytes --> object_store
 ```

+The `storage trait` row is partly aspirational. Today the engine calls `lance::Dataset` methods through `table_store`; a capability-bearing `Dataset` trait per [`docs/invariants.md`](invariants.md) §I.4 is on the roadmap (MR-737). The diagram shows the intended seam.
+
+## Component zoom-ins
+
+### Compiler — `omnigraph-compiler`
+
+```mermaid
+flowchart LR
+    classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
+
+    src[".gq source"]:::l2
+    p[parser Pest<br/>query.pest · schema.pest]:::l2
+    ast[AST<br/>QueryDecl · Mutation · Schema]:::l2
+    cat[catalog<br/>NodeType · EdgeType · Interface]:::l2
+    tc[typecheck<br/>typecheck_query]:::l2
+    low[lower<br/>lower_query]:::l2
+    ir[IROp pipeline<br/>NodeScan · Expand · Filter · AntiJoin]:::l2
+
+    src --> p --> ast --> tc
+    cat --> tc
+    tc --> low --> ir
+```
+
+The compiler crate has zero Lance dependency. It owns the schema language, the query language, and the AST → IR lowering.
+
+Code paths:
+
+- Parser: `crates/omnigraph-compiler/src/query/parser.rs`, `crates/omnigraph-compiler/src/query/query.pest`
+- Typecheck: `crates/omnigraph-compiler/src/query/typecheck.rs:83` (`typecheck_query`)
+- Lower: `crates/omnigraph-compiler/src/ir/lower.rs:11` (`lower_query`)
+- Catalog: `crates/omnigraph-compiler/src/catalog/`
+
+### Engine — `omnigraph` crate
+
+```mermaid
+flowchart TB
+    classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
+
+    subgraph exec[exec module]
+        eq[query · execute_query<br/>query.rs:347]:::l2
+        em[mutation · mutate<br/>mutation.rs:511]:::l2
+        ld[loader · ingest<br/>loader/mod.rs:74]:::l2
+    end
+
+    subgraph state[graph state]
+        coord[GraphCoordinator]:::l2
+        mr[ManifestRepo<br/>db/manifest.rs]:::l2
+        cg[CommitGraph<br/>_graph_commits.lance]:::l2
+        rr[RunRegistry<br/>_graph_runs.lance]:::l2
+    end
+
+    subgraph idx[graph index]
+        gi[GraphIndex<br/>CSR/CSC built per query]:::l2
+        rc[RuntimeCache LRU=8]:::l2
+    end
+
+    subgraph io[Lance I/O]
+        ts[table_store]:::l2
+        st[storage adapter<br/>storage.rs]:::l2
+    end
+
+    eq --> gi
+    eq --> ts
+    em --> ts
+    ld --> ts
+    eq --> mr
+    em --> mr
+    coord --> mr
+    coord --> cg
+    coord --> rr
+    ts --> st
+```
+
+The engine binds the compiler IR to Lance. It owns multi-dataset coordination, the graph topology index, the run registry, and the snapshot/manifest read path.
+
+Code paths:
+
+- Read entry: `Omnigraph::query` at `crates/omnigraph/src/exec/query.rs:7`
+- Mutation entry: `Omnigraph::mutate` at `crates/omnigraph/src/exec/mutation.rs:511`
+- Manifest commit: `ManifestRepo::commit` at `crates/omnigraph/src/db/manifest.rs:280`
+- Graph index: `crates/omnigraph/src/graph_index/`
+- Loader: `Omnigraph::ingest` at `crates/omnigraph/src/loader/mod.rs:74`
+
+### Storage trait — today vs. roadmap
+
+```mermaid
+flowchart LR
+    classDef now fill:#e8f4fd,stroke:#1e6aa8,color:#000
+    classDef future fill:#fff,stroke:#888,stroke-dasharray:5 5,color:#444
+
+    subgraph today[Today]
+        d1[table_store<br/>opens lance::Dataset directly]:::now
+        d2[storage.rs<br/>S3 / file URI plumbing]:::now
+    end
+
+    subgraph roadmap[Roadmap — invariants §I.4]
+        t[trait Dataset<br/>schema · stats · placement<br/>capabilities · scan · write]:::future
+        impl1[LanceStorage]:::future
+        impl2[MemStorage for tests]:::future
+    end
+
+    today -.-> roadmap
+    t --> impl1
+    t --> impl2
+```
+
+The storage layer's trait surface is aspirational. Today the engine calls `lance::Dataset` methods directly. The roadmap (per [`docs/invariants.md`](invariants.md) §I.4 and MR-737) is a `Dataset` trait that surfaces capabilities and statistics so the planner can reason about pushdown opportunities.
+
+### Index lifecycle — today vs. roadmap
+
+```mermaid
+flowchart LR
+    classDef now fill:#e8f4fd,stroke:#1e6aa8,color:#000
+    classDef future fill:#fff,stroke:#888,stroke-dasharray:5 5,color:#444
+
+    subgraph today[Today]
+        ei[ensure_indices<br/>omnigraph.rs:445]:::now
+        manual[called manually<br/>or from optimize]:::now
+    end
+
+    subgraph roadmap[Roadmap — invariants §VII.35]
+        rec[Reconciler<br/>observes manifest]:::future
+        diff[coverage diff<br/>fragments − fragment_bitmap]:::future
+        wp[worker pool<br/>builds index segments]:::future
+    end
+
+    manual --> ei
+    today -.-> roadmap
+    rec --> diff --> wp
+```
+
+Today, indexes are built explicitly via `ensure_indices`. Reads degrade gracefully when index coverage is partial — Lance's scanner unions indexed and scan paths automatically. The roadmap reconciler (per [`docs/invariants.md`](invariants.md) §VII.35) observes manifest state and converges coverage in the background.
+
+### Server / CLI
+
+```mermaid
+flowchart LR
+    classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
+
+    cli[omnigraph CLI<br/>13 cmd families]:::l2
+    srv_in[Axum HTTP<br/>REST + OpenAPI]:::l2
+    auth[Bearer auth<br/>SHA-256 hashed tokens]:::l2
+    pol[Cedar policy gate<br/>per request]:::l2
+    eng[engine API]:::l2
+
+    cli -.-> eng
+    srv_in --> auth --> pol --> eng
+```
+
+The server applies Cedar policy at the HTTP boundary today (per [`docs/invariants.md`](invariants.md) §VII.45, the roadmap is to push policy into the planner as predicates). The CLI bypasses the HTTP layer and calls the engine API directly.
+
+Code paths:
+
+- Server entry: `crates/omnigraph-server/src/lib.rs`
+- Auth: `crates/omnigraph-server/src/auth.rs`
+- Policy: `crates/omnigraph-server/src/policy.rs`
+- CLI: `crates/omnigraph-cli/src/main.rs`
+
 ## L1 / L2 framing

 Throughout the docs, capabilities are split into:
--- a/docs/execution.md
+++ b/docs/execution.md
@ -9,6 +9,49 @@ Pipeline:
 3. If `Expand` or `AntiJoin` is present, build (or fetch from `RuntimeCache`) a `GraphIndex`.
 4. Run `execute_query` against the snapshot.

+### Read flow — sequence
+
+```mermaid
+sequenceDiagram
+    autonumber
+    participant client as Client
+    participant og as Omnigraph::query<br/>(query.rs:7)
+    participant cmp as omnigraph-compiler
+    participant exec as execute_query<br/>(query.rs:347)
+    participant gi as GraphIndex<br/>(RuntimeCache)
+    participant ts as table_store
+    participant lance as Lance scanner
+
+    client->>og: query(target, source, name, params)
+    og->>og: ensure_schema_state_valid()<br/>resolve target → snapshot
+    og->>cmp: parse + typecheck_query (typecheck.rs:83)
+    cmp-->>og: CheckedQuery
+    og->>cmp: lower_query (lower.rs:11)
+    cmp-->>og: QueryIR (pipeline of IROp)
+    og->>exec: extract_search_mode + dispatch (query.rs:110)
+    exec->>gi: build / fetch GraphIndex<br/>(if Expand or AntiJoin)
+    gi-->>exec: CSR / CSC topology
+    loop for each IROp in pipeline
+        exec->>ts: scan with predicate / SIP
+        ts->>lance: filter · nearest · full_text_search
+        lance-->>ts: Stream of RecordBatch
+        ts-->>exec: RecordBatch stream
+        exec->>exec: factorize · expand · fuse · project
+    end
+    exec-->>og: QueryResult (RecordBatches)
+    og-->>client: serialized result
+```
+
+**Code paths:**
+
+- Entry: `Omnigraph::query` at `crates/omnigraph/src/exec/query.rs:7`
+- Search-mode extraction: `extract_search_mode` at `crates/omnigraph/src/exec/query.rs:110`
+- Pipeline runner: `execute_query` at `crates/omnigraph/src/exec/query.rs:347`
+- RRF fan-out: `execute_rrf_query` at `crates/omnigraph/src/exec/query.rs:393`
+- Per-source-row BFS: `execute_expand` at `crates/omnigraph/src/exec/query.rs:675`
+- Lance scan + pushdown: `execute_node_scan` at `crates/omnigraph/src/exec/query.rs:1027`
+- Filter → SQL pushdown: `build_lance_filter` at `crates/omnigraph/src/exec/query.rs:1158`
+
 ### Multi-modal search modes (`SearchMode`)

 The executor recognizes three modes that may be combined in a single query:
@ -44,6 +87,52 @@ Resolves expression values to literals, converts to typed Arrow arrays (`literal

 Multi-statement mutations are atomic at the manifest commit boundary.

+### Mutation flow — sequence
+
+```mermaid
+sequenceDiagram
+    autonumber
+    participant client as Client
+    participant og as Omnigraph::mutate<br/>(mutation.rs:511)
+    participant cmp as omnigraph-compiler
+    participant ts as table_store
+    participant lance as Lance dataset
+    participant mr as ManifestRepo<br/>(manifest.rs:280)
+    participant manifest as __manifest/
+
+    client->>og: mutate(target, source, name, params)
+    og->>cmp: parse + typecheck_query
+    cmp-->>og: CheckedQuery (Mutation IR)
+    og->>og: resolve expression literals<br/>literal_to_typed_array(lit, type, n)
+    loop for each mutation statement
+        alt insert
+            og->>ts: append RecordBatches
+            ts->>lance: WriteMode::Append → new fragment(s)
+        else update
+            og->>ts: merge_insert keyed by id
+            ts->>lance: merge_insert(WhenMatched::Update)
+        else delete
+            og->>ts: merge_insert with delete predicate
+            ts->>lance: merge_insert(WhenMatched::Delete)
+        end
+        lance-->>ts: new dataset version
+        ts-->>og: SubTableUpdate (key, version, row_count)
+    end
+    og->>mr: commit(updates)
+    mr->>manifest: append rows<br/>(table_version per sub-table)
+    manifest-->>mr: new graph-manifest version
+    mr-->>og: graph version
+    og-->>client: MutationResult
+```
+
+**Code paths:**
+
+- Entry: `Omnigraph::mutate` at `crates/omnigraph/src/exec/mutation.rs:511`
+- Actor-attributed variant: `Omnigraph::mutate_as` at `crates/omnigraph/src/exec/mutation.rs:522`
+- Manifest commit: `ManifestRepo::commit` at `crates/omnigraph/src/db/manifest.rs:280`
+
+The whole mutation — every statement, every affected sub-table — publishes through one call to `ManifestRepo::commit`. That single append to `__manifest` is what gives multi-statement mutations their atomicity guarantee (per [`docs/invariants.md`](invariants.md) §VI.26).
+
 ## Bulk loader (`loader/mod.rs`)

 - **JSONL only** in v1, with two record shapes:
--- a/docs/storage.md
+++ b/docs/storage.md
@ -48,6 +48,55 @@ Adding a new on-disk shape change is one constant bump (`INTERNAL_MANIFEST_SCHEM
 | v1 (implicit, pre-stamp) | `__manifest.object_id` had no PK annotation; publisher had no row-level CAS protection. |
 | v2 | `__manifest.object_id` carries `lance-schema:unenforced-primary-key=true`; row-level CAS engaged. Stamped as `omnigraph:internal_schema_version=2`. |

+## On-disk layout
+
+A repo on disk is a directory tree of Lance datasets. Each dataset follows the standard Lance layout (`_versions/`, `data/`, `_indices/`, `_refs/`); OmniGraph adds the multi-dataset coordination by keeping `__manifest/` alongside the per-type datasets.
+
+```mermaid
+flowchart TB
+    classDef l1 fill:#fef3e8,stroke:#c46900,color:#000
+    classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
+
+    repo["repo URI<br/>file:// or s3://bucket/prefix"]:::l2
+
+    manifest["__manifest/<br/>L2 catalog of sub-tables"]:::l2
+    nodes["nodes/{fnv1a64-hex}/<br/>one dataset per node type"]:::l2
+    edges["edges/{fnv1a64-hex}/<br/>one dataset per edge type"]:::l2
+    cgraph["_graph_commits.lance/<br/>_graph_commit_actors.lance/"]:::l2
+    runs["_graph_runs.lance/<br/>_graph_run_actors.lance/"]:::l2
+    refs["_refs/branches/{name}.json<br/>graph-level branches"]:::l2
+
+    repo --> manifest
+    repo --> nodes
+    repo --> edges
+    repo --> cgraph
+    repo --> runs
+    repo --> refs
+
+    subgraph dataset[Inside each Lance dataset — L1]
+        ds_v["_versions/{n}.manifest<br/>per-dataset versions"]:::l1
+        ds_data["data/<br/>fragment files (Arrow IPC)"]:::l1
+        ds_idx["_indices/{uuid}/<br/>BTREE · Inverted FTS · IVF/HNSW"]:::l1
+        ds_refs["_refs/<br/>per-dataset Lance branches/tags"]:::l1
+        ds_tx["_transactions/<br/>commit transaction logs"]:::l1
+    end
+
+    nodes -.-> dataset
+    edges -.-> dataset
+    manifest -.-> dataset
+```
+
+**What's where:**
+
+- **Repo root** is one directory (or S3 prefix). Everything below is part of one OmniGraph repo.
+- **`__manifest/`** is a Lance dataset whose rows describe which sub-table version is published at which graph-branch. Reading a snapshot starts here.
+- **`nodes/`** and **`edges/`** are sibling directories holding one Lance dataset per declared type. Names are `fnv1a64-hex` of the type name to keep paths fixed-length and case-safe.
+- **`_graph_commits.lance` / `_graph_runs.lance`** are L2 datasets that record the graph-level commit DAG and run registry respectively (each has a paired `*_actors.lance` for the actor map).
+- **`_refs/branches/{name}.json`** is graph-level branch metadata — pointers from a branch name to the manifest version it heads.
+- **Inside each Lance dataset** (orange): the standard Lance directory layout. `_versions/{n}.manifest` records every commit; `data/` holds the actual Arrow fragments; `_indices/{uuid}/` holds index segments with their own `fragment_bitmap` for partial coverage; `_refs/` holds Lance-native per-dataset branches and tags.
+
+The split — L2 owns the cross-dataset catalog; L1 owns the per-dataset internals — means that schema work (which adds or removes datasets) updates `__manifest`, while data work (which adds fragments) updates `_versions/` inside the affected dataset and then bumps `__manifest`.
+
 ## URI scheme support (`storage.rs`)

 | Scheme | Backend | Notes |