docs: add Mermaid architecture diagrams across architecture / storage / execution

Replace the single ASCII stack in docs/architecture.md with a hierarchy of
Mermaid diagrams that show the system from external context down to the
component level. Add an on-disk layout diagram in docs/storage.md and two
sequence diagrams (read query, mutation) in docs/execution.md so readers
can navigate from "what is OmniGraph" to "how does a query run" without
opening source.

Static structure (docs/architecture.md):

- System context — agents/clients, embedding providers, Cedar, object store.
- Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling
  via classDef, replacing the pre-existing ASCII art.
- Component zoom-ins — compiler, engine, storage trait, index lifecycle,
  server/CLI. Each zoom-in cites file:line entry points.

Aspirational shapes (storage trait, full reconciler) are visually marked
and pointed at the relevant invariants.md section so readers see the
intended seam without thinking it's already implemented.

On-disk layout (docs/storage.md):

- Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance,
  _graph_runs.lance, _refs/branches/ down into Lance's per-dataset
  internals (_versions/, data/, _indices/, _refs/, _transactions/).
- Annotated with the actual filenames so readers can `ls` the same paths.
- Slots in below the existing __manifest CAS / OCC / migration prose; does
  not move or rewrite that content.

Runtime flows (docs/execution.md):

- Read flow sequence: client → Omnigraph::query → typecheck → lower →
  execute_query → table_store → Lance scanner → RecordBatch stream.
- Mutation flow sequence: Omnigraph::mutate → resolve literals →
  Lance write op (Append / merge_insert) → ManifestRepo::commit →
  __manifest upsert.
- Both diagrams are followed by a "Code paths" block with verified
  file:line citations so readers can navigate from diagram element to
  source in one step.

Conventions established (this is the first Mermaid in the repo):

- L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed.
- Diagram size cap ~9 elements; more detail goes in a sub-diagram.
- Diagrams paired with prose; code-path citations follow each diagram.
- Consistent vocabulary across diagrams: frontend / compiler / engine /
  storage trait / Lance / object store. No accidental synonyms.

Subsequent PRs will add flow diagrams for schema apply, branch + merge,
run isolation, index reconcile, and the embedding pipeline in the same
conventions.
This commit is contained in:
Ragnor Comerford 2026-04-29 16:58:56 +02:00
parent 4e5374a85e
commit 64b9d56476
No known key found for this signature in database
3 changed files with 376 additions and 39 deletions

View file

@ -2,49 +2,248 @@
OmniGraph is a typed property-graph engine built as a coordination layer over many Lance datasets, with Git-style branches and commits across the whole graph, multi-modal querying (vector + FTS + BM25 + RRF + graph traversal) in one runtime, an HTTP server with Cedar policy, and a CLI driven by a single `omnigraph.yaml`.
## Stack
## Reading guide
Three views, increasing zoom:
1. **System context** — what OmniGraph is and what it touches.
2. **Layer view** — the eight-layer stack inside one OmniGraph process.
3. **Component zoom-ins** — what's inside each layer.
For runtime flows (read query, mutation), see [`docs/execution.md`](execution.md). For the on-disk layout of a repo, see [`docs/storage.md`](storage.md).
L1 (orange in the diagrams) is what we inherit from Lance; L2 (blue) is what OmniGraph adds. The L1/L2 framing is also called out in prose at the bottom of this doc.
## System context
```mermaid
flowchart LR
classDef external fill:#fef3e8,stroke:#c46900,color:#000
classDef omnigraph fill:#e8f4fd,stroke:#1e6aa8,color:#000
classDef store fill:#f0f0f0,stroke:#555,color:#000
cli[CLI users]:::external
http[HTTP clients<br/>and SDKs]:::external
agents[Agents]:::external
embed[Embedding providers<br/>OpenAI / Gemini]:::external
og[OmniGraph<br/>kernel]:::omnigraph
cedar[Cedar policy<br/>engine]:::external
s3[Object store<br/>local FS / S3 / RustFS]:::store
cli --> og
http --> og
agents --> og
og --> embed
og --> cedar
og --> s3
```
┌──────────────────────────────────────────────────────────────────┐
│ CLI (omnigraph) HTTP Server (omnigraph-server, Axum) │
│ - 13 cmd families - REST + OpenAPI │
│ - Aliases, configs - Bearer auth + Cedar policy │
└──────────────────────────────┬───────────────────────────────────┘
┌──────────────────────────────▼───────────────────────────────────┐
│ omnigraph-compiler │
│ - Pest grammars: schema.pest, query.pest │
│ - Catalog (Node/Edge/Interface types) │
│ - IR + lowering (NodeScan / Expand / Filter / AntiJoin) │
│ - Schema migration planner │
│ - Embedding client (OpenAI-style for query-time normalization) │
└──────────────────────────────┬───────────────────────────────────┘
┌──────────────────────────────▼───────────────────────────────────┐
│ omnigraph (engine) │
│ - GraphCoordinator + ManifestRepo (__manifest) │
│ - CommitGraph (_graph_commits.lance) │
│ - RunRegistry (_graph_runs.lance, __run__ branches) │
│ - GraphIndex (CSR/CSC) + RuntimeCache (LRU 8) │
│ - exec::query / mutation / merge │
│ - Embedding client (Gemini for runtime ingest) │
└──────────────────────────────┬───────────────────────────────────┘
┌──────────────────────────────▼───────────────────────────────────┐
│ Lance 4.x (per-table dataset) │
│ - Columnar (Arrow) storage, fragments │
│ - Manifest versions per dataset │
│ - Per-dataset branches (copy-on-write) │
│ - Indexes: BTREE, Inverted (FTS/BM25), IVF/HNSW vector │
│ - merge_insert (upsert), append, delete │
│ - compact_files, cleanup_old_versions │
└──────────────────────────────┬───────────────────────────────────┘
┌──────────────────────────────▼───────────────────────────────────┐
│ Object store: local FS, S3, RustFS, MinIO, S3-compatible │
└──────────────────────────────────────────────────────────────────┘
OmniGraph runs as a single process (one binary, multiple crates). External dependencies are the embedding APIs (called during ingest and at query-time normalization), Cedar (called for every privileged action), and an object store (everything OmniGraph persists lands here).
## Layer view
Inside the OmniGraph process, work flows through these layers:
```mermaid
flowchart TB
classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
classDef l1 fill:#fef3e8,stroke:#c46900,color:#000
subgraph CLIs[CLI and HTTP server]
cli[omnigraph CLI]:::l2
srv[omnigraph-server<br/>Axum + Cedar]:::l2
end
subgraph compiler[omnigraph-compiler]
front[parse → AST → typecheck → catalog → IR]:::l2
end
subgraph engine[omnigraph engine]
plan[exec query and mutation]:::l2
gi[graph index CSR/CSC<br/>RuntimeCache LRU 8]:::l2
coord[coordinator<br/>ManifestRepo · CommitGraph · RunRegistry]:::l2
end
subgraph storage[storage trait — wraps Lance]
ts[table_store · storage.rs<br/>direct lance::Dataset today]:::l2
end
subgraph lance_layer[Lance 4.x — substrate]
lance[per-dataset versions, fragments<br/>BTREE · Inverted FTS · IVF/HNSW vector<br/>merge_insert · compact_files · cleanup_old_versions]:::l1
end
subgraph object_store[Object store]
os[local FS · S3 · RustFS · MinIO]:::l1
end
CLIs -- "string + params" --> compiler
compiler -- IROp --> engine
engine -- "scan / write request" --> storage
storage -- "Stream of RecordBatch" --> engine
storage -- "Lance API calls" --> lance_layer
lance_layer -- bytes --> object_store
```
The `storage trait` row is partly aspirational. Today the engine calls `lance::Dataset` methods through `table_store`; a capability-bearing `Dataset` trait per [`docs/invariants.md`](invariants.md) §I.4 is on the roadmap (MR-737). The diagram shows the intended seam.
## Component zoom-ins
### Compiler — `omnigraph-compiler`
```mermaid
flowchart LR
classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
src[".gq source"]:::l2
p[parser Pest<br/>query.pest · schema.pest]:::l2
ast[AST<br/>QueryDecl · Mutation · Schema]:::l2
cat[catalog<br/>NodeType · EdgeType · Interface]:::l2
tc[typecheck<br/>typecheck_query]:::l2
low[lower<br/>lower_query]:::l2
ir[IROp pipeline<br/>NodeScan · Expand · Filter · AntiJoin]:::l2
src --> p --> ast --> tc
cat --> tc
tc --> low --> ir
```
The compiler crate has zero Lance dependency. It owns the schema language, the query language, and the AST → IR lowering.
Code paths:
- Parser: `crates/omnigraph-compiler/src/query/parser.rs`, `crates/omnigraph-compiler/src/query/query.pest`
- Typecheck: `crates/omnigraph-compiler/src/query/typecheck.rs:83` (`typecheck_query`)
- Lower: `crates/omnigraph-compiler/src/ir/lower.rs:11` (`lower_query`)
- Catalog: `crates/omnigraph-compiler/src/catalog/`
### Engine — `omnigraph` crate
```mermaid
flowchart TB
classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
subgraph exec[exec module]
eq[query · execute_query<br/>query.rs:347]:::l2
em[mutation · mutate<br/>mutation.rs:511]:::l2
ld[loader · ingest<br/>loader/mod.rs:74]:::l2
end
subgraph state[graph state]
coord[GraphCoordinator]:::l2
mr[ManifestRepo<br/>db/manifest.rs]:::l2
cg[CommitGraph<br/>_graph_commits.lance]:::l2
rr[RunRegistry<br/>_graph_runs.lance]:::l2
end
subgraph idx[graph index]
gi[GraphIndex<br/>CSR/CSC built per query]:::l2
rc[RuntimeCache LRU=8]:::l2
end
subgraph io[Lance I/O]
ts[table_store]:::l2
st[storage adapter<br/>storage.rs]:::l2
end
eq --> gi
eq --> ts
em --> ts
ld --> ts
eq --> mr
em --> mr
coord --> mr
coord --> cg
coord --> rr
ts --> st
```
The engine binds the compiler IR to Lance. It owns multi-dataset coordination, the graph topology index, the run registry, and the snapshot/manifest read path.
Code paths:
- Read entry: `Omnigraph::query` at `crates/omnigraph/src/exec/query.rs:7`
- Mutation entry: `Omnigraph::mutate` at `crates/omnigraph/src/exec/mutation.rs:511`
- Manifest commit: `ManifestRepo::commit` at `crates/omnigraph/src/db/manifest.rs:280`
- Graph index: `crates/omnigraph/src/graph_index/`
- Loader: `Omnigraph::ingest` at `crates/omnigraph/src/loader/mod.rs:74`
### Storage trait — today vs. roadmap
```mermaid
flowchart LR
classDef now fill:#e8f4fd,stroke:#1e6aa8,color:#000
classDef future fill:#fff,stroke:#888,stroke-dasharray:5 5,color:#444
subgraph today[Today]
d1[table_store<br/>opens lance::Dataset directly]:::now
d2[storage.rs<br/>S3 / file URI plumbing]:::now
end
subgraph roadmap[Roadmap — invariants §I.4]
t[trait Dataset<br/>schema · stats · placement<br/>capabilities · scan · write]:::future
impl1[LanceStorage]:::future
impl2[MemStorage for tests]:::future
end
today -.-> roadmap
t --> impl1
t --> impl2
```
The storage layer's trait surface is aspirational. Today the engine calls `lance::Dataset` methods directly. The roadmap (per [`docs/invariants.md`](invariants.md) §I.4 and MR-737) is a `Dataset` trait that surfaces capabilities and statistics so the planner can reason about pushdown opportunities.
### Index lifecycle — today vs. roadmap
```mermaid
flowchart LR
classDef now fill:#e8f4fd,stroke:#1e6aa8,color:#000
classDef future fill:#fff,stroke:#888,stroke-dasharray:5 5,color:#444
subgraph today[Today]
ei[ensure_indices<br/>omnigraph.rs:445]:::now
manual[called manually<br/>or from optimize]:::now
end
subgraph roadmap[Roadmap — invariants §VII.35]
rec[Reconciler<br/>observes manifest]:::future
diff[coverage diff<br/>fragments fragment_bitmap]:::future
wp[worker pool<br/>builds index segments]:::future
end
manual --> ei
today -.-> roadmap
rec --> diff --> wp
```
Today, indexes are built explicitly via `ensure_indices`. Reads degrade gracefully when index coverage is partial — Lance's scanner unions indexed and scan paths automatically. The roadmap reconciler (per [`docs/invariants.md`](invariants.md) §VII.35) observes manifest state and converges coverage in the background.
### Server / CLI
```mermaid
flowchart LR
classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
cli[omnigraph CLI<br/>13 cmd families]:::l2
srv_in[Axum HTTP<br/>REST + OpenAPI]:::l2
auth[Bearer auth<br/>SHA-256 hashed tokens]:::l2
pol[Cedar policy gate<br/>per request]:::l2
eng[engine API]:::l2
cli -.-> eng
srv_in --> auth --> pol --> eng
```
The server applies Cedar policy at the HTTP boundary today (per [`docs/invariants.md`](invariants.md) §VII.45, the roadmap is to push policy into the planner as predicates). The CLI bypasses the HTTP layer and calls the engine API directly.
Code paths:
- Server entry: `crates/omnigraph-server/src/lib.rs`
- Auth: `crates/omnigraph-server/src/auth.rs`
- Policy: `crates/omnigraph-server/src/policy.rs`
- CLI: `crates/omnigraph-cli/src/main.rs`
## L1 / L2 framing
Throughout the docs, capabilities are split into: