omnigraph/docs/architecture.md
Ragnor Comerford 64b9d56476
docs: add Mermaid architecture diagrams across architecture / storage / execution
Replace the single ASCII stack in docs/architecture.md with a hierarchy of
Mermaid diagrams that show the system from external context down to the
component level. Add an on-disk layout diagram in docs/storage.md and two
sequence diagrams (read query, mutation) in docs/execution.md so readers
can navigate from "what is OmniGraph" to "how does a query run" without
opening source.

Static structure (docs/architecture.md):

- System context — agents/clients, embedding providers, Cedar, object store.
- Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling
  via classDef, replacing the pre-existing ASCII art.
- Component zoom-ins — compiler, engine, storage trait, index lifecycle,
  server/CLI. Each zoom-in cites file:line entry points.

Aspirational shapes (storage trait, full reconciler) are visually marked
and pointed at the relevant invariants.md section so readers see the
intended seam without thinking it's already implemented.

On-disk layout (docs/storage.md):

- Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance,
  _graph_runs.lance, _refs/branches/ down into Lance's per-dataset
  internals (_versions/, data/, _indices/, _refs/, _transactions/).
- Annotated with the actual filenames so readers can `ls` the same paths.
- Slots in below the existing __manifest CAS / OCC / migration prose; does
  not move or rewrite that content.

Runtime flows (docs/execution.md):

- Read flow sequence: client → Omnigraph::query → typecheck → lower →
  execute_query → table_store → Lance scanner → RecordBatch stream.
- Mutation flow sequence: Omnigraph::mutate → resolve literals →
  Lance write op (Append / merge_insert) → ManifestRepo::commit →
  __manifest upsert.
- Both diagrams are followed by a "Code paths" block with verified
  file:line citations so readers can navigate from diagram element to
  source in one step.

Conventions established (this is the first Mermaid in the repo):

- L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed.
- Diagram size cap ~9 elements; more detail goes in a sub-diagram.
- Diagrams paired with prose; code-path citations follow each diagram.
- Consistent vocabulary across diagrams: frontend / compiler / engine /
  storage trait / Lance / object store. No accidental synonyms.

Subsequent PRs will add flow diagrams for schema apply, branch + merge,
run isolation, index reconcile, and the embedding pipeline in the same
conventions.
2026-04-29 16:58:56 +02:00

10 KiB
Raw Blame History

Architecture

OmniGraph is a typed property-graph engine built as a coordination layer over many Lance datasets, with Git-style branches and commits across the whole graph, multi-modal querying (vector + FTS + BM25 + RRF + graph traversal) in one runtime, an HTTP server with Cedar policy, and a CLI driven by a single omnigraph.yaml.

Reading guide

Three views, increasing zoom:

  1. System context — what OmniGraph is and what it touches.
  2. Layer view — the eight-layer stack inside one OmniGraph process.
  3. Component zoom-ins — what's inside each layer.

For runtime flows (read query, mutation), see docs/execution.md. For the on-disk layout of a repo, see docs/storage.md.

L1 (orange in the diagrams) is what we inherit from Lance; L2 (blue) is what OmniGraph adds. The L1/L2 framing is also called out in prose at the bottom of this doc.

System context

flowchart LR
    classDef external fill:#fef3e8,stroke:#c46900,color:#000
    classDef omnigraph fill:#e8f4fd,stroke:#1e6aa8,color:#000
    classDef store fill:#f0f0f0,stroke:#555,color:#000

    cli[CLI users]:::external
    http[HTTP clients<br/>and SDKs]:::external
    agents[Agents]:::external
    embed[Embedding providers<br/>OpenAI / Gemini]:::external

    og[OmniGraph<br/>kernel]:::omnigraph

    cedar[Cedar policy<br/>engine]:::external
    s3[Object store<br/>local FS / S3 / RustFS]:::store

    cli --> og
    http --> og
    agents --> og
    og --> embed
    og --> cedar
    og --> s3

OmniGraph runs as a single process (one binary, multiple crates). External dependencies are the embedding APIs (called during ingest and at query-time normalization), Cedar (called for every privileged action), and an object store (everything OmniGraph persists lands here).

Layer view

Inside the OmniGraph process, work flows through these layers:

flowchart TB
    classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
    classDef l1 fill:#fef3e8,stroke:#c46900,color:#000

    subgraph CLIs[CLI and HTTP server]
        cli[omnigraph CLI]:::l2
        srv[omnigraph-server<br/>Axum + Cedar]:::l2
    end

    subgraph compiler[omnigraph-compiler]
        front[parse → AST → typecheck → catalog → IR]:::l2
    end

    subgraph engine[omnigraph engine]
        plan[exec query and mutation]:::l2
        gi[graph index CSR/CSC<br/>RuntimeCache LRU 8]:::l2
        coord[coordinator<br/>ManifestRepo · CommitGraph · RunRegistry]:::l2
    end

    subgraph storage[storage trait — wraps Lance]
        ts[table_store · storage.rs<br/>direct lance::Dataset today]:::l2
    end

    subgraph lance_layer[Lance 4.x — substrate]
        lance[per-dataset versions, fragments<br/>BTREE · Inverted FTS · IVF/HNSW vector<br/>merge_insert · compact_files · cleanup_old_versions]:::l1
    end

    subgraph object_store[Object store]
        os[local FS · S3 · RustFS · MinIO]:::l1
    end

    CLIs -- "string + params" --> compiler
    compiler -- IROp --> engine
    engine -- "scan / write request" --> storage
    storage -- "Stream of RecordBatch" --> engine
    storage -- "Lance API calls" --> lance_layer
    lance_layer -- bytes --> object_store

The storage trait row is partly aspirational. Today the engine calls lance::Dataset methods through table_store; a capability-bearing Dataset trait per docs/invariants.md §I.4 is on the roadmap (MR-737). The diagram shows the intended seam.

Component zoom-ins

Compiler — omnigraph-compiler

flowchart LR
    classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000

    src[".gq source"]:::l2
    p[parser Pest<br/>query.pest · schema.pest]:::l2
    ast[AST<br/>QueryDecl · Mutation · Schema]:::l2
    cat[catalog<br/>NodeType · EdgeType · Interface]:::l2
    tc[typecheck<br/>typecheck_query]:::l2
    low[lower<br/>lower_query]:::l2
    ir[IROp pipeline<br/>NodeScan · Expand · Filter · AntiJoin]:::l2

    src --> p --> ast --> tc
    cat --> tc
    tc --> low --> ir

The compiler crate has zero Lance dependency. It owns the schema language, the query language, and the AST → IR lowering.

Code paths:

  • Parser: crates/omnigraph-compiler/src/query/parser.rs, crates/omnigraph-compiler/src/query/query.pest
  • Typecheck: crates/omnigraph-compiler/src/query/typecheck.rs:83 (typecheck_query)
  • Lower: crates/omnigraph-compiler/src/ir/lower.rs:11 (lower_query)
  • Catalog: crates/omnigraph-compiler/src/catalog/

Engine — omnigraph crate

flowchart TB
    classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000

    subgraph exec[exec module]
        eq[query · execute_query<br/>query.rs:347]:::l2
        em[mutation · mutate<br/>mutation.rs:511]:::l2
        ld[loader · ingest<br/>loader/mod.rs:74]:::l2
    end

    subgraph state[graph state]
        coord[GraphCoordinator]:::l2
        mr[ManifestRepo<br/>db/manifest.rs]:::l2
        cg[CommitGraph<br/>_graph_commits.lance]:::l2
        rr[RunRegistry<br/>_graph_runs.lance]:::l2
    end

    subgraph idx[graph index]
        gi[GraphIndex<br/>CSR/CSC built per query]:::l2
        rc[RuntimeCache LRU=8]:::l2
    end

    subgraph io[Lance I/O]
        ts[table_store]:::l2
        st[storage adapter<br/>storage.rs]:::l2
    end

    eq --> gi
    eq --> ts
    em --> ts
    ld --> ts
    eq --> mr
    em --> mr
    coord --> mr
    coord --> cg
    coord --> rr
    ts --> st

The engine binds the compiler IR to Lance. It owns multi-dataset coordination, the graph topology index, the run registry, and the snapshot/manifest read path.

Code paths:

  • Read entry: Omnigraph::query at crates/omnigraph/src/exec/query.rs:7
  • Mutation entry: Omnigraph::mutate at crates/omnigraph/src/exec/mutation.rs:511
  • Manifest commit: ManifestRepo::commit at crates/omnigraph/src/db/manifest.rs:280
  • Graph index: crates/omnigraph/src/graph_index/
  • Loader: Omnigraph::ingest at crates/omnigraph/src/loader/mod.rs:74

Storage trait — today vs. roadmap

flowchart LR
    classDef now fill:#e8f4fd,stroke:#1e6aa8,color:#000
    classDef future fill:#fff,stroke:#888,stroke-dasharray:5 5,color:#444

    subgraph today[Today]
        d1[table_store<br/>opens lance::Dataset directly]:::now
        d2[storage.rs<br/>S3 / file URI plumbing]:::now
    end

    subgraph roadmap[Roadmap — invariants §I.4]
        t[trait Dataset<br/>schema · stats · placement<br/>capabilities · scan · write]:::future
        impl1[LanceStorage]:::future
        impl2[MemStorage for tests]:::future
    end

    today -.-> roadmap
    t --> impl1
    t --> impl2

The storage layer's trait surface is aspirational. Today the engine calls lance::Dataset methods directly. The roadmap (per docs/invariants.md §I.4 and MR-737) is a Dataset trait that surfaces capabilities and statistics so the planner can reason about pushdown opportunities.

Index lifecycle — today vs. roadmap

flowchart LR
    classDef now fill:#e8f4fd,stroke:#1e6aa8,color:#000
    classDef future fill:#fff,stroke:#888,stroke-dasharray:5 5,color:#444

    subgraph today[Today]
        ei[ensure_indices<br/>omnigraph.rs:445]:::now
        manual[called manually<br/>or from optimize]:::now
    end

    subgraph roadmap[Roadmap — invariants §VII.35]
        rec[Reconciler<br/>observes manifest]:::future
        diff[coverage diff<br/>fragments  fragment_bitmap]:::future
        wp[worker pool<br/>builds index segments]:::future
    end

    manual --> ei
    today -.-> roadmap
    rec --> diff --> wp

Today, indexes are built explicitly via ensure_indices. Reads degrade gracefully when index coverage is partial — Lance's scanner unions indexed and scan paths automatically. The roadmap reconciler (per docs/invariants.md §VII.35) observes manifest state and converges coverage in the background.

Server / CLI

flowchart LR
    classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000

    cli[omnigraph CLI<br/>13 cmd families]:::l2
    srv_in[Axum HTTP<br/>REST + OpenAPI]:::l2
    auth[Bearer auth<br/>SHA-256 hashed tokens]:::l2
    pol[Cedar policy gate<br/>per request]:::l2
    eng[engine API]:::l2

    cli -.-> eng
    srv_in --> auth --> pol --> eng

The server applies Cedar policy at the HTTP boundary today (per docs/invariants.md §VII.45, the roadmap is to push policy into the planner as predicates). The CLI bypasses the HTTP layer and calls the engine API directly.

Code paths:

  • Server entry: crates/omnigraph-server/src/lib.rs
  • Auth: crates/omnigraph-server/src/auth.rs
  • Policy: crates/omnigraph-server/src/policy.rs
  • CLI: crates/omnigraph-cli/src/main.rs

L1 / L2 framing

Throughout the docs, capabilities are split into:

  • L1 — Inherited from Lance: what OmniGraph gets "for free" by sitting on top of the Lance dataset format (columnar Arrow storage, per-dataset versions and branches, index types, merge_insert, compact_files / cleanup_old_versions).
  • L2 — Added by OmniGraph: typing (schema language), graph semantics, multi-dataset coordination via __manifest, graph-level branches and commits, the .gq query language and IR, the topology index, the HTTP server, Cedar policy, the CLI.

Concurrency model

  • MVCC: every Lance write bumps a per-dataset version; the OmniGraph manifest version coordinates which sub-table versions are visible together.
  • Snapshot isolation: a query holds one Snapshot for its lifetime; concurrent writes don't leak in.
  • Cross-branch isolation: copy-on-write means readers and writers on different branches don't block each other.
  • Run isolation: each transactional run lives on its own __run__<id> branch.
  • Schema-apply lock: __schema_apply_lock__ system branch serializes schema migrations.
  • Fail-points (failpoints cargo feature): failpoints::maybe_fail("operation.step")? in branch_create, publish, etc., for deterministic failure injection in tests.

Workspace crates

  • omnigraph-compiler — schema and query grammars, catalog, IR, lowering, type checker, lint, migration planner, OpenAI-style embedding client.
  • omnigraph (engine, published as omnigraph-engine on crates.io since v0.2.2) — the Lance-backed runtime: manifest, commit graph, run registry, snapshot, exec, merge, loader, Gemini embedding client.
  • omnigraph-cli — the omnigraph binary.
  • omnigraph-server — the omnigraph-server binary (Axum HTTP server).