mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-18 02:24:27 +02:00
docs: add Mermaid architecture diagrams across architecture / storage / execution
Replace the single ASCII stack in docs/architecture.md with a hierarchy of Mermaid diagrams that show the system from external context down to the component level. Add an on-disk layout diagram in docs/storage.md and two sequence diagrams (read query, mutation) in docs/execution.md so readers can navigate from "what is OmniGraph" to "how does a query run" without opening source. Static structure (docs/architecture.md): - System context — agents/clients, embedding providers, Cedar, object store. - Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling via classDef, replacing the pre-existing ASCII art. - Component zoom-ins — compiler, engine, storage trait, index lifecycle, server/CLI. Each zoom-in cites file:line entry points. Aspirational shapes (storage trait, full reconciler) are visually marked and pointed at the relevant invariants.md section so readers see the intended seam without thinking it's already implemented. On-disk layout (docs/storage.md): - Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance, _graph_runs.lance, _refs/branches/ down into Lance's per-dataset internals (_versions/, data/, _indices/, _refs/, _transactions/). - Annotated with the actual filenames so readers can `ls` the same paths. - Slots in below the existing __manifest CAS / OCC / migration prose; does not move or rewrite that content. Runtime flows (docs/execution.md): - Read flow sequence: client → Omnigraph::query → typecheck → lower → execute_query → table_store → Lance scanner → RecordBatch stream. - Mutation flow sequence: Omnigraph::mutate → resolve literals → Lance write op (Append / merge_insert) → ManifestRepo::commit → __manifest upsert. - Both diagrams are followed by a "Code paths" block with verified file:line citations so readers can navigate from diagram element to source in one step. Conventions established (this is the first Mermaid in the repo): - L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed. - Diagram size cap ~9 elements; more detail goes in a sub-diagram. - Diagrams paired with prose; code-path citations follow each diagram. - Consistent vocabulary across diagrams: frontend / compiler / engine / storage trait / Lance / object store. No accidental synonyms. Subsequent PRs will add flow diagrams for schema apply, branch + merge, run isolation, index reconcile, and the embedding pipeline in the same conventions.
This commit is contained in:
parent
4e5374a85e
commit
64b9d56476
3 changed files with 376 additions and 39 deletions
|
|
@ -9,6 +9,49 @@ Pipeline:
|
|||
3. If `Expand` or `AntiJoin` is present, build (or fetch from `RuntimeCache`) a `GraphIndex`.
|
||||
4. Run `execute_query` against the snapshot.
|
||||
|
||||
### Read flow — sequence
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant client as Client
|
||||
participant og as Omnigraph::query<br/>(query.rs:7)
|
||||
participant cmp as omnigraph-compiler
|
||||
participant exec as execute_query<br/>(query.rs:347)
|
||||
participant gi as GraphIndex<br/>(RuntimeCache)
|
||||
participant ts as table_store
|
||||
participant lance as Lance scanner
|
||||
|
||||
client->>og: query(target, source, name, params)
|
||||
og->>og: ensure_schema_state_valid()<br/>resolve target → snapshot
|
||||
og->>cmp: parse + typecheck_query (typecheck.rs:83)
|
||||
cmp-->>og: CheckedQuery
|
||||
og->>cmp: lower_query (lower.rs:11)
|
||||
cmp-->>og: QueryIR (pipeline of IROp)
|
||||
og->>exec: extract_search_mode + dispatch (query.rs:110)
|
||||
exec->>gi: build / fetch GraphIndex<br/>(if Expand or AntiJoin)
|
||||
gi-->>exec: CSR / CSC topology
|
||||
loop for each IROp in pipeline
|
||||
exec->>ts: scan with predicate / SIP
|
||||
ts->>lance: filter · nearest · full_text_search
|
||||
lance-->>ts: Stream of RecordBatch
|
||||
ts-->>exec: RecordBatch stream
|
||||
exec->>exec: factorize · expand · fuse · project
|
||||
end
|
||||
exec-->>og: QueryResult (RecordBatches)
|
||||
og-->>client: serialized result
|
||||
```
|
||||
|
||||
**Code paths:**
|
||||
|
||||
- Entry: `Omnigraph::query` at `crates/omnigraph/src/exec/query.rs:7`
|
||||
- Search-mode extraction: `extract_search_mode` at `crates/omnigraph/src/exec/query.rs:110`
|
||||
- Pipeline runner: `execute_query` at `crates/omnigraph/src/exec/query.rs:347`
|
||||
- RRF fan-out: `execute_rrf_query` at `crates/omnigraph/src/exec/query.rs:393`
|
||||
- Per-source-row BFS: `execute_expand` at `crates/omnigraph/src/exec/query.rs:675`
|
||||
- Lance scan + pushdown: `execute_node_scan` at `crates/omnigraph/src/exec/query.rs:1027`
|
||||
- Filter → SQL pushdown: `build_lance_filter` at `crates/omnigraph/src/exec/query.rs:1158`
|
||||
|
||||
### Multi-modal search modes (`SearchMode`)
|
||||
|
||||
The executor recognizes three modes that may be combined in a single query:
|
||||
|
|
@ -44,6 +87,52 @@ Resolves expression values to literals, converts to typed Arrow arrays (`literal
|
|||
|
||||
Multi-statement mutations are atomic at the manifest commit boundary.
|
||||
|
||||
### Mutation flow — sequence
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant client as Client
|
||||
participant og as Omnigraph::mutate<br/>(mutation.rs:511)
|
||||
participant cmp as omnigraph-compiler
|
||||
participant ts as table_store
|
||||
participant lance as Lance dataset
|
||||
participant mr as ManifestRepo<br/>(manifest.rs:280)
|
||||
participant manifest as __manifest/
|
||||
|
||||
client->>og: mutate(target, source, name, params)
|
||||
og->>cmp: parse + typecheck_query
|
||||
cmp-->>og: CheckedQuery (Mutation IR)
|
||||
og->>og: resolve expression literals<br/>literal_to_typed_array(lit, type, n)
|
||||
loop for each mutation statement
|
||||
alt insert
|
||||
og->>ts: append RecordBatches
|
||||
ts->>lance: WriteMode::Append → new fragment(s)
|
||||
else update
|
||||
og->>ts: merge_insert keyed by id
|
||||
ts->>lance: merge_insert(WhenMatched::Update)
|
||||
else delete
|
||||
og->>ts: merge_insert with delete predicate
|
||||
ts->>lance: merge_insert(WhenMatched::Delete)
|
||||
end
|
||||
lance-->>ts: new dataset version
|
||||
ts-->>og: SubTableUpdate (key, version, row_count)
|
||||
end
|
||||
og->>mr: commit(updates)
|
||||
mr->>manifest: append rows<br/>(table_version per sub-table)
|
||||
manifest-->>mr: new graph-manifest version
|
||||
mr-->>og: graph version
|
||||
og-->>client: MutationResult
|
||||
```
|
||||
|
||||
**Code paths:**
|
||||
|
||||
- Entry: `Omnigraph::mutate` at `crates/omnigraph/src/exec/mutation.rs:511`
|
||||
- Actor-attributed variant: `Omnigraph::mutate_as` at `crates/omnigraph/src/exec/mutation.rs:522`
|
||||
- Manifest commit: `ManifestRepo::commit` at `crates/omnigraph/src/db/manifest.rs:280`
|
||||
|
||||
The whole mutation — every statement, every affected sub-table — publishes through one call to `ManifestRepo::commit`. That single append to `__manifest` is what gives multi-statement mutations their atomicity guarantee (per [`docs/invariants.md`](invariants.md) §VI.26).
|
||||
|
||||
## Bulk loader (`loader/mod.rs`)
|
||||
|
||||
- **JSONL only** in v1, with two record shapes:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue