mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-12 01:45:14 +02:00

Lakehouse-native graph engine with git-style workflows https://omnigraph.dev

Find a file

Ragnor Comerford dbfdddc952 feat(engine): indexed graph traversal (#149 ) * perf(engine): route Expand node hydration through the id BTREE via structured filter hydrate_nodes built an `id IN (...)` SQL string applied via Scanner::filter, which DataFusion evaluates with InListEval (O(N×M)) rather than using the id BTREE scalar index — measured at 72× the indexed cost on a 100k-node hop (MR-376). Build the id IN-list as a structured DataFusion Expr, AND it with the pushable destination filters, and apply via Scanner::filter_expr (the same path execute_node_scan already uses); Lance then compiles it to scalar-index-search -> take. Destination-filter pushability is now decided by ir_filter_to_expr (structured) instead of ir_filter_to_sql, so list-contains (array_has) pushes down too. Removes the now-dead string-filter helpers build_lance_filter, ir_filter_to_sql, and ir_expr_to_sql; literal_to_sql stays (still used by the mutation delete path). * feat(engine): add TableStore::scan_edges_by_endpoint for indexed neighbor lookup Static helper returning edge rows that match a set of endpoint keys on src/dst, projected to [key_col, opposite_col], via a structured `key_col IN (keys)` filter_expr. Lance routes it through the persisted BTREE on the endpoint column (index-search -> take), so cost scales with the frontier size rather than \|E\|. Unused until execute_expand's indexed mode lands; isolated in its own commit so the storage-layer primitive is reviewable on its own. * feat(engine): add BTREE-indexed Expand traversal path Split execute_expand into a dispatcher over execute_expand_csr (the existing in-memory CSR BFS, unchanged) and a new execute_expand_indexed that serves each hop by batching the frontier into one scan_edges_by_endpoint call against the persisted src/dst BTREE (index-search -> take), then fans out per source row. Both share expand_hydrate_and_align — the destination hydration + alignment + hconcat + in-memory non-pushable filters — which now aligns by string id (a HashMap) instead of a dense row-id vec, so one tail serves both modes. Mode selection is OMNIGRAPH_TRAVERSAL_MODE for now (default csr); the frontier-size auto policy and lazy CSR build follow. AntiJoin stays on CSR. tests/traversal_indexed.rs (its own #[serial] binary, so env writes never race a reader) asserts the indexed path matches CSR for one-hop, multi-hop, cross-type, and no-match cases, and that a freshly-appended unindexed edge is still found (partial index coverage — fast_search=false unindexed-fragment scan). * feat(engine): frontier-size Expand dispatcher + lazy CSR build Replace the env-only mode switch with an auto policy: Expand uses the BTREE-indexed path when the source frontier is small and the hop count bounded (OMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER=1024, OMNIGRAPH_EXPAND_INDEXED_MAX_HOPS=6), else the in-memory CSR. OMNIGRAPH_TRAVERSAL_MODE=indexed\|csr still forces a mode. Make the CSR index lazy: thread a GraphIndexHandle (memoizing OnceCell over a Cached/Direct/None builder) through execute_query/execute_pipeline/ execute_rrf_query/execute_anti_join instead of a pre-built Option<&GraphIndex>. A query served entirely by the indexed path with no AntiJoin never pays the O(\|E\|) CSR build — the perf win of Tier 3. AntiJoin still realizes the index (its negation uses CSR has_neighbors). Net effect: selective traversals (the common case) skip the whole-graph CSR build and resolve neighbors from the persisted, incrementally-maintained src/dst BTREE. Existing traversal/aggregation/end_to_end/search suites now run the indexed path by default and stay green. Docs: constants.md (new env knobs), query-language.md (Expand dual path), indexes.md (graph index is lazy + the indexed alternative). * test(engine): bench indexed vs CSR selective traversal Add a selective single-source knows{1,2} comparison to bench_expand: per growing \|E\|, time the cold query in csr vs indexed mode (fresh db each, so CSR pays its O(\|E\|) build) and assert both modes return identical rows — a guard against the scalar-index physical_rows silent fallback dropping unindexed-fragment rows. The existing dense hop1/2/3 latency bench is unchanged. * feat(engine): surface silent scalar-index fallback in indexed traversal (C6) Add TableStore::key_column_index_coverage — a metadata-only check (no IO) of whether a `key_col IN (...)` scan will be served by the persisted BTREE or silently fall back to a full filtered scan, mirroring Lance's own decision: no BTREE on the column, or any fragment missing physical_rows (which disables scalar indices for the whole scan, lance dataset/scanner.rs create_filter_plan). execute_expand_indexed calls it once per traversal and tracing::warn!s on Degraded, so the perf cliff is observable instead of hidden behind a bench oracle. Detection-only: results are correct either way (the scan returns all rows). Closes the "no silent failures" gap the traversal best-practice audit flagged as the top deviation, and adds an IndexCoverage value a future cost-based planner can consume. * perf(engine): dense-id BFS on the indexed traversal path (C3) execute_expand_indexed ran its per-source BFS in string space (Vec<HashSet<String>>, HashMap<String,Vec<String>>, ~4 String clones per neighbor occurrence). Intern node ids to u32 once via a per-traversal TypeIndex (no GraphIndex/CSR build — laziness preserved) and run visited/seen/frontier/ neighbor-map in dense u32 space, mirroring the CSR path; de-intern only for the per-hop IN-list and the emitted dst ids handed to the hydrate+align tail. Behavior-preserving — the traversal_indexed CSR-vs-indexed equivalence tests are the guard (results are identical, the key type just changes String -> u32). * refactor(engine): thread the opened edge dataset into indexed Expand Hoist the edge-dataset open and the C6 index-coverage warning out of execute_expand_indexed into execute_expand, threading the opened dataset in as a parameter so it is opened exactly once. Extract the endpoint-column mapping (endpoint_columns) and the coverage warning (warn_on_degraded_coverage) as helpers. Behavior-preserving: same dataset, same warning, same dispatch decision. This only relocates the open so the upcoming cost-based chooser can consult index coverage before dispatch without opening the dataset twice. * feat(engine): cost-based Expand dispatch chooser (C5) Replace the fixed frontier<=1024 && hops<=6 dispatch threshold with a pure, IO-free cost model. choose_expand_mode compares the indexed path's frontier-relative work (hops * frontier * fanout, or hops * \|E\| when BTREE coverage is degraded) against the cost of building the whole-graph CSR (BUILD_FACTOR * \|E\|), from cheap manifest row counts. Under good coverage this reduces to a selectivity ratio independent of \|E\|, preserving the flat-in-\|E\| indexed win for selective traversals while routing dense / deep / high-fanout or degraded-and-expensive traversals to CSR. execute_expand decides cardinality-first and only opens the edge dataset to confirm coverage when it leans indexed (no open on a clearly-CSR traversal). The two env knobs become hard ceilings layered on the model; the OMNIGRAPH_TRAVERSAL_MODE override still forces a path; the chosen mode is traced. Results are unchanged across modes — only the path differs. Adds inline crossover unit tests and extends the traversal_indexed both_modes harness with an auto pass asserting the chooser is result-preserving across every traversal shape. Documents the new flag semantics in docs/user/{constants,query-language}.md. * test(engine): pin Lance scalar-index coverage + system-column/deletion-metadata surface Add three Lance surface guards de-risking a future persisted-adjacency cache: - a compile-only guard pinning the fragment physical_rows + index-detail surface that key_column_index_coverage mirrors (the C6 fallback); - a runtime probe confirming a scalar BTREE on the system column _row_last_updated_at_version is not buildable via the normal create-index path (the column is not in the user schema), so a version-column range delta is not viable as drafted; - a runtime probe confirming per-fragment deletion metadata (deletion_file.num_deleted_rows) is available as cheap O(fragments) metadata, the primitive a fragment-coverage delete model would rely on. The probes turn the two largest substrate assumptions into green/red CI facts before any cache work begins. * test(engine): regression for cross-type id-collision in indexed traversal A node id is unique only within a type, so a Person and a Company can share an id string. A variable-length traversal over a cross-type edge (WorksAt) must structurally stop after one hop. This test builds a graph where 'shared' is both a Person and a Company id and asserts worksAt{1,2} returns only the one-hop company. It fails today: the indexed path's single string interner de-interns the hop-1 Company id back to the colliding Person id and runs a hop-2 scan that matches that Person's edges, emitting a spurious second-hop company (indexed ["other","shared"] vs csr ["shared"]). * fix(engine): structurally cap cross-type Expand at one hop A cross-type edge cannot chain (e.g. a Company is not a WorksAt source), so a variable-length traversal over one is structurally single-hop. Both traversal paths now enforce this by capping max hops at 1 when from_type != to_type, instead of relying on the hop-2 scan returning empty. That reliance was a correctness hole on the indexed path: it interns every endpoint string into one dense id space, so a cross-type id-string collision (a Person and a Company sharing an id) let hop 2 de-intern a destination id back to the colliding source-type id and match its edges, emitting rows the CSR path never produces. With the cap the cross-type second-hop scan never runs, so the shared interner can no longer alias across types. Turns the regression test green (indexed == csr == ["shared"]). * perf(engine): set-oriented filtered anti-join, remove per-row dispatch execute_anti_join's filtered slow path sliced the outer batch to one row at a time and re-ran the inner pipeline per row, so each 1-row inner Expand dispatched to the indexed path — one Lance scan per outer row, while the CSR realized up front sat unused. Replace it with a set-oriented anti-semi-join: tag each outer row with a synthetic index column, run the inner pipeline once over the whole frontier (the tag survives Expand's hconcat and Filter's row-drop), then exclude outer rows whose tag survived. The inner Expand now runs as a single set-at-a-time traversal over the full frontier; config is read once per operator, not per row (the env nit is mooted). A produced-but-untagged inner batch fails loudly rather than silently keeping every row. Results are unchanged (the predicated-negation tests exercise the path over a multi-row outer with dst-filters). * test(engine): drop flaky wall-clock budget from the merge truth table The 30s wall-clock assertion in merge_pair_truth_table flakes under parallel test load: it tripped at ~31s in the full --test-threads=4 gate while passing at ~20s in isolation. A fixed time budget in a correctness test depends on machine and parallelism, not correctness; elapsed is still logged for visibility, and a real merge-perf regression belongs in a bench. The cell-count correctness assertions (81 / 36 / 45) are unchanged. * fix(engine): total deterministic ORDER via entity-key tie-break + NULL contract apply_ordering used an unstable lexsort with no tie-break, so rows with equal user-sort keys came out in a run-dependent order (the input order depends on scan parallelism / upstream hashing) — making ORDER ... LIMIT non-deterministic, a latent deny-list violation (no nondeterministic result ordering). Append the bound entities' key columns (<var>.id, unique per row) in canonical name-sorted order as ascending tie-breaks, giving a total, reproducible order (and a deterministic top-N when ties straddle the LIMIT cutoff). NULL placement (nulls_first = !descending) is unchanged and now documented as the contract. New tests/ordering.rs locks descending, multi-key precedence, the deterministic key tie-break (data loaded in a different order than the expected output, so it proves the tie sorts by key not by load order), and NULL placement under ASC/DESC. docs/user/query-language.md documents the total-order + NULL contract. * test(engine): property-based query-correctness invariants over generated graphs Adds a proptest harness (new dev-dep) that generates small graphs whose Person and Company keys are drawn from a shared 5-key alphabet, so cross-type id collisions, cycles, and self-loops arise by search rather than from one hand-built fixture. Three invariants: - prop_expand_indexed_eq_csr: csr == indexed == auto over knows{1,3} (same-type, cycles) and worksAt{1,2} (cross-type, collision-prone) from every start. - prop_results_subset_of_existing_nodes: no phantom rows (catches over-emission even if both modes are wrong identically). - prop_antijoin_partitions_persons: not{worksAt} and its complement are disjoint and cover all persons. Verified the guard bites: neutering the cross-type hop cap makes prop_expand_indexed_eq_csr fail and proptest shrinks it to persons["c","e"] / companies["b","c"] — the cross-type collision class the hand-built fixture only sampled once. Tests are sync + #[serial] (per-case runtime; the mode test writes OMNIGRAPH_TRAVERSAL_MODE). * test(engine): cover cycle/self-loop termination + nested anti-join (C5 edge cases) - variable_hops_terminate_and_dedup_on_cycle: a 3-cycle a->b->c->a traversed with knows{1,5} (ceiling above the cycle length) terminates and emits each node once (the c->a back-edge hits the seeded source); both_modes confirms indexed == csr. Uses a bounded range deliberately — unbounded {1,} is a typecheck error, not a runtime path. - variable_hops_handle_self_loop: a->a self-loop does not loop forever and does not re-emit the seeded source. - nested_anti_join_double_negation: not { worksAt; not { name = Acme } } recurses through execute_pipeline, yielding [Alice,Charlie,Diana] (people with no non-Acme employer) — distinct from plain unemployed [Charlie,Diana]. * test(engine): execution goldens for typed-literal filters (C4 gap #4) New literal_filters.rs covers filtering by F64/F32/Bool/Date/DateTime LITERALS across both arms: standalone comparisons ($m.score > 1.5, $m.ratio <= 0.25, $m.active = true, $m.born >= date(...), $m.seen < datetime(...)) exercise the in-memory comparison path, and inline bindings (Metric { active: true }, Metric { score: 3.0 }) exercise Lance filter_expr pushdown. Seeds partition each predicate so a dropped/miscast filter returns all rows. (Param-bound scalars and list-column contains are covered elsewhere.) * test(engine): full rank-order goldens for nearest + bm25 (gap #2) Existing search tests stopped at top-1 (nearest) or non-empty (bm25), so a regression corrupting ranks 2..k or reversing the sort direction passed CI silently. Pin the FULL ordered slug list: nearest([0.1,0.2,0.3,0.4]) -> [ml-intro, nlp-guide, rl-intro] (ml-intro exact at dist 0, rest by ascending L2); bm25(Learning) -> [rl-intro, ml-intro, dl-basics] (descending score). nearest/bm25 skip apply_ordering (is_search_ordered) and return Lance native order, so result_slugs row order == rank order; values resolved by running and confirmed stable across runs. * test(engine): search fuzzy/match_text characterization + RRF non-default pairings - match_text_matches_exact_set_excludes_unrelated: match_text(body,'neural') == [dl-basics] exactly (not just contains). - fuzzy_does_not_match_under_default_tokenizer: characterizes that fuzzy() is inert with the default tokenizer here (search/match_text work, fuzzy returns nothing); turns red — to be promoted to a real golden — if fuzzy starts matching. - rrf_fuses_two_fts_fields / rrf_fuses_two_vector_queries: RRF fuses arms other than the default nearest+bm25 (bm25 title+body; two vector queries), proving primary_var resolves and fusion runs. New fixtures/search.gq queries + two_vector_params helper. Orders resolved by running, confirmed stable. * test(engine): anti-join fast-vs-slow path equivalence harness anti_join_fast_and_slow_paths_agree: the CSR has_neighbors fast path (not { $p worksAt $_ }) and the set-oriented inner-pipeline replay (same negation forced slow by an always-true $c.name != "" dst filter) must produce the same result ([Charlie, Diana]). Closes the second real engine fork explicitly. * test(engine): regression for nested slow-path anti-join tag collision A nested not { ... not { ... } } where both levels hit the set-oriented slow path collides on the fixed __antijoin_outer_row correlation column: the inner call appends a duplicate, and column_by_name reads the OUTER tag. Fan-out (p1 works at two companies) makes inner row indices diverge from outer tags, so the bug returns the wrong person set. Fails on current code (left ["p2","p4"] vs right ["p3","p4"]). * fix(engine): collision-free anti-join correlation tag for nested negation The set-oriented anti-join tagged the outer batch with a fixed column name and read it back by name. Under a nested slow-path anti-join the enclosing tag rides through the inner pipeline, so the inner call produced a duplicate field; Arrow permits duplicate names and column_by_name returns the first, so the inner negation mis-correlated against the outer row indices. Choose a tag name not already present in the batch (suffix-incremented), so each nesting level reads its own correlation column. Turns the fan-out regression green; the existing nested/fast-vs-slow/proptest anti-join invariants still pass. * fix(engine): cap cross-type hops in the Expand cost model gather_cost_inputs fed the requested max_hops into choose_expand_mode even though execute_expand_indexed runs at most one hop for a cross-type edge. So a cross-type variable-length expand (e.g. worksAt{1,5}) had its indexed cost scaled by 5 while only one hop runs, skewing the chooser toward CSR (an unnecessary whole-graph build) near the crossover. Results were unaffected (modes are equivalent); this is a plan-accuracy fix. Add cost_effective_hops(requested, same_type) — caps to 1 for cross-type — and apply it in gather_cost_inputs so the estimate matches what executes. Unit test covers the cap and the crossover consequence (capped 1 hop stays indexed where the requested 5 would have flipped to CSR). * perf(engine): realize anti-join CSR lazily + reuse a warm CSR in the chooser Two CSR build/reuse fixes flagged on the set-oriented anti-join work (results unchanged — plan/perf accuracy): - execute_anti_join called graph_index.get() (the O(\|E\|) whole-graph CSR build) unconditionally, but only the bulk fast path consumes it; a filtered/nested slow-path anti-join's inner Expand picks its own access path. Gate the build on a pure shape predicate (bulk_anti_join_applies) so a selective anti-join over a large graph no longer pays a build it won't use. - gather_cost_inputs hardcoded csr_cached=false, so once an earlier op realized the CSR, later Expands still cost it as a cold build and could pick per-hop indexed scans over reusing the warm in-memory CSR. Add GraphIndexHandle:: is_built() and thread it through so the chooser reuses a materialized CSR. Anti-join, cross-type, proptest-equivalence, and chooser unit tests stay green. * test(engine): RAII traversal-mode guard in proptest equivalence prop_expand_indexed_eq_csr set/cleared OMNIGRAPH_TRAVERSAL_MODE manually; a panic between set and clear (e.g. a query unwrap on a generated case) would leak the forced mode into proptest's shrink/subsequent cases and mask the divergence under test. Replace with a ModeGuard that clears on drop (including on unwind), scoping the forced mode to a single query. * test(engine): regression for multi-hop anti-join hop bounds The bulk anti-join fast path answers via has_neighbors (one-hop existence), so not { $p knows{2,2} $x } wrongly drops a node with a 1-hop neighbor but no 2-hop path. On a->b (sink) and c->d->e, only c has a 2-hop path; the query should keep [a,b,d,e]. Fails on current code (left ["b","e"] — only the sinks). * fix(engine): restrict anti-join bulk fast path to one-hop expands bulk_anti_join_applies accepted any single Expand, but try_bulk_anti_join_mask decides via the CSR has_neighbors one-hop existence check — wrong for multi-hop negations. Require min_hops==1 && max_hops==1 in the predicate; anything else falls to the slow path, whose inner Expand runs the real bounded traversal. Turns the multi-hop regression green; one-hop anti-joins unchanged. * fix(engine): IndexCoverage reports Degraded for uncovered fragments key_column_index_coverage checked BTREE-exists + physical_rows but not that the index actually covers the current fragments. Since edge-index creation is skipped once a BTREE exists, fragments appended later stay unindexed while coverage still reported Indexed — so the cost chooser priced a partly-full scan as fully indexed. Compare the BTREE's fragment_bitmap (public on lance_table IndexMetadata) against the dataset's current fragment ids; report Degraded when any are uncovered. A None bitmap means Lance can't report coverage — don't over-degrade. Results are unaffected (the scan returns unindexed-fragment rows either way); this corrects the cost signal. Test: a freshly-loaded edge BTREE is Indexed; after appending an edge the new fragment is uncovered → Degraded. Surface guard pins IndexMetadata.fragment_bitmap. * docs: clarify the Expand frontier ceiling bounds the initial dispatch frontier The cap is applied at dispatch on the initial frontier; per-hop fan-out (union_dense) is not hard-capped. Correct the constants.md and query-language.md claims: the ceilings bound the initial-dispatch frontier/hops, the cost model estimates total indexed work as ~hopsfrontierfanout (pricing dense fan-out toward CSR), and per-hop work is not a hard bound. Drops the overstated 'hard caps bound indexed work' / 'cost ∝ frontier' wording.		2026-06-09 18:09:13 +02:00
.cargo	Raise LANCE_MEM_POOL_SIZE to 1 GB in .cargo/config.toml	2026-04-19 22:27:49 +03:00
.context	Investigate Lance MergeInsertBuilder CAS granularity (MR-766 prereq)	2026-04-28 23:30:17 +00:00
.github	ci(branch-protection): let code owners bypass required PR review (#154 )	2026-06-08 22:26:04 +03:00
crates	feat(engine): indexed graph traversal (#149 )	2026-06-09 18:09:13 +02:00
docker	feat(server): compose OMNIGRAPH_TARGET_URI with OMNIGRAPH_CONFIG in entrypoint (#129 )	2026-05-30 20:17:55 +01:00
docs	feat(engine): indexed graph traversal (#149 )	2026-06-09 18:09:13 +02:00
scripts	governance: external contribution model (issues/discussions/RFCs/PRs) (#143 )	2026-06-06 23:58:08 +03:00
.dockerignore	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00
.gitignore	release: v0.5.0 (#115 )	2026-05-23 13:59:42 +01:00
AGENTS.md	Merge origin/main into cluster-config-docs	2026-06-09 18:11:12 +03:00
Cargo.lock	feat(engine): indexed graph traversal (#149 )	2026-06-09 18:09:13 +02:00
Cargo.toml	feat(cluster): add read-only validate and plan	2026-06-08 20:07:39 +03:00
CLAUDE.md	Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it	2026-04-28 23:10:09 +02:00
CODE_OF_CONDUCT.md	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00
CONTRIBUTING.md	governance: external contribution model (issues/discussions/RFCs/PRs) (#143 )	2026-06-06 23:58:08 +03:00
Dockerfile	Dockerfile: switch base from Docker Hub to ECR Public	2026-04-20 13:46:23 +03:00
GOVERNANCE.md	governance: external contribution model (issues/discussions/RFCs/PRs) (#143 )	2026-06-06 23:58:08 +03:00
LICENSE	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00
og-cheet-sheet.md	feat: inline query strings in CLI and HTTP server (#110 )	2026-05-29 13:41:54 +02:00
omnigraph.example.yaml	example config: use graphs / cli.graph, matching the MR-603 rename	2026-04-18 23:40:35 +03:00
openapi.json	release: v0.6.2	2026-06-09 15:59:59 +02:00
README.md	README: link to TypeScript SDK and MCP server (#92 )	2026-05-30 14:29:49 +02:00
rust-toolchain.toml	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00
SECURITY.md	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00

README.md

Omnigraph

Lakehouse native graph engine built for context assembly

Omnigraph acts as operational state & coordination layer for agents

Git-style versioning & branching
Multimodal retrieval (graph+vector/fts+filters) optimized for context assembly
Object storage native (S3, RustFS)
Native blob-as-data support (docs, images, videos, etc)
VPC, On-prem, hybrid deployment
Lance format as open storage layer

AS CODE	What it means
Schema AS CODE	Typed `.pg` schemas, planned, applied, enforced
Context AS CODE	Linted queries & agentic nudges, versioned and reusable
Security AS CODE	Cedar policies enforced server-side on every mutation
Dashboards AS CODE	Declarative views & controls over the graph (coming)

Core Use Cases

Use case	What it's for
Company brain	Org knowledge unified into one queryable graph
Context graph	Decision traces and codified tribal knowledge
Agentic memory	Durable, versioned memory for long-running agents
Dev graph	Issues & dependency model for coding agents
R&D data layer	Experiments & trials data written into branches
ML workflows	Versioned, branchable graphs for training & eval
Karpathy's LLM wiki	A living, agent-updatable knowledge base

Quick Install

curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/install.sh | bash

This installs omnigraph and omnigraph-server into ~/.local/bin from published release binaries.

Or install with Homebrew:

brew tap ModernRelay/tap
brew install ModernRelay/tap/omnigraph

For starter graphs and agent skills to bootstrap and operate Omnigraph, see ModernRelay/omnigraph-cookbooks.

One-Command Local RustFS Bootstrap

curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/local-rustfs-bootstrap.sh | bash

That bootstrap:

starts RustFS on 127.0.0.1:9000
creates a bucket and S3-backed graph
loads the checked-in context fixture
launches omnigraph-server on 127.0.0.1:8080

Docker must be installed and running first.

The RustFS bootstrap prefers the rolling edge binaries and only falls back to source builds when release assets are unavailable.

If a previous run left objects under the same graph prefix but did not finish initializing the graph, rerun with RESET_REPO=1 or set PREFIX to a new value.

Common Commands

The same URI works for local paths, s3://…, or http://host:port.

omnigraph init   --schema ./schema.pg ./graph.omni
omnigraph load   --data   ./data.jsonl ./graph.omni
omnigraph read   --query  ./queries.gq --name get_person --params '{"name":"Alice"}' ./graph.omni
omnigraph change --query  ./queries.gq --name insert_person --params '{"name":"Mina"}' ./graph.omni
omnigraph branch create --from main feature-x ./graph.omni
omnigraph branch merge  feature-x --into main ./graph.omni

See docs/user/cli.md for schema apply, snapshots, ingest, commits, and policy commands.

Clients

For programmatic access to a running omnigraph-server:

TypeScript SDK — @modernrelay/omnigraph (source). Instance-per-client, typed errors, camelCase types, async-iterator streaming export.
```
npm install @modernrelay/omnigraph
```
Model Context Protocol server — @modernrelay/omnigraph-mcp (source). Bridges Omnigraph to LLM hosts (Claude Desktop, Claude Code, …) over stdio. Exposes tools and resources for schema, branches, queries, mutations, ingest, and bundles curated best-practices guidance from the cookbook.
```
npm install -g @modernrelay/omnigraph-mcp
```

Both packages are versioned in lockstep with omnigraph-server on major.minor: @modernrelay/omnigraph@X.Y.* targets omnigraph-server@X.Y.*. See ModernRelay/omnigraph-ts for the monorepo.

Docs

Build And Test

cargo build --workspace
cargo check --workspace
cargo test --workspace

Notes:

Rust stable toolchain, edition 2024
CI runs cargo test --workspace --locked
Full CI and some local test flows require protobuf-compiler
S3 integration tests expect an S3-compatible endpoint such as RustFS

Workspace Crates

crates/omnigraph-compiler: shared schema/query parser, typechecker, catalog, and IR lowering
crates/omnigraph: storage/runtime, branching, merge, change detection, and query execution
crates/omnigraph-cli: CLI for graph lifecycle (init/load/ingest), query/mutate, branch/commit/merge, schema/lint, snapshot/export, policy, and maintenance (optimize/cleanup)
crates/omnigraph-server: Axum HTTP server for remote reads, changes, ingest, export, branches, and commits

Contributing

Please open an issue, spec, or design discussion before sending large code changes. Design feedback and concrete problem statements are the fastest way to collaborate on the roadmap.

Community

Join the Omnigraph Slack community to ask questions, share feedback, and follow development.