* docs(rfc): RFC-013 write-path latency design + index link * perf(engine): open write-path tables directly, bypassing the namespace builder Write opens routed through DatasetBuilder::from_namespace, whose describe_table opened the whole dataset just to return a location and then re-resolved the latest version — an O(commit-depth) double latest-resolution per table open that missed Lance's O(1) version-hint fast path. On an object store this dominated write latency (~70%, RFC-013 section 2.4). TableStore::open_dataset_head_for_write now delegates to the direct opener (open_dataset_head: Dataset::open by URI + checkout_branch, routed through the tracked opener so cost tests can count it; a no-op in production). The manifest already holds every sub-table's location, so the namespace catalog lookup was redundant; ensure_expected_version still validates head == pinned for strict ops. This completes PR #268's open-by-location migration on the write side. With both reads (PR #268) and now writes bypassing it, nothing in production routes through the per-table Lance namespace. The dead open chain (load_table_from_namespace, open_table_head_for_write) is deleted and the StagedTableNamespace contract apparatus is gated #[cfg(test)], mirroring the already-test-only read namespace; __manifest commit coordination (GraphNamespacePublisher) is a separate component and is unaffected. See docs/dev/rfc-013-write-path-latency.md sections 2.4 and 9 (step 3a). * test(engine): write-path cost-budget gate on a shared harness Adds tests/helpers/cost.rs, a store-agnostic cost harness (IoCounts/StagedCounts, measure/measure_with_staged, assert_flat, local_graph/s3_graph) that the read-side warm_read_cost.rs, write_cost.rs, and write_cost_s3.rs share, so the IOTracker / task-local plumbing lives in exactly one place instead of duplicated per test. write_cost.rs (local, every-PR) gates the internal-table scan term flat in commit-history depth (a RED #[ignore]'d LOCK, the acceptance for bringing the internal tables into compaction) plus green guards: a single insert's data writes are bounded, a per-write read-op ceiling fails the moment a round-trip is added, and a keyed insert routes through stage_merge_insert once with no stage_append or vector-index build. write_cost_s3.rs (bucket-gated, rustfs CI) gates the data-table opener term flat across depth — the object-store-RPC phenomenon local FS cannot reproduce, and the red->green proof of the opener bypass. Wired into the rustfs_integration CI job and its path filter. Guards the "hot-path cost is bounded by work, not history" invariant on writes. See docs/dev/rfc-013-write-path-latency.md section 5.1, docs/dev/testing.md. * docs(rfc): RFC-013 step 3a landed; write-skew coupling; cost-gate test map - Section 9: mark step 1 (gate + harness) and step 3a (opener bypass) landed; record the per-table namespace retirement to test-only and the corrected measurement note (the opener win is S3-only; the local data-table growth is the merge-insert/RI fragment scan, a compaction term, not the opener). - Sections 7.1/6/11/5.5/10: correct the cross-table write-skew analysis after a prototype proved the scoped expected-set fix is a no-op against the per-object_id manifest (disjoint writers never share a row, so Lance never conflicts, the publisher never retries, and the expected check is a non-atomic pre-check evaluated once against stale state). The fix needs a shared contention row (Phase-7 graph_head / a minimal head row / commit-time re-validation), so it is coupled to that row, not standalone; that contention is load-bearing for correctness, not a drawback. Split the concurrent face (read-set + head) from the sequential face (inbound-RI validation on node removal) -- two different fixes. - testing.md: add write_cost.rs / helpers/cost.rs / write_cost_s3.rs to the test map; document the local-vs-S3 backend split; extend the cost-budget checklist item to the write/open path and point at the shared harness. * test(engine): isolate the opener in the S3 cost gate; fail loud on S3 setup errors Addresses two PR review findings on the bucket-gated write_cost_s3 gate: - The data-table opener was not isolated: `data_reads` also counts the merge-insert/RI scan, which reads O(fragment-count) and so grows with history for a different reason (compaction's domain, not the opener) -- the same term that made the local data-table count grow. The flat assertion would false-RED or misattribute scan growth to the opener on rustfs. Fix: compact (db.optimize) before each measurement so the table holds ~1 fragment, bounding the scan and leaving the opener's latest-version resolution as the only history-varying term. Compaction preserves version history, so the opener still faces a deep _versions/ chain -- the thing under test. - s3_graph used `.ok()?`, so when OMNIGRAPH_S3_TEST_BUCKET was set but the store was down/misconfigured, init/seed failures collapsed to None and the gate skipped + passed vacuously. Fix: skip only when the bucket env var is absent; once it is set, init/seed failures panic (mirrors tests/s3_storage.rs). * test(engine): isolate the S3 opener with a per-prefix IO probe (correct-by-design) Replaces the fixture-bounded isolation (compact-before-measure) from the prior commit with the root fix: a path-classifying ObjectStore wrapper (PrefixCounter) that attributes each data-table read to the opener term (_versions/.manifest) vs the scan term (data/*.lance). IoCounts now exposes data_opener_reads / data_scan_reads, so write_cost_s3 asserts the opener flat *directly* -- no compaction or fixture massaging, and the assertion measures the opener, not the conflated total. Closes the "harness conflates two IO terms" class: any cost test (read or write) can now isolate the opener. PrefixCounter implements only the object_store 0.13 core ObjectStore methods; the convenience surface (get/put/head/...) routes through get_opts/put_opts via ObjectStoreExt's blanket impl, so every read/write is still counted. Validated locally (every-PR) by write_cost::data_table_reads_split_into_flat_opener_ and_growing_scan: opener stays flat (7 -> 3) while scan grows (11 -> 91) and opener + scan == data_total exactly -- proving the classifier and confirming the local data-table growth is the fragment scan, not the opener. warm_read_cost (12 tests) stays green under the shared-harness change. * refactor(tests): remove cost-harness duplication and namespace cfg(test) noise Branch self-review (no behavior change) — pay down three liabilities the write-path work left: - warm_read_cost.rs kept its own probes() (three IOTrackers + a QueryIoProbes + a probe counter) and read raw .stats().read_iops — duplicating the shared helpers::cost harness this branch introduced. Migrated all 12 tests onto measure()/IoCounts; deleted the local probes(). (This also makes IoCounts' version_probes field used rather than dead.) - insert_cost was copy-pasted verbatim into write_cost.rs and write_cost_s3.rs. Hoisted to helpers::cost::measure_insert so the measured write is defined once. - The per-table Lance namespace (namespace.rs) became entirely test-only after step 3a, but was gated with ~22 per-item #[cfg(test)] attributes. Collapsed to a single `#[cfg(test)] mod namespace;` and stripped the per-item attributes; merged the import groups the gating had split. Verified: lib in-source 162 passed; write_cost 4 + warm_read_cost 12 passed; forbidden_apis passed. |
||
|---|---|---|
| .cargo | ||
| .context | ||
| .github | ||
| assets | ||
| crates | ||
| docker | ||
| docs | ||
| scripts | ||
| skills/omnigraph | ||
| .dockerignore | ||
| .gitignore | ||
| AGENTS.md | ||
| Cargo.lock | ||
| Cargo.toml | ||
| CLAUDE.md | ||
| CODE_OF_CONDUCT.md | ||
| CONTRIBUTING.md | ||
| Dockerfile | ||
| GOVERNANCE.md | ||
| LICENSE | ||
| og-cheet-sheet.md | ||
| omnigraph.example.yaml | ||
| openapi.json | ||
| README.md | ||
| rust-toolchain.toml | ||
| SECURITY.md | ||
Lakehouse graph database for context assembly & multi-agent coordination
Multimodal retrieval · Git-style branching · object-storage native
Quickstart · Docs · Cookbooks · CLI
Omnigraph is the operational state and coordination layer for fleets of agents.
Run it as a server, declared as code; hundreds of agents operate and enrich the graph on parallel isolated branches, and every change is reviewed and merged safely.
Key capabilities
| Capability | What it gives you |
|---|---|
| Declared as code | A cluster.yaml declares graphs, schemas, stored queries, embedding providers, and policies; cluster apply converges it and omnigraph-server brings every graph online at /graphs/{id}/…. |
| Built for fleets of agents | Hundreds of agents enrich the graph on parallel isolated branches; changes are reviewed and merged safely, Git-style, across the whole graph. |
| Multimodal retrieval | Graph traversal + vector ANN + full-text + Reciprocal Rank Fusion in one query runtime, for context assembly. |
| Security as code | Cedar policy enforced server-side on every mutation, per-graph and server-wide; bearer auth; actor/audit tracking. |
| Runs on your infrastructure | Any S3-compatible object store: on-prem via RustFS / MinIO, or AWS S3 / R2 / GCS. VPC, on-prem, hybrid; your data never leaves your store. |
| Open, versioned storage | Lance columnar format: branchable, time-travelable, with native blob-as-data (docs, images, video). |
What you can build
| Use case | What it's for |
|---|---|
| Company brain | Org knowledge unified into one graph every agent can query |
| Agentic memory | Durable, versioned memory: a branch per agent or per task, merged on review |
| Context graph | Decision traces and codified tribal knowledge for retrieval |
| Dev graph | Issues & dependency model that coding agents read and write |
| R&D / ML data layer | Experiments and trials written into branches, versioned for training & eval |
Install
curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/install.sh | bash
This installs omnigraph (CLI) and omnigraph-server into ~/.local/bin from
published release binaries. Or with Homebrew:
brew tap ModernRelay/tap
brew install ModernRelay/tap/omnigraph
Set it up with an AI agent
Omnigraph is built to be run by coding agents. Two ways in:
Teach your agent the playbook. This repo ships the
omnigraph agent skill: the operational playbook
covering cluster mode, the two config surfaces, schema evolution, query linting,
data writes, branches, Cedar policy, and the common gotchas.
npx skills add ModernRelay/omnigraph@omnigraph
Or have an agent set it up from scratch. Paste this into Claude Code, Codex, or any agent that can read a URL and run a shell command:
Help me set up Omnigraph
1. Read the docs at https://github.com/ModernRelay/omnigraph, starting with
docs/user/clusters/index.md, then docs/user/deployment.md.
2. Skim the starter graphs and seed data in the cookbooks:
https://github.com/ModernRelay/omnigraph-cookbooks
3. Ask me what I want to build (company brain, agent memory, dev graph,
research / R&D layer, …). Then stand up a cluster for it, load a little
data, and run a query so I can see it working.
For ready-to-run graphs with real seed data (company brain, VC operating system,
pharma & industry intel),
ModernRelay/omnigraph-cookbooks
is the fastest way to see Omnigraph shaped to a real domain.
Deploy
A deployment is a cluster: a multigraph config directory that declares
its graphs, schemas, stored queries, and policies as code. You manage it
Terraform-style: cluster plan previews the diff, cluster apply converges
it. omnigraph-server then boots from the cluster and brings every graph online
at /graphs/{id}/…, each behind its own policy.
1. Declare the cluster.
company-brain/
├── cluster.yaml
├── people.pg # schema for the "knowledge" graph
├── queries/ # stored queries: the .gq files ARE the declaration
│ └── people.gq
└── base.policy.yaml # a Cedar policy bundle
# cluster.yaml
version: 1
metadata:
name: company-brain
storage: s3://company/clusters/company-brain # ledger, catalog, and graph data live here
graphs:
knowledge:
schema: people.pg
queries: queries/ # every `query <name>` in queries/*.gq registers
policies:
base:
file: base.policy.yaml
applies_to: [knowledge] # graph-bound; use [cluster] for server-level
2. Stand up your object store. On-prem, run RustFS (or MinIO); Omnigraph
writes Lance to it over the standard S3
API. In the cloud, point the same AWS_* env at S3 / R2 / GCS instead.
3. Converge and run. apply creates each graph, applies its schema, and
publishes queries and policies into the content-addressed catalog. It is
idempotent; re-running is always safe.
omnigraph cluster validate # parse + typecheck everything
omnigraph cluster plan # preview what apply would do
omnigraph cluster apply # converge
# Boot the server from the cluster dir; storage resolves through cluster.yaml
omnigraph-server --cluster company-brain --bind 0.0.0.0:8080
See the cluster guide for the day-2 loop
(edit → plan → apply → restart), approval gates for destructive changes, drift
inspection, and recovery; the deployment guide for
containers, AWS/Railway, auth, and the full AWS_* contract.
Query and mutate
Set a default server and graph once in ~/.omnigraph/config.yaml, and the
everyday commands stay short. Stored queries and mutations run by name:
omnigraph query search_docs --params '{"q":"AI safety"}'
omnigraph mutate add_person --params '{"name":"Mina"}'
# Branch, review, merge across the whole graph; agents write in isolation
omnigraph branch create --from main agent/ingest-42
omnigraph branch merge agent/ingest-42 --into main
An alias is shorter still: bind a server, graph, and stored query to one
name, then omnigraph alias triage runs it. For an ad-hoc target, any command
still takes --server <name|url> --graph <id> (or --store <uri> for a local
graph). See the CLI reference.
Security & governance
- Engine-wide enforcement: every write path goes through the same Cedar gate, so the HTTP server, the CLI, and the embedded SDK obey identical rules.
- Declared in the cluster: a policy bundle is bound to graphs (or the whole server) via
policies:→applies_to. - Scoped: rules apply per graph, per branch, or server-wide.
- No plaintext tokens: bearer tokens are hashed at startup and compared in constant time.
- Forge-proof identity: the actor is resolved server-side from the token; clients can't set it.
See the policy guide.
Clients & SDKs
| Client | Use it for | Where |
|---|---|---|
| TypeScript SDK | typed access from Node / TS | @modernrelay/omnigraph · source |
| MCP server | bridge Omnigraph to LLM hosts (Claude, Codex, …) | @modernrelay/omnigraph-mcp |
| HTTP / OpenAPI | any language, the wire contract | the server's OpenAPI spec |
| Python SDK | typed access from Python | coming soon |
Both npm packages are versioned in lockstep with omnigraph-server.
Local quick test (no server)
1-min setup to try it: an embedded, local file-backed graph (no server, no object store). For dev and experiments; production is the deployed cluster above.
cat > schema.pg <<'PG'
node Signal { slug: String @key, title: String }
node Pattern { slug: String @key, name: String }
edge Indicates: Signal -> Pattern
PG
printf '%s\n' \
'{"type":"Signal","data":{"slug":"s1","title":"OSS model adoption surging"}}' \
'{"type":"Pattern","data":{"slug":"p1","name":"adoption"}}' \
'{"edge":"Indicates","from":"s1","to":"p1"}' > data.jsonl
omnigraph init --schema schema.pg ./graph.omni
omnigraph load --data data.jsonl --mode overwrite --store ./graph.omni
# "What pattern does signal s1 indicate?"
omnigraph query --store ./graph.omni \
-e 'query indicates() { match { $s: Signal { slug: "s1" } $s indicates $p } return { $p.name } }'
# → adoption
Docs
Build And Test
cargo build --workspace
cargo test --workspace
Notes:
- Rust stable toolchain, edition 2024
- CI runs
cargo test --workspace --locked - Full CI and some local test flows require
protobuf-compiler - S3 integration tests expect an S3-compatible endpoint such as RustFS
Workspace Crates
crates/omnigraph-compiler: shared schema/query parser, typechecker, catalog, and IR lowering (zero Lance dependency)crates/omnigraph(packageomnigraph-engine): storage/runtime, branching, merge, change detection, query execution, and embeddingscrates/omnigraph-policy: Cedar policy compilation and enforcementcrates/omnigraph-api-types: shared HTTP wire DTOs used by both the server and the CLIcrates/omnigraph-cluster: cluster config validation, planning, and apply (the control plane)crates/omnigraph-server: Axum HTTP server, cluster-first, runs N graphs under/graphs/{id}/…crates/omnigraph-cli: CLI for graph lifecycle, query/mutate, branch/commit/merge, schema/lint, snapshot/export, cluster control, policy/queries, profiles, and maintenance
Contributing
Please open an issue, spec, or design discussion before sending large code changes. Design feedback and concrete problem statements are the fastest way to collaborate on the roadmap.
Community
Join the Omnigraph Slack community to ask questions, share feedback, and follow development.