omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-30 02:49:39 +02:00

Lakehouse-native graph engine with git-style workflows https://omnigraph.dev

Find a file

Ragnor Comerford a7d4cba53d perf(engine): halve per-write __manifest scans (#307 ) * test(write_cost): served-regime __manifest scan tripwire Adds `internal_table_scans_grow_without_compaction`, the served-regime twin of `internal_table_scans_are_flat_in_history`. The flat gate `optimize()`s before every measured write, so it only proves the compacted invariant and stays green even when a served graph's per-write `__manifest` scan amplifies without bound. This tripwire measures the uncompacted regime and asserts the scan grows — green today, and it flips RED once the amplification is bounded (write-path warm-reuse + version-GC), at which point it inverts to a permanent `assert_flat` gate. RFC-013. * perf(engine): halve per-write __manifest scans (RFC-013 PR2) Cuts a same-branch write from ~4 to ~2 `__manifest` scans (measured 50->25 at depth 10, 410->205 at depth 100) with the OCC contract and snapshot isolation preserved: - #1a probe-gate the OCC re-capture in `commit_all` via `occ_snapshot_for_branch` (mirrors the read path's `resolve_target_inner`): reuse the warm coordinator when a cheap incarnation probe proves it current, fall through to a cold read on mismatch. - #1b fold the post-publish `known_state` in-memory from `existing_versions` plus the committed rows instead of an O(fragments) re-scan; extracted the shared `assemble_manifest_state` reduction so the fold is byte-identical to a scan, proven by the new `post_publish_fold_matches_fresh_reopen` test. - #1c project `read_manifest_scan` to the columns it reads (drop `base_objects` always, `object_id` on the table-state path). The two remaining publish scans (`load_publish_state` and the `use_index(false)` merge-insert join) stay O(fragments), bounded by compaction/version-GC (RFC-013 PR1, not in this change). * test(manifest): reproduce owner-branch handoff fold desync The PR #307 post-publish fold appends pending table_version rows after existing_versions, and assemble_manifest_state keeps the first equal-version entry. A same-version owner-branch handoff updates a table_version row in place at the same Lance version with a new table_branch (merge-insert UpdateAll on the deterministic version_object_id), so the warm coordinator keeps the stale fork while a fresh re-scan reflects the handoff. This test commits a handoff through the coordinator commit path (exercising the fold) and asserts the warm snapshot equals a fresh reopen. It is red against the current fold; the following commit turns it green. Flagged by Cursor Bugbot (High) and ChatGPT Codex (P2) on PR #307. * fix(engine): fold table_version rows by (table_key, version) identity fold_inputs now keys version entries by (table_key, table_version), the manifest row identity carried by the deterministic version_object_id that the merge-insert CAS uses. A pending row at the same identity replaces the pre-publish entry, mirroring merge-insert UpdateAll on disk. Previously the fold appended pending rows after existing_versions, so an owner-branch handoff left two equal-version entries and assemble_manifest_state retained the stale one. The fold input now carries the same one-row-per-(table_key, version) uniqueness a fresh scan produces, so both feed assemble_manifest_state equivalent inputs and the warm known_state stays byte-identical to read_manifest_state. This corrects the derivation's identity model structurally and applies to any same-version in-place update. Closes the PR #307 review finding. * test(cost): enable lance-io test-util for IO request diagnostics Gives IoStats.requests + assert_io_eq!, used by the cost harness to record the __manifest read log (method + path) for failure diagnostics. Dev-dependency only, so production builds (which exclude dev-deps) never compile it. * test(cost): rebuild IO harness on GraphIoMeter + incremental_stats Consolidate the per-op ProbeHandles into OpProbes plus a persistent GraphIoMeter, and read per-op deltas via lance's incremental_stats() (get-and-reset) instead of cumulative stats() -- the upstream per-request idiom (rust/lance/src/dataset/tests/dataset_io.rs). Add cost_harness(body): it installs one __manifest tracker for a whole test body, so the graph opens under it and every coordinator handle (init plus each post-publish reassignment) carries the same tracker. measure reuses that ambient tracker when present, making manifest_reads ground truth (warm probe plus cold scans, handle-age-irrelevant); outside cost_harness it falls back to a fresh per-op tracker (today's behavior). The body future is boxed so wrapping a whole test body does not overflow the test thread's stack. Also stash each op's __manifest read log on the meter for assert_io_eq!-style failure diagnostics (last_manifest_reads). Behavior-preserving: no test wraps its body in cost_harness yet, so measure takes the fallback path and every cost number is unchanged. write_cost and warm_read_cost stay green. * test(write_cost): ground-truth __manifest counting via cost_harness Wrap the three __manifest-asserting tests (flat, grow, ceiling) in cost_harness so manifest_reads is ground truth -- the warm-coordinator freshness probe rides a long-lived handle a per-op tracker installed at measure time cannot see. The flat/grow gates are depth-difference assertions, so the constant per-write probe offset cancels and they pass unchanged; the absolute ceiling is retightened from 34 to 24 (~18 measured = ~15 publish-path scans + ~3 probe RPCs) with the read log dumped on a breach. Add manifest_reads_capture_warm_probe: it measures the same warm write fresh-only and under cost_harness and asserts ground truth strictly exceeds fresh-only by the probe's RPCs (11 vs 14). Reverting the ground-truth wiring makes the two equal, so this guards that a write's warm-handle probe (3 object-store RPCs that were counted as a single version_probe) cannot silently escape manifest_reads again. * test(warm_read_cost): ground-truth __manifest counting via cost_harness Wrap the warm (== 0) manifest gates in cost_harness so manifest_reads is ground truth. A read's freshness probe is served from Lance's cached manifest at 0 object-store reads (unlike a write's probe, which re-reads after its commit), so the == 0 assertions hold with no re-baseline -- and now also catch any future warm-handle scan a per-op tracker would miss. The stale (> 0) tests are unaffected either way and stay on the fresh fallback. * docs(testing): document ground-truth cost harness (GraphIoMeter) The cost harness now reads incremental_stats() deltas and, under cost_harness, installs one __manifest tracker before the graph opens so manifest_reads is ground truth (handle-age-irrelevant). Note that version_probes is the probe call count and that ground truth reveals a write's probe does ~3 object-store RPCs. * docs(rfc-013): bring write-path handoff current (Thread B + Phase 7 landed) Prepend a current-state section (§A) for the __manifest scan-amplification / version-chain thread: the problem, what landed on main (step 2a, Phase 7 #299), what is in flight on this branch / PR #307 (PR2 scan-halving, the owner-branch handoff fold fix, the PR2.1 ground-truth cost harness), the accurate measurement (per-write __manifest ops ~50->410 pre-PR2 vs 28->208 ground truth; the hidden 3-RPC freshness probe), the remaining roadmap (PR1a manual cleanup, PR3-scoping, deferred PR1b/PR4), critical files, and gotchas. Staleness fixes: Phase 7 was listed as a future "step 4" but landed as #299, so mark it LANDED in the TL;DR landed list and in the remaining-steps section. * docs(rfc-013): refresh PR307 handoff state		2026-06-27 13:18:04 +02:00
.cargo	Raise LANCE_MEM_POOL_SIZE to 1 GB in .cargo/config.toml	2026-04-19 22:27:49 +03:00
.context	Investigate Lance MergeInsertBuilder CAS granularity (MR-766 prereq)	2026-04-28 23:30:17 +00:00
.github	write-path cost gate + opener bypass (#288 )	2026-06-20 13:31:15 +02:00
assets	docs(readme): drop em-dashes, Cursor→Codex, rename agent section (#274 )	2026-06-17 02:36:14 +03:00
crates	perf(engine): halve per-write __manifest scans (#307 )	2026-06-27 13:18:04 +02:00
docker	fix(cluster): stop cluster-apply crash-loops from the recovery-sidecar trap (#284 )	2026-06-19 03:34:15 +03:00
docs	perf(engine): halve per-write __manifest scans (#307 )	2026-06-27 13:18:04 +02:00
scripts	docs: onboarding-first README + in-repo agent skill + drop RustFS script (#257 )	2026-06-16 11:48:13 +02:00
skills/omnigraph	docs: onboarding-first README + in-repo agent skill + drop RustFS script (#257 )	2026-06-16 11:48:13 +02:00
.dockerignore	feat(docker): cluster-mode entrypoint and the CLI in the image	2026-06-10 22:44:54 +03:00
.gitignore	release: v0.5.0 (#115 )	2026-05-23 13:59:42 +01:00
AGENTS.md	feat(engine): graph lineage in __manifest — single-source fold, v3→v4 migration, schema-version floor (#299 )	2026-06-25 13:55:34 +02:00
Cargo.lock	release: v0.7.2 (#301 )	2026-06-25 09:08:12 +02:00
Cargo.toml	build(deps): bump Lance 6.0.1 → 7.0.0 (correct-by-design substrate alignment) (#229 )	2026-06-14 20:42:24 +02:00
CLAUDE.md	Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it	2026-04-28 23:10:09 +02:00
CODE_OF_CONDUCT.md	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00
CONTRIBUTING.md	chore: remove CODEOWNERS chassis and the code-owner review gate	2026-06-18 02:55:27 +03:00
Dockerfile	feat(docker): cluster-mode entrypoint and the CLI in the image	2026-06-10 22:44:54 +03:00
GOVERNANCE.md	chore: remove CODEOWNERS chassis and the code-owner review gate	2026-06-18 02:55:27 +03:00
LICENSE	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00
og-cheet-sheet.md	feat: inline query strings in CLI and HTTP server (#110 )	2026-05-29 13:41:54 +02:00
omnigraph.example.yaml	example config: use graphs / cli.graph, matching the MR-603 rename	2026-04-18 23:40:35 +03:00
openapi.json	release: v0.7.2 (#301 )	2026-06-25 09:08:12 +02:00
README.md	docs(readme): drop em-dashes, Cursor→Codex, rename agent section (#274 )	2026-06-17 02:36:14 +03:00
rust-toolchain.toml	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00
SECURITY.md	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00

README.md

OMNIGRAPH

Lakehouse graph database for context assembly & multi-agent coordination
_{Multimodal retrieval · Git-style branching · object-storage native}

Quickstart · Docs · Cookbooks · CLI

Omnigraph is the operational state and coordination layer for fleets of agents.
Run it as a server, declared as code; hundreds of agents operate and enrich the graph on parallel isolated branches, and every change is reviewed and merged safely.

Key capabilities

Capability	What it gives you
Declared as code	A `cluster.yaml` declares graphs, schemas, stored queries, embedding providers, and policies; `cluster apply` converges it and `omnigraph-server` brings every graph online at `/graphs/{id}/…`.
Built for fleets of agents	Hundreds of agents enrich the graph on parallel isolated branches; changes are reviewed and merged safely, Git-style, across the whole graph.
Multimodal retrieval	Graph traversal + vector ANN + full-text + Reciprocal Rank Fusion in one query runtime, for context assembly.
Security as code	Cedar policy enforced server-side on every mutation, per-graph and server-wide; bearer auth; actor/audit tracking.
Runs on your infrastructure	Any S3-compatible object store: on-prem via RustFS / MinIO, or AWS S3 / R2 / GCS. VPC, on-prem, hybrid; your data never leaves your store.
Open, versioned storage	`Lance` columnar format: branchable, time-travelable, with native blob-as-data (docs, images, video).

What you can build

Use case	What it's for
Company brain	Org knowledge unified into one graph every agent can query
Agentic memory	Durable, versioned memory: a branch per agent or per task, merged on review
Context graph	Decision traces and codified tribal knowledge for retrieval
Dev graph	Issues & dependency model that coding agents read and write
R&D / ML data layer	Experiments and trials written into branches, versioned for training & eval

Install

curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/install.sh | bash

This installs omnigraph (CLI) and omnigraph-server into ~/.local/bin from published release binaries. Or with Homebrew:

brew tap ModernRelay/tap
brew install ModernRelay/tap/omnigraph

Set it up with an AI agent

Omnigraph is built to be run by coding agents. Two ways in:

Teach your agent the playbook. This repo ships the omnigraph agent skill: the operational playbook covering cluster mode, the two config surfaces, schema evolution, query linting, data writes, branches, Cedar policy, and the common gotchas.

npx skills add ModernRelay/omnigraph@omnigraph

Or have an agent set it up from scratch. Paste this into Claude Code, Codex, or any agent that can read a URL and run a shell command:

Help me set up Omnigraph

1. Read the docs at https://github.com/ModernRelay/omnigraph, starting with
   docs/user/clusters/index.md, then docs/user/deployment.md.
2. Skim the starter graphs and seed data in the cookbooks:
   https://github.com/ModernRelay/omnigraph-cookbooks
3. Ask me what I want to build (company brain, agent memory, dev graph,
   research / R&D layer, …). Then stand up a cluster for it, load a little
   data, and run a query so I can see it working.

For ready-to-run graphs with real seed data (company brain, VC operating system, pharma & industry intel), ModernRelay/omnigraph-cookbooks is the fastest way to see Omnigraph shaped to a real domain.

Deploy

A deployment is a cluster: a multigraph config directory that declares its graphs, schemas, stored queries, and policies as code. You manage it Terraform-style: cluster plan previews the diff, cluster apply converges it. omnigraph-server then boots from the cluster and brings every graph online at /graphs/{id}/…, each behind its own policy.

1. Declare the cluster.

company-brain/
├── cluster.yaml
├── people.pg          # schema for the "knowledge" graph
├── queries/           # stored queries: the .gq files ARE the declaration
│   └── people.gq
└── base.policy.yaml   # a Cedar policy bundle

# cluster.yaml
version: 1
metadata:
  name: company-brain
storage: s3://company/clusters/company-brain   # ledger, catalog, and graph data live here
graphs:
  knowledge:
    schema: people.pg
    queries: queries/                          # every `query <name>` in queries/*.gq registers
policies:
  base:
    file: base.policy.yaml
    applies_to: [knowledge]                    # graph-bound; use [cluster] for server-level

2. Stand up your object store. On-prem, run RustFS (or MinIO); Omnigraph writes Lance to it over the standard S3 API. In the cloud, point the same AWS_* env at S3 / R2 / GCS instead.

3. Converge and run. apply creates each graph, applies its schema, and publishes queries and policies into the content-addressed catalog. It is idempotent; re-running is always safe.

omnigraph cluster validate   # parse + typecheck everything
omnigraph cluster plan       # preview what apply would do
omnigraph cluster apply      # converge

# Boot the server from the cluster dir; storage resolves through cluster.yaml
omnigraph-server --cluster company-brain --bind 0.0.0.0:8080

See the cluster guide for the day-2 loop (edit → plan → apply → restart), approval gates for destructive changes, drift inspection, and recovery; the deployment guide for containers, AWS/Railway, auth, and the full AWS_* contract.

Query and mutate

Set a default server and graph once in ~/.omnigraph/config.yaml, and the everyday commands stay short. Stored queries and mutations run by name:

omnigraph query  search_docs --params '{"q":"AI safety"}'
omnigraph mutate add_person  --params '{"name":"Mina"}'

# Branch, review, merge across the whole graph; agents write in isolation
omnigraph branch create --from main agent/ingest-42
omnigraph branch merge  agent/ingest-42 --into main

An alias is shorter still: bind a server, graph, and stored query to one name, then omnigraph alias triage runs it. For an ad-hoc target, any command still takes --server <name|url> --graph <id> (or --store <uri> for a local graph). See the CLI reference.

Security & governance

Engine-wide enforcement: every write path goes through the same Cedar gate, so the HTTP server, the CLI, and the embedded SDK obey identical rules.
Declared in the cluster: a policy bundle is bound to graphs (or the whole server) via policies: → applies_to.
Scoped: rules apply per graph, per branch, or server-wide.
No plaintext tokens: bearer tokens are hashed at startup and compared in constant time.
Forge-proof identity: the actor is resolved server-side from the token; clients can't set it.

See the policy guide.

Clients & SDKs

Client	Use it for	Where
TypeScript SDK	typed access from Node / TS	`@modernrelay/omnigraph` · source
MCP server	bridge Omnigraph to LLM hosts (Claude, Codex, …)	`@modernrelay/omnigraph-mcp`
HTTP / OpenAPI	any language, the wire contract	the server's OpenAPI spec
Python SDK	typed access from Python	coming soon

Both npm packages are versioned in lockstep with omnigraph-server.

Local quick test (no server)

1-min setup to try it: an embedded, local file-backed graph (no server, no object store). For dev and experiments; production is the deployed cluster above.

cat > schema.pg <<'PG'
node Signal  { slug: String @key, title: String }
node Pattern { slug: String @key, name: String }
edge Indicates: Signal -> Pattern
PG
printf '%s\n' \
  '{"type":"Signal","data":{"slug":"s1","title":"OSS model adoption surging"}}' \
  '{"type":"Pattern","data":{"slug":"p1","name":"adoption"}}' \
  '{"edge":"Indicates","from":"s1","to":"p1"}' > data.jsonl

omnigraph init  --schema schema.pg ./graph.omni
omnigraph load  --data data.jsonl --mode overwrite --store ./graph.omni

# "What pattern does signal s1 indicate?"
omnigraph query --store ./graph.omni \
  -e 'query indicates() { match { $s: Signal { slug: "s1" }  $s indicates $p } return { $p.name } }'
# → adoption

Docs

Cluster guide · Deployment guide · CLI reference
Schema · Queries · Search · Policy

Build And Test

cargo build --workspace
cargo test  --workspace

Notes:

Rust stable toolchain, edition 2024
CI runs cargo test --workspace --locked
Full CI and some local test flows require protobuf-compiler
S3 integration tests expect an S3-compatible endpoint such as RustFS

Workspace Crates

crates/omnigraph-compiler: shared schema/query parser, typechecker, catalog, and IR lowering (zero Lance dependency)
crates/omnigraph (package omnigraph-engine): storage/runtime, branching, merge, change detection, query execution, and embeddings
crates/omnigraph-policy: Cedar policy compilation and enforcement
crates/omnigraph-api-types: shared HTTP wire DTOs used by both the server and the CLI
crates/omnigraph-cluster: cluster config validation, planning, and apply (the control plane)
crates/omnigraph-server: Axum HTTP server, cluster-first, runs N graphs under /graphs/{id}/…
crates/omnigraph-cli: CLI for graph lifecycle, query/mutate, branch/commit/merge, schema/lint, snapshot/export, cluster control, policy/queries, profiles, and maintenance

Contributing

Please open an issue, spec, or design discussion before sending large code changes. Design feedback and concrete problem statements are the fastest way to collaborate on the roadmap.

Community

Join the Omnigraph Slack community to ask questions, share feedback, and follow development.