|
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
CI / Container Entrypoint (push) Has been cancelled
Release Edge / Prepare edge release (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled
Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled
Release Edge / Build edge omnigraph-windows-x86_64 (push) Has been cancelled
Release Edge / Smoke Windows installer (push) Has been cancelled
* docs(rfc-013): step-3b handoff + §4.1 corrections (validated)
Add the RFC-013 write-path handoff doc, and correct §4.1's WriteTxn sketch from the
4-subagent validation against current code:
- HandleCache → handle-threading (forward the commit-return handle; a version-keyed
cache misses because HEAD walks N→N+1→N+2 across staging + index-build commits).
- "re-resolution unrepresentable" softened to "pinned base for the pre-commit phase +
named fresh re-reads at the commit/fork boundary" — three reads (commit-time OCC, the
live-HEAD drift probe, fork authority) are irreducible correctness machinery.
- WriteParams DOES carry a session field; the real constraint is "stage off an open
Dataset," so attach the Session by opening read-style then staging off it.
* test(engine): RED step-3b capture-once fitness asserts + open_count probe
Two write-path cost gates, RED today, GREEN after the WriteTxn lands:
- write_validates_schema_contract_once: a write must validate the schema contract
once (3 read_text + 2 exists). Today re-validates at every resolve point —
measured 12 read_text / 9 exists (~4 validations) via CountingStorageAdapter
(zero production change; the write twin of the read-path schema-once test).
- keyed_insert_opens_table_at_most_once: a keyed single-table write must open its
table <=1x. Today measured 10 opens.
Adds an exact open-CALL probe: open_count + record_open() on QueryIoProbes (mirroring
probe_count/record_probe), called at both open chokepoints; surfaced as
IoCounts.open_count. forbidden_apis guarantees every write open routes through them.
* feat(engine): WriteTxn carrier + open_write_txn (3b scaffolding)
The capture-once write transaction (RFC-013 step 3b): WriteTxn{branch, base:
Snapshot, session} + Omnigraph::open_write_txn, which validates the schema contract
once and pins the base snapshot + the shared per-graph Session.
Landed as reviewed scaffolding (gated #[allow(dead_code)]); the next pass threads
Option<&WriteTxn> through open_for_mutation_on_branch / staging on the non-strict
bound-branch path — opening the base once from the pinned entry with the warm session
(a session-aware pinned opener returning a SnapshotHandle) and skipping the per-table
schema re-validation — to turn the two RED cost gates green. Strict ops / fork / the
commit-time OCC re-read keep their fresh reads.
* test(engine): scope write-path open_count to data tables (RFC-013 step 3b)
The keyed_insert_opens_table_at_most_once gate asserted open_count <= 1, but
open_count was a single unclassified counter: record_open() fires in both
open chokepoints, and open_dataset_tracked also opens the internal/system
tables (__manifest via layout.rs, _graph_commits/_graph_commit_actors via
commit_graph.rs). So the count conflated data-table opens with the publisher
CAS + commit-graph append opens — making the gate measure the wrong quantity
and unreachable by threading alone (the manifest publish keeps it >1 regardless).
Scope it by table class, mirroring the read-side counters (which already split
by URI prefix via separate wrappers): record_open(uri) classifies the open's
last path segment and feeds data_open_count vs internal_open_count. IoCounts
exposes both; the gate now asserts data_open_count <= 1.
Re-baselined: a single keyed insert is data_open_count=4 / internal_open_count=6
(sum 10, the old conflated value). The RED target for the WriteTxn threading is
now the real data-table-open count (4 -> 1), with internal opens correctly out
of scope. Pure test-harness/instrumentation; no production behavior change
(classification runs only inside the probe closure, skipped when no probes are
installed).
Also marks #297 (optimize-vs-write race) as landed in the step-3b handoff —
this branch is already stacked on origin/main after it merged.
* feat(engine): validate the schema contract once per write (RFC-013 step 3b)
A single mutate/load re-validated the schema contract ~4 times: at the entry
(ensure_schema_state_valid), per-table in open_for_mutation_on_branch
(resolved_branch_target), at the commit-time OCC re-read (fresh_snapshot_for_branch),
and in the publisher's index-build snapshot (snapshot_for_branch). Each validation
is 3 read_text + 2 exists on the storage adapter — O(touched resolve-points) of
redundant contract I/O on every write.
Thread the already-landed WriteTxn carrier through the write path: capture
`txn = open_write_txn(branch)` once at the mutate/load entry (the single validation),
then source the per-table entry and the commit/publish snapshots from `txn.base`
instead of re-resolving. When `txn` is None (branch merge, schema apply, tests) every
function is byte-identical to before.
- mutate_with_current_actor / load_jsonl_reader capture txn once (replacing the
entry-point ensure_schema_state_valid) and thread Some(&txn) through
execute_*/open_table_for_mutation, commit_all, and
commit_updates_on_branch_with_expected.
- open_for_mutation_on_branch sources (snapshot, branch) from txn.base/txn.branch
when present — skipping resolved_branch_target's re-validation. The OPEN itself is
unchanged (still HEAD via open_dataset_head_for_write), and strict ops keep
ensure_expected_version. Schema-once applies to strict and non-strict alike; the
data-open collapse is a separate change.
- commit_all uses fresh_snapshot_for_branch_unchecked (the OCC manifest re-read minus
the schema re-validation) when txn is present; the drift guard is unchanged.
- prepare_updates_for_commit uses txn.base for the publisher index-build snapshot.
fresh_snapshot_for_branch{,_unchecked} now read the manifest directly via
ManifestCoordinator instead of resolve_target. The OCC re-read consumes only the
Snapshot (per-table location + version), which ManifestCoordinator::open().snapshot()
produces identically — but resolve_target additionally opened the commit graph (a
spurious _graph_commits.lance exists probe the OCC read never consults). Dropping that
load is a pure read-cost reduction for every fresh-snapshot caller (commit_all's None
arm, optimize, repair, fork reclaim); the returned Snapshot is unchanged and the read
is a fresher cold manifest re-read, so the OCC freshness guarantee is preserved.
Greens write_validates_schema_contract_once (3 read_text / 2 exists, was 12/9).
keyed_insert_opens_table_at_most_once stays red (data_open_count=4) — the open
collapse lands next. Full engine suite green otherwise.
* feat(engine): open each data table once per write (RFC-013 step 3b)
A single keyed-node mutate opened its data table 4 times: accumulation (to read
.version()), staging (the real write base), the commit-time drift guard (to read
live HEAD), and the publisher's index build (reopen at the just-committed version).
Collapse three of the four — using the WriteTxn carrier threaded for schema-once —
so a write opens each touched data table at most once.
- #1 accumulation: open_for_mutation_on_branch now returns
(Option<SnapshotHandle>, expected_version, full_path, table_branch). On the txn's
own branch, a non-strict (Insert/Merge) op needs no open — the only thing the
caller reads is .version() (the CAS fence), which is exactly the pinned base
version (entry.table_version). So skip open_dataset_head_for_write and source the
version from txn.base. The node insert path already discarded that handle; the
edge path resolves a pinned read only when non-default cardinality needs it.
STRICT ops and any write that must fork still open live HEAD + ensure_expected_version.
- #3 commit drift guard: commit_all reads live HEAD via
entry.dataset.dataset().latest_version_id() — a cheap manifest-pointer probe off
the already-open staging handle (the same primitive ManifestCoordinator::
probe_latest_version uses) instead of a fresh open_dataset_head_for_write. The
head<current / head>current drift classification is byte-identical.
- #4 index build: commit_all now returns the per-table post-commit_staged
SnapshotHandle map; commit_updates_on_branch_with_expected threads it into
prepare_updates_for_commit, which builds indices on the threaded handle instead of
reopening at the same just-committed version. Absent a handle (other writers,
inline/delete tables) the reopen path is byte-identical.
When txn is None (branch merge, schema apply, tests) every function opens and checks
exactly as before. Greens keyed_insert_opens_table_at_most_once (data_open_count 4->1).
Schema-once gate stays 3/2. Full engine suite + failpoints (recovery sidecar lifecycle)
green.
* refactor(engine): name the write-path open/commit returns (RFC-013 step 3b)
The open collapse left two positional returns that are easy to mis-thread and
carry an unwritten contract: open_for_mutation_on_branch's
(Option<SnapshotHandle>, u64, String, Option<String>) and commit_all's 5-tuple
(updates, expected_versions, sidecar_handle, guards, committed_handles). Replace
both with named structs so each field reads at the call site and the Option's
contract is documented, not folklore.
- OpenedForMutation { handle, expected_version, full_path, table_branch } with a
require_handle(ctx) helper for the callers that must have a handle (strict ops,
the fork path, every no-txn caller — branch merge, the seed test). The handle is
None only on the non-strict-txn open-skip path (collapse #1); require_handle
panics with a named context if that contract is ever broken.
- CommittedMutation { updates, expected_versions, sidecar_handle, guards,
committed_handles } for commit_all; consumers destructure into the same local
bindings they already used, so the publish/sidecar/guard-hold logic is unchanged.
- A debug_assert in open_table_for_mutation pins the skip contract: a missing handle
is legal only on the non-strict txn path, so a future strict arm returning None
trips in debug builds instead of handing None to a require_handle consumer.
Pure refactor — no behavior change. Both cost gates stay green (schema 3/2,
data_open_count=1), full engine suite + lib (162) green.
* refactor(engine): drop the unearned session field from WriteTxn (RFC-013 step 3b)
The open collapse greens data_open_count<=1 by SKIPPING the accumulation open,
PROBING live HEAD with latest_version_id, and REUSING the commit_staged handle —
none of which consume a session. The captured WriteTxn.session was therefore dead
(`#[allow(dead_code)]`): unearned surface a reviewer rightly flags.
Remove it. The carrier is now {branch, base} — exactly what schema-once + the open
collapse use. Step 5 (PublishPlan unification) makes WriteTxn the non-optional
publish carrier and is the right home for session-aware base opens, where the
warm-session benefit on the single remaining open — an object-store (S3) phenomenon,
invisible on local FS — can be earned by its own cost gate rather than carried dead
through this PR.
No behavior change; both cost gates stay green (schema 3/2, data_open_count=1).
* docs(rfc-013): mark step 3b DONE — schema-once + open-collapse shipped, session deferred to step 5
* docs(rfc-013): capture the write-base-staleness convergence (§1d)
Three findings this cycle share one root — the write base is a stale, un-probed,
un-classified pin (the read path probes; the write path returns the warm
coordinator snapshot):
- #298 edge-@card stale-read regression (cursor High / codex P1, VALID): collapse #1
made the cardinality scan read txn.base instead of live HEAD, so a concurrent edge
is uncounted and a max can be exceeded. Fix on #298: restore the live-HEAD read +
deterministic test + correct the single-writer doc comment.
- The structural liability underneath: no unified write-validation read-set —
endpoint/cardinality/uniqueness each pick freshness ad hoc (warm/pinned/live),
the same cardinality check forks mutation-vs-loader, none re-validated at commit.
- The served-strict-write stale-view false-fail (validated on prod + a #[ignore]
repro): a strict update/delete false-fails ExpectedVersionMismatch after an external
optimize advance — the write-side mirror of #297/§6.6. The naive blanket probe is
proven wrong (breaks the cross-process lost-update OCC contract).
All three converge on Design A (step 5): open_txn's warm probe makes the base fresh,
the op-class-aware precondition (derive maintenance vs logical from Lance per-version
transaction metadata — no parallel marker) fast-forwards maintenance and fails logical,
and §7.1's read-set-in-CAS unifies + re-validates the validation read-set. §8 records
the #298 follow-up, the widened §7.1 scope, and the step-5 two-test acceptance contract.
* test(engine): RED — edge @card must scan live HEAD, not stale txn.base (#298)
Regression guard for the cursor-High/codex-P1 finding on #298: 3b's collapse #1
made the non-strict edge-insert cardinality scan read the pinned txn.base instead
of live HEAD (edge_cardinality_read_handle), so a concurrent edge committed after
txn capture is uncounted and a @card max is silently exceeded (invariant 9).
Deterministic two-handle test (no failpoint): handle A commits WorksAt(Alice->Acme)
to the @card(0..1) max; stale handle B (never read since) inserts a second WorksAt
for Alice. B's coordinator is stale by construction (the write path doesn't probe),
so B scans txn.base (Alice has 0) and wrongly commits the 2nd edge. RED: the insert
that must be rejected currently succeeds (panics at unwrap_err). Goes green when the
scan reads live HEAD.
* fix(engine): scan live HEAD for edge @card, not the pinned txn.base (#298)
3b's collapse #1 skips the non-strict edge accumulation open, so edge_cardinality_
read_handle reopened the edge table at the pinned txn.base for the @card scan. Since
cardinality is validated once (never rechecked at commit), a concurrent edge committed
after txn capture was uncounted and a @card max could be silently exceeded (invariant
9) — the cursor-High/codex-P1 regression on #298. Pre-3b the scan read live HEAD (the
mutation's own open_dataset_head_for_write handle).
Restore the live-HEAD read: take the table LOCATION from the pinned entry (stable
across versions) and open the dataset at its current HEAD via open_dataset_head_for_
write. Gate-safe — the data_open_count / merge-insert-only gates are node inserts; the
edge cardinality path (non-default @card only) is untouched by them, and the extra
live-HEAD open is exactly the pre-3b shape. Also drops the dead None-fallback's schema
re-validation (greptile P2, auto-resolved). The residual validate->commit TOCTOU is the
pre-existing §7.1 gap (RFC-013 step 4), recorded in handoff §1d/§8.
Turns cardinality_rejected_for_stale_handle_after_concurrent_edge_commit green;
validators / write_cost / writes / consistency / end_to_end / branching all green.
* docs(dev): link handoff docs from index
* docs(engine): tighten 3b claims to match the code (#298 review)
Review caught several comments/docs overclaiming what the code does (the session
drop + the #298 cardinality fix left stale/too-strong wording). No logic change.
- open_write_txn doc: drop the stale "shared per-graph Session" (WriteTxn no longer
carries one); scope "once" to the table-touch hot path and note edge/load RI
validation still re-resolves (→ step 4 §7.1) + the session-aware open is step 5.
- edge cardinality call-site comment: it said the scan uses a "pinned txn.base" — it
now opens LIVE HEAD (#298); corrected.
- write_cost.rs: "opens the base once (with the shared Session)" → session-aware base
open is deferred to step 5.
- data_open_count completeness (instrumentation.rs + write_cost.rs): forbidden_apis
only keeps engine code OUTSIDE the storage layer on the chokepoints; table_store.rs
is allow-listed and holds direct Dataset::opens for branch-management ops (not the
keyed-write hot path the gate measures). Narrowed the claim accordingly.
- handoff §4: "schema once / open once" is the node hot path (the two gates); edge
endpoint + loader RI/cardinality still re-validate and read warm — #298 un-regresses
cardinality only, it does NOT close write-validation freshness (that's step 4 §1d/§7.1).
build clean; write_cost / validators / forbidden_apis green.
|
||
|---|---|---|
| .cargo | ||
| .context | ||
| .github | ||
| assets | ||
| crates | ||
| docker | ||
| docs | ||
| scripts | ||
| skills/omnigraph | ||
| .dockerignore | ||
| .gitignore | ||
| AGENTS.md | ||
| Cargo.lock | ||
| Cargo.toml | ||
| CLAUDE.md | ||
| CODE_OF_CONDUCT.md | ||
| CONTRIBUTING.md | ||
| Dockerfile | ||
| GOVERNANCE.md | ||
| LICENSE | ||
| og-cheet-sheet.md | ||
| omnigraph.example.yaml | ||
| openapi.json | ||
| README.md | ||
| rust-toolchain.toml | ||
| SECURITY.md | ||
Lakehouse graph database for context assembly & multi-agent coordination
Multimodal retrieval · Git-style branching · object-storage native
Quickstart · Docs · Cookbooks · CLI
Omnigraph is the operational state and coordination layer for fleets of agents.
Run it as a server, declared as code; hundreds of agents operate and enrich the graph on parallel isolated branches, and every change is reviewed and merged safely.
Key capabilities
| Capability | What it gives you |
|---|---|
| Declared as code | A cluster.yaml declares graphs, schemas, stored queries, embedding providers, and policies; cluster apply converges it and omnigraph-server brings every graph online at /graphs/{id}/…. |
| Built for fleets of agents | Hundreds of agents enrich the graph on parallel isolated branches; changes are reviewed and merged safely, Git-style, across the whole graph. |
| Multimodal retrieval | Graph traversal + vector ANN + full-text + Reciprocal Rank Fusion in one query runtime, for context assembly. |
| Security as code | Cedar policy enforced server-side on every mutation, per-graph and server-wide; bearer auth; actor/audit tracking. |
| Runs on your infrastructure | Any S3-compatible object store: on-prem via RustFS / MinIO, or AWS S3 / R2 / GCS. VPC, on-prem, hybrid; your data never leaves your store. |
| Open, versioned storage | Lance columnar format: branchable, time-travelable, with native blob-as-data (docs, images, video). |
What you can build
| Use case | What it's for |
|---|---|
| Company brain | Org knowledge unified into one graph every agent can query |
| Agentic memory | Durable, versioned memory: a branch per agent or per task, merged on review |
| Context graph | Decision traces and codified tribal knowledge for retrieval |
| Dev graph | Issues & dependency model that coding agents read and write |
| R&D / ML data layer | Experiments and trials written into branches, versioned for training & eval |
Install
curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/install.sh | bash
This installs omnigraph (CLI) and omnigraph-server into ~/.local/bin from
published release binaries. Or with Homebrew:
brew tap ModernRelay/tap
brew install ModernRelay/tap/omnigraph
Set it up with an AI agent
Omnigraph is built to be run by coding agents. Two ways in:
Teach your agent the playbook. This repo ships the
omnigraph agent skill: the operational playbook
covering cluster mode, the two config surfaces, schema evolution, query linting,
data writes, branches, Cedar policy, and the common gotchas.
npx skills add ModernRelay/omnigraph@omnigraph
Or have an agent set it up from scratch. Paste this into Claude Code, Codex, or any agent that can read a URL and run a shell command:
Help me set up Omnigraph
1. Read the docs at https://github.com/ModernRelay/omnigraph, starting with
docs/user/clusters/index.md, then docs/user/deployment.md.
2. Skim the starter graphs and seed data in the cookbooks:
https://github.com/ModernRelay/omnigraph-cookbooks
3. Ask me what I want to build (company brain, agent memory, dev graph,
research / R&D layer, …). Then stand up a cluster for it, load a little
data, and run a query so I can see it working.
For ready-to-run graphs with real seed data (company brain, VC operating system,
pharma & industry intel),
ModernRelay/omnigraph-cookbooks
is the fastest way to see Omnigraph shaped to a real domain.
Deploy
A deployment is a cluster: a multigraph config directory that declares
its graphs, schemas, stored queries, and policies as code. You manage it
Terraform-style: cluster plan previews the diff, cluster apply converges
it. omnigraph-server then boots from the cluster and brings every graph online
at /graphs/{id}/…, each behind its own policy.
1. Declare the cluster.
company-brain/
├── cluster.yaml
├── people.pg # schema for the "knowledge" graph
├── queries/ # stored queries: the .gq files ARE the declaration
│ └── people.gq
└── base.policy.yaml # a Cedar policy bundle
# cluster.yaml
version: 1
metadata:
name: company-brain
storage: s3://company/clusters/company-brain # ledger, catalog, and graph data live here
graphs:
knowledge:
schema: people.pg
queries: queries/ # every `query <name>` in queries/*.gq registers
policies:
base:
file: base.policy.yaml
applies_to: [knowledge] # graph-bound; use [cluster] for server-level
2. Stand up your object store. On-prem, run RustFS (or MinIO); Omnigraph
writes Lance to it over the standard S3
API. In the cloud, point the same AWS_* env at S3 / R2 / GCS instead.
3. Converge and run. apply creates each graph, applies its schema, and
publishes queries and policies into the content-addressed catalog. It is
idempotent; re-running is always safe.
omnigraph cluster validate # parse + typecheck everything
omnigraph cluster plan # preview what apply would do
omnigraph cluster apply # converge
# Boot the server from the cluster dir; storage resolves through cluster.yaml
omnigraph-server --cluster company-brain --bind 0.0.0.0:8080
See the cluster guide for the day-2 loop
(edit → plan → apply → restart), approval gates for destructive changes, drift
inspection, and recovery; the deployment guide for
containers, AWS/Railway, auth, and the full AWS_* contract.
Query and mutate
Set a default server and graph once in ~/.omnigraph/config.yaml, and the
everyday commands stay short. Stored queries and mutations run by name:
omnigraph query search_docs --params '{"q":"AI safety"}'
omnigraph mutate add_person --params '{"name":"Mina"}'
# Branch, review, merge across the whole graph; agents write in isolation
omnigraph branch create --from main agent/ingest-42
omnigraph branch merge agent/ingest-42 --into main
An alias is shorter still: bind a server, graph, and stored query to one
name, then omnigraph alias triage runs it. For an ad-hoc target, any command
still takes --server <name|url> --graph <id> (or --store <uri> for a local
graph). See the CLI reference.
Security & governance
- Engine-wide enforcement: every write path goes through the same Cedar gate, so the HTTP server, the CLI, and the embedded SDK obey identical rules.
- Declared in the cluster: a policy bundle is bound to graphs (or the whole server) via
policies:→applies_to. - Scoped: rules apply per graph, per branch, or server-wide.
- No plaintext tokens: bearer tokens are hashed at startup and compared in constant time.
- Forge-proof identity: the actor is resolved server-side from the token; clients can't set it.
See the policy guide.
Clients & SDKs
| Client | Use it for | Where |
|---|---|---|
| TypeScript SDK | typed access from Node / TS | @modernrelay/omnigraph · source |
| MCP server | bridge Omnigraph to LLM hosts (Claude, Codex, …) | @modernrelay/omnigraph-mcp |
| HTTP / OpenAPI | any language, the wire contract | the server's OpenAPI spec |
| Python SDK | typed access from Python | coming soon |
Both npm packages are versioned in lockstep with omnigraph-server.
Local quick test (no server)
1-min setup to try it: an embedded, local file-backed graph (no server, no object store). For dev and experiments; production is the deployed cluster above.
cat > schema.pg <<'PG'
node Signal { slug: String @key, title: String }
node Pattern { slug: String @key, name: String }
edge Indicates: Signal -> Pattern
PG
printf '%s\n' \
'{"type":"Signal","data":{"slug":"s1","title":"OSS model adoption surging"}}' \
'{"type":"Pattern","data":{"slug":"p1","name":"adoption"}}' \
'{"edge":"Indicates","from":"s1","to":"p1"}' > data.jsonl
omnigraph init --schema schema.pg ./graph.omni
omnigraph load --data data.jsonl --mode overwrite --store ./graph.omni
# "What pattern does signal s1 indicate?"
omnigraph query --store ./graph.omni \
-e 'query indicates() { match { $s: Signal { slug: "s1" } $s indicates $p } return { $p.name } }'
# → adoption
Docs
Build And Test
cargo build --workspace
cargo test --workspace
Notes:
- Rust stable toolchain, edition 2024
- CI runs
cargo test --workspace --locked - Full CI and some local test flows require
protobuf-compiler - S3 integration tests expect an S3-compatible endpoint such as RustFS
Workspace Crates
crates/omnigraph-compiler: shared schema/query parser, typechecker, catalog, and IR lowering (zero Lance dependency)crates/omnigraph(packageomnigraph-engine): storage/runtime, branching, merge, change detection, query execution, and embeddingscrates/omnigraph-policy: Cedar policy compilation and enforcementcrates/omnigraph-api-types: shared HTTP wire DTOs used by both the server and the CLIcrates/omnigraph-cluster: cluster config validation, planning, and apply (the control plane)crates/omnigraph-server: Axum HTTP server, cluster-first, runs N graphs under/graphs/{id}/…crates/omnigraph-cli: CLI for graph lifecycle, query/mutate, branch/commit/merge, schema/lint, snapshot/export, cluster control, policy/queries, profiles, and maintenance
Contributing
Please open an issue, spec, or design discussion before sending large code changes. Design feedback and concrete problem statements are the fastest way to collaborate on the roadmap.
Community
Join the Omnigraph Slack community to ask questions, share feedback, and follow development.