Lakehouse-native graph engine with git-style workflows https://omnigraph.dev
Find a file
Andrew Altshuler cb80fa40f1
exec/query: structured Expr pushdown via Scanner::filter_expr (unblocks CompOp::Contains) (#113)
* exec/query: pushdown IR filters via DataFusion Expr (Scanner::filter_expr)

Switches `execute_node_scan` from string-flattened Lance SQL pushdown
(`build_lance_filter` + `scanner.filter(&str)`) to structured DataFusion
Expr pushdown (`build_lance_filter_expr` + `scanner.filter_expr(Expr)`).

## What this enables

1. **`CompOp::Contains` now pushes down.** `ir_filter_to_sql` returned
   `None` for list-contains (the comment said *"Can't pushdown list
   contains"*) because string SQL can't easily express it. With Expr,
   it lowers to DataFusion's `array_has(col, value)` builtin via the
   `nested_expressions` feature, and pushes down to Lance's scan layer
   the same way Eq/Lt/etc. do. Pinned by the new regression test
   `end_to_end::ir_filter_with_list_contains_pushes_down`.

2. **DataFusion 53's optimizer rules now reach our predicates.** Once
   the Expr lands at the Lance scanner, DF's planner runs:
   - `IN`-list vectorized eq kernel (DF #20528)
   - `PhysicalExprSimplifier` (DF #20111)
   - CASE WHEN x THEN y ELSE NULL shortcut (DF #20097)
   - Push limit into hash join (DF #20228)
   None of these were applicable before because the string SQL path
   short-circuited the optimizer.

## Scope

This is one of three string-flattened pushdown sites; the other two
(`hydrate_nodes`/Expand pushdown at query.rs:771-796 and the mutation
delete path in `exec/mutation.rs::predicate_to_sql`) stay on the SQL
string path for now:

- The Expand pushdown still serializes through `hydrate_nodes`'s
  `extra_filter_sql: Option<&str>` parameter. Migrating it changes the
  `TableStorage` trait surface (`scan_stream(filter: Option<&str>)` →
  `Option<Expr>`) and the cascading call sites — out of scope for this
  MR.
- The mutation delete predicate still goes through `Dataset::delete(&str)`
  in Lance 6.0.1. MR-A (delete two-phase via Lance #6658, gated on the
  Lance v7 bump per issue #112) will migrate that path to
  `DeleteBuilder::execute_uncommitted` taking an Expr.

The existing `ir_filter_to_sql` / `ir_expr_to_sql` / `literal_to_sql`
helpers stay in place to serve the remaining string-SQL consumers
(mutation predicates). They get retired when the other call sites
migrate.

## Cargo

Enables the `nested_expressions` feature on the `datafusion` workspace
dep. Lance already pulls in `datafusion-functions-nested` transitively
(it's listed in their feature set), so this just exposes the
`datafusion::functions_nested::expr_fn::array_has` re-export. No
transitive dep change (Cargo.lock unchanged).

## Tests

- New: `ir_filter_with_list_contains_pushes_down` — pins the case that
  was previously impossible (`ir_filter_to_sql` returning `None`).
- 906/906 workspace tests still pass.
- 417/417 engine integration tests pass (was 416 + the new one).
- 19/19 failpoints (recovery canary).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: pin rustfs/rustfs to 1.0.0-beta.3 (last known-good before creds-policy break)

The RustFS S3 Integration job started failing 2026-05-23 with all 3
tests panicking on the first PUT:

  HTTP error: error sending request

The "Dump RustFS logs on failure" step revealed the container was
dying at startup:

  [FATAL] Server encountered an error and is shutting down:
  Default root credentials are not allowed on non-loopback listeners;
  set RUSTFS_ACCESS_KEY and RUSTFS_SECRET_KEY to non-default values,
  bind to loopback, or set RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true
  for local development only

`rustfs/rustfs:latest` was updated 2026-05-21 (1.0.0-beta.4) with a
credentials-policy check that rejects `rustfsadmin`/`rustfsadmin` as
"default" values. PR #111 passed yesterday because it ran against
beta.3; today's runs against beta.4 fail at container startup.

This is unrelated to PR #113's Expr-pushdown refactor — the bump
just happened to hit the same week.

Pin to 1.0.0-beta.3 (2026-05-14, last tag before the change). The
right long-term fix is one of:
  - Rotate the CI creds to less-default values (less coupling to
    RustFS's "default" set definition)
  - Set `RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true` per the
    error message
  - Use a workflow service container with controlled lifecycle

Deferred — pinning is the minimal restore. Also incidentally
documents *which* version we tested against, which `:latest` never
did.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 12:47:33 +01:00
.cargo Raise LANCE_MEM_POOL_SIZE to 1 GB in .cargo/config.toml 2026-04-19 22:27:49 +03:00
.context Investigate Lance MergeInsertBuilder CAS granularity (MR-766 prereq) 2026-04-28 23:30:17 +00:00
.github exec/query: structured Expr pushdown via Scanner::filter_expr (unblocks CompOp::Contains) (#113) 2026-05-23 12:47:33 +01:00
crates exec/query: structured Expr pushdown via Scanner::filter_expr (unblocks CompOp::Contains) (#113) 2026-05-23 12:47:33 +01:00
docker Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00
docs chore(lance): bump 4.0.0 → 6.0.1 (DataFusion 52→53, Arrow 57→58) (#111) 2026-05-23 00:42:29 +01:00
scripts docs: split user and developer docs (#93) 2026-05-15 03:45:22 +03:00
.dockerignore Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00
.gitignore chore: gitignore the mdrip/ markdown snapshot cache 2026-05-12 17:02:14 -07:00
AGENTS.md chore(lance): bump 4.0.0 → 6.0.1 (DataFusion 52→53, Arrow 57→58) (#111) 2026-05-23 00:42:29 +01:00
Cargo.lock chore(lance): bump 4.0.0 → 6.0.1 (DataFusion 52→53, Arrow 57→58) (#111) 2026-05-23 00:42:29 +01:00
Cargo.toml exec/query: structured Expr pushdown via Scanner::filter_expr (unblocks CompOp::Contains) (#113) 2026-05-23 12:47:33 +01:00
CLAUDE.md Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it 2026-04-28 23:10:09 +02:00
CODE_OF_CONDUCT.md Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00
CONTRIBUTING.md Merge remote-tracking branch 'origin/main' into ragnorc/explore-api 2026-04-18 20:24:39 +02:00
Dockerfile Dockerfile: switch base from Docker Hub to ECR Public 2026-04-20 13:46:23 +03:00
LICENSE Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00
og-cheet-sheet.md Add query lint and check commands 2026-04-13 00:37:44 +03:00
omnigraph.example.yaml example config: use graphs / cli.graph, matching the MR-603 rename 2026-04-18 23:40:35 +03:00
openapi.json schema: HTTP allow_data_loss exposure + e2e drop coverage (MR-694 follow-up) (#107) 2026-05-19 01:56:46 +03:00
README.md Update README.md 2026-05-15 18:06:25 -07:00
rust-toolchain.toml Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00
SECURITY.md Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00

Omnigraph

License: MIT Rust Crates.io CI

Object-storage native graph engine with git-style workflows. Designed for agents as first-class operators.

Branch, commit, and merge typed graph data like source code. Multi-modal, self-hosted, open source.

Built on Rust, Arrow, DataFusion and Lance.

Join the Omnigraph Slack community

Use Cases

  • Company brains / Second brains
  • Context graphs
  • Backbone for multi-agent research
  • Incident response graphs
  • Compliance & audit graphs
  • Enterprise knowledge systems

Capabilities

  • Typed schema, typed queries, and typed mutations
  • Native blob-as-data support (docs, images, videos, etc)
  • Schema-as-code, query validation and linting
  • Git-style graph workflows: branches, commits, merges, and transactional runs
  • Local, on-prem & cloud S3-native storage with snapshot-pinned reads
  • Graph traversal + text, fuzzy, BM25, vector, and RRF search in one runtime
  • Policy-as-code for server-side access control
  • Single CLI for multiple deployments

Quick Install

curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/install.sh | bash

This installs omnigraph and omnigraph-server into ~/.local/bin from published release binaries.

Or install with Homebrew:

brew tap ModernRelay/tap
brew install ModernRelay/tap/omnigraph

For starter graphs and agent skills to bootstrap and operate Omnigraph, see ModernRelay/omnigraph-cookbooks.

One-Command Local RustFS Bootstrap

curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/local-rustfs-bootstrap.sh | bash

That bootstrap:

  • starts RustFS on 127.0.0.1:9000
  • creates a bucket and S3-backed repo
  • loads the checked-in context fixture
  • launches omnigraph-server on 127.0.0.1:8080

Docker must be installed and running first.

The RustFS bootstrap prefers the rolling edge binaries and only falls back to source builds when release assets are unavailable.

If a previous run left objects under the same repo prefix but did not finish initializing the repo, rerun with RESET_REPO=1 or set PREFIX to a new value.

Common Commands

The same URI works for local paths, s3://…, or http://host:port.

omnigraph init   --schema ./schema.pg ./repo.omni
omnigraph load   --data   ./data.jsonl ./repo.omni
omnigraph read   --query  ./queries.gq --name get_person --params '{"name":"Alice"}' ./repo.omni
omnigraph change --query  ./queries.gq --name insert_person --params '{"name":"Mina"}' ./repo.omni
omnigraph branch create --from main feature-x ./repo.omni
omnigraph branch merge  feature-x --into main ./repo.omni

See docs/user/cli.md for schema apply, snapshots, ingest, runs, and policy commands.

Docs

Build And Test

cargo build --workspace
cargo check --workspace
cargo test --workspace

Notes:

  • Rust stable toolchain, edition 2024
  • CI runs cargo test --workspace --locked
  • Full CI and some local test flows require protobuf-compiler
  • S3 integration tests expect an S3-compatible endpoint such as RustFS

Workspace Crates

  • crates/omnigraph-compiler: shared schema/query parser, typechecker, catalog, and IR lowering
  • crates/omnigraph: storage/runtime, branching, merge, change detection, and query execution
  • crates/omnigraph-cli: CLI for init/load/ingest/read/change/branch/snapshot/export/policy operations
  • crates/omnigraph-server: Axum HTTP server for remote reads, changes, ingest, export, branches, commits, and runs

Contributing

Please open an issue, spec, or design discussion before sending large code changes. Design feedback and concrete problem statements are the fastest way to collaborate on the roadmap.

Community

Join the Omnigraph Slack community to ask questions, share feedback, and follow development.