mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-15 01:55:13 +02:00

Lakehouse-native graph engine with git-style workflows https://omnigraph.dev

Find a file

Ragnor Comerford 1bed998052 fix(engine): scalar index coverage + filter literal coercion (query latency) (#216 ) * fix(engine): lower date/datetime filter literals as typed Arrow scalars `literal_to_expr` lowered `Date`/`DateTime` query literals as Utf8 strings, relying on DataFusion implicit casts. Against a physical `Date32`/`Date64` column that can coerce the column side (`CAST(col AS Utf8)`), which defeats a scalar BTREE and degrades the scan to a full filtered read. Lower to typed `Date32`/`Date64` scalars instead (reusing the loader's `parse_date32_literal`/`parse_date64_literal`, already used by the in-memory comparison arm), so the predicate stays a direct column comparison and the index is used. Malformed literals fall back to the Utf8 string so pushdown behavior never regresses. Tests: unit goldens asserting the lowered literal is typed (red before, green after) + inline-binding pushdown equality in literal_filters confirming the epoch conversion selects the right rows. * fix(engine): build scalar BTREE for enum and orderable-scalar @index columns `build_indices_on_dataset_for_catalog` only handled `String` (-> FTS) and `Vector` (-> vector). Enums are physically `String`, so an enum `@index` column (e.g. `status`) got an FTS inverted index, which Lance never consults for `=`; and `DateTime`/`Date`/numeric/`Bool` `@index` columns fell through and built nothing. Both meant equality/range filters degraded to full scans with `indices_loaded=0`. Dispatch index kind by property type via a shared `node_prop_index_kind`: enum + orderable scalar -> BTREE, free-text String -> FTS, Vector -> vector, list/Blob -> none. The helper is shared by the builder and `needs_index_work_node` so they cannot drift — the latter decides recovery- sidecar pinning, and under-reporting would leave a HEAD-advancing index build uncovered (invariant 5). Tests: scalar_indexes.rs asserts enum/DateTime/numeric @index columns report `IndexCoverage::Indexed` while free-text String/un-annotated columns stay `Degraded` (negative control). Docs: docs/user/indexes.md. * feat(engine): reindex in optimize to keep index coverage current A scalar/FTS/vector index only covers the fragments it was built over. Rows appended after the build (e.g. `ingest --mode merge`, whose commit does not rebuild an existing index) are scanned unindexed, and `compact_files` rewrites fragments out of coverage. Nothing folded them back in, so coverage decayed as the graph grew — even the id/src/dst BTREEs that power traversal. `optimize_one_table` now runs Lance `optimize_indices` after `compact_files` (incremental merge, not retrain — the same compact->optimize_indices sequence LanceDB's `optimize()` uses) and enters the publish path on compaction work OR stale index coverage (new `TableStore::has_unindexed_fragments`, reusing the fragment_bitmap logic). `optimize_indices` is a committing call with no uncommitted variant in lance-6.0.1, so it is an inline-commit residual covered by the existing `SidecarKind::Optimize` recovery sidecar spanning both ops. Blob-bearing tables are still skipped (the Lance blob-compaction bug is compaction-specific; reindex-for-blob deferred as a noted follow-up). Tests: maintenance.rs asserts an appended fragment is uncovered before and covered after optimize, and idempotency holds (second pass is a no-op). lance_surface_guards pins the `optimize_indices` signature and its incremental- coverage behavior. The existing optimize Phase-B recovery failpoint now also exercises a crash after reindex. Docs: maintenance.md, writes.md, invariants.md, lance.md, AGENTS.md. * fix(engine): coerce pushdown filter literals to the column type Filter literals were pushed to Lance in their natural Arrow type (every integer Int64, every float Float64). Against a narrower indexed column DataFusion widens to the literal's type and casts the COLUMN (`CAST(n32 AS Int64)`), which defeats the scalar BTREE and degrades to a full filtered read. A physical-plan probe confirms it: an Int32 column filtered by an i32 literal uses `ScalarIndexQuery`; by an i64 literal it does not. Thread the scan's `arrow_schema` through `build_lance_filter_expr` -> `ir_filter_to_expr` and coerce each literal operand to the opposite column's exact Arrow type, reusing `projection::literal_to_array` + `arrow_cast` (the same path the in-memory arm uses, so the two arms agree). Coercion never demotes a filter to None: on failure it falls back to the natural literal, because a node scan has no in-memory fallback for inline filters. Supersedes the date-specific change in `e4ef67b` (PR1): the probe shows dates were never index-defeated — temporal coercion casts the LITERAL, not the column — so PR1's index-use rationale was wrong though harmless. The generic coercion subsumes it; `literal_to_expr`'s date arms revert to the natural Utf8 fallback, and its unit tests now assert the live coerced path. Tests: surface guard `scalar_index_use_requires_matched_literal_type` pins the substrate behavior (matched -> index, widened -> column-cast full scan); unit tests cover Int32/UInt32/Float32 coercion, range op, reversed operand order, and the natural fallback; `literal_filters` adds an I32 column with equality + range and an F32 pushdown case. * fix(engine): only coerce filter literals when the cast is lossless The literal coercion in `f064121` narrowed unconditionally. typecheck permits numeric cross-type comparisons (`types_compatible`), so an out-of-domain literal reaches `literal_to_typed_expr` and casts lossily: a fractional float vs an integer column truncates (`{ count: 2.7 }` -> `count = 2`, wrongly matching the count=2 row) and an out-of-range integer overflows to null (`count < 3e9` on I32 -> `count < NULL` -> empty). Both silently change results, and a node scan has no in-memory fallback for inline filters. Add a lossless guard for integer targets: round-trip the cast back to the natural type and, on mismatch, return None so the caller keeps the natural literal (correct via DataFusion coercion; the index is just unused for that out-of-domain predicate). Float targets stay coerced -- narrowing F64 -> F32 is the column's own precision domain, not a value error. Resolves the two valid review findings on PR #216 (Codex float truncation, Greptile out-of-range). Tests: unit cases for fractional/out-of-range fallback vs whole-float/in-range coerce vs F32 exemption; e2e `{ count: 2.7 }` returns no rows.		2026-06-14 16:31:19 +02:00
.cargo	Raise LANCE_MEM_POOL_SIZE to 1 GB in .cargo/config.toml	2026-04-19 22:27:49 +03:00
.context	Investigate Lance MergeInsertBuilder CAS granularity (MR-766 prereq)	2026-04-28 23:30:17 +00:00
.github	ci: run Test Workspace only on main, not on pull requests (#212 )	2026-06-13 19:23:41 +03:00
crates	fix(engine): scalar index coverage + filter literal coercion (query latency) (#216 )	2026-06-14 16:31:19 +02:00
docker	feat(docker): cluster-mode entrypoint and the CLI in the image	2026-06-10 22:44:54 +03:00
docs	fix(engine): scalar index coverage + filter literal coercion (query latency) (#216 )	2026-06-14 16:31:19 +02:00
scripts	ci: pin RustFS to 1.0.0-beta.8	2026-06-10 18:44:05 +03:00
.dockerignore	feat(docker): cluster-mode entrypoint and the CLI in the image	2026-06-10 22:44:54 +03:00
.gitignore	release: v0.5.0 (#115 )	2026-05-23 13:59:42 +01:00
AGENTS.md	fix(engine): scalar index coverage + filter literal coercion (query latency) (#216 )	2026-06-14 16:31:19 +02:00
Cargo.lock	feat(cli): plane-grouped --help + clap 4.6.1 (RFC-010 Slice 2) (#220 )	2026-06-14 01:49:40 +03:00
Cargo.toml	feat(cli): plane-grouped --help + clap 4.6.1 (RFC-010 Slice 2) (#220 )	2026-06-14 01:49:40 +03:00
CLAUDE.md	Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it	2026-04-28 23:10:09 +02:00
CODE_OF_CONDUCT.md	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00
CONTRIBUTING.md	governance: external contribution model (issues/discussions/RFCs/PRs) (#143 )	2026-06-06 23:58:08 +03:00
Dockerfile	feat(docker): cluster-mode entrypoint and the CLI in the image	2026-06-10 22:44:54 +03:00
GOVERNANCE.md	governance: external contribution model (issues/discussions/RFCs/PRs) (#143 )	2026-06-06 23:58:08 +03:00
LICENSE	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00
og-cheet-sheet.md	feat: inline query strings in CLI and HTTP server (#110 )	2026-05-29 13:41:54 +02:00
omnigraph.example.yaml	example config: use graphs / cli.graph, matching the MR-603 rename	2026-04-18 23:40:35 +03:00
openapi.json	feat: canonical POST /load, deprecate /ingest (RFC-009 Phase 5) (#222 )	2026-06-14 03:32:16 +03:00
README.md	feat(cli)!: unified load command; deprecate ingest as an alias	2026-06-11 04:18:00 +03:00
rust-toolchain.toml	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00
SECURITY.md	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00

README.md

Omnigraph

Lakehouse native graph engine built for context assembly

Omnigraph acts as operational state & coordination layer for agents

Git-style versioning & branching
Multimodal retrieval (graph+vector/fts+filters) optimized for context assembly
Object storage native (S3, RustFS)
Native blob-as-data support (docs, images, videos, etc)
VPC, On-prem, hybrid deployment
Lance format as open storage layer

AS CODE	What it means
Schema AS CODE	Typed `.pg` schemas, planned, applied, enforced
Context AS CODE	Linted queries & agentic nudges, versioned and reusable
Security AS CODE	Cedar policies enforced server-side on every mutation
Dashboards AS CODE	Declarative views & controls over the graph (coming)

Core Use Cases

Use case	What it's for
Company brain	Org knowledge unified into one queryable graph
Context graph	Decision traces and codified tribal knowledge
Agentic memory	Durable, versioned memory for long-running agents
Dev graph	Issues & dependency model for coding agents
R&D data layer	Experiments & trials data written into branches
ML workflows	Versioned, branchable graphs for training & eval
Karpathy's LLM wiki	A living, agent-updatable knowledge base

Quick Install

curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/install.sh | bash

This installs omnigraph and omnigraph-server into ~/.local/bin from published release binaries.

Or install with Homebrew:

brew tap ModernRelay/tap
brew install ModernRelay/tap/omnigraph

For starter graphs and agent skills to bootstrap and operate Omnigraph, see ModernRelay/omnigraph-cookbooks.

One-Command Local RustFS Bootstrap

curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/local-rustfs-bootstrap.sh | bash

That bootstrap:

starts RustFS on 127.0.0.1:9000
creates a bucket and S3-backed graph
loads the checked-in context fixture
launches omnigraph-server on 127.0.0.1:8080

Docker must be installed and running first.

The RustFS bootstrap prefers the rolling edge binaries and only falls back to source builds when release assets are unavailable.

If a previous run left objects under the same graph prefix but did not finish initializing the graph, rerun with RESET_REPO=1 or set PREFIX to a new value.

Common Commands

The same URI works for local paths, s3://…, or http://host:port.

omnigraph init   --schema ./schema.pg ./graph.omni
omnigraph load   --data   ./data.jsonl ./graph.omni
omnigraph read   --query  ./queries.gq --name get_person --params '{"name":"Alice"}' ./graph.omni
omnigraph change --query  ./queries.gq --name insert_person --params '{"name":"Mina"}' ./graph.omni
omnigraph branch create --from main feature-x ./graph.omni
omnigraph branch merge  feature-x --into main ./graph.omni

See docs/user/cli.md for schema apply, snapshots, data loading, commits, and policy commands.

Clients

For programmatic access to a running omnigraph-server:

TypeScript SDK — @modernrelay/omnigraph (source). Instance-per-client, typed errors, camelCase types, async-iterator streaming export.
```
npm install @modernrelay/omnigraph
```
Model Context Protocol server — @modernrelay/omnigraph-mcp (source). Bridges Omnigraph to LLM hosts (Claude Desktop, Claude Code, …) over stdio. Exposes tools and resources for schema, branches, queries, mutations, ingest, and bundles curated best-practices guidance from the cookbook.
```
npm install -g @modernrelay/omnigraph-mcp
```

Both packages are versioned in lockstep with omnigraph-server on major.minor: @modernrelay/omnigraph@X.Y.* targets omnigraph-server@X.Y.*. See ModernRelay/omnigraph-ts for the monorepo.

Docs

Build And Test

cargo build --workspace
cargo check --workspace
cargo test --workspace

Notes:

Rust stable toolchain, edition 2024
CI runs cargo test --workspace --locked
Full CI and some local test flows require protobuf-compiler
S3 integration tests expect an S3-compatible endpoint such as RustFS

Workspace Crates

crates/omnigraph-compiler: shared schema/query parser, typechecker, catalog, and IR lowering
crates/omnigraph: storage/runtime, branching, merge, change detection, and query execution
crates/omnigraph-cli: CLI for graph lifecycle (init/load), query/mutate, branch/commit/merge, schema/lint, snapshot/export, policy, and maintenance (optimize/cleanup)
crates/omnigraph-server: Axum HTTP server for remote reads, changes, ingest, export, branches, and commits

Contributing

Please open an issue, spec, or design discussion before sending large code changes. Design feedback and concrete problem statements are the fastest way to collaborate on the roadmap.

Community

Join the Omnigraph Slack community to ask questions, share feedback, and follow development.