omnigraph/AGENTS.md
Ragnor Comerford 6f25c4f9f8
Address reviewer feedback (Cursor + cubic) on PR #60
All eight comments verified against source and applied:

- AGENTS.md: pull @docs/{invariants,lance,testing}.md imports out of
  the markdown blockquote. Claude Code's @-import parser expects @ at
  column 0; the leading "> " of a blockquote silently broke
  recognition, so the claimed auto-include did nothing. (Cursor,
  Medium severity.)
- docs/cli-reference.md: command-family count 13 → 17. The current
  enum Command in crates/omnigraph-cli/src/main.rs has 17 top-level
  variants. (cubic P2.)
- docs/ci.md: Homebrew tap update is a regular `git push`, not a
  force-push (release.yml:117 is `git push origin HEAD:main`). (cubic
  P2.)
- docs/errors.md: add the Storage variant to the NanoError list — it
  exists at error.rs:88-89 but the doc enumerated only 10 of 11.
  (cubic P2.)
- docs/storage.md: clarify tombstone semantics. There is no
  tombstone_version column; state.rs:180 reads the tombstone version
  from the table_version column on rows where object_type =
  table_tombstone. (cubic P2.)
- docs/branches-commits.md: split the GraphCommit pseudo-struct from
  the underlying storage. actor_id is joined in-memory from
  _graph_commit_actors.lance, not a column on _graph_commits.lance.
  (cubic P2.)
- docs/schema-language.md: rename IR_VERSION to SCHEMA_IR_VERSION to
  match the actual constant name in catalog/schema_ir.rs:11.
  (cubic P3.)
- docs/testing.md: engine integration test count 16 → 15 (matches
  `ls crates/omnigraph/tests/*.rs`). (cubic P3.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 00:09:06 +02:00

16 KiB

OmniGraph — Agent Guide

This file is the always-on map for AI coding agents (Claude Code, Codex, Cursor, Cline) working in this repo. It is loaded into context on every turn, so it stays as a map plus the rules and invariants that need to be in scope at all times — the encyclopedia content lives under docs/. When you need depth, follow a pointer.

Required reading every session, every change:

  1. docs/invariants.md — the architectural invariants and §IX deny-list. Apply to every PR, not only architecture work.
  2. docs/lance.md — the curated index of upstream Lance docs. Consult it before every task to identify which Lance pages are relevant to what you're about to do, then fetch those upstream URLs before grepping our code or guessing. Lance is the substrate; behavior is documented there, not here.
  3. docs/testing.md — the test-coverage map. Always check what already covers your change before writing a new test. Extending an existing test (an assertion, a fixture row, a parameterization) is preferred over a duplicated init_and_load() block. Walk the before-every-task checklist to identify existing coverage, run those tests as a clean baseline, and only add a new test fn or file when no existing one owns the area.

Tools that support @-imports (Claude Code) auto-include all three files via the imports below — note these must sit at column 0 (not inside a blockquote) for the parser to recognize them. Other agents (Codex, Cursor, Cline, …) must open them explicitly at the start of each session.

@docs/invariants.md @docs/lance.md @docs/testing.md

CLAUDE.md is a symlink to this file — there is exactly one source of truth. Edit AGENTS.md.

Version surveyed: 0.3.1 Workspace crates: omnigraph-compiler, omnigraph (engine), omnigraph-cli, omnigraph-server Storage substrate: Lance 4.x (columnar, versioned, branchable) License: MIT Toolchain: Rust stable, edition 2024


Start here — what is this?

OmniGraph is a typed property-graph engine built as a coordination layer over many Lance datasets. Highlights:

  • Storage: per node/edge type a separate Lance dataset; multi-dataset commits coordinated atomically through one __manifest table.
  • Languages: a .pg schema language and a .gq query language, both Pest-based, with a typed IR.
  • Multi-modal querying: vector ANN (nearest), full-text (search/fuzzy/match_text/bm25), Reciprocal Rank Fusion (rrf), and graph traversal (Expand, anti-join not { … }) in one runtime.
  • Branches and commits across the whole graph: Git-style — every successful publish appends to a commit DAG; merges are three-way at the row level.
  • Transactional runs: ephemeral __run__<id> branches for isolated mutation, fast-path or merge-path publish.
  • HTTP server: Axum + utoipa OpenAPI, bearer auth (SHA-256 hashed, optional AWS Secrets Manager), Cedar policy gating.
  • CLI driven by a single omnigraph.yaml; multi-format output (json/jsonl/csv/kv/table).

Throughout the docs, capabilities are split into L1 — Inherited from Lance vs L2 — Added by OmniGraph.


Architecture at a glance

CLI (omnigraph)        HTTP Server (omnigraph-server, Axum)
        │                            │
        └─────────────┬──────────────┘
                      ▼
           omnigraph-compiler  ── Pest grammars, catalog, IR, lowering, lint, migration plan
                      │
                      ▼
           omnigraph (engine)  ── ManifestRepo, CommitGraph, RunRegistry, GraphIndex (CSR/CSC), exec
                      │
                      ▼
              Lance 4.x         ── columnar Arrow, fragments, per-dataset versions/branches, indexes
                      │
                      ▼
        Object store (file / s3 / RustFS / MinIO / S3-compat)

Full diagram and concurrency model: docs/architecture.md.


Where to find each topic

Area Read
Architectural invariants & deny-list (read before any non-trivial proposal or review) docs/invariants.md
Lance docs index — fetch upstream Lance docs by problem domain docs/lance.md
Test coverage map — what's covered, what helpers to reuse, before-every-task checklist docs/testing.md
Architecture, L1/L2 framing, concurrency model docs/architecture.md
Storage layout, __manifest schema, URI schemes, S3 env vars docs/storage.md
.pg schema language, types, constraints, annotations, migration planning docs/schema-language.md
.gq query language, MATCH/RETURN/ORDER, search funcs, mutations, IR ops, lint codes docs/query-language.md
Indexes (BTREE / inverted / vector / graph topology) docs/indexes.md
Embeddings (compiler + engine clients, env vars, @embed) docs/embeddings.md
Branches, commit graph, snapshots, system branches docs/branches-commits.md
Runs (transactional graph mutations, __run__<id>, publish paths) docs/runs.md
Three-way merge and conflict kinds docs/merge.md
Diff / change feed (diff_between, diff_commits) docs/changes.md
Query execution, mutation execution, bulk loader, load vs ingest docs/execution.md
optimize (compaction) and cleanup (version GC) docs/maintenance.md
Cedar policy actions, scopes, CLI docs/policy.md
HTTP server endpoints, auth, error model, body limits docs/server.md
CLI quick-start docs/cli.md
CLI command surface and omnigraph.yaml schema docs/cli-reference.md
Audit / actor tracking docs/audit.md
Error taxonomy and result serialization docs/errors.md
Install (binary / Homebrew / source / channels) docs/install.md
Deployment (binary / container / RustFS bootstrap / auth / build variants) docs/deployment.md
CI / release workflows docs/ci.md
Constants & tunables cheat sheet docs/constants.md
Per-version release notes docs/releases/

Always-on rules (load these into your working memory)

These are architectural rules that need to be in scope on every change. They're framed at the level that survives renames and refactors — the deeper implementation specifics (function names, lock names, branch-prefix conventions, enforcement points) live in the per-area docs and may evolve. The full architectural invariants and deny-list are in docs/invariants.md; §IX (deny-list) is the fastest first-pass when reviewing any change.

  1. Multi-dataset publish is atomic across the whole graph. A graph commit flips every relevant sub-table version visible together, in one manifest write. Don't introduce code paths that publish per sub-table outside the unified publish path — that loses cross-table snapshot isolation.
  2. Snapshot isolation per query. A query holds one snapshot for its lifetime. Don't re-read the current head mid-query.
  3. Mutations are atomic at the commit boundary. Multi-statement change queries publish one commit. Don't commit per-statement.
  4. Bearer-token plaintext never persists in process memory. Tokens are hashed at startup; auth uses constant-time comparison; the actor id is server-resolved from the hash match and must not be settable by the client.
  5. Reads always see the current index state for the branch they're reading. Indexes track the branch head, not historical snapshots. If you change index lifecycle, preserve this guarantee.
  6. Stable type IDs survive renames. Schema migration relies on identity that's stable across rename — don't mint new IDs on rename.

Deny-list (fast-pass review filter — full reasoning in docs/invariants.md §IX)

If a proposal fits one of these, the burden is on the proposer to justify why this case is the exception:

  • Synchronous-inline index updates for indexes expensive to build (vector ANN, FTS) — use the reconciler pattern.
  • Custom WAL / transaction manager / buffer pool — Lance owns these.
  • Job queue for state derivable from manifest — reconciler pattern instead.
  • Per-feature lowering for shapes that share a structure (interfaces, wildcards, alternation) — use one mechanism.
  • Eager materialization of cross-products in multi-hop — factorize; flatten only when needed.
  • Ad-hoc IN-list filtering when SIP fits.
  • String-flattened SQL filter generation when structured pushdown is available.
  • In-process-only Dataset impls — Send + Sync, remote descriptors.
  • Cost-blind plan choice — lowering-order execution is not a planner.
  • Hidden statistics — if a metric matters for plan choice, it must be exposed through the trait surface.
  • Side-channels for query semantics — search modes, mutations, polymorphism are first-class IR concepts.
  • Discarding rank in retrieval — score and rank propagate as columns.
  • State that drifts from the manifest — derive from observable state.
  • Cloud-only correctness fixes — correctness is always OSS.
  • Forking the codebase for Cloud — trait-extension only.
  • Hand-rolling something Lance already does — check the spec first.
  • Mutating in place state that should be immutable (Lance fragments, index segments) — new segments instead.
  • Silent failures — OOM, timeout, partial result must all be surfaced and bounded.

Quick-reference flows

# Initialize an S3-backed repo
omnigraph init --schema ./schema.pg s3://my-bucket/repo.omni

# Bulk load
omnigraph load --data ./seed.jsonl --mode overwrite s3://my-bucket/repo.omni

# Branch + ingest a review batch
omnigraph branch create --from main review/2026-04-25 s3://my-bucket/repo.omni
omnigraph ingest --branch review/2026-04-25 --data ./batch.jsonl s3://my-bucket/repo.omni

# Run a hybrid (vector + BM25) query
omnigraph read --query ./queries.gq --name find_similar \
  --params '{"q":"trends in AI safety"}' --format table s3://my-bucket/repo.omni

# Plan + apply schema migration
omnigraph schema plan  --schema ./next.pg s3://my-bucket/repo.omni
omnigraph schema apply --schema ./next.pg s3://my-bucket/repo.omni --json

# Merge review branch back
omnigraph branch merge review/2026-04-25 --into main s3://my-bucket/repo.omni

# Compact + GC (preview, then confirm)
omnigraph optimize s3://my-bucket/repo.omni
omnigraph cleanup  --keep 10 --older-than 7d s3://my-bucket/repo.omni
omnigraph cleanup  --keep 10 --older-than 7d --confirm s3://my-bucket/repo.omni

# Stand up the HTTP server (token from env)
OMNIGRAPH_SERVER_BEARER_TOKEN=xxxx \
  omnigraph-server s3://my-bucket/repo.omni --bind 0.0.0.0:8080

# Cedar policy explain
omnigraph policy explain --actor act-alice --action change --branch main

Capability matrix — "Lens by default vs. added by OmniGraph"

Capability L1 (Lance default) L2 (OmniGraph adds)
Columnar storage on object store Arrow/Lance URI normalization, S3 env-var plumbing
Per-dataset versioning + time travel snapshot_at_version, entity_at, snapshot-pinned reads across many tables
Per-dataset branches Graph-level branches (atomic across all sub-tables), lazy fork, system branch filtering
Atomic single-dataset commits Atomic multi-dataset publish via __manifest + ManifestBatchPublisher
Compaction (compact_files) omnigraph optimize orchestrates over all node/edge tables, bounded concurrency
Cleanup (cleanup_old_versions) omnigraph cleanup with --keep / --older-than policy
BTREE / inverted (FTS) / vector indexes ensure_indices builds them on every relevant column; idempotent; lazy across branches
merge_insert upsert LoadMode::Merge, mutation update/insert/delete lowering
Vector search nearest() query op; embedding pipeline (Gemini / OpenAI clients); @embed in schema
Full-text search search/fuzzy/match_text/bm25 query ops
Hybrid ranking rrf(...) Reciprocal Rank Fusion in one runtime
Graph traversal CSR/CSC topology index, Expand IR op, variable-length hops, not { } anti-join
Schema language .pg + Pest grammar + catalog + interfaces + constraints + annotations
Query language .gq + Pest grammar + IR + lowering + linter
Schema migration planning plan_schema_migration + apply_schema step types + __schema_apply_lock__
Commit graph (DAG) across whole repo _graph_commits.lance with linear + merge parents, ULID ids, actor map
Transactional runs _graph_runs.lance, __run__<id> ephemeral branches, fast-path & merge-path publish
Three-way row-level merge OrderedTableCursor + StagedTableWriter, structured MergeConflictKind
Change feeds diff_between / diff_commits with manifest fast path + ID streaming
Cedar policy 10 actions, branch / target_branch / protected scopes, validate/test/explain CLI
HTTP server Axum, OpenAPI via utoipa, bearer auth (SHA-256, AWS Secrets Manager option), policy gating, NDJSON streaming export
CLI with config omnigraph.yaml, aliases, multi-format output (json/jsonl/csv/kv/table)
Audit / actor tracking _as write APIs + actor maps in commit & run datasets
Local RustFS bootstrap scripts/local-rustfs-bootstrap.sh one-shot S3-backed dev environment

Maintenance contract for agents

When you change something user-visible, update the relevant docs/<area>.md in the same change. Pointers from this file to that doc must keep working — CI enforces cross-link integrity via scripts/check-agents-md.sh.

When proposing or reviewing a non-trivial change, walk docs/invariants.md — at minimum the §IX deny-list and §X review checklist. Add to the deny-list when a new anti-pattern surfaces; relaxing an invariant requires the same review process as code.

Rules:

  1. Update in the same PR. New endpoint, query function, CLI flag, env var, constant, schema construct, or invariant: update both the source code and the doc in the same change. Never split documentation drift into a follow-up.
  2. Bump version on release. When a release boundary crosses (e.g. v0.3.1 → v0.3.2), update the version line at the top of this file and add a docs/releases/<version>.md describing the user-visible delta. Update docs/architecture.md only if the architecture itself changed.
  3. Don't lie. If a section becomes wrong but you can't rewrite it fully right now, replace the wrong line with *(stale — needs update after <change>)* rather than leaving silently incorrect text. Then fix it ASAP.
  4. Re-verify before recommending. If you cite a flag, env var, endpoint, or constant to the user or in code, grep for it in source first. Memory and docs go stale; the code is authoritative.
  5. Keep AGENTS.md a map, not an encyclopedia. New deep content goes into docs/. Add an entry to "Where to find each topic" instead of pasting prose into this file. The "Always-on rules" section is the exception — it's for invariants that should always be in scope.
  6. Re-read on schema/query/IR changes. Edits to schema.pest, query.pest, ir/lower.rs, query/typecheck.rs, or query/lint.rs should trigger a re-read of docs/schema-language.md, docs/query-language.md, and docs/execution.md to confirm they still describe reality.

CI check: scripts/check-agents-md.sh verifies that every docs/*.md link in this file resolves and that every doc in the canonical set is linked. Run it locally before opening a PR if you've moved or renamed docs.