omnigraph/docs/releases/v0.5.0.md
Andrew Altshuler bb1fe57640
release: v0.5.0 (#115)
* gitignore: exclude docs/internal/ from publication

Mirrors the existing "Local-only working files (not for the public
repo)" pattern. Working notes filed under docs/internal/ stay on the
contributor's machine instead of cluttering the published doc tree
or tripping the AGENTS.md / docs-index cross-link check
(scripts/check-agents-md.sh enumerates every docs/*.md and requires
each one to be linked from an audience index — internal notes don't
have an audience index by definition).

Incidental to the v0.5.0 release; lands separately from the version
bump commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: skip docs/internal/ in agents-md cross-link check

Matches the .gitignore exclusion. Mirrors the existing 'docs/releases/'
exclusion pattern: notes under docs/internal/ aren't part of the
published doc tree and don't need to be linked from an audience index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release: v0.5.0 — Lance 6 substrate, Cedar policy engine, schema-lint v1

Bumps the workspace from 0.4.2 to 0.5.0. Release notes at
docs/releases/v0.5.0.md.

Three user-visible pillars motivate the minor bump:
  1. Lance 6.0.1 substrate (DataFusion 52→53, Arrow 57→58)
  2. Engine-wide Cedar policy enforcement on every _as writer; server
     defaults to deny-all; signed-token-claim-only actor identity
  3. Schema-lint v1 chassis: OG-XXX-NNN codes, soft drops, and
     `--allow-data-loss` (Hard mode) for destructive migrations

Plus structured DataFusion Expr filter pushdown (unblocks
CompOp::Contains via array_has), HTTP allow_data_loss parity, inline
.gq sources on CLI/HTTP, optional CORS layer, and bug fixes
(merge-insert dup-rowid, branch-merge coordinator restore on error,
blob columns in branch merge).

Sites bumped:
  - 5 crate [package].version lines (omnigraph, omnigraph-cli,
    omnigraph-compiler, omnigraph-policy, omnigraph-server)
  - 10 internal path-dep `version = "..."` constraints across the
    four manifests that depend on sister crates (engine, server, cli,
    plus engine's dev-dep on the compiler)
  - Cargo.lock (regenerated via cargo update --workspace)
  - AGENTS.md "Version surveyed:"
  - openapi.json `info.version` (regenerated via
    OMNIGRAPH_UPDATE_OPENAPI=1 cargo test -p omnigraph-server --test
    openapi)

Verification:
  - cargo test --workspace --locked: 907/907 green
  - cargo test -p omnigraph-engine --test failpoints --features
    failpoints: 19/19 green
  - cargo test -p omnigraph-engine --test lance_surface_guards: 3/3
  - scripts/check-agents-md.sh: clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 13:59:42 +01:00

8.9 KiB

Omnigraph v0.5.0

Omnigraph v0.5.0 is a substrate, security, and migration-safety release. It jumps the storage substrate from Lance 4 to Lance 6.0.1 (DataFusion 52 → 53, Arrow 57 → 58), introduces engine-wide Cedar policy enforcement on every authoring path, and ships a structured schema-lint v1 chassis with code-tagged diagnostics, soft drops, and an explicit --allow-data-loss flag for destructive migrations.

Highlights

  • Lance 6.0.1 substrate: bump from Lance 4.0.0 → 6.0.1, DataFusion 52 → 53, Arrow 57 → 58. New optimizer rules (vectorized IN-list eq kernel, PhysicalExprSimplifier, push-limit-into-hash-join, CASE-NULL shortcut) reach predicates that flow through the engine. lance-tokenizer replaces tantivy internally; FTS behavior preserved.
  • Cedar policy engine: a new omnigraph-policy crate wires Omnigraph::enforce(action, scope, actor) into every _as writer (mutate_as, load_as, apply_schema_as, branch_create_as, branch_merge_as, branch_delete_as, plus the load and change variants). The HTTP server defaults to deny-all when no Cedar policy is configured; a YAML policy file is required to enable writes. Actor identity comes only from signed token claims — clients cannot set actor identity directly.
  • Schema lint v1 chassis: diagnostics now carry stable codes of the form OG-XXX-NNN instead of free-form messages. omnigraph schema plan and apply understand soft drops on properties and types — destructive drops require the new --allow-data-loss flag (Hard mode) at the CLI and an equivalent JSON flag over HTTP.
  • Structured filter pushdown: query-language predicates lower to DataFusion Expr and push down through Lance's Scanner::filter_expr instead of being flattened to SQL strings. This unlocks CompOp::Contains pushdown (via array_has), which previously fell through to in-memory post-scan filtering, and lets the DataFusion 53 optimizer rules above act on our predicates.
  • HTTP allow_data_loss parity: the destructive-drop guard now exists on both the CLI (--allow-data-loss) and HTTP (allow_data_loss: true in the schema-apply request body).
  • Inline query strings on CLI and HTTP: omnigraph read / omnigraph mutate and the corresponding HTTP endpoints accept inline .gq source, not just a file path. Easier ad-hoc queries, clearer request logs.
  • Browser CORS layer: optional CORS layer on omnigraph-server for browser-based UIs, gated by OMNIGRAPH_CORS_ORIGINS.
  • Merge-insert dup-rowid fix: Lance's MergeInsertBuilder could surface spurious "Ambiguous merge inserts" errors on sequential merges against rows previously rewritten by merge_insert. The engine now opts into SourceDedupeBehavior::FirstSeen with a check_batch_unique_by_keys fail-fast precondition that guarantees source-side dedup happens before Lance sees the batch.
  • Branch-merge error-path recovery: a branch merge that failed mid-flight could leave the in-process coordinator pointing at a stale active branch. The error path now restores the prior coordinator, matching the success path's invariant.
  • Branch merge with blob columns: external blob URIs are now materialized correctly during branch merge instead of being dropped or pointing at the source branch.
  • Lance API surface guards: a new test file (crates/omnigraph/tests/lance_surface_guards.rs) pins eight specific Lance API surfaces (LanceError::TooMuchWriteContention, ManifestLocation fields, MergeInsertBuilder return shape, WriteParams::default, compact_files signature, etc.) so the next Lance bump fails compile or runtime on any silent drift rather than producing wrong-state recovery in production.

Behavior changes

  • On-disk format unchanged: existing v0.4.2 datasets open unchanged. The Lance file format pin stays at V2_2 (required by Lance's blob v2 feature).
  • omnigraph-server defaults to deny-all under --policy: starting a server with the policy feature enabled but no Cedar YAML policy configured rejects every write. Operators must supply a policy file to authorize anything.
  • Schema-lint diagnostics carry stable codes: messages now lead with OG-XXX-NNN. CI parsers or tooling that keyed off the v0.4.2 free-form text need to switch to code-based matching.
  • Destructive schema drops require --allow-data-loss: dropping a property or type returns a structured diagnostic by default. omnigraph schema apply --allow-data-loss (CLI) or {"allow_data_loss": true} (HTTP) opts into Hard mode.
  • HashJoinExec null-aware semantics on anti-join: a side effect of the DataFusion 53 bump — NOT IN semantics under null-valued anti-join columns are now correct per SQL standard. Queries that depended on the prior behavior would have been incorrect.

Upgrade Notes

Migration

  • No data migration. v0.4.2 repos open directly on v0.5.0.

Clients

  • HTTP and SDK clients should switch any string-matching schema-lint parsing to code-based matching against the OG-XXX-NNN prefix.
  • Clients exercising destructive schema drops (DropProperty, DropType) must add the allow_data_loss request field (HTTP) or --allow-data-loss flag (CLI). Default is soft-drop-or-reject.
  • Clients consuming mutate_as / load_as / apply_schema_as / branch authoring APIs now flow through the policy enforcer. Anything bypassing authorization on v0.4.2 will be rejected on v0.5.0 once a policy is configured.

Operators

  • Configure a Cedar policy YAML for production servers before enabling writes; deny-all is the new default. The omnigraph policy validate / test / explain CLI commands are unchanged.
  • Bearer tokens continue to be the actor-identity source; review the signed-token-claim-only invariant in docs/dev/invariants.md if you've built custom authentication.
  • If your local CI uses RustFS for S3-compatible storage testing, our CI pins rustfs/rustfs:1.0.0-beta.3 (the last known-good tag before the upstream credentials-policy change). Mirror the pin or set RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true for the new image versions.

Tests added or strengthened

  • crates/omnigraph/tests/lance_surface_guards.rs — 8 named guards pinning Lance API surfaces against silent drift on future bumps.
  • crates/omnigraph/tests/policy_engine_chassis.rs — engine-level policy enforcement coverage; complements the existing HTTP policy tests.
  • Policy chassis e2e gap-fills — branch-merge, branch-create, branch-delete policy paths now have explicit end-to-end tests over HTTP and CLI.
  • Merge-pair truth table — exhaustive op-variant matrix for three-way merge across noop, addNode, removeNode, addEdge, removeEdge, setProperty, dropProperty, addLabel, removeLabel; the build fails to compile when a new op variant is added without dispositioning every pairing.
  • Merge-insert: regression for the dup-rowid bug class on the load surface (load_merge_repeated_against_overlapping_keys_succeeds), the update surface (second_sequential_update_on_same_row_succeeds), and the upstream-Lance-gap canary (load_merge_window_2_documents_upstream_lance_gap).
  • Maintenance + destructive-migration coverage — omnigraph optimize / cleanup boundary cases, plus schema-apply soft-drop and Hard-mode paths.
  • Stable-row-id preservation across stage_overwrite — pins the invariant that staged overwrites carry stable row IDs through to the committed fragment set.
  • CompOp::Contains pushdown regression (ir_filter_with_list_contains_pushes_down) — pins the new structured Expr pushdown path that retired the in-memory fallback.

Included Changes

  • Lance 4 → 6.0.1, DataFusion 52 → 53, Arrow 57 → 58 substrate upgrade.
  • omnigraph-policy crate with engine-wide Cedar enforcement and signed-token-claim-only actor identity.
  • Schema-lint v1 chassis with OG-XXX-NNN codes, soft DropProperty / DropType semantics, and --allow-data-loss for Hard mode.
  • HTTP allow_data_loss request field parity with the CLI flag.
  • Structured DataFusion Expr filter pushdown via Scanner::filter_expr, with CompOp::Contains lowered through array_has.
  • Inline .gq source acceptance on CLI and HTTP read/mutate endpoints.
  • Optional CORS layer on omnigraph-server for browser UIs.
  • Bug fixes: merge-insert dup-rowid (FirstSeen + uniqueness precondition), branch-merge coordinator restore on error, blob-column materialization during branch merge.
  • New Lance API surface-guard test file as the canary for future Lance bumps.
  • Recovery-sidecar coverage extended across the four write paths (MutationStaging::finalize, schema_apply, branch_merge, ensure_indices) with failpoint regression tests.
  • CI: pinned rustfs/rustfs:1.0.0-beta.3 after the upstream :latest introduced a credentials-policy change.
  • Version bump to 0.5.0 across workspace crates, Cargo.lock, openapi.json, and the AGENTS.md surveyed version.