# Omnigraph v0.5.0 Omnigraph v0.5.0 is a substrate, security, and migration-safety release. It jumps the storage substrate from Lance 4 to Lance 6.0.1 (DataFusion 52 → 53, Arrow 57 → 58), introduces engine-wide Cedar policy enforcement on every authoring path, and ships a structured schema-lint v1 chassis with code-tagged diagnostics, soft drops, and an explicit `--allow-data-loss` flag for destructive migrations. ## Highlights - **Lance 6.0.1 substrate**: bump from Lance 4.0.0 → 6.0.1, DataFusion 52 → 53, Arrow 57 → 58. New optimizer rules (vectorized `IN`-list eq kernel, `PhysicalExprSimplifier`, push-limit-into-hash-join, CASE-NULL shortcut) reach predicates that flow through the engine. `lance-tokenizer` replaces tantivy internally; FTS behavior preserved. - **Cedar policy engine**: a new `omnigraph-policy` crate wires `Omnigraph::enforce(action, scope, actor)` into every `_as` writer (`mutate_as`, `load_as`, `apply_schema_as`, `branch_create_as`, `branch_merge_as`, `branch_delete_as`, plus the load and change variants). The HTTP server defaults to deny-all when no Cedar policy is configured; a YAML policy file is required to enable writes. Actor identity comes only from signed token claims — clients cannot set actor identity directly. - **Schema lint v1 chassis**: diagnostics now carry stable codes of the form `OG-XXX-NNN` instead of free-form messages. `omnigraph schema plan` and `apply` understand soft drops on properties and types — destructive drops require the new `--allow-data-loss` flag (Hard mode) at the CLI and an equivalent JSON flag over HTTP. - **Structured filter pushdown**: query-language predicates lower to DataFusion `Expr` and push down through Lance's `Scanner::filter_expr` instead of being flattened to SQL strings. This unlocks `CompOp::Contains` pushdown (via `array_has`), which previously fell through to in-memory post-scan filtering, and lets the DataFusion 53 optimizer rules above act on our predicates. - **HTTP `allow_data_loss` parity**: the destructive-drop guard now exists on both the CLI (`--allow-data-loss`) and HTTP (`allow_data_loss: true` in the schema-apply request body). - **Inline query strings on CLI and HTTP**: `omnigraph read` / `omnigraph mutate` and the corresponding HTTP endpoints accept inline `.gq` source, not just a file path. Easier ad-hoc queries, clearer request logs. - **Browser CORS layer**: optional CORS layer on `omnigraph-server` for browser-based UIs, gated by `OMNIGRAPH_CORS_ORIGINS`. - **Merge-insert dup-rowid fix**: Lance's `MergeInsertBuilder` could surface spurious `"Ambiguous merge inserts"` errors on sequential merges against rows previously rewritten by `merge_insert`. The engine now opts into `SourceDedupeBehavior::FirstSeen` with a `check_batch_unique_by_keys` fail-fast precondition that guarantees source-side dedup happens before Lance sees the batch. - **Branch-merge error-path recovery**: a branch merge that failed mid-flight could leave the in-process coordinator pointing at a stale active branch. The error path now restores the prior coordinator, matching the success path's invariant. - **Branch merge with blob columns**: external blob URIs are now materialized correctly during branch merge instead of being dropped or pointing at the source branch. - **Lance API surface guards**: a new test file (`crates/omnigraph/tests/lance_surface_guards.rs`) pins eight specific Lance API surfaces (`LanceError::TooMuchWriteContention`, `ManifestLocation` fields, `MergeInsertBuilder` return shape, `WriteParams::default`, `compact_files` signature, etc.) so the next Lance bump fails compile or runtime on any silent drift rather than producing wrong-state recovery in production. ## Behavior changes - **On-disk format unchanged**: existing v0.4.2 datasets open unchanged. The Lance file format pin stays at V2_2 (required by Lance's blob v2 feature). - **`omnigraph-server` defaults to deny-all under `--policy`**: starting a server with the policy feature enabled but no Cedar YAML policy configured rejects every write. Operators must supply a policy file to authorize anything. - **Schema-lint diagnostics carry stable codes**: messages now lead with `OG-XXX-NNN`. CI parsers or tooling that keyed off the v0.4.2 free-form text need to switch to code-based matching. - **Destructive schema drops require `--allow-data-loss`**: dropping a property or type returns a structured diagnostic by default. `omnigraph schema apply --allow-data-loss` (CLI) or `{"allow_data_loss": true}` (HTTP) opts into Hard mode. - **`HashJoinExec` null-aware semantics on anti-join**: a side effect of the DataFusion 53 bump — `NOT IN` semantics under null-valued anti-join columns are now correct per SQL standard. Queries that depended on the prior behavior would have been incorrect. ## Upgrade Notes ### Migration - No data migration. v0.4.2 repos open directly on v0.5.0. ### Clients - HTTP and SDK clients should switch any string-matching schema-lint parsing to code-based matching against the `OG-XXX-NNN` prefix. - Clients exercising destructive schema drops (`DropProperty`, `DropType`) must add the `allow_data_loss` request field (HTTP) or `--allow-data-loss` flag (CLI). Default is soft-drop-or-reject. - Clients consuming `mutate_as` / `load_as` / `apply_schema_as` / branch authoring APIs now flow through the policy enforcer. Anything bypassing authorization on v0.4.2 will be rejected on v0.5.0 once a policy is configured. ### Operators - Configure a Cedar policy YAML for production servers before enabling writes; deny-all is the new default. The `omnigraph policy validate` / `test` / `explain` CLI commands are unchanged. - Bearer tokens continue to be the actor-identity source; review the signed-token-claim-only invariant in `docs/dev/invariants.md` if you've built custom authentication. - If your local CI uses RustFS for S3-compatible storage testing, our CI pins `rustfs/rustfs:1.0.0-beta.3` (the last known-good tag before the upstream credentials-policy change). Mirror the pin or set `RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true` for the new image versions. ## Tests added or strengthened - `crates/omnigraph/tests/lance_surface_guards.rs` — 8 named guards pinning Lance API surfaces against silent drift on future bumps. - `crates/omnigraph/tests/policy_engine_chassis.rs` — engine-level policy enforcement coverage; complements the existing HTTP policy tests. - Policy chassis e2e gap-fills — branch-merge, branch-create, branch-delete policy paths now have explicit end-to-end tests over HTTP and CLI. - Merge-pair truth table — exhaustive op-variant matrix for three-way merge across `noop`, `addNode`, `removeNode`, `addEdge`, `removeEdge`, `setProperty`, `dropProperty`, `addLabel`, `removeLabel`; the build fails to compile when a new op variant is added without dispositioning every pairing. - Merge-insert: regression for the dup-rowid bug class on the load surface (`load_merge_repeated_against_overlapping_keys_succeeds`), the update surface (`second_sequential_update_on_same_row_succeeds`), and the upstream-Lance-gap canary (`load_merge_window_2_documents_upstream_lance_gap`). - Maintenance + destructive-migration coverage — `omnigraph optimize` / `cleanup` boundary cases, plus schema-apply soft-drop and Hard-mode paths. - Stable-row-id preservation across `stage_overwrite` — pins the invariant that staged overwrites carry stable row IDs through to the committed fragment set. - `CompOp::Contains` pushdown regression (`ir_filter_with_list_contains_pushes_down`) — pins the new structured Expr pushdown path that retired the in-memory fallback. ## Included Changes - Lance 4 → 6.0.1, DataFusion 52 → 53, Arrow 57 → 58 substrate upgrade. - `omnigraph-policy` crate with engine-wide Cedar enforcement and signed-token-claim-only actor identity. - Schema-lint v1 chassis with `OG-XXX-NNN` codes, soft `DropProperty` / `DropType` semantics, and `--allow-data-loss` for Hard mode. - HTTP `allow_data_loss` request field parity with the CLI flag. - Structured DataFusion `Expr` filter pushdown via `Scanner::filter_expr`, with `CompOp::Contains` lowered through `array_has`. - Inline `.gq` source acceptance on CLI and HTTP read/mutate endpoints. - Optional CORS layer on `omnigraph-server` for browser UIs. - Bug fixes: merge-insert dup-rowid (FirstSeen + uniqueness precondition), branch-merge coordinator restore on error, blob-column materialization during branch merge. - New Lance API surface-guard test file as the canary for future Lance bumps. - Recovery-sidecar coverage extended across the four write paths (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) with failpoint regression tests. - CI: pinned `rustfs/rustfs:1.0.0-beta.3` after the upstream `:latest` introduced a credentials-policy change. - Version bump to `0.5.0` across workspace crates, `Cargo.lock`, `openapi.json`, and the `AGENTS.md` surveyed version.