omnigraph/docs/releases/v0.4.0.md
Ragnor Comerford 35be20cb05
MR-771: demote Run to direct-publish via expected_table_versions CAS
mutate_as and load now write directly to target tables and call the
publisher once at the end with per-table expected versions; the Run
state machine, _graph_runs.lance writers, __run__ staging branches,
and server /runs/* endpoints are removed. Multi-statement mutations
remain atomic at the manifest level via an in-memory MutationStaging
accumulator that gives read-your-writes within a query and a single
publish at the end. Concurrent-writer conflicts surface as
ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the
old DivergentUpdate merge shape. Documents one known limitation in
docs/runs.md: a multi-statement mid-query failure where op-N writes
a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the
manifest until a follow-up introduces per-table Lance branches.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 08:52:50 +02:00

4.2 KiB

Omnigraph v0.4.0

Omnigraph v0.4.0 demotes the Run state machine to commit metadata via the publisher's CAS, fixing the cancellation hole that motivated MR-771 and reducing the engine's surface area.

Highlights

  • Direct-to-target writes (MR-771): mutate_as and load write directly to the target tables and call ManifestBatchPublisher::publish once at the end with expected_table_versions. No more __run__<id> staging branches, no more RunRecord state machine. Cross-table OCC is enforced inside the publisher's row-level CAS on __manifest.
  • Cancellation safety by construction: a dropped mutation future leaves no graph-level state — only orphaned Lance fragments, reclaimed by omnigraph cleanup. The "zombie run" cascade documented in .context/zombie-run-investigation.md is gone.
  • Read-your-writes inside multi-statement mutations: a .gq query that inserts and then references a row in the same statement now sees its own writes via an in-process MutationStaging cache, even though no manifest commit happens between ops.
  • Structured conflict surface: concurrent writers race through the publisher's CAS; the loser surfaces as ManifestConflictDetails::ExpectedVersionMismatch { table_key, expected, actual }. The HTTP server maps this to 409 Conflict with a structured manifest_conflict body so clients can detect-and-retry without parsing the message.

Removed

This is a breaking release. Pre-0.4.0 / no SLA.

  • omnigraph::db::{RunRecord, RunStatus, RunId} types and the _graph_runs.lance / _graph_run_actors.lance Lance datasets.
  • Engine APIs begin_run, begin_run_as, publish_run, publish_run_as, abort_run, fail_run, terminate_run, list_runs, get_run.
  • HTTP endpoints: GET /runs, GET /runs/{run_id}, POST /runs/{run_id}/publish, POST /runs/{run_id}/abort. The RunListOutput and RunOutput schemas are removed from the OpenAPI document.
  • CLI subcommands: omnigraph run list, omnigraph run show, omnigraph run publish, omnigraph run abort. Use omnigraph commit list reading the commit graph for audit history.
  • Cedar policy actions run_publish and run_abort. Existing policy.yaml files referencing these actions will fail validation — remove the rules; the change action covers the equivalent gating.

Behavior changes

  • mutate_as / load are now atomic per query, single publish at the end. A failed mutation leaves the target unchanged with no intermediate manifest commits.
  • The OmniError::manifest_conflict shape produced by concurrent writers is now ExpectedVersionMismatch (was MergeConflict::DivergentUpdate via the run merge path). Clients that match on the conflict body must switch to inspecting manifest_conflict.table_key/expected/actual.

Known limitation

A multi-statement mutation that writes a Lance fragment in op-N and then fails in op-N+1 leaves the touched table with Lance HEAD ahead of the manifest. The next mutation against that table fails with ExpectedVersionMismatch. Most validation runs before any Lance write, so single-statement mutations are unaffected; the narrow path is multi-statement queries with late-op failures. Tracked as a follow-up; see docs/runs.md for the workaround.

Upgrade notes

  • Stale __run__* branches and _graph_runs.lance in legacy v0.3.x repos are inert — the engine no longer reads them — but they remain on disk until production cleanup. MR-770 owns the destructive sweep; this release deliberately does not touch legacy bytes.
  • The is_internal_run_branch predicate is kept as a defense-in-depth guard against users naming a branch __run__*. It will be removed in a follow-up alongside MR-770.
  • External scripts hitting /runs/* will now receive 404. Migrate them to /commits for audit history; mutation status is implied by the HTTP response on /change itself.

Included Changes

  • MR-771 — Demote Run: write directly to target via publisher
  • MR-766 — ManifestBatchPublisher::publish accepts per-table expected_table_versions (landed earlier; this release wires it in end-to-end)