omnigraph/docs/releases/v0.4.2.md
2026-05-10 14:37:58 +00:00

115 lines
6.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Omnigraph v0.4.2
Omnigraph v0.4.2 is a concurrency, admission-control, and release-hygiene
release. It removes the server-global write lock, lets disjoint writers make
progress concurrently, adds per-actor admission limits, hardens branch and
mutation races with snapshot-isolation fences, and documents the release in
public open-source terms.
## Highlights
- **Unlocked server engine handle**: the HTTP server now holds the engine behind
a shared handle instead of a server-global write lock. Concurrent handlers can
call engine APIs directly while the engine serializes only the resources that
actually conflict.
- **Engine-owned writer queues**: same `(table, branch)` writers are serialized
by per-table writer queues inside the engine, while disjoint table/branch
writes can run concurrently. This narrows contention without relying on route
handlers to know storage-level ordering rules.
- **Per-actor admission control**: mutating HTTP handlers are gated by a
`WorkloadController` with per-actor in-flight request and estimated-byte
budgets. Rejections use HTTP 429 with `code: too_many_requests` and a
`Retry-After` header, so noisy actors back off without blocking unrelated
actors.
- **Admission coverage for all mutating handlers**: `/change`, `/ingest`,
`/schema/apply`, branch create/delete, and branch merge now flow through the
admission controller. Read-only endpoints are not admission-gated.
- **Op-kind-aware version checks**: mutation commit-time drift checks distinguish
append-like inserts from strict update/delete work. Inserts remain permissive
enough for safe concurrent append patterns; updates and deletes get stricter
stale-view rejection.
- **Read-time drift checks for strict mutations**: staged mutations compare the
manifest pin captured when the query opened against the manifest snapshot
captured under table-queue ownership. If a concurrent writer moved the table
after the query read, the stale writer returns a structured
`manifest_conflict` 409 instead of staging work computed against an old
snapshot.
- **Inline-delete recovery coverage**: delete-only mutations still use Lance's
inline delete path, but their recovery sidecar is now written before the
manifest-version rejection path can return. If a delete moves Lance HEAD and a
concurrent manifest update makes the query stale, the next read-write open can
roll the residual back rather than leaving a head-ahead-of-manifest table.
- **Branch-operation race hardening**: branch creation and branch merge avoid
coordinator swap-restore races that could expose the wrong active branch to
concurrent work. Concurrent branch merges are serialized by a merge mutex.
- **Branch-merge target revalidation**: merges re-check target table versions
after acquiring target write queues. A stale merge plan returns a structured
conflict instead of overwriting concurrent target-branch changes or adopting a
source table over newly appended target rows.
- **Schema refresh deadlock fix**: recovery refresh releases the write guard
before schema reload, preventing a refresh/schema-apply deadlock.
- **Lean admission API**: removed the unused global rewrite admission pool,
`service_unavailable` error variant, related 503 documentation, and benchmark
flag. The public server surface now reflects only admission behavior that is
wired to handlers.
- **Open-source release hygiene**: this release adds guidance for public-facing
documentation, release notes, and version bumps. Release docs now avoid
private issue tracker references and use stable public descriptions instead.
## Behavior changes
- Disjoint mutating HTTP requests can now make progress concurrently instead of
queueing behind one process-wide engine write lock.
- Mutating handlers may return HTTP 429 when an actor exceeds per-actor in-flight
or estimated-byte budgets. Clients should respect `Retry-After` and retry
later.
- Concurrent update/delete and merge races now return structured
`manifest_conflict` 409 responses in more stale-view cases instead of relying
on later publisher-CAS detection or allowing a stale plan to proceed.
- Concurrent branch merge × change on the same target branch may return either
success or a clean 409 conflict, depending on which operation wins the queue.
- `OMNIGRAPH_GLOBAL_REWRITE_MAX` is no longer recognized. Remove it from
deployment manifests; use the per-actor in-flight and byte-budget admission
settings for the currently wired server controls.
## Upgrade Notes
- No repository migration is required. Existing v0.4.1 repos can be opened
directly with v0.4.2.
- Clients should treat `manifest_conflict` 409 responses as retryable stale-view
conflicts. This was already the documented contract, but this release uses it
in more concurrent-write paths.
- Clients should handle HTTP 429 from every mutating endpoint, not only
`/change`. Honor the `Retry-After` header.
- Operators should remove stale references to global rewrite admission and 503
rewrite-pool exhaustion from local runbooks.
- If you maintain public docs or release notes, use public identifiers and
user-facing descriptions rather than private tracker IDs.
## Tests added or strengthened
- Regression tests for update read-your-writes under in-process concurrency.
- HTTP tests for same-key insert snapshots, disjoint `/change` concurrency, and
`/ingest` admission 429 + `Retry-After`.
- Branch-operation regression tests for branch-create swap-restore races,
concurrent `/change` + branch-merge interleavings, branch-merge swap-restore
races, branch-op matrix coverage, and post-reopen consistency.
- Failpoint-backed regression coverage for inline-delete recovery sidecar
creation before version-mismatch rejection.
- Admission tests use injectable `WorkloadController` state instead of mutating
process environment.
## Included Changes
- Shared server engine state and per-actor admission on mutating endpoints.
- Per-(table, branch) writer queues and op-kind-aware manifest drift checks.
- Strict read-time version checks for updates/deletes.
- Branch create/merge race hardening and branch-merge target snapshot
revalidation under queue ownership.
- Retry-after support for admission rejections and OpenAPI updates for reachable
429 responses.
- Actor-isolation benchmark harness updates for the current admission controller.
- Removal of the unwired global rewrite admission / 503 server surface.
- Version bump to `0.4.2` across workspace crates, `Cargo.lock`, and
`openapi.json`.
- Public release-note cleanup and new OSS best-practice guidance in `AGENTS.md`.