From 4eb865b34091ee7de2e93b6ac8d83280d867f443 Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Sun, 10 May 2026 14:37:58 +0000 Subject: [PATCH] docs: expand 0.4.2 release notes --- docs/releases/v0.4.2.md | 102 ++++++++++++++++++++++++++++++---------- 1 file changed, 78 insertions(+), 24 deletions(-) diff --git a/docs/releases/v0.4.2.md b/docs/releases/v0.4.2.md index e50167c..bc45716 100644 --- a/docs/releases/v0.4.2.md +++ b/docs/releases/v0.4.2.md @@ -1,44 +1,76 @@ # Omnigraph v0.4.2 -Omnigraph v0.4.2 is a correctness and operability release for concurrent -writes. It closes snapshot-isolation lost-update windows, expands recovery -sidecar coverage for inline deletes, and removes an unwired admission-control -surface before it becomes public API. +Omnigraph v0.4.2 is a concurrency, admission-control, and release-hygiene +release. It removes the server-global write lock, lets disjoint writers make +progress concurrently, adds per-actor admission limits, hardens branch and +mutation races with snapshot-isolation fences, and documents the release in +public open-source terms. ## Highlights -- **Read-time drift checks for strict mutations**: staged mutations now compare - the manifest pin captured when the query opened against the manifest snapshot +- **Unlocked server engine handle**: the HTTP server now holds the engine behind + a shared handle instead of a server-global write lock. Concurrent handlers can + call engine APIs directly while the engine serializes only the resources that + actually conflict. +- **Engine-owned writer queues**: same `(table, branch)` writers are serialized + by per-table writer queues inside the engine, while disjoint table/branch + writes can run concurrently. This narrows contention without relying on route + handlers to know storage-level ordering rules. +- **Per-actor admission control**: mutating HTTP handlers are gated by a + `WorkloadController` with per-actor in-flight request and estimated-byte + budgets. Rejections use HTTP 429 with `code: too_many_requests` and a + `Retry-After` header, so noisy actors back off without blocking unrelated + actors. +- **Admission coverage for all mutating handlers**: `/change`, `/ingest`, + `/schema/apply`, branch create/delete, and branch merge now flow through the + admission controller. Read-only endpoints are not admission-gated. +- **Op-kind-aware version checks**: mutation commit-time drift checks distinguish + append-like inserts from strict update/delete work. Inserts remain permissive + enough for safe concurrent append patterns; updates and deletes get stricter + stale-view rejection. +- **Read-time drift checks for strict mutations**: staged mutations compare the + manifest pin captured when the query opened against the manifest snapshot captured under table-queue ownership. If a concurrent writer moved the table - after the query read, the stale writer returns a manifest-conflict 409 instead - of staging work computed against an old snapshot. + after the query read, the stale writer returns a structured + `manifest_conflict` 409 instead of staging work computed against an old + snapshot. - **Inline-delete recovery coverage**: delete-only mutations still use Lance's inline delete path, but their recovery sidecar is now written before the manifest-version rejection path can return. If a delete moves Lance HEAD and a concurrent manifest update makes the query stale, the next read-write open can roll the residual back rather than leaving a head-ahead-of-manifest table. +- **Branch-operation race hardening**: branch creation and branch merge avoid + coordinator swap-restore races that could expose the wrong active branch to + concurrent work. Concurrent branch merges are serialized by a merge mutex. - **Branch-merge target revalidation**: merges re-check target table versions after acquiring target write queues. A stale merge plan returns a structured conflict instead of overwriting concurrent target-branch changes or adopting a source table over newly appended target rows. +- **Schema refresh deadlock fix**: recovery refresh releases the write guard + before schema reload, preventing a refresh/schema-apply deadlock. - **Lean admission API**: removed the unused global rewrite admission pool, `service_unavailable` error variant, related 503 documentation, and benchmark - flag. The server keeps the wired per-actor inflight and byte-budget admission - gates. -- **Regression coverage**: failpoint and server matrix tests now cover the - inline-delete sidecar race, merge × change target movement, and post-reopen - branch-op state. + flag. The public server surface now reflects only admission behavior that is + wired to handlers. +- **Open-source release hygiene**: this release adds guidance for public-facing + documentation, release notes, and version bumps. Release docs now avoid + private issue tracker references and use stable public descriptions instead. ## Behavior changes -- Some concurrent mutation and merge races now return `manifest_conflict` - instead of relying on later publisher-CAS detection or allowing a stale plan - to proceed. +- Disjoint mutating HTTP requests can now make progress concurrently instead of + queueing behind one process-wide engine write lock. +- Mutating handlers may return HTTP 429 when an actor exceeds per-actor in-flight + or estimated-byte budgets. Clients should respect `Retry-After` and retry + later. +- Concurrent update/delete and merge races now return structured + `manifest_conflict` 409 responses in more stale-view cases instead of relying + on later publisher-CAS detection or allowing a stale plan to proceed. - Concurrent branch merge × change on the same target branch may return either success or a clean 409 conflict, depending on which operation wins the queue. - `OMNIGRAPH_GLOBAL_REWRITE_MAX` is no longer recognized. Remove it from - deployment manifests; use the remaining per-actor inflight and byte-budget - admission settings for the currently wired server controls. + deployment manifests; use the per-actor in-flight and byte-budget admission + settings for the currently wired server controls. ## Upgrade Notes @@ -47,15 +79,37 @@ surface before it becomes public API. - Clients should treat `manifest_conflict` 409 responses as retryable stale-view conflicts. This was already the documented contract, but this release uses it in more concurrent-write paths. +- Clients should handle HTTP 429 from every mutating endpoint, not only + `/change`. Honor the `Retry-After` header. - Operators should remove stale references to global rewrite admission and 503 rewrite-pool exhaustion from local runbooks. +- If you maintain public docs or release notes, use public identifiers and + user-facing descriptions rather than private tracker IDs. + +## Tests added or strengthened + +- Regression tests for update read-your-writes under in-process concurrency. +- HTTP tests for same-key insert snapshots, disjoint `/change` concurrency, and + `/ingest` admission 429 + `Retry-After`. +- Branch-operation regression tests for branch-create swap-restore races, + concurrent `/change` + branch-merge interleavings, branch-merge swap-restore + races, branch-op matrix coverage, and post-reopen consistency. +- Failpoint-backed regression coverage for inline-delete recovery sidecar + creation before version-mismatch rejection. +- Admission tests use injectable `WorkloadController` state instead of mutating + process environment. ## Included Changes -- Per-table writer queues and read-time version checks for strict mutation - publishes. -- Branch-merge target snapshot revalidation under queue ownership. -- Inline-delete manifest-conflict recovery-sidecar regression test and fix. -- Matrix coverage updates for merge × change concurrency and reopen - consistency. +- Shared server engine state and per-actor admission on mutating endpoints. +- Per-(table, branch) writer queues and op-kind-aware manifest drift checks. +- Strict read-time version checks for updates/deletes. +- Branch create/merge race hardening and branch-merge target snapshot + revalidation under queue ownership. +- Retry-after support for admission rejections and OpenAPI updates for reachable + 429 responses. +- Actor-isolation benchmark harness updates for the current admission controller. - Removal of the unwired global rewrite admission / 503 server surface. +- Version bump to `0.4.2` across workspace crates, `Cargo.lock`, and + `openapi.json`. +- Public release-note cleanup and new OSS best-practice guidance in `AGENTS.md`.