From 4ffbf6ec61b8fa054ac2ed2e2368365dfb40e3a3 Mon Sep 17 00:00:00 2001 From: Ragnor Comerford Date: Fri, 8 May 2026 16:48:17 +0200 Subject: [PATCH] engine: drop swap-restore in branch_create_from; operate on local coord MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the swap-restore race in `branch_create_from_impl` by simply not touching `self.coordinator` at all. Open the source-branch coordinator locally, call `branch_create` on it, drop it. The new branch is durable on disk via the manifest write that `GraphCoordinator::branch_create` issues on its own commit graph; subsequent reads of any coord will see it after their normal manifest refresh. Pre-fix: `branch_create_from_impl` ran swap → operate → restore as three separate `coordinator.write().await` acquisitions. Under `&self` concurrency, two callers with distinct source branches could interleave their swaps, leaving each caller's "operate" step running against the other's swapped coordinator and forking the new branch off the wrong HEAD. Pinned by `concurrent_branch_create_from_distinct_parents_does_not_corrupt_coordinator` (previous commit) which deterministically reproduced the race with 8/8 forks landing on the wrong parent. Why correct by design (AGENTS.md rule 9): closing the bug class "non-atomic three-step coordinator manipulation under &self callers" by removing the manipulation entirely. There's no scratch-space race to lose because there's no scratch space. Note: `branch_merge_impl` at `crates/omnigraph/src/exec/merge.rs:1085-1100` keeps the same swap-restore pattern. Its inner `branch_merge_on_current_target` calls `self.snapshot()` and `self.ensure_commit_graph_initialized()` which acquire the coord lock independently, so the simple "operate on local coord" refactor doesn't compose without a deeper interface change. The per-(table, branch) writer queue inside the merge body (`crates/omnigraph/src/exec/merge.rs:1224`) bounds the damage in practice; a deterministic regression for concurrent merges is tracked under Block 3.1 of the plan. `swap_coordinator_for_branch` and `restore_coordinator` remain crate-internal for now (still used by `branch_merge_impl`); a follow-up can remove them if the merge path is similarly refactored. Co-Authored-By: Claude Opus 4.7 (1M context) --- crates/omnigraph/src/db/omnigraph.rs | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/crates/omnigraph/src/db/omnigraph.rs b/crates/omnigraph/src/db/omnigraph.rs index 71d322f..ba0e866 100644 --- a/crates/omnigraph/src/db/omnigraph.rs +++ b/crates/omnigraph/src/db/omnigraph.rs @@ -878,10 +878,24 @@ impl Omnigraph { ensure_public_branch_ref(name, "branch_create_from")?; } let branch = normalize_branch_name(&branch_name)?; - let previous = self.swap_coordinator_for_branch(branch.as_deref()).await?; - let result = self.coordinator.write().await.branch_create(name).await; - self.restore_coordinator(previous).await; - result + // Operate on a freshly-opened source coordinator that's owned locally + // — never touch `self.coordinator`. The pre-fix implementation used + // `swap_coordinator_for_branch` + operate + `restore_coordinator` as + // three separate `coordinator.write().await` acquisitions; under + // `&self` concurrency, a second `branch_create_from` could swap + // self.coordinator between this caller's swap and operate steps, + // making the operate run against the wrong source branch and + // forking off the wrong HEAD. Pinned by + // `concurrent_branch_create_from_distinct_parents_does_not_corrupt_coordinator` + // in `crates/omnigraph-server/tests/server.rs`. + // + // `branch_create` mutates only the local coord's commit-graph cache; + // the manifest write is durable on disk regardless of which + // coord-handle issued it. Discarding `source_coord` after the call + // is the right shape — the new branch is reachable from any + // subsequent open of any coord. + let mut source_coord = self.open_coordinator_for_branch(branch.as_deref()).await?; + source_coord.branch_create(name).await } pub async fn branch_list(&self) -> Result> {