From 20ef0e90a4bed18359fdbb1af38bfe8741c2223d Mon Sep 17 00:00:00 2001 From: Ragnor Comerford Date: Mon, 8 Jun 2026 16:07:24 +0200 Subject: [PATCH] docs(invariants): note the non-atomic manifest->commit-graph publish gap Every graph publish commits __manifest then appends _graph_commits as two separate writes; a crash between them leaves the manifest ahead of the commit DAG. Live reads + durability are unaffected (reads resolve via the manifest) and recovery does not repair it; impact is bounded to commit history / time-travel by commit id / merge-base completeness. Pre-existing across all publishes, not the optimize reconcile specifically. Documented as a Known Gap; the fix is a commit-graph reconcilable from the manifest, not a recovery sidecar. --- docs/dev/invariants.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/docs/dev/invariants.md b/docs/dev/invariants.md index 5ee4f17..86a5d18 100644 --- a/docs/dev/invariants.md +++ b/docs/dev/invariants.md @@ -139,6 +139,20 @@ them explicit. Remove the skip when the upstream Lance fix lands — the `lance_surface_guards.rs::compact_files_still_fails_on_blob_columns` guard turns red on that bump to force it. +- **Manifest→commit-graph publish atomicity:** a graph commit advances + `__manifest` (the visibility authority) and then appends `_graph_commits` as + two separate writes (`commit_updates_with_actor_with_expected`, failpoint + `graph_publish.before_commit_append`). A crash between them leaves the manifest + at version N with no commit-graph row for N. Live reads and durability are + unaffected — the live version resolves via the manifest + (`GraphCoordinator::version()`), not the commit-graph head — and the open-time + recovery sweep does NOT repair it (`lance_head == manifest_pinned` classifies + `NoMovement`; a recovery sidecar would not change this). Impact is bounded to + commit history: `commit list` misses N, time-travel by commit id to N fails, + and merge-base loses a node (a likely-benign off-by-one re-merge). This affects + every publish, not the optimize drift-reconcile specifically. Eventual fix: + make the commit graph reconcilable from the manifest (or the two writes + atomic) — not a recovery-sidecar concern. - **Planner capability/stat surfaces:** cost-aware planning, complete capability advertisement, and explain-with-cost are roadmap. Do not describe them as implemented.