From 5cfae9acc1bf0d63b37e67d17aa53fe0ddef3ed5 Mon Sep 17 00:00:00 2001 From: Ragnor Comerford Date: Sun, 21 Jun 2026 21:54:59 +0200 Subject: [PATCH] =?UTF-8?q?docs(rfc-013):=20latency=20=3D=20(serial=5Fhops?= =?UTF-8?q?=20+=20ops/concurrency)=C2=B7RTT=20=E2=80=94=20concurrency-cap?= =?UTF-8?q?=20correction=20+=20Lance-metadata=20comparison=20(#292)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat(engine): compact the internal __manifest/_graph_commits tables in optimize `optimize` iterated node/edge catalog tables only, so the two internal system tables (`__manifest`, `_graph_commits`) accumulated one fragment per commit and were never compacted -- making every write's metadata scan O(fragments), which grows forever on a long-lived graph (RFC-013 step 2). `optimize_all_tables` now also compacts both internal tables via a new `compact_internal_table`. They are not catalog-tracked (readers open them at their latest Lance HEAD), so it is a much simpler path than `optimize_one_table`: compact in place, no manifest publish (nothing to publish to), no recovery sidecar (a single atomic Lance commit -- no HEAD-before-publish gap), and no optimize_indices (they carry no Lance index, only object_id's unenforced-PK metadata). No application lock: Lance's compact_files auto-retries its Rewrite against any concurrent writer (the canonical LanceDB pattern; Rewrite vs Append is compatible, vs Update a retryable same-fragment conflict Lance rebases), and a coordinator refresh afterwards makes the warm handle observe the compacted HEAD. Compacts both tables even though Phase 7 (iss-991) will later fold _graph_commits into __manifest -- a one-call throwaway for the full interim win; __manifest compaction is also the prerequisite for Phase 7's graph_head contention. Cleanup (version GC) of the internal tables is deliberately NOT included here: it needs the Q8 cleanup-resurrection watermark first (deferred). maintenance.rs: optimize now returns 6 stats (4 data + 2 internal); adds optimize_compacts_internal_tables (sheds fragments, leaks no recovery sidecar, graph coherent for reads + strict writes after). * test(engine): un-ignore the internal-table scan LOCK (step 2 acceptance) `internal_table_scans_are_flat_in_history` was the RED, #[ignore]'d acceptance gate staged in PR #288. With internal-table compaction landed, a write's __manifest/_graph_commits scan is flat in commit-history depth on a compacted graph (measured __manifest 4->2, _graph_commits 7->3 across depth 10->100, vs the pre-step-2 RED 34->214 / 29->207). The test now compacts at each depth before measuring and runs green every-PR. * docs: RFC-013 step 2 internal-table compaction landed - invariants.md: close the compaction half of the read-path-rederivation known gap (optimize now compacts the internal tables; cleanup half still deferred). - maintenance.md: optimize covers __manifest/_graph_commits (no publish, no sidecar); not yet in cleanup. - rfc-013 §9: split step 2 into 2a (compaction, landed) and 2b (cleanup + Q8 watermark, deferred — debated; MTT-overlap + hot-path liability). - testing.md: the internal-table LOCK is now green every-PR. * fix(engine): guard absent _graph_commits + always compact internal tables Addresses PR #291 review findings: - Greptile (P1): optimize unconditionally opened `_graph_commits` for compaction, but a graph can validly have none (the coordinator opens it as `Option`, gated on `storage.exists`, for graphs predating the commit graph). `Dataset::open` on the absent table errored and failed the whole optimize. Guard the `_graph_commits` compaction with the same `storage_adapter().exists()` check the coordinator uses; `__manifest` always exists so it stays unguarded. Regression test `optimize_tolerates_absent_graph_commits_table` (empty graph so no publish recreates the table before the guard). - Cursor (low): the `table_tasks.is_empty()` early return skipped internal-table compaction for a schema with no node/edge types. Removed it so the internal tables are compacted regardless of the data-table set. - Codex (auto-cleanup, P1): documented — `compact_files` commits with a default `CommitConfig` (no skip_auto_cleanup) and `CompactionOptions` exposes no override, so on a graph storing an *on* auto_cleanup config the commit would fire version GC. Both internal tables are created with `auto_cleanup: None`, so new graphs are safe; the only exposure is pre-fix upgraded graphs, identical to the existing data-table optimize path, with step 2b's watermark as the comprehensive guard. Added a comment in `compact_internal_table` recording this. * docs(rfc-013): serial-hop correction — wall-clock is the ~110-hop backbone, not op count Latency-slope measurement on the deployed edge binary (f6d2cc03, steps 1+3a landed; rustfs + per-op latency proxy, depth 1..85) shows wall-clock is set by a ~110-hop SERIAL backbone that is depth-invariant. Total ops grow +~7/depth but PARALLELIZE (parallelism 1->6), so the depth term adds little wall-clock. - New §0(c): the serial-hop vs total-op finding + branch-op backbones (create ~77, delete ~87, branch-write ~258/1777-ops/21s floor = fork-on-first-write). - §2.4: correct the '1720->198 ops => 258s->30s' op-count->wall-clock conversion. - §5.1: promote serial-hop/num_stages to the PRIMARY latency LOCK; op-count flatness demoted to a cost/compute-floor gate. - §9 step 2: reprioritized as Phase-7 prerequisite + compute-floor/space, NOT the wall-clock fix; step 3b (parallel capture-once WriteTxn) is the headline latency lever; branch-write moved under step 3b + fork seam. - Summary: serial-backbone correction up front. Vindicates the §3/§4.1 design; corrects the op-count latency framing. * docs(rfc-013): concurrency-cap correction + Lance-metadata comparison Fold in two measured findings from the deployed edge binary (f6d2cc03) on rustfs behind a latency+concurrency proxy: - §0(d): concurrency-cap A/B. Under unlimited concurrency the internal-table scan parallelizes (backbone ~110); under an R2-realistic cap (8) it serializes and an UNCOMPACTED graph runs away (per-write ops 1273->3505, wall 6->16s), while #291's internal compaction cuts it ~6x and bounds it (137->1 frag). The latency model is (serial_hops + ops/effective_concurrency)*RTT + compute. - Reframe step 2 across Summary/§2.4/§9: NOT de-ranked — on R2 (capped) it is a primary latency lever + the anti-runaway fix + Phase-7 prereq. The earlier 'step 2 is parallel, irrelevant to latency' was an unlimited-concurrency artifact. Deployed f6d2cc03 optimize is node/edge-only; #291 (undeployed) is the prod win. - §5.1: the cost-gate ThrottledStore must cap concurrency AND inject latency; assert serial_hops flat AND ops flat in history. - §2.3 + §8: Lance/LanceDB comparison from 7.0.0 source — Lance metadata is a single-file per-version manifest read O(1) (latest_version_hint), pruned by default; omnigraph's __manifest-as-Lance-dataset scan is self-inflicted by the cross-table-atomicity choice. Adds explicit defense of Lance-dataset __manifest (MTT seam) vs a flat-file CAS'd manifest (cheaper, off the MTT path). Design (§3/§4.1) unchanged and vindicated; corrections are measurement framing, step sizing, and one design-choice that was implicit. --- docs/dev/rfc-013-write-path-latency.md | 195 +++++++++++++++++++++++-- 1 file changed, 184 insertions(+), 11 deletions(-) diff --git a/docs/dev/rfc-013-write-path-latency.md b/docs/dev/rfc-013-write-path-latency.md index 37e6a8a..d955a9d 100644 --- a/docs/dev/rfc-013-write-path-latency.md +++ b/docs/dev/rfc-013-write-path-latency.md @@ -46,6 +46,24 @@ main/branch/node paths (§2.4). It is shippable as a standalone PR first (§9 st 3a); the rest of the RFC is the constant-factor + correctness + internal-residual work layered on the same seam. +**Correction (2026-06-20/21) — the latency metric is `(serial_hops + ops / +effective_concurrency) · RTT + compute`, measured [M].** Two findings, both from the +deployed edge binary (steps 1+3a landed) on rustfs behind a latency+concurrency proxy: +**(i)** under *unlimited* concurrency, wall-clock is a **~110-hop serial backbone, +depth-invariant** — the depth-driven ops parallelize away (§0(c)); but **(ii)** under +an **R2-realistic concurrency cap (8)**, the internal-table fragment scan can no longer +fan out, so **op count re-enters wall-clock** and an uncompacted graph *runs away* +(per-write ops 1273→3505, wall 6→16s and climbing) — while #291's internal-table +compaction cuts it ~6× and bounds it (§0(d) A/B). So the design is **vindicated and +unchanged** (§3/§4.1: capture-once `WriteTxn` + parallel stages → "~2–3 hops" is the +**serial-backbone** lever, step 3b; bounded history is the **op-count** lever, step 2a) +— what's corrected is the *measurement framing and step sizing*: op count was the wrong +latency proxy **only because the harness had unlimited concurrency**; on a capped store +both `serial_hops` (→ step 3b) and `ops` (→ step 2a) are on the critical path, and +which dominates is set by `effective_concurrency × fragment_count`. The cost gate +(§5.1) is corrected to inject a **concurrency cap *and* latency**, and to assert serial +hops *and* op-count-flat-in-history. + --- ## 0. Validation ledger (read this first) @@ -139,6 +157,72 @@ one unpinned item — see §12. Reads, by contrast, are flat in depth (`warm_read_cost.rs`, PR #268). This is the O(history)-per-write → O(N²)-cumulative behavior the production incident hit. +**(c) Serial-hop measurement [M] — wall-clock is set by the serial backbone, not +the op count.** §0(b) counts *total* object-store ops; wall-clock is set by the ops +on the *critical path*. Measured on the **deployed edge binary `f6d2cc03`** (steps +1+3a landed) via rustfs + a per-op latency proxy, sweeping injected per-op latency `L` +and reading the slope of `wall = compute + serial_hops · L` (the slope **is** the +critical-path hop count; the proxy also reports request overlap → parallelism): + +| depth | total ops | parallelism | **serial backbone (slope)** | `L=0` wall (compute floor) | +|---:|---:|---:|---:|---:| +| ~1 | 107 | 1.0–1.2 | **~109** | 2.15s | +| ~33 | 338 | 3.4–4.0 | **~108** | 2.45s | +| ~85 | 716 | 6.0–7.1 | **~113** | 4.27s | + +The serial backbone is **~110 hops and depth-INVARIANT**, while total ops grow +`+~7/depth` (107→716, the §0(b) term) **and parallelize** (parallelism 1→6, +`max_inflight` up to 65) — so the depth-driven ops add almost nothing to wall-clock. +`wall ≈ 110·RTT + compute`; the prod 35s direct-main write ≈ 110 hops × ~280ms +cross-region RTT. Branch ops measured the same way (4-table graph; prod = 217 tables, +≈50× worse): **branch-create serial ~77, branch-delete ~87** (op counts scale with +table count → §9 step 6), and **branch-WRITE is worst — 1777 ops, serial ~258, 21s +compute floor even at `L=0`** = fork-on-first-write (the path 3a did *not* cover; §9 +step 3b + the fork seam), matching prod's 103–138s. + +**The methodological correction this forces.** *Op count is a cost/space/compute-floor +metric; the serial-hop count (latency slope / `num_stages`) is the wall-clock metric.* +3a's real 90s→35s win (≈2.6×, matching its measured 2.7× op cut) is genuine **because +it removed *serial* hops** (the per-table data opens were on the critical path). But +the wall-clock predictor is not serial-hops *alone* — it is +**`(serial_hops + ops / effective_concurrency) · RTT + compute`**: total op count +re-enters wall-clock whenever the store **caps concurrency**, because the parallel +tail can no longer fan out. + +**(d) The concurrency-cap A/B [M] — proves op count *is* wall-clock on a capped store, +and that step 2a is a primary latency lever (not a parallel afterthought).** §0(c) was +measured on **rustfs with unlimited concurrency** (`max_inflight` reached **129**) — a +poor proxy for R2, which is connection-capped and rate-limited. Re-running the same +write through a proxy capped at **8 concurrent** (R2-realistic), with internal-table +**fragment count as the only variable** (edge binary for writes; the unmerged #291 +binary only to run `optimize`), depth ~130, `__manifest`≈137 fragments: + +| state | per-write ops | wall (cap=8, L=20) | trend | +|---|---:|---:|---| +| **uncompacted** (`__manifest` 137 frags) | 1273 → 1487 → **3505** | 5.9 → 8.4 → **16.4 s** | **runaway** — each write reads all frags **and appends one more** | +| **after #291 `optimize`** (137→1 frag) | 275 → 250 → **197** | 6.2 → 5.4 → **3.8 s** | **bounded** | + +`optimize` collapsed `__manifest` 137→1, `_graph_commits` 140→1 frags → **~6× fewer +ops/write and the runaway stopped.** Under unlimited concurrency this delta vanishes +(the frags fan out); under the cap it is the dominant term. **This is the actual +mechanism of the prod 35s and its degradation over time** (the `O(N²)` of §0/§2.2): +on a capped store, every uncompacted write scans all `__manifest`/`_graph_commits` +fragments *and adds one*, so latency climbs with graph age — exactly what prod shows, +and exactly what step 2a halts. Prod confirms the scale: `__manifest` 1,739 obj / +59 MiB, `_graph_commits` 1,848 obj / 23.5 MiB, read per write, **uncompacted** (the +deployed `f6d2cc03` optimize is node/edge-only — §9 step 2 — so an operator `optimize` +run on prod cannot touch them; only #291 can). + +**Corrected conclusion.** The §2.4 op-count math (`1720→198 ⇒ 258s→30s`) is still +wrong *as stated* (it assumes full serialization), but the opposite over-correction — +"step 2 is parallel, so irrelevant to latency" — is **also wrong**, and an artifact of +the unlimited-concurrency harness. The truth is **concurrency-dependent**: on a capped +store (R2) the internal-scan op count *is* on the critical path and **step 2a is a +primary latency lever and the anti-runaway fix**; the residual after compaction +(~4 s here, mostly compute + the serial backbone) is then **step 3b**'s. Both are +load-bearing; which dominates is set by `effective_concurrency × fragment_count`. So +the cost gate (§5.1) must inject a **concurrency cap**, not just latency. + --- ## 1. Problem & measurements @@ -191,7 +275,11 @@ Branch ops compound it: `branch create` is a per-table sequential fork loop (`fork_branch_from_state`, `table_store.rs:282`); `branch delete` opens a snapshot per *other* branch (`ensure_branch_delete_safe`, `omnigraph.rs:1317`) and force-deletes per forked table sequentially (`cleanup_deleted_branch_tables`, -`omnigraph.rs:1359`) **[S]**. +`omnigraph.rs:1359`) **[S]**. Measured serial backbones (§0(c), edge binary): branch +create **~77 hops**, delete **~87** (op counts scale with table count → §9 step 6); +**branch *write* is the worst — 1777 ops, ~258-hop serial backbone, a 21s compute +floor even at zero RTT** = fork-on-first-write (the path step 3a did not cover; §9 step +3b + the fork seam), which is why prod branch-load (103–138s) ≫ direct-main (35s). --- @@ -267,6 +355,31 @@ cost. The correct replacement is *scheduled* compaction **and** version cleanup (§9 step 2), **not** re-enabling `auto_cleanup`. Without it, version history (and per-write cost) grows forever. +**Why Lance/LanceDB don't have this cost — the internal-table scan is self-inflicted +[U].** Verified in Lance 7.0.0 source (cargo registry): a Lance dataset's metadata is a +**per-version manifest *file*** — one self-contained protobuf +(`format/manifest.rs:35`, `struct Manifest { fragments: Arc>, … }`) — +and the current version is resolved **O(1)** via `latest_version_hint.json` +("O(1)/O(k) latest-version lookup via HEAD", `io/commit.rs:75-79`) or the V2 lexical +name. Reading current state is **one file read, never a scan over accumulated +metadata**; old manifests + `_transactions` files are reclaimed by **timestamp GC** +(`dataset/cleanup.rs`, on by default), and manifest *size* is bounded by data +compaction. **LanceDB** is multi-table but each table is an *independent* Lance +dataset; its catalog is a directory/namespace lookup (or a cloud catalog service), not +a mutable dataset read per write — it does **no cross-table atomic commit**, so it +needs no coordinating meta-table. Omnigraph's `__manifest`/`_graph_commits` are +therefore **not a Lance pattern** — they exist only because omnigraph layers a +**mutable catalog *as a Lance dataset*** over 217 independent tables to get a +cross-table atomic commit (the lance#7264 "Alternative A"). The whole §2.2 internal +term is the price of that choice: omnigraph reads its catalog as an **O(fragments) +dataset scan and appends a fragment per write**, where Lance reads its own metadata +**O(1)** and prunes by default. Step 2a (compact → 1 fragment) ≈ Lance's single-file +manifest read; step 2b (cleanup) ≈ Lance's `cleanup_old_versions`; the design simply +re-derives, on a Lance-dataset catalog, the hygiene Lance treats as table stakes — and +§8/lance#7264 MTT is the path to delete the catalog and inherit Lance's O(1) metadata +outright. *(This also raises a design question — should the catalog be a Lance dataset +at all, vs a single flat CAS'd manifest file? — addressed in §8.)* + ### 2.4 Lance namespace: proper use (why the fix is bypass, not patch) The upstream Lance Namespace is a **catalog / discovery layer** — "table @@ -334,9 +447,18 @@ correctness, not drop-in completeness. **Step 2 also proven [M].** On the step-3-patched binary at depth ~87, compacting the internal tables to 1 fragment each (content-preserving) collapsed their scans: `__manifest` 285 → 32 (8.9×), `_graph_commits` 177 → 11 (16×); the step-3 data term -stayed flat at 4. So **both depth terms are now empirically eliminated** — a depth-87 -single edge drops **~1720 → 198 ops (~8.7×; ≈258 s → ≈30 s at 150 ms/RTT)** with -both fixes. The internal term is **fragment-scan growth** (`read_manifest_scan` / +stayed flat at 4. So **both depth *op-count* terms are now empirically eliminated** — +a depth-87 single edge drops **~1720 → 198 ops (~8.7× in op count)** with both fixes. +**Wall-clock correction (§0(c)/(d)):** the `≈258 s → ≈30 s` figure was wrong (it +multiplied *total* ops by RTT as if serial); but the win is **concurrency-dependent**, +not zero. Under *unlimited* concurrency the depth-driven ops parallelize and this +op-count cut barely moves wall-clock (the backbone is ~110 hops); **under an +R2-realistic concurrency cap the same op-count cut is a primary latency win** — the +§0(d) A/B shows the uncompacted internal scan *runs away* (6→16 s) and #291's +compaction cuts it ~6× and bounds it. So step 2a is a **latency lever on a capped store +(R2) and the anti-runaway fix**, *and* a compute-floor / Phase-7-prerequisite / space +win; step 3b is the lever for the residual serial backbone. The internal term is +**fragment-scan growth** (`read_manifest_scan` / `commit_graph.refresh` read all fragments of the *latest* version), so the fix is **compaction** (merge fragments) — distinct from the data table's version-chain term that step 3 / version-cleanup handle. `optimize`'s `all_table_keys` @@ -513,9 +635,24 @@ path and would pass falsely. The load-bearing rule both Lance and SlateDB mostly miss: **assert the constant is flat across N, not just small at one N.** A shallow fixture cannot catch an -O(history) cost (the §0(b) table is the red baseline). Add a `num_stages` -(sequential-hop) assertion via a `ThrottledStore` wrapper (Lance's -`test_commit_iops` setup) so an O(N) listing also blows a wall-time budget. +O(history) cost (the §0(b) table is the red baseline). + +**Two latency LOCKs, and the `ThrottledStore` must cap concurrency *and* inject +latency (corrected per §0(c)/(d)).** The wall-clock model is +`(serial_hops + ops/effective_concurrency)·RTT + compute`, so the gate needs **both** +terms, and an unlimited-concurrency harness measures neither honestly: +(1) **serial-hop LOCK** (`serial_hops ≤ K`, flat in depth) — read off the +`wall = compute + serial_hops·L` slope (Lance's `test_commit_iops` setup); catches the +~110-hop backbone (step 3b's target). (2) **op-count-flat-in-history LOCK** under a +**capped-concurrency** `ThrottledStore` (e.g. `MAXCONC=8`) — catches the internal-scan +runaway (§0(d)) that step 2a fixes; *without the cap this LOCK is invisible* because +the ops fan out (the §0(d) trap). Both are load-bearing: a build can pass the serial-hop +LOCK and still run away on a capped store if its per-write op count grows with history. +Run the depth sweep through a `ThrottledStore` that **both** throttles per-op latency +**and** bounds in-flight concurrency to an R2-realistic value; assert `serial_hops` flat +*and* `ops` flat in history. (A pure op-count gate under unlimited concurrency would +*fail a correct build* whose parallel scans grow yet cost no wall-clock, and *pass a +slow one* — which is why the cap is the load-bearing addition.) ### 5.2 Tier 2 — wall-clock trend (post-merge / nightly, never a PR gate) @@ -821,6 +958,25 @@ not schedule around MTT landing.** When it ships, `publish`'s *body* swaps (stage→CAS→sidecar → `catalog.transaction()`) while `WriteTxn`/`PublishPlan` and every verb lowering stay. `iss-863`/`iss-864` **[G]** already scope this spike. +**Why keep `__manifest` as a Lance *dataset* (and compact it) rather than a single flat +CAS'd manifest file?** The Lance-source comparison (§2.3) makes this an explicit choice +to defend, not assume. Both reference designs the RFC cites store cross-version metadata +as **one flat file** read O(1): Lance's per-version manifest (`format/manifest.rs`) and +SlateDB's monotonic-ID manifest (§13). A flat `graph_manifest.json` updated by +conditional-PUT would give omnigraph O(1) catalog reads and a natural one-writer CAS +**with no fragment-scan / compaction / cleanup treadmill** — structurally cheaper than +the Lance-dataset `__manifest` whose hygiene §9 step 2 exists to maintain. The reason to +keep the Lance-dataset form is the **MTT seam**: `__manifest` is deliberately shaped so +`publish` swaps to Lance `catalog.transaction()` when lance#7264 lands, at which point +Lance owns the cross-table manifest and omnigraph **deletes `__manifest` entirely** — +inheriting Lance's O(1) metadata rather than maintaining its own. A flat-file rewrite +would be a detour *away* from that seam, replaced again by MTT. So the trade is +**"Lance-dataset catalog (compacted, MTT-aligned) over flat-file manifest (locally +cheaper, off the MTT path)"** — defensible, but it means step 2's compaction/cleanup +work is a *bridge cost*, justified only by the MTT endgame; if MTT slips materially, the +flat-file manifest becomes the better target and step 2 stops being a bridge and starts +being permanent overhead. Worth a revisit checkpoint tied to the lance#7264 timeline. + The MemWAL/LSM ingest tier (`iss-681` **[G]**, `dec-adopt-lance-v7-memwal`) is **complementary, not competing, and not in flight** (the `memwal-benefit-analysis` branch is an empty placeholder; the real analysis is commit `c9a81266`). MemWAL @@ -847,10 +1003,21 @@ to flatten the curve. `storage.ops` span metric (§5.3) and the bucket-gated `write_cost_s3.rs` opener LOCK (step 3a's red→green, S3-only per the §9-3a measurement note). 2. **Bound history — bring the INTERNAL tables into optimize/cleanup.** Split into - a compaction half (the latency win, safe) and a cleanup half (version GC, needs - the Q8 watermark). Validated (Lance docs + source): compaction *preserves* - versions and is the only term needed to flatten the per-write metadata scan; - cleanup is the separate version-deleting op that opens the Q8 hole. + a compaction half (safe) and a cleanup half (version GC, needs the Q8 watermark). + Validated (Lance docs + source): compaction *preserves* versions and flattens the + per-write metadata *op-count* scan; cleanup is the separate version-deleting op that + opens the Q8 hole. **Latency role — concurrency-dependent, MEASURED (§0(d)):** the + internal fragment scan parallelizes only on a store with free concurrency; under an + R2-realistic cap (8) it serializes and an uncompacted graph *runs away* (per-write + ops 1273→3505, wall 6→16 s), which #291's compaction cuts ~6× and bounds. So on R2 + step 2a is **both a primary latency lever and the anti-runaway fix**, *and* the + **hard prerequisite for Phase 7 / step 4** (the `graph_head` CAS retry re-runs + `load_publish_state`, only acceptable once `__manifest` is compacted), *and* a + compute-floor/space win. (On an unlimited-concurrency store the latency component + alone vanishes — the depth ops fan out — but R2 is not that store.) **#291 is merged + to main but undeployed; the deployed `f6d2cc03` optimize is node/edge-only, so an + operator `optimize` on prod cannot compact these tables — deploying #291 + running + optimize is the immediate prod win.** - **2a. Internal-table compaction. ✅ LANDED.** `optimize` now compacts all three internal tables — `__manifest`, `_graph_commits`, **and `_graph_commit_actors`** (the actor table grows one fragment per commit on the @@ -952,6 +1119,12 @@ to flatten the curve. `iss-merge-recovery-partial-rollforward`, `iss-recovery-sweep-live-writer-rollback`, `iss-934`.) 6. **Branch ops.** Lance `Clone` for create (`iss-691`); concurrent delete loops. + Measured backbones (§0(c)): create ~77, delete ~87 — op counts scale with table + count, so `Clone` (O(tables)→O(1)) + `buffer_unordered` delete are the fix. + **Note: branch *write* (1777 ops, ~258-hop backbone, 21s compute floor) is NOT a + step-6 item** — it is fork-on-first-write stacked on the main backbone, owned by + **step 3b + the fork seam** (the path 3a skipped); it is the single worst write + shape and should be a named acceptance case for step 3b. 7. **Freeze** investment in publisher/sidecar/fork internals; pursue the MTT seam (`iss-863`/`iss-864`) as the strategic exit.