omnigraph/.context/experiments/_2x-deferred.md
Devin AI 88b338b56b MR-925: exp 1.5-1.7 code-dives + 2.x deferral rationale + 3.x reference systems
- exp 1.5 (bitmap-pushdown): DF 52.5 DynamicFilterPhysicalExpr supports bitmap-shaped
  pushdown as-written; no fork needed; Path A (per-batch evaluation) ships v1, Path B
  (Lance RowIdMask) is v2 optimization
- exp 1.6 (txn-branches-cost): Lance per-table branches are +4N S3 PUTs per txn vs
  current lazy-graph-branch model; side-grade not clean win; recommend keeping current
  model for v1
- exp 1.7 (stable-row-id-compaction): stable row IDs already enabled everywhere in
  OmniGraph; Path B (OmniGraph-driven remap via FragReuseIndex public API) ships
  today; Path A (Lance-managed) is v2 follow-up gated on \xa71.2 plugin registry
- 2.x deferred with rationale: all calibration / risk-quantification work, per ticket
  \xa70.3 acceptance criteria do not require 2.x
- 3.1 Kuzu: factorization, semi-mask, dual-level hash index, variable-length expansion
- 3.2 LanceDB: TableProvider patterns, mutation-as-IR gap, no segment-aware planning
  in OSS
- 3.3 lance-graph: pure-SQL lowering trade-offs, 20-hop cap, Cypher AST liftable
- 3.4 Comet/GlareDB/ParadeDB/Spice.ai: capability advertisement, DF API churn budget
- 3.5 DuckDB: factorization calibration point (5-100x slower on multi-hop), DuckDB
  ext API as plugin gold standard
- 3.6 Trino: cost model with 3 components (CPU/mem/network), Connector SPI as
  versioned plugin reference, dynamic filters analog
2026-05-12 17:36:44 +00:00

4 KiB
Raw Blame History

Experiments 2.12.4 — DEFERRED (rationale)

Ticket: MR-925 §2 (lower-priority experiments). Date: 2026-05-12.

The ticket §0.3 acceptance criteria require all seven high-priority experiments (§1.1§1.7) and at least six reference-system code-dives (§3.1§3.6) to be complete before posting the rollup comment. §2.1§2.4 are explicitly framed as "calibrations and incremental validations that don't gate the RFC but are worth doing" and "can run during Phase 0."

This writeup explicitly defers all four §2.x experiments with rationale, so the next agent (or Phase-0 owner) doesn't have to re-discover the deferral decision.


2.1 scan_by_key_set extended benchmark — DEFER

Reason: MR-376 already validated 72× speedup at hop-1 / 100k nodes on local FS. The extension matrix in §2.1 (cold S3 vs warm local; selectivity sweep; |keys| / |dataset| ratio sweep; BTREE-routed vs direct Dataset::take) is calibration, not capability gating. The §5.3 cost gate in MR-737 can ship without these numbers; we just won't have a hard-tuned threshold for "when should we prefer scan-by-key-set vs re-scan" until they're collected.

When to run: During Phase 0 MR-737 §5.3 implementation, before landing the cost-gate parameter. Estimated 2 days. Owner: same person implementing §5.3.

Risk of deferring: Low. The cost gate has a sensible default (always prefer scan-by-key-set unless |keys| / |dataset| > 0.1); the calibration just tightens it.


2.2 Hash([key], N) partitioning elimination — DEFER

Reason: Validates DataFusion's EXPLAIN shows RepartitionExec elimination for capability-advertised plans. The capability advertising in MR-737 §5.6 is a quality-of-life optimization, not a correctness requirement. We can ship §5.6 without this validation and add it as a follow-up.

When to run: During Phase 0 MR-737 §5.6 implementation, half-day spike.

Risk of deferring: Low-medium. If DataFusion's optimizer does NOT honor our capability advertisements, the perf impact is a redundant RepartitionExec insertion — measurable in EXPLAIN ANALYZE but not a correctness issue. The risk is sub-optimal partitioning, not wrong results.


2.3 Extension-rate propagation through StatisticsRegistry — DEFER

Reason: Validates the §5.7 cost-model plumbing for custom column statistics. The cost model itself can ship with default statistics (no custom registry); the registry is an extension point for better cost choices, not correct cost choices. Deferring keeps §5.7's v1 narrower without breaking it.

When to run: When MR-737 §5.7 (cost-model surface) is being implemented — likely Phase 1, not Phase 0. Estimated 1 day.

Risk of deferring: Low. Default cost models from JoinExec, HashJoinExec, etc. are battle-tested; custom column statistics are incremental.


2.4 DataFusion API churn audit (47 → 53) — DEFER

Reason: Calibrates §11 (Risk) and informs upgrade-cycle budget, not Phase-0 design. Phase 0 pins to DataFusion 52.5 (the substrate pin we validated in §1.3 and §1.5). Knowing the breakage rate for future upgrades is a maintenance-cost input, not an entry-criterion.

When to run: Before any DataFusion minor-version bump after Phase 0 ships. Estimated 12 days. Owner: the engineer planning the bump.

Risk of deferring: Zero for Phase 0. The audit is for future planning; Phase 0 doesn't care about DF 47 or DF 53, only DF 52.5.


Summary

All four §2.x experiments are deferred to Phase 0 or later with the rationale that they are calibration / risk-quantification work, not capability gating. The ticket §0.3 acceptance criteria require §1.x (7/7 done) and §3.x (≥ 6/6 to do) but do not require §2.x. This deferral preserves the ticket's stated scope.

If the §2.x experiments need to be re-prioritized (e.g. if §1.x findings expose a calibration gap), they can be picked up individually; each is small (½2 days) and independent.