Builds and runs the small repro specified in
.context/experiments/stable-row-id-compaction.md §5 ("Small repro plan")
and extends the writeup with the empirical Phase 1 evidence (F7–F11).
Matrix {BTree, Bitmap, LabelList} × {stable=true, stable=false}, 6 fragments
forced via small max_rows_per_file and target_rows_per_fragment, with
with_row_id() probes pre- and post-compaction. All six cases return correct
counts; with enable_stable_row_ids: true the row IDs round-trip unchanged
across compaction; with the flag off the row addresses move (fragment_id <<
32 | local_row), which is the documented contract.
Plus a side experiment confirming that Operation::Overwrite (both staged
via InsertBuilder::execute_uncommitted + CommitBuilder::execute and direct
Dataset::write Overwrite) inherits manifest.uses_stable_row_ids from the
existing dataset, even when the WriteParams flag is absent. This resolves
the suspicion about table_store.rs:956 (stage_overwrite path not setting
the flag): the path is correct, not a latent bug.
Conclusion: MR-737 §5.5 substrate caveat ("Stable Row ID for Index is
documented as experimental in lance-4.0.x") is empirically resolved.
Feature works; docs are conservative. RFC shape for MR-927 is a docs-PR.
Refs MR-925, MR-927.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
On full re-read of MR-737, the section numbering used in several of the
original MR-925 cross-references no longer lines up:
- §5.10 in current MR-737 is 'First-class scores and rank fusion', NOT
'custom index types' / 'connector SPI'. The custom-index-type / plugin
surface is §5.4 ('Persisted CSR adjacency as Lance index plugin') with
the capability shape in §5.6.
- §5.11 is 'Substrate choice — DataFusion vs. custom executor (A)'
(resolved 2026-05-11), NOT 'per-table txn branches'. Per-table Lance
native txn branches live in §5.12 ('Mutation IR, write planner, and
external sources') per Ragnor 2026-04-29.
- §5.5 is 'Stable row IDs as graph IDs', NOT 'reconciler pattern'. The
reconciler is §5.16.
- §5.8 is 'Tiering via Lance base paths', NOT SIP-related. SIP is §5.3.
- The MR-925 §1.6 cross-reference to 'Open Q5' was to a pre-2026-05-11
numbering; Q5 in current §10 is 'extension rate under filters'.
Each writeup now has a §-numbering note at the top mapping its findings
to the current MR-737 numbering. The findings themselves are unchanged —
this is a numbering-only edit.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
- validation-prototypes/custom-operator/: NeighborExpand toy operator
with paired ExtensionPlanner + custom QueryPlanner via
SessionStateBuilder::with_query_planner
- writeup at .context/experiments/custom-operator.md: 5 probes
(round-trip, EXPLAIN, predicate guard, composition with Filter +
Aggregate, BaselineMetrics) \u2014 all pass; ~250 LoC integration
footprint; no unsafe; no internal API access
- finding: \u00a75.3 is achievable on DF 52.5 as written; deltas are
doc-shaped (predicate push-down opt-in, statistics requirement,
Partitioning override)
- exclude validation-prototypes/ and merge-insert-cas-repro from the main
workspace so the nested cargo workspace can use its own pin set
- add validation-prototypes/{factorized-batches,custom-lance-index}/
scratch crates (never merged to main; long-lived branch only)
- exp 1.1 — factorized batches through DataFusion ops: writeup at
.context/experiments/factorized-batches.md (5 cells × 8 ops; all
scalar-keyed ops accept List<UInt64> input, UNNEST via CROSS JOIN
fails in DF 52.5)
- exp 1.2 — custom Lance index plugin from outside lance: writeup at
.context/experiments/custom-lance-index.md (5 probes; transaction
surface is open, SCALAR_INDEX_PLUGIN_REGISTRY is closed → hard
blocker for MR-737 §5.4; recommends upstream path or external-index
path)