Commit graph

4 commits

Author SHA1 Message Date
Devin AI
3b661f35d3 MR-927 Phase 1 — stable-row-id repro across BTree/Bitmap/LabelList + compaction
Builds and runs the small repro specified in
.context/experiments/stable-row-id-compaction.md §5 ("Small repro plan")
and extends the writeup with the empirical Phase 1 evidence (F7–F11).

Matrix {BTree, Bitmap, LabelList} × {stable=true, stable=false}, 6 fragments
forced via small max_rows_per_file and target_rows_per_fragment, with
with_row_id() probes pre- and post-compaction. All six cases return correct
counts; with enable_stable_row_ids: true the row IDs round-trip unchanged
across compaction; with the flag off the row addresses move (fragment_id <<
32 | local_row), which is the documented contract.

Plus a side experiment confirming that Operation::Overwrite (both staged
via InsertBuilder::execute_uncommitted + CommitBuilder::execute and direct
Dataset::write Overwrite) inherits manifest.uses_stable_row_ids from the
existing dataset, even when the WriteParams flag is absent. This resolves
the suspicion about table_store.rs:956 (stage_overwrite path not setting
the flag): the path is correct, not a latent bug.

Conclusion: MR-737 §5.5 substrate caveat ("Stable Row ID for Index is
documented as experimental in lance-4.0.x") is empirically resolved.
Feature works; docs are conservative. RFC shape for MR-927 is a docs-PR.

Refs MR-925, MR-927.

Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
2026-05-12 22:56:51 +00:00
Devin AI
a09f3ff787 MR-925: experiment 1.4 \u2014 SIP wire format bench (roaring vs varint vs raw)
- validation-prototypes/sip-format-bench/: 4 sizes \u00d7 3 distributions
  \u00d7 3 encodings = 36 cells
- writeup at .context/experiments/sip-format-bench.md
- finding: roaring wins decisively for dense Lance row IDs
  (1.05 bits/elem at n=1M dense, 7\u00d7 faster contains than binary_search);
  loses badly for uniform u64 (176 bits/elem)
- recommendation for \u00a75.6: tagged wire format; tag=0x01 roaring (row
  IDs); tag=0x02 varint-delta (fallback for non-fragment-clustered)
2026-05-12 17:25:56 +00:00
Devin AI
8e54526024 MR-925: experiment 1.3 \u2014 custom UserDefinedLogicalNode + ExecutionPlan e2e
- validation-prototypes/custom-operator/: NeighborExpand toy operator
  with paired ExtensionPlanner + custom QueryPlanner via
  SessionStateBuilder::with_query_planner
- writeup at .context/experiments/custom-operator.md: 5 probes
  (round-trip, EXPLAIN, predicate guard, composition with Filter +
  Aggregate, BaselineMetrics) \u2014 all pass; ~250 LoC integration
  footprint; no unsafe; no internal API access
- finding: \u00a75.3 is achievable on DF 52.5 as written; deltas are
  doc-shaped (predicate push-down opt-in, statistics requirement,
  Partitioning override)
2026-05-12 17:22:02 +00:00
Devin AI
02c4b45c85 MR-925: validation-prototypes scaffolding + exp 1.1 + exp 1.2
- exclude validation-prototypes/ and merge-insert-cas-repro from the main
  workspace so the nested cargo workspace can use its own pin set
- add validation-prototypes/{factorized-batches,custom-lance-index}/
  scratch crates (never merged to main; long-lived branch only)
- exp 1.1 — factorized batches through DataFusion ops: writeup at
  .context/experiments/factorized-batches.md (5 cells × 8 ops; all
  scalar-keyed ops accept List<UInt64> input, UNNEST via CROSS JOIN
  fails in DF 52.5)
- exp 1.2 — custom Lance index plugin from outside lance: writeup at
  .context/experiments/custom-lance-index.md (5 probes; transaction
  surface is open, SCALAR_INDEX_PLUGIN_REGISTRY is closed → hard
  blocker for MR-737 §5.4; recommends upstream path or external-index
  path)
2026-05-12 16:49:33 +00:00