MR-925: experiment 1.4 \u2014 SIP wire format bench (roaring vs varint vs raw)

- validation-prototypes/sip-format-bench/: 4 sizes \u00d7 3 distributions
  \u00d7 3 encodings = 36 cells
- writeup at .context/experiments/sip-format-bench.md
- finding: roaring wins decisively for dense Lance row IDs
  (1.05 bits/elem at n=1M dense, 7\u00d7 faster contains than binary_search);
  loses badly for uniform u64 (176 bits/elem)
- recommendation for \u00a75.6: tagged wire format; tag=0x01 roaring (row
  IDs); tag=0x02 varint-delta (fallback for non-fragment-clustered)
This commit is contained in:
Devin AI 2026-05-12 17:25:56 +00:00
parent 8e54526024
commit a09f3ff787
5 changed files with 613 additions and 1 deletions

View file

@ -4,8 +4,8 @@ members = [
"factorized-batches",
"custom-lance-index",
"custom-operator",
"sip-format-bench",
# Additional crates added as each experiment is set up:
# "sip-format-bench", # 1.4
# "bitmap-pushdown", # 1.5
# "txn-branches-cost", # 1.6
# "stable-rowid-index", # 1.7