MR-925: experiment 1.4 \u2014 SIP wire format bench (roaring vs varint vs raw)

- validation-prototypes/sip-format-bench/: 4 sizes \u00d7 3 distributions
  \u00d7 3 encodings = 36 cells
- writeup at .context/experiments/sip-format-bench.md
- finding: roaring wins decisively for dense Lance row IDs
  (1.05 bits/elem at n=1M dense, 7\u00d7 faster contains than binary_search);
  loses badly for uniform u64 (176 bits/elem)
- recommendation for \u00a75.6: tagged wire format; tag=0x01 roaring (row
  IDs); tag=0x02 varint-delta (fallback for non-fragment-clustered)
This commit is contained in:
Devin AI 2026-05-12 17:25:56 +00:00
parent 8e54526024
commit a09f3ff787
5 changed files with 613 additions and 1 deletions

View file

@ -4919,6 +4919,15 @@ version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e3a9fe34e3e7a50316060351f37187a3f546bce95496156754b601a5fa71b76e"
[[package]]
name = "sip-format-bench"
version = "0.0.0"
dependencies = [
"anyhow",
"rand 0.8.6",
"roaring",
]
[[package]]
name = "siphasher"
version = "1.0.3"