omnigraph/research/lance-autoresearch
Claude 7b1b0b5b75
research: fix lance-autoresearch correctness bugs surfaced by code review
A code review pass found a cluster of real bugs in metrics and contract;
fixing them before any agent loop runs against this harness.

Critical metric bug:
- harness-common::sysinfo::peak_rss_mb read VmPeak (virtual address space
  high-water-mark, includes mmap'd files / guard pages / untouched
  allocations) instead of VmHWM (resident pages high-water-mark). The
  function name and HARNESS.md contract both promised RSS. Every
  peak_mem_mb row logged under the old code was virtual peak, not RSS.

Correctness contract bug:
- reference::topk_consistent's tie-tolerance had a flawed neighbor-scan
  check: when the K-th distance fell in a multi-way tie, agent and
  reference could legally return different K-sized subsets of the tied
  band (heap eviction order vs. sort stability), and the neighbor scan
  required both endpoints to be present, false-negativing legitimate
  cases. Simplified to a positional distance-tolerance check; ids at the
  same rank may differ silently because the distance match within tol
  constrains the swap to a 2*tol band. Diagnostic comment explains the
  rationale.

API hygiene:
- Removed dead PqKernel::shape() and ScalarReference::shape() — declared
  in the public API contract (program.md, kernels.rs comment), required
  to be stable, never called by the bench / benches / inputs / reference.
  Now the contract reflects what the bench actually uses.
- Removed dead `anyhow` workspace dependency.

Determinism:
- PRNG seed mixing now uses the SplitMix64 finalizer per part instead of
  raw XOR. Raw XOR is commutative and small-constant collisions are
  reachable; mix_seeds iterates the finalizer once per ingredient so
  distinct (seed, shape, kind) tuples produce distinct streams with
  vanishingly small collision probability.

License headers:
- kernels.rs SPDX changed from Apache-2.0 to MIT OR Apache-2.0 to match
  the crate's Cargo.toml license field (the rest of the crate is dual-
  licensed). Added matching SPDX headers to reference.rs and inputs.rs.

Doc cleanups:
- design.md: replaced the broken relative link
  `../../docs/research/llm-evolutionary-sampling.md` (which resolved inside
  lance-autoresearch where the note doesn't live) with a path-explained
  reference noting the note lives in the parent OmniGraph repo and won't
  ship on extraction.
- README.md: clarified that the target table mixes a single landed target
  with a candidate roadmap — they have no code yet.
- HARNESS.md: added exit code 1 (internal error) to the exit-code summary;
  was documented in run_experiment.rs but not in the loop contract.
- adding-a-target.md: dropped the misleading "cp -r plus surgical edits"
  framing — the workflow rewrites 7 files; what's inherited is Cargo
  manifest, license headers, workspace registration, and shared utilities.

Verified end-to-end: cargo build / clippy / test all green. Baseline
trial runs `correctness: pass` exit 0 in ~34s (peak_mem_mb now reads
RSS — same workload reports 91 MB, plausibly correct given the temporary
fixture-construction buffers).

https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5
2026-05-15 00:55:57 +00:00
..
crates research: fix lance-autoresearch correctness bugs surfaced by code review 2026-05-15 00:55:57 +00:00
docs research: fix lance-autoresearch correctness bugs surfaced by code review 2026-05-15 00:55:57 +00:00
.gitignore research: lance-autoresearch — PQ L2 kernel autoresearch harness 2026-05-14 22:38:39 +00:00
Cargo.toml research: fix lance-autoresearch correctness bugs surfaced by code review 2026-05-15 00:55:57 +00:00
HARNESS.md research: fix lance-autoresearch correctness bugs surfaced by code review 2026-05-15 00:55:57 +00:00
LICENSE-APACHE research: lance-autoresearch — PQ L2 kernel autoresearch harness 2026-05-14 22:38:39 +00:00
LICENSE-MIT research: lance-autoresearch — PQ L2 kernel autoresearch harness 2026-05-14 22:38:39 +00:00
README.md research: fix lance-autoresearch correctness bugs surfaced by code review 2026-05-15 00:55:57 +00:00
rust-toolchain.toml research: lance-autoresearch — PQ L2 kernel autoresearch harness 2026-05-14 22:38:39 +00:00

lance-autoresearch

A multi-target workspace for evolving Lance hot-path kernels via LLM coding agents (Claude Code, Codex, Cursor), in the style of Andrej Karpathy's nanochat-research single-agent autoresearch loop.

Each landed target is an independent Rust crate under crates/. The candidates below are listed as a roadmap — they have no code yet, only the research-note rationale and a docs/targets/<name>.md capsule (when one exists). Spinning up a candidate is the docs/adding-a-target.md workflow.

Target Status Lance source area What's optimized
crates/pq-l2 landed lance-linalg::distance::l2, PQ probe PQ L2 distance: build LUT, probe codes, top-K
crates/pq-cosine candidate (A1) lance-linalg::distance::cosine PQ cosine distance
crates/pq-dot candidate (A1) lance-linalg::distance::dot PQ dot-product distance
crates/ivf-partition candidate (A2) lance-index::vector::ivf partition select IVF partition selection (centroid scan)
crates/fts-bm25 candidate (A3) lance-index::scalar::inverted BM25 FTS BM25 scoring inner loop
crates/bitpack candidate (A4) lance-encoding::encodings::bitpack Bitpack integer decode
crates/dictionary candidate (A5) lance-encoding::encodings::dictionary Dictionary decode
crates/fsst candidate (A6) lance-encoding::encodings::fsst FSST string decode
crates/take candidate (A7) lance-core::utils::take Take / gather kernel
crates/predicate candidate (A8) lance-datafusion filter eval Predicate evaluation kernels
crates/posting-intersect candidate (A9) lance-index::scalar::inverted Posting list intersection (FTS AND)
crates/topk-merge candidate (A10) scan-merge Top-K k-way merge

The candidate targets are documented in docs/targets/ and can be added by following docs/adding-a-target.md. The single landed target (pq-l2) proves the harness shape; the candidates wait for an agent to spin them up.

The contract every target follows

Karpathy's three-file shape, applied per target:

File (per target crate) Mutability Edited by
src/kernels.rs mutable the agent
src/reference.rs, src/inputs.rs, src/lib.rs, src/bin/run_experiment.rs, benches/*.rs immutable
program.md human-iterated the human, between runs
results.tsv append-only the agent, per trial (gitignored)

The shared utilities — deterministic PRNG, geomean, peak-RSS readback, tolerance constants, time-budget — live in crates/harness-common and are consumed by every target. There is intentionally no Target trait: decode-kernel signatures and distance-kernel signatures are different enough that a unifying trait would either bloat or require erased boxing. Each target is its own natural shape; the shared crate is plumbing only.

The shared loop conventions every target's program.md inherits live in HARNESS.md. Per-target priors and API specifics live in each target's own program.md.

Dataset-independent by design

Every other ANN benchmark you've seen is "compete on this fixed dataset" (SIFT1M, GIST1M, DEEP1B). That conflates two things: kernel correctness (the math) and kernel speed under one specific data distribution. An LLM agent given recall@K as the oracle has incentive to overfit to the dataset's quirks.

We split them, every target:

  • Correctness = bit-equivalent (max_abs_err ≤ 1e-4 for floats; bitwise for integer/byte kernels) match to a scalar reference, on diverse generated inputs. Mathematical equivalence; no dataset to overfit. Lossy techniques fail this gate.
  • Speed = geomean ns/operation across multiple shape × distribution combinations, with worst-case guard. A kernel that wins on one distribution and regresses on another fails to keep.

By construction, an "improvement" generalizes across distributions and shapes. There is no wget sift.tar.gz step; every target is fully self-contained.

Why a separate repo (and a workspace, not a single crate)

OmniGraph (the graph engine that motivated this) pins Lance at a released version and consumes its kernels via the public crate API. Improvements live one layer below: in Lance itself. A standalone repo with no OmniGraph dep keeps the optimization target pure (only the kernel changes), keeps the license clean for upstream contribution (dual MIT/Apache-2.0 → Apache-2.0 PRs to Lance), and keeps each agent's working set tiny.

Workspace not single-crate because per-target deps differ — FSST decode will want a different dependency set than PQ kernels — and the agent's edits to one target's kernels.rs must not collide with another's lib path. Each target is buildable, testable, and runnable in isolation: cd crates/<target> && cargo run --release --bin run_experiment.

Quick start

# Run the landed PQ L2 target's baseline.
cargo run --release --bin run_experiment -p pq-l2

# Or with Claude Code / Codex, working on one target:
cd crates/pq-l2
# Open in your agent of choice and prompt:
#   Hi, have a look at program.md and let's kick off a new experiment.

# Add a new target (see docs/adding-a-target.md):
cp -r crates/pq-l2 crates/pq-cosine
# ... edit Cargo.toml name, kernels.rs / reference.rs / inputs.rs / program.md

Repo layout

lance-autoresearch/
├── Cargo.toml                         # workspace root
├── README.md                          # you are here
├── HARNESS.md                         # shared loop contract every target inherits
├── LICENSE-MIT, LICENSE-APACHE        # dual-licensed (Apache compat for Lance PRs)
├── crates/
│   ├── harness-common/                # shared: SplitMix64, geomean, peak RSS, tolerance, time budget
│   │   └── src/{lib,prng,stats,sysinfo,tolerance}.rs
│   └── pq-l2/                         # landed target
│       ├── Cargo.toml
│       ├── program.md                 # this target's agent skill
│       ├── src/
│       │   ├── lib.rs                 # PqShape + module wiring (immutable)
│       │   ├── kernels.rs             # MUTABLE — agent's playground
│       │   ├── reference.rs           # IMMUTABLE — scalar reference, oracle helpers
│       │   ├── inputs.rs              # IMMUTABLE — diverse test-data generators
│       │   └── bin/run_experiment.rs  # IMMUTABLE — per-trial entry point
│       └── benches/pq_l2.rs           # criterion benchmark (immutable)
└── docs/
    ├── design.md                      # rationale for the workspace shape
    ├── adding-a-target.md             # workflow for spinning up a new target
    └── targets/
        └── pq-l2.md                   # capsule: upstream Lance pointers, oracle, status

Upstream contribution path

When a commit on any target clears the keep bar by a meaningful margin (≥10% geomean speedup with worst-case guard intact), the human reviews the diff, ports the technique against lance-format/lance HEAD, runs Lance's own test suite, and opens a PR. Because the workspace is dual MIT/Apache-2.0 licensed and each target's kernel is algorithmically modeled on Lance's existing path, the upstream PR inherits Apache-2.0 cleanly.

License

Dual-licensed under either of:

at your option.