omnigraph/research/lance-autoresearch
Claude ed376af7d8
research: lance-autoresearch — PQ L2 kernel autoresearch harness
Stand up a standalone Rust project under research/lance-autoresearch/ for
LLM-driven optimization of Lance's PQ L2 distance kernels, following Karpathy's
three-file autoresearch contract:

  - src/kernels.rs (mutable, the agent's playground): scalar baseline PQ L2
    distance + top-K matching Lance 4.x's algorithm shape (16 sub-vectors,
    256 centroids, 8-bit codes, 128-d f32).
  - src/{fixture,reference,bin/run_experiment}.rs (immutable): SIFT1M loader
    (fvecs/ivecs + frozen codebook) with deterministic synthetic fallback,
    brute-force ground truth, fixed-format result block with recall@10 floor
    + time-budget exits.
  - program.md (human-iterated): the skill the agent reads each session —
    setup, what it can / cannot edit, the metric, Lance-PQ-specific priors,
    the keep/revert loop.

Smoke tests pass: baseline build clean, recall@10 = 0.66 on synthetic above
the 0.50 floor (exit 0), broken kernel triggers floor failure (exit 2),
clippy -D warnings clean. Excludes research/ from omnigraph workspace so
the nested project doesn't enter omnigraph's cargo build graph.

Licensed dual MIT / Apache-2.0 to keep the upstream-PR path to lance-format/lance
clean.

https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5
2026-05-14 22:38:39 +00:00
..
benches research: lance-autoresearch — PQ L2 kernel autoresearch harness 2026-05-14 22:38:39 +00:00
scripts research: lance-autoresearch — PQ L2 kernel autoresearch harness 2026-05-14 22:38:39 +00:00
src research: lance-autoresearch — PQ L2 kernel autoresearch harness 2026-05-14 22:38:39 +00:00
.gitignore research: lance-autoresearch — PQ L2 kernel autoresearch harness 2026-05-14 22:38:39 +00:00
Cargo.toml research: lance-autoresearch — PQ L2 kernel autoresearch harness 2026-05-14 22:38:39 +00:00
LICENSE-APACHE research: lance-autoresearch — PQ L2 kernel autoresearch harness 2026-05-14 22:38:39 +00:00
LICENSE-MIT research: lance-autoresearch — PQ L2 kernel autoresearch harness 2026-05-14 22:38:39 +00:00
program.md research: lance-autoresearch — PQ L2 kernel autoresearch harness 2026-05-14 22:38:39 +00:00
README.md research: lance-autoresearch — PQ L2 kernel autoresearch harness 2026-05-14 22:38:39 +00:00
rust-toolchain.toml research: lance-autoresearch — PQ L2 kernel autoresearch harness 2026-05-14 22:38:39 +00:00

lance-autoresearch

An autoresearch-style harness for evolving Lance PQ L2 distance kernels via LLM coding agents (Claude Code, Codex, Cursor).

Modeled on Andrej Karpathy's nanochat-research three-file contract:

  • Immutable benchsrc/bin/run_experiment.rs + src/fixture.rs + src/reference.rs. The agent cannot touch these.
  • Mutable kernelsrc/kernels.rs. The agent's playground. Starts as a clean scalar PQ L2 implementation matching Lance's algorithm; the agent's job is to beat it.
  • Human-iterated programprogram.md. The "skill" the agent reads at the start of every session. The human refines it between runs.

The optimization target is the PQ L2 distance kernel for f32 dense vectors on SIFT1M-shaped data (128-d, 16 sub-vectors × 256 centroids, 8-bit codes, top-10 retrieval). The eval oracle is recall@10 against SIFT1M's published ground truth at fixed kernel shape, with geomean_ns_per_query as the speed metric.

Why a separate repo

OmniGraph (the graph engine that motivated this) pins Lance at a released version and consumes its kernels via the public crate API. Improvements live one layer below: in Lance itself. A standalone repo with no OmniGraph dep keeps the optimization target pure (only the kernel changes), keeps the license clean for upstream contribution (dual MIT/Apache-2.0 → Apache-2.0 PRs to Lance), and keeps the agent's working set tiny (~600 lines).

Quick start

# 1. (optional but recommended) Download SIFT1M + train + freeze the PQ codebook.
#    Takes ~510 min; ~250 MB on disk. Skipping it falls back to a synthetic
#    deterministic dataset (1024 base / 64 queries) — useful for smoke-testing
#    the harness but not representative of real workloads.
bash scripts/prepare_fixtures.sh

# 2. Run the baseline.
cargo run --release --bin run_experiment

# 3. Or run with Claude Code / Codex:
#    Open the repo in your agent of choice and prompt:
#       Hi, have a look at program.md and let's kick off a new experiment.

File ownership

File Mutability Edited by
src/kernels.rs mutable the agent
src/bin/run_experiment.rs immutable
src/reference.rs immutable
src/fixture.rs immutable
benches/pq_l2.rs immutable
scripts/prepare_fixtures.sh immutable
program.md human-iterated the human, between runs
results.tsv append-only the agent, per trial (gitignored)

The metric

run_experiment prints a fixed-format block:

---
source:               sift1m
num_base:             1000000
num_queries:          1000
recall_at_10:         0.9421
geomean_ns_per_query: 184273
peak_mem_mb:          42.1
total_seconds:        21.7

A kernel is "kept" iff:

  • recall_at_10 is within 0.005 of the seeded scalar baseline (and ≥ 0.50 hard floor)
  • geomean_ns_per_query is strictly better than the previous best-kept kernel
  • total_seconds ≤ 600

See program.md for the full loop spec.

Upstream contribution path

When a commit clears the keep bar by a meaningful margin (≥10% speedup with recall in-band), the human reviews the diff, ports the technique against lance-format/lance HEAD, runs Lance's own test suite, and opens a PR. Because src/kernels.rs is dual MIT/Apache-2.0 licensed and algorithmically modeled on Lance's existing path, the upstream PR inherits Apache-2.0 cleanly.

License

Dual-licensed under either of:

at your option.