The original lance-autoresearch was one Cargo crate optimizing one Lance
kernel (PQ L2 distance). With 9+ candidate targets enumerated in the research
note, a single-crate shape doesn't scale: per-target deps will collide, the
agent's edits to one target's kernels.rs would conflict with another's lib
path, and build/test isolation is lost. Restructure into a Cargo workspace.
Layout:
research/lance-autoresearch/
├── Cargo.toml (workspace root)
├── README.md (target table, contract overview, repo layout)
├── HARNESS.md (universal loop contract every target inherits)
├── crates/
│ ├── harness-common/ (shared: SplitMix64, geomean, peak RSS,
│ │ MAX_ABS_ERR, TOPK_DIST_TOL, TIME_BUDGET_SECS)
│ └── pq-l2/ (the landed target; was the previous single crate)
└── docs/
├── design.md (rationale for workspace shape, no Target trait)
├── adding-a-target.md (step-by-step workflow for new targets)
└── targets/pq-l2.md (per-target capsule)
Decisions documented in docs/design.md:
- Workspace, not single crate: per-target Cargo.toml so deps don't collide;
per-target src tree so agent edits don't conflict; per-target build/test
isolation for faster agent iteration.
- harness-common as a plumbing-only crate (PRNG, geomean, peak RSS, tolerance
constants, time budget). Intentionally NO Target trait - decode kernel
signatures and distance kernel signatures differ enough that a unifying
trait would either bloat or require erased boxing. Each target is its own
natural shape.
- Per-target program.md + shared HARNESS.md: the loop contract is universal,
the priors and API spec are per-target. Two files instead of one because
copy-pasting the universal loop into every program.md would drift.
pq-l2 refactor:
- src/* moved into crates/pq-l2/src/* via git mv (preserves history)
- crate renamed lance-autoresearch -> pq-l2
- SplitMix64, geomean, peak_rss_mb, MAX_ABS_ERR, TOPK_DIST_TOL,
TIME_BUDGET_SECS now imported from harness-common (drops ~70 lines of
duplication that would have been copy-pasted into every new target)
- program.md trimmed: setup/loop/hygiene moved to HARNESS.md; only the
PQ-L2-specific API contract and SIMD priors remain
- Cargo.toml depends on harness-common via path; workspace.dependencies
pins criterion uniformly across targets
The 9 candidate targets from the research note (A1 cosine/dot/hamming, A2
IVF partition select, A3 FTS BM25, A4 bitpack decode, A5 dictionary decode,
A6 FSST decode, A7 take/gather, A8 predicate eval, A9 posting list intersect,
A10 top-K merge) are listed in README.md's target table as "candidate"; each
gets a docs/targets/<name>.md capsule when it's spun up. docs/adding-a-target.md
documents the cp -r + edit-Cargo.toml + rewrite-three-files workflow.
Verified end-to-end:
- cargo build --release: clean, both crates compile
- cargo clippy --release --workspace --all-targets -- -D warnings: clean
- cargo test --release --workspace: 6/6 pass (4 harness-common + 2 pq-l2)
- cargo run --release --bin run_experiment -p pq-l2: correctness pass,
geomean ~880k ns, exit 0, ~30s wall-clock
- omnigraph parent workspace unchanged (research/ excluded as before)
https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5
|
||
|---|---|---|
| .. | ||
| crates | ||
| docs | ||
| .gitignore | ||
| Cargo.toml | ||
| HARNESS.md | ||
| LICENSE-APACHE | ||
| LICENSE-MIT | ||
| README.md | ||
| rust-toolchain.toml | ||
lance-autoresearch
A multi-target workspace for evolving Lance
hot-path kernels via LLM coding agents (Claude Code, Codex, Cursor),
in the style of Andrej Karpathy's
nanochat-research
single-agent autoresearch loop.
Each target is an independent Rust crate under crates/:
| Target | Status | Lance source area | What's optimized |
|---|---|---|---|
crates/pq-l2 |
landed | lance-linalg::distance::l2, PQ probe |
PQ L2 distance: build LUT, probe codes, top-K |
crates/pq-cosine |
candidate (A1) | lance-linalg::distance::cosine |
PQ cosine distance |
crates/pq-dot |
candidate (A1) | lance-linalg::distance::dot |
PQ dot-product distance |
crates/ivf-partition |
candidate (A2) | lance-index::vector::ivf partition select |
IVF partition selection (centroid scan) |
crates/fts-bm25 |
candidate (A3) | lance-index::scalar::inverted BM25 |
FTS BM25 scoring inner loop |
crates/bitpack |
candidate (A4) | lance-encoding::encodings::bitpack |
Bitpack integer decode |
crates/dictionary |
candidate (A5) | lance-encoding::encodings::dictionary |
Dictionary decode |
crates/fsst |
candidate (A6) | lance-encoding::encodings::fsst |
FSST string decode |
crates/take |
candidate (A7) | lance-core::utils::take |
Take / gather kernel |
crates/predicate |
candidate (A8) | lance-datafusion filter eval |
Predicate evaluation kernels |
crates/posting-intersect |
candidate (A9) | lance-index::scalar::inverted |
Posting list intersection (FTS AND) |
crates/topk-merge |
candidate (A10) | scan-merge | Top-K k-way merge |
The candidate targets are documented in docs/targets/ and can
be added by following docs/adding-a-target.md. The
single landed target (pq-l2) proves the harness shape; the candidates wait
for an agent to spin them up.
The contract every target follows
Karpathy's three-file shape, applied per target:
| File (per target crate) | Mutability | Edited by |
|---|---|---|
src/kernels.rs |
mutable | the agent |
src/reference.rs, src/inputs.rs, src/lib.rs, src/bin/run_experiment.rs, benches/*.rs |
immutable | — |
program.md |
human-iterated | the human, between runs |
results.tsv |
append-only | the agent, per trial (gitignored) |
The shared utilities — deterministic PRNG, geomean, peak-RSS readback,
tolerance constants, time-budget — live in crates/harness-common
and are consumed by every target. There is intentionally no Target trait:
decode-kernel signatures and distance-kernel signatures are different enough
that a unifying trait would either bloat or require erased boxing. Each target
is its own natural shape; the shared crate is plumbing only.
The shared loop conventions every target's program.md inherits live in
HARNESS.md. Per-target priors and API specifics live in each
target's own program.md.
Dataset-independent by design
Every other ANN benchmark you've seen is "compete on this fixed dataset" (SIFT1M, GIST1M, DEEP1B). That conflates two things: kernel correctness (the math) and kernel speed under one specific data distribution. An LLM agent given recall@K as the oracle has incentive to overfit to the dataset's quirks.
We split them, every target:
- Correctness = bit-equivalent (
max_abs_err ≤ 1e-4for floats; bitwise for integer/byte kernels) match to a scalar reference, on diverse generated inputs. Mathematical equivalence; no dataset to overfit. Lossy techniques fail this gate. - Speed = geomean ns/operation across multiple shape × distribution combinations, with worst-case guard. A kernel that wins on one distribution and regresses on another fails to keep.
By construction, an "improvement" generalizes across distributions and shapes.
There is no wget sift.tar.gz step; every target is fully self-contained.
Why a separate repo (and a workspace, not a single crate)
OmniGraph (the graph engine that motivated this) pins Lance at a released version and consumes its kernels via the public crate API. Improvements live one layer below: in Lance itself. A standalone repo with no OmniGraph dep keeps the optimization target pure (only the kernel changes), keeps the license clean for upstream contribution (dual MIT/Apache-2.0 → Apache-2.0 PRs to Lance), and keeps each agent's working set tiny.
Workspace not single-crate because per-target deps differ — FSST decode
will want a different dependency set than PQ kernels — and the agent's edits
to one target's kernels.rs must not collide with another's lib path. Each
target is buildable, testable, and runnable in isolation: cd crates/<target> && cargo run --release --bin run_experiment.
Quick start
# Run the landed PQ L2 target's baseline.
cargo run --release --bin run_experiment -p pq-l2
# Or with Claude Code / Codex, working on one target:
cd crates/pq-l2
# Open in your agent of choice and prompt:
# Hi, have a look at program.md and let's kick off a new experiment.
# Add a new target (see docs/adding-a-target.md):
cp -r crates/pq-l2 crates/pq-cosine
# ... edit Cargo.toml name, kernels.rs / reference.rs / inputs.rs / program.md
Repo layout
lance-autoresearch/
├── Cargo.toml # workspace root
├── README.md # you are here
├── HARNESS.md # shared loop contract every target inherits
├── LICENSE-MIT, LICENSE-APACHE # dual-licensed (Apache compat for Lance PRs)
├── crates/
│ ├── harness-common/ # shared: SplitMix64, geomean, peak RSS, tolerance, time budget
│ │ └── src/{lib,prng,stats,sysinfo,tolerance}.rs
│ └── pq-l2/ # landed target
│ ├── Cargo.toml
│ ├── program.md # this target's agent skill
│ ├── src/
│ │ ├── lib.rs # PqShape + module wiring (immutable)
│ │ ├── kernels.rs # MUTABLE — agent's playground
│ │ ├── reference.rs # IMMUTABLE — scalar reference, oracle helpers
│ │ ├── inputs.rs # IMMUTABLE — diverse test-data generators
│ │ └── bin/run_experiment.rs # IMMUTABLE — per-trial entry point
│ └── benches/pq_l2.rs # criterion benchmark (immutable)
└── docs/
├── design.md # rationale for the workspace shape
├── adding-a-target.md # workflow for spinning up a new target
└── targets/
└── pq-l2.md # capsule: upstream Lance pointers, oracle, status
Upstream contribution path
When a commit on any target clears the keep bar by a meaningful margin
(≥10% geomean speedup with worst-case guard intact), the human reviews the
diff, ports the technique against
lance-format/lance HEAD, runs
Lance's own test suite, and opens a PR. Because the workspace is dual
MIT/Apache-2.0 licensed and each target's kernel is algorithmically modeled on
Lance's existing path, the upstream PR inherits Apache-2.0 cleanly.
License
Dual-licensed under either of:
- MIT license (LICENSE-MIT)
- Apache License, Version 2.0 (LICENSE-APACHE)
at your option.