mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-21 02:28:07 +02:00

Claude 0d72cc69fb research: restructure lance-autoresearch as multi-target workspace The original lance-autoresearch was one Cargo crate optimizing one Lance kernel (PQ L2 distance). With 9+ candidate targets enumerated in the research note, a single-crate shape doesn't scale: per-target deps will collide, the agent's edits to one target's kernels.rs would conflict with another's lib path, and build/test isolation is lost. Restructure into a Cargo workspace. Layout: research/lance-autoresearch/ ├── Cargo.toml (workspace root) ├── README.md (target table, contract overview, repo layout) ├── HARNESS.md (universal loop contract every target inherits) ├── crates/ │ ├── harness-common/ (shared: SplitMix64, geomean, peak RSS, │ │ MAX_ABS_ERR, TOPK_DIST_TOL, TIME_BUDGET_SECS) │ └── pq-l2/ (the landed target; was the previous single crate) └── docs/ ├── design.md (rationale for workspace shape, no Target trait) ├── adding-a-target.md (step-by-step workflow for new targets) └── targets/pq-l2.md (per-target capsule) Decisions documented in docs/design.md: - Workspace, not single crate: per-target Cargo.toml so deps don't collide; per-target src tree so agent edits don't conflict; per-target build/test isolation for faster agent iteration. - harness-common as a plumbing-only crate (PRNG, geomean, peak RSS, tolerance constants, time budget). Intentionally NO Target trait - decode kernel signatures and distance kernel signatures differ enough that a unifying trait would either bloat or require erased boxing. Each target is its own natural shape. - Per-target program.md + shared HARNESS.md: the loop contract is universal, the priors and API spec are per-target. Two files instead of one because copy-pasting the universal loop into every program.md would drift. pq-l2 refactor: - src/* moved into crates/pq-l2/src/* via git mv (preserves history) - crate renamed lance-autoresearch -> pq-l2 - SplitMix64, geomean, peak_rss_mb, MAX_ABS_ERR, TOPK_DIST_TOL, TIME_BUDGET_SECS now imported from harness-common (drops ~70 lines of duplication that would have been copy-pasted into every new target) - program.md trimmed: setup/loop/hygiene moved to HARNESS.md; only the PQ-L2-specific API contract and SIMD priors remain - Cargo.toml depends on harness-common via path; workspace.dependencies pins criterion uniformly across targets The 9 candidate targets from the research note (A1 cosine/dot/hamming, A2 IVF partition select, A3 FTS BM25, A4 bitpack decode, A5 dictionary decode, A6 FSST decode, A7 take/gather, A8 predicate eval, A9 posting list intersect, A10 top-K merge) are listed in README.md's target table as "candidate"; each gets a docs/targets/<name>.md capsule when it's spun up. docs/adding-a-target.md documents the cp -r + edit-Cargo.toml + rewrite-three-files workflow. Verified end-to-end: - cargo build --release: clean, both crates compile - cargo clippy --release --workspace --all-targets -- -D warnings: clean - cargo test --release --workspace: 6/6 pass (4 harness-common + 2 pq-l2) - cargo run --release --bin run_experiment -p pq-l2: correctness pass, geomean ~880k ns, exit 0, ~30s wall-clock - omnigraph parent workspace unchanged (research/ excluded as before) https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5		2026-05-15 00:15:02 +00:00
..
crates	research: restructure lance-autoresearch as multi-target workspace	2026-05-15 00:15:02 +00:00
docs	research: restructure lance-autoresearch as multi-target workspace	2026-05-15 00:15:02 +00:00
.gitignore	research: lance-autoresearch — PQ L2 kernel autoresearch harness	2026-05-14 22:38:39 +00:00
Cargo.toml	research: restructure lance-autoresearch as multi-target workspace	2026-05-15 00:15:02 +00:00
HARNESS.md	research: restructure lance-autoresearch as multi-target workspace	2026-05-15 00:15:02 +00:00
LICENSE-APACHE	research: lance-autoresearch — PQ L2 kernel autoresearch harness	2026-05-14 22:38:39 +00:00
LICENSE-MIT	research: lance-autoresearch — PQ L2 kernel autoresearch harness	2026-05-14 22:38:39 +00:00
README.md	research: restructure lance-autoresearch as multi-target workspace	2026-05-15 00:15:02 +00:00
rust-toolchain.toml	research: lance-autoresearch — PQ L2 kernel autoresearch harness	2026-05-14 22:38:39 +00:00

README.md

lance-autoresearch

A multi-target workspace for evolving Lance hot-path kernels via LLM coding agents (Claude Code, Codex, Cursor), in the style of Andrej Karpathy's nanochat-research single-agent autoresearch loop.

Each target is an independent Rust crate under crates/:

Target	Status	Lance source area	What's optimized
`crates/pq-l2`	landed	`lance-linalg::distance::l2`, PQ probe	PQ L2 distance: build LUT, probe codes, top-K
`crates/pq-cosine`	candidate (A1)	`lance-linalg::distance::cosine`	PQ cosine distance
`crates/pq-dot`	candidate (A1)	`lance-linalg::distance::dot`	PQ dot-product distance
`crates/ivf-partition`	candidate (A2)	`lance-index::vector::ivf` partition select	IVF partition selection (centroid scan)
`crates/fts-bm25`	candidate (A3)	`lance-index::scalar::inverted` BM25	FTS BM25 scoring inner loop
`crates/bitpack`	candidate (A4)	`lance-encoding::encodings::bitpack`	Bitpack integer decode
`crates/dictionary`	candidate (A5)	`lance-encoding::encodings::dictionary`	Dictionary decode
`crates/fsst`	candidate (A6)	`lance-encoding::encodings::fsst`	FSST string decode
`crates/take`	candidate (A7)	`lance-core::utils::take`	Take / gather kernel
`crates/predicate`	candidate (A8)	`lance-datafusion` filter eval	Predicate evaluation kernels
`crates/posting-intersect`	candidate (A9)	`lance-index::scalar::inverted`	Posting list intersection (FTS AND)
`crates/topk-merge`	candidate (A10)	scan-merge	Top-K k-way merge

The candidate targets are documented in docs/targets/ and can be added by following docs/adding-a-target.md. The single landed target (pq-l2) proves the harness shape; the candidates wait for an agent to spin them up.

The contract every target follows

Karpathy's three-file shape, applied per target:

File (per target crate)	Mutability	Edited by
`src/kernels.rs`	mutable	the agent
`src/reference.rs`, `src/inputs.rs`, `src/lib.rs`, `src/bin/run_experiment.rs`, `benches/*.rs`	immutable	—
`program.md`	human-iterated	the human, between runs
`results.tsv`	append-only	the agent, per trial (gitignored)

The shared utilities — deterministic PRNG, geomean, peak-RSS readback, tolerance constants, time-budget — live in crates/harness-common and are consumed by every target. There is intentionally no Target trait: decode-kernel signatures and distance-kernel signatures are different enough that a unifying trait would either bloat or require erased boxing. Each target is its own natural shape; the shared crate is plumbing only.

The shared loop conventions every target's program.md inherits live in HARNESS.md. Per-target priors and API specifics live in each target's own program.md.

Dataset-independent by design

Every other ANN benchmark you've seen is "compete on this fixed dataset" (SIFT1M, GIST1M, DEEP1B). That conflates two things: kernel correctness (the math) and kernel speed under one specific data distribution. An LLM agent given recall@K as the oracle has incentive to overfit to the dataset's quirks.

We split them, every target:

Correctness = bit-equivalent (max_abs_err ≤ 1e-4 for floats; bitwise for integer/byte kernels) match to a scalar reference, on diverse generated inputs. Mathematical equivalence; no dataset to overfit. Lossy techniques fail this gate.
Speed = geomean ns/operation across multiple shape × distribution combinations, with worst-case guard. A kernel that wins on one distribution and regresses on another fails to keep.

By construction, an "improvement" generalizes across distributions and shapes. There is no wget sift.tar.gz step; every target is fully self-contained.

Why a separate repo (and a workspace, not a single crate)

OmniGraph (the graph engine that motivated this) pins Lance at a released version and consumes its kernels via the public crate API. Improvements live one layer below: in Lance itself. A standalone repo with no OmniGraph dep keeps the optimization target pure (only the kernel changes), keeps the license clean for upstream contribution (dual MIT/Apache-2.0 → Apache-2.0 PRs to Lance), and keeps each agent's working set tiny.

Workspace not single-crate because per-target deps differ — FSST decode will want a different dependency set than PQ kernels — and the agent's edits to one target's kernels.rs must not collide with another's lib path. Each target is buildable, testable, and runnable in isolation: cd crates/<target> && cargo run --release --bin run_experiment.

Quick start

# Run the landed PQ L2 target's baseline.
cargo run --release --bin run_experiment -p pq-l2

# Or with Claude Code / Codex, working on one target:
cd crates/pq-l2
# Open in your agent of choice and prompt:
#   Hi, have a look at program.md and let's kick off a new experiment.

# Add a new target (see docs/adding-a-target.md):
cp -r crates/pq-l2 crates/pq-cosine
# ... edit Cargo.toml name, kernels.rs / reference.rs / inputs.rs / program.md

Repo layout

lance-autoresearch/
├── Cargo.toml                         # workspace root
├── README.md                          # you are here
├── HARNESS.md                         # shared loop contract every target inherits
├── LICENSE-MIT, LICENSE-APACHE        # dual-licensed (Apache compat for Lance PRs)
├── crates/
│   ├── harness-common/                # shared: SplitMix64, geomean, peak RSS, tolerance, time budget
│   │   └── src/{lib,prng,stats,sysinfo,tolerance}.rs
│   └── pq-l2/                         # landed target
│       ├── Cargo.toml
│       ├── program.md                 # this target's agent skill
│       ├── src/
│       │   ├── lib.rs                 # PqShape + module wiring (immutable)
│       │   ├── kernels.rs             # MUTABLE — agent's playground
│       │   ├── reference.rs           # IMMUTABLE — scalar reference, oracle helpers
│       │   ├── inputs.rs              # IMMUTABLE — diverse test-data generators
│       │   └── bin/run_experiment.rs  # IMMUTABLE — per-trial entry point
│       └── benches/pq_l2.rs           # criterion benchmark (immutable)
└── docs/
    ├── design.md                      # rationale for the workspace shape
    ├── adding-a-target.md             # workflow for spinning up a new target
    └── targets/
        └── pq-l2.md                   # capsule: upstream Lance pointers, oracle, status

Upstream contribution path

When a commit on any target clears the keep bar by a meaningful margin (≥10% geomean speedup with worst-case guard intact), the human reviews the diff, ports the technique against lance-format/lance HEAD, runs Lance's own test suite, and opens a PR. Because the workspace is dual MIT/Apache-2.0 licensed and each target's kernel is algorithmically modeled on Lance's existing path, the upstream PR inherits Apache-2.0 cleanly.

License

Dual-licensed under either of:

MIT license (LICENSE-MIT)
Apache License, Version 2.0 (LICENSE-APACHE)

at your option.

README.md Unescape Escape