Commit graph

3 commits

Author SHA1 Message Date
Claude
0d72cc69fb
research: restructure lance-autoresearch as multi-target workspace
The original lance-autoresearch was one Cargo crate optimizing one Lance
kernel (PQ L2 distance). With 9+ candidate targets enumerated in the research
note, a single-crate shape doesn't scale: per-target deps will collide, the
agent's edits to one target's kernels.rs would conflict with another's lib
path, and build/test isolation is lost. Restructure into a Cargo workspace.

Layout:

  research/lance-autoresearch/
  ├── Cargo.toml          (workspace root)
  ├── README.md           (target table, contract overview, repo layout)
  ├── HARNESS.md          (universal loop contract every target inherits)
  ├── crates/
  │   ├── harness-common/ (shared: SplitMix64, geomean, peak RSS,
  │   │                    MAX_ABS_ERR, TOPK_DIST_TOL, TIME_BUDGET_SECS)
  │   └── pq-l2/          (the landed target; was the previous single crate)
  └── docs/
      ├── design.md           (rationale for workspace shape, no Target trait)
      ├── adding-a-target.md  (step-by-step workflow for new targets)
      └── targets/pq-l2.md    (per-target capsule)

Decisions documented in docs/design.md:

- Workspace, not single crate: per-target Cargo.toml so deps don't collide;
  per-target src tree so agent edits don't conflict; per-target build/test
  isolation for faster agent iteration.
- harness-common as a plumbing-only crate (PRNG, geomean, peak RSS, tolerance
  constants, time budget). Intentionally NO Target trait - decode kernel
  signatures and distance kernel signatures differ enough that a unifying
  trait would either bloat or require erased boxing. Each target is its own
  natural shape.
- Per-target program.md + shared HARNESS.md: the loop contract is universal,
  the priors and API spec are per-target. Two files instead of one because
  copy-pasting the universal loop into every program.md would drift.

pq-l2 refactor:
- src/* moved into crates/pq-l2/src/* via git mv (preserves history)
- crate renamed lance-autoresearch -> pq-l2
- SplitMix64, geomean, peak_rss_mb, MAX_ABS_ERR, TOPK_DIST_TOL,
  TIME_BUDGET_SECS now imported from harness-common (drops ~70 lines of
  duplication that would have been copy-pasted into every new target)
- program.md trimmed: setup/loop/hygiene moved to HARNESS.md; only the
  PQ-L2-specific API contract and SIMD priors remain
- Cargo.toml depends on harness-common via path; workspace.dependencies
  pins criterion uniformly across targets

The 9 candidate targets from the research note (A1 cosine/dot/hamming, A2
IVF partition select, A3 FTS BM25, A4 bitpack decode, A5 dictionary decode,
A6 FSST decode, A7 take/gather, A8 predicate eval, A9 posting list intersect,
A10 top-K merge) are listed in README.md's target table as "candidate"; each
gets a docs/targets/<name>.md capsule when it's spun up. docs/adding-a-target.md
documents the cp -r + edit-Cargo.toml + rewrite-three-files workflow.

Verified end-to-end:
- cargo build --release: clean, both crates compile
- cargo clippy --release --workspace --all-targets -- -D warnings: clean
- cargo test --release --workspace: 6/6 pass (4 harness-common + 2 pq-l2)
- cargo run --release --bin run_experiment -p pq-l2: correctness pass,
  geomean ~880k ns, exit 0, ~30s wall-clock
- omnigraph parent workspace unchanged (research/ excluded as before)

https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5
2026-05-15 00:15:02 +00:00
Claude
272b70bfb4
research: redesign lance-autoresearch oracle to be dataset-independent
Original harness used recall@K vs. SIFT1M as the correctness oracle, which gives
the agent incentive to overfit to one data distribution: a kernel that hits
recall@10 on SIFT-shaped clusters could regress on other distributions and
still pass the gate. This commit replaces both halves of the oracle.

Correctness phase (was: recall@K floor):
  - Bit-equivalent (max_abs_err <= 1e-4) match against an immutable scalar
    reference kernel, on a 5-distribution input battery (Gaussian, uniform,
    sparse, large-dynamic-range, mostly-zero) crossed with all evaluated PQ
    shapes. Top-K compared with tie-tolerant equivalence (TOPK_DIST_TOL=1e-4).
    Lossy techniques (LUT u8/u16 quantization, etc.) fail this gate by
    construction.

Speed phase (was: geomean ns over one synthetic dataset):
  - Geomean ns/query measured across 3 PQ shapes x 3 data distributions:
      (128, 16, 256) - SIFT-like
      (256, 16, 256) - sub_vector_dim=16
      (768, 96, 256) - BERT-like
    crossed with clustered / uniform / sparse data. Fixed seed across trials
    for reproducibility; per-combo timings reported alongside the global
    geomean / worst / best so a kernel that wins on one combo and regresses
    on another fails the worst-case guard.

Kernel API (was: const-DIM scalar functions):
  - Generic over (dim, num_sub_vectors, num_centroids) via PqShape.
  - PqKernel::new(shape, codebook) lets the agent pre-process the codebook
    once (transpose, cache c.c, pack LUT, etc.) and amortize across queries.
    Build cost is excluded from per-query timing - the bench measures
    distance_table + probe_top_k only.

Other consequences:
  - SIFT1M loader (src/fixture.rs), prepare_fixtures.sh, and the
    cache-directory plumbing all delete - the harness is now fully
    self-contained, no external download.
  - src/inputs.rs replaces src/fixture.rs; deterministic per-trial
    test-data + workload generation, no frozen artifacts.
  - Cargo.toml gains an empty [workspace] block so cargo doesn't walk up to
    the omnigraph parent workspace from inside research/.

Verified end-to-end:
  - cargo build --release: clean
  - cargo clippy --release --all-targets -- -D warnings: clean
  - cargo run --release --bin run_experiment: correctness pass, geomean
    1.22M ns, worst 4.82M ns ((768,96,256), sparse), best 596k ns, exit 0,
    total wall-clock ~39s
  - smoke test: kernel returning 0 distance -> correctness fail with
    diagnostic, exit 2
  - cargo test --release --lib: 2/2 unit tests pass
    (correctness_battery_is_deterministic, speed_workloads_match_shapes)

https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5
2026-05-14 23:03:45 +00:00
Claude
ed376af7d8
research: lance-autoresearch — PQ L2 kernel autoresearch harness
Stand up a standalone Rust project under research/lance-autoresearch/ for
LLM-driven optimization of Lance's PQ L2 distance kernels, following Karpathy's
three-file autoresearch contract:

  - src/kernels.rs (mutable, the agent's playground): scalar baseline PQ L2
    distance + top-K matching Lance 4.x's algorithm shape (16 sub-vectors,
    256 centroids, 8-bit codes, 128-d f32).
  - src/{fixture,reference,bin/run_experiment}.rs (immutable): SIFT1M loader
    (fvecs/ivecs + frozen codebook) with deterministic synthetic fallback,
    brute-force ground truth, fixed-format result block with recall@10 floor
    + time-budget exits.
  - program.md (human-iterated): the skill the agent reads each session —
    setup, what it can / cannot edit, the metric, Lance-PQ-specific priors,
    the keep/revert loop.

Smoke tests pass: baseline build clean, recall@10 = 0.66 on synthetic above
the 0.50 floor (exit 0), broken kernel triggers floor failure (exit 2),
clippy -D warnings clean. Excludes research/ from omnigraph workspace so
the nested project doesn't enter omnigraph's cargo build graph.

Licensed dual MIT / Apache-2.0 to keep the upstream-PR path to lance-format/lance
clean.

https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5
2026-05-14 22:38:39 +00:00