mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-21 02:28:07 +02:00
Original harness used recall@K vs. SIFT1M as the correctness oracle, which gives
the agent incentive to overfit to one data distribution: a kernel that hits
recall@10 on SIFT-shaped clusters could regress on other distributions and
still pass the gate. This commit replaces both halves of the oracle.
Correctness phase (was: recall@K floor):
- Bit-equivalent (max_abs_err <= 1e-4) match against an immutable scalar
reference kernel, on a 5-distribution input battery (Gaussian, uniform,
sparse, large-dynamic-range, mostly-zero) crossed with all evaluated PQ
shapes. Top-K compared with tie-tolerant equivalence (TOPK_DIST_TOL=1e-4).
Lossy techniques (LUT u8/u16 quantization, etc.) fail this gate by
construction.
Speed phase (was: geomean ns over one synthetic dataset):
- Geomean ns/query measured across 3 PQ shapes x 3 data distributions:
(128, 16, 256) - SIFT-like
(256, 16, 256) - sub_vector_dim=16
(768, 96, 256) - BERT-like
crossed with clustered / uniform / sparse data. Fixed seed across trials
for reproducibility; per-combo timings reported alongside the global
geomean / worst / best so a kernel that wins on one combo and regresses
on another fails the worst-case guard.
Kernel API (was: const-DIM scalar functions):
- Generic over (dim, num_sub_vectors, num_centroids) via PqShape.
- PqKernel::new(shape, codebook) lets the agent pre-process the codebook
once (transpose, cache c.c, pack LUT, etc.) and amortize across queries.
Build cost is excluded from per-query timing - the bench measures
distance_table + probe_top_k only.
Other consequences:
- SIFT1M loader (src/fixture.rs), prepare_fixtures.sh, and the
cache-directory plumbing all delete - the harness is now fully
self-contained, no external download.
- src/inputs.rs replaces src/fixture.rs; deterministic per-trial
test-data + workload generation, no frozen artifacts.
- Cargo.toml gains an empty [workspace] block so cargo doesn't walk up to
the omnigraph parent workspace from inside research/.
Verified end-to-end:
- cargo build --release: clean
- cargo clippy --release --all-targets -- -D warnings: clean
- cargo run --release --bin run_experiment: correctness pass, geomean
1.22M ns, worst 4.82M ns ((768,96,256), sparse), best 596k ns, exit 0,
total wall-clock ~39s
- smoke test: kernel returning 0 distance -> correctness fail with
diagnostic, exit 2
- cargo test --release --lib: 2/2 unit tests pass
(correctness_battery_is_deterministic, speed_workloads_match_shapes)
https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5
56 lines
2.1 KiB
Rust
56 lines
2.1 KiB
Rust
//! Lance autoresearch harness — public API for the bench binary, benchmarks, and tests.
|
|
//!
|
|
//! Contract (Karpathy-style three files):
|
|
//!
|
|
//! - `kernels` — the AGENT'S PLAYGROUND. Modify freely.
|
|
//! - `reference` — IMMUTABLE. Scalar reference kernel. Defines the math.
|
|
//! - `inputs` — IMMUTABLE. Diverse test-data + workload generators,
|
|
//! deterministic per fixed seed, varied across the input battery.
|
|
//!
|
|
//! The optimization target is dataset-independent: the agent's kernel must match
|
|
//! the scalar reference within `MAX_ABS_ERR` on every input the bench generates,
|
|
//! and minimize geomean ns/query across multiple PQ shapes and data
|
|
//! distributions. There is no fixed dataset; an "improvement" by construction
|
|
//! generalizes across distributions and shapes.
|
|
|
|
pub mod inputs;
|
|
pub mod kernels;
|
|
pub mod reference;
|
|
|
|
/// Geometry of a PQ index: vector dimension, number of sub-quantizers, centroids
|
|
/// per sub-quantizer. We pin nbits=8 (256 centroids) — the dominant Lance code
|
|
/// path. `dim` must be divisible by `num_sub_vectors`.
|
|
#[derive(Clone, Copy, Debug, Eq, PartialEq, Hash)]
|
|
pub struct PqShape {
|
|
pub dim: usize,
|
|
pub num_sub_vectors: usize,
|
|
pub num_centroids: usize,
|
|
}
|
|
|
|
impl PqShape {
|
|
pub const fn new(dim: usize, num_sub_vectors: usize, num_centroids: usize) -> Self {
|
|
Self {
|
|
dim,
|
|
num_sub_vectors,
|
|
num_centroids,
|
|
}
|
|
}
|
|
pub const fn sub_vector_dim(&self) -> usize {
|
|
self.dim / self.num_sub_vectors
|
|
}
|
|
pub const fn distance_table_len(&self) -> usize {
|
|
self.num_sub_vectors * self.num_centroids
|
|
}
|
|
pub const fn codebook_len(&self) -> usize {
|
|
self.num_sub_vectors * self.num_centroids * self.sub_vector_dim()
|
|
}
|
|
}
|
|
|
|
/// Tolerance for the agent kernel's distance values vs. the scalar reference.
|
|
/// Loose enough to permit legal SIMD-accumulator reordering; tight enough to
|
|
/// catch real arithmetic bugs.
|
|
pub const MAX_ABS_ERR: f32 = 1e-4;
|
|
|
|
/// Tolerance for top-K *distances* (id sets are compared with tie-tolerance —
|
|
/// see `reference::topk_consistent`).
|
|
pub const TOPK_DIST_TOL: f32 = 1e-4;
|