omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-09 01:35:18 +02:00

Claude 0de7fb3057 research: reframe LLM evolutionary sampling note around Lance directly User clarified the target: optimize Lance directly rather than OmniGraph's IR layer. Rewrites the note with Lance as the primary target. Key reframe: Lance is parameter-heavy (not just plan-shape-heavy). The biggest wins come from configuration tuples (IvfPq num_partitions / num_sub_vectors / quantizer choice, nprobes / refine_factor / prefilter, batch_size / io_buffer_size / thread pools, AIMD throttle, scalar-index choice per column, compaction policy). None of these need a Lance fork — Lance accepts them as config and emits the metrics. That makes parameter-search a no-fork, substrate-respecting application of the BauplanLabs JSON-Patch-on-DAG mechanic (patches over config objects instead of plan trees). The plan-patching angle (LanceTableProvider → DataFusion ExecutionPlan, HashJoinExec swap, multi-join reorder) is parked as the long-term play behind an upstream-contribution step: serializing/round-tripping ExecutionPlan as JSON is the prerequisite Bauplan added in their fork, and the right move is to contribute it upstream rather than maintain a fork. Ranks six surfaces by value/difficulty, proposes a smallest experiment on surface 1 (workload-conditioned IvfPq tuning on SIFT1M or LAION-sample with recall@10 / p95-latency fitness, bol_evol with n_steps=3, n_samples=4), and treats OmniGraph-IR work as a complementary footnote since it composes cleanly with a Lance-tuner output.	2026-05-14 21:38:12 +00:00
..
llm-evolutionary-sampling.md	research: reframe LLM evolutionary sampling note around Lance directly	2026-05-14 21:38:12 +00:00

research: reframe LLM evolutionary sampling note around Lance directly

User clarified the target: optimize Lance directly rather than OmniGraph's
IR layer. Rewrites the note with Lance as the primary target.

Key reframe: Lance is parameter-heavy (not just plan-shape-heavy). The
biggest wins come from configuration tuples (IvfPq num_partitions /
num_sub_vectors / quantizer choice, nprobes / refine_factor / prefilter,
batch_size / io_buffer_size / thread pools, AIMD throttle, scalar-index
choice per column, compaction policy). None of these need a Lance fork —
Lance accepts them as config and emits the metrics. That makes
parameter-search a no-fork, substrate-respecting application of the
BauplanLabs JSON-Patch-on-DAG mechanic (patches over config objects
instead of plan trees).

The plan-patching angle (LanceTableProvider → DataFusion ExecutionPlan,
HashJoinExec swap, multi-join reorder) is parked as the long-term play
behind an upstream-contribution step: serializing/round-tripping
ExecutionPlan as JSON is the prerequisite Bauplan added in their fork,
and the right move is to contribute it upstream rather than maintain a
fork.

Ranks six surfaces by value/difficulty, proposes a smallest experiment on
surface 1 (workload-conditioned IvfPq tuning on SIFT1M or LAION-sample
with recall@10 / p95-latency fitness, bol_evol with n_steps=3,
n_samples=4), and treats OmniGraph-IR work as a complementary footnote
since it composes cleanly with a Lance-tuner output.

2026-05-14 21:38:12 +00:00

llm-evolutionary-sampling.md

research: reframe LLM evolutionary sampling note around Lance directly

2026-05-14 21:38:12 +00:00