Three related fixes from the code-review pass that make the per-query
timing measure kernel work and only kernel work:
1. distance_table API now takes `&mut [f32]` output buffer
- Old: `fn distance_table(&self, query: &[f32]) -> Vec<f32>` — every
call allocated a fresh Vec inside the timed region. An agent that
reduced allocator pressure (e.g., via interior-mutability hacks with
RefCell + thread-local scratch) would have shown up as a "kernel win"
when it was actually just dodging the allocator.
- New: `fn distance_table(&self, query: &[f32], out: &mut [f32])`.
run_experiment pre-allocates one buffer per workload and reuses it
across queries. Same for the criterion bench (one scratch buffer per
bench_function closure). Timing now reflects only the kernel work.
2. Warmup query per workload
- The first query of each (shape × distribution) combo paid cold-cache
cost on the codes array (1.9 MB for the (768,96,256) shape, exceeds
L2 on many laptops) and on the codebook (786 KB at that shape). With
SPEED_NUM_QUERIES=32 that's a ~3% first-query bias on the geomean.
- run_experiment now does one untimed distance_table + probe_top_k call
per workload before the timing loop. Black-boxed so it can't be DCE'd.
3. std::hint::black_box on probe_top_k result in the trial loop
- The criterion bench already did this; the trial harness (which is the
load-bearing measurement) did not. Under LTO + opt-level=3, since the
binary was the only consumer of `_hits`, the optimizer could in
principle DCE the heap maintenance work. black_box makes the result
observably live.
Doc updates:
- crates/pq-l2/program.md: API contract reflects the new signature; the
obsolete "avoid the Vec alloc in distance_table" prior is replaced with
a note about reducing probe_top_k's Vec<(u32, f32)> allocation
(single small alloc per query, real concern once the kernel SIMDs).
- docs/targets/pq-l2.md: API description updated.
Verified:
- cargo build / clippy / test: clean
- baseline trial: correctness pass, exit 0, ~40s wall-clock
- baseline numbers are now slower than before (geomean 1.35M vs prior
880k; (768,96,256) 5.2M vs prior 4.3M) because the prior numbers were
artificially low — allocator pressure improvements masqueraded as
kernel improvements, and LTO could in principle DCE heap maintenance.
The new numbers measure actual kernel work.
https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5
A code review pass found a cluster of real bugs in metrics and contract;
fixing them before any agent loop runs against this harness.
Critical metric bug:
- harness-common::sysinfo::peak_rss_mb read VmPeak (virtual address space
high-water-mark, includes mmap'd files / guard pages / untouched
allocations) instead of VmHWM (resident pages high-water-mark). The
function name and HARNESS.md contract both promised RSS. Every
peak_mem_mb row logged under the old code was virtual peak, not RSS.
Correctness contract bug:
- reference::topk_consistent's tie-tolerance had a flawed neighbor-scan
check: when the K-th distance fell in a multi-way tie, agent and
reference could legally return different K-sized subsets of the tied
band (heap eviction order vs. sort stability), and the neighbor scan
required both endpoints to be present, false-negativing legitimate
cases. Simplified to a positional distance-tolerance check; ids at the
same rank may differ silently because the distance match within tol
constrains the swap to a 2*tol band. Diagnostic comment explains the
rationale.
API hygiene:
- Removed dead PqKernel::shape() and ScalarReference::shape() — declared
in the public API contract (program.md, kernels.rs comment), required
to be stable, never called by the bench / benches / inputs / reference.
Now the contract reflects what the bench actually uses.
- Removed dead `anyhow` workspace dependency.
Determinism:
- PRNG seed mixing now uses the SplitMix64 finalizer per part instead of
raw XOR. Raw XOR is commutative and small-constant collisions are
reachable; mix_seeds iterates the finalizer once per ingredient so
distinct (seed, shape, kind) tuples produce distinct streams with
vanishingly small collision probability.
License headers:
- kernels.rs SPDX changed from Apache-2.0 to MIT OR Apache-2.0 to match
the crate's Cargo.toml license field (the rest of the crate is dual-
licensed). Added matching SPDX headers to reference.rs and inputs.rs.
Doc cleanups:
- design.md: replaced the broken relative link
`../../docs/research/llm-evolutionary-sampling.md` (which resolved inside
lance-autoresearch where the note doesn't live) with a path-explained
reference noting the note lives in the parent OmniGraph repo and won't
ship on extraction.
- README.md: clarified that the target table mixes a single landed target
with a candidate roadmap — they have no code yet.
- HARNESS.md: added exit code 1 (internal error) to the exit-code summary;
was documented in run_experiment.rs but not in the loop contract.
- adding-a-target.md: dropped the misleading "cp -r plus surgical edits"
framing — the workflow rewrites 7 files; what's inherited is Cargo
manifest, license headers, workspace registration, and shared utilities.
Verified end-to-end: cargo build / clippy / test all green. Baseline
trial runs `correctness: pass` exit 0 in ~34s (peak_mem_mb now reads
RSS — same workload reports 91 MB, plausibly correct given the temporary
fixture-construction buffers).
https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5
The original lance-autoresearch was one Cargo crate optimizing one Lance
kernel (PQ L2 distance). With 9+ candidate targets enumerated in the research
note, a single-crate shape doesn't scale: per-target deps will collide, the
agent's edits to one target's kernels.rs would conflict with another's lib
path, and build/test isolation is lost. Restructure into a Cargo workspace.
Layout:
research/lance-autoresearch/
├── Cargo.toml (workspace root)
├── README.md (target table, contract overview, repo layout)
├── HARNESS.md (universal loop contract every target inherits)
├── crates/
│ ├── harness-common/ (shared: SplitMix64, geomean, peak RSS,
│ │ MAX_ABS_ERR, TOPK_DIST_TOL, TIME_BUDGET_SECS)
│ └── pq-l2/ (the landed target; was the previous single crate)
└── docs/
├── design.md (rationale for workspace shape, no Target trait)
├── adding-a-target.md (step-by-step workflow for new targets)
└── targets/pq-l2.md (per-target capsule)
Decisions documented in docs/design.md:
- Workspace, not single crate: per-target Cargo.toml so deps don't collide;
per-target src tree so agent edits don't conflict; per-target build/test
isolation for faster agent iteration.
- harness-common as a plumbing-only crate (PRNG, geomean, peak RSS, tolerance
constants, time budget). Intentionally NO Target trait - decode kernel
signatures and distance kernel signatures differ enough that a unifying
trait would either bloat or require erased boxing. Each target is its own
natural shape.
- Per-target program.md + shared HARNESS.md: the loop contract is universal,
the priors and API spec are per-target. Two files instead of one because
copy-pasting the universal loop into every program.md would drift.
pq-l2 refactor:
- src/* moved into crates/pq-l2/src/* via git mv (preserves history)
- crate renamed lance-autoresearch -> pq-l2
- SplitMix64, geomean, peak_rss_mb, MAX_ABS_ERR, TOPK_DIST_TOL,
TIME_BUDGET_SECS now imported from harness-common (drops ~70 lines of
duplication that would have been copy-pasted into every new target)
- program.md trimmed: setup/loop/hygiene moved to HARNESS.md; only the
PQ-L2-specific API contract and SIMD priors remain
- Cargo.toml depends on harness-common via path; workspace.dependencies
pins criterion uniformly across targets
The 9 candidate targets from the research note (A1 cosine/dot/hamming, A2
IVF partition select, A3 FTS BM25, A4 bitpack decode, A5 dictionary decode,
A6 FSST decode, A7 take/gather, A8 predicate eval, A9 posting list intersect,
A10 top-K merge) are listed in README.md's target table as "candidate"; each
gets a docs/targets/<name>.md capsule when it's spun up. docs/adding-a-target.md
documents the cp -r + edit-Cargo.toml + rewrite-three-files workflow.
Verified end-to-end:
- cargo build --release: clean, both crates compile
- cargo clippy --release --workspace --all-targets -- -D warnings: clean
- cargo test --release --workspace: 6/6 pass (4 harness-common + 2 pq-l2)
- cargo run --release --bin run_experiment -p pq-l2: correctness pass,
geomean ~880k ns, exit 0, ~30s wall-clock
- omnigraph parent workspace unchanged (research/ excluded as before)
https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5
Cluster A previously listed only distance-kernel candidates (cosine, IVF
partition selection, BM25 scoring), which understated the autoresearch
opportunity in Lance. The single largest hot-cycle pile for analytical reads
is the decode path in lance-encoding, not lance-linalg.
Restructure Cluster A into three sub-groups, all sharing the autoresearch loop
shape (single-agent, bit-exact oracle, seconds-scale eval, self-contained code)
but differing in fixture shape:
Distance kernels (lance-linalg):
A1. Adjacent distance kernels (cosine, dot, hamming)
A2. IVF partition-selection kernel
A3. FTS BM25 scoring kernel
Decode kernels (lance-encoding) - highest hot-cycle pile:
A4. Bitpack integer decode (billions of values per analytical query;
documented SIMD literature BP128 / simdcomp / Lemire bitpacking)
A5. Dictionary decode (SIMD gather + prefetch wins on low-cardinality
string columns)
A6. FSST string decode (Tableau's 2x SIMD opportunity)
Scan / merge kernels:
A7. Take / gather (random-access reads; hot for ANN post-fetch)
A8. Predicate / filter evaluation (per-type comparison kernels)
A9. Posting list intersection (FTS AND queries; Lemire 2-5x SIMD wins)
A10. Top-K k-way merge (every LIMIT / ANN query)
Each new candidate notes why it's high-leverage, the documented SIMD
opportunity if any, and the bit-exact oracle availability. Updates the
cross-cluster prioritization to add a "largest absolute speedup on a real
workload -> run A4" branch alongside the existing branches; notes that A1
and A4 can run in parallel by separate agents since they share loop shape but
not scaffolding.
scripts/check-agents-md.sh still passes (30/30 links).
https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5
Add a "Next experiment candidates" section grouped by control-loop shape (the
unit of harness reuse is the loop, not the target):
Cluster A - reuses lance-autoresearch as-is:
A1. Adjacent distance kernels (cosine, dot, hamming)
A2. IVF partition-selection kernel (centroid scan)
A3. FTS BM25 scoring kernel
Cluster B - needs a new harness (BauplanLabs tournament loop):
B1. IVF_PQ index-build parameter tuning (original "surface 1")
B2. Auto-index-type selection (categorical + B1 inner)
Cluster C - highest ceiling, hardest harness:
C1. Physical-plan JSON patching for Lance-backed DataFusion
(literal BauplanLabs replication)
Each candidate notes the surface, oracle, harness reuse vs. new, and the
expected payoff. A cross-cluster section frames the three "if your goal is X,
run candidate Y next" branches: shortest path to upstream PR (A1), most
user-facing impact (B1), paper-publishable replication (C1). Includes the
go/no-go logic if A1 and B1 split.
scripts/check-agents-md.sh still passes (30/30 links).
https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5
The note proposed surface 1 (index-build tuning) with recall@K oracle and
BauplanLabs evolutionary tournament as the "smallest experiment that would
produce signal." What landed at research/lance-autoresearch/ is a different
shape: PQ kernel optimization with bit-exact correctness oracle and Karpathy
single-agent autoresearch loop. Add a "First implementation landed" section
that records the divergence and the reasoning (seconds-scale eval favors the
autoresearch shape; kernel work has a more direct upstream PR path; the
bit-exact oracle removes dataset-overfitting incentive). Bumps the note to
revision 3.
scripts/check-agents-md.sh still passes (30/30 links).
https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5
Original harness used recall@K vs. SIFT1M as the correctness oracle, which gives
the agent incentive to overfit to one data distribution: a kernel that hits
recall@10 on SIFT-shaped clusters could regress on other distributions and
still pass the gate. This commit replaces both halves of the oracle.
Correctness phase (was: recall@K floor):
- Bit-equivalent (max_abs_err <= 1e-4) match against an immutable scalar
reference kernel, on a 5-distribution input battery (Gaussian, uniform,
sparse, large-dynamic-range, mostly-zero) crossed with all evaluated PQ
shapes. Top-K compared with tie-tolerant equivalence (TOPK_DIST_TOL=1e-4).
Lossy techniques (LUT u8/u16 quantization, etc.) fail this gate by
construction.
Speed phase (was: geomean ns over one synthetic dataset):
- Geomean ns/query measured across 3 PQ shapes x 3 data distributions:
(128, 16, 256) - SIFT-like
(256, 16, 256) - sub_vector_dim=16
(768, 96, 256) - BERT-like
crossed with clustered / uniform / sparse data. Fixed seed across trials
for reproducibility; per-combo timings reported alongside the global
geomean / worst / best so a kernel that wins on one combo and regresses
on another fails the worst-case guard.
Kernel API (was: const-DIM scalar functions):
- Generic over (dim, num_sub_vectors, num_centroids) via PqShape.
- PqKernel::new(shape, codebook) lets the agent pre-process the codebook
once (transpose, cache c.c, pack LUT, etc.) and amortize across queries.
Build cost is excluded from per-query timing - the bench measures
distance_table + probe_top_k only.
Other consequences:
- SIFT1M loader (src/fixture.rs), prepare_fixtures.sh, and the
cache-directory plumbing all delete - the harness is now fully
self-contained, no external download.
- src/inputs.rs replaces src/fixture.rs; deterministic per-trial
test-data + workload generation, no frozen artifacts.
- Cargo.toml gains an empty [workspace] block so cargo doesn't walk up to
the omnigraph parent workspace from inside research/.
Verified end-to-end:
- cargo build --release: clean
- cargo clippy --release --all-targets -- -D warnings: clean
- cargo run --release --bin run_experiment: correctness pass, geomean
1.22M ns, worst 4.82M ns ((768,96,256), sparse), best 596k ns, exit 0,
total wall-clock ~39s
- smoke test: kernel returning 0 distance -> correctness fail with
diagnostic, exit 2
- cargo test --release --lib: 2/2 unit tests pass
(correctness_battery_is_deterministic, speed_workloads_match_shapes)
https://claude.ai/code/session_01Aq8kBUcjmEPobcEufnWbW5
User clarified the target: optimize Lance directly rather than OmniGraph's
IR layer. Rewrites the note with Lance as the primary target.
Key reframe: Lance is parameter-heavy (not just plan-shape-heavy). The
biggest wins come from configuration tuples (IvfPq num_partitions /
num_sub_vectors / quantizer choice, nprobes / refine_factor / prefilter,
batch_size / io_buffer_size / thread pools, AIMD throttle, scalar-index
choice per column, compaction policy). None of these need a Lance fork —
Lance accepts them as config and emits the metrics. That makes
parameter-search a no-fork, substrate-respecting application of the
BauplanLabs JSON-Patch-on-DAG mechanic (patches over config objects
instead of plan trees).
The plan-patching angle (LanceTableProvider → DataFusion ExecutionPlan,
HashJoinExec swap, multi-join reorder) is parked as the long-term play
behind an upstream-contribution step: serializing/round-tripping
ExecutionPlan as JSON is the prerequisite Bauplan added in their fork,
and the right move is to contribute it upstream rather than maintain a
fork.
Ranks six surfaces by value/difficulty, proposes a smallest experiment on
surface 1 (workload-conditioned IvfPq tuning on SIFT1M or LAION-sample
with recall@10 / p95-latency fitness, bol_evol with n_steps=3,
n_samples=4), and treats OmniGraph-IR work as a complementary footnote
since it composes cleanly with a Lance-tuner output.
Note on Erol et al. (arXiv 2602.10387) — DBPlanBench's evolutionary search
over DataFusion physical plans — and where the mechanic does and does not
port to OmniGraph. The direct port (fork DataFusion, patch physical plans)
is the wrong target since we touch DataFusion only as a MemTable in
table_store::scan_pending_batches; the adapted form (JSON-Patch search over
QueryIR, especially multi-hop Expand ordering / direction) fits cleanly
above the substrate without violating §I substrate respect.
Lists application surfaces by value/difficulty (multi-hop Expand reorder,
RRF hybrid-retrieval k-tuning, filter-pushdown shape, vector index params,
compaction policy) and proposes the smallest experiment that would produce
signal — bol_evol on a ~30-query .gq corpus with bit-identical result
validation. Calls out the Hyrum's Law / determinism discipline (search
offline, freeze plans for serving) and the corpus bootstrap problem.
Filed under docs/research/ as exploratory; not a committed plan.
Branch protection on main, declared as code rather than as opaque
GitHub UI state. Pairs with the CODEOWNERS chassis (#88): once this
PR lands and an admin runs the apply script, every PR to main must
satisfy code-owner review and the listed required checks.
Components:
- .github/branch-protection.json — the policy. Edit this to change
required checks, review counts, etc. Includes a _comment field for
human readers; the apply script strips it before PUT.
- scripts/apply-branch-protection.sh — idempotent apply via `gh api`.
Reads back current state for verification. Supports DRY_RUN=1.
- docs/branch-protection.md — explains the policy, how to apply, how
to change, why declared as code.
- AGENTS.md topic-index row.
Policy summary:
- Required status checks (strict): Classify Changes, Check AGENTS.md
Links, Test Workspace, Test omnigraph-server --features aws,
CODEOWNERS / drift, CODEOWNERS / noedit.
- Required approving reviews: 1, must be a code owner.
- Dismiss stale reviews on new commits.
- Required linear history (squash or rebase merges only).
- No force pushes, no deletions, no admin bypasses.
- Required conversation resolution.
What's NOT in this PR:
- Required signed commits — not yet; maintainers must enroll GPG/SSH
signing first or merges will block.
- Tag protection for v* tags — separate PR.
- Additional required checks (cargo deny, audit, fmt, clippy, CodeQL,
schema-lint MR-946) — separate PRs as each lands.
- The script is NOT run by CI. Branch-protection changes are admin
actions; CI-driven auto-apply would defeat the purpose. Manual
invocation is the audit point.
How to apply after merge:
./scripts/apply-branch-protection.sh
Requires gh-CLI auth with repo-admin permissions.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* codeowners: generator + drift CI + initial roles
Source-of-truth approach to CODEOWNERS: yml is hand-edited, CODEOWNERS
is generated and CI-enforced. Every role change is a reviewable PR
with a permanent in-repo audit trail. No GitHub UI clicks, no shadow
state.
Initial roles:
engineering @aaltshuler owns crates/** + default (.github/,
scripts/, Cargo.*, openapi.json,
everything else not docs)
docs @aaltshuler @ragnorc owns docs/**, README.md, AGENTS.md,
CLAUDE.md, SECURITY.md
Per GitHub semantics, multiple owners on a CODEOWNERS line means "any
one satisfies the review" — for docs, either named member can approve.
Strict "N distinct approvers" would need a CI workaround (not wired
today; tracked for future hardening).
Components:
- .github/codeowners-roles.yml — source of truth. Edit this.
- .github/scripts/render-codeowners.py — generator (PyYAML; ~100 LoC).
- .github/CODEOWNERS — generated. CI rejects hand-edits.
- .github/workflows/codeowners.yml — two checks:
* drift: re-render and assert CODEOWNERS matches.
* noedit: reject PRs that edit CODEOWNERS without editing the yml.
- docs/codeowners.md — explains the source-of-truth pattern, how to
change roles, how to add new roles.
- AGENTS.md topic-index row.
What's NOT in this PR:
- Branch protection on main (separate PR; needs `gh api` call against
the org).
- Required-reviewer enforcement (depends on branch protection landing).
- Required CI status checks (depends on branch protection landing).
- Scheduled rotation (the schedule: block in the yml + a weekly
workflow). Today's roles are stable; rotation isn't needed yet.
- Linear-as-source-of-truth integration (Approach 4 from the design
discussion; deferred).
Verified:
- Generator output is deterministic (idempotent re-runs).
- scripts/check-agents-md.sh OK (28 links, 28 docs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* codeowners: fix catch-all ordering (Devin review #88)
Devin caught a real bug: GitHub CODEOWNERS uses "last match wins"
semantics, but the generator emitted the catch-all `*` AFTER specific
patterns. Net effect: `*` won for every file, silently nullifying the
docs role and never routing reviews to @ragnorc.
Fix is one-line — emit the default `*` line before iterating the
specific paths. Also:
- Added a regression assertion in the generator: after rendering, the
first non-comment line must start with `*` if a default is
configured. Generator exits non-zero otherwise. Catches the same
class of mistake in any future refactor.
- Rewrote the yml header comment, which incorrectly stated "keep
more-specific paths after broader patterns" (correct for GitHub
semantics but the generator was doing the opposite — so the comment
read as a description of behavior when it was actually a contradicted
intention).
Verified by re-rendering: `*` is now line 12, `crates/**` is line 14,
`docs/**` is line 15, etc. README.md matches both `*` and `README.md`;
`README.md` is later → wins → @aaltshuler + @ragnorc both assigned.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First slice of the schema-lint chassis. Adds stable `OG-XXX-NNN`
codes to schema-migration rejections so operators can suppress, look
up, and filter on identifiers rather than free-text prose. Atlas-style
chassis adapted to omnigraph's typed-IR substrate (no SQL injection
vector, no per-engine locks, native edge/vector/embedding types).
What's in v0:
- New `omnigraph-compiler/src/lint/` module with:
- `diagnostic.rs` — Family / SafetyTier / Severity enums covering ten
families (DS, MF, CD, BC, NM, OW, NL, VE, ED, LK). Only DS and MF
are populated in this PR.
- `codes.rs` — 8 DiagnosticCode constants (OG-DS-101..105,
OG-MF-103, OG-MF-104, OG-MF-106). Five of the eight are wired to
real emission sites; the other three are reserved.
- Unit tests for catalog invariants: codes unique, prefix matches
family, suffixes are 3-digit, destructive defaults to error,
lookup() works, EMITTED_IN_V0 codes exist in ALL_CODES.
- `SchemaMigrationStep::UnsupportedChange` gains an optional
`code: Option<String>` field. New `unsupported_error_message()`
helper prefixes the message with `[code]` when present.
- 5 of 17 existing rejection paths now carry codes:
- `removing node type` → OG-DS-102
- `removing edge type` → OG-DS-103
- `removing property` → OG-DS-104
- `adding required property without backfill` → OG-MF-103
- `changing property type` → OG-MF-106
Remaining 12 paths carry `code: None` and are tagged as future work.
- `schema_apply` surfaces the formatted error (with `[code]` prefix);
CLI `omnigraph schema plan` renders the code on the
`unsupported change on <entity>` line.
- PR #62 destructive-rejection tests in `tests/schema_apply.rs` now
assert on the stable code (`msg.contains("OG-DS-104")`) instead of
the error-message substring. 11/11 tests pass.
- New `docs/schema-lint.md` documents the v0 catalog + the 10 families
+ Atlas prior art. AGENTS.md index updated.
What's explicitly NOT in v0 (subsequent PRs):
- No severity config in `omnigraph.yaml` (MR-694 §2).
- No `@allow(OG-XXX-NNN, "rationale")` suppression directive (§3).
- No `--allow-data-loss` flag or destructive-tier enforcement.
- No new `SchemaMigrationStep` variants (soft/hard drops, default,
widen/narrow). MR-700, MR-697 land those.
- No pre-migration checks (MR-941).
- No CD / VE / LK / NM family rules (MR-942..945).
- No CI integration (MR-946).
Tests: 235 compiler tests, 11 schema_apply integration tests, 14
lint module tests, 55 CLI tests — all green.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
npx mdrip writes fetched-page snapshots under mdrip/. The cache is a
local-only working artifact (docs/lance.md is the curated index of
upstream Lance pages we fetch on demand). Keep the cache out of the
tree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Defensive — Lance 4.0.0 preserves the source dataset's flag through
Operation::Overwrite even when WriteParams omits it (pinned by the
prior commit's test), but setting it explicitly matches the public
overwrite_dataset path at line 454 and documents the dependency at
the call site so a future refactor doesn't accidentally drop it.
Setting it on a dataset created without stable row IDs is a no-op
per Lance's row-id-lineage spec, so this stays correct for legacy
datasets.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
stage_overwrite is used by schema_apply to rewrite tables when an
additive migration touches data. If Lance Operation::Overwrite ever
stopped preserving the source dataset's enable_stable_row_ids flag,
every schema_apply that triggers a rewrite would silently disable
stable row IDs on the affected tables and downstream readers that
depend on _rowid stability (change-feed validators, index
reconcilers) would observe silent corruption.
Empirically Lance 4.0.0 does preserve the flag through Overwrite
even when WriteParams omits it — but the preservation isn't
documented at the Lance spec level, so pin it here. Any future
behaviour change surfaces as a test failure rather than silent
corruption.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The L1 capability list claimed the flag was enabled "for the
commit-graph and run-registry datasets" — stale. Every Lance
dataset OmniGraph creates has enable_stable_row_ids: true; the
run-registry datasets are gone since MR-771. Replace with a single
paragraph capturing the invariant, the consequences (row-version
columns available, CreateIndex × Rewrite not retryable, Lance reader
version required), the legacy-dataset constraint (one-way at create,
dump-and-reload to migrate), and a pointer to the regression test in
staged_writes.rs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reframes the first-principle section to lead with Winters' "engineering
is programming integrated over time" as the lens, keeping "minimize
ongoing liability" as the operative directive and folding in "complexity
should be earned." Adds a new Tiebreakers subsection with two rules
that the prior section lacked clean appeals for:
- correctness > simplicity > performance (lexicographic)
- reversibility shapes evidence demand (reversible → prod metrics over
napkin math over RFCs; irreversible → RFC up-front)
Adds a Hyrum's-Law deny-list entry in both AGENTS.md and
docs/invariants.md §IX: shipping observable behavior is shipping a
contract, even when undocumented.
Net always-on context cost: ~7 lines. No renumbering of §I–VIII
invariants; Hyrum's Law lands in the deny-list to avoid breaking
back-references.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These three files in crates/omnigraph/src/loader/ have no `mod`
declaration anywhere in the workspace and no `#[path = "…"]`
reference. They are not compiled — `touch`-ing them does not trigger
`cargo check` to recompile anything.
Their imports (`crate::catalog::schema_ir`, `crate::error::NanoError`,
`crate::store::manifest::hash_string`, `crate::types::ScalarType`,
`super::super::graph::DatasetAccumulator`) reference modules that no
longer exist in the engine crate, so they could not even be wired in
without further work. They are vestigial code from an earlier
monolithic crate layout. The live functionality is independently
implemented inside crates/omnigraph/src/loader/mod.rs.
These files have been orphaned since the initial public commit.
`cargo check --workspace --all-targets` and
`cargo test --workspace --no-run` both pass with no new warnings.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
Reviewer feedback on PR #62: the original
`cleanup_then_optimize_succeed_in_sequence` only unwrapped both calls
and asserted nothing, so it didn't validate the claimed sequencing
behavior. The concern that motivates the test is that cleanup destroys
version history and optimize on a freshly-cleaned table could trip on
dropped fragment refs or stale manifests.
Rename to `cleanup_then_optimize_preserves_rows_and_table_remains_writable`
and add three concrete postconditions: row counts in both Person and
Company tables survive the sequence; the head remains readable; and a
subsequent merge load still succeeds.
The previous version of `apply_schema_renames_node_type_via_rename_from_and_preserves_rows`
kept the node name as `Person` (`@rename_from("Person")`) and only renamed
a property. The planner only emits a `RenameType` step when the new name
differs from the accepted one, so the test name overstated what it
covered: a regression in `RenameType` step emission or in the
coordinator's table-key remap during type rename could pass while the
test still went green.
Rename the desired node from `Person` to `Human` (with
`@rename_from("Person")`), update the dependent edge endpoints to point
at `Human`, and assert both the `RenameType` step and that the manifest
table key has moved from `node:Person` to `node:Human`.
The audit of test coverage flagged three holes:
- `omnigraph optimize` and `omnigraph cleanup` had no integration tests
(no `maintenance.rs`). Add one covering empty/idempotent edges, the
policy-validation contract on `cleanup`, and head preservation under
aggressive policies.
- `apply_schema` only covered I32 -> I64 type-change rejection. Add the
symmetric narrowing case plus rejections for the other destructive
shapes (drop property with data, drop node type, drop edge type, add
required property without backfill) and assert the manifest version
doesn't advance. Add a positive `@rename_from` case to pin the
stable-type-id contract preserves rows through a rename.
- `docs/testing.md` was missing `validators.rs` and the new
`maintenance.rs` from its file table; bump the count and add rows.
* MR-786: merge-pair truth table with exhaustive op-variant matrix
Add crates/omnigraph/tests/merge_truth_table.rs that enumerates every
(left_op, right_op) cell from the operation vocabulary named in the
ticket — {noop, addNode, removeNode, addEdge, removeEdge, setProperty,
dropProperty, addLabel, removeLabel} — and asserts the deterministic
outcome of Omnigraph::branch_merge against a structured oracle.
The matrix is built in a 9x9 match in build_case, so adding a new
OpVariant is a compile-time, fail-on-omission task. Today's mutation
grammar only exposes insert | update set | delete (see
docs/query-language.md), so the 36 cells over the first six ops are
executable and the 45 cells involving dropProperty/addLabel/removeLabel
are recorded as Expected::Unsupported with a note. Each executable cell
spins up a fresh tempdir, applies one mutation per branch, calls
branch_merge, and asserts either:
* MergeOutcome (AlreadyUpToDate / FastForward / Merged) plus a
GraphAssert on the affected entities, or
* an OmniError::MergeConflicts whose entries match the expected
table_key + MergeConflictKind (row_id is optional because edge
ULIDs are generated at runtime).
branch_merge is directional, so the (L, R) and (R, L) cells live in
separate entries in the matrix and are run independently — the
op-pair symmetry encoded in build_case serves as the commutativity
oracle without doubling the runtime. End-to-end the suite runs in
~10s on a fresh build, well under the 30s budget asserted at the
bottom of the test.
Also adds a row to docs/testing.md so the test-coverage map points
future agents at this file.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
* Use one Omnigraph handle for both branches
Self-review caught that the runner was opening two Omnigraph handles
on the same temp dataset (one for main, a second via Omnigraph::open
for feature). tests/branching.rs uses one handle and passes the branch
name to mutate_branch — same pattern works here and avoids any
cache-coherency surprises between the two handles. Also drops the
post-merge reopen, which only existed to give the second handle a
fresh snapshot.
Runtime drops ~10s -> ~9s.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
* Assert exact conflict count, not subset inclusion
cubic and Devin Review both flagged that check_outcome's
Expected::Conflicts arm only enforces want ⊆ got, so a regression that
produces a spurious extra conflict (e.g. emitting both OrphanEdge and
a stray DivergentInsert) would silently pass the truth-table cell.
For a deterministic oracle that's the wrong direction — the cell pins
the exact conflict-artifact set, not a lower bound. Add an
assert_eq!(got.len(), want.len()) before the existence loop. All 36
executable cells still pass; runtime unchanged.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
* Subsume 4 conflict tests in branching.rs into truth table
The four `branch_merge_reports_*_conflict` tests
(DivergentUpdate / DivergentInsert / DeleteVsUpdate / OrphanEdge)
were redundant with the deterministic-oracle cells in the new
`merge_truth_table.rs` and only added drift risk.
To preserve the post-conflict invariant that lived in
`branch_merge_reports_divergent_update_conflict` (target unchanged
after a failed merge), the truth-table runner now generalizes it:
on every `Conflicts` cell, main's state is asserted against
`state_after_apply_only(right_op)`. That gives strictly more
coverage than the deleted tests carried, since the invariant now
applies to *all* seven conflict cells, not just one.
The `UniqueViolation` and `CardinalityViolation` cases stay in
`branching.rs` — they're combinatorial (require >1 op per side
with a non-default schema) and out of scope for the pair-wise
truth table.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
* Fix misleading 'Total edges: 0' comment in (AddEdge, RemoveEdge) cell
Devin Review flagged that the comment said 'Total edges: 0' while the
parenthetical math evaluates to 1 (matching `GraphAssert::base()`).
The assertion is correct; only the leading number in the comment was
wrong. Reworded to 'Net edges: … = 1 (matches base)' so the prose
agrees with both the math and the assertion.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
---------
Co-authored-by: Ragnor <ragnor@modernrelay.com>
Co-authored-by: Ragnor Comerford <ragnor.comerford@gmail.com>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The architectural rule "no cross-query BEGIN/COMMIT; branches fill that
role" lives in docs/invariants.md §VI.23 but is not surfaced anywhere
user-facing. New users coming from Postgres/MySQL hit the gap when they
realize multiple queries on main are independently atomic, not jointly
atomic.
This page explains the model with worked examples:
* Single-query multi-statement (atomic by default)
* Two separate queries on main (NOT atomic — common surprise)
* Many queries via a branch (atomic at merge)
* Coordinating multiple agents via branch-per-agent
Plus a comparison table to BEGIN/COMMIT, failure-mode rundown, and
"when to use what" decision matrix.
Linked from AGENTS.md "Where to find each topic" between
branches-commits.md and runs.md.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The matrix cell d:merge×change:into-target already exercises this
race: pre-fix it flakes ~20% on shared-CPU hardware (sentinel 409s);
post-fix it passes 100% regardless of which side of the racing pair
returns first. That flake-to-stable transition is the regression
signal.
The replacement test (concurrent_merge_clean_409_does_not_poison_next_
change_on_target) tried to sharpen this by looping until the clean-
409 path fired and then strictly requiring it. On fast CI hardware
the race window never opens in 50 iterations, which made the strict
variant fail in CI despite passing 10/10 locally. The bug genuinely
needs a real concurrent writer to advance on-disk manifest during
the swap window — a deterministic failpoint can't substitute because
forcing the merge body to Err without a real concurrent writer leaves
no cache-vs-disk drift to validate.
Reverting to the matrix cell as the sole regression coverage. Updated
the comment in merge.rs accordingly.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
Switch from match-on-Result to if-let-Err so the refresh outcome and
merge_result outcome are checked independently, making the intent
clearer: 'attempt refresh; on Ok-merge-with-refresh-error propagate;
on Err-merge-with-refresh-error log and surface the original merge
error'. No semantic change — both shapes were valid (wildcard patterns
don't move the scrutinee) — but the if-let form sidesteps a
needs-second-reading question raised in code review.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
merge.rs: best-effort refresh on the Err path so a refresh-time
storage error doesn't replace the merge body's structured error
(typically the manifest_conflict that the HTTP layer maps to a 409
with a structured payload) with a less informative one. Ok-path
behavior is unchanged — there a refresh failure is propagated so the
caller knows the coord's cache is unsynced.
server.rs: bump MAX_ITERATIONS to 50 and assert at the end that the
named clean-409 path actually fired at least once. With ~20% per-iter
rate on shared-CPU CI (per the original MR-923 repro), P(no hit in
50) is < 0.002%. Without this assertion the test silently degraded
to exercising only the 200-merge path — covered already by the
matrix cell.
Both changes per Devin Review + cubic comments on PR #80.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
The previous fix used `self.refresh()` to sync the restored
coordinator's cache after the swap-restore window. `refresh` runs the
`RollForwardOnly` recovery sweep — which, on the merge Err path with a
phase-B failure (sidecar written, per-table HEAD advanced, manifest
publish skipped), would observe the merge's own in-flight sidecar and
close it here.
That violates the contract documented on `Omnigraph::refresh`:
> Engine-internal callers that already hold an in-flight sidecar
> (e.g. `schema_apply` mid-write) MUST use `refresh_coordinator_only`
> to avoid the recovery sweep racing their own sidecar.
The post-restore step's purpose is to sync the coord cache with disk,
not to run recovery, so `refresh_coordinator_only` is the right
primitive on both paths. CI surfaced this via
`branch_merge_phase_b_failure_recovered_on_next_open` in
`crates/omnigraph/tests/failpoints.rs`, which asserts the sidecar
persists after the failpoint fires.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
branch_merge_impl swaps the coordinator for the merge target, runs the
merge body, then restores the original coordinator. A concurrent /change
on the same target during this window publishes against the swapped
coord, advancing on-disk manifest state that the restored coord doesn't
see.
The post-restore refresh was previously gated on merge_result.is_ok(),
so the clean-409 path (merge body's post_queue_snapshot drift check
returning a recoverable conflict) left the restored coord's cached
snapshot stale relative to disk. The next sequential /change seeded its
publisher expected_versions from that stale cache and 409'd with
ExpectedVersionMismatch — a non-retryable conflict surfaced to a caller
with no concurrent writer of their own.
Refresh on both Ok and Err paths so cached state cannot diverge from
the manifest across the swap-restore window.
Add a focused regression test
(concurrent_merge_clean_409_does_not_poison_next_change_on_target) that
loops the cell-d scenario until the clean-409 branch fires and asserts
the follow-up sentinel succeeds in that branch specifically.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>