omnigraph/docs/dev/testing.md
Ragnor Comerford 5243c048aa
perf(engine): remove the per-query metadata re-derivation tax on warm reads (#268)
* test(engine): add read-path IO instrumentation seam for warm-read cost tests

Prerequisite seam for the query-latency fixes. Adds
crates/omnigraph/src/instrumentation.rs:

- CountingStorageAdapter: a StorageAdapter decorator counting per-method
  reads (read_text/exists/read_text_versioned/list_dir), for the
  schema-contract reads on the query path.
- A per-query task-local (QueryIoProbes) carrying Lance WrappingObjectStore
  wrappers per open category plus a probe counter, delivered via
  with_query_io_probes. open_dataset_tracked attaches the wrapper so the
  open itself is counted (ObjectStoreParams.object_store_wrapper).

Wires the wrappers into the manifest open (open_manifest_dataset) and the
commit-graph opens (CommitGraph::open/open_at_branch). Production leaves
the task-local unset, so nothing attaches.

Makes Omnigraph::open_with_storage public so tests can inject the counting
adapter. lance-io is a dev-dependency (IOTracker named only in tests). No
runtime behavior change.

* test(engine): warm same-branch read should reuse the coordinator (red)

Cost-budget test using Lance IOTracker at the object-store boundary (the
LanceDB IO-counted-test pattern). On a 20-commit-deep graph, a warm
same-branch query re-opens a fresh coordinator, which opens both the commit
graph and __manifest. Asserts the read opens the commit graph zero times
and performs exactly one cheap version probe; today it does neither (it
scans the commit graph on re-open and never probes). The freshness guard
already passes. Adds the commit_many helper for history-depth fixtures.

Red half of the Fix 1 red->green pair; turns green with the next commit.

* perf(engine): same-branch reads reuse the warm coordinator (Fix 1)

query()/resolved_target re-opened a fresh GraphCoordinator from storage on
every read (full __manifest scan + two commit-graph scans), so a warm
read's cost grew with commit history (invariant 15) though the data was
unchanged.

resolved_target now serves same-branch reads from the warm in-memory
coordinator, gated by a cheap version probe (latest_version_id, one
object-store op) instead of a full re-open:
- fresh (probe == cached version): return the in-memory snapshot under the
  read lock, with a synthetic (branch, version) id and no commit-graph
  access (reads pin the snapshot by manifest version, not the commit DAG;
  invariant 2).
- stale: take the write lock, re-probe (double-checked; tokio RwLock has no
  read->write upgrade), then refresh_manifest_only (no commit-graph scan),
  preserving strong consistency for external writers (invariant 6).
Cross-branch and snapshot targets keep the existing cold-resolve path.

Adds ManifestCoordinator/GraphCoordinator::probe_latest_version and
GraphCoordinator::refresh_manifest_only. Nothing on the read path needs a
real commit ULID (only RuntimeCache keys on the id, where synthetic is
consistent), per a caller audit.

A warm same-branch read on a 20-commit graph now does zero commit-graph
opens and exactly one probe (down from a deep commit-graph scan) and still
observes external commits. The residual per-table __manifest scans are
removed later by Fix 2.

* test(engine): warm query should validate the schema contract once (red)

ensure_schema_state_valid runs twice per query (query()/run_query_at AND
resolved_target/snapshot_at_version), each reading 3 contract files + 2
existence probes. A warm query thus does 6 read_text + 4 exists where one
validation (3 + 2) suffices, measured via CountingStorageAdapter. Adds a
drift guard (schema_source_drift_is_caught_on_read) that already passes.

Red half of the finding-A red->green pair.

* perf(engine): validate the schema contract once per query (finding A)

ensure_schema_state_valid ran on every query AND again inside
resolved_target / snapshot_at_version, so each query validated the schema
contract twice (~10 storage ops). Removes the redundant query()/
run_query_at() calls; the validation inside resolved_target /
snapshot_at_version still runs, so drift is detected exactly as before.

A source-only fast path was rejected: a long-lived handle must detect
external drift of the schema source, IR, OR state on its next operation
(lifecycle::long_lived_handle_rejects_schema_*), which a source-only
compare would miss. So the only safe latency win is not validating twice.

A warm query now does one validation (3 read_text + 2 exists) instead of
two (6 + 4).

* test(engine): warm + multi-table reads should do zero manifest scans (red)

After Fix 1 a warm same-branch read still scans __manifest ~44 times at
20-commit depth: not from resolution (Fix 1 removed that) but from the
per-table open path, which routes through the Lance namespace and full-scans
__manifest twice per touched table (describe_table + describe_table_version).
Tightens the warm test to assert manifest read_iops == 0 and adds a
multi-table (traversal) test asserting the same, pinning the "2 tables = 2x"
tax. Red half of the Fix 2 red->green pair.

* perf(engine): open touched tables by location+version, not via the namespace (Fix 2)

SubTableEntry::open routed every read-path table open through
DatasetBuilder::from_namespace(BranchManifestNamespace), whose describe_table
full-scans __manifest and, with managed_versioning, makes Lance scan again
(describe_table_version) -- two full __manifest scans per touched table. That
was the residual that made warm-read manifest IO grow with history and the
'2 tables = 2x' multi-table tax.

The resolved Snapshot already holds each table's path/version/branch, so open
directly: from_uri(table_uri_for_path(root, path, branch)).with_version(v).
The branch-qualified location is the dataset that physically holds the version
(main: {path}; branch: {path}/tree/{branch}, Lance native-branch storage), and
with_version resolves it within THAT dataset's _versions. 0 namespace calls +
1 HEAD via the native ConditionalPutCommitHandler.

The read namespace (BranchManifestNamespace) is now unused in production
(writes use StagedTableNamespace), so it, its constructor, and the helpers only
it used (to_namespace_version, publish_requests, their imports) are gated
#[cfg(test)] -- retained to validate the namespace contract in unit tests.
Removes the dead open_table_at_version_from_manifest.

Warm same-branch + multi-table reads now scan __manifest zero times; branch +
time-travel reads stay correct (branching.rs, point_in_time.rs, 2 lib
regression tests); production-lib warnings unchanged (baseline).

* test(engine): cost-budget coverage for branch-warm and stale-refresh reads (matrix)

Extends the read-path cost-budget tests across more of the morphological matrix:
- warm_branch_read_does_no_manifest_scans: a warm read on a non-main branch
  (handle synced to it) scans __manifest zero times, exercising Fix 2's
  branch-owned-table open (tree/{branch} + with_version) on Fix 1's warm path --
  the cell that regressed when the open used with_branch against the base.
- stale_read_refreshes_manifest_only: an external commit makes the next read
  take the stale path, which re-reads the manifest (read_iops > 0) but never
  scans the commit graph (refresh_manifest_only), pinning Fix 1's manifest-only
  refresh.

Cold paths (cross-branch, time-travel) stay behavior-covered (branching.rs,
point_in_time.rs) and are cold by design (Fix 1 warm-paths only same-branch), so
there is no manifest==0 contract to assert there.

* test(engine): same-branch write after external commit must not fork the commit DAG (red)

* fix(engine): refresh commit-graph head before append to prevent same-branch DAG fork

A same-branch write that follows an external commit committed a fresh manifest
version (commit_all rebases the pin from a fresh coordinator) but appended off
the coordinator's stale in-memory commit-graph head, forking the commit DAG (the
new commit and the external commit shared a parent). Pre-existing for non-strict
inserts; widened to strict ops by Fix 1's refresh_manifest_only freshening the
read-time pin. record_graph_commit now refreshes the commit-graph head from
storage before append_commit, so the parent is the true current head.
record_merge_commit is unaffected (it passes explicit parents).

* perf(engine): hold open Dataset handles + share one Session per graph (Fix 3)

A warm same-branch read still re-opened every touched table per query (the
"never warms up" residual after Fix 1+2). A per-graph held-handle cache keyed by
(table_path, branch, version) now serves repeat reads with zero table opens, and
one shared lance::Session per graph warms metadata/index caches across opens.

Validated against LanceDB upstream (rust/lancedb/src/table/dataset.rs
DatasetConsistencyWrapper): hold an Arc<Dataset> and reuse it for 0-IO warm
reads; one Session per connection threaded into opens; writers never serve from
the read cache; time-travel bypasses. One adaptation: omnigraph keys by version
(snapshot-pins-version model) where LanceDB keys per-table+HEAD, reusing the
in-repo GraphIndexCache LRU template.

- ReadCaches (session + TableHandleCache) injected onto live-Branch-read
  snapshots in resolved_target; Snapshot::open serves from the cache or opens
  once with the session on a miss (via the instrumented open_table_dataset).
- Writes (resolved_branch_target -> open HEAD) and time-travel / Snapshot-id
  reads bypass the cache. Version-in-key makes a write a new key (old handle ages
  out via LRU); invalidate_all at branch-switch/refresh is hygiene only.
- Cost tests: a 2nd identical warm read does 0 table opens; a write re-opens only
  the changed table at its new version.

Full engine suite green.

* test(engine): forbid raw data opens in the read/exec layer (P2 guard)

Extend the forbidden-API guard with Dataset::open / DatasetBuilder::from_uri /
from_namespace so the read/exec layer (exec/, loader/, changes/, db/omnigraph/)
cannot bypass Snapshot::open and the held-handle cache (Fix 3). The instrumented
opener (instrumentation.rs) is allow-listed; two legitimate non-read opens (a
test editing __manifest, Hard-drop version GC) carry sentinels. The
storage/manifest layers stay allow-listed.

Lean P2 scope, per LanceDB-upstream + minimize-liability: the data-read boundary
already exists (SubTableEntry::open); this guard pins it so a future read cannot
open around the cache. Centralizing all internal opens behind one opener is
deferred.

* docs(dev): invariant 15 (one source of truth, cheaply derived) + cost-budget testing

Records the principle behind the query-latency work: Lance and the manifest are
the source of truth, everything else a derived view held warm and refreshed by a
cheap probe; the two failure modes (a drifting parallel copy, and cold
re-derivation whose cost grows with history) are deny-listed. Adds the
cost-budget testing discipline (assert a warm read's open/IO count is flat at
commit-history depth, the LanceDB IO-counted pattern) and the warm_read_cost.rs
row. Updates the read-path-re-derivation known gap to reflect what Fix 1/2/3 +
finding A close, and adds the commit-graph-parent-under-concurrency gap.

* fix(engine): branch-incarnation identity + unified invalidation + shared LruMap (PR #268 review)

Phase 6 A-D, correct-by-design responses to the Codex/Greptile P2 review comments. A: warm-read freshness and the table-handle cache key use the manifest incarnation (e_tag, manifest-timestamp fallback, then version), so a deleted+recreated non-main branch reusing a version number cannot be served stale; main stays version-cheap, non-main loads latest_manifest; a detected stale refresh also invalidates read caches; two regression tests force the version collision. B: unify the two cache invalidations into Omnigraph::invalidate_read_caches() at the four sites. C: assert the stale path's probe count. D: shared LruMap behind both caches with unconditional eviction, plus a unit test. Full engine suite green; multi-process lineage fork and O(history) write refresh remain known gaps for Phase 6E/7.
2026-06-17 13:25:20 +02:00

21 KiB
Raw Blame History

Testing

This file is the always-on map of the test surface. Consult it before every task so you know what tests already cover the area you're about to change, what helpers to reuse, and where a new test belongs. The architectural invariant for boundary-matched tests lives in docs/dev/invariants.md.

Where tests live, per crate

Crate Path Style
omnigraph (engine) crates/omnigraph/tests/ Integration tests (28 files), fixture-driven, share tests/helpers/mod.rs
omnigraph-cli crates/omnigraph-cli/tests/ Per-area suites (post-modularization): cli_cluster.rs (cluster command surface + operator-actor cascade), cli_cluster_e2e.rs (spawned-binary lifecycle compositions — lost-state re-import recovery, out-of-band drift, graph-root destruction, multi-graph mixed-disposition convergence), cli_data.rs (load/read/change/branch/commit/export/snapshot/policy/embed/maintenance + operator format cascade), cli_schema_config.rs (init/config, schema plan/apply), cli_queries.rs, parity_matrix.rs (RFC-009 Phase 1: the embedded-vs-remote referee — every forked verb run against both arms with matched Cedar policy and the same actor, scrubbed-JSON + exit-code equality; divergences are pinned in its KNOWN_DIVERGENCES ledger, never silently repaired), system_local.rs (full-cycle cluster lifecycle with a spawned --cluster server, applied-policy enforcement over HTTP, keyed-credential auth, operator aliases), system_remote.rs; share tests/support/mod.rs (hermetic OMNIGRAPH_HOME by default)
omnigraph-cluster mostly in-source #[cfg(test)] mod tests; tests/failpoints.rs (feature-gated); tests/s3_cluster.rs (bucket-gated full lifecycle on object storage) Cluster config parser, local JSON state diff, state CAS/lock handling/recovery, read-only validate/plan/status plus explicit refresh/import graph observations, config-only apply (content-addressed payload publish, disposition gating, composite-digest convergence, idempotent re-apply), catalog payload verification (status read-only, refresh drift + self-heal), failpoint crash-mid-apply / CAS-race coverage, Stage 4A graph creation (create executor, recovery sidecars + sweep rows, create crash windows), Stage 4B schema apply (migration previews in plan, schema executor, schema-apply sweep classification, schema crash windows), Stage 4C gated deletes (digest-bound approvals, delete executor + tombstones, delete sweep rows, delete crash windows), and 5A policy binding metadata (applies_to in the applied revision, binding-change diffing + convergence, pre-5A backfill), and the 5B serving-snapshot read API (converged read, refusal rows)
omnigraph-server crates/omnigraph-server/tests/ Per-area suites (post-modularization): auth_policy.rs, data_routes.rs, schema_routes.rs, stored_queries.rs, multi_graph.rs (cluster-mode boot — converged serving, policy binding wiring, boot refusals — + the concurrent branch-ops matrix), boot_settings.rs (mode inference, PolicySource), s3.rs (bucket-gated: single-graph serving + config-free --cluster s3:// boot), openapi.rs (OpenAPI drift / regeneration); share tests/support/mod.rs
omnigraph-compiler mostly in-source #[cfg(test)] mod tests Parser, type-checker, IR lowering, lint

The engine's tests/ is the principal coverage surface; most graph-shaped behavior is exercised there.

Engine integration tests (crates/omnigraph/tests/)

File Covers
end_to_end.rs Full init → load → query/mutate flow
branching.rs Branch create / list / delete, lazy fork
merge_truth_table.rs Merge-pair truth table (MR-786): all 9×9 (left_op, right_op) cells from {noop, addNode, removeNode, addEdge, removeEdge, setProperty, dropProperty, addLabel, removeLabel}. Adding a new op to OpVariant forces a compile error in build_case until the new row + column are dispositioned. 36 executable cells run through real branch_merge with a structured oracle (MergeOutcome / MergeConflictKind + graph-state assert); 45 cells involving dropProperty/addLabel/removeLabel are recorded as Unsupported until the mutation grammar grows.
writes.rs Direct-publish writes: cancellation, non-strict insert/merge rebase under the per-table queue, strict stale-write conflicts, multi-statement atomicity, MR-794 staged-write rewire (D₂ rejection, insert+update coalesce, multi-append coalesce, partial-failure recovery, load RI/cardinality recovery)
staged_writes.rs TableStore staged-write primitives (stage_append, stage_merge_insert, commit_staged, scan_with_staged, count_rows_with_staged) — primitive-level only; engine code uses the in-memory MutationStaging accumulator instead
forbidden_apis.rs Defense-in-depth source-walk guard: engine code (exec/, db/omnigraph/, loader/, changes/) must not reach around the sealed storage trait to Lance inline-commit APIs, nor open datasets directly (Dataset::open / DatasetBuilder::from_uri/from_namespace) — reads route through Snapshot::open and the held-handle cache; // forbidden-api-allow: <reason> sentinel exempts reviewed lines
lance_surface_guards.rs Pins the Lance API surfaces omnigraph depends on (named runtime + compile-only guards; see lance.md) — the first smoke check on any Lance version bump; e.g. compact_files_still_fails_on_blob_columns turns red when the upstream blob-compaction fix lands
warm_read_cost.rs Cost-budget tests for the warm read path (query-latency work), measured at the object-store boundary with Lance IOTracker (the LanceDB IO-counted pattern): a warm same-branch read does 0 manifest opens, 0 commit-graph opens, 1 version probe, validates the schema once (Fix 1 / finding A / Fix 2 at commit-history depth); stale same-branch reads perform exactly 2 probes and refresh manifest-only; recreated non-main branches with the same Lance version refresh by incarnation; recreated branch-owned table handles are distinguished by table e_tag or refresh-time cache clearing; recreated traversal topology is protected by synthetic snapshot-id incarnation or refresh-time cache clearing; a warm repeat read does 0 table opens via the held-handle cache and a write re-opens only the changed table at its new version/e_tag (Fix 3/6A). See "Cost-budget tests" below
lifecycle.rs Graph lifecycle, schema state
point_in_time.rs Snapshots, time travel (snapshot_at_version, entity_at)
changes.rs diff_between / diff_commits
consistency.rs Cross-table snapshot isolation, atomic publish
schema_apply.rs Migration plan + apply, schema-apply lock; index materialization deferred to the reconciler (iss-848): apply_schema_defers_vector_index_on_empty_table (an empty-table Vector @index never aborts the apply) and index_only_constraint_apply_touches_no_table_data (adding an @index is metadata-only — no table-version bump)
search.rs FTS / vector / hybrid (bm25, nearest, rrf)
traversal.rs Expand, variable-length hops, anti-join (CSR path — OMNIGRAPH_TRAVERSAL_MODE unset)
traversal_indexed.rs BTREE-indexed Expand (execute_expand_indexed) forced via OMNIGRAPH_TRAVERSAL_MODE, asserted semantically equal to the CSR path; own binary, all #[serial] so env writes never race
proptest_equivalence.rs Property-based query-correctness invariants over generated graphs (shared key alphabet forces cross-type id collisions, cycles, self-loops) — pins Expand-mode equivalence so a future fork divergence fails loudly instead of silently; #[serial]
ordering.rs ORDER BY contract: descending, multi-key precedence, deterministic key-column tie-break (total order, so ORDER … LIMIT is deterministic), NULL placement (nulls_first = !descending)
literal_filters.rs Execution goldens for non-string/non-integer scalar literal filters (F64/F32/Bool/Date/DateTime) across both the in-memory comparison arm and the Lance-pushdown arm
aggregation.rs count, sum, avg, min, max
export.rs NDJSON streaming export filters
s3_storage.rs S3-backed graph (skipped unless OMNIGRAPH_S3_TEST_BUCKET is set)
lance_version_columns.rs Per-row _row_last_updated_at_version behavior
validators.rs Schema constraint enforcement (enum, range, unique, cardinality) across JSONL, insert, update paths
policy_engine_chassis.rs Engine-layer Cedar enforcement (MR-722): allow + deny through every _as writer via the SDK directly — no HTTP — proving embedded and CLI callers hit the same gate as the server, with action × scope shapes matching authorize_request
maintenance.rs optimize (compaction), repair (explicit uncovered-drift publish), and cleanup (version GC): empty/idempotent/no-op edges, policy validation, head preservation; optimize publishes its own compaction (optimize_publishes_compaction_to_manifest_so_schema_apply_succeeds), skips pre-existing uncovered drift (optimize_skips_preexisting_manifest_head_drift), and refuses to run while a __recovery sidecar is pending (optimize_defers_when_recovery_sidecar_is_pending); repair previews/heals verified maintenance drift, refuses raw semantic drift without --force, and forced repair publishes only by explicit operator choice; the index reconciler (iss-848): index_build_tolerates_null_vector_rows (an untrainable Vector column defers instead of aborting the build, sibling indexes still build) and optimize_materializes_index_declared_but_unbuilt (optimize creates a declared-but-deferred index)
failpoints.rs Failure-injection coverage (gated on failpoints feature). Includes the five per-writer Phase B → recovery integration tests (recovery_rolls_forward_after_finalize_publisher_failure, schema_apply_phase_b_failure_recovered_on_next_open, branch_merge_phase_b_failure_recovered_on_next_open, ensure_indices_phase_b_failure_recovered_on_next_open, optimize_phase_b_failure_recovered_on_next_open) and the write-entry in-process heal contract (the four *_after_finalize_publisher_failure_heals_without_reopen tests — load, mutation, schema apply, branch merge: a follow-up write on the same handle rolls a sidecar-covered residual forward without reopen/refresh) and the storage-fault matrix for the sidecar lifecycle (recovery.sidecar_{write,delete,list} / recovery.record_audit failpoints: Phase A put failure aborts with zero drift, Phase D delete failure is swallowed and healed by the next write, list failures are loud at heal and open, audit-append failures are retried to exactly one audit row; plus the bucket-gated s3_load_recovers_after_publisher_failure_without_reopen).
recovery.rs Open-time recovery sweep — sidecar I/O, classifier dispatch (NoMovement / RolledPastExpected / UnexpectedAtP1 / UnexpectedMultistep / InvariantViolation), all-or-nothing decision, roll-forward via ManifestBatchPublisher::publish, roll-back via Dataset::restore, audit row in _graph_commit_recoveries.lance, OpenMode::ReadOnly skip path
composite_flow.rs Compositional/narrative end-to-end stories — multi-step flows that compose mechanics covered by other test files. Catches integration regressions where individual operations all pass their unit tests but their composition breaks (sequential merges, post-merge main writes, time-travel through merge DAG, reopen consistency over multi-merge histories, post-optimize and post-cleanup strict writes).

Fixtures

crates/omnigraph/tests/fixtures/ holds the canonical schema (.pg), seed data (.jsonl), and queries (.gq) shared across tests. Reuse these before inventing new ones — the helpers harness already knows how to load them.

Test helpers

  • Enginecrates/omnigraph/tests/helpers/mod.rs: init_and_load() (bootstrap a temp graph + load standard fixture), snapshot_main(), snapshot_branch(), query/mutation runners, row collection and counting. Use these instead of hand-rolling.
  • CLIcrates/omnigraph-cli/tests/support/mod.rs: Command-style wrapper for invoking omnigraph, server-process spawning, fixture resolution, output assertion helpers.
  • Server — no shared helpers; server tests call the Omnigraph engine API directly and exercise endpoints over the wire.

Note: the storage adapter has an in-memory backend (ObjectStorageAdapter::in_memory(), full contract including true conditional updates) used by the adapter contract tests in storage.rs. It covers only the text-object layer (sidecars, schema staging, cluster state) — Lance datasets bypass the adapter, so engine integration tests still use tempfile::tempdir(). An in-memory Lance substrate remains an architectural ask — keep it explicit in docs/dev/invariants.md under known gaps.

Failpoints (fault injection)

  • Cargo feature: failpoints = ["dep:fail", "fail/failpoints"] (in crates/omnigraph/Cargo.toml and crates/omnigraph-cluster/Cargo.toml; the cluster feature does not enable the engine's).
  • Wrappers: crates/omnigraph/src/failpoints.rs and crates/omnigraph-cluster/src/failpoints.rs expose maybe_fail("name") and ScopedFailPoint for tests.
  • Call sites are inserted at sensitive transaction boundaries (branch create, graph publish commit, cluster apply's payload→state-write window, etc.).
  • Activated tests: crates/omnigraph/tests/failpoints.rs and crates/omnigraph-cluster/tests/failpoints.rs (crash-mid-apply + state CAS race via fail::cfg_callback; integration binaries, never in-source — the fail registry is process-global). Run with cargo test -p omnigraph-engine --features failpoints --test failpoints / cargo test -p omnigraph-cluster --features failpoints --test failpoints.

RustFS / S3 integration

CI runs three S3-backed tests against a containerized RustFS server (.github/workflows/ci.ymlrustfs_integration job):

  • cargo test -p omnigraph-engine --test s3_storage
  • cargo test -p omnigraph-server --test s3 (single-graph serving + config-free --cluster s3:// boot)
  • cargo test -p omnigraph-cluster --test s3_cluster (full control-plane lifecycle on the bucket)
  • cargo test -p omnigraph-cli --test system_local local_cli_s3_end_to_end_init_load_read_flow
  • cargo test -p omnigraph-engine --features failpoints --test failpoints s3_ (recovery-sidecar lifecycle on a real bucket)

Locally, set OMNIGRAPH_S3_TEST_BUCKET (and the usual AWS_* vars including AWS_ENDPOINT_URL_S3 for non-AWS) before running. Without those, S3 tests skip gracefully.

System e2e requirements and suppression

The CLI system tests (system_local.rs) spawn the workspace-built omnigraph and omnigraph-server binaries (cargo provides paths via CARGO_BIN_EXE_*), bind ephemeral localhost ports, and use local-FS temp dirs — no external services, no env vars required; they run in the default cargo test --workspace. The comprehensive cluster lifecycle e2es (multi-server-restart flows) honor an opt-out for constrained sandboxes: set OMNIGRAPH_SKIP_SYSTEM_E2E=1 to skip them with a logged message (the same graceful-skip pattern as the S3 gate). Cargo-native filtering also works: cargo test --test system_local -- --skip local_cluster.

OpenAPI drift

crates/omnigraph-server/tests/openapi.rs regenerates openapi.json and diffs against the checked-in copy. CI auto-commits the regeneration on same-repository PRs and otherwise runs in strict-check mode (env: OMNIGRAPH_UPDATE_OPENAPI).

Examples & benches

  • crates/omnigraph/examples/bench_expand.rs — runnable example (not part of CI).
  • No benches/ directories. Add benches/ per crate when you ship a perf-driven change, and include the motivating workload with the optimization.

Coverage tooling — what's missing

There is no coverage tooling in the repository today: no tarpaulin.toml, no codecov.yml, no coverage CI step. If you want to know whether your change is covered, the answer comes from reading and running the relevant integration tests, not from a tool.

If introducing coverage tooling is in scope for your task, the natural first step is cargo-llvm-cov wired into a separate CI job, and a per-crate threshold rather than a global one.

First principle: check what already covers it

Before writing any new test, check whether an existing test already covers the case. The cost of duplicating coverage is high: more code to read, more places to keep in sync when behavior changes, and more drift when one copy lags. The cost of extending an existing test is usually one extra assertion or one extra fixture row.

How to check:

  1. Map the change to an area — use the engine integration-test table above (branching.rs, writes.rs, search.rs, etc.). The filename usually names the area.
  2. Open the file and skim every test fn name. Test fn names are the index — read them all, not just the first few.
  3. Grep for the symbol or path you're changing. rg <FunctionName> or rg <enum_variant> across all tests/ directories surfaces existing coverage you might miss.
  4. Decide one of three outcomes, in this order of preference:
    • Existing test already asserts the new behavior → no new test needed; this PR is a refactor or no-op behaviorally. Confirm by running the existing test against the change.
    • Existing test covers the area but not your caseadd an assertion or a fixture row to the existing test, don't write a new function with init_and_load() again.
    • No existing coverage in any test file → only then write a new test; put it in the file that owns the area, or open a new file only if the area itself is new.

Three duplicated init_and_load() → run_query → assert_eq blocks where one parameterized test would do is the most common form of test rot in this repository. Don't add to it.

Before-every-task checklist

When you pick up any change, walk through this:

  1. Find existing coverage (per the principle above). Don't just look at the first test file by name — grep for the symbol you're touching across every crate's tests/.
  2. Run those tests locally before editing. cargo test --workspace --locked for the broad pass; -p <crate> --test <file> for a focused loop. Confirm a clean baseline.
  3. Decide extend-vs-new explicitly. If you can extend an existing test (assertion, fixture row, parameterization), do that. Only add a new test fn or new file if no existing one owns the area.
  4. Reuse the helpers. init_and_load(), fixture files, the CLI support harness — re-use them. Don't bootstrap a fresh graph by hand if a helper exists.
  5. Mind the boundary. Per docs/dev/invariants.md, test at the layer the change lives at — planner-level changes deserve planner-level tests, not just end-to-end.
  6. For substrate-touching changes (Lance behavior), reach for failpoints or fixture-driven scenarios, not stubbed-out mocks.
  7. For server / API changes, confirm the OpenAPI regeneration happens in openapi.rs and that the diff lands in openapi.json.
  8. Verify your change makes an existing test fail before it makes the new one pass. If you can break the code without breaking a test, your coverage gap is the problem to fix first.
  9. Bound hot-path cost at history depth. If the change touches a read or open path, add or extend a test that asserts a bounded cost (e.g. a warm same-branch read performs zero Dataset::open, or a fixed object-op count) against a fixture with realistic commit-history depth, not just realistic row counts. Cost that scales with history is invisible on a shallow fixture and only bites in production. See "Cost-budget tests" below.

Cost-budget tests: bound hot-path cost at history depth

Correctness bugs fail loudly in tests; cost-scaling bugs pass every test and degrade silently in production. The engine read path historically had no cost assertion, and fixtures carry shallow commit history, so an O(commits)-per-query cost stayed green in CI and only surfaced on a long-lived graph (read snapshot resolution re-scanned the internal manifest and commit-graph tables on every query, and those tables were never compacted). Guard against the class:

  • Assert a cost budget, not just a result. For a read/open path, assert the number of Dataset::open calls (or object-store ops) a warm query performs, and that it does not grow with commit count. The reference is LanceDB's IO-counted tests, which assert a cached read costs 0-1 IO and carry a named regression test against "a list call on every subsequent query."
  • Test at history depth. Build a fixture with many commits (not many rows) and assert warm-read cost is flat across depths. A shallow fixture cannot catch an O(commits) cost.
  • This is the testing companion to invariant 15 in docs/dev/invariants.md (hot-path cost is bounded by work, not history).

When in doubt, re-read docs/dev/invariants.md — quality gates apply to every change.