mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-18 02:24:27 +02:00
* test(engine): add read-path IO instrumentation seam for warm-read cost tests
Prerequisite seam for the query-latency fixes. Adds
crates/omnigraph/src/instrumentation.rs:
- CountingStorageAdapter: a StorageAdapter decorator counting per-method
reads (read_text/exists/read_text_versioned/list_dir), for the
schema-contract reads on the query path.
- A per-query task-local (QueryIoProbes) carrying Lance WrappingObjectStore
wrappers per open category plus a probe counter, delivered via
with_query_io_probes. open_dataset_tracked attaches the wrapper so the
open itself is counted (ObjectStoreParams.object_store_wrapper).
Wires the wrappers into the manifest open (open_manifest_dataset) and the
commit-graph opens (CommitGraph::open/open_at_branch). Production leaves
the task-local unset, so nothing attaches.
Makes Omnigraph::open_with_storage public so tests can inject the counting
adapter. lance-io is a dev-dependency (IOTracker named only in tests). No
runtime behavior change.
* test(engine): warm same-branch read should reuse the coordinator (red)
Cost-budget test using Lance IOTracker at the object-store boundary (the
LanceDB IO-counted-test pattern). On a 20-commit-deep graph, a warm
same-branch query re-opens a fresh coordinator, which opens both the commit
graph and __manifest. Asserts the read opens the commit graph zero times
and performs exactly one cheap version probe; today it does neither (it
scans the commit graph on re-open and never probes). The freshness guard
already passes. Adds the commit_many helper for history-depth fixtures.
Red half of the Fix 1 red->green pair; turns green with the next commit.
* perf(engine): same-branch reads reuse the warm coordinator (Fix 1)
query()/resolved_target re-opened a fresh GraphCoordinator from storage on
every read (full __manifest scan + two commit-graph scans), so a warm
read's cost grew with commit history (invariant 15) though the data was
unchanged.
resolved_target now serves same-branch reads from the warm in-memory
coordinator, gated by a cheap version probe (latest_version_id, one
object-store op) instead of a full re-open:
- fresh (probe == cached version): return the in-memory snapshot under the
read lock, with a synthetic (branch, version) id and no commit-graph
access (reads pin the snapshot by manifest version, not the commit DAG;
invariant 2).
- stale: take the write lock, re-probe (double-checked; tokio RwLock has no
read->write upgrade), then refresh_manifest_only (no commit-graph scan),
preserving strong consistency for external writers (invariant 6).
Cross-branch and snapshot targets keep the existing cold-resolve path.
Adds ManifestCoordinator/GraphCoordinator::probe_latest_version and
GraphCoordinator::refresh_manifest_only. Nothing on the read path needs a
real commit ULID (only RuntimeCache keys on the id, where synthetic is
consistent), per a caller audit.
A warm same-branch read on a 20-commit graph now does zero commit-graph
opens and exactly one probe (down from a deep commit-graph scan) and still
observes external commits. The residual per-table __manifest scans are
removed later by Fix 2.
* test(engine): warm query should validate the schema contract once (red)
ensure_schema_state_valid runs twice per query (query()/run_query_at AND
resolved_target/snapshot_at_version), each reading 3 contract files + 2
existence probes. A warm query thus does 6 read_text + 4 exists where one
validation (3 + 2) suffices, measured via CountingStorageAdapter. Adds a
drift guard (schema_source_drift_is_caught_on_read) that already passes.
Red half of the finding-A red->green pair.
* perf(engine): validate the schema contract once per query (finding A)
ensure_schema_state_valid ran on every query AND again inside
resolved_target / snapshot_at_version, so each query validated the schema
contract twice (~10 storage ops). Removes the redundant query()/
run_query_at() calls; the validation inside resolved_target /
snapshot_at_version still runs, so drift is detected exactly as before.
A source-only fast path was rejected: a long-lived handle must detect
external drift of the schema source, IR, OR state on its next operation
(lifecycle::long_lived_handle_rejects_schema_*), which a source-only
compare would miss. So the only safe latency win is not validating twice.
A warm query now does one validation (3 read_text + 2 exists) instead of
two (6 + 4).
* test(engine): warm + multi-table reads should do zero manifest scans (red)
After Fix 1 a warm same-branch read still scans __manifest ~44 times at
20-commit depth: not from resolution (Fix 1 removed that) but from the
per-table open path, which routes through the Lance namespace and full-scans
__manifest twice per touched table (describe_table + describe_table_version).
Tightens the warm test to assert manifest read_iops == 0 and adds a
multi-table (traversal) test asserting the same, pinning the "2 tables = 2x"
tax. Red half of the Fix 2 red->green pair.
* perf(engine): open touched tables by location+version, not via the namespace (Fix 2)
SubTableEntry::open routed every read-path table open through
DatasetBuilder::from_namespace(BranchManifestNamespace), whose describe_table
full-scans __manifest and, with managed_versioning, makes Lance scan again
(describe_table_version) -- two full __manifest scans per touched table. That
was the residual that made warm-read manifest IO grow with history and the
'2 tables = 2x' multi-table tax.
The resolved Snapshot already holds each table's path/version/branch, so open
directly: from_uri(table_uri_for_path(root, path, branch)).with_version(v).
The branch-qualified location is the dataset that physically holds the version
(main: {path}; branch: {path}/tree/{branch}, Lance native-branch storage), and
with_version resolves it within THAT dataset's _versions. 0 namespace calls +
1 HEAD via the native ConditionalPutCommitHandler.
The read namespace (BranchManifestNamespace) is now unused in production
(writes use StagedTableNamespace), so it, its constructor, and the helpers only
it used (to_namespace_version, publish_requests, their imports) are gated
#[cfg(test)] -- retained to validate the namespace contract in unit tests.
Removes the dead open_table_at_version_from_manifest.
Warm same-branch + multi-table reads now scan __manifest zero times; branch +
time-travel reads stay correct (branching.rs, point_in_time.rs, 2 lib
regression tests); production-lib warnings unchanged (baseline).
* test(engine): cost-budget coverage for branch-warm and stale-refresh reads (matrix)
Extends the read-path cost-budget tests across more of the morphological matrix:
- warm_branch_read_does_no_manifest_scans: a warm read on a non-main branch
(handle synced to it) scans __manifest zero times, exercising Fix 2's
branch-owned-table open (tree/{branch} + with_version) on Fix 1's warm path --
the cell that regressed when the open used with_branch against the base.
- stale_read_refreshes_manifest_only: an external commit makes the next read
take the stale path, which re-reads the manifest (read_iops > 0) but never
scans the commit graph (refresh_manifest_only), pinning Fix 1's manifest-only
refresh.
Cold paths (cross-branch, time-travel) stay behavior-covered (branching.rs,
point_in_time.rs) and are cold by design (Fix 1 warm-paths only same-branch), so
there is no manifest==0 contract to assert there.
* test(engine): same-branch write after external commit must not fork the commit DAG (red)
* fix(engine): refresh commit-graph head before append to prevent same-branch DAG fork
A same-branch write that follows an external commit committed a fresh manifest
version (commit_all rebases the pin from a fresh coordinator) but appended off
the coordinator's stale in-memory commit-graph head, forking the commit DAG (the
new commit and the external commit shared a parent). Pre-existing for non-strict
inserts; widened to strict ops by Fix 1's refresh_manifest_only freshening the
read-time pin. record_graph_commit now refreshes the commit-graph head from
storage before append_commit, so the parent is the true current head.
record_merge_commit is unaffected (it passes explicit parents).
* perf(engine): hold open Dataset handles + share one Session per graph (Fix 3)
A warm same-branch read still re-opened every touched table per query (the
"never warms up" residual after Fix 1+2). A per-graph held-handle cache keyed by
(table_path, branch, version) now serves repeat reads with zero table opens, and
one shared lance::Session per graph warms metadata/index caches across opens.
Validated against LanceDB upstream (rust/lancedb/src/table/dataset.rs
DatasetConsistencyWrapper): hold an Arc<Dataset> and reuse it for 0-IO warm
reads; one Session per connection threaded into opens; writers never serve from
the read cache; time-travel bypasses. One adaptation: omnigraph keys by version
(snapshot-pins-version model) where LanceDB keys per-table+HEAD, reusing the
in-repo GraphIndexCache LRU template.
- ReadCaches (session + TableHandleCache) injected onto live-Branch-read
snapshots in resolved_target; Snapshot::open serves from the cache or opens
once with the session on a miss (via the instrumented open_table_dataset).
- Writes (resolved_branch_target -> open HEAD) and time-travel / Snapshot-id
reads bypass the cache. Version-in-key makes a write a new key (old handle ages
out via LRU); invalidate_all at branch-switch/refresh is hygiene only.
- Cost tests: a 2nd identical warm read does 0 table opens; a write re-opens only
the changed table at its new version.
Full engine suite green.
* test(engine): forbid raw data opens in the read/exec layer (P2 guard)
Extend the forbidden-API guard with Dataset::open / DatasetBuilder::from_uri /
from_namespace so the read/exec layer (exec/, loader/, changes/, db/omnigraph/)
cannot bypass Snapshot::open and the held-handle cache (Fix 3). The instrumented
opener (instrumentation.rs) is allow-listed; two legitimate non-read opens (a
test editing __manifest, Hard-drop version GC) carry sentinels. The
storage/manifest layers stay allow-listed.
Lean P2 scope, per LanceDB-upstream + minimize-liability: the data-read boundary
already exists (SubTableEntry::open); this guard pins it so a future read cannot
open around the cache. Centralizing all internal opens behind one opener is
deferred.
* docs(dev): invariant 15 (one source of truth, cheaply derived) + cost-budget testing
Records the principle behind the query-latency work: Lance and the manifest are
the source of truth, everything else a derived view held warm and refreshed by a
cheap probe; the two failure modes (a drifting parallel copy, and cold
re-derivation whose cost grows with history) are deny-listed. Adds the
cost-budget testing discipline (assert a warm read's open/IO count is flat at
commit-history depth, the LanceDB IO-counted pattern) and the warm_read_cost.rs
row. Updates the read-path-re-derivation known gap to reflect what Fix 1/2/3 +
finding A close, and adds the commit-graph-parent-under-concurrency gap.
* fix(engine): branch-incarnation identity + unified invalidation + shared LruMap (PR #268 review)
Phase 6 A-D, correct-by-design responses to the Codex/Greptile P2 review comments. A: warm-read freshness and the table-handle cache key use the manifest incarnation (e_tag, manifest-timestamp fallback, then version), so a deleted+recreated non-main branch reusing a version number cannot be served stale; main stays version-cheap, non-main loads latest_manifest; a detected stale refresh also invalidates read caches; two regression tests force the version collision. B: unify the two cache invalidations into Omnigraph::invalidate_read_caches() at the four sites. C: assert the stale path's probe count. D: shared LruMap behind both caches with unconditional eviction, plus a unit test. Full engine suite green; multi-process lineage fork and O(history) write refresh remain known gaps for Phase 6E/7.
139 lines
21 KiB
Markdown
139 lines
21 KiB
Markdown
# Testing
|
||
|
||
This file is the always-on map of the test surface. **Consult it before every task** so you know what tests already cover the area you're about to change, what helpers to reuse, and where a new test belongs. The architectural invariant for boundary-matched tests lives in [docs/dev/invariants.md](invariants.md).
|
||
|
||
## Where tests live, per crate
|
||
|
||
| Crate | Path | Style |
|
||
|---|---|---|
|
||
| `omnigraph` (engine) | `crates/omnigraph/tests/` | Integration tests (28 files), fixture-driven, share `tests/helpers/mod.rs` |
|
||
| `omnigraph-cli` | `crates/omnigraph-cli/tests/` | Per-area suites (post-modularization): `cli_cluster.rs` (cluster command surface + operator-actor cascade), `cli_cluster_e2e.rs` (spawned-binary lifecycle compositions — lost-state re-import recovery, out-of-band drift, graph-root destruction, multi-graph mixed-disposition convergence), `cli_data.rs` (load/read/change/branch/commit/export/snapshot/policy/embed/maintenance + operator format cascade), `cli_schema_config.rs` (init/config, schema plan/apply), `cli_queries.rs`, `parity_matrix.rs` (RFC-009 Phase 1: the embedded-vs-remote referee — every forked verb run against both arms with matched Cedar policy and the same actor, scrubbed-JSON + exit-code equality; divergences are pinned in its `KNOWN_DIVERGENCES` ledger, never silently repaired), `system_local.rs` (full-cycle cluster lifecycle with a spawned `--cluster` server, applied-policy enforcement over HTTP, keyed-credential auth, operator aliases), `system_remote.rs`; share `tests/support/mod.rs` (hermetic `OMNIGRAPH_HOME` by default) |
|
||
| `omnigraph-cluster` | mostly in-source `#[cfg(test)] mod tests`; `tests/failpoints.rs` (feature-gated); `tests/s3_cluster.rs` (bucket-gated full lifecycle on object storage) | Cluster config parser, local JSON state diff, state CAS/lock handling/recovery, read-only validate/plan/status plus explicit refresh/import graph observations, config-only apply (content-addressed payload publish, disposition gating, composite-digest convergence, idempotent re-apply), catalog payload verification (status read-only, refresh drift + self-heal), failpoint crash-mid-apply / CAS-race coverage, Stage 4A graph creation (create executor, recovery sidecars + sweep rows, create crash windows), Stage 4B schema apply (migration previews in plan, schema executor, schema-apply sweep classification, schema crash windows), Stage 4C gated deletes (digest-bound approvals, delete executor + tombstones, delete sweep rows, delete crash windows), and 5A policy binding metadata (applies_to in the applied revision, binding-change diffing + convergence, pre-5A backfill), and the 5B serving-snapshot read API (converged read, refusal rows) |
|
||
| `omnigraph-server` | `crates/omnigraph-server/tests/` | Per-area suites (post-modularization): `auth_policy.rs`, `data_routes.rs`, `schema_routes.rs`, `stored_queries.rs`, `multi_graph.rs` (cluster-mode boot — converged serving, policy binding wiring, boot refusals — + the concurrent branch-ops matrix), `boot_settings.rs` (mode inference, PolicySource), `s3.rs` (bucket-gated: single-graph serving + config-free `--cluster s3://` boot), `openapi.rs` (OpenAPI drift / regeneration); share `tests/support/mod.rs` |
|
||
| `omnigraph-compiler` | mostly in-source `#[cfg(test)] mod tests` | Parser, type-checker, IR lowering, lint |
|
||
|
||
The engine's `tests/` is the principal coverage surface; most graph-shaped behavior is exercised there.
|
||
|
||
## Engine integration tests (`crates/omnigraph/tests/`)
|
||
|
||
| File | Covers |
|
||
|---|---|
|
||
| `end_to_end.rs` | Full init → load → query/mutate flow |
|
||
| `branching.rs` | Branch create / list / delete, lazy fork |
|
||
| `merge_truth_table.rs` | Merge-pair truth table (MR-786): all 9×9 `(left_op, right_op)` cells from `{noop, addNode, removeNode, addEdge, removeEdge, setProperty, dropProperty, addLabel, removeLabel}`. Adding a new op to `OpVariant` forces a compile error in `build_case` until the new row + column are dispositioned. 36 executable cells run through real `branch_merge` with a structured oracle (`MergeOutcome` / `MergeConflictKind` + graph-state assert); 45 cells involving `dropProperty`/`addLabel`/`removeLabel` are recorded as `Unsupported` until the mutation grammar grows. |
|
||
| `writes.rs` | Direct-publish writes: cancellation, non-strict insert/merge rebase under the per-table queue, strict stale-write conflicts, multi-statement atomicity, MR-794 staged-write rewire (D₂ rejection, insert+update coalesce, multi-append coalesce, partial-failure recovery, load RI/cardinality recovery) |
|
||
| `staged_writes.rs` | TableStore staged-write primitives (`stage_append`, `stage_merge_insert`, `commit_staged`, `scan_with_staged`, `count_rows_with_staged`) — primitive-level only; engine code uses the in-memory `MutationStaging` accumulator instead |
|
||
| `forbidden_apis.rs` | Defense-in-depth source-walk guard: engine code (`exec/`, `db/omnigraph/`, `loader/`, `changes/`) must not reach around the sealed storage trait to Lance inline-commit APIs, nor open datasets directly (`Dataset::open` / `DatasetBuilder::from_uri`/`from_namespace`) — reads route through `Snapshot::open` and the held-handle cache; `// forbidden-api-allow: <reason>` sentinel exempts reviewed lines |
|
||
| `lance_surface_guards.rs` | Pins the Lance API surfaces omnigraph depends on (named runtime + compile-only guards; see [lance.md](lance.md)) — the first smoke check on any Lance version bump; e.g. `compact_files_still_fails_on_blob_columns` turns red when the upstream blob-compaction fix lands |
|
||
| `warm_read_cost.rs` | Cost-budget tests for the warm read path (query-latency work), measured at the object-store boundary with Lance `IOTracker` (the LanceDB IO-counted pattern): a warm same-branch read does 0 manifest opens, 0 commit-graph opens, 1 version probe, validates the schema once (Fix 1 / finding A / Fix 2 at commit-history depth); stale same-branch reads perform exactly 2 probes and refresh manifest-only; recreated non-main branches with the same Lance version refresh by incarnation; recreated branch-owned table handles are distinguished by table e_tag or refresh-time cache clearing; recreated traversal topology is protected by synthetic snapshot-id incarnation or refresh-time cache clearing; a warm *repeat* read does 0 table opens via the held-handle cache and a write re-opens only the changed table at its new version/e_tag (Fix 3/6A). See "Cost-budget tests" below |
|
||
| `lifecycle.rs` | Graph lifecycle, schema state |
|
||
| `point_in_time.rs` | Snapshots, time travel (`snapshot_at_version`, `entity_at`) |
|
||
| `changes.rs` | `diff_between` / `diff_commits` |
|
||
| `consistency.rs` | Cross-table snapshot isolation, atomic publish |
|
||
| `schema_apply.rs` | Migration plan + apply, schema-apply lock; index materialization deferred to the reconciler (iss-848): `apply_schema_defers_vector_index_on_empty_table` (an empty-table Vector `@index` never aborts the apply) and `index_only_constraint_apply_touches_no_table_data` (adding an `@index` is metadata-only — no table-version bump) |
|
||
| `search.rs` | FTS / vector / hybrid (`bm25`, `nearest`, `rrf`) |
|
||
| `traversal.rs` | `Expand`, variable-length hops, anti-join (CSR path — `OMNIGRAPH_TRAVERSAL_MODE` unset) |
|
||
| `traversal_indexed.rs` | BTREE-indexed Expand (`execute_expand_indexed`) forced via `OMNIGRAPH_TRAVERSAL_MODE`, asserted semantically equal to the CSR path; own binary, all `#[serial]` so env writes never race |
|
||
| `proptest_equivalence.rs` | Property-based query-correctness invariants over generated graphs (shared key alphabet forces cross-type id collisions, cycles, self-loops) — pins Expand-mode equivalence so a future fork divergence fails loudly instead of silently; `#[serial]` |
|
||
| `ordering.rs` | ORDER BY contract: descending, multi-key precedence, deterministic key-column tie-break (total order, so `ORDER … LIMIT` is deterministic), NULL placement (`nulls_first = !descending`) |
|
||
| `literal_filters.rs` | Execution goldens for non-string/non-integer scalar literal filters (F64/F32/Bool/Date/DateTime) across both the in-memory comparison arm and the Lance-pushdown arm |
|
||
| `aggregation.rs` | `count`, `sum`, `avg`, `min`, `max` |
|
||
| `export.rs` | NDJSON streaming export filters |
|
||
| `s3_storage.rs` | S3-backed graph (skipped unless `OMNIGRAPH_S3_TEST_BUCKET` is set) |
|
||
| `lance_version_columns.rs` | Per-row `_row_last_updated_at_version` behavior |
|
||
| `validators.rs` | Schema constraint enforcement (enum, range, unique, cardinality) across JSONL, insert, update paths |
|
||
| `policy_engine_chassis.rs` | Engine-layer Cedar enforcement (MR-722): allow + deny through every `_as` writer via the SDK directly — no HTTP — proving embedded and CLI callers hit the same gate as the server, with action × scope shapes matching `authorize_request` |
|
||
| `maintenance.rs` | `optimize` (compaction), `repair` (explicit uncovered-drift publish), and `cleanup` (version GC): empty/idempotent/no-op edges, policy validation, head preservation; `optimize` publishes its own compaction (`optimize_publishes_compaction_to_manifest_so_schema_apply_succeeds`), skips pre-existing uncovered drift (`optimize_skips_preexisting_manifest_head_drift`), and refuses to run while a `__recovery` sidecar is pending (`optimize_defers_when_recovery_sidecar_is_pending`); `repair` previews/heals verified maintenance drift, refuses raw semantic drift without `--force`, and forced repair publishes only by explicit operator choice; the index reconciler (iss-848): `index_build_tolerates_null_vector_rows` (an untrainable Vector column defers instead of aborting the build, sibling indexes still build) and `optimize_materializes_index_declared_but_unbuilt` (optimize creates a declared-but-deferred index) |
|
||
| `failpoints.rs` | Failure-injection coverage (gated on `failpoints` feature). Includes the five per-writer Phase B → recovery integration tests (`recovery_rolls_forward_after_finalize_publisher_failure`, `schema_apply_phase_b_failure_recovered_on_next_open`, `branch_merge_phase_b_failure_recovered_on_next_open`, `ensure_indices_phase_b_failure_recovered_on_next_open`, `optimize_phase_b_failure_recovered_on_next_open`) and the write-entry in-process heal contract (the four `*_after_finalize_publisher_failure_heals_without_reopen` tests — load, mutation, schema apply, branch merge: a follow-up write on the same handle rolls a sidecar-covered residual forward without reopen/refresh) and the storage-fault matrix for the sidecar lifecycle (`recovery.sidecar_{write,delete,list}` / `recovery.record_audit` failpoints: Phase A put failure aborts with zero drift, Phase D delete failure is swallowed and healed by the next write, list failures are loud at heal and open, audit-append failures are retried to exactly one audit row; plus the bucket-gated `s3_load_recovers_after_publisher_failure_without_reopen`). |
|
||
| `recovery.rs` | Open-time recovery sweep — sidecar I/O, classifier dispatch (NoMovement / RolledPastExpected / UnexpectedAtP1 / UnexpectedMultistep / InvariantViolation), all-or-nothing decision, roll-forward via `ManifestBatchPublisher::publish`, roll-back via `Dataset::restore`, audit row in `_graph_commit_recoveries.lance`, `OpenMode::ReadOnly` skip path |
|
||
| `composite_flow.rs` | Compositional/narrative end-to-end stories — multi-step flows that compose mechanics covered by other test files. Catches integration regressions where individual operations all pass their unit tests but their composition breaks (sequential merges, post-merge main writes, time-travel through merge DAG, reopen consistency over multi-merge histories, post-optimize and post-cleanup strict writes). |
|
||
|
||
## Fixtures
|
||
|
||
`crates/omnigraph/tests/fixtures/` holds the canonical schema (`.pg`), seed data (`.jsonl`), and queries (`.gq`) shared across tests. Reuse these before inventing new ones — the helpers harness already knows how to load them.
|
||
|
||
## Test helpers
|
||
|
||
- **Engine** — `crates/omnigraph/tests/helpers/mod.rs`: `init_and_load()` (bootstrap a temp graph + load standard fixture), `snapshot_main()`, `snapshot_branch()`, query/mutation runners, row collection and counting. Use these instead of hand-rolling.
|
||
- **CLI** — `crates/omnigraph-cli/tests/support/mod.rs`: `Command`-style wrapper for invoking `omnigraph`, server-process spawning, fixture resolution, output assertion helpers.
|
||
- **Server** — no shared helpers; server tests call the `Omnigraph` engine API directly and exercise endpoints over the wire.
|
||
|
||
> Note: the storage adapter has an in-memory backend (`ObjectStorageAdapter::in_memory()`, full contract including true conditional updates) used by the adapter contract tests in `storage.rs`. It covers only the text-object layer (sidecars, schema staging, cluster state) — **Lance datasets bypass the adapter**, so engine integration tests still use `tempfile::tempdir()`. An in-memory Lance substrate remains an architectural ask — keep it explicit in [docs/dev/invariants.md](invariants.md) under known gaps.
|
||
|
||
## Failpoints (fault injection)
|
||
|
||
- Cargo feature: `failpoints = ["dep:fail", "fail/failpoints"]` (in `crates/omnigraph/Cargo.toml` **and** `crates/omnigraph-cluster/Cargo.toml`; the cluster feature does not enable the engine's).
|
||
- Wrappers: `crates/omnigraph/src/failpoints.rs` and `crates/omnigraph-cluster/src/failpoints.rs` expose `maybe_fail("name")` and `ScopedFailPoint` for tests.
|
||
- Call sites are inserted at sensitive transaction boundaries (branch create, graph publish commit, cluster apply's payload→state-write window, etc.).
|
||
- Activated tests: `crates/omnigraph/tests/failpoints.rs` and `crates/omnigraph-cluster/tests/failpoints.rs` (crash-mid-apply + state CAS race via `fail::cfg_callback`; integration binaries, never in-source — the fail registry is process-global). Run with `cargo test -p omnigraph-engine --features failpoints --test failpoints` / `cargo test -p omnigraph-cluster --features failpoints --test failpoints`.
|
||
|
||
## RustFS / S3 integration
|
||
|
||
CI runs three S3-backed tests against a containerized RustFS server (`.github/workflows/ci.yml` → `rustfs_integration` job):
|
||
|
||
- `cargo test -p omnigraph-engine --test s3_storage`
|
||
- `cargo test -p omnigraph-server --test s3` (single-graph serving + config-free `--cluster s3://` boot)
|
||
- `cargo test -p omnigraph-cluster --test s3_cluster` (full control-plane lifecycle on the bucket)
|
||
- `cargo test -p omnigraph-cli --test system_local local_cli_s3_end_to_end_init_load_read_flow`
|
||
- `cargo test -p omnigraph-engine --features failpoints --test failpoints s3_` (recovery-sidecar lifecycle on a real bucket)
|
||
|
||
Locally, set `OMNIGRAPH_S3_TEST_BUCKET` (and the usual `AWS_*` vars including `AWS_ENDPOINT_URL_S3` for non-AWS) before running. Without those, S3 tests skip gracefully.
|
||
|
||
## System e2e requirements and suppression
|
||
|
||
The CLI system tests (`system_local.rs`) spawn the workspace-built `omnigraph` and `omnigraph-server` binaries (cargo provides paths via `CARGO_BIN_EXE_*`), bind ephemeral localhost ports, and use local-FS temp dirs — no external services, no env vars required; they run in the default `cargo test --workspace`. The comprehensive cluster lifecycle e2es (multi-server-restart flows) honor an opt-out for constrained sandboxes: set `OMNIGRAPH_SKIP_SYSTEM_E2E=1` to skip them with a logged message (the same graceful-skip pattern as the S3 gate). Cargo-native filtering also works: `cargo test --test system_local -- --skip local_cluster`.
|
||
|
||
## OpenAPI drift
|
||
|
||
`crates/omnigraph-server/tests/openapi.rs` regenerates `openapi.json` and diffs against the checked-in copy. CI auto-commits the regeneration on same-repository PRs and otherwise runs in strict-check mode (env: `OMNIGRAPH_UPDATE_OPENAPI`).
|
||
|
||
## Examples & benches
|
||
|
||
- `crates/omnigraph/examples/bench_expand.rs` — runnable example (not part of CI).
|
||
- No `benches/` directories. Add `benches/` per crate when you ship a perf-driven change, and include the motivating workload with the optimization.
|
||
|
||
## Coverage tooling — what's missing
|
||
|
||
There is **no** coverage tooling in the repository today: no `tarpaulin.toml`, no `codecov.yml`, no coverage CI step. If you want to know whether your change is covered, the answer comes from reading and running the relevant integration tests, not from a tool.
|
||
|
||
If introducing coverage tooling is in scope for your task, the natural first step is `cargo-llvm-cov` wired into a separate CI job, and a per-crate threshold rather than a global one.
|
||
|
||
## First principle: check what already covers it
|
||
|
||
**Before writing any new test, check whether an existing test already covers the case.** The cost of duplicating coverage is high: more code to read, more places to keep in sync when behavior changes, and more drift when one copy lags. The cost of *extending* an existing test is usually one extra assertion or one extra fixture row.
|
||
|
||
How to check:
|
||
|
||
1. **Map the change to an area** — use the engine integration-test table above (`branching.rs`, `writes.rs`, `search.rs`, etc.). The filename usually names the area.
|
||
2. **Open the file and skim every test fn name.** Test fn names are the index — read them all, not just the first few.
|
||
3. **Grep for the symbol or path you're changing.** `rg <FunctionName>` or `rg <enum_variant>` across all `tests/` directories surfaces existing coverage you might miss.
|
||
4. **Decide one of three outcomes**, in this order of preference:
|
||
- *Existing test already asserts the new behavior* → no new test needed; this PR is a refactor or no-op behaviorally. Confirm by running the existing test against the change.
|
||
- *Existing test covers the area but not your case* → **add an assertion or a fixture row to the existing test**, don't write a new function with `init_and_load()` again.
|
||
- *No existing coverage in any test file* → only then write a new test; put it in the file that owns the area, or open a new file only if the area itself is new.
|
||
|
||
Three duplicated `init_and_load() → run_query → assert_eq` blocks where one parameterized test would do is the most common form of test rot in this repository. Don't add to it.
|
||
|
||
## Before-every-task checklist
|
||
|
||
When you pick up any change, walk through this:
|
||
|
||
1. **Find existing coverage** (per the principle above). Don't just look at the first test file by name — grep for the symbol you're touching across every crate's `tests/`.
|
||
2. **Run those tests locally before editing.** `cargo test --workspace --locked` for the broad pass; `-p <crate> --test <file>` for a focused loop. Confirm a clean baseline.
|
||
3. **Decide extend-vs-new** explicitly. If you can extend an existing test (assertion, fixture row, parameterization), do that. Only add a new test fn or new file if no existing one owns the area.
|
||
4. **Reuse the helpers.** `init_and_load()`, fixture files, the CLI `support` harness — re-use them. Don't bootstrap a fresh graph by hand if a helper exists.
|
||
5. **Mind the boundary.** Per [docs/dev/invariants.md](invariants.md), test at the layer the change lives at — planner-level changes deserve planner-level tests, not just end-to-end.
|
||
6. **For substrate-touching changes** (Lance behavior), reach for `failpoints` or fixture-driven scenarios, not stubbed-out mocks.
|
||
7. **For server / API changes**, confirm the OpenAPI regeneration happens in `openapi.rs` and that the diff lands in `openapi.json`.
|
||
8. **Verify your change makes an existing test fail before it makes the new one pass.** If you can break the code without breaking a test, your coverage gap is the problem to fix first.
|
||
9. **Bound hot-path cost at history depth.** If the change touches a read or open path, add or extend a test that asserts a *bounded* cost (e.g. a warm same-branch read performs zero `Dataset::open`, or a fixed object-op count) against a fixture with realistic *commit-history depth*, not just realistic row counts. Cost that scales with history is invisible on a shallow fixture and only bites in production. See "Cost-budget tests" below.
|
||
|
||
## Cost-budget tests: bound hot-path cost at history depth
|
||
|
||
Correctness bugs fail loudly in tests; cost-scaling bugs pass every test and degrade silently in production. The engine read path historically had no cost assertion, and fixtures carry shallow commit history, so an O(commits)-per-query cost stayed green in CI and only surfaced on a long-lived graph (read snapshot resolution re-scanned the internal manifest and commit-graph tables on every query, and those tables were never compacted). Guard against the class:
|
||
|
||
- **Assert a cost budget, not just a result.** For a read/open path, assert the number of `Dataset::open` calls (or object-store ops) a warm query performs, and that it does not grow with commit count. The reference is LanceDB's IO-counted tests, which assert a cached read costs 0-1 IO and carry a named regression test against "a list call on every subsequent query."
|
||
- **Test at history depth.** Build a fixture with many *commits* (not many rows) and assert warm-read cost is flat across depths. A shallow fixture cannot catch an O(commits) cost.
|
||
- This is the testing companion to invariant 15 in [docs/dev/invariants.md](invariants.md) (hot-path cost is bounded by work, not history).
|
||
|
||
When in doubt, re-read [docs/dev/invariants.md](invariants.md) — quality gates apply to every change.
|