omnigraph/.github
Andrew Altshuler 98530a0e8a
ci: shard the RustFS S3 integration job across parallel runners (#321)
* ci: shard the RustFS S3 integration job across parallel runners

The RustFS S3 Integration job chronically hit its 75-minute timeout (e.g. on
the v0.8.0 release run) and got cancelled. Root cause is compile time, not test
time: the S3 tests each run in seconds (the write_cost_s3 step took 0.2m once the
engine was built), but the job ran six serial `cargo test` steps across four
crates plus a `--features failpoints` rebuild, and on a cold cache (any Cargo.lock
change, e.g. a release version bump) every suite must recompile the omnigraph-engine
+ Lance/DataFusion tree, summing to ~75m.

Split the suites into a `strategy.matrix.shard` (engine / server / cluster / cli /
failpoints), one suite per shard on its own runner with a per-shard rust-cache key
and `fail-fast: false`. Wall-clock becomes the slowest single shard (~40m cold,
~25m warm) instead of the sum. Bundling suites would not help — each crate adds its
own unique-dep compile on top of the shared substrate — so each gets its own shard;
the failpoints shard is isolated because its distinct feature set recompiles the
engine tree. Timeout lowered 75 -> 50 (headroom over the worst cold shard).

The job is renamed `RustFS S3 Integration (<shard>)`; it is not a required check,
so branch protection is unaffected. Docs updated in docs/dev/ci.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci: drop the write_cost_s3 cost gate from the correctness job

The RustFS integration job is a correctness gate. write_cost_s3 is a
deterministic IO-count COST gate (RFC-013 step-3a data-table opener, flat
across commit depth) — a performance contract, not a correctness test.
Cost/perf contracts belong on a dedicated harness with a stable runner and
their own cadence, not on the every-merge correctness path. Remove the step
from the engine shard; a comment + testing.md record how to run it on demand
and note it's pending a dedicated cost harness. The local write_cost.rs
opener/scan-split guard still runs every-PR, so the split stays covered; only
the S3 acceptance of the opener term moves off the correctness path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-02 01:15:28 +03:00
..
DISCUSSION_TEMPLATE governance: external contribution model (issues/discussions/RFCs/PRs) (#143) 2026-06-06 23:58:08 +03:00
ISSUE_TEMPLATE governance: external contribution model (issues/discussions/RFCs/PRs) (#143) 2026-06-06 23:58:08 +03:00
workflows ci: shard the RustFS S3 integration job across parallel runners (#321) 2026-07-02 01:15:28 +03:00
branch-protection.json chore: remove CODEOWNERS chassis and the code-owner review gate 2026-06-18 02:55:27 +03:00
PULL_REQUEST_TEMPLATE.md governance: external contribution model (issues/discussions/RFCs/PRs) (#143) 2026-06-06 23:58:08 +03:00