omnigraph/docs/dev
Andrew Altshuler 98530a0e8a
ci: shard the RustFS S3 integration job across parallel runners (#321)
* ci: shard the RustFS S3 integration job across parallel runners

The RustFS S3 Integration job chronically hit its 75-minute timeout (e.g. on
the v0.8.0 release run) and got cancelled. Root cause is compile time, not test
time: the S3 tests each run in seconds (the write_cost_s3 step took 0.2m once the
engine was built), but the job ran six serial `cargo test` steps across four
crates plus a `--features failpoints` rebuild, and on a cold cache (any Cargo.lock
change, e.g. a release version bump) every suite must recompile the omnigraph-engine
+ Lance/DataFusion tree, summing to ~75m.

Split the suites into a `strategy.matrix.shard` (engine / server / cluster / cli /
failpoints), one suite per shard on its own runner with a per-shard rust-cache key
and `fail-fast: false`. Wall-clock becomes the slowest single shard (~40m cold,
~25m warm) instead of the sum. Bundling suites would not help — each crate adds its
own unique-dep compile on top of the shared substrate — so each gets its own shard;
the failpoints shard is isolated because its distinct feature set recompiles the
engine tree. Timeout lowered 75 -> 50 (headroom over the worst cold shard).

The job is renamed `RustFS S3 Integration (<shard>)`; it is not a required check,
so branch protection is unaffected. Docs updated in docs/dev/ci.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci: drop the write_cost_s3 cost gate from the correctness job

The RustFS integration job is a correctness gate. write_cost_s3 is a
deterministic IO-count COST gate (RFC-013 step-3a data-table opener, flat
across commit depth) — a performance contract, not a correctness test.
Cost/perf contracts belong on a dedicated harness with a stable runner and
their own cadence, not on the every-merge correctness path. Remove the step
from the engine shard; a comment + testing.md record how to run it on demand
and note it's pending a dedicated cost harness. The local write_cost.rs
opener/scan-split guard still runs every-PR, so the split stays covered; only
the S3 acceptance of the opener term moves off the correctness path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-02 01:15:28 +03:00
..
architecture.md perf(engine): scope CSR topology index to traversed edges, reuse it cross-branch (#312) 2026-06-28 20:03:06 +02:00
branch-protection.md chore: remove CODEOWNERS chassis and the code-owner review gate 2026-06-18 02:55:27 +03:00
bug-case-fix.md fix(engine): preserve identifier case in filter pushdown (#283) (#285) 2026-06-19 18:42:56 +03:00
ci.md ci: shard the RustFS S3 integration job across parallel runners (#321) 2026-07-02 01:15:28 +03:00
cluster-axioms.md docs(cluster): axiom 15 — single ownership, mode-switch migration, per-operator layer (#164) 2026-06-10 00:44:51 +03:00
cluster-config-implementation-spec.md docs(cluster): RFC-005 — server boots from cluster state (Phase 5 design) (#174) 2026-06-10 15:22:12 +03:00
cluster-config-specs.md docs(user): restructure user docs into topic sections (Phase 1) (#223) 2026-06-14 13:52:14 +03:00
docs-issues.md docs(dev): update coherence ledger — cookbooks drift resolved, omnigraph-ts mechanism (#294) 2026-06-21 00:11:48 +03:00
execution.md perf(engine): scope CSR topology index to traversed edges, reuse it cross-branch (#312) 2026-06-28 20:03:06 +02:00
handoff-rfc-013-write-path.md feat(engine): retire commit-graph tables (#311) 2026-06-28 16:49:49 +02:00
index.md feat(engine): retire commit-graph tables (#311) 2026-06-28 16:49:49 +02:00
invariants.md feat(engine): unify constraint validation across all write surfaces (#314) 2026-06-30 14:06:49 +02:00
lance.md feat(engine): retire commit-graph tables (#311) 2026-06-28 16:49:49 +02:00
merge.md docs: split user and developer docs (#93) 2026-05-15 03:45:22 +03:00
rfc-001-queries-envelope-mcp.md docs(user): restructure user docs into topic sections (Phase 1) (#223) 2026-06-14 13:52:14 +03:00
rfc-002-config-cli-architecture.md docs(rfc): RFC-009 — unify CLI access paths; align the RFC corpus 2026-06-12 17:33:11 +03:00
rfc-003-mcp-server-surface.md Stored-query registry foundation + config/CLI RFC-002 (#128) 2026-06-01 22:50:31 +02:00
rfc-004-cluster-graph-schema-apply.md docs(cluster): document Stage 4C — Phase 4 complete 2026-06-10 14:44:12 +03:00
rfc-005-server-cluster-boot.md fix(cluster): stop cluster-apply crash-loops from the recovery-sidecar trap (#284) 2026-06-19 03:34:15 +03:00
rfc-007-operator-config.md docs(rfc): RFC-009 — unify CLI access paths; align the RFC corpus 2026-06-12 17:33:11 +03:00
rfc-008-deprecate-omnigraph-yaml.md docs(rfc): RFC-009 — unify CLI access paths; align the RFC corpus 2026-06-12 17:33:11 +03:00
rfc-009-unify-access-paths.md feat: canonical POST /load, deprecate /ingest (RFC-009 Phase 5) (#222) 2026-06-14 03:32:16 +03:00
rfc-010-cli-planes-restructure.md docs(rfc): RFC-010 — apply verification-comment current-state fixups (#215) 2026-06-13 22:24:09 +03:00
rfc-011-cli-refactoring.md feat(cli): add read-only profile list / profile show (RFC-011 D8) (#255) 2026-06-15 23:33:01 +03:00
rfc-012-embedding-provider-config.md Wire cluster embedding providers 2026-06-16 04:02:08 +03:00
rfc-013-write-path-latency.md feat(engine): retire commit-graph tables (#311) 2026-06-28 16:49:49 +02:00
schema-lint-v1-plan.md schema-lint chassis v1.0: DropProperty Soft + code-tagged diagnostics (MR-694) (#90) 2026-05-16 16:30:03 +03:00
testing.md ci: shard the RustFS S3 integration job across parallel runners (#321) 2026-07-02 01:15:28 +03:00
versioning.md feat(engine): retire commit-graph tables (#311) 2026-06-28 16:49:49 +02:00
write-latency-roadmap.md feat(engine): retire commit-graph tables (#311) 2026-06-28 16:49:49 +02:00
writes.md feat(engine): retire commit-graph tables (#311) 2026-06-28 16:49:49 +02:00