mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-07-03 02:51:04 +02:00
* ci: shard the RustFS S3 integration job across parallel runners The RustFS S3 Integration job chronically hit its 75-minute timeout (e.g. on the v0.8.0 release run) and got cancelled. Root cause is compile time, not test time: the S3 tests each run in seconds (the write_cost_s3 step took 0.2m once the engine was built), but the job ran six serial `cargo test` steps across four crates plus a `--features failpoints` rebuild, and on a cold cache (any Cargo.lock change, e.g. a release version bump) every suite must recompile the omnigraph-engine + Lance/DataFusion tree, summing to ~75m. Split the suites into a `strategy.matrix.shard` (engine / server / cluster / cli / failpoints), one suite per shard on its own runner with a per-shard rust-cache key and `fail-fast: false`. Wall-clock becomes the slowest single shard (~40m cold, ~25m warm) instead of the sum. Bundling suites would not help — each crate adds its own unique-dep compile on top of the shared substrate — so each gets its own shard; the failpoints shard is isolated because its distinct feature set recompiles the engine tree. Timeout lowered 75 -> 50 (headroom over the worst cold shard). The job is renamed `RustFS S3 Integration (<shard>)`; it is not a required check, so branch protection is unaffected. Docs updated in docs/dev/ci.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci: drop the write_cost_s3 cost gate from the correctness job The RustFS integration job is a correctness gate. write_cost_s3 is a deterministic IO-count COST gate (RFC-013 step-3a data-table opener, flat across commit depth) — a performance contract, not a correctness test. Cost/perf contracts belong on a dedicated harness with a stable runner and their own cadence, not on the every-merge correctness path. Remove the step from the engine shard; a comment + testing.md record how to run it on demand and note it's pending a dedicated cost harness. The local write_cost.rs opener/scan-split guard still runs every-PR, so the split stays covered; only the S3 acceptance of the opener term moves off the correctness path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| architecture.md | ||
| branch-protection.md | ||
| bug-case-fix.md | ||
| ci.md | ||
| cluster-axioms.md | ||
| cluster-config-implementation-spec.md | ||
| cluster-config-specs.md | ||
| docs-issues.md | ||
| execution.md | ||
| handoff-rfc-013-write-path.md | ||
| index.md | ||
| invariants.md | ||
| lance.md | ||
| merge.md | ||
| rfc-001-queries-envelope-mcp.md | ||
| rfc-002-config-cli-architecture.md | ||
| rfc-003-mcp-server-surface.md | ||
| rfc-004-cluster-graph-schema-apply.md | ||
| rfc-005-server-cluster-boot.md | ||
| rfc-007-operator-config.md | ||
| rfc-008-deprecate-omnigraph-yaml.md | ||
| rfc-009-unify-access-paths.md | ||
| rfc-010-cli-planes-restructure.md | ||
| rfc-011-cli-refactoring.md | ||
| rfc-012-embedding-provider-config.md | ||
| rfc-013-write-path-latency.md | ||
| schema-lint-v1-plan.md | ||
| testing.md | ||
| versioning.md | ||
| write-latency-roadmap.md | ||
| writes.md | ||