From 81b66f9427953f4fc736cead9d4986421aad91c5 Mon Sep 17 00:00:00 2001 From: Andrew Altshuler Date: Sat, 13 Jun 2026 19:23:41 +0300 Subject: [PATCH] ci: run Test Workspace only on main, not on pull requests (#212) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The full workspace + failpoints suite was the slowest PR gate (~15min warm, up to the 75min cold ceiling) and dominated PR turnaround. Gate the `test` job with `if: github.event_name != 'pull_request'` so it runs only on push to `main` (post-merge), on `v*` tags, and on manual `workflow_dispatch`. `RustFS S3 Integration` needs `test`, so it becomes push-/dispatch-only by the same cascade. Drop `Test Workspace` from the required-check list in branch-protection.json: a required context that never reports on PRs (the job no longer runs there) would leave every PR permanently pending — the job-never-reports trap the policy already documents. Trade-off accepted deliberately (chosen by the maintainer): a regression the suite would catch now lands on `main` and reddens the post-merge run instead of being blocked pre-merge, so `main` can briefly break. Mitigations documented in ci.md: run `cargo test --workspace --locked` locally before merging non-trivial changes (or trigger the workflow on your branch via workflow_dispatch), and regenerate openapi.json locally for server/API changes (the auto-regen step lived in the now-PR-skipped test job). The fast PR gates remain: Classify Changes, Check AGENTS.md Links, the AWS-feature build/test, and the two CODEOWNERS checks. NOTE: an admin must run ./scripts/apply-branch-protection.sh after this merges, or GitHub keeps requiring the now-unreported Test Workspace context. Co-authored-by: Claude Opus 4.8 --- .github/branch-protection.json | 1 - .github/workflows/ci.yml | 20 ++++++++++++++++++++ docs/dev/branch-protection.md | 2 +- docs/dev/ci.md | 3 +++ 4 files changed, 24 insertions(+), 2 deletions(-) diff --git a/.github/branch-protection.json b/.github/branch-protection.json index c039e32..aa1ab19 100644 --- a/.github/branch-protection.json +++ b/.github/branch-protection.json @@ -5,7 +5,6 @@ "contexts": [ "Classify Changes", "Check AGENTS.md Links", - "Test Workspace", "Test omnigraph-server --features aws", "CODEOWNERS matches source", "CODEOWNERS not hand-edited" diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 56ef3e3..fca08da 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -128,6 +128,23 @@ jobs: test: name: Test Workspace needs: classify_changes + # PR latency: the full workspace + failpoints build/test is the slowest + # gate (~15min warm, up to the 75min ceiling cold) and dominated PR + # turnaround. It now runs only on push to `main` (post-merge), on tags, + # and on manual `workflow_dispatch` — NOT on pull_request. Trade-off + # accepted deliberately: a regression is caught on the `main` run after + # merge rather than before it, so `main` can briefly go red. Mitigations: + # (1) `Test Workspace` is removed from required PR checks in + # `.github/branch-protection.json` (a required check that never + # reports would leave every PR permanently pending); + # (2) run the full suite locally before merging risky changes + # (`cargo test --workspace --locked`), or trigger this workflow via + # the Actions "Run workflow" button (workflow_dispatch) on your branch; + # (3) openapi.json is no longer auto-regenerated on PRs (that step lived + # here) — regenerate it locally for server/API changes + # (`OMNIGRAPH_UPDATE_OPENAPI=1 cargo test -p omnigraph-server --test openapi`) + # or the strict drift check fails the post-merge `main` run. + if: github.event_name != 'pull_request' runs-on: ubuntu-latest # 75, not 45: a cold rust-cache (every Cargo.lock change) costs a full # workspace + failpoints-feature build on a 2-core runner, which now @@ -274,6 +291,9 @@ jobs: rustfs_integration: name: RustFS S3 Integration + # `needs: test` means this is push-/dispatch-only too: on pull_request the + # `test` job is skipped, so this dependent is skipped with it. S3 + # integration runs post-merge on `main`, alongside the workspace suite. needs: - classify_changes - test diff --git a/docs/dev/branch-protection.md b/docs/dev/branch-protection.md index 2b6cc37..1d1c094 100644 --- a/docs/dev/branch-protection.md +++ b/docs/dev/branch-protection.md @@ -8,7 +8,7 @@ This page explains what the policy says and how to change it. | Setting | Value | Why | |---|---|---| -| **Required status checks (strict)** | `Classify Changes`, `Check AGENTS.md Links`, `Test Workspace`, `Test omnigraph-server --features aws`, `CODEOWNERS matches source`, `CODEOWNERS not hand-edited` | Every PR must pass workspace tests, AGENTS.md link integrity, and the CODEOWNERS hygiene checks. The two CODEOWNERS contexts must equal the job `name:` values in `.github/workflows/codeowners.yml` **verbatim** — a context naming a job that never reports (the old `CODEOWNERS / drift` used the job *id*, and the job was path-filtered) leaves every PR permanently pending and forces admin overrides. `strict: true` requires the branch to be up-to-date with `main` before merge. | +| **Required status checks (strict)** | `Classify Changes`, `Check AGENTS.md Links`, `Test omnigraph-server --features aws`, `CODEOWNERS matches source`, `CODEOWNERS not hand-edited` | Every PR must pass the AWS-feature build/test, AGENTS.md link integrity, and the CODEOWNERS hygiene checks. **`Test Workspace` is deliberately NOT required** — it runs only on push to `main` (post-merge), tags, and manual `workflow_dispatch`, to keep PR turnaround fast (it was the ~15min+ slow gate). It is therefore *not* listed here: a required check that never reports on PRs (the `test` job is `if: github.event_name != 'pull_request'`) would leave every PR permanently pending — the same job-never-reports trap the CODEOWNERS contexts call out below. The trade-off (a regression lands on `main` and is caught by the post-merge run, so `main` can briefly go red) and its mitigations are documented in [ci.md](ci.md). The two CODEOWNERS contexts must equal the job `name:` values in `.github/workflows/codeowners.yml` **verbatim** — a context naming a job that never reports (the old `CODEOWNERS / drift` used the job *id*, and the job was path-filtered) leaves every PR permanently pending and forces admin overrides. `strict: true` requires the branch to be up-to-date with `main` before merge. | | **Required approving reviews** | `1` | At least one reviewer. With a 2-person team, going higher would block all merges when one person is unavailable. | | **Require code-owner reviews** | `true` | The reviewer must be a code owner per `.github/CODEOWNERS`. This is what makes the codeowners chassis enforced. | | **Dismiss stale reviews on new commits** | `true` | A push after approval invalidates the prior review. Prevents the "approve, then sneak in unreviewed changes" pattern. | diff --git a/docs/dev/ci.md b/docs/dev/ci.md index 1124cb4..2e80f40 100644 --- a/docs/dev/ci.md +++ b/docs/dev/ci.md @@ -3,6 +3,9 @@ `.github/workflows/`: - **ci.yml**: text-only changes skip; otherwise `cargo test --workspace --locked` on ubuntu-latest with protobuf compiler. OpenAPI-drift check that auto-commits the regenerated `openapi.json` for same-repository PRs. Also runs the AGENTS.md cross-link integrity check (`scripts/check-agents-md.sh`). + - **`Test Workspace` does not run on pull requests.** The job is gated `if: github.event_name != 'pull_request'`, so the full workspace + failpoints suite runs only on push to `main` (post-merge), on `v*` tags, and on manual `workflow_dispatch`. This was a deliberate PR-latency trade-off — it was the slowest gate (~15min warm, up to the 75min cold ceiling). `RustFS S3 Integration` `needs: test`, so it is push-/dispatch-only for the same reason. The fast PR gates remain: `Classify Changes`, `Check AGENTS.md Links`, `Test omnigraph-server --features aws`, and the two CODEOWNERS checks. `Test Workspace` is correspondingly **not** in the required-check list (`.github/branch-protection.json`); see [branch-protection.md](branch-protection.md). + - **Consequences to internalize:** (1) a regression that the suite would catch now lands on `main` and turns the post-merge run red, rather than being blocked pre-merge — `main` can briefly break, so run `cargo test --workspace --locked` locally before merging anything non-trivial, or trigger this workflow on your branch via the Actions "Run workflow" button. (2) `openapi.json` is no longer auto-regenerated on PRs (that step is inside the `test` job); for server/API changes, regenerate it locally with `OMNIGRAPH_UPDATE_OPENAPI=1 cargo test -p omnigraph-server --test openapi` and commit it, or the strict drift check fails the post-merge `main` run. + - **Applying this policy:** removing `Test Workspace` from the JSON is inert until an admin runs `./scripts/apply-branch-protection.sh`. **Run it immediately after this change merges** — until then GitHub still requires a `Test Workspace` context that no longer reports on PRs, which leaves every open PR permanently pending (the job-never-reports trap). - **AWS feature build job**: `cargo build/test -p omnigraph-server --features aws` on ubuntu-latest. - **Windows binary build job**: `cargo build --release --locked -p omnigraph-cli -p omnigraph-server` on windows-latest with smoke checks for `omnigraph.exe version`, `omnigraph-server.exe --help`, and PowerShell installer syntax. - **RustFS S3 integration**: spins up RustFS in Docker, runs `s3_storage`, `server_opens_s3_graph_directly_and_serves_snapshot_and_read`, and `local_cli_s3_end_to_end_init_load_read_flow`.