# Debugging and Managing Flaky Tests ## Table of Contents 1. [Understanding Flakiness Types](#understanding-flakiness-types) 2. [Detection and Reproduction](#detection-and-reproduction) 3. [Root Cause Analysis](#root-cause-analysis) 4. [Fixing Strategies by Type](#fixing-strategies-by-type) 5. [CI-Specific Flakiness](#ci-specific-flakiness) 6. [Quarantine and Management](#quarantine-and-management) 7. [Prevention Strategies](#prevention-strategies) ## Understanding Flakiness Types ### Categories of Flakiness Most flaky tests fall into distinct categories requiring different remediation: | Category | Symptoms | Common Causes | | --------------------------- | ------------------------------- | ------------------------------------------------------ | | **UI-driven** | Element not found, click missed | Missing waits, animations, dynamic rendering | | **Environment-driven** | CI-only failures | Slower CPU, memory limits, cold browser starts | | **Data/parallelism-driven** | Fails with multiple workers | Shared backend data, reused accounts, state collisions | | **Test-suite-driven** | Fails when run with other tests | Leaked state, shared fixtures, order dependencies | ### Flakiness Decision Tree ``` Test fails intermittently ├─ Fails locally too? │ ├─ YES → Timing/async issue → Check waits and assertions │ └─ NO → CI-specific → Check environment differences │ ├─ Fails only with multiple workers? │ └─ YES → Parallelism issue → Check data isolation │ ├─ Fails only when run after specific tests? │ └─ YES → State leak → Check fixtures and cleanup │ └─ Fails randomly regardless of conditions? └─ External dependency → Check network/API stability ``` ## Detection and Reproduction ### Confirming Flakiness ```bash # Run test multiple times to confirm instability npx playwright test tests/checkout.spec.ts --repeat-each=20 # Run with single worker to isolate parallelism issues npx playwright test --workers=1 # Run in CI-like conditions locally CI=true npx playwright test --repeat-each=10 ``` ### Reproduction Strategies ```typescript // playwright.config.ts - Enable artifacts for flaky test investigation export default defineConfig({ retries: process.env.CI ? 2 : 0, use: { trace: "on-first-retry", // Capture trace on retry video: "retain-on-failure", screenshot: "only-on-failure", }, }); ``` ### Identify Flaky Tests Programmatically ```typescript // Track test results across runs test.afterEach(async ({}, testInfo) => { if (testInfo.retry > 0 && testInfo.status === "passed") { console.warn(`FLAKY: ${testInfo.title} passed on retry ${testInfo.retry}`); // Log to your tracking system } }); ``` ## Root Cause Analysis ### Event Logging for Race Conditions Add comprehensive event logging to expose timing issues: ```typescript test.beforeEach(async ({ page }) => { page.on("console", (msg) => console.log(`CONSOLE [${msg.type()}]:`, msg.text()), ); page.on("pageerror", (err) => console.error("PAGE ERROR:", err.message)); page.on("requestfailed", (req) => console.error(`REQUEST FAILED: ${req.url()}`), ); }); ``` > **For comprehensive console error handling** (fail on errors, allowed patterns, fixtures), see [console-errors.md](console-errors.md). ### Network Timing Analysis ```typescript // Capture slow or failed requests test.beforeEach(async ({ page }) => { const slowRequests: string[] = []; page.on("requestfinished", (request) => { const timing = request.timing(); const duration = timing.responseEnd - timing.requestStart; if (duration > 2000) { slowRequests.push(`${request.url()} took ${duration}ms`); } }); page.on("requestfailed", (request) => { console.error(`Failed: ${request.url()} - ${request.failure()?.errorText}`); }); }); ``` ### Trace Analysis ```bash # View trace from failed CI run npx playwright show-trace path/to/trace.zip # Generate trace for specific test npx playwright test tests/flaky.spec.ts --trace on ``` ## Fixing Strategies by Type ### UI-Driven Flakiness **Problem: Element not ready when action executes** ```typescript // ❌ BAD: No wait for element state await page.click("#submit"); await page.fill("#username", "test"); // Element may not be ready // ✅ GOOD: Actions + assertions pattern (auto-waiting built-in) await page.getByRole("button", { name: "Submit" }).click(); await expect(page.getByRole("heading", { name: "Dashboard" })).toBeVisible(); ``` **Problem: Animations or transitions interfere** ```typescript // ❌ BAD: Click during animation await page.click(".menu-item"); // ✅ GOOD: Wait for animation to complete await page.getByRole("menuitem", { name: "Settings" }).click(); await expect(page.getByRole("dialog")).toBeVisible(); // Or disable animations entirely await page.emulateMedia({ reducedMotion: "reduce" }); ``` **Problem: Brittle selectors** ```typescript // ❌ BAD: Fragile CSS chain await page.click("div.container > div:nth-child(2) > button.btn-primary"); // ✅ GOOD: Semantic selectors await page.getByRole("button", { name: "Continue" }).click(); await page.getByTestId("checkout-button").click(); await page.getByLabel("Email address").fill("test@example.com"); ``` ### Async/Timing Flakiness **Problem: Race between test and application** ```typescript // ❌ BAD: Arbitrary sleep await page.click("#load-data"); await page.waitForTimeout(3000); // Hope data loads in 3s // ✅ GOOD: Wait for specific condition await page.click("#load-data"); await expect(page.locator(".data-row")).toHaveCount(10, { timeout: 10000 }); // ✅ BETTER: Wait for network response, then assert const responsePromise = page.waitForResponse( (r) => r.url().includes("/api/data") && r.request().method() === "GET" && r.ok(), ); await page.click("#load-data"); await responsePromise; await expect(page.locator(".data-row")).toHaveCount(10); ``` > **For comprehensive waiting strategies** (navigation, element state, network, polling with `toPass()`), see [assertions-waiting.md](assertions-waiting.md#waiting-strategies). **Problem: Complex async state** ```typescript // Custom wait for application-specific conditions await page.waitForFunction(() => { const app = (window as any).__APP_STATE__; return app?.isReady && !app?.isLoading; }); // Wait for multiple conditions await Promise.all([ page.waitForResponse("**/api/user"), page.waitForResponse("**/api/settings"), page.getByRole("button", { name: "Load" }).click(), ]); ``` ### Data/Parallelism-Driven Flakiness **Problem: Tests share backend data** ```typescript // ❌ BAD: All workers use same user const testUser = { email: "test@example.com", password: "pass123" }; // ✅ GOOD: Unique data per worker import { test as base } from "@playwright/test"; export const test = base.extend< {}, { testUser: { email: string; id: string } } >({ testUser: [ async ({}, use, workerInfo) => { const email = `test-${workerInfo.workerIndex}-${Date.now()}@example.com`; const user = await createTestUser(email); await use(user); await deleteTestUser(user.id); }, { scope: "worker" }, ], }); ``` **Problem: Shared storageState across workers** ```typescript // ❌ BAD: All workers share same auth state use: { storageState: '.auth/user.json', } // ✅ GOOD: Per-worker auth state export const test = base.extend<{}, { workerStorageState: string }>({ workerStorageState: [ async ({ browser }, use, workerInfo) => { const id = workerInfo.workerIndex; const fileName = `.auth/user-${id}.json`; if (!fs.existsSync(fileName)) { const page = await browser.newPage({ storageState: undefined }); await authenticateUser(page, `worker${id}@test.com`); await page.context().storageState({ path: fileName }); await page.close(); } await use(fileName); }, { scope: "worker" }, ], }); ``` ### Test-Suite-Driven Flakiness (State Leaks) **Problem: Tests affect each other** ```typescript // ❌ BAD: Module-level state persists across tests let sharedPage: Page; test.beforeAll(async ({ browser }) => { sharedPage = await browser.newPage(); // Shared across tests! }); // ✅ GOOD: Use Playwright's default isolation (fresh context per test) test("first test", async ({ page }) => { // Fresh page for this test }); test("second test", async ({ page }) => { // Fresh page for this test }); ``` **Problem: Fixture cleanup not happening** ```typescript // ✅ GOOD: Proper fixture with cleanup export const test = base.extend<{ tempFile: string }>({ tempFile: async ({}, use) => { const file = `/tmp/test-${Date.now()}.json`; fs.writeFileSync(file, "{}"); await use(file); // Cleanup always runs, even on failure if (fs.existsSync(file)) { fs.unlinkSync(file); } }, }); ``` ## CI-Specific Flakiness ### Why Tests Fail Only in CI | CI Condition | Impact | Solution | | ------------------ | ------------------------------------- | ---------------------------------------------------- | | Slower CPU | Actions complete later than expected | Use auto-waiting, not timeouts | | Cold browser start | No cached assets, slower initial load | Add explicit waits for first navigation | | Headless mode | Different rendering behavior | Test locally in headless mode | | Shared runners | Resource contention | Reduce parallelism or use dedicated runners | | Network latency | API calls slower | Mock external APIs, increase timeouts for real calls | ### Simulating CI Locally ```bash # Run headless with CI environment variable CI=true npx playwright test # Limit CPU (Linux/Mac) cpulimit -l 50 -- npx playwright test # Run in Docker matching CI environment docker run -it --rm \ -v $(pwd):/work \ -w /work \ mcr.microsoft.com/playwright:v1.40.0-jammy \ npx playwright test ``` ### Consistent Viewport and Scale ```typescript // playwright.config.ts - Match CI rendering exactly export default defineConfig({ use: { viewport: { width: 1280, height: 720 }, deviceScaleFactor: 1, }, }); ``` ### Network Stubbing for External APIs ```typescript // Eliminate external API flakiness test.beforeEach(async ({ page }) => { // Stub unstable third-party APIs await page.route("**/api.analytics.com/**", (route) => route.fulfill({ body: "" }), ); await page.route("**/api.payment-provider.com/**", (route) => route.fulfill({ json: { status: "ok" } }), ); }); // Test-specific stub test("checkout with payment", async ({ page }) => { await page.route("**/api/payment", (route) => route.fulfill({ json: { success: true, transactionId: "test-123" } }), ); // Test proceeds with deterministic response }); ``` ## Quarantine and Management ### Quarantine Pattern ```typescript // playwright.config.ts - Separate flaky tests export default defineConfig({ projects: [ { name: "stable", testIgnore: ["**/*.flaky.spec.ts"], }, { name: "quarantine", testMatch: ["**/*.flaky.spec.ts"], retries: 3, }, ], }); ``` ### Annotation-Based Quarantine ```typescript // Mark flaky tests with annotations test("intermittent checkout issue", async ({ page }, testInfo) => { testInfo.annotations.push({ type: "flaky", description: "Investigating payment API timing - JIRA-1234", }); // Test implementation }); // Skip flaky test conditionally test("known CI flaky", async ({ page }) => { test.skip(!!process.env.CI, "Flaky in CI - investigating JIRA-5678"); // Test implementation }); ``` ## Prevention Strategies ### Test Burn-In ```bash # Run new tests many times before merging npx playwright test tests/new-feature.spec.ts --repeat-each=50 # Run in parallel to expose race conditions npx playwright test tests/new-feature.spec.ts --repeat-each=20 --workers=4 ``` ### Isolation Checklist ```typescript // ✅ Each test should be self-contained test.describe("User profile", () => { test("can update name", async ({ page, testUser }) => { // Uses unique testUser fixture // No dependency on other tests // Cleanup handled by fixture }); test("can update email", async ({ page, testUser }) => { // Independent of "can update name" // Own testUser, own state }); }); ``` ### Defensive Assertions ```typescript // ❌ BAD: Single point of failure await expect(page.locator(".items")).toHaveCount(5); // ✅ GOOD: Progressive assertions that help diagnose await expect(page.locator(".items-container")).toBeVisible(); await expect(page.locator(".loading")).not.toBeVisible(); await expect(page.locator(".items")).toHaveCount(5); ``` ### Retry Budget ```typescript // playwright.config.ts - Limit retries to avoid masking issues export default defineConfig({ retries: process.env.CI ? 2 : 0, // Only retry in CI expect: { timeout: 10000, // Reasonable assertion timeout }, timeout: 60000, // Test timeout }); ``` ## Anti-Patterns to Avoid | Anti-Pattern | Problem | Solution | | ----------------------------------------- | ----------------------------------- | ---------------------------------------------- | | `waitForTimeout()` as primary wait | Arbitrary, hides real timing issues | Use auto-waiting assertions | | Increasing global timeout to "fix" flakes | Masks root cause, slows all tests | Find and fix actual timing issue | | Retrying until pass | Hides systemic problems | Fix root cause, use retries for diagnosis only | | Shared test data across workers | Race conditions, collisions | Isolate data per worker | | Testing real external APIs | Network variability | Mock external dependencies | | Module-level mutable state | Leaks between tests | Use fixtures with proper cleanup | | Ignoring flaky tests | Problem compounds over time | Quarantine and track for fixing | ## Related References - **Debugging**: See [debugging.md](debugging.md) for trace viewer and inspector - **Fixtures**: See [fixtures-hooks.md](../core/fixtures-hooks.md) for worker-scoped isolation - **Performance**: See [performance.md](../infrastructure-ci-cd/performance.md) for parallel execution patterns - **Assertions**: See [assertions-waiting.md](../core/assertions-waiting.md) for auto-waiting patterns - **Global Setup**: See [global-setup.md](../core/global-setup.md) for setup vs fixtures decision