SurfSense/.cursor/skills/playwright-testing/debugging/flaky-tests.md
2026-05-10 04:19:55 +05:30

15 KiB

Debugging and Managing Flaky Tests

Table of Contents

  1. Understanding Flakiness Types
  2. Detection and Reproduction
  3. Root Cause Analysis
  4. Fixing Strategies by Type
  5. CI-Specific Flakiness
  6. Quarantine and Management
  7. Prevention Strategies

Understanding Flakiness Types

Categories of Flakiness

Most flaky tests fall into distinct categories requiring different remediation:

Category Symptoms Common Causes
UI-driven Element not found, click missed Missing waits, animations, dynamic rendering
Environment-driven CI-only failures Slower CPU, memory limits, cold browser starts
Data/parallelism-driven Fails with multiple workers Shared backend data, reused accounts, state collisions
Test-suite-driven Fails when run with other tests Leaked state, shared fixtures, order dependencies

Flakiness Decision Tree

Test fails intermittently
├─ Fails locally too?
│  ├─ YES → Timing/async issue → Check waits and assertions
│  └─ NO → CI-specific → Check environment differences
│
├─ Fails only with multiple workers?
│  └─ YES → Parallelism issue → Check data isolation
│
├─ Fails only when run after specific tests?
│  └─ YES → State leak → Check fixtures and cleanup
│
└─ Fails randomly regardless of conditions?
   └─ External dependency → Check network/API stability

Detection and Reproduction

Confirming Flakiness

# Run test multiple times to confirm instability
npx playwright test tests/checkout.spec.ts --repeat-each=20

# Run with single worker to isolate parallelism issues
npx playwright test --workers=1

# Run in CI-like conditions locally
CI=true npx playwright test --repeat-each=10

Reproduction Strategies

// playwright.config.ts - Enable artifacts for flaky test investigation
export default defineConfig({
  retries: process.env.CI ? 2 : 0,
  use: {
    trace: "on-first-retry", // Capture trace on retry
    video: "retain-on-failure",
    screenshot: "only-on-failure",
  },
});

Identify Flaky Tests Programmatically

// Track test results across runs
test.afterEach(async ({}, testInfo) => {
  if (testInfo.retry > 0 && testInfo.status === "passed") {
    console.warn(`FLAKY: ${testInfo.title} passed on retry ${testInfo.retry}`);
    // Log to your tracking system
  }
});

Root Cause Analysis

Event Logging for Race Conditions

Add comprehensive event logging to expose timing issues:

test.beforeEach(async ({ page }) => {
  page.on("console", (msg) =>
    console.log(`CONSOLE [${msg.type()}]:`, msg.text()),
  );
  page.on("pageerror", (err) => console.error("PAGE ERROR:", err.message));
  page.on("requestfailed", (req) =>
    console.error(`REQUEST FAILED: ${req.url()}`),
  );
});

For comprehensive console error handling (fail on errors, allowed patterns, fixtures), see console-errors.md.

Network Timing Analysis

// Capture slow or failed requests
test.beforeEach(async ({ page }) => {
  const slowRequests: string[] = [];

  page.on("requestfinished", (request) => {
    const timing = request.timing();
    const duration = timing.responseEnd - timing.requestStart;
    if (duration > 2000) {
      slowRequests.push(`${request.url()} took ${duration}ms`);
    }
  });

  page.on("requestfailed", (request) => {
    console.error(`Failed: ${request.url()} - ${request.failure()?.errorText}`);
  });
});

Trace Analysis

# View trace from failed CI run
npx playwright show-trace path/to/trace.zip

# Generate trace for specific test
npx playwright test tests/flaky.spec.ts --trace on

Fixing Strategies by Type

UI-Driven Flakiness

Problem: Element not ready when action executes

// ❌ BAD: No wait for element state
await page.click("#submit");
await page.fill("#username", "test"); // Element may not be ready

// ✅ GOOD: Actions + assertions pattern (auto-waiting built-in)
await page.getByRole("button", { name: "Submit" }).click();
await expect(page.getByRole("heading", { name: "Dashboard" })).toBeVisible();

Problem: Animations or transitions interfere

// ❌ BAD: Click during animation
await page.click(".menu-item");

// ✅ GOOD: Wait for animation to complete
await page.getByRole("menuitem", { name: "Settings" }).click();
await expect(page.getByRole("dialog")).toBeVisible();
// Or disable animations entirely
await page.emulateMedia({ reducedMotion: "reduce" });

Problem: Brittle selectors

// ❌ BAD: Fragile CSS chain
await page.click("div.container > div:nth-child(2) > button.btn-primary");

// ✅ GOOD: Semantic selectors
await page.getByRole("button", { name: "Continue" }).click();
await page.getByTestId("checkout-button").click();
await page.getByLabel("Email address").fill("test@example.com");

Async/Timing Flakiness

Problem: Race between test and application

// ❌ BAD: Arbitrary sleep
await page.click("#load-data");
await page.waitForTimeout(3000); // Hope data loads in 3s

// ✅ GOOD: Wait for specific condition
await page.click("#load-data");
await expect(page.locator(".data-row")).toHaveCount(10, { timeout: 10000 });

// ✅ BETTER: Wait for network response, then assert
const responsePromise = page.waitForResponse(
  (r) =>
    r.url().includes("/api/data") &&
    r.request().method() === "GET" &&
    r.ok(),
);
await page.click("#load-data");
await responsePromise;
await expect(page.locator(".data-row")).toHaveCount(10);

For comprehensive waiting strategies (navigation, element state, network, polling with toPass()), see assertions-waiting.md.

Problem: Complex async state

// Custom wait for application-specific conditions
await page.waitForFunction(() => {
  const app = (window as any).__APP_STATE__;
  return app?.isReady && !app?.isLoading;
});

// Wait for multiple conditions
await Promise.all([
  page.waitForResponse("**/api/user"),
  page.waitForResponse("**/api/settings"),
  page.getByRole("button", { name: "Load" }).click(),
]);

Data/Parallelism-Driven Flakiness

Problem: Tests share backend data

// ❌ BAD: All workers use same user
const testUser = { email: "test@example.com", password: "pass123" };

// ✅ GOOD: Unique data per worker
import { test as base } from "@playwright/test";

export const test = base.extend<
  {},
  { testUser: { email: string; id: string } }
>({
  testUser: [
    async ({}, use, workerInfo) => {
      const email = `test-${workerInfo.workerIndex}-${Date.now()}@example.com`;
      const user = await createTestUser(email);
      await use(user);
      await deleteTestUser(user.id);
    },
    { scope: "worker" },
  ],
});

Problem: Shared storageState across workers

// ❌ BAD: All workers share same auth state
use: {
  storageState: '.auth/user.json',
}

// ✅ GOOD: Per-worker auth state
export const test = base.extend<{}, { workerStorageState: string }>({
  workerStorageState: [
    async ({ browser }, use, workerInfo) => {
      const id = workerInfo.workerIndex;
      const fileName = `.auth/user-${id}.json`;

      if (!fs.existsSync(fileName)) {
        const page = await browser.newPage({ storageState: undefined });
        await authenticateUser(page, `worker${id}@test.com`);
        await page.context().storageState({ path: fileName });
        await page.close();
      }

      await use(fileName);
    },
    { scope: "worker" },
  ],
});

Test-Suite-Driven Flakiness (State Leaks)

Problem: Tests affect each other

// ❌ BAD: Module-level state persists across tests
let sharedPage: Page;

test.beforeAll(async ({ browser }) => {
  sharedPage = await browser.newPage(); // Shared across tests!
});

// ✅ GOOD: Use Playwright's default isolation (fresh context per test)
test("first test", async ({ page }) => {
  // Fresh page for this test
});

test("second test", async ({ page }) => {
  // Fresh page for this test
});

Problem: Fixture cleanup not happening

// ✅ GOOD: Proper fixture with cleanup
export const test = base.extend<{ tempFile: string }>({
  tempFile: async ({}, use) => {
    const file = `/tmp/test-${Date.now()}.json`;
    fs.writeFileSync(file, "{}");

    await use(file);

    // Cleanup always runs, even on failure
    if (fs.existsSync(file)) {
      fs.unlinkSync(file);
    }
  },
});

CI-Specific Flakiness

Why Tests Fail Only in CI

CI Condition Impact Solution
Slower CPU Actions complete later than expected Use auto-waiting, not timeouts
Cold browser start No cached assets, slower initial load Add explicit waits for first navigation
Headless mode Different rendering behavior Test locally in headless mode
Shared runners Resource contention Reduce parallelism or use dedicated runners
Network latency API calls slower Mock external APIs, increase timeouts for real calls

Simulating CI Locally

# Run headless with CI environment variable
CI=true npx playwright test

# Limit CPU (Linux/Mac)
cpulimit -l 50 -- npx playwright test

# Run in Docker matching CI environment
docker run -it --rm \
  -v $(pwd):/work \
  -w /work \
  mcr.microsoft.com/playwright:v1.40.0-jammy \
  npx playwright test

Consistent Viewport and Scale

// playwright.config.ts - Match CI rendering exactly
export default defineConfig({
  use: {
    viewport: { width: 1280, height: 720 },
    deviceScaleFactor: 1,
  },
});

Network Stubbing for External APIs

// Eliminate external API flakiness
test.beforeEach(async ({ page }) => {
  // Stub unstable third-party APIs
  await page.route("**/api.analytics.com/**", (route) =>
    route.fulfill({ body: "" }),
  );
  await page.route("**/api.payment-provider.com/**", (route) =>
    route.fulfill({ json: { status: "ok" } }),
  );
});

// Test-specific stub
test("checkout with payment", async ({ page }) => {
  await page.route("**/api/payment", (route) =>
    route.fulfill({ json: { success: true, transactionId: "test-123" } }),
  );
  // Test proceeds with deterministic response
});

Quarantine and Management

Quarantine Pattern

// playwright.config.ts - Separate flaky tests
export default defineConfig({
  projects: [
    {
      name: "stable",
      testIgnore: ["**/*.flaky.spec.ts"],
    },
    {
      name: "quarantine",
      testMatch: ["**/*.flaky.spec.ts"],
      retries: 3,
    },
  ],
});

Annotation-Based Quarantine

// Mark flaky tests with annotations
test("intermittent checkout issue", async ({ page }, testInfo) => {
  testInfo.annotations.push({
    type: "flaky",
    description: "Investigating payment API timing - JIRA-1234",
  });

  // Test implementation
});

// Skip flaky test conditionally
test("known CI flaky", async ({ page }) => {
  test.skip(!!process.env.CI, "Flaky in CI - investigating JIRA-5678");
  // Test implementation
});

Prevention Strategies

Test Burn-In

# Run new tests many times before merging
npx playwright test tests/new-feature.spec.ts --repeat-each=50

# Run in parallel to expose race conditions
npx playwright test tests/new-feature.spec.ts --repeat-each=20 --workers=4

Isolation Checklist

// ✅ Each test should be self-contained
test.describe("User profile", () => {
  test("can update name", async ({ page, testUser }) => {
    // Uses unique testUser fixture
    // No dependency on other tests
    // Cleanup handled by fixture
  });

  test("can update email", async ({ page, testUser }) => {
    // Independent of "can update name"
    // Own testUser, own state
  });
});

Defensive Assertions

// ❌ BAD: Single point of failure
await expect(page.locator(".items")).toHaveCount(5);

// ✅ GOOD: Progressive assertions that help diagnose
await expect(page.locator(".items-container")).toBeVisible();
await expect(page.locator(".loading")).not.toBeVisible();
await expect(page.locator(".items")).toHaveCount(5);

Retry Budget

// playwright.config.ts - Limit retries to avoid masking issues
export default defineConfig({
  retries: process.env.CI ? 2 : 0, // Only retry in CI
  expect: {
    timeout: 10000, // Reasonable assertion timeout
  },
  timeout: 60000, // Test timeout
});

Anti-Patterns to Avoid

Anti-Pattern Problem Solution
waitForTimeout() as primary wait Arbitrary, hides real timing issues Use auto-waiting assertions
Increasing global timeout to "fix" flakes Masks root cause, slows all tests Find and fix actual timing issue
Retrying until pass Hides systemic problems Fix root cause, use retries for diagnosis only
Shared test data across workers Race conditions, collisions Isolate data per worker
Testing real external APIs Network variability Mock external dependencies
Module-level mutable state Leaks between tests Use fixtures with proper cleanup
Ignoring flaky tests Problem compounds over time Quarantine and track for fixing