SurfSense/.cursor/skills/playwright-testing/debugging/flaky-tests.md

497 lines
15 KiB
Markdown
Raw Normal View History

2026-05-10 04:19:55 +05:30
# Debugging and Managing Flaky Tests
## Table of Contents
1. [Understanding Flakiness Types](#understanding-flakiness-types)
2. [Detection and Reproduction](#detection-and-reproduction)
3. [Root Cause Analysis](#root-cause-analysis)
4. [Fixing Strategies by Type](#fixing-strategies-by-type)
5. [CI-Specific Flakiness](#ci-specific-flakiness)
6. [Quarantine and Management](#quarantine-and-management)
7. [Prevention Strategies](#prevention-strategies)
## Understanding Flakiness Types
### Categories of Flakiness
Most flaky tests fall into distinct categories requiring different remediation:
| Category | Symptoms | Common Causes |
| --------------------------- | ------------------------------- | ------------------------------------------------------ |
| **UI-driven** | Element not found, click missed | Missing waits, animations, dynamic rendering |
| **Environment-driven** | CI-only failures | Slower CPU, memory limits, cold browser starts |
| **Data/parallelism-driven** | Fails with multiple workers | Shared backend data, reused accounts, state collisions |
| **Test-suite-driven** | Fails when run with other tests | Leaked state, shared fixtures, order dependencies |
### Flakiness Decision Tree
```
Test fails intermittently
├─ Fails locally too?
│ ├─ YES → Timing/async issue → Check waits and assertions
│ └─ NO → CI-specific → Check environment differences
├─ Fails only with multiple workers?
│ └─ YES → Parallelism issue → Check data isolation
├─ Fails only when run after specific tests?
│ └─ YES → State leak → Check fixtures and cleanup
└─ Fails randomly regardless of conditions?
└─ External dependency → Check network/API stability
```
## Detection and Reproduction
### Confirming Flakiness
```bash
# Run test multiple times to confirm instability
npx playwright test tests/checkout.spec.ts --repeat-each=20
# Run with single worker to isolate parallelism issues
npx playwright test --workers=1
# Run in CI-like conditions locally
CI=true npx playwright test --repeat-each=10
```
### Reproduction Strategies
```typescript
// playwright.config.ts - Enable artifacts for flaky test investigation
export default defineConfig({
retries: process.env.CI ? 2 : 0,
use: {
trace: "on-first-retry", // Capture trace on retry
video: "retain-on-failure",
screenshot: "only-on-failure",
},
});
```
### Identify Flaky Tests Programmatically
```typescript
// Track test results across runs
test.afterEach(async ({}, testInfo) => {
if (testInfo.retry > 0 && testInfo.status === "passed") {
console.warn(`FLAKY: ${testInfo.title} passed on retry ${testInfo.retry}`);
// Log to your tracking system
}
});
```
## Root Cause Analysis
### Event Logging for Race Conditions
Add comprehensive event logging to expose timing issues:
```typescript
test.beforeEach(async ({ page }) => {
page.on("console", (msg) =>
console.log(`CONSOLE [${msg.type()}]:`, msg.text()),
);
page.on("pageerror", (err) => console.error("PAGE ERROR:", err.message));
page.on("requestfailed", (req) =>
console.error(`REQUEST FAILED: ${req.url()}`),
);
});
```
> **For comprehensive console error handling** (fail on errors, allowed patterns, fixtures), see [console-errors.md](console-errors.md).
### Network Timing Analysis
```typescript
// Capture slow or failed requests
test.beforeEach(async ({ page }) => {
const slowRequests: string[] = [];
page.on("requestfinished", (request) => {
const timing = request.timing();
const duration = timing.responseEnd - timing.requestStart;
if (duration > 2000) {
slowRequests.push(`${request.url()} took ${duration}ms`);
}
});
page.on("requestfailed", (request) => {
console.error(`Failed: ${request.url()} - ${request.failure()?.errorText}`);
});
});
```
### Trace Analysis
```bash
# View trace from failed CI run
npx playwright show-trace path/to/trace.zip
# Generate trace for specific test
npx playwright test tests/flaky.spec.ts --trace on
```
## Fixing Strategies by Type
### UI-Driven Flakiness
**Problem: Element not ready when action executes**
```typescript
// ❌ BAD: No wait for element state
await page.click("#submit");
await page.fill("#username", "test"); // Element may not be ready
// ✅ GOOD: Actions + assertions pattern (auto-waiting built-in)
await page.getByRole("button", { name: "Submit" }).click();
await expect(page.getByRole("heading", { name: "Dashboard" })).toBeVisible();
```
**Problem: Animations or transitions interfere**
```typescript
// ❌ BAD: Click during animation
await page.click(".menu-item");
// ✅ GOOD: Wait for animation to complete
await page.getByRole("menuitem", { name: "Settings" }).click();
await expect(page.getByRole("dialog")).toBeVisible();
// Or disable animations entirely
await page.emulateMedia({ reducedMotion: "reduce" });
```
**Problem: Brittle selectors**
```typescript
// ❌ BAD: Fragile CSS chain
await page.click("div.container > div:nth-child(2) > button.btn-primary");
// ✅ GOOD: Semantic selectors
await page.getByRole("button", { name: "Continue" }).click();
await page.getByTestId("checkout-button").click();
await page.getByLabel("Email address").fill("test@example.com");
```
### Async/Timing Flakiness
**Problem: Race between test and application**
```typescript
// ❌ BAD: Arbitrary sleep
await page.click("#load-data");
await page.waitForTimeout(3000); // Hope data loads in 3s
// ✅ GOOD: Wait for specific condition
await page.click("#load-data");
await expect(page.locator(".data-row")).toHaveCount(10, { timeout: 10000 });
// ✅ BETTER: Wait for network response, then assert
const responsePromise = page.waitForResponse(
(r) =>
r.url().includes("/api/data") &&
r.request().method() === "GET" &&
r.ok(),
);
await page.click("#load-data");
await responsePromise;
await expect(page.locator(".data-row")).toHaveCount(10);
```
> **For comprehensive waiting strategies** (navigation, element state, network, polling with `toPass()`), see [assertions-waiting.md](assertions-waiting.md#waiting-strategies).
**Problem: Complex async state**
```typescript
// Custom wait for application-specific conditions
await page.waitForFunction(() => {
const app = (window as any).__APP_STATE__;
return app?.isReady && !app?.isLoading;
});
// Wait for multiple conditions
await Promise.all([
page.waitForResponse("**/api/user"),
page.waitForResponse("**/api/settings"),
page.getByRole("button", { name: "Load" }).click(),
]);
```
### Data/Parallelism-Driven Flakiness
**Problem: Tests share backend data**
```typescript
// ❌ BAD: All workers use same user
const testUser = { email: "test@example.com", password: "pass123" };
// ✅ GOOD: Unique data per worker
import { test as base } from "@playwright/test";
export const test = base.extend<
{},
{ testUser: { email: string; id: string } }
>({
testUser: [
async ({}, use, workerInfo) => {
const email = `test-${workerInfo.workerIndex}-${Date.now()}@example.com`;
const user = await createTestUser(email);
await use(user);
await deleteTestUser(user.id);
},
{ scope: "worker" },
],
});
```
**Problem: Shared storageState across workers**
```typescript
// ❌ BAD: All workers share same auth state
use: {
storageState: '.auth/user.json',
}
// ✅ GOOD: Per-worker auth state
export const test = base.extend<{}, { workerStorageState: string }>({
workerStorageState: [
async ({ browser }, use, workerInfo) => {
const id = workerInfo.workerIndex;
const fileName = `.auth/user-${id}.json`;
if (!fs.existsSync(fileName)) {
const page = await browser.newPage({ storageState: undefined });
await authenticateUser(page, `worker${id}@test.com`);
await page.context().storageState({ path: fileName });
await page.close();
}
await use(fileName);
},
{ scope: "worker" },
],
});
```
### Test-Suite-Driven Flakiness (State Leaks)
**Problem: Tests affect each other**
```typescript
// ❌ BAD: Module-level state persists across tests
let sharedPage: Page;
test.beforeAll(async ({ browser }) => {
sharedPage = await browser.newPage(); // Shared across tests!
});
// ✅ GOOD: Use Playwright's default isolation (fresh context per test)
test("first test", async ({ page }) => {
// Fresh page for this test
});
test("second test", async ({ page }) => {
// Fresh page for this test
});
```
**Problem: Fixture cleanup not happening**
```typescript
// ✅ GOOD: Proper fixture with cleanup
export const test = base.extend<{ tempFile: string }>({
tempFile: async ({}, use) => {
const file = `/tmp/test-${Date.now()}.json`;
fs.writeFileSync(file, "{}");
await use(file);
// Cleanup always runs, even on failure
if (fs.existsSync(file)) {
fs.unlinkSync(file);
}
},
});
```
## CI-Specific Flakiness
### Why Tests Fail Only in CI
| CI Condition | Impact | Solution |
| ------------------ | ------------------------------------- | ---------------------------------------------------- |
| Slower CPU | Actions complete later than expected | Use auto-waiting, not timeouts |
| Cold browser start | No cached assets, slower initial load | Add explicit waits for first navigation |
| Headless mode | Different rendering behavior | Test locally in headless mode |
| Shared runners | Resource contention | Reduce parallelism or use dedicated runners |
| Network latency | API calls slower | Mock external APIs, increase timeouts for real calls |
### Simulating CI Locally
```bash
# Run headless with CI environment variable
CI=true npx playwright test
# Limit CPU (Linux/Mac)
cpulimit -l 50 -- npx playwright test
# Run in Docker matching CI environment
docker run -it --rm \
-v $(pwd):/work \
-w /work \
mcr.microsoft.com/playwright:v1.40.0-jammy \
npx playwright test
```
### Consistent Viewport and Scale
```typescript
// playwright.config.ts - Match CI rendering exactly
export default defineConfig({
use: {
viewport: { width: 1280, height: 720 },
deviceScaleFactor: 1,
},
});
```
### Network Stubbing for External APIs
```typescript
// Eliminate external API flakiness
test.beforeEach(async ({ page }) => {
// Stub unstable third-party APIs
await page.route("**/api.analytics.com/**", (route) =>
route.fulfill({ body: "" }),
);
await page.route("**/api.payment-provider.com/**", (route) =>
route.fulfill({ json: { status: "ok" } }),
);
});
// Test-specific stub
test("checkout with payment", async ({ page }) => {
await page.route("**/api/payment", (route) =>
route.fulfill({ json: { success: true, transactionId: "test-123" } }),
);
// Test proceeds with deterministic response
});
```
## Quarantine and Management
### Quarantine Pattern
```typescript
// playwright.config.ts - Separate flaky tests
export default defineConfig({
projects: [
{
name: "stable",
testIgnore: ["**/*.flaky.spec.ts"],
},
{
name: "quarantine",
testMatch: ["**/*.flaky.spec.ts"],
retries: 3,
},
],
});
```
### Annotation-Based Quarantine
```typescript
// Mark flaky tests with annotations
test("intermittent checkout issue", async ({ page }, testInfo) => {
testInfo.annotations.push({
type: "flaky",
description: "Investigating payment API timing - JIRA-1234",
});
// Test implementation
});
// Skip flaky test conditionally
test("known CI flaky", async ({ page }) => {
test.skip(!!process.env.CI, "Flaky in CI - investigating JIRA-5678");
// Test implementation
});
```
## Prevention Strategies
### Test Burn-In
```bash
# Run new tests many times before merging
npx playwright test tests/new-feature.spec.ts --repeat-each=50
# Run in parallel to expose race conditions
npx playwright test tests/new-feature.spec.ts --repeat-each=20 --workers=4
```
### Isolation Checklist
```typescript
// ✅ Each test should be self-contained
test.describe("User profile", () => {
test("can update name", async ({ page, testUser }) => {
// Uses unique testUser fixture
// No dependency on other tests
// Cleanup handled by fixture
});
test("can update email", async ({ page, testUser }) => {
// Independent of "can update name"
// Own testUser, own state
});
});
```
### Defensive Assertions
```typescript
// ❌ BAD: Single point of failure
await expect(page.locator(".items")).toHaveCount(5);
// ✅ GOOD: Progressive assertions that help diagnose
await expect(page.locator(".items-container")).toBeVisible();
await expect(page.locator(".loading")).not.toBeVisible();
await expect(page.locator(".items")).toHaveCount(5);
```
### Retry Budget
```typescript
// playwright.config.ts - Limit retries to avoid masking issues
export default defineConfig({
retries: process.env.CI ? 2 : 0, // Only retry in CI
expect: {
timeout: 10000, // Reasonable assertion timeout
},
timeout: 60000, // Test timeout
});
```
## Anti-Patterns to Avoid
| Anti-Pattern | Problem | Solution |
| ----------------------------------------- | ----------------------------------- | ---------------------------------------------- |
| `waitForTimeout()` as primary wait | Arbitrary, hides real timing issues | Use auto-waiting assertions |
| Increasing global timeout to "fix" flakes | Masks root cause, slows all tests | Find and fix actual timing issue |
| Retrying until pass | Hides systemic problems | Fix root cause, use retries for diagnosis only |
| Shared test data across workers | Race conditions, collisions | Isolate data per worker |
| Testing real external APIs | Network variability | Mock external dependencies |
| Module-level mutable state | Leaks between tests | Use fixtures with proper cleanup |
| Ignoring flaky tests | Problem compounds over time | Quarantine and track for fixing |
## Related References
- **Debugging**: See [debugging.md](debugging.md) for trace viewer and inspector
- **Fixtures**: See [fixtures-hooks.md](../core/fixtures-hooks.md) for worker-scoped isolation
- **Performance**: See [performance.md](../infrastructure-ci-cd/performance.md) for parallel execution patterns
- **Assertions**: See [assertions-waiting.md](../core/assertions-waiting.md) for auto-waiting patterns
- **Global Setup**: See [global-setup.md](../core/global-setup.md) for setup vs fixtures decision