mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-05-12 17:22:38 +02:00
860 lines
30 KiB
Markdown
Executable file
860 lines
30 KiB
Markdown
Executable file
# Flaky Tests
|
|
|
|
> **When to use**: A test passes sometimes and fails other times. You need to diagnose the root cause, fix it, and prevent it from happening again.
|
|
> **Prerequisites**: [core/assertions-and-waiting.md](assertions-and-waiting.md), [core/fixtures-and-hooks.md](fixtures-and-hooks.md)
|
|
|
|
## Quick Reference
|
|
|
|
```bash
|
|
# Burn-in test — run 10 times to expose flakiness
|
|
npx playwright test tests/checkout.spec.ts --repeat-each=10
|
|
|
|
# Run with retries to catch intermittent failures
|
|
npx playwright test --retries=3
|
|
|
|
# Run single test in isolation to rule out state leaks
|
|
npx playwright test tests/checkout.spec.ts --grep "adds item" --workers=1
|
|
|
|
# Best Playwright 1.59+ trace mode for flaky tests
|
|
npx playwright test --retries=3 --trace=retain-on-failure-and-retries
|
|
|
|
# Run with tracing on every attempt only when you need maximum detail
|
|
npx playwright test --retries=3 --trace=on
|
|
|
|
# Run in fully parallel mode to expose isolation issues
|
|
npx playwright test --fully-parallel --workers=4
|
|
|
|
# List flaky tests (tests that failed then passed on retry)
|
|
npx playwright test --retries=2 --reporter=json | jq '.suites[].specs[] | select(.ok == true and (.tests[].results | length > 1))'
|
|
```
|
|
|
|
## Patterns
|
|
|
|
### Flakiness Taxonomy
|
|
|
|
Every flaky test falls into one of four categories. Identify the category first, then apply the matching fix.
|
|
|
|
| Category | Symptom | Typical Root Cause | Diagnosis Method |
|
|
|---|---|---|---|
|
|
| **Timing / Async** | Fails intermittently everywhere | Race conditions, missing `await`, arbitrary waits | Fails locally with `--repeat-each=20` |
|
|
| **Test Isolation** | Fails only when run with other tests, passes alone | Shared mutable state, data collisions, test ordering dependency | Passes with `--workers=1 --grep "this test"`, fails in full suite |
|
|
| **Environment** | Fails only in CI, passes locally | Different OS, viewport, fonts, network latency, missing dependencies | Compare CI screenshots/traces with local; run in Docker locally |
|
|
| **Infrastructure** | Random failures unrelated to test logic | Browser crash, OOM, DNS resolution, file system race | Failures have no pattern; error messages reference browser internals |
|
|
|
|
### Diagnosis Flowchart
|
|
|
|
Follow this decision tree to identify which category your flaky test belongs to.
|
|
|
|
```
|
|
Test is flaky
|
|
|
|
|
+-- Does it fail locally with --repeat-each=20?
|
|
| |
|
|
| +-- YES --> TIMING / ASYNC issue
|
|
| | - Missing await
|
|
| | - Using waitForTimeout instead of assertions
|
|
| | - Race condition between action and assertion
|
|
| | - Not waiting for network response before asserting
|
|
| |
|
|
| +-- NO --> Does it fail only in CI?
|
|
| |
|
|
| +-- YES --> ENVIRONMENT issue
|
|
| | - Different viewport/screen size
|
|
| | - Missing fonts causing layout shift
|
|
| | - Slower CI machines hitting timeouts
|
|
| | - External services unavailable
|
|
| |
|
|
| +-- NO --> Does it fail only when run with other tests?
|
|
| |
|
|
| +-- YES --> ISOLATION issue
|
|
| | - Shared mutable state (module-level variables)
|
|
| | - Database/API state from previous test
|
|
| | - localStorage/cookies leaking between tests
|
|
| | - Parallel tests colliding on unique constraints
|
|
| |
|
|
| +-- NO --> INFRASTRUCTURE issue
|
|
| - Browser process crash
|
|
| - Out of memory
|
|
| - File system or network instability
|
|
| - Flaky third-party service
|
|
```
|
|
|
|
### Playwright 1.59 Trace Retention Strategy
|
|
|
|
For flaky tests on Playwright 1.59+, prefer `trace: 'retain-on-failure-and-retries'` over `trace: 'on'` when you are comparing failed attempts with passing retries. It keeps the runs that matter without storing every passing trace in the suite.
|
|
|
|
```typescript
|
|
import { defineConfig } from '@playwright/test';
|
|
|
|
export default defineConfig({
|
|
retries: process.env.CI ? 2 : 0,
|
|
use: {
|
|
trace: process.env.CI
|
|
? 'retain-on-failure-and-retries'
|
|
: 'on-first-retry',
|
|
},
|
|
});
|
|
```
|
|
|
|
This is especially effective when a test fails once, passes on retry, and you need to diff those attempts in Trace Viewer.
|
|
|
|
### Use UI Mode and Trace Viewer Filters
|
|
|
|
When debugging a noisy trace or a long UI Mode run, use the newer filtering options to focus on the failing test, relevant actions, or a specific assertion sequence instead of scanning the full event stream manually.
|
|
|
|
### Fix: Timing and Async Issues
|
|
|
|
**Use when**: The test fails locally with `--repeat-each=20`, or you see `waitForTimeout`, missing `await`, or race conditions.
|
|
|
|
The most common source of flakiness. The fix is always the same: replace arbitrary waits and manual checks with Playwright's auto-retrying mechanisms.
|
|
|
|
**TypeScript**
|
|
```typescript
|
|
import { test, expect } from '@playwright/test';
|
|
|
|
// ---- FIX 1: Replace waitForTimeout with assertions ----
|
|
|
|
// BAD — arbitrary delay, fails on slow machines, wastes time on fast ones
|
|
test('bad: uses arbitrary wait', async ({ page }) => {
|
|
await page.goto('/dashboard');
|
|
await page.getByRole('button', { name: 'Refresh' }).click();
|
|
await page.waitForTimeout(3000); // hoping data loads in 3s
|
|
await expect(page.getByTestId('data-table')).toBeVisible();
|
|
});
|
|
|
|
// GOOD — auto-retrying assertion waits exactly as long as needed
|
|
test('good: uses auto-retrying assertion', async ({ page }) => {
|
|
await page.goto('/dashboard');
|
|
await page.getByRole('button', { name: 'Refresh' }).click();
|
|
await expect(page.getByTestId('data-table')).toBeVisible();
|
|
});
|
|
|
|
// ---- FIX 2: Wait for network responses before asserting ----
|
|
|
|
// BAD — clicks button and immediately asserts, but data comes from API
|
|
test('bad: does not wait for API response', async ({ page }) => {
|
|
await page.goto('/users');
|
|
await page.getByRole('button', { name: 'Load More' }).click();
|
|
// Flaky: API response may not have arrived yet
|
|
await expect(page.getByRole('listitem')).toHaveCount(20);
|
|
});
|
|
|
|
// GOOD — waits for the specific API response that populates the data
|
|
test('good: waits for API response', async ({ page }) => {
|
|
await page.goto('/users');
|
|
|
|
const responsePromise = page.waitForResponse(
|
|
(resp) => resp.url().includes('/api/users') && resp.status() === 200
|
|
);
|
|
await page.getByRole('button', { name: 'Load More' }).click();
|
|
await responsePromise;
|
|
|
|
await expect(page.getByRole('listitem')).toHaveCount(20);
|
|
});
|
|
|
|
// ---- FIX 3: Handle animations and transitions ----
|
|
|
|
// BAD — element exists but is mid-animation, click lands on wrong target
|
|
test('bad: clicks during animation', async ({ page }) => {
|
|
await page.goto('/modal-demo');
|
|
await page.getByRole('button', { name: 'Open' }).click();
|
|
// Modal is animating in — click may miss the button inside it
|
|
await page.getByRole('button', { name: 'Confirm' }).click();
|
|
});
|
|
|
|
// GOOD — wait for the modal to be fully stable before interacting
|
|
test('good: waits for stable state', async ({ page }) => {
|
|
await page.goto('/modal-demo');
|
|
await page.getByRole('button', { name: 'Open' }).click();
|
|
// toBeVisible auto-waits for stability (no animation in progress)
|
|
await expect(page.getByRole('dialog')).toBeVisible();
|
|
await page.getByRole('button', { name: 'Confirm' }).click();
|
|
});
|
|
|
|
// ---- FIX 4: Use toPass() for multi-step assertions that must succeed together ----
|
|
|
|
test('good: retry entire assertion block', async ({ page }) => {
|
|
await page.goto('/search');
|
|
|
|
await expect(async () => {
|
|
await page.getByLabel('Search').fill('playwright');
|
|
await page.getByRole('button', { name: 'Search' }).click();
|
|
await expect(page.getByTestId('result-count')).toHaveText('10 results');
|
|
}).toPass({
|
|
timeout: 15_000,
|
|
intervals: [1_000, 2_000, 5_000],
|
|
});
|
|
});
|
|
```
|
|
|
|
**JavaScript**
|
|
```javascript
|
|
const { test, expect } = require('@playwright/test');
|
|
|
|
// FIX 1: Replace waitForTimeout with assertions
|
|
test('good: uses auto-retrying assertion', async ({ page }) => {
|
|
await page.goto('/dashboard');
|
|
await page.getByRole('button', { name: 'Refresh' }).click();
|
|
await expect(page.getByTestId('data-table')).toBeVisible();
|
|
});
|
|
|
|
// FIX 2: Wait for network responses before asserting
|
|
test('good: waits for API response', async ({ page }) => {
|
|
await page.goto('/users');
|
|
|
|
const responsePromise = page.waitForResponse(
|
|
(resp) => resp.url().includes('/api/users') && resp.status() === 200
|
|
);
|
|
await page.getByRole('button', { name: 'Load More' }).click();
|
|
await responsePromise;
|
|
|
|
await expect(page.getByRole('listitem')).toHaveCount(20);
|
|
});
|
|
|
|
// FIX 3: Handle animations and transitions
|
|
test('good: waits for stable state', async ({ page }) => {
|
|
await page.goto('/modal-demo');
|
|
await page.getByRole('button', { name: 'Open' }).click();
|
|
await expect(page.getByRole('dialog')).toBeVisible();
|
|
await page.getByRole('button', { name: 'Confirm' }).click();
|
|
});
|
|
|
|
// FIX 4: Use toPass() for multi-step assertions
|
|
test('good: retry entire assertion block', async ({ page }) => {
|
|
await page.goto('/search');
|
|
|
|
await expect(async () => {
|
|
await page.getByLabel('Search').fill('playwright');
|
|
await page.getByRole('button', { name: 'Search' }).click();
|
|
await expect(page.getByTestId('result-count')).toHaveText('10 results');
|
|
}).toPass({
|
|
timeout: 15_000,
|
|
intervals: [1_000, 2_000, 5_000],
|
|
});
|
|
});
|
|
```
|
|
|
|
### Fix: Test Isolation Issues
|
|
|
|
**Use when**: The test passes when run alone (`--grep "test name"`) but fails when run with other tests, or fails only in parallel mode.
|
|
|
|
Isolation issues come from shared state: module-level variables, database rows, localStorage, cookies, or file system artifacts.
|
|
|
|
**TypeScript**
|
|
```typescript
|
|
import { test as base, expect } from '@playwright/test';
|
|
|
|
// ---- FIX 1: Unique test data per test ----
|
|
|
|
// BAD — all parallel tests use the same email, causing unique constraint violations
|
|
test('bad: hardcoded data', async ({ page }) => {
|
|
await page.goto('/register');
|
|
await page.getByLabel('Email').fill('test@example.com');
|
|
await page.getByRole('button', { name: 'Register' }).click();
|
|
await expect(page.getByText('Welcome')).toBeVisible();
|
|
});
|
|
|
|
// GOOD — unique email per test run
|
|
test('good: unique data per test', async ({ page }) => {
|
|
const email = `test-${Date.now()}-${Math.random().toString(36).slice(2)}@example.com`;
|
|
|
|
await page.goto('/register');
|
|
await page.getByLabel('Email').fill(email);
|
|
await page.getByRole('button', { name: 'Register' }).click();
|
|
await expect(page.getByText('Welcome')).toBeVisible();
|
|
});
|
|
|
|
// ---- FIX 2: Worker-scoped fixtures for shared expensive resources ----
|
|
|
|
type WorkerFixtures = {
|
|
workerAccount: { email: string; id: string };
|
|
};
|
|
|
|
export const test = base.extend<{}, WorkerFixtures>({
|
|
workerAccount: [async ({ request }, use) => {
|
|
const email = `worker-${Date.now()}-${Math.random().toString(36).slice(2)}@example.com`;
|
|
const response = await request.post('/api/users', {
|
|
data: { email, password: 'TestP@ss123!' },
|
|
});
|
|
const account = await response.json();
|
|
|
|
await use({ email, id: account.id });
|
|
|
|
// Cleanup after all tests in this worker are done
|
|
await request.delete(`/api/users/${account.id}`);
|
|
}, { scope: 'worker' }],
|
|
});
|
|
|
|
// ---- FIX 3: Clean up state in fixture teardown ----
|
|
|
|
export const testWithCleanup = base.extend({
|
|
cleanPage: async ({ page }, use) => {
|
|
await use(page);
|
|
|
|
// Teardown: clear all client-side state
|
|
await page.evaluate(() => {
|
|
localStorage.clear();
|
|
sessionStorage.clear();
|
|
});
|
|
await page.context().clearCookies();
|
|
},
|
|
});
|
|
|
|
// ---- FIX 4: Isolate tests that cannot run in parallel ----
|
|
|
|
import { test } from '@playwright/test';
|
|
|
|
// Use serial mode ONLY for tests that genuinely depend on shared state
|
|
// (e.g., a multi-step wizard where each test is one step)
|
|
test.describe.serial('checkout wizard', () => {
|
|
test('step 1: add items', async ({ page }) => {
|
|
await page.goto('/shop');
|
|
await page.getByRole('button', { name: 'Add Widget' }).click();
|
|
await expect(page.getByTestId('cart-count')).toHaveText('1');
|
|
});
|
|
|
|
test('step 2: enter shipping', async ({ page }) => {
|
|
await page.goto('/checkout/shipping');
|
|
await page.getByLabel('Address').fill('123 Test St');
|
|
await page.getByRole('button', { name: 'Continue' }).click();
|
|
});
|
|
});
|
|
```
|
|
|
|
**JavaScript**
|
|
```javascript
|
|
const { test: base, expect } = require('@playwright/test');
|
|
|
|
// FIX 1: Unique test data per test
|
|
test('good: unique data per test', async ({ page }) => {
|
|
const email = `test-${Date.now()}-${Math.random().toString(36).slice(2)}@example.com`;
|
|
|
|
await page.goto('/register');
|
|
await page.getByLabel('Email').fill(email);
|
|
await page.getByRole('button', { name: 'Register' }).click();
|
|
await expect(page.getByText('Welcome')).toBeVisible();
|
|
});
|
|
|
|
// FIX 2: Worker-scoped fixtures for shared expensive resources
|
|
const test = base.extend({
|
|
workerAccount: [async ({ request }, use) => {
|
|
const email = `worker-${Date.now()}-${Math.random().toString(36).slice(2)}@example.com`;
|
|
const response = await request.post('/api/users', {
|
|
data: { email, password: 'TestP@ss123!' },
|
|
});
|
|
const account = await response.json();
|
|
|
|
await use({ email, id: account.id });
|
|
|
|
await request.delete(`/api/users/${account.id}`);
|
|
}, { scope: 'worker' }],
|
|
});
|
|
|
|
module.exports = { test, expect };
|
|
```
|
|
|
|
### Fix: Environment Issues
|
|
|
|
**Use when**: The test passes locally but fails in CI, or fails on certain operating systems, viewports, or machines.
|
|
|
|
Environment flakiness stems from differences in rendering, timing, available resources, or external service availability between your local machine and CI.
|
|
|
|
**TypeScript**
|
|
```typescript
|
|
// playwright.config.ts — environment-consistent configuration
|
|
import { defineConfig, devices } from '@playwright/test';
|
|
|
|
export default defineConfig({
|
|
// ---- FIX 1: Disable animations for deterministic behavior ----
|
|
use: {
|
|
// Disable CSS animations and transitions
|
|
contextOptions: {
|
|
reducedMotion: 'reduce',
|
|
},
|
|
},
|
|
|
|
// ---- FIX 2: Consistent viewport across environments ----
|
|
projects: [
|
|
{
|
|
name: 'chromium',
|
|
use: {
|
|
...devices['Desktop Chrome'],
|
|
// Explicit viewport prevents layout differences between local and CI
|
|
viewport: { width: 1280, height: 720 },
|
|
},
|
|
},
|
|
],
|
|
|
|
// ---- FIX 3: Use webServer to start app in CI ----
|
|
webServer: {
|
|
command: 'npm run start',
|
|
url: 'http://localhost:3000',
|
|
reuseExistingServer: !process.env.CI,
|
|
timeout: 120_000,
|
|
},
|
|
|
|
// ---- FIX 4: Higher timeouts for slower CI machines ----
|
|
timeout: process.env.CI ? 60_000 : 30_000,
|
|
expect: {
|
|
timeout: process.env.CI ? 10_000 : 5_000,
|
|
},
|
|
});
|
|
```
|
|
|
|
```typescript
|
|
// tests/fixtures/stub-externals.ts — stub external services
|
|
import { test as base, expect } from '@playwright/test';
|
|
|
|
export const test = base.extend({
|
|
// Auto fixture: block all external services in every test
|
|
stubExternals: [async ({ page }, use) => {
|
|
// Block third-party scripts that vary between environments
|
|
await page.route(/google-analytics|segment|hotjar|intercom/, (route) =>
|
|
route.abort()
|
|
);
|
|
|
|
// Stub flaky external API with consistent response
|
|
await page.route('**/api.external-service.com/**', (route) =>
|
|
route.fulfill({
|
|
status: 200,
|
|
contentType: 'application/json',
|
|
body: JSON.stringify({ status: 'ok', data: [] }),
|
|
})
|
|
);
|
|
|
|
await use();
|
|
}, { auto: true }],
|
|
});
|
|
|
|
export { expect };
|
|
```
|
|
|
|
**JavaScript**
|
|
```javascript
|
|
// playwright.config.js
|
|
const { defineConfig, devices } = require('@playwright/test');
|
|
|
|
module.exports = defineConfig({
|
|
use: {
|
|
contextOptions: {
|
|
reducedMotion: 'reduce',
|
|
},
|
|
},
|
|
projects: [
|
|
{
|
|
name: 'chromium',
|
|
use: {
|
|
...devices['Desktop Chrome'],
|
|
viewport: { width: 1280, height: 720 },
|
|
},
|
|
},
|
|
],
|
|
webServer: {
|
|
command: 'npm run start',
|
|
url: 'http://localhost:3000',
|
|
reuseExistingServer: !process.env.CI,
|
|
timeout: 120_000,
|
|
},
|
|
timeout: process.env.CI ? 60_000 : 30_000,
|
|
expect: {
|
|
timeout: process.env.CI ? 10_000 : 5_000,
|
|
},
|
|
});
|
|
```
|
|
|
|
```javascript
|
|
// tests/fixtures/stub-externals.js
|
|
const { test: base, expect } = require('@playwright/test');
|
|
|
|
const test = base.extend({
|
|
stubExternals: [async ({ page }, use) => {
|
|
await page.route(/google-analytics|segment|hotjar|intercom/, (route) =>
|
|
route.abort()
|
|
);
|
|
|
|
await page.route('**/api.external-service.com/**', (route) =>
|
|
route.fulfill({
|
|
status: 200,
|
|
contentType: 'application/json',
|
|
body: JSON.stringify({ status: 'ok', data: [] }),
|
|
})
|
|
);
|
|
|
|
await use();
|
|
}, { auto: true }],
|
|
});
|
|
|
|
module.exports = { test, expect };
|
|
```
|
|
|
|
### Detection Strategies
|
|
|
|
**Use when**: You suspect flakiness but the test does not fail consistently, or you want to validate a fix actually eliminated the flakiness.
|
|
|
|
**TypeScript**
|
|
```typescript
|
|
// ---- Strategy 1: Burn-in testing with --repeat-each ----
|
|
// Run a test 20 times. If it fails even once, it has a flakiness bug.
|
|
// npx playwright test tests/checkout.spec.ts --repeat-each=20
|
|
|
|
// ---- Strategy 2: Retry configuration to catch intermittent failures ----
|
|
// playwright.config.ts
|
|
import { defineConfig } from '@playwright/test';
|
|
|
|
export default defineConfig({
|
|
// Retries in CI surface flaky tests in the report
|
|
retries: process.env.CI ? 2 : 0,
|
|
|
|
// Reporter shows which tests needed retries
|
|
reporter: process.env.CI
|
|
? [['html', { open: 'never' }], ['json', { outputFile: 'results.json' }]]
|
|
: [['html', { open: 'on-failure' }]],
|
|
});
|
|
|
|
// ---- Strategy 3: Custom reporter to track flaky test metrics ----
|
|
// flaky-reporter.ts
|
|
import type { Reporter, TestCase, TestResult } from '@playwright/test/reporter';
|
|
|
|
class FlakyReporter implements Reporter {
|
|
private flakyTests: { name: string; file: string; retries: number }[] = [];
|
|
|
|
onTestEnd(test: TestCase, result: TestResult) {
|
|
if (result.retry > 0 && result.status === 'passed') {
|
|
this.flakyTests.push({
|
|
name: test.title,
|
|
file: test.location.file,
|
|
retries: result.retry,
|
|
});
|
|
}
|
|
}
|
|
|
|
onEnd() {
|
|
if (this.flakyTests.length > 0) {
|
|
console.log('\n--- FLAKY TESTS ---');
|
|
for (const t of this.flakyTests) {
|
|
console.log(` ${t.file} > "${t.name}" (needed ${t.retries} retries)`);
|
|
}
|
|
console.log(`Total flaky: ${this.flakyTests.length}`);
|
|
}
|
|
}
|
|
}
|
|
|
|
export default FlakyReporter;
|
|
```
|
|
|
|
```typescript
|
|
// playwright.config.ts — register custom flaky reporter
|
|
import { defineConfig } from '@playwright/test';
|
|
|
|
export default defineConfig({
|
|
retries: process.env.CI ? 2 : 0,
|
|
reporter: [
|
|
['html'],
|
|
['./flaky-reporter.ts'],
|
|
],
|
|
});
|
|
```
|
|
|
|
**JavaScript**
|
|
```javascript
|
|
// flaky-reporter.js
|
|
class FlakyReporter {
|
|
constructor() {
|
|
this.flakyTests = [];
|
|
}
|
|
|
|
onTestEnd(test, result) {
|
|
if (result.retry > 0 && result.status === 'passed') {
|
|
this.flakyTests.push({
|
|
name: test.title,
|
|
file: test.location.file,
|
|
retries: result.retry,
|
|
});
|
|
}
|
|
}
|
|
|
|
onEnd() {
|
|
if (this.flakyTests.length > 0) {
|
|
console.log('\n--- FLAKY TESTS ---');
|
|
for (const t of this.flakyTests) {
|
|
console.log(` ${t.file} > "${t.name}" (needed ${t.retries} retries)`);
|
|
}
|
|
console.log(`Total flaky: ${this.flakyTests.length}`);
|
|
}
|
|
}
|
|
}
|
|
|
|
module.exports = FlakyReporter;
|
|
```
|
|
|
|
### Quarantine Strategy
|
|
|
|
**Use when**: A test is known-flaky and you cannot fix it immediately. Quarantine it so it does not block CI, but track it so it does not rot.
|
|
|
|
**TypeScript**
|
|
```typescript
|
|
import { test, expect } from '@playwright/test';
|
|
|
|
// ---- Option 1: test.fixme() — skips the test with a reason ----
|
|
test.fixme('checkout with promo code applies discount', async ({ page }) => {
|
|
// TODO(JIRA-1234): Flaky due to race condition in promo service
|
|
// Fails ~10% of runs. Root cause: /api/promo responds after rendering
|
|
await page.goto('/checkout');
|
|
await page.getByLabel('Promo code').fill('SAVE20');
|
|
await page.getByRole('button', { name: 'Apply' }).click();
|
|
await expect(page.getByTestId('discount')).toHaveText('-$20.00');
|
|
});
|
|
|
|
// ---- Option 2: test.fail() — inverts: test passes only if it fails ----
|
|
// Use this when you KNOW the test fails and want CI to alert you when it starts passing
|
|
test.fail('known broken: export to PDF', async ({ page }) => {
|
|
// When this test starts passing, the .fail() annotation will make it fail,
|
|
// reminding you to remove the annotation
|
|
await page.goto('/reports');
|
|
await page.getByRole('button', { name: 'Export PDF' }).click();
|
|
await expect(page.getByText('PDF ready')).toBeVisible({ timeout: 10_000 });
|
|
});
|
|
|
|
// ---- Option 3: Skip by tag — quarantine with a grep filter ----
|
|
test('@flaky checkout race condition', async ({ page }) => {
|
|
// In CI, exclude flaky-tagged tests: npx playwright test --grep-invert @flaky
|
|
// Run ONLY flaky tests nightly: npx playwright test --grep @flaky --retries=5
|
|
await page.goto('/checkout');
|
|
await page.getByRole('button', { name: 'Place Order' }).click();
|
|
await expect(page.getByText('Order confirmed')).toBeVisible();
|
|
});
|
|
```
|
|
|
|
**JavaScript**
|
|
```javascript
|
|
const { test, expect } = require('@playwright/test');
|
|
|
|
// Option 1: test.fixme() — skips the test with a reason
|
|
test.fixme('checkout with promo code applies discount', async ({ page }) => {
|
|
// TODO(JIRA-1234): Flaky due to race condition in promo service
|
|
await page.goto('/checkout');
|
|
await page.getByLabel('Promo code').fill('SAVE20');
|
|
await page.getByRole('button', { name: 'Apply' }).click();
|
|
await expect(page.getByTestId('discount')).toHaveText('-$20.00');
|
|
});
|
|
|
|
// Option 2: test.fail() — inverts: test passes only if it fails
|
|
test.fail('known broken: export to PDF', async ({ page }) => {
|
|
await page.goto('/reports');
|
|
await page.getByRole('button', { name: 'Export PDF' }).click();
|
|
await expect(page.getByText('PDF ready')).toBeVisible({ timeout: 10_000 });
|
|
});
|
|
|
|
// Option 3: Skip by tag
|
|
test('@flaky checkout race condition', async ({ page }) => {
|
|
await page.goto('/checkout');
|
|
await page.getByRole('button', { name: 'Place Order' }).click();
|
|
await expect(page.getByText('Order confirmed')).toBeVisible();
|
|
});
|
|
```
|
|
|
|
**CI configuration for quarantine:**
|
|
|
|
```yaml
|
|
# .github/workflows/tests.yml
|
|
jobs:
|
|
e2e-tests:
|
|
steps:
|
|
- name: Run stable tests
|
|
run: npx playwright test --grep-invert @flaky
|
|
|
|
flaky-monitoring:
|
|
# Runs nightly, not on every PR
|
|
schedule:
|
|
- cron: '0 3 * * *'
|
|
steps:
|
|
- name: Run flaky tests with retries
|
|
run: npx playwright test --grep @flaky --retries=5 --reporter=json
|
|
- name: Report flaky test results
|
|
run: node scripts/report-flaky-metrics.js
|
|
```
|
|
|
|
### Prevention Checklist
|
|
|
|
Apply these rules from the start to prevent flakiness from entering your test suite.
|
|
|
|
**TypeScript**
|
|
```typescript
|
|
// playwright.config.ts — flake-resistant configuration
|
|
import { defineConfig, devices } from '@playwright/test';
|
|
|
|
export default defineConfig({
|
|
// RULE 1: Run tests fully parallel to expose isolation issues early
|
|
fullyParallel: true,
|
|
|
|
// RULE 2: Fail CI if test.only() is left in code
|
|
forbidOnly: !!process.env.CI,
|
|
|
|
// RULE 3: Use retries in CI to surface (not hide) flaky tests
|
|
retries: process.env.CI ? 2 : 0,
|
|
|
|
// RULE 4: Reasonable timeouts — not too high, not too low
|
|
timeout: 30_000,
|
|
expect: { timeout: 5_000 },
|
|
|
|
use: {
|
|
// RULE 5: Always capture traces on retry for debugging
|
|
trace: 'on-first-retry',
|
|
|
|
// RULE 6: Use baseURL — never hardcode full URLs in tests
|
|
baseURL: process.env.BASE_URL || 'http://localhost:3000',
|
|
|
|
// RULE 7: Disable animations for deterministic behavior
|
|
contextOptions: {
|
|
reducedMotion: 'reduce',
|
|
},
|
|
|
|
// RULE 8: Explicit viewport — same locally and in CI
|
|
viewport: { width: 1280, height: 720 },
|
|
},
|
|
|
|
// RULE 9: Start the app automatically in CI
|
|
webServer: {
|
|
command: 'npm run start',
|
|
url: 'http://localhost:3000',
|
|
reuseExistingServer: !process.env.CI,
|
|
},
|
|
});
|
|
```
|
|
|
|
```typescript
|
|
// tests/example-stable-test.spec.ts — applying all rules in a test
|
|
import { test, expect } from '@playwright/test';
|
|
|
|
test.describe('user profile', () => {
|
|
test('updates display name', async ({ page }) => {
|
|
// RULE 10: Unique data per test
|
|
const newName = `User-${Date.now()}`;
|
|
|
|
// RULE 11: Use baseURL — relative paths only
|
|
await page.goto('/profile');
|
|
|
|
// RULE 12: Role-based locators — resilient to implementation changes
|
|
await page.getByRole('textbox', { name: 'Display name' }).fill(newName);
|
|
await page.getByRole('button', { name: 'Save' }).click();
|
|
|
|
// RULE 13: Auto-retrying assertions — never manual waits
|
|
await expect(page.getByRole('alert')).toHaveText('Profile updated');
|
|
|
|
// RULE 14: Assert on the result, not intermediate states
|
|
await expect(page.getByRole('textbox', { name: 'Display name' })).toHaveValue(newName);
|
|
});
|
|
});
|
|
```
|
|
|
|
**JavaScript**
|
|
```javascript
|
|
// playwright.config.js
|
|
const { defineConfig, devices } = require('@playwright/test');
|
|
|
|
module.exports = defineConfig({
|
|
fullyParallel: true,
|
|
forbidOnly: !!process.env.CI,
|
|
retries: process.env.CI ? 2 : 0,
|
|
timeout: 30_000,
|
|
expect: { timeout: 5_000 },
|
|
use: {
|
|
trace: 'on-first-retry',
|
|
baseURL: process.env.BASE_URL || 'http://localhost:3000',
|
|
contextOptions: {
|
|
reducedMotion: 'reduce',
|
|
},
|
|
viewport: { width: 1280, height: 720 },
|
|
},
|
|
webServer: {
|
|
command: 'npm run start',
|
|
url: 'http://localhost:3000',
|
|
reuseExistingServer: !process.env.CI,
|
|
},
|
|
});
|
|
```
|
|
|
|
```javascript
|
|
// tests/example-stable-test.spec.js
|
|
const { test, expect } = require('@playwright/test');
|
|
|
|
test.describe('user profile', () => {
|
|
test('updates display name', async ({ page }) => {
|
|
const newName = `User-${Date.now()}`;
|
|
|
|
await page.goto('/profile');
|
|
await page.getByRole('textbox', { name: 'Display name' }).fill(newName);
|
|
await page.getByRole('button', { name: 'Save' }).click();
|
|
|
|
await expect(page.getByRole('alert')).toHaveText('Profile updated');
|
|
await expect(page.getByRole('textbox', { name: 'Display name' })).toHaveValue(newName);
|
|
});
|
|
});
|
|
```
|
|
|
|
## Decision Guide
|
|
|
|
```
|
|
My test is flaky. What do I do?
|
|
|
|
|
+-- Step 1: Reproduce locally
|
|
| |
|
|
| +-- npx playwright test <file> --repeat-each=20
|
|
| +-- Fails? --> TIMING issue. Fix with auto-retrying assertions.
|
|
| +-- Does not fail? --> Continue to Step 2.
|
|
|
|
|
+-- Step 2: Isolate from other tests
|
|
| |
|
|
| +-- npx playwright test --grep "exact test name" --workers=1
|
|
| +-- Passes alone? --> ISOLATION issue. Fix with unique data + fixtures.
|
|
| +-- Fails alone? --> Continue to Step 3.
|
|
|
|
|
+-- Step 3: Compare environments
|
|
| |
|
|
| +-- Download CI trace, compare with local trace
|
|
| +-- Different? --> ENVIRONMENT issue. Fix with explicit viewport,
|
|
| | reducedMotion, webServer config, stub externals.
|
|
| +-- Same? --> Continue to Step 4.
|
|
|
|
|
+-- Step 4: Check infrastructure
|
|
| |
|
|
| +-- Error mentions browser crash, OOM, DNS, ECONNREFUSED?
|
|
| +-- YES --> INFRASTRUCTURE issue. Fix with Docker, retry config,
|
|
| | health checks before test run.
|
|
| +-- NO --> Re-examine. Enable trace: 'on' and retries to collect
|
|
| more data. Compare passing and failing traces side by side.
|
|
```
|
|
|
|
## Anti-Patterns
|
|
|
|
| Don't Do This | Problem | Do This Instead |
|
|
|---|---|---|
|
|
| Increase timeout to 120s to "fix" flakiness | Masks the real issue. Tests become unbearably slow when they fail. Slows the entire CI pipeline. | Diagnose the root cause. Fix the race condition, not the timeout. |
|
|
| Use `page.waitForTimeout(N)` | Arbitrary delays are too slow on fast machines and too fast on slow ones. The #1 cause of flakiness. | Use `expect(locator).toBeVisible()`, `page.waitForResponse()`, or `expect.poll()`. |
|
|
| Ignore flaky tests ("it works if you run it again") | Flaky tests erode trust in the entire suite. People stop reading failures. Real bugs slip through. | Diagnose immediately. If you cannot fix now, quarantine with `test.fixme()` and a tracking ticket. |
|
|
| Add `--retries=3` and call it fixed | Retries do not fix flakiness, they hide it. A test that needs retries is a test with a bug. | Use retries to **detect** flakiness (check the retry count in reports), not to paper over it. |
|
|
| Use `test.describe.serial()` to fix ordering-dependent tests | Serial mode forces all tests in the block to run sequentially. It hides isolation bugs and slows the suite. | Fix the isolation issue. Each test should pass regardless of execution order. |
|
|
| Mock everything to prevent environment differences | Over-mocking removes confidence that the real system works. Tests pass but the app is broken. | Mock only external/third-party services. Test your own API for real. |
|
|
| Run `--repeat-each=100` in CI on every commit | Multiplies CI time by 100x. Wastes resources. | Run burn-in locally or in a nightly job, not on every PR. |
|
|
|
|
## Troubleshooting
|
|
|
|
| Symptom | Category | Fix |
|
|
|---|---|---|
|
|
| Test fails with "Timeout 5000ms" intermittently | Timing | Increase `expect.timeout` to 10s, or add `page.waitForResponse()` before the assertion |
|
|
| Test passes alone, fails in full suite run | Isolation | Check for module-level `let` variables, shared database rows, or localStorage leaks |
|
|
| Test passes locally, fails in CI | Environment | Compare traces. Check viewport, fonts, `reducedMotion`, and external service availability |
|
|
| Test fails with "Target closed" or "Browser closed" | Infrastructure | Check CI memory limits. Add `--workers=50%` to reduce parallel load. Add health check in `beforeAll` |
|
|
| Test fails differently every time | Timing + Isolation | Enable `trace: 'on'` and compare multiple failing traces. The inconsistency itself is a clue |
|
|
| Flaky test passes 99/100 times | Timing (rare race) | Use `--repeat-each=200` locally. Add `page.waitForResponse()` or `expect.poll()` for the specific race |
|
|
| Visual comparison test is flaky | Environment | Use `maxDiffPixelRatio` threshold. Set explicit fonts with `@font-face`. Use Docker for consistent rendering |
|
|
| Tests flake only on WebKit | Environment | WebKit has different timing behavior. Add WebKit-specific assertions or increase timeouts per project |
|
|
|
|
## Related
|
|
|
|
- [core/assertions-and-waiting.md](assertions-and-waiting.md) -- auto-retrying assertions and explicit waits
|
|
- [core/fixtures-and-hooks.md](fixtures-and-hooks.md) -- fixture teardown for test isolation
|
|
- [core/test-data-management.md](test-data-management.md) -- unique data per test, factory functions
|
|
- [core/configuration.md](configuration.md) -- retry, timeout, and trace configuration
|
|
- [core/debugging.md](debugging.md) -- trace viewer, UI mode, and Inspector for diagnosing failures
|
|
- [core/common-pitfalls.md](common-pitfalls.md) -- common mistakes that cause flakiness
|