mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-05-13 09:42:40 +02:00
496 lines
15 KiB
Markdown
496 lines
15 KiB
Markdown
# Debugging and Managing Flaky Tests
|
|
|
|
## Table of Contents
|
|
|
|
1. [Understanding Flakiness Types](#understanding-flakiness-types)
|
|
2. [Detection and Reproduction](#detection-and-reproduction)
|
|
3. [Root Cause Analysis](#root-cause-analysis)
|
|
4. [Fixing Strategies by Type](#fixing-strategies-by-type)
|
|
5. [CI-Specific Flakiness](#ci-specific-flakiness)
|
|
6. [Quarantine and Management](#quarantine-and-management)
|
|
7. [Prevention Strategies](#prevention-strategies)
|
|
|
|
## Understanding Flakiness Types
|
|
|
|
### Categories of Flakiness
|
|
|
|
Most flaky tests fall into distinct categories requiring different remediation:
|
|
|
|
| Category | Symptoms | Common Causes |
|
|
| --------------------------- | ------------------------------- | ------------------------------------------------------ |
|
|
| **UI-driven** | Element not found, click missed | Missing waits, animations, dynamic rendering |
|
|
| **Environment-driven** | CI-only failures | Slower CPU, memory limits, cold browser starts |
|
|
| **Data/parallelism-driven** | Fails with multiple workers | Shared backend data, reused accounts, state collisions |
|
|
| **Test-suite-driven** | Fails when run with other tests | Leaked state, shared fixtures, order dependencies |
|
|
|
|
### Flakiness Decision Tree
|
|
|
|
```
|
|
Test fails intermittently
|
|
├─ Fails locally too?
|
|
│ ├─ YES → Timing/async issue → Check waits and assertions
|
|
│ └─ NO → CI-specific → Check environment differences
|
|
│
|
|
├─ Fails only with multiple workers?
|
|
│ └─ YES → Parallelism issue → Check data isolation
|
|
│
|
|
├─ Fails only when run after specific tests?
|
|
│ └─ YES → State leak → Check fixtures and cleanup
|
|
│
|
|
└─ Fails randomly regardless of conditions?
|
|
└─ External dependency → Check network/API stability
|
|
```
|
|
|
|
## Detection and Reproduction
|
|
|
|
### Confirming Flakiness
|
|
|
|
```bash
|
|
# Run test multiple times to confirm instability
|
|
npx playwright test tests/checkout.spec.ts --repeat-each=20
|
|
|
|
# Run with single worker to isolate parallelism issues
|
|
npx playwright test --workers=1
|
|
|
|
# Run in CI-like conditions locally
|
|
CI=true npx playwright test --repeat-each=10
|
|
```
|
|
|
|
### Reproduction Strategies
|
|
|
|
```typescript
|
|
// playwright.config.ts - Enable artifacts for flaky test investigation
|
|
export default defineConfig({
|
|
retries: process.env.CI ? 2 : 0,
|
|
use: {
|
|
trace: "on-first-retry", // Capture trace on retry
|
|
video: "retain-on-failure",
|
|
screenshot: "only-on-failure",
|
|
},
|
|
});
|
|
```
|
|
|
|
### Identify Flaky Tests Programmatically
|
|
|
|
```typescript
|
|
// Track test results across runs
|
|
test.afterEach(async ({}, testInfo) => {
|
|
if (testInfo.retry > 0 && testInfo.status === "passed") {
|
|
console.warn(`FLAKY: ${testInfo.title} passed on retry ${testInfo.retry}`);
|
|
// Log to your tracking system
|
|
}
|
|
});
|
|
```
|
|
|
|
## Root Cause Analysis
|
|
|
|
### Event Logging for Race Conditions
|
|
|
|
Add comprehensive event logging to expose timing issues:
|
|
|
|
```typescript
|
|
test.beforeEach(async ({ page }) => {
|
|
page.on("console", (msg) =>
|
|
console.log(`CONSOLE [${msg.type()}]:`, msg.text()),
|
|
);
|
|
page.on("pageerror", (err) => console.error("PAGE ERROR:", err.message));
|
|
page.on("requestfailed", (req) =>
|
|
console.error(`REQUEST FAILED: ${req.url()}`),
|
|
);
|
|
});
|
|
```
|
|
|
|
> **For comprehensive console error handling** (fail on errors, allowed patterns, fixtures), see [console-errors.md](console-errors.md).
|
|
|
|
### Network Timing Analysis
|
|
|
|
```typescript
|
|
// Capture slow or failed requests
|
|
test.beforeEach(async ({ page }) => {
|
|
const slowRequests: string[] = [];
|
|
|
|
page.on("requestfinished", (request) => {
|
|
const timing = request.timing();
|
|
const duration = timing.responseEnd - timing.requestStart;
|
|
if (duration > 2000) {
|
|
slowRequests.push(`${request.url()} took ${duration}ms`);
|
|
}
|
|
});
|
|
|
|
page.on("requestfailed", (request) => {
|
|
console.error(`Failed: ${request.url()} - ${request.failure()?.errorText}`);
|
|
});
|
|
});
|
|
```
|
|
|
|
### Trace Analysis
|
|
|
|
```bash
|
|
# View trace from failed CI run
|
|
npx playwright show-trace path/to/trace.zip
|
|
|
|
# Generate trace for specific test
|
|
npx playwright test tests/flaky.spec.ts --trace on
|
|
```
|
|
|
|
## Fixing Strategies by Type
|
|
|
|
### UI-Driven Flakiness
|
|
|
|
**Problem: Element not ready when action executes**
|
|
|
|
```typescript
|
|
// ❌ BAD: No wait for element state
|
|
await page.click("#submit");
|
|
await page.fill("#username", "test"); // Element may not be ready
|
|
|
|
// ✅ GOOD: Actions + assertions pattern (auto-waiting built-in)
|
|
await page.getByRole("button", { name: "Submit" }).click();
|
|
await expect(page.getByRole("heading", { name: "Dashboard" })).toBeVisible();
|
|
```
|
|
|
|
**Problem: Animations or transitions interfere**
|
|
|
|
```typescript
|
|
// ❌ BAD: Click during animation
|
|
await page.click(".menu-item");
|
|
|
|
// ✅ GOOD: Wait for animation to complete
|
|
await page.getByRole("menuitem", { name: "Settings" }).click();
|
|
await expect(page.getByRole("dialog")).toBeVisible();
|
|
// Or disable animations entirely
|
|
await page.emulateMedia({ reducedMotion: "reduce" });
|
|
```
|
|
|
|
**Problem: Brittle selectors**
|
|
|
|
```typescript
|
|
// ❌ BAD: Fragile CSS chain
|
|
await page.click("div.container > div:nth-child(2) > button.btn-primary");
|
|
|
|
// ✅ GOOD: Semantic selectors
|
|
await page.getByRole("button", { name: "Continue" }).click();
|
|
await page.getByTestId("checkout-button").click();
|
|
await page.getByLabel("Email address").fill("test@example.com");
|
|
```
|
|
|
|
### Async/Timing Flakiness
|
|
|
|
**Problem: Race between test and application**
|
|
|
|
```typescript
|
|
// ❌ BAD: Arbitrary sleep
|
|
await page.click("#load-data");
|
|
await page.waitForTimeout(3000); // Hope data loads in 3s
|
|
|
|
// ✅ GOOD: Wait for specific condition
|
|
await page.click("#load-data");
|
|
await expect(page.locator(".data-row")).toHaveCount(10, { timeout: 10000 });
|
|
|
|
// ✅ BETTER: Wait for network response, then assert
|
|
const responsePromise = page.waitForResponse(
|
|
(r) =>
|
|
r.url().includes("/api/data") &&
|
|
r.request().method() === "GET" &&
|
|
r.ok(),
|
|
);
|
|
await page.click("#load-data");
|
|
await responsePromise;
|
|
await expect(page.locator(".data-row")).toHaveCount(10);
|
|
```
|
|
|
|
> **For comprehensive waiting strategies** (navigation, element state, network, polling with `toPass()`), see [assertions-waiting.md](assertions-waiting.md#waiting-strategies).
|
|
|
|
**Problem: Complex async state**
|
|
|
|
```typescript
|
|
// Custom wait for application-specific conditions
|
|
await page.waitForFunction(() => {
|
|
const app = (window as any).__APP_STATE__;
|
|
return app?.isReady && !app?.isLoading;
|
|
});
|
|
|
|
// Wait for multiple conditions
|
|
await Promise.all([
|
|
page.waitForResponse("**/api/user"),
|
|
page.waitForResponse("**/api/settings"),
|
|
page.getByRole("button", { name: "Load" }).click(),
|
|
]);
|
|
```
|
|
|
|
### Data/Parallelism-Driven Flakiness
|
|
|
|
**Problem: Tests share backend data**
|
|
|
|
```typescript
|
|
// ❌ BAD: All workers use same user
|
|
const testUser = { email: "test@example.com", password: "pass123" };
|
|
|
|
// ✅ GOOD: Unique data per worker
|
|
import { test as base } from "@playwright/test";
|
|
|
|
export const test = base.extend<
|
|
{},
|
|
{ testUser: { email: string; id: string } }
|
|
>({
|
|
testUser: [
|
|
async ({}, use, workerInfo) => {
|
|
const email = `test-${workerInfo.workerIndex}-${Date.now()}@example.com`;
|
|
const user = await createTestUser(email);
|
|
await use(user);
|
|
await deleteTestUser(user.id);
|
|
},
|
|
{ scope: "worker" },
|
|
],
|
|
});
|
|
```
|
|
|
|
**Problem: Shared storageState across workers**
|
|
|
|
```typescript
|
|
// ❌ BAD: All workers share same auth state
|
|
use: {
|
|
storageState: '.auth/user.json',
|
|
}
|
|
|
|
// ✅ GOOD: Per-worker auth state
|
|
export const test = base.extend<{}, { workerStorageState: string }>({
|
|
workerStorageState: [
|
|
async ({ browser }, use, workerInfo) => {
|
|
const id = workerInfo.workerIndex;
|
|
const fileName = `.auth/user-${id}.json`;
|
|
|
|
if (!fs.existsSync(fileName)) {
|
|
const page = await browser.newPage({ storageState: undefined });
|
|
await authenticateUser(page, `worker${id}@test.com`);
|
|
await page.context().storageState({ path: fileName });
|
|
await page.close();
|
|
}
|
|
|
|
await use(fileName);
|
|
},
|
|
{ scope: "worker" },
|
|
],
|
|
});
|
|
```
|
|
|
|
### Test-Suite-Driven Flakiness (State Leaks)
|
|
|
|
**Problem: Tests affect each other**
|
|
|
|
```typescript
|
|
// ❌ BAD: Module-level state persists across tests
|
|
let sharedPage: Page;
|
|
|
|
test.beforeAll(async ({ browser }) => {
|
|
sharedPage = await browser.newPage(); // Shared across tests!
|
|
});
|
|
|
|
// ✅ GOOD: Use Playwright's default isolation (fresh context per test)
|
|
test("first test", async ({ page }) => {
|
|
// Fresh page for this test
|
|
});
|
|
|
|
test("second test", async ({ page }) => {
|
|
// Fresh page for this test
|
|
});
|
|
```
|
|
|
|
**Problem: Fixture cleanup not happening**
|
|
|
|
```typescript
|
|
// ✅ GOOD: Proper fixture with cleanup
|
|
export const test = base.extend<{ tempFile: string }>({
|
|
tempFile: async ({}, use) => {
|
|
const file = `/tmp/test-${Date.now()}.json`;
|
|
fs.writeFileSync(file, "{}");
|
|
|
|
await use(file);
|
|
|
|
// Cleanup always runs, even on failure
|
|
if (fs.existsSync(file)) {
|
|
fs.unlinkSync(file);
|
|
}
|
|
},
|
|
});
|
|
```
|
|
|
|
## CI-Specific Flakiness
|
|
|
|
### Why Tests Fail Only in CI
|
|
|
|
| CI Condition | Impact | Solution |
|
|
| ------------------ | ------------------------------------- | ---------------------------------------------------- |
|
|
| Slower CPU | Actions complete later than expected | Use auto-waiting, not timeouts |
|
|
| Cold browser start | No cached assets, slower initial load | Add explicit waits for first navigation |
|
|
| Headless mode | Different rendering behavior | Test locally in headless mode |
|
|
| Shared runners | Resource contention | Reduce parallelism or use dedicated runners |
|
|
| Network latency | API calls slower | Mock external APIs, increase timeouts for real calls |
|
|
|
|
### Simulating CI Locally
|
|
|
|
```bash
|
|
# Run headless with CI environment variable
|
|
CI=true npx playwright test
|
|
|
|
# Limit CPU (Linux/Mac)
|
|
cpulimit -l 50 -- npx playwright test
|
|
|
|
# Run in Docker matching CI environment
|
|
docker run -it --rm \
|
|
-v $(pwd):/work \
|
|
-w /work \
|
|
mcr.microsoft.com/playwright:v1.40.0-jammy \
|
|
npx playwright test
|
|
```
|
|
|
|
### Consistent Viewport and Scale
|
|
|
|
```typescript
|
|
// playwright.config.ts - Match CI rendering exactly
|
|
export default defineConfig({
|
|
use: {
|
|
viewport: { width: 1280, height: 720 },
|
|
deviceScaleFactor: 1,
|
|
},
|
|
});
|
|
```
|
|
|
|
### Network Stubbing for External APIs
|
|
|
|
```typescript
|
|
// Eliminate external API flakiness
|
|
test.beforeEach(async ({ page }) => {
|
|
// Stub unstable third-party APIs
|
|
await page.route("**/api.analytics.com/**", (route) =>
|
|
route.fulfill({ body: "" }),
|
|
);
|
|
await page.route("**/api.payment-provider.com/**", (route) =>
|
|
route.fulfill({ json: { status: "ok" } }),
|
|
);
|
|
});
|
|
|
|
// Test-specific stub
|
|
test("checkout with payment", async ({ page }) => {
|
|
await page.route("**/api/payment", (route) =>
|
|
route.fulfill({ json: { success: true, transactionId: "test-123" } }),
|
|
);
|
|
// Test proceeds with deterministic response
|
|
});
|
|
```
|
|
|
|
## Quarantine and Management
|
|
|
|
### Quarantine Pattern
|
|
|
|
```typescript
|
|
// playwright.config.ts - Separate flaky tests
|
|
export default defineConfig({
|
|
projects: [
|
|
{
|
|
name: "stable",
|
|
testIgnore: ["**/*.flaky.spec.ts"],
|
|
},
|
|
{
|
|
name: "quarantine",
|
|
testMatch: ["**/*.flaky.spec.ts"],
|
|
retries: 3,
|
|
},
|
|
],
|
|
});
|
|
```
|
|
|
|
### Annotation-Based Quarantine
|
|
|
|
```typescript
|
|
// Mark flaky tests with annotations
|
|
test("intermittent checkout issue", async ({ page }, testInfo) => {
|
|
testInfo.annotations.push({
|
|
type: "flaky",
|
|
description: "Investigating payment API timing - JIRA-1234",
|
|
});
|
|
|
|
// Test implementation
|
|
});
|
|
|
|
// Skip flaky test conditionally
|
|
test("known CI flaky", async ({ page }) => {
|
|
test.skip(!!process.env.CI, "Flaky in CI - investigating JIRA-5678");
|
|
// Test implementation
|
|
});
|
|
```
|
|
|
|
## Prevention Strategies
|
|
|
|
### Test Burn-In
|
|
|
|
```bash
|
|
# Run new tests many times before merging
|
|
npx playwright test tests/new-feature.spec.ts --repeat-each=50
|
|
|
|
# Run in parallel to expose race conditions
|
|
npx playwright test tests/new-feature.spec.ts --repeat-each=20 --workers=4
|
|
```
|
|
|
|
### Isolation Checklist
|
|
|
|
```typescript
|
|
// ✅ Each test should be self-contained
|
|
test.describe("User profile", () => {
|
|
test("can update name", async ({ page, testUser }) => {
|
|
// Uses unique testUser fixture
|
|
// No dependency on other tests
|
|
// Cleanup handled by fixture
|
|
});
|
|
|
|
test("can update email", async ({ page, testUser }) => {
|
|
// Independent of "can update name"
|
|
// Own testUser, own state
|
|
});
|
|
});
|
|
```
|
|
|
|
### Defensive Assertions
|
|
|
|
```typescript
|
|
// ❌ BAD: Single point of failure
|
|
await expect(page.locator(".items")).toHaveCount(5);
|
|
|
|
// ✅ GOOD: Progressive assertions that help diagnose
|
|
await expect(page.locator(".items-container")).toBeVisible();
|
|
await expect(page.locator(".loading")).not.toBeVisible();
|
|
await expect(page.locator(".items")).toHaveCount(5);
|
|
```
|
|
|
|
### Retry Budget
|
|
|
|
```typescript
|
|
// playwright.config.ts - Limit retries to avoid masking issues
|
|
export default defineConfig({
|
|
retries: process.env.CI ? 2 : 0, // Only retry in CI
|
|
expect: {
|
|
timeout: 10000, // Reasonable assertion timeout
|
|
},
|
|
timeout: 60000, // Test timeout
|
|
});
|
|
```
|
|
|
|
## Anti-Patterns to Avoid
|
|
|
|
| Anti-Pattern | Problem | Solution |
|
|
| ----------------------------------------- | ----------------------------------- | ---------------------------------------------- |
|
|
| `waitForTimeout()` as primary wait | Arbitrary, hides real timing issues | Use auto-waiting assertions |
|
|
| Increasing global timeout to "fix" flakes | Masks root cause, slows all tests | Find and fix actual timing issue |
|
|
| Retrying until pass | Hides systemic problems | Fix root cause, use retries for diagnosis only |
|
|
| Shared test data across workers | Race conditions, collisions | Isolate data per worker |
|
|
| Testing real external APIs | Network variability | Mock external dependencies |
|
|
| Module-level mutable state | Leaks between tests | Use fixtures with proper cleanup |
|
|
| Ignoring flaky tests | Problem compounds over time | Quarantine and track for fixing |
|
|
|
|
## Related References
|
|
|
|
- **Debugging**: See [debugging.md](debugging.md) for trace viewer and inspector
|
|
- **Fixtures**: See [fixtures-hooks.md](../core/fixtures-hooks.md) for worker-scoped isolation
|
|
- **Performance**: See [performance.md](../infrastructure-ci-cd/performance.md) for parallel execution patterns
|
|
- **Assertions**: See [assertions-waiting.md](../core/assertions-waiting.md) for auto-waiting patterns
|
|
- **Global Setup**: See [global-setup.md](../core/global-setup.md) for setup vs fixtures decision
|