mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-05-13 01:32:40 +02:00
15 KiB
15 KiB
Debugging and Managing Flaky Tests
Table of Contents
- Understanding Flakiness Types
- Detection and Reproduction
- Root Cause Analysis
- Fixing Strategies by Type
- CI-Specific Flakiness
- Quarantine and Management
- Prevention Strategies
Understanding Flakiness Types
Categories of Flakiness
Most flaky tests fall into distinct categories requiring different remediation:
| Category | Symptoms | Common Causes |
|---|---|---|
| UI-driven | Element not found, click missed | Missing waits, animations, dynamic rendering |
| Environment-driven | CI-only failures | Slower CPU, memory limits, cold browser starts |
| Data/parallelism-driven | Fails with multiple workers | Shared backend data, reused accounts, state collisions |
| Test-suite-driven | Fails when run with other tests | Leaked state, shared fixtures, order dependencies |
Flakiness Decision Tree
Test fails intermittently
├─ Fails locally too?
│ ├─ YES → Timing/async issue → Check waits and assertions
│ └─ NO → CI-specific → Check environment differences
│
├─ Fails only with multiple workers?
│ └─ YES → Parallelism issue → Check data isolation
│
├─ Fails only when run after specific tests?
│ └─ YES → State leak → Check fixtures and cleanup
│
└─ Fails randomly regardless of conditions?
└─ External dependency → Check network/API stability
Detection and Reproduction
Confirming Flakiness
# Run test multiple times to confirm instability
npx playwright test tests/checkout.spec.ts --repeat-each=20
# Run with single worker to isolate parallelism issues
npx playwright test --workers=1
# Run in CI-like conditions locally
CI=true npx playwright test --repeat-each=10
Reproduction Strategies
// playwright.config.ts - Enable artifacts for flaky test investigation
export default defineConfig({
retries: process.env.CI ? 2 : 0,
use: {
trace: "on-first-retry", // Capture trace on retry
video: "retain-on-failure",
screenshot: "only-on-failure",
},
});
Identify Flaky Tests Programmatically
// Track test results across runs
test.afterEach(async ({}, testInfo) => {
if (testInfo.retry > 0 && testInfo.status === "passed") {
console.warn(`FLAKY: ${testInfo.title} passed on retry ${testInfo.retry}`);
// Log to your tracking system
}
});
Root Cause Analysis
Event Logging for Race Conditions
Add comprehensive event logging to expose timing issues:
test.beforeEach(async ({ page }) => {
page.on("console", (msg) =>
console.log(`CONSOLE [${msg.type()}]:`, msg.text()),
);
page.on("pageerror", (err) => console.error("PAGE ERROR:", err.message));
page.on("requestfailed", (req) =>
console.error(`REQUEST FAILED: ${req.url()}`),
);
});
For comprehensive console error handling (fail on errors, allowed patterns, fixtures), see console-errors.md.
Network Timing Analysis
// Capture slow or failed requests
test.beforeEach(async ({ page }) => {
const slowRequests: string[] = [];
page.on("requestfinished", (request) => {
const timing = request.timing();
const duration = timing.responseEnd - timing.requestStart;
if (duration > 2000) {
slowRequests.push(`${request.url()} took ${duration}ms`);
}
});
page.on("requestfailed", (request) => {
console.error(`Failed: ${request.url()} - ${request.failure()?.errorText}`);
});
});
Trace Analysis
# View trace from failed CI run
npx playwright show-trace path/to/trace.zip
# Generate trace for specific test
npx playwright test tests/flaky.spec.ts --trace on
Fixing Strategies by Type
UI-Driven Flakiness
Problem: Element not ready when action executes
// ❌ BAD: No wait for element state
await page.click("#submit");
await page.fill("#username", "test"); // Element may not be ready
// ✅ GOOD: Actions + assertions pattern (auto-waiting built-in)
await page.getByRole("button", { name: "Submit" }).click();
await expect(page.getByRole("heading", { name: "Dashboard" })).toBeVisible();
Problem: Animations or transitions interfere
// ❌ BAD: Click during animation
await page.click(".menu-item");
// ✅ GOOD: Wait for animation to complete
await page.getByRole("menuitem", { name: "Settings" }).click();
await expect(page.getByRole("dialog")).toBeVisible();
// Or disable animations entirely
await page.emulateMedia({ reducedMotion: "reduce" });
Problem: Brittle selectors
// ❌ BAD: Fragile CSS chain
await page.click("div.container > div:nth-child(2) > button.btn-primary");
// ✅ GOOD: Semantic selectors
await page.getByRole("button", { name: "Continue" }).click();
await page.getByTestId("checkout-button").click();
await page.getByLabel("Email address").fill("test@example.com");
Async/Timing Flakiness
Problem: Race between test and application
// ❌ BAD: Arbitrary sleep
await page.click("#load-data");
await page.waitForTimeout(3000); // Hope data loads in 3s
// ✅ GOOD: Wait for specific condition
await page.click("#load-data");
await expect(page.locator(".data-row")).toHaveCount(10, { timeout: 10000 });
// ✅ BETTER: Wait for network response, then assert
const responsePromise = page.waitForResponse(
(r) =>
r.url().includes("/api/data") &&
r.request().method() === "GET" &&
r.ok(),
);
await page.click("#load-data");
await responsePromise;
await expect(page.locator(".data-row")).toHaveCount(10);
For comprehensive waiting strategies (navigation, element state, network, polling with
toPass()), see assertions-waiting.md.
Problem: Complex async state
// Custom wait for application-specific conditions
await page.waitForFunction(() => {
const app = (window as any).__APP_STATE__;
return app?.isReady && !app?.isLoading;
});
// Wait for multiple conditions
await Promise.all([
page.waitForResponse("**/api/user"),
page.waitForResponse("**/api/settings"),
page.getByRole("button", { name: "Load" }).click(),
]);
Data/Parallelism-Driven Flakiness
Problem: Tests share backend data
// ❌ BAD: All workers use same user
const testUser = { email: "test@example.com", password: "pass123" };
// ✅ GOOD: Unique data per worker
import { test as base } from "@playwright/test";
export const test = base.extend<
{},
{ testUser: { email: string; id: string } }
>({
testUser: [
async ({}, use, workerInfo) => {
const email = `test-${workerInfo.workerIndex}-${Date.now()}@example.com`;
const user = await createTestUser(email);
await use(user);
await deleteTestUser(user.id);
},
{ scope: "worker" },
],
});
Problem: Shared storageState across workers
// ❌ BAD: All workers share same auth state
use: {
storageState: '.auth/user.json',
}
// ✅ GOOD: Per-worker auth state
export const test = base.extend<{}, { workerStorageState: string }>({
workerStorageState: [
async ({ browser }, use, workerInfo) => {
const id = workerInfo.workerIndex;
const fileName = `.auth/user-${id}.json`;
if (!fs.existsSync(fileName)) {
const page = await browser.newPage({ storageState: undefined });
await authenticateUser(page, `worker${id}@test.com`);
await page.context().storageState({ path: fileName });
await page.close();
}
await use(fileName);
},
{ scope: "worker" },
],
});
Test-Suite-Driven Flakiness (State Leaks)
Problem: Tests affect each other
// ❌ BAD: Module-level state persists across tests
let sharedPage: Page;
test.beforeAll(async ({ browser }) => {
sharedPage = await browser.newPage(); // Shared across tests!
});
// ✅ GOOD: Use Playwright's default isolation (fresh context per test)
test("first test", async ({ page }) => {
// Fresh page for this test
});
test("second test", async ({ page }) => {
// Fresh page for this test
});
Problem: Fixture cleanup not happening
// ✅ GOOD: Proper fixture with cleanup
export const test = base.extend<{ tempFile: string }>({
tempFile: async ({}, use) => {
const file = `/tmp/test-${Date.now()}.json`;
fs.writeFileSync(file, "{}");
await use(file);
// Cleanup always runs, even on failure
if (fs.existsSync(file)) {
fs.unlinkSync(file);
}
},
});
CI-Specific Flakiness
Why Tests Fail Only in CI
| CI Condition | Impact | Solution |
|---|---|---|
| Slower CPU | Actions complete later than expected | Use auto-waiting, not timeouts |
| Cold browser start | No cached assets, slower initial load | Add explicit waits for first navigation |
| Headless mode | Different rendering behavior | Test locally in headless mode |
| Shared runners | Resource contention | Reduce parallelism or use dedicated runners |
| Network latency | API calls slower | Mock external APIs, increase timeouts for real calls |
Simulating CI Locally
# Run headless with CI environment variable
CI=true npx playwright test
# Limit CPU (Linux/Mac)
cpulimit -l 50 -- npx playwright test
# Run in Docker matching CI environment
docker run -it --rm \
-v $(pwd):/work \
-w /work \
mcr.microsoft.com/playwright:v1.40.0-jammy \
npx playwright test
Consistent Viewport and Scale
// playwright.config.ts - Match CI rendering exactly
export default defineConfig({
use: {
viewport: { width: 1280, height: 720 },
deviceScaleFactor: 1,
},
});
Network Stubbing for External APIs
// Eliminate external API flakiness
test.beforeEach(async ({ page }) => {
// Stub unstable third-party APIs
await page.route("**/api.analytics.com/**", (route) =>
route.fulfill({ body: "" }),
);
await page.route("**/api.payment-provider.com/**", (route) =>
route.fulfill({ json: { status: "ok" } }),
);
});
// Test-specific stub
test("checkout with payment", async ({ page }) => {
await page.route("**/api/payment", (route) =>
route.fulfill({ json: { success: true, transactionId: "test-123" } }),
);
// Test proceeds with deterministic response
});
Quarantine and Management
Quarantine Pattern
// playwright.config.ts - Separate flaky tests
export default defineConfig({
projects: [
{
name: "stable",
testIgnore: ["**/*.flaky.spec.ts"],
},
{
name: "quarantine",
testMatch: ["**/*.flaky.spec.ts"],
retries: 3,
},
],
});
Annotation-Based Quarantine
// Mark flaky tests with annotations
test("intermittent checkout issue", async ({ page }, testInfo) => {
testInfo.annotations.push({
type: "flaky",
description: "Investigating payment API timing - JIRA-1234",
});
// Test implementation
});
// Skip flaky test conditionally
test("known CI flaky", async ({ page }) => {
test.skip(!!process.env.CI, "Flaky in CI - investigating JIRA-5678");
// Test implementation
});
Prevention Strategies
Test Burn-In
# Run new tests many times before merging
npx playwright test tests/new-feature.spec.ts --repeat-each=50
# Run in parallel to expose race conditions
npx playwright test tests/new-feature.spec.ts --repeat-each=20 --workers=4
Isolation Checklist
// ✅ Each test should be self-contained
test.describe("User profile", () => {
test("can update name", async ({ page, testUser }) => {
// Uses unique testUser fixture
// No dependency on other tests
// Cleanup handled by fixture
});
test("can update email", async ({ page, testUser }) => {
// Independent of "can update name"
// Own testUser, own state
});
});
Defensive Assertions
// ❌ BAD: Single point of failure
await expect(page.locator(".items")).toHaveCount(5);
// ✅ GOOD: Progressive assertions that help diagnose
await expect(page.locator(".items-container")).toBeVisible();
await expect(page.locator(".loading")).not.toBeVisible();
await expect(page.locator(".items")).toHaveCount(5);
Retry Budget
// playwright.config.ts - Limit retries to avoid masking issues
export default defineConfig({
retries: process.env.CI ? 2 : 0, // Only retry in CI
expect: {
timeout: 10000, // Reasonable assertion timeout
},
timeout: 60000, // Test timeout
});
Anti-Patterns to Avoid
| Anti-Pattern | Problem | Solution |
|---|---|---|
waitForTimeout() as primary wait |
Arbitrary, hides real timing issues | Use auto-waiting assertions |
| Increasing global timeout to "fix" flakes | Masks root cause, slows all tests | Find and fix actual timing issue |
| Retrying until pass | Hides systemic problems | Fix root cause, use retries for diagnosis only |
| Shared test data across workers | Race conditions, collisions | Isolate data per worker |
| Testing real external APIs | Network variability | Mock external dependencies |
| Module-level mutable state | Leaks between tests | Use fixtures with proper cleanup |
| Ignoring flaky tests | Problem compounds over time | Quarantine and track for fixing |
Related References
- Debugging: See debugging.md for trace viewer and inspector
- Fixtures: See fixtures-hooks.md for worker-scoped isolation
- Performance: See performance.md for parallel execution patterns
- Assertions: See assertions-waiting.md for auto-waiting patterns
- Global Setup: See global-setup.md for setup vs fixtures decision