webclaw/examples/cloudflare-diagnostics
2026-05-18 18:56:00 +02:00
..
README.md docs: add workflow examples 2026-05-18 18:56:00 +02:00

Cloudflare Diagnostics

Use this checklist when a page works in the browser but fails from a scraper, returns a challenge page, or produces empty extracted content.

1. Save the Raw Response

webclaw https://protected.example.com --raw-html > raw.html

Inspect raw.html for challenge copy, blocked request text, empty shells, or application HTML that needs JavaScript rendering.

2. Compare Extracted Formats

webclaw https://protected.example.com --format markdown > page.md
webclaw https://protected.example.com --format json > page.json
webclaw https://protected.example.com --format llm > page.txt

If raw HTML has content but markdown is empty, tune extraction with selectors:

webclaw https://protected.example.com \
  --include "main, article, [role=main]" \
  --exclude "nav, footer, aside, .cookie-banner" \
  --format markdown

3. Try Another Browser Fingerprint

webclaw https://protected.example.com --browser firefox --format markdown
webclaw https://protected.example.com --browser random --format markdown

4. Use Cloud Fallback

export WEBCLAW_API_KEY=wc_your_key

webclaw https://protected.example.com --cloud --format markdown

Cloud mode can use hosted routing, JS rendering, and protected-site handling that are not part of the fully local open-source path.

5. Keep a Reproducible Report

When reporting a problem, include:

  • target URL
  • command used
  • selected format
  • whether --raw-html returned a challenge or normal page HTML
  • whether --browser firefox changed the result
  • whether cloud mode changed the result

Remove cookies, tokens, customer data, and private URLs before sharing logs.