mirror of
https://github.com/0xMassi/webclaw.git
synced 2026-06-06 22:05:13 +02:00
1.7 KiB
1.7 KiB
Cloudflare Diagnostics
Use this checklist when a page works in the browser but fails from a scraper, returns a challenge page, or produces empty extracted content.
1. Save the Raw Response
webclaw https://protected.example.com --raw-html > raw.html
Inspect raw.html for challenge copy, blocked request text, empty shells, or application HTML that needs JavaScript rendering.
2. Compare Extracted Formats
webclaw https://protected.example.com --format markdown > page.md
webclaw https://protected.example.com --format json > page.json
webclaw https://protected.example.com --format llm > page.txt
If raw HTML has content but markdown is empty, tune extraction with selectors:
webclaw https://protected.example.com \
--include "main, article, [role=main]" \
--exclude "nav, footer, aside, .cookie-banner" \
--format markdown
3. Try Another Browser Fingerprint
webclaw https://protected.example.com --browser firefox --format markdown
webclaw https://protected.example.com --browser random --format markdown
4. Use Cloud Fallback
export WEBCLAW_API_KEY=wc_your_key
webclaw https://protected.example.com --cloud --format markdown
Cloud mode can use hosted routing, JS rendering, and protected-site handling that are not part of the fully local open-source path.
5. Keep a Reproducible Report
When reporting a problem, include:
- target URL
- command used
- selected format
- whether
--raw-htmlreturned a challenge or normal page HTML - whether
--browser firefoxchanged the result - whether cloud mode changed the result
Remove cookies, tokens, customer data, and private URLs before sharing logs.