mirror of
https://github.com/0xMassi/webclaw.git
synced 2026-06-06 22:05:13 +02:00
1.2 KiB
1.2 KiB
Proxy-Backed Crawling
Use proxy rotation when you need to distribute a crawl across a proxy pool. webclaw supports a single proxy or a proxy file.
Single Proxy
webclaw https://example.com \
--proxy http://user:pass@proxy.example.com:8080 \
--format markdown
SOCKS5 is supported too:
webclaw https://example.com \
--proxy socks5://proxy.example.com:1080 \
--format markdown
Proxy Pool
Create proxies.txt with one proxy per line:
http://user:pass@proxy-1.example.com:8080
http://user:pass@proxy-2.example.com:8080
http://user:pass@proxy-3.example.com:8080
Run a crawl with controlled concurrency:
webclaw https://docs.example.com \
--crawl \
--depth 2 \
--max-pages 100 \
--concurrency 10 \
--delay 200 \
--proxy-file proxies.txt \
--format markdown
Batch URLs
webclaw --urls-file urls.txt \
--proxy-file proxies.txt \
--concurrency 10 \
--format json
Proxy rotation helps with throughput and IP reputation. It does not replace request fingerprinting, JS rendering, or challenge handling for heavily protected sites. For those, use hosted cloud mode with WEBCLAW_API_KEY.