mirror of
https://github.com/0xMassi/webclaw.git
synced 2026-06-10 22:45:13 +02:00
map falls back to a bounded same-origin crawl when a site has no sitemap or a thin one, harvesting links from each fetched page (the rich source). Adds gzip (.xml.gz) sitemap support, deeper sitemap-index recursion + more fallback paths, uncapped-by-default results with an optional --map-limit / --map-pages, and routes crawler logs to stderr so --map -f json stays machine-parseable. |
||
|---|---|---|
| .. | ||
| examples | ||
| src | ||
| Cargo.toml | ||