- The fastest web scraper for AI agents.
- 67% fewer tokens. Sub-millisecond extraction. Zero browser overhead.
+ Turn websites into clean markdown, JSON, and LLM-ready context.
+ CLI, MCP server, REST API, and SDKs for AI agents and RAG pipelines.
---
-Your AI agent calls `fetch()` and gets a 403. Or 142KB of raw HTML that burns through your token budget. **webclaw fixes both.**
+Most web scraping tools give your agent one of two bad outputs:
-It extracts clean, structured content from any URL using Chrome-level TLS fingerprinting — no headless browser, no Selenium, no Puppeteer. Output is optimized for LLMs: **67% fewer tokens** than raw HTML, with metadata, links, and images preserved.
+- a blocked page, login wall, or empty app shell
+- raw HTML full of nav, scripts, styling, ads, and duplicated boilerplate
+[webclaw.io](https://webclaw.io) is the hosted web extraction API for webclaw. This repo contains the open-source CLI, MCP server, extraction engine, and self-hostable server.
+
+webclaw turns a URL into clean content your tools can actually use.
+
+```bash
+webclaw https://example.com --format markdown
```
- Raw HTML webclaw
-┌──────────────────────────────────┐ ┌──────────────────────────────────┐
-│