From af96628dc9c3ca3ba7f428967c49f0f668eda8e8 Mon Sep 17 00:00:00 2001 From: Valerio <88933932+0xMassi@users.noreply.github.com> Date: Sun, 10 May 2026 22:44:57 +0200 Subject: [PATCH] Revise README for clarity and updated content Updated the README to reflect changes in the project description, banner image size, and various content sections. Enhanced clarity on features and usage. --- README.md | 584 +++++++++++++++++++++++++----------------------------- 1 file changed, 275 insertions(+), 309 deletions(-) diff --git a/README.md b/README.md index 7d936c6..a663511 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,14 @@

- webclaw + webclaw

webclaw

- The fastest web scraper for AI agents.
- 67% fewer tokens. Sub-millisecond extraction. Zero browser overhead. + Turn websites into clean markdown, JSON, and LLM-ready context.
+ CLI, MCP server, REST API, and SDKs for AI agents and RAG pipelines.

@@ -17,64 +17,58 @@ License npm installs

+

Discord X / Twitter - Website + Hosted webclaw Docs

---- -

- Claude Code: web_fetch gets 403, webclaw extracts successfully -
- Claude Code's built-in web_fetch → 403 Forbidden. webclaw → clean markdown. + webclaw extracting clean markdown from a page

--- -Your AI agent calls `fetch()` and gets a 403. Or 142KB of raw HTML that burns through your token budget. **webclaw fixes both.** +Most web scraping tools give your agent one of two bad outputs: -It extracts clean, structured content from any URL using Chrome-level TLS fingerprinting — no headless browser, no Selenium, no Puppeteer. Output is optimized for LLMs: **67% fewer tokens** than raw HTML, with metadata, links, and images preserved. +- a blocked page, login wall, or empty app shell +- raw HTML full of nav, scripts, styling, ads, and duplicated boilerplate +[webclaw.io](https://webclaw.io) is the hosted web extraction API for webclaw. This repo contains the open-source CLI, MCP server, extraction engine, and self-hostable server. + +webclaw turns a URL into clean content your tools can actually use. + +```bash +webclaw https://example.com --format markdown ``` - Raw HTML webclaw -┌──────────────────────────────────┐ ┌──────────────────────────────────┐ -│
│ │ # Breaking: AI Breakthrough │ -│