webclaw

mirror of https://github.com/0xMassi/webclaw.git synced 2026-07-15 06:02:11 +02:00

Valerio 0daa2fec1a Some checks are pending CI / Test (push) Waiting to run CI / Lint (push) Waiting to run CI / Docs (push) Waiting to run feat(cli+mcp): vertical extractor support (28 extractors discoverable + callable) Wires the vertical extractor catalog into both the CLI and the MCP server so users don't have to hit the HTTP API to invoke them. Same semantics as `/v1/scrape/{vertical}` + `/v1/extractors`. CLI (webclaw-cli): - New subcommand `webclaw extractors` lists all 28 extractors with name, label, and sample URL. `--json` flag emits the full catalog as machine-readable JSON. - New subcommand `webclaw vertical <name> <url>` runs a specific extractor and prints typed JSON. Pretty-printed by default; `--raw` for single-line. Exits 1 with a clear "URL does not match" error on mismatch. - FetchClient built with Firefox profile + cloud fallback attached when WEBCLAW_API_KEY is set, so antibot-gated verticals escalate. MCP (webclaw-mcp): - New tool `list_extractors` (no args) returns the catalog as pretty-printed JSON for in-session discovery. - New tool `vertical_scrape` takes `{name, url}` and returns typed JSON. Reuses the long-lived self.fetch_client. - Tool count goes from 10 to 12. Server-info instruction string updated accordingly. Tests: 215 passing, clippy clean. Manual surface-tested end-to-end: CLI prints real Reddit/github/pypi data; MCP JSON-RPC session returns 28-entry catalog + typed responses for pypi/requests + rust-lang/rust in 200-400ms. Version bumped to 0.5.2 (minor for API additions, backwards compatible).		2026-04-22 21:41:15 +02:00
..
webclaw-cli	feat(cli+mcp): vertical extractor support (28 extractors discoverable + callable)	2026-04-22 21:41:15 +02:00
webclaw-core	style: cargo fmt	2026-04-17 12:03:22 +02:00
webclaw-fetch	feat(fetch): Fetcher trait so vertical extractors work under any HTTP backend	2026-04-22 21:17:50 +02:00
webclaw-llm	feat(fetch,llm): DoS hardening + glob validation + cleanup (P2) (#22 )	2026-04-16 19:44:08 +02:00
webclaw-mcp	feat(cli+mcp): vertical extractor support (28 extractors discoverable + callable)	2026-04-22 21:41:15 +02:00
webclaw-pdf	Initial release: webclaw v0.1.0 — web content extraction for LLMs	2026-03-23 18:31:11 +01:00
webclaw-server	feat(extractors): wave 5 \u2014 Amazon, eBay, Trustpilot via cloud fallback	2026-04-22 16:16:11 +02:00