webclaw/crates/webclaw-cli
Valerio 0daa2fec1a
Some checks are pending
CI / Test (push) Waiting to run
CI / Lint (push) Waiting to run
CI / Docs (push) Waiting to run
feat(cli+mcp): vertical extractor support (28 extractors discoverable + callable)
Wires the vertical extractor catalog into both the CLI and the MCP
server so users don't have to hit the HTTP API to invoke them. Same
semantics as `/v1/scrape/{vertical}` + `/v1/extractors`.

CLI (webclaw-cli):
- New subcommand `webclaw extractors` lists all 28 extractors with
  name, label, and sample URL. `--json` flag emits the full catalog
  as machine-readable JSON.
- New subcommand `webclaw vertical <name> <url>` runs a specific
  extractor and prints typed JSON. Pretty-printed by default; `--raw`
  for single-line. Exits 1 with a clear "URL does not match" error
  on mismatch.
- FetchClient built with Firefox profile + cloud fallback attached
  when WEBCLAW_API_KEY is set, so antibot-gated verticals escalate.

MCP (webclaw-mcp):
- New tool `list_extractors` (no args) returns the catalog as
  pretty-printed JSON for in-session discovery.
- New tool `vertical_scrape` takes `{name, url}` and returns typed
  JSON. Reuses the long-lived self.fetch_client.
- Tool count goes from 10 to 12. Server-info instruction string
  updated accordingly.

Tests: 215 passing, clippy clean. Manual surface-tested end-to-end:
CLI prints real Reddit/github/pypi data; MCP JSON-RPC session returns
28-entry catalog + typed responses for pypi/requests + rust-lang/rust
in 200-400ms.

Version bumped to 0.5.2 (minor for API additions, backwards compatible).
2026-04-22 21:41:15 +02:00
..
src feat(cli+mcp): vertical extractor support (28 extractors discoverable + callable) 2026-04-22 21:41:15 +02:00
Cargo.toml fix(cli): close --on-change command injection via sh -c (P0) (#20) 2026-04-16 18:37:02 +02:00