Follow-up to #58/#59, which fixed numeric params but left the booleans.
MCP clients (e.g. Claude Desktop) send `true` as the JSON string `"true"`,
which serde's default bool deserializer rejects with
`invalid type: string "true", expected a boolean`, failing the call.
Adds a `deser_opt_bool_or_str` helper (same untagged pattern as the #59
numeric helpers) that accepts a JSON boolean OR "true"/"false"
(case-insensitive, trimmed) and rejects anything else with a clear error.
Numeric-looking strings like "1" are intentionally NOT coerced to bool.
Applied to every Option<bool> tool param:
- scrape -> only_main_content
- crawl -> use_sitemap
- research -> deep
- search -> scrape (added by the standalone-search slice, #63)
16 unit tests (bool / "true"-string / absent->None / garbage->error per
field). No new dependencies.
Fixes#62.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rescued from the stale perf/audit-fixes branch and ported cleanly onto
current main. OSS surfaces can now search without the hosted webclaw API
when the caller supplies their own Serper.dev key (free at serper.dev).
- webclaw-fetch::search() — calls Serper.dev directly (plain wreq client;
a JSON API needs no fingerprinting) and, with scrape=true, fetches +
extracts the top result pages concurrently (bounded) via the caller's
FetchClient. parse_serper_organic() is pure and unit-tested.
- MCP `search` tool: local-first — uses SERPER_API_KEY when set, else
falls back to the hosted webclaw API. Adds country/lang/scrape params.
- OSS REST server: POST /v1/search, gated on SERPER_API_KEY (501 when
unset, with a setup hint). Adds ApiError::NotImplemented.
- CLI: `webclaw search <query> [--serper-key|SERPER_API_KEY] [--num]
[--country] [--lang] [--scrape] [--format]`.
No new dependencies (reuses futures-util already in the tree). Original
work by the prior author on perf/audit-fixes; this re-applies only the
search slice onto main.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reformat the string-or-number deserialize helpers and tests to satisfy
`cargo fmt --check` (style_edition 2024), which the lint CI job enforces.
Formatting only — no behavior change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
MCP clients (Claude Desktop, VS Code Copilot, etc.) serialize numeric
tool arguments as JSON strings ("3" instead of 3). serde's built-in
u32/usize deserialisers reject these with:
invalid type: string "N", expected u32
Add two private coercion helpers — `deser_opt_u32_or_str` and
`deser_opt_usize_or_str` — that accept both JSON number and JSON string
representations, falling back to `str::parse` for the string form and
returning a clear custom error for non-numeric strings.
Annotate the six affected optional fields:
CrawlParams: depth (u32), max_pages (usize), concurrency (usize)
BatchParams: concurrency (usize)
SearchParams: num_results (u32)
SummarizeParams: max_sentences (usize)
Add 24 unit tests (4 per field: numeric string → value, native number
→ value, absent → None, non-numeric string → Err) verified green via
an isolated serde-only crate.
Fixes#58
Wires the vertical extractor catalog into both the CLI and the MCP
server so users don't have to hit the HTTP API to invoke them. Same
semantics as `/v1/scrape/{vertical}` + `/v1/extractors`.
CLI (webclaw-cli):
- New subcommand `webclaw extractors` lists all 28 extractors with
name, label, and sample URL. `--json` flag emits the full catalog
as machine-readable JSON.
- New subcommand `webclaw vertical <name> <url>` runs a specific
extractor and prints typed JSON. Pretty-printed by default; `--raw`
for single-line. Exits 1 with a clear "URL does not match" error
on mismatch.
- FetchClient built with Firefox profile + cloud fallback attached
when WEBCLAW_API_KEY is set, so antibot-gated verticals escalate.
MCP (webclaw-mcp):
- New tool `list_extractors` (no args) returns the catalog as
pretty-printed JSON for in-session discovery.
- New tool `vertical_scrape` takes `{name, url}` and returns typed
JSON. Reuses the long-lived self.fetch_client.
- Tool count goes from 10 to 12. Server-info instruction string
updated accordingly.
Tests: 215 passing, clippy clean. Manual surface-tested end-to-end:
CLI prints real Reddit/github/pypi data; MCP JSON-RPC session returns
28-entry catalog + typed responses for pypi/requests + rust-lang/rust
in 200-400ms.
Version bumped to 0.5.2 (minor for API additions, backwards compatible).
- --cookie-file reads Chrome extension format ([{name, value, domain, ...}])
- Works with EditThisCookie, Cookie-Editor, and similar browser extensions
- Merges with --cookie when both provided
- MCP scrape tool now accepts cookies parameter
- Closes#7
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>