webclaw/crates/webclaw-fetch/src
Valerio 06f151c560 feat(search): standalone web search via Serper.dev (bring-your-own-key)
Rescued from the stale perf/audit-fixes branch and ported cleanly onto
current main. OSS surfaces can now search without the hosted webclaw API
when the caller supplies their own Serper.dev key (free at serper.dev).

- webclaw-fetch::search() — calls Serper.dev directly (plain wreq client;
  a JSON API needs no fingerprinting) and, with scrape=true, fetches +
  extracts the top result pages concurrently (bounded) via the caller's
  FetchClient. parse_serper_organic() is pure and unit-tested.
- MCP `search` tool: local-first — uses SERPER_API_KEY when set, else
  falls back to the hosted webclaw API. Adds country/lang/scrape params.
- OSS REST server: POST /v1/search, gated on SERPER_API_KEY (501 when
  unset, with a setup hint). Adds ApiError::NotImplemented.
- CLI: `webclaw search <query> [--serper-key|SERPER_API_KEY] [--num]
  [--country] [--lang] [--scrape] [--format]`.

No new dependencies (reuses futures-util already in the tree). Original
work by the prior author on perf/audit-fixes; this re-applies only the
search slice onto main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 15:10:58 +02:00
..
extractors feat(reddit): parse old.reddit.com HTML instead of the dead .json API 2026-06-04 17:36:02 +02:00
browser.rs Bump to 0.5.4: SafariIos profile + Chrome fingerprint alignment + locale helper 2026-04-23 12:58:24 +02:00
client.rs fix: harden LLM providers, UTF-8 handling, and webhook/batch reliability 2026-06-09 21:10:15 +02:00
cloud.rs feat(core): endpoints module for API surface extraction from HTML and JS (#47) 2026-05-19 19:05:16 +02:00
crawler.rs polish(fetch,mcp): robots parser + firefox client cache + Acquire ordering (P3) (#23) 2026-04-16 20:21:32 +02:00
document.rs feat: replace custom TLS stack with wreq (BoringSSL), bump v0.3.3 2026-04-01 18:04:55 +02:00
error.rs feat: replace custom TLS stack with wreq (BoringSSL), bump v0.3.3 2026-04-01 18:04:55 +02:00
fetcher.rs feat(fetch): Fetcher trait so vertical extractors work under any HTTP backend 2026-04-22 21:17:50 +02:00
lib.rs feat(search): standalone web search via Serper.dev (bring-your-own-key) 2026-06-17 15:10:58 +02:00
linkedin.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
locale.rs Bump to 0.5.4: SafariIos profile + Chrome fingerprint alignment + locale helper 2026-04-23 12:58:24 +02:00
progress.rs style: apply rustfmt to salvaged #49 commits 2026-06-09 11:24:13 +02:00
proxy.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
reddit.rs feat(reddit): parse old.reddit.com HTML instead of the dead .json API 2026-06-04 17:36:02 +02:00
search.rs feat(search): standalone web search via Serper.dev (bring-your-own-key) 2026-06-17 15:10:58 +02:00
sitemap.rs polish(fetch,mcp): robots parser + firefox client cache + Acquire ordering (P3) (#23) 2026-04-16 20:21:32 +02:00
tls.rs chore(deps): bump wreq 6.0.0-rc.29, wreq-util 3.0.0-rc.12 2026-06-09 12:38:03 +02:00
url_security.rs fix(security): harden local fetch surfaces 2026-05-12 12:00:25 +02:00