webclaw/crates/webclaw-fetch/src
Valerio 499345046c fix: harden LLM providers, UTF-8 handling, and webhook/batch reliability
- webclaw-llm: add explicit request + connect timeouts to the reqwest
  client in every provider (anthropic, openai, ollama) with a shorter
  timeout on the ollama health check, so a stalled provider fails fast.
- webclaw-llm: fix a panic when truncating a provider error body that
  contains multibyte characters near the 500-char cut (char-safe take).
- webclaw-core: snap the endpoint-scan budget cut to a UTF-8 char
  boundary so oversized scripts with non-ASCII content no longer panic.
- webclaw-core: rewrite js_literal_to_json to copy raw bytes instead of
  `byte as char`, preserving multibyte UTF-8 in SvelteKit string values
  rather than producing Latin-1 mojibake.
- webclaw-cli: have fire_webhook return its JoinHandle and await it at
  the crawl/batch/batch-llm call sites, removing the fixed 500ms sleeps.
- webclaw-mcp: drop the up-front DNS pre-validation loop in batch that
  aborted the whole request on one bad URL; the fetch layer already
  applies the same SSRF guard per URL and reports per-URL errors.
- webclaw-fetch: include the port in the warmup homepage URL so hosts
  on a non-default port are warmed correctly.

Adds regression tests for the UTF-8 endpoint-scan and SvelteKit cases.
2026-06-09 21:10:15 +02:00
..
extractors feat(reddit): parse old.reddit.com HTML instead of the dead .json API 2026-06-04 17:36:02 +02:00
browser.rs Bump to 0.5.4: SafariIos profile + Chrome fingerprint alignment + locale helper 2026-04-23 12:58:24 +02:00
client.rs fix: harden LLM providers, UTF-8 handling, and webhook/batch reliability 2026-06-09 21:10:15 +02:00
cloud.rs feat(core): endpoints module for API surface extraction from HTML and JS (#47) 2026-05-19 19:05:16 +02:00
crawler.rs polish(fetch,mcp): robots parser + firefox client cache + Acquire ordering (P3) (#23) 2026-04-16 20:21:32 +02:00
document.rs feat: replace custom TLS stack with wreq (BoringSSL), bump v0.3.3 2026-04-01 18:04:55 +02:00
error.rs feat: replace custom TLS stack with wreq (BoringSSL), bump v0.3.3 2026-04-01 18:04:55 +02:00
fetcher.rs feat(fetch): Fetcher trait so vertical extractors work under any HTTP backend 2026-04-22 21:17:50 +02:00
lib.rs style: apply rustfmt to salvaged #49 commits 2026-06-09 11:24:13 +02:00
linkedin.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
locale.rs Bump to 0.5.4: SafariIos profile + Chrome fingerprint alignment + locale helper 2026-04-23 12:58:24 +02:00
progress.rs style: apply rustfmt to salvaged #49 commits 2026-06-09 11:24:13 +02:00
proxy.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
reddit.rs feat(reddit): parse old.reddit.com HTML instead of the dead .json API 2026-06-04 17:36:02 +02:00
sitemap.rs polish(fetch,mcp): robots parser + firefox client cache + Acquire ordering (P3) (#23) 2026-04-16 20:21:32 +02:00
tls.rs chore(deps): bump wreq 6.0.0-rc.29, wreq-util 3.0.0-rc.12 2026-06-09 12:38:03 +02:00
url_security.rs fix(security): harden local fetch surfaces 2026-05-12 12:00:25 +02:00