fix: harden LLM providers, UTF-8 handling, and webhook/batch reliability

- webclaw-llm: add explicit request + connect timeouts to the reqwest client in every provider (anthropic, openai, ollama) with a shorter timeout on the ollama health check, so a stalled provider fails fast. - webclaw-llm: fix a panic when truncating a provider error body that contains multibyte characters near the 500-char cut (char-safe take). - webclaw-core: snap the endpoint-scan budget cut to a UTF-8 char boundary so oversized scripts with non-ASCII content no longer panic. - webclaw-core: rewrite js_literal_to_json to copy raw bytes instead of `byte as char`, preserving multibyte UTF-8 in SvelteKit string values rather than producing Latin-1 mojibake. - webclaw-cli: have fire_webhook return its JoinHandle and await it at the crawl/batch/batch-llm call sites, removing the fixed 500ms sleeps. - webclaw-mcp: drop the up-front DNS pre-validation loop in batch that aborted the whole request on one bad URL; the fetch layer already applies the same SSRF guard per URL and reports per-URL errors. - webclaw-fetch: include the port in the warmup homepage URL so hosts on a non-default port are warmed correctly. Adds regression tests for the UTF-8 endpoint-scan and SvelteKit cases.
2026-07-04 04:21:00 +02:00 · 2026-06-09 21:10:15 +02:00 · 2026-06-09 21:10:15 +02:00 · 499345046c
commit 499345046c
parent d0d7b835f2
9 changed files with 117 additions and 51 deletions
--- a/crates/webclaw-mcp/src/server.rs
+++ b/crates/webclaw-mcp/src/server.rs
@ -323,9 +323,10 @@ impl WebclawMcp {
        if params.urls.len() > 100 {
            return Err("batch is limited to 100 URLs per request".into());
        }
-        for u in &params.urls {
-            validate_url(u).await?;
-        }
+        // No up-front DNS pre-validation: it aborted the whole batch on a
+        // single unresolvable URL. The fetch layer applies the same SSRF
+        // guard (validate_public_http_url) per URL, so bad entries surface
+        // as individual per-URL errors below instead of failing the batch.

        let format = params.format.as_deref().unwrap_or("markdown");
        let concurrency = params.concurrency.unwrap_or(5);