webclaw/crates/webclaw-core/src
Valerio 499345046c fix: harden LLM providers, UTF-8 handling, and webhook/batch reliability
- webclaw-llm: add explicit request + connect timeouts to the reqwest
  client in every provider (anthropic, openai, ollama) with a shorter
  timeout on the ollama health check, so a stalled provider fails fast.
- webclaw-llm: fix a panic when truncating a provider error body that
  contains multibyte characters near the 500-char cut (char-safe take).
- webclaw-core: snap the endpoint-scan budget cut to a UTF-8 char
  boundary so oversized scripts with non-ASCII content no longer panic.
- webclaw-core: rewrite js_literal_to_json to copy raw bytes instead of
  `byte as char`, preserving multibyte UTF-8 in SvelteKit string values
  rather than producing Latin-1 mojibake.
- webclaw-cli: have fire_webhook return its JoinHandle and await it at
  the crawl/batch/batch-llm call sites, removing the fixed 500ms sleeps.
- webclaw-mcp: drop the up-front DNS pre-validation loop in batch that
  aborted the whole request on one bad URL; the fetch layer already
  applies the same SSRF guard per URL and reports per-URL errors.
- webclaw-fetch: include the port in the warmup homepage URL so hosts
  on a non-default port are warmed correctly.

Adds regression tests for the UTF-8 endpoint-scan and SvelteKit cases.
2026-06-09 21:10:15 +02:00
..
llm fix: harden resource limits, path safety, and WASM build (#46) 2026-05-19 17:03:52 +02:00
brand.rs fix: improve brand extraction signals 2026-05-04 21:25:07 +02:00
data_island.rs feat: SvelteKit data extraction + license change to AGPL-3.0 2026-04-01 20:37:56 +02:00
diff.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
domain.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
endpoints.rs fix: harden LLM providers, UTF-8 handling, and webhook/batch reliability 2026-06-09 21:10:15 +02:00
error.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
extractor.rs style: cargo fmt 2026-04-17 12:03:22 +02:00
js_eval.rs fix(security): harden local fetch surfaces 2026-05-12 12:00:25 +02:00
lib.rs feat(reddit): parse old.reddit.com HTML instead of the dead .json API 2026-06-04 17:36:02 +02:00
markdown.rs Improve --format llm output quality (#37) 2026-05-10 15:11:12 +02:00
metadata.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
noise.rs chore: bump to 0.3.9, fix formatting from #14 2026-04-04 15:24:17 +02:00
reddit.rs style(reddit): use Option::zip to satisfy clippy 2026-06-04 17:48:17 +02:00
structured_data.rs fix: harden LLM providers, UTF-8 handling, and webhook/batch reliability 2026-06-09 21:10:15 +02:00
types.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
youtube.rs feat: v0.1.2 — TLS fallback, Safari default, Reddit fix, YouTube transcript infra 2026-03-25 18:50:07 +01:00