webclaw/crates/webclaw-core/src
Valerio 5e4970f105 fix: harden resource limits, path safety, and WASM build
Security audit follow-up across the workspace:

- webclaw-core: keep the crate WASM-safe. quickjs/rquickjs is now a
  cfg(not(wasm32)) target dependency and the extraction entry point uses
  a direct call on wasm instead of spawning a thread, so it builds and
  runs on wasm32 with or without default features.
- webclaw-core: bound the structured-data scrubber recursion (depth cap)
  so deeply nested attacker JSON-LD / __NEXT_DATA__ cannot exhaust the
  stack.
- webclaw-fetch: stream the response body with a running ceiling so a
  small highly compressed payload cannot inflate to gigabytes in memory;
  redact user:pass@ from proxy URLs before they reach error strings.
- webclaw-cli: contain output filenames inside the chosen directory
  (reject .. / absolute, drop traversal path segments), run --webhook
  URLs through the public-URL SSRF guard, clamp --watch-interval to >=1s,
  and make research slug truncation char-safe.
- webclaw-mcp: char-safe slug truncation (no multibyte slice panic).
- setup.sh / deploy/hetzner.sh: replace eval on read input with
  printf -v, and mask auth key / API token in console output.
- CI: enforce the wasm32 build invariant for webclaw-core.

Tests added for every behavioral change. Bump to 0.6.3 + CHANGELOG.
2026-05-19 12:14:45 +02:00
..
llm fix: harden resource limits, path safety, and WASM build 2026-05-19 12:14:45 +02:00
brand.rs fix: improve brand extraction signals 2026-05-04 21:25:07 +02:00
data_island.rs feat: SvelteKit data extraction + license change to AGPL-3.0 2026-04-01 20:37:56 +02:00
diff.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
domain.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
error.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
extractor.rs style: cargo fmt 2026-04-17 12:03:22 +02:00
js_eval.rs fix(security): harden local fetch surfaces 2026-05-12 12:00:25 +02:00
lib.rs fix: harden resource limits, path safety, and WASM build 2026-05-19 12:14:45 +02:00
markdown.rs Improve --format llm output quality (#37) 2026-05-10 15:11:12 +02:00
metadata.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
noise.rs chore: bump to 0.3.9, fix formatting from #14 2026-04-04 15:24:17 +02:00
structured_data.rs fix: handle raw newlines in JSON-LD strings 2026-04-16 11:40:25 +02:00
types.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
youtube.rs feat: v0.1.2 — TLS fallback, Safari default, Reddit fix, YouTube transcript infra 2026-03-25 18:50:07 +01:00