webclaw

mirror of https://github.com/0xMassi/webclaw.git synced 2026-07-24 07:31:01 +02:00

Valerio 86182ef28a fix(server): switch default browser profile to Firefox Reddit blocks wreq's Chrome 145 BoringSSL fingerprint at the JA3/JA4 TLS layer even though our HTTP headers correctly impersonate Chrome. Curl from the same machine with the same Chrome User-Agent string returns 200 from Reddit's .json endpoint; webclaw with the Chrome profile returns 403. The detector clearly fingerprints below the header layer. Tested all six vertical extractors with the Firefox profile: reddit, hackernews, github_repo, pypi, npm, huggingface_model all return correct typed JSON. Firefox is a strict improvement on the Chrome default for sites with active TLS-level bot detection, with no regressions on the API-flavored sites that were already working. Real fix is per-extractor preferred profile, but the structural change to allow per-call profile selection in FetchClient is a larger refactor. Flipping the global default is a one-line change that ships the unblock now and lets users hit the new /v1/scrape/{vertical} routes against Reddit immediately.		2026-04-22 14:11:55 +02:00
..
webclaw-cli	feat(cli): add webclaw bench <url> subcommand (closes #26 )	2026-04-22 12:25:29 +02:00
webclaw-core	style: cargo fmt	2026-04-17 12:03:22 +02:00
webclaw-fetch	feat(extractors): add vertical extractors module + first 6 verticals	2026-04-22 14:11:43 +02:00
webclaw-llm	feat(fetch,llm): DoS hardening + glob validation + cleanup (P2) (#22 )	2026-04-16 19:44:08 +02:00
webclaw-mcp	fix(mcp): silence dead-code warning on tool_router field (closes #30 )	2026-04-22 12:25:39 +02:00
webclaw-pdf	Initial release: webclaw v0.1.0 — web content extraction for LLMs	2026-03-23 18:31:11 +01:00
webclaw-server	fix(server): switch default browser profile to Firefox	2026-04-22 14:11:55 +02:00