Commit graph

3 commits

Author SHA1 Message Date
Valerio
afe4d3077d feat: v0.1.2 — TLS fallback, Safari default, Reddit fix, YouTube transcript infra
- Switch default profile to Safari26/Mac (best CF pass rate)
- Auto-fallback to plain client on connection error or 403
- Fixes: ycombinator.com, producthunt.com, and similar CF-strict sites
- Reddit .json endpoint uses plain client (TLS fingerprint was blocked)
- YouTube caption track extraction + timed text parser (core, not yet wired)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 18:50:07 +01:00
Valerio
ea9c783bc5 fix: v0.1.1 — MCP identity, timeouts, exit codes, URL validation
Critical:
- MCP server identifies as "webclaw-mcp" instead of "rmcp"
- Research tool poll loop capped at 200 iterations (~10 min)

CLI:
- Non-zero exit codes on errors
- Text format strips markdown table syntax

MCP server:
- URL validation on all tools
- 60s cloud API timeout, 30s local fetch timeout
- Diff cloud fallback computes actual diff
- Batch capped at 100 URLs, crawl at 500 pages
- Graceful startup failure instead of panic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 17:25:05 +01:00
Valerio
c99ec684fa Initial release: webclaw v0.1.0 — web content extraction for LLMs
CLI + MCP server for extracting clean, structured content from any URL.
6 Rust crates, 10 MCP tools, TLS fingerprinting, 5 output formats.

MIT Licensed | https://webclaw.io
2026-03-23 18:31:11 +01:00