webclaw

mirror of https://github.com/0xMassi/webclaw.git synced 2026-05-13 17:02:36 +02:00

Author	SHA1	Message	Date
Valerio	1d2018c98e	fix: MCP research saves to file, returns compact response Research results saved to ~/.webclaw/research/ (report.md + full.json). MCP returns file paths + findings instead of the full report, preventing "exceeds maximum allowed tokens" errors in Claude/Cursor. Same query returns cached result instantly without spending credits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 16:05:45 +02:00
Valerio	f7cc0cc5cf	feat: CLI --research flag + MCP cloud fallback + structured research output - --research "query": deep research via cloud API, saves JSON file with report + sources + findings, prints report to stdout - --deep: longer, more thorough research mode - MCP extract/summarize: cloud fallback when no local LLM available - MCP research: returns structured JSON instead of raw text - Bump to v0.3.7 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 14:04:04 +02:00
Valerio	da1d76c97a	feat: add --cookie-file support for JSON cookie files - --cookie-file reads Chrome extension format ([{name, value, domain, ...}]) - Works with EditThisCookie, Cookie-Editor, and similar browser extensions - Merges with --cookie when both provided - MCP scrape tool now accepts cookies parameter - Closes #7 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 10:54:53 +02:00
Valerio	199dab6dfa	fix: adapt to webclaw-tls v0.1.1 HeaderMap API change Response.headers() now returns &http::HeaderMap instead of &HashMap<String, String>. Updated FetchResult, is_pdf_content_type, is_document_content_type, is_bot_protected, and all related tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 12:09:50 +02:00
Valerio	0c91c6d5a9	feat: v0.1.3 — crawl streaming, resume/cancel, MCP proxy support Crawl: - Real-time progress on stderr as pages complete - --crawl-state saves progress on Ctrl+C, resumes from saved state - Visited set + remaining frontier persisted for accurate resume MCP server: - Reads WEBCLAW_PROXY and WEBCLAW_PROXY_FILE env vars - Falls back to proxies.txt in CWD (existing behavior) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-25 21:38:28 +01:00
Valerio	dff458d2f5	fix: collapse nested if to satisfy clippy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 17:28:57 +01:00
Valerio	b92c0ed186	style: fix cargo fmt formatting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 17:27:15 +01:00
Valerio	ea9c783bc5	fix: v0.1.1 — MCP identity, timeouts, exit codes, URL validation Critical: - MCP server identifies as "webclaw-mcp" instead of "rmcp" - Research tool poll loop capped at 200 iterations (~10 min) CLI: - Non-zero exit codes on errors - Text format strips markdown table syntax MCP server: - URL validation on all tools - 60s cloud API timeout, 30s local fetch timeout - Diff cloud fallback computes actual diff - Batch capped at 100 URLs, crawl at 500 pages - Graceful startup failure instead of panic Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 17:25:05 +01:00
Valerio	c99ec684fa	Initial release: webclaw v0.1.0 — web content extraction for LLMs CLI + MCP server for extracting clean, structured content from any URL. 6 Rust crates, 10 MCP tools, TLS fingerprinting, 5 output formats. MIT Licensed \| https://webclaw.io	2026-03-23 18:31:11 +01:00

9 commits