webclaw

mirror of https://github.com/0xMassi/webclaw.git synced 2026-06-08 22:25:12 +02:00

Valerio ea14848772 feat: v0.2.0 — DOCX/XLSX/CSV extraction, HTML format, multi-URL watch, batch LLM Document extraction: - DOCX: auto-detected, outputs markdown with headings (via zip + quick-xml) - XLSX/XLS: markdown tables with multi-sheet support (via calamine) - CSV: quoted field handling, markdown table output - All auto-detected by Content-Type header or URL extension New features: - -f html output format (sanitized HTML) - Multi-URL watch: --urls-file + --watch monitors all URLs in parallel - Batch + LLM: --extract-prompt/--extract-json works with multiple URLs - Mixed batch: HTML pages + DOCX + XLSX + CSV in one command Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>		2026-03-26 15:28:23 +01:00
..
webclaw-cli	feat: v0.2.0 — DOCX/XLSX/CSV extraction, HTML format, multi-URL watch, batch LLM	2026-03-26 15:28:23 +01:00
webclaw-core	feat: v0.1.4 — QuickJS integration for inline JavaScript data extraction	2026-03-26 10:28:16 +01:00
webclaw-fetch	feat: v0.2.0 — DOCX/XLSX/CSV extraction, HTML format, multi-URL watch, batch LLM	2026-03-26 15:28:23 +01:00
webclaw-llm	Initial release: webclaw v0.1.0 — web content extraction for LLMs	2026-03-23 18:31:11 +01:00
webclaw-mcp	feat: v0.1.3 — crawl streaming, resume/cancel, MCP proxy support	2026-03-25 21:38:28 +01:00
webclaw-pdf	Initial release: webclaw v0.1.0 — web content extraction for LLMs	2026-03-23 18:31:11 +01:00