webclaw

mirror of https://github.com/0xMassi/webclaw.git synced 2026-06-08 22:25:12 +02:00

devnen 74bac87435 fix: prevent stack overflow on deeply nested HTML pages Pages like Express.co.uk live blogs nest 200+ DOM levels deep, overflowing the default 1 MB main-thread stack on Windows during recursive markdown conversion. Two-layer fix: 1. markdown.rs: add depth parameter to node_to_md/children_to_md/inline_text with MAX_DOM_DEPTH=200 guard — falls back to plain text collection at limit 2. lib.rs: wrap extract_with_options in a worker thread with 8 MB stack so html5ever parsing and extraction both have room on deeply nested pages Tested with Express.co.uk live blog (previously crashed, now extracts 2000+ lines of clean markdown) and drudgereport.com (still works correctly).		2026-04-03 23:45:19 +02:00
..
webclaw-cli	feat: CLI --research flag + MCP cloud fallback + structured research output	2026-04-03 14:04:04 +02:00
webclaw-core	fix: prevent stack overflow on deeply nested HTML pages	2026-04-03 23:45:19 +02:00
webclaw-fetch	style: cargo fmt	2026-04-01 18:25:40 +02:00
webclaw-llm	Initial release: webclaw v0.1.0 — web content extraction for LLMs	2026-03-23 18:31:11 +01:00
webclaw-mcp	fix: MCP research saves to file, returns compact response	2026-04-03 16:05:45 +02:00
webclaw-pdf	Initial release: webclaw v0.1.0 — web content extraction for LLMs	2026-03-23 18:31:11 +01:00