webclaw/crates
devnen 339f41bb7c feat(cli): add --max-output-bytes and --mode summary,toc for output-size control
Three additive CLI flags addressing the 50KB persisted-output cap that
trips Claude Code's per-tool-result harness on aggregator front pages
(apnews.com, cnbc.com/markets/, b92.net all >50KB by default):

--max-output-bytes N: truncates final output at N bytes with a clear
'[truncated: M more bytes ...]' footer. N=0 means unlimited (default).
UTF-8 codepoint-boundary safe; also wraps JSON output so truncated
output stays parseable.

--mode summary: returns only the extracted link list (titles + URLs),
no body text. For aggregator front pages where the LLM is going to
drill the individual articles next anyway.

--mode toc: returns H1/H2 outline + first paragraph after each H2.
For long single-article pages.

New flags are orthogonal to -f (json/llm/text). 9 new unit tests in
webclaw-core, total goes 308 -> 317 passing. Smoke-tested on
apnews.com (51713 -> 27404 summary -> 6269 toc -> 8193 capped),
pitchfork.com (42049 -> 379 summary), cnbc.com (56682 -> 16385 capped).
2026-05-23 18:17:42 +02:00
..
webclaw-cli feat(cli): add --max-output-bytes and --mode summary,toc for output-size control 2026-05-23 18:17:42 +02:00
webclaw-core feat(cli): add --max-output-bytes and --mode summary,toc for output-size control 2026-05-23 18:17:42 +02:00
webclaw-fetch feat(core): endpoints module for API surface extraction from HTML and JS (#47) 2026-05-19 19:05:16 +02:00
webclaw-llm fix: support LLM provider compatibility options 2026-05-06 11:36:53 +02:00
webclaw-mcp fix: harden resource limits, path safety, and WASM build (#46) 2026-05-19 17:03:52 +02:00
webclaw-pdf Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
webclaw-server fix: validate self-host route URLs consistently 2026-05-04 14:30:06 +02:00