feat(core): word-count breakdown in header — article vs chrome split

Current Word count: N is a single number conflating article body and surrounding chrome (nav, ads, footer). Callers couldn't tell from the header alone whether to drill or move on. New: Word count: N (article: M, chrome: K) in -f llm/text output. For -f json: adds word_count_article and word_count_chrome fields alongside the existing word_count. M (article body) is sourced from JSON-LD articleBody when M4's parser found one (NewsArticle or Review.reviewBody); otherwise computed by llm::body_word_count (the M2-style heuristic — words outside markdown link patterns, the same body::process_body output hub_detect uses). --mode summary / toc / sections fall back to the simple Word count: N form (the modes don't extract body content; the breakdown would be meaningless). Suppression piggybacks on the existing include_status toggle in build_metadata_header_with_opts. 9 new tests in webclaw-core (4 in lib.rs::tests for the population logic; 5 in llm/metadata.rs::m12_tests for the header formatter). Workspace 701 -> 710.
2026-06-14 23:25:12 +02:00 · 2026-05-23 23:56:14 +02:00 · 2026-05-23 23:56:14 +02:00 · d5a3aa4bf9
commit d5a3aa4bf9
parent ade2a5143c
17 changed files with 519 additions and 7 deletions
--- a/crates/webclaw-cli/src/main.rs
+++ b/crates/webclaw-cli/src/main.rs
@ -3032,6 +3032,8 @@ mod tests {
                image: None,
                favicon: None,
                word_count: markdown.split_whitespace().count(),
+                word_count_article: 0,
+                word_count_chrome: 0,
                http_status: None,
            },
            content: Content {