webclaw/crates/webclaw-core
devnen ade2a5143c feat(core): --mode sections for nav-URL discovery
Section-URL ambiguity is recurring friction — callers have to guess
whether to hit infobae.com root (LATAM frontpage) or /economia/ (AR-
specific live FX dashboard), or decrypt.co root (ticker ribbon) vs
/news/ (article list), or bbc.com/news/world vs /news/world/europe/.
Each guess costs a round-trip.

New `--mode sections` returns the discoverable section URLs parsed
from the page's nav, in one round-trip. Subsumes issue #16 (non-
English nav harder to LLM-parse — sections come back as data, not
prose).

Multi-signal heuristic on the existing link extraction:
URL-pattern match (/<category>/ style short paths), repetition
(section links appear in header + footer), DOM-position when
available. Fallback when zero sections detected: emit top-N links
with a "(none detected; first N shown)" note.

Format: -f llm/text emits `Sections:` followed by `- [Label](url)`
list. -f json emits `{"sections": [{"label": "...", "url": "..."}]}`.

13 new tests in webclaw-core (688 -> 701).
2026-05-23 23:14:40 +02:00
..
src feat(core): --mode sections for nav-URL discovery 2026-05-23 23:14:40 +02:00
testdata fix: prevent stack overflow on deeply nested HTML pages 2026-04-03 23:45:19 +02:00
Cargo.toml fix: harden resource limits, path safety, and WASM build (#46) 2026-05-19 17:03:52 +02:00