webclaw/crates
devnen ade2a5143c feat(core): --mode sections for nav-URL discovery
Section-URL ambiguity is recurring friction — callers have to guess
whether to hit infobae.com root (LATAM frontpage) or /economia/ (AR-
specific live FX dashboard), or decrypt.co root (ticker ribbon) vs
/news/ (article list), or bbc.com/news/world vs /news/world/europe/.
Each guess costs a round-trip.

New `--mode sections` returns the discoverable section URLs parsed
from the page's nav, in one round-trip. Subsumes issue #16 (non-
English nav harder to LLM-parse — sections come back as data, not
prose).

Multi-signal heuristic on the existing link extraction:
URL-pattern match (/<category>/ style short paths), repetition
(section links appear in header + footer), DOM-position when
available. Fallback when zero sections detected: emit top-N links
with a "(none detected; first N shown)" note.

Format: -f llm/text emits `Sections:` followed by `- [Label](url)`
list. -f json emits `{"sections": [{"label": "...", "url": "..."}]}`.

13 new tests in webclaw-core (688 -> 701).
2026-05-23 23:14:40 +02:00
..
webclaw-cli feat(core): --mode sections for nav-URL discovery 2026-05-23 23:14:40 +02:00
webclaw-core feat(core): --mode sections for nav-URL discovery 2026-05-23 23:14:40 +02:00
webclaw-fetch feat(core): HTTP status header line in -f llm/text/json output 2026-05-23 21:29:26 +02:00
webclaw-llm fix: support LLM provider compatibility options 2026-05-06 11:36:53 +02:00
webclaw-mcp feat(core): HTTP status header line in -f llm/text/json output 2026-05-23 21:29:26 +02:00
webclaw-pdf Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
webclaw-server feat(core): HTTP status header line in -f llm/text/json output 2026-05-23 21:29:26 +02:00