webclaw/crates/webclaw-core/src
Valerio 8d29382b25 feat: extract __NEXT_DATA__ into structured_data
Next.js pages embed server-rendered data in <script id="__NEXT_DATA__">.
Now extracted as structured JSON (pageProps) in the structured_data field.

Tested on 45 sites — 13 return rich structured data including prices,
product info, and page state not visible in the DOM.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 16:04:51 +02:00
..
llm Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
brand.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
data_island.rs feat: SvelteKit data extraction + license change to AGPL-3.0 2026-04-01 20:37:56 +02:00
diff.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
domain.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
error.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
extractor.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
js_eval.rs feat: v0.1.4 — QuickJS integration for inline JavaScript data extraction 2026-03-26 10:28:16 +01:00
lib.rs feat: extract __NEXT_DATA__ into structured_data 2026-04-02 16:04:51 +02:00
markdown.rs fix: v0.1.1 — MCP identity, timeouts, exit codes, URL validation 2026-03-24 17:25:05 +01:00
metadata.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
noise.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
structured_data.rs feat: extract __NEXT_DATA__ into structured_data 2026-04-02 16:04:51 +02:00
types.rs Initial release: webclaw v0.1.0 — web content extraction for LLMs 2026-03-23 18:31:11 +01:00
youtube.rs feat: v0.1.2 — TLS fallback, Safari default, Reddit fix, YouTube transcript infra 2026-03-25 18:50:07 +01:00