mirror of
https://github.com/0xMassi/webclaw.git
synced 2026-05-15 18:25:24 +02:00
Embeds QuickJS (rquickjs) to execute inline <script> tags and extract data hidden in JavaScript variable assignments. Captures window.__* objects like __preloadedData (NYTimes), __PRELOADED_STATE__ (Wired), and self.__next_f (Next.js RSC flight data). Results: - NYTimes: 1,552 → 4,162 words (+168%) - Wired: 1,459 → 9,937 words (+580%) - Zero measurable performance overhead (<15ms per page) - Feature-gated: disable with --no-default-features for WASM Smart text filtering rejects CSS, base64, file paths, code strings. Only readable prose is appended under "## Additional Content". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| llm | ||
| brand.rs | ||
| data_island.rs | ||
| diff.rs | ||
| domain.rs | ||
| error.rs | ||
| extractor.rs | ||
| js_eval.rs | ||
| lib.rs | ||
| markdown.rs | ||
| metadata.rs | ||
| noise.rs | ||
| structured_data.rs | ||
| types.rs | ||
| youtube.rs | ||