feat: v0.1.4 — QuickJS integration for inline JavaScript data extraction

Embeds QuickJS (rquickjs) to execute inline <script> tags and extract
data hidden in JavaScript variable assignments. Captures window.__*
objects like __preloadedData (NYTimes), __PRELOADED_STATE__ (Wired),
and self.__next_f (Next.js RSC flight data).

Results:
- NYTimes: 1,552 → 4,162 words (+168%)
- Wired: 1,459 → 9,937 words (+580%)
- Zero measurable performance overhead (<15ms per page)
- Feature-gated: disable with --no-default-features for WASM

Smart text filtering rejects CSS, base64, file paths, code strings.
Only readable prose is appended under "## Additional Content".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Valerio 2026-03-26 10:28:16 +01:00
parent 0c91c6d5a9
commit 32c035c543
6 changed files with 665 additions and 7 deletions

View file

@ -3,7 +3,7 @@ resolver = "2"
members = ["crates/*"]
[workspace.package]
version = "0.1.3"
version = "0.1.4"
edition = "2024"
license = "MIT"
repository = "https://github.com/0xMassi/webclaw"