mirror of
https://github.com/0xMassi/webclaw.git
synced 2026-04-25 00:06:21 +02:00
feat: v0.1.4 — QuickJS integration for inline JavaScript data extraction
Embeds QuickJS (rquickjs) to execute inline <script> tags and extract data hidden in JavaScript variable assignments. Captures window.__* objects like __preloadedData (NYTimes), __PRELOADED_STATE__ (Wired), and self.__next_f (Next.js RSC flight data). Results: - NYTimes: 1,552 → 4,162 words (+168%) - Wired: 1,459 → 9,937 words (+580%) - Zero measurable performance overhead (<15ms per page) - Feature-gated: disable with --no-default-features for WASM Smart text filtering rejects CSS, base64, file paths, code strings. Only readable prose is appended under "## Additional Content". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
0c91c6d5a9
commit
32c035c543
6 changed files with 665 additions and 7 deletions
11
CHANGELOG.md
11
CHANGELOG.md
|
|
@ -3,6 +3,17 @@
|
|||
All notable changes to webclaw are documented here.
|
||||
Format follows [Keep a Changelog](https://keepachangelog.com/).
|
||||
|
||||
## [0.1.4] — 2026-03-26
|
||||
|
||||
### Added
|
||||
- QuickJS integration for extracting data from inline JavaScript (NYTimes +168%, Wired +580% more content)
|
||||
- Executes inline `<script>` tags in a sandboxed runtime to capture `window.__*` data blobs
|
||||
- Parses Next.js RSC flight data (`self.__next_f`) for App Router sites
|
||||
- Smart text filtering rejects CSS, base64, file paths, and code — only keeps readable prose
|
||||
- Feature-gated with `quickjs` feature flag (enabled by default, disable for WASM builds)
|
||||
|
||||
---
|
||||
|
||||
## [0.1.3] — 2026-03-25
|
||||
|
||||
### Added
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue