mirror of
https://github.com/0xMassi/webclaw.git
synced 2026-06-08 22:25:12 +02:00
Adds etsy_listing and hardens two existing extractors with HTML fallbacks
so transient API failures still return useful data.
New:
- etsy_listing: /listing/{id}(/slug) with Schema.org Product JSON-LD +
OG fallback. Antibot-gated, routes through cloud::smart_fetch_html
like amazon_product and ebay_listing. Auto-dispatched (etsy host is
unique).
Hardened:
- substack_post: when /api/v1/posts/{slug} returns non-200 (rate limit,
403 on hardened custom domains, 5xx), fall back to HTML fetch and
parse OG tags + Article JSON-LD. Response shape is stable across
both paths, with a `data_source` field of "api" or "html_fallback".
- youtube_video: when ytInitialPlayerResponse is missing (EU-consent
interstitial, age-gated, some live pre-shows), fall back to OG tags
for title/description/thumbnail. `data_source` now "player_response"
or "og_fallback".
Tests: 91 passing in webclaw-fetch (9 new), clippy clean.
|
||
|---|---|---|
| .. | ||
| src | ||
| tests | ||
| Cargo.toml | ||