feat(proxy): integrate Scrapling for enhanced web scraping capabilities

- Replaced Playwright with Scrapling's fetchers in the web crawling and YouTube processing modules for improved performance and flexibility.
- Updated proxy configuration to support dynamic proxy selection via environment variables.
- Enhanced logging to track performance metrics during web scraping operations.
- Refactored related modules to utilize the new proxy utilities and streamline the scraping process.
This commit is contained in:
DESKTOP-RTLN3BA\$punk 2026-06-09 00:15:10 -07:00
parent 41a93ca8fb
commit 640ef5f15d
16 changed files with 5770 additions and 4886 deletions

View file

@ -277,9 +277,13 @@ TURNSTILE_ENABLED=FALSE
TURNSTILE_SECRET_KEY=
# Proxy provider selection. Selects a ProxyProvider implementation registered in
# app/utils/proxy/registry.py. Default: "anonymous_proxies". Add new vendors there.
# PROXY_PROVIDER=anonymous_proxies
# Residential Proxy Configuration (anonymous-proxies.net)
# Used for web crawling, link previews, and YouTube transcript fetching to avoid IP bans.
# Leave commented out to disable proxying.
# Consumed by the "anonymous_proxies" provider. Leave commented out to disable proxying.
# RESIDENTIAL_PROXY_USERNAME=your_proxy_username
# RESIDENTIAL_PROXY_PASSWORD=your_proxy_password
# RESIDENTIAL_PROXY_HOSTNAME=rotating.dnsproxifier.com:31230