webclaw/crates/webclaw-fetch/src/lib.rs
devnen fe6e9b5d28 feat(fetch): periodic progress stderr line on slow fetches
Webclaw's default -t timeout is 30s; slow sites previously sat
silently with no feedback. Now during a fetch, every 10s of elapsed
time webclaw writes one line to stderr:

  # webclaw: still fetching <URL> (Ns)

Fetches completing in under 10s emit nothing (the timer never fires).
Stdout output is untouched - pure feedback signal on stderr.

No timeout change. No new flags. Default behavior is augmented at
stderr only.

Implemented via tokio::select! between the fetch future and a
tokio::time::interval. Latency cost: a single tokio task spawn
and a 10s tick - microseconds on the fast path.

10 new tests in webclaw-fetch::progress::tests (none ignored; the
slow-future test uses a 50ms test interval to keep cargo test fast).
Workspace total 710 -> 720.

(cherry picked from commit 06f065cb08)
2026-06-09 10:47:52 +02:00

31 lines
1,010 B
Rust

//! webclaw-fetch: HTTP client layer with browser TLS fingerprint impersonation.
//! Uses wreq (BoringSSL) for browser-grade TLS + HTTP/2 fingerprinting.
//! Automatically detects PDF responses and delegates to webclaw-pdf.
pub mod browser;
pub mod client;
pub mod cloud;
pub mod crawler;
pub mod document;
pub mod error;
pub mod extractors;
pub mod fetcher;
pub mod linkedin;
pub mod locale;
pub mod progress;
pub mod proxy;
pub mod reddit;
pub mod sitemap;
pub mod tls;
pub mod url_security;
pub use browser::BrowserProfile;
pub use client::{BatchExtractResult, BatchResult, FetchClient, FetchConfig, FetchResult};
pub use crawler::{CrawlConfig, CrawlResult, CrawlState, Crawler, PageResult};
pub use error::FetchError;
pub use fetcher::Fetcher;
pub use http::HeaderMap;
pub use locale::{accept_language_for_tld, accept_language_for_url};
pub use progress::{with_progress, PROGRESS_INTERVAL};
pub use proxy::{parse_proxy_file, parse_proxy_line};
pub use sitemap::SitemapEntry;
pub use webclaw_pdf::PdfMode;