mirror of
https://github.com/0xMassi/webclaw.git
synced 2026-04-25 00:06:21 +02:00
Adds `webclaw_fetch::Fetcher` trait. All 28 vertical extractors now
take `client: &dyn Fetcher` instead of `client: &FetchClient` directly.
Backwards-compatible: FetchClient implements Fetcher, blanket impls
cover `&T` and `Arc<T>`, so existing CLI / MCP / self-hosted-server
callers keep working unchanged.
Motivation: the production API server (api.webclaw.io) must not do
in-process TLS fingerprinting; it delegates all HTTP to the Go
tls-sidecar. Before this trait, exposing /v1/scrape/{vertical} on
production would have required importing wreq into the server's
dep graph, violating the CLAUDE.md rule. Now production can provide
its own TlsSidecarFetcher implementation and pass it to the same
dispatcher the OSS server uses.
Changes:
- New `crates/webclaw-fetch/src/fetcher.rs` defining the trait plus
blanket impls for `&T` and `Arc<T>`.
- `FetchClient` gains a tiny impl block in client.rs that forwards to
its existing public methods.
- All 28 extractor signatures migrated from `&FetchClient` to
`&dyn Fetcher` (sed-driven bulk rewrite, no semantic change).
- `cloud::smart_fetch` and `cloud::smart_fetch_html` take `&dyn Fetcher`.
- `extractors::dispatch_by_url` and `extractors::dispatch_by_name`
take `&dyn Fetcher`.
- `async-trait 0.1` added to webclaw-fetch deps (Rust 1.75+ has
native async-fn-in-trait but dyn dispatch still needs async_trait).
- Version bumped to 0.5.1, CHANGELOG updated.
Tests: 215 passing in webclaw-fetch (no new tests needed — the existing
extractor tests exercise the trait methods transparently).
Clippy: clean workspace-wide.
26 lines
832 B
Rust
26 lines
832 B
Rust
//! webclaw-fetch: HTTP client layer with browser TLS fingerprint impersonation.
|
|
//! Uses wreq (BoringSSL) for browser-grade TLS + HTTP/2 fingerprinting.
|
|
//! Automatically detects PDF responses and delegates to webclaw-pdf.
|
|
pub mod browser;
|
|
pub mod client;
|
|
pub mod cloud;
|
|
pub mod crawler;
|
|
pub mod document;
|
|
pub mod error;
|
|
pub mod extractors;
|
|
pub mod fetcher;
|
|
pub mod linkedin;
|
|
pub mod proxy;
|
|
pub mod reddit;
|
|
pub mod sitemap;
|
|
pub mod tls;
|
|
|
|
pub use browser::BrowserProfile;
|
|
pub use client::{BatchExtractResult, BatchResult, FetchClient, FetchConfig, FetchResult};
|
|
pub use crawler::{CrawlConfig, CrawlResult, CrawlState, Crawler, PageResult};
|
|
pub use error::FetchError;
|
|
pub use fetcher::Fetcher;
|
|
pub use http::HeaderMap;
|
|
pub use proxy::{parse_proxy_file, parse_proxy_line};
|
|
pub use sitemap::SitemapEntry;
|
|
pub use webclaw_pdf::PdfMode;
|