mirror of
https://github.com/0xMassi/webclaw.git
synced 2026-06-19 01:58:06 +02:00
Extraction ~22% faster on the corpus benchmark with byte-identical output: - hoist recompiled CSS selectors in the markdown noise path - single-pass shared og() meta parsing across vertical extractors - output-safe QuickJS gating (skip the JS VM when no candidate data) + reuse the already-parsed document instead of re-parsing - wreq connect_timeout + connection-pool tuning; dedup the retry loop Reliability + correctness: - char-boundary-safe truncation of LLM error bodies (shared helper) - HTTP connect/read timeouts on all LLM provider clients - isolate pdf-extract behind catch_unwind + spawn_blocking - OSS server: crawl inherits the shared fetch profile; ProviderChain built once in AppState; request TimeoutLayer API / safety / docs: - #[non_exhaustive] on public enums + result structs (+ builders) - #![forbid(unsafe_code)] on pure crates, deny on llm - //! crate docs + doctests; scrub bypass/vendor/target specifics from public crate docs and comments Tooling: [profile.release] lto/codegen-units/strip, MSRV pin, deny.toml + cargo-deny CI, macOS test matrix. CLI main.rs split into focused modules.
42 lines
1.3 KiB
Rust
42 lines
1.3 KiB
Rust
//! webclaw-llm: LLM integration with local-first hybrid architecture.
|
|
//!
|
|
//! Provider chain tries Ollama (local) first, falls back to OpenAI, then Anthropic.
|
|
//! Provides schema-based extraction, prompt extraction, and summarization
|
|
//! on top of webclaw-core's content pipeline.
|
|
//!
|
|
//! ```no_run
|
|
//! use webclaw_llm::{ProviderChain, LlmProvider, CompletionRequest, Message};
|
|
//!
|
|
//! # async fn run() -> Result<(), webclaw_llm::LlmError> {
|
|
//! // Builds Ollama -> OpenAI -> Anthropic, including only configured providers.
|
|
//! let chain = ProviderChain::default().await;
|
|
//!
|
|
//! let request = CompletionRequest {
|
|
//! model: String::new(), // empty = each provider's default model
|
|
//! messages: vec![Message { role: "user".into(), content: "Hello".into() }],
|
|
//! temperature: None,
|
|
//! max_tokens: None,
|
|
//! json_mode: false,
|
|
//! };
|
|
//!
|
|
//! let answer = chain.complete(&request).await?;
|
|
//! println!("{answer}");
|
|
//! # Ok(())
|
|
//! # }
|
|
//! ```
|
|
#![deny(unsafe_code)]
|
|
|
|
pub mod chain;
|
|
pub mod clean;
|
|
pub mod error;
|
|
pub mod extract;
|
|
pub mod provider;
|
|
pub mod providers;
|
|
pub mod summarize;
|
|
#[cfg(test)]
|
|
pub(crate) mod testing;
|
|
|
|
pub use chain::ProviderChain;
|
|
pub use clean::strip_thinking_tags;
|
|
pub use error::LlmError;
|
|
pub use provider::{CompletionRequest, LlmProvider, Message};
|