webclaw/crates/webclaw-server/Cargo.toml
webclaw 02302e7a1d perf(core): hot-path extraction speedups + senior-grade hardening
Extraction ~22% faster on the corpus benchmark with byte-identical output:
- hoist recompiled CSS selectors in the markdown noise path
- single-pass shared og() meta parsing across vertical extractors
- output-safe QuickJS gating (skip the JS VM when no candidate data) +
  reuse the already-parsed document instead of re-parsing
- wreq connect_timeout + connection-pool tuning; dedup the retry loop

Reliability + correctness:
- char-boundary-safe truncation of LLM error bodies (shared helper)
- HTTP connect/read timeouts on all LLM provider clients
- isolate pdf-extract behind catch_unwind + spawn_blocking
- OSS server: crawl inherits the shared fetch profile; ProviderChain built
  once in AppState; request TimeoutLayer

API / safety / docs:
- #[non_exhaustive] on public enums + result structs (+ builders)
- #![forbid(unsafe_code)] on pure crates, deny on llm
- //! crate docs + doctests; scrub bypass/vendor/target specifics from
  public crate docs and comments

Tooling: [profile.release] lto/codegen-units/strip, MSRV pin, deny.toml +
cargo-deny CI, macOS test matrix. CLI main.rs split into focused modules.
2026-06-04 20:22:00 +02:00

39 lines
1.4 KiB
TOML

[package]
name = "webclaw-server"
version.workspace = true
edition.workspace = true
rust-version.workspace = true
license.workspace = true
repository.workspace = true
description = "Minimal REST API server for self-hosting webclaw extraction. Wraps the OSS extraction crates with HTTP endpoints. NOT the production hosted API at api.webclaw.io — this is a stateless, single-binary reference server for local + self-hosted deployments."
[lints]
workspace = true
[[bin]]
name = "webclaw-server"
path = "src/main.rs"
[dependencies]
webclaw-core = { workspace = true }
webclaw-fetch = { workspace = true }
webclaw-llm = { workspace = true }
webclaw-pdf = { workspace = true }
axum = { version = "0.8", features = ["macros"] }
tokio = { workspace = true }
tower-http = { version = "0.6", features = ["trace", "cors", "timeout"] }
clap = { workspace = true, features = ["derive", "env"] }
serde = { workspace = true }
serde_json = { workspace = true }
tracing = { workspace = true }
tracing-subscriber = { workspace = true, features = ["env-filter"] }
anyhow = "1"
thiserror = { workspace = true }
subtle = "2.6"
[dev-dependencies]
# `ServiceExt::oneshot` drives the router in-process for hermetic handler
# tests (no TCP listener, no network).
tower = { version = "0.5", features = ["util"] }
http-body-util = "0.1"