mirror of
https://github.com/0xMassi/webclaw.git
synced 2026-06-29 03:39:37 +02:00
docs(benchmarks): reproducible 3-way comparison vs trafilatura + firecrawl (#25)
Replaces the previous benchmarks/README.md, which claimed specific numbers (94.2% accuracy, 0.8ms extraction, 97% Cloudflare bypass, etc.) with no reproducing code committed to the repo. The `webclaw-bench` crate and `benchmarks/fixtures`, `benchmarks/ground-truth` directories it referenced never existed. This is what #18 was calling out. New benchmarks/ is fully reproducible. Every number ships with the script that produced it. `./benchmarks/run.sh` regenerates everything. Results (18 sites, 90 hand-curated facts, median of 3 runs, webclaw 0.3.18, cl100k_base tokenizer): tool reduction_mean fidelity latency_mean webclaw 92.5% 76/90 (84.4%) 0.41s firecrawl 92.4% 70/90 (77.8%) 0.99s trafilatura 97.8% 45/90 (50.0%) 0.21s webclaw matches or beats both competitors on fidelity on all 18 sites while running 2.4x faster than Firecrawl's hosted API. Includes: - README.md — headline table + per-site breakdown - methodology.md — tokenizer, fact selection, run rationale - sites.txt — 18 canonical URLs - facts.json — 90 curated facts (PRs welcome to add sites) - scripts/bench.py — the runner - results/2026-04-17.json — today's raw data, median of 3 runs - run.sh — one-command reproduction Closes #18 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
0463b5e263
commit
e27ee1f86f
7 changed files with 934 additions and 118 deletions
27
benchmarks/run.sh
Executable file
27
benchmarks/run.sh
Executable file
|
|
@ -0,0 +1,27 @@
|
|||
#!/usr/bin/env bash
|
||||
# Reproduce the webclaw benchmark.
|
||||
# Requires: python3, tiktoken, trafilatura. Optional: firecrawl-py + FIRECRAWL_API_KEY.
|
||||
|
||||
set -euo pipefail
|
||||
cd "$(dirname "$0")"
|
||||
|
||||
# Build webclaw if not present
|
||||
if [ ! -x "../target/release/webclaw" ]; then
|
||||
echo "→ building webclaw..."
|
||||
(cd .. && cargo build --release)
|
||||
fi
|
||||
|
||||
# Install python deps if missing
|
||||
missing=""
|
||||
python3 -c "import tiktoken" 2>/dev/null || missing+=" tiktoken"
|
||||
python3 -c "import trafilatura" 2>/dev/null || missing+=" trafilatura"
|
||||
if [ -n "${FIRECRAWL_API_KEY:-}" ]; then
|
||||
python3 -c "import firecrawl" 2>/dev/null || missing+=" firecrawl-py"
|
||||
fi
|
||||
if [ -n "$missing" ]; then
|
||||
echo "→ installing python deps:$missing"
|
||||
python3 -m pip install --quiet $missing
|
||||
fi
|
||||
|
||||
# Run
|
||||
python3 scripts/bench.py
|
||||
Loading…
Add table
Add a link
Reference in a new issue