mirror of
https://github.com/0xMassi/webclaw.git
synced 2026-04-25 00:06:21 +02:00
Replaces the previous benchmarks/README.md, which claimed specific numbers (94.2% accuracy, 0.8ms extraction, 97% Cloudflare bypass, etc.) with no reproducing code committed to the repo. The `webclaw-bench` crate and `benchmarks/fixtures`, `benchmarks/ground-truth` directories it referenced never existed. This is what #18 was calling out. New benchmarks/ is fully reproducible. Every number ships with the script that produced it. `./benchmarks/run.sh` regenerates everything. Results (18 sites, 90 hand-curated facts, median of 3 runs, webclaw 0.3.18, cl100k_base tokenizer): tool reduction_mean fidelity latency_mean webclaw 92.5% 76/90 (84.4%) 0.41s firecrawl 92.4% 70/90 (77.8%) 0.99s trafilatura 97.8% 45/90 (50.0%) 0.21s webclaw matches or beats both competitors on fidelity on all 18 sites while running 2.4x faster than Firecrawl's hosted API. Includes: - README.md — headline table + per-site breakdown - methodology.md — tokenizer, fact selection, run rationale - sites.txt — 18 canonical URLs - facts.json — 90 curated facts (PRs welcome to add sites) - scripts/bench.py — the runner - results/2026-04-17.json — today's raw data, median of 3 runs - run.sh — one-command reproduction Closes #18 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
23 lines
2.4 KiB
JSON
23 lines
2.4 KiB
JSON
{
|
|
"_comment": "Hand-curated 'visible facts' per site. Inspected from live pages on 2026-04-17. PRs welcome to add sites or adjust facts — keep facts specific (customer names, headline stats, product names), not generic words.",
|
|
"facts": {
|
|
"https://openai.com": ["ChatGPT", "Sora", "API", "Enterprise", "research"],
|
|
"https://vercel.com": ["Next.js", "Hobby", "Pro", "Enterprise", "deploy"],
|
|
"https://anthropic.com": ["Opus", "Claude", "Glasswing", "Perseverance", "NASA"],
|
|
"https://www.notion.com": ["agents", "Forbes", "Figma", "Ramp", "Cursor"],
|
|
"https://stripe.com": ["Hertz", "URBN", "Instacart", "99.999", "1.9"],
|
|
"https://tavily.com": ["search", "extract", "crawl", "research", "developers"],
|
|
"https://www.shopify.com": ["Plus", "merchants", "retail", "brands", "checkout"],
|
|
"https://docs.python.org/3/": ["tutorial", "library", "reference", "setup", "distribution"],
|
|
"https://react.dev": ["Components", "JSX", "Hooks", "Learn", "Reference"],
|
|
"https://tailwindcss.com/docs/installation": ["Vite", "PostCSS", "CLI", "install", "Next.js"],
|
|
"https://nextjs.org/docs": ["App Router", "Pages Router", "getting-started", "deploying", "Server"],
|
|
"https://github.com": ["Copilot", "Actions", "millions", "developers", "Enterprise"],
|
|
"https://en.wikipedia.org/wiki/Rust_(programming_language)": ["Graydon", "Mozilla", "borrow", "Cargo", "2015"],
|
|
"https://simonwillison.net/2026/Mar/15/latent-reasoning/": ["latent", "reasoning", "Willison", "model", "Simon"],
|
|
"https://paulgraham.com/essays.html": ["Graham", "essay", "startup", "Lisp", "founders"],
|
|
"https://techcrunch.com": ["TechCrunch", "startup", "news", "events", "latest"],
|
|
"https://www.databricks.com": ["Lakehouse", "platform", "data", "MLflow", "AI"],
|
|
"https://www.hashicorp.com": ["Terraform", "Vault", "Consul", "infrastructure", "enterprise"]
|
|
}
|
|
}
|