webclaw/benchmarks/README.md

# Benchmarks

Reproducible benchmarks comparing `webclaw` against open-source and commercial
web extraction tools. Every number here ships with the script that produced it.
Run `./run.sh` to regenerate.

## Headline

**webclaw preserves more page content than any other tool tested, at 2.4× the
speed of the closest competitor.**

Across 18 production sites (SPAs, documentation, long-form articles, news,
enterprise marketing), measured over 3 runs per site with OpenAI's
`cl100k_base` tokenizer. Last run: 2026-04-17, webclaw v0.3.18.

| Tool | Fidelity (facts preserved) | Token reduction vs raw HTML | Mean latency |
|---|---:|---:|---:|
| **webclaw `--format llm`** | **76 / 90  (84.4 %)** | 92.5 % | **0.41 s** |
| Firecrawl API (v2, hosted) | 70 / 90  (77.8 %) | 92.4 % | 0.99 s |
| Trafilatura 2.0 | 45 / 90  (50.0 %) | 97.8 % (by dropping content) | 0.21 s |

**webclaw matches or beats both competitors on fidelity on all 18 sites.**

## Why webclaw wins

- **Speed.** 2.4× faster than Firecrawl's hosted API. Firecrawl defaults to
  browser rendering for everything; webclaw's in-process TLS-fingerprinted
  fetch plus deterministic extractor reaches comparable-or-better content
  without that overhead.
- **Fidelity.** Trafilatura's higher token reduction comes from dropping
  content. On the 18 sites tested it missed 45 of 90 key facts — entire
  customer-story sections, release dates, product names. webclaw keeps them.
- **Deterministic.** Same URL → same output. No LLM post-processing, no
  paraphrasing, no hallucination risk.

## Per-site results

Numbers are median of 3 runs. `raw` = raw fetched HTML token count.
`facts` = hand-curated visible facts preserved out of 5 per site.

| Site | raw HTML | webclaw | Firecrawl | Trafilatura | wc facts | fc facts | tr facts |
|---|---:|---:|---:|---:|:---:|:---:|:---:|
| openai.com | 170 K | 1,238 | 3,139 | 0 | **3/5** | 2/5 | 0/5 |
| vercel.com | 380 K | 1,076 | 4,029 | 585 | **3/5** | 3/5 | 3/5 |
| anthropic.com | 103 K | 672 | 560 | 96 | **5/5** | 5/5 | 4/5 |
| notion.com | 109 K | 13,416 | 5,261 | 91 | **5/5** | 5/5 | 2/5 |
| stripe.com | 243 K | 81,974 | 8,922 | 2,418 | **5/5** | 5/5 | 0/5 |
| tavily.com | 30 K | 1,361 | 1,969 | 182 | **5/5** | 4/5 | 3/5 |
| shopify.com | 184 K | 1,939 | 5,384 | 595 | **3/5** | 3/5 | 3/5 |
| docs.python.org | 5 K | 689 | 1,623 | 347 | **4/5** | 4/5 | 4/5 |
| react.dev | 107 K | 3,332 | 4,959 | 763 | **5/5** | 5/5 | 3/5 |
| tailwindcss.com/docs/installation | 113 K | 779 | 813 | 430 | **4/5** | 4/5 | 2/5 |
| nextjs.org/docs | 228 K | 968 | 885 | 631 | **4/5** | 4/5 | 4/5 |
| github.com | 234 K | 1,438 | 3,058 | 486 | **5/5** | 4/5 | 3/5 |
| en.wikipedia.org/wiki/Rust | 189 K | 47,823 | 59,326 | 37,427 | **5/5** | 5/5 | 5/5 |
| simonwillison.net/…/latent-reasoning | 3 K | 724 | 525 | 0 | **4/5** | 2/5 | 0/5 |
| paulgraham.com/essays.html | 2 K | 169 | 295 | 0 | **2/5** | 1/5 | 0/5 |
| techcrunch.com | 143 K | 7,265 | 11,408 | 397 | **5/5** | 5/5 | 5/5 |
| databricks.com | 274 K | 2,001 | 5,471 | 311 | **4/5** | 4/5 | 4/5 |
| hashicorp.com | 109 K | 1,501 | 4,289 | 0 | **5/5** | 5/5 | 0/5 |

## Reproducing this benchmark

```bash
cd benchmarks/
./run.sh
```

Requirements:
- Python 3.9+
- `pip install tiktoken trafilatura firecrawl-py`
- `webclaw` release binary at `../target/release/webclaw` (or set `$WEBCLAW`)
- Firecrawl API key (free tier: 500 credits/month, enough for many runs) —
  export as `FIRECRAWL_API_KEY`. If omitted, the benchmark runs with webclaw
  and Trafilatura only.

One run of the full suite burns ~60 Firecrawl credits (18 sites × 3 runs,
plus Firecrawl's scrape costs 1 credit each).

## Methodology

See [methodology.md](methodology.md) for:
- Tokenizer rationale (`cl100k_base` → covers GPT-4 / GPT-3.5 /
  `text-embedding-3-*`)
- Fact selection procedure and how to propose additions
- Why median of 3 runs (CDN / cache / network noise)
- Raw data schema (`results/*.json`)
- Notes on site churn (news aggregators, release pages)

## Raw data

Per-run results are committed as JSON at `results/YYYY-MM-DD.json` so the
history of measurements is auditable. Diff two runs to see regressions or
improvements across webclaw versions.
-												Initial release: webclaw v0.1.0 — web content extraction for LLMs

CLI + MCP server for extracting clean, structured content from any URL.
6 Rust crates, 10 MCP tools, TLS fingerprinting, 5 output formats.

MIT Licensed | https://webclaw.io

											
										
										
											2026-03-23 18:31:11 +01:00
+								# Benchmarks
-												docs(benchmarks): reproducible 3-way comparison vs trafilatura + firecrawl (#25)

Replaces the previous benchmarks/README.md, which claimed specific numbers
(94.2% accuracy, 0.8ms extraction, 97% Cloudflare bypass, etc.) with no
reproducing code committed to the repo. The `webclaw-bench` crate and
`benchmarks/fixtures`, `benchmarks/ground-truth` directories it referenced
never existed. This is what #18 was calling out.

New benchmarks/ is fully reproducible. Every number ships with the script
that produced it. `./benchmarks/run.sh` regenerates everything.

Results (18 sites, 90 hand-curated facts, median of 3 runs, webclaw 0.3.18,
cl100k_base tokenizer):

  tool          reduction_mean   fidelity        latency_mean
  webclaw              92.5%    76/90 (84.4%)        0.41s
  firecrawl            92.4%    70/90 (77.8%)        0.99s
  trafilatura          97.8%    45/90 (50.0%)        0.21s

webclaw matches or beats both competitors on fidelity on all 18 sites
while running 2.4x faster than Firecrawl's hosted API.

Includes:
- README.md              — headline table + per-site breakdown
- methodology.md         — tokenizer, fact selection, run rationale
- sites.txt              — 18 canonical URLs
- facts.json             — 90 curated facts (PRs welcome to add sites)
- scripts/bench.py       — the runner
- results/2026-04-17.json — today's raw data, median of 3 runs
- run.sh                 — one-command reproduction

Closes #18

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
											
										
										
											2026-04-17 14:46:19 +02:00
+								Reproducible benchmarks comparing `webclaw` against open-source and commercial
 								web extraction tools. Every number here ships with the script that produced it.
 								Run `./run.sh` to regenerate.
 								## Headline
 								**webclaw preserves more page content than any other tool tested, at 2.4× the
 								speed of the closest competitor.**
 								Across 18 production sites (SPAs, documentation, long-form articles, news,
 								enterprise marketing), measured over 3 runs per site with OpenAI's
 								`cl100k_base` tokenizer. Last run: 2026-04-17, webclaw v0.3.18.
 								| Tool | Fidelity (facts preserved) | Token reduction vs raw HTML | Mean latency |
 								|---|---:|---:|---:|
 								| **webclaw `--format llm`** | **76 / 90  (84.4 %)** | 92.5 % | **0.41 s** |
 								| Firecrawl API (v2, hosted) | 70 / 90  (77.8 %) | 92.4 % | 0.99 s |
 								| Trafilatura 2.0 | 45 / 90  (50.0 %) | 97.8 % (by dropping content) | 0.21 s |
 								**webclaw matches or beats both competitors on fidelity on all 18 sites.**
 								## Why webclaw wins
 								- **Speed.** 2.4× faster than Firecrawl's hosted API. Firecrawl defaults to
 								  browser rendering for everything; webclaw's in-process TLS-fingerprinted
 								  fetch plus deterministic extractor reaches comparable-or-better content
 								  without that overhead.
 								- **Fidelity.** Trafilatura's higher token reduction comes from dropping
 								  content. On the 18 sites tested it missed 45 of 90 key facts — entire
 								  customer-story sections, release dates, product names. webclaw keeps them.
 								- **Deterministic.** Same URL → same output. No LLM post-processing, no
 								  paraphrasing, no hallucination risk.
 								## Per-site results
 								Numbers are median of 3 runs. `raw` = raw fetched HTML token count.
 								`facts` = hand-curated visible facts preserved out of 5 per site.
 								| Site | raw HTML | webclaw | Firecrawl | Trafilatura | wc facts | fc facts | tr facts |
 								|---|---:|---:|---:|---:|:---:|:---:|:---:|
 								| openai.com | 170 K | 1,238 | 3,139 | 0 | **3/5** | 2/5 | 0/5 |
 								| vercel.com | 380 K | 1,076 | 4,029 | 585 | **3/5** | 3/5 | 3/5 |
 								| anthropic.com | 103 K | 672 | 560 | 96 | **5/5** | 5/5 | 4/5 |
 								| notion.com | 109 K | 13,416 | 5,261 | 91 | **5/5** | 5/5 | 2/5 |
 								| stripe.com | 243 K | 81,974 | 8,922 | 2,418 | **5/5** | 5/5 | 0/5 |
 								| tavily.com | 30 K | 1,361 | 1,969 | 182 | **5/5** | 4/5 | 3/5 |
 								| shopify.com | 184 K | 1,939 | 5,384 | 595 | **3/5** | 3/5 | 3/5 |
 								| docs.python.org | 5 K | 689 | 1,623 | 347 | **4/5** | 4/5 | 4/5 |
 								| react.dev | 107 K | 3,332 | 4,959 | 763 | **5/5** | 5/5 | 3/5 |
 								| tailwindcss.com/docs/installation | 113 K | 779 | 813 | 430 | **4/5** | 4/5 | 2/5 |
 								| nextjs.org/docs | 228 K | 968 | 885 | 631 | **4/5** | 4/5 | 4/5 |
 								| github.com | 234 K | 1,438 | 3,058 | 486 | **5/5** | 4/5 | 3/5 |
 								| en.wikipedia.org/wiki/Rust | 189 K | 47,823 | 59,326 | 37,427 | **5/5** | 5/5 | 5/5 |
 								| simonwillison.net/…/latent-reasoning | 3 K | 724 | 525 | 0 | **4/5** | 2/5 | 0/5 |
 								| paulgraham.com/essays.html | 2 K | 169 | 295 | 0 | **2/5** | 1/5 | 0/5 |
 								| techcrunch.com | 143 K | 7,265 | 11,408 | 397 | **5/5** | 5/5 | 5/5 |
 								| databricks.com | 274 K | 2,001 | 5,471 | 311 | **4/5** | 4/5 | 4/5 |
 								| hashicorp.com | 109 K | 1,501 | 4,289 | 0 | **5/5** | 5/5 | 0/5 |
 								## Reproducing this benchmark
-												Initial release: webclaw v0.1.0 — web content extraction for LLMs

CLI + MCP server for extracting clean, structured content from any URL.
6 Rust crates, 10 MCP tools, TLS fingerprinting, 5 output formats.

MIT Licensed | https://webclaw.io

											
										
										
											2026-03-23 18:31:11 +01:00
 								```bash
-												docs(benchmarks): reproducible 3-way comparison vs trafilatura + firecrawl (#25)

Replaces the previous benchmarks/README.md, which claimed specific numbers
(94.2% accuracy, 0.8ms extraction, 97% Cloudflare bypass, etc.) with no
reproducing code committed to the repo. The `webclaw-bench` crate and
`benchmarks/fixtures`, `benchmarks/ground-truth` directories it referenced
never existed. This is what #18 was calling out.

New benchmarks/ is fully reproducible. Every number ships with the script
that produced it. `./benchmarks/run.sh` regenerates everything.

Results (18 sites, 90 hand-curated facts, median of 3 runs, webclaw 0.3.18,
cl100k_base tokenizer):

  tool          reduction_mean   fidelity        latency_mean
  webclaw              92.5%    76/90 (84.4%)        0.41s
  firecrawl            92.4%    70/90 (77.8%)        0.99s
  trafilatura          97.8%    45/90 (50.0%)        0.21s

webclaw matches or beats both competitors on fidelity on all 18 sites
while running 2.4x faster than Firecrawl's hosted API.

Includes:
- README.md              — headline table + per-site breakdown
- methodology.md         — tokenizer, fact selection, run rationale
- sites.txt              — 18 canonical URLs
- facts.json             — 90 curated facts (PRs welcome to add sites)
- scripts/bench.py       — the runner
- results/2026-04-17.json — today's raw data, median of 3 runs
- run.sh                 — one-command reproduction

Closes #18

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
											
										
										
											2026-04-17 14:46:19 +02:00
+								cd benchmarks/
 								./run.sh
-												Initial release: webclaw v0.1.0 — web content extraction for LLMs

CLI + MCP server for extracting clean, structured content from any URL.
6 Rust crates, 10 MCP tools, TLS fingerprinting, 5 output formats.

MIT Licensed | https://webclaw.io

											
										
										
											2026-03-23 18:31:11 +01:00
+								```
-												docs(benchmarks): reproducible 3-way comparison vs trafilatura + firecrawl (#25)

Replaces the previous benchmarks/README.md, which claimed specific numbers
(94.2% accuracy, 0.8ms extraction, 97% Cloudflare bypass, etc.) with no
reproducing code committed to the repo. The `webclaw-bench` crate and
`benchmarks/fixtures`, `benchmarks/ground-truth` directories it referenced
never existed. This is what #18 was calling out.

New benchmarks/ is fully reproducible. Every number ships with the script
that produced it. `./benchmarks/run.sh` regenerates everything.

Results (18 sites, 90 hand-curated facts, median of 3 runs, webclaw 0.3.18,
cl100k_base tokenizer):

  tool          reduction_mean   fidelity        latency_mean
  webclaw              92.5%    76/90 (84.4%)        0.41s
  firecrawl            92.4%    70/90 (77.8%)        0.99s
  trafilatura          97.8%    45/90 (50.0%)        0.21s

webclaw matches or beats both competitors on fidelity on all 18 sites
while running 2.4x faster than Firecrawl's hosted API.

Includes:
- README.md              — headline table + per-site breakdown
- methodology.md         — tokenizer, fact selection, run rationale
- sites.txt              — 18 canonical URLs
- facts.json             — 90 curated facts (PRs welcome to add sites)
- scripts/bench.py       — the runner
- results/2026-04-17.json — today's raw data, median of 3 runs
- run.sh                 — one-command reproduction

Closes #18

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
											
										
										
											2026-04-17 14:46:19 +02:00
+								Requirements:
 								- Python 3.9+
 								- `pip install tiktoken trafilatura firecrawl-py`
 								- `webclaw` release binary at `../target/release/webclaw` (or set `$WEBCLAW`)
 								- Firecrawl API key (free tier: 500 credits/month, enough for many runs) —
 								  export as `FIRECRAWL_API_KEY`. If omitted, the benchmark runs with webclaw
 								  and Trafilatura only.
-												Initial release: webclaw v0.1.0 — web content extraction for LLMs

CLI + MCP server for extracting clean, structured content from any URL.
6 Rust crates, 10 MCP tools, TLS fingerprinting, 5 output formats.

MIT Licensed | https://webclaw.io

											
										
										
											2026-03-23 18:31:11 +01:00
-												docs(benchmarks): reproducible 3-way comparison vs trafilatura + firecrawl (#25)

Replaces the previous benchmarks/README.md, which claimed specific numbers
(94.2% accuracy, 0.8ms extraction, 97% Cloudflare bypass, etc.) with no
reproducing code committed to the repo. The `webclaw-bench` crate and
`benchmarks/fixtures`, `benchmarks/ground-truth` directories it referenced
never existed. This is what #18 was calling out.

New benchmarks/ is fully reproducible. Every number ships with the script
that produced it. `./benchmarks/run.sh` regenerates everything.

Results (18 sites, 90 hand-curated facts, median of 3 runs, webclaw 0.3.18,
cl100k_base tokenizer):

  tool          reduction_mean   fidelity        latency_mean
  webclaw              92.5%    76/90 (84.4%)        0.41s
  firecrawl            92.4%    70/90 (77.8%)        0.99s
  trafilatura          97.8%    45/90 (50.0%)        0.21s

webclaw matches or beats both competitors on fidelity on all 18 sites
while running 2.4x faster than Firecrawl's hosted API.

Includes:
- README.md              — headline table + per-site breakdown
- methodology.md         — tokenizer, fact selection, run rationale
- sites.txt              — 18 canonical URLs
- facts.json             — 90 curated facts (PRs welcome to add sites)
- scripts/bench.py       — the runner
- results/2026-04-17.json — today's raw data, median of 3 runs
- run.sh                 — one-command reproduction

Closes #18

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
											
										
										
											2026-04-17 14:46:19 +02:00
+								One run of the full suite burns ~60 Firecrawl credits (18 sites × 3 runs,
 								plus Firecrawl's scrape costs 1 credit each).
-												Initial release: webclaw v0.1.0 — web content extraction for LLMs

CLI + MCP server for extracting clean, structured content from any URL.
6 Rust crates, 10 MCP tools, TLS fingerprinting, 5 output formats.

MIT Licensed | https://webclaw.io

											
										
										
											2026-03-23 18:31:11 +01:00
-												docs(benchmarks): reproducible 3-way comparison vs trafilatura + firecrawl (#25)

Replaces the previous benchmarks/README.md, which claimed specific numbers
(94.2% accuracy, 0.8ms extraction, 97% Cloudflare bypass, etc.) with no
reproducing code committed to the repo. The `webclaw-bench` crate and
`benchmarks/fixtures`, `benchmarks/ground-truth` directories it referenced
never existed. This is what #18 was calling out.

New benchmarks/ is fully reproducible. Every number ships with the script
that produced it. `./benchmarks/run.sh` regenerates everything.

Results (18 sites, 90 hand-curated facts, median of 3 runs, webclaw 0.3.18,
cl100k_base tokenizer):

  tool          reduction_mean   fidelity        latency_mean
  webclaw              92.5%    76/90 (84.4%)        0.41s
  firecrawl            92.4%    70/90 (77.8%)        0.99s
  trafilatura          97.8%    45/90 (50.0%)        0.21s

webclaw matches or beats both competitors on fidelity on all 18 sites
while running 2.4x faster than Firecrawl's hosted API.

Includes:
- README.md              — headline table + per-site breakdown
- methodology.md         — tokenizer, fact selection, run rationale
- sites.txt              — 18 canonical URLs
- facts.json             — 90 curated facts (PRs welcome to add sites)
- scripts/bench.py       — the runner
- results/2026-04-17.json — today's raw data, median of 3 runs
- run.sh                 — one-command reproduction

Closes #18

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
											
										
										
											2026-04-17 14:46:19 +02:00
+								## Methodology
-												Initial release: webclaw v0.1.0 — web content extraction for LLMs

CLI + MCP server for extracting clean, structured content from any URL.
6 Rust crates, 10 MCP tools, TLS fingerprinting, 5 output formats.

MIT Licensed | https://webclaw.io

											
										
										
											2026-03-23 18:31:11 +01:00
-												docs(benchmarks): reproducible 3-way comparison vs trafilatura + firecrawl (#25)

Replaces the previous benchmarks/README.md, which claimed specific numbers
(94.2% accuracy, 0.8ms extraction, 97% Cloudflare bypass, etc.) with no
reproducing code committed to the repo. The `webclaw-bench` crate and
`benchmarks/fixtures`, `benchmarks/ground-truth` directories it referenced
never existed. This is what #18 was calling out.

New benchmarks/ is fully reproducible. Every number ships with the script
that produced it. `./benchmarks/run.sh` regenerates everything.

Results (18 sites, 90 hand-curated facts, median of 3 runs, webclaw 0.3.18,
cl100k_base tokenizer):

  tool          reduction_mean   fidelity        latency_mean
  webclaw              92.5%    76/90 (84.4%)        0.41s
  firecrawl            92.4%    70/90 (77.8%)        0.99s
  trafilatura          97.8%    45/90 (50.0%)        0.21s

webclaw matches or beats both competitors on fidelity on all 18 sites
while running 2.4x faster than Firecrawl's hosted API.

Includes:
- README.md              — headline table + per-site breakdown
- methodology.md         — tokenizer, fact selection, run rationale
- sites.txt              — 18 canonical URLs
- facts.json             — 90 curated facts (PRs welcome to add sites)
- scripts/bench.py       — the runner
- results/2026-04-17.json — today's raw data, median of 3 runs
- run.sh                 — one-command reproduction

Closes #18

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
											
										
										
											2026-04-17 14:46:19 +02:00
+								See [methodology.md](methodology.md) for:
 								- Tokenizer rationale (`cl100k_base` → covers GPT-4 / GPT-3.5 /
 								  `text-embedding-3-*`)
 								- Fact selection procedure and how to propose additions
 								- Why median of 3 runs (CDN / cache / network noise)
 								- Raw data schema (`results/*.json`)
 								- Notes on site churn (news aggregators, release pages)
-												Initial release: webclaw v0.1.0 — web content extraction for LLMs

CLI + MCP server for extracting clean, structured content from any URL.
6 Rust crates, 10 MCP tools, TLS fingerprinting, 5 output formats.

MIT Licensed | https://webclaw.io

											
										
										
											2026-03-23 18:31:11 +01:00
-												docs(benchmarks): reproducible 3-way comparison vs trafilatura + firecrawl (#25)

Replaces the previous benchmarks/README.md, which claimed specific numbers
(94.2% accuracy, 0.8ms extraction, 97% Cloudflare bypass, etc.) with no
reproducing code committed to the repo. The `webclaw-bench` crate and
`benchmarks/fixtures`, `benchmarks/ground-truth` directories it referenced
never existed. This is what #18 was calling out.

New benchmarks/ is fully reproducible. Every number ships with the script
that produced it. `./benchmarks/run.sh` regenerates everything.

Results (18 sites, 90 hand-curated facts, median of 3 runs, webclaw 0.3.18,
cl100k_base tokenizer):

  tool          reduction_mean   fidelity        latency_mean
  webclaw              92.5%    76/90 (84.4%)        0.41s
  firecrawl            92.4%    70/90 (77.8%)        0.99s
  trafilatura          97.8%    45/90 (50.0%)        0.21s

webclaw matches or beats both competitors on fidelity on all 18 sites
while running 2.4x faster than Firecrawl's hosted API.

Includes:
- README.md              — headline table + per-site breakdown
- methodology.md         — tokenizer, fact selection, run rationale
- sites.txt              — 18 canonical URLs
- facts.json             — 90 curated facts (PRs welcome to add sites)
- scripts/bench.py       — the runner
- results/2026-04-17.json — today's raw data, median of 3 runs
- run.sh                 — one-command reproduction

Closes #18

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
											
										
										
											2026-04-17 14:46:19 +02:00
+								## Raw data
-												Initial release: webclaw v0.1.0 — web content extraction for LLMs

CLI + MCP server for extracting clean, structured content from any URL.
6 Rust crates, 10 MCP tools, TLS fingerprinting, 5 output formats.

MIT Licensed | https://webclaw.io

											
										
										
											2026-03-23 18:31:11 +01:00
-												docs(benchmarks): reproducible 3-way comparison vs trafilatura + firecrawl (#25)

Replaces the previous benchmarks/README.md, which claimed specific numbers
(94.2% accuracy, 0.8ms extraction, 97% Cloudflare bypass, etc.) with no
reproducing code committed to the repo. The `webclaw-bench` crate and
`benchmarks/fixtures`, `benchmarks/ground-truth` directories it referenced
never existed. This is what #18 was calling out.

New benchmarks/ is fully reproducible. Every number ships with the script
that produced it. `./benchmarks/run.sh` regenerates everything.

Results (18 sites, 90 hand-curated facts, median of 3 runs, webclaw 0.3.18,
cl100k_base tokenizer):

  tool          reduction_mean   fidelity        latency_mean
  webclaw              92.5%    76/90 (84.4%)        0.41s
  firecrawl            92.4%    70/90 (77.8%)        0.99s
  trafilatura          97.8%    45/90 (50.0%)        0.21s

webclaw matches or beats both competitors on fidelity on all 18 sites
while running 2.4x faster than Firecrawl's hosted API.

Includes:
- README.md              — headline table + per-site breakdown
- methodology.md         — tokenizer, fact selection, run rationale
- sites.txt              — 18 canonical URLs
- facts.json             — 90 curated facts (PRs welcome to add sites)
- scripts/bench.py       — the runner
- results/2026-04-17.json — today's raw data, median of 3 runs
- run.sh                 — one-command reproduction

Closes #18

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
											
										
										
											2026-04-17 14:46:19 +02:00
+								Per-run results are committed as JSON at `results/YYYY-MM-DD.json` so the
 								history of measurements is auditable. Diff two runs to see regressions or
 								improvements across webclaw versions.