"""IAI-MCP benchmark harness.

Phase-1 benchmarks:
- bench.tokens   -- (steady <=3000) + (fresh <=8000)
- bench.verbatim -- (verbatim recall >=99% on pinned records)

Both runners are invokable as CLIs (`python -m bench.tokens`, `python -m bench.verbatim`)
and exit non-zero on failure. They fall back to a heuristic token count when
ANTHROPIC_API_KEY is absent so CI (and first-time users) can run the suite offline.
"""