demo/README.md: the complete self-serve demo artifact — one-command run, the
seeded scenario explained, a "build your own scenario" section, the honest
boundary (won't invent a cause; can't reach a cause that was never recorded),
the Nature citation + the "field admits this is unsolved" sources, and the
recording playbook + paste-ready caption.
Writing/testing the README surfaced a real inconsistency, now fixed:
- The CLI's failure-finder used a hardcoded content-only marker subset and
ignored tags, so a "Checkout latency spiked" memory (regression tag, no crash
word in content) was never picked as the failure. The CLI now calls the SAME
public `looks_like_failure` (content + tags, full list) the backfill tool uses
— one definition, no drift.
- Extended FAILURE_MARKERS with performance/degradation failures (spiked,
latency, degraded, slow, hang, throttled, oom, 502/503/504, flaky, ...) so the
feature backfills from perf regressions, not just hard crashes.
clippy clean; 527 core + 453 mcp tests; both the main demo and the README's
custom scenario verified end-to-end.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>