A separate, investor-grade script (vs the viral clip): leads with the thesis
"the entire AI-memory industry is trapped in a category error — a root cause
never looks like the bug it creates", proves it live with the similarity-vs-
Postdict contrast, then closes on the moat (faithful Nature port + incumbents'
architecture IS the category error) and the market (every agent that touches
production).
Verified the exact on-screen commands end-to-end: the 3-memory scenario (cause
+ billing-500 lookalike + crash) makes the contrast devastating — similarity
search returns the billing lookalike as its #1, Postdict reaches back 3 days to
the real env-var cause. (Without the lookalike distractor the contrast collapses
— similarity would also surface the cause — so the script plants all three.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
demo/README.md: the complete self-serve demo artifact — one-command run, the
seeded scenario explained, a "build your own scenario" section, the honest
boundary (won't invent a cause; can't reach a cause that was never recorded),
the Nature citation + the "field admits this is unsolved" sources, and the
recording playbook + paste-ready caption.
Writing/testing the README surfaced a real inconsistency, now fixed:
- The CLI's failure-finder used a hardcoded content-only marker subset and
ignored tags, so a "Checkout latency spiked" memory (regression tag, no crash
word in content) was never picked as the failure. The CLI now calls the SAME
public `looks_like_failure` (content + tags, full list) the backfill tool uses
— one definition, no drift.
- Extended FAILURE_MARKERS with performance/degradation failures (spiked,
latency, degraded, slow, hang, throttled, oom, 502/503/504, flaky, ...) so the
feature backfills from perf regressions, not just hard crashes.
clippy clean; 527 core + 453 mcp tests; both the main demo and the README's
custom scenario verified end-to-end.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3-model rotation audit (DeepSeek V4-Pro / Kimi K2.7 / MiniMax M3, max thinking,
each model × each of 3 sections). Claude verified every finding against code.
CONFIRMED + FIXED:
- [FTS, consensus DeepSeek+MiniMax] sanitize_fts5_or_query split on
!is_alphanumeric()+'_', but the index uses tokenize='porter ascii' which
splits on '_' and non-ASCII. So "API_TIMEOUT"/"café" became single phrases that
could NEVER match. Now splits on !is_ascii_alphanumeric() + lowercases to mirror
the tokenizer; caps token count (64) and length (64) for DoS hardening. Also
fixes the pre-existing storage.search bug (multi-word queries silently returned
nothing). 5 new tests pin it.
- [Demo honesty, consensus Kimi+DeepSeek] the contrast labeled keyword search as
"SIMILARITY SEARCH" and asserted "NONE of these is the cause" universally. Now
prints the REAL engine ("keyword (BM25)" vs "semantic (vector + BM25 hybrid)")
and claims only what's true ("ranked by RESEMBLANCE; its top hit is a lookalike").
De-hardcoded the "Service crashed:" munging to a generic label-strip.
VERIFIED FALSE POSITIVE (not changed): MiniMax "fts.id non-existent column" —
the FTS5 table is declared `fts5(id, content, tags, ...)`, the JOIN is valid.
No injection found by any model (quote-doubling + operator-stripping confirmed safe).
clippy clean; 527 core + 453 mcp tests pass; demo verified.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>