docs(demo): full run-it-yourself README + unify failure detection

demo/README.md: the complete self-serve demo artifact — one-command run, the seeded scenario explained, a "build your own scenario" section, the honest boundary (won't invent a cause; can't reach a cause that was never recorded), the Nature citation + the "field admits this is unsolved" sources, and the recording playbook + paste-ready caption. Writing/testing the README surfaced a real inconsistency, now fixed: - The CLI's failure-finder used a hardcoded content-only marker subset and ignored tags, so a "Checkout latency spiked" memory (regression tag, no crash word in content) was never picked as the failure. The CLI now calls the SAME public `looks_like_failure` (content + tags, full list) the backfill tool uses — one definition, no drift. - Extended FAILURE_MARKERS with performance/degradation failures (spiked, latency, degraded, slow, hang, throttled, oom, 502/503/504, flaky, ...) so the feature backfills from perf regressions, not just hard crashes. clippy clean; 527 core + 453 mcp tests; both the main demo and the README's custom scenario verified end-to-end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-02 22:01:01 +02:00 · 2026-06-27 18:12:09 -05:00 · 2026-06-27 18:12:09 -05:00 · 561b2301db
commit 561b2301db
parent 988a31c207
4 changed files with 171 additions and 8 deletions
--- a/crates/vestige-core/src/advanced/retroactive_backfill.rs
+++ b/crates/vestige-core/src/advanced/retroactive_backfill.rs
@ -53,6 +53,9 @@ pub const FAILURE_MARKERS: &[&str] = &[
    "error", "bug", "crash", "crashed", "regression", "broke", "broken",
    "failure", "failed", "panic", "exception", "fault", "outage", "incident",
    "500", "timeout", "deadlock", "leak", "corrupt", "stack overflow",
+    // performance/degradation failures (an agent should backfill from these too)
+    "spiked", "latency", "degraded", "slow", "hang", "hung", "throttled",
+    "oom", "502", "503", "504", "rejected", "denied", "flaky",
 ];

 /// How strongly to promote the backfilled cause: multiply its stability by this
--- a/crates/vestige-mcp/src/bin/cli.rs
+++ b/crates/vestige-mcp/src/bin/cli.rs
@ -2610,18 +2610,17 @@ fn run_backfill(
    }

    // Resolve the failure text up front (used by the contrast baseline).
+    // Use the SAME failure detector the backfill tool uses (content + tags, full
+    // marker list) so the CLI's pick and the tool's pick never diverge.
    let failure_text: Option<String> = match &failure_id {
        Some(id) => storage.get_node(id).ok().flatten().map(|n| n.content),
        None => storage
            .get_all_nodes(500, 0)
            .ok()
            .and_then(|nodes| {
-                nodes.into_iter().find(|n| {
-                    let hay = n.content.to_lowercase();
-                    ["error", "crash", "500", "failed", "panic", "regression", "bug"]
-                        .iter()
-                        .any(|m| hay.contains(m))
-                })
+                nodes
+                    .into_iter()
+                    .find(vestige_mcp::tools::backfill::looks_like_failure)
            })
            .map(|n| n.content),
    };
--- a/crates/vestige-mcp/src/tools/backfill.rs
+++ b/crates/vestige-mcp/src/tools/backfill.rs
@ -91,8 +91,10 @@ fn extract_entities(node: &KnowledgeNode) -> Vec<String> {
    set.into_iter().collect()
 }

-/// Heuristic: does this memory read like a failure/"aversive event"?
-fn looks_like_failure(node: &KnowledgeNode) -> bool {
+/// Heuristic: does this memory read like a failure/"aversive event"? Checks both
+/// content AND tags against the full FAILURE_MARKERS list. Public so the CLI and
+/// any caller share ONE failure-detection definition (no drifting subsets).
+pub fn looks_like_failure(node: &KnowledgeNode) -> bool {
    let hay = node.content.to_lowercase();
    vestige_core::advanced::retroactive_backfill::FAILURE_MARKERS
        .iter()
--- a/demo/README.md
+++ b/demo/README.md
@ -0,0 +1,159 @@
+# Postdict — memory with hindsight
+
+> Every other AI memory finds what your bug **looks like**.
+> This one finds what **caused** it — even when the cause was days ago
+> and looks nothing like the crash.
+
+When your agent hits a failure **today**, Postdict reaches **backward in time**
+and surfaces the quiet earlier change that caused it — the root cause a vector /
+semantic search **structurally cannot** find, because it isn't *similar* to the
+error, only *causally upstream* of it.
+
+It's a faithful port of a 2024 *Nature* neuroscience result: the brain links a
+later salient event to an earlier quiet memory **backward in time** ("fear links
+retrospectively, not prospectively"). Root cause is always in the past — so the
+backward-only direction isn't a metaphor, it's the correct behavior.
+
+---
+
+## ▶️ Run the demo (30 seconds, one command)
+
+```sh
+# from the repo root
+cargo build -p vestige-mcp --bin vestige --release   # first time only (~1 min)
+./demo/postdict-demo.sh
+```
+
+That's it. It uses a **fresh throwaway database** (a temp dir, deleted on exit) —
+it touches nothing else on your machine. No API keys, no network, fully local.
+
+**Pacing:** the script pauses ~1.4s between beats for a clean screen-recording.
+- `PAUSE=0 ./demo/postdict-demo.sh` — instant (no pauses)
+- `PAUSE=2 ./demo/postdict-demo.sh` — slower, more dramatic
+
+---
+
+## What you'll see
+
+The demo plants three memories into an empty store, then asks one question:
+
+| When | Memory | Note |
+|---|---|---|
+| **3 days ago** | `Set API_TIMEOUT=2 in the deploy env to speed up cold starts` | the quiet cause. boring. forgotten. |
+| **20 days ago** | `A 500 Internal Server Error happened in the billing service` | a lookalike — *resembles* today's crash |
+| **today** 💥 | `Service crashed: 500 Internal Server Error on the auth endpoint` | the failure |
+
+Then it runs the same question through both engines, side by side:
+
+```
+── 1. SIMILARITY SEARCH · keyword (BM25) ──
+   1. A 500 Internal Server Error happened in the billing service   ← top match
+   → ranked by RESEMBLANCE. its top hit is a lookalike, not the cause.
+
+── 2. POSTDICT (reach backward for the CAUSE) ──
+   #1 Set API_TIMEOUT=2 in the deploy env to speed up cold starts
+      ↩ reached back 3.0 days before the failure
+      🔗 causal join: api_timeout
+      ✅ promoted — it will resurface next time
+```
+
+**Similarity search confidently returns the lookalike.** It's wrong.
+**Postdict reaches back 3 days and finds the real cause** — by the shared
+`API_TIMEOUT` entity, backward in time. Then it promotes that memory so it
+stops decaying and surfaces next time.
+
+> The label says exactly which engine ran (`keyword (BM25)` here; it becomes
+> `semantic (vector + BM25 hybrid)` once embeddings are generated). No
+> sleight of hand — it's the real search every other memory tool does.
+
+---
+
+## Try your own scenario (the "it's not staged" proof)
+
+Nothing here is hardcoded. Build any history and ask:
+
+```sh
+DB=$(mktemp -d)/db
+BIN=./target/release/vestige
+
+# plant a quiet cause N days ago (--ago-days backdates it)
+$BIN --data-dir "$DB" ingest "Disabled the checkout cache while debugging" \
+    --tags checkout --node-type decision --ago-days 4
+
+# record a failure today that shares an entity (here: 'checkout')
+$BIN --data-dir "$DB" ingest "Checkout latency spiked to 9s after deploy" \
+    --tags checkout,latency,regression --node-type event
+
+# reach backward for the cause, with the similarity contrast
+$BIN --data-dir "$DB" backfill --contrast
+```
+
+The cause and the failure share the `checkout` entity but are **not textually
+similar** — so semantic search misses it and Postdict finds it.
+
+`vestige backfill --help` for all flags (`--manual`, `--lookback-days`,
+`--failure-id`, `--no-promote`).
+
+---
+
+## The honest boundary (read this — it's the point)
+
+- **If the upstream change was never recorded, nothing can reach it.** Postdict
+  reaches back through *memory*, not magic. No memory of the cause → no backfill.
+- **It links by shared entities** (same file / env var / service / symbol),
+  backward in time — *not* by semantic similarity. That's deliberate: similarity
+  is exactly the blind spot every other memory already has.
+- **It won't invent a cause.** No shared entity between the failure and an
+  earlier memory → no link. It would rather say nothing than fabricate.
+- The "promote" step boosts the cause's retention so it resurfaces; it never
+  deletes or rewrites anything (bi-temporal — old memories stay queryable).
+
+---
+
+## How it works (for the skeptics)
+
+1. **Trigger.** A memory lands that reads like a failure (high surprise +
+   failure markers like `error`/`crash`/`500`), or you mark one manually.
+2. **Backward reach.** Postdict scans memories *older* than the failure that
+   share an entity with it (the causal join), within a lookback window.
+3. **Rank by cause, not resemblance.** Candidates are scored by shared-entity
+   strength and *dissimilarity* — the less a candidate resembles the failure,
+   the more valuable it is, because that's precisely what a vector search can't
+   surface.
+4. **Promote.** The surfaced cause's FSRS retention is boosted so it stops
+   decaying and is there next time.
+
+The mechanism, tests, and the *Nature* citation live in
+[`crates/vestige-core/src/advanced/retroactive_backfill.rs`](../crates/vestige-core/src/advanced/retroactive_backfill.rs).
+The field itself admits this is unsolved: causal + temporal retrieval is
+"largely unexplored" (mem0, *State of AI Agent Memory 2026*), and frontier
+models fail at cloud root-cause analysis ([arXiv:2602.09937](https://arxiv.org/abs/2602.09937)).
+This is the first memory that does it.
+
+---
+
+## Recording the demo (for a clean clip)
+
+1. Make your terminal large, dark theme, ~16pt mono font. Clear scrollback.
+2. macOS: `Cmd-Shift-5` → record a tight region around the terminal (or the
+   whole window). QuickTime works too.
+3. Run `./demo/postdict-demo.sh` (default pacing) — or `PAUSE=2` for a slower,
+   more cinematic take.
+4. The single "hold here" frame: when **POSTDICT** resolves the `#1` cause with
+   `↩ reached back 3.0 days` — that's the money shot. Let it sit.
+5. Trim to ~30s. Muted + captions plays best in feeds.
+
+**Caption / hook (ready to paste):**
+
+> Your crash is today. The cause was an env-var edit 3 days ago — it looks
+> *nothing* like the error, so vector search will never surface it. Same query,
+> split screen: similarity search returns the lookalike, Postdict reaches back
+> and finds the cause. Seed's in the repo — run it yourself.
+
+**Pinned honest-boundary reply:** *"Where it doesn't work: if the upstream change
+was never recorded, nothing can reach it. Everything in the clip is in the
+seeded repo."*
+
+---
+
+*Local-first. No API keys. No data leaves your machine.*