mirror of
https://github.com/samvallad33/vestige.git
synced 2026-07-02 22:01:01 +02:00
demo/README.md: the complete self-serve demo artifact — one-command run, the seeded scenario explained, a "build your own scenario" section, the honest boundary (won't invent a cause; can't reach a cause that was never recorded), the Nature citation + the "field admits this is unsolved" sources, and the recording playbook + paste-ready caption. Writing/testing the README surfaced a real inconsistency, now fixed: - The CLI's failure-finder used a hardcoded content-only marker subset and ignored tags, so a "Checkout latency spiked" memory (regression tag, no crash word in content) was never picked as the failure. The CLI now calls the SAME public `looks_like_failure` (content + tags, full list) the backfill tool uses — one definition, no drift. - Extended FAILURE_MARKERS with performance/degradation failures (spiked, latency, degraded, slow, hang, throttled, oom, 502/503/504, flaky, ...) so the feature backfills from perf regressions, not just hard crashes. clippy clean; 527 core + 453 mcp tests; both the main demo and the README's custom scenario verified end-to-end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
159 lines
6.7 KiB
Markdown
159 lines
6.7 KiB
Markdown
# Postdict — memory with hindsight
|
|
|
|
> Every other AI memory finds what your bug **looks like**.
|
|
> This one finds what **caused** it — even when the cause was days ago
|
|
> and looks nothing like the crash.
|
|
|
|
When your agent hits a failure **today**, Postdict reaches **backward in time**
|
|
and surfaces the quiet earlier change that caused it — the root cause a vector /
|
|
semantic search **structurally cannot** find, because it isn't *similar* to the
|
|
error, only *causally upstream* of it.
|
|
|
|
It's a faithful port of a 2024 *Nature* neuroscience result: the brain links a
|
|
later salient event to an earlier quiet memory **backward in time** ("fear links
|
|
retrospectively, not prospectively"). Root cause is always in the past — so the
|
|
backward-only direction isn't a metaphor, it's the correct behavior.
|
|
|
|
---
|
|
|
|
## ▶️ Run the demo (30 seconds, one command)
|
|
|
|
```sh
|
|
# from the repo root
|
|
cargo build -p vestige-mcp --bin vestige --release # first time only (~1 min)
|
|
./demo/postdict-demo.sh
|
|
```
|
|
|
|
That's it. It uses a **fresh throwaway database** (a temp dir, deleted on exit) —
|
|
it touches nothing else on your machine. No API keys, no network, fully local.
|
|
|
|
**Pacing:** the script pauses ~1.4s between beats for a clean screen-recording.
|
|
- `PAUSE=0 ./demo/postdict-demo.sh` — instant (no pauses)
|
|
- `PAUSE=2 ./demo/postdict-demo.sh` — slower, more dramatic
|
|
|
|
---
|
|
|
|
## What you'll see
|
|
|
|
The demo plants three memories into an empty store, then asks one question:
|
|
|
|
| When | Memory | Note |
|
|
|---|---|---|
|
|
| **3 days ago** | `Set API_TIMEOUT=2 in the deploy env to speed up cold starts` | the quiet cause. boring. forgotten. |
|
|
| **20 days ago** | `A 500 Internal Server Error happened in the billing service` | a lookalike — *resembles* today's crash |
|
|
| **today** 💥 | `Service crashed: 500 Internal Server Error on the auth endpoint` | the failure |
|
|
|
|
Then it runs the same question through both engines, side by side:
|
|
|
|
```
|
|
── 1. SIMILARITY SEARCH · keyword (BM25) ──
|
|
1. A 500 Internal Server Error happened in the billing service ← top match
|
|
→ ranked by RESEMBLANCE. its top hit is a lookalike, not the cause.
|
|
|
|
── 2. POSTDICT (reach backward for the CAUSE) ──
|
|
#1 Set API_TIMEOUT=2 in the deploy env to speed up cold starts
|
|
↩ reached back 3.0 days before the failure
|
|
🔗 causal join: api_timeout
|
|
✅ promoted — it will resurface next time
|
|
```
|
|
|
|
**Similarity search confidently returns the lookalike.** It's wrong.
|
|
**Postdict reaches back 3 days and finds the real cause** — by the shared
|
|
`API_TIMEOUT` entity, backward in time. Then it promotes that memory so it
|
|
stops decaying and surfaces next time.
|
|
|
|
> The label says exactly which engine ran (`keyword (BM25)` here; it becomes
|
|
> `semantic (vector + BM25 hybrid)` once embeddings are generated). No
|
|
> sleight of hand — it's the real search every other memory tool does.
|
|
|
|
---
|
|
|
|
## Try your own scenario (the "it's not staged" proof)
|
|
|
|
Nothing here is hardcoded. Build any history and ask:
|
|
|
|
```sh
|
|
DB=$(mktemp -d)/db
|
|
BIN=./target/release/vestige
|
|
|
|
# plant a quiet cause N days ago (--ago-days backdates it)
|
|
$BIN --data-dir "$DB" ingest "Disabled the checkout cache while debugging" \
|
|
--tags checkout --node-type decision --ago-days 4
|
|
|
|
# record a failure today that shares an entity (here: 'checkout')
|
|
$BIN --data-dir "$DB" ingest "Checkout latency spiked to 9s after deploy" \
|
|
--tags checkout,latency,regression --node-type event
|
|
|
|
# reach backward for the cause, with the similarity contrast
|
|
$BIN --data-dir "$DB" backfill --contrast
|
|
```
|
|
|
|
The cause and the failure share the `checkout` entity but are **not textually
|
|
similar** — so semantic search misses it and Postdict finds it.
|
|
|
|
`vestige backfill --help` for all flags (`--manual`, `--lookback-days`,
|
|
`--failure-id`, `--no-promote`).
|
|
|
|
---
|
|
|
|
## The honest boundary (read this — it's the point)
|
|
|
|
- **If the upstream change was never recorded, nothing can reach it.** Postdict
|
|
reaches back through *memory*, not magic. No memory of the cause → no backfill.
|
|
- **It links by shared entities** (same file / env var / service / symbol),
|
|
backward in time — *not* by semantic similarity. That's deliberate: similarity
|
|
is exactly the blind spot every other memory already has.
|
|
- **It won't invent a cause.** No shared entity between the failure and an
|
|
earlier memory → no link. It would rather say nothing than fabricate.
|
|
- The "promote" step boosts the cause's retention so it resurfaces; it never
|
|
deletes or rewrites anything (bi-temporal — old memories stay queryable).
|
|
|
|
---
|
|
|
|
## How it works (for the skeptics)
|
|
|
|
1. **Trigger.** A memory lands that reads like a failure (high surprise +
|
|
failure markers like `error`/`crash`/`500`), or you mark one manually.
|
|
2. **Backward reach.** Postdict scans memories *older* than the failure that
|
|
share an entity with it (the causal join), within a lookback window.
|
|
3. **Rank by cause, not resemblance.** Candidates are scored by shared-entity
|
|
strength and *dissimilarity* — the less a candidate resembles the failure,
|
|
the more valuable it is, because that's precisely what a vector search can't
|
|
surface.
|
|
4. **Promote.** The surfaced cause's FSRS retention is boosted so it stops
|
|
decaying and is there next time.
|
|
|
|
The mechanism, tests, and the *Nature* citation live in
|
|
[`crates/vestige-core/src/advanced/retroactive_backfill.rs`](../crates/vestige-core/src/advanced/retroactive_backfill.rs).
|
|
The field itself admits this is unsolved: causal + temporal retrieval is
|
|
"largely unexplored" (mem0, *State of AI Agent Memory 2026*), and frontier
|
|
models fail at cloud root-cause analysis ([arXiv:2602.09937](https://arxiv.org/abs/2602.09937)).
|
|
This is the first memory that does it.
|
|
|
|
---
|
|
|
|
## Recording the demo (for a clean clip)
|
|
|
|
1. Make your terminal large, dark theme, ~16pt mono font. Clear scrollback.
|
|
2. macOS: `Cmd-Shift-5` → record a tight region around the terminal (or the
|
|
whole window). QuickTime works too.
|
|
3. Run `./demo/postdict-demo.sh` (default pacing) — or `PAUSE=2` for a slower,
|
|
more cinematic take.
|
|
4. The single "hold here" frame: when **POSTDICT** resolves the `#1` cause with
|
|
`↩ reached back 3.0 days` — that's the money shot. Let it sit.
|
|
5. Trim to ~30s. Muted + captions plays best in feeds.
|
|
|
|
**Caption / hook (ready to paste):**
|
|
|
|
> Your crash is today. The cause was an env-var edit 3 days ago — it looks
|
|
> *nothing* like the error, so vector search will never surface it. Same query,
|
|
> split screen: similarity search returns the lookalike, Postdict reaches back
|
|
> and finds the cause. Seed's in the repo — run it yourself.
|
|
|
|
**Pinned honest-boundary reply:** *"Where it doesn't work: if the upstream change
|
|
was never recorded, nothing can reach it. Everything in the clip is in the
|
|
seeded repo."*
|
|
|
|
---
|
|
|
|
*Local-first. No API keys. No data leaves your machine.*
|