Opens with the pitch's pattern-interrupt first sentence ('Your bug was born
days before it crashed'), then tells the story in Sam's own voice — why he built
it, the soccer/causal-gap framing, the DeepMind theorem + CauseBench receipts
(0% vs 60%, kept honestly separate as theorem-vs-measurement). Links the
60-second spoken pitch (demo/PITCH-v2-causebench.md). Real citations only
(arXiv:2508.21038, Nature DOI 10.1038/s41586-024-08168-4).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
16 KiB
Vestige
Your bug was born days before it crashed — you just can't remember where.
Vestige is a local-first memory for AI agents that reaches backward through time to find the quiet change that caused today's failure — the cause that looks nothing like the bug. One 23MB Rust binary. No cloud. Your data never leaves your machine.
⚡ Quick Start · 🧠 The Idea · 🔬 The Science · 🛠 13 Tools · 📊 Dashboard
👋 Why I built this
Hi — I'm Sam. I built Vestige from a tiny apartment in Chicago because I kept losing days to the same thing, and I bet you have too.
Production breaks. You start hunting. And the cause is almost never near the error — it's some quiet change you made days ago that looks nothing like the crash it eventually caused. A flipped env var. A swapped service. A config tweak you'd already forgotten.
Here's the part that took me a while to see: every AI memory tool is built on vector search, and vector search hunts for what looks like your problem. But a root cause never looks like the bug it creates. So they all search the goal line — while the real failure was a quiet midfield turnover fifteen minutes earlier.
I wanted a memory that traces the match backward.
So that's what Vestige is. Everyone else built a memory that remembers. I tried to build the first one that realizes — it gates what's worth keeping, lets the noise fade like your own memory does, and when a failure hits, it reaches back through time to the change that actually caused it.
It's one Rust binary. It runs entirely on your machine. It never phones home. And there's a 60-second start right below.
🎙️ The 60-second version of this whole story — the one I give in person — lives in
demo/PITCH-v2-causebench.md. If you've got a minute, read that first. It's the clearest way to get why this matters.
⚡ Get it running in 60 seconds
npm install -g vestige-mcp-server@latest # one binary — no Docker, no API key, no signup
claude mcp add vestige vestige-mcp -s user # connect it to Claude Code
That's the whole install. Now talk to your agent like it has a memory — because now it does:
You: "Remember: we always disable SimSIMD on release builds, it breaks old x86 CPUs."
...days later, fresh session, zero context...
You: "Should I enable SimSIMD for the release?"
AI: ⚠️ Hold on — this contradicts a decision you stored: you chose to DISABLE it
because it breaks old x86 CPUs.
That last line isn't me being cute — it's a real status the engine returns, called claim_contradicts_memory. Most memory tools would have happily handed you the wrong answer. Vestige tells you when you're about to walk back into a mistake you already learned from.
(Works with Codex, Cursor, VS Code, Claude Desktop, Windsurf, JetBrains, Zed — anything that speaks MCP. Full setup is here ↓.)
🧠 It's not RAG with a nicer haircut
RAG is a bucket: throw everything in, hope nearest-neighbor finds it later. Vestige behaves more like an actual memory — it decides what's worth keeping, forgets what isn't, and reasons across what's left.
| 🪣 RAG / Vector Store | 🧠 Vestige | |
|---|---|---|
| What it stores | Everything you hand it | Only what's surprising or new — the rest gets merged or skipped |
| What it forgets | Nothing — it just bloats | Unused memories fade on a real forgetting curve, so your context stays lean |
| Finding a root cause | Can't — the cause isn't similar to the bug | Reaches backward in time to the change that caused it (the whole point ↓) |
| Catching contradictions | Silent — serves the stale answer with a straight face | Tells you: "this contradicts what you decided" |
| Duplicates | You clean them up by hand | Self-heals — "likes dark mode" + "prefers dark themes" quietly become one |
| Forgetting on demand | DELETE and it's gone | suppress — gently inhibits a memory (and its neighbors), reversible for 24h |
| Where it lives | Usually someone else's cloud | Your machine. One binary. No telemetry. |
🔥 The thing nothing else does: memory with hindsight
This is the part I'm proudest of, and it's worth one honest paragraph.
A bug shows up today. The cause was a quiet decision from three weeks ago — a changed env var, a swapped service. That cause shares no words with the error it created. A vector search will never connect them, because it only knows how to find things that look alike — and this is a case where the cause and the symptom look nothing alike. This isn't a tuning problem; in 2026 Google DeepMind published a proof (arXiv:2508.21038, ICLR 2026) that single-vector retrieval is mathematically incapable of bridging gaps like this.
So Vestige doesn't do it with similarity. Its Retroactive Salience Backfill — ported from Zaki/Cai et al., 2024, Nature 637:145–155 (DOI), on how the brain links a shock to the quiet memory that caused it — reaches backward through time and promotes the dormant memory that's causally upstream: it shares an entity (the same file, env var, or service), not the same words.
I also built a benchmark to keep myself honest about it. Every pure vector retriever scored 0% recall@1 on the causal-gap task; Vestige scored 60%. (To be precise: the impossibility is DeepMind's theorem; the 0%-vs-60% is my measurement — two different claims, and I keep them separate.)
vestige backfill --contrast # show the root cause a vector search would have missed
The nice part: it compounds. Every failure your agent records makes the next session diagnose faster — run two is smarter than run one — and it happens automatically during consolidation, so you don't have to babysit it.
All of this shipped in v2.2.0, along with a 34→13 tool consolidation and a rebuilt retrieval engine. Full release notes →
🔬 This is real neuroscience, not a metaphor
I get skeptical when projects wave the word "neuroscience" around, so here's my receipt: every mechanism below is a real, cited paper, implemented in Rust, running locally on your machine. None of it phones a model in the cloud to sound smart.
| Mechanism | What it does for you | Grounded in |
|---|---|---|
| Prediction-Error Gating | Redundant info gets merged, contradictory gets superseded, only the novel gets stored | The hippocampal novelty signal |
| FSRS-6 Spaced Repetition | 21 parameters of the mathematics of forgetting — used memories stay, unused fade | Modern spaced-repetition research |
| Retroactive Salience Backfill | Backward causal reach to the root cause of a failure | Zaki/Cai et al. 2024, Nature 637:145–155 |
| Synaptic Tagging | A memory that looked trivial this morning can be tagged critical tonight | Frey & Morris 1997 |
| Spreading Activation | Search "auth bug," surface last week's JWT update — memory is a graph, not a list | Collins & Loftus 1975 |
| Dual-Strength Model | Storage strength vs. retrieval strength — deeply stored ≠ instantly recalled, just like you | Bjork & Bjork 1992 |
| Memory Dreaming | Sleep-like consolidation: replays, connects, synthesizes insights to a graph | Active-dreaming consolidation |
Active Forgetting (suppress) |
Top-down inhibition that compounds and cascades to neighbors — reversible for 24h | Anderson 2025 · Davis 2020 |
Read the full science doc → — every feature, every paper.
🛠 13 tools, one brain
v2.2.0 consolidated a sprawling 34-tool surface into 13 sharp ones your agent actually reaches for. Old names still work as hidden aliases — nothing breaks.
| Tool | What it does |
|---|---|
🔍 recall |
The retrieval engine — folds search + deep reasoning + contradiction detection into one call. F32 embeddings, Reciprocal Rank Fusion, claim-vs-memory checks. |
🧠 backfill |
Memory with hindsight — backward causal reach to a failure's root cause (Cai 2024). |
💾 smart_ingest |
Stores with CREATE / UPDATE / SUPERSEDE via Prediction-Error Gating. Batch session-end saves. |
🗂 memory |
Get, edit, promote 👍, demote 👎, check state, purge content + embeddings. |
🧩 graph |
Reasoning chains, associations, bridges, predictions, force-directed export. |
🌙 maintain |
Consolidate, dream, GC, importance-score, backup, export, restore — one maintenance verb. |
🧹 dedup |
Self-healing duplicate detection + merge (8 old tools → 1). |
🚫 suppress |
Top-down active forgetting — compounds, cascades, reversible 24h. The memory is inhibited, not erased. |
📟 memory_status |
Health + stats + trends + recommendations in one packet. |
🧬 codebase · intention · source_sync · session_start |
Per-project code memory · "remind me when X" · external-source connectors · one-call session init. |
📊 Watch your AI think in 3D
vestige dashboard # → http://localhost:3927/dashboard
Every memory is a glowing node in a real-time, force-directed 3D graph. Connections form as you work. Nodes pulse when accessed, burst on creation, fade on decay. Kick off a consolidation and the whole graph slides into purple dream mode, replaying memories that light up in sequence.
Built with SvelteKit 2 · Svelte 5 · Three.js · WebGL bloom · live WebSocket events. 1000+ nodes at 60fps. Installable as a PWA.
🧩 Works in every editor you use
Vestige speaks MCP, so any client that can register a stdio MCP server can use it.
| Editor | One-liner |
|---|---|
| Claude Code | claude mcp add vestige vestige-mcp -s user |
| Codex | codex mcp add vestige -- vestige-mcp |
| Cursor / VS Code / Windsurf / JetBrains / Xcode / OpenCode | Integration guides → |
| Claude Desktop | 2-minute setup → |
Other install methods (Intel Mac, Windows, build-from-source)
Update an existing install:
vestige update # binaries only
vestige update --sandwich-companion # also refresh optional Claude Code companion files
macOS (Intel): Microsoft is dropping x86_64 macOS ONNX Runtime prebuilts after v1.23.0, so the Intel Mac build links dynamically against a Homebrew ONNX Runtime:
brew install onnxruntime
npm install -g vestige-mcp-server@latest
echo 'export ORT_DYLIB_PATH="'"$(brew --prefix onnxruntime)"'/lib/libonnxruntime.dylib"' >> ~/.zshrc && source ~/.zshrc
claude mcp add vestige vestige-mcp -s user
Full guide: docs/INSTALL-INTEL-MAC.md.
Windows + Claude Desktop: quit Claude Desktop from the tray, then in PowerShell:
npm install -g vestige-mcp-server@latest
vestige-mcp --version
Point %APPDATA%\Claude\claude_desktop_config.json at it:
{ "mcpServers": { "vestige": { "command": "vestige-mcp" } } }
If it can't find the command, run where vestige-mcp and use the exact .cmd path.
Build from source (Rust 1.91+):
git clone https://github.com/samvallad33/vestige && cd vestige
cargo build --release -p vestige-mcp
# Apple Silicon GPU: --features metal · NVIDIA: --features qwen3-embeddings,cuda
🚀 Make your AI use memory automatically
Registering the server exposes the tools; a short instruction tells the agent when to call them. Drop in the protocol and your agent saves and recalls on its own:
| You say | Vestige does |
|---|---|
| "Remember this" | Saves immediately |
| "I always..." / "I prefer..." | Saves as a durable preference |
| "Remind me when..." | Creates a future trigger (intention) |
| "This is important" | Saves and promotes it |
Agent memory protocol → · Claude Code template →
🏗 Under the hood
┌──────────────────────────────────────────────────────────┐
│ SvelteKit Dashboard — Three.js 3D graph · WebGL bloom │
├──────────────────────────────────────────────────────────┤
│ Axum HTTP + WebSocket (:3927) — REST + live event stream │
├──────────────────────────────────────────────────────────┤
│ MCP Server (stdio JSON-RPC) — 13 tools · 30 modules │
├──────────────────────────────────────────────────────────┤
│ Cognitive Engine │
│ FSRS-6 · Spreading Activation · Prediction-Error Gating │
│ Retroactive Salience Backfill · Synaptic Tagging │
│ Memory Dreamer · Hippocampal Index · Active Forgetting │
├──────────────────────────────────────────────────────────┤
│ Storage — SQLite + FTS5 · USearch HNSW · Nomic Embed v1.5│
│ Optional: Qwen3 reranker · SQLCipher · Metal/CUDA │
└──────────────────────────────────────────────────────────┘
| Language | Rust 2024 (MSRV 1.91) — 86,000+ lines |
| Binary | ~23MB, single file |
| Embeddings | Nomic Embed Text v1.5 (768d→256d Matryoshka, 8192 ctx); Qwen3 optional |
| Vector search | USearch HNSW (≈20× faster than FAISS) |
| Storage | SQLite + FTS5, optional SQLCipher encryption |
| Tests | 1,550 passing · clippy -D warnings clean |
| First run | Downloads ~130MB embedding model once, then fully offline forever |
| Platforms | macOS (ARM + Intel) · Linux x86_64 · Windows x86_64 — all prebuilt |
📚 Go deeper
| FAQ | 30+ real questions answered |
| The Science | Every feature, every paper |
| Storage Modes | Global · per-project · multi-instance |
| Configuration | CLI, env vars, every knob |
| Changelog | The full story, version by version |
If your agent should remember what you taught it yesterday — star it. ⭐
86,000+ lines of Rust · 13 tools · 30 cognitive modules · 130 years of memory research · one 23MB binary that never phones home.
Built by @samvallad33 · AGPL-3.0 · 100% local, 100% yours