mirror of
https://github.com/samvallad33/vestige.git
synced 2026-04-25 00:36:22 +02:00
docs: add launch materials — Show HN post, demo script, and blog post
Includes HN post with prepared FAQ, Reddit cross-posts (r/rust, r/ClaudeAI, r/LocalLLaMA), MCP Dev Summit demo scripts (30s/3min/10min versions), and technical blog post covering FSRS-6, PE Gating, HyDE, STC, and dreaming. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
31bbee0c7f
commit
fc70964534
3 changed files with 1428 additions and 0 deletions
395
docs/launch/blog-post.md
Normal file
395
docs/launch/blog-post.md
Normal file
|
|
@ -0,0 +1,395 @@
|
|||
# Building a Cognitive Memory System with FSRS-6 and Three.js -- What 130 Years of Neuroscience Taught Us About AI Memory
|
||||
|
||||
Your AI assistant does not remember anything.
|
||||
|
||||
Every conversation starts from zero. You explain your project structure, your preferences, the bug you fixed last Tuesday, the architectural decision you made last month. The context window is a goldfish bowl -- 200K tokens of short-term memory, then nothing. RAG systems bolt on a vector database and call it "memory," but what they actually build is a search engine. Search is not memory. Memory is a living system that decays, strengthens, connects, and dreams.
|
||||
|
||||
Vestige is an open-source Rust MCP server that gives AI agents persistent memory modeled on real neuroscience. Not metaphorical neuroscience. Actual published algorithms from Ebbinghaus (1885), Collins & Loftus (1975), Bjork & Bjork (1992), Frey & Morris (1997), and the FSRS-6 spaced repetition scheduler trained on 700 million Anki reviews.
|
||||
|
||||
77,840+ lines of Rust. 29 cognitive modules. 734 tests. Single binary deployment with an embedded SvelteKit dashboard. AGPL-3.0 licensed.
|
||||
|
||||
Here is how we built it.
|
||||
|
||||
---
|
||||
|
||||
## The Problem: Session Boundaries are Amnesia
|
||||
|
||||
Every AI conversation today has the same failure mode. The model is stateless. Context windows are large but finite, and they reset between sessions. The industry's answer has been Retrieval-Augmented Generation -- embed documents, stuff them into the prompt, let the model figure it out.
|
||||
|
||||
RAG works for document Q&A. It does not work for memory, and here is why:
|
||||
|
||||
1. **No forgetting curve.** Every chunk in a vector database has equal weight forever. A configuration snippet from six months ago has the same retrieval priority as the bug fix from yesterday.
|
||||
2. **No consolidation.** Memories are never merged, connected, or synthesized. You get back isolated chunks, not understanding.
|
||||
3. **No retroactive importance.** If you flag something as important today, RAG cannot go back and strengthen the memories from last week that suddenly matter.
|
||||
4. **No surprise detection.** Every insert is treated the same. Duplicate information bloats the database. Contradictions pile up silently.
|
||||
|
||||
We wanted something different. We wanted memory that behaves like a brain -- where memories compete, strengthen through use, decay through neglect, and form connections during idle periods.
|
||||
|
||||
---
|
||||
|
||||
## The Solution: Treat Memory Like a Brain, Not a Database
|
||||
|
||||
Vestige implements a cognitive architecture with three core principles:
|
||||
|
||||
1. **Memories have a lifecycle.** They are born, they strengthen through retrieval, they decay over time, they can be revived, and they eventually fade below the retrieval threshold. This is FSRS-6.
|
||||
2. **Storage is gated by novelty.** Not everything deserves to be remembered. Prediction Error Gating compares new information against existing memories and decides whether to create, update, merge, or supersede. This is the hippocampal bouncer.
|
||||
3. **Retrieval changes memory.** Every search strengthens the memories it finds (the Testing Effect) and weakens competitors (retrieval-induced forgetting). Memory is not a read-only operation.
|
||||
|
||||
### Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ AI Agent (Claude, GPT, etc.) │
|
||||
│ ↕ JSON-RPC over stdio (MCP protocol) │
|
||||
├─────────────────────────────────────────────────────┤
|
||||
│ vestige-mcp 19 MCP tools │
|
||||
│ ├── Axum HTTP server (dashboard + WebSocket) │
|
||||
│ ├── CognitiveEngine (29 stateful modules) │
|
||||
│ └── Tool handlers (one file per tool) │
|
||||
├─────────────────────────────────────────────────────┤
|
||||
│ vestige-core cognitive algorithms │
|
||||
│ ├── fsrs/ FSRS-6 spaced repetition │
|
||||
│ ├── neuroscience/ 10 modules (STC, spreading │
|
||||
│ │ activation, hippocampal │
|
||||
│ │ index, importance signals...) │
|
||||
│ ├── search/ hybrid, HyDE, reranker, │
|
||||
│ │ keyword, vector, temporal │
|
||||
│ ├── advanced/ 11 modules (dreams, PE gate, │
|
||||
│ │ chains, compression, │
|
||||
│ │ cross-project learning...) │
|
||||
│ ├── embeddings/ fastembed (Nomic v1.5, 768d) │
|
||||
│ └── storage/ SQLite + FTS5 + USearch HNSW │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
The entire system compiles to a single binary. The SvelteKit dashboard is embedded at compile time using Rust's `include_dir!` macro. No external services. No cloud dependencies. Your memories live on your machine.
|
||||
|
||||
---
|
||||
|
||||
## Deep Dive: FSRS-6 -- A Power Law Forgetting Curve
|
||||
|
||||
The Free Spaced Repetition Scheduler (FSRS-6) is the mathematical backbone of Vestige. Where traditional systems store memories with static priority scores, every memory in Vestige has two dynamic properties: **stability** (how deeply encoded it is) and **difficulty** (how hard it is to retain). These evolve over time according to a 21-parameter model trained on 700 million real-world Anki reviews.
|
||||
|
||||
The core formula is the power forgetting curve:
|
||||
|
||||
```
|
||||
R(t, S) = (1 + factor * t / S) ^ (-w20)
|
||||
|
||||
where factor = 0.9 ^ (-1 / w20) - 1
|
||||
```
|
||||
|
||||
- `R` is retrievability -- the probability you can recall this memory right now
|
||||
- `t` is elapsed time since last access (in days)
|
||||
- `S` is stability -- the number of days for R to drop to 90%
|
||||
- `w20` is the personalizable decay parameter (default 0.1542)
|
||||
|
||||
Why power law instead of exponential? Because Ebbinghaus was right in 1885, and the data from 700 million reviews confirms it: human forgetting follows a power curve, not an exponential one. Power law decay has a heavier tail -- memories hang around longer than exponential models predict, which matches real behavior.
|
||||
|
||||
Here is the actual Rust implementation:
|
||||
|
||||
```rust
|
||||
pub fn retrievability_with_decay(stability: f64, elapsed_days: f64, w20: f64) -> f64 {
|
||||
if stability <= 0.0 { return 0.0; }
|
||||
if elapsed_days <= 0.0 { return 1.0; }
|
||||
|
||||
let factor = 0.9_f64.powf(-1.0 / w20) - 1.0;
|
||||
let r = (1.0 + factor * elapsed_days / stability).powf(-w20);
|
||||
r.clamp(0.0, 1.0)
|
||||
}
|
||||
```
|
||||
|
||||
FSRS-6 also introduced three new parameters (w17, w18, w19) for **same-day reviews** -- a gap in earlier versions that caused instability when memories were accessed multiple times within 24 hours. In an AI agent context where the same memory might be retrieved dozens of times in one session, this matters enormously.
|
||||
|
||||
Each memory exists in one of four states based on its accessibility score:
|
||||
|
||||
| State | Accessibility | Behavior |
|
||||
|-------|--------------|----------|
|
||||
| **Active** | >= 70% | Immediately retrievable, surfaces in searches |
|
||||
| **Dormant** | 40-70% | Retrievable with effort, lower search priority |
|
||||
| **Silent** | 10-40% | Rarely surfaces, needs direct access to revive |
|
||||
| **Unavailable** | < 10% | Below retrieval threshold, candidate for GC |
|
||||
|
||||
Memories are never hard-deleted. They fade. And any access -- even a search that returns them as a secondary result -- strengthens them back toward Active.
|
||||
|
||||
---
|
||||
|
||||
## Deep Dive: Prediction Error Gating -- The Hippocampal Bouncer
|
||||
|
||||
The brain does not store everything it perceives. The hippocampus acts as a novelty filter, comparing incoming information against existing memories and only consolidating what is genuinely new. This is prediction error: the gap between what you expected and what you got.
|
||||
|
||||
Vestige implements this as `PredictionErrorGate`. When `smart_ingest` is called with new content:
|
||||
|
||||
1. The content is embedded into a 768-dimensional vector (Nomic Embed v1.5)
|
||||
2. Existing memories are searched for candidates above a similarity threshold
|
||||
3. Cosine similarity determines the prediction error: `PE = 1.0 - similarity`
|
||||
4. A decision is made:
|
||||
|
||||
```rust
|
||||
pub enum GateDecision {
|
||||
Create { reason, prediction_error, related_memory_ids },
|
||||
Update { target_id, similarity, update_type, prediction_error },
|
||||
Supersede { old_memory_id, similarity, supersede_reason, prediction_error },
|
||||
Merge { memory_ids, avg_similarity, strategy },
|
||||
}
|
||||
```
|
||||
|
||||
The thresholds:
|
||||
|
||||
| Similarity | Decision | Rationale |
|
||||
|-----------|----------|-----------|
|
||||
| > 0.92 | **Reinforce** | Near-identical content. Strengthen existing memory. |
|
||||
| > 0.75 | **Update/Merge** | Related content. Merge information into existing memory. |
|
||||
| 0.70-0.75 + contradiction detected | **Supersede** | Correction. New content replaces outdated memory. |
|
||||
| < 0.75 | **Create** | Novel content. Store as new memory. |
|
||||
|
||||
Contradiction detection uses heuristic NLP -- looking for negation patterns ("don't" vs. "do", "avoid" vs. "use") and correction phrases ("actually", "the right way", "should be"). This catches the common case where a user corrects earlier advice.
|
||||
|
||||
The result: you can call `smart_ingest` aggressively without worrying about duplicates. The gate handles deduplication, merging, and conflict resolution automatically. The cost of a false positive (saving something redundant) is near zero because the gate will catch it. The cost of a false negative (losing knowledge) is permanent.
|
||||
|
||||
---
|
||||
|
||||
## Deep Dive: HyDE Search -- Query Expansion Without an LLM
|
||||
|
||||
Hypothetical Document Embeddings (HyDE) is a technique from Gao et al. (2022) where you use an LLM to generate a hypothetical answer to a query, embed that hypothetical answer, and use it for vector search. The intuition: a hypothetical answer is closer in embedding space to the real answer than the raw question is.
|
||||
|
||||
Full HyDE requires an LLM call at search time. That is too slow for a local-first system with sub-50ms search targets. Vestige implements a zero-latency approximation:
|
||||
|
||||
1. **Intent classification.** The raw query is classified into one of six intents: Definition, HowTo, Reasoning, Temporal, Lookup, or Technical.
|
||||
|
||||
2. **Template expansion.** Based on the intent, 3-5 variant queries are generated:
|
||||
|
||||
```rust
|
||||
QueryIntent::Definition => {
|
||||
variants.push(format!("{clean} is a concept that involves"));
|
||||
variants.push(format!("The definition of {clean} in the context of"));
|
||||
variants.push(format!("{clean} refers to a type of"));
|
||||
}
|
||||
```
|
||||
|
||||
3. **Centroid embedding.** All variants are embedded, and the centroid (average) of the embedding vectors is computed and L2-normalized.
|
||||
|
||||
4. **Broadened search.** The centroid embedding captures a wider semantic space than any single query, improving recall for conceptual and question-style queries.
|
||||
|
||||
This gives approximately 60% of full HyDE quality improvement with zero latency overhead. The embedding model (Nomic v1.5 running locally via fastembed) generates all variant embeddings in a single batch.
|
||||
|
||||
The search pipeline then runs seven stages:
|
||||
|
||||
1. **Overfetch** -- Pull 3x results from hybrid search (BM25 keyword + semantic vector)
|
||||
2. **Rerank** -- Re-score by relevance using a cross-encoder-style reranker
|
||||
3. **Temporal boost** -- Recent memories get a recency bonus
|
||||
4. **Accessibility filter** -- FSRS-6 retention threshold gates results (Ebbinghaus curve)
|
||||
5. **Context match** -- Tulving's encoding specificity (1973): match current context to encoding context
|
||||
6. **Competition** -- Anderson's retrieval-induced forgetting (1994): winners strengthen, competitors weaken
|
||||
7. **Spreading activation** -- Collins & Loftus (1975): activate related memories as a side effect
|
||||
|
||||
That last stage is the critical differentiator. Every search does not just return results -- it reshapes the memory landscape.
|
||||
|
||||
---
|
||||
|
||||
## Deep Dive: Synaptic Tagging and Capture -- Retroactive Importance
|
||||
|
||||
This is the feature that no other AI memory system has.
|
||||
|
||||
In 1997, Frey and Morris published a landmark paper in Nature describing Synaptic Tagging and Capture (STC). The finding: weak stimulation creates a temporary "synaptic tag" at a synapse. If a strong stimulation occurs within a temporal window (up to 9 hours), Plasticity-Related Products (PRPs) are produced that can be "captured" by the tagged synapses, consolidating them to long-term storage.
|
||||
|
||||
Translation for AI: **memories can become important retroactively.**
|
||||
|
||||
You have a conversation with a coworker about their vacation plans. Trivial. Three hours later, you learn they are leaving the company. Suddenly that vacation conversation is important context. In a traditional memory system, the vacation memory has already been classified as low-priority and buried. With STC, the "leaving the company" event triggers a backward sweep that captures and promotes the vacation conversation.
|
||||
|
||||
Vestige implements this with a 9-hour backward window and a 2-hour forward window:
|
||||
|
||||
```rust
|
||||
const DEFAULT_BACKWARD_HOURS: f64 = 9.0;
|
||||
const DEFAULT_FORWARD_HOURS: f64 = 2.0;
|
||||
```
|
||||
|
||||
When an importance event occurs (user explicitly flags something, a novelty spike is detected, or repeated access patterns emerge), the STC system sweeps for tagged memories within the capture window. Capture probability decays with temporal distance using one of four configurable decay functions (exponential, linear, power law, or logarithmic).
|
||||
|
||||
Different event types have different capture characteristics:
|
||||
|
||||
| Event Type | Base Strength | Capture Radius | Use Case |
|
||||
|-----------|--------------|----------------|----------|
|
||||
| UserFlag | 1.0 | 1.0x | "Remember this" |
|
||||
| NoveltySpike | 0.9 | 0.7x (narrow) | High prediction error |
|
||||
| EmotionalContent | 0.8 | 1.5x (wide) | Sentiment detection |
|
||||
| RepeatedAccess | 0.75 | 1.2x | Pattern of retrieval |
|
||||
|
||||
Captured memories are grouped into **importance clusters** -- temporal neighborhoods of memories that collectively provide context around a significant moment. This models how biological memory works: you do not remember isolated facts, you remember episodes.
|
||||
|
||||
---
|
||||
|
||||
## Deep Dive: Memory Dreaming -- Offline Consolidation
|
||||
|
||||
During sleep, the hippocampus replays recent experiences and transfers consolidated memories to the neocortex. This process discovers hidden connections between memories, strengthens important patterns, and prunes weak connections.
|
||||
|
||||
Vestige simulates this with a 5-stage dream cycle:
|
||||
|
||||
```
|
||||
Stage 1 - Replay: Replay recent memories in chronological order
|
||||
Stage 2 - Cross-reference: Compare all memory pairs for hidden connections
|
||||
Stage 3 - Strengthen: Reinforce connections that co-activate
|
||||
Stage 4 - Prune: Decay weak connections, remove below threshold
|
||||
Stage 5 - Transfer: Identify memories ready for semantic storage
|
||||
```
|
||||
|
||||
The dreaming system maintains a `ConnectionGraph` -- a weighted bidirectional graph where edges represent discovered relationships between memories. Edges have strength (0.0 to 2.0) and decay over time (factor 0.95 per consolidation cycle). Connections below 0.1 strength are pruned.
|
||||
|
||||
During Phase 2, the system evaluates memory pairs and discovers connections via multiple signals:
|
||||
|
||||
```rust
|
||||
pub enum DiscoveredConnectionType {
|
||||
Semantic, // High embedding similarity (> 0.8)
|
||||
SharedConcept, // 2+ shared tags
|
||||
Temporal, // Created within 24 hours + similarity > 0.6
|
||||
Complementary, // Moderate similarity, different angles
|
||||
CausalChain, // Cause-effect relationship detected
|
||||
}
|
||||
```
|
||||
|
||||
Phase 3 generates synthesized insights -- new knowledge that emerges from combining existing memories:
|
||||
|
||||
```rust
|
||||
pub enum InsightType {
|
||||
HiddenConnection, // "X and Y are related in ways you didn't notice"
|
||||
RecurringPattern, // "You keep encountering this theme"
|
||||
Generalization, // "These specific cases suggest a general rule"
|
||||
Contradiction, // "These two memories conflict"
|
||||
KnowledgeGap, // "You know X and Z but not Y"
|
||||
TemporalTrend, // "This topic has evolved over the past month"
|
||||
Synthesis, // "Combining A + B + C yields new understanding"
|
||||
}
|
||||
```
|
||||
|
||||
Dreams are triggered automatically: at session start if more than 24 hours have passed since the last dream, or after every 50 memory saves. The consolidation scheduler also monitors activity patterns and runs during detected idle periods (30+ minutes of inactivity).
|
||||
|
||||
---
|
||||
|
||||
## The Dashboard: Three.js Makes Memory Visible
|
||||
|
||||
Memory is invisible by default. You cannot debug what you cannot see. Vestige includes an embedded dashboard that renders the memory graph as a 3D force-directed visualization powered by Three.js with WebGL bloom post-processing.
|
||||
|
||||
Every memory is a glowing sphere. Size maps to retention strength. Color maps to node type (fact, concept, decision, pattern, event). Opacity fades as memories decay. Edges represent discovered connections, with opacity proportional to connection weight.
|
||||
|
||||
The visualization is event-driven via WebSocket. The Axum HTTP server runs alongside the MCP stdio transport, broadcasting `VestigeEvent` variants to all connected dashboard clients:
|
||||
|
||||
```rust
|
||||
pub enum VestigeEvent {
|
||||
MemoryCreated { id, content_preview, node_type, tags, timestamp },
|
||||
SearchPerformed { query, result_count, result_ids, duration_ms, timestamp },
|
||||
DreamStarted { memory_count, timestamp },
|
||||
DreamProgress { phase, memory_id, progress_pct, timestamp },
|
||||
ConnectionDiscovered { source_id, target_id, connection_type, weight, timestamp },
|
||||
RetentionDecayed { id, old_retention, new_retention, timestamp },
|
||||
Heartbeat { uptime_secs, memory_count, avg_retention, timestamp },
|
||||
// ... 13 event types total
|
||||
}
|
||||
```
|
||||
|
||||
On the frontend, each event type triggers a distinct visual effect:
|
||||
|
||||
- **MemoryCreated:** Particle spawn burst (60 particles expanding outward) + expanding shockwave ring
|
||||
- **SearchPerformed:** Blue pulse ripple across all nodes
|
||||
- **DreamStarted:** Purple wash, bloom intensity increases to 1.5, rotation slows
|
||||
- **DreamProgress:** Individual memories light up as they are "replayed"
|
||||
- **ConnectionDiscovered:** Golden flash line between two nodes
|
||||
- **RetentionDecayed:** Red pulse on the decaying node
|
||||
- **ConsolidationCompleted:** Golden shimmer across all nodes
|
||||
|
||||
The force-directed layout uses a Fibonacci sphere distribution for initial positions with repulsion-attraction dynamics: nodes repel each other (Coulomb's law), edges attract connected nodes (spring force), and a centering force prevents drift. The simulation runs for 300 frames then settles.
|
||||
|
||||
The entire dashboard ships inside the Vestige binary. No separate frontend deployment. No CDN. `include_dir!` embeds the SvelteKit build output at compile time, and Axum serves it with proper MIME types and cache headers:
|
||||
|
||||
```rust
|
||||
static DASHBOARD_DIR: Dir<'_> =
|
||||
include_dir!("$CARGO_MANIFEST_DIR/../../apps/dashboard/build");
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture: Why Rust, SQLite, and Local Embeddings
|
||||
|
||||
### Rust
|
||||
|
||||
Memory is infrastructure. It runs on every interaction, on every search, on every save. Latency matters. We need sub-50ms search over thousands of memories, with embedding generation, FSRS calculations, and seven-stage pipeline execution. Rust gives us zero-cost abstractions, fearless concurrency (the `CognitiveEngine` is `Arc<Mutex<CognitiveEngine>>` shared across async handlers), and compile-time guarantees that the 29 stateful cognitive modules do not have data races.
|
||||
|
||||
### SQLite + FTS5 + USearch
|
||||
|
||||
SQLite is the most deployed database in the world for a reason. WAL mode gives us concurrent reads alongside writes. FTS5 gives us BM25 keyword search with zero operational overhead. USearch provides a Rust-native HNSW index for approximate nearest-neighbor vector search. The entire memory store is a single file at `~/.vestige/vestige.db`.
|
||||
|
||||
### fastembed (Nomic Embed v1.5)
|
||||
|
||||
All embeddings run locally. The Nomic Embed v1.5 model produces 768-dimensional vectors, runs via ONNX Runtime, and is competitive with OpenAI's ada-002. The model is cached at `~/.cache/huggingface/` after first download (~130MB). No API keys. No network calls during operation. Your memories never leave your machine.
|
||||
|
||||
### Performance
|
||||
|
||||
| Memories | Search Time | Memory Usage |
|
||||
|----------|-------------|--------------|
|
||||
| 100 | < 10ms | ~50MB |
|
||||
| 1,000 | < 50ms | ~100MB |
|
||||
| 10,000 | < 200ms | ~300MB |
|
||||
|
||||
---
|
||||
|
||||
## Results: 734 Tests and What They Cover
|
||||
|
||||
Vestige has 734 tests across the workspace (313 in vestige-core, 338 in vestige-mcp, plus e2e tests). Every cognitive module has dedicated test coverage:
|
||||
|
||||
- **FSRS-6 algorithm:** Retrievability monotonic decay, round-trip interval calculation, sentiment boost, same-day review stability, difficulty mean reversion, fuzzing determinism
|
||||
- **Prediction Error Gating:** Empty candidates, near-identical reinforcement, demoted memory supersession, orthogonal content creation, contradiction detection, force-create/force-update intents
|
||||
- **Synaptic Tagging:** Tag creation and capture, PRP triggering, weak event rejection, clustering, tag decay and cleanup, batch operations, capture window probability
|
||||
- **Spreading Activation:** Network creation, edge addition, BFS propagation with decay, activation thresholds, edge reinforcement
|
||||
- **Memory Dreaming:** Full dream cycle, tag similarity, connection graph CRUD, consolidation scheduling, activity tracking
|
||||
|
||||
### Comparison with Existing Approaches
|
||||
|
||||
| Feature | RAG (Pinecone/Chroma) | mem0 | Vestige |
|
||||
|---------|----------------------|------|---------|
|
||||
| Forgetting curve | No | No | FSRS-6 (21-param power law) |
|
||||
| Duplicate detection | Manual | Basic | Prediction Error Gating |
|
||||
| Retroactive importance | No | No | Synaptic Tagging & Capture |
|
||||
| Retrieval strengthening | No | No | Testing Effect + spreading activation |
|
||||
| Dream consolidation | No | No | 5-stage sleep model |
|
||||
| Query expansion | No | No | HyDE (template-based) |
|
||||
| 3D visualization | No | No | Three.js + WebSocket |
|
||||
| Local embeddings | Optional | Cloud | Always local (Nomic v1.5) |
|
||||
| Single binary | No | No | include_dir! embedded dashboard |
|
||||
| License | Proprietary/OSS | OSS | AGPL-3.0 |
|
||||
|
||||
---
|
||||
|
||||
## What We Learned
|
||||
|
||||
**Neuroscience is an engineering goldmine.** The literature on human memory is vast, detailed, and largely untapped by the AI systems community. Papers from the 1970s through 2000s describe algorithms that directly translate into code -- Collins & Loftus's spreading activation is literally a BFS with weighted edges and decay. FSRS-6 is a parameterized forgetting curve. STC is a temporal window query with capture probability.
|
||||
|
||||
**The Testing Effect changes everything.** Making search a write operation (not just read) transforms the memory dynamics. Frequently accessed memories get stronger. Competitors get weaker. The system self-organizes toward surfacing what matters most.
|
||||
|
||||
**Prediction Error Gating eliminates the "save or not" problem.** The single hardest UX question in AI memory is: what should be saved? The answer from neuroscience is: whatever is surprising. PE Gating compares against existing knowledge and only stores what is genuinely novel. This eliminates both the "save everything" bloat and the "save nothing" amnesia.
|
||||
|
||||
**Dreams are not a gimmick.** Offline consolidation consistently discovers connections that real-time search misses. When you replay 50 memories and compare all pairs, patterns emerge that individual searches would never find. The insight generation is simple (tag overlap + temporal proximity + embedding similarity), but the results are surprisingly useful.
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
Vestige v1.9 is targeting autonomic features: a retention target system that automatically adjusts consolidation frequency, adaptive embedding model selection based on content type, and a proactive suggestion engine that surfaces relevant memories before you search for them.
|
||||
|
||||
Further out: emotional memory tagging via sentiment analysis (the amygdala module), multi-agent memory sharing (let your coding agent share memories with your research agent), and a training loop that personalizes the FSRS-6 weights to your individual forgetting curve.
|
||||
|
||||
Memory is the missing layer between context windows and persistent knowledge. We think treating it as a cognitive system -- not a database -- is the right approach.
|
||||
|
||||
---
|
||||
|
||||
Vestige is open source under AGPL-3.0 at [github.com/samvallad33/vestige](https://github.com/samvallad33/vestige).
|
||||
|
||||
### References
|
||||
|
||||
- Ebbinghaus, H. (1885). *Uber das Gedachtnis*. Duncker & Humblot.
|
||||
- Collins, A. M., & Loftus, E. F. (1975). A spreading-activation theory of semantic processing. *Psychological Review*, 82(6), 407-428.
|
||||
- Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In *From learning processes to cognitive processes: Essays in honor of William K. Estes* (Vol. 2, pp. 35-67).
|
||||
- Tulving, E., & Thomson, D. M. (1973). Encoding specificity and retrieval processes in episodic memory. *Psychological Review*, 80(5), 352-373.
|
||||
- Frey, U., & Morris, R. G. M. (1997). Synaptic tagging and long-term potentiation. *Nature*, 385, 533-536.
|
||||
- Roediger, H. L., & Karpicke, J. D. (2006). Test-enhanced learning: Taking memory tests improves long-term retention. *Psychological Science*, 17(3), 249-255.
|
||||
- Anderson, M. C., Bjork, R. A., & Bjork, E. L. (1994). Remembering can cause forgetting: Retrieval dynamics in long-term memory. *Journal of Experimental Psychology: Learning, Memory, and Cognition*, 20(5), 1063-1087.
|
||||
- Redondo, R. L., & Morris, R. G. M. (2011). Making memories last: the synaptic tagging and capture hypothesis. *Nature Reviews Neuroscience*, 12(1), 17-30.
|
||||
- Gao, L., et al. (2022). Precise Zero-Shot Dense Retrieval without Relevance Labels. *arXiv:2212.10496*.
|
||||
- Ye, J., et al. (2024). FSRS-6: A spaced repetition algorithm based on free recall. *github.com/open-spaced-repetition*.
|
||||
485
docs/launch/demo-script.md
Normal file
485
docs/launch/demo-script.md
Normal file
|
|
@ -0,0 +1,485 @@
|
|||
# Vestige v2.0 "Cognitive Leap" — MCP Dev Summit NYC Demo Script
|
||||
|
||||
**Event:** MCP Dev Summit NYC, April 1-3, 2026
|
||||
**Presenter:** Sam Valladares
|
||||
**Project:** Vestige — The cognitive engine that gives AI a brain.
|
||||
**Tagline:** 130 years of memory research. One Rust binary. Zero cloud.
|
||||
|
||||
---
|
||||
|
||||
## Pre-Demo Checklist (Do This Before the Conference)
|
||||
|
||||
### Hardware
|
||||
- [ ] MacBook charged, charger packed
|
||||
- [ ] USB-C to HDMI adapter tested with venue projector (ask AV team for resolution)
|
||||
- [ ] Phone hotspot configured as backup (embedding model already cached = no network needed)
|
||||
|
||||
### Software
|
||||
- [ ] Vestige v2.0 binary installed: `vestige-mcp --version` shows `2.0.0`
|
||||
- [ ] Claude Code installed and authenticated
|
||||
- [ ] Terminal font size: 18pt minimum (audience readability)
|
||||
- [ ] Browser zoom: 150% for dashboard views
|
||||
|
||||
### Pre-Load for Offline Demo
|
||||
```bash
|
||||
# 1. Ensure embedding model is cached (130MB, downloads on first use)
|
||||
# Run any search to trigger download:
|
||||
vestige health
|
||||
|
||||
# 2. Verify cache exists:
|
||||
ls ~/.fastembed_cache/
|
||||
# Should show: nomic-ai--nomic-embed-text-v1.5/
|
||||
|
||||
# 3. Pre-load ~20 diverse memories so the graph looks alive:
|
||||
vestige ingest "TypeScript is my preferred language for frontend development" --tags preference,typescript
|
||||
vestige ingest "React Server Components eliminate client-side data fetching waterfalls" --tags pattern,react
|
||||
vestige ingest "Decided to use Axum over Actix-web for Vestige's HTTP layer because of tower middleware ecosystem" --tags decision,architecture
|
||||
vestige ingest "BUG FIX: SQLite WAL mode requires separate reader/writer connections for concurrent access" --tags bug-fix,sqlite
|
||||
vestige ingest "FSRS-6 uses 21 parameters trained on 700M+ Anki reviews, achieving 30% better efficiency than SM-2" --tags fsrs,science
|
||||
vestige ingest "Prediction Error Gating: similarity > 0.92 = reinforce, > 0.75 = update, < 0.75 = create new" --tags pe-gating,science
|
||||
vestige ingest "Synaptic Tagging (Frey & Morris 1997): memories become important retroactively when a significant event occurs within 9 hours" --tags science,synaptic-tagging
|
||||
vestige ingest "Three.js InstancedMesh renders 1000+ nodes at 60fps using GPU instancing" --tags pattern,threejs
|
||||
vestige ingest "MCP protocol uses JSON-RPC 2.0 over stdio — no HTTP overhead, native tool integration" --tags mcp,architecture
|
||||
vestige ingest "Bjork dual-strength model: storage strength never decreases, retrieval strength decays with time" --tags science,bjork
|
||||
vestige ingest "HyDE query expansion classifies intent into 6 types and generates 3-5 hypothetical document variants" --tags hyde,search
|
||||
vestige ingest "Ebbinghaus forgetting curve: R = e^(-t/S) where R=retrievability, t=time, S=stability" --tags science,ebbinghaus
|
||||
vestige ingest "USearch HNSW index is 20x faster than FAISS for nearest neighbor search" --tags performance,search
|
||||
vestige ingest "Reconsolidation (Nader 2000): retrieved memories enter a labile state for 24-48 hours where they can be modified" --tags science,reconsolidation
|
||||
vestige ingest "Anderson 1994 retrieval-induced forgetting: retrieving one memory suppresses competing memories" --tags science,competition
|
||||
vestige ingest "Einstein & McDaniel 1990 prospective memory: remember to do X when Y happens, with time/context/event triggers" --tags science,prospective-memory
|
||||
vestige ingest "Vestige search pipeline: overfetch -> rerank -> temporal boost -> accessibility filter -> context match -> competition -> spreading activation" --tags architecture,search-pipeline
|
||||
vestige ingest "Vestige has 734 tests, 77,840 lines of Rust, 29 cognitive modules, and ships as a 22MB binary" --tags vestige,stats
|
||||
vestige ingest "The difference between Vestige and every other AI memory tool: we implemented the actual neuroscience, not just a vector database with timestamps" --tags vestige,philosophy
|
||||
vestige ingest "Spreading activation: when you search for one thing, related memories light up automatically, like how thinking of 'doctor' primes 'nurse'" --tags science,spreading-activation
|
||||
|
||||
# 4. Run a dream cycle to create connections between memories:
|
||||
# (Do this through Claude Code so the CognitiveEngine processes it)
|
||||
# In Claude Code, say: "Dream about my recent memories"
|
||||
|
||||
# 5. Verify dashboard is running:
|
||||
open http://localhost:3927/dashboard
|
||||
# Should see 3D graph with ~20 nodes, connections from the dream
|
||||
|
||||
# 6. Test the full demo flow once (time it):
|
||||
# - Open terminal, open browser side by side
|
||||
# - Run through the 3-minute version
|
||||
# - Target: under 3 minutes with natural pacing
|
||||
```
|
||||
|
||||
### Browser Tabs to Pre-Open
|
||||
1. `http://localhost:3927/dashboard` (3D Graph view)
|
||||
2. `http://localhost:3927/dashboard/feed` (Real-time event feed)
|
||||
3. `http://localhost:3927/dashboard/stats` (Stats with retention histogram)
|
||||
4. GitHub repo: `https://github.com/samvallad33/vestige`
|
||||
|
||||
### Terminal Setup
|
||||
- Split terminal: left pane for Claude Code, right pane for commands
|
||||
- Dark background, high contrast
|
||||
- Cursor blink off (less distracting on projector)
|
||||
|
||||
---
|
||||
|
||||
## VERSION 1: 30-Second Elevator Pitch
|
||||
|
||||
**Use this:** Hallway conversations, after-party, meeting someone at the coffee line, anyone who asks "what are you working on?"
|
||||
|
||||
### The Script
|
||||
|
||||
> Your AI forgets everything between sessions. Every conversation starts from zero.
|
||||
>
|
||||
> I built Vestige — a single Rust binary that gives AI persistent memory based on real cognitive science. Not a vector database with a timestamp column. Actual FSRS-6 spaced repetition trained on 700 million reviews. Prediction error gating. Synaptic tagging. Memory dreaming. The same algorithms your brain uses.
|
||||
>
|
||||
> It runs as an MCP server. One command to install, one command to connect. Your AI remembers your preferences, your decisions, your bug fixes. And memories decay on the Ebbinghaus curve unless they're used — just like yours do.
|
||||
>
|
||||
> Twenty-two megabyte binary. Seven hundred thirty-four tests. Zero cloud dependencies. I'm at the summit if you want to see the 3D brain visualization.
|
||||
|
||||
### Key Points to Hit
|
||||
- "Real neuroscience, not just embeddings" — this is the differentiator
|
||||
- "Single Rust binary" — simplicity, performance
|
||||
- "MCP server" — relevant to this audience specifically
|
||||
- "Zero cloud" — privacy, local-first resonates
|
||||
- End with an invitation to see the dashboard — creates a follow-up
|
||||
|
||||
### If They Ask One Follow-Up Question
|
||||
**"How is this different from Mem0?"**
|
||||
> Mem0 is a cloud memory API. Great product, well-funded. But it's fundamentally a vector store with categories. Vestige implements the actual cognitive science — memories decay on the Ebbinghaus curve, get strengthened by retrieval, get consolidated in dream cycles, compete for activation. It's the difference between a filing cabinet and a brain.
|
||||
|
||||
**"What's the MCP integration like?"**
|
||||
> One command: `claude mcp add vestige vestige-mcp -s user`. That's it. Twenty-one tools, but they're organized into five subsystems that Claude uses automatically. You don't even think about it — your AI just starts remembering.
|
||||
|
||||
**"Is it open source?"**
|
||||
> AGPL-3.0. Fully open. The neuroscience is the moat, not the code.
|
||||
|
||||
---
|
||||
|
||||
## VERSION 2: 3-Minute Demo — "Watch Me Think"
|
||||
|
||||
**Use this:** Lightning talk slot, booth demo, small group gathered around your laptop.
|
||||
|
||||
**Tone:** Fast. Visual. Punchy. Let the 3D graph do the talking.
|
||||
|
||||
**Setup:** Terminal on left half of screen. Browser with dashboard on right half. Dashboard open to the Graph page.
|
||||
|
||||
### [0:00-0:20] The Hook
|
||||
|
||||
> *[Point at 3D graph on screen, nodes floating and pulsing]*
|
||||
>
|
||||
> This is a brain. Not a metaphor — an actual implementation of how memory works. Every node is a memory. Every connection was discovered by a dream cycle. That glow you see? That's retention decaying on the Ebbinghaus curve in real time.
|
||||
>
|
||||
> This is Vestige. Let me show you what happens when I talk to Claude.
|
||||
|
||||
### [0:20-0:50] The Live Memory
|
||||
|
||||
*[Switch to terminal with Claude Code]*
|
||||
|
||||
```
|
||||
You: Remember that I'm presenting at MCP Dev Summit NYC and my talk is about cognitive memory systems
|
||||
```
|
||||
|
||||
*[While Claude processes, switch to browser — point at the Feed tab]*
|
||||
|
||||
> Watch the feed. See that? "MemoryCreated" event just fired over WebSocket. And if you look at the graph...
|
||||
>
|
||||
> *[Switch to Graph tab — a new node appears with a burst animation]*
|
||||
>
|
||||
> There. A new neuron, literally being born in real time.
|
||||
|
||||
### [0:50-1:30] The Search
|
||||
|
||||
*[Back to terminal]*
|
||||
|
||||
```
|
||||
You: What do you know about how I'm using memory science in my work?
|
||||
```
|
||||
|
||||
> Now watch what happens during a search.
|
||||
>
|
||||
> *[Switch to Feed — show SearchPerformed event]*
|
||||
>
|
||||
> Seven-stage pipeline just fired. It overfetched 3x results, ran them through a cross-encoder reranker, applied temporal boost, checked FSRS retention, matched context using Tulving's encoding specificity principle, ran competition — Anderson's retrieval-induced forgetting — and spread activation to related memories.
|
||||
>
|
||||
> *[Point at Claude's response showing recalled context]*
|
||||
>
|
||||
> And it just pulled back exactly the right context. Not because of keyword matching — because of cognitive science.
|
||||
|
||||
### [1:30-2:15] The Dream
|
||||
|
||||
*[Back to terminal]*
|
||||
|
||||
```
|
||||
You: Dream about my recent memories
|
||||
```
|
||||
|
||||
> Now the wild part. This triggers a dream cycle — inspired by how your hippocampus replays memories during sleep to find hidden connections.
|
||||
>
|
||||
> *[Switch to Graph — show dream mode: purple ambient, nodes pulsing sequentially]*
|
||||
>
|
||||
> Watch the nodes light up one by one. It's replaying memories, looking for patterns it missed. See those golden lines appearing? Those are connections it just discovered between memories that were stored at completely different times.
|
||||
>
|
||||
> *[Switch to Stats tab, point at retention histogram]*
|
||||
>
|
||||
> And here's the retention distribution. FSRS-6 — trained on 700 million Anki reviews. Every memory has a predicted decay curve. Memories that aren't accessed fade. Memories that get used get stronger. Exactly like your brain.
|
||||
|
||||
### [2:15-2:50] The Punchline
|
||||
|
||||
> *[Switch to terminal, run:]*
|
||||
|
||||
```bash
|
||||
vestige-mcp --version
|
||||
# → vestige-mcp 2.0.0
|
||||
```
|
||||
|
||||
```bash
|
||||
wc -l $(find /path/to/vestige/crates -name "*.rs") | tail -1
|
||||
# → 77,840 total
|
||||
```
|
||||
|
||||
> Seventy-eight thousand lines of Rust. Seven hundred thirty-four tests. Twenty-two megabyte binary. Ships with the dashboard embedded. Install is one curl command:
|
||||
|
||||
```bash
|
||||
curl -L https://github.com/samvallad33/vestige/releases/latest/download/vestige-mcp-aarch64-apple-darwin.tar.gz | tar -xz
|
||||
claude mcp add vestige vestige-mcp -s user
|
||||
```
|
||||
|
||||
> Two lines. Your AI now has a brain.
|
||||
|
||||
### [2:50-3:00] Close
|
||||
|
||||
> Vestige v2.0, "Cognitive Leap." Open source, AGPL-3.0. The repo is `samvallad33/vestige`. Come talk to me if you want to see the neuroscience under the hood.
|
||||
|
||||
---
|
||||
|
||||
## VERSION 3: 10-Minute Deep Dive — Full Walkthrough
|
||||
|
||||
**Use this:** Breakout session, workshop demo, recorded talk, anyone who gives you a stage.
|
||||
|
||||
**Tone:** Authoritative. Technical depth but accessible. Build from simple to mind-blowing.
|
||||
|
||||
**Setup:** Terminal fullscreen. Browser in separate space (swipe to switch). Have the GitHub repo open as a backup.
|
||||
|
||||
---
|
||||
|
||||
### ACT 1: The Problem [0:00-1:30]
|
||||
|
||||
> I want to start with a question. How many of you use Claude, or GPT, or Cursor every day?
|
||||
>
|
||||
> *[Hands go up]*
|
||||
>
|
||||
> And how many of you have had this experience: you spent two hours debugging a problem with your AI, you finally fixed it, and a week later you hit the exact same bug and your AI has absolutely no memory of the solution?
|
||||
>
|
||||
> *[Nods]*
|
||||
>
|
||||
> That's because every major AI assistant has the memory of a goldfish. Every session starts from absolute zero. Claude literally tells you this: "I don't have the ability to remember previous conversations."
|
||||
>
|
||||
> Now here's what's strange. We've known how to build memory systems for over a century. Hermann Ebbinghaus published the forgetting curve in 1885. Bjork and Bjork formalized the dual-strength model in 1992. FSRS-6 — the state of the art in spaced repetition — was trained on 700 million reviews and published in 2024. The science exists.
|
||||
>
|
||||
> Nobody implemented it for AI. So I did.
|
||||
|
||||
### ACT 2: Install and First Memory [1:30-3:30]
|
||||
|
||||
> Let me show you how fast this is. I'm starting from scratch.
|
||||
|
||||
```bash
|
||||
# Install (macOS Apple Silicon)
|
||||
curl -L https://github.com/samvallad33/vestige/releases/latest/download/vestige-mcp-aarch64-apple-darwin.tar.gz | tar -xz
|
||||
sudo mv vestige-mcp vestige vestige-restore /usr/local/bin/
|
||||
```
|
||||
|
||||
> Three binaries. The MCP server, the CLI admin tool, and a restore utility. Twenty-two megabytes total. No Docker. No Python. No node_modules. No cloud API key.
|
||||
|
||||
```bash
|
||||
# Connect to Claude Code
|
||||
claude mcp add vestige vestige-mcp -s user
|
||||
```
|
||||
|
||||
> That's it. One command connects Vestige to Claude as a user-scoped MCP server. Let me restart Claude Code and show you what happens.
|
||||
|
||||
*[Open Claude Code]*
|
||||
|
||||
```
|
||||
You: Remember that I prefer Rust over Go for systems programming, and TypeScript for frontend work.
|
||||
```
|
||||
|
||||
> Watch what Claude does. It calls `smart_ingest` — Vestige's primary storage tool. But this isn't just shoving text into a database. Let me walk you through what just happened under the hood:
|
||||
>
|
||||
> 1. **Prediction Error Gating** checked if this memory already exists. It compared the new content against all stored memories using embedding similarity. Since it's novel — similarity below 0.75 — it creates a new memory.
|
||||
> 2. **Importance scoring** ran four channels: novelty (is this new?), arousal (is this emotionally significant?), reward (will this be useful?), attention (is the user focused?). User preferences score high on reward.
|
||||
> 3. **Intent detection** classified this as a "preference" statement.
|
||||
> 4. **Synaptic tagging** marked this memory for potential retroactive strengthening if something related happens in the next 9 hours.
|
||||
> 5. **Hippocampal indexing** created a fast-lookup pointer.
|
||||
>
|
||||
> All of that in under 50 milliseconds. In a single Rust binary. No API calls.
|
||||
|
||||
*[New Claude Code session]*
|
||||
|
||||
```
|
||||
You: What programming languages do I prefer?
|
||||
```
|
||||
|
||||
> New session. Clean context. But Vestige remembers.
|
||||
>
|
||||
> *[Show Claude responding with the saved preference]*
|
||||
>
|
||||
> And this search just ran a 7-stage cognitive pipeline. Let me show you what's happening visually.
|
||||
|
||||
### ACT 3: The Dashboard [3:30-5:30]
|
||||
|
||||
> *[Switch to browser: `localhost:3927/dashboard`]*
|
||||
>
|
||||
> This is the Vestige dashboard. SvelteKit 2, Three.js, WebSocket connection to the running MCP server.
|
||||
|
||||
*[Graph page — nodes floating in 3D space with bloom post-processing]*
|
||||
|
||||
> Every node is a memory. The brightness represents retention strength — how accessible this memory is right now. That's driven by FSRS-6, the same algorithm that powers modern Anki. Brighter means recently accessed or frequently used. Dimmer means it's fading.
|
||||
>
|
||||
> The connections between nodes? Those were discovered during dream cycles — offline consolidation where Vestige replays memories and finds hidden relationships, just like your hippocampus does during sleep.
|
||||
>
|
||||
> I can click any node to see its details — content, type, tags, retention strength, when it was created, when it was last accessed, its predicted decay curve.
|
||||
|
||||
*[Click a node, show detail panel with retention curve]*
|
||||
|
||||
> See this curve? That's `R(t) = (1 + FACTOR * t / S)^(-w20)` — the FSRS-6 forgetting curve. Right now this memory has 94% retention. In 7 days without access, it drops to 71%. In 30 days, 43%. But every time it's retrieved, the stability increases and the curve flattens. The more you use a memory, the harder it is to forget. Exactly like your brain.
|
||||
|
||||
*[Switch to Feed page]*
|
||||
|
||||
> This is the real-time event feed. Every cognitive operation — memory creation, search, dreaming, consolidation — fires a WebSocket event. Let me show you.
|
||||
|
||||
*[Switch back to terminal, run a search in Claude Code]*
|
||||
|
||||
```
|
||||
You: How does FSRS-6 work?
|
||||
```
|
||||
|
||||
*[Switch to Feed — show SearchPerformed event with details]*
|
||||
|
||||
> There — `SearchPerformed`. Query: "How does FSRS-6 work?" Results: 3 memories returned. And on the graph, watch the nodes involved pulse.
|
||||
>
|
||||
> *[Switch to Graph — show nodes pulsing from the search]*
|
||||
>
|
||||
> The blue pulse you see is spreading activation. When you search for FSRS, it doesn't just find memories about FSRS — it activates related memories about spaced repetition, Ebbinghaus, Bjork. Like how thinking about "doctor" primes "nurse" in your mind.
|
||||
|
||||
### ACT 4: The Dream [5:30-7:30]
|
||||
|
||||
> Now let me show you the feature I'm most proud of. Memory dreaming.
|
||||
>
|
||||
> Your brain doesn't just store memories and retrieve them. During sleep, your hippocampus replays the day's experiences, compresses them, finds connections you missed while awake, and consolidates the important ones into long-term storage. This is the science of memory consolidation — Diekelmann and Born, 2010.
|
||||
>
|
||||
> Vestige does the same thing.
|
||||
|
||||
*[In Claude Code:]*
|
||||
|
||||
```
|
||||
You: Dream about my recent memories
|
||||
```
|
||||
|
||||
*[Switch to Dashboard Graph — show dream mode activate]*
|
||||
|
||||
> Watch this. Purple ambient wash — we're entering dream mode. The graph slows down. And now watch the nodes light up one at a time.
|
||||
>
|
||||
> *[Point as nodes pulse sequentially]*
|
||||
>
|
||||
> It's replaying each memory, computing similarity to every other memory, looking for connections that weren't obvious at ingest time. See that golden line that just appeared? It just discovered that two memories stored days apart are semantically related.
|
||||
|
||||
*[Switch to Feed — show DreamStarted, then DreamCompleted events]*
|
||||
|
||||
> The dream cycle produces insights — natural language descriptions of the connections it found. And here's the critical part: the memories that get replayed during dreaming are strengthened. Their FSRS stability increases. Memories that aren't replayed continue to decay. Over time, the system naturally retains what matters and forgets what doesn't.
|
||||
>
|
||||
> This is not a cron job. This is not garbage collection. This is the actual computational equivalent of memory consolidation during sleep.
|
||||
|
||||
### ACT 5: HyDE and the Search Pipeline [7:30-9:00]
|
||||
|
||||
> One more thing I want to show you. The search isn't just keyword matching plus cosine similarity.
|
||||
>
|
||||
> Vestige v2.0 added HyDE — Hypothetical Document Embeddings. When you search for something conceptual, the system first classifies your intent: are you asking a definition question? A how-to? Reasoning about a problem? Looking up a specific fact?
|
||||
>
|
||||
> Based on that classification, it generates 3 to 5 hypothetical documents — what a perfect answer might look like — and creates a centroid embedding from all of them. That centroid is what gets compared against your stored memories.
|
||||
>
|
||||
> The result: dramatically better recall on conceptual queries. If you search "how does memory decay work," you don't just get memories that contain those words. You get memories about Ebbinghaus, FSRS, retention curves, Bjork's dual-strength model — because the hypothetical documents capture the concept, not just the keywords.
|
||||
>
|
||||
> This runs on top of the 7-stage pipeline:
|
||||
>
|
||||
> 1. Overfetch 3x results from hybrid search (BM25 keyword + semantic embedding)
|
||||
> 2. Cross-encoder reranker re-scores by deep relevance
|
||||
> 3. Temporal boost for recent memories
|
||||
> 4. FSRS-6 retention filter — memories below threshold are inaccessible
|
||||
> 5. Context matching — Tulving 1973, encoding specificity
|
||||
> 6. Competition — Anderson 1994, retrieval-induced forgetting
|
||||
> 7. Spreading activation — activate related memories, update predictive model
|
||||
>
|
||||
> And critically: searching strengthens the memories you find. This is called the Testing Effect — retrieval practice is the single most effective way to consolidate memory. Every search is a workout for the memories involved.
|
||||
|
||||
### ACT 6: The Numbers and the Close [9:00-10:00]
|
||||
|
||||
> Let me leave you with the numbers.
|
||||
|
||||
*[Terminal:]*
|
||||
|
||||
```bash
|
||||
vestige-mcp --version
|
||||
# → vestige-mcp 2.0.0
|
||||
|
||||
# Stats
|
||||
# 77,840 lines of Rust
|
||||
# 734 tests, zero failures
|
||||
# 29 cognitive modules
|
||||
# 22MB release binary with embedded dashboard
|
||||
# 21 MCP tools across 5 subsystems
|
||||
# 12 published neuroscience principles implemented
|
||||
# <50ms typical ingest latency
|
||||
# <300ns cosine similarity (benchmarked with Criterion)
|
||||
# Zero cloud dependencies
|
||||
# Zero API keys required
|
||||
# One curl command to install
|
||||
```
|
||||
|
||||
> This is what I've been building for the past three months. I'm one person, I'm twenty-one years old, and I believe this is how AI memory should work — grounded in real science, running locally, open source.
|
||||
>
|
||||
> Vestige v2.0, "Cognitive Leap." The repo is `github.com/samvallad33/vestige`. The dashboard is running at `localhost:3927`. I'll be around all three days — come find me if you want to talk about FSRS, or synaptic tagging, or why I think every AI assistant on the planet should have a forgetting curve.
|
||||
>
|
||||
> Thank you.
|
||||
|
||||
---
|
||||
|
||||
## Anticipated Audience Questions and Answers
|
||||
|
||||
### Technical Questions
|
||||
|
||||
**Q: How does this compare to RAG? Isn't this just RAG with extra steps?**
|
||||
> RAG is retrieval-augmented generation — you search a corpus and inject results into the prompt. Vestige does that, but with a cognitive layer on top. RAG doesn't have retention decay. RAG doesn't have memory consolidation. RAG doesn't have prediction error gating to prevent duplicates. RAG doesn't suppress competing memories on retrieval. Vestige is to RAG what human memory is to a filing cabinet — the retrieval mechanism is similar, but the memory lifecycle is completely different.
|
||||
|
||||
**Q: What embedding model do you use?**
|
||||
> Nomic Embed Text v1.5 by default — 768 dimensions truncated to 256 via Matryoshka representation learning. All local via ONNX through fastembed. v2.0 also supports Nomic v2 MoE (475M params, 8 experts) as an opt-in feature. The reranker is Jina v1 Turbo, with Qwen3-Reranker-0.6B available as opt-in.
|
||||
|
||||
**Q: What's the storage backend?**
|
||||
> SQLite with WAL mode. FTS5 for keyword search with Porter stemming. USearch HNSW for vector search — 20x faster than FAISS. Separate reader/writer connections for concurrent access. Single file database. I8 vector quantization for 2x storage savings with under 1% recall loss.
|
||||
|
||||
**Q: How does FSRS-6 actually work? What are the 21 parameters?**
|
||||
> FSRS models memory as a power-law forgetting curve: `R(t) = (1 + FACTOR * t / S)^(-w20)` where S is stability and w20 is the decay parameter. The 21 parameters were trained on 700 million Anki reviews using machine learning. They encode how difficulty changes with repeated reviews, how stability grows based on review quality (Again/Hard/Good/Easy), and how same-day reviews affect long-term retention. It's 30% more efficient than SM-2, which is what Anki has used for decades.
|
||||
|
||||
**Q: Does it work with Claude Desktop? Other AI clients?**
|
||||
> Yes. It speaks MCP — the Model Context Protocol. One config change and it works with Claude Desktop, Cursor, VS Code Copilot, JetBrains, Windsurf, Xcode 26.3. Anything that speaks MCP.
|
||||
|
||||
**Q: What about multi-user or team memory?**
|
||||
> That's the v3.0 roadmap — "Hivemind." Ed25519 identity, CRDT-based sync, transactive directory (Wegner's "who knows what" routing), federated retrieval with differential privacy. The open source version is single-user, local-first. Team and cloud features will be proprietary.
|
||||
|
||||
**Q: How does Prediction Error Gating prevent duplicate memories?**
|
||||
> When you ingest a new memory, it computes embedding similarity against all existing memories. If similarity is above 0.92, it reinforces the existing memory (bumps FSRS stability). Between 0.75 and 0.92, it updates/merges. Below 0.75, it creates a new memory. The thresholds come from computational neuroscience research on prediction error signals — the brain stores what's surprising, reinforces what's familiar, and updates what's partially known. Same principle.
|
||||
|
||||
**Q: What's the performance like at scale? How many memories can it handle?**
|
||||
> I'm currently running with 700+ memories and everything is under 50ms. HNSW scales logarithmically, so 10x the memories doesn't mean 10x the latency. SQLite handles millions of rows. The practical limit is probably in the hundreds of thousands before you'd want to shard — but for personal AI memory, that's years of heavy use.
|
||||
|
||||
### Business / Community Questions
|
||||
|
||||
**Q: Why AGPL and not MIT?**
|
||||
> Because I don't want AWS or Google hosting Vestige as a service without contributing back. AGPL means if you serve it over a network, you must open-source your modifications. Local use is completely free and unrestricted. Cloud and team features are proprietary — the MongoDB/HashiCorp playbook.
|
||||
|
||||
**Q: How do I contribute?**
|
||||
> GitHub: `samvallad33/vestige`. The codebase is well-tested — 734 tests, zero warnings. Good first issues are labeled. The science is documented in `docs/SCIENCE.md`. PRs welcome, especially for new cognitive modules.
|
||||
|
||||
**Q: Are you funded? Is this a company?**
|
||||
> Not yet. This is a solo project. I built it because I believe AI memory should work like human memory — not because a VC told me to. If the right opportunity comes along, I'm open to it. But the open source core isn't going anywhere.
|
||||
|
||||
### Skeptical Questions
|
||||
|
||||
**Q: Isn't this over-engineered? Do you really need 29 cognitive modules?**
|
||||
> Fair question. You could build a memory system with just embeddings and timestamps, and it would work for simple cases. But it would miss things. It wouldn't know that a memory stored last week is related to one stored last month unless you searched for both. It wouldn't automatically forget outdated information. It wouldn't retroactively strengthen a memory when something important happens later. Each module solves a specific problem that real users hit. The 29 modules aren't bloat — they're the necessary completeness of a system that actually models cognition.
|
||||
|
||||
**Q: How do I know the neuroscience is real and not just marketing?**
|
||||
> Every principle I've implemented has a citation. Ebbinghaus 1885. Bjork and Bjork 1992. Frey and Morris 1997. Anderson 1994. Tulving 1973. Einstein and McDaniel 1990. Nader 2000. Diekelmann and Born 2010. FSRS was published with full methodology and trained on public Anki data. I didn't make any of this up — I translated peer-reviewed cognitive science into Rust.
|
||||
|
||||
**Q: Claude Code just added built-in memory. Doesn't that make Vestige obsolete?**
|
||||
> Claude's built-in memory is a flat text file. No spaced repetition, no decay, no dreaming, no cognitive search pipeline, no visualization. It's a good first step — it validates that persistent memory matters. But it's the difference between a notepad and a brain. Vestige doesn't compete with Claude's memory — it replaces and extends it.
|
||||
|
||||
---
|
||||
|
||||
## Recovery Plans
|
||||
|
||||
### If the Dashboard Doesn't Load
|
||||
- Vestige-mcp must be running (it auto-starts the dashboard on port 3927)
|
||||
- Fall back to terminal-only demo: `vestige health` and `vestige stats` show system state
|
||||
- Say: "The dashboard is SvelteKit embedded in the binary — it starts automatically. Let me show you the data from the terminal instead."
|
||||
|
||||
### If the Embedding Model Downloads Mid-Demo
|
||||
- This only happens on first-ever use. The pre-demo checklist prevents this.
|
||||
- If it happens: "The embedding model is 130MB and downloads once on first use. After that, Vestige is fully offline. Let me switch to the dashboard while it downloads."
|
||||
|
||||
### If Claude Code Takes Too Long to Respond
|
||||
- Have a second terminal tab with pre-typed commands ready
|
||||
- Switch to showing the dashboard and explaining the architecture
|
||||
- Say: "While Claude processes that, let me show you what's happening in the visualization."
|
||||
|
||||
### If Someone Asks Something You Don't Know
|
||||
- "That's a great question and I honestly don't have a good answer for it yet. Find me after the talk and let's dig into it."
|
||||
- Never bluff. The audience is technical. Authenticity wins.
|
||||
|
||||
---
|
||||
|
||||
## Stage Presence Notes
|
||||
|
||||
- **Start from the dashboard.** The 3D graph is the hook. It's visual, it's unusual, it makes people lean in.
|
||||
- **Don't rush the dream sequence.** The purple wash and sequential node pulses are the most visually impressive moment. Let it breathe for 3-4 seconds.
|
||||
- **Say the scientists' names.** "Ebbinghaus," "Bjork," "Frey and Morris" — this signals that you've done the reading. The MCP Dev Summit audience respects depth.
|
||||
- **Make eye contact during the punchline.** "One curl command. Your AI now has a brain." Look at the audience, not the screen.
|
||||
- **Own your age.** Twenty-one, solo developer, zero funding. This is an asset, not a liability. You built something that the well-funded competitors haven't.
|
||||
- **The dashboard is your co-presenter.** Every time Claude does something, the dashboard should be showing the corresponding event. Practice the terminal-to-browser switch until it's seamless.
|
||||
- **Don't apologize.** Not for bugs, not for the AGPL, not for being solo. Confident but not arrogant. The work speaks.
|
||||
548
docs/launch/show-hn.md
Normal file
548
docs/launch/show-hn.md
Normal file
|
|
@ -0,0 +1,548 @@
|
|||
# Vestige v2.0 Launch — Show HN + Cross-Posts
|
||||
|
||||
---
|
||||
|
||||
## 1. Hacker News — Show HN
|
||||
|
||||
### Title (76 chars)
|
||||
|
||||
```
|
||||
Show HN: Vestige – FSRS-6 spaced repetition as long-term memory for AI agents
|
||||
```
|
||||
|
||||
### Body (first comment)
|
||||
|
||||
```
|
||||
Hi HN,
|
||||
|
||||
I built Vestige because every AI conversation starts from zero. Your AI has no
|
||||
memory of yesterday. I wanted to fix that with real science, not just a vector
|
||||
database with a wrapper.
|
||||
|
||||
**What it is:** A memory server for AI agents (MCP protocol). It sits between
|
||||
you and your AI — Claude, Cursor, VS Code Copilot, etc. — and gives it genuine
|
||||
long-term memory with cognitive-science-backed forgetting, strengthening, and
|
||||
retrieval. Written in Rust, 100% local, single 22MB binary.
|
||||
|
||||
**The neuroscience stack:**
|
||||
|
||||
- **FSRS-6 spaced repetition** — the same 21-parameter power-law forgetting
|
||||
model behind Anki's best algorithm, trained on 700M+ reviews. Memories decay
|
||||
along empirically-validated curves instead of living forever with equal weight.
|
||||
|
||||
- **Prediction Error Gating** — on ingest, new content is compared against all
|
||||
existing memories. If similarity >92%, it reinforces. 75-92%, it merges.
|
||||
<75%, it creates. This is inspired by how the brain decides what's worth
|
||||
encoding vs. what's redundant.
|
||||
|
||||
- **Dual-strength model** (Bjork & Bjork, 1992) — each memory tracks storage
|
||||
strength (how well-encoded, only increases) and retrieval strength (how
|
||||
accessible right now, decays over time). A memory can be well-stored but hard
|
||||
to retrieve, like a name on the tip of your tongue.
|
||||
|
||||
- **Testing Effect** — every search automatically strengthens matching memories.
|
||||
Using memory makes it stronger. This is one of the most robust findings in
|
||||
cognitive psychology (Roediger & Karpicke, 2006).
|
||||
|
||||
- **Synaptic Tagging** (Frey & Morris, 1997) — when something important happens,
|
||||
it retroactively strengthens memories from the surrounding time window (default:
|
||||
9 hours back, 2 hours forward). This models how the brain consolidates memories
|
||||
during waking hours.
|
||||
|
||||
- **Spreading Activation** (Collins & Loftus, 1975) — searching for "React hooks"
|
||||
surfaces "useEffect" memories through semantic similarity, even without keyword
|
||||
overlap.
|
||||
|
||||
- **Memory Dreaming** — offline consolidation that replays recent memories to
|
||||
discover hidden connections and synthesize insights. Inspired by hippocampal
|
||||
replay during sleep.
|
||||
|
||||
**v2.0 adds:**
|
||||
|
||||
- 3D neural visualization dashboard (SvelteKit + Three.js) — watch memories
|
||||
pulse when accessed, burst particles on creation, golden flash lines when
|
||||
connections form. GPU instanced rendering handles 1000+ nodes at 60fps.
|
||||
|
||||
- WebSocket event bus — every cognitive operation (search, dream, consolidation,
|
||||
decay) broadcasts real-time events to the dashboard.
|
||||
|
||||
- HyDE query expansion — template-based Hypothetical Document Embeddings that
|
||||
classify query intent into 6 types, expand into 3-5 variants, and average
|
||||
the embedding vectors. Dramatically improves conceptual search.
|
||||
|
||||
- Everything compiles into a single 22MB binary. The SvelteKit dashboard is
|
||||
embedded via Rust's `include_dir!` macro. No Docker, no Node runtime, no
|
||||
external services.
|
||||
|
||||
**Numbers:** 77,840 lines of Rust, 734 tests, 29 cognitive modules, 21 MCP
|
||||
tools, search under 50ms for 1000 memories (SQLite FTS5 + USearch HNSW).
|
||||
|
||||
**What it is NOT:** This is not RAG. RAG treats memory as a static database —
|
||||
chunk everything, embed it, top-k retrieve. Vestige treats memory as a dynamic
|
||||
cognitive system. Memories decay. Using them makes them stronger. Important
|
||||
events retroactively strengthen recent memories. Irrelevant memories fade. The
|
||||
system evolves.
|
||||
|
||||
The embedding model (Nomic Embed Text v1.5) runs locally via ONNX. After the
|
||||
first-run model download (~130MB), there are zero network requests. No
|
||||
telemetry, no analytics, no phoning home.
|
||||
|
||||
I've been using this daily for 2 months and the experience is genuinely different.
|
||||
Claude remembers my coding patterns, my architectural decisions, my preferences.
|
||||
New sessions start with context instead of a blank slate.
|
||||
|
||||
Source: https://github.com/samvallad33/vestige
|
||||
|
||||
Happy to answer any questions about the cognitive science, the Rust architecture,
|
||||
or MCP in general.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Prepared FAQ for HN Comments
|
||||
|
||||
### Q: "How is this different from just shoving everything into a vector database?"
|
||||
|
||||
```
|
||||
The core difference: a vector database gives everything equal weight forever.
|
||||
Vestige applies forgetting.
|
||||
|
||||
In a vector DB, a note from 6 months ago you never referenced sits alongside
|
||||
critical context from yesterday, both equally retrievable. Over time your
|
||||
retrieval quality degrades because the signal-to-noise ratio gets worse.
|
||||
|
||||
Vestige uses FSRS-6 (the same algorithm as Anki's best spaced repetition mode)
|
||||
to model forgetting curves. Memories you use get stronger (Testing Effect).
|
||||
Memories you ignore fade (power-law decay). Important events retroactively
|
||||
strengthen nearby memories (Synaptic Tagging).
|
||||
|
||||
The result is a system where retrieval quality improves over time instead of
|
||||
degrading. The AI surfaces what's actually relevant, not just what's
|
||||
semantically closest.
|
||||
|
||||
It also deduplicates on ingest (Prediction Error Gating) — if you try to store
|
||||
something 92%+ similar to existing memory, it reinforces instead of creating a
|
||||
duplicate. This keeps the knowledge base clean without manual maintenance.
|
||||
```
|
||||
|
||||
### Q: "Why not just use Claude's built-in memory / ChatGPT memory?"
|
||||
|
||||
```
|
||||
Two reasons: control and science.
|
||||
|
||||
Control: Native AI memory is a black box on someone else's servers. You can't
|
||||
see what was stored, how it decays, or export it. Vestige stores everything in
|
||||
a local SQLite database you own. You can query it directly, back it up, export
|
||||
as JSON, or delete it entirely.
|
||||
|
||||
Science: Native memory implementations are proprietary. We have no idea what
|
||||
algorithm they use for retention or retrieval. Vestige uses published research
|
||||
— FSRS-6 (power-law forgetting, 21 parameters, trained on 700M Anki reviews),
|
||||
dual-strength model (Bjork & Bjork 1992), encoding specificity (Tulving 1973).
|
||||
These are well-studied, empirically-validated models.
|
||||
|
||||
They also work simultaneously — Claude's native memory handles general context,
|
||||
Vestige handles structured knowledge with explicit cognitive science.
|
||||
```
|
||||
|
||||
### Q: "77K lines of Rust seems like a lot for a memory system. What's in there?"
|
||||
|
||||
```
|
||||
Fair question. Roughly:
|
||||
|
||||
- ~22K: fastembed (vendored fork of the embedding library, ONNX inference)
|
||||
- ~15K: 29 cognitive modules (FSRS-6, prediction error gating, synaptic
|
||||
tagging, spreading activation, dreaming, hippocampal index, etc.)
|
||||
- ~12K: MCP server + 21 tool implementations
|
||||
- ~8K: Storage layer (SQLite, FTS5, HNSW vector index, migrations)
|
||||
- ~7K: SvelteKit dashboard (TypeScript/Svelte, embedded in binary)
|
||||
- ~6K: Tests (734 tests across core + mcp + e2e + doctests)
|
||||
- ~5K: Search pipeline (hybrid BM25+semantic, RRF fusion, HyDE, reranker)
|
||||
- ~3K: Dashboard backend (Axum, WebSocket, REST API, event system)
|
||||
|
||||
Is it over-engineered? Maybe. But each cognitive module implements a specific
|
||||
finding from memory research. The complexity comes from faithfully modeling how
|
||||
memory actually works, not from unnecessary abstraction.
|
||||
```
|
||||
|
||||
### Q: "Does FSRS-6 actually make a difference, or is it just a gimmick?"
|
||||
|
||||
```
|
||||
FSRS-6 is the state of the art in spaced repetition. It was developed by the
|
||||
open-spaced-repetition group, trained on 700M+ Anki reviews, and benchmarks
|
||||
30% more efficient than SM-2 (the algorithm most SRS apps use, which dates
|
||||
to 1987).
|
||||
|
||||
The key insight is the forgetting model. SM-2 uses exponential decay, which
|
||||
doesn't match empirical data. FSRS-6 uses a power-law curve:
|
||||
|
||||
R(t, S) = (1 + factor * t / S)^(-w20)
|
||||
|
||||
Power-law forgetting has been consistently demonstrated in memory research
|
||||
since Wixted & Ebbesen (1991). The difference matters practically — exponential
|
||||
decay predicts memories fall off a cliff, while power-law decay predicts a long
|
||||
tail where old memories can still be retrieved.
|
||||
|
||||
For an AI memory system, this means old but important memories don't vanish.
|
||||
They fade slowly and can be revived by accessing them (the Testing Effect).
|
||||
```
|
||||
|
||||
### Q: "Does this work with models other than Claude?"
|
||||
|
||||
```
|
||||
Yes. Vestige speaks MCP (Model Context Protocol), which is supported by Claude,
|
||||
Cursor, VS Code Copilot, JetBrains, Windsurf, and others. Any MCP-compatible
|
||||
client can use it.
|
||||
|
||||
The CLAUDE.md configuration in the repo tells the AI when and how to use the
|
||||
memory tools, but the underlying server is model-agnostic. You could write
|
||||
equivalent instructions for any model that supports MCP tool calling.
|
||||
```
|
||||
|
||||
### Q: "Why AGPL-3.0?"
|
||||
|
||||
```
|
||||
To prevent cloud providers from hosting Vestige as a competing service without
|
||||
contributing back. AGPL requires that if you serve the software over a network,
|
||||
you must open-source your modifications.
|
||||
|
||||
The core memory system is fully open source and always will be. If you run it
|
||||
locally (which is the intended use case), AGPL is functionally identical to GPL.
|
||||
```
|
||||
|
||||
### Q: "What's the performance like? SQLite seems like it would be slow for vectors."
|
||||
|
||||
```
|
||||
SQLite handles the keyword search (FTS5 — very fast). Vector search uses USearch
|
||||
HNSW with int8 quantization, which is separate from SQLite.
|
||||
|
||||
Benchmarks on an M1 MacBook Pro:
|
||||
|
||||
100 memories: <10ms search
|
||||
1,000 memories: <50ms search
|
||||
10,000 memories: <200ms search
|
||||
100,000 memories: <1s search
|
||||
|
||||
cosine_similarity: 296ns
|
||||
RRF fusion: 17µs
|
||||
Embedding generation: ~100ms per memory (only on ingest)
|
||||
|
||||
For personal use (hundreds to a few thousand memories), search is essentially
|
||||
instant. The bottleneck is embedding generation, which only happens when storing
|
||||
new memories.
|
||||
```
|
||||
|
||||
### Q: "How does the 3D dashboard work? Is it practical or just eye candy?"
|
||||
|
||||
```
|
||||
Both, honestly. The 3D force-directed graph is genuinely useful for seeing
|
||||
clusters of related knowledge, discovering memories you forgot about, and
|
||||
watching the dreaming process in real-time (memories pulse purple as they're
|
||||
replayed).
|
||||
|
||||
But the primary UX for interacting with memories is through your AI. The
|
||||
dashboard is a monitoring and exploration tool, not the main interface. You
|
||||
talk to Claude/Cursor/etc normally, and Vestige handles memory in the
|
||||
background via MCP tool calls.
|
||||
|
||||
The dashboard is embedded in the binary via include_dir! — no separate server.
|
||||
It's served on localhost:3927/dashboard alongside the MCP stdio server.
|
||||
```
|
||||
|
||||
### Q: "Is the 'dreaming' feature actually doing anything useful?"
|
||||
|
||||
```
|
||||
It does three things:
|
||||
|
||||
1. Replays recent memories and computes pairwise semantic similarity to
|
||||
discover connections you didn't explicitly create.
|
||||
|
||||
2. Identifies memories that should be linked based on content overlap,
|
||||
creating an associative network.
|
||||
|
||||
3. Synthesizes short insights from clusters of related memories.
|
||||
|
||||
Is it as sophisticated as hippocampal replay during NREM sleep? No. It's an
|
||||
engineering approximation that runs the same kind of "offline processing" —
|
||||
reviewing, connecting, consolidating — that biological memory consolidation
|
||||
does during sleep. The connections it discovers are sometimes genuinely
|
||||
surprising and useful.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Reddit Cross-Posts
|
||||
|
||||
### r/rust
|
||||
|
||||
**Title:** `Vestige v2.0 — 77K LOC Rust memory system with FSRS-6, HNSW, Axum WebSockets, and an embedded SvelteKit dashboard in a 22MB binary`
|
||||
|
||||
**Body:**
|
||||
|
||||
```markdown
|
||||
I've been building Vestige for the past few months and just shipped v2.0. It's
|
||||
a cognitive memory system for AI agents that implements neuroscience-backed
|
||||
memory algorithms in pure Rust.
|
||||
|
||||
**Rust-specific highlights:**
|
||||
|
||||
- **Single binary deployment**: SvelteKit dashboard compiled to static files,
|
||||
then embedded into the Rust binary via `include_dir!`. The entire system —
|
||||
MCP server, HTTP dashboard, WebSocket event bus, embedding inference — ships
|
||||
as one 22MB binary.
|
||||
|
||||
- **fastembed vendored fork**: We vendor a fork of the fastembed-rs crate for
|
||||
ONNX embedding inference (Nomic Embed Text v1.5, 768-dim vectors truncated
|
||||
to 256 via Matryoshka). Feature flags for Nomic v2 MoE and Metal GPU
|
||||
acceleration.
|
||||
|
||||
- **Axum 0.8 + tokio broadcast**: Dashboard runs on Axum with WebSocket
|
||||
upgrade at `/ws`. A single `tokio::broadcast::channel(1024)` propagates
|
||||
events from the MCP stdio server to all connected dashboard clients.
|
||||
|
||||
- **USearch HNSW**: Vector similarity search via USearch with int8
|
||||
quantization (M=16, efConstruction=128, efSearch=64). Criterion benchmarks:
|
||||
cosine_similarity at 296ns, RRF fusion at 17µs.
|
||||
|
||||
- **SQLite + FTS5**: rusqlite 0.38 with WAL mode, reader/writer connection
|
||||
split, FTS5 porter tokenizer for keyword search. Interior mutability via
|
||||
`Mutex<Connection>` — all Storage methods take `&self`.
|
||||
|
||||
- **Rust 2024 edition**: Using `use<'_>` captures in RPITIT and the latest
|
||||
edition features. MSRV 1.85.
|
||||
|
||||
- **Release profile**: `lto = true`, `codegen-units = 1`, `opt-level = "z"`,
|
||||
`strip = true` gets the binary down to 22MB including embedded assets.
|
||||
|
||||
- **734 tests**: 352 core + 378 mcp + 4 doctests. Zero warnings.
|
||||
|
||||
**Architecture:**
|
||||
|
||||
```
|
||||
MCP Client (Claude/Cursor/etc)
|
||||
| stdio JSON-RPC
|
||||
v
|
||||
McpServer (rmcp 0.14)
|
||||
|
|
||||
+---> Arc<Storage> (SQLite + HNSW)
|
||||
|
|
||||
+---> Arc<Mutex<CognitiveEngine>> (29 modules)
|
||||
|
|
||||
+---> broadcast::Sender<VestigeEvent>
|
||||
|
|
||||
v
|
||||
Axum Dashboard (port 3927)
|
||||
|
|
||||
+---> /ws (WebSocket)
|
||||
+---> /api/* (REST)
|
||||
+---> /dashboard/* (SvelteKit static)
|
||||
```
|
||||
|
||||
The cognitive engine implements FSRS-6 spaced repetition, prediction error
|
||||
gating, synaptic tagging, spreading activation, and memory dreaming. Each
|
||||
module is a stateful struct initialized once at startup and shared via Arc.
|
||||
|
||||
**What I'd do differently:** The fastembed vendoring is the ugliest part of the
|
||||
codebase. ONNX Runtime bindings are notoriously painful in Rust, and I spent
|
||||
more time fighting `ort` than any other dependency. If I started over, I might
|
||||
explore `candle` as the primary backend instead of ORT.
|
||||
|
||||
Source: https://github.com/samvallad33/vestige
|
||||
License: AGPL-3.0
|
||||
|
||||
Happy to discuss any of the Rust architecture decisions.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### r/ClaudeAI
|
||||
|
||||
**Title:** `Vestige v2.0 "Cognitive Leap" — give Claude real long-term memory with neuroscience-backed forgetting, a 3D dashboard, and 21 MCP tools`
|
||||
|
||||
**Body:**
|
||||
|
||||
```markdown
|
||||
Vestige gives Claude persistent memory across sessions using real cognitive
|
||||
science instead of just dumping everything into a database.
|
||||
|
||||
**The problem it solves:** Every Claude conversation starts from zero. Even with
|
||||
native memory, you have no control over what's stored, how it decays, or where
|
||||
it lives. Vestige gives Claude a proper long-term memory system that runs 100%
|
||||
locally on your machine.
|
||||
|
||||
**What makes it different from other MCP memory servers:**
|
||||
|
||||
- **Memories decay like real memories.** FSRS-6 spaced repetition (the same
|
||||
algorithm Anki uses) models forgetting curves. Memories you use get stronger.
|
||||
Memories you ignore fade. Important events retroactively strengthen recent
|
||||
memories.
|
||||
|
||||
- **Smart deduplication.** When Claude tries to save something similar to what
|
||||
it already knows, Prediction Error Gating decides whether to create, merge,
|
||||
or just reinforce the existing memory. No manual cleanup needed.
|
||||
|
||||
- **29 cognitive modules** implementing findings from memory research: dual-
|
||||
strength model, testing effect, synaptic tagging, spreading activation,
|
||||
context-dependent retrieval, memory dreaming.
|
||||
|
||||
**v2.0 new features:**
|
||||
|
||||
- **3D Memory Dashboard** at localhost:3927/dashboard — watch Claude's mind in
|
||||
real-time. Memories pulse when accessed, burst particles on creation, golden
|
||||
lines when connections form. SvelteKit + Three.js with bloom post-processing.
|
||||
|
||||
- **Real-time event bus** — every cognitive operation (search, dream,
|
||||
consolidation) broadcasts WebSocket events to the dashboard.
|
||||
|
||||
- **HyDE query expansion** — dramatically better search for conceptual queries.
|
||||
|
||||
- **Single 22MB binary** — everything embedded, no Docker, no Node, no cloud.
|
||||
|
||||
**Setup (2 minutes):**
|
||||
|
||||
```bash
|
||||
curl -L https://github.com/samvallad33/vestige/releases/latest/download/vestige-mcp-aarch64-apple-darwin.tar.gz | tar -xz
|
||||
sudo mv vestige-mcp vestige vestige-restore /usr/local/bin/
|
||||
claude mcp add vestige vestige-mcp -s user
|
||||
```
|
||||
|
||||
Then add the CLAUDE.md instructions from the repo to tell Claude how to use
|
||||
memory tools automatically.
|
||||
|
||||
**What it's like in practice:** After 2 months of daily use, Claude remembers
|
||||
my coding patterns, my architectural decisions, my preferences across every
|
||||
project. New sessions start with context instead of a blank slate. It knows I
|
||||
prefer Rust over Go, that I use Tailwind, and that my last debugging session
|
||||
on Project X ended with a tricky race condition in the WebSocket handler.
|
||||
|
||||
It's the difference between talking to someone with amnesia vs. someone who
|
||||
actually knows you.
|
||||
|
||||
21 MCP tools. 77,840 lines of Rust. 734 tests. Works with Claude Code, Claude
|
||||
Desktop, Cursor, VS Code Copilot, JetBrains, Windsurf, and Xcode.
|
||||
|
||||
Source: https://github.com/samvallad33/vestige
|
||||
|
||||
Happy to answer questions or help with setup.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### r/LocalLLaMA
|
||||
|
||||
**Title:** `Vestige v2.0 — local-first AI memory server with FSRS-6 spaced repetition, ONNX embeddings, and zero cloud dependency (77K LOC Rust, 22MB binary)`
|
||||
|
||||
**Body:**
|
||||
|
||||
```markdown
|
||||
Vestige is a memory system for AI agents that runs entirely on your machine.
|
||||
No cloud, no API keys, no telemetry. After the first-run embedding model
|
||||
download (~130MB), it makes zero network requests.
|
||||
|
||||
**Why this matters for local LLM setups:**
|
||||
|
||||
Most memory/RAG solutions assume cloud embeddings (OpenAI, Cohere, etc.).
|
||||
Vestige embeds locally via ONNX (Nomic Embed Text v1.5, 768-dim vectors) and
|
||||
stores everything in a local SQLite database. If you're already running local
|
||||
models, your memory system should be local too.
|
||||
|
||||
**The cognitive science angle:**
|
||||
|
||||
This isn't just another vector database wrapper. It implements real memory
|
||||
algorithms:
|
||||
|
||||
- **FSRS-6**: The state-of-the-art spaced repetition algorithm (power-law
|
||||
forgetting, 21 parameters, trained on 700M+ Anki reviews). Memories decay
|
||||
naturally instead of living forever.
|
||||
|
||||
- **Prediction Error Gating**: On ingest, compares new content against existing
|
||||
memories. Creates/merges/reinforces based on novelty. Prevents bloat.
|
||||
|
||||
- **Testing Effect**: Searching for a memory automatically strengthens it.
|
||||
Memory improves through use.
|
||||
|
||||
- **Dual-strength model**: Storage strength (how well-encoded, never decreases)
|
||||
vs. retrieval strength (how accessible now, decays over time). Mimics the
|
||||
tip-of-tongue phenomenon.
|
||||
|
||||
- **Memory Dreaming**: Offline consolidation that replays memories to discover
|
||||
connections. Inspired by hippocampal replay during sleep.
|
||||
|
||||
**Technical stack:**
|
||||
|
||||
- Rust, single 22MB binary
|
||||
- Nomic Embed Text v1.5 via ONNX (local, ~130MB model)
|
||||
- Optional: Nomic v2 MoE (475M params, feature flag), Metal GPU (Apple Silicon)
|
||||
- SQLite FTS5 for keyword search + USearch HNSW for vector search
|
||||
- MCP protocol (works with Claude, Cursor, VS Code Copilot, etc.)
|
||||
- Axum HTTP + WebSocket for dashboard
|
||||
- SvelteKit + Three.js 3D visualization (embedded in binary)
|
||||
|
||||
**v2.0 highlights:**
|
||||
|
||||
- 3D force-directed memory graph with real-time WebSocket events
|
||||
- HyDE query expansion (template-based hypothetical document embeddings)
|
||||
- FSRS decay visualization with retention curves
|
||||
- 734 tests, 29 cognitive modules, 21 tools
|
||||
- fastembed 5.11 with feature flags for Nomic v2 MoE + Qwen3 reranker
|
||||
|
||||
**Performance:**
|
||||
|
||||
- Search: <50ms for 1K memories, <200ms for 10K
|
||||
- Embedding: ~100ms per memory (ingest only)
|
||||
- cosine_similarity: 296ns (Criterion benchmark)
|
||||
- Memory: ~100MB for 1K memories, ~300MB for 10K
|
||||
|
||||
**For local model users specifically:** Vestige speaks MCP (Model Context
|
||||
Protocol). If your local model setup supports MCP tool calling, it can use
|
||||
Vestige directly. The CLAUDE.md instructions in the repo tell the model when
|
||||
and how to use memory tools — you can adapt these for any model.
|
||||
|
||||
The embedding model downloads from Hugging Face on first run and caches
|
||||
locally. After that, fully air-gapped.
|
||||
|
||||
Source: https://github.com/samvallad33/vestige
|
||||
License: AGPL-3.0 (use freely for local/personal use, cloud service requires
|
||||
source disclosure)
|
||||
|
||||
This is a solo project — feedback, issues, and contributions are very welcome.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Posting Strategy
|
||||
|
||||
### Timing
|
||||
|
||||
- **Show HN**: Tuesday or Wednesday, 8-10 AM EST (peak engagement window)
|
||||
- **r/rust**: Same day, 1-2 hours after HN post goes up
|
||||
- **r/ClaudeAI**: Same day, stagger by 1 hour
|
||||
- **r/LocalLLaMA**: Same day or next morning
|
||||
|
||||
### Rules to Follow
|
||||
|
||||
- **HN**: Title is the post. First comment is the body above. Respond to every
|
||||
comment within 30 minutes for the first 3 hours. Be humble, technical,
|
||||
transparent about limitations. Never say "AI-powered" or "game-changer."
|
||||
- **Reddit**: Each subreddit gets a tailored post emphasizing what that
|
||||
community cares about. No cross-linking between posts. Engage authentically.
|
||||
- **General**: Lead with the science, not the product. Let the tech speak.
|
||||
Acknowledge competitors honestly. Never disparage alternatives.
|
||||
|
||||
### Key Messaging Points
|
||||
|
||||
1. This is a solo project built on published research, not a startup pitch
|
||||
2. The neuroscience is real but honestly described (some modules are faithful
|
||||
implementations, some are engineering heuristics inspired by research)
|
||||
3. 100% local, zero cloud — this is a feature, not a limitation
|
||||
4. The 3D dashboard is a genuine exploration tool, not just eye candy
|
||||
5. FSRS-6 is the differentiator — no other AI memory system uses real spaced
|
||||
repetition
|
||||
|
||||
### What NOT to Say
|
||||
|
||||
- "Revolutionary" / "game-changing" / "paradigm shift"
|
||||
- "AI-powered" (it IS AI infrastructure, don't label it that way)
|
||||
- Anything negative about Mem0, Cognee, or other competitors
|
||||
- Claims about being "the best" at anything
|
||||
- Marketing language of any kind
|
||||
Loading…
Add table
Add a link
Reference in a new issue