feat: v2.0.4 "Deep Reference" — cognitive reasoning engine + 10 bug fixes

New features: - deep_reference tool (#22): 8-stage cognitive reasoning pipeline with FSRS-6 trust scoring, intent classification (FactCheck/Timeline/RootCause/Comparison/ Synthesis), spreading activation expansion, temporal supersession, trust-weighted contradiction analysis, relation assessment, dream insight integration, and algorithmic reasoning chain generation — all without calling an LLM - cross_reference (#23): backward-compatible alias for deep_reference - retrieval_mode parameter on search (precise/balanced/exhaustive) - get_batch action on memory tool (up to 20 IDs per call) - Token budget raised from 10K to 100K on search + session_context - Dates (createdAt/updatedAt) on all search results and session_context lines Bug fixes (GitHub Issue #25 — all 10 resolved): - state_transitions empty: wired record_memory_access into strengthen_batch - chain/bridges no storage fallback: added with edge deduplication - knowledge_edges dead schema: documented as deprecated - insights not persisted from dream: wired save_insight after generation - find_duplicates threshold dropped: serde alias fix - search min_retention ignored: serde aliases for snake_case params - intention time triggers null: removed dead trigger_at embedding - changelog missing dreams: added get_dream_history + event integration - phantom Related IDs: clarified message text - fsrs_cards empty: documented as harmless dead schema Security hardening: - HTTP transport CORS: permissive() → localhost-only - Auth token panic guard: &token[..8] → safe min(8) slice - UTF-8 boundary fix: floor_char_boundary on content truncation - All unwrap() removed from HTTP transport (unwrap_or_else fallback) - Dream memory_count capped at 500 (prevents O(N²) hang) - Dormant state threshold aligned (0.3 → 0.4) Stats: 23 tools, 758 tests, 0 failures, 0 warnings, 0 unwraps in production Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-06-22 21:28:08 +02:00 · 2026-04-09 16:15:01 -05:00 · 2026-04-09 16:15:01 -05:00 · 04781a95e2
commit 04781a95e2
parent 61091e06b9
28 changed files with 1797 additions and 102 deletions
--- a/docs/integrations/windsurf.md
+++ b/docs/integrations/windsurf.md
@ -115,7 +115,7 @@ It remembers.

 ## Important: Tool Limit

-Windsurf has a **hard cap of 100 tools** across all MCP servers. Vestige uses 19 tools, leaving plenty of room for other servers.
+Windsurf has a **hard cap of 100 tools** across all MCP servers. Vestige uses 23 tools, leaving plenty of room for other servers.

 ---

--- a/docs/integrations/xcode.md
+++ b/docs/integrations/xcode.md
@ -51,7 +51,7 @@ Quit Xcode completely (Cmd+Q) and reopen your project.

 ### 4. Verify

-Type `/context` in the Agent panel. You should see `vestige` listed with 19 tools.
+Type `/context` in the Agent panel. You should see `vestige` listed with 23 tools.

 ---

--- a/docs/launch/reddit-cross-reference.md
+++ b/docs/launch/reddit-cross-reference.md
@ -0,0 +1,233 @@
+# Reddit Launch Posts — cross_reference Tool
+
+## Post 1: r/ClaudeAI (Primary)
+
+**Title:** `I built a tool that catches when Claude's memories contradict each other before it answers. It's been wrong 0 times since.`
+
+---
+
+I've been building Vestige — an MCP memory server that gives Claude persistent memory across sessions. FSRS-6 spaced repetition (the Anki algorithm), 29 neuroscience-inspired modules, single Rust binary. 1,111 memories stored. It works.
+
+But last week it almost cost me hours of debugging.
+
+Claude confidently told me my AIMO3 competition notebook should use `--enable-prefix-caching` with vLLM. I trusted it. The notebook crashed. Scored 0/50. Burned a daily submission.
+
+The problem? I had TWO memories:
+- **January**: "prefix caching crashes with our vLLM build"  
+- **March**: "prefix caching works with the new animsamuelk wheels"
+
+Claude found both. Picked the wrong one. Gave me a confident wrong answer based on the January memory. The March memory was correct — but Claude had no way to know they conflicted.
+
+So I built `cross_reference`.
+
+Now before answering any factual question, Claude calls:
+
+```
+cross_reference({ query: "should I use --enable-prefix-caching with vLLM?" })
+```
+
+And gets back:
+
+```json
+{
+  "status": "contradictions_found",
+  "confidence": 0.35,
+  "guidance": "WARNING: 1 contradiction detected across 12 memories. 
+               The newer memory is likely more accurate.",
+  "contradictions": [
+    {
+      "newer": {
+        "date": "2026-03-18",
+        "preview": "Switched to animsamuelk wheels which support --enable-prefix-caching..."
+      },
+      "older": {
+        "date": "2026-01-15", 
+        "preview": "prefix caching crashed with our samvalladares vLLM build..."
+      },
+      "recommendation": "Trust the newer memory. Consider demoting the older one."
+    }
+  ],
+  "timeline": [ /* 12 memories sorted newest-first */ ]
+}
+```
+
+Claude sees the contradiction BEFORE answering. Trusts the March memory. Gets it right.
+
+**I've been using it for 2 days. It's caught contradictions I didn't even know I had.** Old project decisions that got reversed. Config values that changed. Library versions that got upgraded. All sitting in memory, waiting to give me wrong answers.
+
+### How it works under the hood:
+
+1. **Broad retrieval** — pulls up to 50 memories related to the query
+2. **Cross-encoder reranking** — filters for quality (Jina Reranker v1 Turbo)
+3. **Pairwise contradiction detection** — every memory pair checked for negation patterns ("don't" vs "do", "deprecated" vs "recommended") and correction signals ("now uses", "switched to", "replaced by")
+4. **Topic gating** — only compares memories that share significant words (prevents false positives between unrelated memories)
+5. **Confidence scoring** — agreements boost confidence, contradictions tank it, recency helps
+6. **Timeline** — everything sorted newest-first so Claude sees the evolution
+7. **Superseded list** — explicitly identifies which old memories should be demoted
+
+### The bigger picture:
+
+Context windows are now 1M tokens (Claude Opus 4.6, GPT-5.4). But bigger context doesn't fix this problem. The "Lost in the Middle" research shows accuracy drops from 92% to 78% at 1M tokens. More context = more chances for contradictions to slip through.
+
+Memory systems need to be SMARTER, not just bigger. That's what Vestige does — 29 cognitive modules implementing real neuroscience:
+
+- **FSRS-6** — memories naturally decay when unused, strengthen when accessed (21 parameters trained on 700M+ reviews)
+- **Prediction Error Gating** — only stores what's genuinely new (no duplicates)
+- **Dream consolidation** — replays memories to discover hidden connections (yes, like sleep)
+- **Spreading activation** — search for "auth bug" and find the related JWT update from last week
+- **cross_reference** — the new tool that catches contradictions before they become wrong answers
+
+### Stats:
+- 22 MCP tools
+- 746 tests, 0 failures
+- Zero `unsafe` code
+- Clean security audit (0 findings — AgentAudit verified)
+- Single 22MB Rust binary — no Docker, no PostgreSQL, no cloud
+- Works with Claude Code, Cursor, VS Code, Xcode 26.3, JetBrains, Windsurf
+
+### Install (30 seconds):
+```bash
+# macOS Apple Silicon
+curl -L https://github.com/samvallad33/vestige/releases/latest/download/vestige-mcp-aarch64-apple-darwin.tar.gz | tar -xz
+sudo mv vestige-mcp /usr/local/bin/
+claude mcp add vestige vestige-mcp -s user
+```
+
+Then add to your CLAUDE.md:
+```
+Before answering factual questions, call cross_reference({ query: "the topic" })
+to verify your memories don't contradict each other.
+```
+
+That's it. Your AI now fact-checks itself.
+
+**GitHub:** https://github.com/samvallad33/vestige
+
+I searched every memory MCP server out there — Mem0 (47K stars), Hindsight, Zep, Letta, OMEGA, Solitaire, MuninnDB. Some detect contradictions at ingest time (when you save). Nobody else gives the AI an on-demand tool to verify its own memories before answering, with confidence scoring and guidance on which memory to trust.
+
+Happy to answer questions about the neuroscience, the architecture, or how to set it up.
+
+---
+
+## Post 2: r/LocalLLaMA (Cross-post, 2h later)
+
+**Title:** `Your AI has 1000 memories. Some contradict each other. It doesn't know. I built the fix — 100% local, single Rust binary, zero cloud.`
+
+---
+
+The problem nobody talks about with AI memory:
+
+You use a memory system. Over months, you accumulate 1000+ memories. Your project config changed 3 times. A library got deprecated. A decision got reversed. All those old memories are still there.
+
+When your AI searches memory, it finds 5 results. Two of them disagree. The AI picks one — maybe the wrong one — and gives you a confident answer based on outdated information.
+
+I built `cross_reference` to fix this. It's tool #22 in Vestige, my cognitive memory MCP server.
+
+```
+cross_reference({ query: "what database does the project use?" })
+```
+
+Returns:
+```json
+{
+  "status": "contradictions_found",
+  "confidence": 0.35,
+  "contradictions": [{
+    "newer": { "date": "2026-03", "preview": "Migrated to PostgreSQL..." },
+    "older": { "date": "2026-01", "preview": "Using SQLite for the backend..." }
+  }],
+  "guidance": "WARNING: Trust the newer memory."
+}
+```
+
+The AI sees the conflict. Picks the right one. Every time.
+
+**What makes this different from RAG/vector search:**
+
+| | RAG | Vestige |
+|---|---|---|
+| Storage | Store everything | Prediction Error Gating — only stores what's new |
+| Retrieval | Nearest neighbor | 7-stage cognitive pipeline with reranking |
+| Contradictions | Returns both, hopes for the best | **Detects and flags before answering** |
+| Decay | Nothing expires | FSRS-6 — memories fade naturally |
+| Duplicates | Manual dedup | Self-healing via semantic comparison |
+
+**The stack:**
+- Rust 2024 edition. Single 22MB binary. No Python, no Docker, no cloud.
+- FSRS-6 spaced repetition (21 parameters, 700M+ reviews)
+- 29 cognitive modules (spreading activation, synaptic tagging, dream consolidation, prediction error gating)
+- 3D neural dashboard at localhost:3927 (Three.js, real-time WebSocket)
+- MCP server — works with any AI that speaks MCP
+
+**100% local. Your data never leaves your machine.**
+
+```bash
+curl -L https://github.com/samvallad33/vestige/releases/latest/download/vestige-mcp-aarch64-apple-darwin.tar.gz | tar -xz
+sudo mv vestige-mcp /usr/local/bin/
+claude mcp add vestige vestige-mcp -s user
+```
+
+746 tests. Zero unsafe code. Clean security audit. AGPL-3.0.
+
+GitHub: https://github.com/samvallad33/vestige
+
+---
+
+## Post 3: r/rust (Optional, technical audience)
+
+**Title:** `I built a 22MB Rust binary that gives AI agents a brain — FSRS-6, 29 cognitive modules, 3D dashboard, and a new contradiction detection tool. 746 tests, zero unsafe.`
+
+---
+
+Vestige is a cognitive memory engine for AI agents. MCP server (stdio JSON-RPC + HTTP transport), SQLite WAL backend, USearch HNSW vector search, Nomic Embed v1.5 via fastembed (local ONNX, no API).
+
+The latest addition: `cross_reference` — pairwise contradiction detection across memories at retrieval time. The AI calls it before answering factual questions to verify its memories don't disagree.
+
+**Why Rust:**
+- Single static binary (22MB with embedded SvelteKit dashboard via `include_dir!`)
+- No runtime, no GC pauses during real-time search
+- `tokio::sync::Mutex` for the cognitive engine, `std::sync::Mutex` for SQLite reader/writer split
+- Zero `unsafe` blocks in the entire codebase
+- `cargo test` runs 746 tests in 11 seconds
+
+**Architecture:**
+```
+Axum 0.8 (dashboard + HTTP transport)
+  ↕ WebSocket event bus (tokio::broadcast, 1024 capacity)
+MCP Server (stdio JSON-RPC)
+  → 22 tools dispatched via match on tool name
+  → Arc<Storage> + Arc<Mutex<CognitiveEngine>>
+SQLite WAL + FTS5 + USearch HNSW
+  → fastembed 5.11 (Nomic Embed v1.5, 768D → 256D Matryoshka)
+  → Jina Reranker v1 Turbo (cross-encoder, 38M params)
+```
+
+**Key crates:** rusqlite 0.38, axum 0.8, tokio 1, fastembed 5.11, usearch 2, chrono 0.4, serde 1, uuid 1, include_dir 0.7
+
+**What I learned building it:**
+- `OnceLock<Result<Mutex<TextEmbedding>, String>>` for lazy model initialization — the embedding model downloads ~130MB on first run, `OnceLock` ensures it only happens once and caches the error if it fails
+- `floor_char_boundary()` saved me from a UTF-8 panic (content truncation with multi-byte chars)
+- SQLite `PRAGMA journal_mode = WAL` + `synchronous = NORMAL` + `mmap_size = 268435456` gives surprisingly good concurrent read performance
+- `try_lock()` pattern for cognitive features in search — if the lock is held (by dream consolidation), search falls back to simpler scoring instead of blocking
+
+Clean security audit. Parameterized SQL everywhere. CSP headers on the dashboard. Constant-time auth comparison (`subtle::ConstantTimeEq`). File permissions 0o600/0o700.
+
+GitHub: https://github.com/samvallad33/vestige  
+AGPL-3.0 | 746 tests | 79K+ LOC
+
+---
+
+## Posting Strategy
+
+| Subreddit | When | Expected Engagement |
+|-----------|------|-------------------|
+| r/ClaudeAI | First — 12-14 UTC (US morning) | High — direct audience for MCP tools |
+| r/LocalLLaMA | 2h after r/ClaudeAI | High — local-first angle resonates here |
+| r/rust | Same day, evening UTC | Medium — technical deep dive audience |
+| r/MachineLearning | Next day if first two do well | Lower but prestigious |
+
+**Title formula that works on Reddit:** Personal story + specific problem + "nobody else has this"
+
+**DO NOT do:** Product-name-first titles, marketing speak, "introducing X", "check out my project"
+
+**DO:** Lead with the PAIN ("my AI gave me wrong answers"), show the FIX (the JSON output), then reveal the tool.