mirror of
https://github.com/samvallad33/vestige.git
synced 2026-05-09 15:52:37 +02:00
The Stage 8 `recommended` selector and the evidence sort both rank by
FSRS-6 trust only, discarding the `combined_score` signal that the
upstream hybrid_search + cross-encoder reranker just computed. Confidence
is then derived from `recommended.trust + evidence_count`, neither of
which moves with the query — so any query against the same corpus
returns the same primary memory and the same confidence score.
Empirical reproduction (15 deep_reference probes against an 11-memory
corpus, 9 with a unique correct answer + 6 with no relevant memories):
- Distinct primary memories returned : 1 / 15
- Confidence values returned : 1 distinct (0.82 for all)
- Ground-truth accuracy on specific queries : 1 / 9 (11.1%)
The single hit is coincidental: the always-returned memory happened to
be the correct answer for one query. Random guessing across the 11-memory
corpus would be ~9% baseline, so the tool is performing at random.
Fix
---
Replace trust-only ranking at three sites with a 50/50 composite of
combined_score (query relevance) and FSRS-6 trust:
let composite = |s: &ScoredMemory| s.combined_score as f64 * 0.5 + s.trust * 0.5;
Used in:
- cross_reference.rs:573 — `recommended` max_by
- cross_reference.rs:589 — `non_superseded` evidence sort_by
- cross_reference.rs:622 — `base_confidence` formula
The 50/50 weighting is a design choice — see PR body for the knob to
tweak if a different blend is preferred. The pre-existing updated_at
tiebreaker is preserved.
Tests
-----
Two regression tests, both verified to FAIL on `main` and PASS with the
fix via negative control (temporarily set the composite weights to
1.0 trust + 0.0 relevance and confirmed both tests fail again):
- test_recommended_uses_query_relevance_not_just_trust
Two-memory corpus, ingested in order so the off-topic memory wins
the trust tiebreaker. Query targets the on-topic memory. The fix
ensures `recommended` is the on-topic one.
- test_confidence_varies_with_query_relevance
Single-memory corpus. Identical execute() calls with a relevant
query and an irrelevant query. The fix ensures the relevant
query produces higher confidence.
Full crate suite: 410 / 410 passing (was 408 + 2 new).
Out of scope
------------
While running the live MCP probes I observed two further inconsistencies
in `cross_reference.rs` that I cannot reproduce in cargo test (the
synthetic test environment with mock embeddings does not trigger the
required combined_score > 0.2 floor condition):
- The `effective_sim` floor at line 551 fabricates contradictions
between memories with no real topical overlap when one contains a
CORRECTION_SIGNALS keyword.
- The Stage 5 `contradictions` field (strict) and the Stage 7
`pair_relations` feeding the reasoning text (loose, post-floor)
disagree, producing responses where `reasoning` claims N
contradictions while `contradictions` is empty and `status` is
"resolved".
I have empirical data for both from live MCP usage but no reproducible
cargo test, so they are intentionally not addressed in this PR. Happy to
file them as a separate issue with the raw probe data if useful.
|
||
|---|---|---|
| .. | ||
| src | ||
| Cargo.toml | ||
| README.md | ||
Vestige MCP Server
A bleeding-edge Rust MCP (Model Context Protocol) server for Vestige - providing Claude and other AI assistants with long-term memory capabilities.
Features
- FSRS-6 Algorithm: State-of-the-art spaced repetition (21 parameters, personalized decay)
- Dual-Strength Memory Model: Based on Bjork & Bjork 1992 cognitive science research
- Local Semantic Embeddings: nomic-embed-text-v1.5 (768d) via fastembed v5 (no external API)
- HNSW Vector Search: USearch-based, 20x faster than FAISS
- Hybrid Search: BM25 + semantic with RRF fusion
- Codebase Memory: Remember patterns, decisions, and context
Installation
cd /path/to/vestige/crates/vestige-mcp
cargo build --release
Binary will be at target/release/vestige-mcp
Claude Desktop Configuration
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"vestige": {
"command": "/path/to/vestige-mcp"
}
}
}
Available Tools
Core Memory
| Tool | Description |
|---|---|
ingest |
Add new knowledge to memory |
recall |
Search and retrieve memories |
semantic_search |
Find conceptually similar content |
hybrid_search |
Combined keyword + semantic search |
get_knowledge |
Retrieve a specific memory by ID |
delete_knowledge |
Delete a memory |
mark_reviewed |
Review with FSRS rating (1-4) |
Statistics & Maintenance
| Tool | Description |
|---|---|
get_stats |
Memory system statistics |
health_check |
System health status |
run_consolidation |
Apply decay, generate embeddings |
Codebase Tools
| Tool | Description |
|---|---|
remember_pattern |
Remember code patterns |
remember_decision |
Remember architectural decisions |
get_codebase_context |
Get patterns and decisions |
Available Resources
Memory Resources
| URI | Description |
|---|---|
memory://stats |
Current statistics |
memory://recent?n=10 |
Recent memories |
memory://decaying |
Low retention memories |
memory://due |
Memories due for review |
Codebase Resources
| URI | Description |
|---|---|
codebase://structure |
Known codebases |
codebase://patterns |
Remembered patterns |
codebase://decisions |
Architectural decisions |
Example Usage (with Claude)
User: Remember that we decided to use FSRS-6 instead of SM-2 because it's 20-30% more efficient.
Claude: [calls remember_decision]
I've recorded that architectural decision.
User: What decisions have we made about algorithms?
Claude: [calls get_codebase_context]
I found 1 decision:
- We decided to use FSRS-6 instead of SM-2 because it's 20-30% more efficient.
Data Storage
- Database:
~/Library/Application Support/com.vestige.mcp/vestige-mcp.db(macOS) - Uses SQLite with FTS5 for full-text search
- Vector embeddings stored in separate table
Protocol
- JSON-RPC 2.0 over stdio
- MCP Protocol Version: 2024-11-05
- Logging to stderr (stdout reserved for JSON-RPC)
License
MIT