The Stage 8 `recommended` selector and the evidence sort both rank by
FSRS-6 trust only, discarding the `combined_score` signal that the
upstream hybrid_search + cross-encoder reranker just computed. Confidence
is then derived from `recommended.trust + evidence_count`, neither of
which moves with the query — so any query against the same corpus
returns the same primary memory and the same confidence score.
Empirical reproduction (15 deep_reference probes against an 11-memory
corpus, 9 with a unique correct answer + 6 with no relevant memories):
- Distinct primary memories returned : 1 / 15
- Confidence values returned : 1 distinct (0.82 for all)
- Ground-truth accuracy on specific queries : 1 / 9 (11.1%)
The single hit is coincidental: the always-returned memory happened to
be the correct answer for one query. Random guessing across the 11-memory
corpus would be ~9% baseline, so the tool is performing at random.
Fix
---
Replace trust-only ranking at three sites with a 50/50 composite of
combined_score (query relevance) and FSRS-6 trust:
let composite = |s: &ScoredMemory| s.combined_score as f64 * 0.5 + s.trust * 0.5;
Used in:
- cross_reference.rs:573 — `recommended` max_by
- cross_reference.rs:589 — `non_superseded` evidence sort_by
- cross_reference.rs:622 — `base_confidence` formula
The 50/50 weighting is a design choice — see PR body for the knob to
tweak if a different blend is preferred. The pre-existing updated_at
tiebreaker is preserved.
Tests
-----
Two regression tests, both verified to FAIL on `main` and PASS with the
fix via negative control (temporarily set the composite weights to
1.0 trust + 0.0 relevance and confirmed both tests fail again):
- test_recommended_uses_query_relevance_not_just_trust
Two-memory corpus, ingested in order so the off-topic memory wins
the trust tiebreaker. Query targets the on-topic memory. The fix
ensures `recommended` is the on-topic one.
- test_confidence_varies_with_query_relevance
Single-memory corpus. Identical execute() calls with a relevant
query and an irrelevant query. The fix ensures the relevant
query produces higher confidence.
Full crate suite: 410 / 410 passing (was 408 + 2 new).
Out of scope
------------
While running the live MCP probes I observed two further inconsistencies
in `cross_reference.rs` that I cannot reproduce in cargo test (the
synthetic test environment with mock embeddings does not trigger the
required combined_score > 0.2 floor condition):
- The `effective_sim` floor at line 551 fabricates contradictions
between memories with no real topical overlap when one contains a
CORRECTION_SIGNALS keyword.
- The Stage 5 `contradictions` field (strict) and the Stage 7
`pair_relations` feeding the reasoning text (loose, post-floor)
disagree, producing responses where `reasoning` claims N
contradictions while `contradictions` is empty and `status` is
"resolved".
I have empirical data for both from live MCP usage but no reproducible
cargo test, so they are intentionally not addressed in this PR. Happy to
file them as a separate issue with the raw probe data if useful.
Collapsed nested if statements into single conditions using
let-chains (if a && let Ok(b) = ...). Fixes CI clippy failures
on both macOS and Ubuntu.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The public JSON schema in schema() declares `in_minutes` and `file_pattern`
in snake_case, but TriggerSpec uses `#[serde(rename_all = "camelCase")]`
which makes serde expect `inMinutes` / `filePattern`. Snake_case inputs are
silently dropped to None, so time-based intentions with `in_minutes` never
fire (triggerAt becomes null) and file_pattern-only context intentions
never match.
Added `#[serde(alias = ...)]` so both naming conventions deserialize
correctly — purely additive, existing camelCase callers unaffected.
Two regression tests added, verified to FAIL without the aliases
(negative control confirmed the snake_case duration test sees
`triggerAt: null` and the file_pattern test sees an empty `triggered`
array). Both pass with the fix. Full crate suite: 408/408 passing.
Related to #25 (Bug #8 was half-fixed — check-side re-derivation works,
but the set-side was still dropping the value before it could be persisted).
- Removed vestige-agent and vestige-agent-py from workspace members
(ARC-AGI-3 code, not part of Vestige release — caused CI failure)
- Improved deep_reference reasoning chain: fuller output with arrows on
supersession reasoning, longer primary finding preview, fallback message
when no relations found, boosted relation detection for search results
with high combined_score
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
explore_connections and memory_graph returned empty results because
in-memory cognitive modules were never loaded from the database.
Connections were persisting to SQLite correctly (795 in production)
but the query path only checked empty ActivationNetwork.
- Add CognitiveEngine::hydrate() to load connections at startup
- Add storage fallback in explore_connections associations
- Hydrate live engine after dream persists new connections
- Add error logging for save_connection failures
- Add 7 integration tests for the full round-trip
Closes#14
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3 new MCP tools (16 → 19 total):
- importance_score: 4-channel neuroscience importance scoring (novelty/arousal/reward/attention)
- session_checkpoint: batch smart_ingest up to 20 items with PE Gating
- find_duplicates: cosine similarity clustering with union-find for dedup
CLI: vestige ingest command for memory ingestion via command line
Core: made get_node_embedding public, added get_all_embeddings for dedup scanning
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add web dashboard (axum) on port 3927 with memory browser, search, and
system stats. New MCP tools: memory_timeline, memory_changelog,
health_check, consolidate, stats, backup, export, gc. Search now supports
detail_level (brief/summary/full) to control token usage. Add backup_to()
and get_recent_state_transitions() to storage layer. Bump to v1.2.0.
- Route ingest tool through smart_ingest (Prediction Error Gating) to
prevent duplicate memories when content is similar to existing entries
- Fix Intel Mac release build: use macos-13 runner for x86_64-apple-darwin
(macos-latest is now ARM64, causing silent cross-compile failures)
- Sync npm package version to 1.1.2 (was 1.0.0 in package.json, 1.1.0
in postinstall.js BINARY_VERSION)
- Add vestige-restore to npm makeExecutable list
- Remove abandoned packages/core/ TypeScript package (pre-Rust implementation
referencing FSRS-5, chromadb, ollama — 32K lines of dead code)
- Sync workspace Cargo.toml version to 1.1.2
Closes#5
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add #![allow(dead_code)] to deprecated tool modules (kept for
backwards compatibility but not exposed in MCP tool list)
- Mark unused functions with #[allow(dead_code)] annotations
- Fix unused variable warnings (prefix with _)
- Apply clippy auto-fixes for redundant closures and derives
- Fix test to account for protocol version negotiation
- Reorganize tools/mod.rs to clarify active vs deprecated tools
Security review: LOW RISK - no critical vulnerabilities found
Dead code review: deprecated tools properly annotated
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
FSRS-6 spaced repetition, spreading activation, synaptic tagging,
hippocampal indexing, and 130 years of memory research.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>