vestige

mirror of https://github.com/samvallad33/vestige.git synced 2026-07-22 23:31:02 +02:00

NoahToKnow 9c022a0f54 fix(deep_reference): incorporate query relevance into recommended/confidence The Stage 8 `recommended` selector and the evidence sort both rank by FSRS-6 trust only, discarding the `combined_score` signal that the upstream hybrid_search + cross-encoder reranker just computed. Confidence is then derived from `recommended.trust + evidence_count`, neither of which moves with the query — so any query against the same corpus returns the same primary memory and the same confidence score. Empirical reproduction (15 deep_reference probes against an 11-memory corpus, 9 with a unique correct answer + 6 with no relevant memories): - Distinct primary memories returned : 1 / 15 - Confidence values returned : 1 distinct (0.82 for all) - Ground-truth accuracy on specific queries : 1 / 9 (11.1%) The single hit is coincidental: the always-returned memory happened to be the correct answer for one query. Random guessing across the 11-memory corpus would be ~9% baseline, so the tool is performing at random. Fix --- Replace trust-only ranking at three sites with a 50/50 composite of combined_score (query relevance) and FSRS-6 trust: let composite = \|s: &ScoredMemory\| s.combined_score as f64 * 0.5 + s.trust * 0.5; Used in: - cross_reference.rs:573 — `recommended` max_by - cross_reference.rs:589 — `non_superseded` evidence sort_by - cross_reference.rs:622 — `base_confidence` formula The 50/50 weighting is a design choice — see PR body for the knob to tweak if a different blend is preferred. The pre-existing updated_at tiebreaker is preserved. Tests ----- Two regression tests, both verified to FAIL on `main` and PASS with the fix via negative control (temporarily set the composite weights to 1.0 trust + 0.0 relevance and confirmed both tests fail again): - test_recommended_uses_query_relevance_not_just_trust Two-memory corpus, ingested in order so the off-topic memory wins the trust tiebreaker. Query targets the on-topic memory. The fix ensures `recommended` is the on-topic one. - test_confidence_varies_with_query_relevance Single-memory corpus. Identical execute() calls with a relevant query and an irrelevant query. The fix ensures the relevant query produces higher confidence. Full crate suite: 410 / 410 passing (was 408 + 2 new). Out of scope ------------ While running the live MCP probes I observed two further inconsistencies in `cross_reference.rs` that I cannot reproduce in cargo test (the synthetic test environment with mock embeddings does not trigger the required combined_score > 0.2 floor condition): - The `effective_sim` floor at line 551 fabricates contradictions between memories with no real topical overlap when one contains a CORRECTION_SIGNALS keyword. - The Stage 5 `contradictions` field (strict) and the Stage 7 `pair_relations` feeding the reasoning text (loose, post-floor) disagree, producing responses where `reasoning` claims N contradictions while `contradictions` is empty and `status` is "resolved". I have empirical data for both from live MCP usage but no reproducible cargo test, so they are intentionally not addressed in this PR. Happy to file them as a separate issue with the raw probe data if useful.		2026-04-09 20:09:56 -06:00
..
changelog.rs	feat: v2.0.4 "Deep Reference" — cognitive reasoning engine + 10 bug fixes	2026-04-09 16:15:26 -05:00
checkpoint.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
codebase.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
codebase_unified.rs	fix: v2.0.1 release — fix broken installs, CI, security, and docs	2026-03-01 20:20:14 -06:00
consolidate.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
context.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
cross_reference.rs	fix(deep_reference): incorporate query relevance into recommended/confidence	2026-04-09 20:09:56 -06:00
dedup.rs	feat: v2.0.4 "Deep Reference" — cognitive reasoning engine + 10 bug fixes	2026-04-09 16:15:26 -05:00
dream.rs	feat: v2.0.4 "Deep Reference" — cognitive reasoning engine + 10 bug fixes	2026-04-09 16:15:26 -05:00
explore.rs	fix: resolve clippy collapsible-if errors in explore.rs	2026-04-09 17:37:41 -05:00
feedback.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
graph.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
health.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
importance.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
ingest.rs	fix: v2.0.1 release — fix broken installs, CI, security, and docs	2026-03-01 20:20:14 -06:00
intention_unified.rs	fix(intention): accept snake_case in_minutes / file_pattern on TriggerSpec	2026-04-09 16:24:17 -06:00
intentions.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
knowledge.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
maintenance.rs	fix: v2.0.1 release — fix broken installs, CI, security, and docs	2026-03-01 20:20:14 -06:00
memory_states.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
memory_unified.rs	feat: v2.0.4 "Deep Reference" — cognitive reasoning engine + 10 bug fixes	2026-04-09 16:15:26 -05:00
mod.rs	feat: v2.0.4 "Deep Reference" — cognitive reasoning engine + 10 bug fixes	2026-04-09 16:15:26 -05:00
predict.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
recall.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
restore.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
review.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
search.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
search_unified.rs	feat: v2.0.4 "Deep Reference" — cognitive reasoning engine + 10 bug fixes	2026-04-09 16:15:26 -05:00
session_context.rs	feat: v2.0.4 "Deep Reference" — cognitive reasoning engine + 10 bug fixes	2026-04-09 16:15:26 -05:00
smart_ingest.rs	fix: v2.0.1 release — fix broken installs, CI, security, and docs	2026-03-01 20:20:14 -06:00
stats.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
tagging.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00
timeline.rs	feat: Vestige v1.9.1 AUTONOMIC — self-regulating memory with graph visualization	2026-02-21 02:02:06 -06:00