feat(merge-supersede): Phase 3 — diff-previewed, reversible merge/supersede controls (v2.1.25) (#75)

Adds opt-in, preview-first combine/dedupe/supersede on a never-delete
(bitemporal) store. The default is review, never silent mutation. Every applied
operation is recorded as a reversible, auditable event with provenance — a git
reflog for your agent's memory.

Core (vestige-core):
- advanced::merge_supersede — pure Fellegi-Sunter two-threshold scoring
  (embedding + tag + token Jaccard), match/possible/non_match classification,
  plan/diff and operation-log types, merge-composition helpers. Unit-tested.
- storage: merge_candidates, plan_merge, plan_supersede, apply_plan, merge_undo,
  protect/pin, and per-project merge_policy (persisted in fsrs_config, env
  overridable). Supersede invalidates bitemporally (valid_until + superseded_by,
  Graphiti-style "invalidate, don't delete") and keeps the old node queryable.
- Migration V14: merge_plans + merge_operations tables, knowledge_nodes.protected
  and .superseded_by columns + indexes. Idempotent on replay (duplicate-column
  guarded ADD COLUMNs).

MCP (vestige-mcp):
- Seven new tools registered + dispatched: merge_candidates, plan_merge,
  plan_supersede, apply_plan, merge_undo, protect, merge_policy.
- apply_plan requires confirm=true for possible/non_match plans; match plans
  auto-apply only when policy.auto_apply is set (default off).

Tests: candidate-threshold classification, plan-preview makes no mutation,
apply+undo reversibility, supersede bitemporal invalidation preserves old-node
queryability, protect blocks merge-away, low-confidence requires confirm, policy
roundtrip, migration V14 + idempotent replay. All 796 scoped tests pass; clippy
-D warnings clean on touched crates.

Docs: docs/MERGE_SUPERSEDE.md + CHANGELOG entry. Version bump 2.1.23 -> 2.1.25.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Sam Valladares 2026-06-12 12:55:31 -05:00 committed by GitHub
parent b01269db22
commit c23d7a309c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
19 changed files with 2704 additions and 19 deletions

View file

@ -7,6 +7,64 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
## [2.1.25] - 2026-06-12 — "Merge / Supersede Controls"
v2.1.25 ships Phase 3: diff-previewed, confidence-gated, reversible,
self-explaining combine/dedupe/supersede on a never-delete (bitemporal) store.
The default is always preview/review — these tools never silently mutate memory.
The differentiator is the reversible operation log: every merge/supersede/undo is
an auditable, reversible event with provenance ("why did these combine?") — a git
reflog for your agent's memory.
### Added
- **Seven new MCP tools** for merge/supersede control:
- `merge_candidates` — surface likely duplicate/overlapping clusters with
confidence scores and the signals behind each (Fellegi-Sunter
match/possible/non-match). Read-only.
- `plan_merge` — produce a previewable merge PLAN (a diff of combined
content/tags/provenance) without applying it.
- `plan_supersede` — preview superseding A with B (bitemporal invalidation,
audit-preserving) without applying.
- `apply_plan` — execute a previously-generated plan id; recorded as a
reversible operation.
- `merge_undo` — reverse a prior merge/supersede operation, or list the
reversible operation log (the "memory reflog").
- `protect` — pin a memory so it can never be auto-merged, superseded, or
garbage-collected.
- `merge_policy` — get/set the per-project Fellegi-Sunter two thresholds
(`match_threshold`, `possible_threshold`) and `auto_apply`.
- **Bitemporal "invalidate, don't delete" supersede** (Graphiti-style): a
superseded memory is kept and stays queryable for audit. It is stamped with
`valid_until = now` and a new `superseded_by` lineage pointer, instead of being
deleted or merely demoted.
- **Reversible operation log** (`merge_operations` table) — every applied
merge/supersede records an undo payload and provenance signals so any operation
can be reversed, including restoring survivor content/tags and clearing the
bitemporal invalidation.
- **Fellegi-Sunter two-threshold scoring** for dedup/merge candidates, combining
embedding cosine similarity with tag and content-token overlap. Borderline
"possible" matches are surfaced for review instead of force-merged.
- **Memory protection / pinning**`protected` column on `knowledge_nodes`;
protected memories are excluded from auto-merge/supersede/GC paths.
- **Migration V14** adding the `merge_plans` and `merge_operations` tables, the
`protected` and `superseded_by` columns on `knowledge_nodes`, and their
indexes. Idempotent on replay.
- **Docs**: `docs/MERGE_SUPERSEDE.md` describing the design, the bitemporal
model, the two-threshold policy, the reversible operation log, and the tool
surface.
### Notes
- All merge/supersede operations are **opt-in and preview-first**. `apply_plan`
requires `confirm=true` for `possible`/`non_match` plans, and only applies
`match` plans without confirmation when `merge_policy.auto_apply` is enabled
(default off). This deliberately avoids the silent-merge / auto-delete /
audit-trail-loss anti-patterns reported against other memory systems.
- The merge policy persists per-project and is also overridable via
`VESTIGE_MERGE_MATCH_THRESHOLD`, `VESTIGE_MERGE_POSSIBLE_THRESHOLD`, and
`VESTIGE_MERGE_AUTO_APPLY` environment variables.
## [2.1.23] - 2026-05-27 — "Receipt Lock Hardening"
v2.1.23 hardens the Sanhedrin launch path so Receipt Lock is portable,

4
Cargo.lock generated
View file

@ -4629,7 +4629,7 @@ checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"
[[package]]
name = "vestige-core"
version = "2.1.23"
version = "2.1.25"
dependencies = [
"candle-core",
"chrono",
@ -4665,7 +4665,7 @@ dependencies = [
[[package]]
name = "vestige-mcp"
version = "2.1.23"
version = "2.1.25"
dependencies = [
"anyhow",
"axum",

View file

@ -10,7 +10,7 @@ exclude = [
]
[workspace.package]
version = "2.1.23"
version = "2.1.25"
edition = "2024"
license = "AGPL-3.0-only"
repository = "https://github.com/samvallad33/vestige"

View file

@ -1,6 +1,6 @@
{
"name": "@vestige/dashboard",
"version": "2.1.23",
"version": "2.1.25",
"private": true,
"type": "module",
"scripts": {

View file

@ -1,6 +1,6 @@
[package]
name = "vestige-core"
version = "2.1.23"
version = "2.1.25"
edition = "2024"
rust-version = "1.91"
authors = ["Vestige Team"]

View file

@ -0,0 +1,447 @@
//! # Merge / Supersede Controls (Phase 3)
//!
//! Diff-previewed, confidence-gated, reversible, self-explaining combine /
//! dedupe / supersede operations on a never-delete (bitemporal) store.
//!
//! This module holds the **pure** logic: candidate scoring, two-threshold
//! classification, and the plan / operation data model. The actual persistence
//! (writing plans, applying them, recording the reversible operation log, and
//! bitemporally invalidating superseded nodes) lives in
//! [`crate::storage`]. Keeping the math here makes it unit-testable without a
//! database.
//!
//! ## Design north star
//!
//! Every combine/dedupe/supersede operation is:
//!
//! - **diff-previewed** — `plan_merge` / `plan_supersede` produce a [`MergePlan`]
//! you can inspect before anything mutates,
//! - **confidence-gated** — a Fellegi-Sunter two-threshold score classifies each
//! candidate as match / possible-match / non-match,
//! - **reversible** — every applied plan records a [`MergeOperation`] with an
//! undo payload (the "git reflog for your agent's memory"),
//! - **self-explaining** — each candidate carries the [`MatchSignals`] that
//! explain *why* the memories combined,
//! - **opt-in, never silent** — the default is preview/review, never auto-mutate,
//! - **audit-preserving** — superseding stamps `valid_until` and keeps the old
//! node queryable (Graphiti-style "invalidate, don't delete").
//!
//! ## Why Fellegi-Sunter
//!
//! Pure hashing under-merges (misses paraphrases); aggressive LLM merging
//! over-merges and destroys the audit trail. Fellegi-Sunter record linkage uses
//! **two** thresholds to carve the score space into three zones, so the
//! borderline "possible match" cases are surfaced for review instead of being
//! force-decided. We reuse the embedding cosine similarity already in the store
//! plus cheap lexical signals (tag overlap, token Jaccard) as the match weight.
use serde::{Deserialize, Serialize};
// ============================================================================
// CONSTANTS — the two Fellegi-Sunter thresholds
// ============================================================================
/// Above this combined score → automatic-eligible "match".
pub const DEFAULT_MATCH_THRESHOLD: f32 = 0.86;
/// Between the two thresholds → "possible match", surfaced for review.
/// Below this → "non-match" (never offered).
pub const DEFAULT_POSSIBLE_THRESHOLD: f32 = 0.72;
/// Weight of embedding cosine similarity in the combined score.
const W_EMBEDDING: f32 = 0.70;
/// Weight of tag overlap (Jaccard) in the combined score.
const W_TAGS: f32 = 0.15;
/// Weight of content token overlap (Jaccard) in the combined score.
const W_TOKENS: f32 = 0.15;
// ============================================================================
// CLASSIFICATION
// ============================================================================
/// Fellegi-Sunter three-way classification of a candidate pair/cluster.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum MatchClass {
/// Score ≥ match threshold — strong duplicate, auto-merge eligible.
Match,
/// Between thresholds — surfaced for human/agent review, never auto-applied.
Possible,
/// Below the possible threshold — not offered as a candidate.
NonMatch,
}
impl MatchClass {
/// String label used in tool output and the `classification` column.
pub fn as_str(&self) -> &'static str {
match self {
MatchClass::Match => "match",
MatchClass::Possible => "possible",
MatchClass::NonMatch => "non_match",
}
}
}
/// Per-merge-policy thresholds. Wired to `vestige.toml` when present, else the
/// defaults above. `auto_apply` gates whether `Match`-class candidates may be
/// applied without an explicit preview step (default: false — never silent).
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct MergePolicy {
/// Score ≥ this → `Match`.
pub match_threshold: f32,
/// Score in `[possible_threshold, match_threshold)` → `Possible`.
pub possible_threshold: f32,
/// If true, `Match`-class candidates may be auto-applied. Default false:
/// the product promise is review/preview, not silent mutation.
pub auto_apply: bool,
}
impl Default for MergePolicy {
fn default() -> Self {
Self {
match_threshold: DEFAULT_MATCH_THRESHOLD,
possible_threshold: DEFAULT_POSSIBLE_THRESHOLD,
auto_apply: false,
}
}
}
impl MergePolicy {
/// Build a policy, clamping thresholds into `[0,1]` and ensuring
/// `possible_threshold <= match_threshold`.
pub fn new(match_threshold: f32, possible_threshold: f32, auto_apply: bool) -> Self {
let match_threshold = match_threshold.clamp(0.0, 1.0);
let possible_threshold = possible_threshold.clamp(0.0, match_threshold);
Self {
match_threshold,
possible_threshold,
auto_apply,
}
}
/// Classify a combined match score.
pub fn classify(&self, score: f32) -> MatchClass {
if score >= self.match_threshold {
MatchClass::Match
} else if score >= self.possible_threshold {
MatchClass::Possible
} else {
MatchClass::NonMatch
}
}
}
// ============================================================================
// SIGNALS — the self-explaining "why did these combine?"
// ============================================================================
/// The individual signals behind a candidate's score. Surfaced verbatim so a
/// user can see *why* two memories were judged duplicates.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MatchSignals {
/// Cosine similarity of the two embeddings (01).
pub embedding_similarity: f32,
/// Jaccard overlap of the two tag sets (01).
pub tag_overlap: f32,
/// Jaccard overlap of content tokens (01).
pub token_overlap: f32,
/// Combined weighted score that was classified.
pub combined_score: f32,
}
/// Compute the combined match score and its signal breakdown for a pair.
pub fn score_pair(
embedding_similarity: f32,
a_tags: &[String],
b_tags: &[String],
a_content: &str,
b_content: &str,
) -> MatchSignals {
let tag_overlap = jaccard(&tag_set(a_tags), &tag_set(b_tags));
let token_overlap = jaccard(&token_set(a_content), &token_set(b_content));
let combined_score = (W_EMBEDDING * embedding_similarity.clamp(0.0, 1.0)
+ W_TAGS * tag_overlap
+ W_TOKENS * token_overlap)
.clamp(0.0, 1.0);
MatchSignals {
embedding_similarity: embedding_similarity.clamp(0.0, 1.0),
tag_overlap,
token_overlap,
combined_score,
}
}
fn tag_set(tags: &[String]) -> std::collections::HashSet<String> {
tags.iter().map(|t| t.to_lowercase()).collect()
}
fn token_set(content: &str) -> std::collections::HashSet<String> {
content
.split(|c: char| !c.is_alphanumeric())
.filter(|t| t.len() > 2)
.map(|t| t.to_lowercase())
.collect()
}
fn jaccard(a: &std::collections::HashSet<String>, b: &std::collections::HashSet<String>) -> f32 {
if a.is_empty() && b.is_empty() {
return 0.0;
}
let inter = a.intersection(b).count() as f32;
let union = a.union(b).count() as f32;
if union == 0.0 { 0.0 } else { inter / union }
}
// ============================================================================
// CANDIDATE
// ============================================================================
/// A surfaced merge candidate: a cluster of likely-duplicate memories with the
/// signals and classification that justify offering it.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MergeCandidate {
/// Node ids in the cluster. The first is the suggested survivor (highest
/// retention).
pub member_ids: Vec<String>,
/// Short content previews, parallel to `member_ids`.
pub previews: Vec<String>,
/// Suggested survivor id (kept after a merge).
pub survivor_id: String,
/// Combined match score for the cluster (min pairwise within the cluster —
/// the weakest link, so a cluster is only as confident as its loosest pair).
pub confidence: f32,
/// Three-way classification under the active policy.
pub classification: MatchClass,
/// Signals for the survivor↔closest-member pair (the explanation).
pub signals: MatchSignals,
/// True if any member is protected (pinned) — blocks auto-merge.
pub has_protected_member: bool,
}
// ============================================================================
// PLAN — the previewable diff
// ============================================================================
/// What kind of plan this is.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum PlanKind {
/// Combine N memories into one survivor.
Merge,
/// Invalidate A in favour of B (bitemporal, audit-preserving).
Supersede,
}
impl PlanKind {
pub fn as_str(&self) -> &'static str {
match self {
PlanKind::Merge => "merge",
PlanKind::Supersede => "supersede",
}
}
}
/// A previewable plan: exactly what *would* change, without changing anything.
/// Persisted to `merge_plans`; consumed by `apply_plan` via its `id`.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MergePlan {
/// Plan id (UUID).
pub id: String,
/// merge | supersede.
pub kind: PlanKind,
/// Node kept after the operation.
pub survivor_id: String,
/// All node ids involved.
pub member_ids: Vec<String>,
/// Resulting content of the survivor after applying.
pub result_content: String,
/// Resulting tag set of the survivor after applying.
pub result_tags: Vec<String>,
/// Resulting provenance / source string after applying.
pub result_source: Option<String>,
/// For supersede: ids that get bitemporally invalidated (their
/// `valid_until` stamped, kept queryable). For merge: the absorbed ids.
pub invalidated_ids: Vec<String>,
/// Match confidence (01) for the plan.
pub confidence: f32,
/// Three-way classification.
pub classification: MatchClass,
/// Signals explaining the plan.
pub signals: MatchSignals,
/// Human-readable explanation of what this plan does.
pub explanation: String,
}
// ============================================================================
// OPERATION LOG — the reversible "memory reflog"
// ============================================================================
/// A recorded, reversible operation. One row in `merge_operations`.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MergeOperation {
/// Operation id (UUID).
pub id: String,
/// Plan id this came from (if any).
pub plan_id: Option<String>,
/// merge | supersede | undo.
pub op_type: String,
/// applied | reverted.
pub status: String,
/// When recorded (RFC3339).
pub created_at: String,
/// When reverted (RFC3339), if reverted.
pub reverted_at: Option<String>,
/// For undo ops: the op id being reversed.
pub reverts_op_id: Option<String>,
/// Survivor node id.
pub survivor_id: Option<String>,
/// Node ids touched by the op.
pub affected_ids: Vec<String>,
/// Match confidence.
pub confidence: Option<f32>,
/// Human-readable reason.
pub reason: Option<String>,
}
// ============================================================================
// MERGE COMPOSITION — pure helpers used by the storage apply path
// ============================================================================
/// Compose merged content from an ordered list of (id, content) members.
/// Survivor content leads; each absorbed member is appended with provenance so
/// nothing is silently dropped (anti-pattern: Mem0 #4896 double-store /
/// contradiction loss).
pub fn compose_merged_content(members: &[(String, String)]) -> String {
if members.is_empty() {
return String::new();
}
let mut out = members[0].1.trim().to_string();
for (id, content) in &members[1..] {
let c = content.trim();
if c.is_empty() || out.contains(c) {
continue;
}
out.push_str("\n\n[merged from ");
out.push_str(id);
out.push_str("]\n");
out.push_str(c);
}
out
}
/// Union the tag sets of all members, preserving first-seen order.
pub fn compose_merged_tags(member_tags: &[Vec<String>]) -> Vec<String> {
let mut seen = std::collections::HashSet::new();
let mut out = Vec::new();
for tags in member_tags {
for t in tags {
if seen.insert(t.to_lowercase()) {
out.push(t.clone());
}
}
}
out
}
// ============================================================================
// TESTS
// ============================================================================
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn classify_three_zones() {
let policy = MergePolicy::default();
assert_eq!(policy.classify(0.95), MatchClass::Match);
assert_eq!(policy.classify(0.80), MatchClass::Possible);
assert_eq!(policy.classify(0.50), MatchClass::NonMatch);
// boundaries are inclusive at the lower edge of each higher zone
assert_eq!(policy.classify(DEFAULT_MATCH_THRESHOLD), MatchClass::Match);
assert_eq!(
policy.classify(DEFAULT_POSSIBLE_THRESHOLD),
MatchClass::Possible
);
}
#[test]
fn policy_clamps_and_orders() {
// possible above match gets clamped down to match
let p = MergePolicy::new(0.8, 0.95, true);
assert!(p.possible_threshold <= p.match_threshold);
// out-of-range clamps to [0,1]
let p2 = MergePolicy::new(2.0, -1.0, false);
assert_eq!(p2.match_threshold, 1.0);
assert_eq!(p2.possible_threshold, 0.0);
}
#[test]
fn score_pair_combines_signals() {
let s = score_pair(
1.0,
&["rust".into(), "async".into()],
&["rust".into(), "async".into()],
"use tokio for async rust",
"use tokio for async rust",
);
assert!((s.embedding_similarity - 1.0).abs() < 1e-6);
assert!((s.tag_overlap - 1.0).abs() < 1e-6);
assert!(s.token_overlap > 0.9);
assert!(s.combined_score > 0.95);
}
#[test]
fn score_pair_disjoint_is_low() {
let s = score_pair(
0.1,
&["a".into()],
&["b".into()],
"completely different topic alpha",
"totally unrelated subject beta",
);
assert!(s.combined_score < 0.3);
assert_eq!(MergePolicy::default().classify(s.combined_score), MatchClass::NonMatch);
}
#[test]
fn jaccard_basics() {
let a: std::collections::HashSet<String> = ["x".into(), "y".into()].into_iter().collect();
let b: std::collections::HashSet<String> = ["y".into(), "z".into()].into_iter().collect();
assert!((jaccard(&a, &b) - (1.0 / 3.0)).abs() < 1e-6);
let empty: std::collections::HashSet<String> = Default::default();
assert_eq!(jaccard(&empty, &empty), 0.0);
}
#[test]
fn compose_merged_content_dedups_and_attributes() {
let members = vec![
("a".into(), "Keep this.".into()),
("b".into(), "Extra detail.".into()),
("c".into(), "Keep this.".into()), // duplicate of survivor → skipped
];
let merged = compose_merged_content(&members);
assert!(merged.starts_with("Keep this."));
assert!(merged.contains("[merged from b]"));
assert!(merged.contains("Extra detail."));
// duplicate content not appended twice
assert_eq!(merged.matches("Keep this.").count(), 1);
}
#[test]
fn compose_merged_tags_unions_in_order() {
let tags = vec![
vec!["rust".into(), "async".into()],
vec!["async".into(), "tokio".into()],
];
let merged = compose_merged_tags(&tags);
assert_eq!(merged, vec!["rust", "async", "tokio"]);
}
#[test]
fn match_class_labels() {
assert_eq!(MatchClass::Match.as_str(), "match");
assert_eq!(MatchClass::Possible.as_str(), "possible");
assert_eq!(MatchClass::NonMatch.as_str(), "non_match");
}
}

View file

@ -23,6 +23,7 @@ pub mod cross_project;
pub mod dreams;
pub mod importance;
pub mod intent;
pub mod merge_supersede;
pub mod prediction_error;
pub mod reconsolidation;
pub mod speculative;
@ -61,6 +62,11 @@ pub use dreams::{
};
pub use importance::{ImportanceDecayConfig, ImportanceScore, ImportanceTracker, UsageEvent};
pub use intent::{ActionType, DetectedIntent, IntentDetector, MaintenanceType, UserAction};
pub use merge_supersede::{
DEFAULT_MATCH_THRESHOLD, DEFAULT_POSSIBLE_THRESHOLD, MatchClass, MatchSignals, MergeCandidate,
MergeOperation, MergePlan, MergePolicy, PlanKind, compose_merged_content, compose_merged_tags,
score_pair,
};
pub use prediction_error::{
CandidateMemory, CreateReason, EvaluationIntent, GateDecision, GateStats, MergeStrategy,
PredictionErrorConfig, PredictionErrorGate, SimilarityResult, SupersedeReason, UpdateType,

View file

@ -225,8 +225,16 @@ pub use advanced::{
MemoryPath,
MemoryReplay,
MemorySnapshot,
// Merge / Supersede controls (Phase 3)
MatchClass,
MatchSignals,
MergeCandidate,
MergeOperation,
MergePlan,
MergePolicy,
MergeStrategy,
Modification,
PlanKind,
Pattern,
PatternType,
PredictedMemory,

View file

@ -69,6 +69,11 @@ pub const MIGRATIONS: &[Migration] = &[
description: "v2.1.2 Honest Memory: non-content purge tombstones",
up: MIGRATION_V13_UP,
},
Migration {
version: 14,
description: "v2.1.25 Merge/Supersede: reversible operation log, merge plans, bitemporal lineage, protected pins",
up: MIGRATION_V14_UP,
},
];
/// A database migration
@ -735,6 +740,79 @@ ON deletion_tombstones(deleted_at);
UPDATE schema_version SET version = 13, applied_at = datetime('now');
"#;
/// V14: Merge / Supersede controls (Phase 3).
///
/// Adds the four pieces the merge/supersede feature needs on a never-delete
/// (bitemporal) store:
///
/// 1. `merge_plans` — previewable, not-yet-applied plans. `plan_merge` and
/// `plan_supersede` write a plan row containing a JSON diff; `apply_plan`
/// consumes it by id. Plans are append-only; status moves
/// pending -> applied / cancelled.
/// 2. `merge_operations` — the reversible operation log (the "memory reflog").
/// Every applied merge/supersede records one row with a JSON `undo_payload`
/// capturing exactly what changed, so `merge_undo` can reverse it. The
/// `signals` column records WHY the memories combined (provenance), which is
/// the self-explaining differentiator.
/// 3. `knowledge_nodes.protected` — pin flag. A protected memory can never be
/// auto-merged, superseded, or forgotten.
/// 4. `knowledge_nodes.superseded_by` — bitemporal lineage pointer. Superseding
/// A with B does NOT delete A: it stamps A.valid_until = B.valid_from and
/// sets A.superseded_by = B.id, leaving A fully queryable for audit
/// (Graphiti-style invalidate-don't-delete).
// The two `protected` / `superseded_by` ADD COLUMNs (and their indexes) are
// applied separately in `apply_migrations` BEFORE this batch runs, guarded
// against "duplicate column" on replay, since SQLite has no
// `ADD COLUMN IF NOT EXISTS`. The rest of V14 is idempotent (CREATE ... IF NOT
// EXISTS).
const MIGRATION_V14_UP: &str = r#"
CREATE INDEX IF NOT EXISTS idx_nodes_protected ON knowledge_nodes(protected);
CREATE INDEX IF NOT EXISTS idx_nodes_superseded_by ON knowledge_nodes(superseded_by);
-- Previewable plans (a diff) produced by plan_merge / plan_supersede.
-- `kind` is 'merge' | 'supersede'. `payload` is the full JSON plan/diff.
CREATE TABLE IF NOT EXISTS merge_plans (
id TEXT PRIMARY KEY,
kind TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending', -- pending | applied | cancelled
created_at TEXT NOT NULL,
applied_at TEXT,
survivor_id TEXT, -- node kept after the op
member_ids TEXT NOT NULL DEFAULT '[]', -- JSON array of all involved node ids
confidence REAL, -- Fellegi-Sunter match score (0-1)
classification TEXT, -- match | possible | non_match
payload TEXT NOT NULL -- full JSON plan/diff
);
CREATE INDEX IF NOT EXISTS idx_merge_plans_status ON merge_plans(status);
CREATE INDEX IF NOT EXISTS idx_merge_plans_created_at ON merge_plans(created_at);
-- Reversible operation log the "git reflog for your agent's memory".
-- One row per applied merge/supersede; `undo_payload` carries everything
-- needed to reverse it, `signals` records why the memories combined.
CREATE TABLE IF NOT EXISTS merge_operations (
id TEXT PRIMARY KEY,
plan_id TEXT, -- merge_plans.id this came from
op_type TEXT NOT NULL, -- merge | supersede | undo
status TEXT NOT NULL DEFAULT 'applied', -- applied | reverted
created_at TEXT NOT NULL,
reverted_at TEXT,
reverts_op_id TEXT, -- set when op_type = 'undo'
survivor_id TEXT, -- node kept
affected_ids TEXT NOT NULL DEFAULT '[]', -- JSON array of node ids touched
confidence REAL,
signals TEXT, -- JSON: why they combined (provenance)
reason TEXT, -- human-readable explanation
undo_payload TEXT NOT NULL -- JSON snapshot to reverse the op
);
CREATE INDEX IF NOT EXISTS idx_merge_operations_status ON merge_operations(status);
CREATE INDEX IF NOT EXISTS idx_merge_operations_created_at ON merge_operations(created_at);
CREATE INDEX IF NOT EXISTS idx_merge_operations_survivor ON merge_operations(survivor_id);
UPDATE schema_version SET version = 14, applied_at = datetime('now');
"#;
/// Get current schema version from database
pub fn get_current_version(conn: &rusqlite::Connection) -> rusqlite::Result<u32> {
conn.query_row(
@ -745,6 +823,19 @@ pub fn get_current_version(conn: &rusqlite::Connection) -> rusqlite::Result<u32>
.or(Ok(0))
}
/// Run an `ALTER TABLE ... ADD COLUMN` statement, treating a "duplicate column
/// name" failure as success so migration replay stays idempotent (SQLite has no
/// `ADD COLUMN IF NOT EXISTS`).
fn add_column_if_missing(conn: &rusqlite::Connection, sql: &str) -> rusqlite::Result<()> {
match conn.execute(sql, []) {
Ok(_) => Ok(()),
Err(rusqlite::Error::SqliteFailure(_, Some(msg))) if msg.contains("duplicate column name") => {
Ok(())
}
Err(e) => Err(e),
}
}
/// Apply pending migrations
pub fn apply_migrations(conn: &rusqlite::Connection) -> rusqlite::Result<u32> {
let current_version = get_current_version(conn)?;
@ -758,6 +849,21 @@ pub fn apply_migrations(conn: &rusqlite::Connection) -> rusqlite::Result<u32> {
migration.description
);
// V14: add the two bitemporal/protect columns BEFORE the batch (the
// batch's indexes reference them). SQLite lacks
// `ADD COLUMN IF NOT EXISTS`, so swallow the "duplicate column"
// error to stay idempotent on replay.
if migration.version == 14 {
add_column_if_missing(
conn,
"ALTER TABLE knowledge_nodes ADD COLUMN protected INTEGER NOT NULL DEFAULT 0",
)?;
add_column_if_missing(
conn,
"ALTER TABLE knowledge_nodes ADD COLUMN superseded_by TEXT",
)?;
}
// Use execute_batch to handle multi-statement SQL including triggers
conn.execute_batch(migration.up)?;
@ -784,17 +890,17 @@ mod tests {
/// version after `apply_migrations` runs all migrations end-to-end, and
/// neither of the dead tables V11 drops must exist afterwards.
#[test]
fn test_apply_migrations_advances_to_v13_and_drops_dead_tables() {
fn test_apply_migrations_advances_to_v14_and_drops_dead_tables() {
let conn = rusqlite::Connection::open_in_memory().expect("open in-memory");
// Pre-requisite: schema_version must be bootstrapped by V1.
apply_migrations(&conn).expect("apply_migrations succeeds");
// 1. schema_version advanced to V13
// 1. schema_version advanced to V14
let version = get_current_version(&conn).expect("read schema_version");
assert_eq!(
version, 13,
"schema_version must be 13 after all migrations"
version, 14,
"schema_version must be 14 after all migrations"
);
// 2. knowledge_edges is gone (V11 drops it)
@ -848,6 +954,37 @@ mod tests {
deletion_tombstone_rows, 1,
"deletion_tombstones table must be created by V13"
);
// 6. merge_plans + merge_operations exist (V14 creates them)
for table in ["merge_plans", "merge_operations"] {
let rows: i64 = conn
.query_row(
"SELECT COUNT(*) FROM sqlite_master WHERE type='table' AND name=?1",
[table],
|row| row.get(0),
)
.expect("query sqlite_master");
assert_eq!(rows, 1, "{table} table must be created by V14");
}
// 7. knowledge_nodes gains `protected` + `superseded_by` (V14)
let node_cols: Vec<String> = {
let mut stmt = conn
.prepare("PRAGMA table_info(knowledge_nodes)")
.expect("prepare table_info");
stmt.query_map([], |row| row.get::<_, String>(1))
.expect("query table_info")
.filter_map(|r| r.ok())
.collect()
};
assert!(
node_cols.iter().any(|c| c == "protected"),
"knowledge_nodes must have `protected` column after V14"
);
assert!(
node_cols.iter().any(|c| c == "superseded_by"),
"knowledge_nodes must have `superseded_by` column after V14"
);
}
/// V11 must be idempotent on replay — if the tables were already dropped
@ -869,6 +1006,6 @@ mod tests {
apply_migrations(&conn).expect("V11 replay must be idempotent");
let version = get_current_version(&conn).expect("read schema_version");
assert_eq!(version, 13, "schema_version back at 13 after replay");
assert_eq!(version, 14, "schema_version back at 14 after replay");
}
}

File diff suppressed because it is too large Load diff

View file

@ -1,6 +1,6 @@
[package]
name = "vestige-mcp"
version = "2.1.23"
version = "2.1.25"
edition = "2024"
description = "Cognitive memory MCP server for AI agents - FSRS-6, spreading activation, synaptic tagging, 3D dashboard, and 130 years of memory research"
authors = ["samvallad33"]
@ -51,7 +51,7 @@ path = "src/bin/cli.rs"
# Only `bundled-sqlite` is always on. `embeddings` and `vector-search` are
# toggled via vestige-mcp's own feature flags below so `--no-default-features`
# actually works (previously hardcoded here, which silently defeated the flag).
vestige-core = { version = "2.1.23", path = "../vestige-core", default-features = false, features = ["bundled-sqlite"] }
vestige-core = { version = "2.1.25", path = "../vestige-core", default-features = false, features = ["bundled-sqlite"] }
# ============================================================================
# MCP Server Dependencies

View file

@ -328,6 +328,52 @@ impl McpServer {
..Default::default()
},
// ================================================================
// MERGE / SUPERSEDE CONTROLS (v2.1.25 — Phase 3)
// Diff-previewed, confidence-gated, reversible, never silent.
// ================================================================
ToolDescription {
name: "merge_candidates".to_string(),
description: Some("Surface likely duplicate/overlapping memory clusters with confidence scores and the signals behind each (Fellegi-Sunter match/possible/non-match). Read-only — nothing is changed.".to_string()),
input_schema: tools::merge::merge_candidates_schema(),
..Default::default()
},
ToolDescription {
name: "plan_merge".to_string(),
description: Some("Produce a previewable MERGE plan (a diff: combined content/tags/provenance) for 2+ memories WITHOUT applying it. Returns a plan_id for apply_plan. Protected members block the merge.".to_string()),
input_schema: tools::merge::plan_merge_schema(),
..Default::default()
},
ToolDescription {
name: "plan_supersede".to_string(),
description: Some("Preview superseding memory A with B — bitemporal invalidation (stamps valid_until, keeps A queryable for audit) WITHOUT applying. Returns a plan_id for apply_plan.".to_string()),
input_schema: tools::merge::plan_supersede_schema(),
..Default::default()
},
ToolDescription {
name: "apply_plan".to_string(),
description: Some("Execute a previously-generated merge/supersede plan by id. Recorded as a reversible operation. Old memories are invalidated (never deleted). 'possible'/'non_match' plans require confirm=true.".to_string()),
input_schema: tools::merge::apply_plan_schema(),
..Default::default()
},
ToolDescription {
name: "merge_undo".to_string(),
description: Some("Reverse a prior merge/supersede operation (the 'git reflog for your agent's memory'). With no operation_id, lists the reversible operation log so you can pick one.".to_string()),
input_schema: tools::merge::merge_undo_schema(),
..Default::default()
},
ToolDescription {
name: "protect".to_string(),
description: Some("Pin a memory so it can never be auto-merged, superseded, or garbage-collected. Pass protected=false to unpin.".to_string()),
input_schema: tools::merge::protect_schema(),
..Default::default()
},
ToolDescription {
name: "merge_policy".to_string(),
description: Some("Get or set the per-project merge policy: the two Fellegi-Sunter thresholds (match_threshold, possible_threshold) and auto_apply. No args returns the current policy.".to_string()),
input_schema: tools::merge::merge_policy_schema(),
..Default::default()
},
// ================================================================
// COGNITIVE TOOLS (v1.5+)
// ================================================================
ToolDescription {
@ -887,6 +933,14 @@ impl McpServer {
}
"find_duplicates" => tools::dedup::execute(&self.storage, request.arguments).await,
// ================================================================
// MERGE / SUPERSEDE CONTROLS (v2.1.25 — Phase 3)
// ================================================================
"merge_candidates" | "plan_merge" | "plan_supersede" | "apply_plan" | "merge_undo"
| "protect" | "merge_policy" => {
tools::merge::execute(&self.storage, request.name.as_str(), request.arguments).await
}
// ================================================================
// COGNITIVE TOOLS (v1.5+)
// ================================================================
@ -1686,8 +1740,10 @@ mod tests {
let result = response.result.unwrap();
let tools = result["tools"].as_array().unwrap();
// v2.1.21: 25 tools (includes first-class contradictions surface)
assert_eq!(tools.len(), 25, "Expected exactly 25 tools in v2.1.21");
// v2.1.25: 32 tools (25 from v2.1.21 + 7 Phase 3 merge/supersede tools:
// merge_candidates, plan_merge, plan_supersede, apply_plan, merge_undo,
// protect, merge_policy)
assert_eq!(tools.len(), 32, "Expected exactly 32 tools in v2.1.25");
let tool_names: Vec<&str> = tools.iter().map(|t| t["name"].as_str().unwrap()).collect();
@ -1741,6 +1797,15 @@ mod tests {
assert!(tool_names.contains(&"importance_score"));
assert!(tool_names.contains(&"find_duplicates"));
// Merge / Supersede controls (v2.1.25 — Phase 3)
assert!(tool_names.contains(&"merge_candidates"));
assert!(tool_names.contains(&"plan_merge"));
assert!(tool_names.contains(&"plan_supersede"));
assert!(tool_names.contains(&"apply_plan"));
assert!(tool_names.contains(&"merge_undo"));
assert!(tool_names.contains(&"protect"));
assert!(tool_names.contains(&"merge_policy"));
// Cognitive tools (v1.5)
assert!(tool_names.contains(&"dream"));
assert!(tool_names.contains(&"explore_connections"));

View file

@ -0,0 +1,530 @@
//! Merge / Supersede control tools (Phase 3 — v2.1.25)
//!
//! Diff-previewed, confidence-gated, reversible, self-explaining
//! combine/dedupe/supersede on a never-delete (bitemporal) store. The default
//! is always preview/review — these tools never silently mutate memory.
//!
//! Tool surface (each registered as its own MCP tool name, all routed here):
//!
//! - `merge_candidates` — surface likely duplicate clusters with confidence +
//! the signals behind each (Fellegi-Sunter match / possible / non-match).
//! - `plan_merge` — previewable merge PLAN (a diff) without applying it.
//! - `plan_supersede` — preview superseding A with B (bitemporal invalidation,
//! audit-preserving) without applying.
//! - `apply_plan` — execute a previously-generated plan id; recorded as a
//! reversible operation.
//! - `merge_undo` — reverse a prior merge/supersede operation (the reflog).
//! - `protect` — pin a memory so it can never be auto-merged/superseded/forgotten.
//! - `merge_policy` — get/set the two confidence thresholds + auto_apply.
//!
//! The actual logic lives in `vestige_core` (`storage::Storage` +
//! `advanced::merge_supersede`); this layer only validates arguments and shapes
//! JSON.
use serde_json::{Value, json};
use std::sync::Arc;
use vestige_core::Storage;
// ============================================================================
// SCHEMAS
// ============================================================================
/// `merge_candidates` input schema.
pub fn merge_candidates_schema() -> Value {
json!({
"type": "object",
"properties": {
"limit": {
"type": "integer",
"description": "Max candidate clusters to return (default 20).",
"default": 20, "minimum": 1, "maximum": 100
},
"tags": {
"type": "array",
"items": { "type": "string" },
"description": "Optional: only consider memories with these tags (ANY match)."
}
}
})
}
/// `plan_merge` input schema.
pub fn plan_merge_schema() -> Value {
json!({
"type": "object",
"properties": {
"member_ids": {
"type": "array",
"items": { "type": "string" },
"description": "IDs of the memories to merge (>= 2). The survivor is kept; the rest are bitemporally invalidated (kept for audit)."
},
"survivor_id": {
"type": "string",
"description": "Optional: which member to keep. Defaults to the highest-retention member."
}
},
"required": ["member_ids"]
})
}
/// `plan_supersede` input schema.
pub fn plan_supersede_schema() -> Value {
json!({
"type": "object",
"properties": {
"old_id": { "type": "string", "description": "Memory being superseded (kept, marked invalid)." },
"new_id": { "type": "string", "description": "Memory that supersedes the old one." }
},
"required": ["old_id", "new_id"]
})
}
/// `apply_plan` input schema.
pub fn apply_plan_schema() -> Value {
json!({
"type": "object",
"properties": {
"plan_id": { "type": "string", "description": "ID of a plan produced by plan_merge / plan_supersede." },
"confirm": {
"type": "boolean",
"description": "Required true for 'possible'/'non_match' plans. 'match' plans apply only if the policy has auto_apply=true, else confirm is required too.",
"default": false
}
},
"required": ["plan_id"]
})
}
/// `merge_undo` input schema.
pub fn merge_undo_schema() -> Value {
json!({
"type": "object",
"properties": {
"operation_id": {
"type": "string",
"description": "ID of the merge/supersede operation to reverse. Omit to list recent operations (the reflog)."
}
}
})
}
/// `protect` input schema.
pub fn protect_schema() -> Value {
json!({
"type": "object",
"properties": {
"id": { "type": "string", "description": "Memory id to protect/unprotect." },
"protected": {
"type": "boolean",
"description": "true to pin (block auto-merge/supersede/forget), false to unpin. Default true.",
"default": true
}
},
"required": ["id"]
})
}
/// `merge_policy` input schema.
pub fn merge_policy_schema() -> Value {
json!({
"type": "object",
"properties": {
"match_threshold": {
"type": "number",
"description": "Score >= this => 'match' (auto-merge eligible). 0-1.",
"minimum": 0.0, "maximum": 1.0
},
"possible_threshold": {
"type": "number",
"description": "Score in [possible, match) => 'possible' (review). Below => not offered. 0-1.",
"minimum": 0.0, "maximum": 1.0
},
"auto_apply": {
"type": "boolean",
"description": "Allow 'match'-class plans to apply without confirm. Default false (review-first)."
}
}
})
}
// ============================================================================
// DISPATCH
// ============================================================================
/// Route a merge/supersede tool call by tool name.
pub async fn execute(storage: &Arc<Storage>, tool: &str, args: Option<Value>) -> Result<Value, String> {
match tool {
"merge_candidates" => merge_candidates(storage, args),
"plan_merge" => plan_merge(storage, args),
"plan_supersede" => plan_supersede(storage, args),
"apply_plan" => apply_plan(storage, args),
"merge_undo" => merge_undo(storage, args),
"protect" => protect(storage, args),
"merge_policy" => merge_policy(storage, args),
other => Err(format!("unknown merge tool: {other}")),
}
}
fn obj(args: &Option<Value>) -> serde_json::Map<String, Value> {
args.as_ref()
.and_then(|v| v.as_object().cloned())
.unwrap_or_default()
}
// ============================================================================
// merge_candidates
// ============================================================================
fn merge_candidates(storage: &Arc<Storage>, args: Option<Value>) -> Result<Value, String> {
#[cfg(all(feature = "embeddings", feature = "vector-search"))]
{
let a = obj(&args);
let limit = a.get("limit").and_then(|v| v.as_u64()).unwrap_or(20) as usize;
let tags: Vec<String> = a
.get("tags")
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|t| t.as_str().map(|s| s.to_string()))
.collect()
})
.unwrap_or_default();
let policy = storage.get_merge_policy().map_err(|e| e.to_string())?;
let candidates = storage
.merge_candidates(policy, limit, &tags)
.map_err(|e| e.to_string())?;
let out: Vec<Value> = candidates
.iter()
.map(|c| {
json!({
"memberIds": c.member_ids,
"previews": c.previews,
"survivorId": c.survivor_id,
"confidence": format!("{:.3}", c.confidence),
"classification": c.classification.as_str(),
"hasProtectedMember": c.has_protected_member,
"signals": {
"embeddingSimilarity": format!("{:.3}", c.signals.embedding_similarity),
"tagOverlap": format!("{:.3}", c.signals.tag_overlap),
"tokenOverlap": format!("{:.3}", c.signals.token_overlap),
"combinedScore": format!("{:.3}", c.signals.combined_score)
},
"nextStep": if c.has_protected_member {
"A member is protected — unprotect it or pick it as survivor before plan_merge."
} else {
"Call plan_merge with these memberIds to preview the combined result."
}
})
})
.collect();
let policy = storage.get_merge_policy().map_err(|e| e.to_string())?;
Ok(json!({
"candidates": out,
"totalCandidates": out.len(),
"policy": {
"matchThreshold": policy.match_threshold,
"possibleThreshold": policy.possible_threshold,
"autoApply": policy.auto_apply
},
"note": "Nothing was changed. These are review candidates only."
}))
}
#[cfg(not(all(feature = "embeddings", feature = "vector-search")))]
{
let _ = (storage, args);
Ok(json!({ "error": "Embeddings feature not enabled.", "candidates": [] }))
}
}
// ============================================================================
// plan_merge
// ============================================================================
fn plan_merge(storage: &Arc<Storage>, args: Option<Value>) -> Result<Value, String> {
#[cfg(all(feature = "embeddings", feature = "vector-search"))]
{
let a = obj(&args);
let member_ids: Vec<String> = a
.get("member_ids")
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|t| t.as_str().map(|s| s.to_string()))
.collect()
})
.unwrap_or_default();
if member_ids.len() < 2 {
return Err("member_ids must contain at least 2 ids".into());
}
let survivor = a.get("survivor_id").and_then(|v| v.as_str());
let policy = storage.get_merge_policy().map_err(|e| e.to_string())?;
let plan = storage
.plan_merge(&member_ids, survivor, policy)
.map_err(|e| e.to_string())?;
Ok(plan_to_json(&plan, &policy))
}
#[cfg(not(all(feature = "embeddings", feature = "vector-search")))]
{
let _ = (storage, args);
Err("Embeddings feature not enabled.".into())
}
}
// ============================================================================
// plan_supersede
// ============================================================================
fn plan_supersede(storage: &Arc<Storage>, args: Option<Value>) -> Result<Value, String> {
#[cfg(all(feature = "embeddings", feature = "vector-search"))]
{
let a = obj(&args);
let old_id = a
.get("old_id")
.and_then(|v| v.as_str())
.ok_or("old_id is required")?;
let new_id = a
.get("new_id")
.and_then(|v| v.as_str())
.ok_or("new_id is required")?;
let policy = storage.get_merge_policy().map_err(|e| e.to_string())?;
let plan = storage
.plan_supersede(old_id, new_id, policy)
.map_err(|e| e.to_string())?;
Ok(plan_to_json(&plan, &policy))
}
#[cfg(not(all(feature = "embeddings", feature = "vector-search")))]
{
let _ = (storage, args);
Err("Embeddings feature not enabled.".into())
}
}
#[cfg(all(feature = "embeddings", feature = "vector-search"))]
fn plan_to_json(plan: &vestige_core::MergePlan, policy: &vestige_core::MergePolicy) -> Value {
let requires_confirm = plan.classification != vestige_core::MatchClass::Match || !policy.auto_apply;
json!({
"planId": plan.id,
"kind": plan.kind.as_str(),
"survivorId": plan.survivor_id,
"memberIds": plan.member_ids,
"diff": {
"resultContent": plan.result_content,
"resultTags": plan.result_tags,
"resultSource": plan.result_source,
"invalidatedIds": plan.invalidated_ids
},
"confidence": format!("{:.3}", plan.confidence),
"classification": plan.classification.as_str(),
"signals": {
"embeddingSimilarity": format!("{:.3}", plan.signals.embedding_similarity),
"tagOverlap": format!("{:.3}", plan.signals.tag_overlap),
"tokenOverlap": format!("{:.3}", plan.signals.token_overlap),
"combinedScore": format!("{:.3}", plan.signals.combined_score)
},
"explanation": plan.explanation,
"requiresConfirm": requires_confirm,
"nextStep": format!(
"Review the diff. To execute: apply_plan with plan_id='{}'{}.",
plan.id,
if requires_confirm { " and confirm=true" } else { "" }
),
"note": "Nothing was changed. This is a preview plan — apply_plan applies it; merge_undo reverses it."
})
}
// ============================================================================
// apply_plan
// ============================================================================
fn apply_plan(storage: &Arc<Storage>, args: Option<Value>) -> Result<Value, String> {
#[cfg(all(feature = "embeddings", feature = "vector-search"))]
{
let a = obj(&args);
let plan_id = a
.get("plan_id")
.and_then(|v| v.as_str())
.ok_or("plan_id is required")?;
let confirm = a.get("confirm").and_then(|v| v.as_bool()).unwrap_or(false);
let op = storage
.apply_plan(plan_id, confirm)
.map_err(|e| e.to_string())?;
Ok(json!({
"operationId": op.id,
"opType": op.op_type,
"status": op.status,
"survivorId": op.survivor_id,
"affectedIds": op.affected_ids,
"reason": op.reason,
"appliedAt": op.created_at,
"reversible": true,
"nextStep": format!("To reverse this, call merge_undo with operation_id='{}'.", op.id),
"note": "Old memories were bitemporally invalidated (valid_until stamped), NOT deleted. They remain queryable for audit."
}))
}
#[cfg(not(all(feature = "embeddings", feature = "vector-search")))]
{
let _ = (storage, args);
Err("Embeddings feature not enabled.".into())
}
}
// ============================================================================
// merge_undo (also lists the reflog when no id given)
// ============================================================================
fn merge_undo(storage: &Arc<Storage>, args: Option<Value>) -> Result<Value, String> {
#[cfg(all(feature = "embeddings", feature = "vector-search"))]
{
let a = obj(&args);
match a.get("operation_id").and_then(|v| v.as_str()) {
Some(op_id) => {
let op = storage.merge_undo(op_id).map_err(|e| e.to_string())?;
Ok(json!({
"undoOperationId": op.id,
"revertedOperationId": op.reverts_op_id,
"status": "reverted",
"affectedIds": op.affected_ids,
"reason": op.reason,
"note": "The original operation was reversed: survivor content/tags restored and invalidation cleared. The plan is re-openable."
}))
}
None => {
// No id => return the reflog so the caller can pick one.
let ops = storage.list_merge_operations(20).map_err(|e| e.to_string())?;
let log: Vec<Value> = ops
.iter()
.map(|op| {
json!({
"operationId": op.id,
"opType": op.op_type,
"status": op.status,
"survivorId": op.survivor_id,
"affectedIds": op.affected_ids,
"confidence": op.confidence.map(|c| format!("{:.3}", c)),
"reason": op.reason,
"createdAt": op.created_at,
"revertedAt": op.reverted_at
})
})
.collect();
Ok(json!({
"operations": log,
"totalOperations": log.len(),
"note": "This is the reversible operation log (the memory reflog). Pass operation_id to reverse one."
}))
}
}
}
#[cfg(not(all(feature = "embeddings", feature = "vector-search")))]
{
let _ = (storage, args);
Err("Embeddings feature not enabled.".into())
}
}
// ============================================================================
// protect
// ============================================================================
fn protect(storage: &Arc<Storage>, args: Option<Value>) -> Result<Value, String> {
let a = obj(&args);
let id = a
.get("id")
.and_then(|v| v.as_str())
.ok_or("id is required")?;
let protected = a.get("protected").and_then(|v| v.as_bool()).unwrap_or(true);
storage
.set_protected(id, protected)
.map_err(|e| e.to_string())?;
Ok(json!({
"id": id,
"protected": protected,
"note": if protected {
"Memory pinned. It can never be auto-merged, superseded, or garbage-collected until unprotected."
} else {
"Memory unprotected. It is now eligible for merge/supersede/forget again."
}
}))
}
// ============================================================================
// merge_policy (get when no args, set otherwise)
// ============================================================================
fn merge_policy(storage: &Arc<Storage>, args: Option<Value>) -> Result<Value, String> {
let a = obj(&args);
let current = storage.get_merge_policy().map_err(|e| e.to_string())?;
let has_update = a.contains_key("match_threshold")
|| a.contains_key("possible_threshold")
|| a.contains_key("auto_apply");
if has_update {
let match_t = a
.get("match_threshold")
.and_then(|v| v.as_f64())
.map(|v| v as f32)
.unwrap_or(current.match_threshold);
let possible_t = a
.get("possible_threshold")
.and_then(|v| v.as_f64())
.map(|v| v as f32)
.unwrap_or(current.possible_threshold);
let auto = a
.get("auto_apply")
.and_then(|v| v.as_bool())
.unwrap_or(current.auto_apply);
let policy = vestige_core::MergePolicy::new(match_t, possible_t, auto);
storage.set_merge_policy(policy).map_err(|e| e.to_string())?;
Ok(json!({
"updated": true,
"matchThreshold": policy.match_threshold,
"possibleThreshold": policy.possible_threshold,
"autoApply": policy.auto_apply,
"note": "Policy saved. Fellegi-Sunter: score>=match => auto-merge eligible; [possible,match) => review; below => not offered."
}))
} else {
Ok(json!({
"matchThreshold": current.match_threshold,
"possibleThreshold": current.possible_threshold,
"autoApply": current.auto_apply,
"note": "Two-threshold merge policy. Pass match_threshold / possible_threshold / auto_apply to change it."
}))
}
}
// ============================================================================
// TESTS — see tests/merge_supersede_test.rs for full integration coverage.
// ============================================================================
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn schemas_are_objects() {
for s in [
merge_candidates_schema(),
plan_merge_schema(),
plan_supersede_schema(),
apply_plan_schema(),
merge_undo_schema(),
protect_schema(),
merge_policy_schema(),
] {
assert_eq!(s["type"], "object");
}
}
#[test]
fn plan_merge_requires_two_ids() {
assert!(plan_merge_schema()["required"]
.as_array()
.unwrap()
.iter()
.any(|v| v == "member_ids"));
}
}

View file

@ -24,6 +24,9 @@ pub mod maintenance;
pub mod dedup;
pub mod importance;
// v2.1.25: Merge / Supersede controls (Phase 3)
pub mod merge;
// v1.5: Cognitive tools
pub mod dream;
pub mod explore;

152
docs/MERGE_SUPERSEDE.md Normal file
View file

@ -0,0 +1,152 @@
# Merge / Supersede Controls (Phase 3)
> Diff-previewed, confidence-gated, reversible, self-explaining
> combine/dedupe/supersede on a never-delete (bitemporal) store.
Memory systems accumulate duplicates, near-duplicates, and outdated facts. The
naive fixes are all bad: dumb hashing under-merges (misses paraphrases),
aggressive LLM merging over-merges and destroys the audit trail, and
auto-deleting on contradiction silently loses information. Vestige's Phase 3
takes the opposite stance:
- **Opt-in, never silent.** The default is preview/review. Nothing mutates your
memory unless you explicitly apply a plan.
- **Diff-previewed.** `plan_merge` / `plan_supersede` show exactly what *would*
change before anything does.
- **Confidence-gated.** A Fellegi-Sunter two-threshold score classifies each
candidate as `match` / `possible` / `non_match`.
- **Reversible.** Every applied operation is recorded with an undo payload — a
*git reflog for your agent's memory*.
- **Self-explaining.** Each candidate carries the signals that explain *why* two
memories were judged duplicates.
- **Audit-preserving.** Superseding does not delete: it stamps `valid_until` and
keeps the old memory queryable (Graphiti-style "invalidate, don't delete").
## The bitemporal model: invalidate, don't delete
Superseding memory A with memory B does **not** erase A. Instead:
- `A.valid_until` is stamped with the supersede time.
- `A.superseded_by` is set to `B.id` (a lineage pointer).
- A remains fully queryable for audit. Searches and timelines can still surface
it; it is simply marked as no longer the current truth.
This reuses the existing `valid_from` / `valid_until` columns on
`knowledge_nodes` (migration V2) plus a new `superseded_by` column (migration
V14). Merges work the same way: the survivor absorbs the others' content, and
each absorbed node is bitemporally invalidated rather than deleted.
## Fellegi-Sunter two-threshold scoring
Candidate scoring combines three signals into a weighted score in `[0, 1]`:
| Signal | Weight | Source |
| ----------------------- | -----: | ------------------------------------------ |
| Embedding cosine sim | 0.70 | stored embeddings (`node_embeddings`) |
| Tag overlap (Jaccard) | 0.15 | `knowledge_nodes.tags` |
| Content token overlap | 0.15 | Jaccard over content tokens (len > 2) |
The combined score is then classified against **two** thresholds:
```
score >= match_threshold => "match" (auto-merge eligible)
possible_threshold <= score => "possible" (surfaced for review)
score < possible_threshold => "non_match" (never offered)
```
Defaults: `match_threshold = 0.86`, `possible_threshold = 0.72`. The two-band
design means borderline cases are surfaced for review instead of being
force-decided in either direction.
A cluster's confidence is the **weakest** pairwise score within it (the loosest
link), so a cluster is only as confident as its least-similar member.
## The reversible operation log (the "memory reflog")
Every applied merge/supersede writes one row to `merge_operations`:
- `op_type``merge` | `supersede` | `undo`
- `status``applied` | `reverted`
- `survivor_id`, `affected_ids` — what was touched
- `confidence`, `signals` — the score and *why* the memories combined
- `reason` — a human-readable explanation
- `undo_payload` — a JSON snapshot capturing everything needed to reverse it
`merge_undo` consumes the undo payload to restore the survivor's prior
content/tags and clear the bitemporal invalidation on every affected node, then
records a compensating `undo` operation. Calling `merge_undo` with no
`operation_id` returns the operation log so you can pick one.
## Memory protection (pinning)
`protect` sets the `protected` flag on a memory. A protected memory:
- is never offered for auto-merge (it is flagged in `merge_candidates`),
- cannot be merged *away* (it may only be the survivor of a merge),
- cannot be superseded,
- is excluded from garbage collection.
Pass `protected: false` to unpin.
## Tool surface
| Tool | Mutates? | Purpose |
| ------------------ | :------: | ------------------------------------------------------------------------- |
| `merge_candidates` | No | Surface likely duplicate clusters with confidence + signals. |
| `plan_merge` | No | Preview a merge of 2+ memories (a diff). Returns a `plan_id`. |
| `plan_supersede` | No | Preview superseding A with B (bitemporal). Returns a `plan_id`. |
| `apply_plan` | **Yes** | Execute a plan by id; recorded as a reversible operation. |
| `merge_undo` | **Yes** | Reverse an operation, or list the operation log when given no id. |
| `protect` | **Yes** | Pin / unpin a memory so it can never be auto-merged/superseded/forgotten. |
| `merge_policy` | **Yes** | Get/set the two thresholds + `auto_apply`. |
### Typical flow
```text
1. merge_candidates -> review clusters + confidence + signals
2. plan_merge { member_ids: [...] } -> inspect the diff, get plan_id
3. apply_plan { plan_id, confirm } -> apply; get operation_id (reversible)
4. merge_undo { operation_id } -> reverse if it was wrong
```
`apply_plan` requires `confirm: true` for `possible` / `non_match` plans. A
`match` plan applies without `confirm` only when the policy has
`auto_apply: true` (default `false`).
## Configuration
The merge policy persists per project (stored in `fsrs_config`). It can also be
overridden via environment variables:
| Variable | Meaning |
| ----------------------------------- | ------------------------------------ |
| `VESTIGE_MERGE_MATCH_THRESHOLD` | Score ≥ this ⇒ `match`. |
| `VESTIGE_MERGE_POSSIBLE_THRESHOLD` | Score ≥ this ⇒ at least `possible`. |
| `VESTIGE_MERGE_AUTO_APPLY` | `1`/`true` to allow auto-apply. |
A persisted policy (set via `merge_policy`) takes precedence over the
environment, which takes precedence over the built-in defaults. When
`vestige.toml` configuration lands, the policy will read from there as well.
## Schema (migration V14)
- `knowledge_nodes.protected INTEGER NOT NULL DEFAULT 0`
- `knowledge_nodes.superseded_by TEXT`
- `merge_plans(id, kind, status, created_at, applied_at, survivor_id,
member_ids, confidence, classification, payload)`
- `merge_operations(id, plan_id, op_type, status, created_at, reverted_at,
reverts_op_id, survivor_id, affected_ids, confidence, signals, reason,
undo_payload)`
The two `ALTER TABLE ... ADD COLUMN` statements are applied with duplicate-column
guards so the migration is idempotent on replay; the rest of V14 uses
`CREATE ... IF NOT EXISTS`.
## Anti-patterns this design avoids
- **Silently double-storing contradictions.** Merge composition attributes and
de-duplicates content instead of blindly concatenating or dropping it.
- **Auto-deleting on contradiction.** Supersede invalidates bitemporally; the
old memory is retained and queryable.
- **Trading away the audit trail for auto-merge convenience.** Every operation is
logged and reversible, with provenance for why memories combined.

View file

@ -1,6 +1,6 @@
{
"name": "vestige",
"version": "2.1.23",
"version": "2.1.25",
"private": true,
"description": "Cognitive memory for AI - MCP server with FSRS-6 spaced repetition",
"author": "Sam Valladares",

View file

@ -1,6 +1,6 @@
{
"name": "@vestige/init",
"version": "2.1.23",
"version": "2.1.25",
"description": "Configure Vestige local memory for MCP-compatible AI agents",
"bin": {
"vestige-init": "bin/init.js"

View file

@ -1,6 +1,6 @@
{
"name": "vestige-mcp-server",
"version": "2.1.23",
"version": "2.1.25",
"mcpName": "io.github.samvallad33/vestige",
"description": "Vestige MCP Server — local cognitive memory for MCP-compatible AI agents",
"bin": {

View file

@ -7,12 +7,12 @@
"url": "https://github.com/samvallad33/vestige",
"source": "github"
},
"version": "2.1.23",
"version": "2.1.25",
"packages": [
{
"registryType": "npm",
"identifier": "vestige-mcp-server",
"version": "2.1.23",
"version": "2.1.25",
"transport": {
"type": "stdio"
}