From ec8bf272553802ac9ede7c1d64a9f2e10a8e92d2 Mon Sep 17 00:00:00 2001 From: Sam Valladares Date: Sun, 28 Jun 2026 18:12:16 -0500 Subject: [PATCH] docs(mcp): add reconciled two-layer tool-consolidation plan; refresh stale comments MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds docs/launch/tool-consolidation-v2.2.0.md — the single sequenced plan that reconciles the two prior planning notes: - Layer 1 (this PR): 34 → 12 advertised tools, safe commit order, alias policy, preserved invariants, and the test that proves each. - Layer 2 (follow-up): tiny always-on default surface + SessionStart/Stop hooks. Also refreshes stale in-code comments to match the consolidated surface: - server.rs handle_tools_list header (was "v2.1.21: 25 tools") and the size-annotation rationale (now lists recall/memory_status/dedup/graph). - tools/mod.rs module doc (the facade vs. granular-handler relationship). No behavior change. Gates: cargo test --workspace, cargo clippy -D warnings, pnpm dashboard check + build — all green. Co-Authored-By: Claude Opus 4.8 (1M context) --- crates/vestige-mcp/src/server.rs | 16 +-- crates/vestige-mcp/src/tools/mod.rs | 11 +- docs/launch/tool-consolidation-v2.2.0.md | 135 +++++++++++++++++++++++ 3 files changed, 152 insertions(+), 10 deletions(-) create mode 100644 docs/launch/tool-consolidation-v2.2.0.md diff --git a/crates/vestige-mcp/src/server.rs b/crates/vestige-mcp/src/server.rs index f63a781..671f179 100644 --- a/crates/vestige-mcp/src/server.rs +++ b/crates/vestige-mcp/src/server.rs @@ -240,9 +240,10 @@ impl McpServer { /// Handle tools/list request async fn handle_tools_list(&self) -> Result { - // v2.1.21: 25 tools (verified by the `tools.len() == 25` assertion in the - // handle_tools_list test below — the `suppress` tool landed in v2.0.5). - // Deprecated tools still work via redirects in handle_tools_call. + // v2.2: 12 advertised tools after Layer-1 Tool Consolidation + // (verified by `tools.len() == 12` in test_tools_list_returns_all_tools). + // 22 deprecated/folded names still work as hidden redirects in + // handle_tools_call. See docs/launch/tool-consolidation-v2.2.0.md. let mut tools = vec![ // ================================================================ // RECALL — unified retrieval tool (v2.2). HOT PATH. @@ -387,10 +388,11 @@ impl McpServer { // chunk-read them. // // Per-tool caps below are sized at ~2× observed peak with growth - // headroom; max permitted by Anthropic is 500_000. Only the four - // empirically-measured high-payload tools carry the annotation today; - // the remaining 21 tools deliberately do NOT (cargo-cult prevention — - // annotating a small-payload tool dilutes the signal). + // headroom; max permitted by Anthropic is 500_000. Only the + // high-payload tools carry the annotation (recall, memory_status, + // memory, codebase, dedup, graph); the remaining advertised tools + // deliberately do NOT (cargo-cult prevention — annotating a + // small-payload tool dilutes the signal). // // Other tools that COULD plausibly grow into the annotated set with // future workload (`deep_reference`, `cross_reference`, `memory_graph`, diff --git a/crates/vestige-mcp/src/tools/mod.rs b/crates/vestige-mcp/src/tools/mod.rs index 4e356e0..e69d59a 100644 --- a/crates/vestige-mcp/src/tools/mod.rs +++ b/crates/vestige-mcp/src/tools/mod.rs @@ -2,9 +2,14 @@ //! //! Tool implementations for the Vestige MCP server. //! -//! The unified tools (codebase_unified, intention_unified, memory_unified, search_unified) -//! are the primary API. The granular tools below are kept for backwards compatibility -//! but are not exposed in the MCP tool list. +//! v2.2 Tool Consolidation (Layer 1): the advertised surface is 12 tools — +//! recall, memory, codebase, intention, smart_ingest, source_sync, +//! memory_status, dedup, graph, maintain, session_start, suppress. The unified +//! facade modules (recall, dedup, memory_status, graph_unified, maintain, plus +//! the earlier *_unified) dispatch on an action/mode/view discriminator and +//! delegate to the granular handler modules below, which stay in the crate as +//! the implementation layer and as hidden back-compat aliases (see the redirect +//! arms in server.rs). See docs/launch/tool-consolidation-v2.2.0.md. // Active unified tools pub mod codebase_unified; diff --git a/docs/launch/tool-consolidation-v2.2.0.md b/docs/launch/tool-consolidation-v2.2.0.md new file mode 100644 index 0000000..3e7190c --- /dev/null +++ b/docs/launch/tool-consolidation-v2.2.0.md @@ -0,0 +1,135 @@ +# Tool Consolidation v2.2.0 + +> Reduce the Vestige MCP tool surface so an agent can reliably pick the right +> tool, then make the few always-on tools deterministic. Two layers: Layer 1 +> (this release) collapses 34 advertised tools to 12; Layer 2 (follow-up) shrinks +> the *default* surface and enforces the memory loop with hooks. + +## Why (frontier evidence) + +More advertised tools actively degrade tool selection — the 30 tools an agent +ignores make the 5 it uses harder to choose: + +- **RAG-MCP** (arXiv 2505.03275): selection accuracy collapses 43% → 14% when the + full tool catalog is dumped into context; stays >90% under ~30 tools. +- **Anthropic tool-deferral**: deferring tool schemas moved Opus 4 from 49% → 74% + on a tool-heavy benchmark. +- **GitHub Copilot**: 40 → 13 tools gave +2–5pp accuracy and −400ms latency. +- **OpenAI** guidance: aim for <20 functions visible at the start of a turn. +- **RoTBench** (2401.08326): tool *names* are load-bearing — renaming drops GPT-4 + 80 → 58. So renames are deliberate and every old name keeps working. + +Vestige had **34** advertised tools. This is the correction. + +## Layer 1 — Count reduction (THIS RELEASE): 34 → 12 advertised + +Principle: **one consolidation per commit, one change per submission.** Each +consolidation is its own commit, landed in a safe order with the hot retrieval +path touched last. Every old tool name remains a hidden `warn!` + redirect alias +for at least one minor release (so existing `.mcp.json` configs, hooks, and agent +habits keep working) and is removed in **v2.3.0**. + +### Safe order (as committed) + +| # | Commit | Folds | Into | Count | +|---|--------|-------|------|------:| +| 1 | `dedup` | find_duplicates + merge_candidates + plan_merge + plan_supersede + apply_plan + merge_undo + protect + merge_policy (8) | `dedup` | 34 → 27 | +| 2 | `session_start` | session_context (rename) | `session_start` | 27 | +| 3a | `memory_status` | system_status + memory_health + memory_timeline + memory_changelog (4) | `memory_status` | 27 → 24 | +| 3b | `graph` | explore_connections + predict + memory_graph + composed_graph (4) | `graph` | 24 → 21 | +| 4 | `maintain` | consolidate + dream + gc + importance_score + backup + export + restore (7) | `maintain` | 21 → 15 | +| 5 | `recall` | search + deep_reference + cross_reference + contradictions (4) | `recall` | 15 → 12 | + +`recall` is committed **last** because it is the hot path. + +### Final advertised surface (12) + +| Standalone (6) | Consolidated (6) | +|---|---| +| `smart_ingest` | `recall` | +| `memory` | `dedup` | +| `codebase` | `memory_status` | +| `intention` | `graph` | +| `source_sync` | `maintain` | +| `suppress` | `session_start` | + +### Action / mode / view maps + +- **`recall`** — `mode`: `lookup` (default) · `reason` · `contradictions` +- **`dedup`** — `action`: `scan` (default) · `plan_merge` · `plan_supersede` · `apply` · `undo` · `protect` · `policy` +- **`memory_status`** — `view`: `health` (default) · `retention` · `timeline` · `changelog` +- **`graph`** — `action`: `chain` · `associations` · `bridges` · `predict` · `memory_graph` · `recent` · `get` · `memory` · `neighbors` · `never_composed` · `bounty_mode` · `label` +- **`maintain`** — `action`: `consolidate` · `dream` · `gc` · `importance_score` · `backup` · `export` · `restore` + +### Resolved design decisions + +- **`search` is folded, not kept standalone.** `recall` with no `mode` (the + default) *is* search — a zero-overhead pass-through to `search_unified`. Keeping + both `search` and `recall` advertised would be the exact RAG-MCP anti-pattern. + Final count is a clean **12**, leaving 2 slots of headroom toward a future + always-on `save` surface rather than spending them on a redundant verb. +- **`graph` actions are flat peers, not nested.** `explore`'s `chain` / + `associations` / `bridges` sit alongside `predict` / `memory_graph` / + `composed_graph` actions in a single `action` enum — matching the existing + `memory` / `codebase` flat-action convention and avoiding a translation layer. + +### Invariants preserved (with the test that proves each) + +- **bitemporal-never-delete** (`dedup`): plan → apply → undo, confirm-gating, and + invalidation-not-deletion delegate to `merge::execute` verbatim. +- **`system_status` response shape** (`memory_status` view=`health`): byte-for-byte + — `test_default_view_is_health`. +- **`gc` dry-run default** + **`restore` path-confinement** (`maintain`): + `test_maintain_actions_and_safety`. +- **`recall` lookup = search, no reasoning cost** (hot path): + `test_recall_lookup_matches_search_shape`. +- **Dashboard events** (consolidate/dream/importance_score Started + Completed, + SearchPerformed): preserved by re-emitting in the new dispatch arms and by + `emit_tool_event` normalizing the unified tool name to its effective sub-action. + +### Result-size annotations (moved with their tools) + +`memory_timeline` (200k) → `memory_status`; `search` (300k) → `recall`; new +`dedup` 150k and `graph` 250k. Kept in sync across the annotation loop, the +`expected_max_result_size` helper, and both annotation guard tests. + +### Deprecation timeline + +Aliases `warn!` in v2.2.x and are hard-removed in **v2.3.0**. Full alias list (31 +names) lives in the dispatch redirects in `crates/vestige-mcp/src/server.rs`. + +## Layer 2 — Default-surface + hooks (FOLLOW-UP, NOT in v2.2.0) + +Count reduction is necessary but not sufficient: what matters most is how few +tools are visible *at the start of a turn*, plus making the memory loop fire +deterministically instead of hoping the model remembers. + +- **Tiny always-on surface (~3)**: `recall` @ session start, `save` (=`smart_ingest`) + @ session end, `recall` on-demand for facts. Everything else (`dedup`, `graph`, + `maintain`, `memory_status`, …) deferred off the default surface, loaded on + demand. +- **Deterministic hooks**: a `SessionStart` hook fires `recall`; a `Stop` hook + fires `save` (async, fire-and-forget — synchronous heavy work in `Stop` causes + loops + per-turn lag). "If the model fails to save, it's gone" — move save out + of the model hot loop. +- This is what turns 12-advertised into ~3-default. Status: **design guidance + only; no code in v2.2.0.** + +## Verification + +Per-commit gates (all green for every commit): + +```sh +cargo test --workspace --no-fail-fast +cargo clippy --workspace -- -D warnings +``` + +Release gates before tagging v2.2.0: + +```sh +pnpm --filter @vestige/dashboard check +pnpm --filter @vestige/dashboard build +``` + +Plus a `tools/list` smoke check asserting exactly **12** advertised names +(`test_tools_list_returns_all_tools`).