Two deeper review findings (both blockers) + doc de-staling.
C2-deep: my earlier C2 made purge/delete TRACE as memory.write, but gate_writes
did `get_node(id) -> skip on None`, and purge had already DELETEd the row — so a
destructive removal still never opened a Memory PR (it was silently skipped).
The most security-critical write type couldn't be reviewed. Fix: a missing node
is now gateable for destructive decisions — gate_writes builds the WriteContext
from the decision itself (marks `forgets`, which classify_write gates), and the
PR records the removal with node.deleted=true. Proven live: purging a node opens
a PR (kind node_decayed, deleted true); test
gate_opens_pr_for_destructive_write_after_node_deleted_c2.
PRIV: gate_writes copied the FULL node.content into the PR diff + title, so a
real secret in a gated memory would leak into the memory_prs table, the
dashboard, and any exported proof bundle — defeating the point of gating
sensitive writes. Fix: the PR now stores a truncated content PREVIEW + an FNV
content HASH, and sensitive-topic/sensitive-node-type writes are fully REDACTED
("[redacted — sensitive content; review via risk signals]"). The reviewer still
sees the risk signals (why it opened) and a hash (to correlate), never the
secret. Tests gate_redacts_sensitive_content_in_pr_priv,
content_preview_redacts_sensitive_and_truncates, content_hash_is_stable. The
committed memory_pr.json + the whole proof bundle were re-captured and contain
no secret (verified by scan); the re-shot memory-prs.png shows the redaction.
DOC: REVIEW.md commit list is now git-log-based (no stale hashes); C2-deep + PRIV
added to the findings table; PROOF.md write/PR rows updated; test count -> 1007.
Gates: 1007 lib tests pass (+7 new regressions), clippy -D warnings clean,
dashboard check + build clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5.7 KiB
Vestige Agent Black Box — Proof Pack (2026-06-22)
Public claim (frozen): Vestige records real MCP memory activity into a replayable local trace, with receipts and reviewable risky writes.
We do not claim Sanhedrin vetoes or dream patches are live by default. Those producers are optional and off by default — the UI says so explicitly.
This pack is captured from a live Vestige build on branch
feat/agent-black-box — a real vestige-mcp process with the dashboard
enabled, driven by real MCP tools/call traffic. Nothing here is mocked.
The receipt chain — one runId, every hop
The money guarantee: a single runId (run_proof) crosses every layer,
byte-identical. Verified two ways — by the files in this folder, and by the
deterministic regression test test_full_spine_one_runid_crosses_every_hop
(crates/vestige-mcp/src/server.rs).
| Hop | Layer | Evidence in this pack |
|---|---|---|
| 1 | MCP tool output (runId + traceUri) |
every tool result; see test HOP 1 |
| 2 | SQLite agent_traces rows |
trace.json (runId: run_proof, 10 events) |
| 3 | WebSocket broadcast | websocket-events.jsonl (6 TraceEvent lines, each with run_id) |
| 4 | /api/traces/:runId response |
trace.json is the export of that endpoint |
| 5 | dashboard render | screenshots (Black Box timeline = the 10 events) |
| 6 | vestige://trace/{runId} MCP resource |
test HOP 5 resolves the same id |
Files
| File | What it proves |
|---|---|
status.json |
the live server health at capture time |
trace.json |
the full .vestige-trace.json export — 10 real events in order |
receipt.json |
a real retrieval receipt (r_2026_06_22_runproof, 5 retrieved, decay medium) |
memory_pr.json |
the risky auth write → Memory PR, promoted through UI→API→SQLite, signal sensitive_topic |
websocket-events.jsonl |
the live WS stream: TraceEvent×6, MemoryPrOpened, MemoryPrDecided, MemoryCreated, MemoryUpdated |
screenshots/ |
Graph, Black Box, Receipts (in PR), Memory PRs — see screenshots/README.md |
Per-feature honesty: real / caveat / stub
| Feature | Status | Notes |
|---|---|---|
mcp.call trace |
REAL | every tools/call records one; args hashed, never stored raw |
memory.write trace |
REAL | fires on smart_ingest/ingest, memory promote/demote/edit, codebase remember_*, AND destructive purge/delete |
memory.retrieve trace |
REAL | fires on deep_reference/search, with per-id activation |
memory.suppress trace |
REAL | recorded path; fires when retrieval suppresses |
contradiction.detected trace |
REAL | fires when deep_reference surfaces a contradiction pair; UI says "no contradiction in this run" when none |
| Memory Receipts | REAL | built from real scored memories + trust, persisted, attached to output |
| Risk-gated Memory PRs | REAL | quarantine review: commit-then-suppress, audit preserved, influence suspended. Promote verified end-to-end (releases the memory, even past the 24h window). Destructive purge/delete also open a PR. PR content is redacted for sensitive writes (preview + hash, never the raw secret) |
| Fast / Risk-Gated / Paranoid modes | REAL | persisted to <data_dir>/review_mode.json; Risk-Gated is the default |
| WebSocket broadcast | REAL | proven by websocket-events.jsonl + a unit test |
vestige://trace/{runId} resource |
REAL | proven by the full-spine test |
sanhedrin.veto trace |
CAVEAT | extraction code is real + unit-tested, but the Sanhedrin verifier is an optional hook, off by default — no producer is connected, and the UI says exactly that |
dream.patch trace |
REAL (proven 2026-06-22) | a real dream run over 6 memories produced one dream.patch event under run_dream_proof — see dream-trace.json (last event), dream-websocket-events.jsonl, and screenshots/dream-producers.png where the row flips to "fired this run". The UI still shows "No dream run in this trace" for runs where no dream executed. |
| Graph-pulse "Open receipt in Cinema" | REAL (deep-link) | navigates the graph centered on the receipt's primary memory; MemoryCinema itself is unchanged |
No feature is stubbed. The two CAVEATs are real plumbing whose upstream producer is intentionally off by default — surfaced as explicit UI states, not empty mystery.
dream.patch — proven with a real dream run (2026-06-22)
Bounded follow-up: a single real dream consolidation flipped the dream.patch
producer from "quiet" to a recorded live event, same runId, every hop.
- 6 related memories seeded under
run_dream_proof, then onedreamcall. - The dream produced one consolidation insight → one
dream.patchevent:dream:RecurringPattern:5d941c7f+a41aca72+b029fe53+6167f2c3+1117dd4e+e0782442(the real insight type + the six source memories it bridged). - SQLite:
dream-trace.json(14 events, last isdream.patch). - API:
/api/traces/run_dream_proof/export→dream-trace.json. - WebSocket:
dream-websocket-events.jsonl(thedream.patchTraceEvent). - Dashboard:
screenshots/black-box-dream.png+screenshots/dream-producers.png(the producers row shows dream.patch · fired this run).
dream.patch is real but not live-by-default: it fires only when a dream
actually runs. The UI says so for runs where it didn't.
Reproduce
VESTIGE_DATA_DIR=<tmp> VESTIGE_DASHBOARD_ENABLED=true vestige-mcp(stdio).initialize, then drivesmart_ingest/deep_referencecalls with arunIdargument.- A sensitive-topic write (auth/security/money/identity/…) opens a Memory PR.
curl /api/traces/<runId>/export→ the.vestige-trace.json.cargo test -p vestige-mcp test_full_spine_one_runid_crosses_every_hop.