mirror of https://github.com/samvallad33/vestige.git synced 2026-07-02 22:01:01 +02:00

Sam Valladares e08182675b fix(blackbox): C2-deep gate destructive writes post-delete + redact PR content

Two deeper review findings (both blockers) + doc de-staling.

C2-deep: my earlier C2 made purge/delete TRACE as memory.write, but gate_writes
did `get_node(id) -> skip on None`, and purge had already DELETEd the row — so a
destructive removal still never opened a Memory PR (it was silently skipped).
The most security-critical write type couldn't be reviewed. Fix: a missing node
is now gateable for destructive decisions — gate_writes builds the WriteContext
from the decision itself (marks `forgets`, which classify_write gates), and the
PR records the removal with node.deleted=true. Proven live: purging a node opens
a PR (kind node_decayed, deleted true); test
gate_opens_pr_for_destructive_write_after_node_deleted_c2.

PRIV: gate_writes copied the FULL node.content into the PR diff + title, so a
real secret in a gated memory would leak into the memory_prs table, the
dashboard, and any exported proof bundle — defeating the point of gating
sensitive writes. Fix: the PR now stores a truncated content PREVIEW + an FNV
content HASH, and sensitive-topic/sensitive-node-type writes are fully REDACTED
("[redacted — sensitive content; review via risk signals]"). The reviewer still
sees the risk signals (why it opened) and a hash (to correlate), never the
secret. Tests gate_redacts_sensitive_content_in_pr_priv,
content_preview_redacts_sensitive_and_truncates, content_hash_is_stable. The
committed memory_pr.json + the whole proof bundle were re-captured and contain
no secret (verified by scan); the re-shot memory-prs.png shows the redaction.

DOC: REVIEW.md commit list is now git-log-based (no stale hashes); C2-deep + PRIV
added to the findings table; PROOF.md write/PR rows updated; test count -> 1007.

Gates: 1007 lib tests pass (+7 new regressions), clippy -D warnings clean,
dashboard check + build clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-22 19:50:57 -05:00

12 KiB

Raw Blame History

Agent Black Box — Review Bundle

Branch: feat/agent-black-box (head = branch tip) Base (review against): 9e92a5999ada37bed9b4820bb25b7748b417411c (the feat/dashboard-bleeding-edge tip this branched from) Status: feature work frozen. No quarantine-constellation work has started.

Start here, then read PROOF.md (the per-feature real/caveat/stub ledger) and open the screenshots.

Update — review findings addressed. A full multi-agent review found 7 real issues (4 blockers). All are fixed and tested; this bundle was re-captured from a single run (run_proof) so trace.json, websocket-events.jsonl, and memory_pr.json now all carry the same runId. See "Review findings addressed" below.

Frozen public claim

Vestige records real MCP memory activity into a replayable local trace, with receipts and reviewable risky writes.

We do not claim Sanhedrin vetoes are live by default. dream.patch is real but fires only when a dream actually runs (proven below; the UI says so when no dream ran).

What's in this bundle

File	What it is
`REVIEW.md`	this file — the entry point
`PROOF.md`	per-feature real/caveat/stub ledger + reproduce steps
`status.json`	live server `/api/health` at capture time
`trace.json`	`.vestige-trace.json` export of `run_proof` (10 real events)
`receipt.json`	a real retrieval receipt (`r_2026_06_22_runproof`)
`memory_pr.json`	the risky auth write → Memory PR, promoted UI→API→SQLite
`websocket-events.jsonl`	live WS stream: `TraceEvent`×6, `MemoryPrOpened`, `MemoryPrDecided`, …
`dream-trace.json`	`run_dream_proof` export — 14 events, last is `dream.patch`
`dream-websocket-events.jsonl`	live WS stream containing the `dream.patch` `TraceEvent`
`screenshots/black-box.png`	Black Box tab (spine header, scrubber, producers, log)
`screenshots/receipts.png`	the `ReceiptCard` with real data + "Open receipt in Cinema"
`screenshots/memory-prs.png`	Memory PRs: diff, "Why this opened", `Decided: promote`
`screenshots/graph.png`	live graph constellation + Memory Cinema (unchanged)
`screenshots/black-box-dream.png`	Black Box on the dream run
`screenshots/dream-producers.png`	producers panel — `dream.patch · fired this run`

Caveat ledger (the honest part)

Producer	Status	Why
`mcp.call`, `memory.write`, `memory.retrieve`, `memory.suppress`	REAL	fire on real tool traffic; args hashed, never stored raw
`contradiction.detected`	REAL	fires when deep_reference surfaces a contradiction; UI shows "no contradiction in this run" otherwise
Memory Receipts	REAL	built from real scored memories + trust, persisted, attached to output
Risk-gated Memory PRs (quarantine review)	REAL	commit-then-suppress; audit preserved, influence suspended; Promote verified end-to-end
WebSocket broadcast	REAL	`websocket-events.jsonl` + unit test
`vestige://trace/{runId}` resource	REAL	full-spine test hop 5
`dream.patch`	REAL (not live-by-default)	proven by `run_dream_proof`; fires only when a dream runs; UI says so otherwise
`sanhedrin.veto`	CAVEAT	extraction is real + unit-tested, but the Sanhedrin verifier is an optional hook, off by default — no producer connected; UI says exactly that

No feature is stubbed. The one remaining caveat is surfaced as an explicit UI state, not an empty space.

Command receipts (run live at 2026-06-22T22:57:59Z)

Toolchain: rustc 1.95.0 · cargo 1.95.0 · node v24.12.0 · pnpm 10.33.0.

$ cargo test --workspace --lib
test result: ok. 529 passed; 0 failed; 0 ignored; 0 measured   (vestige-core)
test result: ok.  33 passed; 0 failed; 0 ignored; 0 measured   (tests/e2e unit)
test result: ok. 428 passed; 0 failed; 0 ignored; 0 measured   (vestige-mcp)
# 990 lib tests, 0 failures

$ cargo clippy --workspace -- -D warnings
Finished `dev` profile ... (EXIT 0, zero warnings)

$ pnpm --filter @vestige/dashboard check
COMPLETED 905 FILES 0 ERRORS 0 WARNINGS 0 FILES_WITH_PROBLEMS

$ pnpm --filter @vestige/dashboard build
✓ built in 4.15s ... ✔ done

$ cargo test -p vestige-mcp test_full_spine_one_runid_crosses_every_hop
test server::tests::test_full_spine_one_runid_crosses_every_hop ... ok

$ cargo test -p vestige-mcp trace_recorder::tests::extract_dream
test ...extract_dream_proposals_empty_when_not_dream_tool ... ok
test ...extract_dream_proposals_from_real_insights_shape ... ok

$ cargo test -p vestige-core trace
test result: ok. 27 passed; 0 failed   (trace schema, receipt, review)

Only statuses with a receipt above are credited. Nothing is claimed from memory.

Review surface (what changed)

The Black Box work sits in a series of commits on top of the base. To see the exact, current diff (build artifacts + this proof bundle excluded):

$ git diff --stat 9e92a59..HEAD -- ':!apps/dashboard/build' ':!blackbox-proof-2026-06-22'
# ~27 source files (Rust + SvelteKit). Run this against the branch tip for the
# live count — it grows as review fixes land.

Commits (oldest first) — run git log --oneline 9e92a59..HEAD for the live, authoritative list; the series so far:

80c823a feat: Agent Black Box + Receipts + risk-gated Memory PRs
b89beee proof: Proof Lock — full-spine test, honest UI states, proof pack
140b15f proof: dream.patch proven live with a real dream run
cadffb4 docs: package the review bundle — REVIEW.md entry point
8f7bed0 fix: review blockers B1–B7 + re-capture proof bundle
6a0173d fix: C1 unconditional quarantine release + C2 trace destructive writes
… fix: C2-deep (gate destructive writes post-delete) + PRIV (redact PR content)

The hashes above are point-in-time; the branch tip is the source of truth.

Key files to review:

Core (pure logic): crates/vestige-core/src/trace/{mod,receipt,review}.rs
Persistence: crates/vestige-core/src/storage/trace_store.rs + migrations.rs (V18)
MCP wiring: crates/vestige-mcp/src/trace_recorder.rs, server.rs (dispatch), resources/trace.rs
Dashboard API: crates/vestige-mcp/src/dashboard/{handlers,events,mod}.rs
UI: apps/dashboard/src/routes/(app)/{blackbox,memory-prs}/+page.svelte, lib/components/{ReceiptCard.svelte,blackbox-helpers.ts}

Suggested review checklist

Spine integrity: read test_full_spine_one_runid_crosses_every_hop (crates/vestige-mcp/src/server.rs) — does it actually assert the runId is byte-identical at all five hops?
Privacy: confirm mcp.call stores only a hash of args (trace_recorder::hash_args), never raw args.
Risk taxonomy: review classify_write + WriteContext (crates/vestige-core/src/trace/review.rs) — is the sensitive-topic / contradiction / supersede gating correct and not over-broad?
Quarantine semantics: confirm risky writes are committed-then-suppressed (audit preserved), not silently dropped, and the copy says so.
Migration safety: V18 is additive; MIGRATIONS.last().version is used by the migration tests (no hard-coded version literals left).
Local-first defaults: Risk-Gated is default; Sanhedrin/dream producers stay optional and off by default; nothing forces heavy models.
No protected code touched: MemoryCinema.svelte and graph/cinema/* are unchanged; the graph page only gained an additive ?center= param.

Review findings addressed (2026-06-22)

A full read-only review (multiple parallel agents, both Rust and dashboard) found 7 real issues — 4 blockers. All fixed and tested:

#	Severity	Finding	Fix	Proof
B1	blocker	Promoting a Memory PR didn't unsuppress the quarantined memory — UI said "promoted" while the memory stayed out of retrieval	`act_on_memory_pr` now calls `reverse_suppression(subject_id)` on accept actions (promote/merge/supersede); `MemoryPrAction::releases_memory()` encodes the rule	live: PR response `subjectReleased: true`, SQLite `suppression_count: 0`; tests `promote_releases_a_quarantined_memory_end_to_end`, `only_accept_actions_release_the_memory`
B2	blocker	memory promote/demote (returns `action`, not `decision`) and `codebase` writes bypassed the write-trace + gate	`extract_writes` reads `action` too, filtered by `is_write_decision`; `is_write_tool` includes `codebase`	tests `extract_writes_recognizes_action_shape_b2`, `extract_writes_ignores_read_actions_b2`, `write_tool_set_includes_codebase_b2`
B3	blocker	Receipt ids collided within a run (`r_<date>_<runId>` + `INSERT OR REPLACE`) — later receipt overwrote earlier	id is now `r_<date>_<runId8>_<unique6>`	live: two receipts in `run_proof` have distinct ids; test `receipt_ids_unique_within_a_run_b3`
B4	blocker	Proof bundle mis-assembled: `trace.json`=`run_proof` but `websocket-events.jsonl`=`run_proof2`	re-captured the whole bundle from one run	all artifacts now carry `run_proof` (verified)
B5	P2	Black Box receipts panel showed global latest, not the selected run's	`list_receipts_for_run` + `/api/receipts?run=` + page uses `listForRun(runId)`	live: `?run=run_proof` returns only that run; test `receipts_are_listable_per_run_b5`
B6	P2	`SENSITIVE_TOPICS` substring match false-fired ("tokenizer"→token, "author"→auth)	word-boundary matching	tests `sensitive_topic_word_boundary_no_false_positives_b6`, `..._still_catches_real_b6`
B7	P3	`set_review_mode` non-atomic write; export filename used raw `run_id`	`write_atomic` (temp+rename); filename sanitized; static routes declared before dynamic	covered by build + the atomic-write helper's existing use
C1	blocker	B1's release used `reverse_suppression`, which refuses past the 24h labile window — a PR promoted late stayed suppressed	new `release_quarantine(id)`: unconditional release (no time limit), used by the PR handler instead	test `release_quarantine_works_past_the_labile_window_c1` (proves reverse_suppression refuses but release_quarantine succeeds at +100h)
C2	blocker	`memory` `purge`/`delete` (destructive removal) bypassed the write-trace + gate	added purge/purged/delete/deleted/forget/forgotten to `is_write_decision`	test `extract_writes_recognizes_destructive_actions_c2`
C2-deep	blocker	C2 made purge trace, but `gate_writes` did `get_node→skip` on the (already-deleted) row, so a destructive write still never opened a PR	gate now treats a missing node as gateable for destructive decisions (builds the context from the decision, marks `forgets`); the PR records the removal with `deleted:true`	test `gate_opens_pr_for_destructive_write_after_node_deleted_c2`; live: purging a node opened a PR (`kind: node_decayed`, `deleted: true`)
PRIV	blocker	`gate_writes` copied full `node.content` into the PR `diff` + `title` — a real secret would leak into the `memory_prs` table and any exported proof bundle	PR now stores a truncated preview + a content hash; sensitive-topic-gated writes are fully redacted (`[redacted — sensitive content…]`); the committed `memory_pr.json` was re-captured and contains no secret	tests `gate_redacts_sensitive_content_in_pr_priv`, `content_preview_redacts_sensitive_and_truncates`; live + bundle scan: no secret string anywhere

One earlier (self-)review claim was withdrawn: the /api/memory-prs/mode vs /{id} route order is not a functional bug — axum 0.8 / matchit gives static segments priority. Reordered for clarity only.

Net after fixes (B1–B7 + C1/C2 + C2-deep + PRIV): 1007 lib tests pass, clippy -D warnings clean, dashboard check + build clean.

Reproduce (any reviewer, locally)

# 1. run a real server with the dashboard on
VESTIGE_DATA_DIR=$(mktemp -d) VESTIGE_DASHBOARD_ENABLED=true vestige-mcp   # stdio JSON-RPC
# 2. initialize, then drive tools/call with a runId arg (smart_ingest, deep_reference)
# 3. a sensitive-topic write opens a Memory PR; promote it via the dashboard
# 4. export the trace:
curl -s http://127.0.0.1:3927/api/traces/<runId>/export
# 5. for dream.patch: seed >=5 memories, then call the `dream` tool with the same runId
# 6. run the regression: cargo test -p vestige-mcp test_full_spine_one_runid_crosses_every_hop

12 KiB Raw Blame History Unescape Escape