Merge remote-tracking branch 'origin/main' into codex/opencode-sigill-salvage

2026-06-20 21:18:08 +02:00 · 2026-06-18 19:59:25 -05:00 · 2026-06-18 19:59:25 -05:00 · ea5ed28081
commit ea5ed28081
parent 6c7d56b4cf c60233961d
26 changed files with 6997 additions and 91 deletions
--- a/docs/COMPOSED_GRAPH.md
+++ b/docs/COMPOSED_GRAPH.md
@ -0,0 +1,159 @@
+# ComposedGraph
+
+ComposedGraph records memory combinations as durable reasoning events.
+
+Most memory systems store facts, entities, or relationships. ComposedGraph stores a
+different object: which memories were used together, why they were used, and what
+happened afterward.
+
+## Model
+
+`composition_events` stores the reasoning envelope:
+
+- tool and mode, such as `deep_reference` or `bounty`
+- query and query hash
+- confidence, status, and output preview
+- metadata for intent, analyzed memory count, activation expansion, and reasoning preview
+
+`composition_members` stores the participating memories:
+
+- memory id
+- role, such as `primary`, `supporting`, `contradicting`, or `superseded`
+- rank, trust, relevance score, preview, and metadata
+
+`composition_outcomes` stores later labels:
+
+- `helpful`
+- `dead_end`
+- `submitted`
+- `accepted`
+- `rejected`
+- `duplicate_risk`
+- `needs_poc`
+- `bad_severity`
+- `user_promoted`
+- `user_demoted`
+- `closed_by_scope`
+- `closed_by_duplicate`
+- `closed_by_false_assumption`
+- `closed_by_user`
+- `expired_lane`
+
+Member memory ids are intentionally historical references, not foreign keys into
+`knowledge_nodes`. Purging or superseding a memory should not erase the fact that
+it once participated in a reasoning path.
+
+## MCP Tool
+
+Use `composed_graph` for read/write access to the composition ledger.
+
+```json
+{ "action": "recent", "limit": 10 }
+```
+
+```json
+{ "action": "get", "event_id": "<composition-event-id>" }
+```
+
+```json
+{ "action": "memory", "memory_id": "<memory-id>", "limit": 10 }
+```
+
+```json
+{ "action": "neighbors", "memory_id": "<memory-id>", "limit": 10 }
+```
+
+```json
+{ "action": "never_composed", "tags": ["project:vestige"], "limit": 10 }
+```
+
+```json
+{
+  "action": "label",
+  "event_id": "<composition-event-id>",
+  "outcome_type": "helpful",
+  "notes": "This combination led to the accepted fix."
+}
+```
+
+## Never-Composed Frontier
+
+`never_composed` returns pairs that have not yet appeared together in a
+composition event.
+
+The ranking is intentionally not just shared-tag matching. It combines:
+
+- exact shared tags
+- shared meaningful content terms
+- boundary tags such as `boundary-*`, `oracle`, `queue`, `settlement`, `upgrade`,
+  `pause`, `accounting`, or `scope`
+- node-type diversity
+- FSRS retention strength
+- composition novelty, so memories that have not already been heavily composed
+  still get surfaced
+- prior composition outcomes from either member, so previously accepted,
+  duplicate-risk, or dead-end lanes shape the frontier without hiding it
+
+Each candidate includes:
+
+- `score`
+- `noveltyScore`
+- `bridgeScore`
+- `trustScore`
+- `outcomeScoreAdjustment`
+- `sharedTags`
+- `boundaryTags`
+- `sharedTerms`
+- `priorOutcomes`
+- `outcomeSignal`, such as `clean`, `prior_success`, `prior_duplicate_risk`,
+  `prior_closed_door`, or `mixed_prior_outcomes`
+- node types
+- previews
+- a short reason
+- a `compositionQuestion` that an agent can answer before taking action
+
+The output is a frontier queue, not a finding. A never-composed pair means
+"worth investigating," not "true," "novel," or "reportable."
+Prior outcomes are also guardrails, not verdicts: a duplicate-risk signal should
+make the agent check duplicate families first, while a success signal should make
+it inspect why the older composition worked.
+
+Closed-door labels should be specific when possible. Prefer `closed_by_scope`,
+`closed_by_duplicate`, `closed_by_false_assumption`, `closed_by_user`, or
+`expired_lane` over a generic `dead_end` when the reason is known.
+
+## Bounty / Research Mode
+
+`bounty_mode` is a higher-level read shape for investigative workflows. It returns:
+
+- recent already-composed lanes
+- never-composed lanes
+- closed doors
+- duplicate-risk lanes
+- lanes that need proof-of-concept work
+- top weird combinations
+
+This is useful for security research, bug triage, architecture work, and product
+strategy because failed or duplicate compositions are preserved instead of being
+rediscovered repeatedly.
+
+## Deep Reference Integration
+
+`deep_reference` persists composition events automatically when it has evidence
+members. Empty evidence does not create a ledger event.
+
+The response includes:
+
+- `composition_event_id` when persisted
+- `compositionWriteStatus`, usually `persisted` or `skipped_empty`
+
+## Design Direction
+
+The next useful upgrades are:
+
+- triple or n-ary candidate mining, not only pairs
+- structural-fit scoring for analogies, separate from surface similarity
+- trust-zone scoring so a composition is limited by its weakest provenance
+- temporal replay: "what combinations were available when this decision was made?"
+- evaluation tasks where success requires combining memories that were never
+  previously co-composed
--- a/docs/SANHEDRIN_RECEIPTS.md
+++ b/docs/SANHEDRIN_RECEIPTS.md
@ -12,6 +12,8 @@ instead of opaque. The current schema is `vestige.sanhedrin.receipt.v1`.
 - Appeals: `~/.vestige/sanhedrin/appeals.jsonl`
 - Fail-open events: `~/.vestige/sanhedrin/fail-open.jsonl`

+Optional companion schema: [`SANHEDRIN_TEST_INTEGRITY_DELTAS.md`](SANHEDRIN_TEST_INTEGRITY_DELTAS.md) describes mechanical deltas for cases where a verifier command passed but the test artifact changed after implementation.
+
 ## v1 JSON Shape

 ```json
--- a/docs/SANHEDRIN_TEST_INTEGRITY_DELTAS.md
+++ b/docs/SANHEDRIN_TEST_INTEGRITY_DELTAS.md
@ -0,0 +1,110 @@
+# Sanhedrin Test-Integrity Delta Receipts
+
+Receipt Lock proves a narrower claim: a verification command actually ran and
+succeeded. Test-integrity deltas are an optional companion receipt for the
+stronger claim that the tests still mean what the draft says they mean.
+
+This receipt is intentionally mechanical. It is not a broad correctness oracle
+and it does not ask a second model to decide whether the implementation is good.
+It records whether the verification artifact changed in ways that should
+upgrade, downgrade, or send the verification claim to human review.
+
+## Boundary
+
+Keep these claims separate:
+
+1. **Command receipt:** `cargo test`, `npm test`, `pytest`, or another verifier
+   command ran after the relevant edit and exited successfully.
+2. **Test-integrity delta:** the tests/specs behind that verifier were not
+   removed, skipped, weakened, or replaced after implementation in a way that
+   makes the green result less admissible.
+
+A run can have a valid command receipt and still receive a downgraded
+integrity decision.
+
+## Optional JSON Shape
+
+```json
+{
+  "schema": "vestige.sanhedrin.test_integrity_delta.v1",
+  "id": "tid_<stable hash>",
+  "commandReceiptId": "receipt_<stable hash>",
+  "verificationClaim": "All tests passed.",
+  "specSource": {
+    "contextId": "spec_ctx_04",
+    "testFiles": [
+      {
+        "path": "tests/cart.test.ts",
+        "hashBeforeImplementation": "sha256:...",
+        "hashAfterVerification": "sha256:..."
+      }
+    ]
+  },
+  "implementationContext": "impl_ctx_09",
+  "verifierContext": "verify_ctx_02",
+  "delta": {
+    "testFilesChangedAfterImplementation": true,
+    "removedOrDisabledTests": [
+      {
+        "kind": "skip_or_only",
+        "path": "tests/cart.test.ts",
+        "line": 42
+      }
+    ],
+    "removedAssertions": 2,
+    "weakenedExpectations": [
+      {
+        "path": "tests/cart.test.ts",
+        "from": "throws InvalidCouponError",
+        "to": "does not throw"
+      }
+    ],
+    "snapshotChurnWithoutSourceChange": false,
+    "coverageDelta": -3.8,
+    "mocksReplacingRealBoundary": [
+      {
+        "module": "PaymentGateway",
+        "before": "integration-ish fake",
+        "after": "empty stub"
+      }
+    ]
+  },
+  "freshVerifier": {
+    "commandReceiptId": "receipt_<stable hash>",
+    "exitCode": 0,
+    "checkedAfterLastRelevantEdit": true
+  },
+  "decision": "downgraded",
+  "reason": "tests passed, but the tests were weakened after implementation"
+}
+```
+
+## Decisions
+
+- `accepted` — a verifier command succeeded after the last relevant edit and no
+  integrity downgrade was detected.
+- `downgraded` — the command succeeded, but the tests/specs changed in a way
+  that makes the verification claim weaker than stated.
+- `needs_human_review` — the delta may be legitimate, but a local mechanical
+  check cannot safely classify it. Snapshot updates are a common example.
+
+## Minimal Fixture Suite
+
+These cases are small enough to live as fixtures without turning Sanhedrin into
+a correctness judge.
+
+| Case | Input pattern | Expected decision | Why |
+| --- | --- | --- | --- |
+| unchanged-good | implementation changes source; tests unchanged; fresh verifier succeeds | `accepted` | Green tests are supported by a fresh command receipt and unchanged test artifact. |
+| skipped-test | implementation adds `.skip`, `.only`, `#[ignore]`, or equivalent before verifier succeeds | `downgraded` | The command ran, but the claim no longer represents the original test obligation. |
+| weakened-assertion | expectation is relaxed after implementation, e.g. `throws InvalidCouponError` -> `does not throw` | `downgraded` | The verifier passed against a weaker assertion than the one available before implementation. |
+| justified-snapshot | snapshot changes alongside an intentional source/UI change | `needs_human_review` or `accepted` by policy | Snapshot churn can be valid, but the receipt should make the policy decision explicit. |
+
+## Non-goals
+
+- Do not infer whether the implementation is correct in the world.
+- Do not require full semantic diffing before Receipt Lock can operate.
+- Do not treat staged evidence or a model explanation as equivalent to a fresh
+  command receipt.
+- Do not block every test edit. The goal is to keep the verification claim
+  honest when the test artifact changed after implementation.