From e1f37965236fc801847fad71cc1a2b78e0750a76 Mon Sep 17 00:00:00 2001 From: Caio Ribeiro Date: Thu, 18 Jun 2026 23:02:32 +0000 Subject: [PATCH] docs: add test integrity delta receipt sketch --- docs/SANHEDRIN_RECEIPTS.md | 2 + docs/SANHEDRIN_TEST_INTEGRITY_DELTAS.md | 110 ++++++++++++++++++++++++ 2 files changed, 112 insertions(+) create mode 100644 docs/SANHEDRIN_TEST_INTEGRITY_DELTAS.md diff --git a/docs/SANHEDRIN_RECEIPTS.md b/docs/SANHEDRIN_RECEIPTS.md index ac0bd4d..2213c58 100644 --- a/docs/SANHEDRIN_RECEIPTS.md +++ b/docs/SANHEDRIN_RECEIPTS.md @@ -12,6 +12,8 @@ instead of opaque. The current schema is `vestige.sanhedrin.receipt.v1`. - Appeals: `~/.vestige/sanhedrin/appeals.jsonl` - Fail-open events: `~/.vestige/sanhedrin/fail-open.jsonl` +Optional companion schema: [`SANHEDRIN_TEST_INTEGRITY_DELTAS.md`](SANHEDRIN_TEST_INTEGRITY_DELTAS.md) describes mechanical deltas for cases where a verifier command passed but the test artifact changed after implementation. + ## v1 JSON Shape ```json diff --git a/docs/SANHEDRIN_TEST_INTEGRITY_DELTAS.md b/docs/SANHEDRIN_TEST_INTEGRITY_DELTAS.md new file mode 100644 index 0000000..c9d12dc --- /dev/null +++ b/docs/SANHEDRIN_TEST_INTEGRITY_DELTAS.md @@ -0,0 +1,110 @@ +# Sanhedrin Test-Integrity Delta Receipts + +Receipt Lock proves a narrower claim: a verification command actually ran and +succeeded. Test-integrity deltas are an optional companion receipt for the +stronger claim that the tests still mean what the draft says they mean. + +This receipt is intentionally mechanical. It is not a broad correctness oracle +and it does not ask a second model to decide whether the implementation is good. +It records whether the verification artifact changed in ways that should +upgrade, downgrade, or send the verification claim to human review. + +## Boundary + +Keep these claims separate: + +1. **Command receipt:** `cargo test`, `npm test`, `pytest`, or another verifier + command ran after the relevant edit and exited successfully. +2. **Test-integrity delta:** the tests/specs behind that verifier were not + removed, skipped, weakened, or replaced after implementation in a way that + makes the green result less admissible. + +A run can have a valid command receipt and still receive a downgraded +integrity decision. + +## Optional JSON Shape + +```json +{ + "schema": "vestige.sanhedrin.test_integrity_delta.v1", + "id": "tid_", + "commandReceiptId": "receipt_", + "verificationClaim": "All tests passed.", + "specSource": { + "contextId": "spec_ctx_04", + "testFiles": [ + { + "path": "tests/cart.test.ts", + "hashBeforeImplementation": "sha256:...", + "hashAfterVerification": "sha256:..." + } + ] + }, + "implementationContext": "impl_ctx_09", + "verifierContext": "verify_ctx_02", + "delta": { + "testFilesChangedAfterImplementation": true, + "removedOrDisabledTests": [ + { + "kind": "skip_or_only", + "path": "tests/cart.test.ts", + "line": 42 + } + ], + "removedAssertions": 2, + "weakenedExpectations": [ + { + "path": "tests/cart.test.ts", + "from": "throws InvalidCouponError", + "to": "does not throw" + } + ], + "snapshotChurnWithoutSourceChange": false, + "coverageDelta": -3.8, + "mocksReplacingRealBoundary": [ + { + "module": "PaymentGateway", + "before": "integration-ish fake", + "after": "empty stub" + } + ] + }, + "freshVerifier": { + "commandReceiptId": "receipt_", + "exitCode": 0, + "checkedAfterLastRelevantEdit": true + }, + "decision": "downgraded", + "reason": "tests passed, but the tests were weakened after implementation" +} +``` + +## Decisions + +- `accepted` — a verifier command succeeded after the last relevant edit and no + integrity downgrade was detected. +- `downgraded` — the command succeeded, but the tests/specs changed in a way + that makes the verification claim weaker than stated. +- `needs_human_review` — the delta may be legitimate, but a local mechanical + check cannot safely classify it. Snapshot updates are a common example. + +## Minimal Fixture Suite + +These cases are small enough to live as fixtures without turning Sanhedrin into +a correctness judge. + +| Case | Input pattern | Expected decision | Why | +| --- | --- | --- | --- | +| unchanged-good | implementation changes source; tests unchanged; fresh verifier succeeds | `accepted` | Green tests are supported by a fresh command receipt and unchanged test artifact. | +| skipped-test | implementation adds `.skip`, `.only`, `#[ignore]`, or equivalent before verifier succeeds | `downgraded` | The command ran, but the claim no longer represents the original test obligation. | +| weakened-assertion | expectation is relaxed after implementation, e.g. `throws InvalidCouponError` -> `does not throw` | `downgraded` | The verifier passed against a weaker assertion than the one available before implementation. | +| justified-snapshot | snapshot changes alongside an intentional source/UI change | `needs_human_review` or `accepted` by policy | Snapshot churn can be valid, but the receipt should make the policy decision explicit. | + +## Non-goals + +- Do not infer whether the implementation is correct in the world. +- Do not require full semantic diffing before Receipt Lock can operate. +- Do not treat staged evidence or a model explanation as equivalent to a fresh + command receipt. +- Do not block every test edit. The goal is to keep the verification claim + honest when the test artifact changed after implementation.