[pitboss] phase 27: Track H.1 + H.2 — Telemetry schema versioning + sampling

This commit is contained in:
pitboss 2026-05-15 18:16:14 -05:00
parent ea722dc9ca
commit 3ed3a9e518
8 changed files with 799 additions and 41 deletions

View file

@ -91,6 +91,79 @@ If scan time is unacceptable for a given workflow (e.g. IDE integration, quick
pre-commit check), use `--no-verify` for that workflow and rely on the full scan
in CI.
## Event schema
The dynamic layer writes one JSON record per verdict to
`~/.cache/nyx/dynamic/events.jsonl`. Every record begins with a fixed envelope
so older readers fail loudly instead of silently mixing incompatible shapes:
```json
{
"schema_version": 1,
"nyx_version": "0.7.0",
"corpus_version": "4",
"kind": "verdict",
"ts": "2026-05-15T18:42:09Z",
"finding_id": "a3b1...",
"spec_hash": "9f4e...",
"lang": "python",
"cap": "SQL_QUERY",
"status": "Confirmed",
"toolchain_id": "python-3.11",
"toolchain_match": "exact",
"duration_ms": 312,
"build_attempts": 1
}
```
| Field | Type | Meaning |
| --- | --- | --- |
| `schema_version` | integer | Bumped on any breaking change. Readers reject mismatches. |
| `nyx_version` | string | `CARGO_PKG_VERSION` of the writing binary. |
| `corpus_version` | string | Payload-corpus version the verdict was scored against. |
| `kind` | string | `"verdict"` (per-finding) or `"rank_delta"` (rank-score shift). |
| `ts` | RFC-3339 string | Wall-clock at write time. |
| `finding_id` | string | Stable finding identifier. |
| `spec_hash` | string | Hash of the `HarnessSpec` that drove the run. |
| `lang` | string | Language slug; `"unknown"` when spec derivation failed. |
| `cap` | string | Sink capability (e.g. `SQL_QUERY`, `CODE_EXEC`). |
| `status` | string | `Confirmed`, `NotConfirmed`, `Inconclusive`, or `Unsupported`. |
| `inconclusive_reason` | string | Present iff `status == Inconclusive`. |
A `rank_delta` record carries the envelope plus `finding_id`, `status`, and a
signed `delta` applied to the rank score.
### Schema-version mismatch
`scripts/m7_ship_gate.sh` Gate 2 walks every line of the log, requires
`schema_version == EXPECTED_SCHEMA_VERSION`, and exits 3 if any record fails
the check. Programmatic readers use
`crate::dynamic::telemetry::read_events(path)`, which surfaces the same
condition as `TelemetryReadError::SchemaMismatch { expected, found, .. }`.
When schema bumps land, the canonical migration is to roll the log over (move
or delete `events.jsonl`) so new and old records never coexist in a file. The
gate refuses to skip silently on mismatch.
### Sampling
`[telemetry]` in `nyx.toml` controls the on-disk sampling policy:
```toml
[telemetry]
keep_all_confirmed = true # default: retain every Confirmed verdict
keep_all_inconclusive = true # default: retain every Inconclusive verdict
sample_rate_other = 1.0 # 0.01.0 for NotConfirmed / Unsupported
```
`sample_rate_other < 1.0` downsamples NotConfirmed and Unsupported verdicts
deterministically — the decision is seeded by the finding's `spec_hash`, so a
given finding makes the same keep-or-drop call across reruns. Confirmed and
Inconclusive verdicts ignore the rate and are always retained (they gate the
false-Confirmed budget and drive the spec-derivation roadmap).
`NYX_NO_TELEMETRY=1` disables every write regardless of the policy.
## Opting in to feedback
False positives (nyx says `Confirmed` but you disagree) can be recorded: