nyx/docs/dynamic.md

213 lines
8.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Dynamic verification
Nyx verifies every `Confidence >= Medium` finding by default: it builds
a minimal harness, runs your code's entry point against a curated payload corpus
inside a sandbox, and records the verdict in each finding's evidence block.
## Headline metrics
The dynamic-verification overhaul ships with four published acceptance targets,
gated end-to-end by `scripts/m7_ship_gate.sh` (Phase 31) against the eval
corpus (OWASP Benchmark v1.2 + NIST SARD subset + the in-house curated set
from `tests/benchmark/corpus`):
| Metric | Target | Gate | Source |
| --- | --- | --- | --- |
| Unsupported% per `(cap, lang)` cell | < 20% | M7 Gate 1 | `tests/eval_corpus/budget.toml` `[default].unsupported_rate` |
| False-Confirmed% per cap | < 2% | M7 Gate 2 | `~/.cache/nyx/dynamic/events.jsonl` (`kind: feedback`, `wrong: true`) |
| Repro stability | 95% | M7 Gate 5 | `~/.cache/nyx/dynamic/repro/*/reproduce.sh` exit 0 |
| Wall-clock cost | 2× static-only | M7 Gate 3 | `benches/fixtures/` (default vs `--no-verify`) |
The corresponding orchestrator is `tests/eval_corpus/run_full.sh`; it bundles
the three corpus sets, writes a canonical `tests/eval_corpus/results.json`,
and propagates the per-cell budget through `tabulate.py` and `report.py`.
A non-zero exit from `m7_ship_gate.sh` is a hard merge blocker for the
default-on flip. Failures map back to the engine follow-ups recorded in
`.pitboss/play/deferred.md` (per-language probe-shim splicing, composite
chain reverifier wiring, telemetry-stability stamping, et al.).
## Default-on semantics
```
nyx scan # verifies Medium+ findings (default)
nyx scan --no-verify # static analysis only, no harness execution
nyx scan --verify # same as default; explicit for clarity in scripts
```
`--no-verify` is the escape hatch. It overrides the config default for a single
run without changing `nyx.toml`.
### What "verified" means
A finding with `dynamic_verdict.status: Confirmed` was successfully triggered
by at least one payload in nyx's corpus. The corpus covers common patterns for
each vulnerability class (SQL injection, XSS, command injection, SSRF, etc.) per
language.
A finding with `dynamic_verdict.status: NotConfirmed` was attempted but no
payload fired. This is not a false-positive signal. It means the corpus did not
have a payload that matched the specific sink variant, or the execution path was
not reachable in the test harness.
A finding with `dynamic_verdict.status: Unsupported` could not be attempted.
Common reasons: confidence below threshold, no flow steps, language or sink type
not yet supported by the harness layer.
### Confidence gate
Only `Confidence >= Medium` findings are verified by default 5.1). To also
verify low-confidence findings (for corpus building or backfill), pass
`--verify-all-confidence`:
```
nyx scan --verify-all-confidence
```
This is not recommended for production scans because low-confidence findings have
a higher false-positive rate and the harness may produce unreliable verdicts.
## nyx.toml opt-out
If you want static-only scans permanently, set `verify = false` in `nyx.toml`:
```toml
[scanner]
verify = false
```
This survives upgrades. The M7 default flip only changes the inherited default
for projects that have not explicitly set the field.
## Sandbox backends
nyx uses docker when available, then falls back to an in-process runner:
```
nyx scan --backend docker # require docker; fail if unavailable
nyx scan --backend process # in-process runner (no container; less isolation)
nyx scan --unsafe-sandbox # alias for --backend process
```
The docker backend mounts only the entry file's directory and blocks all
outbound network by default. When out-of-band detection is enabled (`oob_listener`
in config), the container gets `--network bridge` with a host-gateway route.
## Repro artifacts
When a finding is `Confirmed`, nyx writes a repro artifact to
`~/.cache/nyx/repro/<stable_hash>/`. The artifact contains the harness spec and
the triggering payload. You can regenerate the verdict with:
```
nyx scan --verify <path> # re-scans and re-verifies
```
See `docs/output.md` for the `dynamic_verdict` field schema.
## Wall-clock cost
Verification adds harness build + sandbox startup time per finding. On typical
codebases with 1050 Medium+ findings, end-to-end overhead is 25× static-only.
If scan time is unacceptable for a given workflow (e.g. IDE integration, quick
pre-commit check), use `--no-verify` for that workflow and rely on the full scan
in CI.
## Event schema
The dynamic layer writes one JSON record per verdict to
`~/.cache/nyx/dynamic/events.jsonl`. Every record begins with a fixed envelope
so older readers fail loudly instead of silently mixing incompatible shapes:
```json
{
"schema_version": 1,
"nyx_version": "0.7.0",
"corpus_version": "4",
"kind": "verdict",
"ts": "2026-05-15T18:42:09Z",
"finding_id": "a3b1...",
"spec_hash": "9f4e...",
"lang": "python",
"cap": "SQL_QUERY",
"status": "Confirmed",
"toolchain_id": "python-3.11",
"toolchain_match": "exact",
"duration_ms": 312,
"build_attempts": 1
}
```
| Field | Type | Meaning |
| --- | --- | --- |
| `schema_version` | integer | Bumped on any breaking change. Readers reject mismatches. |
| `nyx_version` | string | `CARGO_PKG_VERSION` of the writing binary. |
| `corpus_version` | string | Payload-corpus version the verdict was scored against. |
| `kind` | string | `"verdict"` (per-finding) or `"rank_delta"` (rank-score shift). |
| `ts` | RFC-3339 string | Wall-clock at write time. |
| `finding_id` | string | Stable finding identifier. |
| `spec_hash` | string | Hash of the `HarnessSpec` that drove the run. |
| `lang` | string | Language slug; `"unknown"` when spec derivation failed. |
| `cap` | string | Sink capability (e.g. `SQL_QUERY`, `CODE_EXEC`). |
| `status` | string | `Confirmed`, `NotConfirmed`, `Inconclusive`, or `Unsupported`. |
| `inconclusive_reason` | string | Present iff `status == Inconclusive`. |
A `rank_delta` record carries the envelope plus `finding_id`, `status`, and a
signed `delta` applied to the rank score.
### Schema-version mismatch
`scripts/m7_ship_gate.sh` Gate 2 walks every line of the log, requires
`schema_version == EXPECTED_SCHEMA_VERSION`, and exits 3 if any record fails
the check. Programmatic readers use
`crate::dynamic::telemetry::read_events(path)`, which surfaces the same
condition as `TelemetryReadError::SchemaMismatch { expected, found, .. }`.
When schema bumps land, the canonical migration is to roll the log over (move
or delete `events.jsonl`) so new and old records never coexist in a file. The
gate refuses to skip silently on mismatch.
### Sampling
`[telemetry]` in `nyx.toml` controls the on-disk sampling policy:
```toml
[telemetry]
keep_all_confirmed = true # default: retain every Confirmed verdict
keep_all_inconclusive = true # default: retain every Inconclusive verdict
sample_rate_other = 1.0 # 0.01.0 for NotConfirmed / Unsupported
```
`sample_rate_other < 1.0` downsamples NotConfirmed and Unsupported verdicts
deterministically. The decision is seeded by the finding's `spec_hash`, so a
given finding makes the same keep-or-drop call across reruns. Confirmed and
Inconclusive verdicts ignore the rate and are always retained (they gate the
false-Confirmed budget and drive the spec-derivation roadmap).
Rank-delta records (emitted by `emit_rank_delta` when a verdict shifts a
finding's position in the ranked output) are also retained unconditionally and
do **not** consult `sample_rate_other`. They are calibration-critical and small
in volume, so the carve-out is intentional; setting `sample_rate_other = 0.0`
to throttle log growth will still produce rank-delta lines.
`NYX_NO_TELEMETRY=1` disables every write regardless of the policy.
## Opting in to feedback
False positives (nyx says `Confirmed` but you disagree) can be recorded:
```
nyx verify-feedback <finding_id> --wrong "reason"
```
This writes to the local telemetry log (`~/.cache/nyx/dynamic/events.jsonl`)
and contributes to precision monitoring. Feedback is never uploaded automatically.
## nyx serve integration
The browser UI shows `dynamic_verdict` in each finding's detail panel and
uses the verdict in ranking (Confirmed findings surface first). The scan compare
page has a **Verdict Diff** tab that shows which findings changed verification
status between two scans.