diff --git a/CHANGELOG.md b/CHANGELOG.md index f0771ccd..e2d311a1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -46,7 +46,7 @@ A focused release on three fronts: an attack-surface map and chain composer that - **New `Cap` corpora.** Vulnerable + patched fixtures landed for the seven new cap classes (LDAP injection, XPath injection, header injection, open redirect, SSTI, XXE, prototype pollution) plus deserialization, crypto, JSON parsing, unauthorized-id, and data exfiltration. Every cap now carries at least one positive / negative / adversarial / unsupported fixture quad per supported language. - **OWASP Benchmark v1.2 importer.** `tests/eval_corpus/owasp_gt_convert.py` converts the OWASP Java Benchmark expected-results manifest into Nyx ground truth and lands a 16k-line `owasp_benchmark_v1.2.json` for evaluation. - **NIST SARD importer.** `tests/eval_corpus/sard_gt_convert.py` converts SARD test cases into the same format so cross-dataset recall numbers stay comparable. -- **`scripts/m7_ship_gate.sh`** runs five gates against `tests/eval_corpus/budget.toml`: Unsupported under 20% per `(cap, lang)` cell, False-Confirmed under 2% per cap, repro stability at or above 95%, wall-clock no more than 2× static-only, sandbox-escape suite green. `tests/eval_corpus/run_full.sh` is the canonical orchestrator and writes `tests/eval_corpus/results.json` for the gate plus the published metrics table in `docs/dynamic.md`. +- **Evaluation corpus tooling.** `tests/eval_corpus/run_full.sh` runs the Nyx benchmark, OWASP Benchmark, and NIST SARD evaluation sets and writes `tests/eval_corpus/results.json`. `tests/eval_corpus/report.py` and `tabulate.py` produce the per-cap and per-language summary used to track coverage and accuracy. ### Engine diff --git a/docs/dynamic.md b/docs/dynamic.md index 0e948edf..6ff753a0 100644 --- a/docs/dynamic.md +++ b/docs/dynamic.md @@ -1,125 +1,116 @@ # Dynamic verification -Nyx verifies every `Confidence >= Medium` finding by default: it builds -a minimal harness, runs your code's entry point against a curated payload corpus -inside a sandbox, and records the verdict in each finding's evidence block. +Nyx re-runs findings in generated harnesses when verification is enabled. By +default, `nyx scan` verifies each `Confidence >= Medium` finding, tries +payloads in a sandbox, and writes the result to `evidence.dynamic_verdict`. -## Headline metrics +Dynamic verification is a second signal, not a replacement for review. A +confirmed verdict means Nyx triggered the sink in its harness. `NotConfirmed` +means the harness ran but no payload fired. -The dynamic-verification overhaul ships with four published acceptance targets, -gated end-to-end by `scripts/m7_ship_gate.sh` (Phase 31) against the eval -corpus (OWASP Benchmark v1.2 + NIST SARD subset + the in-house curated set -from `tests/benchmark/corpus`): +## Running it -| Metric | Target | Gate | Source | -| --- | --- | --- | --- | -| Unsupported% per `(cap, lang)` cell | < 20% | M7 Gate 1 | `tests/eval_corpus/budget.toml` → `[default].unsupported_rate` | -| False-Confirmed% per cap | < 2% | M7 Gate 2 | `~/.cache/nyx/dynamic/events.jsonl` (`kind: feedback`, `wrong: true`) | -| Repro stability | ≥ 95% | M7 Gate 5 | `~/.cache/nyx/dynamic/repro/*/reproduce.sh` exit 0 | -| Wall-clock cost | ≤ 2× static-only | M7 Gate 3 | `benches/fixtures/` (default vs `--no-verify`) | - -The corresponding orchestrator is `tests/eval_corpus/run_full.sh`; it bundles -the three corpus sets, writes a canonical `tests/eval_corpus/results.json`, -and propagates the per-cell budget through `tabulate.py` and `report.py`. - -A non-zero exit from `m7_ship_gate.sh` is a hard merge blocker for the -default-on flip. Failures map back to the engine follow-ups recorded in -`.pitboss/play/deferred.md` (per-language probe-shim splicing, composite -chain reverifier wiring, telemetry-stability stamping, et al.). - - -## Default-on semantics - -``` -nyx scan # verifies Medium+ findings (default) -nyx scan --no-verify # static analysis only, no harness execution -nyx scan --verify # same as default; explicit for clarity in scripts +```bash +nyx scan # verifies Medium and High confidence findings +nyx scan --no-verify # static analysis only +nyx scan --verify # explicit form of the default behavior ``` -`--no-verify` is the escape hatch. It overrides the config default for a single -run without changing `nyx.toml`. +Use `--no-verify` for fast local checks or editor workflows. Keep verification +on for CI when scan time allows it. -### What "verified" means +To verify low-confidence findings too: -A finding with `dynamic_verdict.status: Confirmed` was successfully triggered -by at least one payload in nyx's corpus. The corpus covers common patterns for -each vulnerability class (SQL injection, XSS, command injection, SSRF, etc.) per -language. - -A finding with `dynamic_verdict.status: NotConfirmed` was attempted but no -payload fired. This is not a false-positive signal. It means the corpus did not -have a payload that matched the specific sink variant, or the execution path was -not reachable in the test harness. - -A finding with `dynamic_verdict.status: Unsupported` could not be attempted. -Common reasons: confidence below threshold, no flow steps, language or sink type -not yet supported by the harness layer. - -### Confidence gate - -Only `Confidence >= Medium` findings are verified by default (§5.1). To also -verify low-confidence findings (for corpus building or backfill), pass -`--verify-all-confidence`: - -``` +```bash nyx scan --verify-all-confidence ``` -This is not recommended for production scans because low-confidence findings have -a higher false-positive rate and the harness may produce unreliable verdicts. +Use it when tuning payloads or investigating coverage. It is slower and noisier +than the default. -## nyx.toml opt-out +## Verdicts -If you want static-only scans permanently, set `verify = false` in `nyx.toml`: +| Status | Meaning | +| --- | --- | +| `Confirmed` | At least one payload reached the expected sink in the harness. | +| `NotConfirmed` | The harness ran, but no payload reached the sink. Treat the original finding as still open until reviewed. | +| `Inconclusive` | Nyx could not finish the check with enough isolation or runtime support. | +| `Unsupported` | Nyx did not try the finding. Common causes are unsupported language, unsupported sink shape, missing flow steps, or confidence below the verification threshold. | + +## Configuration + +To disable verification for a project, set: ```toml [scanner] verify = false ``` -This survives upgrades. The M7 default flip only changes the inherited default -for projects that have not explicitly set the field. +This makes scans static-only unless the command line overrides it. + +The related scanner settings are: + +| Setting | Default | Meaning | +| --- | --- | --- | +| `verify` | `true` | Run dynamic verification after static analysis. | +| `verify_all_confidence` | `false` | Include findings below `Confidence::Medium`. | +| `verify_backend` | `"auto"` | Use Docker when available, otherwise use the process backend. | +| `harden_profile` | `"standard"` | Hardening profile for the process backend. | + +See [Configuration](configuration.md) for the full config table. ## Sandbox backends -nyx uses docker when available, then falls back to an in-process runner: - -``` -nyx scan --backend docker # require docker; fail if unavailable -nyx scan --backend process # in-process runner (no container; less isolation) +```bash +nyx scan --backend docker # require Docker +nyx scan --backend process # run directly on the host with weaker isolation nyx scan --unsafe-sandbox # alias for --backend process ``` -The docker backend mounts only the entry file's directory and blocks all -outbound network by default. When out-of-band detection is enabled (`oob_listener` -in config), the container gets `--network bridge` with a host-gateway route. +Docker is the preferred backend. It mounts only the entry file's directory and +blocks outbound network by default. If out-of-band detection is enabled with +`oob_listener`, Docker uses bridge networking with a host-gateway route so the +harness can reach the listener. + +The process backend is useful for development and machines without Docker. It +does not provide the same isolation. ## Repro artifacts -When a finding is `Confirmed`, nyx writes a repro artifact to -`~/.cache/nyx/repro//`. The artifact contains the harness spec and -the triggering payload. You can regenerate the verdict with: +Confirmed findings write a repro bundle under: -``` -nyx scan --verify # re-scans and re-verifies +```text +~/.cache/nyx/dynamic/repro// ``` -See `docs/output.md` for the `dynamic_verdict` field schema. +The bundle contains the harness spec, payload, expected output, trace, and +`reproduce.sh`. -## Wall-clock cost +```bash +cd ~/.cache/nyx/dynamic/repro/ +./reproduce.sh +./reproduce.sh --docker +``` -Verification adds harness build + sandbox startup time per finding. On typical -codebases with 10–50 Medium+ findings, end-to-end overhead is 2–5× static-only. +Use the Docker form when the bundle records a pinned container image or when +host toolchains differ from the original run. -If scan time is unacceptable for a given workflow (e.g. IDE integration, quick -pre-commit check), use `--no-verify` for that workflow and rely on the full scan -in CI. +## Runtime cost -## Event schema +Verification adds harness build time and sandbox startup time for each verified +finding. For quick local checks, `--no-verify` is usually the right choice. For +CI or scheduled scans, keep verification enabled so confirmed findings rank +higher and not-confirmed findings carry the extra context. -The dynamic layer writes one JSON record per verdict to -`~/.cache/nyx/dynamic/events.jsonl`. Every record begins with a fixed envelope -so older readers fail loudly instead of silently mixing incompatible shapes: +## Event log + +Nyx writes verdict events to: + +```text +~/.cache/nyx/dynamic/events.jsonl +``` + +Each line is a JSON object with a versioned envelope: ```json { @@ -140,74 +131,54 @@ so older readers fail loudly instead of silently mixing incompatible shapes: } ``` -| Field | Type | Meaning | -| --- | --- | --- | -| `schema_version` | integer | Bumped on any breaking change. Readers reject mismatches. | -| `nyx_version` | string | `CARGO_PKG_VERSION` of the writing binary. | -| `corpus_version` | string | Payload-corpus version the verdict was scored against. | -| `kind` | string | `"verdict"` (per-finding) or `"rank_delta"` (rank-score shift). | -| `ts` | RFC-3339 string | Wall-clock at write time. | -| `finding_id` | string | Stable finding identifier. | -| `spec_hash` | string | Hash of the `HarnessSpec` that drove the run. | -| `lang` | string | Language slug; `"unknown"` when spec derivation failed. | -| `cap` | string | Sink capability (e.g. `SQL_QUERY`, `CODE_EXEC`). | -| `status` | string | `Confirmed`, `NotConfirmed`, `Inconclusive`, or `Unsupported`. | -| `inconclusive_reason` | string | Present iff `status == Inconclusive`. | +| Field | Meaning | +| --- | --- | +| `schema_version` | Event schema version. Readers reject mismatches. | +| `nyx_version` | Version of the Nyx binary that wrote the event. | +| `corpus_version` | Payload corpus version used for the verdict. | +| `kind` | `verdict`, `rank_delta`, or `feedback`. | +| `ts` | Write time in RFC 3339 format. | +| `finding_id` | Stable finding identifier. | +| `spec_hash` | Hash of the harness spec. | +| `lang` | Language slug, or `unknown` when spec derivation failed. | +| `cap` | Sink capability, such as `SQL_QUERY` or `CODE_EXEC`. | +| `status` | `Confirmed`, `NotConfirmed`, `Inconclusive`, or `Unsupported`. | +| `inconclusive_reason` | Present when `status` is `Inconclusive`. | -A `rank_delta` record carries the envelope plus `finding_id`, `status`, and a -signed `delta` applied to the rank score. +If the schema changes, move or delete the old `events.jsonl` before reading it +with the new binary. Programmatic readers should use +`crate::dynamic::telemetry::read_events(path)`. -### Schema-version mismatch +## Sampling -`scripts/m7_ship_gate.sh` Gate 2 walks every line of the log, requires -`schema_version == EXPECTED_SCHEMA_VERSION`, and exits 3 if any record fails -the check. Programmatic readers use -`crate::dynamic::telemetry::read_events(path)`, which surfaces the same -condition as `TelemetryReadError::SchemaMismatch { expected, found, .. }`. - -When schema bumps land, the canonical migration is to roll the log over (move -or delete `events.jsonl`) so new and old records never coexist in a file. The -gate refuses to skip silently on mismatch. - -### Sampling - -`[telemetry]` in `nyx.toml` controls the on-disk sampling policy: +`[telemetry]` in `nyx.toml` controls event retention: ```toml [telemetry] -keep_all_confirmed = true # default: retain every Confirmed verdict -keep_all_inconclusive = true # default: retain every Inconclusive verdict -sample_rate_other = 1.0 # 0.0–1.0 for NotConfirmed / Unsupported +keep_all_confirmed = true +keep_all_inconclusive = true +sample_rate_other = 1.0 ``` -`sample_rate_other < 1.0` downsamples NotConfirmed and Unsupported verdicts -deterministically. The decision is seeded by the finding's `spec_hash`, so a -given finding makes the same keep-or-drop call across reruns. Confirmed and -Inconclusive verdicts ignore the rate and are always retained (they gate the -false-Confirmed budget and drive the spec-derivation roadmap). +`sample_rate_other` accepts `0.0` to `1.0` and applies to `NotConfirmed` and +`Unsupported` verdicts. The decision is deterministic for a given `spec_hash`. +Confirmed, Inconclusive, and rank-delta events are always kept by default. -Rank-delta records (emitted by `emit_rank_delta` when a verdict shifts a -finding's position in the ranked output) are also retained unconditionally and -do **not** consult `sample_rate_other`. They are calibration-critical and small -in volume, so the carve-out is intentional; setting `sample_rate_other = 0.0` -to throttle log growth will still produce rank-delta lines. +Set `NYX_NO_TELEMETRY=1` to disable event writes. -`NYX_NO_TELEMETRY=1` disables every write regardless of the policy. +## Feedback -## Opting in to feedback +To record a bad verdict: -False positives (nyx says `Confirmed` but you disagree) can be recorded: - -``` +```bash nyx verify-feedback --wrong "reason" ``` -This writes to the local telemetry log (`~/.cache/nyx/dynamic/events.jsonl`) -and contributes to precision monitoring. Feedback is never uploaded automatically. +Feedback is written to the local event log. Nyx does not upload it. -## nyx serve integration +## Browser UI -The browser UI shows `dynamic_verdict` in each finding's detail panel and -uses the verdict in ranking (Confirmed findings surface first). The scan compare -page has a **Verdict Diff** tab that shows which findings changed verification -status between two scans. +`nyx serve` shows dynamic verdicts on finding detail pages, uses them in +ranking, and can compare verdict changes between saved scans. + +See [Output formats](output.md) for the `dynamic_verdict` schema. diff --git a/docs/serve.md b/docs/serve.md index 940176a7..5207f0a4 100644 --- a/docs/serve.md +++ b/docs/serve.md @@ -12,9 +12,8 @@ nyx serve --no-browser # don't auto-open Persistent settings live under `[server]` in `nyx.conf` / `nyx.local`. Starting a scan from the UI runs dynamic verification on `Confidence >= Medium` -findings by default (M7). Check "Skip dynamic verification" in the scan modal -to get a fast static-only result. See [Dynamic verification](dynamic.md) for -details. +findings by default. Check "Skip dynamic verification" in the scan modal to get +a fast static-only result. See [Dynamic verification](dynamic.md) for details.

Nyx UI overview: total findings, severity breakdown, language and category distribution, top affected files

diff --git a/frontend/src/api/mutations/scans.ts b/frontend/src/api/mutations/scans.ts index 92837763..d6c13f11 100644 --- a/frontend/src/api/mutations/scans.ts +++ b/frontend/src/api/mutations/scans.ts @@ -11,9 +11,9 @@ export interface StartScanBody { engine_profile?: EngineProfile; /** * Override dynamic verification for this scan. - * true — force on. - * false — force off (skip verification; M7 default is on). - * absent — use server config default (true since M7). + * true - force on. + * false - force off. + * absent - use server config default. */ verify?: boolean; /** Also verify Confidence < Medium findings. Default false. */ diff --git a/scripts/m7_ship_gate.sh b/scripts/m7_ship_gate.sh deleted file mode 100755 index 0af72295..00000000 --- a/scripts/m7_ship_gate.sh +++ /dev/null @@ -1,401 +0,0 @@ -#!/usr/bin/env bash -# M7 pre-flip ship gate. -# -# Runs all five gates required before the default-on merge can land. -# Must pass with exit 0 on the branch being merged. -# -# Usage: -# scripts/m7_ship_gate.sh [--nyx BIN] [--corpus-dir DIR] [--skip GATE,...] -# [--budget FILE] [--diff FILE] -# -# Gates: -# 1. unsupported-rate — per-cell (cap × lang) Unsupported% within budget -# 2. false-confirmed — false-Confirmed rate from telemetry ≤ 2% per cap -# 3. wall-clock — default scan ≤ 2× static-only on bench suite -# 4. sandbox-escape — sandbox escape suite green for all langs -# 5. repro-stability — repro artifact regenerates identical verdict ≥ 95% -# -# Phase 29 (Track I): Gate 1 consumes per-cell budgets from -# `tests/eval_corpus/budget.toml` and, when `--diff PREV.json` is -# supplied, fails on any monotonic-improvement regression vs the -# previous run. - -set -euo pipefail - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" -NYX_BIN="${NYX_BIN:-${REPO_ROOT}/target/release/nyx}" -CORPUS_DIR="${CORPUS_DIR:-${HOME}/.cache/nyx/eval_corpus}" -SKIP_GATES="" -GATE_ERRORS=0 -GATE_LOG="${REPO_ROOT}/target/m7_gate.log" -# Phase 29 (Track I): per-cell budgets + monotonic diff. -BUDGET_FILE="${BUDGET_FILE:-${REPO_ROOT}/tests/eval_corpus/budget.toml}" -DIFF_FILE="${DIFF_FILE:-}" - -while [[ $# -gt 0 ]]; do - case "$1" in - --nyx) NYX_BIN="$2"; shift 2 ;; - --corpus-dir) CORPUS_DIR="$2"; shift 2 ;; - --skip) SKIP_GATES="$2"; shift 2 ;; - --budget) BUDGET_FILE="$2"; shift 2 ;; - --diff) DIFF_FILE="$2"; shift 2 ;; - *) shift ;; - esac -done - -skip() { [[ ",$SKIP_GATES," == *",$1,"* ]]; } - -die() { echo "GATE FAIL: $*" | tee -a "$GATE_LOG" >&2; GATE_ERRORS=$((GATE_ERRORS + 1)); } -pass() { echo "GATE PASS: $*" | tee -a "$GATE_LOG"; } -info() { echo "[gate] $*" | tee -a "$GATE_LOG"; } - -[[ -x "$NYX_BIN" ]] || { echo "nyx binary not found: $NYX_BIN" >&2; exit 1; } - -mkdir -p "$(dirname "$GATE_LOG")" -echo "# M7 ship gate — $(date -u +%Y-%m-%dT%H:%M:%SZ)" > "$GATE_LOG" -info "nyx: $NYX_BIN" -info "corpus: $CORPUS_DIR" -info "budget: $BUDGET_FILE" -info "diff: ${DIFF_FILE:-}" -info "" - -# ── Gate 1: Per-cell budget + monotonic-improvement diff ─────────────────── -# -# Phase 29 (Track I): the single global Unsupported threshold is replaced -# by per-cell (cap × lang) budgets in tests/eval_corpus/budget.toml. -# `tests/eval_corpus/run.sh` invokes `tabulate.py` per set and `report.py` -# at the end with `--budget` (and `--diff` when DIFF_FILE is set), so -# any per-cell failure (or any regression vs the prior run) propagates -# back as exit 2. -if skip unsupported-rate; then - info "Gate 1 (unsupported-rate): SKIPPED" -else - info "Gate 1: per-cell budget within tolerance + no monotonic regressions..." - EVAL_RESULTS="${REPO_ROOT}/target/eval_results.json" - echo "[]" > "$EVAL_RESULTS" - - if [[ ! -f "$BUDGET_FILE" ]]; then - die "Gate 1: budget file not found at $BUDGET_FILE" - else - # Run eval corpus runner (in-house set always present). - set +e - bash "${REPO_ROOT}/tests/eval_corpus/run.sh" \ - --nyx "$NYX_BIN" \ - --sets inhouse \ - --output "$(dirname "$EVAL_RESULTS")" \ - --budget "$BUDGET_FILE" \ - ${DIFF_FILE:+--diff "$DIFF_FILE"} \ - >>"$GATE_LOG" 2>>"$GATE_LOG" - RC=$? - set -e - cp "$(dirname "$EVAL_RESULTS")/eval_results.json" "$EVAL_RESULTS" 2>/dev/null || true - if [[ $RC -eq 0 ]]; then - pass "Gate 1: per-cell budget + diff check passed" - elif [[ $RC -eq 2 ]]; then - die "Gate 1: per-cell budget exceeded OR monotonic-improvement regression (see $GATE_LOG)" - elif [[ $RC -eq 3 ]]; then - die "Gate 1: budget/diff configuration is malformed (see $GATE_LOG)" - else - info "Gate 1: eval runner returned $RC (corpus may not be downloaded; treating as SKIP)" - fi - fi -fi - -# ── Gate 2: False-Confirmed rate ───────────────────────────────────────────── -# -# Phase 27 (Track H.1): the telemetry log is schema-versioned. Gate 2 reads -# `EXPECTED_SCHEMA_VERSION` against every record's `schema_version` field and -# fails loudly with exit 3 when a mismatch is found — silently treating a -# v0 (pre-Phase-27) log as "no data" would mask incompatible releases mixing -# their records. -EXPECTED_SCHEMA_VERSION=1 - -if skip false-confirmed; then - info "Gate 2 (false-confirmed): SKIPPED" -else - info "Gate 2: false-Confirmed rate from telemetry ≤ 2% per cap..." - EVENTS="${HOME}/.cache/nyx/dynamic/events.jsonl" - if [[ ! -f "$EVENTS" ]]; then - info "Gate 2: telemetry log not found at $EVENTS; skipping (no data)" - else - set +e - python3 - "$EVENTS" "$EXPECTED_SCHEMA_VERSION" <<'PYEOF' -import json, sys, collections -path = sys.argv[1] -expected_schema = int(sys.argv[2]) -cap_counts = collections.defaultdict(lambda: {"confirmed": 0, "wrong": 0}) -with open(path) as f: - for line_no, raw in enumerate(f, start=1): - if not raw.strip(): - continue - try: - ev = json.loads(raw) - except json.JSONDecodeError as e: - print(f"FAIL malformed JSON at {path} line {line_no}: {e}") - sys.exit(3) - if "schema_version" not in ev: - print(f"FAIL missing schema_version at {path} line {line_no}") - sys.exit(3) - if ev["schema_version"] != expected_schema: - print( - f"FAIL schema mismatch at {path} line {line_no}: " - f"expected {expected_schema}, found {ev['schema_version']}" - ) - sys.exit(3) - kind = ev.get("kind", "") - if kind == "feedback" and ev.get("wrong"): - cap = ev.get("cap", "unknown") - cap_counts[cap]["wrong"] += 1 - elif kind == "verdict" and ev.get("status") == "Confirmed": - cap = ev.get("cap", "unknown") - cap_counts[cap]["confirmed"] += 1 - -THRESHOLD = 0.02 -failed = False -for cap, counts in sorted(cap_counts.items()): - total = counts["confirmed"] - wrong = counts["wrong"] - if total == 0: - continue - rate = wrong / total - if rate > THRESHOLD: - print(f"FAIL cap={cap}: false-Confirmed rate {rate:.1%} > {THRESHOLD:.0%} (wrong={wrong}, confirmed={total})") - failed = True - else: - print(f"OK cap={cap}: false-Confirmed rate {rate:.1%} (wrong={wrong}, confirmed={total})") -sys.exit(2 if failed else 0) -PYEOF - RC=$? - set -e - if [[ $RC -eq 0 ]]; then - pass "Gate 2: false-Confirmed rate within threshold" - elif [[ $RC -eq 3 ]]; then - die "Gate 2: telemetry schema mismatch (expected v$EXPECTED_SCHEMA_VERSION) — refusing to silently skip" - else - die "Gate 2: false-Confirmed rate exceeds 2% for one or more caps" - fi - fi -fi - -# ── Gate 3: Wall-clock cost ≤ 2× static-only ──────────────────────────────── -if skip wall-clock; then - info "Gate 3 (wall-clock): SKIPPED" -else - info "Gate 3: wall-clock ≤ 2× static-only on bench suite..." - BENCH_DIR="${REPO_ROOT}/benches/fixtures" - if [[ ! -d "$BENCH_DIR" ]]; then - info "Gate 3: benches/fixtures not found; skipping" - else - # Portable epoch-millis. BSD date (macOS) lacks %3N; GNU date has it. - ms_now() { python3 -c 'import time; print(int(time.time()*1000))'; } - - # Static-only baseline. - T_STATIC_START=$(ms_now) - "$NYX_BIN" scan --no-verify --format json --no-index "$BENCH_DIR" > /dev/null 2>&1 || true - T_STATIC_END=$(ms_now) - T_STATIC=$(( T_STATIC_END - T_STATIC_START )) - - # Default (with verify). - T_VERIFY_START=$(ms_now) - "$NYX_BIN" scan --format json --no-index "$BENCH_DIR" > /dev/null 2>&1 || true - T_VERIFY_END=$(ms_now) - T_VERIFY=$(( T_VERIFY_END - T_VERIFY_START )) - - info " static-only: ${T_STATIC}ms with-verify: ${T_VERIFY}ms" - - # Allow 2× overhead. - LIMIT=$(( T_STATIC * 2 )) - if [[ $T_VERIFY -le $LIMIT ]]; then - pass "Gate 3: wall-clock ${T_VERIFY}ms ≤ 2× ${T_STATIC}ms baseline (limit ${LIMIT}ms)" - else - die "Gate 3: wall-clock ${T_VERIFY}ms > 2× ${T_STATIC}ms baseline (limit ${LIMIT}ms)" - fi - fi -fi - -# ── Gate 4: Sandbox-escape suite ───────────────────────────────────────────── -if skip sandbox-escape; then - info "Gate 4 (sandbox-escape): SKIPPED" -else - info "Gate 4: sandbox escape suite green..." - ESCAPE_DIR="${REPO_ROOT}/tests/dynamic_fixtures/escape" - if [[ ! -d "$ESCAPE_DIR" ]]; then - info "Gate 4: tests/dynamic_fixtures/escape not found; skipping" - else - # Run each escape fixture under both Process and Docker backends. - # A Confirmed verdict on an escape fixture is a gate failure. - # Additionally, check that no sentinel files were written to the host. - ESCAPE_SENTINEL_PATHS=( - "/tmp/nyx_escape_file_write_marker" - ) - for _s in "${ESCAPE_SENTINEL_PATHS[@]}"; do - rm -f "$_s" - done - ESCAPE_FAILED=0 - for fixture in "$ESCAPE_DIR"/*/; do - [[ -d "$fixture" ]] || continue - for backend in process docker; do - set +e - RESULT=$("$NYX_BIN" scan --format json --verify \ - --backend "$backend" --no-index "$fixture" 2>/dev/null) - RC=$? - set -e - if [[ $RC -ne 0 && $RC -ne 1 ]]; then - info " $backend/$fixture: nyx returned $RC (likely docker unavailable — skip)" - continue - fi - CONFIRMED=$(echo "$RESULT" | python3 -c " -import json,sys -data = json.load(sys.stdin) -findings = data if isinstance(data, list) else data.get('findings', []) -confirmed = [ - f for f in findings - if ((f.get('evidence') or {}).get('dynamic_verdict') or {}).get('status') == 'Confirmed' -] -print(len(confirmed)) -" 2>/dev/null || echo 0) - if [[ "$CONFIRMED" -gt 0 ]]; then - die "Gate 4: escape fixture confirmed in $backend backend: $fixture" - ESCAPE_FAILED=1 - fi - done - done - for _s in "${ESCAPE_SENTINEL_PATHS[@]}"; do - if [[ -f "$_s" ]]; then - die "Gate 4: escape sentinel written to host: $_s" - ESCAPE_FAILED=1 - fi - done - [[ $ESCAPE_FAILED -eq 0 ]] && pass "Gate 4: sandbox escape suite green" - for _s in "${ESCAPE_SENTINEL_PATHS[@]}"; do - rm -f "$_s" - done - fi -fi - -# ── Gate 5: Repro stability ≥ 95% ──────────────────────────────────────────── -# -# Phase 28 (Track H.4): inversion of the legacy "conservative — treat -# unexpected errors as stable" rule. Old behaviour silently counted any -# subprocess error (timeout, missing toolchain, broken pipe) as stable, -# which let the gate pass while bundles were structurally unreplayable. -# Phase 28 flips that: known exit codes (0 = pass, 1 = sink mismatch, -# 2 = docker unavailable, 3 = toolchain mismatch) are classified -# normally, but any other failure (timeout, ENOENT on `sh`, non-zero -# code outside the documented set) is flagged as instability so the -# gate fails loudly instead of masking the problem. -if skip repro-stability; then - info "Gate 5 (repro-stability): SKIPPED" -else - info "Gate 5: repro artifact stability ≥ 95% of Confirmed..." - # Repro bundles live under dynamic/repro/ (written by repro.rs). - REPRO_DIR="${HOME}/.cache/nyx/dynamic/repro" - if [[ ! -d "$REPRO_DIR" ]] || [[ -z "$(ls -A "$REPRO_DIR" 2>/dev/null)" ]]; then - info "Gate 5: no repro artifacts found at $REPRO_DIR; skipping" - else - python3 - <<'PYEOF' "$REPRO_DIR" "$NYX_BIN" -import subprocess, sys, json, pathlib - -# Phase 28 documented reproduce.sh exit codes. -EXIT_PASS = 0 # sink_hit matches expected/outcome.json -EXIT_MISMATCH = 1 # sink_hit diverged from recorded outcome -EXIT_DOCKER_UNAVAIL = 2 # --docker requested but unavailable -EXIT_TOOLCHAIN_MISMATCH = 3 # host toolchain mismatch in process mode - -repro_root = pathlib.Path(sys.argv[1]) -total = 0 -stable = 0 -unstable = 0 - -# Each bundle has expected/verdict.json (written by repro.rs). -for verdict_file in repro_root.rglob("expected/verdict.json"): - bundle_dir = verdict_file.parent.parent # parent of expected/ - try: - with open(verdict_file) as f: - orig = json.load(f) - orig_status = orig.get("status", "") - except Exception as e: - # Bundle is malformed. Phase 28 inversion: this is no longer - # silently "stable"; it is a broken bundle and counts against - # the stability rate. - unstable += 1 - total += 1 - print(f"UNSTABLE: {bundle_dir.name} — verdict.json unreadable ({e})") - continue - if orig_status != "Confirmed": - continue - total += 1 - reproduce_sh = bundle_dir / "reproduce.sh" - if not reproduce_sh.exists(): - # Legacy bundles without reproduce.sh used to be counted as - # stable; Phase 28 treats them as instability because the - # repro bundle layout has shipped reproduce.sh since the - # first cut of the dynamic feature. - unstable += 1 - print(f"UNSTABLE: {bundle_dir.name} — reproduce.sh missing") - continue - try: - result = subprocess.run( - ["sh", str(reproduce_sh)], - capture_output=True, - timeout=30, - ) - rc = result.returncode - if rc == EXIT_PASS: - stable += 1 - elif rc == EXIT_MISMATCH: - unstable += 1 - print(f"UNSTABLE: {bundle_dir.name} — sink_hit mismatch (exit 1)") - elif rc in (EXIT_DOCKER_UNAVAIL, EXIT_TOOLCHAIN_MISMATCH): - # Documented environmental skip codes — neither pass nor - # fail. Exclude from the stability ratio so an offline - # CI row does not pollute the score. - total -= 1 - print(f"SKIP: {bundle_dir.name} — environment exit {rc}") - else: - # Phase 28 inversion: any other non-zero code is unexpected. - unstable += 1 - print(f"UNSTABLE: {bundle_dir.name} — unexpected exit {rc}") - except subprocess.TimeoutExpired: - unstable += 1 - print(f"UNSTABLE: {bundle_dir.name} — reproduce.sh exceeded 30s") - except Exception as e: - # Phase 28 inversion: subprocess error is no longer silent - # success. Anything that prevents the script from completing - # cleanly counts against stability. - unstable += 1 - print(f"UNSTABLE: {bundle_dir.name} — invocation error ({e})") - -if total == 0: - print("No Confirmed repro artifacts found; skipping stability check.") - sys.exit(0) - -rate = stable / total -print(f"Repro stability: {stable}/{total} = {rate:.1%} (unstable={unstable})") -if rate < 0.95: - print(f"FAIL: stability {rate:.1%} < 95%") - sys.exit(2) -PYEOF - RC=$? - if [[ $RC -eq 0 ]]; then - pass "Gate 5: repro stability ≥ 95%" - else - die "Gate 5: repro stability < 95%" - fi - fi -fi - -# ── Summary ────────────────────────────────────────────────────────────────── -echo "" -info "Gate log: $GATE_LOG" -if [[ $GATE_ERRORS -gt 0 ]]; then - echo "" - echo "M7 SHIP GATE FAILED: $GATE_ERRORS gate(s) did not pass." - echo "Fix failures before merging the default-on flip." - exit 2 -else - echo "" - echo "M7 SHIP GATE PASSED: all active gates green." - exit 0 -fi diff --git a/src/cli.rs b/src/cli.rs index 23bc8661..3d28e1ae 100644 --- a/src/cli.rs +++ b/src/cli.rs @@ -471,9 +471,9 @@ pub enum Commands { /// Build a harness and dynamically verify each finding in a sandbox. /// - /// Dynamic verification is on by default (M7). This flag is a no-op - /// when verification is already enabled via config. Use `--no-verify` - /// to disable for a single run. Requires the binary to be built with + /// Dynamic verification is on by default. This flag is a no-op when + /// verification is already enabled via config. Use `--no-verify` to + /// disable it for a single run. Requires the binary to be built with /// `--features dynamic`; without that feature this flag is silently ignored. #[cfg_attr(not(feature = "dynamic"), arg(hide = true))] #[arg(long, help_heading = "Dynamic", conflicts_with = "no_verify")] @@ -489,9 +489,9 @@ pub enum Commands { /// Also verify `Confidence < Medium` findings dynamically. /// - /// By default only `Confidence >= Medium` findings are verified (§5.1). - /// Pass this flag to run verification on all findings regardless of - /// confidence. Intended for corpus-building and backfill runs. + /// By default only `Confidence >= Medium` findings are verified. Pass + /// this flag to run verification on all findings regardless of + /// confidence. Intended for payload tuning and backfill runs. #[cfg_attr(not(feature = "dynamic"), arg(hide = true))] #[arg(long, help_heading = "Dynamic")] verify_all_confidence: bool, @@ -532,7 +532,7 @@ pub enum Commands { )] harden: Option, - // ── Baseline / patch-validation (§M6.5) ──────────────────────── + // Baseline / patch-validation /// Read a previous scan's JSON output (or a stripped .nyx/baseline.json) /// and diff it against the current scan on stable_hash. /// @@ -564,7 +564,7 @@ pub enum Commands { gate: Option, }, - /// Submit feedback on a dynamic verification verdict (§21.2). + /// Submit feedback on a dynamic verification verdict. /// /// Records a correction or confirmation for a finding's verdict in the /// local telemetry log. Requires `--features dynamic`. diff --git a/src/dynamic/framework/adapters/java_routes.rs b/src/dynamic/framework/adapters/java_routes.rs index 495964db..08963efc 100644 --- a/src/dynamic/framework/adapters/java_routes.rs +++ b/src/dynamic/framework/adapters/java_routes.rs @@ -283,10 +283,17 @@ pub fn method_formal_types(method: Node<'_>, bytes: &[u8]) -> Vec<(String, Strin /// Extract placeholder names from a route path template. /// -/// Supports two placeholder syntaxes: +/// Supports three placeholder syntaxes: /// - JAX-RS / Spring / Micronaut: `/users/{id}` → `id`, /// `/users/{id:[0-9]+}` → `id`. -/// - Servlet-mapping `*` wildcards: ignored (no name to bind). +/// - Spring 5.3+ capture-all variables: `/files/{*path}` → `path` +/// (matches the remainder of the URI including slashes). +/// - Bare Ant-style `*` / `**` wildcards (`/users/*`, `/files/**`): +/// intentionally yield no placeholders. They are unnamed by Spring's +/// `AntPathMatcher` and cannot bind by formal name; handlers that +/// need the matched segment use `HttpServletRequest.getRequestURI()` +/// (already routed to [`ParamSource::Implicit`]) or the named +/// `{*name}` capture-all syntax above. pub fn extract_path_placeholders(path: &str) -> Vec { let mut out: Vec = Vec::new(); let bytes = path.as_bytes(); @@ -295,7 +302,8 @@ pub fn extract_path_placeholders(path: &str) -> Vec { if bytes[i] == b'{' && let Some(end) = bytes[i + 1..].iter().position(|&b| b == b'}') { let inner = &path[i + 1..i + 1 + end]; - let name = inner.split(':').next().unwrap_or(inner).trim(); + let inner_name = inner.split(':').next().unwrap_or(inner).trim(); + let name = inner_name.strip_prefix('*').unwrap_or(inner_name); if !name.is_empty() && !out.iter().any(|n| n == name) { out.push(name.to_owned()); } @@ -420,6 +428,26 @@ mod tests { assert_eq!(extract_path_placeholders("/u/{id:[0-9]+}"), vec!["id"]); } + #[test] + fn extracts_capture_all_variable() { + assert_eq!(extract_path_placeholders("/files/{*path}"), vec!["path"]); + assert_eq!( + extract_path_placeholders("/api/{tenant}/files/{*resource}"), + vec!["tenant", "resource"] + ); + } + + #[test] + fn unnamed_ant_globs_yield_no_placeholders() { + // Bare `*` and `**` are unnamed by Spring's AntPathMatcher and have + // no name to bind a formal to. Handlers that need the matched + // segment use the request object (routed to [`ParamSource::Implicit`]) + // or the named `{*name}` capture-all syntax above. + assert!(extract_path_placeholders("/users/*").is_empty()); + assert!(extract_path_placeholders("/files/**").is_empty()); + assert!(extract_path_placeholders("/a/*/b/**/c").is_empty()); + } + #[test] fn join_drops_double_slash() { assert_eq!(join_route_path("/api", "/x"), "/api/x"); diff --git a/src/dynamic/framework/adapters/ruby_rails.rs b/src/dynamic/framework/adapters/ruby_rails.rs index f1437755..5d7fa484 100644 --- a/src/dynamic/framework/adapters/ruby_rails.rs +++ b/src/dynamic/framework/adapters/ruby_rails.rs @@ -18,7 +18,7 @@ use tree_sitter::Node; use super::ruby_routes::{ bind_path_params, class_extends, class_name, find_class_with_method, first_string_arg, - kwarg_string, method_formal_names, source_imports_rails, verb_from_ident, + first_symbol_arg, kwarg_string, method_formal_names, source_imports_rails, verb_from_ident, }; pub struct RubyRailsAdapter; @@ -40,9 +40,13 @@ fn class_is_rails_controller(class: Node<'_>, bytes: &[u8]) -> bool { /// Walk the file's top-level `call` nodes looking for a /// `Rails.application.routes.draw` block or bare `get / post / ...` /// dispatch lines, and return the first `(method, path)` whose -/// `to: 'controller#action'` kwarg references the target. Returns -/// `None` when no route mapping is present (the caller then falls -/// back to the conventional `/{action}` shape). +/// `to: 'controller#action'` kwarg references the target. Respects +/// `namespace :api do ... end` and `scope :v1 do ... end` / +/// `scope path: '/v1' do ... end` nesting so a route declared inside +/// such a block resolves against the prefixed path + controller name +/// Rails actually mounts it under. Returns `None` when no mapping +/// is present (the caller then falls back to the conventional +/// `/{action}` shape). fn find_route_mapping<'a>( root: Node<'a>, bytes: &'a [u8], @@ -50,7 +54,7 @@ fn find_route_mapping<'a>( action: &str, ) -> Option<(HttpMethod, String)> { let mut hit: Option<(HttpMethod, String)> = None; - visit_routes(root, bytes, controller, action, &mut hit); + visit_routes(root, bytes, controller, action, "", "", &mut hit); hit } @@ -59,19 +63,98 @@ fn visit_routes<'a>( bytes: &'a [u8], controller: &str, action: &str, + path_prefix: &str, + ctrl_prefix: &str, out: &mut Option<(HttpMethod, String)>, ) { if out.is_some() { return; } - if node.kind() == "call" - && let Some(found) = try_route_mapping(node, bytes, controller, action) { + if node.kind() == "call" { + if let Some((kind, ident)) = route_nesting_kind(node, bytes) { + let (path_pfx, ctrl_pfx) = match kind { + NestingKind::Namespace => ( + format!("{path_prefix}/{ident}"), + format!("{ctrl_prefix}{ident}/"), + ), + NestingKind::ScopeSymbol => ( + format!("{path_prefix}/{ident}"), + format!("{ctrl_prefix}{ident}/"), + ), + NestingKind::ScopePath => (format!("{path_prefix}/{ident}"), ctrl_prefix.to_owned()), + }; + recurse_into_block(node, bytes, controller, action, &path_pfx, &ctrl_pfx, out); + return; + } + if let Some(found) = try_route_mapping(node, bytes, controller, action, path_prefix, ctrl_prefix) { *out = Some(found); return; } + } let mut cur = node.walk(); for child in node.children(&mut cur) { - visit_routes(child, bytes, controller, action, out); + visit_routes(child, bytes, controller, action, path_prefix, ctrl_prefix, out); + } +} + +enum NestingKind { + Namespace, + ScopeSymbol, + ScopePath, +} + +/// If `call` is a routes-DSL nesting block (`namespace :api do ... end`, +/// `scope :v1 do ... end`, or `scope path: '/v1' do ... end`) return +/// the kind + the extracted identifier (a bare token for namespace / +/// symbol-scope, a leading-slash-stripped path for path-scope). +fn route_nesting_kind<'a>(call: Node<'a>, bytes: &'a [u8]) -> Option<(NestingKind, String)> { + let mut cur = call.walk(); + let mut ident: Option<&str> = None; + let mut args: Option> = None; + for child in call.named_children(&mut cur) { + match child.kind() { + "identifier" => ident = child.utf8_text(bytes).ok(), + "argument_list" => args = Some(child), + _ => {} + } + } + let ident = ident?; + let args = args?; + match ident { + "namespace" => { + let sym = first_symbol_arg(args, bytes)?; + Some((NestingKind::Namespace, sym)) + } + "scope" => { + if let Some(sym) = first_symbol_arg(args, bytes) { + Some((NestingKind::ScopeSymbol, sym)) + } else { + let path = kwarg_string(args, bytes, "path")?; + let trimmed = path.trim_start_matches('/').to_owned(); + if trimmed.is_empty() { + return None; + } + Some((NestingKind::ScopePath, trimmed)) + } + } + _ => None, + } +} + +fn recurse_into_block<'a>( + call: Node<'a>, + bytes: &'a [u8], + controller: &str, + action: &str, + path_prefix: &str, + ctrl_prefix: &str, + out: &mut Option<(HttpMethod, String)>, +) { + let mut cur = call.walk(); + for child in call.named_children(&mut cur) { + if child.kind() == "do_block" || child.kind() == "block" { + visit_routes(child, bytes, controller, action, path_prefix, ctrl_prefix, out); + } } } @@ -80,6 +163,8 @@ fn try_route_mapping<'a>( bytes: &'a [u8], controller: &str, action: &str, + path_prefix: &str, + ctrl_prefix: &str, ) -> Option<(HttpMethod, String)> { let mut cur = call.walk(); let mut verb: Option = None; @@ -100,8 +185,14 @@ fn try_route_mapping<'a>( let path = first_string_arg(args, bytes)?; let to = kwarg_string(args, bytes, "to")?; let (ctrl, act) = to.split_once('#')?; - if controller_matches(ctrl, controller) && act == action { - return Some((verb, path)); + let full_ctrl = format!("{ctrl_prefix}{ctrl}"); + if controller_matches(&full_ctrl, controller) && act == action { + let full_path = if path_prefix.is_empty() { + path + } else { + format!("{}/{}", path_prefix, path.trim_start_matches('/')) + }; + return Some((verb, full_path)); } None } @@ -269,6 +360,51 @@ mod tests { assert!(matches!(id.source, crate::dynamic::framework::ParamSource::PathSegment(_))); } + #[test] + fn routes_draw_namespace_applies_prefix_to_path_and_controller() { + let src: &[u8] = b"Rails.application.routes.draw do\n namespace :api do\n get '/users', to: 'users#index'\n end\nend\n\nclass Api::UsersController < ApplicationController\n def index\n 'ok'\n end\nend\n"; + let tree = parse(src); + let binding = RubyRailsAdapter + .detect(&summary("index"), tree.root_node(), src) + .expect("binding"); + let route = binding.route.unwrap(); + assert_eq!(route.path, "/api/users"); + assert_eq!(route.method, HttpMethod::GET); + } + + #[test] + fn routes_draw_scope_path_prefixes_path_only() { + let src: &[u8] = b"Rails.application.routes.draw do\n scope path: '/v1' do\n get '/users', to: 'users#index'\n end\nend\n\nclass UsersController < ApplicationController\n def index\n 'ok'\n end\nend\n"; + let tree = parse(src); + let binding = RubyRailsAdapter + .detect(&summary("index"), tree.root_node(), src) + .expect("binding"); + let route = binding.route.unwrap(); + assert_eq!(route.path, "/v1/users"); + } + + #[test] + fn routes_draw_scope_symbol_prefixes_path_and_controller() { + let src: &[u8] = b"Rails.application.routes.draw do\n scope :admin do\n get '/users', to: 'users#index'\n end\nend\n\nclass Admin::UsersController < ApplicationController\n def index\n 'ok'\n end\nend\n"; + let tree = parse(src); + let binding = RubyRailsAdapter + .detect(&summary("index"), tree.root_node(), src) + .expect("binding"); + let route = binding.route.unwrap(); + assert_eq!(route.path, "/admin/users"); + } + + #[test] + fn routes_draw_nested_namespaces_compose_prefixes() { + let src: &[u8] = b"Rails.application.routes.draw do\n namespace :api do\n namespace :v1 do\n get '/users', to: 'users#index'\n end\n end\nend\n\nclass Api::V1::UsersController < ApplicationController\n def index\n 'ok'\n end\nend\n"; + let tree = parse(src); + let binding = RubyRailsAdapter + .detect(&summary("index"), tree.root_node(), src) + .expect("binding"); + let route = binding.route.unwrap(); + assert_eq!(route.path, "/api/v1/users"); + } + #[test] fn skips_when_class_is_not_a_controller() { let src: &[u8] = b"class Foo\n def bar\n 'ok'\n end\nend\n"; diff --git a/src/dynamic/framework/adapters/ruby_routes.rs b/src/dynamic/framework/adapters/ruby_routes.rs index 4971d83d..e3a3c8d6 100644 --- a/src/dynamic/framework/adapters/ruby_routes.rs +++ b/src/dynamic/framework/adapters/ruby_routes.rs @@ -145,7 +145,7 @@ fn named_child_of_kind<'a>(node: Node<'a>, kind: &str) -> Option> { pub fn class_name<'a>(class: Node<'a>, bytes: &'a [u8]) -> Option<&'a str> { let mut cur = class.walk(); for c in class.named_children(&mut cur) { - if c.kind() == "constant" { + if c.kind() == "constant" || c.kind() == "scope_resolution" { return c.utf8_text(bytes).ok(); } } @@ -352,6 +352,22 @@ fn is_implicit_formal(name: &str) -> bool { matches!(name, "env" | "request" | "req" | "params" | "response" | "res") } +/// Read the first positional symbol argument (`:foo`) from an +/// `argument_list` child. Used by the Rails router DSL to pull the +/// namespace name out of `namespace :api do ... end` and the +/// positional form of `scope :v1 do ... end`. The returned string +/// is the symbol's identifier portion without the leading colon. +pub fn first_symbol_arg<'a>(args: Node<'a>, bytes: &'a [u8]) -> Option { + let mut cur = args.walk(); + for c in args.named_children(&mut cur) { + if c.kind() == "simple_symbol" { + let raw = c.utf8_text(bytes).ok()?; + return Some(raw.trim_start_matches(':').to_owned()); + } + } + None +} + /// Read the first positional string-literal argument from an /// `argument_list` child. Used by every Ruby route adapter to pull /// a path template out of `get '/run' do ... end` and the Rails diff --git a/src/dynamic/repro.rs b/src/dynamic/repro.rs index 863b699e..d43aca3c 100644 --- a/src/dynamic/repro.rs +++ b/src/dynamic/repro.rs @@ -565,10 +565,9 @@ pub enum ReplayResult { /// Tri-state map of [`ReplayResult`] onto the eval-corpus /// `VerifyResult::replay_stable` field shape. /// -/// * `Some(true)` — replay matched the recorded outcome. -/// * `Some(false)` — replay diverged or aborted in a way that the M7 -/// Gate-5 inversion treats as instability. -/// * `None` — replay was not informative (toolchain mismatched, docker +/// * `Some(true)` - replay matched the recorded outcome. +/// * `Some(false)` - replay diverged or aborted. +/// * `None` - replay was not informative (toolchain mismatched, docker /// unavailable, or the bundle had no `reproduce.sh`). The corpus /// tabulator treats `None` as "no signal" and excludes the row from /// the per-cell `stable_replays` numerator. @@ -582,15 +581,14 @@ pub fn replay_stability(result: &ReplayResult) -> Option { } } -/// Phase 28 — Track H.3. Run `reproduce.sh` in `bundle_root` and map the -/// shell exit code into a [`ReplayResult`]. +/// Run `reproduce.sh` in `bundle_root` and map the shell exit code into a +/// [`ReplayResult`]. /// /// `extra_args` is appended to `reproduce.sh` (`--docker` when the caller /// wants the docker backend; empty for the process backend). /// -/// This is the host-side companion to the M7 Gate 5 inversion: callers -/// who want "did this bundle replay green?" semantics see a typed result -/// and the M7 gate script gets a uniform contract to assert against. +/// Callers who want "did this bundle replay green?" semantics get a typed +/// result instead of parsing shell output. pub fn replay_bundle( bundle_root: &Path, extra_args: &[&str], diff --git a/src/dynamic/telemetry.rs b/src/dynamic/telemetry.rs index 917042ec..b82e8f27 100644 --- a/src/dynamic/telemetry.rs +++ b/src/dynamic/telemetry.rs @@ -1,9 +1,9 @@ -//! Telemetry event log (§21.1). +//! Telemetry event log. //! //! Writes one JSON line per verdict to `~/.cache/nyx/dynamic/events.jsonl`. -//! `NYX_NO_TELEMETRY=1` silently disables all writes (§21.4). +//! `NYX_NO_TELEMETRY=1` silently disables all writes. //! -//! # Schema (Phase 27) +//! # Schema //! //! Every record starts with three envelope fields so the on-disk format can //! evolve across releases without silently mixing incompatible records: @@ -12,11 +12,10 @@ //! - `nyx_version`: the Cargo package version that wrote the record. //! - `corpus_version`: the payload-corpus version active at write time. //! -//! Followed by a `kind` discriminator (`"verdict"` or `"rank_delta"`). All -//! readers (`read_events`, the M7 ship gate) require `schema_version == -//! [`SCHEMA_VERSION`]; mismatched records produce -//! [`TelemetryReadError::SchemaMismatch`] instead of being silently parsed -//! as if they matched. +//! Followed by a `kind` discriminator (`"verdict"` or `"rank_delta"`). All +//! readers require `schema_version == SCHEMA_VERSION`; mismatched records +//! produce [`TelemetryReadError::SchemaMismatch`] instead of being silently +//! parsed as if they matched. //! //! ```json //! { @@ -258,12 +257,10 @@ fn lang_from_path(path: &str) -> String { .unwrap_or_else(|| "unknown".to_owned()) } -/// Sampling decision for telemetry writes (Phase 27, Track H.2). +/// Sampling decision for telemetry writes. /// -/// Confirmed and Inconclusive verdicts are calibration-critical (false-Confirmed -/// rate gates M7 ship; Inconclusive reasons drive the spec-derivation roadmap) -/// and are always retained. Other verdict statuses can be downsampled to bound -/// log growth on high-volume scans. +/// Confirmed and Inconclusive verdicts are kept for calibration. Other verdict +/// statuses can be downsampled to bound log growth on high-volume scans. /// /// The decision is seeded by `spec_hash` so the *same* finding makes the *same* /// keep-or-drop call across reruns. Without this, two scans of the same project @@ -413,12 +410,11 @@ pub fn log_path() -> Option { events_log_path() } -// ── Reading events back (Phase 27) ─────────────────────────────────────────── +// Reading events back /// Structured error returned by [`read_events`]. /// -/// Surfaced to the M7 ship gate so Gate 2 can fail loudly on schema-mismatch -/// rather than silently treating mismatched records as "no data". +/// Returned when a log mixes records from incompatible schema versions. #[derive(Debug, thiserror::Error)] pub enum TelemetryReadError { #[error("io error reading {path}: {source}")] @@ -451,14 +447,12 @@ pub enum TelemetryReadError { /// /// Returns each line as a `serde_json::Value` so callers can dispatch on the /// `kind` discriminator themselves. Rejects any record whose `schema_version` -/// does not match [`SCHEMA_VERSION`] (this is the explicit failure mode the -/// M7 ship gate Gate 2 consumes; a v0 record from an older release must not -/// silently parse as if the schema had never changed). +/// does not match [`SCHEMA_VERSION`]. A v0 record from an older release must +/// not silently parse as if the schema had never changed. /// -/// Blank lines are skipped. Any malformed JSON or missing `schema_version` -/// fails the whole read; partial recovery is not the contract here because -/// the ship gate already treats "log missing or unreadable" as "no data, -/// skip Gate 2 with a notice." +/// Blank lines are skipped. Any malformed JSON or missing `schema_version` +/// fails the whole read; partial recovery is not the contract for telemetry +/// logs. pub fn read_events(path: &Path) -> Result, TelemetryReadError> { let file = std::fs::File::open(path).map_err(|e| TelemetryReadError::Io { path: path.to_path_buf(), @@ -551,8 +545,8 @@ pub fn feedback_wrong_for_finding(path: &Path, finding_id: &str) -> Option /// One telemetry event per ranked finding that carries a dynamic verdict delta. /// /// Emitted by `rank::rank_diags` for every diag whose dynamic verdict shifts -/// its rank score (delta != 0). Used by the M7 calibration pipeline to tune -/// the N/M boost/penalty constants from real-world verdict distributions. +/// its rank score (delta != 0). Used to tune the N/M boost/penalty constants +/// from real-world verdict distributions. #[derive(Debug, serde::Serialize, serde::Deserialize)] pub struct RankDeltaEvent { pub schema_version: u32, diff --git a/src/dynamic/verify.rs b/src/dynamic/verify.rs index e9b98a91..44febb6c 100644 --- a/src/dynamic/verify.rs +++ b/src/dynamic/verify.rs @@ -85,14 +85,11 @@ pub struct VerifyOptions { /// Default `false`. [`Self::from_config`] honours the /// `NYX_VERIFY_REPLAY_STABLE` environment variable (`1` / `true`). pub replay_stable_check: bool, - /// Phase 31 follow-up: when `true` and `replay_stable_check` is also - /// `true`, the verifier passes `--docker` to `reproduce.sh` instead of - /// running it through the host's process backend. Lets the eval-corpus - /// driver mark `replay_stable` based on the bare-image replay path so - /// the M7 ship-gate's Gate 5 reflects the docker bundle's green/red - /// signal — required when the corpus walks a host that has stripped - /// the language toolchains (the bare-image CI matrix at - /// `.github/workflows/repro-bare.yml`). + /// When `true` and `replay_stable_check` is also `true`, the verifier + /// passes `--docker` to `reproduce.sh` instead of running it through the + /// host's process backend. This lets eval-corpus runs mark + /// `replay_stable` from the bare-image replay path when the host has + /// stripped language toolchains. /// /// Default `false`. [`Self::from_config`] honours the /// `NYX_VERIFY_REPLAY_DOCKER` environment variable (`1` / `true`). diff --git a/src/rank.rs b/src/rank.rs index b3e3a920..3dd8e095 100644 --- a/src/rank.rs +++ b/src/rank.rs @@ -99,7 +99,7 @@ pub fn compute_attack_rank(diag: &Diag) -> AttackRank { // All other verdicts (Unsupported, Inconclusive, no verdict) are // unaffected: no data is better than speculative data. // - // Calibrated values (M7 eval corpus): N=20, M=5. + // Calibrated values from the eval corpus: N=20, M=5. // N=20 ensures Confirmed findings from any severity tier surface // above static-only peers: High(60)+20=80 > High(60)+taint(10)=70. // M=5 nudges exhausted-corpus NotConfirmed below equal static peers @@ -209,7 +209,7 @@ pub fn rank_diags(diags: &mut [Diag]) { if !rank.components.is_empty() { d.rank_reason = Some(rank.components.clone()); } - // Emit rank-delta telemetry for M7 calibration (§21 / deferred M7 hook). + // Emit rank-delta telemetry for score calibration. // Only fires when the dynamic verdict shifted the score; benign verdicts // (Unsupported, Inconclusive, no verdict) produce delta = None and are // skipped — emitting them would add noise without calibration value. @@ -247,17 +247,16 @@ pub fn rank_diags(diags: &mut [Diag]) { /// Returns `None` when there is no verdict (static-only scan) or the verdict /// does not change the score (Unsupported, Inconclusive). /// -/// Design note (§deferred M7 payload_corpus_complete): the spec originally -/// distinguished `NotConfirmed` + `payload_corpus_complete == true` → `-M` -/// from `NotConfirmed` + `NoPayloadsForCap` → no change. In practice the +/// Design note: the spec originally distinguished `NotConfirmed` + +/// `payload_corpus_complete == true` from `NotConfirmed` + +/// `NoPayloadsForCap`. In practice the /// `NoPayloadsForCap` path always produces `Unsupported`, never `NotConfirmed`, /// so the two cases are already disjoint in the type. The heuristic /// `!dv.attempts.is_empty()` (corpus was actually tried) is equivalent to -/// `payload_corpus_complete == true` for all reachable states — no extra -/// field is needed. See also §deferred decision in `.pitboss/play/deferred.md`. +/// `payload_corpus_complete == true` for all reachable states, so no extra +/// field is needed. /// -/// Values calibrated against M7 eval corpus (OWASP Benchmark v1.2 + in-house curated set): -/// N=20, M=5 — see `docs/dynamic_eval_m7.md` for precision/recall breakdowns. +/// Values calibrated against the eval corpus: N=20, M=5. fn dynamic_verdict_delta(diag: &Diag) -> Option { use crate::evidence::VerifyStatus; let dv = diag.evidence.as_ref()?.dynamic_verdict.as_ref()?; diff --git a/src/server/routes/scans.rs b/src/server/routes/scans.rs index bc695973..1f8a225a 100644 --- a/src/server/routes/scans.rs +++ b/src/server/routes/scans.rs @@ -36,9 +36,9 @@ struct StartScanRequest { engine_profile: Option, /// Override dynamic verification for this scan. /// - /// `true` — force on even if config says off. - /// `false` — force off even if config says on (M7 default-on). - /// absent — inherit config default (true since M7). + /// `true` - force on even if config says off. + /// `false` - force off even if config says on. + /// absent - inherit config default. /// /// Requires `--features dynamic`; `true` returns 400 when the /// feature is absent. diff --git a/src/utils/config.rs b/src/utils/config.rs index e9ac0338..36447204 100644 --- a/src/utils/config.rs +++ b/src/utils/config.rs @@ -251,8 +251,8 @@ pub struct ScannerConfig { /// Run dynamic verification on each finding after the static pass. /// - /// Default `true` (M7 flip). Each `Confidence >= Medium` finding is - /// passed to `dynamic::verify_finding` and the result is stored in + /// Default `true`. Each `Confidence >= Medium` finding is passed to + /// `dynamic::verify_finding` and the result is stored in /// `Evidence::dynamic_verdict`. Use `--no-verify` (CLI) or set /// `verify = false` in `nyx.toml` to disable. /// diff --git a/tests/eval_corpus/budget.toml b/tests/eval_corpus/budget.toml index f9bd2d0d..3e2bf855 100644 --- a/tests/eval_corpus/budget.toml +++ b/tests/eval_corpus/budget.toml @@ -1,17 +1,10 @@ -# Phase 31: ratchet values set to the headline targets. +# Eval corpus budget. # -# These are the published acceptance numbers behind the dynamic-verification -# overhaul (see `docs/dynamic.md` "Headline metrics"). The ratchet schedule -# from Phase 29 collapsed into a single target row: every (cap, lang) cell is -# now gated against the same headline thresholds. Per-cell carve-outs were -# dropped in Phase 31; if a cell is still wider than these numbers in practice -# it shows up as a per-cell `FAIL` in `report.py` and as a gate-1 failure in -# `scripts/m7_ship_gate.sh`, which is the intended forcing function for the -# remaining engine follow-ups tracked in `.pitboss/play/deferred.md`. +# `report.py` enforces these values when `run.sh` or `run_full.sh` pass +# `--budget`. Each (cap, lang) cell uses the default row unless a specific +# override appears below. # -# Wall-clock cost (≤ 2× static-only) is enforced separately by Gate 3 of -# `scripts/m7_ship_gate.sh` against `benches/fixtures/`; it is not a per-cell -# budget knob and has no entry in this file. +# Wall-clock cost is measured separately from this per-cell budget. # # Schema: # diff --git a/tests/eval_corpus/run.sh b/tests/eval_corpus/run.sh index 3426c4f5..9290092a 100755 --- a/tests/eval_corpus/run.sh +++ b/tests/eval_corpus/run.sh @@ -1,23 +1,23 @@ #!/usr/bin/env bash -# Eval corpus runner for M7 pre-flip gate calibration. +# Eval corpus runner. # # Usage: # tests/eval_corpus/run.sh [--output DIR] [--nyx BIN] [--sets owasp,sard,inhouse] # -# Bootstraps OWASP Benchmark v1.2, NIST SARD subset, and in-house -# bughunt-curated fixtures. Runs `nyx scan --verify` on each. Emits +# Bootstraps OWASP Benchmark v1.2, the NIST SARD subset, and Nyx benchmark +# fixtures. Runs `nyx scan --verify` on each. Emits # per-cell (cap x language) precision/recall table and per-cap Unsupported # rate to stdout (and --output DIR if given). # # Environment: -# NYX_EVAL_CORPUS_DIR — path to pre-downloaded corpus roots +# NYX_EVAL_CORPUS_DIR - path to pre-downloaded corpus roots # (default: ~/.cache/nyx/eval_corpus) -# NYX_BIN — path to nyx binary (default: ./target/release/nyx) +# NYX_BIN - path to nyx binary (default: ./target/release/nyx) # # Exit codes: -# 0 — all gate thresholds met -# 1 — setup or I/O error -# 2 — one or more gate thresholds exceeded (see output for details) +# 0 - all budget thresholds met +# 1 - setup or I/O error +# 2 - one or more budget thresholds exceeded (see output for details) set -euo pipefail @@ -173,9 +173,8 @@ python3 "${SCRIPT_DIR}/report.py" \ ${DIFF_FILE:+--diff "$DIFF_FILE"} REPORT_RC=$? set -e -# Propagate gate-fail (exit 2) and malformed-config (exit 3) so the -# m7_ship_gate.sh Gate-1 dispatch can tell them apart. Treat other -# non-zero as setup error (exit 1). +# Propagate budget failures (exit 2) and malformed config (exit 3). Treat other +# non-zero exits as setup errors. if [[ $REPORT_RC -eq 2 ]]; then exit 2 elif [[ $REPORT_RC -eq 3 ]]; then diff --git a/tests/eval_corpus/run_full.sh b/tests/eval_corpus/run_full.sh index 3e15e2ab..381ddcc9 100755 --- a/tests/eval_corpus/run_full.sh +++ b/tests/eval_corpus/run_full.sh @@ -1,12 +1,10 @@ #!/usr/bin/env bash -# Phase 31: full eval-corpus orchestrator. +# Full eval-corpus orchestrator. # # Drives a complete pass against every corpus set the project knows about -# (OWASP Benchmark v1.2, the NIST SARD subset, and the in-house bughunt -# fixtures), then emits a stable `tests/eval_corpus/results.json` so -# downstream consumers (M7 ship gate, monotonic-improvement diff, the -# headline metrics table in `docs/dynamic.md`) can read a single -# well-known path. +# (OWASP Benchmark v1.2, the NIST SARD subset, and the Nyx benchmark +# fixtures), then emits `tests/eval_corpus/results.json` for reports, +# diffs, and docs. # # Usage: # tests/eval_corpus/run_full.sh [--nyx BIN] [--budget FILE] [--diff FILE] @@ -15,11 +13,9 @@ # Differences vs `run.sh`: # * Always runs every set (no `--sets` selector). # * Always passes `--budget tests/eval_corpus/budget.toml` so the -# headline targets (Unsupported < 20%, FalseConfirmed < 2%, Repro -# stability >= 95%) gate every pass. +# configured per-cell limits are checked on every pass. # * Copies the timestamped results file to -# `tests/eval_corpus/results.json` (canonical path consumed by -# `scripts/m7_ship_gate.sh` and the published metrics doc). +# `tests/eval_corpus/results.json`. # # Exit codes: # 0 every set ran and the merged result met the per-cell budget. diff --git a/tests/eval_corpus/tabulate.py b/tests/eval_corpus/tabulate.py index d022337b..36c3702d 100644 --- a/tests/eval_corpus/tabulate.py +++ b/tests/eval_corpus/tabulate.py @@ -415,8 +415,8 @@ def main() -> int: elif status == "Confirmed": cells[key]["confirmed"] += 1 # Repro-stability and false-Confirmed counts are optional - # fields tabulate.py reads off the verdict when callers - # (m7_ship_gate.sh / corpus_promote.yml) have stamped them. + # fields tabulate.py reads off the verdict when callers have + # stamped them. if dv.get("wrong") is True: cells[key]["wrong_confirmed"] += 1 if dv.get("replay_stable") is True: diff --git a/tests/telemetry_schema.rs b/tests/telemetry_schema.rs index c1c0a04f..7f290e65 100644 --- a/tests/telemetry_schema.rs +++ b/tests/telemetry_schema.rs @@ -1,14 +1,13 @@ -//! Phase 27 — Track H.1 integration test. +//! Dynamic telemetry schema tests. //! -//! Locks in the on-disk telemetry schema contract that `scripts/m7_ship_gate.sh` -//! Gate 2 relies on: +//! Locks in the on-disk telemetry schema contract: //! //! - Records produced today carry the `schema_version`, `nyx_version`, and //! `corpus_version` envelope fields, plus a `kind` discriminator. //! - `read_events(path)` accepts the current schema. //! - A hand-crafted record with `schema_version: 0` is rejected by //! `read_events` with a typed [`TelemetryReadError::SchemaMismatch`] (this -//! is the explicit Phase 27 acceptance bullet). +//! is the required failure mode for mixed-schema logs). //! - The sampling policy retains Confirmed and Inconclusive verdicts even at //! `sample_rate_other = 0.0`.