mirror of
https://github.com/elicpeter/nyx.git
synced 2026-06-09 19:45:13 +02:00
[pitboss/grind] deferred session-0002 (20260521T143544Z-f898)
This commit is contained in:
parent
be4021d8c0
commit
b3766311fb
20 changed files with 388 additions and 664 deletions
|
|
@ -46,7 +46,7 @@ A focused release on three fronts: an attack-surface map and chain composer that
|
|||
- **New `Cap` corpora.** Vulnerable + patched fixtures landed for the seven new cap classes (LDAP injection, XPath injection, header injection, open redirect, SSTI, XXE, prototype pollution) plus deserialization, crypto, JSON parsing, unauthorized-id, and data exfiltration. Every cap now carries at least one positive / negative / adversarial / unsupported fixture quad per supported language.
|
||||
- **OWASP Benchmark v1.2 importer.** `tests/eval_corpus/owasp_gt_convert.py` converts the OWASP Java Benchmark expected-results manifest into Nyx ground truth and lands a 16k-line `owasp_benchmark_v1.2.json` for evaluation.
|
||||
- **NIST SARD importer.** `tests/eval_corpus/sard_gt_convert.py` converts SARD test cases into the same format so cross-dataset recall numbers stay comparable.
|
||||
- **`scripts/m7_ship_gate.sh`** runs five gates against `tests/eval_corpus/budget.toml`: Unsupported under 20% per `(cap, lang)` cell, False-Confirmed under 2% per cap, repro stability at or above 95%, wall-clock no more than 2× static-only, sandbox-escape suite green. `tests/eval_corpus/run_full.sh` is the canonical orchestrator and writes `tests/eval_corpus/results.json` for the gate plus the published metrics table in `docs/dynamic.md`.
|
||||
- **Evaluation corpus tooling.** `tests/eval_corpus/run_full.sh` runs the Nyx benchmark, OWASP Benchmark, and NIST SARD evaluation sets and writes `tests/eval_corpus/results.json`. `tests/eval_corpus/report.py` and `tabulate.py` produce the per-cap and per-language summary used to track coverage and accuracy.
|
||||
|
||||
### Engine
|
||||
|
||||
|
|
|
|||
249
docs/dynamic.md
249
docs/dynamic.md
|
|
@ -1,125 +1,116 @@
|
|||
# Dynamic verification
|
||||
|
||||
Nyx verifies every `Confidence >= Medium` finding by default: it builds
|
||||
a minimal harness, runs your code's entry point against a curated payload corpus
|
||||
inside a sandbox, and records the verdict in each finding's evidence block.
|
||||
Nyx re-runs findings in generated harnesses when verification is enabled. By
|
||||
default, `nyx scan` verifies each `Confidence >= Medium` finding, tries
|
||||
payloads in a sandbox, and writes the result to `evidence.dynamic_verdict`.
|
||||
|
||||
## Headline metrics
|
||||
Dynamic verification is a second signal, not a replacement for review. A
|
||||
confirmed verdict means Nyx triggered the sink in its harness. `NotConfirmed`
|
||||
means the harness ran but no payload fired.
|
||||
|
||||
The dynamic-verification overhaul ships with four published acceptance targets,
|
||||
gated end-to-end by `scripts/m7_ship_gate.sh` (Phase 31) against the eval
|
||||
corpus (OWASP Benchmark v1.2 + NIST SARD subset + the in-house curated set
|
||||
from `tests/benchmark/corpus`):
|
||||
## Running it
|
||||
|
||||
| Metric | Target | Gate | Source |
|
||||
| --- | --- | --- | --- |
|
||||
| Unsupported% per `(cap, lang)` cell | < 20% | M7 Gate 1 | `tests/eval_corpus/budget.toml` → `[default].unsupported_rate` |
|
||||
| False-Confirmed% per cap | < 2% | M7 Gate 2 | `~/.cache/nyx/dynamic/events.jsonl` (`kind: feedback`, `wrong: true`) |
|
||||
| Repro stability | ≥ 95% | M7 Gate 5 | `~/.cache/nyx/dynamic/repro/*/reproduce.sh` exit 0 |
|
||||
| Wall-clock cost | ≤ 2× static-only | M7 Gate 3 | `benches/fixtures/` (default vs `--no-verify`) |
|
||||
|
||||
The corresponding orchestrator is `tests/eval_corpus/run_full.sh`; it bundles
|
||||
the three corpus sets, writes a canonical `tests/eval_corpus/results.json`,
|
||||
and propagates the per-cell budget through `tabulate.py` and `report.py`.
|
||||
|
||||
A non-zero exit from `m7_ship_gate.sh` is a hard merge blocker for the
|
||||
default-on flip. Failures map back to the engine follow-ups recorded in
|
||||
`.pitboss/play/deferred.md` (per-language probe-shim splicing, composite
|
||||
chain reverifier wiring, telemetry-stability stamping, et al.).
|
||||
|
||||
|
||||
## Default-on semantics
|
||||
|
||||
```
|
||||
nyx scan # verifies Medium+ findings (default)
|
||||
nyx scan --no-verify # static analysis only, no harness execution
|
||||
nyx scan --verify # same as default; explicit for clarity in scripts
|
||||
```bash
|
||||
nyx scan # verifies Medium and High confidence findings
|
||||
nyx scan --no-verify # static analysis only
|
||||
nyx scan --verify # explicit form of the default behavior
|
||||
```
|
||||
|
||||
`--no-verify` is the escape hatch. It overrides the config default for a single
|
||||
run without changing `nyx.toml`.
|
||||
Use `--no-verify` for fast local checks or editor workflows. Keep verification
|
||||
on for CI when scan time allows it.
|
||||
|
||||
### What "verified" means
|
||||
To verify low-confidence findings too:
|
||||
|
||||
A finding with `dynamic_verdict.status: Confirmed` was successfully triggered
|
||||
by at least one payload in nyx's corpus. The corpus covers common patterns for
|
||||
each vulnerability class (SQL injection, XSS, command injection, SSRF, etc.) per
|
||||
language.
|
||||
|
||||
A finding with `dynamic_verdict.status: NotConfirmed` was attempted but no
|
||||
payload fired. This is not a false-positive signal. It means the corpus did not
|
||||
have a payload that matched the specific sink variant, or the execution path was
|
||||
not reachable in the test harness.
|
||||
|
||||
A finding with `dynamic_verdict.status: Unsupported` could not be attempted.
|
||||
Common reasons: confidence below threshold, no flow steps, language or sink type
|
||||
not yet supported by the harness layer.
|
||||
|
||||
### Confidence gate
|
||||
|
||||
Only `Confidence >= Medium` findings are verified by default (§5.1). To also
|
||||
verify low-confidence findings (for corpus building or backfill), pass
|
||||
`--verify-all-confidence`:
|
||||
|
||||
```
|
||||
```bash
|
||||
nyx scan --verify-all-confidence
|
||||
```
|
||||
|
||||
This is not recommended for production scans because low-confidence findings have
|
||||
a higher false-positive rate and the harness may produce unreliable verdicts.
|
||||
Use it when tuning payloads or investigating coverage. It is slower and noisier
|
||||
than the default.
|
||||
|
||||
## nyx.toml opt-out
|
||||
## Verdicts
|
||||
|
||||
If you want static-only scans permanently, set `verify = false` in `nyx.toml`:
|
||||
| Status | Meaning |
|
||||
| --- | --- |
|
||||
| `Confirmed` | At least one payload reached the expected sink in the harness. |
|
||||
| `NotConfirmed` | The harness ran, but no payload reached the sink. Treat the original finding as still open until reviewed. |
|
||||
| `Inconclusive` | Nyx could not finish the check with enough isolation or runtime support. |
|
||||
| `Unsupported` | Nyx did not try the finding. Common causes are unsupported language, unsupported sink shape, missing flow steps, or confidence below the verification threshold. |
|
||||
|
||||
## Configuration
|
||||
|
||||
To disable verification for a project, set:
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
verify = false
|
||||
```
|
||||
|
||||
This survives upgrades. The M7 default flip only changes the inherited default
|
||||
for projects that have not explicitly set the field.
|
||||
This makes scans static-only unless the command line overrides it.
|
||||
|
||||
The related scanner settings are:
|
||||
|
||||
| Setting | Default | Meaning |
|
||||
| --- | --- | --- |
|
||||
| `verify` | `true` | Run dynamic verification after static analysis. |
|
||||
| `verify_all_confidence` | `false` | Include findings below `Confidence::Medium`. |
|
||||
| `verify_backend` | `"auto"` | Use Docker when available, otherwise use the process backend. |
|
||||
| `harden_profile` | `"standard"` | Hardening profile for the process backend. |
|
||||
|
||||
See [Configuration](configuration.md) for the full config table.
|
||||
|
||||
## Sandbox backends
|
||||
|
||||
nyx uses docker when available, then falls back to an in-process runner:
|
||||
|
||||
```
|
||||
nyx scan --backend docker # require docker; fail if unavailable
|
||||
nyx scan --backend process # in-process runner (no container; less isolation)
|
||||
```bash
|
||||
nyx scan --backend docker # require Docker
|
||||
nyx scan --backend process # run directly on the host with weaker isolation
|
||||
nyx scan --unsafe-sandbox # alias for --backend process
|
||||
```
|
||||
|
||||
The docker backend mounts only the entry file's directory and blocks all
|
||||
outbound network by default. When out-of-band detection is enabled (`oob_listener`
|
||||
in config), the container gets `--network bridge` with a host-gateway route.
|
||||
Docker is the preferred backend. It mounts only the entry file's directory and
|
||||
blocks outbound network by default. If out-of-band detection is enabled with
|
||||
`oob_listener`, Docker uses bridge networking with a host-gateway route so the
|
||||
harness can reach the listener.
|
||||
|
||||
The process backend is useful for development and machines without Docker. It
|
||||
does not provide the same isolation.
|
||||
|
||||
## Repro artifacts
|
||||
|
||||
When a finding is `Confirmed`, nyx writes a repro artifact to
|
||||
`~/.cache/nyx/repro/<stable_hash>/`. The artifact contains the harness spec and
|
||||
the triggering payload. You can regenerate the verdict with:
|
||||
Confirmed findings write a repro bundle under:
|
||||
|
||||
```
|
||||
nyx scan --verify <path> # re-scans and re-verifies
|
||||
```text
|
||||
~/.cache/nyx/dynamic/repro/<spec_hash>/
|
||||
```
|
||||
|
||||
See `docs/output.md` for the `dynamic_verdict` field schema.
|
||||
The bundle contains the harness spec, payload, expected output, trace, and
|
||||
`reproduce.sh`.
|
||||
|
||||
## Wall-clock cost
|
||||
```bash
|
||||
cd ~/.cache/nyx/dynamic/repro/<spec_hash>
|
||||
./reproduce.sh
|
||||
./reproduce.sh --docker
|
||||
```
|
||||
|
||||
Verification adds harness build + sandbox startup time per finding. On typical
|
||||
codebases with 10–50 Medium+ findings, end-to-end overhead is 2–5× static-only.
|
||||
Use the Docker form when the bundle records a pinned container image or when
|
||||
host toolchains differ from the original run.
|
||||
|
||||
If scan time is unacceptable for a given workflow (e.g. IDE integration, quick
|
||||
pre-commit check), use `--no-verify` for that workflow and rely on the full scan
|
||||
in CI.
|
||||
## Runtime cost
|
||||
|
||||
## Event schema
|
||||
Verification adds harness build time and sandbox startup time for each verified
|
||||
finding. For quick local checks, `--no-verify` is usually the right choice. For
|
||||
CI or scheduled scans, keep verification enabled so confirmed findings rank
|
||||
higher and not-confirmed findings carry the extra context.
|
||||
|
||||
The dynamic layer writes one JSON record per verdict to
|
||||
`~/.cache/nyx/dynamic/events.jsonl`. Every record begins with a fixed envelope
|
||||
so older readers fail loudly instead of silently mixing incompatible shapes:
|
||||
## Event log
|
||||
|
||||
Nyx writes verdict events to:
|
||||
|
||||
```text
|
||||
~/.cache/nyx/dynamic/events.jsonl
|
||||
```
|
||||
|
||||
Each line is a JSON object with a versioned envelope:
|
||||
|
||||
```json
|
||||
{
|
||||
|
|
@ -140,74 +131,54 @@ so older readers fail loudly instead of silently mixing incompatible shapes:
|
|||
}
|
||||
```
|
||||
|
||||
| Field | Type | Meaning |
|
||||
| --- | --- | --- |
|
||||
| `schema_version` | integer | Bumped on any breaking change. Readers reject mismatches. |
|
||||
| `nyx_version` | string | `CARGO_PKG_VERSION` of the writing binary. |
|
||||
| `corpus_version` | string | Payload-corpus version the verdict was scored against. |
|
||||
| `kind` | string | `"verdict"` (per-finding) or `"rank_delta"` (rank-score shift). |
|
||||
| `ts` | RFC-3339 string | Wall-clock at write time. |
|
||||
| `finding_id` | string | Stable finding identifier. |
|
||||
| `spec_hash` | string | Hash of the `HarnessSpec` that drove the run. |
|
||||
| `lang` | string | Language slug; `"unknown"` when spec derivation failed. |
|
||||
| `cap` | string | Sink capability (e.g. `SQL_QUERY`, `CODE_EXEC`). |
|
||||
| `status` | string | `Confirmed`, `NotConfirmed`, `Inconclusive`, or `Unsupported`. |
|
||||
| `inconclusive_reason` | string | Present iff `status == Inconclusive`. |
|
||||
| Field | Meaning |
|
||||
| --- | --- |
|
||||
| `schema_version` | Event schema version. Readers reject mismatches. |
|
||||
| `nyx_version` | Version of the Nyx binary that wrote the event. |
|
||||
| `corpus_version` | Payload corpus version used for the verdict. |
|
||||
| `kind` | `verdict`, `rank_delta`, or `feedback`. |
|
||||
| `ts` | Write time in RFC 3339 format. |
|
||||
| `finding_id` | Stable finding identifier. |
|
||||
| `spec_hash` | Hash of the harness spec. |
|
||||
| `lang` | Language slug, or `unknown` when spec derivation failed. |
|
||||
| `cap` | Sink capability, such as `SQL_QUERY` or `CODE_EXEC`. |
|
||||
| `status` | `Confirmed`, `NotConfirmed`, `Inconclusive`, or `Unsupported`. |
|
||||
| `inconclusive_reason` | Present when `status` is `Inconclusive`. |
|
||||
|
||||
A `rank_delta` record carries the envelope plus `finding_id`, `status`, and a
|
||||
signed `delta` applied to the rank score.
|
||||
If the schema changes, move or delete the old `events.jsonl` before reading it
|
||||
with the new binary. Programmatic readers should use
|
||||
`crate::dynamic::telemetry::read_events(path)`.
|
||||
|
||||
### Schema-version mismatch
|
||||
## Sampling
|
||||
|
||||
`scripts/m7_ship_gate.sh` Gate 2 walks every line of the log, requires
|
||||
`schema_version == EXPECTED_SCHEMA_VERSION`, and exits 3 if any record fails
|
||||
the check. Programmatic readers use
|
||||
`crate::dynamic::telemetry::read_events(path)`, which surfaces the same
|
||||
condition as `TelemetryReadError::SchemaMismatch { expected, found, .. }`.
|
||||
|
||||
When schema bumps land, the canonical migration is to roll the log over (move
|
||||
or delete `events.jsonl`) so new and old records never coexist in a file. The
|
||||
gate refuses to skip silently on mismatch.
|
||||
|
||||
### Sampling
|
||||
|
||||
`[telemetry]` in `nyx.toml` controls the on-disk sampling policy:
|
||||
`[telemetry]` in `nyx.toml` controls event retention:
|
||||
|
||||
```toml
|
||||
[telemetry]
|
||||
keep_all_confirmed = true # default: retain every Confirmed verdict
|
||||
keep_all_inconclusive = true # default: retain every Inconclusive verdict
|
||||
sample_rate_other = 1.0 # 0.0–1.0 for NotConfirmed / Unsupported
|
||||
keep_all_confirmed = true
|
||||
keep_all_inconclusive = true
|
||||
sample_rate_other = 1.0
|
||||
```
|
||||
|
||||
`sample_rate_other < 1.0` downsamples NotConfirmed and Unsupported verdicts
|
||||
deterministically. The decision is seeded by the finding's `spec_hash`, so a
|
||||
given finding makes the same keep-or-drop call across reruns. Confirmed and
|
||||
Inconclusive verdicts ignore the rate and are always retained (they gate the
|
||||
false-Confirmed budget and drive the spec-derivation roadmap).
|
||||
`sample_rate_other` accepts `0.0` to `1.0` and applies to `NotConfirmed` and
|
||||
`Unsupported` verdicts. The decision is deterministic for a given `spec_hash`.
|
||||
Confirmed, Inconclusive, and rank-delta events are always kept by default.
|
||||
|
||||
Rank-delta records (emitted by `emit_rank_delta` when a verdict shifts a
|
||||
finding's position in the ranked output) are also retained unconditionally and
|
||||
do **not** consult `sample_rate_other`. They are calibration-critical and small
|
||||
in volume, so the carve-out is intentional; setting `sample_rate_other = 0.0`
|
||||
to throttle log growth will still produce rank-delta lines.
|
||||
Set `NYX_NO_TELEMETRY=1` to disable event writes.
|
||||
|
||||
`NYX_NO_TELEMETRY=1` disables every write regardless of the policy.
|
||||
## Feedback
|
||||
|
||||
## Opting in to feedback
|
||||
To record a bad verdict:
|
||||
|
||||
False positives (nyx says `Confirmed` but you disagree) can be recorded:
|
||||
|
||||
```
|
||||
```bash
|
||||
nyx verify-feedback <finding_id> --wrong "reason"
|
||||
```
|
||||
|
||||
This writes to the local telemetry log (`~/.cache/nyx/dynamic/events.jsonl`)
|
||||
and contributes to precision monitoring. Feedback is never uploaded automatically.
|
||||
Feedback is written to the local event log. Nyx does not upload it.
|
||||
|
||||
## nyx serve integration
|
||||
## Browser UI
|
||||
|
||||
The browser UI shows `dynamic_verdict` in each finding's detail panel and
|
||||
uses the verdict in ranking (Confirmed findings surface first). The scan compare
|
||||
page has a **Verdict Diff** tab that shows which findings changed verification
|
||||
status between two scans.
|
||||
`nyx serve` shows dynamic verdicts on finding detail pages, uses them in
|
||||
ranking, and can compare verdict changes between saved scans.
|
||||
|
||||
See [Output formats](output.md) for the `dynamic_verdict` schema.
|
||||
|
|
|
|||
|
|
@ -12,9 +12,8 @@ nyx serve --no-browser # don't auto-open
|
|||
Persistent settings live under `[server]` in `nyx.conf` / `nyx.local`.
|
||||
|
||||
Starting a scan from the UI runs dynamic verification on `Confidence >= Medium`
|
||||
findings by default (M7). Check "Skip dynamic verification" in the scan modal
|
||||
to get a fast static-only result. See [Dynamic verification](dynamic.md) for
|
||||
details.
|
||||
findings by default. Check "Skip dynamic verification" in the scan modal to get
|
||||
a fast static-only result. See [Dynamic verification](dynamic.md) for details.
|
||||
|
||||
<p align="center"><img src="assets/screenshots/docs/serve-overview.png" alt="Nyx UI overview: total findings, severity breakdown, language and category distribution, top affected files" width="900"/></p>
|
||||
|
||||
|
|
|
|||
|
|
@ -11,9 +11,9 @@ export interface StartScanBody {
|
|||
engine_profile?: EngineProfile;
|
||||
/**
|
||||
* Override dynamic verification for this scan.
|
||||
* true — force on.
|
||||
* false — force off (skip verification; M7 default is on).
|
||||
* absent — use server config default (true since M7).
|
||||
* true - force on.
|
||||
* false - force off.
|
||||
* absent - use server config default.
|
||||
*/
|
||||
verify?: boolean;
|
||||
/** Also verify Confidence < Medium findings. Default false. */
|
||||
|
|
|
|||
|
|
@ -1,401 +0,0 @@
|
|||
#!/usr/bin/env bash
|
||||
# M7 pre-flip ship gate.
|
||||
#
|
||||
# Runs all five gates required before the default-on merge can land.
|
||||
# Must pass with exit 0 on the branch being merged.
|
||||
#
|
||||
# Usage:
|
||||
# scripts/m7_ship_gate.sh [--nyx BIN] [--corpus-dir DIR] [--skip GATE,...]
|
||||
# [--budget FILE] [--diff FILE]
|
||||
#
|
||||
# Gates:
|
||||
# 1. unsupported-rate — per-cell (cap × lang) Unsupported% within budget
|
||||
# 2. false-confirmed — false-Confirmed rate from telemetry ≤ 2% per cap
|
||||
# 3. wall-clock — default scan ≤ 2× static-only on bench suite
|
||||
# 4. sandbox-escape — sandbox escape suite green for all langs
|
||||
# 5. repro-stability — repro artifact regenerates identical verdict ≥ 95%
|
||||
#
|
||||
# Phase 29 (Track I): Gate 1 consumes per-cell budgets from
|
||||
# `tests/eval_corpus/budget.toml` and, when `--diff PREV.json` is
|
||||
# supplied, fails on any monotonic-improvement regression vs the
|
||||
# previous run.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
|
||||
NYX_BIN="${NYX_BIN:-${REPO_ROOT}/target/release/nyx}"
|
||||
CORPUS_DIR="${CORPUS_DIR:-${HOME}/.cache/nyx/eval_corpus}"
|
||||
SKIP_GATES=""
|
||||
GATE_ERRORS=0
|
||||
GATE_LOG="${REPO_ROOT}/target/m7_gate.log"
|
||||
# Phase 29 (Track I): per-cell budgets + monotonic diff.
|
||||
BUDGET_FILE="${BUDGET_FILE:-${REPO_ROOT}/tests/eval_corpus/budget.toml}"
|
||||
DIFF_FILE="${DIFF_FILE:-}"
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--nyx) NYX_BIN="$2"; shift 2 ;;
|
||||
--corpus-dir) CORPUS_DIR="$2"; shift 2 ;;
|
||||
--skip) SKIP_GATES="$2"; shift 2 ;;
|
||||
--budget) BUDGET_FILE="$2"; shift 2 ;;
|
||||
--diff) DIFF_FILE="$2"; shift 2 ;;
|
||||
*) shift ;;
|
||||
esac
|
||||
done
|
||||
|
||||
skip() { [[ ",$SKIP_GATES," == *",$1,"* ]]; }
|
||||
|
||||
die() { echo "GATE FAIL: $*" | tee -a "$GATE_LOG" >&2; GATE_ERRORS=$((GATE_ERRORS + 1)); }
|
||||
pass() { echo "GATE PASS: $*" | tee -a "$GATE_LOG"; }
|
||||
info() { echo "[gate] $*" | tee -a "$GATE_LOG"; }
|
||||
|
||||
[[ -x "$NYX_BIN" ]] || { echo "nyx binary not found: $NYX_BIN" >&2; exit 1; }
|
||||
|
||||
mkdir -p "$(dirname "$GATE_LOG")"
|
||||
echo "# M7 ship gate — $(date -u +%Y-%m-%dT%H:%M:%SZ)" > "$GATE_LOG"
|
||||
info "nyx: $NYX_BIN"
|
||||
info "corpus: $CORPUS_DIR"
|
||||
info "budget: $BUDGET_FILE"
|
||||
info "diff: ${DIFF_FILE:-<none>}"
|
||||
info ""
|
||||
|
||||
# ── Gate 1: Per-cell budget + monotonic-improvement diff ───────────────────
|
||||
#
|
||||
# Phase 29 (Track I): the single global Unsupported threshold is replaced
|
||||
# by per-cell (cap × lang) budgets in tests/eval_corpus/budget.toml.
|
||||
# `tests/eval_corpus/run.sh` invokes `tabulate.py` per set and `report.py`
|
||||
# at the end with `--budget` (and `--diff` when DIFF_FILE is set), so
|
||||
# any per-cell failure (or any regression vs the prior run) propagates
|
||||
# back as exit 2.
|
||||
if skip unsupported-rate; then
|
||||
info "Gate 1 (unsupported-rate): SKIPPED"
|
||||
else
|
||||
info "Gate 1: per-cell budget within tolerance + no monotonic regressions..."
|
||||
EVAL_RESULTS="${REPO_ROOT}/target/eval_results.json"
|
||||
echo "[]" > "$EVAL_RESULTS"
|
||||
|
||||
if [[ ! -f "$BUDGET_FILE" ]]; then
|
||||
die "Gate 1: budget file not found at $BUDGET_FILE"
|
||||
else
|
||||
# Run eval corpus runner (in-house set always present).
|
||||
set +e
|
||||
bash "${REPO_ROOT}/tests/eval_corpus/run.sh" \
|
||||
--nyx "$NYX_BIN" \
|
||||
--sets inhouse \
|
||||
--output "$(dirname "$EVAL_RESULTS")" \
|
||||
--budget "$BUDGET_FILE" \
|
||||
${DIFF_FILE:+--diff "$DIFF_FILE"} \
|
||||
>>"$GATE_LOG" 2>>"$GATE_LOG"
|
||||
RC=$?
|
||||
set -e
|
||||
cp "$(dirname "$EVAL_RESULTS")/eval_results.json" "$EVAL_RESULTS" 2>/dev/null || true
|
||||
if [[ $RC -eq 0 ]]; then
|
||||
pass "Gate 1: per-cell budget + diff check passed"
|
||||
elif [[ $RC -eq 2 ]]; then
|
||||
die "Gate 1: per-cell budget exceeded OR monotonic-improvement regression (see $GATE_LOG)"
|
||||
elif [[ $RC -eq 3 ]]; then
|
||||
die "Gate 1: budget/diff configuration is malformed (see $GATE_LOG)"
|
||||
else
|
||||
info "Gate 1: eval runner returned $RC (corpus may not be downloaded; treating as SKIP)"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
# ── Gate 2: False-Confirmed rate ─────────────────────────────────────────────
|
||||
#
|
||||
# Phase 27 (Track H.1): the telemetry log is schema-versioned. Gate 2 reads
|
||||
# `EXPECTED_SCHEMA_VERSION` against every record's `schema_version` field and
|
||||
# fails loudly with exit 3 when a mismatch is found — silently treating a
|
||||
# v0 (pre-Phase-27) log as "no data" would mask incompatible releases mixing
|
||||
# their records.
|
||||
EXPECTED_SCHEMA_VERSION=1
|
||||
|
||||
if skip false-confirmed; then
|
||||
info "Gate 2 (false-confirmed): SKIPPED"
|
||||
else
|
||||
info "Gate 2: false-Confirmed rate from telemetry ≤ 2% per cap..."
|
||||
EVENTS="${HOME}/.cache/nyx/dynamic/events.jsonl"
|
||||
if [[ ! -f "$EVENTS" ]]; then
|
||||
info "Gate 2: telemetry log not found at $EVENTS; skipping (no data)"
|
||||
else
|
||||
set +e
|
||||
python3 - "$EVENTS" "$EXPECTED_SCHEMA_VERSION" <<'PYEOF'
|
||||
import json, sys, collections
|
||||
path = sys.argv[1]
|
||||
expected_schema = int(sys.argv[2])
|
||||
cap_counts = collections.defaultdict(lambda: {"confirmed": 0, "wrong": 0})
|
||||
with open(path) as f:
|
||||
for line_no, raw in enumerate(f, start=1):
|
||||
if not raw.strip():
|
||||
continue
|
||||
try:
|
||||
ev = json.loads(raw)
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"FAIL malformed JSON at {path} line {line_no}: {e}")
|
||||
sys.exit(3)
|
||||
if "schema_version" not in ev:
|
||||
print(f"FAIL missing schema_version at {path} line {line_no}")
|
||||
sys.exit(3)
|
||||
if ev["schema_version"] != expected_schema:
|
||||
print(
|
||||
f"FAIL schema mismatch at {path} line {line_no}: "
|
||||
f"expected {expected_schema}, found {ev['schema_version']}"
|
||||
)
|
||||
sys.exit(3)
|
||||
kind = ev.get("kind", "")
|
||||
if kind == "feedback" and ev.get("wrong"):
|
||||
cap = ev.get("cap", "unknown")
|
||||
cap_counts[cap]["wrong"] += 1
|
||||
elif kind == "verdict" and ev.get("status") == "Confirmed":
|
||||
cap = ev.get("cap", "unknown")
|
||||
cap_counts[cap]["confirmed"] += 1
|
||||
|
||||
THRESHOLD = 0.02
|
||||
failed = False
|
||||
for cap, counts in sorted(cap_counts.items()):
|
||||
total = counts["confirmed"]
|
||||
wrong = counts["wrong"]
|
||||
if total == 0:
|
||||
continue
|
||||
rate = wrong / total
|
||||
if rate > THRESHOLD:
|
||||
print(f"FAIL cap={cap}: false-Confirmed rate {rate:.1%} > {THRESHOLD:.0%} (wrong={wrong}, confirmed={total})")
|
||||
failed = True
|
||||
else:
|
||||
print(f"OK cap={cap}: false-Confirmed rate {rate:.1%} (wrong={wrong}, confirmed={total})")
|
||||
sys.exit(2 if failed else 0)
|
||||
PYEOF
|
||||
RC=$?
|
||||
set -e
|
||||
if [[ $RC -eq 0 ]]; then
|
||||
pass "Gate 2: false-Confirmed rate within threshold"
|
||||
elif [[ $RC -eq 3 ]]; then
|
||||
die "Gate 2: telemetry schema mismatch (expected v$EXPECTED_SCHEMA_VERSION) — refusing to silently skip"
|
||||
else
|
||||
die "Gate 2: false-Confirmed rate exceeds 2% for one or more caps"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
# ── Gate 3: Wall-clock cost ≤ 2× static-only ────────────────────────────────
|
||||
if skip wall-clock; then
|
||||
info "Gate 3 (wall-clock): SKIPPED"
|
||||
else
|
||||
info "Gate 3: wall-clock ≤ 2× static-only on bench suite..."
|
||||
BENCH_DIR="${REPO_ROOT}/benches/fixtures"
|
||||
if [[ ! -d "$BENCH_DIR" ]]; then
|
||||
info "Gate 3: benches/fixtures not found; skipping"
|
||||
else
|
||||
# Portable epoch-millis. BSD date (macOS) lacks %3N; GNU date has it.
|
||||
ms_now() { python3 -c 'import time; print(int(time.time()*1000))'; }
|
||||
|
||||
# Static-only baseline.
|
||||
T_STATIC_START=$(ms_now)
|
||||
"$NYX_BIN" scan --no-verify --format json --no-index "$BENCH_DIR" > /dev/null 2>&1 || true
|
||||
T_STATIC_END=$(ms_now)
|
||||
T_STATIC=$(( T_STATIC_END - T_STATIC_START ))
|
||||
|
||||
# Default (with verify).
|
||||
T_VERIFY_START=$(ms_now)
|
||||
"$NYX_BIN" scan --format json --no-index "$BENCH_DIR" > /dev/null 2>&1 || true
|
||||
T_VERIFY_END=$(ms_now)
|
||||
T_VERIFY=$(( T_VERIFY_END - T_VERIFY_START ))
|
||||
|
||||
info " static-only: ${T_STATIC}ms with-verify: ${T_VERIFY}ms"
|
||||
|
||||
# Allow 2× overhead.
|
||||
LIMIT=$(( T_STATIC * 2 ))
|
||||
if [[ $T_VERIFY -le $LIMIT ]]; then
|
||||
pass "Gate 3: wall-clock ${T_VERIFY}ms ≤ 2× ${T_STATIC}ms baseline (limit ${LIMIT}ms)"
|
||||
else
|
||||
die "Gate 3: wall-clock ${T_VERIFY}ms > 2× ${T_STATIC}ms baseline (limit ${LIMIT}ms)"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
# ── Gate 4: Sandbox-escape suite ─────────────────────────────────────────────
|
||||
if skip sandbox-escape; then
|
||||
info "Gate 4 (sandbox-escape): SKIPPED"
|
||||
else
|
||||
info "Gate 4: sandbox escape suite green..."
|
||||
ESCAPE_DIR="${REPO_ROOT}/tests/dynamic_fixtures/escape"
|
||||
if [[ ! -d "$ESCAPE_DIR" ]]; then
|
||||
info "Gate 4: tests/dynamic_fixtures/escape not found; skipping"
|
||||
else
|
||||
# Run each escape fixture under both Process and Docker backends.
|
||||
# A Confirmed verdict on an escape fixture is a gate failure.
|
||||
# Additionally, check that no sentinel files were written to the host.
|
||||
ESCAPE_SENTINEL_PATHS=(
|
||||
"/tmp/nyx_escape_file_write_marker"
|
||||
)
|
||||
for _s in "${ESCAPE_SENTINEL_PATHS[@]}"; do
|
||||
rm -f "$_s"
|
||||
done
|
||||
ESCAPE_FAILED=0
|
||||
for fixture in "$ESCAPE_DIR"/*/; do
|
||||
[[ -d "$fixture" ]] || continue
|
||||
for backend in process docker; do
|
||||
set +e
|
||||
RESULT=$("$NYX_BIN" scan --format json --verify \
|
||||
--backend "$backend" --no-index "$fixture" 2>/dev/null)
|
||||
RC=$?
|
||||
set -e
|
||||
if [[ $RC -ne 0 && $RC -ne 1 ]]; then
|
||||
info " $backend/$fixture: nyx returned $RC (likely docker unavailable — skip)"
|
||||
continue
|
||||
fi
|
||||
CONFIRMED=$(echo "$RESULT" | python3 -c "
|
||||
import json,sys
|
||||
data = json.load(sys.stdin)
|
||||
findings = data if isinstance(data, list) else data.get('findings', [])
|
||||
confirmed = [
|
||||
f for f in findings
|
||||
if ((f.get('evidence') or {}).get('dynamic_verdict') or {}).get('status') == 'Confirmed'
|
||||
]
|
||||
print(len(confirmed))
|
||||
" 2>/dev/null || echo 0)
|
||||
if [[ "$CONFIRMED" -gt 0 ]]; then
|
||||
die "Gate 4: escape fixture confirmed in $backend backend: $fixture"
|
||||
ESCAPE_FAILED=1
|
||||
fi
|
||||
done
|
||||
done
|
||||
for _s in "${ESCAPE_SENTINEL_PATHS[@]}"; do
|
||||
if [[ -f "$_s" ]]; then
|
||||
die "Gate 4: escape sentinel written to host: $_s"
|
||||
ESCAPE_FAILED=1
|
||||
fi
|
||||
done
|
||||
[[ $ESCAPE_FAILED -eq 0 ]] && pass "Gate 4: sandbox escape suite green"
|
||||
for _s in "${ESCAPE_SENTINEL_PATHS[@]}"; do
|
||||
rm -f "$_s"
|
||||
done
|
||||
fi
|
||||
fi
|
||||
|
||||
# ── Gate 5: Repro stability ≥ 95% ────────────────────────────────────────────
|
||||
#
|
||||
# Phase 28 (Track H.4): inversion of the legacy "conservative — treat
|
||||
# unexpected errors as stable" rule. Old behaviour silently counted any
|
||||
# subprocess error (timeout, missing toolchain, broken pipe) as stable,
|
||||
# which let the gate pass while bundles were structurally unreplayable.
|
||||
# Phase 28 flips that: known exit codes (0 = pass, 1 = sink mismatch,
|
||||
# 2 = docker unavailable, 3 = toolchain mismatch) are classified
|
||||
# normally, but any other failure (timeout, ENOENT on `sh`, non-zero
|
||||
# code outside the documented set) is flagged as instability so the
|
||||
# gate fails loudly instead of masking the problem.
|
||||
if skip repro-stability; then
|
||||
info "Gate 5 (repro-stability): SKIPPED"
|
||||
else
|
||||
info "Gate 5: repro artifact stability ≥ 95% of Confirmed..."
|
||||
# Repro bundles live under dynamic/repro/ (written by repro.rs).
|
||||
REPRO_DIR="${HOME}/.cache/nyx/dynamic/repro"
|
||||
if [[ ! -d "$REPRO_DIR" ]] || [[ -z "$(ls -A "$REPRO_DIR" 2>/dev/null)" ]]; then
|
||||
info "Gate 5: no repro artifacts found at $REPRO_DIR; skipping"
|
||||
else
|
||||
python3 - <<'PYEOF' "$REPRO_DIR" "$NYX_BIN"
|
||||
import subprocess, sys, json, pathlib
|
||||
|
||||
# Phase 28 documented reproduce.sh exit codes.
|
||||
EXIT_PASS = 0 # sink_hit matches expected/outcome.json
|
||||
EXIT_MISMATCH = 1 # sink_hit diverged from recorded outcome
|
||||
EXIT_DOCKER_UNAVAIL = 2 # --docker requested but unavailable
|
||||
EXIT_TOOLCHAIN_MISMATCH = 3 # host toolchain mismatch in process mode
|
||||
|
||||
repro_root = pathlib.Path(sys.argv[1])
|
||||
total = 0
|
||||
stable = 0
|
||||
unstable = 0
|
||||
|
||||
# Each bundle has expected/verdict.json (written by repro.rs).
|
||||
for verdict_file in repro_root.rglob("expected/verdict.json"):
|
||||
bundle_dir = verdict_file.parent.parent # parent of expected/
|
||||
try:
|
||||
with open(verdict_file) as f:
|
||||
orig = json.load(f)
|
||||
orig_status = orig.get("status", "")
|
||||
except Exception as e:
|
||||
# Bundle is malformed. Phase 28 inversion: this is no longer
|
||||
# silently "stable"; it is a broken bundle and counts against
|
||||
# the stability rate.
|
||||
unstable += 1
|
||||
total += 1
|
||||
print(f"UNSTABLE: {bundle_dir.name} — verdict.json unreadable ({e})")
|
||||
continue
|
||||
if orig_status != "Confirmed":
|
||||
continue
|
||||
total += 1
|
||||
reproduce_sh = bundle_dir / "reproduce.sh"
|
||||
if not reproduce_sh.exists():
|
||||
# Legacy bundles without reproduce.sh used to be counted as
|
||||
# stable; Phase 28 treats them as instability because the
|
||||
# repro bundle layout has shipped reproduce.sh since the
|
||||
# first cut of the dynamic feature.
|
||||
unstable += 1
|
||||
print(f"UNSTABLE: {bundle_dir.name} — reproduce.sh missing")
|
||||
continue
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["sh", str(reproduce_sh)],
|
||||
capture_output=True,
|
||||
timeout=30,
|
||||
)
|
||||
rc = result.returncode
|
||||
if rc == EXIT_PASS:
|
||||
stable += 1
|
||||
elif rc == EXIT_MISMATCH:
|
||||
unstable += 1
|
||||
print(f"UNSTABLE: {bundle_dir.name} — sink_hit mismatch (exit 1)")
|
||||
elif rc in (EXIT_DOCKER_UNAVAIL, EXIT_TOOLCHAIN_MISMATCH):
|
||||
# Documented environmental skip codes — neither pass nor
|
||||
# fail. Exclude from the stability ratio so an offline
|
||||
# CI row does not pollute the score.
|
||||
total -= 1
|
||||
print(f"SKIP: {bundle_dir.name} — environment exit {rc}")
|
||||
else:
|
||||
# Phase 28 inversion: any other non-zero code is unexpected.
|
||||
unstable += 1
|
||||
print(f"UNSTABLE: {bundle_dir.name} — unexpected exit {rc}")
|
||||
except subprocess.TimeoutExpired:
|
||||
unstable += 1
|
||||
print(f"UNSTABLE: {bundle_dir.name} — reproduce.sh exceeded 30s")
|
||||
except Exception as e:
|
||||
# Phase 28 inversion: subprocess error is no longer silent
|
||||
# success. Anything that prevents the script from completing
|
||||
# cleanly counts against stability.
|
||||
unstable += 1
|
||||
print(f"UNSTABLE: {bundle_dir.name} — invocation error ({e})")
|
||||
|
||||
if total == 0:
|
||||
print("No Confirmed repro artifacts found; skipping stability check.")
|
||||
sys.exit(0)
|
||||
|
||||
rate = stable / total
|
||||
print(f"Repro stability: {stable}/{total} = {rate:.1%} (unstable={unstable})")
|
||||
if rate < 0.95:
|
||||
print(f"FAIL: stability {rate:.1%} < 95%")
|
||||
sys.exit(2)
|
||||
PYEOF
|
||||
RC=$?
|
||||
if [[ $RC -eq 0 ]]; then
|
||||
pass "Gate 5: repro stability ≥ 95%"
|
||||
else
|
||||
die "Gate 5: repro stability < 95%"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
# ── Summary ──────────────────────────────────────────────────────────────────
|
||||
echo ""
|
||||
info "Gate log: $GATE_LOG"
|
||||
if [[ $GATE_ERRORS -gt 0 ]]; then
|
||||
echo ""
|
||||
echo "M7 SHIP GATE FAILED: $GATE_ERRORS gate(s) did not pass."
|
||||
echo "Fix failures before merging the default-on flip."
|
||||
exit 2
|
||||
else
|
||||
echo ""
|
||||
echo "M7 SHIP GATE PASSED: all active gates green."
|
||||
exit 0
|
||||
fi
|
||||
16
src/cli.rs
16
src/cli.rs
|
|
@ -471,9 +471,9 @@ pub enum Commands {
|
|||
|
||||
/// Build a harness and dynamically verify each finding in a sandbox.
|
||||
///
|
||||
/// Dynamic verification is on by default (M7). This flag is a no-op
|
||||
/// when verification is already enabled via config. Use `--no-verify`
|
||||
/// to disable for a single run. Requires the binary to be built with
|
||||
/// Dynamic verification is on by default. This flag is a no-op when
|
||||
/// verification is already enabled via config. Use `--no-verify` to
|
||||
/// disable it for a single run. Requires the binary to be built with
|
||||
/// `--features dynamic`; without that feature this flag is silently ignored.
|
||||
#[cfg_attr(not(feature = "dynamic"), arg(hide = true))]
|
||||
#[arg(long, help_heading = "Dynamic", conflicts_with = "no_verify")]
|
||||
|
|
@ -489,9 +489,9 @@ pub enum Commands {
|
|||
|
||||
/// Also verify `Confidence < Medium` findings dynamically.
|
||||
///
|
||||
/// By default only `Confidence >= Medium` findings are verified (§5.1).
|
||||
/// Pass this flag to run verification on all findings regardless of
|
||||
/// confidence. Intended for corpus-building and backfill runs.
|
||||
/// By default only `Confidence >= Medium` findings are verified. Pass
|
||||
/// this flag to run verification on all findings regardless of
|
||||
/// confidence. Intended for payload tuning and backfill runs.
|
||||
#[cfg_attr(not(feature = "dynamic"), arg(hide = true))]
|
||||
#[arg(long, help_heading = "Dynamic")]
|
||||
verify_all_confidence: bool,
|
||||
|
|
@ -532,7 +532,7 @@ pub enum Commands {
|
|||
)]
|
||||
harden: Option<String>,
|
||||
|
||||
// ── Baseline / patch-validation (§M6.5) ────────────────────────
|
||||
// Baseline / patch-validation
|
||||
/// Read a previous scan's JSON output (or a stripped .nyx/baseline.json)
|
||||
/// and diff it against the current scan on stable_hash.
|
||||
///
|
||||
|
|
@ -564,7 +564,7 @@ pub enum Commands {
|
|||
gate: Option<String>,
|
||||
},
|
||||
|
||||
/// Submit feedback on a dynamic verification verdict (§21.2).
|
||||
/// Submit feedback on a dynamic verification verdict.
|
||||
///
|
||||
/// Records a correction or confirmation for a finding's verdict in the
|
||||
/// local telemetry log. Requires `--features dynamic`.
|
||||
|
|
|
|||
|
|
@ -283,10 +283,17 @@ pub fn method_formal_types(method: Node<'_>, bytes: &[u8]) -> Vec<(String, Strin
|
|||
|
||||
/// Extract placeholder names from a route path template.
|
||||
///
|
||||
/// Supports two placeholder syntaxes:
|
||||
/// Supports three placeholder syntaxes:
|
||||
/// - JAX-RS / Spring / Micronaut: `/users/{id}` → `id`,
|
||||
/// `/users/{id:[0-9]+}` → `id`.
|
||||
/// - Servlet-mapping `*` wildcards: ignored (no name to bind).
|
||||
/// - Spring 5.3+ capture-all variables: `/files/{*path}` → `path`
|
||||
/// (matches the remainder of the URI including slashes).
|
||||
/// - Bare Ant-style `*` / `**` wildcards (`/users/*`, `/files/**`):
|
||||
/// intentionally yield no placeholders. They are unnamed by Spring's
|
||||
/// `AntPathMatcher` and cannot bind by formal name; handlers that
|
||||
/// need the matched segment use `HttpServletRequest.getRequestURI()`
|
||||
/// (already routed to [`ParamSource::Implicit`]) or the named
|
||||
/// `{*name}` capture-all syntax above.
|
||||
pub fn extract_path_placeholders(path: &str) -> Vec<String> {
|
||||
let mut out: Vec<String> = Vec::new();
|
||||
let bytes = path.as_bytes();
|
||||
|
|
@ -295,7 +302,8 @@ pub fn extract_path_placeholders(path: &str) -> Vec<String> {
|
|||
if bytes[i] == b'{'
|
||||
&& let Some(end) = bytes[i + 1..].iter().position(|&b| b == b'}') {
|
||||
let inner = &path[i + 1..i + 1 + end];
|
||||
let name = inner.split(':').next().unwrap_or(inner).trim();
|
||||
let inner_name = inner.split(':').next().unwrap_or(inner).trim();
|
||||
let name = inner_name.strip_prefix('*').unwrap_or(inner_name);
|
||||
if !name.is_empty() && !out.iter().any(|n| n == name) {
|
||||
out.push(name.to_owned());
|
||||
}
|
||||
|
|
@ -420,6 +428,26 @@ mod tests {
|
|||
assert_eq!(extract_path_placeholders("/u/{id:[0-9]+}"), vec!["id"]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn extracts_capture_all_variable() {
|
||||
assert_eq!(extract_path_placeholders("/files/{*path}"), vec!["path"]);
|
||||
assert_eq!(
|
||||
extract_path_placeholders("/api/{tenant}/files/{*resource}"),
|
||||
vec!["tenant", "resource"]
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unnamed_ant_globs_yield_no_placeholders() {
|
||||
// Bare `*` and `**` are unnamed by Spring's AntPathMatcher and have
|
||||
// no name to bind a formal to. Handlers that need the matched
|
||||
// segment use the request object (routed to [`ParamSource::Implicit`])
|
||||
// or the named `{*name}` capture-all syntax above.
|
||||
assert!(extract_path_placeholders("/users/*").is_empty());
|
||||
assert!(extract_path_placeholders("/files/**").is_empty());
|
||||
assert!(extract_path_placeholders("/a/*/b/**/c").is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn join_drops_double_slash() {
|
||||
assert_eq!(join_route_path("/api", "/x"), "/api/x");
|
||||
|
|
|
|||
|
|
@ -18,7 +18,7 @@ use tree_sitter::Node;
|
|||
|
||||
use super::ruby_routes::{
|
||||
bind_path_params, class_extends, class_name, find_class_with_method, first_string_arg,
|
||||
kwarg_string, method_formal_names, source_imports_rails, verb_from_ident,
|
||||
first_symbol_arg, kwarg_string, method_formal_names, source_imports_rails, verb_from_ident,
|
||||
};
|
||||
|
||||
pub struct RubyRailsAdapter;
|
||||
|
|
@ -40,9 +40,13 @@ fn class_is_rails_controller(class: Node<'_>, bytes: &[u8]) -> bool {
|
|||
/// Walk the file's top-level `call` nodes looking for a
|
||||
/// `Rails.application.routes.draw` block or bare `get / post / ...`
|
||||
/// dispatch lines, and return the first `(method, path)` whose
|
||||
/// `to: 'controller#action'` kwarg references the target. Returns
|
||||
/// `None` when no route mapping is present (the caller then falls
|
||||
/// back to the conventional `/{action}` shape).
|
||||
/// `to: 'controller#action'` kwarg references the target. Respects
|
||||
/// `namespace :api do ... end` and `scope :v1 do ... end` /
|
||||
/// `scope path: '/v1' do ... end` nesting so a route declared inside
|
||||
/// such a block resolves against the prefixed path + controller name
|
||||
/// Rails actually mounts it under. Returns `None` when no mapping
|
||||
/// is present (the caller then falls back to the conventional
|
||||
/// `/{action}` shape).
|
||||
fn find_route_mapping<'a>(
|
||||
root: Node<'a>,
|
||||
bytes: &'a [u8],
|
||||
|
|
@ -50,7 +54,7 @@ fn find_route_mapping<'a>(
|
|||
action: &str,
|
||||
) -> Option<(HttpMethod, String)> {
|
||||
let mut hit: Option<(HttpMethod, String)> = None;
|
||||
visit_routes(root, bytes, controller, action, &mut hit);
|
||||
visit_routes(root, bytes, controller, action, "", "", &mut hit);
|
||||
hit
|
||||
}
|
||||
|
||||
|
|
@ -59,19 +63,98 @@ fn visit_routes<'a>(
|
|||
bytes: &'a [u8],
|
||||
controller: &str,
|
||||
action: &str,
|
||||
path_prefix: &str,
|
||||
ctrl_prefix: &str,
|
||||
out: &mut Option<(HttpMethod, String)>,
|
||||
) {
|
||||
if out.is_some() {
|
||||
return;
|
||||
}
|
||||
if node.kind() == "call"
|
||||
&& let Some(found) = try_route_mapping(node, bytes, controller, action) {
|
||||
if node.kind() == "call" {
|
||||
if let Some((kind, ident)) = route_nesting_kind(node, bytes) {
|
||||
let (path_pfx, ctrl_pfx) = match kind {
|
||||
NestingKind::Namespace => (
|
||||
format!("{path_prefix}/{ident}"),
|
||||
format!("{ctrl_prefix}{ident}/"),
|
||||
),
|
||||
NestingKind::ScopeSymbol => (
|
||||
format!("{path_prefix}/{ident}"),
|
||||
format!("{ctrl_prefix}{ident}/"),
|
||||
),
|
||||
NestingKind::ScopePath => (format!("{path_prefix}/{ident}"), ctrl_prefix.to_owned()),
|
||||
};
|
||||
recurse_into_block(node, bytes, controller, action, &path_pfx, &ctrl_pfx, out);
|
||||
return;
|
||||
}
|
||||
if let Some(found) = try_route_mapping(node, bytes, controller, action, path_prefix, ctrl_prefix) {
|
||||
*out = Some(found);
|
||||
return;
|
||||
}
|
||||
}
|
||||
let mut cur = node.walk();
|
||||
for child in node.children(&mut cur) {
|
||||
visit_routes(child, bytes, controller, action, out);
|
||||
visit_routes(child, bytes, controller, action, path_prefix, ctrl_prefix, out);
|
||||
}
|
||||
}
|
||||
|
||||
enum NestingKind {
|
||||
Namespace,
|
||||
ScopeSymbol,
|
||||
ScopePath,
|
||||
}
|
||||
|
||||
/// If `call` is a routes-DSL nesting block (`namespace :api do ... end`,
|
||||
/// `scope :v1 do ... end`, or `scope path: '/v1' do ... end`) return
|
||||
/// the kind + the extracted identifier (a bare token for namespace /
|
||||
/// symbol-scope, a leading-slash-stripped path for path-scope).
|
||||
fn route_nesting_kind<'a>(call: Node<'a>, bytes: &'a [u8]) -> Option<(NestingKind, String)> {
|
||||
let mut cur = call.walk();
|
||||
let mut ident: Option<&str> = None;
|
||||
let mut args: Option<Node<'a>> = None;
|
||||
for child in call.named_children(&mut cur) {
|
||||
match child.kind() {
|
||||
"identifier" => ident = child.utf8_text(bytes).ok(),
|
||||
"argument_list" => args = Some(child),
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
let ident = ident?;
|
||||
let args = args?;
|
||||
match ident {
|
||||
"namespace" => {
|
||||
let sym = first_symbol_arg(args, bytes)?;
|
||||
Some((NestingKind::Namespace, sym))
|
||||
}
|
||||
"scope" => {
|
||||
if let Some(sym) = first_symbol_arg(args, bytes) {
|
||||
Some((NestingKind::ScopeSymbol, sym))
|
||||
} else {
|
||||
let path = kwarg_string(args, bytes, "path")?;
|
||||
let trimmed = path.trim_start_matches('/').to_owned();
|
||||
if trimmed.is_empty() {
|
||||
return None;
|
||||
}
|
||||
Some((NestingKind::ScopePath, trimmed))
|
||||
}
|
||||
}
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
fn recurse_into_block<'a>(
|
||||
call: Node<'a>,
|
||||
bytes: &'a [u8],
|
||||
controller: &str,
|
||||
action: &str,
|
||||
path_prefix: &str,
|
||||
ctrl_prefix: &str,
|
||||
out: &mut Option<(HttpMethod, String)>,
|
||||
) {
|
||||
let mut cur = call.walk();
|
||||
for child in call.named_children(&mut cur) {
|
||||
if child.kind() == "do_block" || child.kind() == "block" {
|
||||
visit_routes(child, bytes, controller, action, path_prefix, ctrl_prefix, out);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -80,6 +163,8 @@ fn try_route_mapping<'a>(
|
|||
bytes: &'a [u8],
|
||||
controller: &str,
|
||||
action: &str,
|
||||
path_prefix: &str,
|
||||
ctrl_prefix: &str,
|
||||
) -> Option<(HttpMethod, String)> {
|
||||
let mut cur = call.walk();
|
||||
let mut verb: Option<HttpMethod> = None;
|
||||
|
|
@ -100,8 +185,14 @@ fn try_route_mapping<'a>(
|
|||
let path = first_string_arg(args, bytes)?;
|
||||
let to = kwarg_string(args, bytes, "to")?;
|
||||
let (ctrl, act) = to.split_once('#')?;
|
||||
if controller_matches(ctrl, controller) && act == action {
|
||||
return Some((verb, path));
|
||||
let full_ctrl = format!("{ctrl_prefix}{ctrl}");
|
||||
if controller_matches(&full_ctrl, controller) && act == action {
|
||||
let full_path = if path_prefix.is_empty() {
|
||||
path
|
||||
} else {
|
||||
format!("{}/{}", path_prefix, path.trim_start_matches('/'))
|
||||
};
|
||||
return Some((verb, full_path));
|
||||
}
|
||||
None
|
||||
}
|
||||
|
|
@ -269,6 +360,51 @@ mod tests {
|
|||
assert!(matches!(id.source, crate::dynamic::framework::ParamSource::PathSegment(_)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn routes_draw_namespace_applies_prefix_to_path_and_controller() {
|
||||
let src: &[u8] = b"Rails.application.routes.draw do\n namespace :api do\n get '/users', to: 'users#index'\n end\nend\n\nclass Api::UsersController < ApplicationController\n def index\n 'ok'\n end\nend\n";
|
||||
let tree = parse(src);
|
||||
let binding = RubyRailsAdapter
|
||||
.detect(&summary("index"), tree.root_node(), src)
|
||||
.expect("binding");
|
||||
let route = binding.route.unwrap();
|
||||
assert_eq!(route.path, "/api/users");
|
||||
assert_eq!(route.method, HttpMethod::GET);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn routes_draw_scope_path_prefixes_path_only() {
|
||||
let src: &[u8] = b"Rails.application.routes.draw do\n scope path: '/v1' do\n get '/users', to: 'users#index'\n end\nend\n\nclass UsersController < ApplicationController\n def index\n 'ok'\n end\nend\n";
|
||||
let tree = parse(src);
|
||||
let binding = RubyRailsAdapter
|
||||
.detect(&summary("index"), tree.root_node(), src)
|
||||
.expect("binding");
|
||||
let route = binding.route.unwrap();
|
||||
assert_eq!(route.path, "/v1/users");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn routes_draw_scope_symbol_prefixes_path_and_controller() {
|
||||
let src: &[u8] = b"Rails.application.routes.draw do\n scope :admin do\n get '/users', to: 'users#index'\n end\nend\n\nclass Admin::UsersController < ApplicationController\n def index\n 'ok'\n end\nend\n";
|
||||
let tree = parse(src);
|
||||
let binding = RubyRailsAdapter
|
||||
.detect(&summary("index"), tree.root_node(), src)
|
||||
.expect("binding");
|
||||
let route = binding.route.unwrap();
|
||||
assert_eq!(route.path, "/admin/users");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn routes_draw_nested_namespaces_compose_prefixes() {
|
||||
let src: &[u8] = b"Rails.application.routes.draw do\n namespace :api do\n namespace :v1 do\n get '/users', to: 'users#index'\n end\n end\nend\n\nclass Api::V1::UsersController < ApplicationController\n def index\n 'ok'\n end\nend\n";
|
||||
let tree = parse(src);
|
||||
let binding = RubyRailsAdapter
|
||||
.detect(&summary("index"), tree.root_node(), src)
|
||||
.expect("binding");
|
||||
let route = binding.route.unwrap();
|
||||
assert_eq!(route.path, "/api/v1/users");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn skips_when_class_is_not_a_controller() {
|
||||
let src: &[u8] = b"class Foo\n def bar\n 'ok'\n end\nend\n";
|
||||
|
|
|
|||
|
|
@ -145,7 +145,7 @@ fn named_child_of_kind<'a>(node: Node<'a>, kind: &str) -> Option<Node<'a>> {
|
|||
pub fn class_name<'a>(class: Node<'a>, bytes: &'a [u8]) -> Option<&'a str> {
|
||||
let mut cur = class.walk();
|
||||
for c in class.named_children(&mut cur) {
|
||||
if c.kind() == "constant" {
|
||||
if c.kind() == "constant" || c.kind() == "scope_resolution" {
|
||||
return c.utf8_text(bytes).ok();
|
||||
}
|
||||
}
|
||||
|
|
@ -352,6 +352,22 @@ fn is_implicit_formal(name: &str) -> bool {
|
|||
matches!(name, "env" | "request" | "req" | "params" | "response" | "res")
|
||||
}
|
||||
|
||||
/// Read the first positional symbol argument (`:foo`) from an
|
||||
/// `argument_list` child. Used by the Rails router DSL to pull the
|
||||
/// namespace name out of `namespace :api do ... end` and the
|
||||
/// positional form of `scope :v1 do ... end`. The returned string
|
||||
/// is the symbol's identifier portion without the leading colon.
|
||||
pub fn first_symbol_arg<'a>(args: Node<'a>, bytes: &'a [u8]) -> Option<String> {
|
||||
let mut cur = args.walk();
|
||||
for c in args.named_children(&mut cur) {
|
||||
if c.kind() == "simple_symbol" {
|
||||
let raw = c.utf8_text(bytes).ok()?;
|
||||
return Some(raw.trim_start_matches(':').to_owned());
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Read the first positional string-literal argument from an
|
||||
/// `argument_list` child. Used by every Ruby route adapter to pull
|
||||
/// a path template out of `get '/run' do ... end` and the Rails
|
||||
|
|
|
|||
|
|
@ -565,10 +565,9 @@ pub enum ReplayResult {
|
|||
/// Tri-state map of [`ReplayResult`] onto the eval-corpus
|
||||
/// `VerifyResult::replay_stable` field shape.
|
||||
///
|
||||
/// * `Some(true)` — replay matched the recorded outcome.
|
||||
/// * `Some(false)` — replay diverged or aborted in a way that the M7
|
||||
/// Gate-5 inversion treats as instability.
|
||||
/// * `None` — replay was not informative (toolchain mismatched, docker
|
||||
/// * `Some(true)` - replay matched the recorded outcome.
|
||||
/// * `Some(false)` - replay diverged or aborted.
|
||||
/// * `None` - replay was not informative (toolchain mismatched, docker
|
||||
/// unavailable, or the bundle had no `reproduce.sh`). The corpus
|
||||
/// tabulator treats `None` as "no signal" and excludes the row from
|
||||
/// the per-cell `stable_replays` numerator.
|
||||
|
|
@ -582,15 +581,14 @@ pub fn replay_stability(result: &ReplayResult) -> Option<bool> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Phase 28 — Track H.3. Run `reproduce.sh` in `bundle_root` and map the
|
||||
/// shell exit code into a [`ReplayResult`].
|
||||
/// Run `reproduce.sh` in `bundle_root` and map the shell exit code into a
|
||||
/// [`ReplayResult`].
|
||||
///
|
||||
/// `extra_args` is appended to `reproduce.sh` (`--docker` when the caller
|
||||
/// wants the docker backend; empty for the process backend).
|
||||
///
|
||||
/// This is the host-side companion to the M7 Gate 5 inversion: callers
|
||||
/// who want "did this bundle replay green?" semantics see a typed result
|
||||
/// and the M7 gate script gets a uniform contract to assert against.
|
||||
/// Callers who want "did this bundle replay green?" semantics get a typed
|
||||
/// result instead of parsing shell output.
|
||||
pub fn replay_bundle(
|
||||
bundle_root: &Path,
|
||||
extra_args: &[&str],
|
||||
|
|
|
|||
|
|
@ -1,9 +1,9 @@
|
|||
//! Telemetry event log (§21.1).
|
||||
//! Telemetry event log.
|
||||
//!
|
||||
//! Writes one JSON line per verdict to `~/.cache/nyx/dynamic/events.jsonl`.
|
||||
//! `NYX_NO_TELEMETRY=1` silently disables all writes (§21.4).
|
||||
//! `NYX_NO_TELEMETRY=1` silently disables all writes.
|
||||
//!
|
||||
//! # Schema (Phase 27)
|
||||
//! # Schema
|
||||
//!
|
||||
//! Every record starts with three envelope fields so the on-disk format can
|
||||
//! evolve across releases without silently mixing incompatible records:
|
||||
|
|
@ -12,11 +12,10 @@
|
|||
//! - `nyx_version`: the Cargo package version that wrote the record.
|
||||
//! - `corpus_version`: the payload-corpus version active at write time.
|
||||
//!
|
||||
//! Followed by a `kind` discriminator (`"verdict"` or `"rank_delta"`). All
|
||||
//! readers (`read_events`, the M7 ship gate) require `schema_version ==
|
||||
//! [`SCHEMA_VERSION`]; mismatched records produce
|
||||
//! [`TelemetryReadError::SchemaMismatch`] instead of being silently parsed
|
||||
//! as if they matched.
|
||||
//! Followed by a `kind` discriminator (`"verdict"` or `"rank_delta"`). All
|
||||
//! readers require `schema_version == SCHEMA_VERSION`; mismatched records
|
||||
//! produce [`TelemetryReadError::SchemaMismatch`] instead of being silently
|
||||
//! parsed as if they matched.
|
||||
//!
|
||||
//! ```json
|
||||
//! {
|
||||
|
|
@ -258,12 +257,10 @@ fn lang_from_path(path: &str) -> String {
|
|||
.unwrap_or_else(|| "unknown".to_owned())
|
||||
}
|
||||
|
||||
/// Sampling decision for telemetry writes (Phase 27, Track H.2).
|
||||
/// Sampling decision for telemetry writes.
|
||||
///
|
||||
/// Confirmed and Inconclusive verdicts are calibration-critical (false-Confirmed
|
||||
/// rate gates M7 ship; Inconclusive reasons drive the spec-derivation roadmap)
|
||||
/// and are always retained. Other verdict statuses can be downsampled to bound
|
||||
/// log growth on high-volume scans.
|
||||
/// Confirmed and Inconclusive verdicts are kept for calibration. Other verdict
|
||||
/// statuses can be downsampled to bound log growth on high-volume scans.
|
||||
///
|
||||
/// The decision is seeded by `spec_hash` so the *same* finding makes the *same*
|
||||
/// keep-or-drop call across reruns. Without this, two scans of the same project
|
||||
|
|
@ -413,12 +410,11 @@ pub fn log_path() -> Option<std::path::PathBuf> {
|
|||
events_log_path()
|
||||
}
|
||||
|
||||
// ── Reading events back (Phase 27) ───────────────────────────────────────────
|
||||
// Reading events back
|
||||
|
||||
/// Structured error returned by [`read_events`].
|
||||
///
|
||||
/// Surfaced to the M7 ship gate so Gate 2 can fail loudly on schema-mismatch
|
||||
/// rather than silently treating mismatched records as "no data".
|
||||
/// Returned when a log mixes records from incompatible schema versions.
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum TelemetryReadError {
|
||||
#[error("io error reading {path}: {source}")]
|
||||
|
|
@ -451,14 +447,12 @@ pub enum TelemetryReadError {
|
|||
///
|
||||
/// Returns each line as a `serde_json::Value` so callers can dispatch on the
|
||||
/// `kind` discriminator themselves. Rejects any record whose `schema_version`
|
||||
/// does not match [`SCHEMA_VERSION`] (this is the explicit failure mode the
|
||||
/// M7 ship gate Gate 2 consumes; a v0 record from an older release must not
|
||||
/// silently parse as if the schema had never changed).
|
||||
/// does not match [`SCHEMA_VERSION`]. A v0 record from an older release must
|
||||
/// not silently parse as if the schema had never changed.
|
||||
///
|
||||
/// Blank lines are skipped. Any malformed JSON or missing `schema_version`
|
||||
/// fails the whole read; partial recovery is not the contract here because
|
||||
/// the ship gate already treats "log missing or unreadable" as "no data,
|
||||
/// skip Gate 2 with a notice."
|
||||
/// Blank lines are skipped. Any malformed JSON or missing `schema_version`
|
||||
/// fails the whole read; partial recovery is not the contract for telemetry
|
||||
/// logs.
|
||||
pub fn read_events(path: &Path) -> Result<Vec<serde_json::Value>, TelemetryReadError> {
|
||||
let file = std::fs::File::open(path).map_err(|e| TelemetryReadError::Io {
|
||||
path: path.to_path_buf(),
|
||||
|
|
@ -551,8 +545,8 @@ pub fn feedback_wrong_for_finding(path: &Path, finding_id: &str) -> Option<bool>
|
|||
/// One telemetry event per ranked finding that carries a dynamic verdict delta.
|
||||
///
|
||||
/// Emitted by `rank::rank_diags` for every diag whose dynamic verdict shifts
|
||||
/// its rank score (delta != 0). Used by the M7 calibration pipeline to tune
|
||||
/// the N/M boost/penalty constants from real-world verdict distributions.
|
||||
/// its rank score (delta != 0). Used to tune the N/M boost/penalty constants
|
||||
/// from real-world verdict distributions.
|
||||
#[derive(Debug, serde::Serialize, serde::Deserialize)]
|
||||
pub struct RankDeltaEvent {
|
||||
pub schema_version: u32,
|
||||
|
|
|
|||
|
|
@ -85,14 +85,11 @@ pub struct VerifyOptions {
|
|||
/// Default `false`. [`Self::from_config`] honours the
|
||||
/// `NYX_VERIFY_REPLAY_STABLE` environment variable (`1` / `true`).
|
||||
pub replay_stable_check: bool,
|
||||
/// Phase 31 follow-up: when `true` and `replay_stable_check` is also
|
||||
/// `true`, the verifier passes `--docker` to `reproduce.sh` instead of
|
||||
/// running it through the host's process backend. Lets the eval-corpus
|
||||
/// driver mark `replay_stable` based on the bare-image replay path so
|
||||
/// the M7 ship-gate's Gate 5 reflects the docker bundle's green/red
|
||||
/// signal — required when the corpus walks a host that has stripped
|
||||
/// the language toolchains (the bare-image CI matrix at
|
||||
/// `.github/workflows/repro-bare.yml`).
|
||||
/// When `true` and `replay_stable_check` is also `true`, the verifier
|
||||
/// passes `--docker` to `reproduce.sh` instead of running it through the
|
||||
/// host's process backend. This lets eval-corpus runs mark
|
||||
/// `replay_stable` from the bare-image replay path when the host has
|
||||
/// stripped language toolchains.
|
||||
///
|
||||
/// Default `false`. [`Self::from_config`] honours the
|
||||
/// `NYX_VERIFY_REPLAY_DOCKER` environment variable (`1` / `true`).
|
||||
|
|
|
|||
17
src/rank.rs
17
src/rank.rs
|
|
@ -99,7 +99,7 @@ pub fn compute_attack_rank(diag: &Diag) -> AttackRank {
|
|||
// All other verdicts (Unsupported, Inconclusive, no verdict) are
|
||||
// unaffected: no data is better than speculative data.
|
||||
//
|
||||
// Calibrated values (M7 eval corpus): N=20, M=5.
|
||||
// Calibrated values from the eval corpus: N=20, M=5.
|
||||
// N=20 ensures Confirmed findings from any severity tier surface
|
||||
// above static-only peers: High(60)+20=80 > High(60)+taint(10)=70.
|
||||
// M=5 nudges exhausted-corpus NotConfirmed below equal static peers
|
||||
|
|
@ -209,7 +209,7 @@ pub fn rank_diags(diags: &mut [Diag]) {
|
|||
if !rank.components.is_empty() {
|
||||
d.rank_reason = Some(rank.components.clone());
|
||||
}
|
||||
// Emit rank-delta telemetry for M7 calibration (§21 / deferred M7 hook).
|
||||
// Emit rank-delta telemetry for score calibration.
|
||||
// Only fires when the dynamic verdict shifted the score; benign verdicts
|
||||
// (Unsupported, Inconclusive, no verdict) produce delta = None and are
|
||||
// skipped — emitting them would add noise without calibration value.
|
||||
|
|
@ -247,17 +247,16 @@ pub fn rank_diags(diags: &mut [Diag]) {
|
|||
/// Returns `None` when there is no verdict (static-only scan) or the verdict
|
||||
/// does not change the score (Unsupported, Inconclusive).
|
||||
///
|
||||
/// Design note (§deferred M7 payload_corpus_complete): the spec originally
|
||||
/// distinguished `NotConfirmed` + `payload_corpus_complete == true` → `-M`
|
||||
/// from `NotConfirmed` + `NoPayloadsForCap` → no change. In practice the
|
||||
/// Design note: the spec originally distinguished `NotConfirmed` +
|
||||
/// `payload_corpus_complete == true` from `NotConfirmed` +
|
||||
/// `NoPayloadsForCap`. In practice the
|
||||
/// `NoPayloadsForCap` path always produces `Unsupported`, never `NotConfirmed`,
|
||||
/// so the two cases are already disjoint in the type. The heuristic
|
||||
/// `!dv.attempts.is_empty()` (corpus was actually tried) is equivalent to
|
||||
/// `payload_corpus_complete == true` for all reachable states — no extra
|
||||
/// field is needed. See also §deferred decision in `.pitboss/play/deferred.md`.
|
||||
/// `payload_corpus_complete == true` for all reachable states, so no extra
|
||||
/// field is needed.
|
||||
///
|
||||
/// Values calibrated against M7 eval corpus (OWASP Benchmark v1.2 + in-house curated set):
|
||||
/// N=20, M=5 — see `docs/dynamic_eval_m7.md` for precision/recall breakdowns.
|
||||
/// Values calibrated against the eval corpus: N=20, M=5.
|
||||
fn dynamic_verdict_delta(diag: &Diag) -> Option<f64> {
|
||||
use crate::evidence::VerifyStatus;
|
||||
let dv = diag.evidence.as_ref()?.dynamic_verdict.as_ref()?;
|
||||
|
|
|
|||
|
|
@ -36,9 +36,9 @@ struct StartScanRequest {
|
|||
engine_profile: Option<String>,
|
||||
/// Override dynamic verification for this scan.
|
||||
///
|
||||
/// `true` — force on even if config says off.
|
||||
/// `false` — force off even if config says on (M7 default-on).
|
||||
/// absent — inherit config default (true since M7).
|
||||
/// `true` - force on even if config says off.
|
||||
/// `false` - force off even if config says on.
|
||||
/// absent - inherit config default.
|
||||
///
|
||||
/// Requires `--features dynamic`; `true` returns 400 when the
|
||||
/// feature is absent.
|
||||
|
|
|
|||
|
|
@ -251,8 +251,8 @@ pub struct ScannerConfig {
|
|||
|
||||
/// Run dynamic verification on each finding after the static pass.
|
||||
///
|
||||
/// Default `true` (M7 flip). Each `Confidence >= Medium` finding is
|
||||
/// passed to `dynamic::verify_finding` and the result is stored in
|
||||
/// Default `true`. Each `Confidence >= Medium` finding is passed to
|
||||
/// `dynamic::verify_finding` and the result is stored in
|
||||
/// `Evidence::dynamic_verdict`. Use `--no-verify` (CLI) or set
|
||||
/// `verify = false` in `nyx.toml` to disable.
|
||||
///
|
||||
|
|
|
|||
|
|
@ -1,17 +1,10 @@
|
|||
# Phase 31: ratchet values set to the headline targets.
|
||||
# Eval corpus budget.
|
||||
#
|
||||
# These are the published acceptance numbers behind the dynamic-verification
|
||||
# overhaul (see `docs/dynamic.md` "Headline metrics"). The ratchet schedule
|
||||
# from Phase 29 collapsed into a single target row: every (cap, lang) cell is
|
||||
# now gated against the same headline thresholds. Per-cell carve-outs were
|
||||
# dropped in Phase 31; if a cell is still wider than these numbers in practice
|
||||
# it shows up as a per-cell `FAIL` in `report.py` and as a gate-1 failure in
|
||||
# `scripts/m7_ship_gate.sh`, which is the intended forcing function for the
|
||||
# remaining engine follow-ups tracked in `.pitboss/play/deferred.md`.
|
||||
# `report.py` enforces these values when `run.sh` or `run_full.sh` pass
|
||||
# `--budget`. Each (cap, lang) cell uses the default row unless a specific
|
||||
# override appears below.
|
||||
#
|
||||
# Wall-clock cost (≤ 2× static-only) is enforced separately by Gate 3 of
|
||||
# `scripts/m7_ship_gate.sh` against `benches/fixtures/`; it is not a per-cell
|
||||
# budget knob and has no entry in this file.
|
||||
# Wall-clock cost is measured separately from this per-cell budget.
|
||||
#
|
||||
# Schema:
|
||||
#
|
||||
|
|
|
|||
|
|
@ -1,23 +1,23 @@
|
|||
#!/usr/bin/env bash
|
||||
# Eval corpus runner for M7 pre-flip gate calibration.
|
||||
# Eval corpus runner.
|
||||
#
|
||||
# Usage:
|
||||
# tests/eval_corpus/run.sh [--output DIR] [--nyx BIN] [--sets owasp,sard,inhouse]
|
||||
#
|
||||
# Bootstraps OWASP Benchmark v1.2, NIST SARD subset, and in-house
|
||||
# bughunt-curated fixtures. Runs `nyx scan --verify` on each. Emits
|
||||
# Bootstraps OWASP Benchmark v1.2, the NIST SARD subset, and Nyx benchmark
|
||||
# fixtures. Runs `nyx scan --verify` on each. Emits
|
||||
# per-cell (cap x language) precision/recall table and per-cap Unsupported
|
||||
# rate to stdout (and --output DIR if given).
|
||||
#
|
||||
# Environment:
|
||||
# NYX_EVAL_CORPUS_DIR — path to pre-downloaded corpus roots
|
||||
# NYX_EVAL_CORPUS_DIR - path to pre-downloaded corpus roots
|
||||
# (default: ~/.cache/nyx/eval_corpus)
|
||||
# NYX_BIN — path to nyx binary (default: ./target/release/nyx)
|
||||
# NYX_BIN - path to nyx binary (default: ./target/release/nyx)
|
||||
#
|
||||
# Exit codes:
|
||||
# 0 — all gate thresholds met
|
||||
# 1 — setup or I/O error
|
||||
# 2 — one or more gate thresholds exceeded (see output for details)
|
||||
# 0 - all budget thresholds met
|
||||
# 1 - setup or I/O error
|
||||
# 2 - one or more budget thresholds exceeded (see output for details)
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
|
|
@ -173,9 +173,8 @@ python3 "${SCRIPT_DIR}/report.py" \
|
|||
${DIFF_FILE:+--diff "$DIFF_FILE"}
|
||||
REPORT_RC=$?
|
||||
set -e
|
||||
# Propagate gate-fail (exit 2) and malformed-config (exit 3) so the
|
||||
# m7_ship_gate.sh Gate-1 dispatch can tell them apart. Treat other
|
||||
# non-zero as setup error (exit 1).
|
||||
# Propagate budget failures (exit 2) and malformed config (exit 3). Treat other
|
||||
# non-zero exits as setup errors.
|
||||
if [[ $REPORT_RC -eq 2 ]]; then
|
||||
exit 2
|
||||
elif [[ $REPORT_RC -eq 3 ]]; then
|
||||
|
|
|
|||
|
|
@ -1,12 +1,10 @@
|
|||
#!/usr/bin/env bash
|
||||
# Phase 31: full eval-corpus orchestrator.
|
||||
# Full eval-corpus orchestrator.
|
||||
#
|
||||
# Drives a complete pass against every corpus set the project knows about
|
||||
# (OWASP Benchmark v1.2, the NIST SARD subset, and the in-house bughunt
|
||||
# fixtures), then emits a stable `tests/eval_corpus/results.json` so
|
||||
# downstream consumers (M7 ship gate, monotonic-improvement diff, the
|
||||
# headline metrics table in `docs/dynamic.md`) can read a single
|
||||
# well-known path.
|
||||
# (OWASP Benchmark v1.2, the NIST SARD subset, and the Nyx benchmark
|
||||
# fixtures), then emits `tests/eval_corpus/results.json` for reports,
|
||||
# diffs, and docs.
|
||||
#
|
||||
# Usage:
|
||||
# tests/eval_corpus/run_full.sh [--nyx BIN] [--budget FILE] [--diff FILE]
|
||||
|
|
@ -15,11 +13,9 @@
|
|||
# Differences vs `run.sh`:
|
||||
# * Always runs every set (no `--sets` selector).
|
||||
# * Always passes `--budget tests/eval_corpus/budget.toml` so the
|
||||
# headline targets (Unsupported < 20%, FalseConfirmed < 2%, Repro
|
||||
# stability >= 95%) gate every pass.
|
||||
# configured per-cell limits are checked on every pass.
|
||||
# * Copies the timestamped results file to
|
||||
# `tests/eval_corpus/results.json` (canonical path consumed by
|
||||
# `scripts/m7_ship_gate.sh` and the published metrics doc).
|
||||
# `tests/eval_corpus/results.json`.
|
||||
#
|
||||
# Exit codes:
|
||||
# 0 every set ran and the merged result met the per-cell budget.
|
||||
|
|
|
|||
|
|
@ -415,8 +415,8 @@ def main() -> int:
|
|||
elif status == "Confirmed":
|
||||
cells[key]["confirmed"] += 1
|
||||
# Repro-stability and false-Confirmed counts are optional
|
||||
# fields tabulate.py reads off the verdict when callers
|
||||
# (m7_ship_gate.sh / corpus_promote.yml) have stamped them.
|
||||
# fields tabulate.py reads off the verdict when callers have
|
||||
# stamped them.
|
||||
if dv.get("wrong") is True:
|
||||
cells[key]["wrong_confirmed"] += 1
|
||||
if dv.get("replay_stable") is True:
|
||||
|
|
|
|||
|
|
@ -1,14 +1,13 @@
|
|||
//! Phase 27 — Track H.1 integration test.
|
||||
//! Dynamic telemetry schema tests.
|
||||
//!
|
||||
//! Locks in the on-disk telemetry schema contract that `scripts/m7_ship_gate.sh`
|
||||
//! Gate 2 relies on:
|
||||
//! Locks in the on-disk telemetry schema contract:
|
||||
//!
|
||||
//! - Records produced today carry the `schema_version`, `nyx_version`, and
|
||||
//! `corpus_version` envelope fields, plus a `kind` discriminator.
|
||||
//! - `read_events(path)` accepts the current schema.
|
||||
//! - A hand-crafted record with `schema_version: 0` is rejected by
|
||||
//! `read_events` with a typed [`TelemetryReadError::SchemaMismatch`] (this
|
||||
//! is the explicit Phase 27 acceptance bullet).
|
||||
//! is the required failure mode for mixed-schema logs).
|
||||
//! - The sampling policy retains Confirmed and Inconclusive verdicts even at
|
||||
//! `sample_rate_other = 0.0`.
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue