diff --git a/CHANGELOG.md b/CHANGELOG.md index 90c26fcb..3b79a6ac 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,9 +4,11 @@ All notable changes to Nyx are documented here. The format is based on [Keep a C ## [Unreleased] -Three fronts this release: an attack-surface map, a sandboxed dynamic verifier, and a framework adapter registry that grounds both. +## [0.8.0] - 2026-06-01 -The attack-surface map and chain composer turn the flat finding list into a route-to-sink graph. The dynamic verifier re-runs every Medium-or-higher finding against a payload corpus and stamps a Confirmed / NotConfirmed / Inconclusive / Unsupported verdict on each. The adapter registry (116 entries across 8 languages) covers HTTP, message-broker, scheduled-job, GraphQL, WebSocket, middleware, and migration entry points. +The dynamic-verification release. An attack-surface map, a sandboxed dynamic verifier, a framework adapter registry that grounds both, the per-language build infrastructure that makes per-finding verification affordable at corpus scale, and the first real-corpus acceptance gates. + +The attack-surface map and chain composer turn the flat finding list into a route-to-sink graph. The dynamic verifier re-runs every Medium-or-higher finding against a payload corpus and stamps a Confirmed / PartiallyConfirmed / NotConfirmed / Inconclusive / Unsupported verdict on each. The adapter registry (130+ entries across 8 languages) covers HTTP, message-broker, scheduled-job, GraphQL, WebSocket, middleware, and migration entry points. Per-language build pools and copy-on-write workdirs hold the with-verify wall-clock to within 1.5x of a static-only scan. ### Attack-surface map @@ -36,6 +38,15 @@ The attack-surface map and chain composer turn the flat finding list into a rout - **Guard-aware verdicts.** When a known input-validation or output-sanitization middleware sits in front of a Confirmed sink (Spring `@PreAuthorize`, Express `helmet`, Nest `@UseGuards`, Django `@permission_classes`, and the per-language registry in `src/dynamic/framework/auth_markers.rs`), the verdict demotes to `ConfirmedWithKnownGuard` and the guard names land on `differential.known_guards`. Authentication-only filters do not trigger the demotion since they do not mitigate injection. - **Repro bundles.** Every verified finding writes a hermetic bundle to `~/.cache/nyx/dynamic/repro//` with `reproduce.sh`, `expected/{verdict.json,outcome.json,trace.jsonl}`, and a `docker_pull.sh` when the toolchain is pinned in `tools/image-builder/images.toml`. `--verbose` flushes the per-step `VerifyTrace` to stderr for live triage. - **Real-engine harness paths.** LDAP injection routes through an embedded LDAPv3 BER server, exercised from Java via JNDI `InitialDirContext` and from Python and PHP via pure-stdlib BER clients. XPath injection runs against the live parser in each language: Java `javax.xml.xpath`, PHP `DOMXPath`, JS `xpath` npm, Python `lxml`. `Cap::CRYPTO` lands a `WeakKey` probe across Python, Go, Java, PHP, and Rust that flags sub-2^16 keys produced by non-CSPRNG sources. A new `HeaderSmuggledInWire` oracle predicate catches CRLF smuggling on hand-rolled raw-socket HTTP servers (Python `http.server`, Node `net`, Rust `std::net::TcpListener`) where framework-level CRLF strip cannot intervene. +- **Differential rule v2 and partial confirmations.** A finding confirms when *any* vulnerable payload in the set fires and *every* paired benign control stays clean, replacing the strict pair-wise rule so a single missing control no longer downgrades a confirmable finding. A new `PartiallyConfirmed` verdict marks findings where the sink is reached but the exploit chain does not complete (no marker written, no callback observed), so engine work can ratchet without the tool overstating what it proved. +- **Spec derivation v2.** Every derivation strategy now runs and is scored on flow-step depth, framework binding, cross-file source resolution via `GlobalSummaries`, and payload availability; the highest-scoring candidate wins and the runner-up ranking lands in the trace so engine gaps stay visible. Cross-file seeding walks the call graph (max depth 5) until a `Source` step or framework binding is found. New `EntryKind` adapters auto-recover the entry surface from framework decorators and annotations. + +### Performance + +- **Per-language build pools.** A warm `javac` daemon compiles batched harness sources in one long-lived JVM (Track O headline, Phase 22); Node, PHP, Ruby, Go, Rust, C, and C++ reuse shared module / package / object caches; Python layers a read-only venv per `requirements_hash` with a warmed bytecode cache. Target per-finding harness build: P50 ≤ 200ms hot, ≤ 1.5s cold. Pools self-skip when a toolchain is absent so toolchain-less CI rows stay green. +- **Copy-on-write workdirs.** Per-finding workdir setup uses `clonefile` on macOS and `reflink` / `copy_file_range` on Linux instead of copying every harness file, cutting setup cost to single-digit milliseconds. +- **Cap-routed concurrency lanes.** The verifier worker pool splits into per-cap lanes (`SSRF: 8`, `DESERIALIZE: 2`, `CRYPTO: 1`, and so on) so a slow harness for one cap cannot head-of-line block fast ones. +- **Ship-gate budgets.** Gate 3 holds the with-verify / static-only wall-clock ratio at ≤ 1.5x on `benches/fixtures/`; Gate 6 holds the Java OWASP Benchmark `--verify` run at ≤ 15 min on CI / ≤ 10 min on the dev reference machine. ### Determinism, policy, telemetry @@ -51,6 +62,8 @@ The attack-surface map and chain composer turn the flat finding list into a rout - **OWASP Benchmark v1.2 importer.** `tests/eval_corpus/owasp_gt_convert.py` converts the OWASP Java Benchmark expected-results manifest into Nyx ground truth and lands a 16k-line `owasp_benchmark_v1.2.json` for evaluation. - **NIST SARD importer.** `tests/eval_corpus/sard_gt_convert.py` converts SARD test cases into the same format so cross-dataset recall numbers stay comparable. - **Evaluation corpus tooling.** `tests/eval_corpus/run_full.sh` runs the Nyx benchmark, OWASP Benchmark, and NIST SARD evaluation sets and writes `tests/eval_corpus/results.json`. `tests/eval_corpus/report.py` and `tabulate.py` produce the per-cap and per-language summary used to track coverage and accuracy. +- **Real-corpus acceptance gates.** `scripts/m7_ship_gate.sh` adds Gate 6 (Java OWASP Benchmark v1.2), Gate 7 (NodeGoat + Juice Shop), and Gate 8 (RailsGoat, DVWA, DVPWA, gosec, RustSec). Each row enforces the per-`(cap, lang)` budget in `tests/eval_corpus/budget.toml` and publishes per-cap precision / recall / confirmed-rate against a committed ground truth. The corpora are not vendored; each row self-skips unless its `NYX__CORPUS` points at a checkout. +- **Per-spec cryptographic canary.** Every oracle marker is now derived from `BLAKE3(spec_hash || run_nonce)` rather than a fixed literal, so markers are unique per finding, collision-resistant against ambient harness output, and never leak to the host. A compile-time audit rejects any new ad-hoc canary. ### Engine diff --git a/Cargo.lock b/Cargo.lock index e3c2346e..a51740b0 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -1161,7 +1161,7 @@ dependencies = [ [[package]] name = "nyx-scanner" -version = "0.7.0" +version = "0.8.0" dependencies = [ "assert_cmd", "axum", diff --git a/Cargo.toml b/Cargo.toml index b6a8105d..5920ac73 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "nyx-scanner" -version = "0.7.0" +version = "0.8.0" edition = "2024" rust-version = "1.88" description = "A multi-language static analysis tool for detecting security vulnerabilities" diff --git a/README.md b/README.md index 54141fb1..000d9d7f 100644 --- a/README.md +++ b/README.md @@ -76,7 +76,7 @@ Forward cross-file taint runs in every profile. Symex and the demand-driven back ### GitHub Action ```yaml -- uses: elicpeter/nyx@v0.7.0 +- uses: elicpeter/nyx@v0.8.0 with: format: sarif fail-on: MEDIUM @@ -202,6 +202,28 @@ Detector families: taint (cross-file source→sink, with cap-specific rule class --- +## Verify findings dynamically + +Static analysis says a sink is reachable. Dynamic verification tries to prove it. With `--verify` (on by default), Nyx builds a small harness around each Medium-or-higher finding, runs it in a sandbox against a curated payload corpus, and stamps a verdict onto the finding. + +```bash +nyx scan --verify # build + run a harness per finding (default) +nyx scan --no-verify # static analysis only, for fast local loops +``` + +A finding is **Confirmed** only when an attacker-controlled payload fires the sink *and* a paired benign control stays clean. That differential rule, plus behavioral oracles (a template that renders `49`, a deserializer that resolves a gadget class, a redirect that leaves the origin), keeps the verifier from confirming on an echoed string. Sinks behind a recognized guard demote to `ConfirmedWithKnownGuard`; sinks reached without a completed exploit chain land as `PartiallyConfirmed`. + +Coverage spans 18 capability classes and 130+ framework adapters across all ten languages (Flask, Django, Express, NestJS, Spring, Rails, Laravel, Gin, Axum, and more), with per-language build pools and copy-on-write workdirs to keep the per-finding cost low. Confirmed findings write a hermetic repro bundle with a `reproduce.sh`. Runs are deterministic: every payload is seeded from the spec hash. + +```bash +# CI: fail the build if a new Confirmed finding appears vs. a baseline +nyx scan --baseline .nyx/baseline.json --gate no-new-confirmed +``` + +Backends: Docker (preferred, network-blocked by default) or an in-process runner with `--harden {standard,strict}`. Full matrix, oracle list, and limitations: [Dynamic verification](https://nyxscan.dev/docs/dynamic.html). + +--- + ## Configuration Config merges `nyx.conf` (defaults) and `nyx.local` (your overrides) from the platform config directory (`~/.config/nyx/` on Linux, `~/Library/Application Support/nyx/` on macOS, `%APPDATA%\elicpeter\nyx\config\` on Windows). @@ -247,7 +269,7 @@ Limitations: Browse the full docs site at **[nyxscan.dev/docs](https://nyxscan.dev/docs/)**. - [Quick Start](https://nyxscan.dev/docs/quickstart.html) · [CLI Reference](https://nyxscan.dev/docs/cli.html) · [Installation](https://nyxscan.dev/docs/installation.html) -- [`nyx serve`](https://nyxscan.dev/docs/serve.html) · [Output Formats](https://nyxscan.dev/docs/output.html) · [Configuration](https://nyxscan.dev/docs/configuration.html) +- [`nyx serve`](https://nyxscan.dev/docs/serve.html) · [Output Formats](https://nyxscan.dev/docs/output.html) · [Configuration](https://nyxscan.dev/docs/configuration.html) · [Dynamic verification](https://nyxscan.dev/docs/dynamic.html) - [How it works](https://nyxscan.dev/docs/how-it-works.html) · [Detectors](https://nyxscan.dev/docs/detectors.html) ([Taint](https://nyxscan.dev/docs/detectors/taint.html), [CFG](https://nyxscan.dev/docs/detectors/cfg.html), [State](https://nyxscan.dev/docs/detectors/state.html), [AST Patterns](https://nyxscan.dev/docs/detectors/patterns.html)) - [Rule Reference](https://nyxscan.dev/docs/rules.html) · [Language Maturity](https://nyxscan.dev/docs/language-maturity.html) · [Advanced Analysis](https://nyxscan.dev/docs/advanced-analysis.html) · [Auth Analysis](https://nyxscan.dev/docs/auth.html) diff --git a/docs/dynamic.md b/docs/dynamic.md index dea6174d..d1633275 100644 --- a/docs/dynamic.md +++ b/docs/dynamic.md @@ -1,73 +1,182 @@ # Dynamic verification -Nyx re-runs findings in generated harnesses when verification is enabled. By -default, `nyx scan` verifies each `Confidence >= Medium` finding, tries -payloads in a sandbox, and writes the result to `evidence.dynamic_verdict`. -Default Nyx builds include the `dynamic` feature; custom +Static analysis tells you a sink is reachable from a source. Dynamic +verification tries to prove it. When verification is on, Nyx builds a small +harness around each finding, runs it in a sandbox against a curated payload +set, and stamps the result onto `evidence.dynamic_verdict`. + +It is a second signal, not a replacement for review. A `Confirmed` verdict +means Nyx triggered the sink in its harness with an attacker-controlled +payload and proved the benign control stayed clean. `NotConfirmed` means the +harness ran but nothing fired. Neither verdict closes a finding on its own. + +Default Nyx builds include the `dynamic` feature. Custom `--no-default-features` builds run static-only unless rebuilt with `--features dynamic`. -Dynamic verification is a second signal, not a replacement for review. A -confirmed verdict means Nyx triggered the sink in its harness. `NotConfirmed` -means the harness ran but no payload fired. +## How confirmation works + +Every cap that can be verified ships a curated corpus of payload pairs: at +least one vulnerable payload and one benign control. The verifier runs both +through the same harness and compares. + +- The vulnerable payload must fire the sink. A payload "fires" when an + oracle predicate matches the observed behavior, not when a string appears + in the output. +- The benign control must stay clean. It exercises the same code path with a + value that a correct implementation handles safely. + +A finding is `Confirmed` only when at least one vulnerable payload fires and +every paired benign control stays clean. This differential rule is what keeps +the verifier from confirming a finding just because the harness echoed an +input. + +Oracles are behavioral, scoped to the cap: + +| Cap | Oracle | What it observes | +| --- | --- | --- | +| Command/code injection | stub event | the harness's exec boundary saw the injected command | +| SQL injection | stub event | the SQL boundary saw the injected clause | +| SSRF, data exfil | outbound host | the request left for a host outside the allowlist | +| Path traversal | stub event | the filesystem boundary opened a path outside the root | +| Template injection | template eval | `{{7*7}}` rendered as `49`, not echoed as text | +| Deserialization | gadget marker | a non-allowlisted class was resolved during decode | +| XXE | entity expansion | an external entity was expanded by the parser | +| LDAP / XPath injection | result count | the malicious filter returned more rows than the benign one | +| Header / CRLF | header split | an injected `\r\n` split or added a response header | +| Open redirect | redirect host | the `Location` header pointed off-origin | +| Prototype pollution | canary touch | a property write reached `Object.prototype` | +| Weak crypto | key entropy | the produced key fit inside a 16-bit search space | +| JSON parse abuse | parse depth | the parser accepted a depth past its limit | +| IDOR | ownership cross | the read crossed from the caller's id to another owner's | + +Every canary is derived per-run from `BLAKE3(spec_hash || run_nonce)`, so it is +unique per finding, collision-resistant against ambient harness output, and +never appears on the host. ## Running it ```bash -nyx scan # verifies Medium and High confidence findings -nyx scan --no-verify # static analysis only -nyx scan --verify # explicit form of the default behavior +nyx scan # verifies Medium and High confidence findings +nyx scan --no-verify # static analysis only +nyx scan --verify # explicit form of the default behavior +nyx scan --verify-all-confidence # also verify Low-confidence findings ``` -Use `--no-verify` for fast local checks or editor workflows. Keep verification -on for CI when scan time allows it. - -To verify low-confidence findings too: - -```bash -nyx scan --verify-all-confidence -``` - -Use it when tuning payloads or investigating coverage. It is slower and noisier -than the default. +Use `--no-verify` for fast local checks or editor workflows. Keep +verification on for CI when scan time allows it. `--verify-all-confidence` is +slower and noisier; reach for it when tuning payloads or chasing coverage. ## Verdicts | Status | Meaning | | --- | --- | -| `Confirmed` | At least one payload reached the expected sink in the harness. | -| `NotConfirmed` | The harness ran, but no payload reached the sink. Treat the original finding as still open until reviewed. | -| `Inconclusive` | Nyx could not finish the check with enough isolation or runtime support. | -| `Unsupported` | Nyx did not try the finding. Common causes are unsupported language, unsupported sink shape, missing flow steps, or confidence below the verification threshold. | +| `Confirmed` | A vulnerable payload fired the sink and every benign control stayed clean. | +| `PartiallyConfirmed` | The sink was reached but no oracle marker was observed. The exploit chain did not complete. Treat as a strong lead, not a proof. | +| `NotConfirmed` | The harness ran but no payload fired. The path is likely infeasible or the corpus does not cover this shape. The original finding stays open until reviewed. | +| `Inconclusive` | Nyx could not finish the check. Carries a typed reason (build failed, spec derivation failed, sandbox error, policy denied, and others). | +| `Unsupported` | Nyx did not attempt the finding. Carries a typed reason (language unsupported, entry kind unsupported, no payloads for cap, confidence below threshold, no sound oracle). | -## Configuration +When a `Confirmed` sink sits behind a recognized input-validation or +output-sanitization guard (Spring `@PreAuthorize`, Express `helmet`, Nest +`@UseGuards`, Django `@permission_classes`), the verdict demotes to +`ConfirmedWithKnownGuard` and the guard names land on +`differential.known_guards`. Authentication-only filters do not trigger the +demotion, since they do not mitigate injection. -To disable verification for a project, set: +`PartiallyConfirmed` is deliberate. It marks the cases where engine work can +ratchet without the tool overstating what it proved. -```toml -[scanner] -verify = false -``` +## Capability coverage -This makes scans static-only unless the command line overrides it. +Caps split into two groups. Data-style injection (SQL, command, path, +SSRF, XSS) uses language-neutral payload bytes (`' OR 1=1--`, `../../etc/passwd`, +a callback URL), so the harness emitter for any language can carry them. The +caps below have language-specific payloads (a Java gadget chain is not a +Python pickle), so each language is curated on its own. -The related scanner settings are: +A checkmark means a tuned per-language payload set ships for that cell. Cells +without a checkmark in the data-style rows still run, falling back to the +language-neutral payload union. -| Setting | Default | Meaning | +| Cap | Py | JS | TS | Java | PHP | Ruby | Go | Rust | C | C++ | +| --- | -- | -- | -- | ---- | --- | ---- | -- | ---- | - | --- | +| Command / code injection | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | +| SQL injection | union | union | union | union | union | union | union | ✓ | union | union | +| Path traversal | union | union | union | union | union | union | union | ✓ | union | union | +| SSRF | union | union | union | union | union | union | union | ✓ | union | union | +| XSS | union | union | union | union | union | union | union | ✓ | union | union | +| Format string | | | | | | | | | ✓ | | +| Deserialization | ✓ | | | ✓ | ✓ | ✓ | | | | | +| Template injection | ✓ | ✓ | | ✓ | ✓ | ✓ | | | | | +| XXE | ✓ | | | ✓ | ✓ | ✓ | ✓ | | | | +| LDAP injection | ✓ | | | ✓ | ✓ | | | | | | +| XPath injection | ✓ | ✓ | | ✓ | ✓ | | | | | | +| Header / CRLF | ✓ | ✓ | | ✓ | ✓ | ✓ | ✓ | ✓ | | | +| Open redirect | ✓ | ✓ | | ✓ | ✓ | ✓ | ✓ | ✓ | | | +| Prototype pollution | | ✓ | ✓ | | | | | | | | +| Weak crypto | ✓ | | | ✓ | ✓ | | ✓ | ✓ | | | +| JSON parse abuse | ✓ | ✓ | | ✓ | ✓ | ✓ | ✓ | ✓ | | | +| IDOR | ✓ | ✓ | | ✓ | ✓ | ✓ | ✓ | ✓ | | | +| Data exfiltration | ✓ | ✓ | | ✓ | ✓ | ✓ | ✓ | ✓ | | | + +`ENV_VAR`, `SHELL_ESCAPE`, and `URL_ENCODE` are source and sanitizer caps with +no externally observable sink behavior. They route to +`Unsupported(SoundOracleUnavailable)` rather than counting as a missing-payload +gap. + +## Framework adapters + +Adapters bind a function to its external entry surface so the harness can +drive the real entry point (an HTTP request through the framework, a published +message, a scheduled fire) instead of calling the function in isolation. +Middleware and request validation participate in the verdict that way. + +| Language | HTTP routers | Other surfaces | | --- | --- | --- | -| `verify` | `true` | Run dynamic verification after static analysis. | -| `verify_all_confidence` | `false` | Include findings below `Confidence::Medium`. | -| `verify_backend` | `"auto"` | Use Docker when available, otherwise use the process backend. | -| `harden_profile` | `"standard"` | Hardening profile for the process backend. | +| Python | Flask, Django, FastAPI, Starlette | Jinja2, pickle, LDAP, Celery, Kafka, SQS, Pub/Sub, RabbitMQ, Django Channels, Socket.IO, Django middleware, Django + Flask migrations | +| JavaScript | Express, Koa, NestJS, Fastify | Handlebars, Apollo + Relay GraphQL, lodash.merge + JSON deep-assign, Socket.IO, SQS, Express middleware, Knex + Prisma + Sequelize migrations | +| TypeScript | NestJS | Object.assign + lodash.merge + JSON deep-assign | +| Java | Spring, Quarkus, Micronaut, Jakarta Servlet | Thymeleaf, ObjectInputStream, Spring LDAP, Kafka, SQS, RabbitMQ, Quartz, Spring middleware, Flyway + Liquibase migrations | +| PHP | Laravel, Symfony, CodeIgniter | Twig, unserialize, LDAP, Laravel middleware, Laravel migrations | +| Ruby | Rails, Sinatra, Hanami | ERB, Marshal, Sidekiq, ActionCable, Rails middleware, Rails migrations | +| Go | Gin, Echo, Fiber, Chi | gqlgen GraphQL, NATS, Pub/Sub, go-migrate migrations | +| Rust | Axum, Actix, Rocket, Warp | Juniper GraphQL, Refinery + SQLx migrations | +| C / C++ | none | argv / stdin entry only | -See [Configuration](configuration.md) for the full config table. +Adapters are sanitizer-aware. An XXE, header-injection, open-redirect, SSTI, +LDAP, XPath, deserialization, crypto, or data-exfil adapter declines the +binding when the surrounding source visibly hardens the call: a parser set to +`disallow-doctype-decl` or `resolve_entities=False`, a value routed through +`LdapEncoder.filterEncode` or `escape_filter_chars`, a weak primitive swapped +for `secrets.token_bytes` or `crypto.randomBytes` or `SecureRandom`, or a +redirect host checked against an allowlist. That cuts adapter false positives +without losing the genuinely dangerous calls. + +## Entry points + +The verifier knows how to stand up these entry shapes: + +`Function`, `HttpRoute`, `CliSubcommand`, `LibraryApi`, `ClassMethod`, +`MessageHandler`, `ScheduledJob`, `GraphQLResolver`, `WebSocket`, +`Middleware`, `Migration`. + +`ClassMethod` walks constructor parameters and builds the receiver, preferring +a default constructor and otherwise stubbing dependencies (`MockHttpClient`, +`MockDatabaseConnection`, `MockLogger`) up to a bounded depth. `MessageHandler` +boots an in-sandbox broker stub on loopback and publishes the payload. +`Migration` runs under a database-in-test-mode profile with no real +connection. An entry kind a language emitter does not yet support produces +`Inconclusive(EntryKindUnsupported)` with a hint, never a silent skip. ## Sandbox backends ```bash -nyx scan --backend docker # require Docker -nyx scan --backend process # run directly on the host with weaker isolation +nyx scan --backend auto # docker when available, else process (default) +nyx scan --backend docker # require docker +nyx scan --backend process # run on the host with weaker isolation nyx scan --unsafe-sandbox # alias for --backend process +nyx scan --harden strict # full process-backend lockdown ``` Docker is the preferred backend. It mounts only the entry file's directory and @@ -76,19 +185,54 @@ start for callback-style payloads (SSRF, blind SSTI). When the bind succeeds, Docker switches to bridge networking with a host-gateway route so the harness can reach the listener; OOB payloads are skipped if the bind fails. -The process backend is useful for development and machines without Docker. It -does not provide the same isolation. +The process backend runs on the host. It is useful for development and +machines without Docker, and it does not provide the same isolation. Hardening +profiles apply to it: + +- `standard` (default): no-new-privs plus a memory rlimit on Linux. No + `sandbox-exec` wrap on macOS. +- `strict`: namespace unshare, chroot to the workdir, and a default-deny + seccomp filter on Linux; `sandbox-exec -f .sb` on macOS. Opt-in, + because interpreted Linux harnesses can SIGSYS until the per-language seccomp + allowlists are widened. + +Every sink under test passes through the policy deny rules in +`src/dynamic/policy.rs` before the harness builds. Network egress, writes +outside the sandbox root, and process spawns can be denied per rule, and the +deny decision lands in the trace. + +## Performance + +Verification adds a harness build and a sandbox run per finding. Two pieces of +infrastructure keep that affordable at corpus scale. + +Per-language build pools reuse a warm toolchain across findings instead of +cold-starting one each time. Java runs a long-lived `javac` daemon; Node, PHP, +Ruby, Go, Rust, C, and C++ reuse shared module, package, and object caches; +Python layers a read-only venv with a warmed bytecode cache. The target is a +P50 harness build at or under 200ms hot and 1.5s cold, with an OWASP-scale run +finishing in 10 minutes on the dev reference machine. + +Copy-on-write workdirs (`clonefile` on macOS, `reflink` or `copy_file_range` +on Linux) replace per-finding file copies, and the worker pool routes findings +into per-cap concurrency lanes so a slow `DESERIALIZE` harness does not block +fast `SSRF` ones. + +The CI ship gate holds the with-verify to static-only wall-clock ratio at or +under 1.5x on `benches/fixtures/`. If a change pushes it past that, the gate +fails. ## Repro artifacts -Confirmed findings write a repro bundle under: +Confirmed findings write a hermetic bundle: ```text ~/.cache/nyx/dynamic/repro// ``` -The bundle contains the harness spec, payload, expected output, trace, and -`reproduce.sh`. +The bundle carries the harness spec, payload, expected output, trace, and a +`reproduce.sh`. When the toolchain is pinned in `tools/image-builder/images.toml` +it also writes a `docker_pull.sh`. ```bash cd ~/.cache/nyx/dynamic/repro/ @@ -96,15 +240,21 @@ cd ~/.cache/nyx/dynamic/repro/ ./reproduce.sh --docker ``` -Use the Docker form when the bundle records a pinned container image or when -host toolchains differ from the original run. +Use the Docker form when the bundle records a pinned image or when host +toolchains differ from the original run. -## Runtime cost +## Configuration -Verification adds harness build time and sandbox startup time for each verified -finding. For quick local checks, `--no-verify` is usually the right choice. For -CI or scheduled scans, keep verification enabled so confirmed findings rank -higher and not-confirmed findings carry the extra context. +```toml +[scanner] +verify = true # run dynamic verification after static analysis +verify_all_confidence = false # include findings below Confidence::Medium +verify_backend = "auto" # auto | docker | process | firecracker +harden_profile = "standard" # standard | strict +``` + +Set `verify = false` to make scans static-only unless the command line +overrides it. See [Configuration](configuration.md) for the full table. ## Event log @@ -119,10 +269,10 @@ Each line is a JSON object with a versioned envelope: ```json { "schema_version": 1, - "nyx_version": "0.7.0", + "nyx_version": "0.8.0", "corpus_version": "15", "kind": "verdict", - "ts": "2026-05-15T18:42:09Z", + "ts": "2026-06-01T18:42:09Z", "finding_id": "a3b1...", "spec_hash": "9f4e...", "lang": "python", @@ -135,35 +285,37 @@ Each line is a JSON object with a versioned envelope: } ``` -The literal `nyx_version` and `corpus_version` values shift between releases; see `crate::dynamic::telemetry::CORPUS_VERSION` for the active payload-corpus version your binary writes. +The literal `nyx_version` and `corpus_version` values shift between releases; +see `crate::dynamic::telemetry::CORPUS_VERSION` for the active payload-corpus +version your binary writes. | Field | Meaning | | --- | --- | | `schema_version` | Event schema version. Readers reject mismatches. | | `nyx_version` | Version of the Nyx binary that wrote the event. | | `corpus_version` | Payload corpus version used for the verdict. | -| `kind` | `verdict` or `rank_delta`. Feedback rows use an `event: "verify_feedback"` field instead and may pre-date the schema envelope. | +| `kind` | `verdict` or `rank_delta`. Feedback rows use an `event: "verify_feedback"` field instead. | | `ts` | Write time in RFC 3339 format. | | `finding_id` | Stable finding identifier. | | `spec_hash` | Hash of the harness spec. | | `lang` | Language slug, or `unknown` when spec derivation failed. | | `cap` | Sink capability, such as `SQL_QUERY` or `CODE_EXEC`. | -| `status` | `Confirmed`, `NotConfirmed`, `Inconclusive`, or `Unsupported`. | +| `status` | `Confirmed`, `PartiallyConfirmed`, `NotConfirmed`, `Inconclusive`, or `Unsupported`. | | `inconclusive_reason` | Present when `status` is `Inconclusive`. | If the schema changes, move or delete the old `events.jsonl` before reading it with the new binary. Programmatic readers should use `crate::dynamic::telemetry::read_events(path)`. -## Sampling +### Sampling `[telemetry]` in `nyx.toml` controls event retention: ```toml [telemetry] -keep_all_confirmed = true +keep_all_confirmed = true keep_all_inconclusive = true -sample_rate_other = 1.0 +sample_rate_other = 1.0 ``` `sample_rate_other` accepts `0.0` to `1.0` and applies to `NotConfirmed` and @@ -182,9 +334,44 @@ nyx verify-feedback --wrong "reason" Feedback is written to the local event log. Nyx does not upload it. +## Determinism + +Every random source is seeded from the spec hash, so two runs of the same spec +produce identical payloads and identical verdicts. `scripts/check_no_unseeded_rand.sh` +audits the tree for unseeded `rand` usage on every CI run. + +## Limitations + +- The harness drives the sink, not always the enclosing function. When a + finding's safety comes from a guard in the code around the sink (a merge + target built with `Object.create(null)`, an `ObjectInputStream` subclass + whose `resolveClass` enforces an allowlist, a const-name check before + `Marshal.load`), the synthesized harness can exercise the sink directly and + miss that guard, which over-confirms. Read `Confirmed` as "this sink is + reachable and fires on attacker input," not "this exact code path has no + in-line mitigation." Framework-level guards (auth middleware, helmet) are + recognized and demote to `ConfirmedWithKnownGuard`; custom in-function guards + are not yet captured. +- Per-language payload curation is uneven. Command and code injection ship for + all ten languages; the classic data-style injection caps (SQL, path + traversal, SSRF, XSS) ship a tuned set for Rust and fall back to a + language-neutral payload union elsewhere; the framework-specific caps are + curated for the languages where they occur. The matrix above is the precise + state. +- A `NotConfirmed` verdict is not a clean bill. It means the harness did not + fire, which can be an infeasible path or a corpus that does not cover the + shape. Keep reviewing `NotConfirmed` findings. +- The process backend is weaker isolation than Docker. Use `--backend docker` + or `--harden strict` for untrusted code, and never `--unsafe-sandbox` in CI. +- Real-corpus acceptance rows (OWASP Benchmark, NodeGoat, Juice Shop, and the + polyglot set) self-skip in CI unless the corresponding `NYX_*_CORPUS` + environment variable points at a checkout. They are not vendored into the + repo. +- C and C++ have no framework adapters. Findings in those languages verify + through `argv` and `stdin` entry points only. + ## Browser UI `nyx serve` shows dynamic verdicts on finding detail pages, uses them in -ranking, and can compare verdict changes between saved scans. - -See [Output formats](output.md) for the `dynamic_verdict` schema. +ranking, and can compare verdict changes between saved scans. See +[Output formats](output.md) for the `dynamic_verdict` schema. diff --git a/src/dynamic/telemetry.rs b/src/dynamic/telemetry.rs index 21cfccfc..0e9ba086 100644 --- a/src/dynamic/telemetry.rs +++ b/src/dynamic/telemetry.rs @@ -20,8 +20,8 @@ //! ```json //! { //! "schema_version": 1, -//! "nyx_version": "0.7.0", -//! "corpus_version": "4", +//! "nyx_version": "0.8.0", +//! "corpus_version": "15", //! "kind": "verdict", //! "ts": "", //! "finding_id": "...",