nyx/docs/dynamic.md

397 lines
18 KiB
Markdown
Raw Normal View History

2026-06-05 10:16:30 -05:00
# Dynamic verification
Static analysis tells you a sink is reachable from a source. Dynamic
verification tries to prove it. When verification is on, Nyx builds a small
harness around each finding, runs it in a sandbox against a curated payload
set, and stamps the result onto `evidence.dynamic_verdict`.
It is a second signal, not a replacement for review. A `Confirmed` verdict
means Nyx triggered the sink in its harness with an attacker-controlled
payload and proved the benign control stayed clean. `NotConfirmed` means the
harness ran but nothing fired. Neither verdict closes a finding on its own.
Default Nyx builds include the `dynamic` feature. Custom
`--no-default-features` builds run static-only unless rebuilt with
`--features dynamic`.
## How confirmation works
Every cap that can be verified ships a curated corpus of payload pairs: at
least one vulnerable payload and one benign control. The verifier runs both
through the same harness and compares.
- The vulnerable payload must fire the sink. A payload "fires" when an
oracle predicate matches the observed behavior, not when a string appears
in the output.
- The benign control must stay clean. It exercises the same code path with a
value that a correct implementation handles safely.
A finding is `Confirmed` only when at least one vulnerable payload fires and
every paired benign control stays clean. This differential rule is what keeps
the verifier from confirming a finding just because the harness echoed an
input.
Oracles are behavioral, scoped to the cap:
| Cap | Oracle | What it observes |
| --- | --- | --- |
| Command/code injection | stub event | the harness's exec boundary saw the injected command |
| SQL injection | stub event | the SQL boundary saw the injected clause |
| SSRF, data exfil | outbound host | the request left for a host outside the allowlist |
| Path traversal | stub event | the filesystem boundary opened a path outside the root |
| Template injection | template eval | `{{7*7}}` rendered as `49`, not echoed as text |
| Deserialization | gadget marker | a non-allowlisted class was resolved during decode |
| XXE | entity expansion | an external entity was expanded by the parser |
| LDAP / XPath injection | result count | the malicious filter returned more rows than the benign one |
| Header / CRLF | header split | an injected `\r\n` split or added a response header |
| Open redirect | redirect host | the `Location` header pointed off-origin |
| Prototype pollution | canary touch | a property write reached `Object.prototype` |
| Weak crypto | key entropy | the produced key fit inside a 16-bit search space |
| JSON parse abuse | parse depth | the parser accepted a depth past its limit |
| IDOR | ownership cross | the read crossed from the caller's id to another owner's |
Every canary is derived per-run from `BLAKE3(spec_hash || run_nonce)`, so it is
unique per finding, collision-resistant against ambient harness output, and
never appears on the host.
## Running it
```bash
nyx scan # verifies Medium and High confidence findings
nyx scan --no-verify # static analysis only
nyx scan --verify # explicit form of the default behavior
nyx scan --verify-all-confidence # also verify Low-confidence findings
```
Use `--no-verify` for fast local checks or editor workflows. Keep
verification on for CI when scan time allows it. `--verify-all-confidence` is
slower and noisier; reach for it when tuning payloads or chasing coverage.
## Verdicts
| Status | Meaning |
| --- | --- |
| `Confirmed` | A vulnerable payload fired the sink and every benign control stayed clean. |
| `PartiallyConfirmed` | The sink was reached but no oracle marker was observed. The exploit chain did not complete. Treat as a strong lead, not a proof. |
| `NotConfirmed` | The harness ran but no payload fired. The path is likely infeasible or the corpus does not cover this shape. The original finding stays open until reviewed. |
| `Inconclusive` | Nyx could not finish the check. Carries a typed reason (build failed, spec derivation failed, sandbox error, policy denied, and others). |
| `Unsupported` | Nyx did not attempt the finding. Carries a typed reason (language unsupported, entry kind unsupported, no payloads for cap, confidence below threshold, no sound oracle). |
When a `Confirmed` sink sits behind a recognized input-validation or
output-sanitization guard (Spring `@PreAuthorize`, Express `helmet`, Nest
`@UseGuards`, Django `@permission_classes`), the verdict demotes to
`ConfirmedWithKnownGuard` and the guard names land on
`differential.known_guards`. Authentication-only filters do not trigger the
demotion, since they do not mitigate injection.
`PartiallyConfirmed` is deliberate. It marks the cases where engine work can
ratchet without the tool overstating what it proved.
## Capability coverage
Caps split into two groups. Data-style injection (SQL, command, path,
SSRF, XSS) uses language-neutral payload bytes (`' OR 1=1--`, `../../etc/passwd`,
a callback URL), so the harness emitter for any language can carry them. The
caps below have language-specific payloads (a Java gadget chain is not a
Python pickle), so each language is curated on its own.
A checkmark means a tuned per-language payload set ships for that cell. Cells
without a checkmark in the data-style rows still run, falling back to the
language-neutral payload union.
| Cap | Py | JS | TS | Java | PHP | Ruby | Go | Rust | C | C++ |
| --- | -- | -- | -- | ---- | --- | ---- | -- | ---- | - | --- |
| Command / code injection | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| SQL injection | union | union | union | union | union | union | union | ✓ | union | union |
| Path traversal | union | union | union | union | union | union | union | ✓ | union | union |
| SSRF | union | union | union | union | union | union | union | ✓ | union | union |
| XSS | union | union | union | union | union | union | union | ✓ | union | union |
| Format string | | | | | | | | | ✓ | |
| Deserialization | ✓ | | | ✓ | ✓ | ✓ | | | | |
| Template injection | ✓ | ✓ | | ✓ | ✓ | ✓ | | | | |
| XXE | ✓ | | | ✓ | ✓ | ✓ | ✓ | | | |
| LDAP injection | ✓ | | | ✓ | ✓ | | | | | |
| XPath injection | ✓ | ✓ | | ✓ | ✓ | | | | | |
| Header / CRLF | ✓ | ✓ | | ✓ | ✓ | ✓ | ✓ | ✓ | | |
| Open redirect | ✓ | ✓ | | ✓ | ✓ | ✓ | ✓ | ✓ | | |
| Prototype pollution | | ✓ | ✓ | | | | | | | |
| Weak crypto | ✓ | | | ✓ | ✓ | | ✓ | ✓ | | |
| JSON parse abuse | ✓ | ✓ | | ✓ | ✓ | ✓ | ✓ | ✓ | | |
| IDOR | ✓ | ✓ | | ✓ | ✓ | ✓ | ✓ | ✓ | | |
| Data exfiltration | ✓ | ✓ | | ✓ | ✓ | ✓ | ✓ | ✓ | | |
`ENV_VAR`, `SHELL_ESCAPE`, and `URL_ENCODE` are source and sanitizer caps with
no externally observable sink behavior. They route to
`Unsupported(SoundOracleUnavailable)` rather than counting as a missing-payload
gap.
## Framework adapters
Adapters bind a function to its external entry surface so the harness can
drive the real entry point (an HTTP request through the framework, a published
message, a scheduled fire) instead of calling the function in isolation.
Middleware and request validation participate in the verdict that way.
| Language | HTTP routers | Other surfaces |
| --- | --- | --- |
| Python | Flask, Django, FastAPI, Starlette | Jinja2, pickle, LDAP, Celery, Kafka, SQS, Pub/Sub, RabbitMQ, Django Channels, Socket.IO, Django middleware, Django + Flask migrations |
| JavaScript | Express, Koa, NestJS, Fastify | Handlebars, Apollo + Relay GraphQL, lodash.merge + JSON deep-assign, Socket.IO, SQS, Express middleware, Knex + Prisma + Sequelize migrations |
| TypeScript | NestJS | Object.assign + lodash.merge + JSON deep-assign |
| Java | Spring, Quarkus, Micronaut, Jakarta Servlet | Thymeleaf, ObjectInputStream, Spring LDAP, Kafka, SQS, RabbitMQ, Quartz, Spring middleware, Flyway + Liquibase migrations |
| PHP | Laravel, Symfony, CodeIgniter | Twig, unserialize, LDAP, Laravel middleware, Laravel migrations |
| Ruby | Rails, Sinatra, Hanami | ERB, Marshal, Sidekiq, ActionCable, Rails middleware, Rails migrations |
| Go | Gin, Echo, Fiber, Chi | gqlgen GraphQL, NATS, Pub/Sub, go-migrate migrations |
| Rust | Axum, Actix, Rocket, Warp | Juniper GraphQL, Refinery + SQLx migrations |
| C / C++ | none | argv / stdin entry only |
Adapters are sanitizer-aware. An XXE, header-injection, open-redirect, SSTI,
LDAP, XPath, deserialization, crypto, or data-exfil adapter declines the
binding when the surrounding source visibly hardens the call: a parser set to
`disallow-doctype-decl` or `resolve_entities=False`, a value routed through
`LdapEncoder.filterEncode` or `escape_filter_chars`, a weak primitive swapped
for `secrets.token_bytes` or `crypto.randomBytes` or `SecureRandom`, or a
redirect host checked against an allowlist. That cuts adapter false positives
without losing the genuinely dangerous calls.
## Entry points
The verifier knows how to stand up these entry shapes:
`Function`, `HttpRoute`, `CliSubcommand`, `LibraryApi`, `ClassMethod`,
`MessageHandler`, `ScheduledJob`, `GraphQLResolver`, `WebSocket`,
`Middleware`, `Migration`.
`ClassMethod` walks constructor parameters and builds the receiver, preferring
a default constructor and otherwise stubbing dependencies (`MockHttpClient`,
`MockDatabaseConnection`, `MockLogger`) up to a bounded depth. `MessageHandler`
boots an in-sandbox broker stub on loopback and publishes the payload.
`Migration` runs under a database-in-test-mode profile with no real
connection. An entry kind a language emitter does not yet support produces
`Inconclusive(EntryKindUnsupported)` with a hint, never a silent skip.
## Sandbox backends
```bash
nyx scan --backend auto # docker when available, else process (default)
nyx scan --backend docker # require docker
nyx scan --backend process # run on the host with weaker isolation
nyx scan --unsafe-sandbox # alias for --backend process
nyx scan --harden strict # full process-backend lockdown
```
Docker is the preferred backend. It mounts only the entry file's directory and
blocks outbound network by default. Nyx binds a loopback OOB listener at scan
start for callback-style payloads (SSRF, blind SSTI). When the bind succeeds,
Docker switches to bridge networking with a host-gateway route so the harness
can reach the listener; OOB payloads are skipped if the bind fails.
The process backend runs on the host. It is useful for development and
machines without Docker, and it does not provide the same isolation. Hardening
profiles apply to it:
- `standard` (default): no-new-privs plus a memory rlimit on Linux. No
`sandbox-exec` wrap on macOS.
- `strict`: namespace unshare, chroot to the workdir, and a default-deny
seccomp filter on Linux; `sandbox-exec -f <cap>.sb` on macOS. Opt-in,
because interpreted Linux harnesses can SIGSYS until the per-language seccomp
allowlists are widened.
Every sink under test passes through the policy deny rules in
`src/dynamic/policy.rs` before the harness builds. Network egress, writes
outside the sandbox root, and process spawns can be denied per rule, and the
deny decision lands in the trace.
## Performance
Verification adds a harness build and a sandbox run per finding. Two pieces of
infrastructure keep that affordable at corpus scale.
Per-language build pools reuse a warm toolchain across findings instead of
cold-starting one each time. Java runs a long-lived `javac` daemon; Node, PHP,
Ruby, Go, Rust, C, and C++ reuse shared module, package, and object caches;
Python layers a read-only venv with a warmed bytecode cache. The target is a
P50 harness build at or under 200ms hot and 1.5s cold, with an OWASP-scale run
finishing in 10 minutes on the dev reference machine.
Copy-on-write workdirs (`clonefile` on macOS, `reflink` or `copy_file_range`
on Linux) replace per-finding file copies, and the worker pool routes findings
into per-cap concurrency lanes so a slow `DESERIALIZE` harness does not block
fast `SSRF` ones.
The CI ship gate holds the with-verify to static-only wall-clock ratio at or
under 1.5x on `benches/fixtures/`. If a change pushes it past that, the gate
fails.
## Repro artifacts
2026-06-05 13:10:58 -05:00
Confirmed findings write a hermetic bundle under Nyx's platform cache
directory:
2026-06-05 10:16:30 -05:00
```text
2026-06-05 13:10:58 -05:00
<cache-dir>/nyx/dynamic/repro/<spec_hash>/
2026-06-05 10:16:30 -05:00
```
2026-06-05 13:10:58 -05:00
On Linux this is usually `~/.cache/nyx/dynamic/repro/<spec_hash>/`; on macOS
it is usually `~/Library/Caches/nyx/dynamic/repro/<spec_hash>/`.
2026-06-05 10:16:30 -05:00
The bundle carries the harness spec, payload, expected output, trace, and a
`reproduce.sh`. When the toolchain is pinned in `tools/image-builder/images.toml`
it also writes a `docker_pull.sh`.
2026-06-05 13:10:58 -05:00
The easiest replay path starts from the finding id shown in scan output or the
browser UI:
```bash
nyx repro --finding <finding_id>
nyx repro --finding <finding_id> --docker
```
You can also replay an exact bundle by spec hash, or inspect the shell script
directly:
2026-06-05 10:16:30 -05:00
```bash
2026-06-05 13:10:58 -05:00
nyx repro --spec-hash <spec_hash>
cd <cache-dir>/nyx/dynamic/repro/<spec_hash>
2026-06-05 10:16:30 -05:00
./reproduce.sh
./reproduce.sh --docker
```
Use the Docker form when the bundle records a pinned image or when host
toolchains differ from the original run.
## Configuration
```toml
[scanner]
verify = true # run dynamic verification after static analysis
verify_all_confidence = false # include findings below Confidence::Medium
verify_backend = "auto" # auto | docker | process | firecracker
harden_profile = "standard" # standard | strict
```
Set `verify = false` to make scans static-only unless the command line
overrides it. See [Configuration](configuration.md) for the full table.
## Event log
Nyx writes verdict events to:
```text
~/.cache/nyx/dynamic/events.jsonl
```
Each line is a JSON object with a versioned envelope:
```json
{
"schema_version": 1,
"nyx_version": "0.8.0",
"corpus_version": "15",
"kind": "verdict",
"ts": "2026-06-01T18:42:09Z",
"finding_id": "a3b1...",
"spec_hash": "9f4e...",
"lang": "python",
"cap": "SQL_QUERY",
"status": "Confirmed",
"toolchain_id": "python-3.11",
"toolchain_match": "exact",
"duration_ms": 312,
"build_attempts": 1
}
```
The literal `nyx_version` and `corpus_version` values shift between releases;
see `crate::dynamic::telemetry::CORPUS_VERSION` for the active payload-corpus
version your binary writes.
| Field | Meaning |
| --- | --- |
| `schema_version` | Event schema version. Readers reject mismatches. |
| `nyx_version` | Version of the Nyx binary that wrote the event. |
| `corpus_version` | Payload corpus version used for the verdict. |
| `kind` | `verdict` or `rank_delta`. Feedback rows use an `event: "verify_feedback"` field instead. |
| `ts` | Write time in RFC 3339 format. |
| `finding_id` | Stable finding identifier. |
| `spec_hash` | Hash of the harness spec. |
| `lang` | Language slug, or `unknown` when spec derivation failed. |
| `cap` | Sink capability, such as `SQL_QUERY` or `CODE_EXEC`. |
| `status` | `Confirmed`, `PartiallyConfirmed`, `NotConfirmed`, `Inconclusive`, or `Unsupported`. |
| `inconclusive_reason` | Present when `status` is `Inconclusive`. |
If the schema changes, move or delete the old `events.jsonl` before reading it
with the new binary. Programmatic readers should use
`crate::dynamic::telemetry::read_events(path)`.
### Sampling
`[telemetry]` in `nyx.toml` controls event retention:
```toml
[telemetry]
keep_all_confirmed = true
keep_all_inconclusive = true
sample_rate_other = 1.0
```
`sample_rate_other` accepts `0.0` to `1.0` and applies to `NotConfirmed` and
`Unsupported` verdicts. The decision is deterministic for a given `spec_hash`.
Confirmed, Inconclusive, and rank-delta events are always kept by default.
Set `NYX_NO_TELEMETRY=1` to disable event writes.
## Feedback
To record a bad verdict:
```bash
nyx verify-feedback <finding_id> --wrong "reason"
```
Feedback is written to the local event log. Nyx does not upload it.
## Determinism
Every random source is seeded from the spec hash, so two runs of the same spec
produce identical payloads and identical verdicts. `scripts/check_no_unseeded_rand.sh`
audits the tree for unseeded `rand` usage on every CI run.
## Limitations
- The harness drives the finding's enclosing entry function when one is
derivable, routing the payload to the tainted parameter, so a guard in the
code around the sink (a merge target built with `Object.create(null)`, an
`ObjectInputStream` subclass whose `resolveClass` enforces an allowlist, a
const-name check before `Marshal.load`) runs first and participates in the
verdict. The build-time choice is recorded on the verify trace as
`entry_invocation` (`mode=entry_function` or `mode=direct_sink`). When no
enclosing entry can be derived the harness falls back to driving the sink
directly, and that fallback can over-confirm a guard it never executes. Read
a `direct_sink` `Confirmed` as "this sink is reachable and fires on attacker
input," not "this exact code path has no in-line mitigation." Framework-level
guards (auth middleware, helmet) are also recognized and demote to
`ConfirmedWithKnownGuard`.
- Per-language payload curation is uneven. Command and code injection ship for
all ten languages; the classic data-style injection caps (SQL, path
traversal, SSRF, XSS) ship a tuned set for Rust and fall back to a
language-neutral payload union elsewhere; the framework-specific caps are
curated for the languages where they occur. The matrix above is the precise
state.
- A `NotConfirmed` verdict is not a clean bill. It means the harness did not
fire, which can be an infeasible path or a corpus that does not cover the
shape. Keep reviewing `NotConfirmed` findings.
- The process backend is weaker isolation than Docker. Use `--backend docker`
or `--harden strict` for untrusted code, and never `--unsafe-sandbox` in CI.
- Real-corpus acceptance rows (OWASP Benchmark, NodeGoat, Juice Shop, and the
polyglot set) self-skip in CI unless the corresponding `NYX_*_CORPUS`
environment variable points at a checkout. They are not vendored into the
repo.
- C and C++ have no framework adapters. Findings in those languages verify
through `argv` and `stdin` entry points only.
## Browser UI
`nyx serve` shows dynamic verdicts on finding detail pages, uses them in
ranking, and can compare verdict changes between saved scans. See
[Output formats](output.md) for the `dynamic_verdict` schema.