Improved path traversal detection and enhanced sink classification logic

This commit is contained in:
Eli Peter 2026-05-02 03:36:14 -04:00 committed by GitHub
parent 58f1794a4e
commit 3c89bddbf2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
56 changed files with 3989 additions and 345 deletions

View file

@ -31,6 +31,22 @@ SQL sink as an injection risk; an SSRF sink whose URL prefix is locked to a
trusted host stays quiet. This turns a large class of FPs on numeric and
locked-prefix paths into true negatives.
**Path traversal.** The path domain accepts canonicalised-and-rooted
shapes via `PathFact::is_path_traversal_safe`: a path that is
dotdot-free and either non-absolute or carries a verified prefix-lock has
its `Cap::FILE_IO` cleared. When the lock argument is a string literal
the lock prefix is recorded directly; when it is a method call, field
access, or configured root, an `OPAQUE_PREFIX_LOCK` marker captures the
structural invariant ("rooted under SOME prefix") instead. This closes
the Ruby `File.expand_path + start_with?(root)`, Python
`os.path.realpath + .startswith(root)`, and JS
`path.resolve + .startsWith(root)` shapes. `classify_path_assertion`
recognises JS `.startsWith(...)`, Python `.startswith(...)`, Ruby
`.start_with?(...)` (paren and paren-less), and Go `strings.HasPrefix(...)`.
Branch narrowing flips lock attachment under condition negation
(`if !target.startsWith(ROOT) { return; }` attaches the lock to the
surviving block, not the rejection arm).
**How to turn it off.**
| Surface | Value |
@ -111,9 +127,8 @@ identity independent of the parent value.
|---|---|
| Env var | `NYX_POINTER_ANALYSIS=0` |
The pass is **on by default** as of 2026-04-26. The env-var override is
kept for one release so you can compare against the pre-pointer baseline,
then will be removed.
The pass is **on by default**. The env-var override exists so you can
compare against the pre-pointer baseline.
**Limitations.** This is not a general escape analysis. Function pointers
and arbitrary indirect calls still resolve to no callee, and deep alias

View file

@ -86,6 +86,6 @@ Auth findings render alongside taint findings in the [browser UI](serve.md). The
<p align="center"><img src="../assets/screenshots/docs/serve-finding-detail.png" alt="Nyx finding detail: numbered source → call → sink walk with a How to fix panel and an inline evidence object" width="900"/></p>
## Where the work was done
## Benchmark corpus
The remediation work is documented release-by-release in `tests/benchmark/RESULTS.md` under the Rust auth row. Phases A1 through B5 (precision and structural improvements) and Phase C (taint-based variant) all landed on the 0.5.0 release branch. The benchmark corpus at [`tests/benchmark/corpus/rust/auth/`](https://github.com/elicpeter/nyx/tree/master/tests/benchmark/corpus/rust/auth/) is 10 fixtures covering the five FP patterns plus a true-positive control.
The Rust auth corpus at [`tests/benchmark/corpus/rust/auth/`](https://github.com/elicpeter/nyx/tree/master/tests/benchmark/corpus/rust/auth/) is 10 fixtures covering the five FP patterns plus a true-positive control. Per-row metrics live under the Rust auth row in `tests/benchmark/RESULTS.md`.

View file

@ -394,7 +394,7 @@ engine. No flag is needed; CI pipelines keep working across upgrades.
The rebuild is logged at `info` level:
```
engine version changed (0.4.0 → 0.5.0), rebuilding index
engine version changed (<old><new>), rebuilding index
```
If you see this once per upgrade it is working as intended. If you see it on

View file

@ -56,8 +56,10 @@ Full list: [rules.md](../rules.md).
| `eval("hardcoded literal")` | Pattern matches structure | Run `--mode cfg` to drop AST patterns and rely on taint |
| `unsafe` block with sound justification | Every `unsafe` matches `rs.quality.unsafe_block` | Filter `>=MEDIUM` (it's Medium) or accept the noise |
| `.unwrap()` in tests | Acceptable in test code | Default non-prod severity downgrade reduces it |
| `md5` for non-cryptographic checksums | Pattern can't see intent | Suppress with `--severity ">=MEDIUM"` or per-line `nyx:ignore` |
| `md5` for non-cryptographic checksums | Pattern can't see intent in most languages | PHP recognises non-crypto consuming context structurally (cache keys, ETag, dedup, `getCacheKey()` returns) and suppresses. Other languages: `--severity ">=MEDIUM"` or per-line `nyx:ignore` |
| SQL concat with trusted data (Tier B) | Heuristic can't verify the source | Taint is more precise; or convert to a parameterized query |
| C++ `reinterpret_cast<T>(...)` for byte-pointer / void* / `sockaddr` | Pattern fires on every cast regardless of target type | Suppressed when the target is well-defined by C++ aliasing rules: `char*`, `unsigned char*`, `signed char*`, `wchar_t*`, `uint8_t*`, `int8_t*`, `std::byte*`, `byte*`, `void*`, `uintptr_t` / `intptr_t` (and `std::` variants), and the BSD socket address family. User-defined struct or class pointer targets keep firing. |
| JS / TS `secrets.fallback_secret` on `process.env.X \|\| ""` | Empty-string fallback satisfies non-undefined string types without committing a secret | Empty-string fallbacks are excluded from the rule. Non-empty literal fallbacks still fire. |
## Confidence levels

View file

@ -130,7 +130,7 @@ Sources, sanitizers, and sinks are linked by named capabilities. A sanitizer onl
| `shell_escape` | | `shlex.quote`, `shell_escape::escape` | `system`, `Command::new`, `eval` |
| `url_encode` | | `encodeURIComponent` | `location.href`, HTTP client URL arg |
| `json_parse` | | `JSON.parse` | |
| `file_io` | | `os.path.realpath`, `filepath.Clean` | `open`, `fs::read_to_string`, `send_file` |
| `file_io` | | `os.path.realpath`, `filepath.Clean`, canonicalise + `starts_with`-rooted guard | `open`, `fs::read_to_string`, `send_file` |
| `fmt_string` | | | `printf(var)` |
| `sql_query` | | parameterized query binders | `cursor.execute`, `db.query` with concatenation |
| `deserialize` | | | `pickle.loads`, `yaml.load`, `Marshal.load` |

View file

@ -16,7 +16,7 @@ Two extra layers tune precision around calls. **Context-sensitive inlining** (k=
When a method call has a receiver typed as a super-class, trait, or interface, **hierarchy fan-out** widens the resolved callee set to every concrete implementer the engine has seen. A class diagram extracted in pass 1 (Java extends/implements, Rust impl-for, TS/JS extends, Python bases, Ruby includes, PHP extends/implements, C++ inheritance) feeds an index that the call resolver consults during pass 2. The fan-out is capped at 8 implementers per call site; over-fanning is a precision tax, not a soundness issue.
A separate **field-sensitive points-to** pass tracks abstract locations down to the field level, so `c.mu.Lock()` is a lock on `Field(c, mu)` rather than on `c` as a whole. That distinction is what lets the resource-lifecycle and taint passes tell `obj.field = tainted; sink(obj.other_field)` apart from the conservative whole-variable approximation. Subscript reads and writes (`arr[i]`, `map[k] = v`) lower to synthetic `__index_get__` / `__index_set__` calls so the same container model handles them. Set `NYX_POINTER_ANALYSIS=0` to fall back to the pre-pointer-pass behaviour for one release if you need to compare baselines.
A separate **field-sensitive points-to** pass tracks abstract locations down to the field level, so `c.mu.Lock()` is a lock on `Field(c, mu)` rather than on `c` as a whole. That distinction is what lets the resource-lifecycle and taint passes tell `obj.field = tainted; sink(obj.other_field)` apart from the conservative whole-variable approximation. Subscript reads and writes (`arr[i]`, `map[k] = v`) lower to synthetic `__index_get__` / `__index_set__` calls so the same container model handles them. Set `NYX_POINTER_ANALYSIS=0` to fall back to the pre-pointer-pass behaviour for baseline comparison.
## Optional analyses on top

View file

@ -61,7 +61,7 @@ Optional features:
Nyx stores its scanner version in the project's index database. When the binary's version differs from the stored version, the index is wiped on the next scan and rebuilt against the new engine. You'll see one info-level log line:
```
engine version changed (0.4.0 → 0.5.0), rebuilding index
engine version changed (<old><new>), rebuilding index
```
No flag needed. If you see this on *every* scan, the metadata row isn't being persisted; file an issue.

View file

@ -9,24 +9,22 @@ The classifications here are grounded in three concrete signals:
1. **Rule depth**: how many distinct source / sanitizer / sink matchers exist
for the language in `src/labels/<lang>.rs`, and how many vulnerability
classes (Cap bits) those matchers cover.
2. **Benchmark results**: rule-level precision / recall / F1 on the 433-case
2. **Benchmark results**: rule-level precision / recall / F1 on the 492-case
corpus in
[`tests/benchmark/RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md),
last measured 2026-04-29 with scanner version 0.5.0.
[`tests/benchmark/RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md).
3. **Known weak spots**: FPs and FNs the maintainers have deliberately left
in the benchmark rather than suppressed, plus structural engine
limitations the corpus does not stress, documented release-by-release in
limitations the corpus does not stress, documented in
[`RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md).
As of 2026-04-29 the synthetic corpus has effectively saturated: every
real-CVE fixture fires and rule-level recall is 100%. Nine of ten
languages report rule-level F1 = 100.0%; Go reports 98.0% on the back of
a single safe-fixture FP. Aggregate rule-level P=0.995, R=1.000, F1=0.998.
That means F1 alone no longer differentiates tiers, so the differentiators
are **rule depth**, **gated-sink coverage**, and **structural idioms the
corpus does not fully stress** (deep pointer aliasing in C/C++,
framework-specific context). All parser integrations use tree-sitter and
are stable; parsing is not a differentiator.
The synthetic corpus has effectively saturated: every
real-CVE fixture fires and rule-level precision and recall are both 100%.
All ten languages report rule-level F1 = 100.0%. Aggregate rule-level
P=1.000, R=1.000, F1=1.000. That means F1 alone no longer differentiates
tiers, so the differentiators are **rule depth**, **gated-sink coverage**,
and **structural idioms the corpus does not fully stress** (deep pointer
aliasing in C/C++, framework-specific context). All parser integrations
use tree-sitter and are stable; parsing is not a differentiator.
---
@ -35,7 +33,7 @@ are stable; parsing is not a differentiator.
| Tier | Languages | F1 | What to expect |
|------|-----------|----|----------------|
| **Stable** | Python, JavaScript, TypeScript | 100% | Deep rule sets, gated sinks (argument-role-aware), framework detection, extensive fixtures, and the bulk of advanced-analysis (SSA two-level solve, context-sensitivity, symbolic execution, abstract interpretation) coverage. Safe to depend on in CI gates. |
| **Beta** | Go, Java, PHP, Ruby, Rust | 98.0% to 100% | Solid mid-depth rule sets with narrower cap coverage and **no gated sinks**. Cross-file flows work; some idioms (variable-typed method receivers, framework context, string interpolation, match-arm guards) are partially modeled. Usable in CI; review FP/FN lists before tightening gates. |
| **Beta** | Go, Java, PHP, Ruby, Rust | 100% | Solid mid-depth rule sets with narrower cap coverage and **no gated sinks**. Cross-file flows work; some idioms (variable-typed method receivers, framework context, string interpolation, match-arm guards) are partially modeled. Usable in CI; review FP/FN lists before tightening gates. |
| **Preview** | C, C++ | 100% on synthetic corpus | Recent work taught the engine to follow taint through `std::vector` / `std::string` / map containers (including `c_str()`), through fluent builder chains like `Socket::builder().host(h).connect()`, and through inline class member functions. Function pointers and deeper pointer aliasing through `*p` / `p->field` are still not tracked. Rule-level scores against a corpus of obvious unsafe-API uses look perfect, but that is not the same as a clean audit on a real codebase. Pair with clang-tidy, Clang Static Analyzer, or Infer. |
---
@ -90,13 +88,15 @@ are stable; parsing is not a differentiator.
### Beta tier
#### Go: 96.2% P / 100.0% R / 98.0% F1 *(53-case corpus, 1 FP, 0 FNs)*
#### Go: 100% P / 100% R / 100% F1 *(56-case corpus)*
- **Rule depth**: 4 source families, 4 sanitizer families, 9 sink matchers
covering HTML, URL, Shell, SQL, SSRF, Crypto, and File I/O.
- **Framework context**: Gin, Echo source matchers.
- **Open weak spots**: one safe Go fixture (`go-safe-009`) draws a spurious
CMDi finding.
- **Recent fix**: `strings.ReplaceAll` is now recognised as a CMDi sanitiser
in chain-wrapper / call-site-replace shapes, clearing the last open
Go safe-fixture FP (`go-safe-009`, `validate(s string)` wrapping a
`strings.ReplaceAll` over `;`).
- **Known gaps**: no gated sinks, no deserialization class. `fmt.Sprintf`
is deliberately not a sink. Cap coverage is narrower than the Stable
tier and argument-role-aware sink modeling is not yet implemented for Go,
@ -126,21 +126,25 @@ are stable; parsing is not a differentiator.
#### Ruby: 100% P / 100% R / 100% F1 *(39-case corpus)*
- **Rule depth**: 3 source families, 7 sanitizer families, 15 sink matchers
covering HTML, Shell, SQL, Code, SSRF, File I/O, and Deserialization.
- **Rule depth**: 3 source families, 7 sanitizer families, 16 sink matchers
covering HTML, Shell, SQL, Code, SSRF, File I/O, and Deserialization. SSRF
coverage includes `URI.open` and the low-level `OpenURI.open_uri` it
delegates to (the canonical CarrierWave CVE-2021-21288 sink).
Statement-level chained-call wrappers
(`YAML.safe_load(File.read(filename))`, `Marshal.load(File.read(p))`,
`String.new(File.read(x))`) classify the inner sink for cross-function
summary extraction so the outer call does not strip the sink classification
on the helper.
- **Framework context**: Rails helpers (`sanitize_sql`, `permit`, `require`).
- **Known gaps**: string interpolation inside shell and SQL strings is
recognized structurally but not modeled as a distinct operator.
`begin/rescue/ensure` exception-edge wiring is documented as deferred
(structurally incompatible with `build_try()`). The previous open
`rb-interproc-001` FN closed in the 2026-04-28 baseline after the
Ruby `Kernel#open` CMDI sink and exact-match sigil work landed.
(structurally incompatible with `build_try()`).
#### Rust: 100% P / 100% R / 100% F1 *(70-case adversarial corpus)*
Rust holds the largest per-language adversarial corpus and was promoted
from Experimental to Beta in the 2026-04-25 measurement after the PathFact
landings closed every previously-open `rs-safe-*` regression.
Rust holds the largest per-language adversarial corpus. PathFact-driven
path-domain narrowing covers the `rs-safe-*` regression set.
- **Rule depth**: 6 source families, **2** sanitizer families (prefix and
type-coercion), 11 sink matchers covering HTML, Shell, SQL, SSRF,
@ -149,20 +153,18 @@ landings closed every previously-open `rs-safe-*` regression.
narrow sanitizer count is the primary reason Rust is not in the Stable
tier. Engine-side path/typed sanitizer recognition (PathFact) compensates,
but the ruleset itself is shallow.
- **Recent additions**: SQL class (`rusqlite`, `sqlx`, `diesel`,
`postgres`), Deserialization class (`serde_yaml`, `bincode`,
`rmp_serde`, `ciborium`, `ron`, `toml`), expanded file I/O
(`fs::remove_file/dir/rename/copy`), `reqwest` SSRF builder chain.
- **Closed by recent PathFact landings**
(`src/abstract_interp/path_domain.rs` + per-return-path PathFact entries
on `SsaFuncSummary`): `rs-safe-007` (`.replace("..","")` sanitiser),
`rs-safe-008` (negative-validation return), `rs-safe-009` (match-arm
guards via condition lifting), `rs-safe-010` (static-map lookup),
`rs-safe-012` (`.contains("..")` + `.starts_with('/')` rejection),
`rs-safe-014` (Option-returning user sanitiser), `rs-safe-015`
(`Path::new(p).is_absolute()` typed rejection), `rs-safe-016`
(cross-function `.contains("..")` rejection), and CVE patches
`CVE-2018-20997`, `CVE-2022-36113`, `CVE-2024-24576`.
- **Coverage**: SQL class (`rusqlite`, `sqlx`, `diesel`, `postgres`),
Deserialization class (`serde_yaml`, `bincode`, `rmp_serde`, `ciborium`,
`ron`, `toml`), file I/O (`fs::remove_file/dir/rename/copy`), and the
`reqwest` SSRF builder chain.
- **PathFact-narrowed shapes** (`src/abstract_interp/path_domain.rs` plus
per-return-path PathFact entries on `SsaFuncSummary`) cover
`.replace("..","")` sanitisers, negative-validation returns, match-arm
guards via condition lifting, static-map lookups,
`.contains("..")` + `.starts_with('/')` rejection, Option-returning
user sanitisers, `Path::new(p).is_absolute()` typed rejection,
cross-function `.contains("..")` rejection, and the
`CVE-2018-20997` / `CVE-2022-36113` / `CVE-2024-24576` patch shapes.
- **Not yet covered**: unsafe FFI / `std::mem::transmute` (no rules), Tokio
`process::Command` async variants (not distinguished from sync),
`hyper` / `surf` / `ureq` SSRF clients (reqwest family only).
@ -170,17 +172,16 @@ landings closed every previously-open `rs-safe-*` regression.
### Preview tier
C and C++ remain **Preview** despite reporting 100% rule-level F1 on the
synthetic corpus. A run of additions in late April taught the engine to
follow taint through several constructs that used to be hard cutoffs (STL
containers, builder chains, inline member functions, the wider `std::sto*`
family), so the gap between "passes the synthetic corpus" and "would catch
the same flow on a real codebase" is narrower than it used to be. It is not
zero. The biggest remaining gaps are deep pointer aliasing and function
synthetic corpus. The engine follows taint through STL containers, builder
chains, inline member functions, and the wider `std::sto*` family, so the
gap between "passes the synthetic corpus" and "would catch the same flow
on a real codebase" is narrower than the synthetic numbers suggest. It is
not zero. The biggest remaining gaps are deep pointer aliasing and function
pointers, both of which are pervasive in real C/C++ code. Treat a clean
report as a starting point, not an audit. Pair Nyx with clang-tidy, the
Clang Static Analyzer, or Infer for production use.
**What now works** (added in late April):
**What works:**
- STL container flow. `vec.push_back(tainted)` followed by
`vec.front().c_str()` carries taint into a downstream `system()` sink.
@ -216,8 +217,8 @@ Clang Static Analyzer, or Infer for production use.
`void (*fn)(char *)` resolves to no callee, so cross-pointer flows are
invisible.
- Array-element taint by index. Writes to `buf[i]` do not always propagate
taint to `buf` as a whole; the recent subscript-handling work helps the
general case but doesn't make `buf` an alias for every element.
taint to `buf` as a whole; subscript-handling helps the general case but
doesn't make `buf` an alias for every element.
- Nested classes beyond one level (C++ only).
#### C: 100% P / 100% R / 100% F1 *(30-case corpus)*
@ -269,9 +270,8 @@ have moved out of the blind-spot list. Synthetic-corpus F1 is not a
reliable signal for Preview-tier languages: a clean report can coexist
with structural gaps.
(The previous **Experimental** tier was retired in the 2026-04-25
measurement when Rust's adversarial corpus reached 100% F1; no language
currently sits in that tier.)
(No language currently sits in the **Experimental** tier; it is reserved
for future additions whose corpus has not yet stabilised.)
---

View file

@ -1,6 +1,6 @@
# Rule reference
Every finding Nyx emits has a rule ID. This page enumerates the IDs that ship with scanner 0.5.0, grouped by family.
Every finding Nyx emits has a rule ID. This page enumerates the IDs that ship with the scanner, grouped by family.
> This page is written by hand and drifts against the code. Authoritative sources: [`src/patterns/<lang>.rs`](https://github.com/elicpeter/nyx/tree/master/src/patterns) for AST patterns, [`src/labels/<lang>.rs`](https://github.com/elicpeter/nyx/tree/master/src/labels) for taint matchers, and [`src/auth_analysis/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/config.rs) for auth rules. If a rule fires that isn't listed here, the source file is right and this page is wrong.

View file

@ -82,7 +82,7 @@ Modifiers in the ±5 range nudge the result for trend (only after the second sca
It's a Nyx-finding-pressure metric, not a security audit. Score 100 means Nyx didn't find anything under its current rules and language coverage; it doesn't certify the absence of vulnerabilities. The score doesn't see runtime config, IAM, secret stores, dependency CVEs, or anything outside the source tree being scanned. A repo of mostly Kotlin (where Nyx coverage is thin) will score artificially well because most of the code never gets evaluated.
The current ceilings are calibrated for v0.5 scanner false-positive rates. As symex coverage and rule precision improve, the ceilings tighten. Calibration data and the rationale behind each tunable lives in [health-score-audit.md](health-score-audit.md).
Ceilings are calibrated for the current scanner false-positive rates. As symex coverage and rule precision improve, the ceilings tighten. Calibration data and the rationale behind each tunable lives in [health-score-audit.md](health-score-audit.md).
### Findings and Finding detail