Improved path traversal detection and enhanced sink classification logic

2026-06-27 20:29:39 +02:00 · 2026-05-02 03:36:14 -04:00 · 2026-05-02 03:36:14 -04:00 · 3c89bddbf2
commit 3c89bddbf2
parent 58f1794a4e
56 changed files with 3989 additions and 345 deletions
--- a/docs/advanced-analysis.md
+++ b/docs/advanced-analysis.md
@ -31,6 +31,22 @@ SQL sink as an injection risk; an SSRF sink whose URL prefix is locked to a
 trusted host stays quiet. This turns a large class of FPs on numeric and
 locked-prefix paths into true negatives.

+**Path traversal.** The path domain accepts canonicalised-and-rooted
+shapes via `PathFact::is_path_traversal_safe`: a path that is
+dotdot-free and either non-absolute or carries a verified prefix-lock has
+its `Cap::FILE_IO` cleared. When the lock argument is a string literal
+the lock prefix is recorded directly; when it is a method call, field
+access, or configured root, an `OPAQUE_PREFIX_LOCK` marker captures the
+structural invariant ("rooted under SOME prefix") instead. This closes
+the Ruby `File.expand_path + start_with?(root)`, Python
+`os.path.realpath + .startswith(root)`, and JS
+`path.resolve + .startsWith(root)` shapes. `classify_path_assertion`
+recognises JS `.startsWith(...)`, Python `.startswith(...)`, Ruby
+`.start_with?(...)` (paren and paren-less), and Go `strings.HasPrefix(...)`.
+Branch narrowing flips lock attachment under condition negation
+(`if !target.startsWith(ROOT) { return; }` attaches the lock to the
+surviving block, not the rejection arm).
+
 **How to turn it off.**

 | Surface | Value |
@ -111,9 +127,8 @@ identity independent of the parent value.
 |---|---|
 | Env var | `NYX_POINTER_ANALYSIS=0` |

-The pass is **on by default** as of 2026-04-26. The env-var override is
-kept for one release so you can compare against the pre-pointer baseline,
-then will be removed.
+The pass is **on by default**. The env-var override exists so you can
+compare against the pre-pointer baseline.

 **Limitations.** This is not a general escape analysis. Function pointers
 and arbitrary indirect calls still resolve to no callee, and deep alias
--- a/docs/auth.md
+++ b/docs/auth.md
@ -86,6 +86,6 @@ Auth findings render alongside taint findings in the [browser UI](serve.md). The

 <p align="center"><img src="../assets/screenshots/docs/serve-finding-detail.png" alt="Nyx finding detail: numbered source → call → sink walk with a How to fix panel and an inline evidence object" width="900"/></p>

-## Where the work was done
+## Benchmark corpus

-The remediation work is documented release-by-release in `tests/benchmark/RESULTS.md` under the Rust auth row. Phases A1 through B5 (precision and structural improvements) and Phase C (taint-based variant) all landed on the 0.5.0 release branch. The benchmark corpus at [`tests/benchmark/corpus/rust/auth/`](https://github.com/elicpeter/nyx/tree/master/tests/benchmark/corpus/rust/auth/) is 10 fixtures covering the five FP patterns plus a true-positive control.
+The Rust auth corpus at [`tests/benchmark/corpus/rust/auth/`](https://github.com/elicpeter/nyx/tree/master/tests/benchmark/corpus/rust/auth/) is 10 fixtures covering the five FP patterns plus a true-positive control. Per-row metrics live under the Rust auth row in `tests/benchmark/RESULTS.md`.
--- a/docs/configuration.md
+++ b/docs/configuration.md
@ -394,7 +394,7 @@ engine. No flag is needed; CI pipelines keep working across upgrades.
 The rebuild is logged at `info` level:

 ```
-engine version changed (0.4.0 → 0.5.0), rebuilding index
+engine version changed (<old> → <new>), rebuilding index
 ```

 If you see this once per upgrade it is working as intended. If you see it on
--- a/docs/detectors/patterns.md
+++ b/docs/detectors/patterns.md
@ -56,8 +56,10 @@ Full list: [rules.md](../rules.md).
 | `eval("hardcoded literal")` | Pattern matches structure | Run `--mode cfg` to drop AST patterns and rely on taint |
 | `unsafe` block with sound justification | Every `unsafe` matches `rs.quality.unsafe_block` | Filter `>=MEDIUM` (it's Medium) or accept the noise |
 | `.unwrap()` in tests | Acceptable in test code | Default non-prod severity downgrade reduces it |
-| `md5` for non-cryptographic checksums | Pattern can't see intent | Suppress with `--severity ">=MEDIUM"` or per-line `nyx:ignore` |
+| `md5` for non-cryptographic checksums | Pattern can't see intent in most languages | PHP recognises non-crypto consuming context structurally (cache keys, ETag, dedup, `getCacheKey()` returns) and suppresses. Other languages: `--severity ">=MEDIUM"` or per-line `nyx:ignore` |
 | SQL concat with trusted data (Tier B) | Heuristic can't verify the source | Taint is more precise; or convert to a parameterized query |
+| C++ `reinterpret_cast<T>(...)` for byte-pointer / void* / `sockaddr` | Pattern fires on every cast regardless of target type | Suppressed when the target is well-defined by C++ aliasing rules: `char*`, `unsigned char*`, `signed char*`, `wchar_t*`, `uint8_t*`, `int8_t*`, `std::byte*`, `byte*`, `void*`, `uintptr_t` / `intptr_t` (and `std::` variants), and the BSD socket address family. User-defined struct or class pointer targets keep firing. |
+| JS / TS `secrets.fallback_secret` on `process.env.X \|\| ""` | Empty-string fallback satisfies non-undefined string types without committing a secret | Empty-string fallbacks are excluded from the rule. Non-empty literal fallbacks still fire. |

 ## Confidence levels

--- a/docs/detectors/taint.md
+++ b/docs/detectors/taint.md
@ -130,7 +130,7 @@ Sources, sanitizers, and sinks are linked by named capabilities. A sanitizer onl
 | `shell_escape` | | `shlex.quote`, `shell_escape::escape` | `system`, `Command::new`, `eval` |
 | `url_encode` | | `encodeURIComponent` | `location.href`, HTTP client URL arg |
 | `json_parse` | | `JSON.parse` | |
-| `file_io` | | `os.path.realpath`, `filepath.Clean` | `open`, `fs::read_to_string`, `send_file` |
+| `file_io` | | `os.path.realpath`, `filepath.Clean`, canonicalise + `starts_with`-rooted guard | `open`, `fs::read_to_string`, `send_file` |
 | `fmt_string` | | | `printf(var)` |
 | `sql_query` | | parameterized query binders | `cursor.execute`, `db.query` with concatenation |
 | `deserialize` | | | `pickle.loads`, `yaml.load`, `Marshal.load` |
--- a/docs/how-it-works.md
+++ b/docs/how-it-works.md
@ -16,7 +16,7 @@ Two extra layers tune precision around calls. **Context-sensitive inlining** (k=

 When a method call has a receiver typed as a super-class, trait, or interface, **hierarchy fan-out** widens the resolved callee set to every concrete implementer the engine has seen. A class diagram extracted in pass 1 (Java extends/implements, Rust impl-for, TS/JS extends, Python bases, Ruby includes, PHP extends/implements, C++ inheritance) feeds an index that the call resolver consults during pass 2. The fan-out is capped at 8 implementers per call site; over-fanning is a precision tax, not a soundness issue.

-A separate **field-sensitive points-to** pass tracks abstract locations down to the field level, so `c.mu.Lock()` is a lock on `Field(c, mu)` rather than on `c` as a whole. That distinction is what lets the resource-lifecycle and taint passes tell `obj.field = tainted; sink(obj.other_field)` apart from the conservative whole-variable approximation. Subscript reads and writes (`arr[i]`, `map[k] = v`) lower to synthetic `__index_get__` / `__index_set__` calls so the same container model handles them. Set `NYX_POINTER_ANALYSIS=0` to fall back to the pre-pointer-pass behaviour for one release if you need to compare baselines.
+A separate **field-sensitive points-to** pass tracks abstract locations down to the field level, so `c.mu.Lock()` is a lock on `Field(c, mu)` rather than on `c` as a whole. That distinction is what lets the resource-lifecycle and taint passes tell `obj.field = tainted; sink(obj.other_field)` apart from the conservative whole-variable approximation. Subscript reads and writes (`arr[i]`, `map[k] = v`) lower to synthetic `__index_get__` / `__index_set__` calls so the same container model handles them. Set `NYX_POINTER_ANALYSIS=0` to fall back to the pre-pointer-pass behaviour for baseline comparison.

 ## Optional analyses on top

--- a/docs/installation.md
+++ b/docs/installation.md
@ -61,7 +61,7 @@ Optional features:
 Nyx stores its scanner version in the project's index database. When the binary's version differs from the stored version, the index is wiped on the next scan and rebuilt against the new engine. You'll see one info-level log line:

 ```
-engine version changed (0.4.0 → 0.5.0), rebuilding index
+engine version changed (<old> → <new>), rebuilding index
 ```

 No flag needed. If you see this on *every* scan, the metadata row isn't being persisted; file an issue.
--- a/docs/language-maturity.md
+++ b/docs/language-maturity.md
@ -9,24 +9,22 @@ The classifications here are grounded in three concrete signals:
 1. **Rule depth**: how many distinct source / sanitizer / sink matchers exist
   for the language in `src/labels/<lang>.rs`, and how many vulnerability
   classes (Cap bits) those matchers cover.
-2. **Benchmark results**: rule-level precision / recall / F1 on the 433-case
+2. **Benchmark results**: rule-level precision / recall / F1 on the 492-case
   corpus in
-   [`tests/benchmark/RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md),
-   last measured 2026-04-29 with scanner version 0.5.0.
+   [`tests/benchmark/RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md).
 3. **Known weak spots**: FPs and FNs the maintainers have deliberately left
   in the benchmark rather than suppressed, plus structural engine
-   limitations the corpus does not stress, documented release-by-release in
+   limitations the corpus does not stress, documented in
   [`RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md).

-As of 2026-04-29 the synthetic corpus has effectively saturated: every
-real-CVE fixture fires and rule-level recall is 100%. Nine of ten
-languages report rule-level F1 = 100.0%; Go reports 98.0% on the back of
-a single safe-fixture FP. Aggregate rule-level P=0.995, R=1.000, F1=0.998.
-That means F1 alone no longer differentiates tiers, so the differentiators
-are **rule depth**, **gated-sink coverage**, and **structural idioms the
-corpus does not fully stress** (deep pointer aliasing in C/C++,
-framework-specific context). All parser integrations use tree-sitter and
-are stable; parsing is not a differentiator.
+The synthetic corpus has effectively saturated: every
+real-CVE fixture fires and rule-level precision and recall are both 100%.
+All ten languages report rule-level F1 = 100.0%. Aggregate rule-level
+P=1.000, R=1.000, F1=1.000. That means F1 alone no longer differentiates
+tiers, so the differentiators are **rule depth**, **gated-sink coverage**,
+and **structural idioms the corpus does not fully stress** (deep pointer
+aliasing in C/C++, framework-specific context). All parser integrations
+use tree-sitter and are stable; parsing is not a differentiator.

 ---

@ -35,7 +33,7 @@ are stable; parsing is not a differentiator.
 | Tier | Languages | F1 | What to expect |
 |------|-----------|----|----------------|
 | **Stable** | Python, JavaScript, TypeScript | 100% | Deep rule sets, gated sinks (argument-role-aware), framework detection, extensive fixtures, and the bulk of advanced-analysis (SSA two-level solve, context-sensitivity, symbolic execution, abstract interpretation) coverage. Safe to depend on in CI gates. |
-| **Beta** | Go, Java, PHP, Ruby, Rust | 98.0% to 100% | Solid mid-depth rule sets with narrower cap coverage and **no gated sinks**. Cross-file flows work; some idioms (variable-typed method receivers, framework context, string interpolation, match-arm guards) are partially modeled. Usable in CI; review FP/FN lists before tightening gates. |
+| **Beta** | Go, Java, PHP, Ruby, Rust | 100% | Solid mid-depth rule sets with narrower cap coverage and **no gated sinks**. Cross-file flows work; some idioms (variable-typed method receivers, framework context, string interpolation, match-arm guards) are partially modeled. Usable in CI; review FP/FN lists before tightening gates. |
 | **Preview** | C, C++ | 100% on synthetic corpus | Recent work taught the engine to follow taint through `std::vector` / `std::string` / map containers (including `c_str()`), through fluent builder chains like `Socket::builder().host(h).connect()`, and through inline class member functions. Function pointers and deeper pointer aliasing through `*p` / `p->field` are still not tracked. Rule-level scores against a corpus of obvious unsafe-API uses look perfect, but that is not the same as a clean audit on a real codebase. Pair with clang-tidy, Clang Static Analyzer, or Infer. |

 ---
@ -90,13 +88,15 @@ are stable; parsing is not a differentiator.

 ### Beta tier

-#### Go: 96.2% P / 100.0% R / 98.0% F1 *(53-case corpus, 1 FP, 0 FNs)*
+#### Go: 100% P / 100% R / 100% F1 *(56-case corpus)*

 - **Rule depth**: 4 source families, 4 sanitizer families, 9 sink matchers
  covering HTML, URL, Shell, SQL, SSRF, Crypto, and File I/O.
 - **Framework context**: Gin, Echo source matchers.
- **Open weak spots**: one safe Go fixture (`go-safe-009`) draws a spurious
-  CMDi finding.
+- **Recent fix**: `strings.ReplaceAll` is now recognised as a CMDi sanitiser
+  in chain-wrapper / call-site-replace shapes, clearing the last open
+  Go safe-fixture FP (`go-safe-009`, `validate(s string)` wrapping a
+  `strings.ReplaceAll` over `;`).
 - **Known gaps**: no gated sinks, no deserialization class. `fmt.Sprintf`
  is deliberately not a sink. Cap coverage is narrower than the Stable
  tier and argument-role-aware sink modeling is not yet implemented for Go,
@ -126,21 +126,25 @@ are stable; parsing is not a differentiator.

 #### Ruby: 100% P / 100% R / 100% F1 *(39-case corpus)*

- **Rule depth**: 3 source families, 7 sanitizer families, 15 sink matchers
-  covering HTML, Shell, SQL, Code, SSRF, File I/O, and Deserialization.
+- **Rule depth**: 3 source families, 7 sanitizer families, 16 sink matchers
+  covering HTML, Shell, SQL, Code, SSRF, File I/O, and Deserialization. SSRF
+  coverage includes `URI.open` and the low-level `OpenURI.open_uri` it
+  delegates to (the canonical CarrierWave CVE-2021-21288 sink).
+  Statement-level chained-call wrappers
+  (`YAML.safe_load(File.read(filename))`, `Marshal.load(File.read(p))`,
+  `String.new(File.read(x))`) classify the inner sink for cross-function
+  summary extraction so the outer call does not strip the sink classification
+  on the helper.
 - **Framework context**: Rails helpers (`sanitize_sql`, `permit`, `require`).
 - **Known gaps**: string interpolation inside shell and SQL strings is
  recognized structurally but not modeled as a distinct operator.
  `begin/rescue/ensure` exception-edge wiring is documented as deferred
-  (structurally incompatible with `build_try()`). The previous open
-  `rb-interproc-001` FN closed in the 2026-04-28 baseline after the
-  Ruby `Kernel#open` CMDI sink and exact-match sigil work landed.
+  (structurally incompatible with `build_try()`).

 #### Rust: 100% P / 100% R / 100% F1 *(70-case adversarial corpus)*

-Rust holds the largest per-language adversarial corpus and was promoted
-from Experimental to Beta in the 2026-04-25 measurement after the PathFact
-landings closed every previously-open `rs-safe-*` regression.
+Rust holds the largest per-language adversarial corpus. PathFact-driven
+path-domain narrowing covers the `rs-safe-*` regression set.

 - **Rule depth**: 6 source families, **2** sanitizer families (prefix and
  type-coercion), 11 sink matchers covering HTML, Shell, SQL, SSRF,
@ -149,20 +153,18 @@ landings closed every previously-open `rs-safe-*` regression.
  narrow sanitizer count is the primary reason Rust is not in the Stable
  tier. Engine-side path/typed sanitizer recognition (PathFact) compensates,
  but the ruleset itself is shallow.
- **Recent additions**: SQL class (`rusqlite`, `sqlx`, `diesel`,
-  `postgres`), Deserialization class (`serde_yaml`, `bincode`,
-  `rmp_serde`, `ciborium`, `ron`, `toml`), expanded file I/O
-  (`fs::remove_file/dir/rename/copy`), `reqwest` SSRF builder chain.
- **Closed by recent PathFact landings**
-  (`src/abstract_interp/path_domain.rs` + per-return-path PathFact entries
-  on `SsaFuncSummary`): `rs-safe-007` (`.replace("..","")` sanitiser),
-  `rs-safe-008` (negative-validation return), `rs-safe-009` (match-arm
-  guards via condition lifting), `rs-safe-010` (static-map lookup),
-  `rs-safe-012` (`.contains("..")` + `.starts_with('/')` rejection),
-  `rs-safe-014` (Option-returning user sanitiser), `rs-safe-015`
-  (`Path::new(p).is_absolute()` typed rejection), `rs-safe-016`
-  (cross-function `.contains("..")` rejection), and CVE patches
-  `CVE-2018-20997`, `CVE-2022-36113`, `CVE-2024-24576`.
+- **Coverage**: SQL class (`rusqlite`, `sqlx`, `diesel`, `postgres`),
+  Deserialization class (`serde_yaml`, `bincode`, `rmp_serde`, `ciborium`,
+  `ron`, `toml`), file I/O (`fs::remove_file/dir/rename/copy`), and the
+  `reqwest` SSRF builder chain.
+- **PathFact-narrowed shapes** (`src/abstract_interp/path_domain.rs` plus
+  per-return-path PathFact entries on `SsaFuncSummary`) cover
+  `.replace("..","")` sanitisers, negative-validation returns, match-arm
+  guards via condition lifting, static-map lookups,
+  `.contains("..")` + `.starts_with('/')` rejection, Option-returning
+  user sanitisers, `Path::new(p).is_absolute()` typed rejection,
+  cross-function `.contains("..")` rejection, and the
+  `CVE-2018-20997` / `CVE-2022-36113` / `CVE-2024-24576` patch shapes.
 - **Not yet covered**: unsafe FFI / `std::mem::transmute` (no rules), Tokio
  `process::Command` async variants (not distinguished from sync),
  `hyper` / `surf` / `ureq` SSRF clients (reqwest family only).
@ -170,17 +172,16 @@ landings closed every previously-open `rs-safe-*` regression.
 ### Preview tier

 C and C++ remain **Preview** despite reporting 100% rule-level F1 on the
-synthetic corpus. A run of additions in late April taught the engine to
-follow taint through several constructs that used to be hard cutoffs (STL
-containers, builder chains, inline member functions, the wider `std::sto*`
-family), so the gap between "passes the synthetic corpus" and "would catch
-the same flow on a real codebase" is narrower than it used to be. It is not
-zero. The biggest remaining gaps are deep pointer aliasing and function
+synthetic corpus. The engine follows taint through STL containers, builder
+chains, inline member functions, and the wider `std::sto*` family, so the
+gap between "passes the synthetic corpus" and "would catch the same flow
+on a real codebase" is narrower than the synthetic numbers suggest. It is
+not zero. The biggest remaining gaps are deep pointer aliasing and function
 pointers, both of which are pervasive in real C/C++ code. Treat a clean
 report as a starting point, not an audit. Pair Nyx with clang-tidy, the
 Clang Static Analyzer, or Infer for production use.

-**What now works** (added in late April):
+**What works:**

 - STL container flow. `vec.push_back(tainted)` followed by
  `vec.front().c_str()` carries taint into a downstream `system()` sink.
@ -216,8 +217,8 @@ Clang Static Analyzer, or Infer for production use.
  `void (*fn)(char *)` resolves to no callee, so cross-pointer flows are
  invisible.
 - Array-element taint by index. Writes to `buf[i]` do not always propagate
-  taint to `buf` as a whole; the recent subscript-handling work helps the
-  general case but doesn't make `buf` an alias for every element.
+  taint to `buf` as a whole; subscript-handling helps the general case but
+  doesn't make `buf` an alias for every element.
 - Nested classes beyond one level (C++ only).

 #### C: 100% P / 100% R / 100% F1 *(30-case corpus)*
@ -269,9 +270,8 @@ have moved out of the blind-spot list. Synthetic-corpus F1 is not a
 reliable signal for Preview-tier languages: a clean report can coexist
 with structural gaps.

-(The previous **Experimental** tier was retired in the 2026-04-25
-measurement when Rust's adversarial corpus reached 100% F1; no language
-currently sits in that tier.)
+(No language currently sits in the **Experimental** tier; it is reserved
+for future additions whose corpus has not yet stabilised.)

 ---

--- a/docs/rules.md
+++ b/docs/rules.md
@ -1,6 +1,6 @@
 # Rule reference

-Every finding Nyx emits has a rule ID. This page enumerates the IDs that ship with scanner 0.5.0, grouped by family.
+Every finding Nyx emits has a rule ID. This page enumerates the IDs that ship with the scanner, grouped by family.

 > This page is written by hand and drifts against the code. Authoritative sources: [`src/patterns/<lang>.rs`](https://github.com/elicpeter/nyx/tree/master/src/patterns) for AST patterns, [`src/labels/<lang>.rs`](https://github.com/elicpeter/nyx/tree/master/src/labels) for taint matchers, and [`src/auth_analysis/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/config.rs) for auth rules. If a rule fires that isn't listed here, the source file is right and this page is wrong.

--- a/docs/serve.md
+++ b/docs/serve.md
@ -82,7 +82,7 @@ Modifiers in the ±5 range nudge the result for trend (only after the second sca

 It's a Nyx-finding-pressure metric, not a security audit. Score 100 means Nyx didn't find anything under its current rules and language coverage; it doesn't certify the absence of vulnerabilities. The score doesn't see runtime config, IAM, secret stores, dependency CVEs, or anything outside the source tree being scanned. A repo of mostly Kotlin (where Nyx coverage is thin) will score artificially well because most of the code never gets evaluated.

-The current ceilings are calibrated for v0.5 scanner false-positive rates. As symex coverage and rule precision improve, the ceilings tighten. Calibration data and the rationale behind each tunable lives in [health-score-audit.md](health-score-audit.md).
+Ceilings are calibrated for the current scanner false-positive rates. As symex coverage and rule precision improve, the ceilings tighten. Calibration data and the rationale behind each tunable lives in [health-score-audit.md](health-score-audit.md).

 ### Findings and Finding detail