Prerelease cleanup (#46)

* feat: Add const_bound_vars tracking to prevent false positives in ownership checks

* feat: Introduce field interner and typed bounded vars for enhanced type tracking

* feat: Add typed_call_receivers and typed_bounded_dto_fields for enhanced type tracking

* feat: Centralize method name extraction with bare_method_name helper

* feat: Implement Phase-6 hierarchy fan-out for runtime virtual dispatch

* feat: Enhance C++ taint tracking with additional container operations and inline method resolution

* feat: Introduce field-sensitive points-to analysis for enhanced resource tracking

* feat: Implement Pointer-Phase 6 subscript handling for enhanced container analysis

* test: Add comprehensive tests for JavaScript control flow constructs and lattice operations

* docs: Update advanced analysis documentation with field-sensitive points-to and hierarchy fan-out details

* test: Add comprehensive tests for lattice algebra laws and SSA edge cases

* feat: Add destructured session user handling and safe user ID access patterns

* feat: Implement row-population reverse-walk for enhanced authorization checks

* feat: Enhance authorization checks with local alias chain for self-actor types

* feat: Introduce ActiveRecord query safety checks and enhance snippet extraction

* feat: Implement chained method call inner-gate rebinding for SSRF prevention

* feat: Add observability and error modules, enhance debug functionality, and implement theme context

* feat: Remove Auth Analysis page and update navigation to redirect to Explorer

* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor

* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor

* feat: Reset path-safe-suppressed spans before lowering to maintain analysis integrity

* fix(ssa): ungate debug_assert_bfs_ordering for release-tests build

The helper at src/ssa/lower.rs was gated `#[cfg(debug_assertions)]` while
the unit test at the bottom of the file was gated only `#[cfg(test)]`.
Since `cfg(test)` is set in release builds with `--tests` but
`cfg(debug_assertions)` is not, `cargo build --release --tests` failed
with E0425. Removing the gate fixes the build; the body is `debug_assert!`
only, so the helper is free in release. Also drop the gate at the call
site to avoid a `dead_code` warning when the lib is built without
`--tests`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(closure-capture): flip JS/TS fixtures to required-finding

The JS and TS closure-capture fixtures pinned the old broken behaviour
via `forbidden_findings: [{ "id_prefix": "taint-" }]`. The engine now
correctly traces taint through the closure boundary (env source captured
by an arrow function, sunk via `child_process.exec` inside the body), so
the formerly-forbidden finding is a true positive.

Match the Python sibling's shape — `required_findings` with
`id_prefix` + `min_count` plus a small `noise_budget` — and rewrite the
companion READMEs and the phase8_fragility_tests doc-comments from
"known gap" to "regression guard".

Verified:
- cargo test --release --test phase8_fragility_tests → 8/8 pass
- cargo test --release --lib bfs_assertion → pass
- corpus benchmark F1 = 0.9976 (TP=205, FP=1, FN=0) — unchanged

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: Add OWASP mapping and baseline mutation hooks for enhanced security analysis

* feat: Introduce health module and enhance health score computation with calibration tests

* feat: Add expectations configuration and cleanup .gitignore for log files

* feat: Implement theme selection and enhance settings panel for triage sync

* feat: Suppress false positives for strcpy calls with literal sources in AST

* feat: Update analyse_function_ssa to return body CFG for accurate analysis

* feat: Add bug report and feature request templates for improved issue tracking

* feat: removed dev scripts

* feat: update README.md for clarity and consistency in fixture descriptions

* feat: removed dev docs

* feat: clean up error handling and UI elements for improved user experience

* feat: adjust button sizes in HeaderBar for better UI consistency

* feat: enhance taint analysis with additional context for sanitizer and taint findings

* cargo fmt

* prettier

* refactor: simplify conditional checks and improve code readability in AST and screenshot capture scripts

* feat: add script to frame PNG screenshots with brand gradient

* feat: add fuzzing support with new targets and CI workflows

* refactor: streamline match expressions and improve formatting in CLI and output handling

* feat: enhance configuration display with detailed output options

* feat: stage demo configuration for improved CLI screenshot output

* feat: expose merge_configs function for user-configurable settings

* refactor: simplify code structure and improve readability in config handling

* refactor: improve descriptions for vulnerability patterns in various languages

* feat: update MIT License section with additional usage details and copyright information

* feat: update screenshots

* refactor: update build process and paths for frontend assets

* feat: add cross-file taint fuzzing target and supporting dictionary

* refactor: clean up formatting and comments in fuzz configuration and example files

* refactor: remove outdated comments and clean up CI configuration files

* chore: update changelog dates and improve formatting in documentation

* refactor: update Cargo.toml and CI configuration for improved packaging and build process

* refactor: enhance quote-stripping logic to prevent panics and add regression tests

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Eli Peter 2026-04-29 00:58:38 -04:00 committed by GitHub
parent 79c29b394d
commit 82f18184b1
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
348 changed files with 48731 additions and 2925 deletions

View file

@ -1,11 +1,15 @@
# Advanced Analysis
Nyx ships four optional analysis passes that layer on top of the core SSA
taint engine. Each pass is independently switchable via config
(`[analysis.engine]` in `nyx.conf` / `nyx.local`), a matching CLI flag pair,
or; as a legacy last-resort override for library users with no CLI entry
point; a `NYX_*` environment variable. All four are **on by default**: turning
them off trades precision for speed.
Nyx layers several analysis passes on top of the core SSA taint engine.
Most are switchable via config (`[analysis.engine]` in `nyx.conf` /
`nyx.local`), a matching CLI flag pair, or, as a last-resort override for
library users with no CLI entry point, a `NYX_*` environment variable. The
five precision-tuning passes (abstract interpretation, context sensitivity,
symbolic execution, constraint solving, field-sensitive points-to) are
**on by default** because the benchmark numbers in
[language-maturity.md](language-maturity.md) are measured with them on.
The demand-driven backwards walk and hierarchy fan-out sit alongside but
are not user-toggleable in the same way.
See [`Configuration`](configuration.md#analysisengine) for the full config
surface and CLI flag table. This page explains what each pass does, why it
@ -81,6 +85,77 @@ origin-attribution.
---
## Field-sensitive points-to
**What it does.** Runs a Steensgaard-style alias analysis that interns field
accesses as their own abstract locations. `c.mu` becomes `Field(c, mu)`,
distinct from `c` itself; a write to `obj.cache` and a read from
`obj.cache` in different methods both land on the same abstract location;
subscript reads and writes (`arr[i]`, `map[k] = v`) lower to synthetic
`__index_get__` / `__index_set__` calls so the engine can model them
through the same container store/load primitives used for STL containers,
Python lists, JS arrays, and similar.
**Why it helps.** It splits a class of false positives that the
whole-variable taint model produced. Before this pass, `obj.field =
tainted; sink(obj.other_field)` would taint `obj` as a whole and fire on
the safe field; the receiver-type / sub-field distinction is also what
lets the resource-lifecycle pass attribute a `c.mu.Lock()` to the lock
field rather than to its container. Cross-method field flow (writer in
one method, reader in another) shows up only when fields have stable
identity independent of the parent value.
**How to turn it off.**
| Surface | Value |
|---|---|
| Env var | `NYX_POINTER_ANALYSIS=0` |
The pass is **on by default** as of 2026-04-26. The env-var override is
kept for one release so you can compare against the pre-pointer baseline,
then will be removed.
**Limitations.** This is not a general escape analysis. Function pointers
and arbitrary indirect calls still resolve to no callee, and deep alias
chains through `*p` / `p->field` in C/C++ are not tracked beyond the
direct field case. The points-to set per value is capped at
`--max-pointsto` (default 32); when truncation happens, an engine note
records the precision loss.
**Source**: [`src/pointer/`](https://github.com/elicpeter/nyx/tree/master/src/pointer/).
---
## Hierarchy fan-out for virtual dispatch
**What it does.** Builds a per-language type-hierarchy index in pass 1
(extends, implements, impl-for, includes; the exact construct depends on
the language) and uses it in pass 2 to widen method-call resolution. When
a call's receiver is statically typed as a super-class, trait, or
interface, the resolver returns every concrete implementer it has seen
in the codebase rather than just the first match.
**Why it helps.** Without it, a call like `repository.findById(id)` where
`repository` is typed as the interface gets resolved against whatever the
single-result resolver finds first; if the matching implementer is in
another file the call effectively goes opaque. With the hierarchy, the
taint engine sees the union of every implementer's transform and the
flow shows up regardless of which file holds the concrete class.
**Limitations.** Fan-out is capped at 8 implementers per call site; over
that, the tail is silently dropped (a debug log records the cap hit) and
the call is treated as a non-deterministic union of the kept
implementers. Languages that use structural / implicit interface
satisfaction (Go) are deliberately skipped because per-file extraction
is intractable; those calls fall back to the single-result resolver. The
extractor covers Java, Rust, TS/JS/TSX, Python, Ruby, PHP, and C++.
**Source**: [`src/cfg/hierarchy.rs`](https://github.com/elicpeter/nyx/blob/master/src/cfg/hierarchy.rs)
and [`src/summary/mod.rs`](https://github.com/elicpeter/nyx/blob/master/src/summary/mod.rs)
(`TypeHierarchyIndex`, `resolve_callee_widened`).
---
## Symbolic execution
**What it does.** Builds a symbolic expression tree per tainted SSA value,

View file

@ -25,8 +25,7 @@ One rule ID, parameterized by the source location. Suppressions can target eithe
## What it can't detect
- **Library calls without summaries.** If a callee has no summary (no source, binary-only dependency), Nyx treats it as neither propagating nor sanitizing. This is conservative for sanitization but lossy for propagation.
- **Taint through struct fields and containers.** Taint attaches to whole variables. `obj.field = tainted; sink(obj.other_field)` can produce a false positive because `obj` itself is tainted.
- **Aliasing.** `let y = &x; sink(*y)` tracks `y` separately from `x`. Can cause FNs.
- **Deep pointer aliasing.** `let y = &x; sink(*y)` works through one level, but arbitrary chains of pointer arithmetic and aliased writes (`*p`, `p->field` in C/C++) are not tracked end-to-end. Function pointers and indirect calls resolve to no callee.
- **Implicit flows.** Taint follows explicit data, not branching signal. `if (secret) x = 1 else x = 0` does not taint `x`.
- **Globals and statics across functions.** Not tracked across function boundaries.
@ -35,7 +34,7 @@ One rule ID, parameterized by the source location. Suppressions can target eithe
| Scenario | Why | Mitigation |
|---|---|---|
| Custom sanitizer not recognised | Only built-in + configured sanitizers match | Add a custom sanitizer rule in config |
| Taint through struct fields | Variable-level tracking, not field-level | No fix yet; field-sensitivity is planned |
| Container holds mixed-typed items the engine cannot tell apart | A `vector<int>` of port numbers and a `vector<string>` of user input share the same store/load model | Sanitize the values on the way in (numeric parse / explicit validator) so the values themselves carry no cap, not just the container |
| Dead branches | Path-insensitive within a function | Constraint solving catches trivially infeasible combos; path-validated findings are scored lower |
| Library wrapper re-introduces taint | Wrapper opaque, or summary marks it as propagating | Summarize the wrapper explicitly or add it as a sanitizer |

View file

@ -14,6 +14,10 @@ A scan runs in two passes over the file tree, with an optional SQLite index that
Two extra layers tune precision around calls. **Context-sensitive inlining** (k=1) re-runs intra-file callees with the actual argument taint at the call site, so a helper called once with tainted input and once with sanitized input produces the right result for each call. **SCC fixed-point**: when a group of mutually-recursive functions forms a strongly-connected component in the call graph, the engine iterates summaries to a joint fixed-point (capped at 64 iterations). SCCs that span files are also handled.
When a method call has a receiver typed as a super-class, trait, or interface, **hierarchy fan-out** widens the resolved callee set to every concrete implementer the engine has seen. A class diagram extracted in pass 1 (Java extends/implements, Rust impl-for, TS/JS extends, Python bases, Ruby includes, PHP extends/implements, C++ inheritance) feeds an index that the call resolver consults during pass 2. The fan-out is capped at 8 implementers per call site; over-fanning is a precision tax, not a soundness issue.
A separate **field-sensitive points-to** pass tracks abstract locations down to the field level, so `c.mu.Lock()` is a lock on `Field(c, mu)` rather than on `c` as a whole. That distinction is what lets the resource-lifecycle and taint passes tell `obj.field = tainted; sink(obj.other_field)` apart from the conservative whole-variable approximation. Subscript reads and writes (`arr[i]`, `map[k] = v`) lower to synthetic `__index_get__` / `__index_set__` calls so the same container model handles them. Set `NYX_POINTER_ANALYSIS=0` to fall back to the pre-pointer-pass behaviour for one release if you need to compare baselines.
## Optional analyses on top
These run on top of the forward taint pass. They're independently switchable via `[analysis.engine]` config or matching CLI flags. See [advanced-analysis.md](advanced-analysis.md) for the full description and tradeoffs.
@ -22,6 +26,8 @@ These run on top of the forward taint pass. They're independently switchable via
|---|---|---|
| Abstract interpretation | Carries interval and string prefix/suffix bounds alongside taint. Suppresses findings on proven-bounded integers and locked-prefix URLs | on |
| Context sensitivity | k=1 inlining for intra-file callees | on |
| Field-sensitive points-to | Distinguishes `obj.field` from `obj` itself, so a tainted write to one field does not poison reads from another. Also gives the resource-lifecycle pass per-field locks | on |
| Hierarchy fan-out | When a method call's receiver is typed as a super-class, trait, or interface, widens callee resolution to every concrete implementer the engine has seen | on |
| Constraint solving | Drops paths whose accumulated branch predicates are unsatisfiable. Optional Z3 backend with `--features smt` | on |
| Symbolic execution | Builds an expression tree per tainted value. Produces a witness string at the sink. Detects sanitization patterns the taint engine alone would miss | on |
| Backwards analysis | After the forward pass, walks backwards from each sink to confirm or invalidate the flow. Annotates findings as `backwards-confirmed`, `backwards-infeasible`, or `backwards-budget-exhausted` | off |

View file

@ -9,28 +9,34 @@ The classifications here are grounded in three concrete signals:
1. **Rule depth**: how many distinct source / sanitizer / sink matchers exist
for the language in `src/labels/<lang>.rs`, and how many vulnerability
classes (Cap bits) those matchers cover.
2. **Benchmark results**: rule-level precision / recall / F1 on the 305-case
corpus (267 synthetic + 14 real-CVE pairs + 10 auth fixtures) in
2. **Benchmark results**: rule-level precision / recall / F1 on the 433-case
corpus in
[`tests/benchmark/RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md),
last measured 2026-04-23 with scanner version 0.5.0.
last measured 2026-04-29 with scanner version 0.5.0.
3. **Known weak spots**: FPs and FNs the maintainers have deliberately left
in the benchmark rather than suppressed, documented release-by-release in
in the benchmark rather than suppressed, plus structural engine
limitations the corpus does not stress, documented release-by-release in
[`RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md).
All parser integrations use tree-sitter and are stable; parsing is not a
differentiator between tiers. The differentiators are rule depth, cross-file
confidence, and modeled idioms.
As of 2026-04-29 the synthetic corpus has effectively saturated: nine of ten
languages report rule-level F1 = 100.0% and Go reports 94.1% (two FPs and
one FN on a real-CVE SSRF case, `cve-go-2023-3188-vulnerable`). Aggregate
rule-level P=0.991, R=0.995, F1=0.993. That means F1 alone no longer
differentiates tiers, so the differentiators are **rule depth**,
**gated-sink coverage**, and **structural idioms the corpus does not fully
stress** (deep pointer aliasing in C/C++, framework-specific context). All
parser integrations use tree-sitter and are stable; parsing is not a
differentiator.
---
## Tier Summary
| Tier | Languages | What to expect |
|------|-----------|----------------|
| **Stable** | Python, JavaScript, TypeScript | Deep rule sets, gated sinks (argument-role-aware), framework detection, extensive fixtures, and the bulk of advanced-analysis (SSA, context-sensitivity, symbolic execution) coverage. Safe to depend on in CI gates. |
| **Beta** | Go, Java, Ruby, PHP | Solid mid-depth rule sets with known narrower class coverage. No gated sinks yet. Cross-file flows work; some idioms (variable-typed method receivers, framework context, string interpolation) are incomplete. Usable in CI, but review FP/FN lists before tightening gates. |
| **Preview** | C, C++ | Pattern-only coverage. Pointer aliasing, function pointers, array-element taint, and STL container flows are not modeled. Suitable for finding obvious unsafe API uses; do not use as a sole SAST gate. Pair with clang-tidy / Clang Static Analyzer / Infer. |
| **Experimental** | Rust | Full source coverage relative to the framework ecosystem, but several FPs persist on adversarial safe cases pending engine work (match-arm guards, structural sinks with type facts). Appropriate for spot-checks and contribution but not yet recommended as a sole SAST dependency. |
| Tier | Languages | F1 | What to expect |
|------|-----------|----|----------------|
| **Stable** | Python, JavaScript, TypeScript | 100% | Deep rule sets, gated sinks (argument-role-aware), framework detection, extensive fixtures, and the bulk of advanced-analysis (SSA two-level solve, context-sensitivity, symbolic execution, abstract interpretation) coverage. Safe to depend on in CI gates. |
| **Beta** | Go, Java, PHP, Ruby, Rust | 94.1% to 100% | Solid mid-depth rule sets with narrower cap coverage and **no gated sinks**. Cross-file flows work; some idioms (variable-typed method receivers, framework context, string interpolation, match-arm guards) are partially modeled. Usable in CI; review FP/FN lists before tightening gates. |
| **Preview** | C, C++ | 100% on synthetic corpus | Recent work taught the engine to follow taint through `std::vector` / `std::string` / map containers (including `c_str()`), through fluent builder chains like `Socket::builder().host(h).connect()`, and through inline class member functions. Function pointers and deeper pointer aliasing through `*p` / `p->field` are still not tracked. Rule-level scores against a corpus of obvious unsafe-API uses look perfect, but that is not the same as a clean audit on a real codebase. Pair with clang-tidy, Clang Static Analyzer, or Infer. |
---
@ -38,7 +44,7 @@ confidence, and modeled idioms.
### Stable tier
#### Python: 100% P / 100% R / 100% F1 *(29-case corpus)*
#### Python: 100% P / 100% R / 100% F1 *(46-case corpus)*
- **Rule depth**: 5 source families, 7 sanitizer families, 21 sink matchers
spanning HTML, URL, Shell, SQL, Code, SSRF, File I/O, and Deserialization.
@ -47,52 +53,59 @@ confidence, and modeled idioms.
- **Advanced analysis**: gated sinks (`Popen`, `subprocess.run/call` with
activation-arg awareness), most SSA-equivalence and symbolic-execution
fixtures target Python.
- **Fixtures**: 125 under `tests/fixtures/` plus 30 benchmark cases.
- **Fixtures**: 125 under `tests/fixtures/` plus 42 benchmark cases.
- **Blind spots**: f-string interpolation is not explicitly modeled as a
distinct taint-producing construct; string-formatting flows are caught by
the general concatenation path.
#### JavaScript: 93.8% P / 100% R / 96.8% F1 *(27-case corpus)*
#### JavaScript: 100% P / 100% R / 100% F1 *(42-case corpus)*
- **Rule depth**: 3 source families, 10 sanitizer families, 24 sink matchers
spanning HTML, URL, JSON, Shell, SQL, Code, SSRF, and File I/O.
- **Advanced analysis**: gated sinks (`setAttribute`, `parseFromString`),
two-level SSA solve for top-level + per-function scopes (`analyse_ssa_js_two_level`),
prefix-locked SSRF suppression via StringFact.
two-level SSA solve for top-level + per-function scopes
(`analyse_ssa_js_two_level`), prefix-locked SSRF suppression via
StringFact, abstract-interpretation interval tracking.
- **Framework context**: Express, Koa, Fastify (via in-file import scan when
`package.json` is absent).
- **Fixtures**: 238 under `tests/fixtures/`; the largest corpus of any
- **Fixtures**: 238 under `tests/fixtures/`; the largest fixture set of any
language.
- **Blind spots**: template literals are lowered through concatenation rather
than modeled as a first-class taint operator; dynamic property access
(`obj[user]`) is conservatively treated.
#### TypeScript: 100% P / 100% R / 100% F1 *(35-case corpus, most recent measurement)*
#### TypeScript: 100% P / 100% R / 100% F1 *(47-case corpus)*
- **Rule depth**: Shares the JS ruleset (3 sources, 10 sanitizers, 24 sinks)
plus TS-specific grammar handling.
- **Advanced analysis**: TSX and JSX grammars wired as of 2026-04-20;
- **Advanced analysis**: TSX and JSX grammars wired;
discriminated-union narrowing, generic erasure, decorator flow, and
interface dispatch are all validated against adversarial type-system
stressors.
- **Framework context**: Fastify detection via `detect_in_file_frameworks`
(import-driven, no `package.json` required).
- **Fixtures**: 39 test fixtures plus 35 benchmark cases.
- **Blind spots**: 0 known open weak spots as of 2026-04-20. `as any` casts
and `any`-typed flows are handled conservatively (treated as tainted).
- **Fixtures**: 39 test fixtures plus 42 benchmark cases.
- **Blind spots**: `as any` casts and `any`-typed flows are handled
conservatively (treated as tainted).
### Beta tier
#### Go: 94.1% P / 100% R / 97.0% F1 *(28-case corpus)*
#### Go: 92.3% P / 96.0% R / 94.1% F1 *(53-case corpus, 2 FPs, 1 FN)*
- **Rule depth**: 4 source families, 4 sanitizer families, 9 sink matchers
covering HTML, URL, Shell, SQL, SSRF, Crypto, and File I/O.
- **Framework context**: Gin, Echo source matchers.
- **Known gaps**: no gated sinks, no deserialization class, allowlist
early-return patterns in path-pruning benchmark cases still produce FPs
(`go-pathprune-safe-001`). `fmt.Sprintf` is deliberately not a sink.
- **Open weak spots**: `cve-go-2023-3188-vulnerable` (owncast SSRF) goes
undetected, and two safe Go fixtures (`go-safe-007`, `go-safe-009`) draw
spurious SQLi and CMDi findings respectively. These are the only
imperfect language scores in the current corpus.
- **Known gaps**: no gated sinks, no deserialization class. `fmt.Sprintf`
is deliberately not a sink. Cap coverage is narrower than the Stable
tier and argument-role-aware sink modeling is not yet implemented for Go,
so production CI gates may surface additional FPs the corpus does not
exercise.
#### Java: 92.9% P / 100% R / 96.3% F1 *(23-case corpus)*
#### Java: 100% P / 100% R / 100% F1 *(35-case corpus)*
- **Rule depth**: 3 source families, 8 sanitizer families, 10 sink matchers
covering HTML, URL, Shell, SQL, Code, SSRF, and Deserialization.
@ -101,10 +114,19 @@ confidence, and modeled idioms.
- **Known gaps**: no gated sinks. Variable-receiver method calls
(`client.send(...)` vs `HttpClient.send(...)`) rely on type-qualified
resolution from receiver-type inference; flows where the receiver type
cannot be inferred are missed (`java-ssrf-002` historically persisted as
FN; closed via type facts but fragile on unusual builder chains).
cannot be inferred are conservatively over-tainted on unusual builder
chains.
#### Ruby: 100% P / 92.3% R / 96.0% F1 *(24-case corpus)*
#### PHP: 100% P / 100% R / 100% F1 *(37-case corpus)*
- **Rule depth**: 3 source families (`$_GET`, `$_POST`, `$_REQUEST`
superglobals), 7 sanitizer families, 10 sink matchers covering HTML, URL,
Shell, SQL, Code, SSRF, File I/O, and Deserialization.
- **Known gaps**: no gated sinks. Limited framework context (Laravel raw
methods only). `echo` language-construct detection is wired but its
inner-argument propagation is narrower than function-call sinks.
#### Ruby: 100% P / 100% R / 100% F1 *(39-case corpus)*
- **Rule depth**: 3 source families, 7 sanitizer families, 15 sink matchers
covering HTML, Shell, SQL, Code, SSRF, File I/O, and Deserialization.
@ -112,154 +134,168 @@ confidence, and modeled idioms.
- **Known gaps**: string interpolation inside shell and SQL strings is
recognized structurally but not modeled as a distinct operator.
`begin/rescue/ensure` exception-edge wiring is documented as deferred
(structurally incompatible with `build_try()`). One FN persists on an
interprocedural taint propagation case due to rule-ID mismatch, not a
missed flow (`rb-interproc-001`).
(structurally incompatible with `build_try()`). The previous open
`rb-interproc-001` FN closed in the 2026-04-28 baseline after the
Ruby `Kernel#open` CMDI sink and exact-match sigil work landed.
#### PHP: 86.7% P / 100% R / 92.9% F1 *(24-case corpus)*
#### Rust: 100% P / 100% R / 100% F1 *(70-case adversarial corpus)*
- **Rule depth**: 3 source families (`$_GET`, `$_POST`, `$_REQUEST`
superglobals), 7 sanitizer families, 10 sink matchers covering HTML, URL,
Shell, SQL, Code, SSRF, File I/O, and Deserialization.
- **Known gaps**: no gated sinks. Limited framework context (Laravel raw
methods only). Interprocedural sanitizer-wrapping case
(`php-interproc-safe-001`) persists as FP. `echo` language-construct
detection is wired but its inner-argument propagation is narrower than
function-call sinks.
### Preview tier
C and C++ are labeled **Preview** (not Experimental) to convey a specific
shape of limitation: the parser and existing rules produce useful findings
on obvious unsafe-API uses, but the engine **structurally cannot model**
several pervasive C/C++ constructs. Running Nyx on a C/C++ codebase and
seeing a clean report should not be read as a clean audit. Pair Nyx with
clang-tidy, the Clang Static Analyzer, or Infer for production use.
**Not modeled** (common to both C and C++):
- Pointer aliasing. Taint through `*p`, `p->field`, arbitrary pointer
arithmetic, and aliased writes are not tracked.
- Function pointers and callback dispatch. Indirect calls through
`void (*fn)(char *)` resolve to no callee.
- Array-element taint. Writes to `buf[i]` do not propagate taint to `buf`
in the general case; structural taint chains involving `fgets` → array →
`system` have rule-ID matching issues (`c-cmdi-004`).
- STL container operations (C++ only). `std::vector`, `std::map`,
`std::string` methods are not taint-aware; `c_str()` breaks taint chains
(`cpp-cmdi-003`).
- Lambdas and nested classes (C++ only). Not modeled.
- Complex socket setup (C++ only). E.g. `connect()` chains are not detected
(`cpp-ssrf-002`).
#### C: 85.7% P / 100% R / 92.3% F1 *(20-case corpus)*
- **Rule depth**: 3 source families, **2** sanitizer families (prefix-based
only), 5 sink matchers spanning Shell, File, SSRF, and Format-String.
- **Known gaps**: no framework rules, no gated sinks. Path-validation via
`strstr()` is not recognized as a guard (`c-safe-006`). Forward-declared
sanitizers are not tracked (`c-safe-008`).
#### C++: 80.0% P / 100% R / 88.9% F1 *(20-case corpus)*
- **Rule depth**: Clones the C ruleset (3 sources, 2 sanitizers, 5 sinks) and
adds `std::cin` / `std::getline` sources.
- **Known gaps**: same sanitizer-recognition gaps as C. See the "Not
modeled" list above for structural gaps (STL containers, `c_str()`,
`connect()`, lambdas, nested classes).
### Experimental tier
#### Rust: 76.0% P / 100% R / 86.4% F1 *(31-case adversarial corpus)*
Rust holds the largest per-language adversarial corpus and was promoted
from Experimental to Beta in the 2026-04-25 measurement after the PathFact
landings closed every previously-open `rs-safe-*` regression.
- **Rule depth**: 6 source families, **2** sanitizer families (prefix and
type-coercion), 11 sink matchers covering HTML, Shell, SQL, SSRF,
Deserialization, and File I/O. Extensive framework source coverage
(Axum, Actix, Rocket); the most of any language on the source side.
- **Recent additions (2026-04-20)**: new SQL class (`rusqlite`, `sqlx`,
`diesel`, `postgres`), new Deserialization class (`serde_yaml`,
`bincode`, `rmp_serde`, `ciborium`, `ron`, `toml`), expanded file I/O
(Axum, Actix, Rocket); the most of any language on the source side. The
narrow sanitizer count is the primary reason Rust is not in the Stable
tier. Engine-side path/typed sanitizer recognition (PathFact) compensates,
but the ruleset itself is shallow.
- **Recent additions**: SQL class (`rusqlite`, `sqlx`, `diesel`,
`postgres`), Deserialization class (`serde_yaml`, `bincode`,
`rmp_serde`, `ciborium`, `ron`, `toml`), expanded file I/O
(`fs::remove_file/dir/rename/copy`), `reqwest` SSRF builder chain.
- **Known gaps**:
- `rs-safe-003`: structural `cfg-unguarded-sink` fires when a tainted
variable is *declared* in scope but not used in the sink; intentional
for high-risk sinks.
- `rs-safe-009`: match-arm guards don't surface as `StmtKind::If`, so
`classify_condition` never sees the character-class validation.
- `safe_direct_sanitizer.rs`: still FP because the SSA lowering for
an OR-chain rejection (`if a || b || c { return X }`) joins both
return paths into a single block, losing the early-return
semantics. Distinct from the merged-return-block defect closed in
2026-04-24; tracked separately.
- **Closed by the 2026-04-23 PathFact domain**
(`src/abstract_interp/path_domain.rs`): `rs-safe-007` (`.replace("..",
"")` sanitiser), `rs-safe-008` (negative-validation return pattern),
`rs-safe-010` (static-map lookup; still handled by the dedicated
static-map analysis, but PathFact does not interfere), new `rs-safe-012`
(`.contains("..")` + `.starts_with('/')` intraprocedural rejection),
new `rs-safe-015` (`Path::new(p).is_absolute()` typed rejection), plus a
new `rs-path-006` negative-guard to prevent over-suppression.
- **Closed by the 2026-04-24 per-return-path PathFact landing**
(`PathFactReturnEntry` on `SsaFuncSummary` + structural
variant-wrapper transparency + non-data-return skipping +
path-fact-proven leaf detection in
`trace_tainted_leaf_values`):
`rs-safe-014` (Option-returning user sanitiser),
new `rs-safe-016` (cross-function `.contains("..")` rejection),
`CVE-2018-20997` patched (tar-rs zip-slip),
`CVE-2022-36113` patched (cargo `.cargo-ok` symlink),
`CVE-2024-24576` patched (BatBadBut argv injection).
- **Closed by recent PathFact landings**
(`src/abstract_interp/path_domain.rs` + per-return-path PathFact entries
on `SsaFuncSummary`): `rs-safe-007` (`.replace("..","")` sanitiser),
`rs-safe-008` (negative-validation return), `rs-safe-009` (match-arm
guards via condition lifting), `rs-safe-010` (static-map lookup),
`rs-safe-012` (`.contains("..")` + `.starts_with('/')` rejection),
`rs-safe-014` (Option-returning user sanitiser), `rs-safe-015`
(`Path::new(p).is_absolute()` typed rejection), `rs-safe-016`
(cross-function `.contains("..")` rejection), and CVE patches
`CVE-2018-20997`, `CVE-2022-36113`, `CVE-2024-24576`.
- **Not yet covered**: unsafe FFI / `std::mem::transmute` (no rules), Tokio
`process::Command` async variants (not distinguished from sync),
`hyper` / `surf` / `ureq` SSRF clients (reqwest family only), and Rocket /
Actix positive cases (rules exist but no benchmark fixtures yet).
`hyper` / `surf` / `ureq` SSRF clients (reqwest family only).
### Preview tier
C and C++ remain **Preview** despite reporting 100% rule-level F1 on the
synthetic corpus. A run of additions in late April taught the engine to
follow taint through several constructs that used to be hard cutoffs (STL
containers, builder chains, inline member functions, the wider `std::sto*`
family), so the gap between "passes the synthetic corpus" and "would catch
the same flow on a real codebase" is narrower than it used to be. It is not
zero. The biggest remaining gaps are deep pointer aliasing and function
pointers, both of which are pervasive in real C/C++ code. Treat a clean
report as a starting point, not an audit. Pair Nyx with clang-tidy, the
Clang Static Analyzer, or Infer for production use.
**What now works** (added in late April):
- STL container flow. `vec.push_back(tainted)` followed by
`vec.front().c_str()` carries taint into a downstream `system()` sink.
`std::map::insert_or_assign`, `find`, `count`, `at`, and `data` all
participate in the container store/load model.
- Inline class member functions. `class C { void run(...) { ... } };`
bodies are now extracted as their own functions, so an intra-file call
like `inner.run(input)` resolves to the body summary. Same fix covers
`struct_specifier`, `union_specifier`, `enum_specifier`,
`template_declaration`, and `extern "C"` blocks.
- Lambda passthrough. `auto echo = [](const char* s) { return s; };` carries
argument taint into the result via the engine's default call-argument
propagation.
- Builder chains. `Socket::builder().host(user).port(8080).connect()`
resolves the chained returns and fires on `.connect()` when `user` is
tainted; the safe variant with a hardcoded host stays quiet.
- Wider numeric sanitizer family. The full `std::sto*` set (including
`stoll`, `stoull`, `stold`) and the C-stdlib forms (`atoi`, `atof`,
`strtol`, etc.) clear all caps when they're called.
- More header / source extensions. `.cc`, `.cxx`, `.hpp`, `.hxx`, `.hh`,
and `.h++` are recognized as C++ on top of `.cpp` and `.c++`. `.h` is
intentionally still routed to C since it's ambiguous without a build
system.
**Still not modeled** (common to both C and C++):
- Deep pointer aliasing. Taint through `*p`, `p->field`, and arbitrary
pointer arithmetic is not tracked through arbitrary aliased writes.
Field-sensitive points-to (see [Advanced analysis](advanced-analysis.md))
handles the "lock on a sub-field" case but is not a general escape
analysis.
- Function pointers and callback dispatch. An indirect call through
`void (*fn)(char *)` resolves to no callee, so cross-pointer flows are
invisible.
- Array-element taint by index. Writes to `buf[i]` do not always propagate
taint to `buf` as a whole; the recent subscript-handling work helps the
general case but doesn't make `buf` an alias for every element.
- Nested classes beyond one level (C++ only).
#### C: 100% P / 100% R / 100% F1 *(30-case corpus)*
- **Rule depth**: 3 source families, **2** sanitizer families (the
`sanitize_*` prefix and numeric-parse functions), 5 sink matchers spanning
Shell, File, SSRF, and Format-String.
- **Known gaps**: no framework rules, no gated sinks. The structural
limitations listed above are the dominant concern; rule additions alone
will not lift this language out of the Preview tier.
#### C++: 100% P / 100% R / 100% F1 *(33-case corpus, plus 6 new fixtures for STL / builder / inline-method flows)*
- **Rule depth**: Builds on the C ruleset with `std::cin` / `std::getline`
sources and a wider numeric-sanitizer set covering the full `std::sto*`
family (3 sources, 3 sanitizer families, 5 sinks).
- **Known gaps**: still no framework rules and no gated sinks. The
structural blind spots are now narrower than they were a release ago
(see "What now works" above), but function pointers and the harder
pointer-aliasing patterns still produce false negatives.
---
## How the tiers were assigned
Because rule-level F1 has saturated for nine of ten languages, the tier
boundaries are drawn primarily on **rule depth** and **engine coverage of
real-world idioms** rather than on benchmark scores alone.
A language lands in **Stable** when all three hold:
- Rule set covers ≥ 8 vulnerability classes with both source and sink
matchers, and at least one class has argument-role-aware gating.
matchers, and at least one class has argument-role-aware **gated-sink**
modeling (e.g. `setAttribute("href", url)` only flags href-like attrs).
- Benchmark F1 ≥ 95% on a corpus of ≥ 25 cases.
- Advanced analysis (SSA lowering, context-sensitivity, symbolic-execution)
is exercised by fixtures for the language.
- Advanced analysis (SSA lowering, context-sensitivity, symbolic execution,
abstract interpretation) is exercised by fixtures for the language.
A language lands in **Beta** when benchmark F1 ≥ 90% but at least one of the
Stable criteria fails; usually narrower cap coverage or absence of gated
sinks.
A language lands in **Beta** when benchmark F1 is in the mid-90s or higher
on a meaningful corpus but at least one Stable criterion fails. Typical
gaps: absence of gated sinks, or sanitizer rule depth narrow enough that
the engine compensates structurally rather than via the ruleset.
A language lands in **Preview** when the engine structurally cannot model
constructs that are pervasive in typical codebases for that language
(pointer aliasing, function pointers, array-element taint, STL containers
for C/C++). Pattern-only coverage is useful but not sufficient as a sole
SAST gate.
A language lands in **Preview** when the engine has documented structural
blind spots for constructs that are pervasive in typical codebases for that
language. For C and C++ that means deep pointer aliasing, function
pointers, and array-element taint; STL container flow and builder chains
have moved out of the blind-spot list. Synthetic-corpus F1 is not a
reliable signal for Preview-tier languages: a clean report can coexist
with structural gaps.
A language lands in **Experimental** when rule depth is clearly narrower
(≤ 5 sinks and ≤ 2 sanitizers), or benchmark F1 < 90%, or documented weak
spots require engine changes rather than rule additions to close, but the
engine does not have the pervasive structural blind spots of the Preview
tier.
(The previous **Experimental** tier was retired in the 2026-04-25
measurement when Rust's adversarial corpus reached 100% F1; no language
currently sits in that tier.)
---
## What this means for you
- **CI gates**: safe to set strict `--fail-on HIGH` gates on Stable-tier
languages. On Beta-tier, expect occasional FP triage; the weak-spot lists
above tell you exactly what to skim for. On Preview- and Experimental-tier,
treat Nyx findings as a starting point for manual review rather than
authoritative; Preview-tier languages in particular have structural
blind spots that a clean report will not disclose.
languages. On Beta-tier, expect occasional FP triage on production code
(the synthetic corpus does not cover every framework idiom); the
weak-spot lists above tell you what to skim for. On Preview-tier, treat
Nyx findings as a starting point for manual review rather than
authoritative. STL container flow and builder chains are tracked now,
but deep pointer aliasing and function pointers are not, so a clean
report does not tell you what the engine could not see.
- **Rule contributions**: the shortest path to raising a language's tier is
contributing sink matchers and gated-sink registrations. Label files live
at `src/labels/<lang>.rs`; benchmark cases live at
`tests/benchmark/corpus/<lang>/`.
- **Scope planning**: if your primary stack is C, C++, or Rust, Nyx will
surface real findings, but you should budget for review time and consider
combining Nyx with a language-specific tool (e.g. `cargo-audit`,
`clang-tidy`) until those tiers mature.
- **Scope planning**: if your primary stack is C or C++, Nyx will surface
real findings on obvious unsafe-API uses, but budget for review time and
combine Nyx with `clang-tidy` or the Clang Static Analyzer. Rust is now
Beta-tier and suitable as a CI gate; pair with `cargo-audit` for
dependency CVEs.
The benchmark thresholds in `tests/benchmark_test.rs` are deliberately set
~5 pp below current baselines so any drop in a language's F1 fails CI. Tier

View file

@ -10,7 +10,7 @@ First run builds a SQLite index under `.nyx/`; later runs skip files whose conte
## What a finding looks like
<p align="center"><img src="../assets/screenshots/docs/cli-scan-quickstart.png" alt="nyx scan output: two HIGH taint flows (Python os.system, JavaScript document.write) framed by the brand purple gradient" width="900"/></p>
<p align="center"><img src="../assets/screenshots/cli-scan.png" alt="nyx scan output: HIGH taint flows from req.params.user, req.query.url, and req.query.path into exec/fetch/fs.readFileSync, framed by the brand purple gradient" width="900"/></p>
The same scan in console form:
@ -23,7 +23,7 @@ The same scan in console form:
Sink: os.system
6:5 ✖ [HIGH] py.cmdi.os_system (Score: 64, Confidence: High)
Os.system() — shell command execution
os.system() runs a shell command
/tmp/demo/xss_document_write.js
5:5 ✖ [HIGH] taint-unsanitised-flow (source 3:18) (Score: 81, Confidence: High)
@ -33,7 +33,7 @@ The same scan in console form:
Sink: document.write
5:5 ⚠ [MEDIUM] js.xss.document_write (Score: 34, Confidence: High)
Document.write() — XSS sink
document.write() is an XSS sink
warning 'demo' generated 10 issues.
Finished in 0.054s.

View file

@ -46,6 +46,44 @@ If you forward the port over SSH or expose it through a reverse proxy, the host-
The numeric `:id` for finding URLs is the position index in the current scan, not a stable fingerprint. Bookmarks across scans aren't reliable; rely on file path + line.
### Overview and Health Score
The overview is the landing page after a scan. Severity counts, top affected files, OWASP coverage, and a 0 to 100 Health Score with a letter grade.
#### How the Health Score is calculated
Two things drive the score. The density of risk in the codebase, and hard guardrails that decide what the grade can mean.
Each finding contributes weight = `severity_base × confidence_factor × verdict_factor × context_factor`:
- Severity base: HIGH 10, MEDIUM 3, LOW (security) 0.5
- Confidence: High 1.0, Medium 0.6, Low 0.3
- Symex verdict: Confirmed 1.2, NotAttempted 1.0, Inconclusive 0.7, Infeasible 0.1
- Context: cross-file taint flow 1.15, intra-file flow 1.0, AST-only or no flow 0.75, test path 0.3
Quality lints (rule IDs containing `.quality.`) skip the per-finding weight and instead apply a saturating drag, capped at 15 points (so 1000 unwrap lints don't grade worse than 300 do). Total weight gets divided by `sqrt(files / 100)`, clamped between 1 and roughly 22, so a 100-file repo and a 50000-file repo see different denominators but a monorepo can't dilute its way out of a real HIGH.
The result feeds a log curve into a 0 to 100 base, minus the quality drag. Then HIGH guardrails apply, keyed on the *credibility-adjusted* HIGH count rather than the raw count:
| effective HIGH | ceiling |
|---|---|
| 0 | 100 |
| 1 | 85 |
| 2 | 78 |
| 3 to 5 | 68 |
| 6 to 10 | 58 |
| 11+ | 45 |
A repo with zero effective HIGHs never grades below C 70. That floor is the structural promise that the score isn't an automated F-machine for projects that have lots of LOW noise but no critical issues.
Modifiers in the ±5 range nudge the result for trend (only after the second scan), triage coverage (only when total findings ≥ 20), reintroduced findings, and stale HIGHs more than 30 days old.
#### What the score doesn't measure
It's a Nyx-finding-pressure metric, not a security audit. Score 100 means Nyx didn't find anything under its current rules and language coverage; it doesn't certify the absence of vulnerabilities. The score doesn't see runtime config, IAM, secret stores, dependency CVEs, or anything outside the source tree being scanned. A repo of mostly Kotlin (where Nyx coverage is thin) will score artificially well because most of the code never gets evaluated.
The current ceilings are calibrated for v0.5 scanner false-positive rates. As symex coverage and rule precision improve, the ceilings tighten. Calibration data and the rationale behind each tunable lives in [health-score-audit.md](health-score-audit.md).
### Findings and Finding detail
The findings list is filterable by severity, confidence, category, language, rule ID, and triage state.