apunkt/nyx

mirror of https://github.com/elicpeter/nyx.git synced 2026-06-06 19:35:13 +02:00

* chore: Exclude CLAUDE.md from Cargo.toml

* feat: add callgraph module and integrate into main analysis flow

* feat: enhance CLI with new severity filtering and analysis modes

* feat: update CHANGELOG with recent enhancements and fixes to severity filtering and output handling

* feat: implement state-model dataflow analysis for resource lifecycle and auth state

* feat: enhance diagnostic output formatting and add evidence structure

* feat: implement attack surface ranking for diagnostics with scoring and sorting

* feat: add comprehensive documentation for installation, usage, and rules reference

* feat: add multiple language support for command execution and evaluation endpoints

* feat: implement inline suppression for findings using `nyx:ignore` comments

* feat: add confidence levels to AST patterns and update output structure

* feat: implement low-noise prioritization system with category filtering, rollup grouping, and configurable budgets

* feat: bump version to 0.4.0 and update changelog with new features and improvements

* feat: add dead code allowances to various functions in mod.rs and real_world_tests.rs

2026-02-25 21:16:36 -05:00

8.7 KiB

Raw Blame History

Taint Analysis

Summary

Nyx's taint analysis tracks the flow of untrusted data from sources (where data enters the program) through assignments and function calls to sinks (where dangerous operations happen). If the data reaches a sink without passing through a sanitizer with matching capabilities, a finding is emitted.

The engine uses a monotone forward dataflow analysis over a finite lattice with guaranteed termination. Analysis is intra-procedural with cross-file function summaries — it does not follow calls into other functions but uses pre-computed summaries of their behavior.

Rule ID

taint-unsanitised-flow (source <line>:<col>)

One rule ID covers all taint findings. The parenthetical identifies the specific source location.

What It Detects

Environment variables flowing to shell execution (env::var → Command::new)
User input flowing to code evaluation (req.body → eval())
File contents flowing to SQL queries (fs::read_to_string → db.execute())
Request parameters flowing to HTML output (req.query → innerHTML)
Any source-to-sink flow where the sink's required capability is not stripped by a sanitizer

What It Cannot Detect

Inter-procedural flows without summaries: If a function isn't summarized (e.g. from a third-party library without source), the taint engine cannot track data through it. It conservatively treats unknown callees as neither propagating nor sanitizing.
Flows through data structures: Taint is tracked per-variable, not per-field. obj.field = tainted; sink(obj.other_field) may produce a false positive because taint attaches to obj as a whole.
Aliasing: let y = &x; sink(*y) — the engine tracks y as a fresh variable, not an alias of x. This can cause false negatives.
Complex control flow: The analysis is flow-sensitive (respects control flow within a function) but does not track taint through arbitrary loops with complex exit conditions.
Implicit flows: Taint only follows explicit data flow, not information flow through branching (e.g. if (secret) { x = 1 } else { x = 0 } does not taint x).

Common False Positives

Scenario	Why it happens	Mitigation
Custom sanitizer not recognized	Nyx only knows built-in and configured sanitizers	Add a custom sanitizer rule in config
Taint through struct fields	Variable-level (not field-level) tracking	No current mitigation; field sensitivity is planned
Dead code paths	The engine is path-insensitive within a function (it considers all paths)	Contradiction pruning catches some cases; path-validated findings score lower
Library wrappers	A wrapper around a dangerous function may re-introduce taint that was sanitized by the wrapper	Summarize the wrapper function or add it as a sanitizer

Common False Negatives

Scenario	Why it's missed
Third-party library calls	No summary available; callee treated as opaque
Taint through global/static variables	Not tracked across function boundaries
Taint through closures/callbacks in some languages	Closure capture analysis is limited (JS/TS/Ruby/Go anonymous functions ARE analyzed)
Flows spanning more than two files	Summary approximation loses precision at depth

Confidence Signals

These signals in the output indicate higher-confidence findings:

Signal	What it means
Evidence: Source + Sink	Both endpoints identified with specific function names and locations
Source kind = user input	Source is directly controllable by an attacker (req.body, argv, etc.)
path_validated = false	No validation guard on the path — higher exploitability
No guard_kind	No dominating predicate check (null check, error check, etc.)
High rank_score	Multiple confidence signals combined

Lower-confidence:

Signal	What it means
path_validated = true	A validation predicate guards the path — may not be exploitable
guard_kind = "ValidationCall"	An explicit validation function was called before the sink
Source kind = database	Data from DB — may already be validated at insertion time

Tuning and Noise Controls

Add custom sanitizers

If your codebase has a custom sanitizer that Nyx doesn't recognize:

# nyx.local
[[analysis.languages.javascript.rules]]
matchers = ["escapeHtml", "sanitizeInput"]
kind = "sanitizer"
cap = "html_escape"

Or via CLI:

nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape

Filter by severity

nyx scan . --severity HIGH          # Only high-severity taint findings
nyx scan . --severity ">=MEDIUM"    # Skip low-severity

Skip non-production code

By default, findings in tests/, vendor/, build/ paths are downgraded one severity tier. To exclude them entirely, add to config:

[scanner]
excluded_directories = ["tests", "vendor", "build", "examples"]

Disable taint (AST-only mode)

nyx scan . --mode ast

Example

Vulnerable code (Rust):

use std::env;
use std::process::Command;

fn main() {
    let cmd = env::var("USER_CMD").unwrap();          // line 5: source
    Command::new("sh").arg("-c").arg(&cmd).output();   // line 6: sink
}

Finding:

[HIGH]   taint-unsanitised-flow (source 5:15)  src/main.rs:6:5
         Source: env::var("USER_CMD") at 5:15
         Sink: Command::new("sh").arg("-c")
         Score: 76

Safe alternative:

use std::env;
use std::process::Command;

fn main() {
    let cmd = env::var("USER_CMD").unwrap();
    // Use the value as a direct argument, not a shell command
    Command::new(&cmd).output();
    // Or validate against an allowlist
}

Technical Details

Capability System

Taint uses a bitflag capability system to match sources with appropriate sanitizers and sinks:

Capability	Bit	Sources	Sanitizers	Sinks
`ENV_VAR`	0x01	`env::var`, `getenv`	—	—
`HTML_ESCAPE`	0x02	—	`html_escape`, `DOMPurify.sanitize`	`innerHTML`, `document.write`
`SHELL_ESCAPE`	0x04	—	`shell_escape`	`Command::new`, `system()`, `eval()`
`URL_ENCODE`	0x08	—	`encodeURIComponent`	`location.href`
`JSON_PARSE`	0x10	—	`JSON.parse`	—
`FILE_IO`	0x20	—	`filepath.Clean`, `basename`, `os.path.realpath`	`fopen`, `open`, `send_file`, `fs::read_to_string`
`FMT_STRING`	0x40	—	—	`printf(var)`

Sources typically use Cap::all() to match any sink. A sanitizer strips specific capability bits. A finding fires when a tainted variable reaches a sink and the taint still has the matching capability bit set.

Nested Function Analysis

The CFG builder recursively discovers function expressions nested inside call arguments:

JavaScript/TypeScript: function_expression, arrow_function inside call arguments (e.g., Express route handlers)
Ruby: do_block and block nodes (e.g., Sinatra get '/path' do...end)
Go: func_literal (anonymous function literals)

Each nested function is walked as a separate scope and receives a unique identifier (<anon@{byte_offset}>) to prevent collisions when multiple anonymous functions exist in the same file.

Chained Call Classification

Method chains like r.URL.Query().Get("host") are normalized by stripping internal () segments between . separators. The classifier matches against both the original text and the normalized form, enabling rules like r.URL to match within r.URL.Query.Get.

Nested Call Fallback

When the outermost call in an expression doesn't classify as a source/sink, the engine tries all nested inner calls. This handles patterns like str(eval(expr)) where str is not a sink but the inner eval is.

Rust `if let` / `while let` Pattern Bindings

The CFG builder recognizes Rust let_condition nodes inside if and while expressions. The value expression is classified for source/sink labels, and the pattern binding is extracted as a variable definition:

if let Ok(cmd) = env::var("CMD") {
    // cmd is tainted — env::var is a source, cmd is the binding
    Command::new("sh").arg("-c").arg(&cmd).output();  // taint-unsanitised-flow
}

This also works for while let patterns.

JS/TS Two-Level Solve

For JavaScript and TypeScript, taint analysis uses a two-level approach:

Level 1: Solve top-level code (module scope)
Level 2: Solve each function seeded with the converged top-level state

This prevents false positives from cross-function taint leakage while preserving global-to-function flows.

8.7 KiB Raw Blame History