apunkt/nyx

mirror of https://github.com/elicpeter/nyx.git synced 2026-06-06 19:35:13 +02:00

* chore: Exclude CLAUDE.md from Cargo.toml

* feat: add callgraph module and integrate into main analysis flow

* feat: enhance CLI with new severity filtering and analysis modes

* feat: update CHANGELOG with recent enhancements and fixes to severity filtering and output handling

* feat: implement state-model dataflow analysis for resource lifecycle and auth state

* feat: enhance diagnostic output formatting and add evidence structure

* feat: implement attack surface ranking for diagnostics with scoring and sorting

* feat: add comprehensive documentation for installation, usage, and rules reference

* feat: add multiple language support for command execution and evaluation endpoints

* feat: implement inline suppression for findings using `nyx:ignore` comments

* feat: add confidence levels to AST patterns and update output structure

* feat: implement low-noise prioritization system with category filtering, rollup grouping, and configurable budgets

* feat: bump version to 0.4.0 and update changelog with new features and improvements

* feat: add dead code allowances to various functions in mod.rs and real_world_tests.rs

2026-02-25 21:16:36 -05:00

6 KiB

Raw Blame History

AST Pattern Matching

Summary

AST patterns are tree-sitter queries that match specific structural code constructs. They are the simplest and fastest detector family — no dataflow, no CFG, just structural presence. A match means the dangerous construct exists in the code; it does not prove the code is exploitable.

AST patterns run in all analysis modes, including --mode ast (where they are the only active detector).

Rule IDs

Pattern rule IDs follow the format <lang>.<category>.<specific>:

rs.memory.transmute
js.code_exec.eval
py.deser.pickle_loads
c.memory.gets
java.sqli.execute_concat

See the Rule Reference for a complete listing per language.

Pattern Tiers

Tier	Meaning	Examples
A	Structural presence alone is high-signal	`gets()`, `eval()`, `pickle.loads()`, `mem::transmute`
B	Query includes a heuristic guard	SQL `execute` with concatenated arg, `printf(var)` with non-literal format

Tier B patterns use additional tree-sitter predicates to reduce false positives. For example, java.sqli.execute_concat only fires when executeQuery() receives a binary_expression (string concatenation) as its argument, not when it receives a literal or parameter placeholder.

What It Detects

By category

Category	What it matches	Example languages
CommandExec	Shell command execution functions	C (`system`), Python (`os.system`), Ruby (backticks)
CodeExec	Dynamic code evaluation	JS (`eval`, `new Function()`), Python (`exec`), PHP (`eval`)
Deserialization	Unsafe object deserialization	Java (`readObject`), Python (`pickle.loads`), Ruby (`Marshal.load`)
SqlInjection	SQL with string concatenation	Java, Go, Python, PHP (Tier B heuristic)
PathTraversal	File inclusion with variable path	PHP (`include $var`)
Xss	XSS sink functions	JS (`document.write`, `outerHTML`), Java (`getWriter().print`)
Crypto	Weak cryptographic algorithms	All languages (`md5`, `sha1`, `Math.random()`)
Secrets	Hardcoded credentials	Go (variable name matching)
InsecureTransport	Unencrypted communication	Go (`InsecureSkipVerify`), JS (`fetch("http://")`)
Reflection	Dynamic class/method dispatch	Java (`Class.forName`, `Method.invoke`), Ruby (`send`, `constantize`)
MemorySafety	Memory safety violations	Rust (`transmute`, `unsafe`), C (`gets`, `strcpy`, `sprintf`)
Prototype	Prototype pollution	JS/TS (`__proto__` assignment)
CodeQuality	Panic/abort/type-safety issues	Rust (`unwrap`, `panic!`), TS (`as any`)

What It Cannot Detect

Dataflow: Patterns don't track whether the dangerous function receives tainted input. eval("hello") (safe) and eval(userInput) (dangerous) both match js.code_exec.eval.
Context: Patterns don't understand whether the code is reachable, guarded, or inside a test.
Semantics: strcpy(dst, src) always matches — it cannot determine buffer sizes.
Indirect calls: Function pointers, dynamic dispatch, and aliased references are invisible.

Common False Positives

Scenario	Why it fires	Mitigation
`eval()` with a hardcoded string literal	Pattern matches structural presence	Taint analysis won't flag this — use `--mode cfg` for fewer false positives
`unsafe` block in Rust with sound justification	All unsafe blocks match	Filter with `--severity ">=MEDIUM"` (unsafe_block is Medium)
`.unwrap()` in test code	Acceptable in tests	Default non-prod downgrade reduces severity
`md5()` used for checksums (not security)	Pattern doesn't know usage intent	Filter Low severity or add to exclusions
SQL concatenation with trusted data	Tier B heuristic can't verify data source	Taint analysis is more precise here

Common False Negatives

Scenario	Why it's missed
`eval` called via alias (`let e = eval; e(input)`)	Pattern matches the identifier `eval`, not the resolved function
Dangerous function in a macro expansion	Tree-sitter parses the macro call, not the expansion
SQL injection via ORM query builder	No pattern for ORM-specific query building
Imported function under different name	`from os import system as s; s(cmd)` — pattern looks for `system`

Confidence Signals

Signal	Meaning
Tier A	High confidence — the function itself is dangerous
Tier B	Moderate confidence — heuristic guard reduces false positives
High severity	Critical vulnerability class (command exec, deserialization)
Low severity	Informational (weak crypto, code quality)
Non-prod path	Finding in test/vendor code — downgraded by default

Tuning and Noise Controls

Severity filtering

# Skip code-quality and weak-crypto findings
nyx scan . --severity ">=MEDIUM"

# Only critical findings
nyx scan . --severity HIGH

Use taint for precision

# Taint-only mode: only report findings with confirmed dataflow
nyx scan . --mode cfg

Exclude directories

[scanner]
excluded_directories = ["node_modules", "vendor", "generated"]

Examples

Tier A — structural presence

C: Banned function

char buf[64];
gets(buf);  // c.memory.gets — always dangerous, no safe usage

Python: Unsafe deserialization

import pickle
data = pickle.loads(user_input)  # py.deser.pickle_loads

Tier B — heuristic-guarded

Java: SQL concatenation

// Fires: concatenated argument
stmt.executeQuery("SELECT * FROM users WHERE id=" + userId);
// java.sqli.execute_concat

// Does NOT fire: parameterized query
stmt.executeQuery(preparedSql);

C: Format string

// Fires: variable as first argument
printf(user_input);  // c.memory.printf_no_fmt

// Does NOT fire: literal format string
printf("%s", user_input);

6 KiB Raw Blame History