nyx/docs/detectors/patterns.md
Eli Peter 1bbe4b1cfb
Phase 1 (#33)
* chore: Exclude CLAUDE.md from Cargo.toml

* feat: add callgraph module and integrate into main analysis flow

* feat: enhance CLI with new severity filtering and analysis modes

* feat: update CHANGELOG with recent enhancements and fixes to severity filtering and output handling

* feat: implement state-model dataflow analysis for resource lifecycle and auth state

* feat: enhance diagnostic output formatting and add evidence structure

* feat: implement attack surface ranking for diagnostics with scoring and sorting

* feat: add comprehensive documentation for installation, usage, and rules reference

* feat: add multiple language support for command execution and evaluation endpoints

* feat: implement inline suppression for findings using `nyx:ignore` comments

* feat: add confidence levels to AST patterns and update output structure

* feat: implement low-noise prioritization system with category filtering, rollup grouping, and configurable budgets

* feat: bump version to 0.4.0 and update changelog with new features and improvements

* feat: add dead code allowances to various functions in mod.rs and real_world_tests.rs
2026-02-25 21:16:36 -05:00

6 KiB

AST Pattern Matching

Summary

AST patterns are tree-sitter queries that match specific structural code constructs. They are the simplest and fastest detector family — no dataflow, no CFG, just structural presence. A match means the dangerous construct exists in the code; it does not prove the code is exploitable.

AST patterns run in all analysis modes, including --mode ast (where they are the only active detector).

Rule IDs

Pattern rule IDs follow the format <lang>.<category>.<specific>:

rs.memory.transmute
js.code_exec.eval
py.deser.pickle_loads
c.memory.gets
java.sqli.execute_concat

See the Rule Reference for a complete listing per language.

Pattern Tiers

Tier Meaning Examples
A Structural presence alone is high-signal gets(), eval(), pickle.loads(), mem::transmute
B Query includes a heuristic guard SQL execute with concatenated arg, printf(var) with non-literal format

Tier B patterns use additional tree-sitter predicates to reduce false positives. For example, java.sqli.execute_concat only fires when executeQuery() receives a binary_expression (string concatenation) as its argument, not when it receives a literal or parameter placeholder.

What It Detects

By category

Category What it matches Example languages
CommandExec Shell command execution functions C (system), Python (os.system), Ruby (backticks)
CodeExec Dynamic code evaluation JS (eval, new Function()), Python (exec), PHP (eval)
Deserialization Unsafe object deserialization Java (readObject), Python (pickle.loads), Ruby (Marshal.load)
SqlInjection SQL with string concatenation Java, Go, Python, PHP (Tier B heuristic)
PathTraversal File inclusion with variable path PHP (include $var)
Xss XSS sink functions JS (document.write, outerHTML), Java (getWriter().print)
Crypto Weak cryptographic algorithms All languages (md5, sha1, Math.random())
Secrets Hardcoded credentials Go (variable name matching)
InsecureTransport Unencrypted communication Go (InsecureSkipVerify), JS (fetch("http://"))
Reflection Dynamic class/method dispatch Java (Class.forName, Method.invoke), Ruby (send, constantize)
MemorySafety Memory safety violations Rust (transmute, unsafe), C (gets, strcpy, sprintf)
Prototype Prototype pollution JS/TS (__proto__ assignment)
CodeQuality Panic/abort/type-safety issues Rust (unwrap, panic!), TS (as any)

What It Cannot Detect

  • Dataflow: Patterns don't track whether the dangerous function receives tainted input. eval("hello") (safe) and eval(userInput) (dangerous) both match js.code_exec.eval.
  • Context: Patterns don't understand whether the code is reachable, guarded, or inside a test.
  • Semantics: strcpy(dst, src) always matches — it cannot determine buffer sizes.
  • Indirect calls: Function pointers, dynamic dispatch, and aliased references are invisible.

Common False Positives

Scenario Why it fires Mitigation
eval() with a hardcoded string literal Pattern matches structural presence Taint analysis won't flag this — use --mode cfg for fewer false positives
unsafe block in Rust with sound justification All unsafe blocks match Filter with --severity ">=MEDIUM" (unsafe_block is Medium)
.unwrap() in test code Acceptable in tests Default non-prod downgrade reduces severity
md5() used for checksums (not security) Pattern doesn't know usage intent Filter Low severity or add to exclusions
SQL concatenation with trusted data Tier B heuristic can't verify data source Taint analysis is more precise here

Common False Negatives

Scenario Why it's missed
eval called via alias (let e = eval; e(input)) Pattern matches the identifier eval, not the resolved function
Dangerous function in a macro expansion Tree-sitter parses the macro call, not the expansion
SQL injection via ORM query builder No pattern for ORM-specific query building
Imported function under different name from os import system as s; s(cmd) — pattern looks for system

Confidence Signals

Signal Meaning
Tier A High confidence — the function itself is dangerous
Tier B Moderate confidence — heuristic guard reduces false positives
High severity Critical vulnerability class (command exec, deserialization)
Low severity Informational (weak crypto, code quality)
Non-prod path Finding in test/vendor code — downgraded by default

Tuning and Noise Controls

Severity filtering

# Skip code-quality and weak-crypto findings
nyx scan . --severity ">=MEDIUM"

# Only critical findings
nyx scan . --severity HIGH

Use taint for precision

# Taint-only mode: only report findings with confirmed dataflow
nyx scan . --mode cfg

Exclude directories

[scanner]
excluded_directories = ["node_modules", "vendor", "generated"]

Examples

Tier A — structural presence

C: Banned function

char buf[64];
gets(buf);  // c.memory.gets — always dangerous, no safe usage

Python: Unsafe deserialization

import pickle
data = pickle.loads(user_input)  # py.deser.pickle_loads

Tier B — heuristic-guarded

Java: SQL concatenation

// Fires: concatenated argument
stmt.executeQuery("SELECT * FROM users WHERE id=" + userId);
// java.sqli.execute_concat

// Does NOT fire: parameterized query
stmt.executeQuery(preparedSql);

C: Format string

// Fires: variable as first argument
printf(user_input);  // c.memory.printf_no_fmt

// Does NOT fire: literal format string
printf("%s", user_input);