mirror of
https://github.com/elicpeter/nyx.git
synced 2026-06-21 20:18:06 +02:00
Phase 1 (#33)
* chore: Exclude CLAUDE.md from Cargo.toml * feat: add callgraph module and integrate into main analysis flow * feat: enhance CLI with new severity filtering and analysis modes * feat: update CHANGELOG with recent enhancements and fixes to severity filtering and output handling * feat: implement state-model dataflow analysis for resource lifecycle and auth state * feat: enhance diagnostic output formatting and add evidence structure * feat: implement attack surface ranking for diagnostics with scoring and sorting * feat: add comprehensive documentation for installation, usage, and rules reference * feat: add multiple language support for command execution and evaluation endpoints * feat: implement inline suppression for findings using `nyx:ignore` comments * feat: add confidence levels to AST patterns and update output structure * feat: implement low-noise prioritization system with category filtering, rollup grouping, and configurable budgets * feat: bump version to 0.4.0 and update changelog with new features and improvements * feat: add dead code allowances to various functions in mod.rs and real_world_tests.rs
This commit is contained in:
parent
19b578c5c4
commit
1bbe4b1cfb
456 changed files with 25628 additions and 1228 deletions
149
docs/detectors/patterns.md
Normal file
149
docs/detectors/patterns.md
Normal file
|
|
@ -0,0 +1,149 @@
|
|||
# AST Pattern Matching
|
||||
|
||||
## Summary
|
||||
|
||||
AST patterns are tree-sitter queries that match specific structural code constructs. They are the simplest and fastest detector family — no dataflow, no CFG, just structural presence. A match means the dangerous construct exists in the code; it does not prove the code is exploitable.
|
||||
|
||||
AST patterns run in all analysis modes, including `--mode ast` (where they are the only active detector).
|
||||
|
||||
## Rule IDs
|
||||
|
||||
Pattern rule IDs follow the format `<lang>.<category>.<specific>`:
|
||||
|
||||
```
|
||||
rs.memory.transmute
|
||||
js.code_exec.eval
|
||||
py.deser.pickle_loads
|
||||
c.memory.gets
|
||||
java.sqli.execute_concat
|
||||
```
|
||||
|
||||
See the [Rule Reference](../rules/index.md) for a complete listing per language.
|
||||
|
||||
## Pattern Tiers
|
||||
|
||||
| Tier | Meaning | Examples |
|
||||
|------|---------|---------|
|
||||
| **A** | Structural presence alone is high-signal | `gets()`, `eval()`, `pickle.loads()`, `mem::transmute` |
|
||||
| **B** | Query includes a heuristic guard | SQL `execute` with concatenated arg, `printf(var)` with non-literal format |
|
||||
|
||||
Tier B patterns use additional tree-sitter predicates to reduce false positives. For example, `java.sqli.execute_concat` only fires when `executeQuery()` receives a `binary_expression` (string concatenation) as its argument, not when it receives a literal or parameter placeholder.
|
||||
|
||||
## What It Detects
|
||||
|
||||
### By category
|
||||
|
||||
| Category | What it matches | Example languages |
|
||||
|----------|----------------|-------------------|
|
||||
| **CommandExec** | Shell command execution functions | C (`system`), Python (`os.system`), Ruby (backticks) |
|
||||
| **CodeExec** | Dynamic code evaluation | JS (`eval`, `new Function()`), Python (`exec`), PHP (`eval`) |
|
||||
| **Deserialization** | Unsafe object deserialization | Java (`readObject`), Python (`pickle.loads`), Ruby (`Marshal.load`) |
|
||||
| **SqlInjection** | SQL with string concatenation | Java, Go, Python, PHP (Tier B heuristic) |
|
||||
| **PathTraversal** | File inclusion with variable path | PHP (`include $var`) |
|
||||
| **Xss** | XSS sink functions | JS (`document.write`, `outerHTML`), Java (`getWriter().print`) |
|
||||
| **Crypto** | Weak cryptographic algorithms | All languages (`md5`, `sha1`, `Math.random()`) |
|
||||
| **Secrets** | Hardcoded credentials | Go (variable name matching) |
|
||||
| **InsecureTransport** | Unencrypted communication | Go (`InsecureSkipVerify`), JS (`fetch("http://")`) |
|
||||
| **Reflection** | Dynamic class/method dispatch | Java (`Class.forName`, `Method.invoke`), Ruby (`send`, `constantize`) |
|
||||
| **MemorySafety** | Memory safety violations | Rust (`transmute`, `unsafe`), C (`gets`, `strcpy`, `sprintf`) |
|
||||
| **Prototype** | Prototype pollution | JS/TS (`__proto__` assignment) |
|
||||
| **CodeQuality** | Panic/abort/type-safety issues | Rust (`unwrap`, `panic!`), TS (`as any`) |
|
||||
|
||||
## What It Cannot Detect
|
||||
|
||||
- **Dataflow**: Patterns don't track whether the dangerous function receives tainted input. `eval("hello")` (safe) and `eval(userInput)` (dangerous) both match `js.code_exec.eval`.
|
||||
- **Context**: Patterns don't understand whether the code is reachable, guarded, or inside a test.
|
||||
- **Semantics**: `strcpy(dst, src)` always matches — it cannot determine buffer sizes.
|
||||
- **Indirect calls**: Function pointers, dynamic dispatch, and aliased references are invisible.
|
||||
|
||||
## Common False Positives
|
||||
|
||||
| Scenario | Why it fires | Mitigation |
|
||||
|----------|-------------|------------|
|
||||
| `eval()` with a hardcoded string literal | Pattern matches structural presence | Taint analysis won't flag this — use `--mode cfg` for fewer false positives |
|
||||
| `unsafe` block in Rust with sound justification | All unsafe blocks match | Filter with `--severity ">=MEDIUM"` (unsafe_block is Medium) |
|
||||
| `.unwrap()` in test code | Acceptable in tests | Default non-prod downgrade reduces severity |
|
||||
| `md5()` used for checksums (not security) | Pattern doesn't know usage intent | Filter Low severity or add to exclusions |
|
||||
| SQL concatenation with trusted data | Tier B heuristic can't verify data source | Taint analysis is more precise here |
|
||||
|
||||
## Common False Negatives
|
||||
|
||||
| Scenario | Why it's missed |
|
||||
|----------|----------------|
|
||||
| `eval` called via alias (`let e = eval; e(input)`) | Pattern matches the identifier `eval`, not the resolved function |
|
||||
| Dangerous function in a macro expansion | Tree-sitter parses the macro call, not the expansion |
|
||||
| SQL injection via ORM query builder | No pattern for ORM-specific query building |
|
||||
| Imported function under different name | `from os import system as s; s(cmd)` — pattern looks for `system` |
|
||||
|
||||
## Confidence Signals
|
||||
|
||||
| Signal | Meaning |
|
||||
|--------|---------|
|
||||
| **Tier A** | High confidence — the function itself is dangerous |
|
||||
| **Tier B** | Moderate confidence — heuristic guard reduces false positives |
|
||||
| **High severity** | Critical vulnerability class (command exec, deserialization) |
|
||||
| **Low severity** | Informational (weak crypto, code quality) |
|
||||
| **Non-prod path** | Finding in test/vendor code — downgraded by default |
|
||||
|
||||
## Tuning and Noise Controls
|
||||
|
||||
### Severity filtering
|
||||
|
||||
```bash
|
||||
# Skip code-quality and weak-crypto findings
|
||||
nyx scan . --severity ">=MEDIUM"
|
||||
|
||||
# Only critical findings
|
||||
nyx scan . --severity HIGH
|
||||
```
|
||||
|
||||
### Use taint for precision
|
||||
|
||||
```bash
|
||||
# Taint-only mode: only report findings with confirmed dataflow
|
||||
nyx scan . --mode cfg
|
||||
```
|
||||
|
||||
### Exclude directories
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
excluded_directories = ["node_modules", "vendor", "generated"]
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Tier A — structural presence
|
||||
|
||||
**C: Banned function**
|
||||
```c
|
||||
char buf[64];
|
||||
gets(buf); // c.memory.gets — always dangerous, no safe usage
|
||||
```
|
||||
|
||||
**Python: Unsafe deserialization**
|
||||
```python
|
||||
import pickle
|
||||
data = pickle.loads(user_input) # py.deser.pickle_loads
|
||||
```
|
||||
|
||||
### Tier B — heuristic-guarded
|
||||
|
||||
**Java: SQL concatenation**
|
||||
```java
|
||||
// Fires: concatenated argument
|
||||
stmt.executeQuery("SELECT * FROM users WHERE id=" + userId);
|
||||
// java.sqli.execute_concat
|
||||
|
||||
// Does NOT fire: parameterized query
|
||||
stmt.executeQuery(preparedSql);
|
||||
```
|
||||
|
||||
**C: Format string**
|
||||
```c
|
||||
// Fires: variable as first argument
|
||||
printf(user_input); // c.memory.printf_no_fmt
|
||||
|
||||
// Does NOT fire: literal format string
|
||||
printf("%s", user_input);
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue