mirror of
https://github.com/elicpeter/nyx.git
synced 2026-06-21 20:18:06 +02:00
Phase 1 (#33)
* chore: Exclude CLAUDE.md from Cargo.toml * feat: add callgraph module and integrate into main analysis flow * feat: enhance CLI with new severity filtering and analysis modes * feat: update CHANGELOG with recent enhancements and fixes to severity filtering and output handling * feat: implement state-model dataflow analysis for resource lifecycle and auth state * feat: enhance diagnostic output formatting and add evidence structure * feat: implement attack surface ranking for diagnostics with scoring and sorting * feat: add comprehensive documentation for installation, usage, and rules reference * feat: add multiple language support for command execution and evaluation endpoints * feat: implement inline suppression for findings using `nyx:ignore` comments * feat: add confidence levels to AST patterns and update output structure * feat: implement low-noise prioritization system with category filtering, rollup grouping, and configurable budgets * feat: bump version to 0.4.0 and update changelog with new features and improvements * feat: add dead code allowances to various functions in mod.rs and real_world_tests.rs
This commit is contained in:
parent
19b578c5c4
commit
1bbe4b1cfb
456 changed files with 25628 additions and 1228 deletions
161
docs/detectors/cfg.md
Normal file
161
docs/detectors/cfg.md
Normal file
|
|
@ -0,0 +1,161 @@
|
|||
# CFG Structural Analysis
|
||||
|
||||
## Summary
|
||||
|
||||
Nyx builds an intra-procedural control-flow graph (CFG) for each function and analyzes structural properties: whether sinks are guarded by sanitizers or validators, whether web handlers check authentication, whether resources are released on all exit paths, and whether error-handling code terminates properly.
|
||||
|
||||
These detectors use **dominator analysis** — they check whether a guard node dominates (must execute before) a sink node on the CFG.
|
||||
|
||||
## Rule IDs
|
||||
|
||||
| Rule ID | Severity | Description |
|
||||
|---------|----------|-------------|
|
||||
| `cfg-unguarded-sink` | High/Medium | Sink reachable without a dominating guard or sanitizer |
|
||||
| `cfg-auth-gap` | High | Web handler reaches privileged sink without auth check |
|
||||
| `cfg-unreachable-sink` | Medium | Dangerous function in unreachable code |
|
||||
| `cfg-unreachable-sanitizer` | Low | Sanitizer in unreachable code |
|
||||
| `cfg-unreachable-source` | Low | Source in unreachable code |
|
||||
| `cfg-error-fallthrough` | High/Medium | Error check doesn't terminate; dangerous code follows |
|
||||
| `cfg-resource-leak` | Medium | Resource acquired but not released on all exit paths |
|
||||
| `cfg-lock-not-released` | Medium | Lock acquired but not released on all exit paths |
|
||||
|
||||
## What It Detects
|
||||
|
||||
### Unguarded sinks (`cfg-unguarded-sink`)
|
||||
A sink call (e.g. `system()`, `eval()`, `Command::new()`) is reachable from the function entry without passing through a guard or sanitizer that matches the sink's capability.
|
||||
|
||||
### Auth gaps (`cfg-auth-gap`)
|
||||
A function identified as a web handler (by parameter naming conventions like `req`, `res`, `ctx`, `request`) reaches a privileged sink (shell execution, file I/O) without a prior call to an authentication function (`is_authenticated`, `require_auth`, `check_permission`, etc.).
|
||||
|
||||
### Unreachable security code (`cfg-unreachable-*`)
|
||||
Sinks, sanitizers, or sources in dead code branches. This often indicates a refactoring error where security-critical code was accidentally made unreachable.
|
||||
|
||||
### Error fallthrough (`cfg-error-fallthrough`)
|
||||
An error check (null check, error return check) does not terminate the function or loop back. Execution continues to a dangerous operation on the error path.
|
||||
|
||||
### Resource leaks (`cfg-resource-leak`, `cfg-lock-not-released`)
|
||||
A resource acquisition call (e.g. `File::open`, `fopen`, `socket`, `Lock`) is not matched by a release call (e.g. `close`, `fclose`, `unlock`) on all exit paths from the function.
|
||||
|
||||
## What It Cannot Detect
|
||||
|
||||
- **Inter-procedural guards**: If authentication is checked in a middleware function that calls this handler, the CFG detector cannot see it. It only analyzes one function at a time.
|
||||
- **Dynamic dispatch**: Virtual method calls, function pointers, and closures are opaque to the CFG.
|
||||
- **Complex guard patterns**: Only recognized guard function names are checked. Custom validation logic (e.g. `if password == expected`) is not recognized as a guard.
|
||||
- **Correct sanitization**: The detector checks that *some* guard dominates the sink, not that the guard is *correct*. A guard that always passes would suppress the finding.
|
||||
- **Cross-function resource flows**: If a file handle is opened in one function and closed in another, the detector will report a leak in the first function.
|
||||
|
||||
## Common False Positives
|
||||
|
||||
| Scenario | Why it fires | Mitigation |
|
||||
|----------|-------------|------------|
|
||||
| Framework-level auth middleware | Handler doesn't call auth directly | Document as expected; suppress with severity filter |
|
||||
| Resource closed via RAII/defer | Implicit cleanup not visible to CFG | Currently not detected; known limitation |
|
||||
| Custom guard function name | Function not in the recognized guard list | Add the function name as a sanitizer in config |
|
||||
| Test handlers | Intentionally skip auth in tests | Default non-prod downgrade reduces severity; or exclude test dirs |
|
||||
|
||||
## Common False Negatives
|
||||
|
||||
| Scenario | Why it's missed |
|
||||
|----------|----------------|
|
||||
| Auth in called function | Cross-function guards not tracked |
|
||||
| Guard via type system | Type-level guarantees (e.g. Rust's `AuthenticatedUser` wrapper) not analyzed |
|
||||
| Resource closed in finally/defer | Some cleanup patterns not recognized |
|
||||
|
||||
## Confidence Signals
|
||||
|
||||
| Signal | Meaning |
|
||||
|--------|---------|
|
||||
| **Evidence lists guard nodes** | Shows which guards were checked and found missing |
|
||||
| **Sink has high capability** | Shell execution or file I/O sinks are higher risk |
|
||||
| **Handler detection matched** | Web handler identification is based on conventional parameter names |
|
||||
|
||||
## Tuning and Noise Controls
|
||||
|
||||
### Add custom guards/sanitizers
|
||||
|
||||
```toml
|
||||
[[analysis.languages.python.rules]]
|
||||
matchers = ["validate_request", "check_csrf"]
|
||||
kind = "sanitizer"
|
||||
cap = "all"
|
||||
```
|
||||
|
||||
### Add auth rules
|
||||
|
||||
Auth checks are recognized by function name. If your codebase uses non-standard names:
|
||||
|
||||
```toml
|
||||
[[analysis.languages.javascript.rules]]
|
||||
matchers = ["ensureLoggedIn", "requirePermission"]
|
||||
kind = "sanitizer"
|
||||
cap = "all"
|
||||
```
|
||||
|
||||
### Filter results
|
||||
|
||||
```bash
|
||||
# Skip low-severity unreachable findings
|
||||
nyx scan . --severity ">=MEDIUM"
|
||||
```
|
||||
|
||||
### Disable CFG analysis
|
||||
|
||||
```bash
|
||||
nyx scan . --mode ast # AST patterns only
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Unguarded sink
|
||||
|
||||
```go
|
||||
func handler(w http.ResponseWriter, r *http.Request) {
|
||||
cmd := r.URL.Query().Get("cmd")
|
||||
exec.Command("sh", "-c", cmd).Run() // cfg-unguarded-sink: no guard dominates
|
||||
}
|
||||
```
|
||||
|
||||
### Auth gap
|
||||
|
||||
```javascript
|
||||
app.get('/admin/delete', (req, res) => {
|
||||
// No is_authenticated() call
|
||||
db.execute("DELETE FROM users WHERE id = " + req.params.id);
|
||||
// cfg-auth-gap: web handler reaches privileged sink without auth
|
||||
});
|
||||
```
|
||||
|
||||
### Resource leak
|
||||
|
||||
```c
|
||||
void process() {
|
||||
FILE *f = fopen("data.txt", "r"); // acquire
|
||||
if (error) {
|
||||
return; // cfg-resource-leak: f not closed on this path
|
||||
}
|
||||
fclose(f);
|
||||
}
|
||||
```
|
||||
|
||||
## Guard Rules
|
||||
|
||||
Nyx recognizes these function name patterns as guards:
|
||||
|
||||
| Pattern | Applies to |
|
||||
|---------|-----------|
|
||||
| `validate*`, `sanitize*` | All sinks |
|
||||
| `check_*`, `verify_*`, `assert_*` | All sinks |
|
||||
| `shell_escape` | Shell execution sinks |
|
||||
| `html_escape` | HTML/XSS sinks |
|
||||
| `url_encode` | URL sinks |
|
||||
| `which` | Shell execution (binary lookup) |
|
||||
|
||||
### Auth rules
|
||||
|
||||
| Pattern | Category |
|
||||
|---------|----------|
|
||||
| `is_authenticated`, `require_auth`, `check_permission` | Common |
|
||||
| `authorize`, `authenticate`, `require_login` | Common |
|
||||
| `check_auth`, `verify_token`, `validate_token` | Common |
|
||||
| `middleware.auth`, `auth.required` | Go |
|
||||
| `isAuthenticated`, `checkPermission`, `hasAuthority`, `hasRole` | Java |
|
||||
149
docs/detectors/patterns.md
Normal file
149
docs/detectors/patterns.md
Normal file
|
|
@ -0,0 +1,149 @@
|
|||
# AST Pattern Matching
|
||||
|
||||
## Summary
|
||||
|
||||
AST patterns are tree-sitter queries that match specific structural code constructs. They are the simplest and fastest detector family — no dataflow, no CFG, just structural presence. A match means the dangerous construct exists in the code; it does not prove the code is exploitable.
|
||||
|
||||
AST patterns run in all analysis modes, including `--mode ast` (where they are the only active detector).
|
||||
|
||||
## Rule IDs
|
||||
|
||||
Pattern rule IDs follow the format `<lang>.<category>.<specific>`:
|
||||
|
||||
```
|
||||
rs.memory.transmute
|
||||
js.code_exec.eval
|
||||
py.deser.pickle_loads
|
||||
c.memory.gets
|
||||
java.sqli.execute_concat
|
||||
```
|
||||
|
||||
See the [Rule Reference](../rules/index.md) for a complete listing per language.
|
||||
|
||||
## Pattern Tiers
|
||||
|
||||
| Tier | Meaning | Examples |
|
||||
|------|---------|---------|
|
||||
| **A** | Structural presence alone is high-signal | `gets()`, `eval()`, `pickle.loads()`, `mem::transmute` |
|
||||
| **B** | Query includes a heuristic guard | SQL `execute` with concatenated arg, `printf(var)` with non-literal format |
|
||||
|
||||
Tier B patterns use additional tree-sitter predicates to reduce false positives. For example, `java.sqli.execute_concat` only fires when `executeQuery()` receives a `binary_expression` (string concatenation) as its argument, not when it receives a literal or parameter placeholder.
|
||||
|
||||
## What It Detects
|
||||
|
||||
### By category
|
||||
|
||||
| Category | What it matches | Example languages |
|
||||
|----------|----------------|-------------------|
|
||||
| **CommandExec** | Shell command execution functions | C (`system`), Python (`os.system`), Ruby (backticks) |
|
||||
| **CodeExec** | Dynamic code evaluation | JS (`eval`, `new Function()`), Python (`exec`), PHP (`eval`) |
|
||||
| **Deserialization** | Unsafe object deserialization | Java (`readObject`), Python (`pickle.loads`), Ruby (`Marshal.load`) |
|
||||
| **SqlInjection** | SQL with string concatenation | Java, Go, Python, PHP (Tier B heuristic) |
|
||||
| **PathTraversal** | File inclusion with variable path | PHP (`include $var`) |
|
||||
| **Xss** | XSS sink functions | JS (`document.write`, `outerHTML`), Java (`getWriter().print`) |
|
||||
| **Crypto** | Weak cryptographic algorithms | All languages (`md5`, `sha1`, `Math.random()`) |
|
||||
| **Secrets** | Hardcoded credentials | Go (variable name matching) |
|
||||
| **InsecureTransport** | Unencrypted communication | Go (`InsecureSkipVerify`), JS (`fetch("http://")`) |
|
||||
| **Reflection** | Dynamic class/method dispatch | Java (`Class.forName`, `Method.invoke`), Ruby (`send`, `constantize`) |
|
||||
| **MemorySafety** | Memory safety violations | Rust (`transmute`, `unsafe`), C (`gets`, `strcpy`, `sprintf`) |
|
||||
| **Prototype** | Prototype pollution | JS/TS (`__proto__` assignment) |
|
||||
| **CodeQuality** | Panic/abort/type-safety issues | Rust (`unwrap`, `panic!`), TS (`as any`) |
|
||||
|
||||
## What It Cannot Detect
|
||||
|
||||
- **Dataflow**: Patterns don't track whether the dangerous function receives tainted input. `eval("hello")` (safe) and `eval(userInput)` (dangerous) both match `js.code_exec.eval`.
|
||||
- **Context**: Patterns don't understand whether the code is reachable, guarded, or inside a test.
|
||||
- **Semantics**: `strcpy(dst, src)` always matches — it cannot determine buffer sizes.
|
||||
- **Indirect calls**: Function pointers, dynamic dispatch, and aliased references are invisible.
|
||||
|
||||
## Common False Positives
|
||||
|
||||
| Scenario | Why it fires | Mitigation |
|
||||
|----------|-------------|------------|
|
||||
| `eval()` with a hardcoded string literal | Pattern matches structural presence | Taint analysis won't flag this — use `--mode cfg` for fewer false positives |
|
||||
| `unsafe` block in Rust with sound justification | All unsafe blocks match | Filter with `--severity ">=MEDIUM"` (unsafe_block is Medium) |
|
||||
| `.unwrap()` in test code | Acceptable in tests | Default non-prod downgrade reduces severity |
|
||||
| `md5()` used for checksums (not security) | Pattern doesn't know usage intent | Filter Low severity or add to exclusions |
|
||||
| SQL concatenation with trusted data | Tier B heuristic can't verify data source | Taint analysis is more precise here |
|
||||
|
||||
## Common False Negatives
|
||||
|
||||
| Scenario | Why it's missed |
|
||||
|----------|----------------|
|
||||
| `eval` called via alias (`let e = eval; e(input)`) | Pattern matches the identifier `eval`, not the resolved function |
|
||||
| Dangerous function in a macro expansion | Tree-sitter parses the macro call, not the expansion |
|
||||
| SQL injection via ORM query builder | No pattern for ORM-specific query building |
|
||||
| Imported function under different name | `from os import system as s; s(cmd)` — pattern looks for `system` |
|
||||
|
||||
## Confidence Signals
|
||||
|
||||
| Signal | Meaning |
|
||||
|--------|---------|
|
||||
| **Tier A** | High confidence — the function itself is dangerous |
|
||||
| **Tier B** | Moderate confidence — heuristic guard reduces false positives |
|
||||
| **High severity** | Critical vulnerability class (command exec, deserialization) |
|
||||
| **Low severity** | Informational (weak crypto, code quality) |
|
||||
| **Non-prod path** | Finding in test/vendor code — downgraded by default |
|
||||
|
||||
## Tuning and Noise Controls
|
||||
|
||||
### Severity filtering
|
||||
|
||||
```bash
|
||||
# Skip code-quality and weak-crypto findings
|
||||
nyx scan . --severity ">=MEDIUM"
|
||||
|
||||
# Only critical findings
|
||||
nyx scan . --severity HIGH
|
||||
```
|
||||
|
||||
### Use taint for precision
|
||||
|
||||
```bash
|
||||
# Taint-only mode: only report findings with confirmed dataflow
|
||||
nyx scan . --mode cfg
|
||||
```
|
||||
|
||||
### Exclude directories
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
excluded_directories = ["node_modules", "vendor", "generated"]
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Tier A — structural presence
|
||||
|
||||
**C: Banned function**
|
||||
```c
|
||||
char buf[64];
|
||||
gets(buf); // c.memory.gets — always dangerous, no safe usage
|
||||
```
|
||||
|
||||
**Python: Unsafe deserialization**
|
||||
```python
|
||||
import pickle
|
||||
data = pickle.loads(user_input) # py.deser.pickle_loads
|
||||
```
|
||||
|
||||
### Tier B — heuristic-guarded
|
||||
|
||||
**Java: SQL concatenation**
|
||||
```java
|
||||
// Fires: concatenated argument
|
||||
stmt.executeQuery("SELECT * FROM users WHERE id=" + userId);
|
||||
// java.sqli.execute_concat
|
||||
|
||||
// Does NOT fire: parameterized query
|
||||
stmt.executeQuery(preparedSql);
|
||||
```
|
||||
|
||||
**C: Format string**
|
||||
```c
|
||||
// Fires: variable as first argument
|
||||
printf(user_input); // c.memory.printf_no_fmt
|
||||
|
||||
// Does NOT fire: literal format string
|
||||
printf("%s", user_input);
|
||||
```
|
||||
204
docs/detectors/state.md
Normal file
204
docs/detectors/state.md
Normal file
|
|
@ -0,0 +1,204 @@
|
|||
# State Model Analysis
|
||||
|
||||
## Summary
|
||||
|
||||
Nyx's state model analysis tracks **resource lifecycle** and **authentication state** through a function using monotone dataflow over bounded lattices. It detects use-after-close bugs, double-close bugs, resource leaks, and unauthenticated access to privileged operations.
|
||||
|
||||
State analysis is **opt-in** — enable it with `scanner.enable_state_analysis = true` in config. It requires `mode = "full"` or `mode = "cfg"`.
|
||||
|
||||
## Rule IDs
|
||||
|
||||
| Rule ID | Severity | Description |
|
||||
|---------|----------|-------------|
|
||||
| `state-use-after-close` | High | Variable used after being closed/released |
|
||||
| `state-double-close` | Medium | Resource closed twice |
|
||||
| `state-resource-leak` | Medium | Resource opened but never closed (definite) |
|
||||
| `state-resource-leak-possible` | Low | Resource may not be closed on all paths |
|
||||
| `state-unauthed-access` | High | Privileged operation reached without authentication |
|
||||
|
||||
## What It Detects
|
||||
|
||||
### Use-after-close (`state-use-after-close`)
|
||||
|
||||
A resource transitions to the CLOSED state (via `close()`, `fclose()`, `disconnect()`, etc.), then a use operation (`read`, `write`, `send`, `recv`, `query`, etc.) is performed on it.
|
||||
|
||||
```c
|
||||
FILE *f = fopen("data.txt", "r");
|
||||
fclose(f);
|
||||
fread(buf, 1, 100, f); // state-use-after-close
|
||||
```
|
||||
|
||||
### Double-close (`state-double-close`)
|
||||
|
||||
A resource is closed twice. This can cause crashes or undefined behavior.
|
||||
|
||||
```python
|
||||
f = open("data.txt")
|
||||
f.close()
|
||||
f.close() # state-double-close
|
||||
```
|
||||
|
||||
### Resource leak (`state-resource-leak`)
|
||||
|
||||
A resource is opened but never closed on any path through the function. This is a definite leak.
|
||||
|
||||
```java
|
||||
FileInputStream fis = new FileInputStream("data.txt");
|
||||
process(fis);
|
||||
// function exits without fis.close() — state-resource-leak
|
||||
```
|
||||
|
||||
### Possible resource leak (`state-resource-leak-possible`)
|
||||
|
||||
A resource is closed on some paths but not others.
|
||||
|
||||
```go
|
||||
f, err := os.Open("data.txt")
|
||||
if err != nil {
|
||||
return // f not closed here
|
||||
}
|
||||
f.Close() // closed here
|
||||
// state-resource-leak-possible on the error path
|
||||
```
|
||||
|
||||
### Unauthenticated access (`state-unauthed-access`)
|
||||
|
||||
A function identified as a web handler reaches a privileged sink (shell execution, file I/O) without any authentication check on the path.
|
||||
|
||||
A function is identified as a web handler if:
|
||||
1. Its name starts with `handle_`, `route_`, or `api_` (strong match — sufficient on its own), OR
|
||||
2. Its name starts with `serve_` or `process_` AND any function in the file has web-like parameter names (`request`, `req`, `ctx`, `res`, `response`, `w`, `writer`, etc., varying by language).
|
||||
|
||||
The function name `main` is explicitly excluded.
|
||||
|
||||
```javascript
|
||||
app.post('/admin/exec', (req, res) => {
|
||||
// No auth check
|
||||
exec(req.body.command); // state-unauthed-access
|
||||
});
|
||||
```
|
||||
|
||||
## What It Cannot Detect
|
||||
|
||||
- **Cross-function resource management**: Resources opened in one function and closed in another are not tracked. This is the most common source of false positives for leak detection.
|
||||
- **RAII / defer / try-with-resources**: Implicit cleanup via language-level constructs (Rust's `Drop`, Go's `defer`, Java's try-with-resources, Python's `with`) is not recognized. These patterns will produce false-positive leak findings.
|
||||
- **Dynamic dispatch**: If `close()` is called through a trait object or interface, it may not be recognized.
|
||||
- **Authentication via type system**: Rust's type-state pattern (e.g. `AuthenticatedRequest<T>`) is not recognized as an auth check.
|
||||
- **Complex authorization logic**: Only recognized function name patterns are checked.
|
||||
|
||||
## Common False Positives
|
||||
|
||||
| Scenario | Why it fires | Mitigation |
|
||||
|----------|-------------|------------|
|
||||
| RAII / Drop / defer cleanup | Implicit cleanup not visible | Known limitation; filter by severity |
|
||||
| Resource returned to caller | Ownership transferred, not leaked | Known limitation |
|
||||
| Framework-managed resources | Web framework manages connection lifecycle | Exclude framework-generated handlers |
|
||||
| Try-with-resources (Java) | Language construct not parsed | Known limitation |
|
||||
| Context manager (Python `with`) | Block construct not tracked | Known limitation |
|
||||
|
||||
## Common False Negatives
|
||||
|
||||
| Scenario | Why it's missed |
|
||||
|----------|----------------|
|
||||
| Resource closed in helper function | Cross-function tracking not implemented |
|
||||
| Auth in middleware | Auth check happens before handler is called |
|
||||
| Double-close via aliased reference | Alias analysis not performed |
|
||||
|
||||
## Confidence Signals
|
||||
|
||||
| Signal | Meaning |
|
||||
|--------|---------|
|
||||
| **Definite leak (state-resource-leak)** | Resource is never closed on any path — high confidence |
|
||||
| **Use-after-close** | Read/write operation after explicit close — high confidence |
|
||||
| **Web handler detected** | Entry point matched by parameter naming convention |
|
||||
| **Possible leak (state-resource-leak-possible)** | Resource closed on some but not all paths — lower confidence |
|
||||
|
||||
## Tuning and Noise Controls
|
||||
|
||||
### Enable state analysis
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
enable_state_analysis = true
|
||||
```
|
||||
|
||||
### Severity filtering
|
||||
|
||||
```bash
|
||||
# Skip possible-leak findings (Low severity)
|
||||
nyx scan . --severity ">=MEDIUM"
|
||||
```
|
||||
|
||||
### Exclude test files
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
excluded_directories = ["tests", "test", "spec"]
|
||||
```
|
||||
|
||||
## Resource Pairs
|
||||
|
||||
The state engine recognizes these acquire/release pairs per language:
|
||||
|
||||
### C/C++
|
||||
| Acquire | Release | Resource |
|
||||
|---------|---------|----------|
|
||||
| `fopen` | `fclose` | File handle |
|
||||
| `open` | `close` | File descriptor |
|
||||
| `socket` | `close` | Socket |
|
||||
| `malloc`, `calloc`, `realloc` | `free` | Heap memory |
|
||||
| `pthread_mutex_lock` | `pthread_mutex_unlock` | Mutex |
|
||||
|
||||
### Rust
|
||||
| Acquire | Release | Resource |
|
||||
|---------|---------|----------|
|
||||
| `File::open`, `File::create` | `drop`, `close` | File handle |
|
||||
| `TcpStream::connect` | `shutdown` | TCP connection |
|
||||
| `lock`, `read`, `write` (on Mutex/RwLock) | `drop` | Lock guard |
|
||||
|
||||
### Java
|
||||
| Acquire | Release | Resource |
|
||||
|---------|---------|----------|
|
||||
| `new FileInputStream` | `close` | File stream |
|
||||
| `getConnection` | `close` | DB connection |
|
||||
| `new Socket` | `close` | Socket |
|
||||
|
||||
### Go, Python, JavaScript, Ruby, PHP
|
||||
Similar patterns with language-specific function names.
|
||||
|
||||
## Use Patterns (Trigger use-after-close)
|
||||
|
||||
The following operations on a closed resource trigger `state-use-after-close`:
|
||||
|
||||
```
|
||||
read, write, send, recv, fread, fwrite, fgets, fputs, fprintf, fscanf,
|
||||
fflush, fseek, ftell, rewind, feof, ferror, fgetc, fputc, getc, putc,
|
||||
ungetc, query, execute, fetch, sendto, recvfrom, ioctl, fcntl,
|
||||
strcpy, strncpy, strcat, strncat, memcpy, memmove, memset, memcmp,
|
||||
strcmp, strncmp, strlen, sprintf, snprintf
|
||||
```
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Resource Lifecycle Lattice
|
||||
|
||||
```
|
||||
UNINIT → OPEN → CLOSED
|
||||
→ MOVED
|
||||
```
|
||||
|
||||
States are tracked as bitflags, allowing the lattice to represent uncertainty (e.g. OPEN|CLOSED means the resource is open on some paths and closed on others).
|
||||
|
||||
### Leak Detection Scope
|
||||
|
||||
Resource leaks are checked at the file-level exit node and the **synthesized** function exit node (a single Return node that all early returns feed into). Early-return nodes are **not** checked individually — only the merged state at the function's synthesized exit is inspected. This prevents duplicate findings where an early-return path reports a definite leak while the merged exit correctly reports a possible leak.
|
||||
|
||||
This per-function exit inspection ensures that a variable leaked inside one function is not masked by a same-named variable that is properly closed in a subsequent function.
|
||||
|
||||
### Auth Level Lattice
|
||||
|
||||
```
|
||||
Unauthed < Authed < Admin
|
||||
```
|
||||
|
||||
Join semantics: take the minimum (conservative). If any path is unauthenticated, the result is unauthenticated.
|
||||
202
docs/detectors/taint.md
Normal file
202
docs/detectors/taint.md
Normal file
|
|
@ -0,0 +1,202 @@
|
|||
# Taint Analysis
|
||||
|
||||
## Summary
|
||||
|
||||
Nyx's taint analysis tracks the flow of untrusted data from **sources** (where data enters the program) through **assignments and function calls** to **sinks** (where dangerous operations happen). If the data reaches a sink without passing through a **sanitizer** with matching capabilities, a finding is emitted.
|
||||
|
||||
The engine uses a monotone forward dataflow analysis over a finite lattice with guaranteed termination. Analysis is **intra-procedural with cross-file function summaries** — it does not follow calls into other functions but uses pre-computed summaries of their behavior.
|
||||
|
||||
## Rule ID
|
||||
|
||||
```
|
||||
taint-unsanitised-flow (source <line>:<col>)
|
||||
```
|
||||
|
||||
One rule ID covers all taint findings. The parenthetical identifies the specific source location.
|
||||
|
||||
## What It Detects
|
||||
|
||||
- Environment variables flowing to shell execution (`env::var` → `Command::new`)
|
||||
- User input flowing to code evaluation (`req.body` → `eval()`)
|
||||
- File contents flowing to SQL queries (`fs::read_to_string` → `db.execute()`)
|
||||
- Request parameters flowing to HTML output (`req.query` → `innerHTML`)
|
||||
- Any source-to-sink flow where the sink's required capability is not stripped by a sanitizer
|
||||
|
||||
## What It Cannot Detect
|
||||
|
||||
- **Inter-procedural flows without summaries**: If a function isn't summarized (e.g. from a third-party library without source), the taint engine cannot track data through it. It conservatively treats unknown callees as neither propagating nor sanitizing.
|
||||
- **Flows through data structures**: Taint is tracked per-variable, not per-field. `obj.field = tainted; sink(obj.other_field)` may produce a false positive because taint attaches to `obj` as a whole.
|
||||
- **Aliasing**: `let y = &x; sink(*y)` — the engine tracks `y` as a fresh variable, not an alias of `x`. This can cause false negatives.
|
||||
- **Complex control flow**: The analysis is flow-sensitive (respects control flow within a function) but does not track taint through arbitrary loops with complex exit conditions.
|
||||
- **Implicit flows**: Taint only follows explicit data flow, not information flow through branching (e.g. `if (secret) { x = 1 } else { x = 0 }` does not taint `x`).
|
||||
|
||||
## Common False Positives
|
||||
|
||||
| Scenario | Why it happens | Mitigation |
|
||||
|----------|---------------|------------|
|
||||
| Custom sanitizer not recognized | Nyx only knows built-in and configured sanitizers | Add a custom sanitizer rule in config |
|
||||
| Taint through struct fields | Variable-level (not field-level) tracking | No current mitigation; field sensitivity is planned |
|
||||
| Dead code paths | The engine is path-insensitive within a function (it considers all paths) | Contradiction pruning catches some cases; path-validated findings score lower |
|
||||
| Library wrappers | A wrapper around a dangerous function may re-introduce taint that was sanitized by the wrapper | Summarize the wrapper function or add it as a sanitizer |
|
||||
|
||||
## Common False Negatives
|
||||
|
||||
| Scenario | Why it's missed |
|
||||
|----------|----------------|
|
||||
| Third-party library calls | No summary available; callee treated as opaque |
|
||||
| Taint through global/static variables | Not tracked across function boundaries |
|
||||
| Taint through closures/callbacks in some languages | Closure capture analysis is limited (JS/TS/Ruby/Go anonymous functions ARE analyzed) |
|
||||
| Flows spanning more than two files | Summary approximation loses precision at depth |
|
||||
|
||||
## Confidence Signals
|
||||
|
||||
These signals in the output indicate higher-confidence findings:
|
||||
|
||||
| Signal | What it means |
|
||||
|--------|--------------|
|
||||
| **Evidence: Source + Sink** | Both endpoints identified with specific function names and locations |
|
||||
| **Source kind = user input** | Source is directly controllable by an attacker (req.body, argv, etc.) |
|
||||
| **path_validated = false** | No validation guard on the path — higher exploitability |
|
||||
| **No guard_kind** | No dominating predicate check (null check, error check, etc.) |
|
||||
| **High rank_score** | Multiple confidence signals combined |
|
||||
|
||||
Lower-confidence:
|
||||
|
||||
| Signal | What it means |
|
||||
|--------|--------------|
|
||||
| **path_validated = true** | A validation predicate guards the path — may not be exploitable |
|
||||
| **guard_kind = "ValidationCall"** | An explicit validation function was called before the sink |
|
||||
| **Source kind = database** | Data from DB — may already be validated at insertion time |
|
||||
|
||||
## Tuning and Noise Controls
|
||||
|
||||
### Add custom sanitizers
|
||||
|
||||
If your codebase has a custom sanitizer that Nyx doesn't recognize:
|
||||
|
||||
```toml
|
||||
# nyx.local
|
||||
[[analysis.languages.javascript.rules]]
|
||||
matchers = ["escapeHtml", "sanitizeInput"]
|
||||
kind = "sanitizer"
|
||||
cap = "html_escape"
|
||||
```
|
||||
|
||||
Or via CLI:
|
||||
```bash
|
||||
nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape
|
||||
```
|
||||
|
||||
### Filter by severity
|
||||
|
||||
```bash
|
||||
nyx scan . --severity HIGH # Only high-severity taint findings
|
||||
nyx scan . --severity ">=MEDIUM" # Skip low-severity
|
||||
```
|
||||
|
||||
### Skip non-production code
|
||||
|
||||
By default, findings in `tests/`, `vendor/`, `build/` paths are downgraded one severity tier. To exclude them entirely, add to config:
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
excluded_directories = ["tests", "vendor", "build", "examples"]
|
||||
```
|
||||
|
||||
### Disable taint (AST-only mode)
|
||||
|
||||
```bash
|
||||
nyx scan . --mode ast
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
**Vulnerable code** (Rust):
|
||||
```rust
|
||||
use std::env;
|
||||
use std::process::Command;
|
||||
|
||||
fn main() {
|
||||
let cmd = env::var("USER_CMD").unwrap(); // line 5: source
|
||||
Command::new("sh").arg("-c").arg(&cmd).output(); // line 6: sink
|
||||
}
|
||||
```
|
||||
|
||||
**Finding**:
|
||||
```
|
||||
[HIGH] taint-unsanitised-flow (source 5:15) src/main.rs:6:5
|
||||
Source: env::var("USER_CMD") at 5:15
|
||||
Sink: Command::new("sh").arg("-c")
|
||||
Score: 76
|
||||
```
|
||||
|
||||
**Safe alternative**:
|
||||
```rust
|
||||
use std::env;
|
||||
use std::process::Command;
|
||||
|
||||
fn main() {
|
||||
let cmd = env::var("USER_CMD").unwrap();
|
||||
// Use the value as a direct argument, not a shell command
|
||||
Command::new(&cmd).output();
|
||||
// Or validate against an allowlist
|
||||
}
|
||||
```
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Capability System
|
||||
|
||||
Taint uses a bitflag capability system to match sources with appropriate sanitizers and sinks:
|
||||
|
||||
| Capability | Bit | Sources | Sanitizers | Sinks |
|
||||
|-----------|-----|---------|------------|-------|
|
||||
| `ENV_VAR` | 0x01 | `env::var`, `getenv` | — | — |
|
||||
| `HTML_ESCAPE` | 0x02 | — | `html_escape`, `DOMPurify.sanitize` | `innerHTML`, `document.write` |
|
||||
| `SHELL_ESCAPE` | 0x04 | — | `shell_escape` | `Command::new`, `system()`, `eval()` |
|
||||
| `URL_ENCODE` | 0x08 | — | `encodeURIComponent` | `location.href` |
|
||||
| `JSON_PARSE` | 0x10 | — | `JSON.parse` | — |
|
||||
| `FILE_IO` | 0x20 | — | `filepath.Clean`, `basename`, `os.path.realpath` | `fopen`, `open`, `send_file`, `fs::read_to_string` |
|
||||
| `FMT_STRING` | 0x40 | — | — | `printf(var)` |
|
||||
|
||||
Sources typically use `Cap::all()` to match any sink. A sanitizer strips specific capability bits. A finding fires when a tainted variable reaches a sink and the taint still has the matching capability bit set.
|
||||
|
||||
### Nested Function Analysis
|
||||
|
||||
The CFG builder recursively discovers function expressions nested inside call arguments:
|
||||
|
||||
- **JavaScript/TypeScript**: `function_expression`, `arrow_function` inside call arguments (e.g., Express route handlers)
|
||||
- **Ruby**: `do_block` and `block` nodes (e.g., Sinatra `get '/path' do...end`)
|
||||
- **Go**: `func_literal` (anonymous function literals)
|
||||
|
||||
Each nested function is walked as a separate scope and receives a unique identifier (`<anon@{byte_offset}>`) to prevent collisions when multiple anonymous functions exist in the same file.
|
||||
|
||||
### Chained Call Classification
|
||||
|
||||
Method chains like `r.URL.Query().Get("host")` are normalized by stripping internal `()` segments between `.` separators. The classifier matches against both the original text and the normalized form, enabling rules like `r.URL` to match within `r.URL.Query.Get`.
|
||||
|
||||
### Nested Call Fallback
|
||||
|
||||
When the outermost call in an expression doesn't classify as a source/sink, the engine tries all nested inner calls. This handles patterns like `str(eval(expr))` where `str` is not a sink but the inner `eval` is.
|
||||
|
||||
### Rust `if let` / `while let` Pattern Bindings
|
||||
|
||||
The CFG builder recognizes Rust `let_condition` nodes inside `if` and `while` expressions. The value expression is classified for source/sink labels, and the pattern binding is extracted as a variable definition:
|
||||
|
||||
```rust
|
||||
if let Ok(cmd) = env::var("CMD") {
|
||||
// cmd is tainted — env::var is a source, cmd is the binding
|
||||
Command::new("sh").arg("-c").arg(&cmd).output(); // taint-unsanitised-flow
|
||||
}
|
||||
```
|
||||
|
||||
This also works for `while let` patterns.
|
||||
|
||||
### JS/TS Two-Level Solve
|
||||
|
||||
For JavaScript and TypeScript, taint analysis uses a two-level approach:
|
||||
|
||||
1. **Level 1**: Solve top-level code (module scope)
|
||||
2. **Level 2**: Solve each function seeded with the converged top-level state
|
||||
|
||||
This prevents false positives from cross-function taint leakage while preserving global-to-function flows.
|
||||
Loading…
Add table
Add a link
Reference in a new issue