nyx/docs/detectors/state.md
Eli Peter 1bbe4b1cfb
Phase 1 (#33)
* chore: Exclude CLAUDE.md from Cargo.toml

* feat: add callgraph module and integrate into main analysis flow

* feat: enhance CLI with new severity filtering and analysis modes

* feat: update CHANGELOG with recent enhancements and fixes to severity filtering and output handling

* feat: implement state-model dataflow analysis for resource lifecycle and auth state

* feat: enhance diagnostic output formatting and add evidence structure

* feat: implement attack surface ranking for diagnostics with scoring and sorting

* feat: add comprehensive documentation for installation, usage, and rules reference

* feat: add multiple language support for command execution and evaluation endpoints

* feat: implement inline suppression for findings using `nyx:ignore` comments

* feat: add confidence levels to AST patterns and update output structure

* feat: implement low-noise prioritization system with category filtering, rollup grouping, and configurable budgets

* feat: bump version to 0.4.0 and update changelog with new features and improvements

* feat: add dead code allowances to various functions in mod.rs and real_world_tests.rs
2026-02-25 21:16:36 -05:00

204 lines
7.5 KiB
Markdown

# State Model Analysis
## Summary
Nyx's state model analysis tracks **resource lifecycle** and **authentication state** through a function using monotone dataflow over bounded lattices. It detects use-after-close bugs, double-close bugs, resource leaks, and unauthenticated access to privileged operations.
State analysis is **opt-in** — enable it with `scanner.enable_state_analysis = true` in config. It requires `mode = "full"` or `mode = "cfg"`.
## Rule IDs
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `state-use-after-close` | High | Variable used after being closed/released |
| `state-double-close` | Medium | Resource closed twice |
| `state-resource-leak` | Medium | Resource opened but never closed (definite) |
| `state-resource-leak-possible` | Low | Resource may not be closed on all paths |
| `state-unauthed-access` | High | Privileged operation reached without authentication |
## What It Detects
### Use-after-close (`state-use-after-close`)
A resource transitions to the CLOSED state (via `close()`, `fclose()`, `disconnect()`, etc.), then a use operation (`read`, `write`, `send`, `recv`, `query`, etc.) is performed on it.
```c
FILE *f = fopen("data.txt", "r");
fclose(f);
fread(buf, 1, 100, f); // state-use-after-close
```
### Double-close (`state-double-close`)
A resource is closed twice. This can cause crashes or undefined behavior.
```python
f = open("data.txt")
f.close()
f.close() # state-double-close
```
### Resource leak (`state-resource-leak`)
A resource is opened but never closed on any path through the function. This is a definite leak.
```java
FileInputStream fis = new FileInputStream("data.txt");
process(fis);
// function exits without fis.close() — state-resource-leak
```
### Possible resource leak (`state-resource-leak-possible`)
A resource is closed on some paths but not others.
```go
f, err := os.Open("data.txt")
if err != nil {
return // f not closed here
}
f.Close() // closed here
// state-resource-leak-possible on the error path
```
### Unauthenticated access (`state-unauthed-access`)
A function identified as a web handler reaches a privileged sink (shell execution, file I/O) without any authentication check on the path.
A function is identified as a web handler if:
1. Its name starts with `handle_`, `route_`, or `api_` (strong match — sufficient on its own), OR
2. Its name starts with `serve_` or `process_` AND any function in the file has web-like parameter names (`request`, `req`, `ctx`, `res`, `response`, `w`, `writer`, etc., varying by language).
The function name `main` is explicitly excluded.
```javascript
app.post('/admin/exec', (req, res) => {
// No auth check
exec(req.body.command); // state-unauthed-access
});
```
## What It Cannot Detect
- **Cross-function resource management**: Resources opened in one function and closed in another are not tracked. This is the most common source of false positives for leak detection.
- **RAII / defer / try-with-resources**: Implicit cleanup via language-level constructs (Rust's `Drop`, Go's `defer`, Java's try-with-resources, Python's `with`) is not recognized. These patterns will produce false-positive leak findings.
- **Dynamic dispatch**: If `close()` is called through a trait object or interface, it may not be recognized.
- **Authentication via type system**: Rust's type-state pattern (e.g. `AuthenticatedRequest<T>`) is not recognized as an auth check.
- **Complex authorization logic**: Only recognized function name patterns are checked.
## Common False Positives
| Scenario | Why it fires | Mitigation |
|----------|-------------|------------|
| RAII / Drop / defer cleanup | Implicit cleanup not visible | Known limitation; filter by severity |
| Resource returned to caller | Ownership transferred, not leaked | Known limitation |
| Framework-managed resources | Web framework manages connection lifecycle | Exclude framework-generated handlers |
| Try-with-resources (Java) | Language construct not parsed | Known limitation |
| Context manager (Python `with`) | Block construct not tracked | Known limitation |
## Common False Negatives
| Scenario | Why it's missed |
|----------|----------------|
| Resource closed in helper function | Cross-function tracking not implemented |
| Auth in middleware | Auth check happens before handler is called |
| Double-close via aliased reference | Alias analysis not performed |
## Confidence Signals
| Signal | Meaning |
|--------|---------|
| **Definite leak (state-resource-leak)** | Resource is never closed on any path — high confidence |
| **Use-after-close** | Read/write operation after explicit close — high confidence |
| **Web handler detected** | Entry point matched by parameter naming convention |
| **Possible leak (state-resource-leak-possible)** | Resource closed on some but not all paths — lower confidence |
## Tuning and Noise Controls
### Enable state analysis
```toml
[scanner]
enable_state_analysis = true
```
### Severity filtering
```bash
# Skip possible-leak findings (Low severity)
nyx scan . --severity ">=MEDIUM"
```
### Exclude test files
```toml
[scanner]
excluded_directories = ["tests", "test", "spec"]
```
## Resource Pairs
The state engine recognizes these acquire/release pairs per language:
### C/C++
| Acquire | Release | Resource |
|---------|---------|----------|
| `fopen` | `fclose` | File handle |
| `open` | `close` | File descriptor |
| `socket` | `close` | Socket |
| `malloc`, `calloc`, `realloc` | `free` | Heap memory |
| `pthread_mutex_lock` | `pthread_mutex_unlock` | Mutex |
### Rust
| Acquire | Release | Resource |
|---------|---------|----------|
| `File::open`, `File::create` | `drop`, `close` | File handle |
| `TcpStream::connect` | `shutdown` | TCP connection |
| `lock`, `read`, `write` (on Mutex/RwLock) | `drop` | Lock guard |
### Java
| Acquire | Release | Resource |
|---------|---------|----------|
| `new FileInputStream` | `close` | File stream |
| `getConnection` | `close` | DB connection |
| `new Socket` | `close` | Socket |
### Go, Python, JavaScript, Ruby, PHP
Similar patterns with language-specific function names.
## Use Patterns (Trigger use-after-close)
The following operations on a closed resource trigger `state-use-after-close`:
```
read, write, send, recv, fread, fwrite, fgets, fputs, fprintf, fscanf,
fflush, fseek, ftell, rewind, feof, ferror, fgetc, fputc, getc, putc,
ungetc, query, execute, fetch, sendto, recvfrom, ioctl, fcntl,
strcpy, strncpy, strcat, strncat, memcpy, memmove, memset, memcmp,
strcmp, strncmp, strlen, sprintf, snprintf
```
## Technical Details
### Resource Lifecycle Lattice
```
UNINIT → OPEN → CLOSED
→ MOVED
```
States are tracked as bitflags, allowing the lattice to represent uncertainty (e.g. OPEN|CLOSED means the resource is open on some paths and closed on others).
### Leak Detection Scope
Resource leaks are checked at the file-level exit node and the **synthesized** function exit node (a single Return node that all early returns feed into). Early-return nodes are **not** checked individually — only the merged state at the function's synthesized exit is inspected. This prevents duplicate findings where an early-return path reports a definite leak while the merged exit correctly reports a possible leak.
This per-function exit inspection ensures that a variable leaked inside one function is not masked by a same-named variable that is properly closed in a subsequent function.
### Auth Level Lattice
```
Unauthed < Authed < Admin
```
Join semantics: take the minimum (conservative). If any path is unauthenticated, the result is unauthenticated.