nyx/docs/rules/python.md

# Python Rules

Nyx detects Python vulnerabilities through AST patterns and taint analysis, covering code execution, command injection, deserialization, SQL injection, and weak crypto.

## Taint Labels

Python has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/python.rs`.

### Sources

| Matcher | Cap |
|---------|-----|
| `os.getenv`, `os.environ` | all |
| `request.args`, `request.form`, `request.json`, `request.headers`, `request.cookies`, `input` | all |
| `sys.argv` | all |
| `argparse.parse_args`, `urllib.request.urlopen`, `requests.get`, `requests.post` | all |

### Sanitizers

| Matcher | Cap |
|---------|-----|
| `html.escape` | HTML_ESCAPE |
| `shlex.quote` | SHELL_ESCAPE |
| `os.path.realpath` | FILE_IO |

### Sinks

| Matcher | Cap |
|---------|-----|
| `eval`, `exec` | SHELL_ESCAPE |
| `os.system`, `os.popen`, `subprocess.call`, `subprocess.run`, `subprocess.Popen`, `subprocess.check_output`, `subprocess.check_call` | SHELL_ESCAPE |
| `cursor.execute`, `cursor.executemany` | SHELL_ESCAPE |
| `send_file`, `send_from_directory` | FILE_IO |
| `open` | FILE_IO |

---

## AST Pattern Rules

### Code Execution

| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.code_exec.eval` | High | A | `eval()` — dynamic code execution |
| `py.code_exec.exec` | High | A | `exec()` — dynamic code execution |
| `py.code_exec.compile` | Medium | A | `compile()` with exec/eval mode |

### Command Execution

| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.cmdi.os_system` | High | A | `os.system()` — shell command execution |
| `py.cmdi.os_popen` | High | A | `os.popen()` — shell command execution |
| `py.cmdi.subprocess_shell` | High | B | `subprocess.*` with `shell=True` |

### Deserialization

| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.deser.pickle_loads` | High | A | `pickle.loads()` / `pickle.load()` — arbitrary object deserialization |
| `py.deser.yaml_load` | High | A | `yaml.load()` without SafeLoader |
| `py.deser.shelve_open` | Medium | A | `shelve.open()` — pickle-backed deserialization |

### SQL Injection

| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.sqli.execute_format` | Medium | B | `cursor.execute()` with string concatenation |

### Weak Crypto

| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.crypto.md5` | Low | A | `hashlib.md5()` — weak hash algorithm |
| `py.crypto.sha1` | Low | A | `hashlib.sha1()` — weak hash algorithm |

### Template Injection

| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.xss.jinja_from_string` | Medium | A | `jinja2.Template.from_string()` — template injection |

---

## Examples

### `py.deser.pickle_loads` — Unsafe deserialization

**Vulnerable:**
```python
import pickle
data = pickle.loads(request.body)  # Arbitrary code execution
```

**Safe alternative:**
```python
import json
data = json.loads(request.body)  # JSON is safe
```

### `py.cmdi.subprocess_shell` — Shell execution

**Vulnerable:**
```python
import subprocess
subprocess.call(user_input, shell=True)  # Command injection
```

**Safe alternative:**
```python
import subprocess
import shlex
subprocess.call(shlex.split(user_input), shell=False)
# Or better: use an explicit command list
subprocess.call(["ls", "-la", user_dir])
```

### `py.deser.yaml_load` — Unsafe YAML

**Vulnerable:**
```python
import yaml
config = yaml.load(user_data)  # Can instantiate arbitrary objects
```

**Safe alternative:**
```python
import yaml
config = yaml.safe_load(user_data)  # Only basic Python types
```

### `py.sqli.execute_format` — SQL concatenation

**Vulnerable:**
```python
cursor.execute("SELECT * FROM users WHERE id=" + user_id)
```

**Safe alternative:**
```python
cursor.execute("SELECT * FROM users WHERE id=?", (user_id,))
```
Phase 1 (#33) * chore: Exclude CLAUDE.md from Cargo.toml * feat: add callgraph module and integrate into main analysis flow * feat: enhance CLI with new severity filtering and analysis modes * feat: update CHANGELOG with recent enhancements and fixes to severity filtering and output handling * feat: implement state-model dataflow analysis for resource lifecycle and auth state * feat: enhance diagnostic output formatting and add evidence structure * feat: implement attack surface ranking for diagnostics with scoring and sorting * feat: add comprehensive documentation for installation, usage, and rules reference * feat: add multiple language support for command execution and evaluation endpoints * feat: implement inline suppression for findings using `nyx:ignore` comments * feat: add confidence levels to AST patterns and update output structure * feat: implement low-noise prioritization system with category filtering, rollup grouping, and configurable budgets * feat: bump version to 0.4.0 and update changelog with new features and improvements * feat: add dead code allowances to various functions in mod.rs and real_world_tests.rs 2026-02-25 21:16:36 -05:00			`# Python Rules`

			`Nyx detects Python vulnerabilities through AST patterns and taint analysis, covering code execution, command injection, deserialization, SQL injection, and weak crypto.`

			`## Taint Labels`

			Python has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/python.rs`.

			`### Sources`

			`\| Matcher \| Cap \|`
			`\|---------\|-----\|`
			\| `os.getenv`, `os.environ` \| all \|
			\| `request.args`, `request.form`, `request.json`, `request.headers`, `request.cookies`, `input` \| all \|
			\| `sys.argv` \| all \|
			\| `argparse.parse_args`, `urllib.request.urlopen`, `requests.get`, `requests.post` \| all \|

			`### Sanitizers`

			`\| Matcher \| Cap \|`
			`\|---------\|-----\|`
			\| `html.escape` \| HTML_ESCAPE \|
			\| `shlex.quote` \| SHELL_ESCAPE \|
			\| `os.path.realpath` \| FILE_IO \|

			`### Sinks`

			`\| Matcher \| Cap \|`
			`\|---------\|-----\|`
			\| `eval`, `exec` \| SHELL_ESCAPE \|
			\| `os.system`, `os.popen`, `subprocess.call`, `subprocess.run`, `subprocess.Popen`, `subprocess.check_output`, `subprocess.check_call` \| SHELL_ESCAPE \|
			\| `cursor.execute`, `cursor.executemany` \| SHELL_ESCAPE \|
			\| `send_file`, `send_from_directory` \| FILE_IO \|
			\| `open` \| FILE_IO \|

			`---`

			`## AST Pattern Rules`

			`### Code Execution`

			`\| Rule ID \| Severity \| Tier \| Description \|`
			`\|---------\|----------\|------\|-------------\|`
			\| `py.code_exec.eval` \| High \| A \| `eval()` — dynamic code execution \|
			\| `py.code_exec.exec` \| High \| A \| `exec()` — dynamic code execution \|
			\| `py.code_exec.compile` \| Medium \| A \| `compile()` with exec/eval mode \|

			`### Command Execution`

			`\| Rule ID \| Severity \| Tier \| Description \|`
			`\|---------\|----------\|------\|-------------\|`
			\| `py.cmdi.os_system` \| High \| A \| `os.system()` — shell command execution \|
			\| `py.cmdi.os_popen` \| High \| A \| `os.popen()` — shell command execution \|
			\| `py.cmdi.subprocess_shell` \| High \| B \| `subprocess.*` with `shell=True` \|

			`### Deserialization`

			`\| Rule ID \| Severity \| Tier \| Description \|`
			`\|---------\|----------\|------\|-------------\|`
			\| `py.deser.pickle_loads` \| High \| A \| `pickle.loads()` / `pickle.load()` — arbitrary object deserialization \|
			\| `py.deser.yaml_load` \| High \| A \| `yaml.load()` without SafeLoader \|
			\| `py.deser.shelve_open` \| Medium \| A \| `shelve.open()` — pickle-backed deserialization \|

			`### SQL Injection`

			`\| Rule ID \| Severity \| Tier \| Description \|`
			`\|---------\|----------\|------\|-------------\|`
			\| `py.sqli.execute_format` \| Medium \| B \| `cursor.execute()` with string concatenation \|

			`### Weak Crypto`

			`\| Rule ID \| Severity \| Tier \| Description \|`
			`\|---------\|----------\|------\|-------------\|`
			\| `py.crypto.md5` \| Low \| A \| `hashlib.md5()` — weak hash algorithm \|
			\| `py.crypto.sha1` \| Low \| A \| `hashlib.sha1()` — weak hash algorithm \|

			`### Template Injection`

			`\| Rule ID \| Severity \| Tier \| Description \|`
			`\|---------\|----------\|------\|-------------\|`
			\| `py.xss.jinja_from_string` \| Medium \| A \| `jinja2.Template.from_string()` — template injection \|

			`---`

			`## Examples`

			### `py.deser.pickle_loads` — Unsafe deserialization

			`Vulnerable:`
			```python
			`import pickle`
			`data = pickle.loads(request.body) # Arbitrary code execution`
			```

			`Safe alternative:`
			```python
			`import json`
			`data = json.loads(request.body) # JSON is safe`
			```

			### `py.cmdi.subprocess_shell` — Shell execution

			`Vulnerable:`
			```python
			`import subprocess`
			`subprocess.call(user_input, shell=True) # Command injection`
			```

			`Safe alternative:`
			```python
			`import subprocess`
			`import shlex`
			`subprocess.call(shlex.split(user_input), shell=False)`
			`# Or better: use an explicit command list`
			`subprocess.call(["ls", "-la", user_dir])`
			```

			### `py.deser.yaml_load` — Unsafe YAML

			`Vulnerable:`
			```python
			`import yaml`
			`config = yaml.load(user_data) # Can instantiate arbitrary objects`
			```

			`Safe alternative:`
			```python
			`import yaml`
			`config = yaml.safe_load(user_data) # Only basic Python types`
			```

			### `py.sqli.execute_format` — SQL concatenation

			`Vulnerable:`
			```python
			`cursor.execute("SELECT * FROM users WHERE id=" + user_id)`
			```

			`Safe alternative:`
			```python
			`cursor.execute("SELECT * FROM users WHERE id=?", (user_id,))`
			```