* chore: Exclude CLAUDE.md from Cargo.toml

* feat: add callgraph module and integrate into main analysis flow

* feat: enhance CLI with new severity filtering and analysis modes

* feat: update CHANGELOG with recent enhancements and fixes to severity filtering and output handling

* feat: implement state-model dataflow analysis for resource lifecycle and auth state

* feat: enhance diagnostic output formatting and add evidence structure

* feat: implement attack surface ranking for diagnostics with scoring and sorting

* feat: add comprehensive documentation for installation, usage, and rules reference

* feat: add multiple language support for command execution and evaluation endpoints

* feat: implement inline suppression for findings using `nyx:ignore` comments

* feat: add confidence levels to AST patterns and update output structure

* feat: implement low-noise prioritization system with category filtering, rollup grouping, and configurable budgets

* feat: bump version to 0.4.0 and update changelog with new features and improvements

* feat: add dead code allowances to various functions in mod.rs and real_world_tests.rs
This commit is contained in:
Eli Peter 2026-02-25 21:16:36 -05:00 committed by GitHub
parent 19b578c5c4
commit 1bbe4b1cfb
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
456 changed files with 25628 additions and 1228 deletions

234
docs/cli.md Normal file
View file

@ -0,0 +1,234 @@
# CLI Reference
## Global
```
nyx [COMMAND]
nyx --version
nyx --help
```
---
## `nyx scan`
Run a security scan on a directory.
```
nyx scan [PATH] [OPTIONS]
```
**PATH** defaults to `.` (current directory).
### Analysis Mode
| Flag | Default | Description |
|------|---------|-------------|
| `--mode <MODE>` | `full` | Analysis mode: `full`, `ast`, `cfg`, or `taint` |
| Mode | What runs |
|------|-----------|
| `full` | AST patterns + CFG structural analysis + taint analysis |
| `ast` | AST patterns only (fastest, no CFG or taint) |
| `cfg` / `taint` | CFG + taint analysis only (no AST patterns) |
**Deprecated aliases**: `--ast-only` (use `--mode ast`), `--cfg-only` (use `--mode cfg`), `--all-targets` (use `--mode full`).
### Index Control
| Flag | Default | Description |
|------|---------|-------------|
| `--index <MODE>` | `auto` | Index behavior: `auto`, `off`, or `rebuild` |
| Index Mode | Behavior |
|------------|----------|
| `auto` | Use existing index if available; build if missing |
| `off` | Skip indexing, scan filesystem directly |
| `rebuild` | Force rebuild index before scanning |
**Deprecated aliases**: `--no-index` (use `--index off`), `--rebuild-index` (use `--index rebuild`).
### Output
| Flag | Default | Description |
|------|---------|-------------|
| `-f, --format <FMT>` | `console` | Output format: `console`, `json`, or `sarif` |
| `--quiet` | off | Suppress status messages (stderr); stdout stays clean |
| `--no-rank` | off | Disable attack-surface ranking |
### Filtering
| Flag | Default | Description |
|------|---------|-------------|
| `--severity <EXPR>` | *(none)* | Filter findings by severity |
| `--min-score <N>` | *(none)* | Drop findings with rank score below N |
| `--min-confidence <LEVEL>` | *(none)* | Drop findings below this confidence level (`low`, `medium`, `high`) |
| `--fail-on <SEV>` | *(none)* | Exit code 1 if any finding >= this severity |
| `--show-suppressed` | off | Show inline-suppressed findings (dimmed, tagged `[SUPPRESSED]`) |
| `--keep-nonprod-severity` | off | Don't downgrade severity for test/vendor paths |
| `--all` | off | Disable category filtering, rollups, and LOW budgets — show everything |
| `--include-quality` | off | Include Quality-category findings (hidden by default) |
| `--max-low <N>` | `20` | Maximum total LOW findings to show |
| `--max-low-per-file <N>` | `1` | Maximum LOW findings per file |
| `--max-low-per-rule <N>` | `10` | Maximum LOW findings per rule |
| `--rollup-examples <N>` | `5` | Number of example locations in rollup findings |
| `--show-instances <RULE>` | *(none)* | Expand all instances of a specific rule (bypass rollup) |
**Severity expression formats**:
```bash
--severity HIGH # Only high
--severity "HIGH,MEDIUM" # High or medium
--severity ">=MEDIUM" # Medium and above (high + medium)
--severity ">= low" # All severities (case-insensitive)
```
**Deprecated aliases**: `--high-only` (use `--severity HIGH`), `--include-nonprod` (use `--keep-nonprod-severity`).
### Examples
```bash
# Basic scan
nyx scan
# Scan specific path, JSON output
nyx scan ./server --format json
# CI gate: fail on medium+, SARIF output
nyx scan . --format sarif --fail-on medium > results.sarif
# Fast AST-only scan, no index
nyx scan . --mode ast --index off
# High-severity only, quiet mode
nyx scan . --severity HIGH --quiet
# Only findings scoring 50 or above
nyx scan . --min-score 50
# Only medium+ confidence findings
nyx scan . --min-confidence medium
# Show everything (no filtering, no rollups)
nyx scan . --all
# Include quality findings but keep rollups and budgets
nyx scan . --include-quality
# See all unwrap findings expanded
nyx scan . --include-quality --show-instances rs.quality.unwrap
# Allow more LOW findings
nyx scan . --max-low 50 --max-low-per-file 5
```
---
## `nyx index`
Manage the SQLite file index.
### `nyx index build`
```
nyx index build [PATH] [--force]
```
Build or update the index for the given path (default: `.`).
| Flag | Description |
|------|-------------|
| `-f, --force` | Force full rebuild, ignoring cached file hashes |
### `nyx index status`
```
nyx index status [PATH]
```
Display index statistics (file count, size, last modified) for the given path.
---
## `nyx list`
```
nyx list [-v]
```
List all indexed projects.
| Flag | Description |
|------|-------------|
| `-v, --verbose` | Show detailed information per project |
---
## `nyx clean`
```
nyx clean [PROJECT] [--all]
```
Remove index data.
| Argument/Flag | Description |
|---------------|-------------|
| `PROJECT` | Project name or path to clean |
| `--all` | Clean all indexed projects |
---
## `nyx config`
Manage configuration.
### `nyx config show`
Print the effective merged configuration as TOML.
### `nyx config path`
Print the configuration directory path.
### `nyx config add-rule`
```
nyx config add-rule --lang <LANG> --matcher <MATCHER> --kind <KIND> --cap <CAP>
```
Add a custom taint rule. Written to `nyx.local`.
| Flag | Values |
|------|--------|
| `--lang` | `rust`, `javascript`, `typescript`, `python`, `go`, `java`, `c`, `cpp`, `php`, `ruby` |
| `--matcher` | Function or property name to match |
| `--kind` | `source`, `sanitizer`, `sink` |
| `--cap` | `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `all` |
### `nyx config add-terminator`
```
nyx config add-terminator --lang <LANG> --name <NAME>
```
Add a terminator function (e.g. `process.exit`). Written to `nyx.local`.
---
## Exit Codes
| Code | Meaning |
|------|---------|
| `0` | Scan completed; no findings matched `--fail-on` threshold (or no `--fail-on` specified) |
| `1` | Scan completed but at least one finding met or exceeded the `--fail-on` severity |
| Non-zero | Error during scan (I/O error, config parse error, database error, etc.) |
---
## Environment Variables
| Variable | Description |
|----------|-------------|
| `RUST_LOG` | Set tracing verbosity (e.g. `RUST_LOG=debug nyx scan .`) |
| `NO_COLOR` | Disable ANSI color output |

183
docs/configuration.md Normal file
View file

@ -0,0 +1,183 @@
# Configuration
Nyx uses TOML configuration files. A default config is auto-generated on first run.
## File Locations
| Platform | Directory |
|----------|-----------|
| Linux | `~/.config/nyx/` |
| macOS | `~/Library/Application Support/nyx/` |
| Windows | `%APPDATA%\elicpeter\nyx\config\` |
Run `nyx config path` to see the exact directory on your system.
## File Precedence
1. **`nyx.conf`** — Default config (auto-created from built-in template on first run)
2. **`nyx.local`** — User overrides (loaded on top of defaults)
Both files are optional. CLI flags take precedence over both.
## Merge Strategy
| Type | Behavior |
|------|----------|
| Scalars (`mode`, `min_severity`, booleans) | User value wins |
| Arrays (`excluded_extensions`, `excluded_directories`) | Union + deduplicate |
| Analysis rules | Per-language union with deduplication |
Example:
```toml
# nyx.conf (default):
excluded_extensions = ["jpg", "png", "exe"]
# nyx.local (user):
excluded_extensions = ["foo", "jpg"]
# Effective result:
# ["exe", "foo", "jpg", "png"] — sorted, deduped union
```
---
## Full Schema
### `[scanner]`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `mode` | `"full"` \| `"ast"` \| `"cfg"` \| `"taint"` | `"full"` | Analysis mode |
| `min_severity` | `"Low"` \| `"Medium"` \| `"High"` | `"Low"` | Minimum severity to report |
| `max_file_size_mb` | int \| null | null | Max file size in MiB; null = unlimited |
| `excluded_extensions` | [string] | `["jpg", "png", "gif", "mp4", ...]` | File extensions to skip |
| `excluded_directories` | [string] | `["node_modules", ".git", "target", ...]` | Directories to skip |
| `excluded_files` | [string] | `[]` | Specific files to skip |
| `read_global_ignore` | bool | `false` | Honor global ignore file |
| `read_vcsignore` | bool | `true` | Honor `.gitignore` / `.hgignore` |
| `require_git_to_read_vcsignore` | bool | `true` | Require `.git` dir to apply gitignore |
| `one_file_system` | bool | `false` | Don't cross filesystem boundaries |
| `follow_symlinks` | bool | `false` | Follow symbolic links |
| `scan_hidden_files` | bool | `false` | Scan dot-files |
| `include_nonprod` | bool | `false` | Keep original severity for test/vendor paths |
| `enable_state_analysis` | bool | `false` | Enable resource lifecycle + auth state analysis. Detects use-after-close, double-close, resource leaks (per-function scope), and unauthenticated access. Requires `mode = "full"` or `mode = "cfg"`. |
### `[database]`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `path` | string | `""` | Custom SQLite DB path; empty = platform default |
### `[output]`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `default_format` | `"console"` \| `"json"` \| `"sarif"` | `"console"` | Default output format |
| `quiet` | bool | `false` | Suppress status messages |
| `max_results` | int \| null | null | Cap number of findings; null = unlimited |
| `attack_surface_ranking` | bool | `true` | Enable attack-surface ranking |
| `min_score` | int \| null | null | Minimum rank score to include; null = no minimum |
| `min_confidence` | string \| null | null | Minimum confidence level (`"low"`, `"medium"`, `"high"`); null = no minimum |
| `include_quality` | bool | `false` | Include Quality-category findings (hidden by default) |
| `show_all` | bool | `false` | Disable category filtering, rollups, and LOW budgets |
| `max_low` | int | `20` | Maximum total LOW findings to show (rollups count as 1) |
| `max_low_per_file` | int | `1` | Maximum LOW findings per file (rollups count as 1) |
| `max_low_per_rule` | int | `10` | Maximum LOW findings per rule (rollups count as 1) |
| `rollup_examples` | int | `5` | Number of example locations stored in rollup findings |
### `[performance]`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `worker_threads` | int \| null | null | Worker thread count; null/0 = auto-detect |
| `batch_size` | int | `100` | Files per index batch |
| `channel_multiplier` | int | `4` | Channel capacity = threads x multiplier |
| `rayon_thread_stack_size` | int | `8388608` | Rayon thread stack size in bytes (8 MiB) |
| `prune` | bool | `false` | Stop traversing into matching directories |
### `[analysis.languages.<slug>]`
Per-language custom rules. `<slug>` is one of: `rust`, `javascript`, `typescript`, `python`, `go`, `java`, `c`, `cpp`, `php`, `ruby`.
| Field | Type | Description |
|-------|------|-------------|
| `rules` | array of rule objects | Custom label rules |
| `terminators` | [string] | Functions that terminate execution |
| `event_handlers` | [string] | Event handler function names |
**Rule object**:
```toml
[[analysis.languages.javascript.rules]]
matchers = ["escapeHtml"]
kind = "sanitizer" # "source" | "sanitizer" | "sink"
cap = "html_escape" # "env_var" | "html_escape" | "shell_escape" |
# "url_encode" | "json_parse" | "file_io" | "all"
```
---
## Example Configurations
### Minimal override (`nyx.local`)
```toml
[scanner]
min_severity = "Medium"
[output]
default_format = "json"
max_results = 100
```
### CI-optimized
```toml
[scanner]
mode = "full"
min_severity = "Medium"
excluded_directories = ["node_modules", ".git", "target", "vendor", "dist"]
[output]
quiet = true
default_format = "sarif"
[performance]
worker_threads = 4
```
### Custom rules for a Node.js project
```toml
[analysis.languages.javascript]
terminators = ["process.exit", "abort"]
event_handlers = ["addEventListener"]
[[analysis.languages.javascript.rules]]
matchers = ["escapeHtml", "sanitizeInput"]
kind = "sanitizer"
cap = "html_escape"
[[analysis.languages.javascript.rules]]
matchers = ["dangerouslySetInnerHTML"]
kind = "sink"
cap = "html_escape"
[[analysis.languages.javascript.rules]]
matchers = ["getRequestBody", "readUserInput"]
kind = "source"
cap = "all"
```
### Adding rules via CLI
```bash
# Add a sanitizer
nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape
# Add a terminator
nyx config add-terminator --lang javascript --name process.exit
# Verify
nyx config show
```

81
docs/detectors.md Normal file
View file

@ -0,0 +1,81 @@
# Detector Overview
Nyx uses four independent detector families. Each targets different vulnerability classes and operates at a different level of analysis depth. Findings from all active detectors are merged, deduplicated, ranked, and presented in a single result set.
## The Four Detector Families
| Family | Rule prefix | Analysis depth | What it finds |
|--------|------------|----------------|---------------|
| [**Taint Analysis**](detectors/taint.md) | `taint-*` | Cross-file dataflow | Unsanitized data flowing from sources to sinks |
| [**CFG Structural**](detectors/cfg.md) | `cfg-*` | Intra-procedural CFG | Auth gaps, unguarded sinks, resource leaks, error fallthrough |
| [**State Model**](detectors/state.md) | `state-*` | Intra-procedural lattice | Use-after-close, double-close, resource leaks, unauthenticated access |
| [**AST Patterns**](detectors/patterns.md) | `<lang>.*.*` | Structural (no flow) | Dangerous function calls, banned APIs, weak crypto |
## How They Combine
In `--mode full` (default), all four families run. Findings are deduplicated:
1. **Taint supersedes AST**: If a taint finding and an AST pattern both fire at the same location (e.g. both flag `eval(userInput)`), both are kept with distinct rule IDs. The taint finding ranks higher due to the analysis-kind bonus.
2. **State supersedes CFG**: If a state-model finding (e.g. `state-resource-leak`) fires at the same location as a CFG finding (e.g. `cfg-resource-leak`), the CFG finding is suppressed.
3. **Location-level dedup**: Exact duplicates (same line, column, rule ID, severity) are removed.
## Analysis Modes
| Mode | CLI flag | Active detectors |
|------|----------|-----------------|
| Full | `--mode full` | All four |
| AST-only | `--mode ast` | AST patterns only |
| CFG/Taint | `--mode cfg` | Taint + CFG + State |
## Attack-Surface Ranking
Every finding receives a deterministic **attack-surface score** estimating exploitability. Findings are sorted by descending score.
### Scoring Formula
```
score = severity_base + analysis_kind + evidence_strength + state_bonus - validation_penalty
```
| Component | Values | Purpose |
|-----------|--------|---------|
| **Severity base** | High=60, Medium=30, Low=10 | Primary signal |
| **Analysis kind** | taint=+10, state=+8, cfg(with evidence)=+5, cfg(no evidence)=+3, ast=+0 | Confidence of analysis |
| **Evidence strength** | +1 per evidence item (max 4), +2-6 for source kind | Specificity of finding |
| **State bonus** | use-after-close/unauthed=+6, double-close=+3, must-leak=+2, may-leak=+1 | State rule severity |
| **Validation penalty** | -5 if path-validated | Guard reduces exploitability |
### Source-kind priority
| Source type | Bonus | Examples |
|-------------|-------|---------|
| User input | +6 | `req.body`, `argv`, `stdin`, `form`, `query`, `params` |
| Environment | +5 | `env::var`, `getenv`, `process.env` |
| Unknown | +4 | Conservative default |
| File system | +3 | `fs::read_to_string`, `fgets` |
| Database | +2 | Query results |
### Score ranges (approximate)
| Finding type | Score range |
|-------------|------------|
| High taint + user input | ~76-80 |
| High state (use-after-close) | ~74 |
| High CFG structural | ~63-68 |
| Medium taint + env source | ~45-50 |
| Medium state (resource leak) | ~40 |
| Low AST-only pattern | ~10 |
Ranking is enabled by default. Disable with `--no-rank` or `output.attack_surface_ranking = false`.
## Two-Pass Architecture
Nyx's taint analysis requires cross-file context, achieved via two passes:
1. **Pass 1 — Summary extraction**: Each file is parsed, a CFG is built, and a `FuncSummary` is extracted per function. Summaries capture source/sanitizer/sink capabilities (bitflags), taint propagation behavior, and callee lists. Summaries are persisted to SQLite.
2. **Pass 2 — Analysis**: All summaries are merged into a global map. Files are re-parsed and analyzed with full cross-file context. The taint engine resolves callees against local summaries (more precise) first, then falls back to global summaries.
With indexing enabled, Pass 1 skips files whose content hash hasn't changed since the last scan.

161
docs/detectors/cfg.md Normal file
View file

@ -0,0 +1,161 @@
# CFG Structural Analysis
## Summary
Nyx builds an intra-procedural control-flow graph (CFG) for each function and analyzes structural properties: whether sinks are guarded by sanitizers or validators, whether web handlers check authentication, whether resources are released on all exit paths, and whether error-handling code terminates properly.
These detectors use **dominator analysis** — they check whether a guard node dominates (must execute before) a sink node on the CFG.
## Rule IDs
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `cfg-unguarded-sink` | High/Medium | Sink reachable without a dominating guard or sanitizer |
| `cfg-auth-gap` | High | Web handler reaches privileged sink without auth check |
| `cfg-unreachable-sink` | Medium | Dangerous function in unreachable code |
| `cfg-unreachable-sanitizer` | Low | Sanitizer in unreachable code |
| `cfg-unreachable-source` | Low | Source in unreachable code |
| `cfg-error-fallthrough` | High/Medium | Error check doesn't terminate; dangerous code follows |
| `cfg-resource-leak` | Medium | Resource acquired but not released on all exit paths |
| `cfg-lock-not-released` | Medium | Lock acquired but not released on all exit paths |
## What It Detects
### Unguarded sinks (`cfg-unguarded-sink`)
A sink call (e.g. `system()`, `eval()`, `Command::new()`) is reachable from the function entry without passing through a guard or sanitizer that matches the sink's capability.
### Auth gaps (`cfg-auth-gap`)
A function identified as a web handler (by parameter naming conventions like `req`, `res`, `ctx`, `request`) reaches a privileged sink (shell execution, file I/O) without a prior call to an authentication function (`is_authenticated`, `require_auth`, `check_permission`, etc.).
### Unreachable security code (`cfg-unreachable-*`)
Sinks, sanitizers, or sources in dead code branches. This often indicates a refactoring error where security-critical code was accidentally made unreachable.
### Error fallthrough (`cfg-error-fallthrough`)
An error check (null check, error return check) does not terminate the function or loop back. Execution continues to a dangerous operation on the error path.
### Resource leaks (`cfg-resource-leak`, `cfg-lock-not-released`)
A resource acquisition call (e.g. `File::open`, `fopen`, `socket`, `Lock`) is not matched by a release call (e.g. `close`, `fclose`, `unlock`) on all exit paths from the function.
## What It Cannot Detect
- **Inter-procedural guards**: If authentication is checked in a middleware function that calls this handler, the CFG detector cannot see it. It only analyzes one function at a time.
- **Dynamic dispatch**: Virtual method calls, function pointers, and closures are opaque to the CFG.
- **Complex guard patterns**: Only recognized guard function names are checked. Custom validation logic (e.g. `if password == expected`) is not recognized as a guard.
- **Correct sanitization**: The detector checks that *some* guard dominates the sink, not that the guard is *correct*. A guard that always passes would suppress the finding.
- **Cross-function resource flows**: If a file handle is opened in one function and closed in another, the detector will report a leak in the first function.
## Common False Positives
| Scenario | Why it fires | Mitigation |
|----------|-------------|------------|
| Framework-level auth middleware | Handler doesn't call auth directly | Document as expected; suppress with severity filter |
| Resource closed via RAII/defer | Implicit cleanup not visible to CFG | Currently not detected; known limitation |
| Custom guard function name | Function not in the recognized guard list | Add the function name as a sanitizer in config |
| Test handlers | Intentionally skip auth in tests | Default non-prod downgrade reduces severity; or exclude test dirs |
## Common False Negatives
| Scenario | Why it's missed |
|----------|----------------|
| Auth in called function | Cross-function guards not tracked |
| Guard via type system | Type-level guarantees (e.g. Rust's `AuthenticatedUser` wrapper) not analyzed |
| Resource closed in finally/defer | Some cleanup patterns not recognized |
## Confidence Signals
| Signal | Meaning |
|--------|---------|
| **Evidence lists guard nodes** | Shows which guards were checked and found missing |
| **Sink has high capability** | Shell execution or file I/O sinks are higher risk |
| **Handler detection matched** | Web handler identification is based on conventional parameter names |
## Tuning and Noise Controls
### Add custom guards/sanitizers
```toml
[[analysis.languages.python.rules]]
matchers = ["validate_request", "check_csrf"]
kind = "sanitizer"
cap = "all"
```
### Add auth rules
Auth checks are recognized by function name. If your codebase uses non-standard names:
```toml
[[analysis.languages.javascript.rules]]
matchers = ["ensureLoggedIn", "requirePermission"]
kind = "sanitizer"
cap = "all"
```
### Filter results
```bash
# Skip low-severity unreachable findings
nyx scan . --severity ">=MEDIUM"
```
### Disable CFG analysis
```bash
nyx scan . --mode ast # AST patterns only
```
## Examples
### Unguarded sink
```go
func handler(w http.ResponseWriter, r *http.Request) {
cmd := r.URL.Query().Get("cmd")
exec.Command("sh", "-c", cmd).Run() // cfg-unguarded-sink: no guard dominates
}
```
### Auth gap
```javascript
app.get('/admin/delete', (req, res) => {
// No is_authenticated() call
db.execute("DELETE FROM users WHERE id = " + req.params.id);
// cfg-auth-gap: web handler reaches privileged sink without auth
});
```
### Resource leak
```c
void process() {
FILE *f = fopen("data.txt", "r"); // acquire
if (error) {
return; // cfg-resource-leak: f not closed on this path
}
fclose(f);
}
```
## Guard Rules
Nyx recognizes these function name patterns as guards:
| Pattern | Applies to |
|---------|-----------|
| `validate*`, `sanitize*` | All sinks |
| `check_*`, `verify_*`, `assert_*` | All sinks |
| `shell_escape` | Shell execution sinks |
| `html_escape` | HTML/XSS sinks |
| `url_encode` | URL sinks |
| `which` | Shell execution (binary lookup) |
### Auth rules
| Pattern | Category |
|---------|----------|
| `is_authenticated`, `require_auth`, `check_permission` | Common |
| `authorize`, `authenticate`, `require_login` | Common |
| `check_auth`, `verify_token`, `validate_token` | Common |
| `middleware.auth`, `auth.required` | Go |
| `isAuthenticated`, `checkPermission`, `hasAuthority`, `hasRole` | Java |

149
docs/detectors/patterns.md Normal file
View file

@ -0,0 +1,149 @@
# AST Pattern Matching
## Summary
AST patterns are tree-sitter queries that match specific structural code constructs. They are the simplest and fastest detector family — no dataflow, no CFG, just structural presence. A match means the dangerous construct exists in the code; it does not prove the code is exploitable.
AST patterns run in all analysis modes, including `--mode ast` (where they are the only active detector).
## Rule IDs
Pattern rule IDs follow the format `<lang>.<category>.<specific>`:
```
rs.memory.transmute
js.code_exec.eval
py.deser.pickle_loads
c.memory.gets
java.sqli.execute_concat
```
See the [Rule Reference](../rules/index.md) for a complete listing per language.
## Pattern Tiers
| Tier | Meaning | Examples |
|------|---------|---------|
| **A** | Structural presence alone is high-signal | `gets()`, `eval()`, `pickle.loads()`, `mem::transmute` |
| **B** | Query includes a heuristic guard | SQL `execute` with concatenated arg, `printf(var)` with non-literal format |
Tier B patterns use additional tree-sitter predicates to reduce false positives. For example, `java.sqli.execute_concat` only fires when `executeQuery()` receives a `binary_expression` (string concatenation) as its argument, not when it receives a literal or parameter placeholder.
## What It Detects
### By category
| Category | What it matches | Example languages |
|----------|----------------|-------------------|
| **CommandExec** | Shell command execution functions | C (`system`), Python (`os.system`), Ruby (backticks) |
| **CodeExec** | Dynamic code evaluation | JS (`eval`, `new Function()`), Python (`exec`), PHP (`eval`) |
| **Deserialization** | Unsafe object deserialization | Java (`readObject`), Python (`pickle.loads`), Ruby (`Marshal.load`) |
| **SqlInjection** | SQL with string concatenation | Java, Go, Python, PHP (Tier B heuristic) |
| **PathTraversal** | File inclusion with variable path | PHP (`include $var`) |
| **Xss** | XSS sink functions | JS (`document.write`, `outerHTML`), Java (`getWriter().print`) |
| **Crypto** | Weak cryptographic algorithms | All languages (`md5`, `sha1`, `Math.random()`) |
| **Secrets** | Hardcoded credentials | Go (variable name matching) |
| **InsecureTransport** | Unencrypted communication | Go (`InsecureSkipVerify`), JS (`fetch("http://")`) |
| **Reflection** | Dynamic class/method dispatch | Java (`Class.forName`, `Method.invoke`), Ruby (`send`, `constantize`) |
| **MemorySafety** | Memory safety violations | Rust (`transmute`, `unsafe`), C (`gets`, `strcpy`, `sprintf`) |
| **Prototype** | Prototype pollution | JS/TS (`__proto__` assignment) |
| **CodeQuality** | Panic/abort/type-safety issues | Rust (`unwrap`, `panic!`), TS (`as any`) |
## What It Cannot Detect
- **Dataflow**: Patterns don't track whether the dangerous function receives tainted input. `eval("hello")` (safe) and `eval(userInput)` (dangerous) both match `js.code_exec.eval`.
- **Context**: Patterns don't understand whether the code is reachable, guarded, or inside a test.
- **Semantics**: `strcpy(dst, src)` always matches — it cannot determine buffer sizes.
- **Indirect calls**: Function pointers, dynamic dispatch, and aliased references are invisible.
## Common False Positives
| Scenario | Why it fires | Mitigation |
|----------|-------------|------------|
| `eval()` with a hardcoded string literal | Pattern matches structural presence | Taint analysis won't flag this — use `--mode cfg` for fewer false positives |
| `unsafe` block in Rust with sound justification | All unsafe blocks match | Filter with `--severity ">=MEDIUM"` (unsafe_block is Medium) |
| `.unwrap()` in test code | Acceptable in tests | Default non-prod downgrade reduces severity |
| `md5()` used for checksums (not security) | Pattern doesn't know usage intent | Filter Low severity or add to exclusions |
| SQL concatenation with trusted data | Tier B heuristic can't verify data source | Taint analysis is more precise here |
## Common False Negatives
| Scenario | Why it's missed |
|----------|----------------|
| `eval` called via alias (`let e = eval; e(input)`) | Pattern matches the identifier `eval`, not the resolved function |
| Dangerous function in a macro expansion | Tree-sitter parses the macro call, not the expansion |
| SQL injection via ORM query builder | No pattern for ORM-specific query building |
| Imported function under different name | `from os import system as s; s(cmd)` — pattern looks for `system` |
## Confidence Signals
| Signal | Meaning |
|--------|---------|
| **Tier A** | High confidence — the function itself is dangerous |
| **Tier B** | Moderate confidence — heuristic guard reduces false positives |
| **High severity** | Critical vulnerability class (command exec, deserialization) |
| **Low severity** | Informational (weak crypto, code quality) |
| **Non-prod path** | Finding in test/vendor code — downgraded by default |
## Tuning and Noise Controls
### Severity filtering
```bash
# Skip code-quality and weak-crypto findings
nyx scan . --severity ">=MEDIUM"
# Only critical findings
nyx scan . --severity HIGH
```
### Use taint for precision
```bash
# Taint-only mode: only report findings with confirmed dataflow
nyx scan . --mode cfg
```
### Exclude directories
```toml
[scanner]
excluded_directories = ["node_modules", "vendor", "generated"]
```
## Examples
### Tier A — structural presence
**C: Banned function**
```c
char buf[64];
gets(buf); // c.memory.gets — always dangerous, no safe usage
```
**Python: Unsafe deserialization**
```python
import pickle
data = pickle.loads(user_input) # py.deser.pickle_loads
```
### Tier B — heuristic-guarded
**Java: SQL concatenation**
```java
// Fires: concatenated argument
stmt.executeQuery("SELECT * FROM users WHERE id=" + userId);
// java.sqli.execute_concat
// Does NOT fire: parameterized query
stmt.executeQuery(preparedSql);
```
**C: Format string**
```c
// Fires: variable as first argument
printf(user_input); // c.memory.printf_no_fmt
// Does NOT fire: literal format string
printf("%s", user_input);
```

204
docs/detectors/state.md Normal file
View file

@ -0,0 +1,204 @@
# State Model Analysis
## Summary
Nyx's state model analysis tracks **resource lifecycle** and **authentication state** through a function using monotone dataflow over bounded lattices. It detects use-after-close bugs, double-close bugs, resource leaks, and unauthenticated access to privileged operations.
State analysis is **opt-in** — enable it with `scanner.enable_state_analysis = true` in config. It requires `mode = "full"` or `mode = "cfg"`.
## Rule IDs
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `state-use-after-close` | High | Variable used after being closed/released |
| `state-double-close` | Medium | Resource closed twice |
| `state-resource-leak` | Medium | Resource opened but never closed (definite) |
| `state-resource-leak-possible` | Low | Resource may not be closed on all paths |
| `state-unauthed-access` | High | Privileged operation reached without authentication |
## What It Detects
### Use-after-close (`state-use-after-close`)
A resource transitions to the CLOSED state (via `close()`, `fclose()`, `disconnect()`, etc.), then a use operation (`read`, `write`, `send`, `recv`, `query`, etc.) is performed on it.
```c
FILE *f = fopen("data.txt", "r");
fclose(f);
fread(buf, 1, 100, f); // state-use-after-close
```
### Double-close (`state-double-close`)
A resource is closed twice. This can cause crashes or undefined behavior.
```python
f = open("data.txt")
f.close()
f.close() # state-double-close
```
### Resource leak (`state-resource-leak`)
A resource is opened but never closed on any path through the function. This is a definite leak.
```java
FileInputStream fis = new FileInputStream("data.txt");
process(fis);
// function exits without fis.close() — state-resource-leak
```
### Possible resource leak (`state-resource-leak-possible`)
A resource is closed on some paths but not others.
```go
f, err := os.Open("data.txt")
if err != nil {
return // f not closed here
}
f.Close() // closed here
// state-resource-leak-possible on the error path
```
### Unauthenticated access (`state-unauthed-access`)
A function identified as a web handler reaches a privileged sink (shell execution, file I/O) without any authentication check on the path.
A function is identified as a web handler if:
1. Its name starts with `handle_`, `route_`, or `api_` (strong match — sufficient on its own), OR
2. Its name starts with `serve_` or `process_` AND any function in the file has web-like parameter names (`request`, `req`, `ctx`, `res`, `response`, `w`, `writer`, etc., varying by language).
The function name `main` is explicitly excluded.
```javascript
app.post('/admin/exec', (req, res) => {
// No auth check
exec(req.body.command); // state-unauthed-access
});
```
## What It Cannot Detect
- **Cross-function resource management**: Resources opened in one function and closed in another are not tracked. This is the most common source of false positives for leak detection.
- **RAII / defer / try-with-resources**: Implicit cleanup via language-level constructs (Rust's `Drop`, Go's `defer`, Java's try-with-resources, Python's `with`) is not recognized. These patterns will produce false-positive leak findings.
- **Dynamic dispatch**: If `close()` is called through a trait object or interface, it may not be recognized.
- **Authentication via type system**: Rust's type-state pattern (e.g. `AuthenticatedRequest<T>`) is not recognized as an auth check.
- **Complex authorization logic**: Only recognized function name patterns are checked.
## Common False Positives
| Scenario | Why it fires | Mitigation |
|----------|-------------|------------|
| RAII / Drop / defer cleanup | Implicit cleanup not visible | Known limitation; filter by severity |
| Resource returned to caller | Ownership transferred, not leaked | Known limitation |
| Framework-managed resources | Web framework manages connection lifecycle | Exclude framework-generated handlers |
| Try-with-resources (Java) | Language construct not parsed | Known limitation |
| Context manager (Python `with`) | Block construct not tracked | Known limitation |
## Common False Negatives
| Scenario | Why it's missed |
|----------|----------------|
| Resource closed in helper function | Cross-function tracking not implemented |
| Auth in middleware | Auth check happens before handler is called |
| Double-close via aliased reference | Alias analysis not performed |
## Confidence Signals
| Signal | Meaning |
|--------|---------|
| **Definite leak (state-resource-leak)** | Resource is never closed on any path — high confidence |
| **Use-after-close** | Read/write operation after explicit close — high confidence |
| **Web handler detected** | Entry point matched by parameter naming convention |
| **Possible leak (state-resource-leak-possible)** | Resource closed on some but not all paths — lower confidence |
## Tuning and Noise Controls
### Enable state analysis
```toml
[scanner]
enable_state_analysis = true
```
### Severity filtering
```bash
# Skip possible-leak findings (Low severity)
nyx scan . --severity ">=MEDIUM"
```
### Exclude test files
```toml
[scanner]
excluded_directories = ["tests", "test", "spec"]
```
## Resource Pairs
The state engine recognizes these acquire/release pairs per language:
### C/C++
| Acquire | Release | Resource |
|---------|---------|----------|
| `fopen` | `fclose` | File handle |
| `open` | `close` | File descriptor |
| `socket` | `close` | Socket |
| `malloc`, `calloc`, `realloc` | `free` | Heap memory |
| `pthread_mutex_lock` | `pthread_mutex_unlock` | Mutex |
### Rust
| Acquire | Release | Resource |
|---------|---------|----------|
| `File::open`, `File::create` | `drop`, `close` | File handle |
| `TcpStream::connect` | `shutdown` | TCP connection |
| `lock`, `read`, `write` (on Mutex/RwLock) | `drop` | Lock guard |
### Java
| Acquire | Release | Resource |
|---------|---------|----------|
| `new FileInputStream` | `close` | File stream |
| `getConnection` | `close` | DB connection |
| `new Socket` | `close` | Socket |
### Go, Python, JavaScript, Ruby, PHP
Similar patterns with language-specific function names.
## Use Patterns (Trigger use-after-close)
The following operations on a closed resource trigger `state-use-after-close`:
```
read, write, send, recv, fread, fwrite, fgets, fputs, fprintf, fscanf,
fflush, fseek, ftell, rewind, feof, ferror, fgetc, fputc, getc, putc,
ungetc, query, execute, fetch, sendto, recvfrom, ioctl, fcntl,
strcpy, strncpy, strcat, strncat, memcpy, memmove, memset, memcmp,
strcmp, strncmp, strlen, sprintf, snprintf
```
## Technical Details
### Resource Lifecycle Lattice
```
UNINIT → OPEN → CLOSED
→ MOVED
```
States are tracked as bitflags, allowing the lattice to represent uncertainty (e.g. OPEN|CLOSED means the resource is open on some paths and closed on others).
### Leak Detection Scope
Resource leaks are checked at the file-level exit node and the **synthesized** function exit node (a single Return node that all early returns feed into). Early-return nodes are **not** checked individually — only the merged state at the function's synthesized exit is inspected. This prevents duplicate findings where an early-return path reports a definite leak while the merged exit correctly reports a possible leak.
This per-function exit inspection ensures that a variable leaked inside one function is not masked by a same-named variable that is properly closed in a subsequent function.
### Auth Level Lattice
```
Unauthed < Authed < Admin
```
Join semantics: take the minimum (conservative). If any path is unauthenticated, the result is unauthenticated.

202
docs/detectors/taint.md Normal file
View file

@ -0,0 +1,202 @@
# Taint Analysis
## Summary
Nyx's taint analysis tracks the flow of untrusted data from **sources** (where data enters the program) through **assignments and function calls** to **sinks** (where dangerous operations happen). If the data reaches a sink without passing through a **sanitizer** with matching capabilities, a finding is emitted.
The engine uses a monotone forward dataflow analysis over a finite lattice with guaranteed termination. Analysis is **intra-procedural with cross-file function summaries** — it does not follow calls into other functions but uses pre-computed summaries of their behavior.
## Rule ID
```
taint-unsanitised-flow (source <line>:<col>)
```
One rule ID covers all taint findings. The parenthetical identifies the specific source location.
## What It Detects
- Environment variables flowing to shell execution (`env::var``Command::new`)
- User input flowing to code evaluation (`req.body``eval()`)
- File contents flowing to SQL queries (`fs::read_to_string``db.execute()`)
- Request parameters flowing to HTML output (`req.query``innerHTML`)
- Any source-to-sink flow where the sink's required capability is not stripped by a sanitizer
## What It Cannot Detect
- **Inter-procedural flows without summaries**: If a function isn't summarized (e.g. from a third-party library without source), the taint engine cannot track data through it. It conservatively treats unknown callees as neither propagating nor sanitizing.
- **Flows through data structures**: Taint is tracked per-variable, not per-field. `obj.field = tainted; sink(obj.other_field)` may produce a false positive because taint attaches to `obj` as a whole.
- **Aliasing**: `let y = &x; sink(*y)` — the engine tracks `y` as a fresh variable, not an alias of `x`. This can cause false negatives.
- **Complex control flow**: The analysis is flow-sensitive (respects control flow within a function) but does not track taint through arbitrary loops with complex exit conditions.
- **Implicit flows**: Taint only follows explicit data flow, not information flow through branching (e.g. `if (secret) { x = 1 } else { x = 0 }` does not taint `x`).
## Common False Positives
| Scenario | Why it happens | Mitigation |
|----------|---------------|------------|
| Custom sanitizer not recognized | Nyx only knows built-in and configured sanitizers | Add a custom sanitizer rule in config |
| Taint through struct fields | Variable-level (not field-level) tracking | No current mitigation; field sensitivity is planned |
| Dead code paths | The engine is path-insensitive within a function (it considers all paths) | Contradiction pruning catches some cases; path-validated findings score lower |
| Library wrappers | A wrapper around a dangerous function may re-introduce taint that was sanitized by the wrapper | Summarize the wrapper function or add it as a sanitizer |
## Common False Negatives
| Scenario | Why it's missed |
|----------|----------------|
| Third-party library calls | No summary available; callee treated as opaque |
| Taint through global/static variables | Not tracked across function boundaries |
| Taint through closures/callbacks in some languages | Closure capture analysis is limited (JS/TS/Ruby/Go anonymous functions ARE analyzed) |
| Flows spanning more than two files | Summary approximation loses precision at depth |
## Confidence Signals
These signals in the output indicate higher-confidence findings:
| Signal | What it means |
|--------|--------------|
| **Evidence: Source + Sink** | Both endpoints identified with specific function names and locations |
| **Source kind = user input** | Source is directly controllable by an attacker (req.body, argv, etc.) |
| **path_validated = false** | No validation guard on the path — higher exploitability |
| **No guard_kind** | No dominating predicate check (null check, error check, etc.) |
| **High rank_score** | Multiple confidence signals combined |
Lower-confidence:
| Signal | What it means |
|--------|--------------|
| **path_validated = true** | A validation predicate guards the path — may not be exploitable |
| **guard_kind = "ValidationCall"** | An explicit validation function was called before the sink |
| **Source kind = database** | Data from DB — may already be validated at insertion time |
## Tuning and Noise Controls
### Add custom sanitizers
If your codebase has a custom sanitizer that Nyx doesn't recognize:
```toml
# nyx.local
[[analysis.languages.javascript.rules]]
matchers = ["escapeHtml", "sanitizeInput"]
kind = "sanitizer"
cap = "html_escape"
```
Or via CLI:
```bash
nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape
```
### Filter by severity
```bash
nyx scan . --severity HIGH # Only high-severity taint findings
nyx scan . --severity ">=MEDIUM" # Skip low-severity
```
### Skip non-production code
By default, findings in `tests/`, `vendor/`, `build/` paths are downgraded one severity tier. To exclude them entirely, add to config:
```toml
[scanner]
excluded_directories = ["tests", "vendor", "build", "examples"]
```
### Disable taint (AST-only mode)
```bash
nyx scan . --mode ast
```
## Example
**Vulnerable code** (Rust):
```rust
use std::env;
use std::process::Command;
fn main() {
let cmd = env::var("USER_CMD").unwrap(); // line 5: source
Command::new("sh").arg("-c").arg(&cmd).output(); // line 6: sink
}
```
**Finding**:
```
[HIGH] taint-unsanitised-flow (source 5:15) src/main.rs:6:5
Source: env::var("USER_CMD") at 5:15
Sink: Command::new("sh").arg("-c")
Score: 76
```
**Safe alternative**:
```rust
use std::env;
use std::process::Command;
fn main() {
let cmd = env::var("USER_CMD").unwrap();
// Use the value as a direct argument, not a shell command
Command::new(&cmd).output();
// Or validate against an allowlist
}
```
## Technical Details
### Capability System
Taint uses a bitflag capability system to match sources with appropriate sanitizers and sinks:
| Capability | Bit | Sources | Sanitizers | Sinks |
|-----------|-----|---------|------------|-------|
| `ENV_VAR` | 0x01 | `env::var`, `getenv` | — | — |
| `HTML_ESCAPE` | 0x02 | — | `html_escape`, `DOMPurify.sanitize` | `innerHTML`, `document.write` |
| `SHELL_ESCAPE` | 0x04 | — | `shell_escape` | `Command::new`, `system()`, `eval()` |
| `URL_ENCODE` | 0x08 | — | `encodeURIComponent` | `location.href` |
| `JSON_PARSE` | 0x10 | — | `JSON.parse` | — |
| `FILE_IO` | 0x20 | — | `filepath.Clean`, `basename`, `os.path.realpath` | `fopen`, `open`, `send_file`, `fs::read_to_string` |
| `FMT_STRING` | 0x40 | — | — | `printf(var)` |
Sources typically use `Cap::all()` to match any sink. A sanitizer strips specific capability bits. A finding fires when a tainted variable reaches a sink and the taint still has the matching capability bit set.
### Nested Function Analysis
The CFG builder recursively discovers function expressions nested inside call arguments:
- **JavaScript/TypeScript**: `function_expression`, `arrow_function` inside call arguments (e.g., Express route handlers)
- **Ruby**: `do_block` and `block` nodes (e.g., Sinatra `get '/path' do...end`)
- **Go**: `func_literal` (anonymous function literals)
Each nested function is walked as a separate scope and receives a unique identifier (`<anon@{byte_offset}>`) to prevent collisions when multiple anonymous functions exist in the same file.
### Chained Call Classification
Method chains like `r.URL.Query().Get("host")` are normalized by stripping internal `()` segments between `.` separators. The classifier matches against both the original text and the normalized form, enabling rules like `r.URL` to match within `r.URL.Query.Get`.
### Nested Call Fallback
When the outermost call in an expression doesn't classify as a source/sink, the engine tries all nested inner calls. This handles patterns like `str(eval(expr))` where `str` is not a sink but the inner `eval` is.
### Rust `if let` / `while let` Pattern Bindings
The CFG builder recognizes Rust `let_condition` nodes inside `if` and `while` expressions. The value expression is classified for source/sink labels, and the pattern binding is extracted as a variable definition:
```rust
if let Ok(cmd) = env::var("CMD") {
// cmd is tainted — env::var is a source, cmd is the binding
Command::new("sh").arg("-c").arg(&cmd).output(); // taint-unsanitised-flow
}
```
This also works for `while let` patterns.
### JS/TS Two-Level Solve
For JavaScript and TypeScript, taint analysis uses a two-level approach:
1. **Level 1**: Solve top-level code (module scope)
2. **Level 2**: Solve each function seeded with the converged top-level state
This prevents false positives from cross-function taint leakage while preserving global-to-function flows.

32
docs/index.md Normal file
View file

@ -0,0 +1,32 @@
# Nyx Documentation
Welcome to the Nyx documentation. Nyx is a multi-language static vulnerability scanner built in Rust.
## User Guide
- [Installation](installation.md) — Install via cargo, prebuilt binaries, or from source
- [Quick Start](quickstart.md) — Your first scan in 60 seconds
- [CLI Reference](cli.md) — Every flag, subcommand, and option
- [Configuration](configuration.md) — Config file schema, precedence, custom rules
- [Output Formats](output.md) — Console, JSON, SARIF; exit codes; evidence fields
## Detector Reference
- [Detector Overview](detectors.md) — How the four detector families work together
- [Taint Analysis](detectors/taint.md) — Cross-file source-to-sink dataflow tracking
- [CFG Structural Analysis](detectors/cfg.md) — Auth gaps, unguarded sinks, resource leaks
- [State Model Analysis](detectors/state.md) — Resource lifecycle and authentication state
- [AST Patterns](detectors/patterns.md) — Tree-sitter structural pattern matching
## Rule Reference
- [Rule Index](rules/index.md) — How rules are organized
- [Rust](rules/rust.md) | [C](rules/c.md) | [C++](rules/cpp.md) | [Java](rules/java.md) | [Go](rules/go.md)
- [JavaScript](rules/javascript.md) | [TypeScript](rules/typescript.md) | [Python](rules/python.md)
- [PHP](rules/php.md) | [Ruby](rules/ruby.md)
## Contributing
- [Contributing Guide](../CONTRIBUTING.md) — Development setup, adding rules, PR guidelines
- [Security Policy](../SECURITY.md) — Responsible disclosure
- [Code of Conduct](../CODE_OF_CONDUCT.md)

76
docs/installation.md Normal file
View file

@ -0,0 +1,76 @@
# Installation
## Install from crates.io
```bash
cargo install nyx-scanner
```
This installs the `nyx` binary into `~/.cargo/bin/`.
## Install from GitHub releases
1. Go to the [Releases](https://github.com/elicpeter/nyx/releases) page.
2. Download the binary for your platform:
| Platform | Archive |
|----------|---------|
| Linux x86_64 | `nyx-x86_64-unknown-linux-gnu.zip` |
| macOS Intel | `nyx-x86_64-apple-darwin.zip` |
| macOS Apple Silicon | `nyx-aarch64-apple-darwin.zip` |
| Windows x86_64 | `nyx-x86_64-pc-windows-msvc.zip` |
3. Extract and install:
```bash
# Linux / macOS
unzip nyx-*.zip
chmod +x nyx
sudo mv nyx /usr/local/bin/
# Windows (PowerShell)
Expand-Archive -Path nyx-*.zip -DestinationPath .
Move-Item -Path .\nyx.exe -Destination "C:\Program Files\Nyx\"
```
4. Verify:
```bash
nyx --version
```
## Build from source
```bash
git clone https://github.com/elicpeter/nyx.git
cd nyx
cargo build --release
cargo install --path .
```
Requires **Rust 1.85+** (edition 2024).
## CI Integration
### GitHub Actions
```yaml
- name: Install Nyx
run: cargo install nyx-scanner
- name: Run security scan
run: nyx scan . --format sarif --fail-on medium > results.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
```
### Generic CI
```bash
# Fail the build if any High or Medium finding is detected
nyx scan . --severity ">=MEDIUM" --fail-on medium --quiet --format json
```
The `--fail-on` flag causes Nyx to exit with code **1** if any finding meets or exceeds the given severity. Exit code **0** means no findings matched.

315
docs/output.md Normal file
View file

@ -0,0 +1,315 @@
# Output Formats
Nyx supports three output formats, selected with `--format` or `output.default_format` in config.
## Console (default)
Human-readable, color-coded output to stdout. Status messages go to stderr.
```
[HIGH] taint-unsanitised-flow (source 5:11) src/handler.rs:12:5 (Score: 76, Confidence: High)
Source: env::var("CMD") → Command::new("sh").arg("-c")
[MEDIUM] cfg-unguarded-sink src/handler.rs:12:5 (Score: 35, Confidence: Medium)
[LOW] rs.quality.unwrap src/lib.rs:88:5 (Score: 10, Confidence: High)
```
### Severity indicators
| Tag | Color | Meaning |
|-----|-------|---------|
| `[HIGH]` | Red, bold | Critical — likely exploitable |
| `[MEDIUM]` | Orange, bold | Important — may be exploitable |
| `[LOW]` | Muted blue-gray | Informational — code quality or weak signal |
### Evidence fields
Taint and state findings include structured evidence:
| Label | Meaning |
|-------|---------|
| **Source** | Where tainted data originated (function name + location) |
| **Sink** | Where the dangerous operation happens |
| **Path guard** | Type of validation predicate protecting the path |
### Score
When attack-surface ranking is enabled (default), each finding shows a `Score` value. Higher scores indicate greater exploitability. See [Detector Overview](detectors.md) for the scoring formula.
### Rollup findings
High-frequency LOW Quality findings (e.g. `rs.quality.unwrap`) are grouped into rollup findings by `(file, rule)`:
```
21:10 ● [LOW] rs.quality.unwrap
rs.quality.unwrap (38 occurrences)
Examples: 21:10, 50:10, 79:10, 105:10, 134:10
Run: nyx scan --show-instances rs.quality.unwrap
```
Rollups count as **one finding** for LOW budget enforcement. Use `--show-instances <RULE>` to expand a specific rule or `--all` to disable rollups entirely.
### Suppression footer
When findings are suppressed by the prioritization pipeline, a footer is shown:
```
Suppressed 195 LOW/Quality findings.
Active filters:
include_quality = false
max_low = 20
max_low_per_file = 1
max_low_per_rule = 10
Use --include-quality, --max-low, or --all to adjust.
```
---
## JSON
Machine-readable JSON array. Each finding is an object:
```json
[
{
"path": "src/handler.rs",
"line": 12,
"col": 5,
"severity": "High",
"id": "taint-unsanitised-flow (source 5:11)",
"path_validated": false,
"labels": [
["Source", "env::var(\"CMD\") at 5:11"],
["Sink", "Command::new(\"sh\").arg(\"-c\")"]
],
"confidence": "High",
"evidence": {
"source": {
"path": "src/handler.rs",
"line": 5,
"col": 11,
"kind": "source",
"snippet": "env::var(\"CMD\")"
},
"sink": {
"path": "src/handler.rs",
"line": 12,
"col": 5,
"kind": "sink",
"snippet": "Command::new(\"sh\")"
},
"notes": ["source_kind:EnvironmentConfig"]
},
"rank_score": 76.0,
"rank_reason": [
["severity_base", "60"],
["analysis_kind", "10"],
["source_kind", "5"],
["evidence_count", "1"]
]
}
]
```
### Field descriptions
| Field | Type | Always present | Description |
|-------|------|----------------|-------------|
| `path` | string | yes | File path relative to scan root |
| `line` | int | yes | 1-indexed line number |
| `col` | int | yes | 1-indexed column number |
| `severity` | string | yes | `"High"`, `"Medium"`, or `"Low"` |
| `id` | string | yes | Rule ID |
| `category` | string | yes | Finding category: `"Security"`, `"Reliability"`, or `"Quality"` |
| `path_validated` | bool | no | True if guarded by validation predicate |
| `guard_kind` | string | no | Predicate type (e.g. `"NullCheck"`, `"ValidationCall"`) |
| `message` | string | no | Human-readable context (state analysis findings) |
| `labels` | array | no | Array of `[label, value]` pairs for console display |
| `confidence` | string | no | Confidence level: `"Low"`, `"Medium"`, or `"High"` |
| `evidence` | object | no | Structured evidence (source/sink spans, state, notes) |
| `rank_score` | float | no | Attack-surface score (omitted when ranking disabled) |
| `rank_reason` | array | no | Score breakdown (omitted when ranking disabled) |
| `rollup` | object | no | Rollup data when findings are grouped (see below) |
Fields marked "no" are omitted when empty/null/false to keep output compact.
### Confidence levels
| Level | Meaning |
|-------|---------|
| `High` | Strong signal — taint-confirmed flow, definite state violation |
| `Medium` | Moderate signal — resource leak, path-validated taint, CFG structural |
| `Low` | Weak signal — AST pattern match, possible resource leak, degraded analysis |
### Evidence object
The `evidence` field provides structured provenance data:
| Field | Type | Description |
|-------|------|-------------|
| `source` | object | Source span (path, line, col, kind, snippet) |
| `sink` | object | Sink span (path, line, col, kind, snippet) |
| `guards` | array | Validation guard spans |
| `sanitizers` | array | Sanitizer spans |
| `state` | object | State-machine evidence (machine, subject, from_state, to_state) |
| `notes` | array | Free-form notes (e.g. `"source_kind:UserInput"`, `"path_validated"`) |
All fields are omitted when empty/null.
### Rollup object
When a finding is a rollup (grouped from multiple occurrences), the `rollup` field is present:
```json
{
"rollup": {
"count": 38,
"occurrences": [
{ "line": 21, "col": 10 },
{ "line": 50, "col": 10 },
{ "line": 79, "col": 10 }
]
}
}
```
| Field | Type | Description |
|-------|------|-------------|
| `count` | int | Total number of occurrences |
| `occurrences` | array | First N example locations (controlled by `rollup_examples`) |
---
## SARIF (Static Analysis Results Interchange Format)
SARIF 2.1.0 JSON, suitable for GitHub Code Scanning and other SARIF-compatible tools.
```bash
nyx scan . --format sarif > results.sarif
```
The SARIF output includes:
- **Tool metadata** — Nyx name and version
- **Rules** — Rule ID, description, severity mapping
- **Results** — One result per finding with location, message, and properties
- **Properties** — Each result includes `category` and optionally `confidence` and `rollup.count`
- **Related locations** — Rollup findings include example locations in `relatedLocations`
- **Artifacts** — File paths referenced by findings
### GitHub Code Scanning integration
```yaml
- name: Run Nyx
run: nyx scan . --format sarif > results.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
```
---
## Exit Codes
| Code | Meaning |
|------|---------|
| `0` | Scan completed successfully; no findings matched `--fail-on` threshold |
| `1` | `--fail-on` threshold breached (at least one finding meets or exceeds the specified severity) |
| Non-zero | Error (I/O, config, database, parse error) |
Without `--fail-on`, Nyx always exits `0` on a successful scan regardless of findings count.
---
## Severity Levels
| Level | Description | Typical rules |
|-------|-------------|---------------|
| **High** | Critical vulnerabilities — likely exploitable | Command injection, unsafe deserialization, banned C functions, taint-confirmed flows with user input sources |
| **Medium** | Important issues — may be exploitable with additional context | SQL concatenation, XSS sinks, reflection, unguarded sinks, resource leaks |
| **Low** | Informational — code quality or weak signals | Weak crypto algorithms, insecure randomness, `unwrap()`/`panic!()`, type-safety escapes |
### Non-production severity downgrade
By default, findings in paths matching common non-production patterns (`tests/`, `test/`, `vendor/`, `build/`, `examples/`, `benchmarks/`) are downgraded by one tier:
- High → Medium
- Medium → Low
- Low → Low (unchanged)
Use `--keep-nonprod-severity` to disable this behavior.
---
## Inline Suppressions
Suppress specific findings directly in source code using `nyx:ignore` comments. Suppressed findings are excluded from output, severity counts, and `--fail-on` checks by default.
### Comment syntax
| Language | Comment styles |
|----------|---------------|
| Rust, C, C++, Java, Go, JS, TS | `// nyx:ignore ...` or `/* nyx:ignore ... */` |
| Python, Ruby | `# nyx:ignore ...` |
| PHP | `// nyx:ignore ...`, `# nyx:ignore ...`, or `/* nyx:ignore ... */` |
### Directive forms
```python
x = dangerous() # nyx:ignore taint-unsanitised-flow ← suppresses this line
# nyx:ignore-next-line taint-unsanitised-flow
x = dangerous() ← suppresses this line
```
- `nyx:ignore <RULE_ID>` — suppresses findings on the **same line** as the comment.
- `nyx:ignore-next-line <RULE_ID>` — suppresses findings on the **next line**.
- For taint findings, the primary line is the **sink line** (the `line` field in output).
### Rule ID matching
- **Case-sensitive**, exact match after canonicalization.
- Comma-separated: `nyx:ignore rule-a, rule-b`
- Wildcard suffix: `nyx:ignore rs.quality.*` matches any ID starting with `rs.quality.`
- Taint IDs are canonicalized: `nyx:ignore taint-unsanitised-flow` matches `taint-unsanitised-flow (source 5:1)` (parenthetical suffix stripped).
### Console behavior
- **Default**: suppressed findings are hidden entirely.
- **`--show-suppressed`**: suppressed findings appear dimmed with `[SUPPRESSED]` tag. Summary shows `"N issues (M suppressed)"`.
### JSON / SARIF behavior
- **Default**: suppressed findings are excluded from JSON/SARIF output.
- **`--show-suppressed`**: suppressed findings are included with additional fields:
```json
{
"suppressed": true,
"suppression": {
"kind": "SameLine",
"matched_pattern": "taint-unsanitised-flow",
"directive_line": 42
}
}
```
### Exit code
Suppressed findings do **not** trigger `--fail-on`. A scan with only suppressed findings exits `0`.
---
## Rule ID Format
| Prefix | Detector | Example |
|--------|----------|---------|
| `taint-*` | Taint analysis | `taint-unsanitised-flow (source 5:11)` |
| `cfg-*` | CFG structural | `cfg-unguarded-sink`, `cfg-auth-gap` |
| `state-*` | State model | `state-use-after-close`, `state-resource-leak` |
| `<lang>.*.*` | AST patterns | `rs.memory.transmute`, `js.code_exec.eval` |
See the [Rule Reference](rules/index.md) for a complete listing.

103
docs/quickstart.md Normal file
View file

@ -0,0 +1,103 @@
# Quick Start
## Your first scan
```bash
# Scan the current directory
nyx scan
# Scan a specific path
nyx scan ./my-project
```
Nyx automatically creates an SQLite index on first run. Subsequent scans skip unchanged files.
## Understanding the output
A typical console output looks like:
```
[HIGH] taint-unsanitised-flow (source 5:11) src/handler.rs:12:5
Source: env::var("CMD") at 5:11
Sink: Command::new("sh").arg("-c")
Score: 76
[MEDIUM] cfg-unguarded-sink src/handler.rs:12:5
Score: 35
[MEDIUM] rs.quality.unsafe_block src/lib.rs:44:5
Score: 30
```
Each finding shows:
| Field | Meaning |
|-------|---------|
| **Severity tag** | `[HIGH]`, `[MEDIUM]`, or `[LOW]` |
| **Rule ID** | Identifies the detector and specific rule |
| **Location** | `file:line:col` |
| **Evidence** | Source, Sink, and guard details (taint findings only) |
| **Score** | Attack-surface ranking score (higher = more exploitable) |
## Common workflows
### CI gate — fail on high-severity findings
```bash
nyx scan . --fail-on high --quiet
# Exit code 1 if any HIGH finding exists, 0 otherwise
```
### Export for tooling
```bash
# JSON for scripting
nyx scan . --format json > findings.json
# SARIF for GitHub Code Scanning
nyx scan . --format sarif > results.sarif
```
### Fast structural scan (no dataflow)
```bash
nyx scan . --mode ast
```
AST-only mode runs tree-sitter pattern queries without building CFGs or running taint analysis. Much faster, but misses dataflow vulnerabilities.
### Filter by severity
```bash
# Only high-severity
nyx scan . --severity HIGH
# High and medium
nyx scan . --severity ">=MEDIUM"
# Specific set
nyx scan . --severity "HIGH,MEDIUM"
```
### Skip the index
```bash
nyx scan . --index off
```
Useful for one-off scans or when you don't want to write to disk.
### Scan without non-production noise
By default, findings in test/vendor/build paths are downgraded one severity tier. To keep original severity:
```bash
nyx scan . --keep-nonprod-severity
```
## Next steps
- [CLI Reference](cli.md) — All flags and options
- [Configuration](configuration.md) — Customize rules, exclusions, and behavior
- [Detector Overview](detectors.md) — How the analysis engines work
- [Rule Reference](rules/index.md) — Browse all rules by language

89
docs/rules/c.md Normal file
View file

@ -0,0 +1,89 @@
# C Rules
Nyx detects C vulnerabilities through AST patterns (banned functions, format strings) and taint analysis (user input → shell execution, buffer overflow sinks).
## Taint Sources
| Function | Capability | Source Kind |
|----------|-----------|-------------|
| `getenv` | `all` | EnvironmentConfig |
| `fgets`, `scanf`, `fscanf`, `gets`, `read` | `all` | UserInput |
## Taint Sinks
| Function | Required Capability |
|----------|-------------------|
| `system`, `popen`, `exec*` family | `SHELL_ESCAPE` |
| `sprintf`, `strcpy`, `strcat` | `HTML_ESCAPE` |
| `printf`, `fprintf` | `FMT_STRING` |
| `fopen`, `open` | `FILE_IO` |
---
## AST Pattern Rules
### Memory Safety (Banned Functions)
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `c.memory.gets` | High | A | `gets()` — no bounds checking, always exploitable |
| `c.memory.strcpy` | High | A | `strcpy()` — no bounds checking on destination buffer |
| `c.memory.strcat` | High | A | `strcat()` — no bounds checking on destination buffer |
| `c.memory.sprintf` | High | A | `sprintf()` — no length limit on output buffer |
| `c.memory.scanf_percent_s` | High | A | `scanf("%s")` — unbounded string read |
| `c.memory.printf_no_fmt` | High | B | `printf(var)` — format-string vulnerability (non-literal first arg) |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `c.cmdi.system` | High | A | `system()` — shell command execution |
| `c.cmdi.popen` | Medium | A | `popen()` — shell command execution with pipe |
---
## Examples
### `c.memory.gets` — Banned function
**Vulnerable:**
```c
char buf[64];
gets(buf); // No bounds checking — buffer overflow
```
**Safe alternative:**
```c
char buf[64];
fgets(buf, sizeof(buf), stdin);
```
### `c.memory.printf_no_fmt` — Format string
**Vulnerable:**
```c
char *user_input = get_input();
printf(user_input); // Format string vulnerability
```
**Safe alternative:**
```c
char *user_input = get_input();
printf("%s", user_input);
```
### `c.cmdi.system` — Shell execution
**Vulnerable:**
```c
char cmd[256];
snprintf(cmd, sizeof(cmd), "ls %s", user_dir);
system(cmd); // Command injection if user_dir contains shell metacharacters
```
**Safe alternative:**
```c
// Use execvp with explicit argument array
char *args[] = {"ls", user_dir, NULL};
execvp("ls", args);
```

66
docs/rules/cpp.md Normal file
View file

@ -0,0 +1,66 @@
# C++ Rules
C++ rules inherit C banned-function concerns and add C++-specific patterns like dangerous casts.
## Taint Labels
C++ shares taint labels with C. See [C Rules](c.md) for the full source/sink/sanitizer listing.
---
## AST Pattern Rules
### Memory Safety
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `cpp.memory.gets` | High | A | `gets()` — no bounds checking, always exploitable |
| `cpp.memory.strcpy` | High | A | `strcpy()` — no bounds checking on destination |
| `cpp.memory.strcat` | High | A | `strcat()` — no bounds checking on destination |
| `cpp.memory.sprintf` | High | A | `sprintf()` — no length limit on output |
| `cpp.memory.reinterpret_cast` | Medium | A | `reinterpret_cast` — type-punning cast |
| `cpp.memory.const_cast` | Medium | A | `const_cast` — removes const/volatile qualifier |
| `cpp.memory.printf_no_fmt` | High | B | `printf(var)` — format-string vulnerability |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `cpp.cmdi.system` | High | A | `system()` — shell command execution |
| `cpp.cmdi.popen` | High | A | `popen()` — shell command execution |
---
## Examples
### `cpp.memory.reinterpret_cast` — Type-punning cast
**Flagged:**
```cpp
int x = 42;
float* fp = reinterpret_cast<float*>(&x); // Type-punning, may violate strict aliasing
```
**Safe alternative:**
```cpp
int x = 42;
float f;
std::memcpy(&f, &x, sizeof(f)); // Well-defined type punning
```
### `cpp.memory.const_cast` — Removing const
**Flagged:**
```cpp
void process(const std::string& s) {
char* p = const_cast<char*>(s.c_str()); // Removes const
p[0] = 'X'; // Undefined behavior
}
```
**Safe alternative:**
```cpp
void process(std::string s) { // Take by value
s[0] = 'X';
}
```

148
docs/rules/go.md Normal file
View file

@ -0,0 +1,148 @@
# Go Rules
Nyx detects Go vulnerabilities through AST patterns and taint analysis, covering command execution, unsafe pointer usage, TLS misconfiguration, weak crypto, SQL injection, hardcoded secrets, and deserialization.
## Taint Labels
Go has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/go.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `os.Getenv` | all |
| `http.Request`, `r.FormValue`, `r.URL`, `r.Body`, `r.Header` | all |
| `r.URL.Query`, `r.URL.Query.Get`, `Request.FormValue`, `Request.URL` | all |
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `html.EscapeString`, `template.HTMLEscapeString` | HTML_ESCAPE |
| `url.QueryEscape`, `url.PathEscape` | URL_ENCODE |
| `filepath.Clean`, `filepath.Base` | FILE_IO |
### Sinks
| Matcher | Cap |
|---------|-----|
| `exec.Command` | SHELL_ESCAPE |
| `db.Query`, `db.Exec`, `db.QueryRow`, `db.Prepare` | SHELL_ESCAPE |
| `fmt.Fprintf`, `fmt.Sprintf`, `fmt.Printf` | FMT_STRING |
| `os.Open`, `os.OpenFile`, `os.Create`, `ioutil.ReadFile`, `os.ReadFile` | FILE_IO |
| `template.HTML` | HTML_ESCAPE |
> **Note:** Chained calls like `r.URL.Query().Get("host")` are normalized by stripping internal `()` segments before matching, so `r.URL.Query.Get` matches the source rule.
---
## AST Pattern Rules
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.cmdi.exec_command` | High | A | `exec.Command()` — arbitrary process execution |
### Memory Safety
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.memory.unsafe_pointer` | Medium | A | `unsafe.Pointer` — bypasses Go type system |
### Insecure Transport
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.transport.insecure_skip_verify` | High | A | `InsecureSkipVerify: true` — disables TLS certificate validation |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.crypto.md5` | Low | A | `md5.New()` / `md5.Sum()` — weak hash algorithm |
| `go.crypto.sha1` | Low | A | `sha1.New()` / `sha1.Sum()` — weak hash algorithm |
### SQL Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.sqli.query_concat` | Medium | B | `db.Query`/`Exec`/`QueryRow` with concatenated string |
### Secrets
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.secrets.hardcoded_key` | Medium | A | Variable with secret-like name assigned a string literal |
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.deser.gob_decode` | Medium | A | `gob.NewDecoder` — Go binary deserialization |
---
## Examples
### `go.transport.insecure_skip_verify` — TLS misconfiguration
**Vulnerable:**
```go
tr := &http.Transport{
TLSClientConfig: &tls.Config{
InsecureSkipVerify: true, // Disables certificate verification
},
}
```
**Safe alternative:**
```go
tr := &http.Transport{
TLSClientConfig: &tls.Config{
// Use proper CA certificates
RootCAs: certPool,
},
}
```
### `go.sqli.query_concat` — SQL concatenation
**Vulnerable:**
```go
rows, err := db.Query("SELECT * FROM users WHERE id=" + userID)
```
**Safe alternative:**
```go
rows, err := db.Query("SELECT * FROM users WHERE id=$1", userID)
```
### `go.secrets.hardcoded_key` — Hardcoded secret
**Flagged:**
```go
apiKey := "sk-1234567890abcdef"
password := "hunter2"
```
**Safe alternative:**
```go
apiKey := os.Getenv("API_KEY")
password := os.Getenv("DB_PASSWORD")
```
### `go.cmdi.exec_command` — Command execution
**Vulnerable:**
```go
cmd := exec.Command("sh", "-c", userInput)
cmd.Run()
```
**Safe alternative:**
```go
// Use explicit command and arguments, not shell
cmd := exec.Command("ls", "-la", safeDir)
cmd.Run()
```

79
docs/rules/index.md Normal file
View file

@ -0,0 +1,79 @@
# Rule Reference
This section lists every detection rule in Nyx, organized by language.
## Rule ID Format
| Prefix | Detector Family | Example |
|--------|----------------|---------|
| `taint-*` | [Taint analysis](../detectors/taint.md) | `taint-unsanitised-flow (source 5:11)` |
| `cfg-*` | [CFG structural](../detectors/cfg.md) | `cfg-unguarded-sink`, `cfg-auth-gap` |
| `state-*` | [State model](../detectors/state.md) | `state-use-after-close`, `state-resource-leak` |
| `<lang>.*.*` | [AST patterns](../detectors/patterns.md) | `rs.memory.transmute`, `js.code_exec.eval` |
## Cross-Language Rules
These rules apply to all supported languages:
### Taint Rules
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `taint-unsanitised-flow (source L:C)` | Varies by source kind | Unsanitized data flows from source to sink |
### CFG Structural Rules
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `cfg-unguarded-sink` | High/Medium | Sink without dominating guard |
| `cfg-auth-gap` | High | Web handler reaches privileged sink without auth |
| `cfg-unreachable-sink` | Medium | Dangerous function in unreachable code |
| `cfg-unreachable-sanitizer` | Low | Sanitizer in unreachable code |
| `cfg-unreachable-source` | Low | Source in unreachable code |
| `cfg-error-fallthrough` | High/Medium | Error path doesn't terminate before dangerous code |
| `cfg-resource-leak` | Medium | Resource not released on all exit paths |
| `cfg-lock-not-released` | Medium | Lock not released on all exit paths |
### State Model Rules
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `state-use-after-close` | High | Variable used after being closed |
| `state-double-close` | Medium | Resource closed twice |
| `state-resource-leak` | Medium | Resource never closed (definite) |
| `state-resource-leak-possible` | Low | Resource may not close on all paths |
| `state-unauthed-access` | High | Privileged operation without authentication |
## Per-Language AST Pattern Rules
Each language page lists all AST pattern rules with examples:
- [Rust](rust.md) — 12 rules (memory safety, code quality)
- [C](c.md) — 8 rules (banned functions, command execution, format strings)
- [C++](cpp.md) — 9 rules (banned functions, dangerous casts, command execution)
- [Java](java.md) — 8 rules (deserialization, command execution, reflection, SQL, crypto, XSS)
- [Go](go.md) — 8 rules (command execution, unsafe pointer, TLS, crypto, SQL, secrets, deserialization)
- [JavaScript](javascript.md) — 12 rules (code execution, XSS, prototype pollution, crypto, transport)
- [TypeScript](typescript.md) — 10 rules (mirrors JS + type-safety escapes)
- [Python](python.md) — 12 rules (code execution, command execution, deserialization, SQL, crypto, XSS)
- [PHP](php.md) — 11 rules (code execution, command execution, deserialization, SQL, path traversal, crypto)
- [Ruby](ruby.md) — 10 rules (code execution, command execution, deserialization, reflection, SSRF, crypto)
## Taint Label Coverage
Taint analysis uses language-specific source/sink/sanitizer labels. Coverage varies by language:
| Language | Sources | Sinks | Sanitizers | Coverage |
|----------|---------|-------|------------|----------|
| Rust | Complete | Complete | Complete | Full |
| JavaScript | Complete | Complete | Partial | Full |
| TypeScript | Partial | Partial | Partial | Moderate |
| Python | Partial | Complete | Partial | Moderate |
| C | Partial | Complete | Minimal | Moderate |
| C++ | Partial | Complete | Minimal | Moderate |
| Java | Partial | Partial | Partial | Moderate |
| Go | Complete | Complete | Partial | Full |
| PHP | Complete | Complete | Partial | Full |
| Ruby | Partial | Partial | Partial | Moderate |
"Starter" coverage means basic rules exist but many common library functions are not yet labeled. Contributions welcome.

135
docs/rules/java.md Normal file
View file

@ -0,0 +1,135 @@
# Java Rules
Nyx detects Java vulnerabilities through AST patterns and taint analysis, covering deserialization, command execution, reflection, SQL injection, weak crypto, and XSS.
## Taint Labels
Java has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/java.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `System.getenv` | all |
| `getParameter`, `getInputStream`, `getHeader`, `getCookies`, `getReader`, `getQueryString`, `getPathInfo` | all |
| `readObject`, `readLine` | all |
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `HtmlUtils.htmlEscape`, `StringEscapeUtils.escapeHtml4` | HTML_ESCAPE |
### Sinks
| Matcher | Cap |
|---------|-----|
| `Runtime.exec`, `ProcessBuilder` | SHELL_ESCAPE |
| `executeQuery`, `executeUpdate`, `prepareStatement` | SHELL_ESCAPE |
| `Class.forName` | SHELL_ESCAPE |
| `println`, `print`, `write` | HTML_ESCAPE |
---
## AST Pattern Rules
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.deser.readobject` | High | A | `ObjectInputStream.readObject()` — unsafe deserialization |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.cmdi.runtime_exec` | High | A | `Runtime.getRuntime().exec()` — shell command execution |
### Reflection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.reflection.class_forname` | Medium | A | `Class.forName()` — dynamic class loading |
| `java.reflection.method_invoke` | Medium | A | `Method.invoke()` — reflective method invocation |
### SQL Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.sqli.execute_concat` | Medium | B | SQL `execute*()` with concatenated string argument |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.crypto.insecure_random` | Low | A | `new Random()``java.util.Random` is not cryptographically secure |
| `java.crypto.weak_digest` | Low | A | `MessageDigest.getInstance("MD5"/"SHA1")` |
### XSS
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.xss.getwriter_print` | Medium | A | `response.getWriter().print/println/write` — direct output |
---
## Examples
### `java.deser.readobject` — Unsafe deserialization
**Vulnerable:**
```java
ObjectInputStream ois = new ObjectInputStream(request.getInputStream());
Object obj = ois.readObject(); // Arbitrary object instantiation
```
**Safe alternative:**
```java
// Use a safe format like JSON
ObjectMapper mapper = new ObjectMapper();
MyType obj = mapper.readValue(request.getInputStream(), MyType.class);
```
### `java.sqli.execute_concat` — SQL concatenation
**Vulnerable:**
```java
String query = "SELECT * FROM users WHERE id=" + userId;
stmt.executeQuery(query); // SQL injection
```
**Safe alternative:**
```java
PreparedStatement ps = conn.prepareStatement("SELECT * FROM users WHERE id=?");
ps.setString(1, userId);
ResultSet rs = ps.executeQuery();
```
### `java.cmdi.runtime_exec` — Command execution
**Vulnerable:**
```java
Runtime.getRuntime().exec("cmd /c " + userCommand);
```
**Safe alternative:**
```java
ProcessBuilder pb = new ProcessBuilder("cmd", "/c", "dir");
// Use explicit argument list, never concatenate user input
```
### `java.reflection.class_forname` — Dynamic class loading
**Flagged:**
```java
Class<?> cls = Class.forName(className);
Object obj = cls.getDeclaredConstructor().newInstance();
```
**Safe alternative:**
```java
// Use an allowlist of permitted class names
Map<String, Class<?>> allowed = Map.of("User", User.class, "Order", Order.class);
Class<?> cls = allowed.get(className);
if (cls != null) { /* ... */ }
```

138
docs/rules/javascript.md Normal file
View file

@ -0,0 +1,138 @@
# JavaScript Rules
JavaScript has the most complete taint label coverage alongside Rust. Nyx detects code execution, XSS, prototype pollution, command injection, and weak crypto.
## Taint Sources
| Function | Capability | Source Kind |
|----------|-----------|-------------|
| `document.location`, `window.location` | `all` | UserInput |
| `req.body`, `req.query`, `req.params` | `all` | UserInput |
| `req.headers`, `req.cookies` | `all` | UserInput |
| `process.env` | `all` | EnvironmentConfig |
## Taint Sinks
| Function | Required Capability |
|----------|-------------------|
| `eval` | `SHELL_ESCAPE` |
| `innerHTML` | `HTML_ESCAPE` |
| `location.href`, `window.location.href` | `URL_ENCODE` |
| `child_process.exec`, `child_process.execSync` | `SHELL_ESCAPE` |
| `child_process.spawn` | `SHELL_ESCAPE` |
## Taint Sanitizers
| Function | Strips Capability |
|----------|------------------|
| `JSON.parse` | `JSON_PARSE` |
| `encodeURIComponent`, `encodeURI` | `URL_ENCODE` |
| `DOMPurify.sanitize` | `HTML_ESCAPE` |
> **Note:** Anonymous function expressions and arrow functions passed as callback arguments (e.g., Express `app.get('/path', function(req, res) { ... })`) are automatically walked as separate function scopes for taint analysis. Each anonymous function gets a unique scope identifier to prevent cross-function taint leakage.
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.code_exec.eval` | High | A | `eval()` — dynamic code execution |
| `js.code_exec.new_function` | High | A | `new Function()` — eval equivalent |
| `js.code_exec.settimeout_string` | Medium | A | `setTimeout`/`setInterval` with string argument |
### XSS Sinks
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.xss.document_write` | Medium | A | `document.write()` / `document.writeln()` |
| `js.xss.outer_html` | Medium | A | Assignment to `.outerHTML` |
| `js.xss.insert_adjacent_html` | Medium | A | `insertAdjacentHTML()` |
| `js.xss.location_assign` | Medium | A | Assignment to `location`/`location.href` — open redirect |
| `js.xss.cookie_write` | Medium | A | Write to `document.cookie` |
### Prototype Pollution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.prototype.proto_assignment` | Medium | A | Assignment to `__proto__` |
| `js.prototype.extend_object` | Medium | A | Assignment to `Object.prototype.*` |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.crypto.weak_hash` | Low | A | `crypto.createHash("md5"/"sha1")` |
| `js.crypto.math_random` | Low | A | `Math.random()` — not cryptographically secure |
### Insecure Transport
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.transport.fetch_http` | Low | A | `fetch("http://...")` — plaintext HTTP |
---
## Examples
### `js.code_exec.eval` — Dynamic code execution
**Vulnerable:**
```javascript
const code = req.query.code;
eval(code); // Remote code execution
```
**Safe alternative:**
```javascript
// Use a sandboxed interpreter or avoid eval entirely
const allowed = { add: (a, b) => a + b };
const result = allowed[req.query.operation]?.(req.query.a, req.query.b);
```
### `js.xss.document_write` — XSS sink
**Vulnerable:**
```javascript
document.write("<h1>" + userName + "</h1>");
```
**Safe alternative:**
```javascript
const el = document.createElement("h1");
el.textContent = userName;
document.body.appendChild(el);
```
### `js.prototype.proto_assignment` — Prototype pollution
**Vulnerable:**
```javascript
function merge(target, source) {
for (let key in source) {
target[key] = source[key]; // If key is "__proto__", pollutes prototype
}
}
```
**Safe alternative:**
```javascript
function merge(target, source) {
for (let key in source) {
if (key === "__proto__" || key === "constructor") continue;
target[key] = source[key];
}
}
```
### Taint: `req.body``eval()`
**Finding:**
```
[HIGH] taint-unsanitised-flow (source 2:18) src/handler.js:3:5
Source: req.body at 2:18
Sink: eval()
Score: 78
```

138
docs/rules/php.md Normal file
View file

@ -0,0 +1,138 @@
# PHP Rules
Nyx detects PHP vulnerabilities through AST patterns and taint analysis, covering code execution, command injection, deserialization, SQL injection, path traversal, and weak crypto.
## Taint Labels
PHP has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/php.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `$_GET` / `_GET`, `$_POST` / `_POST`, `$_REQUEST` / `_REQUEST`, `$_COOKIE` / `_COOKIE`, `$_FILES` / `_FILES`, `$_SERVER` / `_SERVER`, `$_ENV` / `_ENV` | all |
| `file_get_contents`, `fread` | all |
> **Note:** PHP superglobal names are matched both with and without the `$` prefix because the CFG's `collect_idents` strips the leading `$` from variable names. Subscript access like `$_GET['cmd']` is handled via `element_reference` / `subscript_expression` node detection.
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `htmlspecialchars`, `htmlentities` | HTML_ESCAPE |
| `escapeshellarg`, `escapeshellcmd` | SHELL_ESCAPE |
| `basename` | FILE_IO |
### Sinks
| Matcher | Cap |
|---------|-----|
| `system`, `exec`, `passthru`, `shell_exec`, `proc_open`, `popen` | SHELL_ESCAPE |
| `eval`, `assert` | SHELL_ESCAPE |
| `include`, `include_once`, `require`, `require_once` | FILE_IO |
| `unserialize` | SHELL_ESCAPE |
| `move_uploaded_file`, `copy`, `file_put_contents`, `fwrite` | FILE_IO |
| `echo`, `print` | HTML_ESCAPE |
| `mysqli_query`, `pg_query`, `query` | SHELL_ESCAPE |
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.code_exec.eval` | High | A | `eval()` — dynamic code execution |
| `php.code_exec.create_function` | High | A | `create_function()` — deprecated eval-like constructor |
| `php.code_exec.preg_replace_e` | High | A | `preg_replace` with `/e` modifier — code execution via regex |
| `php.code_exec.assert_string` | High | A | `assert()` with string argument — evaluates PHP code |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.cmdi.system` | High | A | `system`/`shell_exec`/`exec`/`passthru`/`proc_open`/`popen` |
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.deser.unserialize` | High | A | `unserialize()` — PHP object injection |
### SQL Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.sqli.query_concat` | Medium | B | `mysql_query`/`mysqli_query` with concatenated SQL |
### Path Traversal
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.path.include_variable` | High | B | `include`/`require` with variable path — file inclusion |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.crypto.md5` | Low | A | `md5()` — weak hash function |
| `php.crypto.sha1` | Low | A | `sha1()` — weak hash function |
| `php.crypto.rand` | Low | A | `rand()`/`mt_rand()` — not cryptographically secure |
---
## Examples
### `php.code_exec.eval` — Dynamic code execution
**Vulnerable:**
```php
eval($_GET['code']);
```
**Safe alternative:**
```php
// Never use eval with user input
// Use a template engine or allowlisted operations
```
### `php.deser.unserialize` — Object injection
**Vulnerable:**
```php
$obj = unserialize($_COOKIE['data']);
```
**Safe alternative:**
```php
$data = json_decode($_COOKIE['data'], true);
```
### `php.path.include_variable` — File inclusion
**Vulnerable:**
```php
include($_GET['page']); // Local/remote file inclusion
```
**Safe alternative:**
```php
$allowed = ['home', 'about', 'contact'];
$page = in_array($_GET['page'], $allowed) ? $_GET['page'] : 'home';
include("pages/{$page}.php");
```
### `php.sqli.query_concat` — SQL concatenation
**Vulnerable:**
```php
mysqli_query($conn, "SELECT * FROM users WHERE id=" . $_GET['id']);
```
**Safe alternative:**
```php
$stmt = $conn->prepare("SELECT * FROM users WHERE id=?");
$stmt->bind_param("i", $_GET['id']);
$stmt->execute();
```

142
docs/rules/python.md Normal file
View file

@ -0,0 +1,142 @@
# Python Rules
Nyx detects Python vulnerabilities through AST patterns and taint analysis, covering code execution, command injection, deserialization, SQL injection, and weak crypto.
## Taint Labels
Python has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/python.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `os.getenv`, `os.environ` | all |
| `request.args`, `request.form`, `request.json`, `request.headers`, `request.cookies`, `input` | all |
| `sys.argv` | all |
| `argparse.parse_args`, `urllib.request.urlopen`, `requests.get`, `requests.post` | all |
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `html.escape` | HTML_ESCAPE |
| `shlex.quote` | SHELL_ESCAPE |
| `os.path.realpath` | FILE_IO |
### Sinks
| Matcher | Cap |
|---------|-----|
| `eval`, `exec` | SHELL_ESCAPE |
| `os.system`, `os.popen`, `subprocess.call`, `subprocess.run`, `subprocess.Popen`, `subprocess.check_output`, `subprocess.check_call` | SHELL_ESCAPE |
| `cursor.execute`, `cursor.executemany` | SHELL_ESCAPE |
| `send_file`, `send_from_directory` | FILE_IO |
| `open` | FILE_IO |
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.code_exec.eval` | High | A | `eval()` — dynamic code execution |
| `py.code_exec.exec` | High | A | `exec()` — dynamic code execution |
| `py.code_exec.compile` | Medium | A | `compile()` with exec/eval mode |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.cmdi.os_system` | High | A | `os.system()` — shell command execution |
| `py.cmdi.os_popen` | High | A | `os.popen()` — shell command execution |
| `py.cmdi.subprocess_shell` | High | B | `subprocess.*` with `shell=True` |
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.deser.pickle_loads` | High | A | `pickle.loads()` / `pickle.load()` — arbitrary object deserialization |
| `py.deser.yaml_load` | High | A | `yaml.load()` without SafeLoader |
| `py.deser.shelve_open` | Medium | A | `shelve.open()` — pickle-backed deserialization |
### SQL Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.sqli.execute_format` | Medium | B | `cursor.execute()` with string concatenation |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.crypto.md5` | Low | A | `hashlib.md5()` — weak hash algorithm |
| `py.crypto.sha1` | Low | A | `hashlib.sha1()` — weak hash algorithm |
### Template Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.xss.jinja_from_string` | Medium | A | `jinja2.Template.from_string()` — template injection |
---
## Examples
### `py.deser.pickle_loads` — Unsafe deserialization
**Vulnerable:**
```python
import pickle
data = pickle.loads(request.body) # Arbitrary code execution
```
**Safe alternative:**
```python
import json
data = json.loads(request.body) # JSON is safe
```
### `py.cmdi.subprocess_shell` — Shell execution
**Vulnerable:**
```python
import subprocess
subprocess.call(user_input, shell=True) # Command injection
```
**Safe alternative:**
```python
import subprocess
import shlex
subprocess.call(shlex.split(user_input), shell=False)
# Or better: use an explicit command list
subprocess.call(["ls", "-la", user_dir])
```
### `py.deser.yaml_load` — Unsafe YAML
**Vulnerable:**
```python
import yaml
config = yaml.load(user_data) # Can instantiate arbitrary objects
```
**Safe alternative:**
```python
import yaml
config = yaml.safe_load(user_data) # Only basic Python types
```
### `py.sqli.execute_format` — SQL concatenation
**Vulnerable:**
```python
cursor.execute("SELECT * FROM users WHERE id=" + user_id)
```
**Safe alternative:**
```python
cursor.execute("SELECT * FROM users WHERE id=?", (user_id,))
```

132
docs/rules/ruby.md Normal file
View file

@ -0,0 +1,132 @@
# Ruby Rules
Nyx detects Ruby vulnerabilities through AST patterns and taint analysis, covering code execution, command injection, deserialization, reflection, SSRF, and weak crypto.
## Taint Labels
Ruby has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/ruby.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `ENV`, `gets` | all |
| `params` | all |
> **Note:** Ruby's `params[:cmd]` subscript access is detected via `element_reference` node handling in the CFG. Sinatra/Rails `do...end` blocks are walked as function scopes.
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `CGI.escapeHTML`, `ERB::Util.html_escape` | HTML_ESCAPE |
| `Shellwords.escape`, `Shellwords.shellescape` | SHELL_ESCAPE |
### Sinks
| Matcher | Cap |
|---------|-----|
| `system`, `exec` | SHELL_ESCAPE |
| `eval` | SHELL_ESCAPE |
| `puts`, `print` | HTML_ESCAPE |
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.code_exec.eval` | High | A | `Kernel#eval` — dynamic code execution |
| `rb.code_exec.instance_eval` | High | A | `instance_eval` — evaluates string in object context |
| `rb.code_exec.class_eval` | High | A | `class_eval` / `module_eval` — evaluates string in class context |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.cmdi.backtick` | High | A | Backtick shell execution (`` `cmd` ``) |
| `rb.cmdi.system_interp` | High | A | `system`/`exec` call — command execution risk |
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.deser.yaml_load` | High | A | `YAML.load` — arbitrary object deserialization |
| `rb.deser.marshal_load` | High | A | `Marshal.load` — arbitrary Ruby object deserialization |
### Reflection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.reflection.send_dynamic` | Medium | B | `send()` with non-symbol argument — arbitrary method dispatch |
| `rb.reflection.constantize` | Medium | A | `constantize` / `safe_constantize` — dynamic class resolution |
### SSRF
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.ssrf.open_uri` | Medium | A | `Kernel#open` with HTTP URL — SSRF via open-uri |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.crypto.md5` | Low | A | `Digest::MD5` — weak hash algorithm |
---
## Examples
### `rb.deser.yaml_load` — Unsafe YAML deserialization
**Vulnerable:**
```ruby
data = YAML.load(params[:config]) # Arbitrary object instantiation
```
**Safe alternative:**
```ruby
data = YAML.safe_load(params[:config]) # Only basic Ruby types
```
### `rb.cmdi.backtick` — Backtick shell execution
**Vulnerable:**
```ruby
output = `ls #{user_dir}` # Command injection via interpolation
```
**Safe alternative:**
```ruby
require 'open3'
output, status = Open3.capture2('ls', user_dir)
```
### `rb.reflection.send_dynamic` — Dynamic method dispatch
**Vulnerable:**
```ruby
obj.send(params[:method], params[:arg]) # Arbitrary method invocation
```
**Safe alternative:**
```ruby
allowed = %w[name email phone]
if allowed.include?(params[:method])
obj.send(params[:method])
end
```
### `rb.deser.marshal_load` — Marshal deserialization
**Vulnerable:**
```ruby
obj = Marshal.load(request.body.read)
```
**Safe alternative:**
```ruby
data = JSON.parse(request.body.read)
```

105
docs/rules/rust.md Normal file
View file

@ -0,0 +1,105 @@
# Rust Rules
Nyx detects Rust vulnerabilities through AST patterns (memory safety, code quality) and taint analysis (command injection via `env::var``Command::new`).
## Taint Sources
| Function | Capability | Source Kind |
|----------|-----------|-------------|
| `std::env::var`, `env::var` | `all` | EnvironmentConfig |
## Taint Sinks
| Function | Required Capability |
|----------|-------------------|
| `Command::new`, `Command::arg`, `Command::args` | `SHELL_ESCAPE` |
| `Command::status`, `Command::output` | `SHELL_ESCAPE` |
| `fs::read_to_string`, `fs::write`, `fs::read`, `File::open`, `File::create` | `FILE_IO` |
## Taint Sanitizers
| Function | Strips Capability |
|----------|------------------|
| `html_escape::encode_safe`, `sanitize_html` | `HTML_ESCAPE` |
| `shell_escape::unix::escape`, `sanitize_shell` | `SHELL_ESCAPE` |
> **Note:** `fs::read_to_string` was moved from taint sources to sinks to support path traversal detection (`env::var``fs::read_to_string`).
---
## AST Pattern Rules
### Memory Safety
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rs.memory.transmute` | High | A | `std::mem::transmute` — unchecked type reinterpretation |
| `rs.memory.copy_nonoverlapping` | High | A | `ptr::copy_nonoverlapping` — raw pointer memcpy |
| `rs.memory.get_unchecked` | High | A | `get_unchecked` / `get_unchecked_mut` — unchecked indexing |
| `rs.memory.mem_zeroed` | High | A | `std::mem::zeroed` — may be UB for non-POD types |
| `rs.memory.ptr_read` | High | A | `ptr::read` / `ptr::read_volatile` — raw pointer dereference |
| `rs.memory.narrow_cast` | Low | A | `as u8`/`i8`/`u16`/`i16` — possible truncation |
| `rs.memory.mem_forget` | Low | A | `std::mem::forget` — may leak resources |
### Code Quality
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rs.quality.unsafe_block` | Medium | A | `unsafe { }` block — manual memory safety obligation |
| `rs.quality.unsafe_fn` | Medium | A | `unsafe fn` declaration |
| `rs.quality.unwrap` | Low | A | `.unwrap()` — panics on `None`/`Err` |
| `rs.quality.expect` | Low | A | `.expect()` — panics on `None`/`Err` |
| `rs.quality.panic_macro` | Low | A | `panic!()` macro invocation |
| `rs.quality.todo` | Low | A | `todo!()` / `unimplemented!()` placeholder |
---
## Examples
### `rs.memory.transmute` — Unchecked type reinterpretation
**Vulnerable:**
```rust
let x: u32 = 42;
let y: f32 = unsafe { std::mem::transmute(x) };
```
**Safe alternative:**
```rust
let x: u32 = 42;
let y: f32 = f32::from_bits(x);
```
### `rs.quality.unsafe_block` — Unsafe block
**Flagged:**
```rust
unsafe {
let ptr = &x as *const i32;
println!("{}", *ptr);
}
```
**Safe alternative:**
```rust
// Use safe abstractions when possible
println!("{}", x);
```
### Taint: `env::var``Command::new`
**Vulnerable:**
```rust
let cmd = std::env::var("USER_CMD").unwrap();
Command::new("sh").arg("-c").arg(&cmd).output()?;
```
**Safe alternative:**
```rust
let cmd = std::env::var("USER_CMD").unwrap();
// Validate against allowlist
let allowed = ["ls", "whoami", "date"];
if allowed.contains(&cmd.as_str()) {
Command::new(&cmd).output()?;
}
```

81
docs/rules/typescript.md Normal file
View file

@ -0,0 +1,81 @@
# TypeScript Rules
TypeScript rules mirror JavaScript patterns plus TypeScript-specific type-safety escape detectors. Taint labels are shared with JavaScript (see [JavaScript Rules](javascript.md)).
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.code_exec.eval` | High | A | `eval()` — dynamic code execution |
| `ts.code_exec.new_function` | High | A | `new Function()` — eval equivalent |
| `ts.code_exec.settimeout_string` | Medium | A | `setTimeout`/`setInterval` with string argument |
### XSS Sinks
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.xss.document_write` | Medium | A | `document.write()` / `document.writeln()` |
| `ts.xss.outer_html` | Medium | A | Assignment to `.outerHTML` |
| `ts.xss.insert_adjacent_html` | Medium | A | `insertAdjacentHTML()` |
| `ts.xss.location_assign` | Medium | A | Assignment to `location`/`location.href` |
| `ts.xss.cookie_write` | Low | A | Write to `document.cookie` |
### Prototype Pollution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.prototype.proto_assignment` | Medium | A | Assignment to `__proto__` |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.crypto.math_random` | Low | A | `Math.random()` — not cryptographically secure |
### Code Quality (TypeScript-specific)
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.quality.any_annotation` | Low | A | Type annotation of `any` — disables type checking |
| `ts.quality.as_any` | Low | A | Type assertion `as any` — type-safety escape hatch |
---
## Examples
### `ts.quality.any_annotation``any` type
**Flagged:**
```typescript
function process(data: any) { // ts.quality.any_annotation
data.whatever(); // No type checking
}
```
**Safe alternative:**
```typescript
interface UserData { name: string; email: string; }
function process(data: UserData) {
console.log(data.name);
}
```
### `ts.quality.as_any` — Type assertion escape
**Flagged:**
```typescript
const result = someValue as any; // ts.quality.as_any
result.nonexistentMethod();
```
**Safe alternative:**
```typescript
if (isValidType(someValue)) {
const result = someValue as KnownType;
result.knownMethod();
}
```