Phase 1 (#33)

* chore: Exclude CLAUDE.md from Cargo.toml * feat: add callgraph module and integrate into main analysis flow * feat: enhance CLI with new severity filtering and analysis modes * feat: update CHANGELOG with recent enhancements and fixes to severity filtering and output handling * feat: implement state-model dataflow analysis for resource lifecycle and auth state * feat: enhance diagnostic output formatting and add evidence structure * feat: implement attack surface ranking for diagnostics with scoring and sorting * feat: add comprehensive documentation for installation, usage, and rules reference * feat: add multiple language support for command execution and evaluation endpoints * feat: implement inline suppression for findings using `nyx:ignore` comments * feat: add confidence levels to AST patterns and update output structure * feat: implement low-noise prioritization system with category filtering, rollup grouping, and configurable budgets * feat: bump version to 0.4.0 and update changelog with new features and improvements * feat: add dead code allowances to various functions in mod.rs and real_world_tests.rs
2026-06-24 20:28:06 +02:00 · 2026-02-25 21:16:36 -05:00 · 2026-02-25 21:16:36 -05:00 · 1bbe4b1cfb
commit 1bbe4b1cfb
parent 19b578c5c4
456 changed files with 25628 additions and 1228 deletions
--- a/README.md
+++ b/README.md
@ -6,7 +6,7 @@
 [![crates.io](https://img.shields.io/crates/v/nyx-scanner.svg)](https://crates.io/crates/nyx-scanner)
 [![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
 [![Rust 1.85+](https://img.shields.io/badge/rust-1.85%2B-orange)](https://www.rust-lang.org)
-[![CI](https://img.shields.io/github/actions/workflow/status/ecpeter23/nyx/ci.yml?branch=master)](https://github.com/ecpeter23/nyx/actions)
+[![CI](https://img.shields.io/github/actions/workflow/status/elicpeter/nyx/ci.yml?branch=master)](https://github.com/elicpeter/nyx/actions)
 </div>

 ---
@ -24,7 +24,7 @@
 | Multi-language support | Rust, C, C++, Java, Go, PHP, Python, Ruby, TypeScript, JavaScript |
 | AST-level pattern matching | Language-specific queries written against precise parse trees |
 | Control-flow graph analysis | Auth gaps, unguarded sinks, unreachable security code, resource leaks, error fallthrough |
-| Cross-file taint tracking | BFS taint propagation from sources through sanitizers to sinks with function summaries |
+| Cross-file taint tracking | Monotone forward dataflow taint analysis from sources through sanitizers to sinks with function summaries |
 | Cross-language interop | Taint flows across language boundaries via explicit interop edges |
 | Two-pass architecture | Pass 1 extracts function summaries; Pass 2 runs taint with full cross-file context |
 | Incremental indexing | SQLite database stores file hashes, summaries, and findings to skip unchanged files |
@ -42,7 +42,7 @@
 |---|---|
 | **Pure-Rust, single binary** | No JVM, Python, or server to install; drop the `nyx` executable into your `$PATH` and go. |
 | **Massively parallel** | Uses Rayon and a thread-pool walker; scales to all CPU cores. Scanning the entire **rust-lang/rust** codebase (~53,000 files) on an M2 MacBook Pro takes **~1 s**. |
-| **Deep analysis** | Real CFG construction and taint propagation, not just regex matching. Cross-file function summaries, capability-based sanitizer tracking, and scored findings. |
+| **Deep analysis** | Real CFG construction and monotone dataflow taint analysis with guaranteed termination, not just regex matching. Cross-file function summaries, capability-based sanitizer tracking, and scored findings. |
 | **Index-aware** | An optional SQLite index stores file hashes and findings; subsequent scans touch *only* changed files, slashing CI times. |
 | **Offline & privacy-friendly** | Requires no login, cloud account, or telemetry. Perfect for air-gapped environments and strict compliance policies. |
 | **Tree-sitter precision** | Parses real language grammars, not regexes, giving far fewer false positives than line-based scanners. |
@ -58,7 +58,7 @@ $ cargo install nyx-scanner
 ```

 ### Install Github release
-1. Navigate to the [Releases](https://github.com/ecpeter23/nyx/releases) page of the repository.
+1. Navigate to the [Releases](https://github.com/elicpeter/nyx/releases) page of the repository.
 2. Download the appropriate binary for your system:

    ```nyx-x86_64-unknown-linux-gnu.zip``` for Linux
@ -87,7 +87,7 @@ $ cargo install nyx-scanner
 ### Build from source

 ```bash
-$ git clone https://github.com/ecpeter23/nyx.git
+$ git clone https://github.com/elicpeter/nyx.git
 $ cd nyx
 $ cargo build --release
 # optional – copy the binary into PATH
@ -111,20 +111,29 @@ $ nyx scan ./server --format json
 $ nyx scan --format sarif > results.sarif

 # Perform an ad-hoc scan without touching the index
-$ nyx scan --no-index
+$ nyx scan --index off

 # Restrict results to high-severity findings
-$ nyx scan --high-only
+$ nyx scan --severity HIGH
+
+# Filter by severity expression (high and medium)
+$ nyx scan --severity ">=MEDIUM"

 # AST pattern matching only (fastest, no CFG/taint)
-$ nyx scan --ast-only
+$ nyx scan --mode ast

 # CFG + taint analysis only (skip AST pattern rules)
-$ nyx scan --cfg-only
+$ nyx scan --mode cfg
+
+# CI gate: fail on medium+, SARIF output
+$ nyx scan --format sarif --fail-on MEDIUM > results.sarif
+
+# Suppress status messages (for CI/scripting)
+$ nyx scan --quiet --format json

 # Include test/vendor/benchmark paths at original severity
 # (by default these are downgraded one tier)
-$ nyx scan --include-nonprod
+$ nyx scan --keep-nonprod-severity
 ```

 ### Index Management
@ -164,13 +173,14 @@ $ nyx config add-terminator --lang javascript --name process.exit

 ## Analysis Modes

-Nyx supports three analysis modes, selectable via the `scanner.mode` config option or CLI flags:
+Nyx supports four analysis modes, selectable via `--mode` or the `scanner.mode` config option:

 | Mode | CLI flag | What runs |
 |---|---|---|
-| **Full** (default) | — | AST pattern matching + CFG construction + taint analysis |
-| **AST-only** | `--ast-only` | AST pattern matching only; skips CFG and taint entirely |
-| **Taint-only** | `--cfg-only` | CFG + taint analysis only; filters out AST pattern findings |
+| **Full** (default) | `--mode full` | AST pattern matching + CFG construction + taint analysis |
+| **AST-only** | `--mode ast` | AST pattern matching only; skips CFG and taint entirely |
+| **CFG** | `--mode cfg` | CFG + taint analysis only; filters out AST pattern findings |
+| **Taint** | `--mode taint` | Alias for `cfg` (CFG + taint analysis) |

 ### What the CFG + taint engine detects

@ -182,8 +192,40 @@ Nyx supports three analysis modes, selectable via the `scanner.mode` config opti
 | Unreachable security code | `cfg-unreachable-*` | Sanitizers, guards, or sinks in dead code branches |
 | Error fallthrough | `cfg-error-fallthrough` | Error-handling branches that don't terminate, allowing execution to fall through to dangerous operations |
 | Resource leak | `cfg-resource-leak` | Resources acquired but not released on all exit paths (malloc/free, fopen/fclose, Lock/Unlock) |
+| Use-after-close | `state-use-after-close` | Variable read/written after its resource handle was closed |
+| Double-close | `state-double-close` | Resource handle closed more than once |
+| Must-leak | `state-resource-leak` | Resource acquired but never closed on any exit path |
+| May-leak | `state-resource-leak-possible` | Resource open on some but not all exit paths |
+| Unauthenticated access | `state-unauthed-access` | Sensitive sink reached without a preceding auth/admin check |

-Findings are scored and ranked by severity, proximity to entry point, path complexity, and taint confirmation.
+### Attack Surface Ranking
+
+Every finding is assigned a deterministic **attack-surface score** that estimates exploitability using only information already in memory — no extra source passes are needed. Findings are sorted by descending score before truncation, so `max_results` always keeps the most important results.
+
+The score is the sum of five components:
+
+| Component | Weight | Description |
+|---|---|---|
+| **Severity base** | High = 60, Medium = 30, Low = 10 | Primary ordering signal. Severity reflects source-kind exploitability and rule confidence. |
+| **Analysis kind** | taint = +10, state = +8, cfg = +3/+5, ast = 0 | Taint-confirmed flows are the strongest signal; AST-only pattern matches rank lowest at equal severity. CFG findings with evidence get +5, without get +3. |
+| **Evidence strength** | +1 per evidence item (max 4), +2–6 for source kind | More evidence increases confidence. Source-kind priority: user input (+6) > env/config (+5) > unknown (+4) > file system (+3) > database (+2). |
+| **State rule type** | +1 to +6 | Use-after-close and unauthenticated access (+6) rank above double-close (+3), must-leak (+2), and may-leak (+1). |
+| **Path validation** | −5 | Findings on paths guarded by a validation predicate receive a small exploitability penalty — the guard may prevent triggering. |
+
+**Score ranges** (approximate):
+
+| Finding type | Score |
+|---|---|
+| High taint + user input | ~78 |
+| High state (use-after-close) | ~74 |
+| High CFG structural | ~63 |
+| Medium taint + env source | ~47 |
+| Medium state (resource leak) | ~40 |
+| Low AST-only pattern | ~10 |
+
+Tie-breaking is deterministic: severity → rule ID → file path → line → column → message hash. The same set of findings always produces the same ordering regardless of parallelism or input order.
+
+Ranking is enabled by default. Disable it with `--no-rank` or `output.attack_surface_ranking = false` in config. When disabled, `rank_score` is omitted from JSON/SARIF output.

 ---

@ -213,8 +255,8 @@ Nyx merges a default configuration file (`nyx.conf`) with user overrides (`nyx.l
 | Platform | Directory |
 |---|---|
 | Linux | `~/.config/nyx/` |
-| macOS | `~/Library/Application Support/dev.ecpeter23.nyx/` |
-| Windows | `%APPDATA%\ecpeter23\nyx\config\` |
+| macOS | `~/Library/Application Support/nyx/` |
+| Windows | `%APPDATA%\elicpeter\nyx\config\` |

 Minimal example (`nyx.local`):

@ -270,7 +312,7 @@ Nyx uses a **two-pass architecture** to enable cross-file analysis without sacri
 1. **File enumeration** -- A parallel walker (Rayon + `ignore` crate) applies gitignore rules, size limits, and user exclusions.
 2. **Pass 1 -- Summary extraction** -- Each file is parsed via tree-sitter, an intra-procedural CFG is built (petgraph), and a `FuncSummary` is exported per function capturing source/sanitizer/sink capabilities (bitflags), taint propagation behavior, and callee lists. Summaries are persisted to SQLite.
 3. **Summary merge** -- All per-file summaries are merged into a `GlobalSummaries` map with conservative conflict resolution (union caps, OR booleans).
-4. **Pass 2 -- Analysis** -- Files are re-parsed and analyzed with the full cross-file context: BFS taint propagation resolves callees against local and global summaries, CFG analysis checks for auth gaps, unguarded sinks, resource leaks, and more.
+4. **Pass 2 -- Analysis** -- Files are re-parsed and analyzed with the full cross-file context: a monotone forward dataflow engine resolves callees against local and global summaries and propagates taint through a bounded lattice with guaranteed convergence. CFG analysis checks for auth gaps, unguarded sinks, resource leaks, and more.
 5. **Reporting** -- Findings are scored, ranked, deduplicated, and emitted to the console or serialized as JSON.

 With indexing enabled, Pass 1 skips files whose blake3 content hash is unchanged, and cached findings are served directly for AST-only results.
@ -279,14 +321,19 @@ With indexing enabled, Pass 1 skips files whose blake3 content hash is unchanged

 ## Roadmap

-### Phase 1 -- Deep Static Engine
+### Phase 1 -- Deep Static Engine (Complete)

-| Feature | Description |
-|---|---|
-| Interprocedural call graph | Precise symbol resolution via `FuncKey`, language-scoped namespaces, cross-module linking. No name-collision merging -- full call graph with topological analysis. |
-| Path-sensitive analysis | Track path predicates and conditional constraints. Detect infeasible paths and validation-only-in-one-branch patterns. Dramatically reduces false positives. |
-| Dataflow & state modeling | Resource state machines (init -> use -> close), auth state transitions, privilege level tracking. Semantic analysis beyond pattern matching. |
-| Attack surface ranking | Score entry points by distance-to-sink, guard strength, path complexity, and privilege escalation potential. Deterministic attack surface scoring. |
+| Feature | Status | Description |
+|---|--------|---|
+| Interprocedural call graph | Done | Precise symbol resolution via `FuncKey`, language-scoped namespaces, cross-module linking. Full call graph with SCC and topological analysis. |
+| Path-sensitive analysis | Done | Track path predicates and conditional constraints. Detect infeasible paths and validation-only-in-one-branch patterns. Monotone predicate summaries with contradiction pruning. |
+| Dataflow & state modeling | Done | Resource state machines (init -> use -> close), auth state transitions, privilege level tracking. Generic `Transfer` trait over bounded lattices with guaranteed convergence. |
+| Monotone taint analysis | Done | Replaced BFS taint engine with a forward worklist dataflow analysis over a finite `TaintState` lattice. Multi-origin tracking, dual validated-must/may sets, JS/TS two-level solve. Guaranteed termination via lattice finiteness. |
+| Attack surface ranking | Done | Deterministic post-analysis scoring of findings by severity, analysis kind, evidence strength, source-kind exploitability, and validation state. Findings sorted by score before truncation so `max_results` keeps the most important results. |
+| Inline suppressions | Done | `nyx:ignore` and `nyx:ignore-next-line` comments with wildcard matching, all 10 languages supported. `--show-suppressed` flag for visibility. |
+| Low-noise prioritization | Done | Category filtering, rollup grouping for high-frequency rules, configurable LOW budgets. Quality-category findings hidden by default. |
+| Pattern-level confidence | Done | Explicit High/Medium/Low confidence on every AST pattern. Confidence flows into output alongside severity and rank score. |
+| AST pattern overhaul | Done | 30+ new patterns across all languages, 11 broken query fixes, namespaced IDs, severity recalibration. |

 ### Phase 2 -- Dynamic Capability

@ -312,7 +359,25 @@ With indexing enabled, Pass 1 skips files whose blake3 content hash is unchanged
 | Rule updates | Remote rule feed with signature verification |
 | UX | Smart file-watch re-scan |

-Community feedback shapes priorities -- please [open an issue](https://github.com/ecpeter23/nyx/issues) to discuss proposed changes.
+Community feedback shapes priorities -- please [open an issue](https://github.com/elicpeter/nyx/issues) to discuss proposed changes.
+
+---
+
+## Documentation
+
+Full documentation is available in the [`docs/`](docs/index.md) directory:
+
+- [Installation](docs/installation.md) — cargo, binaries, CI tips
+- [Quick Start](docs/quickstart.md) — Your first scan in 60 seconds
+- [CLI Reference](docs/cli.md) — Every flag and subcommand
+- [Configuration](docs/configuration.md) — Config file schema, custom rules
+- [Output Formats](docs/output.md) — Console, JSON, SARIF; exit codes
+- [Detector Overview](docs/detectors.md) — How the four detector families work
+  - [Taint Analysis](docs/detectors/taint.md) — Cross-file source-to-sink dataflow
+  - [CFG Structural](docs/detectors/cfg.md) — Auth gaps, unguarded sinks, resource leaks
+  - [State Model](docs/detectors/state.md) — Resource lifecycle, authentication state
+  - [AST Patterns](docs/detectors/patterns.md) — Tree-sitter structural matching
+- [Rule Reference](docs/rules/index.md) — Per-language rule listings with examples

 ---

@ -327,7 +392,7 @@ Pull requests are welcome. To contribute:

 Please open an issue for any crash, panic, or suspicious result -- attach the minimal code snippet and mention the Nyx version.

-See `CONTRIBUTING.md` for full guidelines.
+See [`CONTRIBUTING.md`](CONTRIBUTING.md) for full guidelines, including how to add new rules and support new languages.

 ---