mirror of
https://github.com/elicpeter/nyx.git
synced 2026-06-06 19:35:13 +02:00
Phase 1 (#33)
* chore: Exclude CLAUDE.md from Cargo.toml * feat: add callgraph module and integrate into main analysis flow * feat: enhance CLI with new severity filtering and analysis modes * feat: update CHANGELOG with recent enhancements and fixes to severity filtering and output handling * feat: implement state-model dataflow analysis for resource lifecycle and auth state * feat: enhance diagnostic output formatting and add evidence structure * feat: implement attack surface ranking for diagnostics with scoring and sorting * feat: add comprehensive documentation for installation, usage, and rules reference * feat: add multiple language support for command execution and evaluation endpoints * feat: implement inline suppression for findings using `nyx:ignore` comments * feat: add confidence levels to AST patterns and update output structure * feat: implement low-noise prioritization system with category filtering, rollup grouping, and configurable budgets * feat: bump version to 0.4.0 and update changelog with new features and improvements * feat: add dead code allowances to various functions in mod.rs and real_world_tests.rs
This commit is contained in:
parent
19b578c5c4
commit
1bbe4b1cfb
456 changed files with 25628 additions and 1228 deletions
179
CHANGELOG.md
179
CHANGELOG.md
|
|
@ -5,6 +5,185 @@ All notable changes to this project will be documented in this file.
|
|||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
## [0.4.0] - 2025-02-25
|
||||
|
||||
### Added
|
||||
- **Low-noise prioritization system** — post-analysis pipeline that reduces noise from high-frequency LOW/Quality findings without hiding security signal. Three-stage process: category filtering, rollup grouping, and LOW budgets.
|
||||
- **`FindingCategory` enum** (`Security`, `Reliability`, `Quality`) — every `Diag` now carries a `category` field. AST pattern findings derive their category from `PatternCategory` metadata (`CodeQuality` → `Quality`, all others → `Security`). Taint, CFG, and state findings are always `Security`.
|
||||
- **Category filtering** — Quality-category findings (e.g. `rs.quality.unwrap`, `rs.quality.expect`) are excluded by default. Use `--include-quality` to include them.
|
||||
- **Rollup grouping** — eligible HIGH-frequency rules (`rs.quality.unwrap`, `rs.quality.expect`, `rs.quality.panic_macro`) are grouped by `(file, rule)` into a single rollup finding with occurrence count and example locations. Canonical location is the first sorted occurrence. Example count controlled by `--rollup-examples` (default 5).
|
||||
- **LOW budgets** — three configurable limits enforce noise caps: `--max-low` (default 20, total), `--max-low-per-file` (default 1), `--max-low-per-rule` (default 10). Rollups count as one finding for all budgets. High/Medium findings are never dropped.
|
||||
- **`--all` CLI flag** — disables all prioritization (no category filtering, no rollups, no budgets).
|
||||
- **`--show-instances <RULE>`** — bypasses rollup for a specific rule, expanding all individual occurrences.
|
||||
- **Console suppression footer** — when findings are suppressed, a footer displays the count and active filter values with adjustment hints.
|
||||
- **`rollup` field on `Diag`** — optional `RollupData` with `count` and `occurrences` (example `Location`s). Serializes to JSON automatically; omitted when not a rollup.
|
||||
- **SARIF rollup support** — `category` in result properties, rollup count in `properties.rollup.count`, example locations in `relatedLocations`.
|
||||
- **`max_results` severity stability** — when `max_results` truncation is needed, High findings are kept first, then Medium, then Low. Low findings never displace higher-severity ones.
|
||||
- New config fields in `[output]`: `include_quality`, `show_all`, `max_low`, `max_low_per_file`, `max_low_per_rule`, `rollup_examples`.
|
||||
- 14 new unit tests covering category filtering, rollup grouping/examples/canonical, LOW budgets (per-file/per-rule/total), High/Medium immunity, rollup-counts-as-one, show_instances bypass, JSON serialization, and determinism.
|
||||
- **Pattern-level confidence for AST rules** — each AST pattern in `src/patterns/` now carries an explicit `confidence: Confidence` field (High, Medium, or Low). Confidence is set at the pattern definition site and flows directly into emitted `Diag`s, replacing the old heuristic that inferred AST confidence from severity alone. `compute_confidence()` is retained as a fallback for detectors that don't set confidence (taint, state, legacy).
|
||||
- Tier A patterns with High/Medium severity → `Confidence::High` (deterministic structural match).
|
||||
- Tier A patterns with Low severity → `Confidence::Medium` (quality/crypto signals).
|
||||
- Tier B patterns (heuristic-guarded) → `Confidence::Medium`.
|
||||
- Example: `rs.quality.expect` now produces `Confidence: High` regardless of its Low severity.
|
||||
- **Inline per-finding suppressions** — suppress specific findings directly in source code using `nyx:ignore` comments. Two directive forms: `nyx:ignore <RULE_ID>` (same line) and `nyx:ignore-next-line <RULE_ID>` (next line). Supports comma-separated IDs, wildcard suffixes (`rs.quality.*`), and automatic canonicalization of taint rule IDs (parenthetical suffixes stripped). Comment detection covers all 10 languages with string/raw-string/template-literal guards to avoid false positives.
|
||||
- **`--show-suppressed` CLI flag** — reveal suppressed findings in output, dimmed with `[SUPPRESSED]` tag. Summary shows `"N issues (M suppressed)"`. In JSON/SARIF mode, suppressed findings include `"suppressed": true` and `"suppression": {...}` metadata fields.
|
||||
- **`suppressed` and `suppression` fields on `Diag`** — conditionally serialized; JSON output is unchanged when no suppressions are active.
|
||||
- Suppressed findings are excluded from `--fail-on` exit-code checks and severity counts.
|
||||
- New module `src/suppress/mod.rs` with 22 unit tests covering all comment styles, string guards, wildcard matching, canonicalization, CRLF, and edge cases.
|
||||
- **`--min-score <N>` CLI flag and `output.min_score` config option** — filter out findings whose attack-surface rank score falls below the given threshold. Applied after ranking and severity filtering, before `max_results` truncation. Has no effect when `--no-rank` is used. CLI value overrides config.
|
||||
- **Attack surface ranking** — deterministic post-analysis scoring layer that prioritizes findings by exploitability. Each `Diag` receives an `f64` score computed from five components: severity base (High=60, Medium=30, Low=10), analysis kind bonus (taint +10 > state +8 > cfg +3/5 > ast 0), evidence strength (+1 per item, +2–6 for source-kind priority), state rule type bonus (+1–6), and a path-validation penalty (−5 for guarded paths). Findings are sorted by descending score before truncation so `max_results` keeps the most important results. Tie-breaking is deterministic by severity, rule ID, file path, line, column, and message hash.
|
||||
- **`rank_score` and `rank_reason` fields on `Diag`** — optional fields with `#[serde(skip_serializing_if = "Option::is_none")]`; JSON output is unchanged when ranking is disabled.
|
||||
- **`--no-rank` CLI flag** — disables attack-surface ranking (enabled by default).
|
||||
- **`output.attack_surface_ranking` config key** — boolean (default `true`) to control ranking via config file.
|
||||
- **Console score display** — dim `Score: N` appended to each finding's header line when ranking is enabled.
|
||||
- **New module `src/rank.rs`** — `compute_attack_rank()`, `rank_diags()`, and `sort_key()` functions. Scoring uses only in-memory data; no extra file I/O or graph recomputation.
|
||||
- 10 new unit tests: ordering correctness (high taint > medium file-io, must-leak > may-leak, taint > cfg-only, state rules, AST lowest at same severity), determinism (input-order-independent), path-validation penalty, and JSON serialization (rank fields omitted when None, present when set).
|
||||
- **State-model dataflow analysis** — new `src/state/` module implementing a forward worklist dataflow engine over the existing CFG. Tracks per-variable resource lifecycle (`UNINIT`, `OPEN`, `CLOSED`, `MOVED`) via bitset lattice and per-path authentication level (`Unauthed`, `Authed`, `Admin`) as a composable product domain. Detects:
|
||||
- **Use-after-close** (`state-use-after-close`, High) — variable read/written after its resource handle was closed.
|
||||
- **Double-close** (`state-double-close`, Medium) — resource handle closed more than once.
|
||||
- **Must-leak** (`state-resource-leak`, High) — resource acquired but never closed on any exit path.
|
||||
- **May-leak** (`state-resource-leak-possible`, Medium) — resource open on some but not all exit paths (branch-aware via lattice join).
|
||||
- **Unauthenticated access** (`state-unauthed-access`, High) — sensitive sink reached without a preceding auth/admin check.
|
||||
- **State analysis architecture** — six-module design:
|
||||
- `lattice.rs` — `Lattice` trait (`bot`, `join`, `leq`) for generic fixed-point computation.
|
||||
- `domain.rs` — `ResourceLifecycle` (bitflag), `ResourceDomainState`, `AuthLevel`, `AuthDomainState`, `ProductState` with lattice impls.
|
||||
- `symbol.rs` — `SymbolInterner` that builds a string-interning table from CFG node defines/uses; `SymbolId` newtype.
|
||||
- `transfer.rs` — `DefaultTransfer` function: maps CFG node kinds (Call, Assignment, If, Return) to state transitions using the existing `ResourcePair` definitions from `cfg_analysis::rules`. Emits `TransferEvent` for illegal transitions.
|
||||
- `engine.rs` — two-phase forward worklist solver: Phase 1 iterates to a fixed point (no events collected to avoid spurious reports from intermediate states); Phase 2 re-applies transfer once over converged states to collect events. Bounded by `MAX_TRACKED_VARS` (64) with guarded degradation.
|
||||
- `facts.rs` — post-analysis pass: extracts `StateFinding`s from transfer events (use-after-close, double-close) and exit-node state inspection (must-leak, may-leak, unauthed access).
|
||||
- **`scanner.enable_state_analysis` config option** — opt-in boolean (default `false`) in `ScannerConfig` and `default-nyx.conf`. Requires CFG mode (`full` or `taint`).
|
||||
- **`Diag.message` field** — optional human-readable message on diagnostic output. State findings carry variable-specific context (e.g. "variable `f` used after close"). Surfaced in console output (dimmed line below the finding), JSON, and SARIF (`message.text` prefers per-finding message over generic rule description).
|
||||
- **State finding dedup** — when state analysis produces findings on a line, overlapping `cfg-resource-leak` and `cfg-auth-gap` findings on the same line are suppressed (state analysis is more precise).
|
||||
- **SARIF rule descriptions** for all five state rule IDs.
|
||||
- 21 integration tests (`tests/state_tests.rs`) with 19 C fixture files covering: use-after-close, double-close, resource leak, clean usage, opt-in gating, may-leak vs must-leak branch semantics, early return, nested branches, both-branches-close, loop convergence, loop use-after-close, handle overwrite, reopen-after-close, multiple handles, conservative join masking, chain operations, malloc/free pairs, straight-line double-close, and message field population.
|
||||
- 30+ unit tests across state modules: lattice properties, lifecycle join/leq, domain merging, auth-level join, product state composition, may/must leak semantics, symbol interning, and transfer event generation.
|
||||
- **`--severity <EXPR>` filter** — replaces `--high-only` with a flexible severity expression supporting single levels (`HIGH`), comma lists (`HIGH,MEDIUM`), and thresholds (`>=MEDIUM`). Parsing is case-insensitive with whitespace tolerance. `SeverityFilter` type with `parse()` and `matches()` in `patterns/mod.rs`.
|
||||
- **`--mode <full|ast|cfg|taint>`** — replaces `--ast-only` and `--cfg-only` with a single canonical analysis mode flag. Enforces mutual exclusivity via clap `ValueEnum`.
|
||||
- **`--index <auto|off|rebuild>`** — replaces `--no-index` and `--rebuild-index` with a single flag (default `auto`).
|
||||
- **`--fail-on <SEVERITY>`** — CI ergonomics: exit code 1 if any emitted finding meets or exceeds the threshold severity. Example: `--fail-on HIGH`.
|
||||
- **`--quiet`** — CLI flag to suppress all human-readable status output (equivalent to `output.quiet = true` in config).
|
||||
- **`--keep-nonprod-severity`** — renamed from `--include-nonprod` for clarity; old name kept as hidden alias.
|
||||
- **`OutputFormat` enum** — `--format` now uses clap `ValueEnum` with typed `Console`, `Json`, `Sarif` variants (default `Console`). No more empty-string default.
|
||||
- 10 new unit tests: `SeverityFilter` parsing (single, comma list, threshold, case-insensitive, whitespace, empty rejection, invalid level rejection), `Severity::from_str` rejection of unknown values, and `severity_filter_applied_at_output_stage` integration test verifying that downgraded findings are correctly filtered.
|
||||
- **AST pattern overhaul** -- all 10 language pattern files (`src/patterns/*.rs`) rewritten with consistent conventions, structured metadata, and validated tree-sitter queries.
|
||||
- **Pattern schema extensions** -- `PatternTier` (A = structural, B = heuristic-guarded), `PatternCategory` (13 vulnerability classes), and `Hash` on `Severity`. Module-level docs explain conventions and how to add new patterns.
|
||||
- **Namespaced IDs** -- all pattern IDs follow `<lang>.<category>.<specific>` format (e.g. `java.deser.readobject`, `py.cmdi.os_system`, `js.xss.document_write`).
|
||||
- **New vulnerability coverage** -- 30+ new patterns across languages: Python deserialization (`pickle.loads`, `yaml.load`, `shelve.open`), Python command injection (`os.system`, `os.popen`), Python weak crypto (`hashlib.md5/sha1`), Java reflection (`Method.invoke`), Java weak digest (`MessageDigest.getInstance("MD5")`), Java XSS (`getWriter().println`), Go TLS misconfiguration (`InsecureSkipVerify: true`), Go SQL concat, Go hardcoded secrets, Go gob deserialization, PHP `assert()` code exec, PHP `include $var` path traversal, PHP weak crypto (`md5`/`sha1`/`rand`), C/C++ `popen()`, C/C++ format-string with variable first arg, C++ `const_cast`, Ruby `Digest::MD5`.
|
||||
- **Query fixes** -- fixed 11 broken tree-sitter queries: Java `object_creation_expression` used wrong type node (`identifier` → `type_identifier`), C++ `reinterpret_cast`/`const_cast` used non-existent node types (→ `template_function` match), Ruby backtick used `shell_command` (→ `subshell`), Python SQL used `binary_expression` (→ `binary_operator`), TypeScript `as any` used inaccessible field (→ positional child), PHP patterns missing `argument` wrapper nodes, Rust `unsafe fn` regex used unsupported `\b`.
|
||||
- **No-duplicate rule** -- patterns that overlap with taint sinks use distinct ID namespaces and are documented; dedup in `ast.rs` prevents duplicate findings at the same location.
|
||||
- **Severity recalibration** -- `unwrap`/`expect`/`panic!`/`todo!` moved to Low (filtered by default `min_severity`). Security patterns remain High/Medium.
|
||||
- **Pattern test suite** (`tests/pattern_tests.rs`, 26 tests) -- sanity checks (unique IDs, query compilation, non-empty descriptions, naming convention, severity distribution), positive fixture tests (10 languages), and negative fixture tests (10 languages verifying no false positives on safe code).
|
||||
- **Pattern test fixtures** -- positive and negative fixture files for all 10 languages under `tests/fixtures/patterns/<lang>/`.
|
||||
- **Real world test suite** — comprehensive fixture-based test suite (`tests/real_world_tests.rs`) with ~180 test fixtures across all 10 supported languages (C, C++, Go, Java, JavaScript, PHP, Python, Ruby, Rust, TypeScript). Each fixture has an `.expect.json` file declaring expected findings (with `must_match` for hard requirements and soft expectations for aspirational coverage). Fixtures are organized by analysis type (`taint/`, `state/`, `cfg/`, `mixed/`) under `tests/fixtures/real_world/<lang>/`. A single parameterized test runner validates all fixtures in both `full` and `ast` modes, with verbose output via `NYX_TEST_VERBOSE=1`.
|
||||
|
||||
|
||||
### Changed
|
||||
- **Console header line now includes confidence** — the finding header shows score and confidence together as a parenthesized suffix: `(Score: 36, Confidence: Medium)`. The previous standalone `Confidence: ...` body line is removed. All four combinations are handled (both, score-only, confidence-only, neither).
|
||||
- **Confidence display uses Title Case** — `Confidence::Display` now renders as `Low`, `Medium`, `High` (previously lowercase).
|
||||
- **Breaking**: Config and data directory changed from `dev.ecpeter23.nyx` to `nyx` (e.g. `~/Library/Application Support/nyx/` on macOS). Existing config files (`nyx.conf`, `nyx.local`) and SQLite indexes at the old path will not be picked up automatically — copy them to the new location or re-run `nyx scan` to regenerate.
|
||||
- **Improved diagnostic output formatting** — overhauled console renderer for a professional, security-tool-grade look:
|
||||
- Severity is now the strongest visual anchor: HIGH (bold red with ✖), MEDIUM (bold orange ⚠), LOW (muted blue-gray ●). Fewer colors, clearer hierarchy.
|
||||
- File paths rendered dim blue (never brighter than severity).
|
||||
- Taint flow messages now use `→` arrow between shortened source/sink instead of backtick-wrapped text.
|
||||
- Evidence values (Source, Sink) no longer wrapped in backticks — cleaner rendering with no risk of broken backtick spans across wrapped lines.
|
||||
- **Fixed taint expression rendering** — multi-line sink/source call chains are now normalised before display:
|
||||
- Whitespace collapsed (`foo() .bar()` → `foo().bar()`).
|
||||
- Newlines joined into single-line canonical form.
|
||||
- Spacing artefacts between `)` and `.` in method chains cleaned up.
|
||||
- Long chains truncated with `…` ellipsis.
|
||||
- Added `terminal_size` dependency for terminal-width-aware line wrapping.
|
||||
- **Monotone forward dataflow taint analysis** — replaced the BFS taint engine in `taint/mod.rs` with a proper worklist-based forward dataflow analysis where termination is guaranteed by lattice finiteness. The generic `Transfer<S: Lattice>` trait in `state/engine.rs` now powers both the resource lifecycle/auth analysis and taint analysis.
|
||||
- **`TaintState` lattice** (`taint/domain.rs`) — bounded abstract state with per-variable `VarTaint` (Cap bitflags + multi-origin tracking via `SmallVec<[TaintOrigin; 2]>`), dual validation bitsets (`validated_must` for intersection/all-paths, `validated_may` for union/any-path), and monotone `PredicateSummary` for contradiction pruning. Variables stored in sorted `SmallVec` keyed by `SymbolId` for O(n) merge-join. Lattice height bounded at ~8700 (7-bit Cap × 64 vars + validation bits + predicate bits).
|
||||
- **`TaintTransfer`** (`taint/transfer.rs`) — implements `Transfer<TaintState>` with identical taint logic to the old BFS (source → propagation → sanitization → sink check). Callee resolution unchanged (local → global same-lang → interop edges). Emits `TaintEvent::SinkReached` events during Phase 2 of the engine.
|
||||
- **JS/TS two-level solve** — prevents cross-function taint leakage (the main source of state explosion in the old BFS) while preserving global-to-function flows. Level 1 solves top-level code; Level 2 solves each function seeded with read-only top-level taint via `global_seed`.
|
||||
- **Monotone predicate tracking** — path-sensitivity predicates moved from per-BFS-item `PathState` (which duplicated state exponentially) to monotone `PredicateSummary` in the lattice. Contradiction pruning uses `known_true & known_false` bit intersection (NullCheck/EmptyCheck/ErrorCheck only), which is both more precise and guaranteed monotone.
|
||||
- **Multi-origin tracking** — each tainted variable tracks up to 4 `TaintOrigin` (node + `SourceKind`), enabling multiple findings when distinct sources flow to the same sink.
|
||||
- **Guaranteed termination** — no more `MAX_BFS_ITERATIONS`/`MAX_SEEN_STATES` safety nets needed (though a 100K worklist iteration budget remains as defense-in-depth). Convergence follows from finite lattice height × finite CFG edges.
|
||||
- **`analyse_file()` signature unchanged** — `Finding` struct, `Diag` conversion, and all callers are unaffected.
|
||||
- **Generic dataflow engine** (`state/engine.rs`) — `run_forward()` and `DataflowResult` are now generic over any `S: Lattice` + `T: Transfer<S>`. `DefaultTransfer` (resource lifecycle) implements `Transfer<ProductState>`; `TaintTransfer` implements `Transfer<TaintState>`. Per-domain iteration budget and `on_budget_exceeded` hooks added.
|
||||
- **`path_state.rs` simplified** — removed `PathState`, `Predicate`, `MAX_PATH_PREDICATES`, `state_hash()`, `priority()` structs/methods. Kept `PredicateKind` enum and `classify_condition()` function (used by the new transfer for predicate classification).
|
||||
- **Removed BFS infrastructure** — `taint_hash()`, BFS `Item` struct, `pred` predecessor map, two-tier seen-state map, and all bail-out constants (`MAX_BFS_ITERATIONS=200K`, `MAX_SEEN_STATES=100K`, `PATH_SENSITIVITY_NODE_LIMIT=500`, `PATH_SENSITIVITY_QUEUE_LIMIT=10K`, `MAX_PATH_VARIANTS_PER_KEY=4`) are no longer needed and have been removed.
|
||||
- **Severity filtering applied at output stage** — `--severity` (and legacy `--high-only`) filtering is now applied ONCE in `scan::handle()` after all severity normalization (nonprod downgrades, dedup, truncation). Previously `--high-only` only filtered AST patterns during analysis; taint and CFG findings bypassed the filter entirely.
|
||||
- **`--format` default is `console`** — previously defaulted to empty string, requiring fallback logic.
|
||||
- **All status/progress output goes to stderr** — "Checking...", "Finished in...", config notes, and progress bars now use `eprintln!`/stderr exclusively. JSON and SARIF output is stdout-only.
|
||||
- **`Severity::from_str` returns `Err` for unknown values** — previously returned `Ok(Severity::Low)` for any unrecognized input.
|
||||
- **Deprecated CLI flags preserved as hidden aliases** — `--high-only`, `--no-index`, `--rebuild-index`, `--ast-only`, `--cfg-only`, and `--include-nonprod` are hidden from help but still functional, mapping to their canonical replacements.
|
||||
- **Path-sensitive taint analysis** -- the BFS taint engine now carries a `PathState` (bounded set of branch predicates) alongside the taint map. When the BFS traverses a True or False edge from an `If` node, it records a `Predicate` with the condition's variables, kind, and polarity. This enables two new capabilities:
|
||||
- **Infeasible path pruning** -- paths with contradictory predicates (e.g. `if x.is_none() { return; } if x.is_none() { sink }`) are detected and pruned, eliminating false positives on code guarded by redundant null/empty/error checks. Contradiction detection is conservative: only whitelisted kinds (`NullCheck`, `EmptyCheck`, `ErrorCheck`) with single-variable predicates are pruned.
|
||||
- **Validation guard annotation** -- when all tainted variables reaching a sink are guarded by a `ValidationCall` predicate (e.g. `if validate(&x) { sink }` or `if !validate(&x) { return; } sink`), the finding is annotated with `path_validated: true` and `guard_kind: ValidationCall`. This metadata is surfaced in JSON and console output without changing severity.
|
||||
- **Condition metadata on CFG nodes** -- `NodeInfo` now carries `condition_text`, `condition_vars`, and `condition_negated` for `If` nodes, extracted during CFG construction. Negation detection handles `!expr`, `not expr`, and Ruby `unless`. Classification of condition text into `PredicateKind` (NullCheck, EmptyCheck, ErrorCheck, ValidationCall, SanitizerCall, Comparison, Unknown) is conservative: call-based kinds require `(` in the text and a matching callee token.
|
||||
- **`path_validated` and `guard_kind` fields on `Diag`** -- taint findings carry path-sensitivity metadata in JSON output (fields omitted when not set) and console output (suffix line `Path guard: ValidationCall` when present). Finding IDs are unchanged for dedup stability.
|
||||
- **`smallvec` dependency** -- used for inline-allocated predicate storage in `PathState` (avoids heap allocation for the common case of ≤4 predicates per path).
|
||||
- **Interprocedural call graph** -- a whole-program `CallGraph` (`petgraph::DiGraph<FuncKey, CallEdge>`) is now built between Pass 1 and Pass 2 of every taint-enabled scan. Each function definition is a node; resolved callee relationships are edges. The graph is constructed from the merged `GlobalSummaries` and is available in both the filesystem and indexed scan paths.
|
||||
- **Three-valued callee resolution** -- `CalleeResolution` enum distinguishes `Resolved(FuncKey)`, `NotFound`, and `Ambiguous(Vec<FuncKey>)`. Ambiguous callees (same name in multiple namespaces, caller in a third namespace) are tracked separately from missing callees for diagnostics.
|
||||
- **Shared resolution helper** -- `GlobalSummaries::resolve_callee_key()` centralizes same-language callee resolution with arity-aware filtering and namespace disambiguation. Both the call graph builder and the taint engine now use the same resolution logic.
|
||||
- **Callee-name normalization** -- `normalize_callee_name()` extracts the last segment from qualified callee text (`"env::var"` → `"var"`, `"obj.method"` → `"method"`) before resolution. The raw call-site text is preserved on graph edges for diagnostics.
|
||||
- **SCC / topological analysis** -- `CallGraphAnalysis` computes strongly connected components via Tarjan's algorithm and exposes a callee-first (leaves-first) topological ordering of SCC indices, ready for future bottom-up taint propagation.
|
||||
- **Call graph tracing** -- `tracing::info!` log with node count, edge count, unresolved-not-found count, unresolved-ambiguous count, and SCC count is emitted after every call graph build.
|
||||
- 8 new path-sensitivity integration tests: early-return validation guard, failed-validation branch, contradictory null-check pruning, if/else validation annotation, sanitize-one-branch regression, path-state budget graceful degradation, unknown-predicate non-pruning, multi-var non-pruning.
|
||||
- 35 new unit tests in `taint::path_state`: classify_condition variants, PathState push/truncation, contradiction detection (whitelisted kinds, single-var only), has_validation_for semantics, state_hash determinism, priority ordering.
|
||||
- 11 new unit tests: callee normalization, same-name-different-namespaces resolution, cross-language isolation, arity separation, recursive SCC detection, not-found vs ambiguous diagnostics, diamond topo ordering, interop edge resolution, namespace normalization consistency, and raw call-site preservation.
|
||||
- **Edge-aware taint traversal** -- `analyse_file()` now uses `cfg.edges(node)` instead of `cfg.neighbors(node)`, inspecting `EdgeKind` on each edge. This is required for predicate recording but also makes the taint engine aware of the CFG's branch structure for the first time.
|
||||
- **Two-tier seen-state deduplication** -- the BFS seen-state map changed from `HashSet<(NodeIndex, u64)>` to a `HashMap` keyed by `(NodeIndex, taint_hash)` mapping to a bounded list of `(path_hash, priority)` pairs. At most `MAX_PATH_VARIANTS_PER_KEY` (4) path variants are tracked per taint state, with deterministic eviction preferring non-truncated states with fewer predicates.
|
||||
- **Finding deduplication** -- taint findings are now deduplicated by `(sink, source)` pair after analysis, preferring findings with `path_validated = true` (most informative metadata).
|
||||
- **`taint::Finding` struct** -- added `path_validated: bool` and `guard_kind: Option<PredicateKind>` fields. Code that constructs `Finding` directly must include these fields.
|
||||
- **`Diag` struct** -- added `path_validated: bool` and `guard_kind: Option<String>` fields. Both use `#[serde(skip_serializing_if)]` to omit from JSON when not set.
|
||||
- **`taint::resolve_callee()` refactored** -- the global resolution step now delegates to `GlobalSummaries::resolve_callee_key()` and applies `normalize_callee_name()` before lookup, unifying resolution logic with the call graph builder.
|
||||
- **Label rules expanded across 8 languages:**
|
||||
- **Go** — added `r.URL.Query`, `r.URL.Query.Get`, `Request.FormValue`, `Request.URL` sources; `filepath.Clean`/`filepath.Base` sanitizers; `fmt.Fprintf`/`fmt.Sprintf`/`fmt.Printf` format-string sinks; `os.Open`/`os.OpenFile`/`os.Create`/`ioutil.ReadFile`/`os.ReadFile` FILE_IO sinks; `template.HTML` HTML sink; `db.QueryRow`/`db.Prepare` SQL sinks.
|
||||
- **PHP** — sources now match both `$_GET` and `_GET` (without `$` prefix, matching collect_idents stripping); added `$_FILES`/`_FILES`, `$_SERVER`/`_SERVER`, `$_ENV`/`_ENV` sources; `eval`/`assert` shell sinks; `include`/`include_once`/`require`/`require_once` FILE_IO sinks; `unserialize` sink; `move_uploaded_file`/`copy`/`file_put_contents`/`fwrite` FILE_IO sinks; `basename` FILE_IO sanitizer; `query` SQL sink.
|
||||
- **Java** — added `readObject`/`readLine` sources; `ProcessBuilder` shell sink; `Class.forName` reflection sink; `println`/`print`/`write` HTML sinks.
|
||||
- **Python** — added `send_file`/`send_from_directory` FILE_IO sinks; `os.path.realpath` FILE_IO sanitizer; `open` changed from source to FILE_IO sink (fixes source/sink conflict for path traversal detection).
|
||||
- **Ruby** — `params` source detection now works via subscript handling.
|
||||
- **Rust** — added `fs::read_to_string`/`fs::write`/`fs::read`/`File::open`/`File::create` as FILE_IO sinks; `fs::read_to_string` removed from sources (was source/sink conflict).
|
||||
- **C/C++** — added `fopen`/`open` as FILE_IO sinks.
|
||||
- **Ruby `rb.cmdi.system_interp` pattern broadened** — no longer requires string interpolation in arguments; now matches any `system`/`exec` call, promoted from Tier B to Tier A.
|
||||
- **C++ `cpp.cmdi.popen` pattern added** — `popen()` command execution detection for C++, using the language-namespaced ID (the C pattern retains `c.cmdi.popen`).
|
||||
- **Test config enables state analysis** — `test_config()` now sets `enable_state_analysis = true`.
|
||||
|
||||
|
||||
### Fixed
|
||||
- **Taint source kind misclassified as "unknown" for non-call sources** — source-bearing nodes with `CallWrapper` or `Assignment` kind (e.g. `userInput = req.query.data`) had their `callee` field set to `None` because the CFG builder only populated `callee` for `StmtKind::Call` nodes. This caused `infer_source_kind()` to receive an empty string, failing to match any keyword pattern and defaulting to `SourceKind::Unknown`. Fixed by also setting `callee` when a label (Source/Sink/Sanitizer) is detected, so the extracted member text (e.g. "req.query") flows through to source kind inference. Affects severity classification and diagnostic output for property-access sources across all languages.
|
||||
- **Full KINDS map audit across all 10 languages** — 89 missing tree-sitter node types added to KINDS maps so the CFG builder no longer silently drops code inside switch/case, try/catch/finally, class bodies, closures/lambdas, and other container nodes. Previously, any node not in a language's KINDS map hit the `build_sub` fallback which created a terminal Seq node without recursing into children, effectively making all wrapped code invisible to analysis.
|
||||
- **C** (+3): `switch_statement`, `case_statement`, `labeled_statement`
|
||||
- **C++** (+7, 1 fix): `switch_statement`, `case_statement`, `labeled_statement`, `throw_statement` (Return), `try_statement`, `catch_clause`, `lambda_expression`; **critical fix**: `namespace_definition` changed from `Trivia` to `Block` (all function definitions inside namespaces were silently dropped)
|
||||
- **Java** (+11): `do_statement` (While), `throw_statement` (Return), `switch_expression`, `switch_block`, `switch_block_statement_group`, `try_statement`, `catch_clause`, `finally_clause`, `lambda_expression`, `constructor_body`, `static_initializer`
|
||||
- **JavaScript** (+11): `switch_statement`, `switch_body`, `switch_case`, `switch_default`, `try_statement`, `catch_clause`, `finally_clause`, `class_declaration`, `class` (expression), `class_body`, `export_statement`
|
||||
- **TypeScript** (+13): all JS switch/try/class entries plus `abstract_class_declaration`, `export_statement`, `enum_declaration` (Trivia)
|
||||
- **PHP** (+11): `do_statement` (While), `throw_expression` (Return), `switch_statement`, `switch_block`, `case_statement`, `default_statement`, `try_statement`, `catch_clause`, `finally_clause`, `colon_block`, `class_declaration`
|
||||
- **Python** (+7): `try_statement`, `except_clause`, `finally_clause`, `class_definition`, `decorated_definition`, `match_statement`, `case_clause`
|
||||
- **Ruby** (+11): `until` (While), `begin`, `rescue`, `ensure`, `case`, `when`, `class`, `module`, `singleton_method` (Function), `do`, `block`
|
||||
- **Go** (+10): `expression_switch_statement`, `type_switch_statement`, `expression_case`, `type_case`, `default_case`, `select_statement`, `communication_case`, `go_statement`, `defer_statement`, `func_literal` (Function)
|
||||
- **Rust** (+5, 1 removal): `closure_expression`, `async_block`, `impl_item`, `trait_item`, `declaration_list`; removed dead `loop_statement` entry (node doesn't exist in tree-sitter-rust 0.24.0)
|
||||
- Removed unused `Kind::LoopBody` enum variant from `labels/mod.rs` (no arm in `build_sub`, last reference was the dead Rust `loop_statement` entry)
|
||||
- **CFG: `else_clause` not recursed into for C/C++** — tree-sitter's C and C++ grammars wrap else bodies in an `else_clause` node. This node was missing from both languages' `KINDS` maps, so the CFG builder's fallback arm treated it as a terminal `Seq` node without descending into children. All statements inside else blocks (e.g. `fclose(f)`) were silently dropped from the CFG, causing false-positive resource leak and incorrect branch analysis. Fixed by mapping `"else_clause" => Kind::Block` in `src/labels/c.rs` and `src/labels/cpp.rs`.
|
||||
- **CFG: `else_clause` missing from Rust, JavaScript, TypeScript, Python, PHP KINDS maps** — same bug class as C/C++: tree-sitter wraps else bodies in an `else_clause` node that was not in KINDS, silently dropping all code inside else blocks from the CFG. Fixed by mapping `"else_clause" => Kind::Block` in all five languages. Also added `"elif_clause" => Kind::Block` (Python), `"else_if_clause" => Kind::Block` (PHP), and `"elsif" => Kind::If` (Ruby) to handle chained elif/elsif nodes.
|
||||
- **Rust KINDS using wrong tree-sitter node names** — tree-sitter-rust uses `_expression` suffixes (not `_statement`) for `while`, `for`, and `return` nodes. The existing `while_statement`, `for_statement`, and `return_statement` entries were dead code (0 grammar matches). Added `while_expression`, `for_expression`, and `return_expression` mappings.
|
||||
- **Rust `match_expression`, `match_block`, `match_arm`, `unsafe_block` missing from KINDS** — these wrapper nodes were not mapped, causing all code inside match arms and unsafe blocks to be silently dropped from the CFG. Mapped to `Kind::Block` for sequential traversal.
|
||||
- **TypeScript missing `throw_statement` and `do_statement`** — `throw` was mapped in JavaScript but not TypeScript; `do_statement` (do-while loops) was missing from both JS and TS. Added `"throw_statement" => Kind::Return` and `"do_statement" => Kind::While` to both languages.
|
||||
- **Python `raise_statement` and `with_statement` missing from KINDS** — `raise` terminates the current path (mapped to `Kind::Return`); `with` wraps code in a context manager (mapped to `Kind::Block`). Both were silently dropping enclosed code.
|
||||
- **Dead KINDS entries removed** — `"for_of_statement"` in TypeScript (0 grammar matches; TS inherits `for_in_statement` from JS) and `"method_call"` in Ruby (0 grammar matches; Ruby only has `call`).
|
||||
- **`--high-only` emitting Low/Medium taint and CFG findings** — severity filter was only applied to AST pattern queries during analysis. Taint findings (whose severity derives from `SourceKind`) and CFG structural findings passed through unfiltered. The filter is now applied at the final output stage after all severity normalization, ensuring `--severity HIGH` never emits downgraded Medium/Low findings.
|
||||
- **JSON/SARIF output contaminated with status messages on stdout** — status messages ("Checking...", "Finished in...") used `println!` and appeared in stdout alongside machine output. Now all status goes to stderr.
|
||||
- **CFG: False edge to then-block exits in no-else if statements** -- previously, `if (cond) { body }` without an else block created a `False` edge from the condition node directly to the then-block's exit nodes. This made the false path appear to traverse the then-block, causing incorrect predicate polarity in path-sensitive analysis and duplicate taint findings with contradictory metadata. The CFG now creates a synthetic pass-through `Seq` node for the false path with an explicit `False` edge from the condition, correctly modeling "skip the then-block." This also fixes the frontier: previously, the no-else non-terminating case duplicated `then_exits` in the frontier (`then_exits ++ then_exits.clone()`); it now correctly produces `then_exits ∪ [pass_through]`.
|
||||
- **Taint BFS non-termination on large JS files** — the BFS taint engine in `taint/mod.rs` had no global iteration bound. The seen-state deduplication keyed on `(node, taint_hash)`, so every distinct taint map at a CFG node was treated as a novel state. In files with loops and many tainted variables (e.g. a 2,200-line JS file with 18+ top-level variables tainted via `window.location.search`), each loop iteration produced a slightly different taint map, causing the BFS to revisit loop bodies indefinitely. Both `--no-index` and `--rebuild-index` scans hung near completion (progress showed e.g. 87/88 files). Fixed by adding two hard bounds: `MAX_BFS_ITERATIONS` (200,000 queue pops) and `MAX_SEEN_STATES` (100,000 unique `(node, taint_hash)` entries in the seen-state map). When either limit is reached the analysis bails out gracefully and returns all findings collected so far. A `tracing::warn!` is emitted on iteration-limit bail-out. Normal files are unaffected (typical BFS uses <1,000 iterations).
|
||||
- **Rust `if let` / `while let` taint propagation** — the CFG builder now extracts pattern bindings from `let_condition` nodes as variable definitions in `def_use()`, and classifies the value expression (e.g. `env::var("CMD")`) for source/sink labels in `push_node()`. Previously, `if let Ok(cmd) = env::var("CMD") { Command::new("sh").arg(&cmd) }` produced no taint finding because `cmd` was never recognized as a tainted definition. Now correctly detects taint flow through `if let` and `while let` bindings.
|
||||
- **C++ `popen` pattern ID collision** — renamed `c.cmdi.popen` to `cpp.cmdi.popen` in C++ patterns to fix a cross-language duplicate ID that caused `all_pattern_ids_are_globally_unique` test failure.
|
||||
- **State analysis early-return leak duplication** — `extract_findings` in `state/facts.rs` now skips early-return nodes when checking for resource leaks, only inspecting the synthesized function exit node. Previously, early-return nodes with path-specific state (OPEN only) emitted `state-resource-leak` alongside the correct `state-resource-leak-possible` from the merged exit state.
|
||||
- **Severity filter bug** — `min_severity` comparison in `ast.rs` was inverted (`<=` instead of `>`), causing all AST patterns at the minimum severity level to be silently dropped. With the default `min_severity = Low`, all Low-severity patterns (`.unwrap()`, `.expect()`, `panic!`, `todo!`, `mem::forget`, Go crypto patterns, narrow casts) were never reported. Fixed 29 test cases.
|
||||
- **Nested function analysis** — CFG builder now recurses into function expressions passed as call arguments (e.g., Express `app.get('/path', function(req, res) { ... })`, Sinatra `get '/path' do...end`). Added `collect_nested_function_nodes()` to discover `Kind::Function` nodes inside `CallWrapper`/`CallFn` AST subtrees. Also added `function_expression` to JS/TS KINDS maps, and `do_block`/`block` as `Kind::Function` in Ruby for Sinatra/Rails blocks. Anonymous functions now get unique names (`<anon@{offset}>`) to prevent scope collisions in JS two-level taint solve.
|
||||
- **Chained method call classification** — `classify()` now normalizes chained calls like `r.URL.Query().Get` by stripping internal `()` between `.` segments, producing `r.URL.Query.Get`. Suffix matching is attempted against both the original head and the normalized form, fixing Go HTTP handler source detection and similar patterns.
|
||||
- **Subscript access source detection** — `first_member_label` and `first_member_text` now handle `subscript_expression`, `subscript`, and `element_reference` nodes, enabling source classification for PHP `$_GET['cmd']`, Ruby `params[:cmd]`, and Python `os.environ['KEY']`.
|
||||
- **Return-statement call extraction** — `Kind::Return` added to the node types that extract inner call identifiers via `first_call_ident`, fixing cases like `return send_file(path)` where the sink was not classified.
|
||||
- **Nested call classification** — new `find_classifiable_inner_call()` tries all nested calls when the outermost one doesn't classify, fixing `str(eval(expr))` where `eval` is a sink wrapped in a non-sink call.
|
||||
- **Java `new` expression text extraction** — added `type` field fallback in `push_node` and `first_call_ident` for `CallFn` nodes, fixing `new ProcessBuilder(...)` not matching as a sink.
|
||||
- **Function body lookup for anonymous functions** — `Kind::Function` handler now falls back to finding a `Kind::Block` child when `child_by_field_name("body")` returns None, supporting JS/TS anonymous function expressions and Ruby blocks.
|
||||
- **Function-level resource leak detection** — `extract_findings` in `state/facts.rs` now inspects per-function Return nodes for leaked resources, not just the file-level Exit node. Previously, variables from one function could be overwritten by same-named variables in subsequent functions, masking leaks.
|
||||
- **Use-after-free for memory functions** — added `strcpy`, `strncpy`, `memcpy`, `memmove`, `memset`, `memcmp`, `strcmp`, `strncmp`, `strlen`, `sprintf`, `snprintf` to `RESOURCE_USE_PATTERNS` in state analysis, enabling use-after-free detection for common C/C++ string and memory functions.
|
||||
|
||||
## [0.3.0] - 2026-02-25
|
||||
|
||||
### Added
|
||||
|
|
|
|||
|
|
@ -61,7 +61,7 @@ representative at an online or offline event.
|
|||
|
||||
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
||||
reported to the community leaders responsible for enforcement at
|
||||
**opening a private issue** at [https://github.com/ecpeter23/nyx/issues/new/choose]().
|
||||
**opening a private issue** at [https://github.com/elicpeter/nyx/issues/new/choose]().
|
||||
All complaints will be reviewed and investigated promptly and fairly.
|
||||
|
||||
All community leaders are obligated to respect the privacy and security of the
|
||||
|
|
|
|||
377
CONTRIBUTING.md
377
CONTRIBUTING.md
|
|
@ -1,142 +1,351 @@
|
|||
# Contributing to Nyx
|
||||
|
||||
First off, **thank you for taking the time to contribute!** By participating in this project, you agree to abide by the community values and expectations described in our [Code of Conduct](CODE_OF_CONDUCT.md).
|
||||
Thank you for your interest in improving Nyx. This guide covers everything you need to contribute effectively.
|
||||
|
||||
Nyx is dual‑licensed under **MIT** and **Apache‑2.0**. By submitting code, documentation, or any other material, you agree to license your contribution under these same terms.
|
||||
Please read our [Code of Conduct](CODE_OF_CONDUCT.md) before participating.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Getting Started](#getting-started)
|
||||
2. [How to Contribute](#how-to-contribute)
|
||||
|
||||
* [Bug Reports](#bug-reports)
|
||||
* [Feature Requests](#feature-requests)
|
||||
* [Pull Requests](#pull-requests)
|
||||
3. [Development Workflow](#development-workflow)
|
||||
4. [Commit & Branching Conventions](#commit--branching-conventions)
|
||||
5. [Style Guide](#style-guide)
|
||||
6. [Security Policy](#security-policy)
|
||||
7. [Community Standards](#community-standards)
|
||||
1. [Development Setup](#development-setup)
|
||||
2. [Project Layout](#project-layout)
|
||||
3. [How to Add a New AST Pattern](#how-to-add-a-new-ast-pattern)
|
||||
4. [How to Add a New Taint Rule](#how-to-add-a-new-taint-rule)
|
||||
5. [How to Add a New Language](#how-to-add-a-new-language)
|
||||
6. [Testing](#testing)
|
||||
7. [Pull Request Guidelines](#pull-request-guidelines)
|
||||
8. [Bug Reports](#bug-reports)
|
||||
9. [Feature Requests](#feature-requests)
|
||||
10. [Release Process](#release-process)
|
||||
|
||||
---
|
||||
|
||||
## Getting Started
|
||||
## Development Setup
|
||||
|
||||
Clone the repository and build Nyx in release mode:
|
||||
### Prerequisites
|
||||
|
||||
- **Rust 1.85+** (edition 2024)
|
||||
- Git
|
||||
|
||||
### Building
|
||||
|
||||
```bash
|
||||
git clone https://github.com/<your‑org>/nyx.git
|
||||
git clone https://github.com/elicpeter/nyx.git
|
||||
cd nyx
|
||||
cargo build --release
|
||||
|
||||
cargo build # Debug build
|
||||
cargo build --release # Release build
|
||||
cargo install --path . # Install as `nyx` binary
|
||||
```
|
||||
|
||||
Run the test‑suite:
|
||||
### Running Quality Checks
|
||||
|
||||
```bash
|
||||
cargo test
|
||||
cargo test --bin nyx # Unit tests (inline in modules)
|
||||
cargo clippy --all -- -D warnings # Lint — treats warnings as errors
|
||||
cargo fmt # Format code
|
||||
cargo fmt -- --check # Check formatting without modifying
|
||||
```
|
||||
|
||||
> **Tip**: The first build downloads and compiles several `tree‑sitter` grammars. Later builds will be faster.
|
||||
> **Note**: The first build downloads and compiles tree-sitter grammars for all 10 languages. Subsequent builds are faster.
|
||||
|
||||
### Benchmarks
|
||||
|
||||
```bash
|
||||
cargo bench --bench scan_bench
|
||||
```
|
||||
|
||||
Benchmark fixtures live in `benches/fixtures/`. Criterion produces HTML reports in `target/criterion/`.
|
||||
|
||||
---
|
||||
|
||||
## How to Contribute
|
||||
## Project Layout
|
||||
|
||||
### Bug Reports
|
||||
|
||||
* Search existing [issues](https://github.com/<your‑org>/nyx/issues) to ensure the bug has not already been reported.
|
||||
* Include **steps to reproduce**, expected vs. actual behaviour, and your environment details (`nyx --version`, `rustc --version`).
|
||||
* Attach a minimal code sample if possible.
|
||||
|
||||
### Feature Requests
|
||||
|
||||
We welcome well‑motivated feature proposals. Please describe:
|
||||
|
||||
1. **Problem statement** – what pain point does this solve?
|
||||
2. **Proposed solution** – high‑level description, optionally with pseudo‑code.
|
||||
3. **Alternatives considered** – why existing functionality is not enough.
|
||||
|
||||
### Pull Requests
|
||||
|
||||
Every PR should:
|
||||
|
||||
1. Target the `main` branch.
|
||||
2. Contain a single, focused change (small orthogonal fixes are okay).
|
||||
3. Pass `cargo test`, `cargo fmt --check`, and `cargo clippy -- -D warnings`.
|
||||
4. Update documentation and, when relevant, add tests.
|
||||
5. Reference related issue numbers in the description (`Fixes #123`).
|
||||
|
||||
A reviewer will provide feedback within **3 business days**. Squash‑merge is the default strategy; maintainers may edit commit messages for clarity.
|
||||
```
|
||||
src/
|
||||
main.rs CLI entry point
|
||||
lib.rs Library re-exports (benchmarks, integration tests)
|
||||
cli.rs Clap command definitions
|
||||
commands/
|
||||
mod.rs Command dispatch
|
||||
scan.rs Two-pass scan orchestration, Diag struct
|
||||
ast.rs Entry points for both passes; tree-sitter parsing
|
||||
cfg.rs CFG construction from AST
|
||||
cfg_analysis/ CFG structural detectors
|
||||
guards.rs Unguarded sink detection (dominator analysis)
|
||||
auth.rs Auth gap detection
|
||||
resources.rs Resource leak detection
|
||||
error_handling.rs Error fallthrough detection
|
||||
unreachable.rs Unreachable security code detection
|
||||
rules.rs Guard rules, auth rules, resource pairs
|
||||
taint/
|
||||
mod.rs Taint analysis facade + JS two-level solve
|
||||
domain.rs TaintState lattice (VarTaint, Cap, TaintOrigin)
|
||||
transfer.rs TaintTransfer function (source/sanitizer/sink/call)
|
||||
path_state.rs Predicate tracking and contradiction pruning
|
||||
state/
|
||||
engine.rs Generic monotone dataflow engine (Transfer<S: Lattice>)
|
||||
transfer.rs DefaultTransfer — resource lifecycle + auth state
|
||||
summary.rs FuncSummary, GlobalSummaries, conservative merge
|
||||
labels/ Per-language label rules
|
||||
mod.rs classify() dispatch, Cap bitflags, DataLabel, LabelRule
|
||||
rust.rs Rust sources, sinks, sanitizers
|
||||
javascript.rs JS sources, sinks, sanitizers
|
||||
... (one file per language)
|
||||
patterns/ Per-language AST pattern queries
|
||||
mod.rs Pattern struct, Severity, SeverityFilter, registry
|
||||
rust.rs Rust patterns
|
||||
javascript.rs JS patterns
|
||||
... (one file per language)
|
||||
callgraph.rs Call graph construction (petgraph), SCC, topo sort
|
||||
database.rs SQLite indexing via r2d2 pool
|
||||
rank.rs Attack-surface ranking
|
||||
fmt.rs Output formatting and evidence normalization
|
||||
output.rs SARIF 2.1 builder
|
||||
walk.rs Parallel file walker (ignore crate, respects .gitignore)
|
||||
symbol.rs Symbol interning (SymbolId)
|
||||
interop.rs Cross-language interop edges
|
||||
errors.rs NyxError, NyxResult types
|
||||
utils/
|
||||
config.rs TOML config loading, merging, Config struct
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Development Workflow
|
||||
## How to Add a New AST Pattern
|
||||
|
||||
1. **Fork** the repo and create your feature branch:
|
||||
AST patterns are the simplest detector to add. Each pattern is a tree-sitter query that matches a structural code construct.
|
||||
|
||||
```bash
|
||||
git checkout -b feature/my‑feature
|
||||
### Step-by-step
|
||||
|
||||
1. **Pick the language file** under `src/patterns/<lang>.rs`.
|
||||
|
||||
2. **Choose the metadata**:
|
||||
|
||||
| Field | Options | Guidelines |
|
||||
|-------|---------|------------|
|
||||
| **ID** | `<lang>.<category>.<specific>` | e.g. `py.cmdi.os_popen` |
|
||||
| **Tier** | `A` or `B` | `A` = presence alone is high-signal; `B` = query includes a heuristic guard |
|
||||
| **Severity** | `High`, `Medium`, `Low` | High: command exec, deser, banned functions. Medium: SQL concat, reflection, XSS. Low: weak crypto, code quality. |
|
||||
| **Category** | See `PatternCategory` enum | `CommandExec`, `CodeExec`, `Deserialization`, `SqlInjection`, `PathTraversal`, `Xss`, `Crypto`, `Secrets`, `InsecureTransport`, `Reflection`, `MemorySafety`, `Prototype`, `CodeQuality` |
|
||||
|
||||
3. **Write the tree-sitter query**:
|
||||
|
||||
```rust
|
||||
Pattern {
|
||||
id: "py.cmdi.os_popen",
|
||||
description: "os.popen() — shell command execution",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "os")
|
||||
attribute: (identifier) @fn (#eq? @fn "popen")))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CommandExec,
|
||||
},
|
||||
```
|
||||
|
||||
2. Make your changes, then run:
|
||||
The query **must** capture a `@vuln` node. That node's span determines the reported location.
|
||||
|
||||
4. **Test it**:
|
||||
|
||||
```bash
|
||||
cargo fmt
|
||||
cargo clippy --all-targets --all-features -- -D warnings
|
||||
cargo test
|
||||
cargo test --bin nyx
|
||||
```
|
||||
|
||||
3. **Sign‑off** your commits if your employer requires a Developer Certificate of Origin (DCO):
|
||||
5. **Update docs**: Add the new rule to `docs/rules/<lang>.md`.
|
||||
|
||||
### Tips
|
||||
|
||||
- Use the [tree-sitter playground](https://tree-sitter.github.io/tree-sitter/playground) to develop and test queries.
|
||||
- Avoid duplicating taint coverage. If the same function is already a labeled sink in `src/labels/<lang>.rs`, the AST pattern is still useful for `--mode ast`, but use a distinct ID namespace. The dedup pass prevents exact-duplicate findings at the same location.
|
||||
- Test with real-world code to check false positive rates before choosing a tier.
|
||||
|
||||
---
|
||||
|
||||
## How to Add a New Taint Rule
|
||||
|
||||
Taint rules define sources (where untrusted data enters), sinks (where dangerous operations happen), and sanitizers (where data is made safe).
|
||||
|
||||
### Step-by-step
|
||||
|
||||
1. **Open the language file** in `src/labels/<lang>.rs`.
|
||||
|
||||
2. **Add an entry** to the `RULES` slice:
|
||||
|
||||
```rust
|
||||
LabelRule {
|
||||
matchers: &["dangerouslySetInnerHTML"],
|
||||
label: DataLabel::Sink(Cap::HTML_ESCAPE),
|
||||
},
|
||||
```
|
||||
|
||||
3. **Choose the right label type**:
|
||||
|
||||
| Type | Purpose | Example |
|
||||
|------|---------|---------|
|
||||
| `DataLabel::Source(cap)` | Introduces tainted data | `env::var`, `req.body` |
|
||||
| `DataLabel::Sanitizer(cap)` | Strips matching capability bits | `html_escape`, `encodeURIComponent` |
|
||||
| `DataLabel::Sink(cap)` | Dangerous operation requiring sanitization | `eval`, `innerHTML`, `Command::new` |
|
||||
|
||||
4. **Choose capabilities**:
|
||||
|
||||
| Capability | When to use |
|
||||
|-----------|-------------|
|
||||
| `Cap::all()` | Sources that produce universally dangerous data |
|
||||
| `Cap::SHELL_ESCAPE` | Shell command injection sinks/sanitizers |
|
||||
| `Cap::HTML_ESCAPE` | XSS sinks/sanitizers |
|
||||
| `Cap::URL_ENCODE` | URL injection sinks/sanitizers |
|
||||
| `Cap::JSON_PARSE` | JSON parsing sanitizers |
|
||||
| `Cap::FILE_IO` | File I/O sinks |
|
||||
| `Cap::FMT_STRING` | Format string sinks |
|
||||
| `Cap::ENV_VAR` | Environment/config data sources |
|
||||
|
||||
5. **Matcher semantics**:
|
||||
- Case-insensitive suffix matching by default.
|
||||
- If a matcher ends with `_`, it acts as a prefix match.
|
||||
- Multiple matchers in one rule are alternatives (any match triggers the rule).
|
||||
|
||||
### User-defined rules (no code change needed)
|
||||
|
||||
Users can add taint rules via config:
|
||||
|
||||
```toml
|
||||
[[analysis.languages.javascript.rules]]
|
||||
matchers = ["dangerouslySetInnerHTML"]
|
||||
kind = "sink"
|
||||
cap = "html_escape"
|
||||
```
|
||||
|
||||
Or via CLI:
|
||||
|
||||
```bash
|
||||
nyx config add-rule --lang javascript --matcher dangerouslySetInnerHTML --kind sink --cap html_escape
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## How to Add a New Language
|
||||
|
||||
Adding a new language requires changes across several modules. Use an existing language (e.g. Go or Python) as a template.
|
||||
|
||||
### Checklist
|
||||
|
||||
1. **Tree-sitter parser**: Add `tree-sitter-<lang>` to `Cargo.toml`.
|
||||
|
||||
2. **Language registration**: Register the parser in `ast.rs` (language detection from file extension, parser initialization).
|
||||
|
||||
3. **CFG node kinds**: Create `src/labels/<lang>.rs` with a `KINDS` map that maps tree-sitter node types to the internal `Kind` enum (`Block`, `If`, `While`, `For`, `Return`, `CallFn`, `CallMethod`, `Assignment`, etc.).
|
||||
|
||||
4. **Parameter extraction**: Add a `PARAM_CONFIG` constant specifying how to extract function parameters from the AST (field name for parameter list, node type for individual parameters, extraction field for parameter names).
|
||||
|
||||
5. **Label rules**: Add `RULES` (sources, sinks, sanitizers) and `TERMINATORS` to the labels file.
|
||||
|
||||
6. **AST patterns**: Create `src/patterns/<lang>.rs` with a `PATTERNS` constant.
|
||||
|
||||
7. **Registry updates**:
|
||||
- `src/patterns/mod.rs` — add to the `REGISTRY` HashMap
|
||||
- `src/labels/mod.rs` — add to the `classify()` dispatch
|
||||
|
||||
8. **File extension mapping**: Add the extension in `ast.rs`.
|
||||
|
||||
9. **Tests**: Write unit tests and add test fixtures.
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Unit Tests
|
||||
|
||||
All tests are inline `#[test]` blocks inside source modules. Run them with:
|
||||
|
||||
```bash
|
||||
cargo test --bin nyx
|
||||
```
|
||||
|
||||
### What to Test
|
||||
|
||||
- **New AST patterns**: Ensure the tree-sitter query matches the intended construct and does not match safe alternatives.
|
||||
- **New taint rules**: Verify that source-to-sink flows are detected and that sanitizers properly neutralize findings.
|
||||
- **New CFG rules**: Test that guard dominance logic correctly suppresses findings when guards are present.
|
||||
- **Edge cases**: Empty files, files with syntax errors (tree-sitter is error-tolerant), deeply nested structures.
|
||||
|
||||
### Linting
|
||||
|
||||
CI runs Clippy with strict settings. Before submitting:
|
||||
|
||||
```bash
|
||||
cargo clippy --all -- -D warnings
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pull Request Guidelines
|
||||
|
||||
1. **Branch from `master`**. Use descriptive branch names: `feat/add-kotlin-support`, `fix/false-positive-sql-concat`, `docs/update-rule-reference`.
|
||||
|
||||
2. **Keep PRs focused**. One logical change per PR.
|
||||
|
||||
3. **Ensure CI passes**:
|
||||
```bash
|
||||
git commit -s -m "feat: add XYZ"
|
||||
cargo test --bin nyx
|
||||
cargo clippy --all -- -D warnings
|
||||
cargo fmt -- --check
|
||||
```
|
||||
|
||||
4. Push the branch and open a PR against `main`.
|
||||
4. **Commit style**: Use [Conventional Commits](https://www.conventionalcommits.org/).
|
||||
```
|
||||
feat(patterns): add Python subprocess.Popen pattern
|
||||
fix(taint): prevent false positive on sanitized innerHTML
|
||||
docs(rules): update JavaScript rule reference
|
||||
```
|
||||
|
||||
5. **Document new rules**. If you add patterns or taint rules, update the corresponding `docs/rules/<lang>.md` page.
|
||||
|
||||
6. **Include test cases** for any new detection rules.
|
||||
|
||||
---
|
||||
|
||||
## Commit & Branching Conventions
|
||||
## Bug Reports
|
||||
|
||||
* **Branch names**: `feature/<slug>`, `fix/<slug>`, `docs/<slug>`
|
||||
* **Commit style** – Conventional Commits (simplified):
|
||||
Please [open an issue](https://github.com/elicpeter/nyx/issues) for:
|
||||
|
||||
```text
|
||||
type(scope): subject
|
||||
|
||||
body (optional)
|
||||
```
|
||||
|
||||
| Type | Use for |
|
||||
|------------|--------------------------------------|
|
||||
| `feat` | New functionality |
|
||||
| `fix` | Bug fixes |
|
||||
| `docs` | Documentation only |
|
||||
| `refactor` | Code change without behaviour change |
|
||||
| `test` | Adding or changing tests |
|
||||
| `chore` | Build process, tooling |
|
||||
- **Crashes or panics** — include the backtrace (`RUST_BACKTRACE=1 nyx scan .`)
|
||||
- **False positives** — include the minimal code snippet, rule ID, and Nyx version
|
||||
- **False negatives** — describe what you expected Nyx to find and why
|
||||
- **Documentation errors** — point to the specific page and what's wrong
|
||||
|
||||
---
|
||||
|
||||
## Style Guide
|
||||
## Feature Requests
|
||||
|
||||
* **Formatting**: run `cargo fmt` before committing.
|
||||
* **Linting**: CI runs Clippy with `-D warnings`; keep the tree warning‑free.
|
||||
* **Unsafe Rust**: prohibited unless absolutely necessary. Justify with in‑code comments.
|
||||
* **Public API stability**: avoid breaking changes on exported types and functions without prior discussion.
|
||||
We welcome well-motivated feature proposals. Please describe:
|
||||
|
||||
1. **Problem statement** — what pain point does this solve?
|
||||
2. **Proposed solution** — high-level description, optionally with pseudo-code.
|
||||
3. **Alternatives considered** — why existing functionality is not enough.
|
||||
|
||||
---
|
||||
|
||||
## Security Policy
|
||||
## Release Process
|
||||
|
||||
Please do **not** open public issues for security‑sensitive bugs. Instead, email the maintainers at `<security@example.com>` with the details and a proof of concept. We aim to acknowledge reports within **48 hours**.
|
||||
1. Update version in `Cargo.toml`.
|
||||
2. Update `CHANGELOG.md` with the new version section.
|
||||
3. Run full test suite: `cargo test --bin nyx && cargo clippy --all -- -D warnings`.
|
||||
4. Create a git tag: `git tag v0.x.y`.
|
||||
5. Push tag: `git push origin v0.x.y`.
|
||||
6. CI builds release binaries and publishes to crates.io.
|
||||
|
||||
---
|
||||
|
||||
## Community Standards
|
||||
## Security Issues
|
||||
|
||||
We strive to maintain a welcoming and inclusive community. Harassment, discrimination, or other forms of unacceptable behavior will be addressed per the [Code of Conduct](CODE_OF_CONDUCT.md).
|
||||
Please do **not** open public issues for security-sensitive bugs. See [SECURITY.md](SECURITY.md) for our responsible disclosure process.
|
||||
|
||||
Thank you for helping to make Nyx better!
|
||||
---
|
||||
|
||||
## License
|
||||
|
||||
By contributing to Nyx, you agree that your contributions will be licensed under the [GPL-3.0](./LICENSE).
|
||||
|
|
|
|||
110
Cargo.lock
generated
110
Cargo.lock
generated
|
|
@ -71,7 +71,7 @@ version = "1.1.5"
|
|||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "40c48f72fd53cd289104fc64099abca73db4166ad86ea0b4341abe65af83dadc"
|
||||
dependencies = [
|
||||
"windows-sys",
|
||||
"windows-sys 0.61.2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
|
@ -82,7 +82,7 @@ checksum = "291e6a250ff86cd4a820112fb8898808a366d8f9f58ce16d1f538353ad55747d"
|
|||
dependencies = [
|
||||
"anstyle",
|
||||
"once_cell_polyfill",
|
||||
"windows-sys",
|
||||
"windows-sys 0.61.2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
|
@ -283,7 +283,7 @@ dependencies = [
|
|||
"libc",
|
||||
"once_cell",
|
||||
"unicode-width",
|
||||
"windows-sys",
|
||||
"windows-sys 0.61.2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
|
@ -429,7 +429,7 @@ dependencies = [
|
|||
"libc",
|
||||
"option-ext",
|
||||
"redox_users",
|
||||
"windows-sys",
|
||||
"windows-sys 0.61.2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
|
@ -457,7 +457,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
|
|||
checksum = "39cab71617ae0d63f51a36d69f866391735b51691dbda63cf6f96d042b63efeb"
|
||||
dependencies = [
|
||||
"libc",
|
||||
"windows-sys",
|
||||
"windows-sys 0.61.2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
|
@ -805,7 +805,7 @@ version = "0.50.3"
|
|||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "7957b9740744892f114936ab4a57b3f487491bbeafaf8083688b16841a4240e5"
|
||||
dependencies = [
|
||||
"windows-sys",
|
||||
"windows-sys 0.61.2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
|
@ -835,7 +835,7 @@ dependencies = [
|
|||
|
||||
[[package]]
|
||||
name = "nyx-scanner"
|
||||
version = "0.3.0"
|
||||
version = "0.4.0"
|
||||
dependencies = [
|
||||
"assert_cmd",
|
||||
"bitflags",
|
||||
|
|
@ -862,7 +862,9 @@ dependencies = [
|
|||
"rusqlite",
|
||||
"serde",
|
||||
"serde_json",
|
||||
"smallvec",
|
||||
"tempfile",
|
||||
"terminal_size",
|
||||
"thiserror",
|
||||
"toml",
|
||||
"tracing",
|
||||
|
|
@ -1272,7 +1274,7 @@ dependencies = [
|
|||
"errno",
|
||||
"libc",
|
||||
"linux-raw-sys",
|
||||
"windows-sys",
|
||||
"windows-sys 0.61.2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
|
@ -1436,7 +1438,17 @@ dependencies = [
|
|||
"getrandom 0.4.1",
|
||||
"once_cell",
|
||||
"rustix",
|
||||
"windows-sys",
|
||||
"windows-sys 0.61.2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "terminal_size"
|
||||
version = "0.4.3"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "60b8cb979cb11c32ce1603f8137b22262a9d131aaa5c37b5678025f22b8becd0"
|
||||
dependencies = [
|
||||
"rustix",
|
||||
"windows-sys 0.60.2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
|
@ -1631,9 +1643,9 @@ dependencies = [
|
|||
|
||||
[[package]]
|
||||
name = "tree-sitter"
|
||||
version = "0.26.5"
|
||||
version = "0.26.6"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "12987371f54efc9b9306a20dc87ed5aaee9f320c8a8b115e28515c412b2efe39"
|
||||
checksum = "13f456d2108c3fef07342ba4689a8503ec1fb5beed245e2b9be93096ef394848"
|
||||
dependencies = [
|
||||
"cc",
|
||||
"regex",
|
||||
|
|
@ -1967,7 +1979,7 @@ version = "0.1.11"
|
|||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "c2a7b1c03c876122aa43f3020e6c3c3ee5c05081c9a00739faf7503aeba10d22"
|
||||
dependencies = [
|
||||
"windows-sys",
|
||||
"windows-sys 0.61.2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
|
@ -2035,6 +2047,15 @@ dependencies = [
|
|||
"windows-link",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "windows-sys"
|
||||
version = "0.60.2"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "f2f500e4d28234f72040990ec9d39e3a6b950f9f22d3dba18416c35882612bcb"
|
||||
dependencies = [
|
||||
"windows-targets",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "windows-sys"
|
||||
version = "0.61.2"
|
||||
|
|
@ -2044,6 +2065,71 @@ dependencies = [
|
|||
"windows-link",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "windows-targets"
|
||||
version = "0.53.5"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "4945f9f551b88e0d65f3db0bc25c33b8acea4d9e41163edf90dcd0b19f9069f3"
|
||||
dependencies = [
|
||||
"windows-link",
|
||||
"windows_aarch64_gnullvm",
|
||||
"windows_aarch64_msvc",
|
||||
"windows_i686_gnu",
|
||||
"windows_i686_gnullvm",
|
||||
"windows_i686_msvc",
|
||||
"windows_x86_64_gnu",
|
||||
"windows_x86_64_gnullvm",
|
||||
"windows_x86_64_msvc",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "windows_aarch64_gnullvm"
|
||||
version = "0.53.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "a9d8416fa8b42f5c947f8482c43e7d89e73a173cead56d044f6a56104a6d1b53"
|
||||
|
||||
[[package]]
|
||||
name = "windows_aarch64_msvc"
|
||||
version = "0.53.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "b9d782e804c2f632e395708e99a94275910eb9100b2114651e04744e9b125006"
|
||||
|
||||
[[package]]
|
||||
name = "windows_i686_gnu"
|
||||
version = "0.53.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "960e6da069d81e09becb0ca57a65220ddff016ff2d6af6a223cf372a506593a3"
|
||||
|
||||
[[package]]
|
||||
name = "windows_i686_gnullvm"
|
||||
version = "0.53.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "fa7359d10048f68ab8b09fa71c3daccfb0e9b559aed648a8f95469c27057180c"
|
||||
|
||||
[[package]]
|
||||
name = "windows_i686_msvc"
|
||||
version = "0.53.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "1e7ac75179f18232fe9c285163565a57ef8d3c89254a30685b57d83a38d326c2"
|
||||
|
||||
[[package]]
|
||||
name = "windows_x86_64_gnu"
|
||||
version = "0.53.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "9c3842cdd74a865a8066ab39c8a7a473c0778a3f29370b5fd6b4b9aa7df4a499"
|
||||
|
||||
[[package]]
|
||||
name = "windows_x86_64_gnullvm"
|
||||
version = "0.53.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "0ffa179e2d07eee8ad8f57493436566c7cc30ac536a3379fdf008f47f6bb7ae1"
|
||||
|
||||
[[package]]
|
||||
name = "windows_x86_64_msvc"
|
||||
version = "0.53.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "d6bbff5f0aada427a1e5a6da5f1f98158182f26556f345ac9e04d36d0ebed650"
|
||||
|
||||
[[package]]
|
||||
name = "winnow"
|
||||
version = "0.7.14"
|
||||
|
|
|
|||
|
|
@ -1,13 +1,13 @@
|
|||
[package]
|
||||
name = "nyx-scanner"
|
||||
version = "0.3.0"
|
||||
version = "0.4.0"
|
||||
edition = "2024"
|
||||
description = "A CLI security scanner for automating vulnerability checks"
|
||||
license = "GPL-3.0"
|
||||
authors = ["Eli Peter <elicpeter@exmaple.com>"]
|
||||
homepage = "https://github.com/elicpeter/nyx"
|
||||
repository = "https://github.com/elicpeter/nyx"
|
||||
documentation = "https://github.com/elicpeter/nyx#readme"
|
||||
documentation = "https://github.com/elicpeter/nyx/tree/master/docs"
|
||||
keywords = ["security", "vulnerability", "scanner", "static-analysis", "cli"]
|
||||
categories = ["security", "command-line-utilities", "development-tools", "parser-implementations", "text-processing"]
|
||||
readme = "README.md"
|
||||
|
|
@ -56,7 +56,7 @@ num_cpus = "1.17.0"
|
|||
rusqlite = { version = "0.38.0", features = ["bundled"] }
|
||||
r2d2_sqlite = { version = "0.32.0", features = ["bundled"] }
|
||||
ignore = "0.4.25"
|
||||
tree-sitter = "0.26.5"
|
||||
tree-sitter = "0.26.6"
|
||||
tree-sitter-rust = "0.24.0"
|
||||
tree-sitter-c = "0.24.1"
|
||||
tree-sitter-cpp = "0.23.4"
|
||||
|
|
@ -71,6 +71,7 @@ crossbeam-channel = "0.5.15"
|
|||
blake3 = "1.8.3"
|
||||
once_cell = "1.21.3"
|
||||
console = "0.16.2"
|
||||
terminal_size = "0.4"
|
||||
rayon = "1.11.0"
|
||||
r2d2 = "0.8.10"
|
||||
bytesize = "2.3.1"
|
||||
|
|
@ -81,3 +82,4 @@ petgraph = "0.8.3"
|
|||
bitflags = "2.11.0"
|
||||
phf = { version = "0.13.1", features = ["macros"] }
|
||||
indicatif = "0.18.4"
|
||||
smallvec = "1.15"
|
||||
|
|
|
|||
119
README.md
119
README.md
|
|
@ -6,7 +6,7 @@
|
|||
[](https://crates.io/crates/nyx-scanner)
|
||||
[](https://www.gnu.org/licenses/gpl-3.0)
|
||||
[](https://www.rust-lang.org)
|
||||
[](https://github.com/ecpeter23/nyx/actions)
|
||||
[](https://github.com/elicpeter/nyx/actions)
|
||||
</div>
|
||||
|
||||
---
|
||||
|
|
@ -24,7 +24,7 @@
|
|||
| Multi-language support | Rust, C, C++, Java, Go, PHP, Python, Ruby, TypeScript, JavaScript |
|
||||
| AST-level pattern matching | Language-specific queries written against precise parse trees |
|
||||
| Control-flow graph analysis | Auth gaps, unguarded sinks, unreachable security code, resource leaks, error fallthrough |
|
||||
| Cross-file taint tracking | BFS taint propagation from sources through sanitizers to sinks with function summaries |
|
||||
| Cross-file taint tracking | Monotone forward dataflow taint analysis from sources through sanitizers to sinks with function summaries |
|
||||
| Cross-language interop | Taint flows across language boundaries via explicit interop edges |
|
||||
| Two-pass architecture | Pass 1 extracts function summaries; Pass 2 runs taint with full cross-file context |
|
||||
| Incremental indexing | SQLite database stores file hashes, summaries, and findings to skip unchanged files |
|
||||
|
|
@ -42,7 +42,7 @@
|
|||
|---|---|
|
||||
| **Pure-Rust, single binary** | No JVM, Python, or server to install; drop the `nyx` executable into your `$PATH` and go. |
|
||||
| **Massively parallel** | Uses Rayon and a thread-pool walker; scales to all CPU cores. Scanning the entire **rust-lang/rust** codebase (~53,000 files) on an M2 MacBook Pro takes **~1 s**. |
|
||||
| **Deep analysis** | Real CFG construction and taint propagation, not just regex matching. Cross-file function summaries, capability-based sanitizer tracking, and scored findings. |
|
||||
| **Deep analysis** | Real CFG construction and monotone dataflow taint analysis with guaranteed termination, not just regex matching. Cross-file function summaries, capability-based sanitizer tracking, and scored findings. |
|
||||
| **Index-aware** | An optional SQLite index stores file hashes and findings; subsequent scans touch *only* changed files, slashing CI times. |
|
||||
| **Offline & privacy-friendly** | Requires no login, cloud account, or telemetry. Perfect for air-gapped environments and strict compliance policies. |
|
||||
| **Tree-sitter precision** | Parses real language grammars, not regexes, giving far fewer false positives than line-based scanners. |
|
||||
|
|
@ -58,7 +58,7 @@ $ cargo install nyx-scanner
|
|||
```
|
||||
|
||||
### Install Github release
|
||||
1. Navigate to the [Releases](https://github.com/ecpeter23/nyx/releases) page of the repository.
|
||||
1. Navigate to the [Releases](https://github.com/elicpeter/nyx/releases) page of the repository.
|
||||
2. Download the appropriate binary for your system:
|
||||
|
||||
```nyx-x86_64-unknown-linux-gnu.zip``` for Linux
|
||||
|
|
@ -87,7 +87,7 @@ $ cargo install nyx-scanner
|
|||
### Build from source
|
||||
|
||||
```bash
|
||||
$ git clone https://github.com/ecpeter23/nyx.git
|
||||
$ git clone https://github.com/elicpeter/nyx.git
|
||||
$ cd nyx
|
||||
$ cargo build --release
|
||||
# optional – copy the binary into PATH
|
||||
|
|
@ -111,20 +111,29 @@ $ nyx scan ./server --format json
|
|||
$ nyx scan --format sarif > results.sarif
|
||||
|
||||
# Perform an ad-hoc scan without touching the index
|
||||
$ nyx scan --no-index
|
||||
$ nyx scan --index off
|
||||
|
||||
# Restrict results to high-severity findings
|
||||
$ nyx scan --high-only
|
||||
$ nyx scan --severity HIGH
|
||||
|
||||
# Filter by severity expression (high and medium)
|
||||
$ nyx scan --severity ">=MEDIUM"
|
||||
|
||||
# AST pattern matching only (fastest, no CFG/taint)
|
||||
$ nyx scan --ast-only
|
||||
$ nyx scan --mode ast
|
||||
|
||||
# CFG + taint analysis only (skip AST pattern rules)
|
||||
$ nyx scan --cfg-only
|
||||
$ nyx scan --mode cfg
|
||||
|
||||
# CI gate: fail on medium+, SARIF output
|
||||
$ nyx scan --format sarif --fail-on MEDIUM > results.sarif
|
||||
|
||||
# Suppress status messages (for CI/scripting)
|
||||
$ nyx scan --quiet --format json
|
||||
|
||||
# Include test/vendor/benchmark paths at original severity
|
||||
# (by default these are downgraded one tier)
|
||||
$ nyx scan --include-nonprod
|
||||
$ nyx scan --keep-nonprod-severity
|
||||
```
|
||||
|
||||
### Index Management
|
||||
|
|
@ -164,13 +173,14 @@ $ nyx config add-terminator --lang javascript --name process.exit
|
|||
|
||||
## Analysis Modes
|
||||
|
||||
Nyx supports three analysis modes, selectable via the `scanner.mode` config option or CLI flags:
|
||||
Nyx supports four analysis modes, selectable via `--mode` or the `scanner.mode` config option:
|
||||
|
||||
| Mode | CLI flag | What runs |
|
||||
|---|---|---|
|
||||
| **Full** (default) | — | AST pattern matching + CFG construction + taint analysis |
|
||||
| **AST-only** | `--ast-only` | AST pattern matching only; skips CFG and taint entirely |
|
||||
| **Taint-only** | `--cfg-only` | CFG + taint analysis only; filters out AST pattern findings |
|
||||
| **Full** (default) | `--mode full` | AST pattern matching + CFG construction + taint analysis |
|
||||
| **AST-only** | `--mode ast` | AST pattern matching only; skips CFG and taint entirely |
|
||||
| **CFG** | `--mode cfg` | CFG + taint analysis only; filters out AST pattern findings |
|
||||
| **Taint** | `--mode taint` | Alias for `cfg` (CFG + taint analysis) |
|
||||
|
||||
### What the CFG + taint engine detects
|
||||
|
||||
|
|
@ -182,8 +192,40 @@ Nyx supports three analysis modes, selectable via the `scanner.mode` config opti
|
|||
| Unreachable security code | `cfg-unreachable-*` | Sanitizers, guards, or sinks in dead code branches |
|
||||
| Error fallthrough | `cfg-error-fallthrough` | Error-handling branches that don't terminate, allowing execution to fall through to dangerous operations |
|
||||
| Resource leak | `cfg-resource-leak` | Resources acquired but not released on all exit paths (malloc/free, fopen/fclose, Lock/Unlock) |
|
||||
| Use-after-close | `state-use-after-close` | Variable read/written after its resource handle was closed |
|
||||
| Double-close | `state-double-close` | Resource handle closed more than once |
|
||||
| Must-leak | `state-resource-leak` | Resource acquired but never closed on any exit path |
|
||||
| May-leak | `state-resource-leak-possible` | Resource open on some but not all exit paths |
|
||||
| Unauthenticated access | `state-unauthed-access` | Sensitive sink reached without a preceding auth/admin check |
|
||||
|
||||
Findings are scored and ranked by severity, proximity to entry point, path complexity, and taint confirmation.
|
||||
### Attack Surface Ranking
|
||||
|
||||
Every finding is assigned a deterministic **attack-surface score** that estimates exploitability using only information already in memory — no extra source passes are needed. Findings are sorted by descending score before truncation, so `max_results` always keeps the most important results.
|
||||
|
||||
The score is the sum of five components:
|
||||
|
||||
| Component | Weight | Description |
|
||||
|---|---|---|
|
||||
| **Severity base** | High = 60, Medium = 30, Low = 10 | Primary ordering signal. Severity reflects source-kind exploitability and rule confidence. |
|
||||
| **Analysis kind** | taint = +10, state = +8, cfg = +3/+5, ast = 0 | Taint-confirmed flows are the strongest signal; AST-only pattern matches rank lowest at equal severity. CFG findings with evidence get +5, without get +3. |
|
||||
| **Evidence strength** | +1 per evidence item (max 4), +2–6 for source kind | More evidence increases confidence. Source-kind priority: user input (+6) > env/config (+5) > unknown (+4) > file system (+3) > database (+2). |
|
||||
| **State rule type** | +1 to +6 | Use-after-close and unauthenticated access (+6) rank above double-close (+3), must-leak (+2), and may-leak (+1). |
|
||||
| **Path validation** | −5 | Findings on paths guarded by a validation predicate receive a small exploitability penalty — the guard may prevent triggering. |
|
||||
|
||||
**Score ranges** (approximate):
|
||||
|
||||
| Finding type | Score |
|
||||
|---|---|
|
||||
| High taint + user input | ~78 |
|
||||
| High state (use-after-close) | ~74 |
|
||||
| High CFG structural | ~63 |
|
||||
| Medium taint + env source | ~47 |
|
||||
| Medium state (resource leak) | ~40 |
|
||||
| Low AST-only pattern | ~10 |
|
||||
|
||||
Tie-breaking is deterministic: severity → rule ID → file path → line → column → message hash. The same set of findings always produces the same ordering regardless of parallelism or input order.
|
||||
|
||||
Ranking is enabled by default. Disable it with `--no-rank` or `output.attack_surface_ranking = false` in config. When disabled, `rank_score` is omitted from JSON/SARIF output.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -213,8 +255,8 @@ Nyx merges a default configuration file (`nyx.conf`) with user overrides (`nyx.l
|
|||
| Platform | Directory |
|
||||
|---|---|
|
||||
| Linux | `~/.config/nyx/` |
|
||||
| macOS | `~/Library/Application Support/dev.ecpeter23.nyx/` |
|
||||
| Windows | `%APPDATA%\ecpeter23\nyx\config\` |
|
||||
| macOS | `~/Library/Application Support/nyx/` |
|
||||
| Windows | `%APPDATA%\elicpeter\nyx\config\` |
|
||||
|
||||
Minimal example (`nyx.local`):
|
||||
|
||||
|
|
@ -270,7 +312,7 @@ Nyx uses a **two-pass architecture** to enable cross-file analysis without sacri
|
|||
1. **File enumeration** -- A parallel walker (Rayon + `ignore` crate) applies gitignore rules, size limits, and user exclusions.
|
||||
2. **Pass 1 -- Summary extraction** -- Each file is parsed via tree-sitter, an intra-procedural CFG is built (petgraph), and a `FuncSummary` is exported per function capturing source/sanitizer/sink capabilities (bitflags), taint propagation behavior, and callee lists. Summaries are persisted to SQLite.
|
||||
3. **Summary merge** -- All per-file summaries are merged into a `GlobalSummaries` map with conservative conflict resolution (union caps, OR booleans).
|
||||
4. **Pass 2 -- Analysis** -- Files are re-parsed and analyzed with the full cross-file context: BFS taint propagation resolves callees against local and global summaries, CFG analysis checks for auth gaps, unguarded sinks, resource leaks, and more.
|
||||
4. **Pass 2 -- Analysis** -- Files are re-parsed and analyzed with the full cross-file context: a monotone forward dataflow engine resolves callees against local and global summaries and propagates taint through a bounded lattice with guaranteed convergence. CFG analysis checks for auth gaps, unguarded sinks, resource leaks, and more.
|
||||
5. **Reporting** -- Findings are scored, ranked, deduplicated, and emitted to the console or serialized as JSON.
|
||||
|
||||
With indexing enabled, Pass 1 skips files whose blake3 content hash is unchanged, and cached findings are served directly for AST-only results.
|
||||
|
|
@ -279,14 +321,19 @@ With indexing enabled, Pass 1 skips files whose blake3 content hash is unchanged
|
|||
|
||||
## Roadmap
|
||||
|
||||
### Phase 1 -- Deep Static Engine
|
||||
### Phase 1 -- Deep Static Engine (Complete)
|
||||
|
||||
| Feature | Description |
|
||||
|---|---|
|
||||
| Interprocedural call graph | Precise symbol resolution via `FuncKey`, language-scoped namespaces, cross-module linking. No name-collision merging -- full call graph with topological analysis. |
|
||||
| Path-sensitive analysis | Track path predicates and conditional constraints. Detect infeasible paths and validation-only-in-one-branch patterns. Dramatically reduces false positives. |
|
||||
| Dataflow & state modeling | Resource state machines (init -> use -> close), auth state transitions, privilege level tracking. Semantic analysis beyond pattern matching. |
|
||||
| Attack surface ranking | Score entry points by distance-to-sink, guard strength, path complexity, and privilege escalation potential. Deterministic attack surface scoring. |
|
||||
| Feature | Status | Description |
|
||||
|---|--------|---|
|
||||
| Interprocedural call graph | Done | Precise symbol resolution via `FuncKey`, language-scoped namespaces, cross-module linking. Full call graph with SCC and topological analysis. |
|
||||
| Path-sensitive analysis | Done | Track path predicates and conditional constraints. Detect infeasible paths and validation-only-in-one-branch patterns. Monotone predicate summaries with contradiction pruning. |
|
||||
| Dataflow & state modeling | Done | Resource state machines (init -> use -> close), auth state transitions, privilege level tracking. Generic `Transfer` trait over bounded lattices with guaranteed convergence. |
|
||||
| Monotone taint analysis | Done | Replaced BFS taint engine with a forward worklist dataflow analysis over a finite `TaintState` lattice. Multi-origin tracking, dual validated-must/may sets, JS/TS two-level solve. Guaranteed termination via lattice finiteness. |
|
||||
| Attack surface ranking | Done | Deterministic post-analysis scoring of findings by severity, analysis kind, evidence strength, source-kind exploitability, and validation state. Findings sorted by score before truncation so `max_results` keeps the most important results. |
|
||||
| Inline suppressions | Done | `nyx:ignore` and `nyx:ignore-next-line` comments with wildcard matching, all 10 languages supported. `--show-suppressed` flag for visibility. |
|
||||
| Low-noise prioritization | Done | Category filtering, rollup grouping for high-frequency rules, configurable LOW budgets. Quality-category findings hidden by default. |
|
||||
| Pattern-level confidence | Done | Explicit High/Medium/Low confidence on every AST pattern. Confidence flows into output alongside severity and rank score. |
|
||||
| AST pattern overhaul | Done | 30+ new patterns across all languages, 11 broken query fixes, namespaced IDs, severity recalibration. |
|
||||
|
||||
### Phase 2 -- Dynamic Capability
|
||||
|
||||
|
|
@ -312,7 +359,25 @@ With indexing enabled, Pass 1 skips files whose blake3 content hash is unchanged
|
|||
| Rule updates | Remote rule feed with signature verification |
|
||||
| UX | Smart file-watch re-scan |
|
||||
|
||||
Community feedback shapes priorities -- please [open an issue](https://github.com/ecpeter23/nyx/issues) to discuss proposed changes.
|
||||
Community feedback shapes priorities -- please [open an issue](https://github.com/elicpeter/nyx/issues) to discuss proposed changes.
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
|
||||
Full documentation is available in the [`docs/`](docs/index.md) directory:
|
||||
|
||||
- [Installation](docs/installation.md) — cargo, binaries, CI tips
|
||||
- [Quick Start](docs/quickstart.md) — Your first scan in 60 seconds
|
||||
- [CLI Reference](docs/cli.md) — Every flag and subcommand
|
||||
- [Configuration](docs/configuration.md) — Config file schema, custom rules
|
||||
- [Output Formats](docs/output.md) — Console, JSON, SARIF; exit codes
|
||||
- [Detector Overview](docs/detectors.md) — How the four detector families work
|
||||
- [Taint Analysis](docs/detectors/taint.md) — Cross-file source-to-sink dataflow
|
||||
- [CFG Structural](docs/detectors/cfg.md) — Auth gaps, unguarded sinks, resource leaks
|
||||
- [State Model](docs/detectors/state.md) — Resource lifecycle, authentication state
|
||||
- [AST Patterns](docs/detectors/patterns.md) — Tree-sitter structural matching
|
||||
- [Rule Reference](docs/rules/index.md) — Per-language rule listings with examples
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -327,7 +392,7 @@ Pull requests are welcome. To contribute:
|
|||
|
||||
Please open an issue for any crash, panic, or suspicious result -- attach the minimal code snippet and mention the Nyx version.
|
||||
|
||||
See `CONTRIBUTING.md` for full guidelines.
|
||||
See [`CONTRIBUTING.md`](CONTRIBUTING.md) for full guidelines, including how to add new rules and support new languages.
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -4,9 +4,9 @@
|
|||
|
||||
| Version | Supported | Notes |
|
||||
|---------|-----------|----------------------|
|
||||
| 0.3.x | ✅ | Latest stable line |
|
||||
| 0.2.x | ✅ | Critical fixes only |
|
||||
| < 0.2 | ❌ | End-of-life |
|
||||
| 0.4.x | ✅ | Latest stable line |
|
||||
| 0.3.x | ✅ | Critical fixes only |
|
||||
| < 0.3 | ❌ | End-of-life |
|
||||
|
||||
We follow [Semantic Versioning] as soon as we hit **1.0.0**.
|
||||
Before that, breaking changes may land in any minor release.
|
||||
|
|
|
|||
|
|
@ -52,6 +52,11 @@ follow_symlinks = false
|
|||
## Scan hidden files (dot-files)
|
||||
scan_hidden_files = false
|
||||
|
||||
## Enable state-model dataflow analysis (resource lifecycle + auth state).
|
||||
## Detects use-after-close, double-close, resource leaks, and unauthed access.
|
||||
## Requires mode = "full" or "taint" (needs CFG). Default: off.
|
||||
enable_state_analysis = false
|
||||
|
||||
|
||||
[database]
|
||||
|
||||
|
|
@ -70,15 +75,48 @@ vacuum_on_startup = false
|
|||
|
||||
[output]
|
||||
|
||||
## Output format — only "console" exists for now
|
||||
## Output format: console | json | sarif
|
||||
default_format = "console"
|
||||
|
||||
## Suppress all console output (UNIMPLEMENTED)
|
||||
## Suppress all human-readable status output (stderr)
|
||||
quiet = false
|
||||
|
||||
## Enable attack-surface ranking (sort findings by exploitability score)
|
||||
attack_surface_ranking = true
|
||||
|
||||
## Cap the number of issues shown; null = unlimited
|
||||
max_results = null
|
||||
|
||||
## Minimum attack-surface score to include; null = no minimum
|
||||
## Findings below this threshold are dropped after ranking.
|
||||
## Requires attack_surface_ranking to be enabled.
|
||||
min_score = null
|
||||
|
||||
## Minimum confidence level to include in output; null = no minimum
|
||||
## Values: "low", "medium", "high"
|
||||
# min_confidence = "medium"
|
||||
|
||||
## Include Quality-category findings (excluded by default).
|
||||
## Quality findings (e.g. unwrap, expect, panic) are noise-heavy and hidden
|
||||
## unless this is set to true or --include-quality is passed.
|
||||
include_quality = false
|
||||
|
||||
## Show all findings: disables category filtering, rollups, and LOW budgets.
|
||||
## Equivalent to --all on the command line.
|
||||
show_all = false
|
||||
|
||||
## Maximum total LOW findings to show (rollups count as 1).
|
||||
max_low = 20
|
||||
|
||||
## Maximum LOW findings per file (rollups count as 1).
|
||||
max_low_per_file = 1
|
||||
|
||||
## Maximum LOW findings per rule (rollups count as 1).
|
||||
max_low_per_rule = 10
|
||||
|
||||
## Number of example locations stored in rollup findings.
|
||||
rollup_examples = 5
|
||||
|
||||
|
||||
[performance]
|
||||
|
||||
|
|
|
|||
234
docs/cli.md
Normal file
234
docs/cli.md
Normal file
|
|
@ -0,0 +1,234 @@
|
|||
# CLI Reference
|
||||
|
||||
## Global
|
||||
|
||||
```
|
||||
nyx [COMMAND]
|
||||
nyx --version
|
||||
nyx --help
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## `nyx scan`
|
||||
|
||||
Run a security scan on a directory.
|
||||
|
||||
```
|
||||
nyx scan [PATH] [OPTIONS]
|
||||
```
|
||||
|
||||
**PATH** defaults to `.` (current directory).
|
||||
|
||||
### Analysis Mode
|
||||
|
||||
| Flag | Default | Description |
|
||||
|------|---------|-------------|
|
||||
| `--mode <MODE>` | `full` | Analysis mode: `full`, `ast`, `cfg`, or `taint` |
|
||||
|
||||
| Mode | What runs |
|
||||
|------|-----------|
|
||||
| `full` | AST patterns + CFG structural analysis + taint analysis |
|
||||
| `ast` | AST patterns only (fastest, no CFG or taint) |
|
||||
| `cfg` / `taint` | CFG + taint analysis only (no AST patterns) |
|
||||
|
||||
**Deprecated aliases**: `--ast-only` (use `--mode ast`), `--cfg-only` (use `--mode cfg`), `--all-targets` (use `--mode full`).
|
||||
|
||||
### Index Control
|
||||
|
||||
| Flag | Default | Description |
|
||||
|------|---------|-------------|
|
||||
| `--index <MODE>` | `auto` | Index behavior: `auto`, `off`, or `rebuild` |
|
||||
|
||||
| Index Mode | Behavior |
|
||||
|------------|----------|
|
||||
| `auto` | Use existing index if available; build if missing |
|
||||
| `off` | Skip indexing, scan filesystem directly |
|
||||
| `rebuild` | Force rebuild index before scanning |
|
||||
|
||||
**Deprecated aliases**: `--no-index` (use `--index off`), `--rebuild-index` (use `--index rebuild`).
|
||||
|
||||
### Output
|
||||
|
||||
| Flag | Default | Description |
|
||||
|------|---------|-------------|
|
||||
| `-f, --format <FMT>` | `console` | Output format: `console`, `json`, or `sarif` |
|
||||
| `--quiet` | off | Suppress status messages (stderr); stdout stays clean |
|
||||
| `--no-rank` | off | Disable attack-surface ranking |
|
||||
|
||||
### Filtering
|
||||
|
||||
| Flag | Default | Description |
|
||||
|------|---------|-------------|
|
||||
| `--severity <EXPR>` | *(none)* | Filter findings by severity |
|
||||
| `--min-score <N>` | *(none)* | Drop findings with rank score below N |
|
||||
| `--min-confidence <LEVEL>` | *(none)* | Drop findings below this confidence level (`low`, `medium`, `high`) |
|
||||
| `--fail-on <SEV>` | *(none)* | Exit code 1 if any finding >= this severity |
|
||||
| `--show-suppressed` | off | Show inline-suppressed findings (dimmed, tagged `[SUPPRESSED]`) |
|
||||
| `--keep-nonprod-severity` | off | Don't downgrade severity for test/vendor paths |
|
||||
| `--all` | off | Disable category filtering, rollups, and LOW budgets — show everything |
|
||||
| `--include-quality` | off | Include Quality-category findings (hidden by default) |
|
||||
| `--max-low <N>` | `20` | Maximum total LOW findings to show |
|
||||
| `--max-low-per-file <N>` | `1` | Maximum LOW findings per file |
|
||||
| `--max-low-per-rule <N>` | `10` | Maximum LOW findings per rule |
|
||||
| `--rollup-examples <N>` | `5` | Number of example locations in rollup findings |
|
||||
| `--show-instances <RULE>` | *(none)* | Expand all instances of a specific rule (bypass rollup) |
|
||||
|
||||
**Severity expression formats**:
|
||||
|
||||
```bash
|
||||
--severity HIGH # Only high
|
||||
--severity "HIGH,MEDIUM" # High or medium
|
||||
--severity ">=MEDIUM" # Medium and above (high + medium)
|
||||
--severity ">= low" # All severities (case-insensitive)
|
||||
```
|
||||
|
||||
**Deprecated aliases**: `--high-only` (use `--severity HIGH`), `--include-nonprod` (use `--keep-nonprod-severity`).
|
||||
|
||||
### Examples
|
||||
|
||||
```bash
|
||||
# Basic scan
|
||||
nyx scan
|
||||
|
||||
# Scan specific path, JSON output
|
||||
nyx scan ./server --format json
|
||||
|
||||
# CI gate: fail on medium+, SARIF output
|
||||
nyx scan . --format sarif --fail-on medium > results.sarif
|
||||
|
||||
# Fast AST-only scan, no index
|
||||
nyx scan . --mode ast --index off
|
||||
|
||||
# High-severity only, quiet mode
|
||||
nyx scan . --severity HIGH --quiet
|
||||
|
||||
# Only findings scoring 50 or above
|
||||
nyx scan . --min-score 50
|
||||
|
||||
# Only medium+ confidence findings
|
||||
nyx scan . --min-confidence medium
|
||||
|
||||
# Show everything (no filtering, no rollups)
|
||||
nyx scan . --all
|
||||
|
||||
# Include quality findings but keep rollups and budgets
|
||||
nyx scan . --include-quality
|
||||
|
||||
# See all unwrap findings expanded
|
||||
nyx scan . --include-quality --show-instances rs.quality.unwrap
|
||||
|
||||
# Allow more LOW findings
|
||||
nyx scan . --max-low 50 --max-low-per-file 5
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## `nyx index`
|
||||
|
||||
Manage the SQLite file index.
|
||||
|
||||
### `nyx index build`
|
||||
|
||||
```
|
||||
nyx index build [PATH] [--force]
|
||||
```
|
||||
|
||||
Build or update the index for the given path (default: `.`).
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `-f, --force` | Force full rebuild, ignoring cached file hashes |
|
||||
|
||||
### `nyx index status`
|
||||
|
||||
```
|
||||
nyx index status [PATH]
|
||||
```
|
||||
|
||||
Display index statistics (file count, size, last modified) for the given path.
|
||||
|
||||
---
|
||||
|
||||
## `nyx list`
|
||||
|
||||
```
|
||||
nyx list [-v]
|
||||
```
|
||||
|
||||
List all indexed projects.
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `-v, --verbose` | Show detailed information per project |
|
||||
|
||||
---
|
||||
|
||||
## `nyx clean`
|
||||
|
||||
```
|
||||
nyx clean [PROJECT] [--all]
|
||||
```
|
||||
|
||||
Remove index data.
|
||||
|
||||
| Argument/Flag | Description |
|
||||
|---------------|-------------|
|
||||
| `PROJECT` | Project name or path to clean |
|
||||
| `--all` | Clean all indexed projects |
|
||||
|
||||
---
|
||||
|
||||
## `nyx config`
|
||||
|
||||
Manage configuration.
|
||||
|
||||
### `nyx config show`
|
||||
|
||||
Print the effective merged configuration as TOML.
|
||||
|
||||
### `nyx config path`
|
||||
|
||||
Print the configuration directory path.
|
||||
|
||||
### `nyx config add-rule`
|
||||
|
||||
```
|
||||
nyx config add-rule --lang <LANG> --matcher <MATCHER> --kind <KIND> --cap <CAP>
|
||||
```
|
||||
|
||||
Add a custom taint rule. Written to `nyx.local`.
|
||||
|
||||
| Flag | Values |
|
||||
|------|--------|
|
||||
| `--lang` | `rust`, `javascript`, `typescript`, `python`, `go`, `java`, `c`, `cpp`, `php`, `ruby` |
|
||||
| `--matcher` | Function or property name to match |
|
||||
| `--kind` | `source`, `sanitizer`, `sink` |
|
||||
| `--cap` | `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `all` |
|
||||
|
||||
### `nyx config add-terminator`
|
||||
|
||||
```
|
||||
nyx config add-terminator --lang <LANG> --name <NAME>
|
||||
```
|
||||
|
||||
Add a terminator function (e.g. `process.exit`). Written to `nyx.local`.
|
||||
|
||||
---
|
||||
|
||||
## Exit Codes
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| `0` | Scan completed; no findings matched `--fail-on` threshold (or no `--fail-on` specified) |
|
||||
| `1` | Scan completed but at least one finding met or exceeded the `--fail-on` severity |
|
||||
| Non-zero | Error during scan (I/O error, config parse error, database error, etc.) |
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `RUST_LOG` | Set tracing verbosity (e.g. `RUST_LOG=debug nyx scan .`) |
|
||||
| `NO_COLOR` | Disable ANSI color output |
|
||||
183
docs/configuration.md
Normal file
183
docs/configuration.md
Normal file
|
|
@ -0,0 +1,183 @@
|
|||
# Configuration
|
||||
|
||||
Nyx uses TOML configuration files. A default config is auto-generated on first run.
|
||||
|
||||
## File Locations
|
||||
|
||||
| Platform | Directory |
|
||||
|----------|-----------|
|
||||
| Linux | `~/.config/nyx/` |
|
||||
| macOS | `~/Library/Application Support/nyx/` |
|
||||
| Windows | `%APPDATA%\elicpeter\nyx\config\` |
|
||||
|
||||
Run `nyx config path` to see the exact directory on your system.
|
||||
|
||||
## File Precedence
|
||||
|
||||
1. **`nyx.conf`** — Default config (auto-created from built-in template on first run)
|
||||
2. **`nyx.local`** — User overrides (loaded on top of defaults)
|
||||
|
||||
Both files are optional. CLI flags take precedence over both.
|
||||
|
||||
## Merge Strategy
|
||||
|
||||
| Type | Behavior |
|
||||
|------|----------|
|
||||
| Scalars (`mode`, `min_severity`, booleans) | User value wins |
|
||||
| Arrays (`excluded_extensions`, `excluded_directories`) | Union + deduplicate |
|
||||
| Analysis rules | Per-language union with deduplication |
|
||||
|
||||
Example:
|
||||
```toml
|
||||
# nyx.conf (default):
|
||||
excluded_extensions = ["jpg", "png", "exe"]
|
||||
|
||||
# nyx.local (user):
|
||||
excluded_extensions = ["foo", "jpg"]
|
||||
|
||||
# Effective result:
|
||||
# ["exe", "foo", "jpg", "png"] — sorted, deduped union
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Full Schema
|
||||
|
||||
### `[scanner]`
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `mode` | `"full"` \| `"ast"` \| `"cfg"` \| `"taint"` | `"full"` | Analysis mode |
|
||||
| `min_severity` | `"Low"` \| `"Medium"` \| `"High"` | `"Low"` | Minimum severity to report |
|
||||
| `max_file_size_mb` | int \| null | null | Max file size in MiB; null = unlimited |
|
||||
| `excluded_extensions` | [string] | `["jpg", "png", "gif", "mp4", ...]` | File extensions to skip |
|
||||
| `excluded_directories` | [string] | `["node_modules", ".git", "target", ...]` | Directories to skip |
|
||||
| `excluded_files` | [string] | `[]` | Specific files to skip |
|
||||
| `read_global_ignore` | bool | `false` | Honor global ignore file |
|
||||
| `read_vcsignore` | bool | `true` | Honor `.gitignore` / `.hgignore` |
|
||||
| `require_git_to_read_vcsignore` | bool | `true` | Require `.git` dir to apply gitignore |
|
||||
| `one_file_system` | bool | `false` | Don't cross filesystem boundaries |
|
||||
| `follow_symlinks` | bool | `false` | Follow symbolic links |
|
||||
| `scan_hidden_files` | bool | `false` | Scan dot-files |
|
||||
| `include_nonprod` | bool | `false` | Keep original severity for test/vendor paths |
|
||||
| `enable_state_analysis` | bool | `false` | Enable resource lifecycle + auth state analysis. Detects use-after-close, double-close, resource leaks (per-function scope), and unauthenticated access. Requires `mode = "full"` or `mode = "cfg"`. |
|
||||
|
||||
### `[database]`
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `path` | string | `""` | Custom SQLite DB path; empty = platform default |
|
||||
|
||||
### `[output]`
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `default_format` | `"console"` \| `"json"` \| `"sarif"` | `"console"` | Default output format |
|
||||
| `quiet` | bool | `false` | Suppress status messages |
|
||||
| `max_results` | int \| null | null | Cap number of findings; null = unlimited |
|
||||
| `attack_surface_ranking` | bool | `true` | Enable attack-surface ranking |
|
||||
| `min_score` | int \| null | null | Minimum rank score to include; null = no minimum |
|
||||
| `min_confidence` | string \| null | null | Minimum confidence level (`"low"`, `"medium"`, `"high"`); null = no minimum |
|
||||
| `include_quality` | bool | `false` | Include Quality-category findings (hidden by default) |
|
||||
| `show_all` | bool | `false` | Disable category filtering, rollups, and LOW budgets |
|
||||
| `max_low` | int | `20` | Maximum total LOW findings to show (rollups count as 1) |
|
||||
| `max_low_per_file` | int | `1` | Maximum LOW findings per file (rollups count as 1) |
|
||||
| `max_low_per_rule` | int | `10` | Maximum LOW findings per rule (rollups count as 1) |
|
||||
| `rollup_examples` | int | `5` | Number of example locations stored in rollup findings |
|
||||
|
||||
### `[performance]`
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `worker_threads` | int \| null | null | Worker thread count; null/0 = auto-detect |
|
||||
| `batch_size` | int | `100` | Files per index batch |
|
||||
| `channel_multiplier` | int | `4` | Channel capacity = threads x multiplier |
|
||||
| `rayon_thread_stack_size` | int | `8388608` | Rayon thread stack size in bytes (8 MiB) |
|
||||
| `prune` | bool | `false` | Stop traversing into matching directories |
|
||||
|
||||
### `[analysis.languages.<slug>]`
|
||||
|
||||
Per-language custom rules. `<slug>` is one of: `rust`, `javascript`, `typescript`, `python`, `go`, `java`, `c`, `cpp`, `php`, `ruby`.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `rules` | array of rule objects | Custom label rules |
|
||||
| `terminators` | [string] | Functions that terminate execution |
|
||||
| `event_handlers` | [string] | Event handler function names |
|
||||
|
||||
**Rule object**:
|
||||
|
||||
```toml
|
||||
[[analysis.languages.javascript.rules]]
|
||||
matchers = ["escapeHtml"]
|
||||
kind = "sanitizer" # "source" | "sanitizer" | "sink"
|
||||
cap = "html_escape" # "env_var" | "html_escape" | "shell_escape" |
|
||||
# "url_encode" | "json_parse" | "file_io" | "all"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example Configurations
|
||||
|
||||
### Minimal override (`nyx.local`)
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
min_severity = "Medium"
|
||||
|
||||
[output]
|
||||
default_format = "json"
|
||||
max_results = 100
|
||||
```
|
||||
|
||||
### CI-optimized
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
mode = "full"
|
||||
min_severity = "Medium"
|
||||
excluded_directories = ["node_modules", ".git", "target", "vendor", "dist"]
|
||||
|
||||
[output]
|
||||
quiet = true
|
||||
default_format = "sarif"
|
||||
|
||||
[performance]
|
||||
worker_threads = 4
|
||||
```
|
||||
|
||||
### Custom rules for a Node.js project
|
||||
|
||||
```toml
|
||||
[analysis.languages.javascript]
|
||||
terminators = ["process.exit", "abort"]
|
||||
event_handlers = ["addEventListener"]
|
||||
|
||||
[[analysis.languages.javascript.rules]]
|
||||
matchers = ["escapeHtml", "sanitizeInput"]
|
||||
kind = "sanitizer"
|
||||
cap = "html_escape"
|
||||
|
||||
[[analysis.languages.javascript.rules]]
|
||||
matchers = ["dangerouslySetInnerHTML"]
|
||||
kind = "sink"
|
||||
cap = "html_escape"
|
||||
|
||||
[[analysis.languages.javascript.rules]]
|
||||
matchers = ["getRequestBody", "readUserInput"]
|
||||
kind = "source"
|
||||
cap = "all"
|
||||
```
|
||||
|
||||
### Adding rules via CLI
|
||||
|
||||
```bash
|
||||
# Add a sanitizer
|
||||
nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape
|
||||
|
||||
# Add a terminator
|
||||
nyx config add-terminator --lang javascript --name process.exit
|
||||
|
||||
# Verify
|
||||
nyx config show
|
||||
```
|
||||
81
docs/detectors.md
Normal file
81
docs/detectors.md
Normal file
|
|
@ -0,0 +1,81 @@
|
|||
# Detector Overview
|
||||
|
||||
Nyx uses four independent detector families. Each targets different vulnerability classes and operates at a different level of analysis depth. Findings from all active detectors are merged, deduplicated, ranked, and presented in a single result set.
|
||||
|
||||
## The Four Detector Families
|
||||
|
||||
| Family | Rule prefix | Analysis depth | What it finds |
|
||||
|--------|------------|----------------|---------------|
|
||||
| [**Taint Analysis**](detectors/taint.md) | `taint-*` | Cross-file dataflow | Unsanitized data flowing from sources to sinks |
|
||||
| [**CFG Structural**](detectors/cfg.md) | `cfg-*` | Intra-procedural CFG | Auth gaps, unguarded sinks, resource leaks, error fallthrough |
|
||||
| [**State Model**](detectors/state.md) | `state-*` | Intra-procedural lattice | Use-after-close, double-close, resource leaks, unauthenticated access |
|
||||
| [**AST Patterns**](detectors/patterns.md) | `<lang>.*.*` | Structural (no flow) | Dangerous function calls, banned APIs, weak crypto |
|
||||
|
||||
## How They Combine
|
||||
|
||||
In `--mode full` (default), all four families run. Findings are deduplicated:
|
||||
|
||||
1. **Taint supersedes AST**: If a taint finding and an AST pattern both fire at the same location (e.g. both flag `eval(userInput)`), both are kept with distinct rule IDs. The taint finding ranks higher due to the analysis-kind bonus.
|
||||
|
||||
2. **State supersedes CFG**: If a state-model finding (e.g. `state-resource-leak`) fires at the same location as a CFG finding (e.g. `cfg-resource-leak`), the CFG finding is suppressed.
|
||||
|
||||
3. **Location-level dedup**: Exact duplicates (same line, column, rule ID, severity) are removed.
|
||||
|
||||
## Analysis Modes
|
||||
|
||||
| Mode | CLI flag | Active detectors |
|
||||
|------|----------|-----------------|
|
||||
| Full | `--mode full` | All four |
|
||||
| AST-only | `--mode ast` | AST patterns only |
|
||||
| CFG/Taint | `--mode cfg` | Taint + CFG + State |
|
||||
|
||||
## Attack-Surface Ranking
|
||||
|
||||
Every finding receives a deterministic **attack-surface score** estimating exploitability. Findings are sorted by descending score.
|
||||
|
||||
### Scoring Formula
|
||||
|
||||
```
|
||||
score = severity_base + analysis_kind + evidence_strength + state_bonus - validation_penalty
|
||||
```
|
||||
|
||||
| Component | Values | Purpose |
|
||||
|-----------|--------|---------|
|
||||
| **Severity base** | High=60, Medium=30, Low=10 | Primary signal |
|
||||
| **Analysis kind** | taint=+10, state=+8, cfg(with evidence)=+5, cfg(no evidence)=+3, ast=+0 | Confidence of analysis |
|
||||
| **Evidence strength** | +1 per evidence item (max 4), +2-6 for source kind | Specificity of finding |
|
||||
| **State bonus** | use-after-close/unauthed=+6, double-close=+3, must-leak=+2, may-leak=+1 | State rule severity |
|
||||
| **Validation penalty** | -5 if path-validated | Guard reduces exploitability |
|
||||
|
||||
### Source-kind priority
|
||||
|
||||
| Source type | Bonus | Examples |
|
||||
|-------------|-------|---------|
|
||||
| User input | +6 | `req.body`, `argv`, `stdin`, `form`, `query`, `params` |
|
||||
| Environment | +5 | `env::var`, `getenv`, `process.env` |
|
||||
| Unknown | +4 | Conservative default |
|
||||
| File system | +3 | `fs::read_to_string`, `fgets` |
|
||||
| Database | +2 | Query results |
|
||||
|
||||
### Score ranges (approximate)
|
||||
|
||||
| Finding type | Score range |
|
||||
|-------------|------------|
|
||||
| High taint + user input | ~76-80 |
|
||||
| High state (use-after-close) | ~74 |
|
||||
| High CFG structural | ~63-68 |
|
||||
| Medium taint + env source | ~45-50 |
|
||||
| Medium state (resource leak) | ~40 |
|
||||
| Low AST-only pattern | ~10 |
|
||||
|
||||
Ranking is enabled by default. Disable with `--no-rank` or `output.attack_surface_ranking = false`.
|
||||
|
||||
## Two-Pass Architecture
|
||||
|
||||
Nyx's taint analysis requires cross-file context, achieved via two passes:
|
||||
|
||||
1. **Pass 1 — Summary extraction**: Each file is parsed, a CFG is built, and a `FuncSummary` is extracted per function. Summaries capture source/sanitizer/sink capabilities (bitflags), taint propagation behavior, and callee lists. Summaries are persisted to SQLite.
|
||||
|
||||
2. **Pass 2 — Analysis**: All summaries are merged into a global map. Files are re-parsed and analyzed with full cross-file context. The taint engine resolves callees against local summaries (more precise) first, then falls back to global summaries.
|
||||
|
||||
With indexing enabled, Pass 1 skips files whose content hash hasn't changed since the last scan.
|
||||
161
docs/detectors/cfg.md
Normal file
161
docs/detectors/cfg.md
Normal file
|
|
@ -0,0 +1,161 @@
|
|||
# CFG Structural Analysis
|
||||
|
||||
## Summary
|
||||
|
||||
Nyx builds an intra-procedural control-flow graph (CFG) for each function and analyzes structural properties: whether sinks are guarded by sanitizers or validators, whether web handlers check authentication, whether resources are released on all exit paths, and whether error-handling code terminates properly.
|
||||
|
||||
These detectors use **dominator analysis** — they check whether a guard node dominates (must execute before) a sink node on the CFG.
|
||||
|
||||
## Rule IDs
|
||||
|
||||
| Rule ID | Severity | Description |
|
||||
|---------|----------|-------------|
|
||||
| `cfg-unguarded-sink` | High/Medium | Sink reachable without a dominating guard or sanitizer |
|
||||
| `cfg-auth-gap` | High | Web handler reaches privileged sink without auth check |
|
||||
| `cfg-unreachable-sink` | Medium | Dangerous function in unreachable code |
|
||||
| `cfg-unreachable-sanitizer` | Low | Sanitizer in unreachable code |
|
||||
| `cfg-unreachable-source` | Low | Source in unreachable code |
|
||||
| `cfg-error-fallthrough` | High/Medium | Error check doesn't terminate; dangerous code follows |
|
||||
| `cfg-resource-leak` | Medium | Resource acquired but not released on all exit paths |
|
||||
| `cfg-lock-not-released` | Medium | Lock acquired but not released on all exit paths |
|
||||
|
||||
## What It Detects
|
||||
|
||||
### Unguarded sinks (`cfg-unguarded-sink`)
|
||||
A sink call (e.g. `system()`, `eval()`, `Command::new()`) is reachable from the function entry without passing through a guard or sanitizer that matches the sink's capability.
|
||||
|
||||
### Auth gaps (`cfg-auth-gap`)
|
||||
A function identified as a web handler (by parameter naming conventions like `req`, `res`, `ctx`, `request`) reaches a privileged sink (shell execution, file I/O) without a prior call to an authentication function (`is_authenticated`, `require_auth`, `check_permission`, etc.).
|
||||
|
||||
### Unreachable security code (`cfg-unreachable-*`)
|
||||
Sinks, sanitizers, or sources in dead code branches. This often indicates a refactoring error where security-critical code was accidentally made unreachable.
|
||||
|
||||
### Error fallthrough (`cfg-error-fallthrough`)
|
||||
An error check (null check, error return check) does not terminate the function or loop back. Execution continues to a dangerous operation on the error path.
|
||||
|
||||
### Resource leaks (`cfg-resource-leak`, `cfg-lock-not-released`)
|
||||
A resource acquisition call (e.g. `File::open`, `fopen`, `socket`, `Lock`) is not matched by a release call (e.g. `close`, `fclose`, `unlock`) on all exit paths from the function.
|
||||
|
||||
## What It Cannot Detect
|
||||
|
||||
- **Inter-procedural guards**: If authentication is checked in a middleware function that calls this handler, the CFG detector cannot see it. It only analyzes one function at a time.
|
||||
- **Dynamic dispatch**: Virtual method calls, function pointers, and closures are opaque to the CFG.
|
||||
- **Complex guard patterns**: Only recognized guard function names are checked. Custom validation logic (e.g. `if password == expected`) is not recognized as a guard.
|
||||
- **Correct sanitization**: The detector checks that *some* guard dominates the sink, not that the guard is *correct*. A guard that always passes would suppress the finding.
|
||||
- **Cross-function resource flows**: If a file handle is opened in one function and closed in another, the detector will report a leak in the first function.
|
||||
|
||||
## Common False Positives
|
||||
|
||||
| Scenario | Why it fires | Mitigation |
|
||||
|----------|-------------|------------|
|
||||
| Framework-level auth middleware | Handler doesn't call auth directly | Document as expected; suppress with severity filter |
|
||||
| Resource closed via RAII/defer | Implicit cleanup not visible to CFG | Currently not detected; known limitation |
|
||||
| Custom guard function name | Function not in the recognized guard list | Add the function name as a sanitizer in config |
|
||||
| Test handlers | Intentionally skip auth in tests | Default non-prod downgrade reduces severity; or exclude test dirs |
|
||||
|
||||
## Common False Negatives
|
||||
|
||||
| Scenario | Why it's missed |
|
||||
|----------|----------------|
|
||||
| Auth in called function | Cross-function guards not tracked |
|
||||
| Guard via type system | Type-level guarantees (e.g. Rust's `AuthenticatedUser` wrapper) not analyzed |
|
||||
| Resource closed in finally/defer | Some cleanup patterns not recognized |
|
||||
|
||||
## Confidence Signals
|
||||
|
||||
| Signal | Meaning |
|
||||
|--------|---------|
|
||||
| **Evidence lists guard nodes** | Shows which guards were checked and found missing |
|
||||
| **Sink has high capability** | Shell execution or file I/O sinks are higher risk |
|
||||
| **Handler detection matched** | Web handler identification is based on conventional parameter names |
|
||||
|
||||
## Tuning and Noise Controls
|
||||
|
||||
### Add custom guards/sanitizers
|
||||
|
||||
```toml
|
||||
[[analysis.languages.python.rules]]
|
||||
matchers = ["validate_request", "check_csrf"]
|
||||
kind = "sanitizer"
|
||||
cap = "all"
|
||||
```
|
||||
|
||||
### Add auth rules
|
||||
|
||||
Auth checks are recognized by function name. If your codebase uses non-standard names:
|
||||
|
||||
```toml
|
||||
[[analysis.languages.javascript.rules]]
|
||||
matchers = ["ensureLoggedIn", "requirePermission"]
|
||||
kind = "sanitizer"
|
||||
cap = "all"
|
||||
```
|
||||
|
||||
### Filter results
|
||||
|
||||
```bash
|
||||
# Skip low-severity unreachable findings
|
||||
nyx scan . --severity ">=MEDIUM"
|
||||
```
|
||||
|
||||
### Disable CFG analysis
|
||||
|
||||
```bash
|
||||
nyx scan . --mode ast # AST patterns only
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Unguarded sink
|
||||
|
||||
```go
|
||||
func handler(w http.ResponseWriter, r *http.Request) {
|
||||
cmd := r.URL.Query().Get("cmd")
|
||||
exec.Command("sh", "-c", cmd).Run() // cfg-unguarded-sink: no guard dominates
|
||||
}
|
||||
```
|
||||
|
||||
### Auth gap
|
||||
|
||||
```javascript
|
||||
app.get('/admin/delete', (req, res) => {
|
||||
// No is_authenticated() call
|
||||
db.execute("DELETE FROM users WHERE id = " + req.params.id);
|
||||
// cfg-auth-gap: web handler reaches privileged sink without auth
|
||||
});
|
||||
```
|
||||
|
||||
### Resource leak
|
||||
|
||||
```c
|
||||
void process() {
|
||||
FILE *f = fopen("data.txt", "r"); // acquire
|
||||
if (error) {
|
||||
return; // cfg-resource-leak: f not closed on this path
|
||||
}
|
||||
fclose(f);
|
||||
}
|
||||
```
|
||||
|
||||
## Guard Rules
|
||||
|
||||
Nyx recognizes these function name patterns as guards:
|
||||
|
||||
| Pattern | Applies to |
|
||||
|---------|-----------|
|
||||
| `validate*`, `sanitize*` | All sinks |
|
||||
| `check_*`, `verify_*`, `assert_*` | All sinks |
|
||||
| `shell_escape` | Shell execution sinks |
|
||||
| `html_escape` | HTML/XSS sinks |
|
||||
| `url_encode` | URL sinks |
|
||||
| `which` | Shell execution (binary lookup) |
|
||||
|
||||
### Auth rules
|
||||
|
||||
| Pattern | Category |
|
||||
|---------|----------|
|
||||
| `is_authenticated`, `require_auth`, `check_permission` | Common |
|
||||
| `authorize`, `authenticate`, `require_login` | Common |
|
||||
| `check_auth`, `verify_token`, `validate_token` | Common |
|
||||
| `middleware.auth`, `auth.required` | Go |
|
||||
| `isAuthenticated`, `checkPermission`, `hasAuthority`, `hasRole` | Java |
|
||||
149
docs/detectors/patterns.md
Normal file
149
docs/detectors/patterns.md
Normal file
|
|
@ -0,0 +1,149 @@
|
|||
# AST Pattern Matching
|
||||
|
||||
## Summary
|
||||
|
||||
AST patterns are tree-sitter queries that match specific structural code constructs. They are the simplest and fastest detector family — no dataflow, no CFG, just structural presence. A match means the dangerous construct exists in the code; it does not prove the code is exploitable.
|
||||
|
||||
AST patterns run in all analysis modes, including `--mode ast` (where they are the only active detector).
|
||||
|
||||
## Rule IDs
|
||||
|
||||
Pattern rule IDs follow the format `<lang>.<category>.<specific>`:
|
||||
|
||||
```
|
||||
rs.memory.transmute
|
||||
js.code_exec.eval
|
||||
py.deser.pickle_loads
|
||||
c.memory.gets
|
||||
java.sqli.execute_concat
|
||||
```
|
||||
|
||||
See the [Rule Reference](../rules/index.md) for a complete listing per language.
|
||||
|
||||
## Pattern Tiers
|
||||
|
||||
| Tier | Meaning | Examples |
|
||||
|------|---------|---------|
|
||||
| **A** | Structural presence alone is high-signal | `gets()`, `eval()`, `pickle.loads()`, `mem::transmute` |
|
||||
| **B** | Query includes a heuristic guard | SQL `execute` with concatenated arg, `printf(var)` with non-literal format |
|
||||
|
||||
Tier B patterns use additional tree-sitter predicates to reduce false positives. For example, `java.sqli.execute_concat` only fires when `executeQuery()` receives a `binary_expression` (string concatenation) as its argument, not when it receives a literal or parameter placeholder.
|
||||
|
||||
## What It Detects
|
||||
|
||||
### By category
|
||||
|
||||
| Category | What it matches | Example languages |
|
||||
|----------|----------------|-------------------|
|
||||
| **CommandExec** | Shell command execution functions | C (`system`), Python (`os.system`), Ruby (backticks) |
|
||||
| **CodeExec** | Dynamic code evaluation | JS (`eval`, `new Function()`), Python (`exec`), PHP (`eval`) |
|
||||
| **Deserialization** | Unsafe object deserialization | Java (`readObject`), Python (`pickle.loads`), Ruby (`Marshal.load`) |
|
||||
| **SqlInjection** | SQL with string concatenation | Java, Go, Python, PHP (Tier B heuristic) |
|
||||
| **PathTraversal** | File inclusion with variable path | PHP (`include $var`) |
|
||||
| **Xss** | XSS sink functions | JS (`document.write`, `outerHTML`), Java (`getWriter().print`) |
|
||||
| **Crypto** | Weak cryptographic algorithms | All languages (`md5`, `sha1`, `Math.random()`) |
|
||||
| **Secrets** | Hardcoded credentials | Go (variable name matching) |
|
||||
| **InsecureTransport** | Unencrypted communication | Go (`InsecureSkipVerify`), JS (`fetch("http://")`) |
|
||||
| **Reflection** | Dynamic class/method dispatch | Java (`Class.forName`, `Method.invoke`), Ruby (`send`, `constantize`) |
|
||||
| **MemorySafety** | Memory safety violations | Rust (`transmute`, `unsafe`), C (`gets`, `strcpy`, `sprintf`) |
|
||||
| **Prototype** | Prototype pollution | JS/TS (`__proto__` assignment) |
|
||||
| **CodeQuality** | Panic/abort/type-safety issues | Rust (`unwrap`, `panic!`), TS (`as any`) |
|
||||
|
||||
## What It Cannot Detect
|
||||
|
||||
- **Dataflow**: Patterns don't track whether the dangerous function receives tainted input. `eval("hello")` (safe) and `eval(userInput)` (dangerous) both match `js.code_exec.eval`.
|
||||
- **Context**: Patterns don't understand whether the code is reachable, guarded, or inside a test.
|
||||
- **Semantics**: `strcpy(dst, src)` always matches — it cannot determine buffer sizes.
|
||||
- **Indirect calls**: Function pointers, dynamic dispatch, and aliased references are invisible.
|
||||
|
||||
## Common False Positives
|
||||
|
||||
| Scenario | Why it fires | Mitigation |
|
||||
|----------|-------------|------------|
|
||||
| `eval()` with a hardcoded string literal | Pattern matches structural presence | Taint analysis won't flag this — use `--mode cfg` for fewer false positives |
|
||||
| `unsafe` block in Rust with sound justification | All unsafe blocks match | Filter with `--severity ">=MEDIUM"` (unsafe_block is Medium) |
|
||||
| `.unwrap()` in test code | Acceptable in tests | Default non-prod downgrade reduces severity |
|
||||
| `md5()` used for checksums (not security) | Pattern doesn't know usage intent | Filter Low severity or add to exclusions |
|
||||
| SQL concatenation with trusted data | Tier B heuristic can't verify data source | Taint analysis is more precise here |
|
||||
|
||||
## Common False Negatives
|
||||
|
||||
| Scenario | Why it's missed |
|
||||
|----------|----------------|
|
||||
| `eval` called via alias (`let e = eval; e(input)`) | Pattern matches the identifier `eval`, not the resolved function |
|
||||
| Dangerous function in a macro expansion | Tree-sitter parses the macro call, not the expansion |
|
||||
| SQL injection via ORM query builder | No pattern for ORM-specific query building |
|
||||
| Imported function under different name | `from os import system as s; s(cmd)` — pattern looks for `system` |
|
||||
|
||||
## Confidence Signals
|
||||
|
||||
| Signal | Meaning |
|
||||
|--------|---------|
|
||||
| **Tier A** | High confidence — the function itself is dangerous |
|
||||
| **Tier B** | Moderate confidence — heuristic guard reduces false positives |
|
||||
| **High severity** | Critical vulnerability class (command exec, deserialization) |
|
||||
| **Low severity** | Informational (weak crypto, code quality) |
|
||||
| **Non-prod path** | Finding in test/vendor code — downgraded by default |
|
||||
|
||||
## Tuning and Noise Controls
|
||||
|
||||
### Severity filtering
|
||||
|
||||
```bash
|
||||
# Skip code-quality and weak-crypto findings
|
||||
nyx scan . --severity ">=MEDIUM"
|
||||
|
||||
# Only critical findings
|
||||
nyx scan . --severity HIGH
|
||||
```
|
||||
|
||||
### Use taint for precision
|
||||
|
||||
```bash
|
||||
# Taint-only mode: only report findings with confirmed dataflow
|
||||
nyx scan . --mode cfg
|
||||
```
|
||||
|
||||
### Exclude directories
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
excluded_directories = ["node_modules", "vendor", "generated"]
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Tier A — structural presence
|
||||
|
||||
**C: Banned function**
|
||||
```c
|
||||
char buf[64];
|
||||
gets(buf); // c.memory.gets — always dangerous, no safe usage
|
||||
```
|
||||
|
||||
**Python: Unsafe deserialization**
|
||||
```python
|
||||
import pickle
|
||||
data = pickle.loads(user_input) # py.deser.pickle_loads
|
||||
```
|
||||
|
||||
### Tier B — heuristic-guarded
|
||||
|
||||
**Java: SQL concatenation**
|
||||
```java
|
||||
// Fires: concatenated argument
|
||||
stmt.executeQuery("SELECT * FROM users WHERE id=" + userId);
|
||||
// java.sqli.execute_concat
|
||||
|
||||
// Does NOT fire: parameterized query
|
||||
stmt.executeQuery(preparedSql);
|
||||
```
|
||||
|
||||
**C: Format string**
|
||||
```c
|
||||
// Fires: variable as first argument
|
||||
printf(user_input); // c.memory.printf_no_fmt
|
||||
|
||||
// Does NOT fire: literal format string
|
||||
printf("%s", user_input);
|
||||
```
|
||||
204
docs/detectors/state.md
Normal file
204
docs/detectors/state.md
Normal file
|
|
@ -0,0 +1,204 @@
|
|||
# State Model Analysis
|
||||
|
||||
## Summary
|
||||
|
||||
Nyx's state model analysis tracks **resource lifecycle** and **authentication state** through a function using monotone dataflow over bounded lattices. It detects use-after-close bugs, double-close bugs, resource leaks, and unauthenticated access to privileged operations.
|
||||
|
||||
State analysis is **opt-in** — enable it with `scanner.enable_state_analysis = true` in config. It requires `mode = "full"` or `mode = "cfg"`.
|
||||
|
||||
## Rule IDs
|
||||
|
||||
| Rule ID | Severity | Description |
|
||||
|---------|----------|-------------|
|
||||
| `state-use-after-close` | High | Variable used after being closed/released |
|
||||
| `state-double-close` | Medium | Resource closed twice |
|
||||
| `state-resource-leak` | Medium | Resource opened but never closed (definite) |
|
||||
| `state-resource-leak-possible` | Low | Resource may not be closed on all paths |
|
||||
| `state-unauthed-access` | High | Privileged operation reached without authentication |
|
||||
|
||||
## What It Detects
|
||||
|
||||
### Use-after-close (`state-use-after-close`)
|
||||
|
||||
A resource transitions to the CLOSED state (via `close()`, `fclose()`, `disconnect()`, etc.), then a use operation (`read`, `write`, `send`, `recv`, `query`, etc.) is performed on it.
|
||||
|
||||
```c
|
||||
FILE *f = fopen("data.txt", "r");
|
||||
fclose(f);
|
||||
fread(buf, 1, 100, f); // state-use-after-close
|
||||
```
|
||||
|
||||
### Double-close (`state-double-close`)
|
||||
|
||||
A resource is closed twice. This can cause crashes or undefined behavior.
|
||||
|
||||
```python
|
||||
f = open("data.txt")
|
||||
f.close()
|
||||
f.close() # state-double-close
|
||||
```
|
||||
|
||||
### Resource leak (`state-resource-leak`)
|
||||
|
||||
A resource is opened but never closed on any path through the function. This is a definite leak.
|
||||
|
||||
```java
|
||||
FileInputStream fis = new FileInputStream("data.txt");
|
||||
process(fis);
|
||||
// function exits without fis.close() — state-resource-leak
|
||||
```
|
||||
|
||||
### Possible resource leak (`state-resource-leak-possible`)
|
||||
|
||||
A resource is closed on some paths but not others.
|
||||
|
||||
```go
|
||||
f, err := os.Open("data.txt")
|
||||
if err != nil {
|
||||
return // f not closed here
|
||||
}
|
||||
f.Close() // closed here
|
||||
// state-resource-leak-possible on the error path
|
||||
```
|
||||
|
||||
### Unauthenticated access (`state-unauthed-access`)
|
||||
|
||||
A function identified as a web handler reaches a privileged sink (shell execution, file I/O) without any authentication check on the path.
|
||||
|
||||
A function is identified as a web handler if:
|
||||
1. Its name starts with `handle_`, `route_`, or `api_` (strong match — sufficient on its own), OR
|
||||
2. Its name starts with `serve_` or `process_` AND any function in the file has web-like parameter names (`request`, `req`, `ctx`, `res`, `response`, `w`, `writer`, etc., varying by language).
|
||||
|
||||
The function name `main` is explicitly excluded.
|
||||
|
||||
```javascript
|
||||
app.post('/admin/exec', (req, res) => {
|
||||
// No auth check
|
||||
exec(req.body.command); // state-unauthed-access
|
||||
});
|
||||
```
|
||||
|
||||
## What It Cannot Detect
|
||||
|
||||
- **Cross-function resource management**: Resources opened in one function and closed in another are not tracked. This is the most common source of false positives for leak detection.
|
||||
- **RAII / defer / try-with-resources**: Implicit cleanup via language-level constructs (Rust's `Drop`, Go's `defer`, Java's try-with-resources, Python's `with`) is not recognized. These patterns will produce false-positive leak findings.
|
||||
- **Dynamic dispatch**: If `close()` is called through a trait object or interface, it may not be recognized.
|
||||
- **Authentication via type system**: Rust's type-state pattern (e.g. `AuthenticatedRequest<T>`) is not recognized as an auth check.
|
||||
- **Complex authorization logic**: Only recognized function name patterns are checked.
|
||||
|
||||
## Common False Positives
|
||||
|
||||
| Scenario | Why it fires | Mitigation |
|
||||
|----------|-------------|------------|
|
||||
| RAII / Drop / defer cleanup | Implicit cleanup not visible | Known limitation; filter by severity |
|
||||
| Resource returned to caller | Ownership transferred, not leaked | Known limitation |
|
||||
| Framework-managed resources | Web framework manages connection lifecycle | Exclude framework-generated handlers |
|
||||
| Try-with-resources (Java) | Language construct not parsed | Known limitation |
|
||||
| Context manager (Python `with`) | Block construct not tracked | Known limitation |
|
||||
|
||||
## Common False Negatives
|
||||
|
||||
| Scenario | Why it's missed |
|
||||
|----------|----------------|
|
||||
| Resource closed in helper function | Cross-function tracking not implemented |
|
||||
| Auth in middleware | Auth check happens before handler is called |
|
||||
| Double-close via aliased reference | Alias analysis not performed |
|
||||
|
||||
## Confidence Signals
|
||||
|
||||
| Signal | Meaning |
|
||||
|--------|---------|
|
||||
| **Definite leak (state-resource-leak)** | Resource is never closed on any path — high confidence |
|
||||
| **Use-after-close** | Read/write operation after explicit close — high confidence |
|
||||
| **Web handler detected** | Entry point matched by parameter naming convention |
|
||||
| **Possible leak (state-resource-leak-possible)** | Resource closed on some but not all paths — lower confidence |
|
||||
|
||||
## Tuning and Noise Controls
|
||||
|
||||
### Enable state analysis
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
enable_state_analysis = true
|
||||
```
|
||||
|
||||
### Severity filtering
|
||||
|
||||
```bash
|
||||
# Skip possible-leak findings (Low severity)
|
||||
nyx scan . --severity ">=MEDIUM"
|
||||
```
|
||||
|
||||
### Exclude test files
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
excluded_directories = ["tests", "test", "spec"]
|
||||
```
|
||||
|
||||
## Resource Pairs
|
||||
|
||||
The state engine recognizes these acquire/release pairs per language:
|
||||
|
||||
### C/C++
|
||||
| Acquire | Release | Resource |
|
||||
|---------|---------|----------|
|
||||
| `fopen` | `fclose` | File handle |
|
||||
| `open` | `close` | File descriptor |
|
||||
| `socket` | `close` | Socket |
|
||||
| `malloc`, `calloc`, `realloc` | `free` | Heap memory |
|
||||
| `pthread_mutex_lock` | `pthread_mutex_unlock` | Mutex |
|
||||
|
||||
### Rust
|
||||
| Acquire | Release | Resource |
|
||||
|---------|---------|----------|
|
||||
| `File::open`, `File::create` | `drop`, `close` | File handle |
|
||||
| `TcpStream::connect` | `shutdown` | TCP connection |
|
||||
| `lock`, `read`, `write` (on Mutex/RwLock) | `drop` | Lock guard |
|
||||
|
||||
### Java
|
||||
| Acquire | Release | Resource |
|
||||
|---------|---------|----------|
|
||||
| `new FileInputStream` | `close` | File stream |
|
||||
| `getConnection` | `close` | DB connection |
|
||||
| `new Socket` | `close` | Socket |
|
||||
|
||||
### Go, Python, JavaScript, Ruby, PHP
|
||||
Similar patterns with language-specific function names.
|
||||
|
||||
## Use Patterns (Trigger use-after-close)
|
||||
|
||||
The following operations on a closed resource trigger `state-use-after-close`:
|
||||
|
||||
```
|
||||
read, write, send, recv, fread, fwrite, fgets, fputs, fprintf, fscanf,
|
||||
fflush, fseek, ftell, rewind, feof, ferror, fgetc, fputc, getc, putc,
|
||||
ungetc, query, execute, fetch, sendto, recvfrom, ioctl, fcntl,
|
||||
strcpy, strncpy, strcat, strncat, memcpy, memmove, memset, memcmp,
|
||||
strcmp, strncmp, strlen, sprintf, snprintf
|
||||
```
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Resource Lifecycle Lattice
|
||||
|
||||
```
|
||||
UNINIT → OPEN → CLOSED
|
||||
→ MOVED
|
||||
```
|
||||
|
||||
States are tracked as bitflags, allowing the lattice to represent uncertainty (e.g. OPEN|CLOSED means the resource is open on some paths and closed on others).
|
||||
|
||||
### Leak Detection Scope
|
||||
|
||||
Resource leaks are checked at the file-level exit node and the **synthesized** function exit node (a single Return node that all early returns feed into). Early-return nodes are **not** checked individually — only the merged state at the function's synthesized exit is inspected. This prevents duplicate findings where an early-return path reports a definite leak while the merged exit correctly reports a possible leak.
|
||||
|
||||
This per-function exit inspection ensures that a variable leaked inside one function is not masked by a same-named variable that is properly closed in a subsequent function.
|
||||
|
||||
### Auth Level Lattice
|
||||
|
||||
```
|
||||
Unauthed < Authed < Admin
|
||||
```
|
||||
|
||||
Join semantics: take the minimum (conservative). If any path is unauthenticated, the result is unauthenticated.
|
||||
202
docs/detectors/taint.md
Normal file
202
docs/detectors/taint.md
Normal file
|
|
@ -0,0 +1,202 @@
|
|||
# Taint Analysis
|
||||
|
||||
## Summary
|
||||
|
||||
Nyx's taint analysis tracks the flow of untrusted data from **sources** (where data enters the program) through **assignments and function calls** to **sinks** (where dangerous operations happen). If the data reaches a sink without passing through a **sanitizer** with matching capabilities, a finding is emitted.
|
||||
|
||||
The engine uses a monotone forward dataflow analysis over a finite lattice with guaranteed termination. Analysis is **intra-procedural with cross-file function summaries** — it does not follow calls into other functions but uses pre-computed summaries of their behavior.
|
||||
|
||||
## Rule ID
|
||||
|
||||
```
|
||||
taint-unsanitised-flow (source <line>:<col>)
|
||||
```
|
||||
|
||||
One rule ID covers all taint findings. The parenthetical identifies the specific source location.
|
||||
|
||||
## What It Detects
|
||||
|
||||
- Environment variables flowing to shell execution (`env::var` → `Command::new`)
|
||||
- User input flowing to code evaluation (`req.body` → `eval()`)
|
||||
- File contents flowing to SQL queries (`fs::read_to_string` → `db.execute()`)
|
||||
- Request parameters flowing to HTML output (`req.query` → `innerHTML`)
|
||||
- Any source-to-sink flow where the sink's required capability is not stripped by a sanitizer
|
||||
|
||||
## What It Cannot Detect
|
||||
|
||||
- **Inter-procedural flows without summaries**: If a function isn't summarized (e.g. from a third-party library without source), the taint engine cannot track data through it. It conservatively treats unknown callees as neither propagating nor sanitizing.
|
||||
- **Flows through data structures**: Taint is tracked per-variable, not per-field. `obj.field = tainted; sink(obj.other_field)` may produce a false positive because taint attaches to `obj` as a whole.
|
||||
- **Aliasing**: `let y = &x; sink(*y)` — the engine tracks `y` as a fresh variable, not an alias of `x`. This can cause false negatives.
|
||||
- **Complex control flow**: The analysis is flow-sensitive (respects control flow within a function) but does not track taint through arbitrary loops with complex exit conditions.
|
||||
- **Implicit flows**: Taint only follows explicit data flow, not information flow through branching (e.g. `if (secret) { x = 1 } else { x = 0 }` does not taint `x`).
|
||||
|
||||
## Common False Positives
|
||||
|
||||
| Scenario | Why it happens | Mitigation |
|
||||
|----------|---------------|------------|
|
||||
| Custom sanitizer not recognized | Nyx only knows built-in and configured sanitizers | Add a custom sanitizer rule in config |
|
||||
| Taint through struct fields | Variable-level (not field-level) tracking | No current mitigation; field sensitivity is planned |
|
||||
| Dead code paths | The engine is path-insensitive within a function (it considers all paths) | Contradiction pruning catches some cases; path-validated findings score lower |
|
||||
| Library wrappers | A wrapper around a dangerous function may re-introduce taint that was sanitized by the wrapper | Summarize the wrapper function or add it as a sanitizer |
|
||||
|
||||
## Common False Negatives
|
||||
|
||||
| Scenario | Why it's missed |
|
||||
|----------|----------------|
|
||||
| Third-party library calls | No summary available; callee treated as opaque |
|
||||
| Taint through global/static variables | Not tracked across function boundaries |
|
||||
| Taint through closures/callbacks in some languages | Closure capture analysis is limited (JS/TS/Ruby/Go anonymous functions ARE analyzed) |
|
||||
| Flows spanning more than two files | Summary approximation loses precision at depth |
|
||||
|
||||
## Confidence Signals
|
||||
|
||||
These signals in the output indicate higher-confidence findings:
|
||||
|
||||
| Signal | What it means |
|
||||
|--------|--------------|
|
||||
| **Evidence: Source + Sink** | Both endpoints identified with specific function names and locations |
|
||||
| **Source kind = user input** | Source is directly controllable by an attacker (req.body, argv, etc.) |
|
||||
| **path_validated = false** | No validation guard on the path — higher exploitability |
|
||||
| **No guard_kind** | No dominating predicate check (null check, error check, etc.) |
|
||||
| **High rank_score** | Multiple confidence signals combined |
|
||||
|
||||
Lower-confidence:
|
||||
|
||||
| Signal | What it means |
|
||||
|--------|--------------|
|
||||
| **path_validated = true** | A validation predicate guards the path — may not be exploitable |
|
||||
| **guard_kind = "ValidationCall"** | An explicit validation function was called before the sink |
|
||||
| **Source kind = database** | Data from DB — may already be validated at insertion time |
|
||||
|
||||
## Tuning and Noise Controls
|
||||
|
||||
### Add custom sanitizers
|
||||
|
||||
If your codebase has a custom sanitizer that Nyx doesn't recognize:
|
||||
|
||||
```toml
|
||||
# nyx.local
|
||||
[[analysis.languages.javascript.rules]]
|
||||
matchers = ["escapeHtml", "sanitizeInput"]
|
||||
kind = "sanitizer"
|
||||
cap = "html_escape"
|
||||
```
|
||||
|
||||
Or via CLI:
|
||||
```bash
|
||||
nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape
|
||||
```
|
||||
|
||||
### Filter by severity
|
||||
|
||||
```bash
|
||||
nyx scan . --severity HIGH # Only high-severity taint findings
|
||||
nyx scan . --severity ">=MEDIUM" # Skip low-severity
|
||||
```
|
||||
|
||||
### Skip non-production code
|
||||
|
||||
By default, findings in `tests/`, `vendor/`, `build/` paths are downgraded one severity tier. To exclude them entirely, add to config:
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
excluded_directories = ["tests", "vendor", "build", "examples"]
|
||||
```
|
||||
|
||||
### Disable taint (AST-only mode)
|
||||
|
||||
```bash
|
||||
nyx scan . --mode ast
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
**Vulnerable code** (Rust):
|
||||
```rust
|
||||
use std::env;
|
||||
use std::process::Command;
|
||||
|
||||
fn main() {
|
||||
let cmd = env::var("USER_CMD").unwrap(); // line 5: source
|
||||
Command::new("sh").arg("-c").arg(&cmd).output(); // line 6: sink
|
||||
}
|
||||
```
|
||||
|
||||
**Finding**:
|
||||
```
|
||||
[HIGH] taint-unsanitised-flow (source 5:15) src/main.rs:6:5
|
||||
Source: env::var("USER_CMD") at 5:15
|
||||
Sink: Command::new("sh").arg("-c")
|
||||
Score: 76
|
||||
```
|
||||
|
||||
**Safe alternative**:
|
||||
```rust
|
||||
use std::env;
|
||||
use std::process::Command;
|
||||
|
||||
fn main() {
|
||||
let cmd = env::var("USER_CMD").unwrap();
|
||||
// Use the value as a direct argument, not a shell command
|
||||
Command::new(&cmd).output();
|
||||
// Or validate against an allowlist
|
||||
}
|
||||
```
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Capability System
|
||||
|
||||
Taint uses a bitflag capability system to match sources with appropriate sanitizers and sinks:
|
||||
|
||||
| Capability | Bit | Sources | Sanitizers | Sinks |
|
||||
|-----------|-----|---------|------------|-------|
|
||||
| `ENV_VAR` | 0x01 | `env::var`, `getenv` | — | — |
|
||||
| `HTML_ESCAPE` | 0x02 | — | `html_escape`, `DOMPurify.sanitize` | `innerHTML`, `document.write` |
|
||||
| `SHELL_ESCAPE` | 0x04 | — | `shell_escape` | `Command::new`, `system()`, `eval()` |
|
||||
| `URL_ENCODE` | 0x08 | — | `encodeURIComponent` | `location.href` |
|
||||
| `JSON_PARSE` | 0x10 | — | `JSON.parse` | — |
|
||||
| `FILE_IO` | 0x20 | — | `filepath.Clean`, `basename`, `os.path.realpath` | `fopen`, `open`, `send_file`, `fs::read_to_string` |
|
||||
| `FMT_STRING` | 0x40 | — | — | `printf(var)` |
|
||||
|
||||
Sources typically use `Cap::all()` to match any sink. A sanitizer strips specific capability bits. A finding fires when a tainted variable reaches a sink and the taint still has the matching capability bit set.
|
||||
|
||||
### Nested Function Analysis
|
||||
|
||||
The CFG builder recursively discovers function expressions nested inside call arguments:
|
||||
|
||||
- **JavaScript/TypeScript**: `function_expression`, `arrow_function` inside call arguments (e.g., Express route handlers)
|
||||
- **Ruby**: `do_block` and `block` nodes (e.g., Sinatra `get '/path' do...end`)
|
||||
- **Go**: `func_literal` (anonymous function literals)
|
||||
|
||||
Each nested function is walked as a separate scope and receives a unique identifier (`<anon@{byte_offset}>`) to prevent collisions when multiple anonymous functions exist in the same file.
|
||||
|
||||
### Chained Call Classification
|
||||
|
||||
Method chains like `r.URL.Query().Get("host")` are normalized by stripping internal `()` segments between `.` separators. The classifier matches against both the original text and the normalized form, enabling rules like `r.URL` to match within `r.URL.Query.Get`.
|
||||
|
||||
### Nested Call Fallback
|
||||
|
||||
When the outermost call in an expression doesn't classify as a source/sink, the engine tries all nested inner calls. This handles patterns like `str(eval(expr))` where `str` is not a sink but the inner `eval` is.
|
||||
|
||||
### Rust `if let` / `while let` Pattern Bindings
|
||||
|
||||
The CFG builder recognizes Rust `let_condition` nodes inside `if` and `while` expressions. The value expression is classified for source/sink labels, and the pattern binding is extracted as a variable definition:
|
||||
|
||||
```rust
|
||||
if let Ok(cmd) = env::var("CMD") {
|
||||
// cmd is tainted — env::var is a source, cmd is the binding
|
||||
Command::new("sh").arg("-c").arg(&cmd).output(); // taint-unsanitised-flow
|
||||
}
|
||||
```
|
||||
|
||||
This also works for `while let` patterns.
|
||||
|
||||
### JS/TS Two-Level Solve
|
||||
|
||||
For JavaScript and TypeScript, taint analysis uses a two-level approach:
|
||||
|
||||
1. **Level 1**: Solve top-level code (module scope)
|
||||
2. **Level 2**: Solve each function seeded with the converged top-level state
|
||||
|
||||
This prevents false positives from cross-function taint leakage while preserving global-to-function flows.
|
||||
32
docs/index.md
Normal file
32
docs/index.md
Normal file
|
|
@ -0,0 +1,32 @@
|
|||
# Nyx Documentation
|
||||
|
||||
Welcome to the Nyx documentation. Nyx is a multi-language static vulnerability scanner built in Rust.
|
||||
|
||||
## User Guide
|
||||
|
||||
- [Installation](installation.md) — Install via cargo, prebuilt binaries, or from source
|
||||
- [Quick Start](quickstart.md) — Your first scan in 60 seconds
|
||||
- [CLI Reference](cli.md) — Every flag, subcommand, and option
|
||||
- [Configuration](configuration.md) — Config file schema, precedence, custom rules
|
||||
- [Output Formats](output.md) — Console, JSON, SARIF; exit codes; evidence fields
|
||||
|
||||
## Detector Reference
|
||||
|
||||
- [Detector Overview](detectors.md) — How the four detector families work together
|
||||
- [Taint Analysis](detectors/taint.md) — Cross-file source-to-sink dataflow tracking
|
||||
- [CFG Structural Analysis](detectors/cfg.md) — Auth gaps, unguarded sinks, resource leaks
|
||||
- [State Model Analysis](detectors/state.md) — Resource lifecycle and authentication state
|
||||
- [AST Patterns](detectors/patterns.md) — Tree-sitter structural pattern matching
|
||||
|
||||
## Rule Reference
|
||||
|
||||
- [Rule Index](rules/index.md) — How rules are organized
|
||||
- [Rust](rules/rust.md) | [C](rules/c.md) | [C++](rules/cpp.md) | [Java](rules/java.md) | [Go](rules/go.md)
|
||||
- [JavaScript](rules/javascript.md) | [TypeScript](rules/typescript.md) | [Python](rules/python.md)
|
||||
- [PHP](rules/php.md) | [Ruby](rules/ruby.md)
|
||||
|
||||
## Contributing
|
||||
|
||||
- [Contributing Guide](../CONTRIBUTING.md) — Development setup, adding rules, PR guidelines
|
||||
- [Security Policy](../SECURITY.md) — Responsible disclosure
|
||||
- [Code of Conduct](../CODE_OF_CONDUCT.md)
|
||||
76
docs/installation.md
Normal file
76
docs/installation.md
Normal file
|
|
@ -0,0 +1,76 @@
|
|||
# Installation
|
||||
|
||||
## Install from crates.io
|
||||
|
||||
```bash
|
||||
cargo install nyx-scanner
|
||||
```
|
||||
|
||||
This installs the `nyx` binary into `~/.cargo/bin/`.
|
||||
|
||||
## Install from GitHub releases
|
||||
|
||||
1. Go to the [Releases](https://github.com/elicpeter/nyx/releases) page.
|
||||
2. Download the binary for your platform:
|
||||
|
||||
| Platform | Archive |
|
||||
|----------|---------|
|
||||
| Linux x86_64 | `nyx-x86_64-unknown-linux-gnu.zip` |
|
||||
| macOS Intel | `nyx-x86_64-apple-darwin.zip` |
|
||||
| macOS Apple Silicon | `nyx-aarch64-apple-darwin.zip` |
|
||||
| Windows x86_64 | `nyx-x86_64-pc-windows-msvc.zip` |
|
||||
|
||||
3. Extract and install:
|
||||
|
||||
```bash
|
||||
# Linux / macOS
|
||||
unzip nyx-*.zip
|
||||
chmod +x nyx
|
||||
sudo mv nyx /usr/local/bin/
|
||||
|
||||
# Windows (PowerShell)
|
||||
Expand-Archive -Path nyx-*.zip -DestinationPath .
|
||||
Move-Item -Path .\nyx.exe -Destination "C:\Program Files\Nyx\"
|
||||
```
|
||||
|
||||
4. Verify:
|
||||
```bash
|
||||
nyx --version
|
||||
```
|
||||
|
||||
## Build from source
|
||||
|
||||
```bash
|
||||
git clone https://github.com/elicpeter/nyx.git
|
||||
cd nyx
|
||||
cargo build --release
|
||||
cargo install --path .
|
||||
```
|
||||
|
||||
Requires **Rust 1.85+** (edition 2024).
|
||||
|
||||
## CI Integration
|
||||
|
||||
### GitHub Actions
|
||||
|
||||
```yaml
|
||||
- name: Install Nyx
|
||||
run: cargo install nyx-scanner
|
||||
|
||||
- name: Run security scan
|
||||
run: nyx scan . --format sarif --fail-on medium > results.sarif
|
||||
|
||||
- name: Upload SARIF
|
||||
uses: github/codeql-action/upload-sarif@v3
|
||||
with:
|
||||
sarif_file: results.sarif
|
||||
```
|
||||
|
||||
### Generic CI
|
||||
|
||||
```bash
|
||||
# Fail the build if any High or Medium finding is detected
|
||||
nyx scan . --severity ">=MEDIUM" --fail-on medium --quiet --format json
|
||||
```
|
||||
|
||||
The `--fail-on` flag causes Nyx to exit with code **1** if any finding meets or exceeds the given severity. Exit code **0** means no findings matched.
|
||||
315
docs/output.md
Normal file
315
docs/output.md
Normal file
|
|
@ -0,0 +1,315 @@
|
|||
# Output Formats
|
||||
|
||||
Nyx supports three output formats, selected with `--format` or `output.default_format` in config.
|
||||
|
||||
## Console (default)
|
||||
|
||||
Human-readable, color-coded output to stdout. Status messages go to stderr.
|
||||
|
||||
```
|
||||
[HIGH] taint-unsanitised-flow (source 5:11) src/handler.rs:12:5 (Score: 76, Confidence: High)
|
||||
Source: env::var("CMD") → Command::new("sh").arg("-c")
|
||||
|
||||
[MEDIUM] cfg-unguarded-sink src/handler.rs:12:5 (Score: 35, Confidence: Medium)
|
||||
|
||||
[LOW] rs.quality.unwrap src/lib.rs:88:5 (Score: 10, Confidence: High)
|
||||
```
|
||||
|
||||
### Severity indicators
|
||||
|
||||
| Tag | Color | Meaning |
|
||||
|-----|-------|---------|
|
||||
| `[HIGH]` | Red, bold | Critical — likely exploitable |
|
||||
| `[MEDIUM]` | Orange, bold | Important — may be exploitable |
|
||||
| `[LOW]` | Muted blue-gray | Informational — code quality or weak signal |
|
||||
|
||||
### Evidence fields
|
||||
|
||||
Taint and state findings include structured evidence:
|
||||
|
||||
| Label | Meaning |
|
||||
|-------|---------|
|
||||
| **Source** | Where tainted data originated (function name + location) |
|
||||
| **Sink** | Where the dangerous operation happens |
|
||||
| **Path guard** | Type of validation predicate protecting the path |
|
||||
|
||||
### Score
|
||||
|
||||
When attack-surface ranking is enabled (default), each finding shows a `Score` value. Higher scores indicate greater exploitability. See [Detector Overview](detectors.md) for the scoring formula.
|
||||
|
||||
### Rollup findings
|
||||
|
||||
High-frequency LOW Quality findings (e.g. `rs.quality.unwrap`) are grouped into rollup findings by `(file, rule)`:
|
||||
|
||||
```
|
||||
21:10 ● [LOW] rs.quality.unwrap
|
||||
rs.quality.unwrap (38 occurrences)
|
||||
Examples: 21:10, 50:10, 79:10, 105:10, 134:10
|
||||
Run: nyx scan --show-instances rs.quality.unwrap
|
||||
```
|
||||
|
||||
Rollups count as **one finding** for LOW budget enforcement. Use `--show-instances <RULE>` to expand a specific rule or `--all` to disable rollups entirely.
|
||||
|
||||
### Suppression footer
|
||||
|
||||
When findings are suppressed by the prioritization pipeline, a footer is shown:
|
||||
|
||||
```
|
||||
Suppressed 195 LOW/Quality findings.
|
||||
Active filters:
|
||||
include_quality = false
|
||||
max_low = 20
|
||||
max_low_per_file = 1
|
||||
max_low_per_rule = 10
|
||||
|
||||
Use --include-quality, --max-low, or --all to adjust.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## JSON
|
||||
|
||||
Machine-readable JSON array. Each finding is an object:
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"path": "src/handler.rs",
|
||||
"line": 12,
|
||||
"col": 5,
|
||||
"severity": "High",
|
||||
"id": "taint-unsanitised-flow (source 5:11)",
|
||||
"path_validated": false,
|
||||
"labels": [
|
||||
["Source", "env::var(\"CMD\") at 5:11"],
|
||||
["Sink", "Command::new(\"sh\").arg(\"-c\")"]
|
||||
],
|
||||
"confidence": "High",
|
||||
"evidence": {
|
||||
"source": {
|
||||
"path": "src/handler.rs",
|
||||
"line": 5,
|
||||
"col": 11,
|
||||
"kind": "source",
|
||||
"snippet": "env::var(\"CMD\")"
|
||||
},
|
||||
"sink": {
|
||||
"path": "src/handler.rs",
|
||||
"line": 12,
|
||||
"col": 5,
|
||||
"kind": "sink",
|
||||
"snippet": "Command::new(\"sh\")"
|
||||
},
|
||||
"notes": ["source_kind:EnvironmentConfig"]
|
||||
},
|
||||
"rank_score": 76.0,
|
||||
"rank_reason": [
|
||||
["severity_base", "60"],
|
||||
["analysis_kind", "10"],
|
||||
["source_kind", "5"],
|
||||
["evidence_count", "1"]
|
||||
]
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### Field descriptions
|
||||
|
||||
| Field | Type | Always present | Description |
|
||||
|-------|------|----------------|-------------|
|
||||
| `path` | string | yes | File path relative to scan root |
|
||||
| `line` | int | yes | 1-indexed line number |
|
||||
| `col` | int | yes | 1-indexed column number |
|
||||
| `severity` | string | yes | `"High"`, `"Medium"`, or `"Low"` |
|
||||
| `id` | string | yes | Rule ID |
|
||||
| `category` | string | yes | Finding category: `"Security"`, `"Reliability"`, or `"Quality"` |
|
||||
| `path_validated` | bool | no | True if guarded by validation predicate |
|
||||
| `guard_kind` | string | no | Predicate type (e.g. `"NullCheck"`, `"ValidationCall"`) |
|
||||
| `message` | string | no | Human-readable context (state analysis findings) |
|
||||
| `labels` | array | no | Array of `[label, value]` pairs for console display |
|
||||
| `confidence` | string | no | Confidence level: `"Low"`, `"Medium"`, or `"High"` |
|
||||
| `evidence` | object | no | Structured evidence (source/sink spans, state, notes) |
|
||||
| `rank_score` | float | no | Attack-surface score (omitted when ranking disabled) |
|
||||
| `rank_reason` | array | no | Score breakdown (omitted when ranking disabled) |
|
||||
| `rollup` | object | no | Rollup data when findings are grouped (see below) |
|
||||
|
||||
Fields marked "no" are omitted when empty/null/false to keep output compact.
|
||||
|
||||
### Confidence levels
|
||||
|
||||
| Level | Meaning |
|
||||
|-------|---------|
|
||||
| `High` | Strong signal — taint-confirmed flow, definite state violation |
|
||||
| `Medium` | Moderate signal — resource leak, path-validated taint, CFG structural |
|
||||
| `Low` | Weak signal — AST pattern match, possible resource leak, degraded analysis |
|
||||
|
||||
### Evidence object
|
||||
|
||||
The `evidence` field provides structured provenance data:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `source` | object | Source span (path, line, col, kind, snippet) |
|
||||
| `sink` | object | Sink span (path, line, col, kind, snippet) |
|
||||
| `guards` | array | Validation guard spans |
|
||||
| `sanitizers` | array | Sanitizer spans |
|
||||
| `state` | object | State-machine evidence (machine, subject, from_state, to_state) |
|
||||
| `notes` | array | Free-form notes (e.g. `"source_kind:UserInput"`, `"path_validated"`) |
|
||||
|
||||
All fields are omitted when empty/null.
|
||||
|
||||
### Rollup object
|
||||
|
||||
When a finding is a rollup (grouped from multiple occurrences), the `rollup` field is present:
|
||||
|
||||
```json
|
||||
{
|
||||
"rollup": {
|
||||
"count": 38,
|
||||
"occurrences": [
|
||||
{ "line": 21, "col": 10 },
|
||||
{ "line": 50, "col": 10 },
|
||||
{ "line": 79, "col": 10 }
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `count` | int | Total number of occurrences |
|
||||
| `occurrences` | array | First N example locations (controlled by `rollup_examples`) |
|
||||
|
||||
---
|
||||
|
||||
## SARIF (Static Analysis Results Interchange Format)
|
||||
|
||||
SARIF 2.1.0 JSON, suitable for GitHub Code Scanning and other SARIF-compatible tools.
|
||||
|
||||
```bash
|
||||
nyx scan . --format sarif > results.sarif
|
||||
```
|
||||
|
||||
The SARIF output includes:
|
||||
|
||||
- **Tool metadata** — Nyx name and version
|
||||
- **Rules** — Rule ID, description, severity mapping
|
||||
- **Results** — One result per finding with location, message, and properties
|
||||
- **Properties** — Each result includes `category` and optionally `confidence` and `rollup.count`
|
||||
- **Related locations** — Rollup findings include example locations in `relatedLocations`
|
||||
- **Artifacts** — File paths referenced by findings
|
||||
|
||||
### GitHub Code Scanning integration
|
||||
|
||||
```yaml
|
||||
- name: Run Nyx
|
||||
run: nyx scan . --format sarif > results.sarif
|
||||
|
||||
- name: Upload SARIF
|
||||
uses: github/codeql-action/upload-sarif@v3
|
||||
with:
|
||||
sarif_file: results.sarif
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Exit Codes
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| `0` | Scan completed successfully; no findings matched `--fail-on` threshold |
|
||||
| `1` | `--fail-on` threshold breached (at least one finding meets or exceeds the specified severity) |
|
||||
| Non-zero | Error (I/O, config, database, parse error) |
|
||||
|
||||
Without `--fail-on`, Nyx always exits `0` on a successful scan regardless of findings count.
|
||||
|
||||
---
|
||||
|
||||
## Severity Levels
|
||||
|
||||
| Level | Description | Typical rules |
|
||||
|-------|-------------|---------------|
|
||||
| **High** | Critical vulnerabilities — likely exploitable | Command injection, unsafe deserialization, banned C functions, taint-confirmed flows with user input sources |
|
||||
| **Medium** | Important issues — may be exploitable with additional context | SQL concatenation, XSS sinks, reflection, unguarded sinks, resource leaks |
|
||||
| **Low** | Informational — code quality or weak signals | Weak crypto algorithms, insecure randomness, `unwrap()`/`panic!()`, type-safety escapes |
|
||||
|
||||
### Non-production severity downgrade
|
||||
|
||||
By default, findings in paths matching common non-production patterns (`tests/`, `test/`, `vendor/`, `build/`, `examples/`, `benchmarks/`) are downgraded by one tier:
|
||||
|
||||
- High → Medium
|
||||
- Medium → Low
|
||||
- Low → Low (unchanged)
|
||||
|
||||
Use `--keep-nonprod-severity` to disable this behavior.
|
||||
|
||||
---
|
||||
|
||||
## Inline Suppressions
|
||||
|
||||
Suppress specific findings directly in source code using `nyx:ignore` comments. Suppressed findings are excluded from output, severity counts, and `--fail-on` checks by default.
|
||||
|
||||
### Comment syntax
|
||||
|
||||
| Language | Comment styles |
|
||||
|----------|---------------|
|
||||
| Rust, C, C++, Java, Go, JS, TS | `// nyx:ignore ...` or `/* nyx:ignore ... */` |
|
||||
| Python, Ruby | `# nyx:ignore ...` |
|
||||
| PHP | `// nyx:ignore ...`, `# nyx:ignore ...`, or `/* nyx:ignore ... */` |
|
||||
|
||||
### Directive forms
|
||||
|
||||
```python
|
||||
x = dangerous() # nyx:ignore taint-unsanitised-flow ← suppresses this line
|
||||
# nyx:ignore-next-line taint-unsanitised-flow
|
||||
x = dangerous() ← suppresses this line
|
||||
```
|
||||
|
||||
- `nyx:ignore <RULE_ID>` — suppresses findings on the **same line** as the comment.
|
||||
- `nyx:ignore-next-line <RULE_ID>` — suppresses findings on the **next line**.
|
||||
- For taint findings, the primary line is the **sink line** (the `line` field in output).
|
||||
|
||||
### Rule ID matching
|
||||
|
||||
- **Case-sensitive**, exact match after canonicalization.
|
||||
- Comma-separated: `nyx:ignore rule-a, rule-b`
|
||||
- Wildcard suffix: `nyx:ignore rs.quality.*` matches any ID starting with `rs.quality.`
|
||||
- Taint IDs are canonicalized: `nyx:ignore taint-unsanitised-flow` matches `taint-unsanitised-flow (source 5:1)` (parenthetical suffix stripped).
|
||||
|
||||
### Console behavior
|
||||
|
||||
- **Default**: suppressed findings are hidden entirely.
|
||||
- **`--show-suppressed`**: suppressed findings appear dimmed with `[SUPPRESSED]` tag. Summary shows `"N issues (M suppressed)"`.
|
||||
|
||||
### JSON / SARIF behavior
|
||||
|
||||
- **Default**: suppressed findings are excluded from JSON/SARIF output.
|
||||
- **`--show-suppressed`**: suppressed findings are included with additional fields:
|
||||
|
||||
```json
|
||||
{
|
||||
"suppressed": true,
|
||||
"suppression": {
|
||||
"kind": "SameLine",
|
||||
"matched_pattern": "taint-unsanitised-flow",
|
||||
"directive_line": 42
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Exit code
|
||||
|
||||
Suppressed findings do **not** trigger `--fail-on`. A scan with only suppressed findings exits `0`.
|
||||
|
||||
---
|
||||
|
||||
## Rule ID Format
|
||||
|
||||
| Prefix | Detector | Example |
|
||||
|--------|----------|---------|
|
||||
| `taint-*` | Taint analysis | `taint-unsanitised-flow (source 5:11)` |
|
||||
| `cfg-*` | CFG structural | `cfg-unguarded-sink`, `cfg-auth-gap` |
|
||||
| `state-*` | State model | `state-use-after-close`, `state-resource-leak` |
|
||||
| `<lang>.*.*` | AST patterns | `rs.memory.transmute`, `js.code_exec.eval` |
|
||||
|
||||
See the [Rule Reference](rules/index.md) for a complete listing.
|
||||
103
docs/quickstart.md
Normal file
103
docs/quickstart.md
Normal file
|
|
@ -0,0 +1,103 @@
|
|||
# Quick Start
|
||||
|
||||
## Your first scan
|
||||
|
||||
```bash
|
||||
# Scan the current directory
|
||||
nyx scan
|
||||
|
||||
# Scan a specific path
|
||||
nyx scan ./my-project
|
||||
```
|
||||
|
||||
Nyx automatically creates an SQLite index on first run. Subsequent scans skip unchanged files.
|
||||
|
||||
## Understanding the output
|
||||
|
||||
A typical console output looks like:
|
||||
|
||||
```
|
||||
[HIGH] taint-unsanitised-flow (source 5:11) src/handler.rs:12:5
|
||||
Source: env::var("CMD") at 5:11
|
||||
Sink: Command::new("sh").arg("-c")
|
||||
Score: 76
|
||||
|
||||
[MEDIUM] cfg-unguarded-sink src/handler.rs:12:5
|
||||
Score: 35
|
||||
|
||||
[MEDIUM] rs.quality.unsafe_block src/lib.rs:44:5
|
||||
Score: 30
|
||||
```
|
||||
|
||||
Each finding shows:
|
||||
|
||||
| Field | Meaning |
|
||||
|-------|---------|
|
||||
| **Severity tag** | `[HIGH]`, `[MEDIUM]`, or `[LOW]` |
|
||||
| **Rule ID** | Identifies the detector and specific rule |
|
||||
| **Location** | `file:line:col` |
|
||||
| **Evidence** | Source, Sink, and guard details (taint findings only) |
|
||||
| **Score** | Attack-surface ranking score (higher = more exploitable) |
|
||||
|
||||
## Common workflows
|
||||
|
||||
### CI gate — fail on high-severity findings
|
||||
|
||||
```bash
|
||||
nyx scan . --fail-on high --quiet
|
||||
# Exit code 1 if any HIGH finding exists, 0 otherwise
|
||||
```
|
||||
|
||||
### Export for tooling
|
||||
|
||||
```bash
|
||||
# JSON for scripting
|
||||
nyx scan . --format json > findings.json
|
||||
|
||||
# SARIF for GitHub Code Scanning
|
||||
nyx scan . --format sarif > results.sarif
|
||||
```
|
||||
|
||||
### Fast structural scan (no dataflow)
|
||||
|
||||
```bash
|
||||
nyx scan . --mode ast
|
||||
```
|
||||
|
||||
AST-only mode runs tree-sitter pattern queries without building CFGs or running taint analysis. Much faster, but misses dataflow vulnerabilities.
|
||||
|
||||
### Filter by severity
|
||||
|
||||
```bash
|
||||
# Only high-severity
|
||||
nyx scan . --severity HIGH
|
||||
|
||||
# High and medium
|
||||
nyx scan . --severity ">=MEDIUM"
|
||||
|
||||
# Specific set
|
||||
nyx scan . --severity "HIGH,MEDIUM"
|
||||
```
|
||||
|
||||
### Skip the index
|
||||
|
||||
```bash
|
||||
nyx scan . --index off
|
||||
```
|
||||
|
||||
Useful for one-off scans or when you don't want to write to disk.
|
||||
|
||||
### Scan without non-production noise
|
||||
|
||||
By default, findings in test/vendor/build paths are downgraded one severity tier. To keep original severity:
|
||||
|
||||
```bash
|
||||
nyx scan . --keep-nonprod-severity
|
||||
```
|
||||
|
||||
## Next steps
|
||||
|
||||
- [CLI Reference](cli.md) — All flags and options
|
||||
- [Configuration](configuration.md) — Customize rules, exclusions, and behavior
|
||||
- [Detector Overview](detectors.md) — How the analysis engines work
|
||||
- [Rule Reference](rules/index.md) — Browse all rules by language
|
||||
89
docs/rules/c.md
Normal file
89
docs/rules/c.md
Normal file
|
|
@ -0,0 +1,89 @@
|
|||
# C Rules
|
||||
|
||||
Nyx detects C vulnerabilities through AST patterns (banned functions, format strings) and taint analysis (user input → shell execution, buffer overflow sinks).
|
||||
|
||||
## Taint Sources
|
||||
|
||||
| Function | Capability | Source Kind |
|
||||
|----------|-----------|-------------|
|
||||
| `getenv` | `all` | EnvironmentConfig |
|
||||
| `fgets`, `scanf`, `fscanf`, `gets`, `read` | `all` | UserInput |
|
||||
|
||||
## Taint Sinks
|
||||
|
||||
| Function | Required Capability |
|
||||
|----------|-------------------|
|
||||
| `system`, `popen`, `exec*` family | `SHELL_ESCAPE` |
|
||||
| `sprintf`, `strcpy`, `strcat` | `HTML_ESCAPE` |
|
||||
| `printf`, `fprintf` | `FMT_STRING` |
|
||||
| `fopen`, `open` | `FILE_IO` |
|
||||
|
||||
---
|
||||
|
||||
## AST Pattern Rules
|
||||
|
||||
### Memory Safety (Banned Functions)
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `c.memory.gets` | High | A | `gets()` — no bounds checking, always exploitable |
|
||||
| `c.memory.strcpy` | High | A | `strcpy()` — no bounds checking on destination buffer |
|
||||
| `c.memory.strcat` | High | A | `strcat()` — no bounds checking on destination buffer |
|
||||
| `c.memory.sprintf` | High | A | `sprintf()` — no length limit on output buffer |
|
||||
| `c.memory.scanf_percent_s` | High | A | `scanf("%s")` — unbounded string read |
|
||||
| `c.memory.printf_no_fmt` | High | B | `printf(var)` — format-string vulnerability (non-literal first arg) |
|
||||
|
||||
### Command Execution
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `c.cmdi.system` | High | A | `system()` — shell command execution |
|
||||
| `c.cmdi.popen` | Medium | A | `popen()` — shell command execution with pipe |
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### `c.memory.gets` — Banned function
|
||||
|
||||
**Vulnerable:**
|
||||
```c
|
||||
char buf[64];
|
||||
gets(buf); // No bounds checking — buffer overflow
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```c
|
||||
char buf[64];
|
||||
fgets(buf, sizeof(buf), stdin);
|
||||
```
|
||||
|
||||
### `c.memory.printf_no_fmt` — Format string
|
||||
|
||||
**Vulnerable:**
|
||||
```c
|
||||
char *user_input = get_input();
|
||||
printf(user_input); // Format string vulnerability
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```c
|
||||
char *user_input = get_input();
|
||||
printf("%s", user_input);
|
||||
```
|
||||
|
||||
### `c.cmdi.system` — Shell execution
|
||||
|
||||
**Vulnerable:**
|
||||
```c
|
||||
char cmd[256];
|
||||
snprintf(cmd, sizeof(cmd), "ls %s", user_dir);
|
||||
system(cmd); // Command injection if user_dir contains shell metacharacters
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```c
|
||||
// Use execvp with explicit argument array
|
||||
char *args[] = {"ls", user_dir, NULL};
|
||||
execvp("ls", args);
|
||||
```
|
||||
66
docs/rules/cpp.md
Normal file
66
docs/rules/cpp.md
Normal file
|
|
@ -0,0 +1,66 @@
|
|||
# C++ Rules
|
||||
|
||||
C++ rules inherit C banned-function concerns and add C++-specific patterns like dangerous casts.
|
||||
|
||||
## Taint Labels
|
||||
|
||||
C++ shares taint labels with C. See [C Rules](c.md) for the full source/sink/sanitizer listing.
|
||||
|
||||
---
|
||||
|
||||
## AST Pattern Rules
|
||||
|
||||
### Memory Safety
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `cpp.memory.gets` | High | A | `gets()` — no bounds checking, always exploitable |
|
||||
| `cpp.memory.strcpy` | High | A | `strcpy()` — no bounds checking on destination |
|
||||
| `cpp.memory.strcat` | High | A | `strcat()` — no bounds checking on destination |
|
||||
| `cpp.memory.sprintf` | High | A | `sprintf()` — no length limit on output |
|
||||
| `cpp.memory.reinterpret_cast` | Medium | A | `reinterpret_cast` — type-punning cast |
|
||||
| `cpp.memory.const_cast` | Medium | A | `const_cast` — removes const/volatile qualifier |
|
||||
| `cpp.memory.printf_no_fmt` | High | B | `printf(var)` — format-string vulnerability |
|
||||
|
||||
### Command Execution
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `cpp.cmdi.system` | High | A | `system()` — shell command execution |
|
||||
| `cpp.cmdi.popen` | High | A | `popen()` — shell command execution |
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### `cpp.memory.reinterpret_cast` — Type-punning cast
|
||||
|
||||
**Flagged:**
|
||||
```cpp
|
||||
int x = 42;
|
||||
float* fp = reinterpret_cast<float*>(&x); // Type-punning, may violate strict aliasing
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```cpp
|
||||
int x = 42;
|
||||
float f;
|
||||
std::memcpy(&f, &x, sizeof(f)); // Well-defined type punning
|
||||
```
|
||||
|
||||
### `cpp.memory.const_cast` — Removing const
|
||||
|
||||
**Flagged:**
|
||||
```cpp
|
||||
void process(const std::string& s) {
|
||||
char* p = const_cast<char*>(s.c_str()); // Removes const
|
||||
p[0] = 'X'; // Undefined behavior
|
||||
}
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```cpp
|
||||
void process(std::string s) { // Take by value
|
||||
s[0] = 'X';
|
||||
}
|
||||
```
|
||||
148
docs/rules/go.md
Normal file
148
docs/rules/go.md
Normal file
|
|
@ -0,0 +1,148 @@
|
|||
# Go Rules
|
||||
|
||||
Nyx detects Go vulnerabilities through AST patterns and taint analysis, covering command execution, unsafe pointer usage, TLS misconfiguration, weak crypto, SQL injection, hardcoded secrets, and deserialization.
|
||||
|
||||
## Taint Labels
|
||||
|
||||
Go has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/go.rs`.
|
||||
|
||||
### Sources
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `os.Getenv` | all |
|
||||
| `http.Request`, `r.FormValue`, `r.URL`, `r.Body`, `r.Header` | all |
|
||||
| `r.URL.Query`, `r.URL.Query.Get`, `Request.FormValue`, `Request.URL` | all |
|
||||
|
||||
### Sanitizers
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `html.EscapeString`, `template.HTMLEscapeString` | HTML_ESCAPE |
|
||||
| `url.QueryEscape`, `url.PathEscape` | URL_ENCODE |
|
||||
| `filepath.Clean`, `filepath.Base` | FILE_IO |
|
||||
|
||||
### Sinks
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `exec.Command` | SHELL_ESCAPE |
|
||||
| `db.Query`, `db.Exec`, `db.QueryRow`, `db.Prepare` | SHELL_ESCAPE |
|
||||
| `fmt.Fprintf`, `fmt.Sprintf`, `fmt.Printf` | FMT_STRING |
|
||||
| `os.Open`, `os.OpenFile`, `os.Create`, `ioutil.ReadFile`, `os.ReadFile` | FILE_IO |
|
||||
| `template.HTML` | HTML_ESCAPE |
|
||||
|
||||
> **Note:** Chained calls like `r.URL.Query().Get("host")` are normalized by stripping internal `()` segments before matching, so `r.URL.Query.Get` matches the source rule.
|
||||
|
||||
---
|
||||
|
||||
## AST Pattern Rules
|
||||
|
||||
### Command Execution
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `go.cmdi.exec_command` | High | A | `exec.Command()` — arbitrary process execution |
|
||||
|
||||
### Memory Safety
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `go.memory.unsafe_pointer` | Medium | A | `unsafe.Pointer` — bypasses Go type system |
|
||||
|
||||
### Insecure Transport
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `go.transport.insecure_skip_verify` | High | A | `InsecureSkipVerify: true` — disables TLS certificate validation |
|
||||
|
||||
### Weak Crypto
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `go.crypto.md5` | Low | A | `md5.New()` / `md5.Sum()` — weak hash algorithm |
|
||||
| `go.crypto.sha1` | Low | A | `sha1.New()` / `sha1.Sum()` — weak hash algorithm |
|
||||
|
||||
### SQL Injection
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `go.sqli.query_concat` | Medium | B | `db.Query`/`Exec`/`QueryRow` with concatenated string |
|
||||
|
||||
### Secrets
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `go.secrets.hardcoded_key` | Medium | A | Variable with secret-like name assigned a string literal |
|
||||
|
||||
### Deserialization
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `go.deser.gob_decode` | Medium | A | `gob.NewDecoder` — Go binary deserialization |
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### `go.transport.insecure_skip_verify` — TLS misconfiguration
|
||||
|
||||
**Vulnerable:**
|
||||
```go
|
||||
tr := &http.Transport{
|
||||
TLSClientConfig: &tls.Config{
|
||||
InsecureSkipVerify: true, // Disables certificate verification
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```go
|
||||
tr := &http.Transport{
|
||||
TLSClientConfig: &tls.Config{
|
||||
// Use proper CA certificates
|
||||
RootCAs: certPool,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### `go.sqli.query_concat` — SQL concatenation
|
||||
|
||||
**Vulnerable:**
|
||||
```go
|
||||
rows, err := db.Query("SELECT * FROM users WHERE id=" + userID)
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```go
|
||||
rows, err := db.Query("SELECT * FROM users WHERE id=$1", userID)
|
||||
```
|
||||
|
||||
### `go.secrets.hardcoded_key` — Hardcoded secret
|
||||
|
||||
**Flagged:**
|
||||
```go
|
||||
apiKey := "sk-1234567890abcdef"
|
||||
password := "hunter2"
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```go
|
||||
apiKey := os.Getenv("API_KEY")
|
||||
password := os.Getenv("DB_PASSWORD")
|
||||
```
|
||||
|
||||
### `go.cmdi.exec_command` — Command execution
|
||||
|
||||
**Vulnerable:**
|
||||
```go
|
||||
cmd := exec.Command("sh", "-c", userInput)
|
||||
cmd.Run()
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```go
|
||||
// Use explicit command and arguments, not shell
|
||||
cmd := exec.Command("ls", "-la", safeDir)
|
||||
cmd.Run()
|
||||
```
|
||||
79
docs/rules/index.md
Normal file
79
docs/rules/index.md
Normal file
|
|
@ -0,0 +1,79 @@
|
|||
# Rule Reference
|
||||
|
||||
This section lists every detection rule in Nyx, organized by language.
|
||||
|
||||
## Rule ID Format
|
||||
|
||||
| Prefix | Detector Family | Example |
|
||||
|--------|----------------|---------|
|
||||
| `taint-*` | [Taint analysis](../detectors/taint.md) | `taint-unsanitised-flow (source 5:11)` |
|
||||
| `cfg-*` | [CFG structural](../detectors/cfg.md) | `cfg-unguarded-sink`, `cfg-auth-gap` |
|
||||
| `state-*` | [State model](../detectors/state.md) | `state-use-after-close`, `state-resource-leak` |
|
||||
| `<lang>.*.*` | [AST patterns](../detectors/patterns.md) | `rs.memory.transmute`, `js.code_exec.eval` |
|
||||
|
||||
## Cross-Language Rules
|
||||
|
||||
These rules apply to all supported languages:
|
||||
|
||||
### Taint Rules
|
||||
|
||||
| Rule ID | Severity | Description |
|
||||
|---------|----------|-------------|
|
||||
| `taint-unsanitised-flow (source L:C)` | Varies by source kind | Unsanitized data flows from source to sink |
|
||||
|
||||
### CFG Structural Rules
|
||||
|
||||
| Rule ID | Severity | Description |
|
||||
|---------|----------|-------------|
|
||||
| `cfg-unguarded-sink` | High/Medium | Sink without dominating guard |
|
||||
| `cfg-auth-gap` | High | Web handler reaches privileged sink without auth |
|
||||
| `cfg-unreachable-sink` | Medium | Dangerous function in unreachable code |
|
||||
| `cfg-unreachable-sanitizer` | Low | Sanitizer in unreachable code |
|
||||
| `cfg-unreachable-source` | Low | Source in unreachable code |
|
||||
| `cfg-error-fallthrough` | High/Medium | Error path doesn't terminate before dangerous code |
|
||||
| `cfg-resource-leak` | Medium | Resource not released on all exit paths |
|
||||
| `cfg-lock-not-released` | Medium | Lock not released on all exit paths |
|
||||
|
||||
### State Model Rules
|
||||
|
||||
| Rule ID | Severity | Description |
|
||||
|---------|----------|-------------|
|
||||
| `state-use-after-close` | High | Variable used after being closed |
|
||||
| `state-double-close` | Medium | Resource closed twice |
|
||||
| `state-resource-leak` | Medium | Resource never closed (definite) |
|
||||
| `state-resource-leak-possible` | Low | Resource may not close on all paths |
|
||||
| `state-unauthed-access` | High | Privileged operation without authentication |
|
||||
|
||||
## Per-Language AST Pattern Rules
|
||||
|
||||
Each language page lists all AST pattern rules with examples:
|
||||
|
||||
- [Rust](rust.md) — 12 rules (memory safety, code quality)
|
||||
- [C](c.md) — 8 rules (banned functions, command execution, format strings)
|
||||
- [C++](cpp.md) — 9 rules (banned functions, dangerous casts, command execution)
|
||||
- [Java](java.md) — 8 rules (deserialization, command execution, reflection, SQL, crypto, XSS)
|
||||
- [Go](go.md) — 8 rules (command execution, unsafe pointer, TLS, crypto, SQL, secrets, deserialization)
|
||||
- [JavaScript](javascript.md) — 12 rules (code execution, XSS, prototype pollution, crypto, transport)
|
||||
- [TypeScript](typescript.md) — 10 rules (mirrors JS + type-safety escapes)
|
||||
- [Python](python.md) — 12 rules (code execution, command execution, deserialization, SQL, crypto, XSS)
|
||||
- [PHP](php.md) — 11 rules (code execution, command execution, deserialization, SQL, path traversal, crypto)
|
||||
- [Ruby](ruby.md) — 10 rules (code execution, command execution, deserialization, reflection, SSRF, crypto)
|
||||
|
||||
## Taint Label Coverage
|
||||
|
||||
Taint analysis uses language-specific source/sink/sanitizer labels. Coverage varies by language:
|
||||
|
||||
| Language | Sources | Sinks | Sanitizers | Coverage |
|
||||
|----------|---------|-------|------------|----------|
|
||||
| Rust | Complete | Complete | Complete | Full |
|
||||
| JavaScript | Complete | Complete | Partial | Full |
|
||||
| TypeScript | Partial | Partial | Partial | Moderate |
|
||||
| Python | Partial | Complete | Partial | Moderate |
|
||||
| C | Partial | Complete | Minimal | Moderate |
|
||||
| C++ | Partial | Complete | Minimal | Moderate |
|
||||
| Java | Partial | Partial | Partial | Moderate |
|
||||
| Go | Complete | Complete | Partial | Full |
|
||||
| PHP | Complete | Complete | Partial | Full |
|
||||
| Ruby | Partial | Partial | Partial | Moderate |
|
||||
|
||||
"Starter" coverage means basic rules exist but many common library functions are not yet labeled. Contributions welcome.
|
||||
135
docs/rules/java.md
Normal file
135
docs/rules/java.md
Normal file
|
|
@ -0,0 +1,135 @@
|
|||
# Java Rules
|
||||
|
||||
Nyx detects Java vulnerabilities through AST patterns and taint analysis, covering deserialization, command execution, reflection, SQL injection, weak crypto, and XSS.
|
||||
|
||||
## Taint Labels
|
||||
|
||||
Java has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/java.rs`.
|
||||
|
||||
### Sources
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `System.getenv` | all |
|
||||
| `getParameter`, `getInputStream`, `getHeader`, `getCookies`, `getReader`, `getQueryString`, `getPathInfo` | all |
|
||||
| `readObject`, `readLine` | all |
|
||||
|
||||
### Sanitizers
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `HtmlUtils.htmlEscape`, `StringEscapeUtils.escapeHtml4` | HTML_ESCAPE |
|
||||
|
||||
### Sinks
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `Runtime.exec`, `ProcessBuilder` | SHELL_ESCAPE |
|
||||
| `executeQuery`, `executeUpdate`, `prepareStatement` | SHELL_ESCAPE |
|
||||
| `Class.forName` | SHELL_ESCAPE |
|
||||
| `println`, `print`, `write` | HTML_ESCAPE |
|
||||
|
||||
---
|
||||
|
||||
## AST Pattern Rules
|
||||
|
||||
### Deserialization
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `java.deser.readobject` | High | A | `ObjectInputStream.readObject()` — unsafe deserialization |
|
||||
|
||||
### Command Execution
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `java.cmdi.runtime_exec` | High | A | `Runtime.getRuntime().exec()` — shell command execution |
|
||||
|
||||
### Reflection
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `java.reflection.class_forname` | Medium | A | `Class.forName()` — dynamic class loading |
|
||||
| `java.reflection.method_invoke` | Medium | A | `Method.invoke()` — reflective method invocation |
|
||||
|
||||
### SQL Injection
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `java.sqli.execute_concat` | Medium | B | SQL `execute*()` with concatenated string argument |
|
||||
|
||||
### Weak Crypto
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `java.crypto.insecure_random` | Low | A | `new Random()` — `java.util.Random` is not cryptographically secure |
|
||||
| `java.crypto.weak_digest` | Low | A | `MessageDigest.getInstance("MD5"/"SHA1")` |
|
||||
|
||||
### XSS
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `java.xss.getwriter_print` | Medium | A | `response.getWriter().print/println/write` — direct output |
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### `java.deser.readobject` — Unsafe deserialization
|
||||
|
||||
**Vulnerable:**
|
||||
```java
|
||||
ObjectInputStream ois = new ObjectInputStream(request.getInputStream());
|
||||
Object obj = ois.readObject(); // Arbitrary object instantiation
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```java
|
||||
// Use a safe format like JSON
|
||||
ObjectMapper mapper = new ObjectMapper();
|
||||
MyType obj = mapper.readValue(request.getInputStream(), MyType.class);
|
||||
```
|
||||
|
||||
### `java.sqli.execute_concat` — SQL concatenation
|
||||
|
||||
**Vulnerable:**
|
||||
```java
|
||||
String query = "SELECT * FROM users WHERE id=" + userId;
|
||||
stmt.executeQuery(query); // SQL injection
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```java
|
||||
PreparedStatement ps = conn.prepareStatement("SELECT * FROM users WHERE id=?");
|
||||
ps.setString(1, userId);
|
||||
ResultSet rs = ps.executeQuery();
|
||||
```
|
||||
|
||||
### `java.cmdi.runtime_exec` — Command execution
|
||||
|
||||
**Vulnerable:**
|
||||
```java
|
||||
Runtime.getRuntime().exec("cmd /c " + userCommand);
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```java
|
||||
ProcessBuilder pb = new ProcessBuilder("cmd", "/c", "dir");
|
||||
// Use explicit argument list, never concatenate user input
|
||||
```
|
||||
|
||||
### `java.reflection.class_forname` — Dynamic class loading
|
||||
|
||||
**Flagged:**
|
||||
```java
|
||||
Class<?> cls = Class.forName(className);
|
||||
Object obj = cls.getDeclaredConstructor().newInstance();
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```java
|
||||
// Use an allowlist of permitted class names
|
||||
Map<String, Class<?>> allowed = Map.of("User", User.class, "Order", Order.class);
|
||||
Class<?> cls = allowed.get(className);
|
||||
if (cls != null) { /* ... */ }
|
||||
```
|
||||
138
docs/rules/javascript.md
Normal file
138
docs/rules/javascript.md
Normal file
|
|
@ -0,0 +1,138 @@
|
|||
# JavaScript Rules
|
||||
|
||||
JavaScript has the most complete taint label coverage alongside Rust. Nyx detects code execution, XSS, prototype pollution, command injection, and weak crypto.
|
||||
|
||||
## Taint Sources
|
||||
|
||||
| Function | Capability | Source Kind |
|
||||
|----------|-----------|-------------|
|
||||
| `document.location`, `window.location` | `all` | UserInput |
|
||||
| `req.body`, `req.query`, `req.params` | `all` | UserInput |
|
||||
| `req.headers`, `req.cookies` | `all` | UserInput |
|
||||
| `process.env` | `all` | EnvironmentConfig |
|
||||
|
||||
## Taint Sinks
|
||||
|
||||
| Function | Required Capability |
|
||||
|----------|-------------------|
|
||||
| `eval` | `SHELL_ESCAPE` |
|
||||
| `innerHTML` | `HTML_ESCAPE` |
|
||||
| `location.href`, `window.location.href` | `URL_ENCODE` |
|
||||
| `child_process.exec`, `child_process.execSync` | `SHELL_ESCAPE` |
|
||||
| `child_process.spawn` | `SHELL_ESCAPE` |
|
||||
|
||||
## Taint Sanitizers
|
||||
|
||||
| Function | Strips Capability |
|
||||
|----------|------------------|
|
||||
| `JSON.parse` | `JSON_PARSE` |
|
||||
| `encodeURIComponent`, `encodeURI` | `URL_ENCODE` |
|
||||
| `DOMPurify.sanitize` | `HTML_ESCAPE` |
|
||||
|
||||
> **Note:** Anonymous function expressions and arrow functions passed as callback arguments (e.g., Express `app.get('/path', function(req, res) { ... })`) are automatically walked as separate function scopes for taint analysis. Each anonymous function gets a unique scope identifier to prevent cross-function taint leakage.
|
||||
|
||||
---
|
||||
|
||||
## AST Pattern Rules
|
||||
|
||||
### Code Execution
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `js.code_exec.eval` | High | A | `eval()` — dynamic code execution |
|
||||
| `js.code_exec.new_function` | High | A | `new Function()` — eval equivalent |
|
||||
| `js.code_exec.settimeout_string` | Medium | A | `setTimeout`/`setInterval` with string argument |
|
||||
|
||||
### XSS Sinks
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `js.xss.document_write` | Medium | A | `document.write()` / `document.writeln()` |
|
||||
| `js.xss.outer_html` | Medium | A | Assignment to `.outerHTML` |
|
||||
| `js.xss.insert_adjacent_html` | Medium | A | `insertAdjacentHTML()` |
|
||||
| `js.xss.location_assign` | Medium | A | Assignment to `location`/`location.href` — open redirect |
|
||||
| `js.xss.cookie_write` | Medium | A | Write to `document.cookie` |
|
||||
|
||||
### Prototype Pollution
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `js.prototype.proto_assignment` | Medium | A | Assignment to `__proto__` |
|
||||
| `js.prototype.extend_object` | Medium | A | Assignment to `Object.prototype.*` |
|
||||
|
||||
### Weak Crypto
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `js.crypto.weak_hash` | Low | A | `crypto.createHash("md5"/"sha1")` |
|
||||
| `js.crypto.math_random` | Low | A | `Math.random()` — not cryptographically secure |
|
||||
|
||||
### Insecure Transport
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `js.transport.fetch_http` | Low | A | `fetch("http://...")` — plaintext HTTP |
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### `js.code_exec.eval` — Dynamic code execution
|
||||
|
||||
**Vulnerable:**
|
||||
```javascript
|
||||
const code = req.query.code;
|
||||
eval(code); // Remote code execution
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```javascript
|
||||
// Use a sandboxed interpreter or avoid eval entirely
|
||||
const allowed = { add: (a, b) => a + b };
|
||||
const result = allowed[req.query.operation]?.(req.query.a, req.query.b);
|
||||
```
|
||||
|
||||
### `js.xss.document_write` — XSS sink
|
||||
|
||||
**Vulnerable:**
|
||||
```javascript
|
||||
document.write("<h1>" + userName + "</h1>");
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```javascript
|
||||
const el = document.createElement("h1");
|
||||
el.textContent = userName;
|
||||
document.body.appendChild(el);
|
||||
```
|
||||
|
||||
### `js.prototype.proto_assignment` — Prototype pollution
|
||||
|
||||
**Vulnerable:**
|
||||
```javascript
|
||||
function merge(target, source) {
|
||||
for (let key in source) {
|
||||
target[key] = source[key]; // If key is "__proto__", pollutes prototype
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```javascript
|
||||
function merge(target, source) {
|
||||
for (let key in source) {
|
||||
if (key === "__proto__" || key === "constructor") continue;
|
||||
target[key] = source[key];
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Taint: `req.body` → `eval()`
|
||||
|
||||
**Finding:**
|
||||
```
|
||||
[HIGH] taint-unsanitised-flow (source 2:18) src/handler.js:3:5
|
||||
Source: req.body at 2:18
|
||||
Sink: eval()
|
||||
Score: 78
|
||||
```
|
||||
138
docs/rules/php.md
Normal file
138
docs/rules/php.md
Normal file
|
|
@ -0,0 +1,138 @@
|
|||
# PHP Rules
|
||||
|
||||
Nyx detects PHP vulnerabilities through AST patterns and taint analysis, covering code execution, command injection, deserialization, SQL injection, path traversal, and weak crypto.
|
||||
|
||||
## Taint Labels
|
||||
|
||||
PHP has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/php.rs`.
|
||||
|
||||
### Sources
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `$_GET` / `_GET`, `$_POST` / `_POST`, `$_REQUEST` / `_REQUEST`, `$_COOKIE` / `_COOKIE`, `$_FILES` / `_FILES`, `$_SERVER` / `_SERVER`, `$_ENV` / `_ENV` | all |
|
||||
| `file_get_contents`, `fread` | all |
|
||||
|
||||
> **Note:** PHP superglobal names are matched both with and without the `$` prefix because the CFG's `collect_idents` strips the leading `$` from variable names. Subscript access like `$_GET['cmd']` is handled via `element_reference` / `subscript_expression` node detection.
|
||||
|
||||
### Sanitizers
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `htmlspecialchars`, `htmlentities` | HTML_ESCAPE |
|
||||
| `escapeshellarg`, `escapeshellcmd` | SHELL_ESCAPE |
|
||||
| `basename` | FILE_IO |
|
||||
|
||||
### Sinks
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `system`, `exec`, `passthru`, `shell_exec`, `proc_open`, `popen` | SHELL_ESCAPE |
|
||||
| `eval`, `assert` | SHELL_ESCAPE |
|
||||
| `include`, `include_once`, `require`, `require_once` | FILE_IO |
|
||||
| `unserialize` | SHELL_ESCAPE |
|
||||
| `move_uploaded_file`, `copy`, `file_put_contents`, `fwrite` | FILE_IO |
|
||||
| `echo`, `print` | HTML_ESCAPE |
|
||||
| `mysqli_query`, `pg_query`, `query` | SHELL_ESCAPE |
|
||||
|
||||
---
|
||||
|
||||
## AST Pattern Rules
|
||||
|
||||
### Code Execution
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `php.code_exec.eval` | High | A | `eval()` — dynamic code execution |
|
||||
| `php.code_exec.create_function` | High | A | `create_function()` — deprecated eval-like constructor |
|
||||
| `php.code_exec.preg_replace_e` | High | A | `preg_replace` with `/e` modifier — code execution via regex |
|
||||
| `php.code_exec.assert_string` | High | A | `assert()` with string argument — evaluates PHP code |
|
||||
|
||||
### Command Execution
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `php.cmdi.system` | High | A | `system`/`shell_exec`/`exec`/`passthru`/`proc_open`/`popen` |
|
||||
|
||||
### Deserialization
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `php.deser.unserialize` | High | A | `unserialize()` — PHP object injection |
|
||||
|
||||
### SQL Injection
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `php.sqli.query_concat` | Medium | B | `mysql_query`/`mysqli_query` with concatenated SQL |
|
||||
|
||||
### Path Traversal
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `php.path.include_variable` | High | B | `include`/`require` with variable path — file inclusion |
|
||||
|
||||
### Weak Crypto
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `php.crypto.md5` | Low | A | `md5()` — weak hash function |
|
||||
| `php.crypto.sha1` | Low | A | `sha1()` — weak hash function |
|
||||
| `php.crypto.rand` | Low | A | `rand()`/`mt_rand()` — not cryptographically secure |
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### `php.code_exec.eval` — Dynamic code execution
|
||||
|
||||
**Vulnerable:**
|
||||
```php
|
||||
eval($_GET['code']);
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```php
|
||||
// Never use eval with user input
|
||||
// Use a template engine or allowlisted operations
|
||||
```
|
||||
|
||||
### `php.deser.unserialize` — Object injection
|
||||
|
||||
**Vulnerable:**
|
||||
```php
|
||||
$obj = unserialize($_COOKIE['data']);
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```php
|
||||
$data = json_decode($_COOKIE['data'], true);
|
||||
```
|
||||
|
||||
### `php.path.include_variable` — File inclusion
|
||||
|
||||
**Vulnerable:**
|
||||
```php
|
||||
include($_GET['page']); // Local/remote file inclusion
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```php
|
||||
$allowed = ['home', 'about', 'contact'];
|
||||
$page = in_array($_GET['page'], $allowed) ? $_GET['page'] : 'home';
|
||||
include("pages/{$page}.php");
|
||||
```
|
||||
|
||||
### `php.sqli.query_concat` — SQL concatenation
|
||||
|
||||
**Vulnerable:**
|
||||
```php
|
||||
mysqli_query($conn, "SELECT * FROM users WHERE id=" . $_GET['id']);
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```php
|
||||
$stmt = $conn->prepare("SELECT * FROM users WHERE id=?");
|
||||
$stmt->bind_param("i", $_GET['id']);
|
||||
$stmt->execute();
|
||||
```
|
||||
142
docs/rules/python.md
Normal file
142
docs/rules/python.md
Normal file
|
|
@ -0,0 +1,142 @@
|
|||
# Python Rules
|
||||
|
||||
Nyx detects Python vulnerabilities through AST patterns and taint analysis, covering code execution, command injection, deserialization, SQL injection, and weak crypto.
|
||||
|
||||
## Taint Labels
|
||||
|
||||
Python has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/python.rs`.
|
||||
|
||||
### Sources
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `os.getenv`, `os.environ` | all |
|
||||
| `request.args`, `request.form`, `request.json`, `request.headers`, `request.cookies`, `input` | all |
|
||||
| `sys.argv` | all |
|
||||
| `argparse.parse_args`, `urllib.request.urlopen`, `requests.get`, `requests.post` | all |
|
||||
|
||||
### Sanitizers
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `html.escape` | HTML_ESCAPE |
|
||||
| `shlex.quote` | SHELL_ESCAPE |
|
||||
| `os.path.realpath` | FILE_IO |
|
||||
|
||||
### Sinks
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `eval`, `exec` | SHELL_ESCAPE |
|
||||
| `os.system`, `os.popen`, `subprocess.call`, `subprocess.run`, `subprocess.Popen`, `subprocess.check_output`, `subprocess.check_call` | SHELL_ESCAPE |
|
||||
| `cursor.execute`, `cursor.executemany` | SHELL_ESCAPE |
|
||||
| `send_file`, `send_from_directory` | FILE_IO |
|
||||
| `open` | FILE_IO |
|
||||
|
||||
---
|
||||
|
||||
## AST Pattern Rules
|
||||
|
||||
### Code Execution
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `py.code_exec.eval` | High | A | `eval()` — dynamic code execution |
|
||||
| `py.code_exec.exec` | High | A | `exec()` — dynamic code execution |
|
||||
| `py.code_exec.compile` | Medium | A | `compile()` with exec/eval mode |
|
||||
|
||||
### Command Execution
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `py.cmdi.os_system` | High | A | `os.system()` — shell command execution |
|
||||
| `py.cmdi.os_popen` | High | A | `os.popen()` — shell command execution |
|
||||
| `py.cmdi.subprocess_shell` | High | B | `subprocess.*` with `shell=True` |
|
||||
|
||||
### Deserialization
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `py.deser.pickle_loads` | High | A | `pickle.loads()` / `pickle.load()` — arbitrary object deserialization |
|
||||
| `py.deser.yaml_load` | High | A | `yaml.load()` without SafeLoader |
|
||||
| `py.deser.shelve_open` | Medium | A | `shelve.open()` — pickle-backed deserialization |
|
||||
|
||||
### SQL Injection
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `py.sqli.execute_format` | Medium | B | `cursor.execute()` with string concatenation |
|
||||
|
||||
### Weak Crypto
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `py.crypto.md5` | Low | A | `hashlib.md5()` — weak hash algorithm |
|
||||
| `py.crypto.sha1` | Low | A | `hashlib.sha1()` — weak hash algorithm |
|
||||
|
||||
### Template Injection
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `py.xss.jinja_from_string` | Medium | A | `jinja2.Template.from_string()` — template injection |
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### `py.deser.pickle_loads` — Unsafe deserialization
|
||||
|
||||
**Vulnerable:**
|
||||
```python
|
||||
import pickle
|
||||
data = pickle.loads(request.body) # Arbitrary code execution
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```python
|
||||
import json
|
||||
data = json.loads(request.body) # JSON is safe
|
||||
```
|
||||
|
||||
### `py.cmdi.subprocess_shell` — Shell execution
|
||||
|
||||
**Vulnerable:**
|
||||
```python
|
||||
import subprocess
|
||||
subprocess.call(user_input, shell=True) # Command injection
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```python
|
||||
import subprocess
|
||||
import shlex
|
||||
subprocess.call(shlex.split(user_input), shell=False)
|
||||
# Or better: use an explicit command list
|
||||
subprocess.call(["ls", "-la", user_dir])
|
||||
```
|
||||
|
||||
### `py.deser.yaml_load` — Unsafe YAML
|
||||
|
||||
**Vulnerable:**
|
||||
```python
|
||||
import yaml
|
||||
config = yaml.load(user_data) # Can instantiate arbitrary objects
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```python
|
||||
import yaml
|
||||
config = yaml.safe_load(user_data) # Only basic Python types
|
||||
```
|
||||
|
||||
### `py.sqli.execute_format` — SQL concatenation
|
||||
|
||||
**Vulnerable:**
|
||||
```python
|
||||
cursor.execute("SELECT * FROM users WHERE id=" + user_id)
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```python
|
||||
cursor.execute("SELECT * FROM users WHERE id=?", (user_id,))
|
||||
```
|
||||
132
docs/rules/ruby.md
Normal file
132
docs/rules/ruby.md
Normal file
|
|
@ -0,0 +1,132 @@
|
|||
# Ruby Rules
|
||||
|
||||
Nyx detects Ruby vulnerabilities through AST patterns and taint analysis, covering code execution, command injection, deserialization, reflection, SSRF, and weak crypto.
|
||||
|
||||
## Taint Labels
|
||||
|
||||
Ruby has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/ruby.rs`.
|
||||
|
||||
### Sources
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `ENV`, `gets` | all |
|
||||
| `params` | all |
|
||||
|
||||
> **Note:** Ruby's `params[:cmd]` subscript access is detected via `element_reference` node handling in the CFG. Sinatra/Rails `do...end` blocks are walked as function scopes.
|
||||
|
||||
### Sanitizers
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `CGI.escapeHTML`, `ERB::Util.html_escape` | HTML_ESCAPE |
|
||||
| `Shellwords.escape`, `Shellwords.shellescape` | SHELL_ESCAPE |
|
||||
|
||||
### Sinks
|
||||
|
||||
| Matcher | Cap |
|
||||
|---------|-----|
|
||||
| `system`, `exec` | SHELL_ESCAPE |
|
||||
| `eval` | SHELL_ESCAPE |
|
||||
| `puts`, `print` | HTML_ESCAPE |
|
||||
|
||||
---
|
||||
|
||||
## AST Pattern Rules
|
||||
|
||||
### Code Execution
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `rb.code_exec.eval` | High | A | `Kernel#eval` — dynamic code execution |
|
||||
| `rb.code_exec.instance_eval` | High | A | `instance_eval` — evaluates string in object context |
|
||||
| `rb.code_exec.class_eval` | High | A | `class_eval` / `module_eval` — evaluates string in class context |
|
||||
|
||||
### Command Execution
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `rb.cmdi.backtick` | High | A | Backtick shell execution (`` `cmd` ``) |
|
||||
| `rb.cmdi.system_interp` | High | A | `system`/`exec` call — command execution risk |
|
||||
|
||||
### Deserialization
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `rb.deser.yaml_load` | High | A | `YAML.load` — arbitrary object deserialization |
|
||||
| `rb.deser.marshal_load` | High | A | `Marshal.load` — arbitrary Ruby object deserialization |
|
||||
|
||||
### Reflection
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `rb.reflection.send_dynamic` | Medium | B | `send()` with non-symbol argument — arbitrary method dispatch |
|
||||
| `rb.reflection.constantize` | Medium | A | `constantize` / `safe_constantize` — dynamic class resolution |
|
||||
|
||||
### SSRF
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `rb.ssrf.open_uri` | Medium | A | `Kernel#open` with HTTP URL — SSRF via open-uri |
|
||||
|
||||
### Weak Crypto
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `rb.crypto.md5` | Low | A | `Digest::MD5` — weak hash algorithm |
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### `rb.deser.yaml_load` — Unsafe YAML deserialization
|
||||
|
||||
**Vulnerable:**
|
||||
```ruby
|
||||
data = YAML.load(params[:config]) # Arbitrary object instantiation
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```ruby
|
||||
data = YAML.safe_load(params[:config]) # Only basic Ruby types
|
||||
```
|
||||
|
||||
### `rb.cmdi.backtick` — Backtick shell execution
|
||||
|
||||
**Vulnerable:**
|
||||
```ruby
|
||||
output = `ls #{user_dir}` # Command injection via interpolation
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```ruby
|
||||
require 'open3'
|
||||
output, status = Open3.capture2('ls', user_dir)
|
||||
```
|
||||
|
||||
### `rb.reflection.send_dynamic` — Dynamic method dispatch
|
||||
|
||||
**Vulnerable:**
|
||||
```ruby
|
||||
obj.send(params[:method], params[:arg]) # Arbitrary method invocation
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```ruby
|
||||
allowed = %w[name email phone]
|
||||
if allowed.include?(params[:method])
|
||||
obj.send(params[:method])
|
||||
end
|
||||
```
|
||||
|
||||
### `rb.deser.marshal_load` — Marshal deserialization
|
||||
|
||||
**Vulnerable:**
|
||||
```ruby
|
||||
obj = Marshal.load(request.body.read)
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```ruby
|
||||
data = JSON.parse(request.body.read)
|
||||
```
|
||||
105
docs/rules/rust.md
Normal file
105
docs/rules/rust.md
Normal file
|
|
@ -0,0 +1,105 @@
|
|||
# Rust Rules
|
||||
|
||||
Nyx detects Rust vulnerabilities through AST patterns (memory safety, code quality) and taint analysis (command injection via `env::var` → `Command::new`).
|
||||
|
||||
## Taint Sources
|
||||
|
||||
| Function | Capability | Source Kind |
|
||||
|----------|-----------|-------------|
|
||||
| `std::env::var`, `env::var` | `all` | EnvironmentConfig |
|
||||
|
||||
## Taint Sinks
|
||||
|
||||
| Function | Required Capability |
|
||||
|----------|-------------------|
|
||||
| `Command::new`, `Command::arg`, `Command::args` | `SHELL_ESCAPE` |
|
||||
| `Command::status`, `Command::output` | `SHELL_ESCAPE` |
|
||||
| `fs::read_to_string`, `fs::write`, `fs::read`, `File::open`, `File::create` | `FILE_IO` |
|
||||
|
||||
## Taint Sanitizers
|
||||
|
||||
| Function | Strips Capability |
|
||||
|----------|------------------|
|
||||
| `html_escape::encode_safe`, `sanitize_html` | `HTML_ESCAPE` |
|
||||
| `shell_escape::unix::escape`, `sanitize_shell` | `SHELL_ESCAPE` |
|
||||
|
||||
> **Note:** `fs::read_to_string` was moved from taint sources to sinks to support path traversal detection (`env::var` → `fs::read_to_string`).
|
||||
|
||||
---
|
||||
|
||||
## AST Pattern Rules
|
||||
|
||||
### Memory Safety
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `rs.memory.transmute` | High | A | `std::mem::transmute` — unchecked type reinterpretation |
|
||||
| `rs.memory.copy_nonoverlapping` | High | A | `ptr::copy_nonoverlapping` — raw pointer memcpy |
|
||||
| `rs.memory.get_unchecked` | High | A | `get_unchecked` / `get_unchecked_mut` — unchecked indexing |
|
||||
| `rs.memory.mem_zeroed` | High | A | `std::mem::zeroed` — may be UB for non-POD types |
|
||||
| `rs.memory.ptr_read` | High | A | `ptr::read` / `ptr::read_volatile` — raw pointer dereference |
|
||||
| `rs.memory.narrow_cast` | Low | A | `as u8`/`i8`/`u16`/`i16` — possible truncation |
|
||||
| `rs.memory.mem_forget` | Low | A | `std::mem::forget` — may leak resources |
|
||||
|
||||
### Code Quality
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `rs.quality.unsafe_block` | Medium | A | `unsafe { }` block — manual memory safety obligation |
|
||||
| `rs.quality.unsafe_fn` | Medium | A | `unsafe fn` declaration |
|
||||
| `rs.quality.unwrap` | Low | A | `.unwrap()` — panics on `None`/`Err` |
|
||||
| `rs.quality.expect` | Low | A | `.expect()` — panics on `None`/`Err` |
|
||||
| `rs.quality.panic_macro` | Low | A | `panic!()` macro invocation |
|
||||
| `rs.quality.todo` | Low | A | `todo!()` / `unimplemented!()` placeholder |
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### `rs.memory.transmute` — Unchecked type reinterpretation
|
||||
|
||||
**Vulnerable:**
|
||||
```rust
|
||||
let x: u32 = 42;
|
||||
let y: f32 = unsafe { std::mem::transmute(x) };
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```rust
|
||||
let x: u32 = 42;
|
||||
let y: f32 = f32::from_bits(x);
|
||||
```
|
||||
|
||||
### `rs.quality.unsafe_block` — Unsafe block
|
||||
|
||||
**Flagged:**
|
||||
```rust
|
||||
unsafe {
|
||||
let ptr = &x as *const i32;
|
||||
println!("{}", *ptr);
|
||||
}
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```rust
|
||||
// Use safe abstractions when possible
|
||||
println!("{}", x);
|
||||
```
|
||||
|
||||
### Taint: `env::var` → `Command::new`
|
||||
|
||||
**Vulnerable:**
|
||||
```rust
|
||||
let cmd = std::env::var("USER_CMD").unwrap();
|
||||
Command::new("sh").arg("-c").arg(&cmd).output()?;
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```rust
|
||||
let cmd = std::env::var("USER_CMD").unwrap();
|
||||
// Validate against allowlist
|
||||
let allowed = ["ls", "whoami", "date"];
|
||||
if allowed.contains(&cmd.as_str()) {
|
||||
Command::new(&cmd).output()?;
|
||||
}
|
||||
```
|
||||
81
docs/rules/typescript.md
Normal file
81
docs/rules/typescript.md
Normal file
|
|
@ -0,0 +1,81 @@
|
|||
# TypeScript Rules
|
||||
|
||||
TypeScript rules mirror JavaScript patterns plus TypeScript-specific type-safety escape detectors. Taint labels are shared with JavaScript (see [JavaScript Rules](javascript.md)).
|
||||
|
||||
---
|
||||
|
||||
## AST Pattern Rules
|
||||
|
||||
### Code Execution
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `ts.code_exec.eval` | High | A | `eval()` — dynamic code execution |
|
||||
| `ts.code_exec.new_function` | High | A | `new Function()` — eval equivalent |
|
||||
| `ts.code_exec.settimeout_string` | Medium | A | `setTimeout`/`setInterval` with string argument |
|
||||
|
||||
### XSS Sinks
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `ts.xss.document_write` | Medium | A | `document.write()` / `document.writeln()` |
|
||||
| `ts.xss.outer_html` | Medium | A | Assignment to `.outerHTML` |
|
||||
| `ts.xss.insert_adjacent_html` | Medium | A | `insertAdjacentHTML()` |
|
||||
| `ts.xss.location_assign` | Medium | A | Assignment to `location`/`location.href` |
|
||||
| `ts.xss.cookie_write` | Low | A | Write to `document.cookie` |
|
||||
|
||||
### Prototype Pollution
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `ts.prototype.proto_assignment` | Medium | A | Assignment to `__proto__` |
|
||||
|
||||
### Weak Crypto
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `ts.crypto.math_random` | Low | A | `Math.random()` — not cryptographically secure |
|
||||
|
||||
### Code Quality (TypeScript-specific)
|
||||
|
||||
| Rule ID | Severity | Tier | Description |
|
||||
|---------|----------|------|-------------|
|
||||
| `ts.quality.any_annotation` | Low | A | Type annotation of `any` — disables type checking |
|
||||
| `ts.quality.as_any` | Low | A | Type assertion `as any` — type-safety escape hatch |
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### `ts.quality.any_annotation` — `any` type
|
||||
|
||||
**Flagged:**
|
||||
```typescript
|
||||
function process(data: any) { // ts.quality.any_annotation
|
||||
data.whatever(); // No type checking
|
||||
}
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```typescript
|
||||
interface UserData { name: string; email: string; }
|
||||
function process(data: UserData) {
|
||||
console.log(data.name);
|
||||
}
|
||||
```
|
||||
|
||||
### `ts.quality.as_any` — Type assertion escape
|
||||
|
||||
**Flagged:**
|
||||
```typescript
|
||||
const result = someValue as any; // ts.quality.as_any
|
||||
result.nonexistentMethod();
|
||||
```
|
||||
|
||||
**Safe alternative:**
|
||||
```typescript
|
||||
if (isValidType(someValue)) {
|
||||
const result = someValue as KnownType;
|
||||
result.knownMethod();
|
||||
}
|
||||
```
|
||||
432
src/ast.rs
432
src/ast.rs
|
|
@ -2,8 +2,10 @@ use crate::cfg::{build_cfg, export_summaries};
|
|||
use crate::cfg_analysis;
|
||||
use crate::commands::scan::Diag;
|
||||
use crate::errors::{NyxError, NyxResult};
|
||||
use crate::evidence::{Evidence, SpanEvidence, StateEvidence};
|
||||
use crate::labels::{build_lang_rules, severity_for_source_kind};
|
||||
use crate::patterns::Severity;
|
||||
use crate::patterns::{FindingCategory, Severity};
|
||||
use crate::state;
|
||||
use crate::summary::{FuncSummary, GlobalSummaries};
|
||||
use crate::symbol::{Lang, normalize_namespace};
|
||||
use crate::taint::analyse_file;
|
||||
|
|
@ -92,6 +94,23 @@ fn is_nonprod_path(path: &Path) -> bool {
|
|||
false
|
||||
}
|
||||
|
||||
/// Normalize a callee description for display.
|
||||
fn sanitize_desc(s: &str) -> String {
|
||||
crate::fmt::normalize_snippet(s)
|
||||
}
|
||||
|
||||
/// Human-readable label for a `SourceKind`.
|
||||
fn source_kind_label(sk: crate::labels::SourceKind) -> &'static str {
|
||||
use crate::labels::SourceKind;
|
||||
match sk {
|
||||
SourceKind::UserInput => "user input",
|
||||
SourceKind::EnvironmentConfig => "environment config",
|
||||
SourceKind::FileSystem => "file system data",
|
||||
SourceKind::Database => "database result",
|
||||
SourceKind::Unknown => "tainted data",
|
||||
}
|
||||
}
|
||||
|
||||
/// Downgrade severity by one tier: High→Medium, Medium→Low, Low→Low.
|
||||
fn downgrade_severity(s: Severity) -> Severity {
|
||||
match s {
|
||||
|
|
@ -239,8 +258,45 @@ pub fn run_rules_on_bytes(
|
|||
let source_byte = cfg_graph[finding.source].span.0;
|
||||
let source_point = byte_offset_to_point(&_tree, source_byte);
|
||||
|
||||
let source_callee = cfg_graph[finding.source]
|
||||
.callee
|
||||
.as_deref()
|
||||
.map(sanitize_desc)
|
||||
.unwrap_or_else(|| "(unknown)".into());
|
||||
let sink_callee = cfg_graph[finding.sink]
|
||||
.callee
|
||||
.as_deref()
|
||||
.map(sanitize_desc)
|
||||
.unwrap_or_else(|| "(unknown)".into());
|
||||
let kind_label = source_kind_label(finding.source_kind);
|
||||
|
||||
let short_source = crate::fmt::shorten_callee(&source_callee);
|
||||
let short_sink = crate::fmt::shorten_callee(&sink_callee);
|
||||
|
||||
let mut labels = vec![
|
||||
(
|
||||
"Source".into(),
|
||||
format!(
|
||||
"{source_callee} ({}:{})",
|
||||
source_point.row + 1,
|
||||
source_point.column + 1
|
||||
),
|
||||
),
|
||||
("Sink".into(), sink_callee.to_string()),
|
||||
];
|
||||
if let Some(guard) = finding.guard_kind {
|
||||
labels.push(("Path guard".into(), format!("{guard:?}")));
|
||||
}
|
||||
|
||||
let file_path_owned = path.to_string_lossy().into_owned();
|
||||
let mut evidence_notes = Vec::new();
|
||||
if finding.path_validated {
|
||||
evidence_notes.push("path_validated".into());
|
||||
}
|
||||
evidence_notes.push(format!("source_kind:{:?}", finding.source_kind));
|
||||
|
||||
out.push(Diag {
|
||||
path: path.to_string_lossy().into_owned(),
|
||||
path: file_path_owned.clone(),
|
||||
line: sink_point.row + 1,
|
||||
col: sink_point.column + 1,
|
||||
severity: severity_for_source_kind(finding.source_kind),
|
||||
|
|
@ -249,6 +305,50 @@ pub fn run_rules_on_bytes(
|
|||
source_point.row + 1,
|
||||
source_point.column + 1
|
||||
),
|
||||
category: FindingCategory::Security,
|
||||
path_validated: finding.path_validated,
|
||||
guard_kind: finding.guard_kind.map(|k| format!("{k:?}")),
|
||||
message: Some(format!(
|
||||
"unsanitised {kind_label} flows from {short_source} \u{2192} {short_sink}"
|
||||
)),
|
||||
labels,
|
||||
confidence: None,
|
||||
evidence: Some(Evidence {
|
||||
source: Some(SpanEvidence {
|
||||
path: file_path_owned.clone(),
|
||||
line: (source_point.row + 1) as u32,
|
||||
col: (source_point.column + 1) as u32,
|
||||
kind: "source".into(),
|
||||
snippet: Some(short_source.clone()),
|
||||
}),
|
||||
sink: Some(SpanEvidence {
|
||||
path: file_path_owned,
|
||||
line: (sink_point.row + 1) as u32,
|
||||
col: (sink_point.column + 1) as u32,
|
||||
kind: "sink".into(),
|
||||
snippet: Some(short_sink.clone()),
|
||||
}),
|
||||
guards: finding
|
||||
.guard_kind
|
||||
.map(|g| {
|
||||
vec![SpanEvidence {
|
||||
path: path.to_string_lossy().into_owned(),
|
||||
line: (sink_point.row + 1) as u32,
|
||||
col: 0,
|
||||
kind: "guard".into(),
|
||||
snippet: Some(format!("{g:?}")),
|
||||
}]
|
||||
})
|
||||
.unwrap_or_default(),
|
||||
sanitizers: vec![],
|
||||
state: None,
|
||||
notes: evidence_notes,
|
||||
}),
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
});
|
||||
}
|
||||
|
||||
|
|
@ -268,14 +368,111 @@ pub fn run_rules_on_bytes(
|
|||
};
|
||||
for cf in cfg_analysis::run_all(&cfg_ctx) {
|
||||
let point = byte_offset_to_point(&_tree, cf.span.0);
|
||||
let cfg_confidence = Some(match cf.confidence {
|
||||
cfg_analysis::Confidence::High => crate::evidence::Confidence::High,
|
||||
cfg_analysis::Confidence::Medium => crate::evidence::Confidence::Medium,
|
||||
cfg_analysis::Confidence::Low => crate::evidence::Confidence::Low,
|
||||
});
|
||||
out.push(Diag {
|
||||
path: path.to_string_lossy().into_owned(),
|
||||
line: point.row + 1,
|
||||
col: point.column + 1,
|
||||
severity: cf.severity,
|
||||
id: cf.rule_id,
|
||||
category: FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: Some(cf.message),
|
||||
labels: vec![],
|
||||
confidence: cfg_confidence,
|
||||
evidence: Some(Evidence {
|
||||
source: None,
|
||||
sink: Some(SpanEvidence {
|
||||
path: path.to_string_lossy().into_owned(),
|
||||
line: (point.row + 1) as u32,
|
||||
col: (point.column + 1) as u32,
|
||||
kind: "sink".into(),
|
||||
snippet: None,
|
||||
}),
|
||||
guards: vec![],
|
||||
sanitizers: vec![],
|
||||
state: None,
|
||||
notes: vec![],
|
||||
}),
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
});
|
||||
}
|
||||
|
||||
// ── State-model dataflow analysis ────────────────────────────────
|
||||
if cfg.scanner.enable_state_analysis {
|
||||
let state_findings = state::run_state_analysis(
|
||||
&cfg_graph,
|
||||
entry,
|
||||
caller_lang,
|
||||
bytes,
|
||||
&summaries,
|
||||
global_summaries,
|
||||
);
|
||||
// Collect state finding lines to dedup overlapping CFG findings.
|
||||
let state_lines: std::collections::HashSet<usize> = state_findings
|
||||
.iter()
|
||||
.map(|sf| byte_offset_to_point(&_tree, sf.span.0).row + 1)
|
||||
.collect();
|
||||
|
||||
for sf in &state_findings {
|
||||
let point = byte_offset_to_point(&_tree, sf.span.0);
|
||||
out.push(Diag {
|
||||
path: path.to_string_lossy().into_owned(),
|
||||
line: point.row + 1,
|
||||
col: point.column + 1,
|
||||
severity: sf.severity,
|
||||
id: sf.rule_id.clone(),
|
||||
category: FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: Some(sf.message.clone()),
|
||||
labels: vec![],
|
||||
confidence: None,
|
||||
evidence: Some(Evidence {
|
||||
source: None,
|
||||
sink: Some(SpanEvidence {
|
||||
path: path.to_string_lossy().into_owned(),
|
||||
line: (point.row + 1) as u32,
|
||||
col: (point.column + 1) as u32,
|
||||
kind: "sink".into(),
|
||||
snippet: None,
|
||||
}),
|
||||
guards: vec![],
|
||||
sanitizers: vec![],
|
||||
state: Some(StateEvidence {
|
||||
machine: sf.machine.into(),
|
||||
subject: sf.subject.clone(),
|
||||
from_state: sf.from_state.into(),
|
||||
to_state: sf.to_state.into(),
|
||||
}),
|
||||
notes: vec![],
|
||||
}),
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
});
|
||||
}
|
||||
|
||||
// Suppress cfg-resource-leak / cfg-auth-gap when state analysis
|
||||
// already covers the same line (state analysis is more precise).
|
||||
if !state_findings.is_empty() {
|
||||
out.retain(|d| {
|
||||
!((d.id == "cfg-resource-leak" || d.id == "cfg-auth-gap")
|
||||
&& state_lines.contains(&d.line))
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if cfg.scanner.mode == AnalysisMode::Full || cfg.scanner.mode == AnalysisMode::Ast {
|
||||
|
|
@ -285,7 +482,7 @@ pub fn run_rules_on_bytes(
|
|||
let mut cursor = QueryCursor::new();
|
||||
|
||||
for cq in compiled.iter() {
|
||||
if cfg.scanner.min_severity <= cq.meta.severity {
|
||||
if cq.meta.severity > cfg.scanner.min_severity {
|
||||
continue;
|
||||
}
|
||||
let mut matches = cursor.matches(&cq.query, root, bytes);
|
||||
|
|
@ -298,6 +495,31 @@ pub fn run_rules_on_bytes(
|
|||
col: point.column + 1,
|
||||
severity: cq.meta.severity,
|
||||
id: cq.meta.id.to_owned(),
|
||||
category: cq.meta.category.finding_category(),
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: Some(cq.meta.description.to_owned()),
|
||||
labels: vec![],
|
||||
confidence: Some(cq.meta.confidence),
|
||||
evidence: Some(Evidence {
|
||||
source: None,
|
||||
sink: Some(SpanEvidence {
|
||||
path: path.to_string_lossy().into_owned(),
|
||||
line: (point.row + 1) as u32,
|
||||
col: (point.column + 1) as u32,
|
||||
kind: "sink".into(),
|
||||
snippet: None,
|
||||
}),
|
||||
guards: vec![],
|
||||
sanitizers: vec![],
|
||||
state: None,
|
||||
notes: vec![],
|
||||
}),
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
|
@ -427,8 +649,45 @@ pub fn analyse_file_fused(
|
|||
let source_byte = cfg_graph[finding.source].span.0;
|
||||
let source_point = byte_offset_to_point(&tree, source_byte);
|
||||
|
||||
let source_callee = cfg_graph[finding.source]
|
||||
.callee
|
||||
.as_deref()
|
||||
.map(sanitize_desc)
|
||||
.unwrap_or_else(|| "(unknown)".into());
|
||||
let sink_callee = cfg_graph[finding.sink]
|
||||
.callee
|
||||
.as_deref()
|
||||
.map(sanitize_desc)
|
||||
.unwrap_or_else(|| "(unknown)".into());
|
||||
let kind_label = source_kind_label(finding.source_kind);
|
||||
|
||||
let short_source = crate::fmt::shorten_callee(&source_callee);
|
||||
let short_sink = crate::fmt::shorten_callee(&sink_callee);
|
||||
|
||||
let mut labels = vec![
|
||||
(
|
||||
"Source".into(),
|
||||
format!(
|
||||
"{source_callee} ({}:{})",
|
||||
source_point.row + 1,
|
||||
source_point.column + 1
|
||||
),
|
||||
),
|
||||
("Sink".into(), sink_callee.to_string()),
|
||||
];
|
||||
if let Some(guard) = finding.guard_kind {
|
||||
labels.push(("Path guard".into(), format!("{guard:?}")));
|
||||
}
|
||||
|
||||
let fused_file_path = path.to_string_lossy().into_owned();
|
||||
let mut fused_evidence_notes = Vec::new();
|
||||
if finding.path_validated {
|
||||
fused_evidence_notes.push("path_validated".into());
|
||||
}
|
||||
fused_evidence_notes.push(format!("source_kind:{:?}", finding.source_kind));
|
||||
|
||||
out.push(Diag {
|
||||
path: path.to_string_lossy().into_owned(),
|
||||
path: fused_file_path.clone(),
|
||||
line: sink_point.row + 1,
|
||||
col: sink_point.column + 1,
|
||||
severity: severity_for_source_kind(finding.source_kind),
|
||||
|
|
@ -437,6 +696,50 @@ pub fn analyse_file_fused(
|
|||
source_point.row + 1,
|
||||
source_point.column + 1
|
||||
),
|
||||
category: FindingCategory::Security,
|
||||
path_validated: finding.path_validated,
|
||||
guard_kind: finding.guard_kind.map(|k| format!("{k:?}")),
|
||||
message: Some(format!(
|
||||
"unsanitised {kind_label} flows from {short_source} \u{2192} {short_sink}"
|
||||
)),
|
||||
labels,
|
||||
confidence: None,
|
||||
evidence: Some(Evidence {
|
||||
source: Some(SpanEvidence {
|
||||
path: fused_file_path.clone(),
|
||||
line: (source_point.row + 1) as u32,
|
||||
col: (source_point.column + 1) as u32,
|
||||
kind: "source".into(),
|
||||
snippet: Some(short_source.clone()),
|
||||
}),
|
||||
sink: Some(SpanEvidence {
|
||||
path: fused_file_path.clone(),
|
||||
line: (sink_point.row + 1) as u32,
|
||||
col: (sink_point.column + 1) as u32,
|
||||
kind: "sink".into(),
|
||||
snippet: Some(short_sink.clone()),
|
||||
}),
|
||||
guards: finding
|
||||
.guard_kind
|
||||
.map(|g| {
|
||||
vec![SpanEvidence {
|
||||
path: fused_file_path,
|
||||
line: (sink_point.row + 1) as u32,
|
||||
col: 0,
|
||||
kind: "guard".into(),
|
||||
snippet: Some(format!("{g:?}")),
|
||||
}]
|
||||
})
|
||||
.unwrap_or_default(),
|
||||
sanitizers: vec![],
|
||||
state: None,
|
||||
notes: fused_evidence_notes,
|
||||
}),
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
});
|
||||
}
|
||||
|
||||
|
|
@ -455,14 +758,108 @@ pub fn analyse_file_fused(
|
|||
};
|
||||
for cf in cfg_analysis::run_all(&cfg_ctx) {
|
||||
let point = byte_offset_to_point(&tree, cf.span.0);
|
||||
let fused_cfg_confidence = Some(match cf.confidence {
|
||||
cfg_analysis::Confidence::High => crate::evidence::Confidence::High,
|
||||
cfg_analysis::Confidence::Medium => crate::evidence::Confidence::Medium,
|
||||
cfg_analysis::Confidence::Low => crate::evidence::Confidence::Low,
|
||||
});
|
||||
out.push(Diag {
|
||||
path: path.to_string_lossy().into_owned(),
|
||||
line: point.row + 1,
|
||||
col: point.column + 1,
|
||||
severity: cf.severity,
|
||||
id: cf.rule_id,
|
||||
category: FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: Some(cf.message),
|
||||
labels: vec![],
|
||||
confidence: fused_cfg_confidence,
|
||||
evidence: Some(Evidence {
|
||||
source: None,
|
||||
sink: Some(SpanEvidence {
|
||||
path: path.to_string_lossy().into_owned(),
|
||||
line: (point.row + 1) as u32,
|
||||
col: (point.column + 1) as u32,
|
||||
kind: "sink".into(),
|
||||
snippet: None,
|
||||
}),
|
||||
guards: vec![],
|
||||
sanitizers: vec![],
|
||||
state: None,
|
||||
notes: vec![],
|
||||
}),
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
});
|
||||
}
|
||||
|
||||
// ── State-model dataflow analysis ────────────────────────────────
|
||||
if cfg.scanner.enable_state_analysis {
|
||||
let state_findings = state::run_state_analysis(
|
||||
&cfg_graph,
|
||||
entry,
|
||||
caller_lang,
|
||||
bytes,
|
||||
&local_summaries,
|
||||
global_summaries,
|
||||
);
|
||||
let state_lines: std::collections::HashSet<usize> = state_findings
|
||||
.iter()
|
||||
.map(|sf| byte_offset_to_point(&tree, sf.span.0).row + 1)
|
||||
.collect();
|
||||
|
||||
for sf in &state_findings {
|
||||
let point = byte_offset_to_point(&tree, sf.span.0);
|
||||
out.push(Diag {
|
||||
path: path.to_string_lossy().into_owned(),
|
||||
line: point.row + 1,
|
||||
col: point.column + 1,
|
||||
severity: sf.severity,
|
||||
id: sf.rule_id.clone(),
|
||||
category: FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: Some(sf.message.clone()),
|
||||
labels: vec![],
|
||||
confidence: None,
|
||||
evidence: Some(Evidence {
|
||||
source: None,
|
||||
sink: Some(SpanEvidence {
|
||||
path: path.to_string_lossy().into_owned(),
|
||||
line: (point.row + 1) as u32,
|
||||
col: (point.column + 1) as u32,
|
||||
kind: "sink".into(),
|
||||
snippet: None,
|
||||
}),
|
||||
guards: vec![],
|
||||
sanitizers: vec![],
|
||||
state: Some(StateEvidence {
|
||||
machine: sf.machine.into(),
|
||||
subject: sf.subject.clone(),
|
||||
from_state: sf.from_state.into(),
|
||||
to_state: sf.to_state.into(),
|
||||
}),
|
||||
notes: vec![],
|
||||
}),
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
});
|
||||
}
|
||||
|
||||
if !state_findings.is_empty() {
|
||||
out.retain(|d| {
|
||||
!((d.id == "cfg-resource-leak" || d.id == "cfg-auth-gap")
|
||||
&& state_lines.contains(&d.line))
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// AST pattern queries
|
||||
|
|
@ -472,7 +869,7 @@ pub fn analyse_file_fused(
|
|||
let mut cursor = QueryCursor::new();
|
||||
|
||||
for cq in compiled.iter() {
|
||||
if cfg.scanner.min_severity <= cq.meta.severity {
|
||||
if cq.meta.severity > cfg.scanner.min_severity {
|
||||
continue;
|
||||
}
|
||||
let mut matches = cursor.matches(&cq.query, root, bytes);
|
||||
|
|
@ -485,6 +882,31 @@ pub fn analyse_file_fused(
|
|||
col: point.column + 1,
|
||||
severity: cq.meta.severity,
|
||||
id: cq.meta.id.to_owned(),
|
||||
category: cq.meta.category.finding_category(),
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: Some(cq.meta.description.to_owned()),
|
||||
labels: vec![],
|
||||
confidence: Some(cq.meta.confidence),
|
||||
evidence: Some(Evidence {
|
||||
source: None,
|
||||
sink: Some(SpanEvidence {
|
||||
path: path.to_string_lossy().into_owned(),
|
||||
line: (point.row + 1) as u32,
|
||||
col: (point.column + 1) as u32,
|
||||
kind: "sink".into(),
|
||||
snippet: None,
|
||||
}),
|
||||
guards: vec![],
|
||||
sanitizers: vec![],
|
||||
state: None,
|
||||
notes: vec![],
|
||||
}),
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
|
|
|||
599
src/callgraph.rs
Normal file
599
src/callgraph.rs
Normal file
|
|
@ -0,0 +1,599 @@
|
|||
use crate::interop::InteropEdge;
|
||||
use crate::summary::{CalleeResolution, GlobalSummaries};
|
||||
use crate::symbol::FuncKey;
|
||||
use petgraph::graph::NodeIndex;
|
||||
use petgraph::prelude::*;
|
||||
use std::collections::HashMap;
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Types
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Metadata attached to each call-graph edge.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct CallEdge {
|
||||
/// The raw callee string as it appeared in source (e.g. `"env::var"`).
|
||||
/// Preserved for diagnostics — **not** the normalized form used for resolution.
|
||||
#[allow(dead_code)] // used for future diagnostics and path display
|
||||
pub call_site: String,
|
||||
}
|
||||
|
||||
/// A callee that could not be resolved to any known function definition.
|
||||
#[derive(Debug, Clone)]
|
||||
#[allow(dead_code)] // fields used for future diagnostics reporting
|
||||
pub struct UnresolvedCallee {
|
||||
pub caller: FuncKey,
|
||||
pub callee_name: String,
|
||||
}
|
||||
|
||||
/// A callee that matched multiple function definitions — ambiguous.
|
||||
#[derive(Debug, Clone)]
|
||||
#[allow(dead_code)] // fields used for future diagnostics reporting
|
||||
pub struct AmbiguousCallee {
|
||||
pub caller: FuncKey,
|
||||
pub callee_name: String,
|
||||
pub candidates: Vec<FuncKey>,
|
||||
}
|
||||
|
||||
/// The whole-program call graph.
|
||||
///
|
||||
/// Nodes are [`FuncKey`]s (one per function definition across all files).
|
||||
/// Edges represent call-site relationships resolved after pass 1.
|
||||
pub struct CallGraph {
|
||||
pub graph: DiGraph<FuncKey, CallEdge>,
|
||||
/// `FuncKey → NodeIndex` for quick lookup.
|
||||
#[allow(dead_code)] // used for future topo-ordered analysis and call-graph queries
|
||||
pub index: HashMap<FuncKey, NodeIndex>,
|
||||
/// Callee strings that could not be resolved to any [`FuncKey`].
|
||||
pub unresolved_not_found: Vec<UnresolvedCallee>,
|
||||
/// Callee strings that matched multiple candidates.
|
||||
pub unresolved_ambiguous: Vec<AmbiguousCallee>,
|
||||
}
|
||||
|
||||
/// Result of SCC / topological analysis on the call graph.
|
||||
pub struct CallGraphAnalysis {
|
||||
/// Strongly connected components.
|
||||
pub sccs: Vec<Vec<NodeIndex>>,
|
||||
/// Maps each `NodeIndex` to its SCC index in [`sccs`].
|
||||
#[allow(dead_code)] // used for future topo-ordered taint propagation
|
||||
pub node_to_scc: HashMap<NodeIndex, usize>,
|
||||
/// SCC indices in **callee-first** (leaves-first) order.
|
||||
///
|
||||
/// Functions with no callees appear first; callers appear later.
|
||||
/// Suitable for bottom-up taint propagation.
|
||||
#[allow(dead_code)] // used for future topo-ordered taint propagation
|
||||
pub topo_scc_callee_first: Vec<usize>,
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Callee-name normalization
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Extract the last segment of a qualified callee name for resolution.
|
||||
///
|
||||
/// ```text
|
||||
/// "env::var" → "var"
|
||||
/// "std::process::Command" → "Command"
|
||||
/// "obj.method" → "method"
|
||||
/// "pkg.mod.func" → "func"
|
||||
/// "foo" → "foo" (unchanged)
|
||||
/// "" → "" (edge case)
|
||||
/// ```
|
||||
///
|
||||
/// The original raw text is preserved on [`CallEdge::call_site`] for
|
||||
/// diagnostics; this function only produces the lookup key.
|
||||
pub(crate) fn normalize_callee_name(raw: &str) -> &str {
|
||||
// Split on "::" first (Rust-style qualification), take last segment.
|
||||
let after_colons = raw.rsplit("::").next().unwrap_or(raw);
|
||||
// Then split on "." (method calls, Python/JS dotted paths), take last segment.
|
||||
after_colons.rsplit('.').next().unwrap_or(after_colons)
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Call-graph construction
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Build the whole-program call graph from merged summaries.
|
||||
///
|
||||
/// Resolution mirrors `GlobalSummaries::resolve_callee_key`:
|
||||
/// 1. Normalize callee name (last segment after `::` or `.`)
|
||||
/// 2. Same-language, arity-filtered, namespace-disambiguated lookup
|
||||
/// 3. Interop edges (explicit cross-language bridges)
|
||||
///
|
||||
/// Unresolved and ambiguous callees are recorded for diagnostics but
|
||||
/// do **not** create edges.
|
||||
pub fn build_call_graph(summaries: &GlobalSummaries, interop_edges: &[InteropEdge]) -> CallGraph {
|
||||
let mut graph = DiGraph::new();
|
||||
let mut index = HashMap::new();
|
||||
|
||||
// 1. Create one node per FuncKey.
|
||||
for (key, _) in summaries.iter() {
|
||||
let idx = graph.add_node(key.clone());
|
||||
index.insert(key.clone(), idx);
|
||||
}
|
||||
|
||||
let mut unresolved_not_found = Vec::new();
|
||||
let mut unresolved_ambiguous = Vec::new();
|
||||
|
||||
// 2. Resolve callees and add edges.
|
||||
for (caller_key, summary) in summaries.iter() {
|
||||
let caller_node = index[caller_key];
|
||||
|
||||
for raw_callee in &summary.callees {
|
||||
let normalized = normalize_callee_name(raw_callee);
|
||||
|
||||
match summaries.resolve_callee_key(
|
||||
normalized,
|
||||
caller_key.lang,
|
||||
&caller_key.namespace,
|
||||
None,
|
||||
) {
|
||||
CalleeResolution::Resolved(target_key) => {
|
||||
if let Some(&target_node) = index.get(&target_key) {
|
||||
graph.add_edge(
|
||||
caller_node,
|
||||
target_node,
|
||||
CallEdge {
|
||||
call_site: raw_callee.clone(),
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
CalleeResolution::NotFound => {
|
||||
// Try interop edges before recording as not-found.
|
||||
if let Some(target_key) =
|
||||
resolve_via_interop(raw_callee, caller_key, interop_edges)
|
||||
&& let Some(&target_node) = index.get(&target_key)
|
||||
{
|
||||
graph.add_edge(
|
||||
caller_node,
|
||||
target_node,
|
||||
CallEdge {
|
||||
call_site: raw_callee.clone(),
|
||||
},
|
||||
);
|
||||
continue;
|
||||
}
|
||||
unresolved_not_found.push(UnresolvedCallee {
|
||||
caller: caller_key.clone(),
|
||||
callee_name: raw_callee.clone(),
|
||||
});
|
||||
}
|
||||
CalleeResolution::Ambiguous(candidates) => {
|
||||
unresolved_ambiguous.push(AmbiguousCallee {
|
||||
caller: caller_key.clone(),
|
||||
callee_name: raw_callee.clone(),
|
||||
candidates,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
CallGraph {
|
||||
graph,
|
||||
index,
|
||||
unresolved_not_found,
|
||||
unresolved_ambiguous,
|
||||
}
|
||||
}
|
||||
|
||||
/// Check interop edges for a matching cross-language bridge.
|
||||
fn resolve_via_interop(
|
||||
raw_callee: &str,
|
||||
caller_key: &FuncKey,
|
||||
interop_edges: &[InteropEdge],
|
||||
) -> Option<FuncKey> {
|
||||
for edge in interop_edges {
|
||||
if edge.from.caller_lang == caller_key.lang
|
||||
&& edge.from.caller_namespace == caller_key.namespace
|
||||
&& edge.from.callee_symbol == raw_callee
|
||||
&& (edge.from.caller_func.is_empty() || edge.from.caller_func == caller_key.name)
|
||||
{
|
||||
return Some(edge.to.clone());
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// SCC / topological analysis
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Compute SCC decomposition and topological ordering of the call graph.
|
||||
///
|
||||
/// `petgraph::algo::tarjan_scc` returns SCCs in *reverse* topological order
|
||||
/// of the condensation DAG — i.e. leaf SCCs (no outgoing cross-SCC edges)
|
||||
/// come **first**. That is exactly the **callee-first** order suitable for
|
||||
/// bottom-up taint propagation.
|
||||
pub fn analyse(cg: &CallGraph) -> CallGraphAnalysis {
|
||||
let sccs = petgraph::algo::tarjan_scc(&cg.graph);
|
||||
|
||||
let mut node_to_scc = HashMap::with_capacity(cg.graph.node_count());
|
||||
for (scc_idx, scc) in sccs.iter().enumerate() {
|
||||
for &node in scc {
|
||||
node_to_scc.insert(node, scc_idx);
|
||||
}
|
||||
}
|
||||
|
||||
// tarjan_scc already gives callee-first ordering.
|
||||
let topo_scc_callee_first: Vec<usize> = (0..sccs.len()).collect();
|
||||
|
||||
CallGraphAnalysis {
|
||||
sccs,
|
||||
node_to_scc,
|
||||
topo_scc_callee_first,
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Tests
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::interop::CallSiteKey;
|
||||
use crate::summary::{FuncSummary, merge_summaries};
|
||||
use crate::symbol::Lang;
|
||||
|
||||
/// Helper to create a minimal FuncSummary.
|
||||
fn make_summary(
|
||||
name: &str,
|
||||
file_path: &str,
|
||||
lang: &str,
|
||||
param_count: usize,
|
||||
callees: Vec<&str>,
|
||||
) -> FuncSummary {
|
||||
FuncSummary {
|
||||
name: name.into(),
|
||||
file_path: file_path.into(),
|
||||
lang: lang.into(),
|
||||
param_count,
|
||||
param_names: vec![],
|
||||
source_caps: 0,
|
||||
sanitizer_caps: 0,
|
||||
sink_caps: 0,
|
||||
propagates_taint: false,
|
||||
tainted_sink_params: vec![],
|
||||
callees: callees.into_iter().map(String::from).collect(),
|
||||
}
|
||||
}
|
||||
|
||||
// ── normalize_callee_name ────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn normalize_callee_basic() {
|
||||
assert_eq!(normalize_callee_name("env::var"), "var");
|
||||
assert_eq!(normalize_callee_name("std::process::Command"), "Command");
|
||||
assert_eq!(normalize_callee_name("obj.method"), "method");
|
||||
assert_eq!(normalize_callee_name("pkg.mod.func"), "func");
|
||||
assert_eq!(normalize_callee_name("foo"), "foo");
|
||||
assert_eq!(normalize_callee_name(""), "");
|
||||
}
|
||||
|
||||
// ── same name, different Rust modules ────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn same_name_different_rust_modules() {
|
||||
let helper_a = make_summary("helper", "src/a.rs", "rust", 0, vec![]);
|
||||
let helper_b = make_summary("helper", "src/b.rs", "rust", 0, vec![]);
|
||||
let caller = make_summary("caller", "src/a.rs", "rust", 0, vec!["helper"]);
|
||||
|
||||
let gs = merge_summaries(vec![helper_a, helper_b, caller], None);
|
||||
let cg = build_call_graph(&gs, &[]);
|
||||
|
||||
// Two helper nodes + one caller node = 3 nodes
|
||||
assert_eq!(cg.graph.node_count(), 3);
|
||||
|
||||
// Caller is in src/a.rs, so "helper" resolves to src/a.rs::helper
|
||||
let caller_key = FuncKey {
|
||||
lang: Lang::Rust,
|
||||
namespace: "src/a.rs".into(),
|
||||
name: "caller".into(),
|
||||
arity: Some(0),
|
||||
};
|
||||
let helper_a_key = FuncKey {
|
||||
lang: Lang::Rust,
|
||||
namespace: "src/a.rs".into(),
|
||||
name: "helper".into(),
|
||||
arity: Some(0),
|
||||
};
|
||||
|
||||
let caller_node = cg.index[&caller_key];
|
||||
let helper_a_node = cg.index[&helper_a_key];
|
||||
|
||||
// Exactly one edge: caller → helper_a
|
||||
let edges: Vec<_> = cg
|
||||
.graph
|
||||
.edges(caller_node)
|
||||
.filter(|e| e.target() == helper_a_node)
|
||||
.collect();
|
||||
assert_eq!(edges.len(), 1);
|
||||
assert!(cg.unresolved_not_found.is_empty());
|
||||
assert!(cg.unresolved_ambiguous.is_empty());
|
||||
}
|
||||
|
||||
// ── same name, Python vs Rust ────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn same_name_python_and_rust() {
|
||||
let py_foo = make_summary("foo", "handler.py", "python", 0, vec![]);
|
||||
let rs_foo = make_summary("foo", "handler.rs", "rust", 0, vec![]);
|
||||
// Python caller calls "foo" — should only see the Python one
|
||||
let py_caller = make_summary("main", "app.py", "python", 0, vec!["foo"]);
|
||||
|
||||
let gs = merge_summaries(vec![py_foo, rs_foo, py_caller], None);
|
||||
let cg = build_call_graph(&gs, &[]);
|
||||
|
||||
assert_eq!(cg.graph.node_count(), 3);
|
||||
|
||||
let py_foo_key = FuncKey {
|
||||
lang: Lang::Python,
|
||||
namespace: "handler.py".into(),
|
||||
name: "foo".into(),
|
||||
arity: Some(0),
|
||||
};
|
||||
let caller_key = FuncKey {
|
||||
lang: Lang::Python,
|
||||
namespace: "app.py".into(),
|
||||
name: "main".into(),
|
||||
arity: Some(0),
|
||||
};
|
||||
|
||||
let caller_node = cg.index[&caller_key];
|
||||
let py_foo_node = cg.index[&py_foo_key];
|
||||
|
||||
// Edge goes to Python foo, not Rust foo
|
||||
let edges: Vec<_> = cg.graph.edges(caller_node).collect();
|
||||
assert_eq!(edges.len(), 1);
|
||||
assert_eq!(edges[0].target(), py_foo_node);
|
||||
}
|
||||
|
||||
// ── arity differences → separate nodes ───────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn arity_differences_separate_nodes() {
|
||||
let helper1 = make_summary("helper", "lib.rs", "rust", 1, vec![]);
|
||||
let helper2 = make_summary("helper", "lib.rs", "rust", 2, vec![]);
|
||||
|
||||
let gs = merge_summaries(vec![helper1, helper2], None);
|
||||
let cg = build_call_graph(&gs, &[]);
|
||||
|
||||
// Two separate nodes (different arity → different FuncKey)
|
||||
assert_eq!(cg.graph.node_count(), 2);
|
||||
|
||||
let key1 = FuncKey {
|
||||
lang: Lang::Rust,
|
||||
namespace: "lib.rs".into(),
|
||||
name: "helper".into(),
|
||||
arity: Some(1),
|
||||
};
|
||||
let key2 = FuncKey {
|
||||
lang: Lang::Rust,
|
||||
namespace: "lib.rs".into(),
|
||||
name: "helper".into(),
|
||||
arity: Some(2),
|
||||
};
|
||||
assert!(cg.index.contains_key(&key1));
|
||||
assert!(cg.index.contains_key(&key2));
|
||||
}
|
||||
|
||||
// ── recursive SCC detection ──────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn recursive_scc_detection() {
|
||||
let a = make_summary("a", "lib.rs", "rust", 0, vec!["b"]);
|
||||
let b = make_summary("b", "lib.rs", "rust", 0, vec!["a"]);
|
||||
|
||||
let gs = merge_summaries(vec![a, b], None);
|
||||
let cg = build_call_graph(&gs, &[]);
|
||||
|
||||
assert_eq!(cg.graph.edge_count(), 2); // a→b and b→a
|
||||
|
||||
let analysis = analyse(&cg);
|
||||
|
||||
// Both nodes should be in the same SCC
|
||||
let key_a = FuncKey {
|
||||
lang: Lang::Rust,
|
||||
namespace: "lib.rs".into(),
|
||||
name: "a".into(),
|
||||
arity: Some(0),
|
||||
};
|
||||
let key_b = FuncKey {
|
||||
lang: Lang::Rust,
|
||||
namespace: "lib.rs".into(),
|
||||
name: "b".into(),
|
||||
arity: Some(0),
|
||||
};
|
||||
|
||||
let scc_a = analysis.node_to_scc[&cg.index[&key_a]];
|
||||
let scc_b = analysis.node_to_scc[&cg.index[&key_b]];
|
||||
assert_eq!(scc_a, scc_b);
|
||||
assert_eq!(analysis.sccs[scc_a].len(), 2);
|
||||
}
|
||||
|
||||
// ── unresolved callee → recorded as not found ────────────────────────
|
||||
|
||||
#[test]
|
||||
fn unresolved_callee_recorded_as_not_found() {
|
||||
let caller = make_summary("caller", "lib.rs", "rust", 0, vec!["nonexistent"]);
|
||||
|
||||
let gs = merge_summaries(vec![caller], None);
|
||||
let cg = build_call_graph(&gs, &[]);
|
||||
|
||||
assert_eq!(cg.graph.edge_count(), 0);
|
||||
assert_eq!(cg.unresolved_not_found.len(), 1);
|
||||
assert_eq!(cg.unresolved_not_found[0].callee_name, "nonexistent");
|
||||
assert!(cg.unresolved_ambiguous.is_empty());
|
||||
}
|
||||
|
||||
// ── ambiguous callee → recorded as ambiguous ─────────────────────────
|
||||
|
||||
#[test]
|
||||
fn ambiguous_callee_recorded() {
|
||||
// Two "helper" functions in different namespaces.
|
||||
let helper_a = make_summary("helper", "a.rs", "rust", 0, vec![]);
|
||||
let helper_b = make_summary("helper", "b.rs", "rust", 0, vec![]);
|
||||
// Caller is in a THIRD namespace, so neither is preferred.
|
||||
let caller = make_summary("caller", "c.rs", "rust", 0, vec!["helper"]);
|
||||
|
||||
let gs = merge_summaries(vec![helper_a, helper_b, caller], None);
|
||||
let cg = build_call_graph(&gs, &[]);
|
||||
|
||||
assert_eq!(cg.graph.edge_count(), 0); // no edge — ambiguous
|
||||
assert!(cg.unresolved_not_found.is_empty());
|
||||
assert_eq!(cg.unresolved_ambiguous.len(), 1);
|
||||
assert_eq!(cg.unresolved_ambiguous[0].callee_name, "helper");
|
||||
assert_eq!(cg.unresolved_ambiguous[0].candidates.len(), 2);
|
||||
}
|
||||
|
||||
// ── diamond topo order (callee-first) ────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn diamond_topo_callee_first() {
|
||||
// A → B, A → C, B → D, C → D
|
||||
let d = make_summary("d", "lib.rs", "rust", 0, vec![]);
|
||||
let b = make_summary("b", "lib.rs", "rust", 0, vec!["d"]);
|
||||
let c = make_summary("c", "lib.rs", "rust", 0, vec!["d"]);
|
||||
let a = make_summary("a", "lib.rs", "rust", 0, vec!["b", "c"]);
|
||||
|
||||
let gs = merge_summaries(vec![a, b, c, d], None);
|
||||
let cg = build_call_graph(&gs, &[]);
|
||||
|
||||
assert_eq!(cg.graph.node_count(), 4);
|
||||
|
||||
let analysis = analyse(&cg);
|
||||
|
||||
let key = |name: &str| FuncKey {
|
||||
lang: Lang::Rust,
|
||||
namespace: "lib.rs".into(),
|
||||
name: name.into(),
|
||||
arity: Some(0),
|
||||
};
|
||||
|
||||
let scc_of = |name: &str| analysis.node_to_scc[&cg.index[&key(name)]];
|
||||
let topo_pos = |name: &str| {
|
||||
analysis
|
||||
.topo_scc_callee_first
|
||||
.iter()
|
||||
.position(|&s| s == scc_of(name))
|
||||
.unwrap()
|
||||
};
|
||||
|
||||
// D (leaf) must come before B and C, which must come before A (root).
|
||||
assert!(topo_pos("d") < topo_pos("b"));
|
||||
assert!(topo_pos("d") < topo_pos("c"));
|
||||
assert!(topo_pos("b") < topo_pos("a"));
|
||||
assert!(topo_pos("c") < topo_pos("a"));
|
||||
}
|
||||
|
||||
// ── interop edge resolution ──────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn interop_edge_resolution() {
|
||||
let py_caller = make_summary("process", "handler.py", "python", 0, vec!["js_func"]);
|
||||
let js_target = make_summary("js_func", "util.js", "javascript", 1, vec![]);
|
||||
|
||||
let gs = merge_summaries(vec![py_caller, js_target], None);
|
||||
|
||||
let interop = vec![InteropEdge {
|
||||
from: CallSiteKey {
|
||||
caller_lang: Lang::Python,
|
||||
caller_namespace: "handler.py".into(),
|
||||
caller_func: String::new(), // wildcard
|
||||
callee_symbol: "js_func".into(),
|
||||
ordinal: 0,
|
||||
},
|
||||
to: FuncKey {
|
||||
lang: Lang::JavaScript,
|
||||
namespace: "util.js".into(),
|
||||
name: "js_func".into(),
|
||||
arity: Some(1),
|
||||
},
|
||||
arg_map: vec![],
|
||||
ret_taints: false,
|
||||
}];
|
||||
|
||||
let cg = build_call_graph(&gs, &interop);
|
||||
|
||||
let caller_key = FuncKey {
|
||||
lang: Lang::Python,
|
||||
namespace: "handler.py".into(),
|
||||
name: "process".into(),
|
||||
arity: Some(0),
|
||||
};
|
||||
let target_key = FuncKey {
|
||||
lang: Lang::JavaScript,
|
||||
namespace: "util.js".into(),
|
||||
name: "js_func".into(),
|
||||
arity: Some(1),
|
||||
};
|
||||
|
||||
let caller_node = cg.index[&caller_key];
|
||||
let target_node = cg.index[&target_key];
|
||||
|
||||
let edges: Vec<_> = cg
|
||||
.graph
|
||||
.edges(caller_node)
|
||||
.filter(|e| e.target() == target_node)
|
||||
.collect();
|
||||
assert_eq!(edges.len(), 1);
|
||||
assert!(cg.unresolved_not_found.is_empty());
|
||||
}
|
||||
|
||||
// ── namespace normalization consistency ───────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn namespace_normalization_consistency() {
|
||||
// FuncSummary::func_key with a scan root produces the same namespace
|
||||
// string that would be used as caller_namespace in resolution.
|
||||
let summary = FuncSummary {
|
||||
name: "my_func".into(),
|
||||
file_path: "/home/user/proj/src/lib.rs".into(),
|
||||
lang: "rust".into(),
|
||||
param_count: 0,
|
||||
param_names: vec![],
|
||||
source_caps: 0,
|
||||
sanitizer_caps: 0,
|
||||
sink_caps: 0,
|
||||
propagates_taint: false,
|
||||
tainted_sink_params: vec![],
|
||||
callees: vec![],
|
||||
};
|
||||
|
||||
let root = "/home/user/proj";
|
||||
let key = summary.func_key(Some(root));
|
||||
|
||||
// The namespace in the key must be the same as what normalize_namespace produces
|
||||
let expected_ns = crate::symbol::normalize_namespace(&summary.file_path, Some(root));
|
||||
assert_eq!(key.namespace, expected_ns);
|
||||
assert_eq!(key.namespace, "src/lib.rs");
|
||||
}
|
||||
|
||||
// ── raw call_site preserved on edge ──────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn raw_call_site_preserved_on_edge() {
|
||||
// Callee "env::var" normalizes to "var" for resolution, but
|
||||
// the edge should retain the original raw text.
|
||||
let source = make_summary("var", "util.rs", "rust", 0, vec![]);
|
||||
let caller = make_summary("main", "util.rs", "rust", 0, vec!["env::var"]);
|
||||
|
||||
let gs = merge_summaries(vec![source, caller], None);
|
||||
let cg = build_call_graph(&gs, &[]);
|
||||
|
||||
let caller_key = FuncKey {
|
||||
lang: Lang::Rust,
|
||||
namespace: "util.rs".into(),
|
||||
name: "main".into(),
|
||||
arity: Some(0),
|
||||
};
|
||||
let caller_node = cg.index[&caller_key];
|
||||
|
||||
let edges: Vec<_> = cg.graph.edges(caller_node).collect();
|
||||
assert_eq!(edges.len(), 1);
|
||||
// Raw call_site preserved, not the normalized "var"
|
||||
assert_eq!(edges[0].weight().call_site, "env::var");
|
||||
}
|
||||
}
|
||||
417
src/cfg.rs
417
src/cfg.rs
|
|
@ -32,6 +32,9 @@ pub enum EdgeKind {
|
|||
Back, // back‑edge that closes a loop
|
||||
}
|
||||
|
||||
/// Maximum number of identifiers to store from a condition expression.
|
||||
const MAX_COND_VARS: usize = 8;
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct NodeInfo {
|
||||
pub kind: StmtKind,
|
||||
|
|
@ -44,6 +47,12 @@ pub struct NodeInfo {
|
|||
pub enclosing_func: Option<String>,
|
||||
/// Per-function call ordinal (0-based, only meaningful for Call nodes).
|
||||
pub call_ordinal: u32,
|
||||
/// For If nodes: raw condition text (truncated to 128 chars). None for non-If nodes.
|
||||
pub condition_text: Option<String>,
|
||||
/// For If nodes: identifiers referenced in the condition (sorted, deduped, max 8).
|
||||
pub condition_vars: Vec<String>,
|
||||
/// For If nodes: whether the condition has a leading negation (`!` / `not`).
|
||||
pub condition_negated: bool,
|
||||
}
|
||||
|
||||
/// Intra‑file function summary with graph‑local node indices.
|
||||
|
|
@ -122,6 +131,7 @@ fn first_call_ident<'a>(n: Node<'a>, lang: &str, code: &'a [u8]) -> Option<Strin
|
|||
.child_by_field_name("function")
|
||||
.or_else(|| c.child_by_field_name("method"))
|
||||
.or_else(|| c.child_by_field_name("name"))
|
||||
.or_else(|| c.child_by_field_name("type"))
|
||||
.and_then(|f| text_of(f, code)),
|
||||
Kind::CallMethod => {
|
||||
let func = c
|
||||
|
|
@ -155,6 +165,65 @@ fn first_call_ident<'a>(n: Node<'a>, lang: &str, code: &'a [u8]) -> Option<Strin
|
|||
None
|
||||
}
|
||||
|
||||
/// Search recursively for any nested call whose identifier classifies as a label.
|
||||
/// Used for cases like `str(eval(expr))` where `str` doesn't match but `eval` does.
|
||||
fn find_classifiable_inner_call<'a>(
|
||||
n: Node<'a>,
|
||||
lang: &str,
|
||||
code: &'a [u8],
|
||||
extra: Option<&[crate::labels::RuntimeLabelRule]>,
|
||||
) -> Option<(String, DataLabel)> {
|
||||
let mut cursor = n.walk();
|
||||
for c in n.children(&mut cursor) {
|
||||
match lookup(lang, c.kind()) {
|
||||
Kind::CallFn | Kind::CallMethod | Kind::CallMacro => {
|
||||
let ident = match lookup(lang, c.kind()) {
|
||||
Kind::CallFn => c
|
||||
.child_by_field_name("function")
|
||||
.or_else(|| c.child_by_field_name("method"))
|
||||
.or_else(|| c.child_by_field_name("name"))
|
||||
.or_else(|| c.child_by_field_name("type"))
|
||||
.and_then(|f| text_of(f, code)),
|
||||
Kind::CallMethod => {
|
||||
let func = c
|
||||
.child_by_field_name("method")
|
||||
.or_else(|| c.child_by_field_name("name"))
|
||||
.and_then(|f| text_of(f, code));
|
||||
let recv = c
|
||||
.child_by_field_name("object")
|
||||
.or_else(|| c.child_by_field_name("receiver"))
|
||||
.and_then(|f| root_receiver_text(f, lang, code));
|
||||
match (recv, func) {
|
||||
(Some(r), Some(f)) => Some(format!("{r}.{f}")),
|
||||
(_, Some(f)) => Some(f),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
Kind::CallMacro => c
|
||||
.child_by_field_name("macro")
|
||||
.and_then(|f| text_of(f, code)),
|
||||
_ => None,
|
||||
};
|
||||
if let Some(ref id) = ident
|
||||
&& let Some(lbl) = classify(lang, id, extra)
|
||||
{
|
||||
return Some((id.clone(), lbl));
|
||||
}
|
||||
// Recurse into arguments of this call
|
||||
if let Some(found) = find_classifiable_inner_call(c, lang, code, extra) {
|
||||
return Some(found);
|
||||
}
|
||||
}
|
||||
_ => {
|
||||
if let Some(found) = find_classifiable_inner_call(c, lang, code, extra) {
|
||||
return Some(found);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Build the dot-joined text of a member_expression / attribute / selector_expression.
|
||||
/// E.g. for `process.env.CMD` this returns `"process.env.CMD"`.
|
||||
fn member_expr_text(n: Node, code: &[u8]) -> Option<String> {
|
||||
|
|
@ -209,6 +278,25 @@ fn first_member_label(
|
|||
}
|
||||
}
|
||||
}
|
||||
// PHP/Python/Ruby subscript access: `$_GET['cmd']`, `os.environ['KEY']`, `params[:cmd]`
|
||||
// Try to classify the object (before the `[`) as a source.
|
||||
"subscript_expression" | "subscript" | "element_reference" => {
|
||||
if let Some(obj) = n
|
||||
.child_by_field_name("object")
|
||||
.or_else(|| n.child_by_field_name("value"))
|
||||
.or_else(|| n.child(0))
|
||||
{
|
||||
if let Some(txt) = text_of(obj, code)
|
||||
&& let Some(lbl) = classify(lang, &txt, extra_labels)
|
||||
{
|
||||
return Some(lbl);
|
||||
}
|
||||
// Recurse into the object for nested member accesses
|
||||
if let Some(lbl) = first_member_label(obj, lang, code, extra_labels) {
|
||||
return Some(lbl);
|
||||
}
|
||||
}
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
let mut cursor = n.walk();
|
||||
|
|
@ -224,6 +312,11 @@ fn first_member_label(
|
|||
fn first_member_text(n: Node, code: &[u8]) -> Option<String> {
|
||||
match n.kind() {
|
||||
"member_expression" | "attribute" | "selector_expression" => member_expr_text(n, code),
|
||||
"subscript_expression" | "subscript" | "element_reference" => n
|
||||
.child_by_field_name("object")
|
||||
.or_else(|| n.child_by_field_name("value"))
|
||||
.or_else(|| n.child(0))
|
||||
.and_then(|obj| text_of(obj, code)),
|
||||
_ => {
|
||||
let mut cursor = n.walk();
|
||||
for child in n.children(&mut cursor) {
|
||||
|
|
@ -237,6 +330,42 @@ fn first_member_text(n: Node, code: &[u8]) -> Option<String> {
|
|||
}
|
||||
|
||||
/// Check whether any descendant of `n` is a call expression.
|
||||
/// Collect function-expression nodes nested inside a call's arguments.
|
||||
///
|
||||
/// This finds anonymous functions / arrow functions / closures that are
|
||||
/// passed as arguments to a call and should be analysed as separate
|
||||
/// function scopes. Only direct function-argument children are collected
|
||||
/// (not functions nested inside other functions — those get handled when
|
||||
/// the outer function is recursed into).
|
||||
fn collect_nested_function_nodes<'a>(n: Node<'a>, lang: &str) -> Vec<Node<'a>> {
|
||||
let mut funcs = Vec::new();
|
||||
collect_nested_functions_rec(n, lang, &mut funcs, false);
|
||||
funcs
|
||||
}
|
||||
|
||||
fn collect_nested_functions_rec<'a>(
|
||||
n: Node<'a>,
|
||||
lang: &str,
|
||||
out: &mut Vec<Node<'a>>,
|
||||
inside_function: bool,
|
||||
) {
|
||||
let kind = lookup(lang, n.kind());
|
||||
// Only treat as a function if it's a real function node (has children),
|
||||
// not a keyword token like `function` in JS which shares the same kind name.
|
||||
if kind == Kind::Function && n.child_count() > 0 {
|
||||
if inside_function {
|
||||
// Don't recurse into nested functions of nested functions
|
||||
return;
|
||||
}
|
||||
out.push(n);
|
||||
return;
|
||||
}
|
||||
let mut cursor = n.walk();
|
||||
for c in n.children(&mut cursor) {
|
||||
collect_nested_functions_rec(c, lang, out, inside_function);
|
||||
}
|
||||
}
|
||||
|
||||
fn has_call_descendant(n: Node, lang: &str) -> bool {
|
||||
let mut cursor = n.walk();
|
||||
for c in n.children(&mut cursor) {
|
||||
|
|
@ -361,6 +490,36 @@ fn def_use(ast: Node, lang: &str, code: &[u8]) -> (Option<String>, Vec<String>)
|
|||
(defs, uses)
|
||||
}
|
||||
|
||||
// if‑let / while‑let — the `let_condition` binds a variable from
|
||||
// the value expression. E.g. `if let Ok(cmd) = env::var("CMD")`
|
||||
// defines `cmd` and uses `env`, `var`, `CMD`.
|
||||
Kind::If | Kind::While => {
|
||||
let cond = ast.child_by_field_name("condition");
|
||||
if let Some(c) = cond
|
||||
&& c.kind() == "let_condition"
|
||||
{
|
||||
let mut defs = None;
|
||||
let mut uses = Vec::new();
|
||||
|
||||
if let Some(pat) = c.child_by_field_name("pattern") {
|
||||
let mut tmp = Vec::<String>::new();
|
||||
collect_idents(pat, code, &mut tmp);
|
||||
// The first plain identifier in the pattern is the binding.
|
||||
// Skip type identifiers (e.g. "Ok" in Ok(cmd)) — take the
|
||||
// last ident which is the inner binding name.
|
||||
defs = tmp.into_iter().last();
|
||||
}
|
||||
if let Some(val) = c.child_by_field_name("value") {
|
||||
collect_idents(val, code, &mut uses);
|
||||
}
|
||||
return (defs, uses);
|
||||
}
|
||||
|
||||
let mut uses = Vec::new();
|
||||
collect_idents(ast, code, &mut uses);
|
||||
(None, uses)
|
||||
}
|
||||
|
||||
// everything else – no definition, but may read vars
|
||||
_ => {
|
||||
let mut uses = Vec::new();
|
||||
|
|
@ -370,6 +529,109 @@ fn def_use(ast: Node, lang: &str, code: &[u8]) -> (Option<String>, Vec<String>)
|
|||
}
|
||||
}
|
||||
|
||||
/// Extract raw condition metadata from an If AST node.
|
||||
///
|
||||
/// Returns `(condition_text, condition_vars, condition_negated)`.
|
||||
/// The condition subtree is located via `child_by_field_name("condition")`
|
||||
/// for most languages, with a positional fallback for Rust `if_expression`.
|
||||
///
|
||||
/// Negation is detected by checking for a leading unary `!` operator or
|
||||
/// `not` keyword. Variables are sorted, deduped, and capped at
|
||||
/// [`MAX_COND_VARS`].
|
||||
fn extract_condition_raw<'a>(
|
||||
ast: Node<'a>,
|
||||
lang: &str,
|
||||
code: &'a [u8],
|
||||
) -> (Option<String>, Vec<String>, bool) {
|
||||
// 1. Find the condition subtree.
|
||||
let cond_node = ast.child_by_field_name("condition").or_else(|| {
|
||||
// Rust `if_expression` uses positional children: the condition is
|
||||
// the first child that is not a keyword, block, or `let` pattern.
|
||||
let mut cursor = ast.walk();
|
||||
ast.children(&mut cursor).find(|c| {
|
||||
let k = c.kind();
|
||||
!matches!(lookup(lang, k), Kind::Block | Kind::Trivia)
|
||||
&& k != "if"
|
||||
&& k != "else"
|
||||
&& k != "let"
|
||||
&& k != "{"
|
||||
&& k != "}"
|
||||
&& k != "("
|
||||
&& k != ")"
|
||||
})
|
||||
});
|
||||
|
||||
let Some(cond) = cond_node else {
|
||||
return (None, Vec::new(), false);
|
||||
};
|
||||
|
||||
// 2. Detect leading negation (`!expr`, `not expr`, Ruby `unless`).
|
||||
let (inner, negated) = detect_negation(cond, ast, lang);
|
||||
|
||||
// 3. Collect identifiers from the (inner) condition subtree.
|
||||
let mut vars = Vec::new();
|
||||
collect_idents(inner, code, &mut vars);
|
||||
vars.sort();
|
||||
vars.dedup();
|
||||
vars.truncate(MAX_COND_VARS);
|
||||
|
||||
// 4. Extract text, truncated.
|
||||
let text = text_of(cond, code).map(|t| {
|
||||
if t.len() > 128 {
|
||||
t[..128].to_string()
|
||||
} else {
|
||||
t
|
||||
}
|
||||
});
|
||||
|
||||
(text, vars, negated)
|
||||
}
|
||||
|
||||
/// Detect leading negation and return the inner expression.
|
||||
///
|
||||
/// Handles:
|
||||
/// - `!expr` (unary_expression / prefix_unary_expression with `!` operator)
|
||||
/// - `not expr` (Python `not_operator`, Ruby)
|
||||
/// - Ruby `unless` (the whole If node kind is `unless`)
|
||||
fn detect_negation<'a>(cond: Node<'a>, if_ast: Node<'a>, _lang: &str) -> (Node<'a>, bool) {
|
||||
// Ruby `unless` is mapped to Kind::If but is semantically negated.
|
||||
if if_ast.kind() == "unless" {
|
||||
return (cond, true);
|
||||
}
|
||||
|
||||
// `!expr` appears as unary_expression, not_operator, or prefix_unary_expression
|
||||
// with a `!` or `not` operator child.
|
||||
let is_negation_wrapper = matches!(
|
||||
cond.kind(),
|
||||
"unary_expression" | "not_operator" | "prefix_unary_expression" | "unary_not"
|
||||
);
|
||||
|
||||
if is_negation_wrapper {
|
||||
// Check if the first child is a `!` or `not` operator.
|
||||
let has_not = cond
|
||||
.child(0)
|
||||
.is_some_and(|c| c.kind() == "!" || c.kind() == "not");
|
||||
|
||||
if has_not {
|
||||
// Return the operand (inner expression after the `!` / `not`).
|
||||
let inner = cond
|
||||
.child_by_field_name("argument")
|
||||
.or_else(|| cond.child_by_field_name("operand"))
|
||||
.or_else(|| {
|
||||
// Last non-operator child.
|
||||
let mut cursor = cond.walk();
|
||||
cond.children(&mut cursor)
|
||||
.filter(|c| c.kind() != "!" && c.kind() != "not")
|
||||
.last()
|
||||
})
|
||||
.unwrap_or(cond);
|
||||
return (inner, true);
|
||||
}
|
||||
}
|
||||
|
||||
(cond, false)
|
||||
}
|
||||
|
||||
/// Create a node in one short borrow and optionally attach a taint label.
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
fn push_node<'a>(
|
||||
|
|
@ -391,6 +653,7 @@ fn push_node<'a>(
|
|||
.child_by_field_name("function")
|
||||
.or_else(|| ast.child_by_field_name("method"))
|
||||
.or_else(|| ast.child_by_field_name("name"))
|
||||
.or_else(|| ast.child_by_field_name("type"))
|
||||
.and_then(|n| text_of(n, code))
|
||||
.unwrap_or_default(),
|
||||
|
||||
|
|
@ -426,7 +689,7 @@ fn push_node<'a>(
|
|||
// the whole line.
|
||||
if matches!(
|
||||
lookup(lang, ast.kind()),
|
||||
Kind::CallWrapper | Kind::Assignment
|
||||
Kind::CallWrapper | Kind::Assignment | Kind::Return
|
||||
) && let Some(inner) = first_call_ident(ast, lang, code)
|
||||
{
|
||||
text = inner;
|
||||
|
|
@ -437,6 +700,20 @@ fn push_node<'a>(
|
|||
let extra = analysis_rules.map(|r| r.extra_labels.as_slice());
|
||||
let mut label = classify(lang, &text, extra);
|
||||
|
||||
// If the outermost call didn't classify, try inner/nested calls.
|
||||
// E.g. `str(eval(expr))` — `str` is not a sink, but `eval` is.
|
||||
if label.is_none()
|
||||
&& matches!(
|
||||
lookup(lang, ast.kind()),
|
||||
Kind::CallWrapper | Kind::Assignment | Kind::Return
|
||||
)
|
||||
&& let Some((inner_text, inner_label)) =
|
||||
find_classifiable_inner_call(ast, lang, code, extra)
|
||||
{
|
||||
label = Some(inner_label);
|
||||
text = inner_text;
|
||||
}
|
||||
|
||||
// For assignments like `element.innerHTML = value`, the inner-call heuristic
|
||||
// above may have overridden `text` with a call on the RHS (e.g. getElementById).
|
||||
// If that didn't produce a label, check the LHS property name — it may be a
|
||||
|
|
@ -493,18 +770,49 @@ fn push_node<'a>(
|
|||
}
|
||||
}
|
||||
|
||||
// For `if let` / `while let` patterns: try to classify the value expression
|
||||
// in the let-condition as a source/sink. E.g. `if let Ok(cmd) = env::var("CMD")`
|
||||
// should recognise `env::var` as a taint source and label this node accordingly.
|
||||
if label.is_none()
|
||||
&& matches!(lookup(lang, ast.kind()), Kind::If | Kind::While)
|
||||
&& let Some(cond) = ast.child_by_field_name("condition")
|
||||
&& cond.kind() == "let_condition"
|
||||
&& let Some(val) = cond.child_by_field_name("value")
|
||||
{
|
||||
if let Some(ident) = first_call_ident(val, lang, code)
|
||||
&& let Some(l) = classify(lang, &ident, extra)
|
||||
{
|
||||
label = Some(l);
|
||||
text = ident;
|
||||
}
|
||||
if label.is_none()
|
||||
&& let Some(ident_text) = text_of(val, code)
|
||||
&& let Some(l) = classify(lang, &ident_text, extra)
|
||||
{
|
||||
label = Some(l);
|
||||
text = ident_text;
|
||||
}
|
||||
}
|
||||
|
||||
let span = (ast.start_byte(), ast.end_byte());
|
||||
|
||||
/* ── 3. GRAPH INSERTION + DEBUG ──────────────────────────────────── */
|
||||
|
||||
let (defines, uses) = def_use(ast, lang, code);
|
||||
|
||||
let callee = if kind == StmtKind::Call {
|
||||
let callee = if kind == StmtKind::Call || label.is_some() {
|
||||
Some(text.clone())
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
// Extract condition metadata for If nodes.
|
||||
let (condition_text, condition_vars, condition_negated) = if kind == StmtKind::If {
|
||||
extract_condition_raw(ast, lang, code)
|
||||
} else {
|
||||
(None, Vec::new(), false)
|
||||
};
|
||||
|
||||
let idx = g.add_node(NodeInfo {
|
||||
kind,
|
||||
span,
|
||||
|
|
@ -514,6 +822,9 @@ fn push_node<'a>(
|
|||
callee,
|
||||
enclosing_func: enclosing_func.map(|s| s.to_string()),
|
||||
call_ordinal,
|
||||
condition_text,
|
||||
condition_vars,
|
||||
condition_negated,
|
||||
});
|
||||
|
||||
debug!(
|
||||
|
|
@ -717,19 +1028,27 @@ fn build_sub<'a>(
|
|||
}
|
||||
exits
|
||||
} else {
|
||||
// No explicit else → if the then-branch falls through
|
||||
// (non-empty exits), the false branch merges with those exits.
|
||||
// If the then-branch terminates (break/return/continue →
|
||||
// empty exits), the false branch flows from the condition
|
||||
// to whatever comes next.
|
||||
if then_exits.is_empty() {
|
||||
vec![cond]
|
||||
} else {
|
||||
if let Some(&first) = then_exits.first() {
|
||||
connect_all(g, &[cond], first, EdgeKind::False);
|
||||
}
|
||||
then_exits.clone()
|
||||
}
|
||||
// No explicit else → create a synthetic pass-through node
|
||||
// for the false path. This avoids routing the False edge
|
||||
// to a then-block exit (which would make it appear that the
|
||||
// false path goes *through* the then-block) and gives
|
||||
// path-sensitive analysis an explicit False edge to record
|
||||
// predicates on.
|
||||
let pass = g.add_node(NodeInfo {
|
||||
kind: StmtKind::Seq,
|
||||
span: (ast.end_byte(), ast.end_byte()),
|
||||
label: None,
|
||||
defines: None,
|
||||
uses: Vec::new(),
|
||||
callee: None,
|
||||
enclosing_func: enclosing_func.map(|s| s.to_string()),
|
||||
call_ordinal: 0,
|
||||
condition_text: None,
|
||||
condition_vars: Vec::new(),
|
||||
condition_negated: false,
|
||||
});
|
||||
connect_all(g, &[cond], pass, EdgeKind::False);
|
||||
vec![pass]
|
||||
};
|
||||
|
||||
// Frontier = union of both branches
|
||||
|
|
@ -995,7 +1314,7 @@ fn build_sub<'a>(
|
|||
collect_idents(n, code, &mut tmp);
|
||||
tmp.into_iter().next()
|
||||
})
|
||||
.unwrap_or_else(|| "<anon>".to_string());
|
||||
.unwrap_or_else(|| format!("<anon@{}>", ast.start_byte()));
|
||||
let entry_idx = push_node(
|
||||
g,
|
||||
StmtKind::Seq,
|
||||
|
|
@ -1016,7 +1335,20 @@ fn build_sub<'a>(
|
|||
// Snapshot the current node count so we can iterate only over nodes
|
||||
// created within this function (avoids O(N²) scan of the full graph).
|
||||
let fn_first_node: NodeIndex = NodeIndex::new(g.node_count());
|
||||
let body = ast.child_by_field_name("body").expect("fn w/o body");
|
||||
let body = ast.child_by_field_name("body").unwrap_or_else(|| {
|
||||
// Some function expressions (e.g. JS anonymous `function(…) { … }`)
|
||||
// don't have a named "body" field — find the first block child.
|
||||
let mut c = ast.walk();
|
||||
ast.children(&mut c)
|
||||
.find(|n| matches!(lookup(lang, n.kind()), Kind::Block | Kind::SourceFile))
|
||||
.unwrap_or_else(|| {
|
||||
panic!(
|
||||
"fn w/o body: kind={} text='{}'",
|
||||
ast.kind(),
|
||||
text_of(ast, code).unwrap_or_default()
|
||||
)
|
||||
})
|
||||
});
|
||||
let mut fn_call_ordinal: u32 = 0;
|
||||
let mut fn_breaks = Vec::new();
|
||||
let mut fn_continues = Vec::new();
|
||||
|
|
@ -1191,6 +1523,9 @@ fn build_sub<'a>(
|
|||
callee: None,
|
||||
enclosing_func: Some(fn_name.clone()),
|
||||
call_ordinal: 0,
|
||||
condition_text: None,
|
||||
condition_vars: Vec::new(),
|
||||
condition_negated: false,
|
||||
});
|
||||
// Wire body exits (fall-through) to the exit node.
|
||||
for &b in &body_exits {
|
||||
|
|
@ -1300,6 +1635,28 @@ fn build_sub<'a>(
|
|||
{
|
||||
return Vec::new();
|
||||
}
|
||||
|
||||
// Recurse into any function expressions nested in arguments
|
||||
// (e.g. `app.get('/path', function(req, res) { ... })`)
|
||||
// so that they get proper function summaries.
|
||||
let nested = collect_nested_function_nodes(ast, lang);
|
||||
for func_node in nested {
|
||||
build_sub(
|
||||
func_node,
|
||||
&[node],
|
||||
g,
|
||||
lang,
|
||||
code,
|
||||
summaries,
|
||||
file_path,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
break_targets,
|
||||
continue_targets,
|
||||
);
|
||||
}
|
||||
|
||||
vec![node]
|
||||
}
|
||||
|
||||
|
|
@ -1326,6 +1683,26 @@ fn build_sub<'a>(
|
|||
{
|
||||
return Vec::new();
|
||||
}
|
||||
|
||||
// Recurse into any function expressions nested in arguments
|
||||
let nested = collect_nested_function_nodes(ast, lang);
|
||||
for func_node in nested {
|
||||
build_sub(
|
||||
func_node,
|
||||
&[n],
|
||||
g,
|
||||
lang,
|
||||
code,
|
||||
summaries,
|
||||
file_path,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
break_targets,
|
||||
continue_targets,
|
||||
);
|
||||
}
|
||||
|
||||
vec![n]
|
||||
}
|
||||
|
||||
|
|
@ -1412,6 +1789,9 @@ pub(crate) fn build_cfg<'a>(
|
|||
callee: None,
|
||||
enclosing_func: None,
|
||||
call_ordinal: 0,
|
||||
condition_text: None,
|
||||
condition_vars: Vec::new(),
|
||||
condition_negated: false,
|
||||
});
|
||||
let exit = g.add_node(NodeInfo {
|
||||
kind: StmtKind::Exit,
|
||||
|
|
@ -1422,6 +1802,9 @@ pub(crate) fn build_cfg<'a>(
|
|||
callee: None,
|
||||
enclosing_func: None,
|
||||
call_ordinal: 0,
|
||||
condition_text: None,
|
||||
condition_vars: Vec::new(),
|
||||
condition_negated: false,
|
||||
});
|
||||
|
||||
// Build the body below the synthetic ENTRY.
|
||||
|
|
|
|||
|
|
@ -33,7 +33,6 @@ pub struct CfgFinding {
|
|||
pub severity: Severity,
|
||||
pub confidence: Confidence,
|
||||
pub span: (usize, usize),
|
||||
#[allow(dead_code)]
|
||||
pub message: String,
|
||||
pub evidence: Vec<NodeIndex>,
|
||||
pub score: Option<f64>,
|
||||
|
|
|
|||
|
|
@ -681,6 +681,8 @@ fn taint_and_unguarded_sink_deduped() {
|
|||
source: entry,
|
||||
path: vec![entry, sink_node],
|
||||
source_kind: crate::labels::SourceKind::UserInput,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
}];
|
||||
|
||||
let findings = parse_and_run_all_with_taint(
|
||||
|
|
|
|||
174
src/cli.rs
174
src/cli.rs
|
|
@ -1,4 +1,4 @@
|
|||
use clap::{Parser, Subcommand};
|
||||
use clap::{Parser, Subcommand, ValueEnum};
|
||||
|
||||
#[derive(Parser)]
|
||||
#[command(name = "nyx")]
|
||||
|
|
@ -13,10 +13,55 @@ impl Commands {
|
|||
/// Whether this command produces structured (machine-readable) output on
|
||||
/// stdout, meaning human status messages must be suppressed entirely.
|
||||
pub fn is_structured_output(&self) -> bool {
|
||||
matches!(self, Commands::Scan { format, .. } if format == "json" || format == "sarif")
|
||||
matches!(self, Commands::Scan { format, .. } if *format == OutputFormat::Json || *format == OutputFormat::Sarif)
|
||||
}
|
||||
}
|
||||
|
||||
/// Output format for scan results.
|
||||
#[derive(Debug, Copy, Clone, PartialEq, Eq, ValueEnum, Default)]
|
||||
pub enum OutputFormat {
|
||||
#[default]
|
||||
Console,
|
||||
Json,
|
||||
Sarif,
|
||||
}
|
||||
|
||||
impl std::fmt::Display for OutputFormat {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
match self {
|
||||
OutputFormat::Console => write!(f, "console"),
|
||||
OutputFormat::Json => write!(f, "json"),
|
||||
OutputFormat::Sarif => write!(f, "sarif"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Index mode for scan operations.
|
||||
#[derive(Debug, Copy, Clone, PartialEq, Eq, ValueEnum, Default)]
|
||||
pub enum IndexMode {
|
||||
/// Use index if available, build if missing (default)
|
||||
#[default]
|
||||
Auto,
|
||||
/// Skip indexing entirely, scan filesystem directly
|
||||
Off,
|
||||
/// Force rebuild index before scanning
|
||||
Rebuild,
|
||||
}
|
||||
|
||||
/// Analysis mode for scan operations.
|
||||
#[derive(Debug, Copy, Clone, PartialEq, Eq, ValueEnum, Default)]
|
||||
pub enum ScanMode {
|
||||
/// Run all analyses: AST patterns + CFG + taint (default)
|
||||
#[default]
|
||||
Full,
|
||||
/// Run AST pattern queries only (no CFG/taint)
|
||||
Ast,
|
||||
/// Run CFG structural analyses + taint only (no AST patterns)
|
||||
Cfg,
|
||||
/// Alias for cfg (CFG + taint analysis)
|
||||
Taint,
|
||||
}
|
||||
|
||||
#[derive(Subcommand)]
|
||||
pub enum Commands {
|
||||
/// Scan project for vulnerabilities
|
||||
|
|
@ -25,35 +70,118 @@ pub enum Commands {
|
|||
#[arg(default_value = ".")]
|
||||
path: String,
|
||||
|
||||
/// Skip using/building index, scan directly
|
||||
#[arg(long)]
|
||||
no_index: bool,
|
||||
/// Index mode: auto (default), off (no index), rebuild (force rebuild)
|
||||
#[arg(long, value_enum, default_value_t = IndexMode::Auto)]
|
||||
index: IndexMode,
|
||||
|
||||
/// Force rebuild index before scanning
|
||||
#[arg(long)]
|
||||
rebuild_index: bool,
|
||||
/// Output format
|
||||
#[arg(short, long, value_enum, default_value_t = OutputFormat::Console)]
|
||||
format: OutputFormat,
|
||||
|
||||
/// Output format (console, json, sarif)
|
||||
#[arg(short, long, default_value = "")]
|
||||
format: String,
|
||||
|
||||
/// Show only high severity issues
|
||||
/// Severity filter expression: HIGH, HIGH,MEDIUM, or >=MEDIUM
|
||||
///
|
||||
/// Filters findings AFTER all severity normalization (e.g. nonprod
|
||||
/// downgrades). Only findings matching the expression are emitted.
|
||||
/// Case-insensitive. Shell-quote expressions containing ">".
|
||||
#[arg(long)]
|
||||
high_only: bool,
|
||||
severity: Option<String>,
|
||||
|
||||
#[arg(long)]
|
||||
ast_only: bool,
|
||||
/// Analysis mode: full (default), ast, cfg, taint
|
||||
#[arg(long, value_enum, default_value_t = ScanMode::Full)]
|
||||
mode: ScanMode,
|
||||
|
||||
#[arg(long)]
|
||||
cfg_only: bool,
|
||||
|
||||
#[arg(long)]
|
||||
/// Scan all targets (alias for --mode full)
|
||||
#[arg(long, hide = true)]
|
||||
all_targets: bool,
|
||||
|
||||
/// Include findings from test/vendor/build paths at original severity
|
||||
/// (by default these are downgraded)
|
||||
/// Preserve original severity for test/vendor/build paths
|
||||
///
|
||||
/// By default, findings in non-production paths are downgraded by one
|
||||
/// severity tier. This flag preserves original severity.
|
||||
#[arg(long, alias = "include-nonprod")]
|
||||
keep_nonprod_severity: bool,
|
||||
|
||||
/// Suppress all human-readable status output
|
||||
#[arg(long)]
|
||||
include_nonprod: bool,
|
||||
quiet: bool,
|
||||
|
||||
/// Exit with code 1 if any finding meets or exceeds this severity
|
||||
///
|
||||
/// Useful for CI gating. Example: --fail-on HIGH
|
||||
#[arg(long)]
|
||||
fail_on: Option<String>,
|
||||
|
||||
/// Disable attack-surface ranking (findings are sorted by exploitability by default)
|
||||
#[arg(long)]
|
||||
no_rank: bool,
|
||||
|
||||
/// Show inline-suppressed findings (dimmed, tagged [SUPPRESSED])
|
||||
#[arg(long)]
|
||||
show_suppressed: bool,
|
||||
|
||||
/// Show all findings: disables category filtering, rollups, and LOW budgets
|
||||
#[arg(long = "all")]
|
||||
show_all: bool,
|
||||
|
||||
/// Include Quality findings (excluded by default)
|
||||
#[arg(long)]
|
||||
include_quality: bool,
|
||||
|
||||
/// Maximum total LOW findings to show
|
||||
#[arg(long, default_value_t = 20)]
|
||||
max_low: u32,
|
||||
|
||||
/// Maximum LOW findings per file
|
||||
#[arg(long, default_value_t = 1)]
|
||||
max_low_per_file: u32,
|
||||
|
||||
/// Maximum LOW findings per rule
|
||||
#[arg(long, default_value_t = 10)]
|
||||
max_low_per_rule: u32,
|
||||
|
||||
/// Number of example locations in rollup findings
|
||||
#[arg(long, default_value_t = 5)]
|
||||
rollup_examples: u32,
|
||||
|
||||
/// Show all instances for a specific rule (bypasses rollup for that rule)
|
||||
#[arg(long)]
|
||||
show_instances: Option<String>,
|
||||
|
||||
/// Minimum attack-surface score to include in output
|
||||
///
|
||||
/// Findings with a rank score below this threshold are suppressed.
|
||||
/// Requires ranking to be enabled (has no effect with --no-rank).
|
||||
/// Example: --min-score 50
|
||||
#[arg(long)]
|
||||
min_score: Option<u32>,
|
||||
|
||||
/// Minimum confidence level to include in output
|
||||
///
|
||||
/// Values: low, medium, high. Findings below this level are dropped.
|
||||
/// JSON/SARIF include all unless filtered.
|
||||
#[arg(long)]
|
||||
min_confidence: Option<String>,
|
||||
|
||||
// ── Deprecated aliases (hidden) ─────────────────────────────────
|
||||
/// Deprecated: use --index off
|
||||
#[arg(long, hide = true)]
|
||||
no_index: bool,
|
||||
|
||||
/// Deprecated: use --index rebuild
|
||||
#[arg(long, hide = true)]
|
||||
rebuild_index: bool,
|
||||
|
||||
/// Deprecated: use --severity HIGH
|
||||
#[arg(long, hide = true)]
|
||||
high_only: bool,
|
||||
|
||||
/// Deprecated: use --mode ast
|
||||
#[arg(long, hide = true)]
|
||||
ast_only: bool,
|
||||
|
||||
/// Deprecated: use --mode cfg
|
||||
#[arg(long, hide = true)]
|
||||
cfg_only: bool,
|
||||
},
|
||||
|
||||
/// Manage project indexes
|
||||
|
|
|
|||
|
|
@ -4,9 +4,9 @@ pub mod index;
|
|||
pub mod list;
|
||||
pub mod scan;
|
||||
|
||||
use crate::cli::Commands;
|
||||
use crate::cli::{Commands, IndexMode, ScanMode};
|
||||
use crate::errors::NyxResult;
|
||||
use crate::patterns::Severity;
|
||||
use crate::patterns::{Severity, SeverityFilter};
|
||||
use crate::utils::config::{AnalysisMode, Config};
|
||||
use std::path::Path;
|
||||
|
||||
|
|
@ -19,36 +19,130 @@ pub fn handle_command(
|
|||
match command {
|
||||
Commands::Scan {
|
||||
path,
|
||||
index,
|
||||
format,
|
||||
severity,
|
||||
mode,
|
||||
all_targets,
|
||||
keep_nonprod_severity,
|
||||
quiet,
|
||||
fail_on,
|
||||
no_rank,
|
||||
show_suppressed,
|
||||
show_all,
|
||||
include_quality,
|
||||
max_low,
|
||||
max_low_per_file,
|
||||
max_low_per_rule,
|
||||
rollup_examples,
|
||||
show_instances,
|
||||
min_score,
|
||||
min_confidence,
|
||||
// Deprecated aliases
|
||||
no_index,
|
||||
rebuild_index,
|
||||
format,
|
||||
high_only,
|
||||
ast_only,
|
||||
cfg_only,
|
||||
all_targets,
|
||||
include_nonprod,
|
||||
} => {
|
||||
if high_only {
|
||||
config.scanner.min_severity = Severity::High
|
||||
// ── Resolve deprecated aliases ──────────────────────────────
|
||||
|
||||
// Index mode: explicit --index wins, then deprecated flags
|
||||
let effective_index = if no_index {
|
||||
IndexMode::Off
|
||||
} else if rebuild_index {
|
||||
IndexMode::Rebuild
|
||||
} else {
|
||||
index
|
||||
};
|
||||
|
||||
if ast_only {
|
||||
config.scanner.mode = AnalysisMode::Ast
|
||||
// Analysis mode: explicit --mode wins, then deprecated flags
|
||||
let effective_mode = if ast_only {
|
||||
ScanMode::Ast
|
||||
} else if cfg_only {
|
||||
ScanMode::Cfg
|
||||
} else if all_targets {
|
||||
ScanMode::Full
|
||||
} else {
|
||||
mode
|
||||
};
|
||||
|
||||
if cfg_only {
|
||||
config.scanner.mode = AnalysisMode::Taint
|
||||
// Severity filter: explicit --severity wins, then --high-only
|
||||
let severity_filter = if let Some(ref expr) = severity {
|
||||
Some(SeverityFilter::parse(expr).map_err(|e| {
|
||||
crate::errors::NyxError::Msg(format!("invalid --severity expression: {e}"))
|
||||
})?)
|
||||
} else if high_only {
|
||||
Some(SeverityFilter::parse("HIGH").unwrap())
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
if all_targets {
|
||||
config.scanner.mode = AnalysisMode::Full
|
||||
// Fail-on threshold
|
||||
let fail_on_sev = if let Some(ref expr) = fail_on {
|
||||
Some(expr.trim().parse::<Severity>().map_err(|e| {
|
||||
crate::errors::NyxError::Msg(format!("invalid --fail-on value: {e}"))
|
||||
})?)
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
if include_nonprod {
|
||||
config.scanner.include_nonprod = true
|
||||
};
|
||||
// ── Apply to config ─────────────────────────────────────────
|
||||
|
||||
scan::handle(&path, no_index, rebuild_index, format, database_dir, config)?;
|
||||
match effective_mode {
|
||||
ScanMode::Full => config.scanner.mode = AnalysisMode::Full,
|
||||
ScanMode::Ast => config.scanner.mode = AnalysisMode::Ast,
|
||||
ScanMode::Cfg | ScanMode::Taint => config.scanner.mode = AnalysisMode::Taint,
|
||||
}
|
||||
|
||||
if keep_nonprod_severity {
|
||||
config.scanner.include_nonprod = true;
|
||||
}
|
||||
|
||||
if quiet {
|
||||
config.output.quiet = true;
|
||||
}
|
||||
|
||||
if no_rank {
|
||||
config.output.attack_surface_ranking = false;
|
||||
}
|
||||
|
||||
// Min-score: CLI wins, then config
|
||||
if let Some(s) = min_score {
|
||||
config.output.min_score = Some(s);
|
||||
}
|
||||
|
||||
// Min-confidence: CLI wins, then config
|
||||
if let Some(ref expr) = min_confidence {
|
||||
config.output.min_confidence =
|
||||
Some(expr.parse::<crate::evidence::Confidence>().map_err(|e| {
|
||||
crate::errors::NyxError::Msg(format!("invalid --min-confidence value: {e}"))
|
||||
})?);
|
||||
}
|
||||
|
||||
if show_all {
|
||||
config.output.show_all = true;
|
||||
}
|
||||
if include_quality {
|
||||
config.output.include_quality = true;
|
||||
}
|
||||
// CLI values override config defaults (clap provides defaults)
|
||||
config.output.max_low = max_low;
|
||||
config.output.max_low_per_file = max_low_per_file;
|
||||
config.output.max_low_per_rule = max_low_per_rule;
|
||||
config.output.rollup_examples = rollup_examples;
|
||||
|
||||
scan::handle(
|
||||
&path,
|
||||
effective_index,
|
||||
format,
|
||||
severity_filter,
|
||||
fail_on_sev,
|
||||
show_suppressed,
|
||||
show_instances.as_deref(),
|
||||
database_dir,
|
||||
config,
|
||||
)?;
|
||||
}
|
||||
Commands::Index { action } => {
|
||||
index::handle(action, database_dir, config)?;
|
||||
|
|
|
|||
1116
src/commands/scan.rs
1116
src/commands/scan.rs
File diff suppressed because it is too large
Load diff
|
|
@ -272,6 +272,18 @@ pub mod index {
|
|||
line: row.get::<_, i64>(2)? as usize,
|
||||
col: row.get::<_, i64>(3)? as usize,
|
||||
severity: Severity::from_str(&sev_str).unwrap(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: None,
|
||||
labels: vec![],
|
||||
confidence: None,
|
||||
evidence: None,
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
})
|
||||
})?;
|
||||
|
||||
|
|
|
|||
396
src/evidence.rs
Normal file
396
src/evidence.rs
Normal file
|
|
@ -0,0 +1,396 @@
|
|||
//! Structured evidence and confidence types for scan diagnostics.
|
||||
//!
|
||||
//! These types capture the provenance of findings (source locations,
|
||||
//! sanitizer/guard info, state-machine transitions) in a structured form
|
||||
//! that can be serialized to JSON and consumed by ranking, filtering,
|
||||
//! and downstream tooling.
|
||||
|
||||
use crate::commands::scan::Diag;
|
||||
use crate::patterns::Severity;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::fmt;
|
||||
use std::str::FromStr;
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Confidence
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Confidence level for a diagnostic finding.
|
||||
///
|
||||
/// Ordered Low < Medium < High so that `>=` comparisons work naturally
|
||||
/// for filtering (e.g. `--min-confidence medium` keeps Medium and High).
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Serialize, Deserialize)]
|
||||
pub enum Confidence {
|
||||
Low,
|
||||
Medium,
|
||||
High,
|
||||
}
|
||||
|
||||
impl fmt::Display for Confidence {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
match self {
|
||||
Self::Low => write!(f, "Low"),
|
||||
Self::Medium => write!(f, "Medium"),
|
||||
Self::High => write!(f, "High"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl FromStr for Confidence {
|
||||
type Err = String;
|
||||
|
||||
fn from_str(s: &str) -> Result<Self, Self::Err> {
|
||||
match s.to_ascii_lowercase().as_str() {
|
||||
"low" => Ok(Self::Low),
|
||||
"medium" | "med" => Ok(Self::Medium),
|
||||
"high" => Ok(Self::High),
|
||||
_ => Err(format!(
|
||||
"unknown confidence level: {s:?} (expected low, medium, high)"
|
||||
)),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Evidence
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Structured evidence for a diagnostic finding.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct Evidence {
|
||||
/// Where tainted data originated.
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub source: Option<SpanEvidence>,
|
||||
|
||||
/// Where the dangerous operation happens.
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub sink: Option<SpanEvidence>,
|
||||
|
||||
/// Validation guards protecting this path.
|
||||
#[serde(skip_serializing_if = "Vec::is_empty")]
|
||||
pub guards: Vec<SpanEvidence>,
|
||||
|
||||
/// Sanitizers applied to this path.
|
||||
#[serde(skip_serializing_if = "Vec::is_empty")]
|
||||
pub sanitizers: Vec<SpanEvidence>,
|
||||
|
||||
/// State-machine evidence (resource lifecycle / auth).
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub state: Option<StateEvidence>,
|
||||
|
||||
/// Free-form notes for ranking and display.
|
||||
#[serde(skip_serializing_if = "Vec::is_empty")]
|
||||
pub notes: Vec<String>,
|
||||
}
|
||||
|
||||
impl Evidence {
|
||||
/// Returns `true` if the evidence contains no useful data.
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.source.is_none()
|
||||
&& self.sink.is_none()
|
||||
&& self.guards.is_empty()
|
||||
&& self.sanitizers.is_empty()
|
||||
&& self.state.is_none()
|
||||
&& self.notes.is_empty()
|
||||
}
|
||||
}
|
||||
|
||||
/// A source-location evidence span.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct SpanEvidence {
|
||||
pub path: String,
|
||||
pub line: u32,
|
||||
pub col: u32,
|
||||
/// One of: `"source"`, `"sink"`, `"guard"`, `"sanitizer"`.
|
||||
pub kind: String,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub snippet: Option<String>,
|
||||
}
|
||||
|
||||
/// Evidence from a state-machine analysis (resource lifecycle / auth).
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct StateEvidence {
|
||||
/// The state machine: `"resource"` or `"auth"`.
|
||||
pub machine: String,
|
||||
/// Variable name if available.
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub subject: Option<String>,
|
||||
/// State before the event.
|
||||
pub from_state: String,
|
||||
/// State after the event.
|
||||
pub to_state: String,
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// compute_confidence
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Derive a confidence level for `diag` based on its rule ID, severity,
|
||||
/// evidence, and analysis kind.
|
||||
///
|
||||
/// This is called as a post-pass after all findings are collected; findings
|
||||
/// that already have a confidence set (e.g. from CFG analysis) are preserved.
|
||||
pub fn compute_confidence(diag: &Diag) -> Confidence {
|
||||
// Degraded analysis caps confidence
|
||||
if let Some(ev) = &diag.evidence
|
||||
&& ev.notes.iter().any(|n| n.starts_with("degraded:"))
|
||||
{
|
||||
return Confidence::Low;
|
||||
}
|
||||
|
||||
let id = &diag.id;
|
||||
|
||||
if id.starts_with("taint-") {
|
||||
if let Some(ev) = &diag.evidence
|
||||
&& ev.notes.iter().any(|n| n == "path_validated")
|
||||
{
|
||||
return Confidence::Medium;
|
||||
}
|
||||
// source+sink present = High
|
||||
if let Some(ev) = &diag.evidence
|
||||
&& ev.source.is_some()
|
||||
&& ev.sink.is_some()
|
||||
{
|
||||
return Confidence::High;
|
||||
}
|
||||
return Confidence::High; // default for taint
|
||||
}
|
||||
|
||||
if id.starts_with("state-") {
|
||||
return match id.as_str() {
|
||||
"state-use-after-close" => Confidence::High,
|
||||
"state-double-close" => Confidence::High,
|
||||
"state-unauthed-access" => Confidence::High,
|
||||
"state-resource-leak" => Confidence::Medium,
|
||||
"state-resource-leak-possible" => Confidence::Low,
|
||||
_ => Confidence::Medium,
|
||||
};
|
||||
}
|
||||
|
||||
if id.starts_with("cfg-") {
|
||||
// If CFG conversion already set confidence, preserve it
|
||||
return diag.confidence.unwrap_or(Confidence::Medium);
|
||||
}
|
||||
|
||||
// AST patterns: High severity → Medium confidence, else Low
|
||||
if diag.severity == Severity::High {
|
||||
Confidence::Medium
|
||||
} else {
|
||||
Confidence::Low
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Tests
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn make_diag(id: &str, severity: Severity) -> Diag {
|
||||
Diag {
|
||||
path: "test.rs".into(),
|
||||
line: 1,
|
||||
col: 1,
|
||||
severity,
|
||||
id: id.into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: None,
|
||||
labels: vec![],
|
||||
confidence: None,
|
||||
evidence: None,
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn compute_confidence_taint_high() {
|
||||
let mut d = make_diag("taint-unsanitised-flow (source 1:1)", Severity::High);
|
||||
d.evidence = Some(Evidence {
|
||||
source: Some(SpanEvidence {
|
||||
path: "test.rs".into(),
|
||||
line: 1,
|
||||
col: 1,
|
||||
kind: "source".into(),
|
||||
snippet: Some("env::var(\"X\")".into()),
|
||||
}),
|
||||
sink: Some(SpanEvidence {
|
||||
path: "test.rs".into(),
|
||||
line: 10,
|
||||
col: 5,
|
||||
kind: "sink".into(),
|
||||
snippet: Some("exec()".into()),
|
||||
}),
|
||||
guards: vec![],
|
||||
sanitizers: vec![],
|
||||
state: None,
|
||||
notes: vec![],
|
||||
});
|
||||
assert_eq!(compute_confidence(&d), Confidence::High);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn compute_confidence_taint_validated() {
|
||||
let mut d = make_diag("taint-unsanitised-flow (source 1:1)", Severity::High);
|
||||
d.evidence = Some(Evidence {
|
||||
source: Some(SpanEvidence {
|
||||
path: "test.rs".into(),
|
||||
line: 1,
|
||||
col: 1,
|
||||
kind: "source".into(),
|
||||
snippet: None,
|
||||
}),
|
||||
sink: Some(SpanEvidence {
|
||||
path: "test.rs".into(),
|
||||
line: 10,
|
||||
col: 5,
|
||||
kind: "sink".into(),
|
||||
snippet: None,
|
||||
}),
|
||||
guards: vec![],
|
||||
sanitizers: vec![],
|
||||
state: None,
|
||||
notes: vec!["path_validated".into()],
|
||||
});
|
||||
assert_eq!(compute_confidence(&d), Confidence::Medium);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn compute_confidence_degraded_caps_to_low() {
|
||||
let mut d = make_diag("taint-unsanitised-flow (source 1:1)", Severity::High);
|
||||
d.evidence = Some(Evidence {
|
||||
source: None,
|
||||
sink: None,
|
||||
guards: vec![],
|
||||
sanitizers: vec![],
|
||||
state: None,
|
||||
notes: vec!["degraded:budget_exceeded".into()],
|
||||
});
|
||||
assert_eq!(compute_confidence(&d), Confidence::Low);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn compute_confidence_state_rules() {
|
||||
assert_eq!(
|
||||
compute_confidence(&make_diag("state-use-after-close", Severity::High)),
|
||||
Confidence::High,
|
||||
);
|
||||
assert_eq!(
|
||||
compute_confidence(&make_diag("state-double-close", Severity::Medium)),
|
||||
Confidence::High,
|
||||
);
|
||||
assert_eq!(
|
||||
compute_confidence(&make_diag("state-unauthed-access", Severity::High)),
|
||||
Confidence::High,
|
||||
);
|
||||
assert_eq!(
|
||||
compute_confidence(&make_diag("state-resource-leak", Severity::Medium)),
|
||||
Confidence::Medium,
|
||||
);
|
||||
assert_eq!(
|
||||
compute_confidence(&make_diag("state-resource-leak-possible", Severity::Low)),
|
||||
Confidence::Low,
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn compute_confidence_cfg_preserves_existing() {
|
||||
let mut d = make_diag("cfg-unguarded-sink", Severity::High);
|
||||
d.confidence = Some(Confidence::Low);
|
||||
assert_eq!(compute_confidence(&d), Confidence::Low);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn compute_confidence_ast_low() {
|
||||
let d = make_diag("rs.code_exec.eval", Severity::Medium);
|
||||
assert_eq!(compute_confidence(&d), Confidence::Low);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn compute_confidence_ast_high_severity_medium() {
|
||||
let d = make_diag("rs.code_exec.eval", Severity::High);
|
||||
assert_eq!(compute_confidence(&d), Confidence::Medium);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn evidence_is_empty() {
|
||||
let ev = Evidence {
|
||||
source: None,
|
||||
sink: None,
|
||||
guards: vec![],
|
||||
sanitizers: vec![],
|
||||
state: None,
|
||||
notes: vec![],
|
||||
};
|
||||
assert!(ev.is_empty());
|
||||
|
||||
let ev2 = Evidence {
|
||||
source: Some(SpanEvidence {
|
||||
path: "x.rs".into(),
|
||||
line: 1,
|
||||
col: 1,
|
||||
kind: "source".into(),
|
||||
snippet: None,
|
||||
}),
|
||||
sink: None,
|
||||
guards: vec![],
|
||||
sanitizers: vec![],
|
||||
state: None,
|
||||
notes: vec![],
|
||||
};
|
||||
assert!(!ev2.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn confidence_ord() {
|
||||
assert!(Confidence::Low < Confidence::Medium);
|
||||
assert!(Confidence::Medium < Confidence::High);
|
||||
assert!(Confidence::Low < Confidence::High);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn confidence_display_and_parse() {
|
||||
assert_eq!(Confidence::Low.to_string(), "Low");
|
||||
assert_eq!(Confidence::Medium.to_string(), "Medium");
|
||||
assert_eq!(Confidence::High.to_string(), "High");
|
||||
|
||||
assert_eq!("low".parse::<Confidence>().unwrap(), Confidence::Low);
|
||||
assert_eq!("MEDIUM".parse::<Confidence>().unwrap(), Confidence::Medium);
|
||||
assert_eq!("High".parse::<Confidence>().unwrap(), Confidence::High);
|
||||
assert!("invalid".parse::<Confidence>().is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn compute_confidence_does_not_override_preset() {
|
||||
// AST patterns set confidence directly; compute_confidence must not overwrite.
|
||||
let mut d = make_diag("rs.quality.expect", Severity::Low);
|
||||
d.confidence = Some(Confidence::High);
|
||||
// The post-pass only runs when confidence is None, but verify compute_confidence
|
||||
// itself would return something different (Low for AST + Low severity), proving
|
||||
// the guard in scan.rs is necessary.
|
||||
assert_eq!(compute_confidence(&d), Confidence::Low);
|
||||
// The actual guard: confidence is already Some, so scan.rs skips compute_confidence.
|
||||
assert_eq!(d.confidence, Some(Confidence::High));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_omits_none_fields() {
|
||||
let ev = Evidence {
|
||||
source: None,
|
||||
sink: None,
|
||||
guards: vec![],
|
||||
sanitizers: vec![],
|
||||
state: None,
|
||||
notes: vec![],
|
||||
};
|
||||
let json = serde_json::to_string(&ev).unwrap();
|
||||
assert_eq!(json, "{}");
|
||||
}
|
||||
}
|
||||
984
src/fmt.rs
Normal file
984
src/fmt.rs
Normal file
|
|
@ -0,0 +1,984 @@
|
|||
//! Console output formatting for scan diagnostics.
|
||||
//!
|
||||
//! Produces professional, security-tool-grade aligned output with a clear
|
||||
//! severity hierarchy, normalised taint flow rendering, and stable wrapping.
|
||||
|
||||
use crate::commands::scan::{Diag, SuppressionStats};
|
||||
use crate::patterns::Severity;
|
||||
use console::style;
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
/// Default maximum line width when terminal size is unknown.
|
||||
const DEFAULT_WIDTH: usize = 100;
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Public API
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Render all diagnostics as grouped, formatted console output with a summary.
|
||||
pub fn render_console(
|
||||
diags: &[Diag],
|
||||
project_name: &str,
|
||||
suppression_stats: Option<&SuppressionStats>,
|
||||
) -> String {
|
||||
let width = terminal_width();
|
||||
let mut out = String::new();
|
||||
|
||||
let mut grouped: BTreeMap<&str, Vec<&Diag>> = BTreeMap::new();
|
||||
for d in diags {
|
||||
grouped.entry(&d.path).or_default().push(d);
|
||||
}
|
||||
|
||||
for (path, issues) in &grouped {
|
||||
// File path header — dim blue, never brighter than severity.
|
||||
out.push_str(&format!("{}\n", style(path).blue().dim().underlined()));
|
||||
for d in issues {
|
||||
out.push_str(&render_diag(d, width));
|
||||
out.push('\n'); // blank line between findings
|
||||
}
|
||||
}
|
||||
|
||||
let suppressed_count = diags.iter().filter(|d| d.suppressed).count();
|
||||
let active_count = diags.len() - suppressed_count;
|
||||
|
||||
if suppressed_count > 0 {
|
||||
out.push_str(&format!(
|
||||
"{} '{}' generated {} {} ({} suppressed).\n\n",
|
||||
style("warning").yellow().bold(),
|
||||
style(project_name).white().bold(),
|
||||
style(active_count).bold(),
|
||||
if active_count == 1 { "issue" } else { "issues" },
|
||||
suppressed_count,
|
||||
));
|
||||
} else {
|
||||
out.push_str(&format!(
|
||||
"{} '{}' generated {} {}.\n\n",
|
||||
style("warning").yellow().bold(),
|
||||
style(project_name).white().bold(),
|
||||
style(diags.len()).bold(),
|
||||
if diags.len() == 1 { "issue" } else { "issues" },
|
||||
));
|
||||
}
|
||||
|
||||
// ── Suppression footer ─────────────────────────────────────────────
|
||||
if let Some(stats) = suppression_stats {
|
||||
let total = stats.total_suppressed();
|
||||
if total > 0 {
|
||||
out.push_str(&format!(
|
||||
"{}\n",
|
||||
style(format!("Suppressed {total} LOW/Quality findings.")).dim()
|
||||
));
|
||||
out.push_str(&format!("{}\n", style("Active filters:").dim()));
|
||||
if !stats.include_quality {
|
||||
out.push_str(&format!(
|
||||
" {} {}\n",
|
||||
style("include_quality =").dim(),
|
||||
style("false").dim()
|
||||
));
|
||||
}
|
||||
out.push_str(&format!(
|
||||
" {} {}\n",
|
||||
style("max_low =").dim(),
|
||||
style(stats.max_low).dim()
|
||||
));
|
||||
out.push_str(&format!(
|
||||
" {} {}\n",
|
||||
style("max_low_per_file =").dim(),
|
||||
style(stats.max_low_per_file).dim()
|
||||
));
|
||||
out.push_str(&format!(
|
||||
" {} {}\n",
|
||||
style("max_low_per_rule =").dim(),
|
||||
style(stats.max_low_per_rule).dim()
|
||||
));
|
||||
out.push_str(&format!(
|
||||
"\n{}\n",
|
||||
style("Use --include-quality, --max-low, or --all to adjust.").dim()
|
||||
));
|
||||
}
|
||||
}
|
||||
|
||||
out
|
||||
}
|
||||
|
||||
/// Normalise a code snippet for display: collapse whitespace, join lines,
|
||||
/// clean up method-chain spacing, trim, and truncate.
|
||||
pub fn normalize_snippet(s: &str) -> String {
|
||||
// Strip newlines/carriage returns with no replacement, then collapse
|
||||
// runs of spaces into a single space.
|
||||
let no_newlines: String = s.chars().filter(|c| *c != '\n' && *c != '\r').collect();
|
||||
let collapsed: String = no_newlines.split_whitespace().collect::<Vec<_>>().join(" ");
|
||||
// Clean up `) .foo(` → `).foo(` and similar spacing around dots in chains.
|
||||
let cleaned = collapse_chain_spacing(&collapsed);
|
||||
let trimmed = cleaned.trim();
|
||||
if trimmed.len() > 120 {
|
||||
format!("{}…", &trimmed[..120])
|
||||
} else {
|
||||
trimmed.to_string()
|
||||
}
|
||||
}
|
||||
|
||||
/// Truncate method chains: keep constructor + first balanced `(...)`, then `…`.
|
||||
///
|
||||
/// E.g. `Command::new("sh").arg("-c").arg(&cmd)` → `Command::new("sh")…`
|
||||
#[allow(dead_code)] // public API, used by consumers
|
||||
pub fn shorten_callee(s: &str) -> String {
|
||||
let s = s.trim();
|
||||
if s.is_empty() {
|
||||
return String::new();
|
||||
}
|
||||
|
||||
let Some(open) = s.find('(') else {
|
||||
return s.to_string();
|
||||
};
|
||||
|
||||
let mut depth = 0u32;
|
||||
let mut close = None;
|
||||
for (i, ch) in s[open..].char_indices() {
|
||||
match ch {
|
||||
'(' => depth += 1,
|
||||
')' => {
|
||||
depth -= 1;
|
||||
if depth == 0 {
|
||||
close = Some(open + i);
|
||||
break;
|
||||
}
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
|
||||
let Some(close_idx) = close else {
|
||||
return s.to_string();
|
||||
};
|
||||
|
||||
let end = close_idx + 1;
|
||||
if end < s.len() {
|
||||
format!("{}…", &s[..end])
|
||||
} else {
|
||||
s.to_string()
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Internal rendering
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Indentation for body/evidence lines (spaces).
|
||||
const BODY_INDENT: usize = 6;
|
||||
|
||||
/// Render a single diagnostic block.
|
||||
fn render_diag(d: &Diag, width: usize) -> String {
|
||||
let mut out = String::new();
|
||||
|
||||
// ── Header line ──────────────────────────────────────────────────────
|
||||
// Format: ` 98:5 ⚠ [MEDIUM] taint-unsanitised-flow (Score: 87, Confidence: Medium)`
|
||||
let loc = format!("{}:{}", d.line, d.col);
|
||||
let sev = if d.suppressed {
|
||||
format!("{} {}", style("○").dim(), style("[SUPPRESSED]").dim(),)
|
||||
} else {
|
||||
severity_tag(d.severity)
|
||||
};
|
||||
let meta_suffix = match (d.rank_score, d.confidence) {
|
||||
(Some(s), Some(c)) => format!(
|
||||
" {}",
|
||||
style(format!("(Score: {}, Confidence: {c})", s as u32)).dim()
|
||||
),
|
||||
(Some(s), None) => format!(" {}", style(format!("(Score: {})", s as u32)).dim()),
|
||||
(None, Some(c)) => format!(" {}", style(format!("(Confidence: {c})")).dim()),
|
||||
(None, None) => String::new(),
|
||||
};
|
||||
out.push_str(&format!(
|
||||
" {} {} {}{}\n",
|
||||
style(&loc).dim(),
|
||||
sev,
|
||||
style(&d.id).dim(),
|
||||
meta_suffix,
|
||||
));
|
||||
|
||||
// ── Rollup body ─────────────────────────────────────────────────────
|
||||
let indent_str = " ".repeat(BODY_INDENT);
|
||||
if let Some(ref rollup) = d.rollup {
|
||||
out.push_str(&format!(
|
||||
"{indent_str}{} ({} occurrences)\n",
|
||||
style(&d.id).dim(),
|
||||
rollup.count
|
||||
));
|
||||
if !rollup.occurrences.is_empty() {
|
||||
let examples: Vec<String> = rollup
|
||||
.occurrences
|
||||
.iter()
|
||||
.map(|loc| format!("{}:{}", loc.line, loc.col))
|
||||
.collect();
|
||||
out.push_str(&format!(
|
||||
"{indent_str}{} {}\n",
|
||||
style("Examples:").dim(),
|
||||
style(examples.join(", ")).dim()
|
||||
));
|
||||
}
|
||||
out.push_str(&format!(
|
||||
"{indent_str}{}\n",
|
||||
style(format!("Run: nyx scan --show-instances {}", d.id)).dim()
|
||||
));
|
||||
return out;
|
||||
}
|
||||
|
||||
// ── Message body ─────────────────────────────────────────────────────
|
||||
if let Some(msg) = &d.message {
|
||||
let capitalized = capitalize_first(msg);
|
||||
let wrapped = wrap_text(&capitalized, width, BODY_INDENT);
|
||||
out.push_str(&format!("{indent_str}{wrapped}\n"));
|
||||
}
|
||||
|
||||
// ── Evidence labels (Source, Sink, Path guard) ───────────────────────
|
||||
if !d.labels.is_empty() {
|
||||
out.push('\n');
|
||||
let max_label = d.labels.iter().map(|(k, _)| k.len()).max().unwrap_or(0);
|
||||
let key_width = max_label + 1; // +1 for ':'
|
||||
for (label, value) in &d.labels {
|
||||
let key_str = format!("{label}:");
|
||||
let value_indent = BODY_INDENT + key_width + 1; // key + space
|
||||
let wrapped_val = wrap_text(value, width, value_indent);
|
||||
if label == "Path guard" {
|
||||
out.push_str(&format!(
|
||||
"{indent_str}{:<kw$} {}\n",
|
||||
style(&key_str).dim(),
|
||||
style(&wrapped_val).cyan(),
|
||||
kw = key_width,
|
||||
));
|
||||
} else {
|
||||
out.push_str(&format!(
|
||||
"{indent_str}{:<kw$} {}\n",
|
||||
style(&key_str).dim(),
|
||||
wrapped_val,
|
||||
kw = key_width,
|
||||
));
|
||||
}
|
||||
}
|
||||
} else if let Some(guard) = &d.guard_kind {
|
||||
out.push_str(&format!(
|
||||
"{indent_str}{} {}\n",
|
||||
style("Path guard:").dim(),
|
||||
style(guard).cyan(),
|
||||
));
|
||||
}
|
||||
|
||||
out
|
||||
}
|
||||
|
||||
/// Colored severity tag with icon. The tag is the visual anchor of each finding.
|
||||
///
|
||||
/// - HIGH: bold red
|
||||
/// - MEDIUM: bold 208 (orange) — distinct from yellow
|
||||
/// - LOW: dim 67 (muted blue-gray)
|
||||
fn severity_tag(sev: Severity) -> String {
|
||||
match sev {
|
||||
Severity::High => format!(
|
||||
"{} [{}]",
|
||||
style("✖").red().bold(),
|
||||
style("HIGH").red().bold(),
|
||||
),
|
||||
Severity::Medium => format!(
|
||||
"{} [{}]",
|
||||
style("⚠").color256(208).bold(),
|
||||
style("MEDIUM").color256(208).bold(),
|
||||
),
|
||||
Severity::Low => format!(
|
||||
"{} [{}]",
|
||||
style("●").color256(67),
|
||||
style("LOW").color256(67),
|
||||
),
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Text utilities
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Collapse spacing artefacts in method chains.
|
||||
///
|
||||
/// - `") .foo("` → `").foo("` (space between `)` and `.`)
|
||||
/// - Multiple spaces → single space
|
||||
fn collapse_chain_spacing(s: &str) -> String {
|
||||
let mut out = String::with_capacity(s.len());
|
||||
let chars: Vec<char> = s.chars().collect();
|
||||
let len = chars.len();
|
||||
let mut i = 0;
|
||||
|
||||
while i < len {
|
||||
// Pattern: `)` followed by whitespace then `.`
|
||||
if chars[i] == ')' {
|
||||
out.push(')');
|
||||
i += 1;
|
||||
// Skip whitespace between `)` and `.`
|
||||
let ws_start = i;
|
||||
while i < len && chars[i] == ' ' {
|
||||
i += 1;
|
||||
}
|
||||
if i < len && chars[i] == '.' {
|
||||
// Collapse: emit `.` directly after `)`
|
||||
continue;
|
||||
} else {
|
||||
// Not a chain continuation — emit the whitespace we skipped
|
||||
for c in &chars[ws_start..i] {
|
||||
out.push(*c);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
out.push(chars[i]);
|
||||
i += 1;
|
||||
}
|
||||
}
|
||||
out
|
||||
}
|
||||
|
||||
/// Word-wrap text to fit within `max_width`, with continuation lines indented
|
||||
/// to `indent` spaces. The first line is NOT indented (caller handles that).
|
||||
fn wrap_text(text: &str, max_width: usize, indent: usize) -> String {
|
||||
let available_first = max_width.saturating_sub(indent);
|
||||
let available_cont = max_width.saturating_sub(indent);
|
||||
if available_first == 0 || text.len() <= available_first {
|
||||
return text.to_string();
|
||||
}
|
||||
|
||||
let indent_str = " ".repeat(indent);
|
||||
let mut result = String::new();
|
||||
let mut line_len = 0usize;
|
||||
let mut first_line = true;
|
||||
|
||||
for word in text.split_whitespace() {
|
||||
let wlen = word.len();
|
||||
let avail = if first_line {
|
||||
available_first
|
||||
} else {
|
||||
available_cont
|
||||
};
|
||||
|
||||
if line_len == 0 {
|
||||
result.push_str(word);
|
||||
line_len = wlen;
|
||||
} else if line_len + 1 + wlen > avail {
|
||||
result.push('\n');
|
||||
result.push_str(&indent_str);
|
||||
result.push_str(word);
|
||||
line_len = wlen;
|
||||
first_line = false;
|
||||
} else {
|
||||
result.push(' ');
|
||||
result.push_str(word);
|
||||
line_len += 1 + wlen;
|
||||
}
|
||||
}
|
||||
|
||||
result
|
||||
}
|
||||
|
||||
/// Get terminal width, falling back to DEFAULT_WIDTH.
|
||||
fn terminal_width() -> usize {
|
||||
terminal_size::terminal_size()
|
||||
.map(|(w, _)| w.0 as usize)
|
||||
.unwrap_or(DEFAULT_WIDTH)
|
||||
}
|
||||
|
||||
/// Capitalise the first character of a string.
|
||||
fn capitalize_first(s: &str) -> String {
|
||||
let mut chars = s.chars();
|
||||
match chars.next() {
|
||||
None => String::new(),
|
||||
Some(c) => {
|
||||
let mut out = String::with_capacity(s.len());
|
||||
for upper in c.to_uppercase() {
|
||||
out.push(upper);
|
||||
}
|
||||
out.push_str(chars.as_str());
|
||||
out
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Tests
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
// ── Helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
/// Strip ANSI escape codes for testing visible content.
|
||||
fn strip_ansi(s: &str) -> String {
|
||||
let mut result = String::new();
|
||||
let mut in_escape = false;
|
||||
for ch in s.chars() {
|
||||
if ch == '\x1b' {
|
||||
in_escape = true;
|
||||
} else if in_escape {
|
||||
if ch == 'm' {
|
||||
in_escape = false;
|
||||
}
|
||||
} else {
|
||||
result.push(ch);
|
||||
}
|
||||
}
|
||||
result
|
||||
}
|
||||
|
||||
// ── normalize_snippet ────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn normalize_snippet_strips_newlines_no_space() {
|
||||
// Newlines are removed with no whitespace inserted in their place.
|
||||
assert_eq!(normalize_snippet("foo\nbar\rbaz"), "foobarbaz");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn normalize_snippet_collapses_whitespace() {
|
||||
assert_eq!(
|
||||
normalize_snippet("Command::new(\"tar\") .arg(\"-czf\")"),
|
||||
"Command::new(\"tar\").arg(\"-czf\")"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn normalize_snippet_trims() {
|
||||
assert_eq!(normalize_snippet(" hello "), "hello");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn normalize_snippet_truncates_at_120() {
|
||||
let long = "a".repeat(200);
|
||||
let result = normalize_snippet(&long);
|
||||
// 120 chars + '…' (3 bytes UTF-8)
|
||||
assert!(result.len() > 120);
|
||||
assert!(result.ends_with('…'));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn normalize_snippet_short_unchanged() {
|
||||
assert_eq!(normalize_snippet("short"), "short");
|
||||
}
|
||||
|
||||
// ── collapse_chain_spacing ───────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn collapse_chain_removes_space_before_dot() {
|
||||
assert_eq!(
|
||||
collapse_chain_spacing("foo() .bar() .baz()"),
|
||||
"foo().bar().baz()"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn collapse_chain_preserves_non_chain_spacing() {
|
||||
assert_eq!(collapse_chain_spacing("foo() + bar()"), "foo() + bar()");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn collapse_chain_multiple_spaces() {
|
||||
assert_eq!(
|
||||
collapse_chain_spacing("cmd() .arg(\"-c\")"),
|
||||
"cmd().arg(\"-c\")"
|
||||
);
|
||||
}
|
||||
|
||||
// ── shorten_callee ───────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn shorten_callee_truncates_chain() {
|
||||
assert_eq!(
|
||||
shorten_callee("Command::new(\"sh\").arg(\"-c\").arg(&cmd)"),
|
||||
"Command::new(\"sh\")…"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn shorten_callee_no_chain_unchanged() {
|
||||
assert_eq!(shorten_callee("env::var(\"HOME\")"), "env::var(\"HOME\")");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn shorten_callee_nested_parens() {
|
||||
assert_eq!(shorten_callee("foo(bar(1, 2)).baz()"), "foo(bar(1, 2))…");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn shorten_callee_no_parens() {
|
||||
assert_eq!(shorten_callee("simple_name"), "simple_name");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn shorten_callee_empty() {
|
||||
assert_eq!(shorten_callee(""), "");
|
||||
}
|
||||
|
||||
// ── wrap_text ────────────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn wrap_short_text_unchanged() {
|
||||
assert_eq!(wrap_text("short text", 80, 4), "short text");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn wrap_breaks_at_boundary() {
|
||||
let text = "word1 word2 word3 word4 word5";
|
||||
let result = wrap_text(text, 20, 4);
|
||||
assert!(result.contains('\n'));
|
||||
for line in result.lines().skip(1) {
|
||||
assert!(line.starts_with(" "));
|
||||
}
|
||||
}
|
||||
|
||||
// ── severity_tag ─────────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn severity_tags_contain_level_name() {
|
||||
let h = strip_ansi(&severity_tag(Severity::High));
|
||||
let m = strip_ansi(&severity_tag(Severity::Medium));
|
||||
let l = strip_ansi(&severity_tag(Severity::Low));
|
||||
assert!(h.contains("HIGH"), "got: {h}");
|
||||
assert!(m.contains("MEDIUM"), "got: {m}");
|
||||
assert!(l.contains("LOW"), "got: {l}");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn severity_tags_have_icons() {
|
||||
let h = strip_ansi(&severity_tag(Severity::High));
|
||||
let m = strip_ansi(&severity_tag(Severity::Medium));
|
||||
let l = strip_ansi(&severity_tag(Severity::Low));
|
||||
assert!(h.contains('✖'), "HIGH should have ✖");
|
||||
assert!(m.contains('⚠'), "MEDIUM should have ⚠");
|
||||
assert!(l.contains('●'), "LOW should have ●");
|
||||
}
|
||||
|
||||
// ── render_console ───────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn render_console_groups_by_file() {
|
||||
let diags = vec![
|
||||
Diag {
|
||||
path: "src/a.rs".into(),
|
||||
line: 10,
|
||||
col: 5,
|
||||
severity: Severity::High,
|
||||
id: "test-rule".into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: Some("test message".into()),
|
||||
labels: vec![],
|
||||
confidence: None,
|
||||
evidence: None,
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
},
|
||||
Diag {
|
||||
path: "src/b.rs".into(),
|
||||
line: 20,
|
||||
col: 1,
|
||||
severity: Severity::Low,
|
||||
id: "another-rule".into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: None,
|
||||
labels: vec![],
|
||||
confidence: None,
|
||||
evidence: None,
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
},
|
||||
];
|
||||
let output = render_console(&diags, "test-project", None);
|
||||
let stripped = strip_ansi(&output);
|
||||
assert!(stripped.contains("src/a.rs"));
|
||||
assert!(stripped.contains("src/b.rs"));
|
||||
assert!(stripped.contains("2 issues"));
|
||||
assert!(stripped.contains("test-project"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn render_console_evidence_displayed() {
|
||||
let diags = vec![Diag {
|
||||
path: "src/main.rs".into(),
|
||||
line: 42,
|
||||
col: 5,
|
||||
severity: Severity::High,
|
||||
id: "taint-unsanitised-flow (source 12:3)".into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: Some("unsanitised input".into()),
|
||||
labels: vec![
|
||||
("Source".into(), "env::var(\"HOME\") at 12:3".into()),
|
||||
("Sink".into(), "Command::new(\"sh\")".into()),
|
||||
],
|
||||
confidence: None,
|
||||
evidence: None,
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
}];
|
||||
let output = render_console(&diags, "proj", None);
|
||||
let stripped = strip_ansi(&output);
|
||||
assert!(stripped.contains("Source:"), "should contain Source label");
|
||||
assert!(stripped.contains("Sink:"), "should contain Sink label");
|
||||
// No backticks in output
|
||||
assert!(
|
||||
!stripped.contains('`'),
|
||||
"should not contain backticks in evidence"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn render_console_blank_line_between_findings() {
|
||||
let diags = vec![
|
||||
Diag {
|
||||
path: "src/a.rs".into(),
|
||||
line: 1,
|
||||
col: 1,
|
||||
severity: Severity::High,
|
||||
id: "rule-a".into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: Some("first".into()),
|
||||
labels: vec![],
|
||||
confidence: None,
|
||||
evidence: None,
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
},
|
||||
Diag {
|
||||
path: "src/a.rs".into(),
|
||||
line: 10,
|
||||
col: 1,
|
||||
severity: Severity::Medium,
|
||||
id: "rule-b".into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: Some("second".into()),
|
||||
labels: vec![],
|
||||
confidence: None,
|
||||
evidence: None,
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
},
|
||||
];
|
||||
let output = render_console(&diags, "proj", None);
|
||||
let stripped = strip_ansi(&output);
|
||||
// There should be a blank line between the two findings
|
||||
assert!(
|
||||
stripped.contains("First\n\n"),
|
||||
"blank line between findings: {stripped}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_omits_empty_labels() {
|
||||
let d = Diag {
|
||||
path: "x.rs".into(),
|
||||
line: 1,
|
||||
col: 1,
|
||||
severity: Severity::Low,
|
||||
id: "test".into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: None,
|
||||
labels: vec![],
|
||||
confidence: None,
|
||||
evidence: None,
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
};
|
||||
let json = serde_json::to_string(&d).unwrap();
|
||||
assert!(
|
||||
!json.contains("labels"),
|
||||
"empty labels should be omitted from JSON"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_omits_rank_fields_when_none() {
|
||||
let d = Diag {
|
||||
path: "x.rs".into(),
|
||||
line: 1,
|
||||
col: 1,
|
||||
severity: Severity::Low,
|
||||
id: "test".into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: None,
|
||||
labels: vec![],
|
||||
confidence: None,
|
||||
evidence: None,
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
};
|
||||
let json = serde_json::to_string(&d).unwrap();
|
||||
assert!(
|
||||
!json.contains("rank_score"),
|
||||
"rank_score should be omitted when None"
|
||||
);
|
||||
assert!(
|
||||
!json.contains("rank_reason"),
|
||||
"rank_reason should be omitted when None"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_includes_rank_score_when_set() {
|
||||
let d = Diag {
|
||||
path: "x.rs".into(),
|
||||
line: 1,
|
||||
col: 1,
|
||||
severity: Severity::High,
|
||||
id: "taint-unsanitised-flow".into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: None,
|
||||
labels: vec![],
|
||||
confidence: None,
|
||||
evidence: None,
|
||||
rank_score: Some(120.0),
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
};
|
||||
let json = serde_json::to_string(&d).unwrap();
|
||||
assert!(
|
||||
json.contains("rank_score"),
|
||||
"rank_score should be present when set"
|
||||
);
|
||||
assert!(json.contains("120"), "rank_score value should appear");
|
||||
}
|
||||
|
||||
// ── capitalize_first ─────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn capitalize_first_works() {
|
||||
assert_eq!(capitalize_first("hello"), "Hello");
|
||||
assert_eq!(capitalize_first(""), "");
|
||||
assert_eq!(capitalize_first("A"), "A");
|
||||
assert_eq!(capitalize_first("unsanitised"), "Unsanitised");
|
||||
}
|
||||
|
||||
// ── taint flow rendering (integration-style) ─────────────────────────
|
||||
|
||||
#[test]
|
||||
fn taint_flow_no_broken_backticks_or_weird_spacing() {
|
||||
let raw_sink = "Command::new(\"tar\") .arg(\"-czf\") .arg(\"/backups/nightly.tar.gz\") .arg(\"/var/data\") .output()";
|
||||
let normalised = normalize_snippet(raw_sink);
|
||||
// Chain spacing should be collapsed
|
||||
assert!(
|
||||
!normalised.contains(") ."),
|
||||
"chain spacing should be collapsed: {normalised}"
|
||||
);
|
||||
assert!(!normalised.contains(" "), "no double-spaces: {normalised}");
|
||||
// Should not contain backticks
|
||||
assert!(!normalised.contains('`'), "no backticks: {normalised}");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn multiline_sink_joined_and_normalised() {
|
||||
let raw = "Command::new(\"tar\")\n .arg(\"-czf\")\n .arg(\"/backups/nightly.tar.gz\")\n .arg(\"/var/data\")\n .output()";
|
||||
let normalised = normalize_snippet(raw);
|
||||
assert_eq!(
|
||||
normalised,
|
||||
"Command::new(\"tar\").arg(\"-czf\").arg(\"/backups/nightly.tar.gz\").arg(\"/var/data\").output()"
|
||||
);
|
||||
}
|
||||
|
||||
// ── confidence display ──────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn confidence_after_score_on_header_line() {
|
||||
let d = Diag {
|
||||
path: "src/a.rs".into(),
|
||||
line: 510,
|
||||
col: 5,
|
||||
severity: Severity::Medium,
|
||||
id: "cfg-unguarded-sink".into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: Some("dangerous sink".into()),
|
||||
labels: vec![],
|
||||
confidence: Some(crate::evidence::Confidence::Medium),
|
||||
evidence: None,
|
||||
rank_score: Some(36.0),
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
};
|
||||
let output = render_diag(&d, 120);
|
||||
let stripped = strip_ansi(&output);
|
||||
// Header line should contain score and confidence together
|
||||
let header = stripped.lines().next().unwrap();
|
||||
assert!(
|
||||
header.contains("(Score: 36, Confidence: Medium)"),
|
||||
"header should contain '(Score: 36, Confidence: Medium)': {header}"
|
||||
);
|
||||
// No standalone Confidence line
|
||||
let non_header_lines: Vec<&str> = stripped.lines().skip(1).collect();
|
||||
assert!(
|
||||
!non_header_lines
|
||||
.iter()
|
||||
.any(|l| l.trim().starts_with("Confidence:")),
|
||||
"should not have standalone Confidence line"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn confidence_title_case() {
|
||||
for (conf, expected) in [
|
||||
(crate::evidence::Confidence::Low, "Confidence: Low"),
|
||||
(crate::evidence::Confidence::Medium, "Confidence: Medium"),
|
||||
(crate::evidence::Confidence::High, "Confidence: High"),
|
||||
] {
|
||||
let d = Diag {
|
||||
path: "x.rs".into(),
|
||||
line: 1,
|
||||
col: 1,
|
||||
severity: Severity::Low,
|
||||
id: "test".into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: None,
|
||||
labels: vec![],
|
||||
confidence: Some(conf),
|
||||
evidence: None,
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
};
|
||||
let output = render_diag(&d, 100);
|
||||
let stripped = strip_ansi(&output);
|
||||
assert!(
|
||||
stripped.contains(expected),
|
||||
"expected '{expected}' in: {stripped}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn confidence_none_only_score() {
|
||||
let d = Diag {
|
||||
path: "src/a.rs".into(),
|
||||
line: 10,
|
||||
col: 5,
|
||||
severity: Severity::High,
|
||||
id: "test-rule".into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: Some("test message".into()),
|
||||
labels: vec![],
|
||||
confidence: None,
|
||||
evidence: None,
|
||||
rank_score: Some(42.0),
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
};
|
||||
let output = render_diag(&d, 100);
|
||||
let stripped = strip_ansi(&output);
|
||||
let header = stripped.lines().next().unwrap();
|
||||
assert!(
|
||||
header.contains("(Score: 42)"),
|
||||
"should show score without confidence: {header}"
|
||||
);
|
||||
assert!(
|
||||
!header.contains("Confidence"),
|
||||
"should not mention confidence when None: {header}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn confidence_only_no_score() {
|
||||
let d = Diag {
|
||||
path: "src/a.rs".into(),
|
||||
line: 10,
|
||||
col: 5,
|
||||
severity: Severity::High,
|
||||
id: "test-rule".into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: None,
|
||||
labels: vec![],
|
||||
confidence: Some(crate::evidence::Confidence::High),
|
||||
evidence: None,
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
};
|
||||
let output = render_diag(&d, 100);
|
||||
let stripped = strip_ansi(&output);
|
||||
let header = stripped.lines().next().unwrap();
|
||||
assert!(
|
||||
header.contains("(Confidence: High)"),
|
||||
"should show confidence without score: {header}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_omits_confidence_when_none() {
|
||||
let d = Diag {
|
||||
path: "x.rs".into(),
|
||||
line: 1,
|
||||
col: 1,
|
||||
severity: Severity::Low,
|
||||
id: "test".into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: None,
|
||||
labels: vec![],
|
||||
confidence: None,
|
||||
evidence: None,
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
};
|
||||
let json = serde_json::to_string(&d).unwrap();
|
||||
assert!(
|
||||
!json.contains("confidence"),
|
||||
"confidence should be omitted when None: {json}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
@ -31,6 +31,10 @@ pub static RULES: &[LabelRule] = &[
|
|||
matchers: &["printf", "fprintf"],
|
||||
label: DataLabel::Sink(Cap::FMT_STRING),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["fopen", "open"],
|
||||
label: DataLabel::Sink(Cap::FILE_IO),
|
||||
},
|
||||
];
|
||||
|
||||
pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
||||
|
|
@ -39,6 +43,9 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"while_statement" => Kind::While,
|
||||
"for_statement" => Kind::For,
|
||||
"do_statement" => Kind::While,
|
||||
"switch_statement" => Kind::Block,
|
||||
"case_statement" => Kind::Block,
|
||||
"labeled_statement" => Kind::Block,
|
||||
|
||||
"return_statement" => Kind::Return,
|
||||
"break_statement" => Kind::Break,
|
||||
|
|
@ -47,6 +54,7 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
// structure
|
||||
"translation_unit" => Kind::SourceFile,
|
||||
"compound_statement" => Kind::Block,
|
||||
"else_clause" => Kind::Block,
|
||||
"function_definition" => Kind::Function,
|
||||
|
||||
// data-flow
|
||||
|
|
|
|||
|
|
@ -29,6 +29,10 @@ pub static RULES: &[LabelRule] = &[
|
|||
matchers: &["printf", "fprintf"],
|
||||
label: DataLabel::Sink(Cap::FMT_STRING),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["fopen", "open"],
|
||||
label: DataLabel::Sink(Cap::FILE_IO),
|
||||
},
|
||||
];
|
||||
|
||||
pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
||||
|
|
@ -38,15 +42,23 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"for_statement" => Kind::For,
|
||||
"for_range_loop" => Kind::For,
|
||||
"do_statement" => Kind::While,
|
||||
"switch_statement" => Kind::Block,
|
||||
"case_statement" => Kind::Block,
|
||||
"labeled_statement" => Kind::Block,
|
||||
|
||||
"return_statement" => Kind::Return,
|
||||
"throw_statement" => Kind::Return,
|
||||
"break_statement" => Kind::Break,
|
||||
"continue_statement" => Kind::Continue,
|
||||
|
||||
// structure
|
||||
"translation_unit" => Kind::SourceFile,
|
||||
"compound_statement" => Kind::Block,
|
||||
"else_clause" => Kind::Block,
|
||||
"function_definition" => Kind::Function,
|
||||
"try_statement" => Kind::Block,
|
||||
"catch_clause" => Kind::Block,
|
||||
"lambda_expression" => Kind::Block,
|
||||
|
||||
// data-flow
|
||||
"call_expression" => Kind::CallFn,
|
||||
|
|
@ -63,7 +75,7 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"preproc_include" => Kind::Trivia,
|
||||
"preproc_def" => Kind::Trivia,
|
||||
"using_declaration" => Kind::Trivia,
|
||||
"namespace_definition" => Kind::Trivia,
|
||||
"namespace_definition" => Kind::Block,
|
||||
};
|
||||
|
||||
pub static PARAM_CONFIG: ParamConfig = ParamConfig {
|
||||
|
|
|
|||
|
|
@ -8,7 +8,17 @@ pub static RULES: &[LabelRule] = &[
|
|||
label: DataLabel::Source(Cap::all()),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["http.Request", "r.FormValue", "r.URL"],
|
||||
matchers: &[
|
||||
"http.Request",
|
||||
"r.FormValue",
|
||||
"r.URL",
|
||||
"r.Body",
|
||||
"r.Header",
|
||||
"r.URL.Query",
|
||||
"r.URL.Query.Get",
|
||||
"Request.FormValue",
|
||||
"Request.URL",
|
||||
],
|
||||
label: DataLabel::Source(Cap::all()),
|
||||
},
|
||||
// ───────── Sanitizers ──────────
|
||||
|
|
@ -17,18 +27,40 @@ pub static RULES: &[LabelRule] = &[
|
|||
label: DataLabel::Sanitizer(Cap::HTML_ESCAPE),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["url.QueryEscape"],
|
||||
matchers: &["url.QueryEscape", "url.PathEscape"],
|
||||
label: DataLabel::Sanitizer(Cap::URL_ENCODE),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["filepath.Clean", "filepath.Base"],
|
||||
label: DataLabel::Sanitizer(Cap::FILE_IO),
|
||||
},
|
||||
// ─────────── Sinks ─────────────
|
||||
LabelRule {
|
||||
matchers: &["exec.Command"],
|
||||
label: DataLabel::Sink(Cap::SHELL_ESCAPE),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["db.Query", "db.Exec"],
|
||||
matchers: &["db.Query", "db.Exec", "db.QueryRow", "db.Prepare"],
|
||||
label: DataLabel::Sink(Cap::SHELL_ESCAPE),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["fmt.Fprintf", "fmt.Sprintf", "fmt.Printf"],
|
||||
label: DataLabel::Sink(Cap::FMT_STRING),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &[
|
||||
"os.Open",
|
||||
"os.OpenFile",
|
||||
"os.Create",
|
||||
"ioutil.ReadFile",
|
||||
"os.ReadFile",
|
||||
],
|
||||
label: DataLabel::Sink(Cap::FILE_IO),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["template.HTML"],
|
||||
label: DataLabel::Sink(Cap::HTML_ESCAPE),
|
||||
},
|
||||
];
|
||||
|
||||
pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
||||
|
|
@ -46,6 +78,16 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"statement_list" => Kind::Block,
|
||||
"function_declaration" => Kind::Function,
|
||||
"method_declaration" => Kind::Function,
|
||||
"func_literal" => Kind::Function,
|
||||
"expression_switch_statement" => Kind::Block,
|
||||
"type_switch_statement" => Kind::Block,
|
||||
"expression_case" => Kind::Block,
|
||||
"type_case" => Kind::Block,
|
||||
"default_case" => Kind::Block,
|
||||
"select_statement" => Kind::Block,
|
||||
"communication_case" => Kind::Block,
|
||||
"go_statement" => Kind::Block,
|
||||
"defer_statement" => Kind::Block,
|
||||
|
||||
// data-flow
|
||||
"call_expression" => Kind::CallFn,
|
||||
|
|
|
|||
|
|
@ -8,7 +8,19 @@ pub static RULES: &[LabelRule] = &[
|
|||
label: DataLabel::Source(Cap::all()),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["getParameter", "getInputStream", "getHeader", "getCookies"],
|
||||
matchers: &[
|
||||
"getParameter",
|
||||
"getInputStream",
|
||||
"getHeader",
|
||||
"getCookies",
|
||||
"getReader",
|
||||
"getQueryString",
|
||||
"getPathInfo",
|
||||
],
|
||||
label: DataLabel::Source(Cap::all()),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["readObject", "readLine"],
|
||||
label: DataLabel::Source(Cap::all()),
|
||||
},
|
||||
// ───────── Sanitizers ──────────
|
||||
|
|
@ -18,13 +30,21 @@ pub static RULES: &[LabelRule] = &[
|
|||
},
|
||||
// ─────────── Sinks ─────────────
|
||||
LabelRule {
|
||||
matchers: &["Runtime.exec"],
|
||||
matchers: &["Runtime.exec", "ProcessBuilder"],
|
||||
label: DataLabel::Sink(Cap::SHELL_ESCAPE),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["executeQuery", "executeUpdate", "prepareStatement"],
|
||||
label: DataLabel::Sink(Cap::SHELL_ESCAPE),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["Class.forName"],
|
||||
label: DataLabel::Sink(Cap::SHELL_ESCAPE),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["println", "print", "write"],
|
||||
label: DataLabel::Sink(Cap::HTML_ESCAPE),
|
||||
},
|
||||
];
|
||||
|
||||
pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
||||
|
|
@ -33,8 +53,10 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"while_statement" => Kind::While,
|
||||
"for_statement" => Kind::For,
|
||||
"enhanced_for_statement" => Kind::For,
|
||||
"do_statement" => Kind::While,
|
||||
|
||||
"return_statement" => Kind::Return,
|
||||
"throw_statement" => Kind::Return,
|
||||
"break_statement" => Kind::Break,
|
||||
"continue_statement" => Kind::Continue,
|
||||
|
||||
|
|
@ -46,6 +68,15 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"interface_body" => Kind::Block,
|
||||
"method_declaration" => Kind::Function,
|
||||
"constructor_declaration" => Kind::Function,
|
||||
"switch_expression" => Kind::Block,
|
||||
"switch_block" => Kind::Block,
|
||||
"switch_block_statement_group" => Kind::Block,
|
||||
"try_statement" => Kind::Block,
|
||||
"catch_clause" => Kind::Block,
|
||||
"finally_clause" => Kind::Block,
|
||||
"lambda_expression" => Kind::Block,
|
||||
"constructor_body" => Kind::Block,
|
||||
"static_initializer" => Kind::Block,
|
||||
|
||||
// data-flow
|
||||
"method_invocation" => Kind::CallMethod,
|
||||
|
|
|
|||
|
|
@ -62,6 +62,7 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"while_statement" => Kind::While,
|
||||
"for_statement" => Kind::For,
|
||||
"for_in_statement" => Kind::For,
|
||||
"do_statement" => Kind::While,
|
||||
|
||||
"return_statement" => Kind::Return,
|
||||
"throw_statement" => Kind::Return,
|
||||
|
|
@ -71,9 +72,24 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
// structure
|
||||
"program" => Kind::SourceFile,
|
||||
"statement_block" => Kind::Block,
|
||||
"else_clause" => Kind::Block,
|
||||
"function_declaration" => Kind::Function,
|
||||
"function_expression" => Kind::Function,
|
||||
"arrow_function" => Kind::Function,
|
||||
"method_definition" => Kind::Function,
|
||||
"generator_function_declaration" => Kind::Function,
|
||||
"generator_function" => Kind::Function,
|
||||
"switch_statement" => Kind::Block,
|
||||
"switch_body" => Kind::Block,
|
||||
"switch_case" => Kind::Block,
|
||||
"switch_default" => Kind::Block,
|
||||
"try_statement" => Kind::Block,
|
||||
"catch_clause" => Kind::Block,
|
||||
"finally_clause" => Kind::Block,
|
||||
"class_declaration" => Kind::Block,
|
||||
"class" => Kind::Block,
|
||||
"class_body" => Kind::Block,
|
||||
"export_statement" => Kind::Block,
|
||||
|
||||
// data-flow
|
||||
"call_expression" => Kind::CallFn,
|
||||
|
|
|
|||
|
|
@ -41,7 +41,6 @@ pub enum Kind {
|
|||
InfiniteLoop,
|
||||
While,
|
||||
For,
|
||||
LoopBody,
|
||||
CallFn,
|
||||
CallMethod,
|
||||
CallMacro,
|
||||
|
|
@ -196,7 +195,7 @@ pub fn lookup(lang: &str, raw: &str) -> Kind {
|
|||
}
|
||||
|
||||
/// The kind of taint source, used to refine finding severity.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
|
||||
pub enum SourceKind {
|
||||
/// Direct user input (request params, argv, stdin, form data)
|
||||
UserInput,
|
||||
|
|
@ -375,6 +374,11 @@ pub fn classify(lang: &str, text: &str, extra: Option<&[RuntimeLabelRule]>) -> O
|
|||
let head = text.split(['(', '<']).next().unwrap_or("");
|
||||
let trimmed = head.trim().as_bytes();
|
||||
|
||||
// For chained calls like `r.URL.Query().Get`, also strip internal
|
||||
// `().` segments to produce a normalized form like `r.URL.Query.Get`.
|
||||
let full_normalized = normalize_chained_call(text);
|
||||
let full_norm_bytes = full_normalized.as_bytes();
|
||||
|
||||
// ── Check runtime (config) rules first — they take priority ──────
|
||||
if let Some(extras) = extra {
|
||||
// Pass 1: exact / suffix
|
||||
|
|
@ -384,12 +388,8 @@ pub fn classify(lang: &str, text: &str, extra: Option<&[RuntimeLabelRule]>) -> O
|
|||
if m.last() == Some(&b'_') {
|
||||
continue;
|
||||
}
|
||||
if ends_with_ignore_case(trimmed, m) {
|
||||
let start = trimmed.len() - m.len();
|
||||
let ok = start == 0 || matches!(trimmed[start - 1], b'.' | b':');
|
||||
if ok {
|
||||
return Some(rule.label);
|
||||
}
|
||||
if match_suffix(trimmed, m) || match_suffix(full_norm_bytes, m) {
|
||||
return Some(rule.label);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -397,7 +397,10 @@ pub fn classify(lang: &str, text: &str, extra: Option<&[RuntimeLabelRule]>) -> O
|
|||
for rule in extras {
|
||||
for raw in &rule.matchers {
|
||||
let m = raw.as_bytes();
|
||||
if m.last() == Some(&b'_') && starts_with_ignore_case(trimmed, m) {
|
||||
if m.last() == Some(&b'_')
|
||||
&& (starts_with_ignore_case(trimmed, m)
|
||||
|| starts_with_ignore_case(full_norm_bytes, m))
|
||||
{
|
||||
return Some(rule.label);
|
||||
}
|
||||
}
|
||||
|
|
@ -417,12 +420,8 @@ pub fn classify(lang: &str, text: &str, extra: Option<&[RuntimeLabelRule]>) -> O
|
|||
if m.last() == Some(&b'_') {
|
||||
continue;
|
||||
}
|
||||
if ends_with_ignore_case(trimmed, m) {
|
||||
let start = trimmed.len() - m.len();
|
||||
let ok = start == 0 || matches!(trimmed[start - 1], b'.' | b':');
|
||||
if ok {
|
||||
return Some(rule.label);
|
||||
}
|
||||
if match_suffix(trimmed, m) || match_suffix(full_norm_bytes, m) {
|
||||
return Some(rule.label);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -431,7 +430,10 @@ pub fn classify(lang: &str, text: &str, extra: Option<&[RuntimeLabelRule]>) -> O
|
|||
for rule in *rules {
|
||||
for raw in rule.matchers {
|
||||
let m = raw.as_bytes();
|
||||
if m.last() == Some(&b'_') && starts_with_ignore_case(trimmed, m) {
|
||||
if m.last() == Some(&b'_')
|
||||
&& (starts_with_ignore_case(trimmed, m)
|
||||
|| starts_with_ignore_case(full_norm_bytes, m))
|
||||
{
|
||||
return Some(rule.label);
|
||||
}
|
||||
}
|
||||
|
|
@ -440,6 +442,58 @@ pub fn classify(lang: &str, text: &str, extra: Option<&[RuntimeLabelRule]>) -> O
|
|||
None
|
||||
}
|
||||
|
||||
/// Check if `text` ends with `matcher` at a word boundary (`.` or `:`).
|
||||
#[inline]
|
||||
fn match_suffix(text: &[u8], matcher: &[u8]) -> bool {
|
||||
if ends_with_ignore_case(text, matcher) {
|
||||
let start = text.len() - matcher.len();
|
||||
start == 0 || matches!(text[start - 1], b'.' | b':')
|
||||
} else {
|
||||
false
|
||||
}
|
||||
}
|
||||
|
||||
/// Normalize a chained method call: strip `()` between `.` segments.
|
||||
/// e.g. `r.URL.Query().Get` → `r.URL.Query.Get`
|
||||
/// e.g. `r.URL.Query().Get("host")` → `r.URL.Query.Get`
|
||||
fn normalize_chained_call(text: &str) -> String {
|
||||
let mut result = String::with_capacity(text.len());
|
||||
let bytes = text.as_bytes();
|
||||
let mut i = 0;
|
||||
while i < bytes.len() {
|
||||
match bytes[i] {
|
||||
b'(' => {
|
||||
// Skip from `(` to matching `)`, but only if followed by `.`
|
||||
// This handles `Query().Get` → `Query.Get`
|
||||
let mut depth = 1u32;
|
||||
let mut j = i + 1;
|
||||
while j < bytes.len() && depth > 0 {
|
||||
if bytes[j] == b'(' {
|
||||
depth += 1;
|
||||
} else if bytes[j] == b')' {
|
||||
depth -= 1;
|
||||
}
|
||||
j += 1;
|
||||
}
|
||||
// If we're at end or next char is `.`, skip the parens
|
||||
if j >= bytes.len() || bytes[j] == b'.' {
|
||||
i = j;
|
||||
} else {
|
||||
// Keep the paren content (unusual case)
|
||||
result.push('(');
|
||||
i += 1;
|
||||
}
|
||||
}
|
||||
b'<' => break, // Stop at generic args
|
||||
_ => {
|
||||
result.push(bytes[i] as char);
|
||||
i += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
result
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
|
|
|||
|
|
@ -3,8 +3,24 @@ use phf::{Map, phf_map};
|
|||
|
||||
pub static RULES: &[LabelRule] = &[
|
||||
// ─────────── Sources ───────────
|
||||
// Note: PHP `$` prefix is stripped by collect_idents, so match without `$`.
|
||||
LabelRule {
|
||||
matchers: &["$_GET", "$_POST", "$_REQUEST", "$_COOKIE"],
|
||||
matchers: &[
|
||||
"$_GET",
|
||||
"_GET",
|
||||
"$_POST",
|
||||
"_POST",
|
||||
"$_REQUEST",
|
||||
"_REQUEST",
|
||||
"$_COOKIE",
|
||||
"_COOKIE",
|
||||
"$_FILES",
|
||||
"_FILES",
|
||||
"$_SERVER",
|
||||
"_SERVER",
|
||||
"$_ENV",
|
||||
"_ENV",
|
||||
],
|
||||
label: DataLabel::Source(Cap::all()),
|
||||
},
|
||||
LabelRule {
|
||||
|
|
@ -20,17 +36,44 @@ pub static RULES: &[LabelRule] = &[
|
|||
matchers: &["escapeshellarg", "escapeshellcmd"],
|
||||
label: DataLabel::Sanitizer(Cap::SHELL_ESCAPE),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["basename"],
|
||||
label: DataLabel::Sanitizer(Cap::FILE_IO),
|
||||
},
|
||||
// ─────────── Sinks ─────────────
|
||||
LabelRule {
|
||||
matchers: &["system", "exec", "passthru", "shell_exec"],
|
||||
matchers: &[
|
||||
"system",
|
||||
"exec",
|
||||
"passthru",
|
||||
"shell_exec",
|
||||
"proc_open",
|
||||
"popen",
|
||||
],
|
||||
label: DataLabel::Sink(Cap::SHELL_ESCAPE),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["eval", "assert"],
|
||||
label: DataLabel::Sink(Cap::SHELL_ESCAPE),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["include", "include_once", "require", "require_once"],
|
||||
label: DataLabel::Sink(Cap::FILE_IO),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["unserialize"],
|
||||
label: DataLabel::Sink(Cap::SHELL_ESCAPE),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["move_uploaded_file", "copy", "file_put_contents", "fwrite"],
|
||||
label: DataLabel::Sink(Cap::FILE_IO),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["echo", "print"],
|
||||
label: DataLabel::Sink(Cap::HTML_ESCAPE),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["mysqli_query", "pg_query"],
|
||||
matchers: &["mysqli_query", "pg_query", "query"],
|
||||
label: DataLabel::Sink(Cap::SHELL_ESCAPE),
|
||||
},
|
||||
];
|
||||
|
|
@ -41,16 +84,29 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"while_statement" => Kind::While,
|
||||
"for_statement" => Kind::For,
|
||||
"foreach_statement" => Kind::For,
|
||||
"do_statement" => Kind::While,
|
||||
|
||||
"return_statement" => Kind::Return,
|
||||
"throw_expression" => Kind::Return,
|
||||
"break_statement" => Kind::Break,
|
||||
"continue_statement" => Kind::Continue,
|
||||
|
||||
// structure
|
||||
"program" => Kind::SourceFile,
|
||||
"compound_statement" => Kind::Block,
|
||||
"else_clause" => Kind::Block,
|
||||
"else_if_clause" => Kind::Block,
|
||||
"function_definition" => Kind::Function,
|
||||
"method_declaration" => Kind::Function,
|
||||
"switch_statement" => Kind::Block,
|
||||
"switch_block" => Kind::Block,
|
||||
"case_statement" => Kind::Block,
|
||||
"default_statement" => Kind::Block,
|
||||
"try_statement" => Kind::Block,
|
||||
"catch_clause" => Kind::Block,
|
||||
"finally_clause" => Kind::Block,
|
||||
"colon_block" => Kind::Block,
|
||||
"class_declaration" => Kind::Block,
|
||||
|
||||
// data-flow
|
||||
"function_call_expression" => Kind::CallFn,
|
||||
|
|
|
|||
|
|
@ -24,7 +24,7 @@ pub static RULES: &[LabelRule] = &[
|
|||
},
|
||||
LabelRule {
|
||||
matchers: &["open"],
|
||||
label: DataLabel::Source(Cap::all()),
|
||||
label: DataLabel::Sink(Cap::FILE_IO),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &[
|
||||
|
|
@ -65,6 +65,14 @@ pub static RULES: &[LabelRule] = &[
|
|||
matchers: &["cursor.execute", "cursor.executemany"],
|
||||
label: DataLabel::Sink(Cap::SHELL_ESCAPE),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["send_file", "send_from_directory"],
|
||||
label: DataLabel::Sink(Cap::FILE_IO),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["os.path.realpath"],
|
||||
label: DataLabel::Sanitizer(Cap::FILE_IO),
|
||||
},
|
||||
];
|
||||
|
||||
pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
||||
|
|
@ -74,13 +82,24 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"for_statement" => Kind::For,
|
||||
|
||||
"return_statement" => Kind::Return,
|
||||
"raise_statement" => Kind::Return,
|
||||
"break_statement" => Kind::Break,
|
||||
"continue_statement" => Kind::Continue,
|
||||
|
||||
// structure
|
||||
"module" => Kind::SourceFile,
|
||||
"block" => Kind::Block,
|
||||
"else_clause" => Kind::Block,
|
||||
"elif_clause" => Kind::Block,
|
||||
"with_statement" => Kind::Block,
|
||||
"function_definition" => Kind::Function,
|
||||
"try_statement" => Kind::Block,
|
||||
"except_clause" => Kind::Block,
|
||||
"finally_clause" => Kind::Block,
|
||||
"class_definition" => Kind::Block,
|
||||
"decorated_definition" => Kind::Block,
|
||||
"match_statement" => Kind::Block,
|
||||
"case_clause" => Kind::Block,
|
||||
|
||||
// data-flow
|
||||
"call" => Kind::CallFn,
|
||||
|
|
|
|||
|
|
@ -40,6 +40,7 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"if" => Kind::If,
|
||||
"unless" => Kind::If,
|
||||
"while" => Kind::While,
|
||||
"until" => Kind::While,
|
||||
"for" => Kind::For,
|
||||
|
||||
"return" => Kind::Return,
|
||||
|
|
@ -49,15 +50,26 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
// structure
|
||||
"program" => Kind::SourceFile,
|
||||
"body_statement" => Kind::Block,
|
||||
"do_block" => Kind::Block,
|
||||
"do_block" => Kind::Function,
|
||||
"then" => Kind::Block,
|
||||
"else" => Kind::Block,
|
||||
"elsif" => Kind::If,
|
||||
|
||||
"begin" => Kind::Block,
|
||||
"rescue" => Kind::Block,
|
||||
"ensure" => Kind::Block,
|
||||
"case" => Kind::Block,
|
||||
"when" => Kind::Block,
|
||||
"class" => Kind::Block,
|
||||
"module" => Kind::Block,
|
||||
"do" => Kind::Block,
|
||||
"block" => Kind::Function,
|
||||
|
||||
// data-flow
|
||||
"call" => Kind::CallFn,
|
||||
"method_call" => Kind::CallFn,
|
||||
"assignment" => Kind::Assignment,
|
||||
"method" => Kind::Function,
|
||||
"singleton_method" => Kind::Function,
|
||||
|
||||
// trivia
|
||||
"comment" => Kind::Trivia,
|
||||
|
|
|
|||
|
|
@ -8,7 +8,7 @@ pub static RULES: &[LabelRule] = &[
|
|||
label: DataLabel::Source(Cap::all()),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["fs::read_to_string", "source_file"],
|
||||
matchers: &["source_file"],
|
||||
label: DataLabel::Source(Cap::all()),
|
||||
},
|
||||
// ───────── Sanitizers ──────────
|
||||
|
|
@ -36,17 +36,29 @@ pub static RULES: &[LabelRule] = &[
|
|||
matchers: &["sink_html"],
|
||||
label: DataLabel::Sink(Cap::HTML_ESCAPE),
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &[
|
||||
"fs::read_to_string",
|
||||
"fs::write",
|
||||
"fs::read",
|
||||
"File::open",
|
||||
"File::create",
|
||||
],
|
||||
label: DataLabel::Sink(Cap::FILE_IO),
|
||||
},
|
||||
];
|
||||
|
||||
pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
||||
// control-flow
|
||||
"if_expression" => Kind::If,
|
||||
"loop_expression" => Kind::InfiniteLoop,
|
||||
"loop_statement" => Kind::LoopBody,
|
||||
"while_statement" => Kind::While,
|
||||
"while_expression" => Kind::While,
|
||||
"for_statement" => Kind::For,
|
||||
"for_expression" => Kind::For,
|
||||
|
||||
"return_statement" => Kind::Return,
|
||||
"return_expression" => Kind::Return,
|
||||
"break_expression" => Kind::Break,
|
||||
"break_statement" => Kind::Break,
|
||||
"continue_expression" => Kind::Continue,
|
||||
|
|
@ -55,7 +67,17 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
// structure
|
||||
"source_file" => Kind::SourceFile,
|
||||
"block" => Kind::Block,
|
||||
"else_clause" => Kind::Block,
|
||||
"match_expression" => Kind::Block,
|
||||
"match_block" => Kind::Block,
|
||||
"match_arm" => Kind::Block,
|
||||
"unsafe_block" => Kind::Block,
|
||||
"function_item" => Kind::Function,
|
||||
"closure_expression" => Kind::Block,
|
||||
"async_block" => Kind::Block,
|
||||
"impl_item" => Kind::Block,
|
||||
"trait_item" => Kind::Block,
|
||||
"declaration_list" => Kind::Block,
|
||||
|
||||
// data-flow
|
||||
"call_expression" => Kind::CallFn,
|
||||
|
|
|
|||
|
|
@ -50,18 +50,36 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"while_statement" => Kind::While,
|
||||
"for_statement" => Kind::For,
|
||||
"for_in_statement" => Kind::For,
|
||||
"for_of_statement" => Kind::For,
|
||||
"do_statement" => Kind::While,
|
||||
|
||||
"return_statement" => Kind::Return,
|
||||
"throw_statement" => Kind::Return,
|
||||
"break_statement" => Kind::Break,
|
||||
"continue_statement" => Kind::Continue,
|
||||
|
||||
// structure
|
||||
"program" => Kind::SourceFile,
|
||||
"statement_block" => Kind::Block,
|
||||
"else_clause" => Kind::Block,
|
||||
"function_declaration" => Kind::Function,
|
||||
"function_expression" => Kind::Function,
|
||||
"arrow_function" => Kind::Function,
|
||||
"method_definition" => Kind::Function,
|
||||
"generator_function_declaration" => Kind::Function,
|
||||
"generator_function" => Kind::Function,
|
||||
"switch_statement" => Kind::Block,
|
||||
"switch_body" => Kind::Block,
|
||||
"switch_case" => Kind::Block,
|
||||
"switch_default" => Kind::Block,
|
||||
"try_statement" => Kind::Block,
|
||||
"catch_clause" => Kind::Block,
|
||||
"finally_clause" => Kind::Block,
|
||||
"class_declaration" => Kind::Block,
|
||||
"class" => Kind::Block,
|
||||
"class_body" => Kind::Block,
|
||||
"abstract_class_declaration" => Kind::Block,
|
||||
"export_statement" => Kind::Block,
|
||||
"enum_declaration" => Kind::Trivia,
|
||||
|
||||
// data-flow
|
||||
"call_expression" => Kind::CallFn,
|
||||
|
|
|
|||
49
src/lib.rs
49
src/lib.rs
|
|
@ -1,19 +1,62 @@
|
|||
// Re-exports for benchmarks and integration tests.
|
||||
// The binary crate (main.rs) is the primary entry point; this lib target
|
||||
// exposes internals for criterion and other tooling.
|
||||
//! # Nyx Scanner
|
||||
//!
|
||||
//! A multi-language static vulnerability scanner. Nyx parses source files with
|
||||
//! [tree-sitter](https://tree-sitter.github.io/), builds intra-procedural
|
||||
//! control-flow graphs ([petgraph](https://docs.rs/petgraph)), and runs
|
||||
//! cross-file taint analysis with a capability-based sanitizer system.
|
||||
//!
|
||||
//! ## Architecture
|
||||
//!
|
||||
//! Nyx uses a **two-pass architecture**:
|
||||
//!
|
||||
//! 1. **Pass 1 — Summary extraction**: Parse each file, build a CFG per function,
|
||||
//! and export a [`summary::FuncSummary`] capturing source/sanitizer/sink capabilities,
|
||||
//! taint propagation behavior, and callee lists. Summaries are persisted to SQLite.
|
||||
//!
|
||||
//! 2. **Pass 2 — Analysis**: Load all summaries into a [`summary::GlobalSummaries`] map,
|
||||
//! re-parse files, and run taint analysis with cross-file callee resolution. CFG
|
||||
//! structural analysis checks for auth gaps, unguarded sinks, and resource leaks.
|
||||
//!
|
||||
//! ## Four Detector Families
|
||||
//!
|
||||
//! - **Taint** ([`taint`]) — Monotone forward dataflow tracking source-to-sink flows
|
||||
//! - **CFG Structural** ([`cfg_analysis`]) — Dominator-based guard and auth-gap detection
|
||||
//! - **State Model** ([`state`]) — Resource lifecycle and authentication state lattices
|
||||
//! - **AST Patterns** ([`patterns`]) — Tree-sitter structural queries per language
|
||||
//!
|
||||
//! ## Supported Languages
|
||||
//!
|
||||
//! Rust, C, C++, Java, Go, PHP, Python, Ruby, TypeScript, JavaScript.
|
||||
//!
|
||||
//! ## Entry Points
|
||||
//!
|
||||
//! - [`scan_no_index`] — Run a two-pass scan without indexing (for tests)
|
||||
//! - [`commands::scan::scan_filesystem`] — Filesystem scan with optional indexing
|
||||
//! - [`commands::scan::scan_with_index_parallel`] — Index-backed parallel scan
|
||||
//!
|
||||
//! ## Documentation
|
||||
//!
|
||||
//! See the [`docs/`](https://github.com/elicpeter/nyx/tree/master/docs) directory
|
||||
//! for user and contributor documentation.
|
||||
|
||||
pub mod ast;
|
||||
pub mod callgraph;
|
||||
pub mod cfg;
|
||||
pub mod cfg_analysis;
|
||||
pub(crate) mod cli;
|
||||
pub mod commands;
|
||||
pub mod database;
|
||||
pub mod errors;
|
||||
pub mod evidence;
|
||||
pub mod fmt;
|
||||
pub mod interop;
|
||||
pub mod labels;
|
||||
pub mod output;
|
||||
pub mod patterns;
|
||||
pub mod rank;
|
||||
pub mod state;
|
||||
pub mod summary;
|
||||
pub mod suppress;
|
||||
pub mod symbol;
|
||||
pub mod taint;
|
||||
pub mod utils;
|
||||
|
|
|
|||
16
src/main.rs
16
src/main.rs
|
|
@ -1,15 +1,21 @@
|
|||
mod ast;
|
||||
mod callgraph;
|
||||
mod cfg;
|
||||
mod cfg_analysis;
|
||||
mod cli;
|
||||
mod commands;
|
||||
mod database;
|
||||
mod errors;
|
||||
mod evidence;
|
||||
mod fmt;
|
||||
mod interop;
|
||||
mod labels;
|
||||
mod output;
|
||||
mod patterns;
|
||||
mod rank;
|
||||
mod state;
|
||||
mod summary;
|
||||
mod suppress;
|
||||
mod symbol;
|
||||
mod taint;
|
||||
mod utils;
|
||||
|
|
@ -25,7 +31,7 @@ use std::fs;
|
|||
use std::time::Instant;
|
||||
use tracing_subscriber::fmt::time;
|
||||
use tracing_subscriber::prelude::*;
|
||||
use tracing_subscriber::{EnvFilter, Registry, fmt};
|
||||
use tracing_subscriber::{EnvFilter, Registry, fmt as tracing_fmt};
|
||||
// use tracing_appender::rolling::{RollingFileAppender, Rotation};
|
||||
// use tracing_appender::non_blocking;
|
||||
|
||||
|
|
@ -33,7 +39,7 @@ fn init_tracing() {
|
|||
// let file_appender = RollingFileAppender::new(Rotation::HOURLY, "logs", "nyx-scanner.log");
|
||||
// let (file_writer, guard) = non_blocking(file_appender);
|
||||
|
||||
let fmt_layer = fmt::layer()
|
||||
let fmt_layer = tracing_fmt::layer()
|
||||
.pretty()
|
||||
.with_thread_ids(true)
|
||||
.with_timer(time::UtcTime::rfc_3339());
|
||||
|
|
@ -56,8 +62,8 @@ fn main() -> NyxResult<()> {
|
|||
tracing::debug!("CLI starting up");
|
||||
let cli = Cli::parse();
|
||||
|
||||
let proj_dirs = ProjectDirs::from("dev", "ecpeter23", "nyx")
|
||||
.ok_or("Unable to determine project directories")?;
|
||||
let proj_dirs =
|
||||
ProjectDirs::from("", "", "nyx").ok_or("Unable to determine project directories")?;
|
||||
|
||||
// todo: check if we want to actually build a config file, maybe some environments will not want to have anything written
|
||||
let config_dir = proj_dirs.config_dir();
|
||||
|
|
@ -83,7 +89,7 @@ fn main() -> NyxResult<()> {
|
|||
commands::handle_command(cli.command, database_dir, config_dir, &mut config)?;
|
||||
|
||||
if !quiet {
|
||||
println!(
|
||||
eprintln!(
|
||||
"{} in {:.3}s.",
|
||||
style("Finished").green().bold(),
|
||||
now.elapsed().as_secs_f32()
|
||||
|
|
|
|||
|
|
@ -38,6 +38,11 @@ fn cfg_rule_description(id: &str) -> Option<&'static str> {
|
|||
}
|
||||
"cfg-resource-leak" => Some("Resource acquired but not released on all exit paths"),
|
||||
"cfg-lock-not-released" => Some("Lock acquired but not released on all exit paths"),
|
||||
"state-use-after-close" => Some("Variable used after its resource handle was closed"),
|
||||
"state-double-close" => Some("Resource handle closed more than once"),
|
||||
"state-resource-leak" => Some("Resource acquired but never closed"),
|
||||
"state-resource-leak-possible" => Some("Resource may not be closed on all paths"),
|
||||
"state-unauthed-access" => Some("Sensitive operation reached without authentication"),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
|
@ -116,11 +121,17 @@ pub fn build_sarif(diags: &[Diag], scan_root: &Path) -> Value {
|
|||
.map(|p| p.to_string_lossy().to_string())
|
||||
.unwrap_or_else(|_| d.path.clone());
|
||||
|
||||
json!({
|
||||
// Prefer the per-finding message (e.g. from state analysis) over the generic rule description.
|
||||
let msg_text = d
|
||||
.message
|
||||
.as_deref()
|
||||
.unwrap_or_else(|| rule_description(base));
|
||||
|
||||
let mut result = json!({
|
||||
"ruleId": base,
|
||||
"ruleIndex": rule_index,
|
||||
"level": severity_to_level(d.severity),
|
||||
"message": { "text": rule_description(base) },
|
||||
"message": { "text": msg_text },
|
||||
"locations": [{
|
||||
"physicalLocation": {
|
||||
"artifactLocation": { "uri": uri },
|
||||
|
|
@ -130,7 +141,50 @@ pub fn build_sarif(diags: &[Diag], scan_root: &Path) -> Value {
|
|||
}
|
||||
}
|
||||
}]
|
||||
})
|
||||
});
|
||||
|
||||
// Build properties object
|
||||
let mut props = serde_json::Map::new();
|
||||
props.insert("category".into(), json!(d.category.to_string()));
|
||||
if let Some(conf) = d.confidence {
|
||||
props.insert("confidence".into(), json!(conf.to_string()));
|
||||
}
|
||||
|
||||
// Add rollup data if present
|
||||
if let Some(ref rollup) = d.rollup {
|
||||
props.insert(
|
||||
"rollup".into(),
|
||||
json!({
|
||||
"count": rollup.count,
|
||||
}),
|
||||
);
|
||||
|
||||
// Add rollup occurrences as relatedLocations
|
||||
let related: Vec<Value> = rollup
|
||||
.occurrences
|
||||
.iter()
|
||||
.enumerate()
|
||||
.map(|(idx, loc)| {
|
||||
json!({
|
||||
"id": idx,
|
||||
"physicalLocation": {
|
||||
"artifactLocation": { "uri": &uri },
|
||||
"region": {
|
||||
"startLine": loc.line,
|
||||
"startColumn": loc.col
|
||||
}
|
||||
}
|
||||
})
|
||||
})
|
||||
.collect();
|
||||
if !related.is_empty() {
|
||||
result["relatedLocations"] = json!(related);
|
||||
}
|
||||
}
|
||||
|
||||
result["properties"] = Value::Object(props);
|
||||
|
||||
result
|
||||
})
|
||||
.collect();
|
||||
|
||||
|
|
|
|||
|
|
@ -1,40 +1,95 @@
|
|||
use crate::patterns::{Pattern, Severity};
|
||||
use crate::evidence::Confidence;
|
||||
use crate::patterns::{Pattern, PatternCategory, PatternTier, Severity};
|
||||
|
||||
/// C AST patterns.
|
||||
///
|
||||
/// Taint rules cover `system`/`popen`/`exec*` (command injection),
|
||||
/// `sprintf`/`strcpy`/`strcat` (buffer overflow sinks), and `printf`/`fprintf`
|
||||
/// (format-string sinks). AST patterns here focus on **banned-by-default
|
||||
/// functions** (`gets`, `scanf %s`) and **format-string** variants not covered
|
||||
/// by taint, since these are dangerous regardless of data origin.
|
||||
pub const PATTERNS: &[Pattern] = &[
|
||||
// ── Tier A: Banned functions (always dangerous) ────────────────────
|
||||
Pattern {
|
||||
id: "strcpy_call",
|
||||
description: "strcpy() usage",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"strcpy\")) @vuln",
|
||||
id: "c.memory.gets",
|
||||
description: "gets() — no bounds checking, always exploitable",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "gets")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "strcat_call",
|
||||
description: "strcat() usage",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"strcat\")) @vuln",
|
||||
id: "c.memory.strcpy",
|
||||
description: "strcpy() — no bounds checking on destination buffer",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "strcpy")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "sprintf_call",
|
||||
description: "sprintf() (no length limit)",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"sprintf\")) @vuln",
|
||||
id: "c.memory.strcat",
|
||||
description: "strcat() — no bounds checking on destination buffer",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "strcat")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "gets_call",
|
||||
description: "gets() usage",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"gets\")) @vuln",
|
||||
id: "c.memory.sprintf",
|
||||
description: "sprintf() — no length limit on output buffer",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "sprintf")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "scanf_with_percent_s",
|
||||
description: "scanf(\"%s\") without length specifier",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"scanf\") arguments: (argument_list (string_literal) @fmt (#match? @fmt \".*%s.*\"))) @vuln",
|
||||
id: "c.memory.scanf_percent_s",
|
||||
description: "scanf(\"%s\") — unbounded string read",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#eq? @id "scanf")
|
||||
arguments: (argument_list
|
||||
(string_literal) @fmt (#match? @fmt "%s")))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Command execution ──────────────────────────────────────
|
||||
Pattern {
|
||||
id: "c.cmdi.system",
|
||||
description: "system() — shell command execution",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "system")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CommandExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "system_call",
|
||||
description: "system() shell execution",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"system\")) @vuln",
|
||||
id: "c.cmdi.popen",
|
||||
description: "popen() — shell command execution with pipe",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "popen")) @vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CommandExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Format-string ──────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "c.memory.printf_no_fmt",
|
||||
description: "printf(var) — format-string vulnerability when first arg is not literal",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#eq? @id "printf")
|
||||
arguments: (argument_list
|
||||
. (identifier) @arg))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::B,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
];
|
||||
|
|
|
|||
|
|
@ -1,40 +1,106 @@
|
|||
use crate::patterns::{Pattern, Severity};
|
||||
use crate::evidence::Confidence;
|
||||
use crate::patterns::{Pattern, PatternCategory, PatternTier, Severity};
|
||||
|
||||
/// C++ AST patterns.
|
||||
///
|
||||
/// Inherits C banned-function concerns plus C++-specific patterns like
|
||||
/// `reinterpret_cast` and `const_cast`. Taint rules overlap with C rules
|
||||
/// for `system`/`sprintf`/`strcpy`/`strcat`.
|
||||
pub const PATTERNS: &[Pattern] = &[
|
||||
// ── Tier A: Banned C functions (inherited) ─────────────────────────
|
||||
Pattern {
|
||||
id: "strcpy_call",
|
||||
description: "strcpy() usage",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"strcpy\")) @vuln",
|
||||
id: "cpp.memory.gets",
|
||||
description: "gets() — no bounds checking, always exploitable",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "gets")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "strcat_call",
|
||||
description: "strcat() usage",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"strcat\")) @vuln",
|
||||
id: "cpp.memory.strcpy",
|
||||
description: "strcpy() — no bounds checking on destination buffer",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "strcpy")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "sprintf_call",
|
||||
description: "sprintf() (no length limit)",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"sprintf\")) @vuln",
|
||||
id: "cpp.memory.strcat",
|
||||
description: "strcat() — no bounds checking on destination buffer",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "strcat")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "gets_call",
|
||||
description: "gets() usage",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"gets\")) @vuln",
|
||||
id: "cpp.memory.sprintf",
|
||||
description: "sprintf() — no length limit on output buffer",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "sprintf")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Command execution ──────────────────────────────────────
|
||||
Pattern {
|
||||
id: "cpp.cmdi.system",
|
||||
description: "system() — shell command execution",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "system")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CommandExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "system_call",
|
||||
description: "system() shell execution",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"system\")) @vuln",
|
||||
id: "cpp.cmdi.popen",
|
||||
description: "popen() — shell command execution",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "popen")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CommandExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Dangerous casts ────────────────────────────────────────
|
||||
// C++ casts are parsed as call_expression with template_function
|
||||
Pattern {
|
||||
id: "cpp.memory.reinterpret_cast",
|
||||
description: "reinterpret_cast — type-punning cast",
|
||||
query: r#"(call_expression
|
||||
function: (template_function
|
||||
name: (identifier) @n (#eq? @n "reinterpret_cast")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "reinterpret_cast",
|
||||
description: "reinterpret_cast usage",
|
||||
query: "(reinterpret_cast_expression) @vuln",
|
||||
id: "cpp.memory.const_cast",
|
||||
description: "const_cast — removes const/volatile qualifier",
|
||||
query: r#"(call_expression
|
||||
function: (template_function
|
||||
name: (identifier) @n (#eq? @n "const_cast")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier B: Format-string (variable first arg) ─────────────────────
|
||||
Pattern {
|
||||
id: "cpp.memory.printf_no_fmt",
|
||||
description: "printf(var) — format-string vulnerability when first arg is not literal",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#eq? @id "printf")
|
||||
arguments: (argument_list
|
||||
. (identifier) @arg))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::B,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
];
|
||||
|
|
|
|||
|
|
@ -1,34 +1,120 @@
|
|||
use crate::patterns::{Pattern, Severity};
|
||||
use crate::evidence::Confidence;
|
||||
use crate::patterns::{Pattern, PatternCategory, PatternTier, Severity};
|
||||
|
||||
/// Go AST patterns.
|
||||
///
|
||||
/// Taint rules cover `exec.Command` (command injection), `db.Query`/`db.Exec`
|
||||
/// (SQL sinks). AST patterns here focus on **TLS misconfiguration**,
|
||||
/// **weak crypto**, **unsafe.Pointer**, and **hardcoded secrets**.
|
||||
pub const PATTERNS: &[Pattern] = &[
|
||||
// ── Tier A: Command execution ──────────────────────────────────────
|
||||
Pattern {
|
||||
id: "exec_command",
|
||||
description: "os/exec Command construction",
|
||||
query: "(call_expression function: (selector_expression field: (field_identifier) @f (#eq? @f \"Command\"))) @vuln",
|
||||
severity: Severity::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "http_insecure_tls",
|
||||
description: "&http.Transport{TLSClientConfig: &tls.Config{InsecureSkipVerify: true}}",
|
||||
query: "(composite_literal type: (selector_expression field: (field_identifier) @t (#eq? @t \"Transport\")) body: (literal_value (keyed_element key: (identifier) @k (#eq? @k \"TLSClientConfig\") value: (composite_literal body: (literal_value (keyed_element key: (identifier) @ik (#eq? @ik \"InsecureSkipVerify\") value: (true)))))) @vuln",
|
||||
id: "go.cmdi.exec_command",
|
||||
description: "exec.Command() — arbitrary process execution",
|
||||
query: r#"(call_expression
|
||||
function: (selector_expression
|
||||
field: (field_identifier) @f (#eq? @f "Command")))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CommandExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Unsafe pointer ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "unsafe_pointer",
|
||||
description: "Use of unsafe.Pointer",
|
||||
query: "(qualified_type type: (selector_expression field: (field_identifier) @f (#eq? @f \"Pointer\"))) @vuln",
|
||||
severity: Severity::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "md5_sha1",
|
||||
description: "crypto/md5 or crypto/sha1 usage",
|
||||
query: "(call_expression function: (selector_expression object: (identifier) @pkg (#match? @pkg \"md5|sha1\"))) @vuln",
|
||||
id: "go.memory.unsafe_pointer",
|
||||
description: "unsafe.Pointer — bypasses Go type system",
|
||||
query: r#"(call_expression
|
||||
function: (selector_expression
|
||||
operand: (identifier) @pkg (#eq? @pkg "unsafe")
|
||||
field: (field_identifier) @f (#eq? @f "Pointer")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: TLS misconfiguration ───────────────────────────────────
|
||||
Pattern {
|
||||
id: "hardcoded_secret",
|
||||
description: "Hard-coded string that looks like an API key/token",
|
||||
query: "(interpreted_string_literal) @s (#match? @s \"(?i)(api|secret|token|password)[=:]?[ \\t]*[A-Za-z0-9_\\-]{8,}\")",
|
||||
id: "go.transport.insecure_skip_verify",
|
||||
description: "InsecureSkipVerify: true — disables TLS certificate validation",
|
||||
query: r#"(keyed_element
|
||||
(literal_element
|
||||
(identifier) @k (#eq? @k "InsecureSkipVerify"))
|
||||
(literal_element (true)))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::InsecureTransport,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Weak crypto ────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "go.crypto.md5",
|
||||
description: "md5.New() / md5.Sum() — weak hash algorithm",
|
||||
query: r#"(call_expression
|
||||
function: (selector_expression
|
||||
operand: (identifier) @pkg (#eq? @pkg "md5")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Crypto,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "go.crypto.sha1",
|
||||
description: "sha1.New() / sha1.Sum() — weak hash algorithm",
|
||||
query: r#"(call_expression
|
||||
function: (selector_expression
|
||||
operand: (identifier) @pkg (#eq? @pkg "sha1")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Crypto,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
// ── Tier B: SQL injection (concatenation heuristic) ────────────────
|
||||
Pattern {
|
||||
id: "go.sqli.query_concat",
|
||||
description: "db.Query/Exec with concatenated string argument",
|
||||
query: r#"(call_expression
|
||||
function: (selector_expression
|
||||
field: (field_identifier) @f (#match? @f "^(Query|Exec|QueryRow)$"))
|
||||
arguments: (argument_list
|
||||
(binary_expression) @concat))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::B,
|
||||
category: PatternCategory::SqlInjection,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
// ── Tier A: Hardcoded secrets ──────────────────────────────────────
|
||||
Pattern {
|
||||
id: "go.secrets.hardcoded_key",
|
||||
description: "Variable with secret-like name assigned a string literal",
|
||||
query: r#"(short_var_declaration
|
||||
left: (expression_list
|
||||
(identifier) @name (#match? @name "(?i)(password|secret|api_?key|token|private_?key)"))
|
||||
right: (expression_list
|
||||
(interpreted_string_literal) @val))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Secrets,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Deserialization ────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "go.deser.gob_decode",
|
||||
description: "gob.NewDecoder — Go binary deserialization",
|
||||
query: r#"(call_expression
|
||||
function: (selector_expression
|
||||
operand: (identifier) @pkg (#eq? @pkg "gob")
|
||||
field: (field_identifier) @f (#eq? @f "NewDecoder")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Deserialization,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
];
|
||||
|
|
|
|||
|
|
@ -1,40 +1,116 @@
|
|||
use crate::patterns::{Pattern, Severity};
|
||||
use crate::evidence::Confidence;
|
||||
use crate::patterns::{Pattern, PatternCategory, PatternTier, Severity};
|
||||
|
||||
/// Java AST patterns.
|
||||
///
|
||||
/// Taint rules cover `Runtime.exec` (command injection) and
|
||||
/// `executeQuery`/`executeUpdate`/`prepareStatement` (SQL sinks).
|
||||
/// AST patterns here focus on **deserialization**, **reflection**,
|
||||
/// **SQL with concatenation** (Tier B heuristic), and **weak crypto**.
|
||||
pub const PATTERNS: &[Pattern] = &[
|
||||
// ── Tier A: Deserialization ────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "runtime_exec",
|
||||
description: "Runtime.getRuntime().exec(...) – arbitrary-command execution",
|
||||
query: "(method_invocation object: (method_invocation name: (identifier) @n (#eq? @n \"getRuntime\")) name: (identifier) @id (#eq? @id \"exec\")) @vuln",
|
||||
id: "java.deser.readobject",
|
||||
description: "ObjectInputStream.readObject() — unsafe deserialization",
|
||||
// Match any .readObject() call — the method name is specific enough.
|
||||
query: r#"(method_invocation
|
||||
name: (identifier) @id (#eq? @id "readObject"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Deserialization,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Command execution ──────────────────────────────────────
|
||||
Pattern {
|
||||
id: "class_for_name",
|
||||
description: "Dynamic reflection via Class.forName(...)",
|
||||
query: "(method_invocation object: (identifier) @c (#eq? @c \"Class\") name: (identifier) @id (#eq? @id \"forName\")) @vuln",
|
||||
severity: Severity::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "object_deserialization",
|
||||
description: "java.io.ObjectInputStream#readObject() deserialization",
|
||||
query: "(method_invocation object: (identifier) @o (#eq? @o \"ObjectInputStream\") name: (identifier) @id (#eq? @id \"readObject\")) @vuln",
|
||||
id: "java.cmdi.runtime_exec",
|
||||
description: "Runtime.getRuntime().exec() — shell command execution",
|
||||
query: r#"(method_invocation
|
||||
object: (method_invocation
|
||||
name: (identifier) @n (#eq? @n "getRuntime"))
|
||||
name: (identifier) @id (#eq? @id "exec"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CommandExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Reflection ─────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "insecure_random",
|
||||
description: "java.util.Random used where SecureRandom is expected",
|
||||
query: "(object_creation_expression type: (identifier) @t (#eq? @t \"Random\")) @vuln",
|
||||
id: "java.reflection.class_forname",
|
||||
description: "Class.forName() — dynamic class loading",
|
||||
query: r#"(method_invocation
|
||||
object: (identifier) @c (#eq? @c "Class")
|
||||
name: (identifier) @id (#eq? @id "forName"))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Reflection,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "thread_stop",
|
||||
description: "Deprecated Thread.stop() invocation",
|
||||
query: "(method_invocation name: (identifier) @id (#eq? @id \"stop\") object: (identifier) @obj (#eq? @obj \"Thread\")) @vuln",
|
||||
id: "java.reflection.method_invoke",
|
||||
description: "Method.invoke() — reflective method invocation",
|
||||
query: r#"(method_invocation
|
||||
name: (identifier) @id (#eq? @id "invoke"))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Reflection,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier B: SQL injection (concatenation heuristic) ────────────────
|
||||
Pattern {
|
||||
id: "java.sqli.execute_concat",
|
||||
description: "SQL execute with concatenated string argument",
|
||||
query: r#"(method_invocation
|
||||
name: (identifier) @id (#match? @id "^execute(Query|Update)?$")
|
||||
arguments: (argument_list
|
||||
(binary_expression) @concat))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::B,
|
||||
category: PatternCategory::SqlInjection,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
// ── Tier A: Weak crypto ────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "java.crypto.insecure_random",
|
||||
description: "new Random() — java.util.Random is not cryptographically secure",
|
||||
query: r#"(object_creation_expression
|
||||
type: (type_identifier) @t (#eq? @t "Random"))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Crypto,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "sql_concat",
|
||||
description: "SQL built with string concatenation",
|
||||
query: "(method_invocation name: (identifier) @id (#match? @id \"execute(Query|Update)?\") arguments: (argument_list (binary_expression) @concat)) @vuln",
|
||||
id: "java.crypto.weak_digest",
|
||||
description: "MessageDigest.getInstance(\"MD5\"/\"SHA1\") — weak hash algorithm",
|
||||
query: r#"(method_invocation
|
||||
object: (identifier) @c (#eq? @c "MessageDigest")
|
||||
name: (identifier) @id (#eq? @id "getInstance")
|
||||
arguments: (argument_list
|
||||
(string_literal) @alg (#match? @alg "(?i)(md5|sha-?1)")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Crypto,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
// ── Tier A: XSS (servlet) ──────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "java.xss.getwriter_print",
|
||||
description: "response.getWriter().print/println — direct output without encoding",
|
||||
query: r#"(method_invocation
|
||||
object: (method_invocation
|
||||
name: (identifier) @gw (#eq? @gw "getWriter"))
|
||||
name: (identifier) @id (#match? @id "^(print|println|write)$"))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Xss,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
];
|
||||
|
|
|
|||
|
|
@ -1,117 +1,182 @@
|
|||
use crate::patterns::{Pattern, Severity};
|
||||
use crate::evidence::Confidence;
|
||||
use crate::patterns::{Pattern, PatternCategory, PatternTier, Severity};
|
||||
|
||||
/// JavaScript AST patterns.
|
||||
///
|
||||
/// Taint rules cover `eval` (code injection), `innerHTML` (XSS),
|
||||
/// `location.href` (open redirect), and `child_process.exec/spawn` (command
|
||||
/// injection). AST patterns here add **new Function()**, **document.write**,
|
||||
/// **setTimeout with string**, **deserialization**, **prototype pollution**,
|
||||
/// **XSS sinks** not covered by taint, and **weak crypto**.
|
||||
pub const PATTERNS: &[Pattern] = &[
|
||||
// ── Tier A: Code execution ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "eval_call",
|
||||
description: "Use of eval()",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"eval\")) @vuln",
|
||||
id: "js.code_exec.eval",
|
||||
description: "eval() — dynamic code execution",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#eq? @id "eval"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "new_function",
|
||||
description: "new Function() constructor",
|
||||
query: "(new_expression constructor: (identifier) @id (#eq? @id \"Function\")) @vuln",
|
||||
id: "js.code_exec.new_function",
|
||||
description: "new Function() constructor — eval equivalent",
|
||||
query: r#"(new_expression
|
||||
constructor: (identifier) @id (#eq? @id "Function"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "document_write",
|
||||
description: "document.write() call",
|
||||
query: "(call_expression function: (member_expression object: (identifier) @obj (#eq? @obj \"document\") property: (property_identifier) @prop (#eq? @prop \"write\"))) @vuln",
|
||||
id: "js.code_exec.settimeout_string",
|
||||
description: "setTimeout/setInterval with string argument — implicit eval",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#match? @id "^(setTimeout|setInterval)$")
|
||||
arguments: (arguments (string) @code))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: XSS sinks ──────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "settimeout_string",
|
||||
description: "setTimeout / setInterval with a string argument",
|
||||
query: "(call_expression function: (identifier) @id (#match? @id \"setTimeout|setInterval\") arguments: (arguments (string) @code . _)) @vuln",
|
||||
id: "js.xss.document_write",
|
||||
description: "document.write() — XSS sink",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
object: (identifier) @obj (#eq? @obj "document")
|
||||
property: (property_identifier) @prop (#match? @prop "^(write|writeln)$")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Xss,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "json_parse",
|
||||
description: "JSON.parse on dynamic string",
|
||||
query: "(call_expression function: (member_expression object: (identifier) @obj (#eq? @obj \"JSON\") property: (property_identifier) @prop (#eq? @prop \"parse\"))) @vuln",
|
||||
id: "js.xss.outer_html",
|
||||
description: "Assignment to .outerHTML — XSS sink",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "outerHTML")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Xss,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "js.xss.insert_adjacent_html",
|
||||
description: "insertAdjacentHTML() — XSS sink",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "insertAdjacentHTML")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Xss,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Prototype pollution ────────────────────────────────────
|
||||
Pattern {
|
||||
id: "js.prototype.proto_assignment",
|
||||
description: "Assignment to __proto__ — prototype pollution",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "__proto__")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Prototype,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "js.prototype.extend_object",
|
||||
description: "Assignment to Object.prototype — prototype mutation",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
object: (member_expression
|
||||
object: (identifier) @obj (#eq? @obj "Object")
|
||||
property: (property_identifier) @mid (#eq? @mid "prototype"))))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Prototype,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Weak crypto ────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "js.crypto.weak_hash",
|
||||
description: "crypto.createHash with weak algorithm (md5/sha1)",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "createHash"))
|
||||
arguments: (arguments
|
||||
(string) @alg (#match? @alg "\"(md5|sha1)\"")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Crypto,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "outer_html_assignment",
|
||||
description: "Assignment to element.outerHTML",
|
||||
query: "(assignment_expression
|
||||
left: (member_expression
|
||||
property: (property_identifier) @prop
|
||||
(#eq? @prop \"outerHTML\"))) @vuln",
|
||||
id: "js.crypto.math_random",
|
||||
description: "Math.random() — not cryptographically secure",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
object: (identifier) @obj (#eq? @obj "Math")
|
||||
property: (property_identifier) @prop (#eq? @prop "random")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Crypto,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
// ── Tier A: Open redirect ──────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "js.xss.location_assign",
|
||||
description: "Assignment to location/location.href — open redirect",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
object: (identifier) @obj (#match? @obj "^(window|location|document)$")
|
||||
property: (property_identifier) @prop (#match? @prop "^(location|href)$")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Xss,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Insecure transport ─────────────────────────────────────
|
||||
Pattern {
|
||||
id: "insert_adjacent_html",
|
||||
description: "insertAdjacentHTML() call",
|
||||
query: "(call_expression
|
||||
function: (member_expression
|
||||
property: (property_identifier) @prop
|
||||
(#eq? @prop \"insertAdjacentHTML\"))) @vuln",
|
||||
severity: Severity::Medium,
|
||||
id: "js.transport.fetch_http",
|
||||
description: "fetch() over plain HTTP",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#eq? @id "fetch")
|
||||
arguments: (arguments
|
||||
(string) @url (#match? @url "^\"http://")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::InsecureTransport,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
// ── Tier A: Cookie manipulation ────────────────────────────────────
|
||||
Pattern {
|
||||
id: "location_href_assignment",
|
||||
description: "Assignment to window.location / location.href",
|
||||
query: "(assignment_expression
|
||||
left: (member_expression
|
||||
object: (identifier) @obj
|
||||
(#match? @obj \"^(window|location|document|self|top|parent|frames)$\")
|
||||
property: (property_identifier) @prop
|
||||
(#match? @prop \"^(location|href)$\"))) @vuln",
|
||||
severity: Severity::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "cookie_assignment",
|
||||
id: "js.xss.cookie_write",
|
||||
description: "Write to document.cookie",
|
||||
query: "(assignment_expression
|
||||
left: (member_expression
|
||||
object: (identifier) @obj
|
||||
(#eq? @obj \"document\")
|
||||
property: (property_identifier) @prop
|
||||
(#eq? @prop \"cookie\"))) @vuln",
|
||||
severity: Severity::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "proto_pollution",
|
||||
description: "Assignment to __proto__ (prototype pollution)",
|
||||
query: "(assignment_expression
|
||||
left: (member_expression
|
||||
property: (property_identifier) @prop
|
||||
(#eq? @prop \"__proto__\"))) @vuln",
|
||||
severity: Severity::Low,
|
||||
},
|
||||
Pattern {
|
||||
id: "weak_hash_md5",
|
||||
description: "crypto.createHash(\"md5\")",
|
||||
query: "(call_expression
|
||||
function: (member_expression
|
||||
object: (identifier) @obj
|
||||
(#eq? @obj \"crypto\")
|
||||
property: (property_identifier) @prop
|
||||
(#eq? @prop \"createHash\"))
|
||||
arguments: (arguments
|
||||
(string) @alg
|
||||
(#eq? @alg \"md5\"))) @vuln",
|
||||
severity: Severity::Low,
|
||||
},
|
||||
Pattern {
|
||||
id: "regexp_constructor_string",
|
||||
description: "new RegExp() with a dynamic string",
|
||||
query: "(new_expression
|
||||
constructor: (identifier) @id
|
||||
(#eq? @id \"RegExp\")
|
||||
arguments: (arguments (string) @pattern)) @vuln",
|
||||
severity: Severity::Low,
|
||||
},
|
||||
Pattern {
|
||||
id: "dangerous_extend_builtin",
|
||||
description: "Extending Object.prototype (may lead to collisions/pollution)",
|
||||
query: "(assignment_expression
|
||||
left: (member_expression
|
||||
object: (identifier) @obj
|
||||
(#eq? @obj \"Object\")
|
||||
property: (property_identifier) @prop
|
||||
(#eq? @prop \"prototype\"))) @vuln",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
object: (identifier) @obj (#eq? @obj "document")
|
||||
property: (property_identifier) @prop (#eq? @prop "cookie")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Xss,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
];
|
||||
|
|
|
|||
|
|
@ -1,3 +1,43 @@
|
|||
//! # AST Pattern Conventions
|
||||
//!
|
||||
//! Each language file exports a `PATTERNS` slice of [`Pattern`] structs.
|
||||
//!
|
||||
//! ## ID format
|
||||
//!
|
||||
//! `<lang>.<category>.<specific>` — e.g. `java.deser.readobject`, `py.cmdi.os_system`.
|
||||
//!
|
||||
//! Language prefixes: `rs`, `java`, `py`, `js`, `ts`, `c`, `cpp`, `go`, `php`, `rb`.
|
||||
//!
|
||||
//! ## Tiers
|
||||
//!
|
||||
//! * **Tier A** — structural presence is high-signal (e.g. `gets()`, `eval()`).
|
||||
//! * **Tier B** — requires a heuristic guard in the query (e.g. SQL with concatenated
|
||||
//! arg, format-string with variable first arg).
|
||||
//!
|
||||
//! ## Severity
|
||||
//!
|
||||
//! * **High** — command exec, deserialization, banned C functions.
|
||||
//! * **Medium** — SQL concat, reflection, XSS sinks, casts.
|
||||
//! * **Low** — weak crypto, insecure randomness, code-quality (`unwrap`/`expect`/`panic`).
|
||||
//!
|
||||
//! Note: the default `min_severity` filter skips Low patterns; they only appear when
|
||||
//! the user explicitly lowers the threshold.
|
||||
//!
|
||||
//! ## No-duplicate rule
|
||||
//!
|
||||
//! If a vulnerability class is already detected by taint analysis (e.g. `eval` as a
|
||||
//! sink, `system` as a sink), the AST pattern is still kept for `--ast-only` mode but
|
||||
//! uses a distinct ID namespace (`js.code_exec.eval` vs `taint-unsanitised-flow`).
|
||||
//! The dedup pass in `ast.rs` prevents exact-duplicate findings at the same location.
|
||||
//!
|
||||
//! ## Adding a new pattern
|
||||
//!
|
||||
//! 1. Pick the language file under `src/patterns/<lang>.rs`.
|
||||
//! 2. Choose tier, category, severity per the rules above.
|
||||
//! 3. Write the tree-sitter query — test with `cargo test --test pattern_tests`.
|
||||
//! 4. Add a snippet to `tests/fixtures/patterns/<lang>/positive.<ext>`.
|
||||
//! 5. Add the ID to the positive test assertion in `tests/pattern_tests.rs`.
|
||||
|
||||
pub mod c;
|
||||
pub mod cpp;
|
||||
mod go;
|
||||
|
|
@ -9,6 +49,7 @@ mod ruby;
|
|||
pub mod rust;
|
||||
pub mod typescript;
|
||||
|
||||
use crate::evidence::Confidence;
|
||||
use console::style;
|
||||
use once_cell::sync::Lazy;
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
|
@ -16,7 +57,7 @@ use std::collections::HashMap;
|
|||
use std::fmt;
|
||||
use std::str::FromStr;
|
||||
|
||||
#[derive(Debug, Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Serialize, Deserialize)]
|
||||
#[derive(Debug, Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash, Serialize, Deserialize)]
|
||||
pub enum Severity {
|
||||
High,
|
||||
Medium,
|
||||
|
|
@ -28,13 +69,14 @@ impl Severity {
|
|||
///
|
||||
/// Returns e.g. `"[HIGH] "` or `"[MEDIUM]"` — always 8 visible characters
|
||||
/// so the column after the tag lines up regardless of severity.
|
||||
#[allow(dead_code)] // public API for lib consumers
|
||||
pub fn colored_tag(self) -> String {
|
||||
// Visible widths: "[HIGH]" = 6, "[MEDIUM]" = 8, "[LOW]" = 5.
|
||||
// Pad the *whole* tag to 8 visible chars (the longest, "[MEDIUM]").
|
||||
let (label, styled_fn): (&str, fn(&str) -> String) = match self {
|
||||
Severity::High => ("HIGH", |s| style(s).red().bold().to_string()),
|
||||
Severity::Medium => ("MEDIUM", |s| style(s).yellow().bold().to_string()),
|
||||
Severity::Low => ("LOW", |s| style(s).cyan().bold().to_string()),
|
||||
Severity::Medium => ("MEDIUM", |s| style(s).color256(208).bold().to_string()),
|
||||
Severity::Low => ("LOW", |s| style(s).color256(67).to_string()),
|
||||
};
|
||||
let bracket_len = label.len() + 2; // "[" + label + "]"
|
||||
let pad = 8usize.saturating_sub(bracket_len);
|
||||
|
|
@ -46,8 +88,8 @@ impl fmt::Display for Severity {
|
|||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
let styled = match *self {
|
||||
Severity::High => style("HIGH").red().bold().to_string(),
|
||||
Severity::Medium => style("MEDIUM").yellow().bold().to_string(),
|
||||
Severity::Low => style("LOW").cyan().bold().to_string(),
|
||||
Severity::Medium => style("MEDIUM").color256(208).bold().to_string(),
|
||||
Severity::Low => style("LOW").color256(67).to_string(),
|
||||
};
|
||||
f.write_str(&styled)
|
||||
}
|
||||
|
|
@ -65,14 +107,132 @@ impl Severity {
|
|||
}
|
||||
|
||||
impl FromStr for Severity {
|
||||
// TODO: FIX
|
||||
type Err = ();
|
||||
type Err = String;
|
||||
|
||||
fn from_str(input: &str) -> Result<Self, Self::Err> {
|
||||
match input.to_lowercase().as_str() {
|
||||
"medium" => Ok(Severity::Medium),
|
||||
"high" => Ok(Severity::High),
|
||||
_ => Ok(Severity::Low),
|
||||
match input.trim().to_ascii_uppercase().as_str() {
|
||||
"HIGH" => Ok(Severity::High),
|
||||
"MEDIUM" | "MED" => Ok(Severity::Medium),
|
||||
"LOW" => Ok(Severity::Low),
|
||||
other => Err(format!("unknown severity: '{other}'")),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// A parsed severity filter expression.
|
||||
///
|
||||
/// Supports three forms:
|
||||
/// - Single level: `"HIGH"` — matches only that level
|
||||
/// - Comma list: `"HIGH,MEDIUM"` — matches any listed level
|
||||
/// - Threshold: `">=MEDIUM"` — matches that level and above
|
||||
///
|
||||
/// Parsing is case-insensitive and tolerates whitespace around tokens.
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub enum SeverityFilter {
|
||||
/// Match findings at or above this level (High >= Medium >= Low).
|
||||
AtLeast(Severity),
|
||||
/// Match findings whose severity is in this exact set.
|
||||
AnyOf(Vec<Severity>),
|
||||
}
|
||||
|
||||
impl SeverityFilter {
|
||||
/// Parse a severity filter expression.
|
||||
///
|
||||
/// Examples: `"HIGH"`, `"high,medium"`, `">=MEDIUM"`, `">= low"`.
|
||||
pub fn parse(expr: &str) -> Result<Self, String> {
|
||||
let trimmed = expr.trim();
|
||||
if trimmed.is_empty() {
|
||||
return Err("empty severity expression".into());
|
||||
}
|
||||
|
||||
// Threshold form: >=LEVEL
|
||||
if let Some(rest) = trimmed.strip_prefix(">=") {
|
||||
let level: Severity = rest.parse()?;
|
||||
return Ok(SeverityFilter::AtLeast(level));
|
||||
}
|
||||
|
||||
// Comma-separated list (also handles single value)
|
||||
let levels: Result<Vec<Severity>, String> = trimmed
|
||||
.split(',')
|
||||
.map(|tok| tok.trim().parse::<Severity>())
|
||||
.collect();
|
||||
let levels = levels?;
|
||||
if levels.is_empty() {
|
||||
return Err("empty severity expression".into());
|
||||
}
|
||||
// Optimise single-value list
|
||||
if levels.len() == 1 {
|
||||
return Ok(SeverityFilter::AnyOf(levels));
|
||||
}
|
||||
Ok(SeverityFilter::AnyOf(levels))
|
||||
}
|
||||
|
||||
/// Returns `true` if the given severity passes this filter.
|
||||
pub fn matches(&self, sev: Severity) -> bool {
|
||||
match self {
|
||||
SeverityFilter::AtLeast(threshold) => {
|
||||
// Severity ordering: High < Medium < Low (derived Ord).
|
||||
// "at least Medium" means sev <= Medium in Ord terms.
|
||||
sev <= *threshold
|
||||
}
|
||||
SeverityFilter::AnyOf(set) => set.contains(&sev),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Pattern confidence tier.
|
||||
///
|
||||
/// * **A** – Structural presence alone is high-signal (e.g. `gets()`, `eval()`).
|
||||
/// * **B** – Requires a simple heuristic guard in the query (e.g. SQL with
|
||||
/// concatenated arg, file-open with non-literal path).
|
||||
#[derive(Debug, Copy, Clone, Eq, PartialEq, Serialize, Deserialize)]
|
||||
pub enum PatternTier {
|
||||
A,
|
||||
B,
|
||||
}
|
||||
|
||||
/// High-level finding category for noise reduction and prioritization.
|
||||
#[derive(Debug, Copy, Clone, Eq, PartialEq, Hash, Serialize, Deserialize)]
|
||||
pub enum FindingCategory {
|
||||
Security,
|
||||
Reliability,
|
||||
Quality,
|
||||
}
|
||||
|
||||
impl std::fmt::Display for FindingCategory {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
match self {
|
||||
FindingCategory::Security => write!(f, "Security"),
|
||||
FindingCategory::Reliability => write!(f, "Reliability"),
|
||||
FindingCategory::Quality => write!(f, "Quality"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Vulnerability class that a pattern detects.
|
||||
#[derive(Debug, Copy, Clone, Eq, PartialEq, Serialize, Deserialize)]
|
||||
pub enum PatternCategory {
|
||||
CommandExec,
|
||||
CodeExec,
|
||||
Deserialization,
|
||||
SqlInjection,
|
||||
PathTraversal,
|
||||
Xss,
|
||||
Crypto,
|
||||
Secrets,
|
||||
InsecureTransport,
|
||||
Reflection,
|
||||
MemorySafety,
|
||||
Prototype,
|
||||
CodeQuality,
|
||||
}
|
||||
|
||||
impl PatternCategory {
|
||||
/// Map this vulnerability class to a high-level finding category.
|
||||
pub fn finding_category(self) -> FindingCategory {
|
||||
match self {
|
||||
PatternCategory::CodeQuality => FindingCategory::Quality,
|
||||
_ => FindingCategory::Security,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -80,7 +240,7 @@ impl FromStr for Severity {
|
|||
/// One AST pattern with a tree-sitter query and meta-data.
|
||||
#[derive(Debug, Clone, Serialize, PartialEq)]
|
||||
pub struct Pattern {
|
||||
/// Unique identifier (snake-case preferred).
|
||||
/// Unique identifier — `<lang>.<category>.<specific>` preferred.
|
||||
pub id: &'static str,
|
||||
/// Human-readable explanation.
|
||||
pub description: &'static str,
|
||||
|
|
@ -88,6 +248,12 @@ pub struct Pattern {
|
|||
pub query: &'static str,
|
||||
/// Rough severity bucket.
|
||||
pub severity: Severity,
|
||||
/// Confidence tier (A = structural, B = heuristic-guarded).
|
||||
pub tier: PatternTier,
|
||||
/// Vulnerability class.
|
||||
pub category: PatternCategory,
|
||||
/// Confidence level for findings produced by this pattern.
|
||||
pub confidence: Confidence,
|
||||
}
|
||||
|
||||
/// Global, lazily-initialised registry: lang-name → pattern slice
|
||||
|
|
@ -164,3 +330,66 @@ fn load_returns_correct_pattern_slices() {
|
|||
|
||||
assert!(load("brainfuck").is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn severity_from_str_rejects_unknown() {
|
||||
assert!("garbage".parse::<Severity>().is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn severity_filter_single() {
|
||||
let f = SeverityFilter::parse("HIGH").unwrap();
|
||||
assert!(f.matches(Severity::High));
|
||||
assert!(!f.matches(Severity::Medium));
|
||||
assert!(!f.matches(Severity::Low));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn severity_filter_comma_list() {
|
||||
let f = SeverityFilter::parse("HIGH,MEDIUM").unwrap();
|
||||
assert!(f.matches(Severity::High));
|
||||
assert!(f.matches(Severity::Medium));
|
||||
assert!(!f.matches(Severity::Low));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn severity_filter_threshold() {
|
||||
let f = SeverityFilter::parse(">=MEDIUM").unwrap();
|
||||
assert!(f.matches(Severity::High));
|
||||
assert!(f.matches(Severity::Medium));
|
||||
assert!(!f.matches(Severity::Low));
|
||||
|
||||
let f2 = SeverityFilter::parse(">=LOW").unwrap();
|
||||
assert!(f2.matches(Severity::High));
|
||||
assert!(f2.matches(Severity::Medium));
|
||||
assert!(f2.matches(Severity::Low));
|
||||
|
||||
let f3 = SeverityFilter::parse(">=HIGH").unwrap();
|
||||
assert!(f3.matches(Severity::High));
|
||||
assert!(!f3.matches(Severity::Medium));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn severity_filter_case_insensitive_and_whitespace() {
|
||||
let f = SeverityFilter::parse(" high , medium ").unwrap();
|
||||
assert!(f.matches(Severity::High));
|
||||
assert!(f.matches(Severity::Medium));
|
||||
assert!(!f.matches(Severity::Low));
|
||||
|
||||
let f2 = SeverityFilter::parse(">= medium").unwrap();
|
||||
assert!(f2.matches(Severity::High));
|
||||
assert!(f2.matches(Severity::Medium));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn severity_filter_rejects_empty() {
|
||||
assert!(SeverityFilter::parse("").is_err());
|
||||
assert!(SeverityFilter::parse(" ").is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn severity_filter_rejects_invalid_level() {
|
||||
assert!(SeverityFilter::parse("CRITICAL").is_err());
|
||||
assert!(SeverityFilter::parse("HIGH,CRITICAL").is_err());
|
||||
assert!(SeverityFilter::parse(">=BOGUS").is_err());
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,40 +1,144 @@
|
|||
use crate::patterns::{Pattern, Severity};
|
||||
use crate::evidence::Confidence;
|
||||
use crate::patterns::{Pattern, PatternCategory, PatternTier, Severity};
|
||||
|
||||
/// PHP AST patterns.
|
||||
///
|
||||
/// Taint rules cover `system`/`exec`/`passthru`/`shell_exec` (command
|
||||
/// injection), `echo`/`print` (XSS sinks), and `mysqli_query`/`pg_query`
|
||||
/// (SQL sinks). AST patterns here focus on **eval**, **deserialization**,
|
||||
/// **deprecated dangerous functions**, **include with variable**, and
|
||||
/// **SQL concatenation** (Tier B).
|
||||
pub const PATTERNS: &[Pattern] = &[
|
||||
// ── Tier A: Code execution ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "eval_call",
|
||||
description: "eval($code) execution",
|
||||
query: "(function_call_expression function: (name) @n (#eq? @n \"eval\")) @vuln",
|
||||
id: "php.code_exec.eval",
|
||||
description: "eval() — dynamic code execution",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#eq? @n "eval"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "preg_replace_e",
|
||||
description: "preg_replace with deprecated /e modifier",
|
||||
query: "(function_call_expression function: (name) @n (#eq? @n \"preg_replace\") arguments: (arguments (string) @pat (#match? @pat \"/.*e.*$/\"))) @vuln",
|
||||
id: "php.code_exec.create_function",
|
||||
description: "create_function() — deprecated eval-like constructor",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#eq? @n "create_function"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "create_function",
|
||||
description: "create_function(...) anonymous eval-like",
|
||||
query: "(function_call_expression function: (name) @n (#eq? @n \"create_function\")) @vuln",
|
||||
severity: Severity::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "unserialize_call",
|
||||
description: "unserialize(...) on user input",
|
||||
query: "(function_call_expression function: (name) @n (#eq? @n \"unserialize\")) @vuln",
|
||||
id: "php.code_exec.preg_replace_e",
|
||||
description: "preg_replace with /e modifier — code execution via regex",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#eq? @n "preg_replace")
|
||||
arguments: (arguments
|
||||
(argument
|
||||
(string) @pat (#match? @pat "/[^/]*/[a-zA-Z]*e"))))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "mysql_query_concat",
|
||||
description: "mysql_query with concatenated SQL",
|
||||
query: "(function_call_expression function: (name) @n (#eq? @n \"mysql_query\") arguments: (arguments (binary_expression) @concat)) @vuln",
|
||||
id: "php.code_exec.assert_string",
|
||||
description: "assert() with string argument — evaluates PHP code",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#eq? @n "assert")
|
||||
arguments: (arguments
|
||||
(argument (string) @code)))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Command execution ──────────────────────────────────────
|
||||
Pattern {
|
||||
id: "php.cmdi.system",
|
||||
description: "system/shell_exec/exec/passthru — shell command execution",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#match? @n "^(system|shell_exec|exec|passthru|proc_open|popen)$"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CommandExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Deserialization ────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "php.deser.unserialize",
|
||||
description: "unserialize() — PHP object injection",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#eq? @n "unserialize"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Deserialization,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier B: SQL injection (concatenation heuristic) ────────────────
|
||||
Pattern {
|
||||
id: "php.sqli.query_concat",
|
||||
description: "mysql_query/mysqli_query with concatenated SQL string",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#match? @n "^(mysql_query|mysqli_query)$")
|
||||
arguments: (arguments
|
||||
(argument (binary_expression) @concat)))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::B,
|
||||
category: PatternCategory::SqlInjection,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
// ── Tier B: Path traversal (include with variable) ─────────────────
|
||||
Pattern {
|
||||
id: "php.path.include_variable",
|
||||
description: "include/require with variable path — file inclusion vulnerability",
|
||||
query: r#"(include_expression (variable_name)) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::B,
|
||||
category: PatternCategory::PathTraversal,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
// ── Tier A: Crypto ─────────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "php.crypto.md5",
|
||||
description: "md5() — weak hash function",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#eq? @n "md5"))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Crypto,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "system_call",
|
||||
description: "system()/shell_exec()/exec() command execution",
|
||||
query: "(function_call_expression function: (name) @n (#match? @n \"system|shell_exec|exec|passthru\")) @vuln",
|
||||
severity: Severity::Medium,
|
||||
id: "php.crypto.sha1",
|
||||
description: "sha1() — weak hash function",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#eq? @n "sha1"))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Crypto,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "php.crypto.rand",
|
||||
description: "rand()/mt_rand() — not cryptographically secure",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#match? @n "^(rand|mt_rand)$"))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Crypto,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
];
|
||||
|
|
|
|||
|
|
@ -1,22 +1,178 @@
|
|||
use crate::patterns::{Pattern, Severity};
|
||||
use crate::evidence::Confidence;
|
||||
use crate::patterns::{Pattern, PatternCategory, PatternTier, Severity};
|
||||
|
||||
/// Python AST patterns.
|
||||
///
|
||||
/// Taint rules cover `eval`/`exec`, `os.system`/`os.popen`/`subprocess.*`,
|
||||
/// and `cursor.execute`. AST patterns here add coverage for **deserialization**,
|
||||
/// **subprocess shell=True** (Tier B — taint doesn't check keyword args), and
|
||||
/// **code execution** sinks that taint cannot structurally verify.
|
||||
pub const PATTERNS: &[Pattern] = &[
|
||||
// ── Tier A: Code execution ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "eval_call",
|
||||
description: "eval() on dynamic input",
|
||||
query: "(call function: (identifier) @id (#eq? @id \"eval\")) @vuln",
|
||||
id: "py.code_exec.eval",
|
||||
description: "eval() — dynamic code execution",
|
||||
query: r#"(call function: (identifier) @id (#eq? @id "eval")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "exec_call",
|
||||
description: "exec(...) execution of dynamic code",
|
||||
query: "(call function: (identifier) @id (#eq? @id \"exec\")) @vuln",
|
||||
id: "py.code_exec.exec",
|
||||
description: "exec() — dynamic code execution",
|
||||
query: r#"(call function: (identifier) @id (#eq? @id "exec")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "subprocess_shell_true",
|
||||
description: "subprocess.* with shell=True",
|
||||
query: "(call function: (attribute object: (identifier) @pkg (#eq? @pkg \"subprocess\")) arguments: (argument_list . (keyword_argument name: (identifier) @k (#eq? @k \"shell\")) (true) @val)) @vuln",
|
||||
id: "py.code_exec.compile",
|
||||
description: "compile() with exec/eval mode — code compilation from string",
|
||||
query: r#"(call function: (identifier) @id (#eq? @id "compile")) @vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Command execution ──────────────────────────────────────
|
||||
Pattern {
|
||||
id: "py.cmdi.os_system",
|
||||
description: "os.system() — shell command execution",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "os")
|
||||
attribute: (identifier) @fn (#eq? @fn "system")))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CommandExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "py.cmdi.os_popen",
|
||||
description: "os.popen() — shell command execution",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "os")
|
||||
attribute: (identifier) @fn (#eq? @fn "popen")))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CommandExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier B: subprocess with shell=True ─────────────────────────────
|
||||
Pattern {
|
||||
id: "py.cmdi.subprocess_shell",
|
||||
description: "subprocess call with shell=True",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "subprocess"))
|
||||
arguments: (argument_list
|
||||
(keyword_argument
|
||||
name: (identifier) @k (#eq? @k "shell")
|
||||
value: (true))))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::B,
|
||||
category: PatternCategory::CommandExec,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
// ── Tier A: Deserialization ────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "py.deser.pickle_loads",
|
||||
description: "pickle.loads/load — arbitrary object deserialization",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "pickle")
|
||||
attribute: (identifier) @fn (#match? @fn "^loads?$")))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Deserialization,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "py.deser.yaml_load",
|
||||
description: "yaml.load() without SafeLoader — arbitrary object instantiation",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "yaml")
|
||||
attribute: (identifier) @fn (#eq? @fn "load")))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Deserialization,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "py.deser.shelve_open",
|
||||
description: "shelve.open() — pickle-backed deserialization",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "shelve")
|
||||
attribute: (identifier) @fn (#eq? @fn "open")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Deserialization,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier B: SQL injection (format/concat heuristic) ────────────────
|
||||
Pattern {
|
||||
id: "py.sqli.execute_format",
|
||||
description: "cursor.execute with string concatenation — SQL injection risk",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
attribute: (identifier) @fn (#eq? @fn "execute"))
|
||||
arguments: (argument_list
|
||||
(binary_operator) @arg))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::B,
|
||||
category: PatternCategory::SqlInjection,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
// ── Tier A: Weak crypto ────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "py.crypto.md5",
|
||||
description: "hashlib.md5() — weak hash algorithm",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "hashlib")
|
||||
attribute: (identifier) @fn (#eq? @fn "md5")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Crypto,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "py.crypto.sha1",
|
||||
description: "hashlib.sha1() — weak hash algorithm",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "hashlib")
|
||||
attribute: (identifier) @fn (#eq? @fn "sha1")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Crypto,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
// ── Tier A: Template injection ─────────────────────────────────────
|
||||
Pattern {
|
||||
id: "py.xss.jinja_from_string",
|
||||
description: "jinja2.Template from string — potential template injection",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
attribute: (identifier) @fn (#eq? @fn "from_string")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Xss,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
];
|
||||
|
|
|
|||
|
|
@ -1,133 +1,141 @@
|
|||
use crate::patterns::{Pattern, Severity};
|
||||
use crate::evidence::Confidence;
|
||||
use crate::patterns::{Pattern, PatternCategory, PatternTier, Severity};
|
||||
|
||||
/// Ruby AST patterns.
|
||||
///
|
||||
/// Taint rules cover `system`/`exec` (command injection), `eval` (code
|
||||
/// execution), and `puts`/`print` (output sinks). AST patterns here focus on
|
||||
/// **deserialization** (YAML.load, Marshal.load), **instance_eval/class_eval**,
|
||||
/// **backtick shell**, **send with dynamic arg**, and **constantize**.
|
||||
pub const PATTERNS: &[Pattern] = &[
|
||||
// ---------- Runtime code-execution primitives ----------
|
||||
// ── Tier A: Code execution ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "eval_call",
|
||||
description: "Kernel#eval usage",
|
||||
query: r#"
|
||||
(call
|
||||
(identifier) @id
|
||||
(#eq? @id "eval")
|
||||
) @vuln
|
||||
"#,
|
||||
id: "rb.code_exec.eval",
|
||||
description: "Kernel#eval — dynamic code execution",
|
||||
query: r#"(call (identifier) @id (#eq? @id "eval")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "instance_eval_call",
|
||||
description: "Object#instance_eval usage",
|
||||
query: r#"
|
||||
(call
|
||||
(identifier) @id
|
||||
(#eq? @id "instance_eval")
|
||||
) @vuln
|
||||
"#,
|
||||
id: "rb.code_exec.instance_eval",
|
||||
description: "instance_eval — evaluates string in object context",
|
||||
query: r#"(call
|
||||
method: (identifier) @id (#eq? @id "instance_eval"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "class_eval_call",
|
||||
description: "Module#class_eval / module_eval usage",
|
||||
query: r#"
|
||||
(call
|
||||
(identifier) @id
|
||||
(#match? @id "^(class_eval|module_eval)$")
|
||||
) @vuln
|
||||
"#,
|
||||
id: "rb.code_exec.class_eval",
|
||||
description: "class_eval / module_eval — evaluates string in class context",
|
||||
query: r#"(call
|
||||
method: (identifier) @id (#match? @id "^(class_eval|module_eval)$"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ---------- Shell execution ----------
|
||||
// ── Tier A: Command execution ──────────────────────────────────────
|
||||
Pattern {
|
||||
id: "system_exec_interp",
|
||||
description: "system/exec with string interpolation",
|
||||
query: r#"
|
||||
(call
|
||||
method: (identifier) @m
|
||||
(#match? @m "^(system|exec)$")
|
||||
arguments: (argument_list
|
||||
(string
|
||||
(interpolation)+ @vuln
|
||||
)
|
||||
)
|
||||
)
|
||||
"#,
|
||||
id: "rb.cmdi.backtick",
|
||||
description: "Backtick shell execution",
|
||||
query: r#"(subshell) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CommandExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Shell execution ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "rb.cmdi.system_interp",
|
||||
description: "system/exec call — command execution risk",
|
||||
query: r#"(call
|
||||
method: (identifier) @m (#match? @m "^(system|exec)$"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CommandExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Deserialization ────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "rb.deser.yaml_load",
|
||||
description: "YAML.load — arbitrary object deserialization (use safe_load instead)",
|
||||
query: r#"(call
|
||||
receiver: (constant) @recv (#match? @recv "^(YAML|Psych)$")
|
||||
method: (identifier) @m (#eq? @m "load"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Deserialization,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "backtick_command",
|
||||
description: "Back-tick shell execution",
|
||||
// `uname -a`
|
||||
query: r#"(shell_command) @vuln"#,
|
||||
id: "rb.deser.marshal_load",
|
||||
description: "Marshal.load — arbitrary Ruby object deserialization",
|
||||
query: r#"(call
|
||||
receiver: (constant) @recv (#eq? @recv "Marshal")
|
||||
method: (identifier) @m (#eq? @m "load"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Deserialization,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ---------- Dangerous deserialisation ----------
|
||||
// ── Tier A: Reflection ─────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "yaml_load",
|
||||
description: "YAML.load / Psych.load (arbitrary object deserialisation)",
|
||||
query: r#"
|
||||
(call
|
||||
receiver: (constant) @recv
|
||||
(#match? @recv "^(YAML|Psych)$")
|
||||
method: (identifier) @m
|
||||
(#eq? @m "load")
|
||||
) @vuln
|
||||
"#,
|
||||
severity: Severity::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "marshal_load",
|
||||
description: "Marshal.load usage",
|
||||
query: r#"
|
||||
(call
|
||||
receiver: (constant) @recv
|
||||
(#eq? @recv "Marshal")
|
||||
method: (identifier) @m
|
||||
(#eq? @m "load")
|
||||
) @vuln
|
||||
"#,
|
||||
severity: Severity::High,
|
||||
},
|
||||
// ---------- Reflection / meta-programming ----------
|
||||
Pattern {
|
||||
id: "send_dynamic",
|
||||
description: "send() with dynamic first argument (not a literal symbol)",
|
||||
query: r#"
|
||||
(call
|
||||
method: (identifier) @m
|
||||
(#eq? @m "send")
|
||||
arguments: (argument_list
|
||||
[
|
||||
(identifier) ; send(method_name_var, …)
|
||||
(string (interpolation)+) ; send("user_#{role}", …)
|
||||
] @vuln
|
||||
)
|
||||
)
|
||||
id: "rb.reflection.send_dynamic",
|
||||
description: "send() with non-symbol argument — arbitrary method dispatch",
|
||||
query: r#"(call
|
||||
method: (identifier) @m (#eq? @m "send")
|
||||
arguments: (argument_list
|
||||
[(identifier) (string (interpolation)+)] @vuln))
|
||||
"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::B,
|
||||
category: PatternCategory::Reflection,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "constantize_call",
|
||||
description: "ActiveSupport constantize / safe_constantize on tainted data",
|
||||
query: r#"
|
||||
(call
|
||||
method: (identifier) @m
|
||||
(#match? @m "^(constantize|safe_constantize)$")
|
||||
) @vuln
|
||||
"#,
|
||||
id: "rb.reflection.constantize",
|
||||
description: "constantize / safe_constantize — dynamic class resolution",
|
||||
query: r#"(call
|
||||
method: (identifier) @m (#match? @m "^(constantize|safe_constantize)$"))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Reflection,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ---------- Insecure resource access ----------
|
||||
// ── Tier A: SSRF ───────────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "open_uri_http",
|
||||
description: "Kernel#open with HTTP(S) URL (open-uri auto-follow)",
|
||||
query: r#"
|
||||
(call
|
||||
method: (identifier) @m
|
||||
(#eq? @m "open")
|
||||
arguments: (argument_list
|
||||
(string) @url
|
||||
(#match? @url "^\"https?://")
|
||||
)
|
||||
) @vuln
|
||||
"#,
|
||||
id: "rb.ssrf.open_uri",
|
||||
description: "Kernel#open with HTTP URL — SSRF via open-uri",
|
||||
query: r#"(call
|
||||
method: (identifier) @m (#eq? @m "open")
|
||||
arguments: (argument_list
|
||||
(string) @url (#match? @url "^\"https?://")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::InsecureTransport,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Crypto ─────────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "rb.crypto.md5",
|
||||
description: "Digest::MD5 — weak hash algorithm",
|
||||
query: r#"(scope_resolution
|
||||
name: (constant) @c (#eq? @c "MD5"))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Crypto,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
];
|
||||
|
|
|
|||
|
|
@ -1,118 +1,170 @@
|
|||
use crate::patterns::{Pattern, Severity};
|
||||
use crate::evidence::Confidence;
|
||||
use crate::patterns::{Pattern, PatternCategory, PatternTier, Severity};
|
||||
|
||||
/// Rust AST patterns.
|
||||
///
|
||||
/// Rust taint rules already cover `Command::new`/`arg`/`status`/`output` sinks
|
||||
/// and `env::var` / `fs::read_to_string` sources, so we do NOT duplicate those.
|
||||
/// Patterns here focus on **unsafe memory**, **panicking APIs**, and structural
|
||||
/// code-quality signals specific to Rust.
|
||||
pub const PATTERNS: &[Pattern] = &[
|
||||
// ── Tier A: Memory Safety (unsafe) ─────────────────────────────────
|
||||
Pattern {
|
||||
id: "unsafe_block",
|
||||
description: "Use of an `unsafe` block",
|
||||
id: "rs.memory.transmute",
|
||||
description: "std::mem::transmute — unchecked type reinterpretation",
|
||||
query: r#"(call_expression
|
||||
function: (scoped_identifier
|
||||
path: (identifier) @p (#eq? @p "mem")
|
||||
name: (identifier) @f (#eq? @f "transmute")))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "rs.memory.copy_nonoverlapping",
|
||||
description: "ptr::copy_nonoverlapping — raw pointer memcpy",
|
||||
query: r#"(call_expression
|
||||
function: (scoped_identifier
|
||||
path: (identifier) @p (#eq? @p "ptr")
|
||||
name: (identifier) @f (#eq? @f "copy_nonoverlapping")))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "rs.memory.get_unchecked",
|
||||
description: "get_unchecked / get_unchecked_mut — unchecked indexing",
|
||||
query: r#"(call_expression
|
||||
function: (field_expression
|
||||
field: (field_identifier) @m
|
||||
(#match? @m "^get_unchecked(_mut)?$")))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "rs.memory.mem_zeroed",
|
||||
description: "std::mem::zeroed — zero-initialised memory may be UB for non-POD types",
|
||||
query: r#"(call_expression
|
||||
function: (scoped_identifier
|
||||
path: (identifier) @p (#eq? @p "mem")
|
||||
name: (identifier) @n (#eq? @n "zeroed")))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "rs.memory.ptr_read",
|
||||
description: "ptr::read / ptr::read_volatile — raw pointer dereference",
|
||||
query: r#"(call_expression
|
||||
function: (scoped_identifier
|
||||
path: (identifier) @p (#eq? @p "ptr")
|
||||
name: (identifier) @n (#match? @n "^read(_volatile)?$")))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Code quality / robustness ──────────────────────────────
|
||||
Pattern {
|
||||
id: "rs.quality.unsafe_block",
|
||||
description: "unsafe block — manual memory safety obligation",
|
||||
query: "(unsafe_block) @vuln",
|
||||
severity: Severity::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "unsafe_fn",
|
||||
description: "`unsafe fn` declaration",
|
||||
query: "(function_item
|
||||
(function_modifiers) @mods
|
||||
(#match? @mods \"^unsafe\\b\")) @vuln",
|
||||
severity: Severity::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "transmute_call",
|
||||
description: "`std::mem::transmute` call",
|
||||
query: "(call_expression
|
||||
function: (scoped_identifier
|
||||
path: (identifier) @p (#eq? @p \"mem\")
|
||||
name: (identifier) @f (#eq? @f \"transmute\")))
|
||||
@vuln",
|
||||
severity: Severity::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "copy_nonoverlapping",
|
||||
description: "Raw pointer `copy_nonoverlapping`",
|
||||
query: "(call_expression
|
||||
function: (scoped_identifier
|
||||
path: (identifier) @p (#eq? @p \"ptr\")
|
||||
name: (identifier) @f (#eq? @f \"copy_nonoverlapping\")))
|
||||
@vuln",
|
||||
severity: Severity::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "get_unchecked",
|
||||
description: "`get_unchecked` / `get_unchecked_mut` slice access",
|
||||
query: "(call_expression
|
||||
function: (field_expression
|
||||
field: (field_identifier) @m
|
||||
(#match? @m \"get_unchecked(_mut)?\"))) @vuln",
|
||||
severity: Severity::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "unwrap_call",
|
||||
description: "`.unwrap()` call (may panic)",
|
||||
query: "(call_expression
|
||||
function: (field_expression
|
||||
field: (field_identifier) @name
|
||||
(#eq? @name \"unwrap\"))) ; exact match
|
||||
@vuln",
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "expect_call",
|
||||
description: "`.expect()` call (may panic)",
|
||||
query: "(call_expression
|
||||
function: (field_expression
|
||||
field: (field_identifier) @name
|
||||
(#eq? @name \"expect\"))) @vuln",
|
||||
id: "rs.quality.unsafe_fn",
|
||||
description: "unsafe fn declaration",
|
||||
query: r#"(function_item
|
||||
(function_modifiers) @mods
|
||||
(#match? @mods "^unsafe"))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "panic_macro",
|
||||
description: "`panic!` macro invocation",
|
||||
query: "(macro_invocation (identifier) @id (#eq? @id \"panic\")) @vuln",
|
||||
severity: Severity::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "todo_or_unimplemented",
|
||||
description: "`todo!()` / `unimplemented!()` placeholder",
|
||||
query: "(macro_invocation
|
||||
(identifier) @id
|
||||
(#match? @id \"todo|unimplemented\")) @vuln",
|
||||
id: "rs.quality.unwrap",
|
||||
description: ".unwrap() — panics on None/Err",
|
||||
query: r#"(call_expression
|
||||
function: (field_expression
|
||||
field: (field_identifier) @name (#eq? @name "unwrap")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeQuality,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "narrow_cast_with_as",
|
||||
description: "`as` cast to an 8-/16-bit integer (possible truncation)",
|
||||
query: "(type_cast_expression
|
||||
type: (primitive_type) @to
|
||||
(#match? @to \"^u?i(8|16)$\")) @vuln",
|
||||
id: "rs.quality.expect",
|
||||
description: ".expect() — panics on None/Err",
|
||||
query: r#"(call_expression
|
||||
function: (field_expression
|
||||
field: (field_identifier) @name (#eq? @name "expect")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeQuality,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "mem_zeroed",
|
||||
description: "`std::mem::zeroed()`",
|
||||
query: "(call_expression function:(scoped_identifier path:(identifier)@p (#eq? @p \"mem\") name:(identifier)@n (#eq? @n \"zeroed\")))@vuln",
|
||||
severity: Severity::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "mem_forget",
|
||||
description: "`std::mem::forget()`",
|
||||
query: "(call_expression function:(scoped_identifier path:(identifier)@p (#eq? @p \"mem\") name:(identifier)@n (#eq? @n \"forget\")))@vuln",
|
||||
severity: Severity::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "ptr_read",
|
||||
description: "`ptr::read_*` raw-ptr read",
|
||||
query: "(call_expression function:(scoped_identifier path:(identifier)@p (#eq? @p \"ptr\") name:(identifier)@n (#match? @n \"read(_volatile)?\")))@vuln",
|
||||
severity: Severity::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "arc_unwrap",
|
||||
description: "`Arc::unwrap_or_else_unchecked`",
|
||||
query: "(call_expression function:(scoped_identifier name:(identifier)@n (#eq? @n \"unwrap_or_else_unchecked\")))@vuln",
|
||||
severity: Severity::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "dbg_macro",
|
||||
description: "`dbg!()` left in code",
|
||||
query: "(macro_invocation (identifier)@id (#eq? @id \"dbg\"))@vuln",
|
||||
id: "rs.quality.panic_macro",
|
||||
description: "panic! macro invocation",
|
||||
query: r#"(macro_invocation (identifier) @id (#eq? @id "panic")) @vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeQuality,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "rs.quality.todo",
|
||||
description: "todo!() / unimplemented!() placeholder left in code",
|
||||
query: r#"(macro_invocation
|
||||
(identifier) @id
|
||||
(#match? @id "^(todo|unimplemented)$"))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeQuality,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Narrowing cast ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "rs.memory.narrow_cast",
|
||||
description: "`as` cast to 8/16-bit integer — possible truncation",
|
||||
query: r#"(type_cast_expression
|
||||
type: (primitive_type) @to
|
||||
(#match? @to "^(u8|i8|u16|i16)$"))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "rs.memory.mem_forget",
|
||||
description: "std::mem::forget — may leak resources",
|
||||
query: r#"(call_expression
|
||||
function: (scoped_identifier
|
||||
path: (identifier) @p (#eq? @p "mem")
|
||||
name: (identifier) @n (#eq? @n "forget")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::MemorySafety,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
];
|
||||
|
|
|
|||
|
|
@ -1,100 +1,157 @@
|
|||
use crate::patterns::{Pattern, Severity};
|
||||
use crate::evidence::Confidence;
|
||||
use crate::patterns::{Pattern, PatternCategory, PatternTier, Severity};
|
||||
|
||||
/// TypeScript AST patterns.
|
||||
///
|
||||
/// TypeScript shares most patterns with JavaScript. Taint rules cover `eval`,
|
||||
/// `innerHTML`, and `child_process.*` sinks. AST patterns here mirror JS
|
||||
/// patterns plus TS-specific `any` type-safety escapes.
|
||||
pub const PATTERNS: &[Pattern] = &[
|
||||
// ── Tier A: Code execution ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "eval_call",
|
||||
description: "Use of eval()",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"eval\")) @vuln",
|
||||
id: "ts.code_exec.eval",
|
||||
description: "eval() — dynamic code execution",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#eq? @id "eval"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "new_function",
|
||||
description: "new Function() constructor",
|
||||
query: "(new_expression constructor: (identifier) @id (#eq? @id \"Function\")) @vuln",
|
||||
id: "ts.code_exec.new_function",
|
||||
description: "new Function() constructor — eval equivalent",
|
||||
query: r#"(new_expression
|
||||
constructor: (identifier) @id (#eq? @id "Function"))
|
||||
@vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "document_write",
|
||||
description: "document.write() call",
|
||||
query: "(call_expression function: (member_expression object: (identifier) @obj (#eq? @obj \"document\") property: (property_identifier) @prop (#eq? @prop \"write\"))) @vuln",
|
||||
id: "ts.code_exec.settimeout_string",
|
||||
description: "setTimeout/setInterval with string argument — implicit eval",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#match? @id "^(setTimeout|setInterval)$")
|
||||
arguments: (arguments (string) @code))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeExec,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: XSS sinks ──────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "settimeout_string",
|
||||
description: "setTimeout / setInterval with a string argument",
|
||||
query: "(call_expression function: (identifier) @id (#match? @id \"setTimeout|setInterval\") arguments: (arguments (string) @code . _)) @vuln",
|
||||
id: "ts.xss.document_write",
|
||||
description: "document.write() — XSS sink",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
object: (identifier) @obj (#eq? @obj "document")
|
||||
property: (property_identifier) @prop (#match? @prop "^(write|writeln)$")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Xss,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "any_type",
|
||||
description: "Type annotation of `any`",
|
||||
query: "(type_annotation (predefined_type) @t (#eq? @t \"any\")) @vuln",
|
||||
severity: Severity::Low,
|
||||
},
|
||||
Pattern {
|
||||
id: "json_parse",
|
||||
description: "JSON.parse on dynamic string",
|
||||
query: "(call_expression function: (member_expression object: (identifier) @obj (#eq? @obj \"JSON\") property: (property_identifier) @prop (#eq? @prop \"parse\"))) @vuln",
|
||||
severity: Severity::Low,
|
||||
},
|
||||
Pattern {
|
||||
id: "as_any_assertion",
|
||||
description: "Type assertion to `any` using `as any`",
|
||||
query: "(as_expression type: (predefined_type) @t (#eq? @t \"any\")) @vuln",
|
||||
severity: Severity::Low,
|
||||
},
|
||||
Pattern {
|
||||
id: "type_assertion_any",
|
||||
description: "Type assertion to `any` using `<any>` syntax",
|
||||
query: "(type_assertion type: (predefined_type) @t (#eq? @t \"any\")) @vuln",
|
||||
severity: Severity::Low,
|
||||
},
|
||||
Pattern {
|
||||
id: "outer_html_assignment",
|
||||
description: "Assignment to element.outerHTML",
|
||||
query: "(assignment_expression left: (member_expression property: (property_identifier) @prop (#eq? @prop \"outerHTML\"))) @vuln",
|
||||
id: "ts.xss.outer_html",
|
||||
description: "Assignment to .outerHTML — XSS sink",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "outerHTML")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Xss,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
Pattern {
|
||||
id: "insert_adjacent_html",
|
||||
description: "insertAdjacentHTML() call",
|
||||
query: "(call_expression function: (member_expression property: (property_identifier) @prop (#eq? @prop \"insertAdjacentHTML\"))) @vuln",
|
||||
id: "ts.xss.insert_adjacent_html",
|
||||
description: "insertAdjacentHTML() — XSS sink",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "insertAdjacentHTML")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Xss,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Weak crypto ────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "ts.crypto.math_random",
|
||||
description: "Math.random() — not cryptographically secure",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
object: (identifier) @obj (#eq? @obj "Math")
|
||||
property: (property_identifier) @prop (#eq? @prop "random")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Crypto,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
// ── Tier A: TypeScript-specific type-safety escapes ────────────────
|
||||
Pattern {
|
||||
id: "ts.quality.any_annotation",
|
||||
description: "Type annotation of `any` — disables type checking",
|
||||
query: r#"(type_annotation (predefined_type) @t (#eq? @t "any")) @vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeQuality,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "document_cookie_write",
|
||||
id: "ts.quality.as_any",
|
||||
description: "Type assertion `as any` — type-safety escape hatch",
|
||||
query: r#"(as_expression (predefined_type) @t (#eq? @t "any")) @vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::CodeQuality,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
// ── Tier A: Prototype pollution ────────────────────────────────────
|
||||
Pattern {
|
||||
id: "ts.prototype.proto_assignment",
|
||||
description: "Assignment to __proto__ — prototype pollution",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "__proto__")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Prototype,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Open redirect ──────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "ts.xss.location_assign",
|
||||
description: "Assignment to location/location.href — open redirect",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
object: (identifier) @obj (#match? @obj "^(window|location|document)$")
|
||||
property: (property_identifier) @prop (#match? @prop "^(location|href)$")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Xss,
|
||||
confidence: Confidence::High,
|
||||
},
|
||||
// ── Tier A: Cookie manipulation ────────────────────────────────────
|
||||
Pattern {
|
||||
id: "ts.xss.cookie_write",
|
||||
description: "Write to document.cookie",
|
||||
query: "(assignment_expression left: (member_expression object: (identifier) @obj (#eq? @obj \"document\") property: (property_identifier) @prop (#eq? @prop \"cookie\"))) @vuln",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
object: (identifier) @obj (#eq? @obj "document")
|
||||
property: (property_identifier) @prop (#eq? @prop "cookie")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
},
|
||||
Pattern {
|
||||
id: "onclick_setattribute",
|
||||
description: "Element.setAttribute('onclick', …)",
|
||||
query: "(call_expression function: (member_expression property: (property_identifier) @prop (#eq? @prop \"setAttribute\")) arguments: (arguments (string) @name (#eq? @name \"\\\"onclick\\\"\") . (string) @handler)) @vuln",
|
||||
severity: Severity::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "math_random_call",
|
||||
description: "Use of Math.random() for security-sensitive randomness",
|
||||
query: "(call_expression function: (member_expression object: (identifier) @obj (#eq? @obj \"Math\") property: (property_identifier) @prop (#eq? @prop \"random\"))) @vuln",
|
||||
severity: Severity::Low,
|
||||
},
|
||||
Pattern {
|
||||
id: "crypto_createhash_md5",
|
||||
description: "Insecure hash algorithm: crypto.createHash('md5')",
|
||||
query: "(call_expression function: (member_expression object: (identifier) @obj (#eq? @obj \"crypto\") property: (property_identifier) @prop (#eq? @prop \"createHash\")) arguments: (arguments (string) @alg (#match? @alg \"(?i)\\\"md5\\\"\"))) @vuln",
|
||||
severity: Severity::Medium,
|
||||
},
|
||||
Pattern {
|
||||
id: "fetch_http_url",
|
||||
description: "fetch() over plain HTTP",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"fetch\") arguments: (arguments (string) @url (#match? @url \"^\\\"http://\"))) @vuln",
|
||||
severity: Severity::Low,
|
||||
},
|
||||
Pattern {
|
||||
id: "xhr_eval_response",
|
||||
description: "eval() of XMLHttpRequest.responseText",
|
||||
query: "(call_expression function: (identifier) @id (#eq? @id \"eval\") arguments: (arguments (member_expression property: (property_identifier) @prop (#eq? @prop \"responseText\")))) @vuln",
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
category: PatternCategory::Xss,
|
||||
confidence: Confidence::Medium,
|
||||
},
|
||||
];
|
||||
|
|
|
|||
646
src/rank.rs
Normal file
646
src/rank.rs
Normal file
|
|
@ -0,0 +1,646 @@
|
|||
//! Attack surface ranking for scan diagnostics.
|
||||
//!
|
||||
//! Computes a deterministic score for each [`Diag`] using only in-memory
|
||||
//! information (severity, evidence, source kind, rule ID, validation state).
|
||||
//! The score is used to sort findings so that truncation keeps the most
|
||||
//! exploitable / important results.
|
||||
|
||||
use crate::commands::scan::Diag;
|
||||
use crate::evidence::Evidence;
|
||||
use crate::patterns::Severity;
|
||||
use std::hash::{DefaultHasher, Hash, Hasher};
|
||||
|
||||
/// Computed attack-surface ranking for a single diagnostic.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct AttackRank {
|
||||
pub score: f64,
|
||||
/// Breakdown of score components (for debug/display purposes).
|
||||
#[allow(dead_code)]
|
||||
pub components: Vec<(String, String)>,
|
||||
}
|
||||
|
||||
/// Compute an attack-surface score for `diag`.
|
||||
///
|
||||
/// The score is a positive `f64`; higher means more exploitable / important.
|
||||
/// Components are returned for optional debug/display.
|
||||
pub fn compute_attack_rank(diag: &Diag) -> AttackRank {
|
||||
let mut score = 0.0_f64;
|
||||
let mut components: Vec<(String, String)> = Vec::new();
|
||||
|
||||
// ── 1. Severity base ────────────────────────────────────────────────
|
||||
let sev_score = match diag.severity {
|
||||
Severity::High => 60.0,
|
||||
Severity::Medium => 30.0,
|
||||
Severity::Low => 10.0,
|
||||
};
|
||||
score += sev_score;
|
||||
components.push(("severity".into(), format!("{sev_score}")));
|
||||
|
||||
// ── 2. Analysis kind bonus ──────────────────────────────────────────
|
||||
//
|
||||
// Taint-confirmed findings are the strongest signal. State findings
|
||||
// (resource lifecycle / auth) are next. CFG-structural findings
|
||||
// without taint evidence rank lower. AST-only pattern matches are
|
||||
// the weakest.
|
||||
let kind_bonus = analysis_kind_bonus(&diag.id, diag.evidence.as_ref());
|
||||
score += kind_bonus;
|
||||
if kind_bonus != 0.0 {
|
||||
components.push(("analysis_kind".into(), format!("{kind_bonus}")));
|
||||
}
|
||||
|
||||
// ── 3. Evidence strength / source-kind priority ─────────────────────
|
||||
let evidence_bonus = evidence_strength(diag);
|
||||
score += evidence_bonus;
|
||||
if evidence_bonus != 0.0 {
|
||||
components.push(("evidence".into(), format!("{evidence_bonus}")));
|
||||
}
|
||||
|
||||
// ── 4. State finding sub-ranking ────────────────────────────────────
|
||||
let state_bonus = state_finding_bonus(&diag.id);
|
||||
score += state_bonus;
|
||||
if state_bonus != 0.0 {
|
||||
components.push(("state_rule".into(), format!("{state_bonus}")));
|
||||
}
|
||||
|
||||
// ── 5. Path validation penalty ──────────────────────────────────────
|
||||
//
|
||||
// If a taint path is guarded by a validation predicate, the finding
|
||||
// has higher informational value but lower exploitability because the
|
||||
// guard may prevent the vulnerability from being triggered. Apply a
|
||||
// small penalty (–5) to push validated paths below otherwise-equal
|
||||
// unvalidated ones without changing the overall ranking tier.
|
||||
let path_validated = diag.evidence.as_ref().map_or(diag.path_validated, |ev| {
|
||||
ev.notes.iter().any(|n| n == "path_validated")
|
||||
});
|
||||
if path_validated {
|
||||
score -= 5.0;
|
||||
components.push(("path_validated_penalty".into(), "-5".into()));
|
||||
}
|
||||
|
||||
AttackRank { score, components }
|
||||
}
|
||||
|
||||
/// Deterministic sort key for a diagnostic.
|
||||
///
|
||||
/// Two diags with identical scores are tie-broken by:
|
||||
/// severity (High < Medium < Low in the `Ord` impl, so we negate)
|
||||
/// → rule ID → file path → line → col → message hash
|
||||
///
|
||||
/// Returns a tuple suitable for `sort_by`.
|
||||
pub fn sort_key(diag: &Diag) -> impl Ord {
|
||||
let sev_ord: u8 = match diag.severity {
|
||||
Severity::High => 0,
|
||||
Severity::Medium => 1,
|
||||
Severity::Low => 2,
|
||||
};
|
||||
let msg_hash = {
|
||||
let mut h = DefaultHasher::new();
|
||||
diag.message.hash(&mut h);
|
||||
h.finish()
|
||||
};
|
||||
(
|
||||
sev_ord,
|
||||
diag.id.clone(),
|
||||
diag.path.clone(),
|
||||
diag.line,
|
||||
diag.col,
|
||||
msg_hash,
|
||||
)
|
||||
}
|
||||
|
||||
/// Sort diagnostics in-place by descending attack-surface score, then by
|
||||
/// deterministic tie-breaker. Populates `rank_score` on each `Diag`.
|
||||
pub fn rank_diags(diags: &mut [Diag]) {
|
||||
// Compute scores
|
||||
let scores: Vec<f64> = diags.iter().map(|d| compute_attack_rank(d).score).collect();
|
||||
|
||||
// Attach scores to diags
|
||||
for (d, s) in diags.iter_mut().zip(scores.iter()) {
|
||||
d.rank_score = Some(*s);
|
||||
}
|
||||
|
||||
// Sort descending by score, then ascending by tie-breaker
|
||||
diags.sort_by(|a, b| {
|
||||
let sa = a.rank_score.unwrap_or(0.0);
|
||||
let sb = b.rank_score.unwrap_or(0.0);
|
||||
// Descending score (higher first)
|
||||
sb.partial_cmp(&sa)
|
||||
.unwrap_or(std::cmp::Ordering::Equal)
|
||||
.then_with(|| sort_key(a).cmp(&sort_key(b)))
|
||||
});
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Scoring helpers
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Bonus based on analysis kind inferred from rule ID + evidence.
|
||||
fn analysis_kind_bonus(rule_id: &str, evidence: Option<&Evidence>) -> f64 {
|
||||
if rule_id.starts_with("taint-") {
|
||||
// Taint-confirmed flow is the strongest signal
|
||||
10.0
|
||||
} else if rule_id.starts_with("state-") {
|
||||
// State-model findings (resource / auth) are strong
|
||||
8.0
|
||||
} else if rule_id.starts_with("cfg-") {
|
||||
// CFG-structural findings: boost if evidence exists
|
||||
if evidence.is_some_and(|e| !e.is_empty()) {
|
||||
5.0
|
||||
} else {
|
||||
3.0
|
||||
}
|
||||
} else {
|
||||
// AST-only pattern match
|
||||
0.0
|
||||
}
|
||||
}
|
||||
|
||||
/// Bonus from evidence strength: number of evidence items and source-kind
|
||||
/// priority.
|
||||
fn evidence_strength(diag: &Diag) -> f64 {
|
||||
let mut bonus = 0.0;
|
||||
|
||||
if let Some(ev) = &diag.evidence {
|
||||
// Count structured evidence items (capped at 4)
|
||||
let item_count = ev.source.is_some() as usize
|
||||
+ ev.sink.is_some() as usize
|
||||
+ (ev.guards.len() + ev.sanitizers.len()).min(2);
|
||||
bonus += item_count.min(4) as f64;
|
||||
|
||||
// Source-kind priority from evidence notes
|
||||
for note in &ev.notes {
|
||||
if let Some(kind) = note.strip_prefix("source_kind:") {
|
||||
bonus += source_kind_priority(kind);
|
||||
break;
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Fallback for DB-cached diags without structured evidence
|
||||
bonus += (diag.labels.len() as f64).min(4.0);
|
||||
for (label, value) in &diag.labels {
|
||||
if label == "Source" {
|
||||
bonus += source_kind_priority(value);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
bonus
|
||||
}
|
||||
|
||||
/// Priority bonus based on the source kind string found in evidence.
|
||||
///
|
||||
/// UserInput / EnvironmentConfig / Unknown are most exploitable.
|
||||
/// FileSystem / Database are lower because the attacker needs a more
|
||||
/// indirect vector.
|
||||
fn source_kind_priority(source_value: &str) -> f64 {
|
||||
// Structured SourceKind enum values (from evidence.notes "source_kind:X")
|
||||
match source_value {
|
||||
"UserInput" => return 6.0,
|
||||
"EnvironmentConfig" => return 5.0,
|
||||
"FileSystem" => return 3.0,
|
||||
"Database" => return 2.0,
|
||||
"Unknown" => return 4.0,
|
||||
_ => {}
|
||||
}
|
||||
|
||||
// Fallback: substring matching for legacy labels
|
||||
let lower = source_value.to_ascii_lowercase();
|
||||
if lower.contains("stdin")
|
||||
|| lower.contains("argv")
|
||||
|| lower.contains("request")
|
||||
|| lower.contains("form")
|
||||
|| lower.contains("query")
|
||||
|| lower.contains("param")
|
||||
|| lower.contains("header")
|
||||
|| lower.contains("body")
|
||||
|| lower.contains("read_line")
|
||||
{
|
||||
// Strong user-input signals
|
||||
6.0
|
||||
} else if lower.contains("env") || lower.contains("var(") || lower.contains("getenv") {
|
||||
// Environment / config — still attacker-controllable in many deployments
|
||||
5.0
|
||||
} else if lower.contains("read") || lower.contains("file") || lower.contains("open") {
|
||||
// File system — needs indirect vector
|
||||
3.0
|
||||
} else if lower.contains("query") || lower.contains("fetch") || lower.contains("select") {
|
||||
// Database — needs prior injection
|
||||
2.0
|
||||
} else {
|
||||
// Unknown / unrecognised — treat as moderately exploitable
|
||||
4.0
|
||||
}
|
||||
}
|
||||
|
||||
/// Bonus for specific state-analysis rule IDs.
|
||||
fn state_finding_bonus(rule_id: &str) -> f64 {
|
||||
match rule_id {
|
||||
"state-use-after-close" => 6.0,
|
||||
"state-unauthed-access" => 6.0,
|
||||
"state-double-close" => 3.0,
|
||||
"state-resource-leak" => 2.0, // must-leak
|
||||
"state-resource-leak-possible" => 1.0, // may-leak
|
||||
_ => 0.0,
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Tests
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn make_diag(
|
||||
severity: Severity,
|
||||
id: &str,
|
||||
path: &str,
|
||||
line: usize,
|
||||
labels: Vec<(String, String)>,
|
||||
path_validated: bool,
|
||||
) -> Diag {
|
||||
Diag {
|
||||
path: path.into(),
|
||||
line,
|
||||
col: 1,
|
||||
severity,
|
||||
id: id.into(),
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
path_validated,
|
||||
guard_kind: None,
|
||||
message: None,
|
||||
labels,
|
||||
confidence: None,
|
||||
evidence: None,
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
}
|
||||
}
|
||||
|
||||
// ── Ordering tests ──────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn high_taint_user_input_ranks_above_medium_file_io() {
|
||||
let high_taint = make_diag(
|
||||
Severity::High,
|
||||
"taint-unsanitised-flow (source 1:1)",
|
||||
"src/main.rs",
|
||||
10,
|
||||
vec![
|
||||
("Source".into(), "read_line() at 1:1".into()),
|
||||
("Sink".into(), "exec()".into()),
|
||||
],
|
||||
false,
|
||||
);
|
||||
let med_file = make_diag(
|
||||
Severity::Medium,
|
||||
"taint-unsanitised-flow (source 5:1)",
|
||||
"src/lib.rs",
|
||||
20,
|
||||
vec![
|
||||
("Source".into(), "File::open() at 5:1".into()),
|
||||
("Sink".into(), "write()".into()),
|
||||
],
|
||||
false,
|
||||
);
|
||||
|
||||
let score_high = compute_attack_rank(&high_taint).score;
|
||||
let score_med = compute_attack_rank(&med_file).score;
|
||||
assert!(
|
||||
score_high > score_med,
|
||||
"high taint user-input ({score_high}) should rank above medium file-io ({score_med})"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn must_leak_ranks_above_may_leak() {
|
||||
let must = make_diag(
|
||||
Severity::Medium,
|
||||
"state-resource-leak",
|
||||
"src/db.rs",
|
||||
30,
|
||||
vec![],
|
||||
false,
|
||||
);
|
||||
let may = make_diag(
|
||||
Severity::Low,
|
||||
"state-resource-leak-possible",
|
||||
"src/db.rs",
|
||||
35,
|
||||
vec![],
|
||||
false,
|
||||
);
|
||||
|
||||
let score_must = compute_attack_rank(&must).score;
|
||||
let score_may = compute_attack_rank(&may).score;
|
||||
assert!(
|
||||
score_must > score_may,
|
||||
"must-leak ({score_must}) should rank above may-leak ({score_may})"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn cfg_without_evidence_ranks_below_taint_confirmed() {
|
||||
let taint = make_diag(
|
||||
Severity::High,
|
||||
"taint-unsanitised-flow (source 1:1)",
|
||||
"src/main.rs",
|
||||
10,
|
||||
vec![
|
||||
("Source".into(), "env::var(\"CMD\") at 1:1".into()),
|
||||
("Sink".into(), "exec()".into()),
|
||||
],
|
||||
false,
|
||||
);
|
||||
let cfg_only = make_diag(
|
||||
Severity::High,
|
||||
"cfg-unguarded-sink",
|
||||
"src/main.rs",
|
||||
10,
|
||||
vec![],
|
||||
false,
|
||||
);
|
||||
|
||||
let score_taint = compute_attack_rank(&taint).score;
|
||||
let score_cfg = compute_attack_rank(&cfg_only).score;
|
||||
assert!(
|
||||
score_taint > score_cfg,
|
||||
"taint-confirmed ({score_taint}) should rank above cfg-only ({score_cfg})"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn determinism_input_order_independent() {
|
||||
let d1 = make_diag(
|
||||
Severity::High,
|
||||
"taint-unsanitised-flow (source 1:1)",
|
||||
"a.rs",
|
||||
1,
|
||||
vec![("Source".into(), "stdin at 1:1".into())],
|
||||
false,
|
||||
);
|
||||
let d2 = make_diag(
|
||||
Severity::Medium,
|
||||
"cfg-unguarded-sink",
|
||||
"b.rs",
|
||||
2,
|
||||
vec![],
|
||||
false,
|
||||
);
|
||||
let d3 = make_diag(Severity::Low, "rs.code_exec.eval", "c.rs", 3, vec![], false);
|
||||
|
||||
let mut order_a = vec![d1.clone(), d2.clone(), d3.clone()];
|
||||
let mut order_b = vec![d3, d1, d2];
|
||||
|
||||
rank_diags(&mut order_a);
|
||||
rank_diags(&mut order_b);
|
||||
|
||||
let ids_a: Vec<_> = order_a.iter().map(|d| (&d.id, d.line)).collect();
|
||||
let ids_b: Vec<_> = order_b.iter().map(|d| (&d.id, d.line)).collect();
|
||||
assert_eq!(
|
||||
ids_a, ids_b,
|
||||
"ranking must be deterministic regardless of input order"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn path_validated_penalty_applied() {
|
||||
let unvalidated = make_diag(
|
||||
Severity::High,
|
||||
"taint-unsanitised-flow (source 1:1)",
|
||||
"src/main.rs",
|
||||
10,
|
||||
vec![("Source".into(), "env::var(\"X\") at 1:1".into())],
|
||||
false,
|
||||
);
|
||||
let validated = make_diag(
|
||||
Severity::High,
|
||||
"taint-unsanitised-flow (source 1:1)",
|
||||
"src/main.rs",
|
||||
10,
|
||||
vec![("Source".into(), "env::var(\"X\") at 1:1".into())],
|
||||
true,
|
||||
);
|
||||
|
||||
let score_unval = compute_attack_rank(&unvalidated).score;
|
||||
let score_val = compute_attack_rank(&validated).score;
|
||||
assert!(
|
||||
score_unval > score_val,
|
||||
"unvalidated ({score_unval}) should rank above validated ({score_val})"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn state_use_after_close_ranks_above_may_leak() {
|
||||
let uac = make_diag(
|
||||
Severity::High,
|
||||
"state-use-after-close",
|
||||
"x.rs",
|
||||
1,
|
||||
vec![],
|
||||
false,
|
||||
);
|
||||
let may = make_diag(
|
||||
Severity::Low,
|
||||
"state-resource-leak-possible",
|
||||
"x.rs",
|
||||
2,
|
||||
vec![],
|
||||
false,
|
||||
);
|
||||
|
||||
let score_uac = compute_attack_rank(&uac).score;
|
||||
let score_may = compute_attack_rank(&may).score;
|
||||
assert!(score_uac > score_may);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unauthed_access_ranks_above_resource_leak() {
|
||||
let unauth = make_diag(
|
||||
Severity::High,
|
||||
"state-unauthed-access",
|
||||
"x.rs",
|
||||
1,
|
||||
vec![],
|
||||
false,
|
||||
);
|
||||
let leak = make_diag(
|
||||
Severity::Medium,
|
||||
"state-resource-leak",
|
||||
"x.rs",
|
||||
2,
|
||||
vec![],
|
||||
false,
|
||||
);
|
||||
|
||||
let score_ua = compute_attack_rank(&unauth).score;
|
||||
let score_lk = compute_attack_rank(&leak).score;
|
||||
assert!(score_ua > score_lk);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn ast_only_ranks_below_all_others_at_same_severity() {
|
||||
let ast = make_diag(
|
||||
Severity::High,
|
||||
"rs.code_exec.eval",
|
||||
"x.rs",
|
||||
1,
|
||||
vec![],
|
||||
false,
|
||||
);
|
||||
let cfg = make_diag(
|
||||
Severity::High,
|
||||
"cfg-unguarded-sink",
|
||||
"x.rs",
|
||||
2,
|
||||
vec![],
|
||||
false,
|
||||
);
|
||||
let taint = make_diag(
|
||||
Severity::High,
|
||||
"taint-unsanitised-flow (source 1:1)",
|
||||
"x.rs",
|
||||
3,
|
||||
vec![("Source".into(), "env::var(\"X\") at 1:1".into())],
|
||||
false,
|
||||
);
|
||||
let state = make_diag(
|
||||
Severity::High,
|
||||
"state-use-after-close",
|
||||
"x.rs",
|
||||
4,
|
||||
vec![],
|
||||
false,
|
||||
);
|
||||
|
||||
let s_ast = compute_attack_rank(&ast).score;
|
||||
let s_cfg = compute_attack_rank(&cfg).score;
|
||||
let s_taint = compute_attack_rank(&taint).score;
|
||||
let s_state = compute_attack_rank(&state).score;
|
||||
|
||||
assert!(s_ast < s_cfg, "AST ({s_ast}) < CFG ({s_cfg})");
|
||||
assert!(s_ast < s_taint, "AST ({s_ast}) < taint ({s_taint})");
|
||||
assert!(s_ast < s_state, "AST ({s_ast}) < state ({s_state})");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn structured_evidence_source_kind_matches_legacy() {
|
||||
// Structured evidence with source_kind:UserInput note should give
|
||||
// the same source-kind bonus as a legacy "Source" label with user input.
|
||||
let mut structured = make_diag(
|
||||
Severity::High,
|
||||
"taint-unsanitised-flow (source 1:1)",
|
||||
"src/main.rs",
|
||||
10,
|
||||
vec![],
|
||||
false,
|
||||
);
|
||||
structured.evidence = Some(crate::evidence::Evidence {
|
||||
source: Some(crate::evidence::SpanEvidence {
|
||||
path: "src/main.rs".into(),
|
||||
line: 1,
|
||||
col: 1,
|
||||
kind: "source".into(),
|
||||
snippet: Some("read_line()".into()),
|
||||
}),
|
||||
sink: Some(crate::evidence::SpanEvidence {
|
||||
path: "src/main.rs".into(),
|
||||
line: 10,
|
||||
col: 5,
|
||||
kind: "sink".into(),
|
||||
snippet: Some("exec()".into()),
|
||||
}),
|
||||
guards: vec![],
|
||||
sanitizers: vec![],
|
||||
state: None,
|
||||
notes: vec!["source_kind:UserInput".into()],
|
||||
});
|
||||
|
||||
let legacy = make_diag(
|
||||
Severity::High,
|
||||
"taint-unsanitised-flow (source 1:1)",
|
||||
"src/main.rs",
|
||||
10,
|
||||
vec![
|
||||
("Source".into(), "read_line() at 1:1".into()),
|
||||
("Sink".into(), "exec()".into()),
|
||||
],
|
||||
false,
|
||||
);
|
||||
|
||||
let score_structured = compute_attack_rank(&structured).score;
|
||||
let score_legacy = compute_attack_rank(&legacy).score;
|
||||
assert_eq!(
|
||||
score_structured, score_legacy,
|
||||
"structured ({score_structured}) should equal legacy ({score_legacy})"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn evidence_item_count_capped_at_4() {
|
||||
let mut d = make_diag(
|
||||
Severity::High,
|
||||
"taint-unsanitised-flow (source 1:1)",
|
||||
"src/main.rs",
|
||||
10,
|
||||
vec![],
|
||||
false,
|
||||
);
|
||||
let span = || crate::evidence::SpanEvidence {
|
||||
path: "x.rs".into(),
|
||||
line: 1,
|
||||
col: 1,
|
||||
kind: "guard".into(),
|
||||
snippet: None,
|
||||
};
|
||||
d.evidence = Some(crate::evidence::Evidence {
|
||||
source: Some(span()),
|
||||
sink: Some(span()),
|
||||
guards: vec![span(), span(), span()], // 3 guards
|
||||
sanitizers: vec![span()], // 1 sanitizer
|
||||
state: None,
|
||||
notes: vec![],
|
||||
});
|
||||
|
||||
// item_count = 1 (source) + 1 (sink) + min(2, 3+1) = 4
|
||||
// evidence bonus should be exactly 4.0 (from items) + 4.0 (unknown source kind) = 8.0
|
||||
// ... but no source_kind note, so no source priority bonus
|
||||
let score = evidence_strength(&d);
|
||||
assert!(
|
||||
(score - 4.0).abs() < f64::EPSILON,
|
||||
"evidence item count should be capped at 4, got {score}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn path_validated_from_evidence_notes() {
|
||||
let mut d = make_diag(
|
||||
Severity::High,
|
||||
"taint-unsanitised-flow (source 1:1)",
|
||||
"src/main.rs",
|
||||
10,
|
||||
vec![],
|
||||
false, // path_validated is false on Diag
|
||||
);
|
||||
d.evidence = Some(crate::evidence::Evidence {
|
||||
source: None,
|
||||
sink: None,
|
||||
guards: vec![],
|
||||
sanitizers: vec![],
|
||||
state: None,
|
||||
notes: vec!["path_validated".into()],
|
||||
});
|
||||
|
||||
let rank = compute_attack_rank(&d);
|
||||
assert!(
|
||||
rank.components
|
||||
.iter()
|
||||
.any(|(k, _)| k == "path_validated_penalty"),
|
||||
"path_validated note in evidence should trigger penalty"
|
||||
);
|
||||
}
|
||||
}
|
||||
313
src/state/domain.rs
Normal file
313
src/state/domain.rs
Normal file
|
|
@ -0,0 +1,313 @@
|
|||
use super::lattice::Lattice;
|
||||
use super::symbol::SymbolId;
|
||||
use bitflags::bitflags;
|
||||
use std::collections::{HashMap, HashSet};
|
||||
|
||||
// ── ResourceLifecycle ────────────────────────────────────────────────────
|
||||
|
||||
bitflags! {
|
||||
/// Bitset of possible lifecycle states for a single resource handle.
|
||||
///
|
||||
/// Join = bitwise OR (a variable may be in multiple states across paths).
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
|
||||
pub struct ResourceLifecycle: u8 {
|
||||
const UNINIT = 0b0001;
|
||||
const OPEN = 0b0010;
|
||||
const CLOSED = 0b0100;
|
||||
const MOVED = 0b1000;
|
||||
}
|
||||
}
|
||||
|
||||
impl Lattice for ResourceLifecycle {
|
||||
fn bot() -> Self {
|
||||
ResourceLifecycle::empty()
|
||||
}
|
||||
|
||||
fn join(&self, other: &Self) -> Self {
|
||||
*self | *other
|
||||
}
|
||||
|
||||
fn leq(&self, other: &Self) -> bool {
|
||||
self.intersection(*other) == *self
|
||||
}
|
||||
}
|
||||
|
||||
// ── ResourceDomainState ──────────────────────────────────────────────────
|
||||
|
||||
/// Maps interned variable IDs to their lifecycle bitsets.
|
||||
#[derive(Clone, Debug, Default, PartialEq, Eq)]
|
||||
pub struct ResourceDomainState {
|
||||
pub vars: HashMap<SymbolId, ResourceLifecycle>,
|
||||
}
|
||||
|
||||
impl ResourceDomainState {
|
||||
pub fn new() -> Self {
|
||||
Self::default()
|
||||
}
|
||||
|
||||
pub fn get(&self, sym: SymbolId) -> ResourceLifecycle {
|
||||
self.vars
|
||||
.get(&sym)
|
||||
.copied()
|
||||
.unwrap_or(ResourceLifecycle::empty())
|
||||
}
|
||||
|
||||
pub fn set(&mut self, sym: SymbolId, state: ResourceLifecycle) {
|
||||
self.vars.insert(sym, state);
|
||||
}
|
||||
}
|
||||
|
||||
impl Lattice for ResourceDomainState {
|
||||
fn bot() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
|
||||
fn join(&self, other: &Self) -> Self {
|
||||
let mut merged = self.clone();
|
||||
for (&sym, &other_lc) in &other.vars {
|
||||
let entry = merged.vars.entry(sym).or_insert(ResourceLifecycle::empty());
|
||||
*entry = entry.join(&other_lc);
|
||||
}
|
||||
merged
|
||||
}
|
||||
|
||||
fn leq(&self, other: &Self) -> bool {
|
||||
for (&sym, &self_lc) in &self.vars {
|
||||
let other_lc = other.get(sym);
|
||||
if !self_lc.leq(&other_lc) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
true
|
||||
}
|
||||
}
|
||||
|
||||
// ── AuthLevel ────────────────────────────────────────────────────────────
|
||||
|
||||
/// Simple ordered lattice for path authentication state.
|
||||
///
|
||||
/// Bot = `Unauthed`. Join = `min` (conservative: if any path is unauthed,
|
||||
/// the joined state is unauthed).
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
|
||||
pub enum AuthLevel {
|
||||
Unauthed,
|
||||
Authed,
|
||||
Admin,
|
||||
}
|
||||
|
||||
impl Lattice for AuthLevel {
|
||||
fn bot() -> Self {
|
||||
AuthLevel::Unauthed
|
||||
}
|
||||
|
||||
fn join(&self, other: &Self) -> Self {
|
||||
// Conservative: take the minimum (least privileged)
|
||||
(*self).min(*other)
|
||||
}
|
||||
|
||||
fn leq(&self, other: &Self) -> bool {
|
||||
// Higher auth subsumes lower: Unauthed ⊑ Authed ⊑ Admin
|
||||
// In our lattice, join = min, so leq means self >= other
|
||||
*self >= *other
|
||||
}
|
||||
}
|
||||
|
||||
// ── AuthDomainState ──────────────────────────────────────────────────────
|
||||
|
||||
/// Path auth level + per-variable validation bit.
|
||||
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||
pub struct AuthDomainState {
|
||||
pub auth_level: AuthLevel,
|
||||
pub validated: HashSet<SymbolId>,
|
||||
}
|
||||
|
||||
impl Default for AuthDomainState {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
auth_level: AuthLevel::Unauthed,
|
||||
validated: HashSet::new(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl AuthDomainState {
|
||||
pub fn new() -> Self {
|
||||
Self::default()
|
||||
}
|
||||
}
|
||||
|
||||
impl Lattice for AuthDomainState {
|
||||
fn bot() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
|
||||
fn join(&self, other: &Self) -> Self {
|
||||
Self {
|
||||
auth_level: self.auth_level.join(&other.auth_level),
|
||||
// Only validated on ALL paths counts
|
||||
validated: self
|
||||
.validated
|
||||
.intersection(&other.validated)
|
||||
.copied()
|
||||
.collect(),
|
||||
}
|
||||
}
|
||||
|
||||
fn leq(&self, other: &Self) -> bool {
|
||||
self.auth_level.leq(&other.auth_level) && self.validated.is_superset(&other.validated)
|
||||
}
|
||||
}
|
||||
|
||||
// ── ProductState ─────────────────────────────────────────────────────────
|
||||
|
||||
/// Composable product of resource and auth domains.
|
||||
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||
pub struct ProductState {
|
||||
pub resource: ResourceDomainState,
|
||||
pub auth: AuthDomainState,
|
||||
}
|
||||
|
||||
impl ProductState {
|
||||
pub fn initial() -> Self {
|
||||
Self {
|
||||
resource: ResourceDomainState::new(),
|
||||
auth: AuthDomainState::new(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Lattice for ProductState {
|
||||
fn bot() -> Self {
|
||||
Self {
|
||||
resource: ResourceDomainState::bot(),
|
||||
auth: AuthDomainState::bot(),
|
||||
}
|
||||
}
|
||||
|
||||
fn join(&self, other: &Self) -> Self {
|
||||
Self {
|
||||
resource: self.resource.join(&other.resource),
|
||||
auth: self.auth.join(&other.auth),
|
||||
}
|
||||
}
|
||||
|
||||
fn leq(&self, other: &Self) -> bool {
|
||||
self.resource.leq(&other.resource) && self.auth.leq(&other.auth)
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn resource_lifecycle_join_is_or() {
|
||||
let a = ResourceLifecycle::OPEN;
|
||||
let b = ResourceLifecycle::CLOSED;
|
||||
assert_eq!(
|
||||
a.join(&b),
|
||||
ResourceLifecycle::OPEN | ResourceLifecycle::CLOSED
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn resource_lifecycle_bot_identity() {
|
||||
let a = ResourceLifecycle::OPEN;
|
||||
assert_eq!(a.join(&ResourceLifecycle::bot()), a);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn resource_lifecycle_leq() {
|
||||
let a = ResourceLifecycle::OPEN;
|
||||
let b = ResourceLifecycle::OPEN | ResourceLifecycle::CLOSED;
|
||||
assert!(a.leq(&b));
|
||||
assert!(!b.leq(&a));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn resource_domain_join_merges_keys() {
|
||||
let mut a = ResourceDomainState::new();
|
||||
let mut b = ResourceDomainState::new();
|
||||
let sym_x = SymbolId(0);
|
||||
let sym_y = SymbolId(1);
|
||||
|
||||
a.set(sym_x, ResourceLifecycle::OPEN);
|
||||
b.set(sym_x, ResourceLifecycle::CLOSED);
|
||||
b.set(sym_y, ResourceLifecycle::OPEN);
|
||||
|
||||
let joined = a.join(&b);
|
||||
assert_eq!(
|
||||
joined.get(sym_x),
|
||||
ResourceLifecycle::OPEN | ResourceLifecycle::CLOSED
|
||||
);
|
||||
assert_eq!(joined.get(sym_y), ResourceLifecycle::OPEN);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn auth_level_join_is_min() {
|
||||
assert_eq!(
|
||||
AuthLevel::Admin.join(&AuthLevel::Unauthed),
|
||||
AuthLevel::Unauthed
|
||||
);
|
||||
assert_eq!(AuthLevel::Authed.join(&AuthLevel::Admin), AuthLevel::Authed);
|
||||
assert_eq!(
|
||||
AuthLevel::Authed.join(&AuthLevel::Authed),
|
||||
AuthLevel::Authed
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn auth_domain_join_intersects_validated() {
|
||||
let sym_a = SymbolId(0);
|
||||
let sym_b = SymbolId(1);
|
||||
let sym_c = SymbolId(2);
|
||||
|
||||
let a = AuthDomainState {
|
||||
auth_level: AuthLevel::Authed,
|
||||
validated: [sym_a, sym_b].into_iter().collect(),
|
||||
};
|
||||
let b = AuthDomainState {
|
||||
auth_level: AuthLevel::Admin,
|
||||
validated: [sym_b, sym_c].into_iter().collect(),
|
||||
};
|
||||
|
||||
let joined = a.join(&b);
|
||||
assert_eq!(joined.auth_level, AuthLevel::Authed);
|
||||
assert_eq!(joined.validated, [sym_b].into_iter().collect());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn product_state_join() {
|
||||
let a = ProductState::initial();
|
||||
let b = ProductState::initial();
|
||||
let joined = a.join(&b);
|
||||
assert_eq!(joined, ProductState::initial());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn may_must_leak_semantics() {
|
||||
// Must-leak: OPEN only
|
||||
let must_leak = ResourceLifecycle::OPEN;
|
||||
assert!(must_leak.contains(ResourceLifecycle::OPEN));
|
||||
assert!(!must_leak.contains(ResourceLifecycle::CLOSED));
|
||||
assert!(!must_leak.contains(ResourceLifecycle::MOVED));
|
||||
|
||||
// May-leak: OPEN | CLOSED (some paths close, some don't)
|
||||
let may_leak = ResourceLifecycle::OPEN | ResourceLifecycle::CLOSED;
|
||||
assert!(may_leak.contains(ResourceLifecycle::OPEN));
|
||||
assert!(may_leak.contains(ResourceLifecycle::CLOSED));
|
||||
|
||||
// No leak: CLOSED only
|
||||
let no_leak = ResourceLifecycle::CLOSED;
|
||||
assert!(!no_leak.contains(ResourceLifecycle::OPEN));
|
||||
assert!(no_leak.contains(ResourceLifecycle::CLOSED));
|
||||
}
|
||||
|
||||
// SymbolId is a newtype used in domain tests; ensure it's Copy
|
||||
#[test]
|
||||
fn symbol_id_is_copy() {
|
||||
let s = SymbolId(0);
|
||||
let s2 = s;
|
||||
assert_eq!(s, s2);
|
||||
}
|
||||
}
|
||||
288
src/state/engine.rs
Normal file
288
src/state/engine.rs
Normal file
|
|
@ -0,0 +1,288 @@
|
|||
use super::lattice::Lattice;
|
||||
use crate::cfg::{Cfg, EdgeKind, NodeInfo};
|
||||
use petgraph::graph::NodeIndex;
|
||||
use petgraph::visit::EdgeRef;
|
||||
use std::collections::{HashMap, VecDeque};
|
||||
|
||||
/// Maximum tracked variables per function (guarded degradation).
|
||||
pub const MAX_TRACKED_VARS: usize = 64;
|
||||
|
||||
/// Default worklist iteration budget.
|
||||
pub const MAX_WORKLIST_ITERATIONS: usize = 100_000;
|
||||
|
||||
/// Generic transfer function trait for forward dataflow analysis.
|
||||
///
|
||||
/// Domains implement this to define how abstract state flows through
|
||||
/// CFG nodes and what events (findings) are emitted.
|
||||
pub trait Transfer<S: Lattice> {
|
||||
/// Side-channel events emitted during transfer (e.g., findings, violations).
|
||||
type Event: Clone;
|
||||
|
||||
/// Apply the transfer function to a node, returning the output state
|
||||
/// and any events.
|
||||
fn apply(
|
||||
&self,
|
||||
node: NodeIndex,
|
||||
info: &NodeInfo,
|
||||
edge: Option<EdgeKind>,
|
||||
state: S,
|
||||
) -> (S, Vec<Self::Event>);
|
||||
|
||||
/// Per-domain iteration budget. Defaults to [`MAX_WORKLIST_ITERATIONS`].
|
||||
fn iteration_budget(&self) -> usize {
|
||||
MAX_WORKLIST_ITERATIONS
|
||||
}
|
||||
|
||||
/// Called when the budget is exhausted. Returns true if the engine
|
||||
/// should continue with the current (non-converged) state, false to bail.
|
||||
fn on_budget_exceeded(&self) -> bool {
|
||||
false
|
||||
}
|
||||
}
|
||||
|
||||
/// Result of running the forward dataflow engine.
|
||||
pub struct DataflowResult<S, E> {
|
||||
/// Converged state at the entry of each node.
|
||||
pub states: HashMap<NodeIndex, S>,
|
||||
/// Events emitted during Phase 2 transfer over converged states.
|
||||
pub events: Vec<E>,
|
||||
/// Whether the analysis converged (false if budget was hit).
|
||||
#[allow(dead_code)]
|
||||
pub converged: bool,
|
||||
}
|
||||
|
||||
/// Run a forward worklist dataflow analysis over the CFG.
|
||||
///
|
||||
/// Two-phase design:
|
||||
/// - Phase 1: fixed-point iteration to converge states (no event collection).
|
||||
/// - Phase 2: single pass over converged states to collect events.
|
||||
///
|
||||
/// Termination is guaranteed by lattice finiteness + iteration budget.
|
||||
pub fn run_forward<S: Lattice, T: Transfer<S>>(
|
||||
cfg: &Cfg,
|
||||
entry: NodeIndex,
|
||||
transfer: &T,
|
||||
initial: S,
|
||||
) -> DataflowResult<S, T::Event> {
|
||||
let mut states: HashMap<NodeIndex, S> = HashMap::new();
|
||||
let budget = transfer.iteration_budget();
|
||||
|
||||
// Initialize entry node
|
||||
states.insert(entry, initial);
|
||||
|
||||
// ── Phase 1: fixed-point iteration (compute converged states) ─────
|
||||
let mut worklist: VecDeque<NodeIndex> = VecDeque::new();
|
||||
worklist.push_back(entry);
|
||||
|
||||
let mut iterations: usize = 0;
|
||||
let mut converged = true;
|
||||
|
||||
while let Some(node) = worklist.pop_front() {
|
||||
iterations += 1;
|
||||
if iterations > budget {
|
||||
converged = !transfer.on_budget_exceeded();
|
||||
if !converged {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
let node_state = match states.get(&node) {
|
||||
Some(s) => s.clone(),
|
||||
None => continue,
|
||||
};
|
||||
|
||||
let edges: Vec<_> = cfg.edges(node).map(|e| (*e.weight(), e.target())).collect();
|
||||
|
||||
// No outgoing edges — nothing to propagate (exit/dead end).
|
||||
if edges.is_empty() {
|
||||
continue;
|
||||
}
|
||||
|
||||
for (edge_kind, target) in edges {
|
||||
let info = &cfg[node];
|
||||
let (out_state, _events) =
|
||||
transfer.apply(node, info, Some(edge_kind), node_state.clone());
|
||||
|
||||
// Join into target's state
|
||||
let target_state = states.get(&target);
|
||||
let new_target = match target_state {
|
||||
Some(existing) => existing.join(&out_state),
|
||||
None => out_state,
|
||||
};
|
||||
|
||||
let changed = target_state.is_none_or(|existing| *existing != new_target);
|
||||
if changed {
|
||||
states.insert(target, new_target);
|
||||
if !worklist.contains(&target) {
|
||||
worklist.push_back(target);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── Phase 2: single pass over converged states to collect events ──
|
||||
let mut events: Vec<T::Event> = Vec::new();
|
||||
let mut seen_edges: std::collections::HashSet<(NodeIndex, NodeIndex)> =
|
||||
std::collections::HashSet::new();
|
||||
|
||||
for node in states.keys().copied().collect::<Vec<_>>() {
|
||||
let node_state = match states.get(&node) {
|
||||
Some(s) => s.clone(),
|
||||
None => continue,
|
||||
};
|
||||
|
||||
let edges: Vec<_> = cfg.edges(node).map(|e| (*e.weight(), e.target())).collect();
|
||||
|
||||
if edges.is_empty() {
|
||||
// Exit / dead end — apply transfer for event collection.
|
||||
let info = &cfg[node];
|
||||
let (_out_state, new_events) = transfer.apply(node, info, None, node_state);
|
||||
events.extend(new_events);
|
||||
continue;
|
||||
}
|
||||
|
||||
for (edge_kind, target) in edges {
|
||||
if !seen_edges.insert((node, target)) {
|
||||
continue;
|
||||
}
|
||||
let info = &cfg[node];
|
||||
let (_out_state, new_events) =
|
||||
transfer.apply(node, info, Some(edge_kind), node_state.clone());
|
||||
events.extend(new_events);
|
||||
}
|
||||
}
|
||||
|
||||
DataflowResult {
|
||||
states,
|
||||
events,
|
||||
converged,
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::cfg::{EdgeKind, NodeInfo, StmtKind};
|
||||
use crate::cfg_analysis::rules;
|
||||
use crate::state::domain::ResourceLifecycle;
|
||||
use crate::state::symbol::SymbolInterner;
|
||||
use crate::state::transfer::DefaultTransfer;
|
||||
use crate::symbol::Lang;
|
||||
use petgraph::Graph;
|
||||
|
||||
fn make_node(kind: StmtKind) -> NodeInfo {
|
||||
NodeInfo {
|
||||
kind,
|
||||
span: (0, 0),
|
||||
label: None,
|
||||
defines: None,
|
||||
uses: vec![],
|
||||
callee: None,
|
||||
enclosing_func: None,
|
||||
call_ordinal: 0,
|
||||
condition_text: None,
|
||||
condition_vars: vec![],
|
||||
condition_negated: false,
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn linear_cfg_converges() {
|
||||
use crate::state::domain::ProductState;
|
||||
|
||||
// Entry → fopen(f) → fclose(f) → Exit
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
let open_node = cfg.add_node(NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
defines: Some("f".into()),
|
||||
callee: Some("fopen".into()),
|
||||
..make_node(StmtKind::Call)
|
||||
});
|
||||
let close_node = cfg.add_node(NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
uses: vec!["f".into()],
|
||||
callee: Some("fclose".into()),
|
||||
..make_node(StmtKind::Call)
|
||||
});
|
||||
let exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
|
||||
cfg.add_edge(entry, open_node, EdgeKind::Seq);
|
||||
cfg.add_edge(open_node, close_node, EdgeKind::Seq);
|
||||
cfg.add_edge(close_node, exit, EdgeKind::Seq);
|
||||
|
||||
let interner = SymbolInterner::from_cfg(&cfg);
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::C,
|
||||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
};
|
||||
|
||||
let result = run_forward(&cfg, entry, &transfer, ProductState::initial());
|
||||
|
||||
// No events (clean open→close)
|
||||
assert!(result.events.is_empty());
|
||||
assert!(result.converged);
|
||||
|
||||
// At exit, f should be CLOSED
|
||||
let sym_f = interner.get("f").unwrap();
|
||||
let exit_state = result.states.get(&exit).unwrap();
|
||||
assert_eq!(exit_state.resource.get(sym_f), ResourceLifecycle::CLOSED);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn diamond_cfg_joins_states() {
|
||||
use crate::state::domain::ProductState;
|
||||
|
||||
// Entry
|
||||
// |
|
||||
// fopen(f)
|
||||
// |
|
||||
// If
|
||||
// / \
|
||||
// fclose(f) (no close)
|
||||
// \ /
|
||||
// Exit
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
let open_node = cfg.add_node(NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
defines: Some("f".into()),
|
||||
callee: Some("fopen".into()),
|
||||
..make_node(StmtKind::Call)
|
||||
});
|
||||
let if_node = cfg.add_node(make_node(StmtKind::If));
|
||||
let close_node = cfg.add_node(NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
uses: vec!["f".into()],
|
||||
callee: Some("fclose".into()),
|
||||
..make_node(StmtKind::Call)
|
||||
});
|
||||
let no_close = cfg.add_node(make_node(StmtKind::Seq));
|
||||
let exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
|
||||
cfg.add_edge(entry, open_node, EdgeKind::Seq);
|
||||
cfg.add_edge(open_node, if_node, EdgeKind::Seq);
|
||||
cfg.add_edge(if_node, close_node, EdgeKind::True);
|
||||
cfg.add_edge(if_node, no_close, EdgeKind::False);
|
||||
cfg.add_edge(close_node, exit, EdgeKind::Seq);
|
||||
cfg.add_edge(no_close, exit, EdgeKind::Seq);
|
||||
|
||||
let interner = SymbolInterner::from_cfg(&cfg);
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::C,
|
||||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
};
|
||||
|
||||
let result = run_forward(&cfg, entry, &transfer, ProductState::initial());
|
||||
|
||||
// At exit, f should be OPEN | CLOSED (may-leak)
|
||||
let sym_f = interner.get("f").unwrap();
|
||||
let exit_state = result.states.get(&exit).unwrap();
|
||||
assert_eq!(
|
||||
exit_state.resource.get(sym_f),
|
||||
ResourceLifecycle::OPEN | ResourceLifecycle::CLOSED
|
||||
);
|
||||
}
|
||||
}
|
||||
355
src/state/facts.rs
Normal file
355
src/state/facts.rs
Normal file
|
|
@ -0,0 +1,355 @@
|
|||
use super::domain::{AuthLevel, ProductState, ResourceLifecycle};
|
||||
use super::engine::DataflowResult;
|
||||
use super::symbol::SymbolInterner;
|
||||
use super::transfer::{TransferEvent, TransferEventKind};
|
||||
use crate::cfg::{Cfg, StmtKind};
|
||||
use crate::labels::{Cap, DataLabel};
|
||||
use crate::patterns::Severity;
|
||||
use crate::symbol::Lang;
|
||||
use petgraph::visit::IntoNodeReferences;
|
||||
|
||||
/// Normalize a callee description for display.
|
||||
fn sanitize_desc(s: &str) -> String {
|
||||
crate::fmt::normalize_snippet(s)
|
||||
}
|
||||
|
||||
/// A finding produced by state analysis.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct StateFinding {
|
||||
pub rule_id: String,
|
||||
pub severity: Severity,
|
||||
pub span: (usize, usize),
|
||||
pub message: String,
|
||||
/// State machine that produced this finding: `"resource"` or `"auth"`.
|
||||
pub machine: &'static str,
|
||||
/// Variable name involved, if available.
|
||||
pub subject: Option<String>,
|
||||
/// State before the event (e.g. `"closed"`, `"open"`, `"unauthed"`).
|
||||
pub from_state: &'static str,
|
||||
/// State after the event (e.g. `"used"`, `"closed"`, `"leaked"`, `"access"`).
|
||||
pub to_state: &'static str,
|
||||
}
|
||||
|
||||
/// Extract findings from converged dataflow state + transfer events.
|
||||
pub fn extract_findings(
|
||||
result: &DataflowResult<ProductState, TransferEvent>,
|
||||
cfg: &Cfg,
|
||||
interner: &SymbolInterner,
|
||||
lang: Lang,
|
||||
func_summaries: &crate::cfg::FuncSummaries,
|
||||
) -> Vec<StateFinding> {
|
||||
let mut findings = Vec::new();
|
||||
|
||||
// ── 1. Use-after-close from transfer events ──────────────────────────
|
||||
for event in &result.events {
|
||||
let info = &cfg[event.node];
|
||||
let var_name = interner.resolve(event.var);
|
||||
match event.kind {
|
||||
TransferEventKind::UseAfterClose => {
|
||||
findings.push(StateFinding {
|
||||
rule_id: "state-use-after-close".into(),
|
||||
severity: Severity::High,
|
||||
span: info.span,
|
||||
message: format!("variable `{var_name}` used after close"),
|
||||
machine: "resource",
|
||||
subject: Some(var_name.to_string()),
|
||||
from_state: "closed",
|
||||
to_state: "used",
|
||||
});
|
||||
}
|
||||
TransferEventKind::DoubleClose => {
|
||||
findings.push(StateFinding {
|
||||
rule_id: "state-double-close".into(),
|
||||
severity: Severity::Medium,
|
||||
span: info.span,
|
||||
message: format!("variable `{var_name}` closed twice"),
|
||||
machine: "resource",
|
||||
subject: Some(var_name.to_string()),
|
||||
from_state: "closed",
|
||||
to_state: "closed",
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── 2. Resource leaks at Exit and function-Return nodes ──────────────
|
||||
for (idx, info) in cfg.node_references() {
|
||||
// Check both the file-level Exit node and the *synthesised* function
|
||||
// exit node (a Return node). Skip early-return nodes — they flow
|
||||
// into the synthesised exit and carry only path-specific state.
|
||||
// The synthesised exit is the one Return node that does NOT have an
|
||||
// outgoing edge to another Return in the same function.
|
||||
let is_exit = info.kind == StmtKind::Exit;
|
||||
let is_func_exit = info.kind == StmtKind::Return && info.enclosing_func.is_some();
|
||||
if !is_exit && !is_func_exit {
|
||||
continue;
|
||||
}
|
||||
if is_func_exit {
|
||||
use petgraph::Direction;
|
||||
let is_early_return = cfg
|
||||
.neighbors_directed(idx, Direction::Outgoing)
|
||||
.any(|succ| {
|
||||
let s = &cfg[succ];
|
||||
s.kind == StmtKind::Return && s.enclosing_func == info.enclosing_func
|
||||
});
|
||||
if is_early_return {
|
||||
continue;
|
||||
}
|
||||
}
|
||||
let Some(state) = result.states.get(&idx) else {
|
||||
continue;
|
||||
};
|
||||
|
||||
for (&sym, &lifecycle) in &state.resource.vars {
|
||||
if !lifecycle.contains(ResourceLifecycle::OPEN) {
|
||||
continue;
|
||||
}
|
||||
let var_name = interner.resolve(sym);
|
||||
|
||||
if !lifecycle.contains(ResourceLifecycle::CLOSED)
|
||||
&& !lifecycle.contains(ResourceLifecycle::MOVED)
|
||||
{
|
||||
// Definite leak: open on all paths, never closed
|
||||
// Find the acquire span by scanning backwards for this variable's define
|
||||
let acquire_span = find_acquire_span(cfg, sym, interner);
|
||||
findings.push(StateFinding {
|
||||
rule_id: "state-resource-leak".into(),
|
||||
severity: Severity::Medium,
|
||||
span: acquire_span.unwrap_or(info.span),
|
||||
message: format!("resource `{var_name}` is never closed"),
|
||||
machine: "resource",
|
||||
subject: Some(var_name.to_string()),
|
||||
from_state: "open",
|
||||
to_state: "leaked",
|
||||
});
|
||||
} else if lifecycle.contains(ResourceLifecycle::CLOSED) {
|
||||
// May-leak: open on some paths, closed on others
|
||||
let acquire_span = find_acquire_span(cfg, sym, interner);
|
||||
findings.push(StateFinding {
|
||||
rule_id: "state-resource-leak-possible".into(),
|
||||
severity: Severity::Low,
|
||||
span: acquire_span.unwrap_or(info.span),
|
||||
message: format!("resource `{var_name}` may not be closed on all paths"),
|
||||
machine: "resource",
|
||||
subject: Some(var_name.to_string()),
|
||||
from_state: "open",
|
||||
to_state: "possibly_leaked",
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── 3. Auth-required sinks ───────────────────────────────────────────
|
||||
// Check if any function is a web entrypoint
|
||||
let has_web_entrypoint = cfg.node_references().any(|(_, info)| {
|
||||
if let Some(ref func_name) = info.enclosing_func {
|
||||
is_web_entrypoint_simple(func_name, lang, func_summaries, cfg)
|
||||
} else {
|
||||
false
|
||||
}
|
||||
});
|
||||
|
||||
if has_web_entrypoint {
|
||||
for (idx, info) in cfg.node_references() {
|
||||
if !is_privileged_sink(info) {
|
||||
continue;
|
||||
}
|
||||
let Some(state) = result.states.get(&idx) else {
|
||||
continue;
|
||||
};
|
||||
if state.auth.auth_level == AuthLevel::Unauthed {
|
||||
let callee_desc = sanitize_desc(info.callee.as_deref().unwrap_or("(sensitive op)"));
|
||||
findings.push(StateFinding {
|
||||
rule_id: "state-unauthed-access".into(),
|
||||
severity: Severity::High,
|
||||
span: info.span,
|
||||
message: format!(
|
||||
"sensitive operation `{callee_desc}` reached without authentication"
|
||||
),
|
||||
machine: "auth",
|
||||
subject: None,
|
||||
from_state: "unauthed",
|
||||
to_state: "access",
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Dedup
|
||||
findings.sort_by(|a, b| a.span.cmp(&b.span).then_with(|| a.rule_id.cmp(&b.rule_id)));
|
||||
findings.dedup_by(|a, b| a.span == b.span && a.rule_id == b.rule_id);
|
||||
|
||||
findings
|
||||
}
|
||||
|
||||
/// Find the span where a variable was acquired (defined via Call node).
|
||||
fn find_acquire_span(
|
||||
cfg: &Cfg,
|
||||
sym: super::symbol::SymbolId,
|
||||
interner: &SymbolInterner,
|
||||
) -> Option<(usize, usize)> {
|
||||
let var_name = interner.resolve(sym);
|
||||
for (_idx, info) in cfg.node_references() {
|
||||
if info.kind == StmtKind::Call
|
||||
&& let Some(ref def) = info.defines
|
||||
&& def == var_name
|
||||
{
|
||||
return Some(info.span);
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Check if a node is a privileged sink (shell execution or file I/O).
|
||||
fn is_privileged_sink(info: &crate::cfg::NodeInfo) -> bool {
|
||||
match info.label {
|
||||
Some(DataLabel::Sink(caps)) => caps.intersects(Cap::SHELL_ESCAPE | Cap::FILE_IO),
|
||||
_ => false,
|
||||
}
|
||||
}
|
||||
|
||||
/// Simplified web entrypoint check (avoids AnalysisContext dependency).
|
||||
fn is_web_entrypoint_simple(
|
||||
func_name: &str,
|
||||
lang: Lang,
|
||||
func_summaries: &crate::cfg::FuncSummaries,
|
||||
_cfg: &Cfg,
|
||||
) -> bool {
|
||||
let name_lower = func_name.to_ascii_lowercase();
|
||||
|
||||
// Skip bare "main" — it's typically a CLI entry
|
||||
if name_lower == "main" {
|
||||
return false;
|
||||
}
|
||||
|
||||
let is_handler_name = name_lower.starts_with("handle_")
|
||||
|| name_lower.starts_with("route_")
|
||||
|| name_lower.starts_with("api_")
|
||||
|| name_lower.starts_with("serve_")
|
||||
|| name_lower.starts_with("process_")
|
||||
|| name_lower == "handler";
|
||||
|
||||
if !is_handler_name {
|
||||
return false;
|
||||
}
|
||||
|
||||
// Check for web-like parameters
|
||||
let web_params: &[&str] = match lang {
|
||||
Lang::Rust => &["request", "req", "json", "query", "form", "payload", "body"],
|
||||
Lang::JavaScript | Lang::TypeScript => &["req", "request", "ctx", "res", "response"],
|
||||
Lang::Python => &["request", "req"],
|
||||
Lang::Go => &["w", "writer", "r", "req", "request"],
|
||||
Lang::Java => &["request", "req"],
|
||||
_ => &["request", "req"],
|
||||
};
|
||||
|
||||
let has_web_params = func_summaries.values().any(|s| {
|
||||
s.param_names
|
||||
.iter()
|
||||
.any(|p| web_params.contains(&p.to_ascii_lowercase().as_str()))
|
||||
});
|
||||
|
||||
// Strong handler names are enough even without web params
|
||||
let strong_name = name_lower.starts_with("handle_")
|
||||
|| name_lower.starts_with("route_")
|
||||
|| name_lower.starts_with("api_");
|
||||
|
||||
has_web_params || strong_name
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::cfg::{EdgeKind, NodeInfo};
|
||||
use crate::cfg_analysis::rules;
|
||||
use crate::state::domain::ProductState;
|
||||
use crate::state::engine;
|
||||
use crate::state::symbol::SymbolInterner;
|
||||
use crate::state::transfer::DefaultTransfer;
|
||||
use petgraph::Graph;
|
||||
use std::collections::HashMap;
|
||||
|
||||
fn make_node(kind: StmtKind) -> NodeInfo {
|
||||
NodeInfo {
|
||||
kind,
|
||||
span: (0, 0),
|
||||
label: None,
|
||||
defines: None,
|
||||
uses: vec![],
|
||||
callee: None,
|
||||
enclosing_func: None,
|
||||
call_ordinal: 0,
|
||||
condition_text: None,
|
||||
condition_vars: vec![],
|
||||
condition_negated: false,
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn detects_resource_leak() {
|
||||
// Entry → fopen(f) → Exit (no close)
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
let open_node = cfg.add_node(NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
span: (10, 20),
|
||||
defines: Some("f".into()),
|
||||
callee: Some("fopen".into()),
|
||||
..make_node(StmtKind::Call)
|
||||
});
|
||||
let exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
|
||||
cfg.add_edge(entry, open_node, EdgeKind::Seq);
|
||||
cfg.add_edge(open_node, exit, EdgeKind::Seq);
|
||||
|
||||
let interner = SymbolInterner::from_cfg(&cfg);
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::C,
|
||||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
};
|
||||
|
||||
let result = engine::run_forward(&cfg, entry, &transfer, ProductState::initial());
|
||||
let findings = extract_findings(&result, &cfg, &interner, Lang::C, &HashMap::new());
|
||||
|
||||
assert_eq!(findings.len(), 1);
|
||||
assert_eq!(findings[0].rule_id, "state-resource-leak");
|
||||
assert!(findings[0].message.contains("f"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn clean_open_close_no_findings() {
|
||||
// Entry → fopen(f) → fclose(f) → Exit
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
let open_node = cfg.add_node(NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
defines: Some("f".into()),
|
||||
callee: Some("fopen".into()),
|
||||
..make_node(StmtKind::Call)
|
||||
});
|
||||
let close_node = cfg.add_node(NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
uses: vec!["f".into()],
|
||||
callee: Some("fclose".into()),
|
||||
..make_node(StmtKind::Call)
|
||||
});
|
||||
let exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
|
||||
cfg.add_edge(entry, open_node, EdgeKind::Seq);
|
||||
cfg.add_edge(open_node, close_node, EdgeKind::Seq);
|
||||
cfg.add_edge(close_node, exit, EdgeKind::Seq);
|
||||
|
||||
let interner = SymbolInterner::from_cfg(&cfg);
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::C,
|
||||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
};
|
||||
|
||||
let result = engine::run_forward(&cfg, entry, &transfer, ProductState::initial());
|
||||
let findings = extract_findings(&result, &cfg, &interner, Lang::C, &HashMap::new());
|
||||
|
||||
assert!(findings.is_empty());
|
||||
}
|
||||
}
|
||||
91
src/state/lattice.rs
Normal file
91
src/state/lattice.rs
Normal file
|
|
@ -0,0 +1,91 @@
|
|||
/// A bounded semi-lattice with bottom element and monotone join.
|
||||
///
|
||||
/// Implementations must satisfy:
|
||||
/// - `join` is commutative, associative, and idempotent
|
||||
/// - `bot()` is the identity for `join`
|
||||
/// - `leq(a, b)` iff `join(a, b) == b`
|
||||
#[allow(dead_code)]
|
||||
pub trait Lattice: Clone + Eq + Sized {
|
||||
/// Bottom element (least information / unreachable).
|
||||
fn bot() -> Self;
|
||||
|
||||
/// Least upper bound: merge two abstract values.
|
||||
fn join(&self, other: &Self) -> Self;
|
||||
|
||||
/// Partial order: `self ⊑ other`.
|
||||
fn leq(&self, other: &Self) -> bool;
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
/// A trivial 3-element lattice for testing the trait contract.
|
||||
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||
struct Three(u8); // 0=bot, 1, 2=top-ish
|
||||
|
||||
impl Lattice for Three {
|
||||
fn bot() -> Self {
|
||||
Three(0)
|
||||
}
|
||||
fn join(&self, other: &Self) -> Self {
|
||||
Three(self.0.max(other.0))
|
||||
}
|
||||
fn leq(&self, other: &Self) -> bool {
|
||||
self.0 <= other.0
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bot_identity() {
|
||||
let a = Three(1);
|
||||
assert_eq!(a.join(&Three::bot()), a);
|
||||
assert_eq!(Three::bot().join(&a), a);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn join_commutative() {
|
||||
let a = Three(1);
|
||||
let b = Three(2);
|
||||
assert_eq!(a.join(&b), b.join(&a));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn join_associative() {
|
||||
let a = Three(0);
|
||||
let b = Three(1);
|
||||
let c = Three(2);
|
||||
assert_eq!(a.join(&b).join(&c), a.join(&b.join(&c)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn join_idempotent() {
|
||||
let a = Three(1);
|
||||
assert_eq!(a.join(&a), a);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn leq_reflexive() {
|
||||
let a = Three(1);
|
||||
assert!(a.leq(&a));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn leq_transitive() {
|
||||
let a = Three(0);
|
||||
let b = Three(1);
|
||||
let c = Three(2);
|
||||
assert!(a.leq(&b));
|
||||
assert!(b.leq(&c));
|
||||
assert!(a.leq(&c));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn leq_consistent_with_join() {
|
||||
let a = Three(1);
|
||||
let b = Three(2);
|
||||
// a ⊑ b iff join(a, b) == b
|
||||
assert!(a.leq(&b));
|
||||
assert_eq!(a.join(&b), b);
|
||||
}
|
||||
}
|
||||
62
src/state/mod.rs
Normal file
62
src/state/mod.rs
Normal file
|
|
@ -0,0 +1,62 @@
|
|||
pub mod domain;
|
||||
pub mod engine;
|
||||
pub mod facts;
|
||||
pub mod lattice;
|
||||
pub mod symbol;
|
||||
pub mod transfer;
|
||||
|
||||
use crate::cfg::{Cfg, FuncSummaries};
|
||||
use crate::cfg_analysis::rules;
|
||||
use crate::summary::GlobalSummaries;
|
||||
use crate::symbol::Lang;
|
||||
use domain::ProductState;
|
||||
use engine::MAX_TRACKED_VARS;
|
||||
use facts::StateFinding;
|
||||
use petgraph::graph::NodeIndex;
|
||||
use symbol::SymbolInterner;
|
||||
use transfer::DefaultTransfer;
|
||||
|
||||
/// Run state-model dataflow analysis on a single function's CFG.
|
||||
///
|
||||
/// Returns findings for use-after-close, double-close, resource leaks,
|
||||
/// and unauthenticated access to sensitive sinks.
|
||||
pub fn run_state_analysis(
|
||||
cfg: &Cfg,
|
||||
entry: NodeIndex,
|
||||
lang: Lang,
|
||||
_source_bytes: &[u8],
|
||||
func_summaries: &FuncSummaries,
|
||||
_global_summaries: Option<&GlobalSummaries>,
|
||||
) -> Vec<StateFinding> {
|
||||
let _span = tracing::debug_span!("run_state_analysis").entered();
|
||||
|
||||
// 1. Build symbol interner from CFG
|
||||
let interner = SymbolInterner::from_cfg(cfg);
|
||||
|
||||
// Guarded degradation: cap tracked variables
|
||||
if interner.len() > MAX_TRACKED_VARS {
|
||||
tracing::warn!(
|
||||
symbols = interner.len(),
|
||||
max = MAX_TRACKED_VARS,
|
||||
"state analysis: too many variables, capping tracking"
|
||||
);
|
||||
// Still run — the interner has all symbols, but transfer will only
|
||||
// track the first MAX_TRACKED_VARS due to HashMap insertion order.
|
||||
// This is conservative but safe.
|
||||
}
|
||||
|
||||
// 2. Construct transfer function
|
||||
let resource_pairs = rules::resource_pairs(lang);
|
||||
let transfer = DefaultTransfer {
|
||||
lang,
|
||||
resource_pairs,
|
||||
interner: &interner,
|
||||
};
|
||||
|
||||
// 3. Run forward dataflow engine
|
||||
let initial = ProductState::initial();
|
||||
let result = engine::run_forward(cfg, entry, &transfer, initial);
|
||||
|
||||
// 4. Extract findings
|
||||
facts::extract_findings(&result, cfg, &interner, lang, func_summaries)
|
||||
}
|
||||
101
src/state/symbol.rs
Normal file
101
src/state/symbol.rs
Normal file
|
|
@ -0,0 +1,101 @@
|
|||
use crate::cfg::Cfg;
|
||||
use petgraph::visit::IntoNodeReferences;
|
||||
use std::collections::HashMap;
|
||||
|
||||
/// Cheap `Copy` handle into a [`SymbolInterner`].
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
|
||||
pub struct SymbolId(pub(crate) u32);
|
||||
|
||||
/// Per-function interner: maps `String` ↔ [`SymbolId`].
|
||||
///
|
||||
/// Built once from CFG node `defines`/`uses`, reused throughout analysis.
|
||||
#[derive(Default)]
|
||||
pub struct SymbolInterner {
|
||||
to_id: HashMap<String, SymbolId>,
|
||||
to_str: Vec<String>,
|
||||
}
|
||||
|
||||
impl SymbolInterner {
|
||||
pub fn new() -> Self {
|
||||
Self::default()
|
||||
}
|
||||
|
||||
/// Intern a name, returning its stable [`SymbolId`].
|
||||
pub fn intern(&mut self, name: &str) -> SymbolId {
|
||||
if let Some(&id) = self.to_id.get(name) {
|
||||
return id;
|
||||
}
|
||||
let id = SymbolId(self.to_str.len() as u32);
|
||||
self.to_str.push(name.to_owned());
|
||||
self.to_id.insert(name.to_owned(), id);
|
||||
id
|
||||
}
|
||||
|
||||
/// Look up a name without interning it.
|
||||
pub fn get(&self, name: &str) -> Option<SymbolId> {
|
||||
self.to_id.get(name).copied()
|
||||
}
|
||||
|
||||
/// Resolve an id back to its string.
|
||||
pub fn resolve(&self, id: SymbolId) -> &str {
|
||||
&self.to_str[id.0 as usize]
|
||||
}
|
||||
|
||||
/// Number of interned symbols.
|
||||
pub fn len(&self) -> usize {
|
||||
self.to_str.len()
|
||||
}
|
||||
|
||||
/// Whether the interner is empty.
|
||||
#[allow(dead_code)]
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.to_str.is_empty()
|
||||
}
|
||||
|
||||
/// Build from a CFG: walk all nodes, intern every `defines`/`uses` string.
|
||||
pub fn from_cfg(cfg: &Cfg) -> Self {
|
||||
let mut interner = Self::new();
|
||||
for (_idx, info) in cfg.node_references() {
|
||||
if let Some(ref d) = info.defines {
|
||||
interner.intern(d);
|
||||
}
|
||||
for u in &info.uses {
|
||||
interner.intern(u);
|
||||
}
|
||||
}
|
||||
interner
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn intern_resolve_roundtrip() {
|
||||
let mut interner = SymbolInterner::new();
|
||||
let a = interner.intern("foo");
|
||||
let b = interner.intern("bar");
|
||||
let a2 = interner.intern("foo");
|
||||
|
||||
assert_eq!(a, a2);
|
||||
assert_ne!(a, b);
|
||||
assert_eq!(interner.resolve(a), "foo");
|
||||
assert_eq!(interner.resolve(b), "bar");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn get_returns_none_for_unknown() {
|
||||
let interner = SymbolInterner::new();
|
||||
assert!(interner.get("missing").is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn len_tracks_unique_symbols() {
|
||||
let mut interner = SymbolInterner::new();
|
||||
interner.intern("a");
|
||||
interner.intern("b");
|
||||
interner.intern("a"); // duplicate
|
||||
assert_eq!(interner.len(), 2);
|
||||
}
|
||||
}
|
||||
426
src/state/transfer.rs
Normal file
426
src/state/transfer.rs
Normal file
|
|
@ -0,0 +1,426 @@
|
|||
use super::domain::{AuthLevel, ProductState, ResourceLifecycle};
|
||||
use super::engine::Transfer;
|
||||
use super::symbol::{SymbolId, SymbolInterner};
|
||||
use crate::cfg::{EdgeKind, NodeInfo, StmtKind};
|
||||
use crate::cfg_analysis::rules::{self, ResourcePair};
|
||||
use crate::symbol::Lang;
|
||||
use petgraph::graph::NodeIndex;
|
||||
|
||||
/// Events emitted during transfer for illegal state transitions.
|
||||
/// These are NOT lattice values — they become findings in `facts.rs`.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct TransferEvent {
|
||||
pub kind: TransferEventKind,
|
||||
pub node: NodeIndex,
|
||||
pub var: SymbolId,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub enum TransferEventKind {
|
||||
UseAfterClose,
|
||||
DoubleClose,
|
||||
}
|
||||
|
||||
/// Resource-use patterns: callees that read/write/operate on a resource handle
|
||||
/// (triggering use-after-close if the handle is closed).
|
||||
static RESOURCE_USE_PATTERNS: &[&str] = &[
|
||||
"read", "write", "send", "recv", "fread", "fwrite", "fgets", "fputs", "fprintf", "fscanf",
|
||||
"fflush", "fseek", "ftell", "rewind", "feof", "ferror", "fgetc", "fputc", "getc", "putc",
|
||||
"ungetc", "query", "execute", "fetch", "sendto", "recvfrom", "ioctl", "fcntl",
|
||||
// Memory access functions (for malloc/free use-after-free detection)
|
||||
"strcpy", "strncpy", "strcat", "strncat", "memcpy", "memmove", "memset", "memcmp", "strcmp",
|
||||
"strncmp", "strlen", "sprintf", "snprintf",
|
||||
];
|
||||
|
||||
/// Auth-call matchers for admin-level privilege.
|
||||
static ADMIN_PATTERNS: &[&str] = &[
|
||||
"is_admin",
|
||||
"hasrole",
|
||||
"has_role",
|
||||
"check_admin",
|
||||
"require_admin",
|
||||
];
|
||||
|
||||
pub struct DefaultTransfer<'a> {
|
||||
pub lang: Lang,
|
||||
pub resource_pairs: &'a [ResourcePair],
|
||||
pub interner: &'a SymbolInterner,
|
||||
}
|
||||
|
||||
impl Transfer<ProductState> for DefaultTransfer<'_> {
|
||||
type Event = TransferEvent;
|
||||
|
||||
fn apply(
|
||||
&self,
|
||||
node_idx: NodeIndex,
|
||||
info: &NodeInfo,
|
||||
edge: Option<EdgeKind>,
|
||||
mut state: ProductState,
|
||||
) -> (ProductState, Vec<TransferEvent>) {
|
||||
let mut events = Vec::new();
|
||||
|
||||
match info.kind {
|
||||
StmtKind::Call => {
|
||||
self.apply_call(node_idx, info, &mut state, &mut events);
|
||||
}
|
||||
StmtKind::If => {
|
||||
self.apply_if(info, edge, &mut state);
|
||||
}
|
||||
StmtKind::Seq => {
|
||||
self.apply_assignment(node_idx, info, &mut state);
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
|
||||
(state, events)
|
||||
}
|
||||
}
|
||||
|
||||
impl DefaultTransfer<'_> {
|
||||
fn apply_call(
|
||||
&self,
|
||||
node_idx: NodeIndex,
|
||||
info: &NodeInfo,
|
||||
state: &mut ProductState,
|
||||
events: &mut Vec<TransferEvent>,
|
||||
) {
|
||||
let callee = match &info.callee {
|
||||
Some(c) => c.to_ascii_lowercase(),
|
||||
None => return,
|
||||
};
|
||||
|
||||
// ── Resource acquire ─────────────────────────────────────────────
|
||||
for pair in self.resource_pairs {
|
||||
let is_acquire = pair.acquire.iter().any(|a| callee_matches(&callee, a));
|
||||
let is_excluded = pair
|
||||
.exclude_acquire
|
||||
.iter()
|
||||
.any(|e| callee_matches(&callee, e));
|
||||
|
||||
if is_acquire
|
||||
&& !is_excluded
|
||||
&& let Some(ref def) = info.defines
|
||||
&& let Some(sym) = self.interner.get(def)
|
||||
{
|
||||
state.resource.set(sym, ResourceLifecycle::OPEN);
|
||||
}
|
||||
}
|
||||
|
||||
// ── Resource release ─────────────────────────────────────────────
|
||||
// Track which variables have already been released to avoid double-
|
||||
// matching across multiple resource pair definitions.
|
||||
let mut released: smallvec::SmallVec<[SymbolId; 4]> = smallvec::SmallVec::new();
|
||||
for pair in self.resource_pairs {
|
||||
let is_release = pair.release.iter().any(|r| callee_matches(&callee, r));
|
||||
if is_release {
|
||||
for used in &info.uses {
|
||||
if let Some(sym) = self.interner.get(used) {
|
||||
if released.contains(&sym) {
|
||||
continue;
|
||||
}
|
||||
let current = state.resource.get(sym);
|
||||
if current == ResourceLifecycle::CLOSED {
|
||||
// Double close
|
||||
events.push(TransferEvent {
|
||||
kind: TransferEventKind::DoubleClose,
|
||||
node: node_idx,
|
||||
var: sym,
|
||||
});
|
||||
} else if current.contains(ResourceLifecycle::OPEN) {
|
||||
state.resource.set(sym, ResourceLifecycle::CLOSED);
|
||||
}
|
||||
released.push(sym);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── Resource use (read/write/etc.) ───────────────────────────────
|
||||
let is_use = RESOURCE_USE_PATTERNS
|
||||
.iter()
|
||||
.any(|p| callee_matches(&callee, p));
|
||||
if is_use {
|
||||
for used in &info.uses {
|
||||
if let Some(sym) = self.interner.get(used) {
|
||||
let current = state.resource.get(sym);
|
||||
if current == ResourceLifecycle::CLOSED {
|
||||
events.push(TransferEvent {
|
||||
kind: TransferEventKind::UseAfterClose,
|
||||
node: node_idx,
|
||||
var: sym,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── Auth call ────────────────────────────────────────────────────
|
||||
let auth_rules = rules::auth_rules(self.lang);
|
||||
let is_auth = auth_rules.iter().any(|rule| {
|
||||
rule.matchers
|
||||
.iter()
|
||||
.any(|m| callee_matches(&callee, &m.to_ascii_lowercase()))
|
||||
});
|
||||
if is_auth {
|
||||
let is_admin = ADMIN_PATTERNS.iter().any(|p| callee_matches(&callee, p));
|
||||
let new_level = if is_admin {
|
||||
AuthLevel::Admin
|
||||
} else {
|
||||
AuthLevel::Authed
|
||||
};
|
||||
if new_level > state.auth.auth_level {
|
||||
state.auth.auth_level = new_level;
|
||||
}
|
||||
}
|
||||
|
||||
// ── Validation call (guard) ──────────────────────────────────────
|
||||
if is_guard_like(&callee) {
|
||||
for used in &info.uses {
|
||||
if let Some(sym) = self.interner.get(used) {
|
||||
state.auth.validated.insert(sym);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn apply_if(&self, info: &NodeInfo, edge: Option<EdgeKind>, state: &mut ProductState) {
|
||||
// On the True edge of an If node whose condition is an auth check,
|
||||
// refine auth level.
|
||||
let is_true_edge = matches!(edge, Some(EdgeKind::True));
|
||||
if !is_true_edge {
|
||||
return;
|
||||
}
|
||||
|
||||
if let Some(ref cond) = info.condition_text {
|
||||
let cond_lower = cond.to_ascii_lowercase();
|
||||
|
||||
// Auth-related condition
|
||||
let auth_rules = rules::auth_rules(self.lang);
|
||||
let is_auth_cond = auth_rules.iter().any(|rule| {
|
||||
rule.matchers
|
||||
.iter()
|
||||
.any(|m| cond_lower.contains(&m.to_ascii_lowercase()))
|
||||
});
|
||||
if is_auth_cond && !info.condition_negated {
|
||||
let is_admin = ADMIN_PATTERNS.iter().any(|p| cond_lower.contains(p));
|
||||
let new_level = if is_admin {
|
||||
AuthLevel::Admin
|
||||
} else {
|
||||
AuthLevel::Authed
|
||||
};
|
||||
if new_level > state.auth.auth_level {
|
||||
state.auth.auth_level = new_level;
|
||||
}
|
||||
}
|
||||
|
||||
// Validation-related condition
|
||||
if is_guard_like(&cond_lower) && !info.condition_negated {
|
||||
for var in &info.condition_vars {
|
||||
if let Some(sym) = self.interner.get(var) {
|
||||
state.auth.validated.insert(sym);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn apply_assignment(&self, _node_idx: NodeIndex, info: &NodeInfo, state: &mut ProductState) {
|
||||
// Ownership transfer: if `defines` reassigns a tracked resource
|
||||
// variable from a `uses` variable, transfer the lifecycle.
|
||||
if let Some(ref def) = info.defines
|
||||
&& let Some(def_sym) = self.interner.get(def)
|
||||
{
|
||||
// If the RHS is a tracked resource, transfer its state
|
||||
for used in &info.uses {
|
||||
if let Some(use_sym) = self.interner.get(used) {
|
||||
let lc = state.resource.get(use_sym);
|
||||
if lc.contains(ResourceLifecycle::OPEN) {
|
||||
state.resource.set(def_sym, lc);
|
||||
state.resource.set(use_sym, ResourceLifecycle::MOVED);
|
||||
return;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Check if a callee matches a pattern.
|
||||
/// Supports suffix matching (e.g., "fclose" matches callee "my_fclose")
|
||||
/// and dot-prefix matching (e.g., ".close" matches "file.close").
|
||||
fn callee_matches(callee: &str, pattern: &str) -> bool {
|
||||
let pattern_lower = pattern.to_ascii_lowercase();
|
||||
if pattern_lower.starts_with('.') {
|
||||
// Method pattern: ".close" matches "x.close", "file.close", etc.
|
||||
callee.ends_with(&pattern_lower)
|
||||
} else {
|
||||
// Exact or suffix match
|
||||
callee == pattern_lower || callee.ends_with(&pattern_lower)
|
||||
}
|
||||
}
|
||||
|
||||
/// Check if a callee looks like a guard/validation function.
|
||||
fn is_guard_like(callee: &str) -> bool {
|
||||
static GUARD_PREFIXES: &[&str] = &["validate", "sanitize", "check_", "verify_", "assert_"];
|
||||
GUARD_PREFIXES.iter().any(|p| callee.starts_with(p))
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
#[test]
|
||||
fn callee_matches_exact() {
|
||||
assert!(callee_matches("fopen", "fopen"));
|
||||
assert!(!callee_matches("fopen", "fclose"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn callee_matches_suffix() {
|
||||
assert!(callee_matches("curlx_fclose", "fclose"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn callee_matches_dot_prefix() {
|
||||
assert!(callee_matches("file.close", ".close"));
|
||||
assert!(!callee_matches("file.close", ".open"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn acquire_sets_open() {
|
||||
let mut interner = SymbolInterner::new();
|
||||
let sym_f = interner.intern("f");
|
||||
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::C,
|
||||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
};
|
||||
|
||||
let info = NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
span: (0, 10),
|
||||
label: None,
|
||||
defines: Some("f".into()),
|
||||
uses: vec![],
|
||||
callee: Some("fopen".into()),
|
||||
enclosing_func: None,
|
||||
call_ordinal: 0,
|
||||
condition_text: None,
|
||||
condition_vars: vec![],
|
||||
condition_negated: false,
|
||||
};
|
||||
|
||||
let (state, events) =
|
||||
transfer.apply(NodeIndex::new(0), &info, None, ProductState::initial());
|
||||
assert!(events.is_empty());
|
||||
assert_eq!(state.resource.get(sym_f), ResourceLifecycle::OPEN);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn close_after_open_sets_closed() {
|
||||
let mut interner = SymbolInterner::new();
|
||||
let sym_f = interner.intern("f");
|
||||
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::C,
|
||||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
};
|
||||
|
||||
let mut state = ProductState::initial();
|
||||
state.resource.set(sym_f, ResourceLifecycle::OPEN);
|
||||
|
||||
let info = NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
span: (10, 20),
|
||||
label: None,
|
||||
defines: None,
|
||||
uses: vec!["f".into()],
|
||||
callee: Some("fclose".into()),
|
||||
enclosing_func: None,
|
||||
call_ordinal: 0,
|
||||
condition_text: None,
|
||||
condition_vars: vec![],
|
||||
condition_negated: false,
|
||||
};
|
||||
|
||||
let (state, events) = transfer.apply(NodeIndex::new(1), &info, None, state);
|
||||
assert!(events.is_empty());
|
||||
assert_eq!(state.resource.get(sym_f), ResourceLifecycle::CLOSED);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn double_close_emits_event() {
|
||||
let mut interner = SymbolInterner::new();
|
||||
let sym_f = interner.intern("f");
|
||||
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::C,
|
||||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
};
|
||||
|
||||
let mut state = ProductState::initial();
|
||||
state.resource.set(sym_f, ResourceLifecycle::CLOSED);
|
||||
|
||||
let info = NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
span: (20, 30),
|
||||
label: None,
|
||||
defines: None,
|
||||
uses: vec!["f".into()],
|
||||
callee: Some("fclose".into()),
|
||||
enclosing_func: None,
|
||||
call_ordinal: 0,
|
||||
condition_text: None,
|
||||
condition_vars: vec![],
|
||||
condition_negated: false,
|
||||
};
|
||||
|
||||
let (_state, events) = transfer.apply(NodeIndex::new(2), &info, None, state);
|
||||
assert_eq!(events.len(), 1);
|
||||
assert_eq!(events[0].kind, TransferEventKind::DoubleClose);
|
||||
assert_eq!(events[0].var, sym_f);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn use_after_close_emits_event() {
|
||||
let mut interner = SymbolInterner::new();
|
||||
let sym_f = interner.intern("f");
|
||||
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::C,
|
||||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
};
|
||||
|
||||
let mut state = ProductState::initial();
|
||||
state.resource.set(sym_f, ResourceLifecycle::CLOSED);
|
||||
|
||||
let info = NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
span: (30, 40),
|
||||
label: None,
|
||||
defines: None,
|
||||
uses: vec!["f".into()],
|
||||
callee: Some("fread".into()),
|
||||
enclosing_func: None,
|
||||
call_ordinal: 0,
|
||||
condition_text: None,
|
||||
condition_vars: vec![],
|
||||
condition_negated: false,
|
||||
};
|
||||
|
||||
let (_state, events) = transfer.apply(NodeIndex::new(3), &info, None, state);
|
||||
assert_eq!(events.len(), 1);
|
||||
assert_eq!(events[0].kind, TransferEventKind::UseAfterClose);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn is_guard_like_check() {
|
||||
assert!(is_guard_like("validate_input"));
|
||||
assert!(is_guard_like("sanitize_html"));
|
||||
assert!(is_guard_like("check_permission"));
|
||||
assert!(!is_guard_like("open_file"));
|
||||
}
|
||||
}
|
||||
|
|
@ -139,6 +139,22 @@ impl FuncSummary {
|
|||
}
|
||||
}
|
||||
|
||||
// ── Callee resolution ────────────────────────────────────────────────────
|
||||
|
||||
/// Result of resolving a bare callee name to a [`FuncKey`].
|
||||
///
|
||||
/// Three-valued: the call graph builder and taint engine need to distinguish
|
||||
/// "no candidates at all" from "multiple candidates, can't pick one".
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub enum CalleeResolution {
|
||||
/// Exactly one candidate matched.
|
||||
Resolved(FuncKey),
|
||||
/// No candidates found at all.
|
||||
NotFound,
|
||||
/// Multiple candidates — ambiguous, cannot pick one.
|
||||
Ambiguous(Vec<FuncKey>),
|
||||
}
|
||||
|
||||
// ── Lookup map used by the taint engine ─────────────────────────────────
|
||||
|
||||
/// A merged view of all function summaries keyed by qualified [`FuncKey`].
|
||||
|
|
@ -216,16 +232,66 @@ impl GlobalSummaries {
|
|||
}
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
#[allow(dead_code)] // used by tests and future call-graph consumers
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.by_key.is_empty()
|
||||
}
|
||||
|
||||
/// Iterate over all (key, summary) pairs.
|
||||
#[allow(dead_code)]
|
||||
pub fn iter(&self) -> impl Iterator<Item = (&FuncKey, &FuncSummary)> {
|
||||
self.by_key.iter()
|
||||
}
|
||||
|
||||
/// Resolve a bare (already-normalized) callee name to a [`FuncKey`].
|
||||
///
|
||||
/// Resolution order:
|
||||
/// 1. Collect all same-language candidates matching the name.
|
||||
/// 2. If `arity_hint` is `Some`, filter candidates by matching arity.
|
||||
/// 3. If exactly one candidate → [`CalleeResolution::Resolved`].
|
||||
/// 4. If multiple, filter by `caller_namespace`; if exactly one → `Resolved`.
|
||||
/// 5. If still multiple → [`CalleeResolution::Ambiguous`].
|
||||
/// 6. If zero candidates → [`CalleeResolution::NotFound`].
|
||||
pub fn resolve_callee_key(
|
||||
&self,
|
||||
callee: &str,
|
||||
caller_lang: Lang,
|
||||
caller_namespace: &str,
|
||||
arity_hint: Option<usize>,
|
||||
) -> CalleeResolution {
|
||||
let candidates = self.lookup_same_lang(caller_lang, callee);
|
||||
if candidates.is_empty() {
|
||||
return CalleeResolution::NotFound;
|
||||
}
|
||||
|
||||
// Apply arity filter if hint provided.
|
||||
let filtered: Vec<&FuncKey> = if let Some(arity) = arity_hint {
|
||||
candidates
|
||||
.iter()
|
||||
.filter(|(k, _)| k.arity == Some(arity))
|
||||
.map(|(k, _)| *k)
|
||||
.collect()
|
||||
} else {
|
||||
candidates.iter().map(|(k, _)| *k).collect()
|
||||
};
|
||||
|
||||
match filtered.len() {
|
||||
0 => CalleeResolution::NotFound,
|
||||
1 => CalleeResolution::Resolved(filtered[0].clone()),
|
||||
_ => {
|
||||
// Namespace disambiguation: prefer same-namespace match.
|
||||
let same_ns: Vec<&FuncKey> = filtered
|
||||
.iter()
|
||||
.filter(|k| k.namespace == caller_namespace)
|
||||
.copied()
|
||||
.collect();
|
||||
match same_ns.len() {
|
||||
1 => CalleeResolution::Resolved(same_ns[0].clone()),
|
||||
0 => CalleeResolution::Ambiguous(filtered.into_iter().cloned().collect()),
|
||||
_ => CalleeResolution::Ambiguous(same_ns.into_iter().cloned().collect()),
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl std::fmt::Debug for GlobalSummaries {
|
||||
|
|
|
|||
715
src/suppress/mod.rs
Normal file
715
src/suppress/mod.rs
Normal file
|
|
@ -0,0 +1,715 @@
|
|||
//! Inline per-finding suppression via source-code comments.
|
||||
//!
|
||||
//! Supports two directive forms:
|
||||
//! - `nyx:ignore <RULE_ID>[, <RULE_ID>…]` — suppress findings on the same line
|
||||
//! - `nyx:ignore-next-line <RULE_ID>[, …]` — suppress findings on the next line
|
||||
//!
|
||||
//! Comments are detected for all supported languages without tree-sitter,
|
||||
//! using a lightweight string/comment state machine.
|
||||
|
||||
use std::collections::HashMap;
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Public types
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Whether the directive suppresses on its own line or the next line.
|
||||
#[derive(Debug, Clone, serde::Serialize)]
|
||||
pub enum SuppressionKind {
|
||||
SameLine,
|
||||
NextLine,
|
||||
}
|
||||
|
||||
/// Metadata attached to a suppressed finding.
|
||||
#[derive(Debug, Clone, serde::Serialize)]
|
||||
pub struct SuppressionMeta {
|
||||
pub kind: SuppressionKind,
|
||||
/// The pattern that matched the finding's rule ID.
|
||||
pub matched_pattern: String,
|
||||
/// 1-indexed line where the suppression directive appears.
|
||||
pub directive_line: usize,
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Internal types
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// A single rule matcher — either exact or wildcard-suffix (`foo.*`).
|
||||
#[derive(Debug)]
|
||||
enum RuleMatcher {
|
||||
Exact(String),
|
||||
/// `prefix` stores everything before the trailing `.*`.
|
||||
WildcardSuffix(String),
|
||||
}
|
||||
|
||||
impl RuleMatcher {
|
||||
fn matches(&self, rule_id: &str) -> bool {
|
||||
match self {
|
||||
RuleMatcher::Exact(s) => s == rule_id,
|
||||
RuleMatcher::WildcardSuffix(prefix) => {
|
||||
rule_id.starts_with(prefix.as_str())
|
||||
&& rule_id.len() > prefix.len()
|
||||
&& rule_id.as_bytes()[prefix.len()] == b'.'
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// A parsed directive from a single comment.
|
||||
#[derive(Debug)]
|
||||
struct LineDirective {
|
||||
kind: SuppressionKind,
|
||||
/// 1-indexed line where the directive comment appears.
|
||||
directive_line: usize,
|
||||
matchers: Vec<RuleMatcher>,
|
||||
}
|
||||
|
||||
/// Pre-built index of suppression directives keyed by **target line** (the
|
||||
/// line whose findings should be suppressed, 1-indexed).
|
||||
pub struct SuppressionIndex {
|
||||
directives: HashMap<usize, Vec<LineDirective>>,
|
||||
}
|
||||
|
||||
impl SuppressionIndex {
|
||||
/// Check whether a finding at `line` (1-indexed) with `rule_id` is suppressed.
|
||||
pub fn check(&self, line: usize, rule_id: &str) -> Option<SuppressionMeta> {
|
||||
let canon = canonical_rule_id(rule_id);
|
||||
let dirs = self.directives.get(&line)?;
|
||||
for dir in dirs {
|
||||
for m in &dir.matchers {
|
||||
if m.matches(canon) {
|
||||
let display_pattern = match m {
|
||||
RuleMatcher::Exact(s) => s.clone(),
|
||||
RuleMatcher::WildcardSuffix(s) => format!("{s}.*"),
|
||||
};
|
||||
return Some(SuppressionMeta {
|
||||
kind: dir.kind.clone(),
|
||||
matched_pattern: display_pattern,
|
||||
directive_line: dir.directive_line,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Returns `true` if no directives were found.
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.directives.is_empty()
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Canonical rule ID
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Strip parenthetical suffix from a rule ID:
|
||||
/// `"taint-unsanitised-flow (source 5:1)"` → `"taint-unsanitised-flow"`.
|
||||
pub fn canonical_rule_id(id: &str) -> &str {
|
||||
let trimmed = id.trim();
|
||||
if let Some(idx) = trimmed.find(" (") {
|
||||
trimmed[..idx].trim_end()
|
||||
} else {
|
||||
trimmed
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Comment style per language
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
#[derive(Clone, Copy)]
|
||||
enum CommentStyle {
|
||||
/// `//` and `/* */` — Rust, C, C++, Java, Go, JS, TS
|
||||
CStyle,
|
||||
/// `#` only — Python, Ruby
|
||||
Hash,
|
||||
/// `//`, `#`, and `/* */` — PHP
|
||||
PhpStyle,
|
||||
}
|
||||
|
||||
/// Map a file extension to the comment style for that language.
|
||||
fn comment_style_for_ext(ext: &str) -> Option<CommentStyle> {
|
||||
match ext {
|
||||
"rs" | "c" | "cpp" | "java" | "go" | "ts" | "js" => Some(CommentStyle::CStyle),
|
||||
"py" | "rb" => Some(CommentStyle::Hash),
|
||||
"php" => Some(CommentStyle::PhpStyle),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Map a file path to its comment style by inspecting the extension.
|
||||
fn comment_style_for_path(path: &std::path::Path) -> Option<CommentStyle> {
|
||||
let ext = path.extension().and_then(|s| s.to_str())?;
|
||||
// Normalise common variant extensions
|
||||
let norm = match ext {
|
||||
"RS" => "rs",
|
||||
"c++" => "cpp",
|
||||
"PY" => "py",
|
||||
"TSX" | "tsx" => "ts",
|
||||
other => other,
|
||||
};
|
||||
comment_style_for_ext(norm)
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Parser
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Parse inline suppression directives from `source`, using comment syntax
|
||||
/// appropriate for the given file path.
|
||||
///
|
||||
/// Returns an empty index if the source doesn't contain `nyx:ignore` or the
|
||||
/// language is unsupported.
|
||||
pub fn parse_inline_suppressions(path: &std::path::Path, source: &str) -> SuppressionIndex {
|
||||
// Fast path: no directives possible.
|
||||
if !source.as_bytes().windows(10).any(|w| w == b"nyx:ignore") {
|
||||
return SuppressionIndex {
|
||||
directives: HashMap::new(),
|
||||
};
|
||||
}
|
||||
|
||||
let Some(style) = comment_style_for_path(path) else {
|
||||
return SuppressionIndex {
|
||||
directives: HashMap::new(),
|
||||
};
|
||||
};
|
||||
|
||||
let mut index: HashMap<usize, Vec<LineDirective>> = HashMap::new();
|
||||
let total_lines = source.lines().count();
|
||||
|
||||
// State machine for string/comment tracking.
|
||||
let mut in_block_comment = false;
|
||||
let mut block_comment_start_line: usize = 0;
|
||||
|
||||
for (line_idx, raw_line) in source.lines().enumerate() {
|
||||
let line_num = line_idx + 1; // 1-indexed
|
||||
let line = raw_line.trim_end_matches('\r');
|
||||
|
||||
if in_block_comment {
|
||||
// Check for block comment end.
|
||||
if let Some(end_pos) = line.find("*/") {
|
||||
// Extract text before `*/` — may contain a directive.
|
||||
let block_text = &line[..end_pos];
|
||||
if let Some(dir) = try_parse_directive(block_text, line_num) {
|
||||
let target = target_line(&dir, line_num, total_lines);
|
||||
if let Some(t) = target {
|
||||
index.entry(t).or_default().push(dir);
|
||||
}
|
||||
}
|
||||
in_block_comment = false;
|
||||
// After the block comment ends, check the rest of the line
|
||||
// for a line comment.
|
||||
let rest = &line[end_pos + 2..];
|
||||
if let Some(dir) = extract_from_line_rest(rest, line_num, style) {
|
||||
let target = target_line(&dir, line_num, total_lines);
|
||||
if let Some(t) = target {
|
||||
index.entry(t).or_default().push(dir);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Still inside block comment — check for directive.
|
||||
if let Some(dir) = try_parse_directive(line, line_num) {
|
||||
let target = target_line(&dir, line_num, total_lines);
|
||||
if let Some(t) = target {
|
||||
index.entry(t).or_default().push(dir);
|
||||
}
|
||||
}
|
||||
}
|
||||
let _ = block_comment_start_line; // suppress unused warning
|
||||
continue;
|
||||
}
|
||||
|
||||
// Not in a block comment — scan the line character by character
|
||||
// tracking string state.
|
||||
if let Some(dir) = scan_line_for_directive(line, line_num, style, &mut in_block_comment) {
|
||||
let target = target_line(&dir, line_num, total_lines);
|
||||
if let Some(t) = target {
|
||||
index.entry(t).or_default().push(dir);
|
||||
}
|
||||
}
|
||||
if in_block_comment {
|
||||
block_comment_start_line = line_num;
|
||||
}
|
||||
}
|
||||
|
||||
SuppressionIndex { directives: index }
|
||||
}
|
||||
|
||||
/// Compute the target line for a directive. Returns `None` if the directive
|
||||
/// is `NextLine` but on the last line (EOF — no-op).
|
||||
fn target_line(dir: &LineDirective, line_num: usize, total_lines: usize) -> Option<usize> {
|
||||
match dir.kind {
|
||||
SuppressionKind::SameLine => Some(line_num),
|
||||
SuppressionKind::NextLine => {
|
||||
if line_num < total_lines {
|
||||
Some(line_num + 1)
|
||||
} else {
|
||||
None // EOF — no next line
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Scan a single line (not inside a block comment) for a suppression directive.
|
||||
/// Tracks string literals to avoid false positives.
|
||||
///
|
||||
/// Sets `in_block_comment` to `true` if the line opens a `/* */` block that
|
||||
/// doesn't close on the same line.
|
||||
fn scan_line_for_directive(
|
||||
line: &str,
|
||||
line_num: usize,
|
||||
style: CommentStyle,
|
||||
in_block_comment: &mut bool,
|
||||
) -> Option<LineDirective> {
|
||||
let bytes = line.as_bytes();
|
||||
let len = bytes.len();
|
||||
let mut i = 0;
|
||||
|
||||
// String state
|
||||
let mut in_string: Option<u8> = None; // quote char: b'"', b'\'', b'`'
|
||||
|
||||
while i < len {
|
||||
let ch = bytes[i];
|
||||
|
||||
// ── Inside a string literal ─────────────────────────────────────
|
||||
if let Some(quote) = in_string {
|
||||
if ch == b'\\' {
|
||||
i += 2; // skip escaped char
|
||||
continue;
|
||||
}
|
||||
// Python triple quotes
|
||||
if (quote == b'"' || quote == b'\'')
|
||||
&& i + 2 < len
|
||||
&& bytes[i] == quote
|
||||
&& bytes[i + 1] == quote
|
||||
&& bytes[i + 2] == quote
|
||||
{
|
||||
// Check if this is a triple-quote close
|
||||
// (we entered via triple-quote open, but we track single quote char)
|
||||
in_string = None;
|
||||
i += 3;
|
||||
continue;
|
||||
}
|
||||
if ch == quote {
|
||||
in_string = None;
|
||||
}
|
||||
i += 1;
|
||||
continue;
|
||||
}
|
||||
|
||||
// ── Not in a string ─────────────────────────────────────────────
|
||||
|
||||
// Rust raw strings: r"..." or r#"..."#
|
||||
if ch == b'r' && i + 1 < len {
|
||||
let next = bytes[i + 1];
|
||||
if next == b'"' {
|
||||
// r"..." — skip to closing "
|
||||
i += 2;
|
||||
while i < len && bytes[i] != b'"' {
|
||||
i += 1;
|
||||
}
|
||||
i += 1; // skip closing "
|
||||
continue;
|
||||
}
|
||||
if next == b'#' {
|
||||
// Count hashes
|
||||
let hash_start = i + 1;
|
||||
let mut j = i + 1;
|
||||
while j < len && bytes[j] == b'#' {
|
||||
j += 1;
|
||||
}
|
||||
let hash_count = j - hash_start;
|
||||
if j < len && bytes[j] == b'"' {
|
||||
// Skip to closing "###
|
||||
let close_pat_len = 1 + hash_count; // " + hashes
|
||||
i = j + 1;
|
||||
'raw: while i < len {
|
||||
if bytes[i] == b'"' {
|
||||
// Check for matching hashes
|
||||
let mut k = 1;
|
||||
while k <= hash_count && i + k < len && bytes[i + k] == b'#' {
|
||||
k += 1;
|
||||
}
|
||||
if k > hash_count {
|
||||
i += close_pat_len;
|
||||
break 'raw;
|
||||
}
|
||||
}
|
||||
i += 1;
|
||||
}
|
||||
continue;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Python triple quotes: """ or '''
|
||||
if (ch == b'"' || ch == b'\'') && i + 2 < len && bytes[i + 1] == ch && bytes[i + 2] == ch {
|
||||
in_string = Some(ch);
|
||||
i += 3;
|
||||
continue;
|
||||
}
|
||||
|
||||
// Regular string literals
|
||||
if ch == b'"' || ch == b'\'' || ch == b'`' {
|
||||
in_string = Some(ch);
|
||||
i += 1;
|
||||
continue;
|
||||
}
|
||||
|
||||
// ── Comment detection ───────────────────────────────────────────
|
||||
|
||||
// C-style line comment: //
|
||||
let has_slash_slash = matches!(style, CommentStyle::CStyle | CommentStyle::PhpStyle);
|
||||
if has_slash_slash && ch == b'/' && i + 1 < len && bytes[i + 1] == b'/' {
|
||||
let comment_body = &line[i + 2..];
|
||||
return try_parse_directive(comment_body, line_num);
|
||||
}
|
||||
|
||||
// Block comment: /*
|
||||
let has_block = matches!(style, CommentStyle::CStyle | CommentStyle::PhpStyle);
|
||||
if has_block && ch == b'/' && i + 1 < len && bytes[i + 1] == b'*' {
|
||||
// Look for closing */ on the same line
|
||||
let rest = &line[i + 2..];
|
||||
if let Some(end) = rest.find("*/") {
|
||||
let block_body = &rest[..end];
|
||||
// Check directive in block body
|
||||
if let Some(dir) = try_parse_directive(block_body, line_num) {
|
||||
return Some(dir);
|
||||
}
|
||||
// Continue scanning after the block
|
||||
i = i + 2 + end + 2;
|
||||
continue;
|
||||
} else {
|
||||
// Block comment extends to next line(s)
|
||||
*in_block_comment = true;
|
||||
let block_body = rest;
|
||||
return try_parse_directive(block_body, line_num);
|
||||
}
|
||||
}
|
||||
|
||||
// Hash comment: #
|
||||
let has_hash = matches!(style, CommentStyle::Hash | CommentStyle::PhpStyle);
|
||||
if has_hash && ch == b'#' {
|
||||
let comment_body = &line[i + 1..];
|
||||
return try_parse_directive(comment_body, line_num);
|
||||
}
|
||||
|
||||
i += 1;
|
||||
}
|
||||
|
||||
None
|
||||
}
|
||||
|
||||
/// Try to extract a directive from a line rest (after a block comment closes).
|
||||
fn extract_from_line_rest(
|
||||
rest: &str,
|
||||
line_num: usize,
|
||||
style: CommentStyle,
|
||||
) -> Option<LineDirective> {
|
||||
let mut in_block = false;
|
||||
scan_line_for_directive(rest, line_num, style, &mut in_block)
|
||||
}
|
||||
|
||||
/// Try to parse a `nyx:ignore` or `nyx:ignore-next-line` directive from
|
||||
/// comment body text. Returns `None` if no directive is found.
|
||||
fn try_parse_directive(text: &str, line_num: usize) -> Option<LineDirective> {
|
||||
let trimmed = text.trim();
|
||||
// Strip leading `*` or `* ` common in block comments (e.g. ` * nyx:ignore ...`).
|
||||
let trimmed = trimmed
|
||||
.strip_prefix("* ")
|
||||
.or(trimmed.strip_prefix('*'))
|
||||
.unwrap_or(trimmed)
|
||||
.trim();
|
||||
|
||||
// Check for `nyx:ignore-next-line` first (longer prefix wins).
|
||||
if let Some(rest) = strip_directive_prefix(trimmed, "nyx:ignore-next-line") {
|
||||
let matchers = parse_rule_ids(rest);
|
||||
if matchers.is_empty() {
|
||||
return None;
|
||||
}
|
||||
return Some(LineDirective {
|
||||
kind: SuppressionKind::NextLine,
|
||||
directive_line: line_num,
|
||||
matchers,
|
||||
});
|
||||
}
|
||||
|
||||
if let Some(rest) = strip_directive_prefix(trimmed, "nyx:ignore") {
|
||||
let matchers = parse_rule_ids(rest);
|
||||
if matchers.is_empty() {
|
||||
return None;
|
||||
}
|
||||
return Some(LineDirective {
|
||||
kind: SuppressionKind::SameLine,
|
||||
directive_line: line_num,
|
||||
matchers,
|
||||
});
|
||||
}
|
||||
|
||||
None
|
||||
}
|
||||
|
||||
/// Strip a directive prefix, allowing optional whitespace or the rest of the
|
||||
/// line to follow.
|
||||
fn strip_directive_prefix<'a>(text: &'a str, prefix: &str) -> Option<&'a str> {
|
||||
let rest = text.strip_prefix(prefix)?;
|
||||
// Must be followed by whitespace, end of string, or nothing.
|
||||
// If prefix is "nyx:ignore" and rest starts with "-next-line", don't match
|
||||
// (handled by checking the longer prefix first).
|
||||
if rest.is_empty() || rest.starts_with(char::is_whitespace) {
|
||||
Some(rest)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
|
||||
/// Parse comma-separated rule IDs into matchers.
|
||||
fn parse_rule_ids(text: &str) -> Vec<RuleMatcher> {
|
||||
text.split(',')
|
||||
.map(|s| s.trim())
|
||||
.filter(|s| !s.is_empty())
|
||||
.map(|s| {
|
||||
if let Some(prefix) = s.strip_suffix(".*") {
|
||||
RuleMatcher::WildcardSuffix(prefix.to_string())
|
||||
} else {
|
||||
RuleMatcher::Exact(s.to_string())
|
||||
}
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
// Tests
|
||||
// ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use std::path::Path;
|
||||
|
||||
fn rust_path() -> &'static Path {
|
||||
Path::new("test.rs")
|
||||
}
|
||||
fn py_path() -> &'static Path {
|
||||
Path::new("test.py")
|
||||
}
|
||||
fn rb_path() -> &'static Path {
|
||||
Path::new("test.rb")
|
||||
}
|
||||
fn php_path() -> &'static Path {
|
||||
Path::new("test.php")
|
||||
}
|
||||
fn js_path() -> &'static Path {
|
||||
Path::new("test.js")
|
||||
}
|
||||
|
||||
// 1. `//` comment parsing
|
||||
#[test]
|
||||
fn slash_slash_comment_suppresses() {
|
||||
let src = "let x = 1; // nyx:ignore rule.a\n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
assert!(idx.check(1, "rule.a").is_some());
|
||||
assert!(idx.check(1, "rule.b").is_none());
|
||||
}
|
||||
|
||||
// 2. `#` comment parsing
|
||||
#[test]
|
||||
fn hash_comment_suppresses() {
|
||||
let src = "x = 1 # nyx:ignore rule.a\n";
|
||||
let idx = parse_inline_suppressions(py_path(), src);
|
||||
assert!(idx.check(1, "rule.a").is_some());
|
||||
}
|
||||
|
||||
// 3. `/* */` block comment
|
||||
#[test]
|
||||
fn block_comment_suppresses() {
|
||||
let src = "let x = 1; /* nyx:ignore rule.a */\n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
assert!(idx.check(1, "rule.a").is_some());
|
||||
}
|
||||
|
||||
// 4. Same-line semantics
|
||||
#[test]
|
||||
fn same_line_only_suppresses_own_line() {
|
||||
let src = "line1\nlet x = 1; // nyx:ignore rule.a\nline3\n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
assert!(idx.check(1, "rule.a").is_none());
|
||||
assert!(idx.check(2, "rule.a").is_some());
|
||||
assert!(idx.check(3, "rule.a").is_none());
|
||||
}
|
||||
|
||||
// 5. Next-line semantics
|
||||
#[test]
|
||||
fn next_line_suppresses_following_line() {
|
||||
let src = "// nyx:ignore-next-line rule.a\nlet x = dangerous();\nline3\n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
assert!(idx.check(1, "rule.a").is_none());
|
||||
assert!(idx.check(2, "rule.a").is_some());
|
||||
assert!(idx.check(3, "rule.a").is_none());
|
||||
}
|
||||
|
||||
// 6. Multiple rule IDs
|
||||
#[test]
|
||||
fn multiple_rule_ids() {
|
||||
let src = "let x = 1; // nyx:ignore a.b.c, x.y.z\n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
assert!(idx.check(1, "a.b.c").is_some());
|
||||
assert!(idx.check(1, "x.y.z").is_some());
|
||||
assert!(idx.check(1, "other").is_none());
|
||||
}
|
||||
|
||||
// 7. Wildcard suffix
|
||||
#[test]
|
||||
fn wildcard_suffix_matching() {
|
||||
let src = "let x = 1; // nyx:ignore rs.quality.*\n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
assert!(idx.check(1, "rs.quality.foo").is_some());
|
||||
assert!(idx.check(1, "rs.quality.bar").is_some());
|
||||
assert!(idx.check(1, "rs.other.foo").is_none());
|
||||
// Exact match of prefix without the dot should not match
|
||||
assert!(idx.check(1, "rs.quality").is_none());
|
||||
}
|
||||
|
||||
// 8. String literal guard
|
||||
#[test]
|
||||
fn string_literal_not_suppressed() {
|
||||
let src = "let x = \"// nyx:ignore rule.a\";\n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
assert!(idx.check(1, "rule.a").is_none());
|
||||
}
|
||||
|
||||
// 9. Rust raw string guard
|
||||
#[test]
|
||||
fn rust_raw_string_not_suppressed() {
|
||||
let src = "let x = r#\"// nyx:ignore rule.a\"#;\n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
assert!(idx.check(1, "rule.a").is_none());
|
||||
}
|
||||
|
||||
// 10. Rule ID mismatch
|
||||
#[test]
|
||||
fn rule_id_mismatch() {
|
||||
let src = "let x = 1; // nyx:ignore rule-a\n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
assert!(idx.check(1, "rule-a").is_some());
|
||||
assert!(idx.check(1, "rule-b").is_none());
|
||||
}
|
||||
|
||||
// 11. Taint rule ID canonicalization
|
||||
#[test]
|
||||
fn taint_rule_id_canonicalization() {
|
||||
let src = "let x = 1; // nyx:ignore taint-unsanitised-flow\n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
assert!(
|
||||
idx.check(1, "taint-unsanitised-flow (source 5:1)")
|
||||
.is_some()
|
||||
);
|
||||
assert!(idx.check(1, "taint-unsanitised-flow").is_some());
|
||||
}
|
||||
|
||||
// 12. Multiple directives targeting the same line
|
||||
#[test]
|
||||
fn multiple_directives_same_target() {
|
||||
let src = "// nyx:ignore-next-line rule-a\n// nyx:ignore-next-line rule-b\nlet x = dangerous();\n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
// First ignore-next-line targets line 2, second targets line 3
|
||||
assert!(idx.check(2, "rule-a").is_some());
|
||||
assert!(idx.check(3, "rule-b").is_some());
|
||||
}
|
||||
|
||||
// 13. Block comment with ignore-next-line
|
||||
#[test]
|
||||
fn block_comment_next_line() {
|
||||
let src = "/* nyx:ignore-next-line rule.a */\nlet x = dangerous();\n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
assert!(idx.check(2, "rule.a").is_some());
|
||||
}
|
||||
|
||||
// 14. EOF ignore-next-line is a no-op
|
||||
#[test]
|
||||
fn eof_next_line_no_panic() {
|
||||
let src = "// nyx:ignore-next-line rule.a";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
// Line 1 is the last line, so ignore-next-line targets line 2 which doesn't exist
|
||||
assert!(idx.check(1, "rule.a").is_none());
|
||||
assert!(idx.check(2, "rule.a").is_none());
|
||||
}
|
||||
|
||||
// 15. CRLF input
|
||||
#[test]
|
||||
fn crlf_line_endings() {
|
||||
let src = "let x = 1; // nyx:ignore rule.a\r\nlet y = 2;\r\n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
assert!(idx.check(1, "rule.a").is_some());
|
||||
assert!(idx.check(2, "rule.a").is_none());
|
||||
}
|
||||
|
||||
// 16. Whitespace tolerance
|
||||
#[test]
|
||||
fn whitespace_tolerance() {
|
||||
let src = "let x = 1; // nyx:ignore rule.a, rule.b \n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
assert!(idx.check(1, "rule.a").is_some());
|
||||
assert!(idx.check(1, "rule.b").is_some());
|
||||
}
|
||||
|
||||
// 17. PHP multi-style comments
|
||||
#[test]
|
||||
fn php_multi_style() {
|
||||
let src_hash = "<?php\n$x = 1; # nyx:ignore rule.a\n";
|
||||
let src_slash = "<?php\n$x = 1; // nyx:ignore rule.b\n";
|
||||
let idx_hash = parse_inline_suppressions(php_path(), src_hash);
|
||||
let idx_slash = parse_inline_suppressions(php_path(), src_slash);
|
||||
assert!(idx_hash.check(2, "rule.a").is_some());
|
||||
assert!(idx_slash.check(2, "rule.b").is_some());
|
||||
}
|
||||
|
||||
// ── canonical_rule_id tests ─────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn canonical_strips_parenthetical() {
|
||||
assert_eq!(
|
||||
canonical_rule_id("taint-unsanitised-flow (source 5:1)"),
|
||||
"taint-unsanitised-flow"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn canonical_no_parenthetical_unchanged() {
|
||||
assert_eq!(canonical_rule_id("rs.quality.unwrap"), "rs.quality.unwrap");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn canonical_trims_whitespace() {
|
||||
assert_eq!(canonical_rule_id(" rule.a "), "rule.a");
|
||||
}
|
||||
|
||||
// ── Ruby hash comment ───────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn ruby_hash_comment() {
|
||||
let src = "x = dangerous # nyx:ignore rule.a\n";
|
||||
let idx = parse_inline_suppressions(rb_path(), src);
|
||||
assert!(idx.check(1, "rule.a").is_some());
|
||||
}
|
||||
|
||||
// ── JS template literal guard ───────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn js_template_literal_not_suppressed() {
|
||||
let src = "let x = `// nyx:ignore rule.a`;\n";
|
||||
let idx = parse_inline_suppressions(js_path(), src);
|
||||
assert!(idx.check(1, "rule.a").is_none());
|
||||
}
|
||||
|
||||
// ── Multiline block comment ─────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn multiline_block_comment() {
|
||||
let src = "/*\n * nyx:ignore rule.a\n */\nlet x = dangerous;\n";
|
||||
let idx = parse_inline_suppressions(rust_path(), src);
|
||||
// The directive is on line 2, same-line → targets line 2
|
||||
assert!(idx.check(2, "rule.a").is_some());
|
||||
}
|
||||
}
|
||||
620
src/taint/domain.rs
Normal file
620
src/taint/domain.rs
Normal file
|
|
@ -0,0 +1,620 @@
|
|||
use crate::labels::{Cap, SourceKind};
|
||||
use crate::state::lattice::Lattice;
|
||||
use crate::state::symbol::SymbolId;
|
||||
use crate::taint::path_state::PredicateKind;
|
||||
use petgraph::graph::NodeIndex;
|
||||
use smallvec::SmallVec;
|
||||
|
||||
/// Maximum origins tracked per variable (bounded to prevent growth).
|
||||
const MAX_ORIGINS_PER_VAR: usize = 4;
|
||||
|
||||
/// Per-variable taint information.
|
||||
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||
pub struct VarTaint {
|
||||
pub caps: Cap,
|
||||
/// Up to N origins that contributed taint (bounded).
|
||||
pub origins: SmallVec<[TaintOrigin; 2]>,
|
||||
}
|
||||
|
||||
/// A single taint origin — the node and classification of where taint came from.
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
|
||||
pub struct TaintOrigin {
|
||||
pub node: NodeIndex,
|
||||
pub source_kind: SourceKind,
|
||||
}
|
||||
|
||||
/// Compact bitset for up to 64 variables (indexed by SymbolId ordinal).
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
|
||||
pub struct SmallBitSet(u64);
|
||||
|
||||
impl SmallBitSet {
|
||||
pub fn empty() -> Self {
|
||||
Self(0)
|
||||
}
|
||||
|
||||
pub fn insert(&mut self, id: SymbolId) {
|
||||
let idx = id.0;
|
||||
if idx < 64 {
|
||||
self.0 |= 1u64 << idx;
|
||||
}
|
||||
}
|
||||
|
||||
pub fn contains(&self, id: SymbolId) -> bool {
|
||||
let idx = id.0;
|
||||
if idx < 64 {
|
||||
self.0 & (1u64 << idx) != 0
|
||||
} else {
|
||||
false
|
||||
}
|
||||
}
|
||||
|
||||
/// Union: self | other
|
||||
pub fn union(self, other: Self) -> Self {
|
||||
Self(self.0 | other.0)
|
||||
}
|
||||
|
||||
/// Intersection: self & other
|
||||
pub fn intersection(self, other: Self) -> Self {
|
||||
Self(self.0 & other.0)
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
pub fn is_empty(self) -> bool {
|
||||
self.0 == 0
|
||||
}
|
||||
|
||||
/// Whether self is a subset of other.
|
||||
#[allow(dead_code)] // used by Lattice::leq
|
||||
pub fn is_subset_of(self, other: Self) -> bool {
|
||||
self.0 & other.0 == self.0
|
||||
}
|
||||
|
||||
/// Whether self is a superset of other.
|
||||
#[allow(dead_code)] // used by Lattice::leq
|
||||
pub fn is_superset_of(self, other: Self) -> bool {
|
||||
other.is_subset_of(self)
|
||||
}
|
||||
}
|
||||
|
||||
/// Monotone predicate summary per variable.
|
||||
///
|
||||
/// Tracks which whitelisted predicate kinds are known true/false on ALL paths.
|
||||
/// join = intersection of bits (must-hold semantics).
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
|
||||
pub struct PredicateSummary {
|
||||
/// Bitmask: bit 0=NullCheck, 1=EmptyCheck, 2=ErrorCheck
|
||||
pub known_true: u8,
|
||||
pub known_false: u8,
|
||||
}
|
||||
|
||||
impl PredicateSummary {
|
||||
pub fn empty() -> Self {
|
||||
Self {
|
||||
known_true: 0,
|
||||
known_false: 0,
|
||||
}
|
||||
}
|
||||
|
||||
/// Join = intersection (only predicates true on ALL paths).
|
||||
pub fn join(self, other: Self) -> Self {
|
||||
Self {
|
||||
known_true: self.known_true & other.known_true,
|
||||
known_false: self.known_false & other.known_false,
|
||||
}
|
||||
}
|
||||
|
||||
/// Check for contradiction: same kind known both true and false.
|
||||
pub fn has_contradiction(self) -> bool {
|
||||
self.known_true & self.known_false != 0
|
||||
}
|
||||
|
||||
pub fn is_empty(self) -> bool {
|
||||
self.known_true == 0 && self.known_false == 0
|
||||
}
|
||||
}
|
||||
|
||||
/// Map a whitelisted PredicateKind to its bit index (0-2).
|
||||
/// Returns None for non-whitelisted kinds.
|
||||
pub fn predicate_kind_bit(kind: PredicateKind) -> Option<u8> {
|
||||
match kind {
|
||||
PredicateKind::NullCheck => Some(0),
|
||||
PredicateKind::EmptyCheck => Some(1),
|
||||
PredicateKind::ErrorCheck => Some(2),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// The abstract taint state at a program point.
|
||||
///
|
||||
/// Uses sorted SmallVec keyed by SymbolId for O(n) merge-join.
|
||||
/// Variables beyond the interner's capacity are naturally excluded.
|
||||
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||
pub struct TaintState {
|
||||
/// Per-variable taint, sorted by SymbolId.
|
||||
pub vars: SmallVec<[(SymbolId, VarTaint); 16]>,
|
||||
|
||||
/// Variables validated on ALL paths (intersection on join).
|
||||
pub validated_must: SmallBitSet,
|
||||
|
||||
/// Variables validated on ANY path (union on join).
|
||||
pub validated_may: SmallBitSet,
|
||||
|
||||
/// Per-variable predicate summary (sorted by SymbolId).
|
||||
pub predicates: SmallVec<[(SymbolId, PredicateSummary); 4]>,
|
||||
}
|
||||
|
||||
impl TaintState {
|
||||
/// Create the initial state (no taint, no validation, no predicates).
|
||||
pub fn initial() -> Self {
|
||||
Self {
|
||||
vars: SmallVec::new(),
|
||||
validated_must: SmallBitSet::empty(),
|
||||
validated_may: SmallBitSet::empty(),
|
||||
predicates: SmallVec::new(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Look up taint for a variable.
|
||||
pub fn get(&self, sym: SymbolId) -> Option<&VarTaint> {
|
||||
self.vars
|
||||
.binary_search_by_key(&sym, |(id, _)| *id)
|
||||
.ok()
|
||||
.map(|idx| &self.vars[idx].1)
|
||||
}
|
||||
|
||||
/// Insert or update taint for a variable.
|
||||
pub fn set(&mut self, sym: SymbolId, taint: VarTaint) {
|
||||
match self.vars.binary_search_by_key(&sym, |(id, _)| *id) {
|
||||
Ok(idx) => self.vars[idx].1 = taint,
|
||||
Err(idx) => self.vars.insert(idx, (sym, taint)),
|
||||
}
|
||||
}
|
||||
|
||||
/// Remove taint for a variable.
|
||||
pub fn remove(&mut self, sym: SymbolId) {
|
||||
if let Ok(idx) = self.vars.binary_search_by_key(&sym, |(id, _)| *id) {
|
||||
self.vars.remove(idx);
|
||||
}
|
||||
}
|
||||
|
||||
/// Set a predicate summary for a variable.
|
||||
pub fn set_predicate(&mut self, sym: SymbolId, summary: PredicateSummary) {
|
||||
match self.predicates.binary_search_by_key(&sym, |(id, _)| *id) {
|
||||
Ok(idx) => self.predicates[idx].1 = summary,
|
||||
Err(idx) => self.predicates.insert(idx, (sym, summary)),
|
||||
}
|
||||
}
|
||||
|
||||
/// Get predicate summary for a variable.
|
||||
pub fn get_predicate(&self, sym: SymbolId) -> PredicateSummary {
|
||||
self.predicates
|
||||
.binary_search_by_key(&sym, |(id, _)| *id)
|
||||
.ok()
|
||||
.map(|idx| self.predicates[idx].1)
|
||||
.unwrap_or_else(PredicateSummary::empty)
|
||||
}
|
||||
|
||||
/// Check if any variable has contradictory predicates.
|
||||
pub fn has_contradiction(&self) -> bool {
|
||||
self.predicates.iter().any(|(_, s)| s.has_contradiction())
|
||||
}
|
||||
}
|
||||
|
||||
impl Lattice for TaintState {
|
||||
fn bot() -> Self {
|
||||
Self::initial()
|
||||
}
|
||||
|
||||
fn join(&self, other: &Self) -> Self {
|
||||
// Merge-join vars (sorted by SymbolId)
|
||||
let vars = merge_join_vars(&self.vars, &other.vars);
|
||||
|
||||
// validated_must = intersection (must hold on ALL paths)
|
||||
let validated_must = self.validated_must.intersection(other.validated_must);
|
||||
|
||||
// validated_may = union (holds on ANY path)
|
||||
let validated_may = self.validated_may.union(other.validated_may);
|
||||
|
||||
// predicates = per-key intersection of known_true/known_false bits
|
||||
let predicates = merge_join_predicates(&self.predicates, &other.predicates);
|
||||
|
||||
TaintState {
|
||||
vars,
|
||||
validated_must,
|
||||
validated_may,
|
||||
predicates,
|
||||
}
|
||||
}
|
||||
|
||||
fn leq(&self, other: &Self) -> bool {
|
||||
// Per-key Cap subset + origins subset
|
||||
if !vars_leq(&self.vars, &other.vars) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// validated_must: self ⊇ other (superset = less info = lower)
|
||||
if !self.validated_must.is_superset_of(other.validated_must) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// validated_may: self ⊆ other
|
||||
if !self.validated_may.is_subset_of(other.validated_may) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// predicates: self.known_true ⊇ other.known_true (more precise = lower)
|
||||
predicates_leq(&self.predicates, &other.predicates)
|
||||
}
|
||||
}
|
||||
|
||||
/// Merge-join two sorted var lists: per-key Cap OR + origins merge (bounded).
|
||||
fn merge_join_vars(
|
||||
a: &[(SymbolId, VarTaint)],
|
||||
b: &[(SymbolId, VarTaint)],
|
||||
) -> SmallVec<[(SymbolId, VarTaint); 16]> {
|
||||
let mut result = SmallVec::with_capacity(a.len().max(b.len()));
|
||||
let (mut i, mut j) = (0, 0);
|
||||
|
||||
while i < a.len() && j < b.len() {
|
||||
match a[i].0.cmp(&b[j].0) {
|
||||
std::cmp::Ordering::Less => {
|
||||
result.push(a[i].clone());
|
||||
i += 1;
|
||||
}
|
||||
std::cmp::Ordering::Greater => {
|
||||
result.push(b[j].clone());
|
||||
j += 1;
|
||||
}
|
||||
std::cmp::Ordering::Equal => {
|
||||
let caps = a[i].1.caps | b[j].1.caps;
|
||||
let origins = merge_origins(&a[i].1.origins, &b[j].1.origins);
|
||||
result.push((a[i].0, VarTaint { caps, origins }));
|
||||
i += 1;
|
||||
j += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Remaining from either side
|
||||
while i < a.len() {
|
||||
result.push(a[i].clone());
|
||||
i += 1;
|
||||
}
|
||||
while j < b.len() {
|
||||
result.push(b[j].clone());
|
||||
j += 1;
|
||||
}
|
||||
|
||||
result
|
||||
}
|
||||
|
||||
/// Merge two origin lists, deduplicating by node and bounding at MAX_ORIGINS_PER_VAR.
|
||||
fn merge_origins(
|
||||
a: &SmallVec<[TaintOrigin; 2]>,
|
||||
b: &SmallVec<[TaintOrigin; 2]>,
|
||||
) -> SmallVec<[TaintOrigin; 2]> {
|
||||
let mut merged = a.clone();
|
||||
for origin in b {
|
||||
if merged.len() >= MAX_ORIGINS_PER_VAR {
|
||||
break;
|
||||
}
|
||||
if !merged.iter().any(|o| o.node == origin.node) {
|
||||
merged.push(*origin);
|
||||
}
|
||||
}
|
||||
merged
|
||||
}
|
||||
|
||||
/// Check if a.vars ⊑ b.vars (per-key Cap subset + origins subset).
|
||||
#[allow(dead_code)] // called by Lattice::leq
|
||||
fn vars_leq(a: &[(SymbolId, VarTaint)], b: &[(SymbolId, VarTaint)]) -> bool {
|
||||
let (mut i, mut j) = (0, 0);
|
||||
|
||||
while i < a.len() {
|
||||
if j >= b.len() {
|
||||
return false; // a has keys not in b → not ⊑
|
||||
}
|
||||
match a[i].0.cmp(&b[j].0) {
|
||||
std::cmp::Ordering::Less => return false, // key in a but not b
|
||||
std::cmp::Ordering::Greater => {
|
||||
j += 1; // key only in b, skip
|
||||
}
|
||||
std::cmp::Ordering::Equal => {
|
||||
// Cap subset check
|
||||
if a[i].1.caps & b[j].1.caps != a[i].1.caps {
|
||||
return false;
|
||||
}
|
||||
// Origins subset check (by node)
|
||||
for orig in &a[i].1.origins {
|
||||
if !b[j].1.origins.iter().any(|o| o.node == orig.node) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
i += 1;
|
||||
j += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
true
|
||||
}
|
||||
|
||||
/// Merge-join predicate summaries with intersection semantics.
|
||||
fn merge_join_predicates(
|
||||
a: &[(SymbolId, PredicateSummary)],
|
||||
b: &[(SymbolId, PredicateSummary)],
|
||||
) -> SmallVec<[(SymbolId, PredicateSummary); 4]> {
|
||||
let mut result = SmallVec::new();
|
||||
let (mut i, mut j) = (0, 0);
|
||||
|
||||
while i < a.len() && j < b.len() {
|
||||
match a[i].0.cmp(&b[j].0) {
|
||||
std::cmp::Ordering::Less => {
|
||||
// Key only in a — intersection with empty = empty → drop
|
||||
i += 1;
|
||||
}
|
||||
std::cmp::Ordering::Greater => {
|
||||
j += 1;
|
||||
}
|
||||
std::cmp::Ordering::Equal => {
|
||||
let joined = a[i].1.join(b[j].1);
|
||||
if !joined.is_empty() {
|
||||
result.push((a[i].0, joined));
|
||||
}
|
||||
i += 1;
|
||||
j += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
// Keys only in one side → intersection with empty = drop
|
||||
|
||||
result
|
||||
}
|
||||
|
||||
/// Check if a.predicates ⊑ b.predicates.
|
||||
/// More precise (more known_true bits) = lower in the lattice.
|
||||
/// So a ⊑ b means a.known_true ⊇ b.known_true for each key.
|
||||
#[allow(dead_code)] // called by Lattice::leq
|
||||
fn predicates_leq(a: &[(SymbolId, PredicateSummary)], b: &[(SymbolId, PredicateSummary)]) -> bool {
|
||||
let (mut i, mut j) = (0, 0);
|
||||
|
||||
// For each key in b, a must have at least as many bits
|
||||
while j < b.len() {
|
||||
if i >= a.len() {
|
||||
// b has keys that a doesn't — a is missing info = not lower
|
||||
return false;
|
||||
}
|
||||
match a[i].0.cmp(&b[j].0) {
|
||||
std::cmp::Ordering::Less => {
|
||||
// a has extra keys (more info) — OK for leq
|
||||
i += 1;
|
||||
}
|
||||
std::cmp::Ordering::Greater => {
|
||||
// b has a key that a doesn't → a has fewer bits → not ⊑
|
||||
return false;
|
||||
}
|
||||
std::cmp::Ordering::Equal => {
|
||||
// a.known_true must be a superset of b.known_true
|
||||
if a[i].1.known_true & b[j].1.known_true != b[j].1.known_true {
|
||||
return false;
|
||||
}
|
||||
if a[i].1.known_false & b[j].1.known_false != b[j].1.known_false {
|
||||
return false;
|
||||
}
|
||||
i += 1;
|
||||
j += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
true
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn make_taint(sym: u32, caps: Cap) -> (SymbolId, VarTaint) {
|
||||
(
|
||||
SymbolId(sym),
|
||||
VarTaint {
|
||||
caps,
|
||||
origins: SmallVec::new(),
|
||||
},
|
||||
)
|
||||
}
|
||||
|
||||
fn make_taint_with_origin(sym: u32, caps: Cap, node: usize) -> (SymbolId, VarTaint) {
|
||||
(
|
||||
SymbolId(sym),
|
||||
VarTaint {
|
||||
caps,
|
||||
origins: smallvec::smallvec![TaintOrigin {
|
||||
node: NodeIndex::new(node),
|
||||
source_kind: SourceKind::Unknown,
|
||||
}],
|
||||
},
|
||||
)
|
||||
}
|
||||
|
||||
fn state_with_vars(vars: Vec<(SymbolId, VarTaint)>) -> TaintState {
|
||||
let mut s = TaintState::initial();
|
||||
s.vars = SmallVec::from_vec(vars);
|
||||
s
|
||||
}
|
||||
|
||||
// ── Lattice property tests ──────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn bot_identity() {
|
||||
let a = state_with_vars(vec![make_taint(0, Cap::ENV_VAR)]);
|
||||
assert_eq!(a.join(&TaintState::bot()), a);
|
||||
assert_eq!(TaintState::bot().join(&a), a);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn join_commutativity() {
|
||||
let a = state_with_vars(vec![make_taint(0, Cap::ENV_VAR)]);
|
||||
let b = state_with_vars(vec![make_taint(1, Cap::SHELL_ESCAPE)]);
|
||||
assert_eq!(a.join(&b), b.join(&a));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn join_associativity() {
|
||||
let a = state_with_vars(vec![make_taint(0, Cap::ENV_VAR)]);
|
||||
let b = state_with_vars(vec![make_taint(0, Cap::SHELL_ESCAPE)]);
|
||||
let c = state_with_vars(vec![make_taint(1, Cap::HTML_ESCAPE)]);
|
||||
assert_eq!(a.join(&b).join(&c), a.join(&b.join(&c)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn join_idempotency() {
|
||||
let a = state_with_vars(vec![make_taint(0, Cap::ENV_VAR | Cap::SHELL_ESCAPE)]);
|
||||
assert_eq!(a.join(&a), a);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn leq_reflexive() {
|
||||
let a = state_with_vars(vec![make_taint(0, Cap::ENV_VAR)]);
|
||||
assert!(a.leq(&a));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn leq_consistent_with_join() {
|
||||
let a = state_with_vars(vec![make_taint(0, Cap::ENV_VAR)]);
|
||||
let b = state_with_vars(vec![make_taint(0, Cap::ENV_VAR | Cap::SHELL_ESCAPE)]);
|
||||
assert!(a.leq(&b));
|
||||
assert_eq!(a.join(&b), b);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn join_merges_caps() {
|
||||
let a = state_with_vars(vec![make_taint(0, Cap::ENV_VAR)]);
|
||||
let b = state_with_vars(vec![make_taint(0, Cap::SHELL_ESCAPE)]);
|
||||
let joined = a.join(&b);
|
||||
assert_eq!(
|
||||
joined.get(SymbolId(0)).unwrap().caps,
|
||||
Cap::ENV_VAR | Cap::SHELL_ESCAPE
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn join_merges_origins() {
|
||||
let a = state_with_vars(vec![make_taint_with_origin(0, Cap::ENV_VAR, 1)]);
|
||||
let b = state_with_vars(vec![make_taint_with_origin(0, Cap::ENV_VAR, 2)]);
|
||||
let joined = a.join(&b);
|
||||
assert_eq!(joined.get(SymbolId(0)).unwrap().origins.len(), 2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn validated_must_intersection() {
|
||||
let mut a = TaintState::initial();
|
||||
a.validated_must.insert(SymbolId(0));
|
||||
a.validated_must.insert(SymbolId(1));
|
||||
|
||||
let mut b = TaintState::initial();
|
||||
b.validated_must.insert(SymbolId(1));
|
||||
b.validated_must.insert(SymbolId(2));
|
||||
|
||||
let joined = a.join(&b);
|
||||
assert!(!joined.validated_must.contains(SymbolId(0)));
|
||||
assert!(joined.validated_must.contains(SymbolId(1)));
|
||||
assert!(!joined.validated_must.contains(SymbolId(2)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn validated_may_union() {
|
||||
let mut a = TaintState::initial();
|
||||
a.validated_may.insert(SymbolId(0));
|
||||
|
||||
let mut b = TaintState::initial();
|
||||
b.validated_may.insert(SymbolId(1));
|
||||
|
||||
let joined = a.join(&b);
|
||||
assert!(joined.validated_may.contains(SymbolId(0)));
|
||||
assert!(joined.validated_may.contains(SymbolId(1)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn predicate_contradiction() {
|
||||
let mut state = TaintState::initial();
|
||||
state.set_predicate(
|
||||
SymbolId(0),
|
||||
PredicateSummary {
|
||||
known_true: 1, // NullCheck true
|
||||
known_false: 1, // NullCheck false
|
||||
},
|
||||
);
|
||||
assert!(state.has_contradiction());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn predicate_no_contradiction() {
|
||||
let mut state = TaintState::initial();
|
||||
state.set_predicate(
|
||||
SymbolId(0),
|
||||
PredicateSummary {
|
||||
known_true: 1, // NullCheck true
|
||||
known_false: 2, // EmptyCheck false (different kind)
|
||||
},
|
||||
);
|
||||
assert!(!state.has_contradiction());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn predicate_join_intersection() {
|
||||
let mut a = TaintState::initial();
|
||||
a.set_predicate(
|
||||
SymbolId(0),
|
||||
PredicateSummary {
|
||||
known_true: 0b011, // NullCheck + EmptyCheck
|
||||
known_false: 0,
|
||||
},
|
||||
);
|
||||
|
||||
let mut b = TaintState::initial();
|
||||
b.set_predicate(
|
||||
SymbolId(0),
|
||||
PredicateSummary {
|
||||
known_true: 0b010, // EmptyCheck only
|
||||
known_false: 0,
|
||||
},
|
||||
);
|
||||
|
||||
let joined = a.join(&b);
|
||||
let pred = joined.get_predicate(SymbolId(0));
|
||||
assert_eq!(pred.known_true, 0b010); // only EmptyCheck on both paths
|
||||
}
|
||||
|
||||
// ── SmallBitSet tests ───────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn small_bitset_basic() {
|
||||
let mut bs = SmallBitSet::empty();
|
||||
assert!(bs.is_empty());
|
||||
|
||||
bs.insert(SymbolId(0));
|
||||
assert!(bs.contains(SymbolId(0)));
|
||||
assert!(!bs.contains(SymbolId(1)));
|
||||
assert!(!bs.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn small_bitset_union_intersection() {
|
||||
let mut a = SmallBitSet::empty();
|
||||
a.insert(SymbolId(0));
|
||||
a.insert(SymbolId(2));
|
||||
|
||||
let mut b = SmallBitSet::empty();
|
||||
b.insert(SymbolId(1));
|
||||
b.insert(SymbolId(2));
|
||||
|
||||
let u = a.union(b);
|
||||
assert!(u.contains(SymbolId(0)));
|
||||
assert!(u.contains(SymbolId(1)));
|
||||
assert!(u.contains(SymbolId(2)));
|
||||
|
||||
let i = a.intersection(b);
|
||||
assert!(!i.contains(SymbolId(0)));
|
||||
assert!(!i.contains(SymbolId(1)));
|
||||
assert!(i.contains(SymbolId(2)));
|
||||
}
|
||||
}
|
||||
563
src/taint/mod.rs
563
src/taint/mod.rs
|
|
@ -1,11 +1,21 @@
|
|||
use crate::cfg::{Cfg, FuncSummaries, NodeInfo, StmtKind};
|
||||
pub mod domain;
|
||||
pub mod path_state;
|
||||
pub mod transfer;
|
||||
|
||||
use crate::cfg::{Cfg, FuncSummaries};
|
||||
use crate::interop::InteropEdge;
|
||||
use crate::labels::{Cap, DataLabel, SourceKind};
|
||||
use crate::labels::SourceKind;
|
||||
use crate::state::engine::{self, MAX_TRACKED_VARS};
|
||||
use crate::state::lattice::Lattice;
|
||||
use crate::state::symbol::SymbolInterner;
|
||||
use crate::summary::GlobalSummaries;
|
||||
use crate::symbol::Lang;
|
||||
use domain::TaintState;
|
||||
use path_state::PredicateKind;
|
||||
use petgraph::graph::NodeIndex;
|
||||
use std::collections::HashMap;
|
||||
use tracing::debug;
|
||||
use petgraph::visit::IntoNodeReferences;
|
||||
use std::collections::HashSet;
|
||||
use transfer::{TaintEvent, TaintTransfer};
|
||||
|
||||
/// A detected taint finding with both source and sink locations.
|
||||
#[derive(Debug, Clone)]
|
||||
|
|
@ -20,269 +30,23 @@ pub struct Finding {
|
|||
pub path: Vec<NodeIndex>,
|
||||
/// The kind of source that originated the taint.
|
||||
pub source_kind: SourceKind,
|
||||
}
|
||||
|
||||
/// Order-independent hash of a taint map.
|
||||
///
|
||||
/// Uses XOR of per-entry hashes so the result is the same regardless of
|
||||
/// iteration order — no allocation or sorting required.
|
||||
fn taint_hash(taint: &HashMap<String, Cap>) -> u64 {
|
||||
let mut h: u64 = 0;
|
||||
for (k, bits) in taint {
|
||||
// Per-entry hash: FNV-1a-style mixing of key bytes + cap bits.
|
||||
let mut entry_h: u64 = 0xcbf2_9ce4_8422_2325; // FNV offset basis
|
||||
for b in k.as_bytes() {
|
||||
entry_h ^= *b as u64;
|
||||
entry_h = entry_h.wrapping_mul(0x0100_0000_01b3); // FNV prime
|
||||
}
|
||||
entry_h ^= bits.bits() as u64;
|
||||
entry_h = entry_h.wrapping_mul(0x0100_0000_01b3);
|
||||
h ^= entry_h;
|
||||
}
|
||||
h
|
||||
}
|
||||
|
||||
/// Resolved summary for a callee — a uniform view regardless of whether the
|
||||
/// summary came from a local (same‑file) or global (cross‑file) source.
|
||||
struct ResolvedSummary {
|
||||
source_caps: Cap,
|
||||
sanitizer_caps: Cap,
|
||||
sink_caps: Cap,
|
||||
propagates_taint: bool,
|
||||
}
|
||||
|
||||
/// Try to resolve a callee name using conservative same-language resolution.
|
||||
///
|
||||
/// Resolution order:
|
||||
/// 1. Local (same-file): exact name + same lang + same namespace
|
||||
/// 2. Global same-language: via `lookup_same_lang`; must be unambiguous
|
||||
/// 3. Interop edges: explicit cross-language bridges
|
||||
/// 4. No cross-language fallback
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
fn resolve_callee(
|
||||
callee: &str,
|
||||
caller_lang: Lang,
|
||||
caller_namespace: &str,
|
||||
caller_func: &str,
|
||||
call_ordinal: u32,
|
||||
local: &FuncSummaries,
|
||||
global: Option<&GlobalSummaries>,
|
||||
interop_edges: &[InteropEdge],
|
||||
) -> Option<ResolvedSummary> {
|
||||
// 1) Local (same-file): scan local summaries for matching name + lang + namespace
|
||||
let local_matches: Vec<_> = local
|
||||
.iter()
|
||||
.filter(|(k, _)| {
|
||||
k.name == callee && k.lang == caller_lang && k.namespace == caller_namespace
|
||||
})
|
||||
.collect();
|
||||
|
||||
if local_matches.len() == 1 {
|
||||
let (_, ls) = local_matches[0];
|
||||
return Some(ResolvedSummary {
|
||||
source_caps: ls.source_caps,
|
||||
sanitizer_caps: ls.sanitizer_caps,
|
||||
sink_caps: ls.sink_caps,
|
||||
propagates_taint: ls.propagates_taint,
|
||||
});
|
||||
}
|
||||
|
||||
// Multiple local matches — try arity disambiguation (future), for now return None
|
||||
if local_matches.len() > 1 {
|
||||
return None;
|
||||
}
|
||||
|
||||
// 2) Global same-language
|
||||
if let Some(gs) = global {
|
||||
let matches = gs.lookup_same_lang(caller_lang, callee);
|
||||
if matches.len() == 1 {
|
||||
let (_, fs) = matches[0];
|
||||
return Some(ResolvedSummary {
|
||||
source_caps: fs.source_caps(),
|
||||
sanitizer_caps: fs.sanitizer_caps(),
|
||||
sink_caps: fs.sink_caps(),
|
||||
propagates_taint: fs.propagates_taint,
|
||||
});
|
||||
}
|
||||
// Multiple matches — try namespace match first
|
||||
if matches.len() > 1 {
|
||||
let same_ns: Vec<_> = matches
|
||||
.iter()
|
||||
.filter(|(k, _)| k.namespace == caller_namespace)
|
||||
.collect();
|
||||
if same_ns.len() == 1 {
|
||||
let (_, fs) = same_ns[0];
|
||||
return Some(ResolvedSummary {
|
||||
source_caps: fs.source_caps(),
|
||||
sanitizer_caps: fs.sanitizer_caps(),
|
||||
sink_caps: fs.sink_caps(),
|
||||
propagates_taint: fs.propagates_taint,
|
||||
});
|
||||
}
|
||||
// Still ambiguous — return None (conservative)
|
||||
return None;
|
||||
}
|
||||
}
|
||||
|
||||
// 3) Interop edges: explicit cross-language bridges
|
||||
for edge in interop_edges {
|
||||
if edge.from.caller_lang == caller_lang
|
||||
&& edge.from.caller_namespace == caller_namespace
|
||||
&& edge.from.callee_symbol == callee
|
||||
&& (edge.from.caller_func.is_empty() || edge.from.caller_func == caller_func)
|
||||
&& (edge.from.ordinal == 0 || edge.from.ordinal == call_ordinal)
|
||||
{
|
||||
// Look up the target in global summaries by exact FuncKey
|
||||
if let Some(gs) = global
|
||||
&& let Some(fs) = gs.get(&edge.to)
|
||||
{
|
||||
return Some(ResolvedSummary {
|
||||
source_caps: fs.source_caps(),
|
||||
sanitizer_caps: fs.sanitizer_caps(),
|
||||
sink_caps: fs.sink_caps(),
|
||||
propagates_taint: fs.propagates_taint,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 4) No cross-language fallback
|
||||
None
|
||||
}
|
||||
|
||||
/// Apply taint transfer for a single node, mutating `out` in place.
|
||||
///
|
||||
/// Callers should clone the taint map before calling if they need
|
||||
/// the original state preserved.
|
||||
fn apply_taint(
|
||||
node: &NodeInfo,
|
||||
out: &mut HashMap<String, Cap>,
|
||||
local_summaries: &FuncSummaries,
|
||||
global_summaries: Option<&GlobalSummaries>,
|
||||
caller_lang: Lang,
|
||||
caller_namespace: &str,
|
||||
interop_edges: &[InteropEdge],
|
||||
) {
|
||||
debug!(target: "taint", "Applying taint to node: {:?}", node);
|
||||
debug!(target: "taint", "Taint: {:?}", out);
|
||||
|
||||
let caller_func = node.enclosing_func.as_deref().unwrap_or("");
|
||||
|
||||
match node.label {
|
||||
// A new untrusted value enters the program
|
||||
Some(DataLabel::Source(bits)) => {
|
||||
if let Some(v) = &node.defines {
|
||||
out.insert(v.clone(), bits);
|
||||
}
|
||||
}
|
||||
// Sanitizer: propagate input taint through the assignment FIRST,
|
||||
// then strip the sanitizer's capability bits. This ensures that
|
||||
// `let y = sanitize_html(&x)` gives y the taint of x minus the
|
||||
// HTML_ESCAPE bit — rather than leaving y completely clean (which
|
||||
// would hide "wrong sanitiser for this sink" bugs).
|
||||
Some(DataLabel::Sanitizer(bits)) => {
|
||||
if let Some(v) = &node.defines {
|
||||
// 1. Propagate: union taint from all read variables
|
||||
let mut combined = Cap::empty();
|
||||
for u in &node.uses {
|
||||
if let Some(b) = out.get(u) {
|
||||
combined |= *b;
|
||||
}
|
||||
}
|
||||
// 2. Strip the sanitiser's bits
|
||||
let new = combined & !bits;
|
||||
if new.is_empty() {
|
||||
out.remove(v);
|
||||
} else {
|
||||
out.insert(v.clone(), new);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// A function call — resolve against local + global summaries
|
||||
_ if node.kind == StmtKind::Call => {
|
||||
if let Some(callee) = &node.callee
|
||||
&& let Some(resolved) = resolve_callee(
|
||||
callee,
|
||||
caller_lang,
|
||||
caller_namespace,
|
||||
caller_func,
|
||||
node.call_ordinal,
|
||||
local_summaries,
|
||||
global_summaries,
|
||||
interop_edges,
|
||||
)
|
||||
{
|
||||
// Build the return value's taint bits in stages, then
|
||||
// write once at the end. Order matters:
|
||||
//
|
||||
// 1. Start with fresh source taint (if the callee is a source)
|
||||
// 2. Union with propagated arg taint (if the callee propagates)
|
||||
// 3. Strip sanitizer bits last (so sanitization always wins)
|
||||
|
||||
let mut return_bits = Cap::empty();
|
||||
|
||||
// ── 1. Source behaviour ──
|
||||
return_bits |= resolved.source_caps;
|
||||
|
||||
// ── 2. Propagation ──
|
||||
if resolved.propagates_taint {
|
||||
for u in &node.uses {
|
||||
if let Some(bits) = out.get(u) {
|
||||
return_bits |= *bits;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── 3. Sanitizer behaviour (applied last so it always wins) ──
|
||||
return_bits &= !resolved.sanitizer_caps;
|
||||
|
||||
// ── Write the result ──
|
||||
if let Some(v) = &node.defines {
|
||||
if return_bits.is_empty() {
|
||||
out.remove(v);
|
||||
} else {
|
||||
out.insert(v.clone(), return_bits);
|
||||
}
|
||||
}
|
||||
|
||||
// ── Sink behaviour: handled in the main analysis loop
|
||||
// (checked via node.label or resolved summary) ──
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
// Unresolved call — fall through to default gen/kill below
|
||||
}
|
||||
|
||||
// All other statements: classic gen/kill for assignments
|
||||
_ => {}
|
||||
}
|
||||
|
||||
// Default gen/kill: propagate taint through variable assignments
|
||||
if !matches!(
|
||||
node.label,
|
||||
Some(DataLabel::Source(_)) | Some(DataLabel::Sanitizer(_))
|
||||
) && let Some(d) = &node.defines
|
||||
{
|
||||
let mut combined = Cap::empty();
|
||||
for u in &node.uses {
|
||||
if let Some(bits) = out.get(u) {
|
||||
combined |= *bits;
|
||||
}
|
||||
}
|
||||
if combined.is_empty() {
|
||||
out.remove(d);
|
||||
} else {
|
||||
out.insert(d.clone(), combined);
|
||||
}
|
||||
}
|
||||
/// Whether all tainted sink variables are guarded by a validation
|
||||
/// predicate on this path (metadata only — does not change severity).
|
||||
#[allow(dead_code)] // surfaced in Diag output (task 4)
|
||||
pub path_validated: bool,
|
||||
/// The kind of validation guard protecting this path, if any.
|
||||
#[allow(dead_code)] // surfaced in Diag output (task 4)
|
||||
pub guard_kind: Option<PredicateKind>,
|
||||
}
|
||||
|
||||
/// Run taint analysis on a single file's CFG.
|
||||
///
|
||||
/// `global_summaries` is `None` for pass‑1 / single‑file mode and
|
||||
/// `Some(&map)` for pass‑2 cross‑file analysis.
|
||||
/// Uses a monotone forward dataflow analysis via `state::engine::run_forward`
|
||||
/// with the `TaintTransfer` function. Termination is guaranteed by lattice
|
||||
/// finiteness (bounded `Cap` bits × bounded variable count).
|
||||
///
|
||||
/// For JS/TS files: uses a two-level solve to prevent cross-function taint
|
||||
/// leakage while preserving global-to-function flows.
|
||||
pub fn analyse_file(
|
||||
cfg: &Cfg,
|
||||
entry: NodeIndex,
|
||||
|
|
@ -292,162 +56,155 @@ pub fn analyse_file(
|
|||
caller_namespace: &str,
|
||||
interop_edges: &[InteropEdge],
|
||||
) -> Vec<Finding> {
|
||||
use std::collections::{HashMap, HashSet, VecDeque};
|
||||
let _span = tracing::debug_span!("taint_analyse_file").entered();
|
||||
|
||||
/// Queue item: current CFG node + taint map that holds here
|
||||
#[derive(Clone)]
|
||||
struct Item {
|
||||
node: NodeIndex,
|
||||
taint: HashMap<String, Cap>,
|
||||
// 1. Build symbol interner from CFG
|
||||
let interner = SymbolInterner::from_cfg(cfg);
|
||||
|
||||
if interner.len() > MAX_TRACKED_VARS {
|
||||
tracing::warn!(
|
||||
symbols = interner.len(),
|
||||
max = MAX_TRACKED_VARS,
|
||||
"taint analysis: too many variables, some will be ignored"
|
||||
);
|
||||
}
|
||||
|
||||
// (node, taint_hash) → predecessor key (for path rebuild)
|
||||
type Key = (NodeIndex, u64);
|
||||
let mut pred: HashMap<Key, Key> = HashMap::new();
|
||||
// 2. Build base transfer function
|
||||
let base_transfer = TaintTransfer {
|
||||
lang: caller_lang,
|
||||
namespace: caller_namespace,
|
||||
interner: &interner, // also used for events_to_findings below
|
||||
local_summaries,
|
||||
global_summaries,
|
||||
interop_edges,
|
||||
global_seed: None,
|
||||
scope_filter: None,
|
||||
};
|
||||
|
||||
// Seen states so we do not revisit them infinitely
|
||||
let mut seen: HashSet<Key> = HashSet::new();
|
||||
// 3. Run analysis (two-level for JS/TS, single-pass otherwise)
|
||||
let events = if matches!(caller_lang, Lang::JavaScript | Lang::TypeScript) {
|
||||
analyse_js_two_level(cfg, entry, &interner, &base_transfer)
|
||||
} else {
|
||||
let result = engine::run_forward(cfg, entry, &base_transfer, TaintState::initial());
|
||||
result.events
|
||||
};
|
||||
|
||||
// Resulting findings: (sink_node, source_node, full_path)
|
||||
let mut findings: Vec<Finding> = Vec::new();
|
||||
// 4. Convert events to findings
|
||||
let mut findings = events_to_findings(&events, &interner);
|
||||
|
||||
let mut q = VecDeque::new();
|
||||
q.push_back(Item {
|
||||
node: entry,
|
||||
taint: HashMap::new(),
|
||||
});
|
||||
seen.insert((entry, 0));
|
||||
// 5. Deduplicate findings by (sink, source), prefer path_validated=true
|
||||
findings.sort_by_key(|f| (f.sink.index(), f.source.index(), !f.path_validated));
|
||||
findings.dedup_by_key(|f| (f.sink, f.source));
|
||||
|
||||
while let Some(Item { node, taint }) = q.pop_front() {
|
||||
let caller_func = cfg[node].enclosing_func.as_deref().unwrap_or("");
|
||||
let mut out = taint.clone();
|
||||
apply_taint(
|
||||
&cfg[node],
|
||||
&mut out,
|
||||
local_summaries,
|
||||
global_summaries,
|
||||
caller_lang,
|
||||
caller_namespace,
|
||||
interop_edges,
|
||||
);
|
||||
findings
|
||||
}
|
||||
|
||||
// ── Sink check ──────────────────────────────────────────────────
|
||||
// Two ways a node can be a sink:
|
||||
// 1. Its AST label says Sink (existing inline labels)
|
||||
// 2. Its callee resolves to a function with sink_caps (cross-file)
|
||||
let sink_caps = match cfg[node].label {
|
||||
Some(DataLabel::Sink(caps)) => caps,
|
||||
_ => {
|
||||
// check if callee resolves to a sink
|
||||
cfg[node]
|
||||
.callee
|
||||
.as_ref()
|
||||
.and_then(|c| {
|
||||
resolve_callee(
|
||||
c,
|
||||
caller_lang,
|
||||
caller_namespace,
|
||||
caller_func,
|
||||
cfg[node].call_ordinal,
|
||||
local_summaries,
|
||||
global_summaries,
|
||||
interop_edges,
|
||||
)
|
||||
})
|
||||
.filter(|r| !r.sink_caps.is_empty())
|
||||
.map(|r| r.sink_caps)
|
||||
.unwrap_or(Cap::empty())
|
||||
}
|
||||
/// JS/TS two-level solve to prevent cross-function taint leakage.
|
||||
///
|
||||
/// Level 1: Solve top-level code (nodes where `enclosing_func.is_none()`).
|
||||
/// Level 2: For each function, solve seeded with top-level taint.
|
||||
fn analyse_js_two_level(
|
||||
cfg: &Cfg,
|
||||
entry: NodeIndex,
|
||||
_interner: &SymbolInterner,
|
||||
base_transfer: &TaintTransfer,
|
||||
) -> Vec<TaintEvent> {
|
||||
// Level 1: solve top-level only
|
||||
let toplevel_transfer = TaintTransfer {
|
||||
lang: base_transfer.lang,
|
||||
namespace: base_transfer.namespace,
|
||||
interner: base_transfer.interner,
|
||||
local_summaries: base_transfer.local_summaries,
|
||||
global_summaries: base_transfer.global_summaries,
|
||||
interop_edges: base_transfer.interop_edges,
|
||||
global_seed: None,
|
||||
scope_filter: Some(None), // top-level only (enclosing_func == None)
|
||||
};
|
||||
|
||||
let toplevel_result =
|
||||
engine::run_forward(cfg, entry, &toplevel_transfer, TaintState::initial());
|
||||
|
||||
// Extract top-level taint state at the last converged point
|
||||
let toplevel_state = extract_exit_state(&toplevel_result.states);
|
||||
|
||||
// Level 2: solve each function seeded with top-level state
|
||||
let mut all_events = toplevel_result.events;
|
||||
|
||||
let func_entries = find_function_entries(cfg);
|
||||
for (func_name, func_entry) in &func_entries {
|
||||
let func_transfer = TaintTransfer {
|
||||
lang: base_transfer.lang,
|
||||
namespace: base_transfer.namespace,
|
||||
interner: base_transfer.interner,
|
||||
local_summaries: base_transfer.local_summaries,
|
||||
global_summaries: base_transfer.global_summaries,
|
||||
interop_edges: base_transfer.interop_edges,
|
||||
global_seed: Some(&toplevel_state),
|
||||
scope_filter: Some(Some(func_name.as_str())),
|
||||
};
|
||||
|
||||
if !sink_caps.is_empty() {
|
||||
let bad = cfg[node]
|
||||
.uses
|
||||
.iter()
|
||||
.any(|u| out.get(u).is_some_and(|b| (*b & sink_caps) != Cap::empty()));
|
||||
if bad {
|
||||
// Reconstruct path backwards from sink to source.
|
||||
//
|
||||
// A node is considered a "source" if:
|
||||
// 1. It has an inline DataLabel::Source (same-file), OR
|
||||
// 2. It is a Call whose callee resolves to a source via
|
||||
// local or global summaries (cross-file).
|
||||
let sink_node = node;
|
||||
let mut path = vec![node];
|
||||
let mut source_node = node; // fallback: sink itself
|
||||
let mut key = (node, taint_hash(&taint));
|
||||
let func_result =
|
||||
engine::run_forward(cfg, *func_entry, &func_transfer, TaintState::initial());
|
||||
all_events.extend(func_result.events);
|
||||
}
|
||||
|
||||
while let Some(&(prev, prev_hash)) = pred.get(&key) {
|
||||
path.push(prev);
|
||||
all_events
|
||||
}
|
||||
|
||||
// Check inline source label
|
||||
if matches!(cfg[prev].label, Some(DataLabel::Source(_))) {
|
||||
source_node = prev;
|
||||
break;
|
||||
}
|
||||
/// Extract the "best" taint state from converged states (join all exit/reachable states).
|
||||
fn extract_exit_state(states: &std::collections::HashMap<NodeIndex, TaintState>) -> TaintState {
|
||||
let mut result = TaintState::initial();
|
||||
for state in states.values() {
|
||||
result = result.join(state);
|
||||
}
|
||||
result
|
||||
}
|
||||
|
||||
// Check cross-file source via resolved callee summary
|
||||
let prev_caller_func = cfg[prev].enclosing_func.as_deref().unwrap_or("");
|
||||
if cfg[prev].kind == StmtKind::Call
|
||||
&& let Some(callee) = &cfg[prev].callee
|
||||
&& let Some(resolved) = resolve_callee(
|
||||
callee,
|
||||
caller_lang,
|
||||
caller_namespace,
|
||||
prev_caller_func,
|
||||
cfg[prev].call_ordinal,
|
||||
local_summaries,
|
||||
global_summaries,
|
||||
interop_edges,
|
||||
)
|
||||
&& !resolved.source_caps.is_empty()
|
||||
{
|
||||
source_node = prev;
|
||||
break;
|
||||
}
|
||||
/// Find function entry nodes: (func_name, entry_node) pairs.
|
||||
///
|
||||
/// A function entry is the first node with a given `enclosing_func` value.
|
||||
fn find_function_entries(cfg: &Cfg) -> Vec<(String, NodeIndex)> {
|
||||
let mut seen = HashSet::new();
|
||||
let mut entries = Vec::new();
|
||||
|
||||
key = (prev, prev_hash);
|
||||
}
|
||||
|
||||
path.reverse();
|
||||
|
||||
// Infer the source kind from the source node's label and callee
|
||||
let source_kind = match cfg[source_node].label {
|
||||
Some(DataLabel::Source(caps)) => {
|
||||
let callee = cfg[source_node].callee.as_deref().unwrap_or("");
|
||||
crate::labels::infer_source_kind(caps, callee)
|
||||
}
|
||||
_ => SourceKind::Unknown,
|
||||
};
|
||||
|
||||
findings.push(Finding {
|
||||
sink: sink_node,
|
||||
source: source_node,
|
||||
path,
|
||||
source_kind,
|
||||
});
|
||||
}
|
||||
for (idx, info) in cfg.node_references() {
|
||||
if let Some(ref func_name) = info.enclosing_func
|
||||
&& seen.insert(func_name.clone())
|
||||
{
|
||||
entries.push((func_name.clone(), idx));
|
||||
}
|
||||
}
|
||||
|
||||
// enqueue successors — cache hashes to avoid recomputation
|
||||
let out_h = taint_hash(&out);
|
||||
let in_h = taint_hash(&taint);
|
||||
let succs: Vec<_> = cfg.neighbors(node).collect();
|
||||
for (i, succ) in succs.iter().enumerate() {
|
||||
let key = (*succ, out_h);
|
||||
if !seen.contains(&key) {
|
||||
seen.insert(key);
|
||||
pred.insert(key, (node, in_h));
|
||||
// Move the map into the last successor to avoid a clone
|
||||
let taint_for_succ = if i + 1 == succs.len() {
|
||||
std::mem::take(&mut out)
|
||||
} else {
|
||||
out.clone()
|
||||
};
|
||||
q.push_back(Item {
|
||||
node: *succ,
|
||||
taint: taint_for_succ,
|
||||
});
|
||||
entries
|
||||
}
|
||||
|
||||
/// Convert TaintEvents into Findings.
|
||||
fn events_to_findings(events: &[TaintEvent], _interner: &SymbolInterner) -> Vec<Finding> {
|
||||
let mut findings = Vec::new();
|
||||
|
||||
for event in events {
|
||||
let TaintEvent::SinkReached {
|
||||
sink_node,
|
||||
tainted_vars,
|
||||
all_validated,
|
||||
guard_kind,
|
||||
..
|
||||
} = event;
|
||||
|
||||
// Collect unique origins across all tainted vars at this sink
|
||||
let mut seen_origins: HashSet<(usize, usize)> = HashSet::new();
|
||||
for (_sym, _caps, origins) in tainted_vars {
|
||||
for origin in origins {
|
||||
if seen_origins.insert((origin.node.index(), sink_node.index())) {
|
||||
findings.push(Finding {
|
||||
sink: *sink_node,
|
||||
source: origin.node,
|
||||
path: vec![origin.node, *sink_node],
|
||||
source_kind: origin.source_kind,
|
||||
path_validated: *all_validated,
|
||||
guard_kind: *guard_kind,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
|||
234
src/taint/path_state.rs
Normal file
234
src/taint/path_state.rs
Normal file
|
|
@ -0,0 +1,234 @@
|
|||
// ─── PredicateKind ───────────────────────────────────────────────────────────
|
||||
|
||||
/// Classification of what an if-condition tests.
|
||||
///
|
||||
/// Determined by heuristic analysis of the raw condition text.
|
||||
/// Classification is conservative: prefer [`Unknown`](PredicateKind::Unknown)
|
||||
/// over a wrong guess.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
|
||||
pub enum PredicateKind {
|
||||
/// `x.is_none()`, `x == null`, `x == nil`, `x is None`
|
||||
NullCheck,
|
||||
/// `x.is_empty()`, `x.len() == 0`, `x == ""`
|
||||
EmptyCheck,
|
||||
/// `x.is_err()`, `x.is_ok()`, `err != nil`
|
||||
ErrorCheck,
|
||||
/// Call to a validation/guard function: `validate(x)`, `is_safe(x)`
|
||||
ValidationCall,
|
||||
/// Call to a sanitizer function: `sanitize(x)`, `escape(x)`
|
||||
SanitizerCall,
|
||||
/// Comparison operators: `x == 5`, `x > threshold`
|
||||
Comparison,
|
||||
/// Generic boolean test — cannot classify further.
|
||||
Unknown,
|
||||
}
|
||||
|
||||
/// Classify a raw condition text into a [`PredicateKind`].
|
||||
///
|
||||
/// # Rules
|
||||
///
|
||||
/// - Empty/None text → [`Unknown`](PredicateKind::Unknown).
|
||||
/// - `ValidationCall` / `SanitizerCall` require a `(` in the text **and** a
|
||||
/// matching callee token. This avoids misclassifying comparisons like
|
||||
/// `x_valid == true`.
|
||||
/// - Prefers [`Unknown`](PredicateKind::Unknown) over false positives.
|
||||
pub fn classify_condition(text: &str) -> PredicateKind {
|
||||
if text.is_empty() {
|
||||
return PredicateKind::Unknown;
|
||||
}
|
||||
|
||||
let lower = text.to_ascii_lowercase();
|
||||
|
||||
// ── Error checks (before null checks: `err != nil` is an error check,
|
||||
// not a null check, even though it contains `!= nil`) ──────────────
|
||||
if lower.contains("is_err")
|
||||
|| lower.contains("is_ok")
|
||||
|| lower.contains("err != nil")
|
||||
|| lower.contains("err == nil")
|
||||
|| lower.contains("error != nil")
|
||||
|| lower.contains("error == nil")
|
||||
{
|
||||
return PredicateKind::ErrorCheck;
|
||||
}
|
||||
|
||||
// ── Null checks ──────────────────────────────────────────────────────
|
||||
if lower.contains("is_none")
|
||||
|| lower.contains("is_some")
|
||||
|| lower.contains("== none")
|
||||
|| lower.contains("!= none")
|
||||
|| lower.contains("is none")
|
||||
|| lower.contains("is not none")
|
||||
|| lower.contains("== null")
|
||||
|| lower.contains("!= null")
|
||||
|| lower.contains("=== null")
|
||||
|| lower.contains("!== null")
|
||||
|| lower.contains("== nil")
|
||||
|| lower.contains("!= nil")
|
||||
{
|
||||
return PredicateKind::NullCheck;
|
||||
}
|
||||
|
||||
// ── Empty checks ─────────────────────────────────────────────────────
|
||||
if lower.contains("is_empty")
|
||||
|| lower.contains(".len() == 0")
|
||||
|| lower.contains(".len() != 0")
|
||||
|| lower.contains(".length == 0")
|
||||
|| lower.contains(".length === 0")
|
||||
|| lower.contains(".length != 0")
|
||||
|| lower.contains(".length !== 0")
|
||||
|| lower.contains("== \"\"")
|
||||
|| lower.contains("== ''")
|
||||
{
|
||||
return PredicateKind::EmptyCheck;
|
||||
}
|
||||
|
||||
// ── Call-based kinds (require `(` to be present) ─────────────────────
|
||||
if lower.contains('(') {
|
||||
// Extract a rough callee token: everything before the first `(`
|
||||
// that looks like an identifier (letters, digits, underscores, dots).
|
||||
let callee_part = lower.split('(').next().unwrap_or("");
|
||||
// Take the last segment (after `.` or `::`) as the bare name.
|
||||
let bare = callee_part
|
||||
.rsplit(['.', ':'])
|
||||
.next()
|
||||
.unwrap_or(callee_part)
|
||||
.trim();
|
||||
|
||||
// Validation
|
||||
if bare.contains("valid")
|
||||
|| bare.contains("check")
|
||||
|| bare.contains("verify")
|
||||
|| bare.starts_with("is_safe")
|
||||
|| bare.starts_with("is_authorized")
|
||||
|| bare.starts_with("is_authenticated")
|
||||
{
|
||||
return PredicateKind::ValidationCall;
|
||||
}
|
||||
|
||||
// Sanitizer
|
||||
if bare.contains("sanitiz") || bare.contains("escape") || bare.contains("encode") {
|
||||
return PredicateKind::SanitizerCall;
|
||||
}
|
||||
}
|
||||
|
||||
// ── Comparison operators ─────────────────────────────────────────────
|
||||
if lower.contains("==")
|
||||
|| lower.contains("!=")
|
||||
|| lower.contains(">=")
|
||||
|| lower.contains("<=")
|
||||
|| lower.contains(" > ")
|
||||
|| lower.contains(" < ")
|
||||
{
|
||||
return PredicateKind::Comparison;
|
||||
}
|
||||
|
||||
PredicateKind::Unknown
|
||||
}
|
||||
|
||||
// ─── Tests ───────────────────────────────────────────────────────────────────
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
// ── classify_condition ────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn classify_empty_is_unknown() {
|
||||
assert_eq!(classify_condition(""), PredicateKind::Unknown);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_null_checks() {
|
||||
assert_eq!(classify_condition("x.is_none()"), PredicateKind::NullCheck);
|
||||
assert_eq!(classify_condition("x == null"), PredicateKind::NullCheck);
|
||||
assert_eq!(classify_condition("x != nil"), PredicateKind::NullCheck);
|
||||
assert_eq!(classify_condition("x is None"), PredicateKind::NullCheck);
|
||||
assert_eq!(classify_condition("x === null"), PredicateKind::NullCheck);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_error_checks() {
|
||||
assert_eq!(classify_condition("x.is_err()"), PredicateKind::ErrorCheck);
|
||||
assert_eq!(classify_condition("err != nil"), PredicateKind::ErrorCheck);
|
||||
assert_eq!(classify_condition("x.is_ok()"), PredicateKind::ErrorCheck);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_empty_checks() {
|
||||
assert_eq!(
|
||||
classify_condition("x.is_empty()"),
|
||||
PredicateKind::EmptyCheck
|
||||
);
|
||||
assert_eq!(
|
||||
classify_condition("x.len() == 0"),
|
||||
PredicateKind::EmptyCheck
|
||||
);
|
||||
assert_eq!(
|
||||
classify_condition("x.length === 0"),
|
||||
PredicateKind::EmptyCheck
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_validation_call() {
|
||||
assert_eq!(
|
||||
classify_condition("validate(x)"),
|
||||
PredicateKind::ValidationCall
|
||||
);
|
||||
assert_eq!(
|
||||
classify_condition("is_safe(input)"),
|
||||
PredicateKind::ValidationCall
|
||||
);
|
||||
assert_eq!(
|
||||
classify_condition("check_auth(req)"),
|
||||
PredicateKind::ValidationCall
|
||||
);
|
||||
assert_eq!(
|
||||
classify_condition("input.verify(sig)"),
|
||||
PredicateKind::ValidationCall
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_validation_requires_paren() {
|
||||
// `x_valid == true` should NOT be ValidationCall — no `(` call syntax.
|
||||
assert_eq!(
|
||||
classify_condition("x_valid == true"),
|
||||
PredicateKind::Comparison
|
||||
);
|
||||
assert_eq!(
|
||||
classify_condition("is_valid && ready"),
|
||||
PredicateKind::Unknown
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_sanitizer_call() {
|
||||
assert_eq!(
|
||||
classify_condition("sanitize(x)"),
|
||||
PredicateKind::SanitizerCall
|
||||
);
|
||||
assert_eq!(
|
||||
classify_condition("html_escape(s)"),
|
||||
PredicateKind::SanitizerCall
|
||||
);
|
||||
assert_eq!(
|
||||
classify_condition("url_encode(path)"),
|
||||
PredicateKind::SanitizerCall
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_comparison() {
|
||||
assert_eq!(classify_condition("x == 5"), PredicateKind::Comparison);
|
||||
assert_eq!(classify_condition("x != y"), PredicateKind::Comparison);
|
||||
assert_eq!(classify_condition("a >= b"), PredicateKind::Comparison);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_unknown_fallback() {
|
||||
assert_eq!(classify_condition("flag"), PredicateKind::Unknown);
|
||||
assert_eq!(classify_condition("a && b"), PredicateKind::Unknown);
|
||||
}
|
||||
}
|
||||
|
|
@ -1,6 +1,7 @@
|
|||
use super::*;
|
||||
use crate::cfg::FuncSummaries;
|
||||
use crate::interop::InteropEdge;
|
||||
use crate::labels::Cap;
|
||||
use crate::symbol::FuncKey;
|
||||
|
||||
#[test]
|
||||
|
|
@ -52,8 +53,10 @@ fn taint_through_if_else() {
|
|||
let (cfg, entry, summaries) = build_cfg(&tree, src, "rust", "test.rs", None);
|
||||
let findings = analyse_file(&cfg, entry, &summaries, None, Lang::Rust, "test.rs", &[]);
|
||||
|
||||
// exactly one path (via the True branch) should be flagged
|
||||
assert_eq!(findings.len(), 1);
|
||||
// Both branches have findings: the true branch uses unsanitized `x`,
|
||||
// the else branch uses `safe` which was sanitized with HTML_ESCAPE
|
||||
// but the sink requires SHELL_ESCAPE (wrong sanitizer → still tainted).
|
||||
assert_eq!(findings.len(), 2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
|
|
@ -2218,3 +2221,318 @@ fn return_call_recognized_as_source() {
|
|||
"foo() should have source_caps set because env::var is called inside return"
|
||||
);
|
||||
}
|
||||
|
||||
// ─── Path-sensitive analysis tests ───────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn validate_and_early_return() {
|
||||
use crate::cfg::build_cfg;
|
||||
use tree_sitter::Language;
|
||||
|
||||
// Validate before use: if validation fails, early return.
|
||||
// The sink after the guard is on the "validated" path.
|
||||
//
|
||||
// The CFG creates a synthetic pass-through node for the false path
|
||||
// with an explicit False edge from the If node. BFS reaches the
|
||||
// sink via: cond → (False) → pass-through → (Seq) → sink.
|
||||
// The predicate on the False edge records that `!validate(&x)` was
|
||||
// false (i.e. validation passed), so the sink is path-guarded.
|
||||
let src = br#"
|
||||
use std::env; use std::process::Command;
|
||||
fn main() {
|
||||
let x = env::var("INPUT").unwrap();
|
||||
if !validate(&x) { return; }
|
||||
Command::new("sh").arg(x).status().unwrap();
|
||||
}"#;
|
||||
|
||||
let mut parser = tree_sitter::Parser::new();
|
||||
parser
|
||||
.set_language(&Language::from(tree_sitter_rust::LANGUAGE))
|
||||
.unwrap();
|
||||
let tree = parser.parse(src as &[u8], None).unwrap();
|
||||
|
||||
let (cfg, entry, summaries) = build_cfg(&tree, src, "rust", "test.rs", None);
|
||||
let findings = analyse_file(&cfg, entry, &summaries, None, Lang::Rust, "test.rs", &[]);
|
||||
|
||||
// Taint still flows (validate doesn't kill taint), but the finding
|
||||
// should be annotated as path_validated because the false path
|
||||
// (validation passed) has a ValidationCall predicate with polarity=true.
|
||||
assert_eq!(findings.len(), 1, "should still detect the taint flow");
|
||||
assert!(
|
||||
findings[0].path_validated,
|
||||
"finding should be marked as path_validated (early-return guard detected)"
|
||||
);
|
||||
assert_eq!(
|
||||
findings[0].guard_kind,
|
||||
Some(PredicateKind::ValidationCall),
|
||||
"guard_kind should be ValidationCall"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn validate_in_if_else_path_validated() {
|
||||
use crate::cfg::build_cfg;
|
||||
use tree_sitter::Language;
|
||||
|
||||
// If/else where the True branch (validation passed) contains the sink.
|
||||
// This IS detectable because the If node has genuine True/False branches.
|
||||
let src = br#"
|
||||
use std::env; use std::process::Command;
|
||||
fn main() {
|
||||
let x = env::var("INPUT").unwrap();
|
||||
if validate(&x) {
|
||||
Command::new("sh").arg(&x).status().unwrap();
|
||||
} else {
|
||||
println!("invalid input");
|
||||
}
|
||||
}"#;
|
||||
|
||||
let mut parser = tree_sitter::Parser::new();
|
||||
parser
|
||||
.set_language(&Language::from(tree_sitter_rust::LANGUAGE))
|
||||
.unwrap();
|
||||
let tree = parser.parse(src as &[u8], None).unwrap();
|
||||
|
||||
let (cfg, entry, summaries) = build_cfg(&tree, src, "rust", "test.rs", None);
|
||||
let findings = analyse_file(&cfg, entry, &summaries, None, Lang::Rust, "test.rs", &[]);
|
||||
|
||||
assert_eq!(findings.len(), 1, "should detect the taint flow");
|
||||
assert!(
|
||||
findings[0].path_validated,
|
||||
"finding should be path_validated (sink in validated branch)"
|
||||
);
|
||||
assert_eq!(
|
||||
findings[0].guard_kind,
|
||||
Some(PredicateKind::ValidationCall),
|
||||
"guard_kind should be ValidationCall"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn sink_on_failed_validation_branch() {
|
||||
use crate::cfg::build_cfg;
|
||||
use tree_sitter::Language;
|
||||
|
||||
// Sink is in the failed-validation branch (negated condition, false edge).
|
||||
let src = br#"
|
||||
use std::env; use std::process::Command;
|
||||
fn main() {
|
||||
let x = env::var("INPUT").unwrap();
|
||||
if !validate(&x) {
|
||||
Command::new("sh").arg(&x).status().unwrap();
|
||||
}
|
||||
}"#;
|
||||
|
||||
let mut parser = tree_sitter::Parser::new();
|
||||
parser
|
||||
.set_language(&Language::from(tree_sitter_rust::LANGUAGE))
|
||||
.unwrap();
|
||||
let tree = parser.parse(src as &[u8], None).unwrap();
|
||||
|
||||
let (cfg, entry, summaries) = build_cfg(&tree, src, "rust", "test.rs", None);
|
||||
let findings = analyse_file(&cfg, entry, &summaries, None, Lang::Rust, "test.rs", &[]);
|
||||
|
||||
assert_eq!(findings.len(), 1, "should detect taint flow to sink");
|
||||
assert!(
|
||||
!findings[0].path_validated,
|
||||
"finding should NOT be path_validated (sink is in failed-validation branch)"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn contradictory_null_check_pruned() {
|
||||
use crate::cfg::build_cfg;
|
||||
use tree_sitter::Language;
|
||||
|
||||
// Inner branch is infeasible: if x.is_none() then x cannot also be is_none().
|
||||
// After early return on is_none(), the fall-through path has polarity=false
|
||||
// for NullCheck. The inner `if x.is_none()` True branch has polarity=true —
|
||||
// contradiction.
|
||||
let src = br#"
|
||||
use std::env; use std::process::Command;
|
||||
fn main() {
|
||||
let x = env::var("INPUT").ok();
|
||||
if x.is_none() { return; }
|
||||
if x.is_none() {
|
||||
Command::new("sh").arg("dangerous").status().unwrap();
|
||||
}
|
||||
}"#;
|
||||
|
||||
let mut parser = tree_sitter::Parser::new();
|
||||
parser
|
||||
.set_language(&Language::from(tree_sitter_rust::LANGUAGE))
|
||||
.unwrap();
|
||||
let tree = parser.parse(src as &[u8], None).unwrap();
|
||||
|
||||
let (cfg, entry, summaries) = build_cfg(&tree, src, "rust", "test.rs", None);
|
||||
let findings = analyse_file(&cfg, entry, &summaries, None, Lang::Rust, "test.rs", &[]);
|
||||
|
||||
// The inner branch is infeasible, and the arg "dangerous" is a string
|
||||
// literal (not tainted), so there should be no findings.
|
||||
assert!(
|
||||
findings.is_empty(),
|
||||
"inner branch is infeasible — should produce no findings (got {})",
|
||||
findings.len()
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn sanitize_one_branch_no_regression() {
|
||||
use crate::cfg::build_cfg;
|
||||
use tree_sitter::Language;
|
||||
|
||||
// Same as existing taint_through_if_else: sanitized in one branch, not in the other.
|
||||
// Verify the finding count stays at 1 (no regression from path sensitivity).
|
||||
let src = br#"
|
||||
use std::env; use std::process::Command;
|
||||
fn main() {
|
||||
let x = env::var("DANGEROUS").unwrap();
|
||||
let safe = html_escape::encode_safe(&x);
|
||||
|
||||
if x.len() > 5 {
|
||||
Command::new("sh").arg(&x).status().unwrap(); // UNSAFE
|
||||
} else {
|
||||
Command::new("sh").arg(&safe).status().unwrap(); // SAFE
|
||||
}
|
||||
}"#;
|
||||
|
||||
let mut parser = tree_sitter::Parser::new();
|
||||
parser
|
||||
.set_language(&Language::from(tree_sitter_rust::LANGUAGE))
|
||||
.unwrap();
|
||||
let tree = parser.parse(src as &[u8], None).unwrap();
|
||||
|
||||
let (cfg, entry, summaries) = build_cfg(&tree, src, "rust", "test.rs", None);
|
||||
let findings = analyse_file(&cfg, entry, &summaries, None, Lang::Rust, "test.rs", &[]);
|
||||
|
||||
// Both branches produce findings: the true branch uses unsanitized `x`,
|
||||
// the else branch uses `safe` (HTML_ESCAPE sanitizer vs SHELL_ESCAPE sink).
|
||||
// Previously only 1 finding because else_clause was silently dropped from CFG.
|
||||
assert_eq!(
|
||||
findings.len(),
|
||||
2,
|
||||
"two findings expected (both branches reach sink with wrong/no sanitizer)"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn path_state_budget_graceful() {
|
||||
use crate::cfg::build_cfg;
|
||||
use tree_sitter::Language;
|
||||
|
||||
// Deeply nested ifs with a sink at the innermost level.
|
||||
// PathState should truncate gracefully after MAX_PATH_PREDICATES.
|
||||
let src = br#"
|
||||
use std::env; use std::process::Command;
|
||||
fn main() {
|
||||
let x = env::var("INPUT").unwrap();
|
||||
if x.len() > 1 {
|
||||
if x.len() > 2 {
|
||||
if x.len() > 3 {
|
||||
if x.len() > 4 {
|
||||
if x.len() > 5 {
|
||||
if x.len() > 6 {
|
||||
if x.len() > 7 {
|
||||
if x.len() > 8 {
|
||||
if x.len() > 9 {
|
||||
Command::new("sh").arg(&x).status().unwrap();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}"#;
|
||||
|
||||
let mut parser = tree_sitter::Parser::new();
|
||||
parser
|
||||
.set_language(&Language::from(tree_sitter_rust::LANGUAGE))
|
||||
.unwrap();
|
||||
let tree = parser.parse(src as &[u8], None).unwrap();
|
||||
|
||||
let (cfg, entry, summaries) = build_cfg(&tree, src, "rust", "test.rs", None);
|
||||
let findings = analyse_file(&cfg, entry, &summaries, None, Lang::Rust, "test.rs", &[]);
|
||||
|
||||
// Should still detect the flow — truncation shouldn't cause false negatives.
|
||||
assert_eq!(
|
||||
findings.len(),
|
||||
1,
|
||||
"should detect taint flow even with truncated PathState"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unknown_predicate_not_pruned() {
|
||||
use crate::cfg::build_cfg;
|
||||
use tree_sitter::Language;
|
||||
|
||||
// Comparison predicates are NOT in the contradiction whitelist, so even
|
||||
// seemingly contradictory comparisons should not be pruned.
|
||||
let src = br#"
|
||||
use std::env; use std::process::Command;
|
||||
fn main() {
|
||||
let x = env::var("INPUT").unwrap();
|
||||
if x.len() > 5 { return; }
|
||||
if x.len() > 5 {
|
||||
Command::new("sh").arg(&x).status().unwrap();
|
||||
}
|
||||
}"#;
|
||||
|
||||
let mut parser = tree_sitter::Parser::new();
|
||||
parser
|
||||
.set_language(&Language::from(tree_sitter_rust::LANGUAGE))
|
||||
.unwrap();
|
||||
let tree = parser.parse(src as &[u8], None).unwrap();
|
||||
|
||||
let (cfg, entry, summaries) = build_cfg(&tree, src, "rust", "test.rs", None);
|
||||
let findings = analyse_file(&cfg, entry, &summaries, None, Lang::Rust, "test.rs", &[]);
|
||||
|
||||
// Comparison is not in the whitelist — the path should NOT be pruned.
|
||||
assert_eq!(
|
||||
findings.len(),
|
||||
1,
|
||||
"Comparison predicate should not cause contradiction pruning"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn multi_var_predicate_not_pruned() {
|
||||
use crate::cfg::build_cfg;
|
||||
use tree_sitter::Language;
|
||||
|
||||
// Multi-variable conditions should never be pruned for contradiction,
|
||||
// even if the kind is in the whitelist.
|
||||
let src = br#"
|
||||
use std::env; use std::process::Command;
|
||||
fn main() {
|
||||
let x = env::var("INPUT").unwrap();
|
||||
let y = env::var("OTHER").ok();
|
||||
if y.is_none() { return; }
|
||||
if y.is_none() {
|
||||
Command::new("sh").arg(&x).status().unwrap();
|
||||
}
|
||||
}"#;
|
||||
|
||||
let mut parser = tree_sitter::Parser::new();
|
||||
parser
|
||||
.set_language(&Language::from(tree_sitter_rust::LANGUAGE))
|
||||
.unwrap();
|
||||
let tree = parser.parse(src as &[u8], None).unwrap();
|
||||
|
||||
let (cfg, entry, summaries) = build_cfg(&tree, src, "rust", "test.rs", None);
|
||||
let findings = analyse_file(&cfg, entry, &summaries, None, Lang::Rust, "test.rs", &[]);
|
||||
|
||||
// Note: y.is_none() condition references `y` and `is_none` — two idents.
|
||||
// Wait, `is_none` is a method — collect_idents finds `y` and `is_none` as
|
||||
// separate identifiers. That makes it multi-var, so contradiction should
|
||||
// NOT fire. However, the actual behavior depends on how many idents
|
||||
// collect_idents extracts from `y.is_none()`. If it returns ["y", "is_none"],
|
||||
// then the predicate has 2 vars → multi-var → not pruned → finding exists.
|
||||
assert!(
|
||||
!findings.is_empty(),
|
||||
"multi-var predicate should not be pruned; flow should be detected"
|
||||
);
|
||||
}
|
||||
|
|
|
|||
458
src/taint/transfer.rs
Normal file
458
src/taint/transfer.rs
Normal file
|
|
@ -0,0 +1,458 @@
|
|||
use crate::callgraph::normalize_callee_name;
|
||||
use crate::cfg::{EdgeKind, FuncSummaries, NodeInfo, StmtKind};
|
||||
use crate::interop::InteropEdge;
|
||||
use crate::labels::{Cap, DataLabel};
|
||||
use crate::state::engine::Transfer;
|
||||
use crate::state::lattice::Lattice;
|
||||
use crate::state::symbol::{SymbolId, SymbolInterner};
|
||||
use crate::summary::{CalleeResolution, GlobalSummaries};
|
||||
use crate::symbol::Lang;
|
||||
use crate::taint::domain::{TaintOrigin, TaintState, VarTaint, predicate_kind_bit};
|
||||
use crate::taint::path_state::{PredicateKind, classify_condition};
|
||||
use petgraph::graph::NodeIndex;
|
||||
use smallvec::SmallVec;
|
||||
|
||||
/// Events emitted by the taint transfer function during Phase 2.
|
||||
#[derive(Clone, Debug)]
|
||||
pub enum TaintEvent {
|
||||
SinkReached {
|
||||
sink_node: NodeIndex,
|
||||
tainted_vars: Vec<(SymbolId, Cap, SmallVec<[TaintOrigin; 2]>)>,
|
||||
#[allow(dead_code)]
|
||||
sink_caps: Cap,
|
||||
all_validated: bool,
|
||||
guard_kind: Option<PredicateKind>,
|
||||
},
|
||||
}
|
||||
|
||||
/// Taint transfer function for forward dataflow analysis.
|
||||
pub struct TaintTransfer<'a> {
|
||||
pub lang: Lang,
|
||||
pub namespace: &'a str,
|
||||
pub interner: &'a SymbolInterner,
|
||||
pub local_summaries: &'a FuncSummaries,
|
||||
pub global_summaries: Option<&'a GlobalSummaries>,
|
||||
pub interop_edges: &'a [InteropEdge],
|
||||
/// For JS two-level solve: top-level taint state seeded into function solves.
|
||||
pub global_seed: Option<&'a TaintState>,
|
||||
/// Optional scope filter: if set, only process nodes whose enclosing_func matches.
|
||||
/// None = process all nodes. Some(None) = top-level only. Some(Some(name)) = function only.
|
||||
pub scope_filter: Option<Option<&'a str>>,
|
||||
}
|
||||
|
||||
impl Transfer<TaintState> for TaintTransfer<'_> {
|
||||
type Event = TaintEvent;
|
||||
|
||||
fn apply(
|
||||
&self,
|
||||
node: NodeIndex,
|
||||
info: &NodeInfo,
|
||||
edge: Option<EdgeKind>,
|
||||
mut state: TaintState,
|
||||
) -> (TaintState, Vec<TaintEvent>) {
|
||||
let mut events = Vec::new();
|
||||
|
||||
// Scope filter: skip nodes outside our scope (return state unchanged)
|
||||
if let Some(ref filter) = self.scope_filter {
|
||||
let node_func = info.enclosing_func.as_deref();
|
||||
if node_func != *filter {
|
||||
return (state, events);
|
||||
}
|
||||
}
|
||||
|
||||
let caller_func = info.enclosing_func.as_deref().unwrap_or("");
|
||||
|
||||
// ── Apply taint transfer ────────────────────────────────────────
|
||||
match info.label {
|
||||
Some(DataLabel::Source(bits)) => {
|
||||
self.apply_source(node, info, bits, &mut state);
|
||||
}
|
||||
Some(DataLabel::Sanitizer(bits)) => {
|
||||
self.apply_sanitizer(info, bits, &mut state);
|
||||
}
|
||||
_ if info.kind == StmtKind::Call => {
|
||||
self.apply_call(node, info, caller_func, &mut state);
|
||||
}
|
||||
_ => {
|
||||
self.apply_assignment(info, &mut state);
|
||||
}
|
||||
}
|
||||
|
||||
// ── If-node predicate handling (edge-aware) ─────────────────────
|
||||
if info.kind == StmtKind::If
|
||||
&& !info.condition_vars.is_empty()
|
||||
&& matches!(edge, Some(EdgeKind::True) | Some(EdgeKind::False))
|
||||
{
|
||||
let cond_text = info.condition_text.as_deref().unwrap_or("");
|
||||
let kind = classify_condition(cond_text);
|
||||
let polarity = matches!(edge, Some(EdgeKind::True)) ^ info.condition_negated;
|
||||
|
||||
// ValidationCall handling
|
||||
if kind == PredicateKind::ValidationCall && polarity {
|
||||
for var in &info.condition_vars {
|
||||
if let Some(sym) = self.interner.get(var) {
|
||||
state.validated_may.insert(sym);
|
||||
state.validated_must.insert(sym);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Predicate summary for whitelisted kinds (contradiction pruning)
|
||||
if let Some(bit_idx) = predicate_kind_bit(kind) {
|
||||
for var in &info.condition_vars {
|
||||
if let Some(sym) = self.interner.get(var) {
|
||||
let mut summary = state.get_predicate(sym);
|
||||
if polarity {
|
||||
summary.known_true |= 1 << bit_idx;
|
||||
} else {
|
||||
summary.known_false |= 1 << bit_idx;
|
||||
}
|
||||
state.set_predicate(sym, summary);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Contradiction pruning: if any variable has contradictory predicates,
|
||||
// this is an infeasible path → return bot (monotonically kills branch).
|
||||
if state.has_contradiction() {
|
||||
return (TaintState::bot(), events);
|
||||
}
|
||||
}
|
||||
|
||||
// ── Sink check ──────────────────────────────────────────────────
|
||||
let sink_caps = self.resolve_sink_caps(info, caller_func);
|
||||
if !sink_caps.is_empty() {
|
||||
let tainted_vars = self.collect_tainted_sink_vars(info, &state, sink_caps);
|
||||
if !tainted_vars.is_empty() {
|
||||
let all_validated = tainted_vars
|
||||
.iter()
|
||||
.all(|(sym, _, _)| state.validated_may.contains(*sym));
|
||||
|
||||
let guard_kind = if all_validated {
|
||||
Some(PredicateKind::ValidationCall)
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
events.push(TaintEvent::SinkReached {
|
||||
sink_node: node,
|
||||
tainted_vars,
|
||||
sink_caps,
|
||||
all_validated,
|
||||
guard_kind,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
(state, events)
|
||||
}
|
||||
|
||||
fn iteration_budget(&self) -> usize {
|
||||
100_000
|
||||
}
|
||||
|
||||
fn on_budget_exceeded(&self) -> bool {
|
||||
tracing::warn!("taint analysis: worklist budget exceeded, returning partial results");
|
||||
false
|
||||
}
|
||||
}
|
||||
|
||||
impl TaintTransfer<'_> {
|
||||
/// Apply a Source label: insert taint for the defined variable.
|
||||
fn apply_source(&self, node: NodeIndex, info: &NodeInfo, bits: Cap, state: &mut TaintState) {
|
||||
if let Some(ref v) = info.defines
|
||||
&& let Some(sym) = self.interner.get(v)
|
||||
{
|
||||
let callee = info.callee.as_deref().unwrap_or("");
|
||||
let source_kind = crate::labels::infer_source_kind(bits, callee);
|
||||
let origin = TaintOrigin { node, source_kind };
|
||||
|
||||
match state.get(sym) {
|
||||
Some(existing) => {
|
||||
let mut new_taint = existing.clone();
|
||||
new_taint.caps |= bits;
|
||||
if new_taint.origins.len() < 4
|
||||
&& !new_taint.origins.iter().any(|o| o.node == node)
|
||||
{
|
||||
new_taint.origins.push(origin);
|
||||
}
|
||||
state.set(sym, new_taint);
|
||||
}
|
||||
None => {
|
||||
state.set(
|
||||
sym,
|
||||
VarTaint {
|
||||
caps: bits,
|
||||
origins: SmallVec::from_elem(origin, 1),
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Apply a Sanitizer label: propagate input taint, then strip sanitizer bits.
|
||||
fn apply_sanitizer(&self, info: &NodeInfo, bits: Cap, state: &mut TaintState) {
|
||||
if let Some(ref v) = info.defines
|
||||
&& let Some(sym) = self.interner.get(v)
|
||||
{
|
||||
let (combined_caps, combined_origins) = self.collect_uses_taint(info, state);
|
||||
let new_caps = combined_caps & !bits;
|
||||
if new_caps.is_empty() {
|
||||
state.remove(sym);
|
||||
} else {
|
||||
state.set(
|
||||
sym,
|
||||
VarTaint {
|
||||
caps: new_caps,
|
||||
origins: combined_origins,
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Apply a function call: resolve callee and compute return taint.
|
||||
fn apply_call(
|
||||
&self,
|
||||
node: NodeIndex,
|
||||
info: &NodeInfo,
|
||||
caller_func: &str,
|
||||
state: &mut TaintState,
|
||||
) {
|
||||
if let Some(ref callee) = info.callee
|
||||
&& let Some(resolved) = self.resolve_callee(callee, caller_func, info.call_ordinal)
|
||||
{
|
||||
let mut return_bits = Cap::empty();
|
||||
let mut return_origins: SmallVec<[TaintOrigin; 2]> = SmallVec::new();
|
||||
|
||||
// 1. Source behaviour
|
||||
if !resolved.source_caps.is_empty() {
|
||||
return_bits |= resolved.source_caps;
|
||||
let callee_str = info.callee.as_deref().unwrap_or("");
|
||||
let source_kind =
|
||||
crate::labels::infer_source_kind(resolved.source_caps, callee_str);
|
||||
let origin = TaintOrigin { node, source_kind };
|
||||
if !return_origins.iter().any(|o| o.node == node) {
|
||||
return_origins.push(origin);
|
||||
}
|
||||
}
|
||||
|
||||
// 2. Propagation
|
||||
if resolved.propagates_taint {
|
||||
let (use_caps, use_origins) = self.collect_uses_taint(info, state);
|
||||
return_bits |= use_caps;
|
||||
for orig in &use_origins {
|
||||
if return_origins.len() < 4
|
||||
&& !return_origins.iter().any(|o| o.node == orig.node)
|
||||
{
|
||||
return_origins.push(*orig);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 3. Sanitizer behaviour (applied last so it always wins)
|
||||
return_bits &= !resolved.sanitizer_caps;
|
||||
|
||||
// Write result
|
||||
if let Some(ref v) = info.defines
|
||||
&& let Some(sym) = self.interner.get(v)
|
||||
{
|
||||
if return_bits.is_empty() {
|
||||
state.remove(sym);
|
||||
} else {
|
||||
state.set(
|
||||
sym,
|
||||
VarTaint {
|
||||
caps: return_bits,
|
||||
origins: return_origins,
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
// Unresolved call — fall through to default gen/kill
|
||||
self.apply_assignment(info, state);
|
||||
}
|
||||
|
||||
/// Default gen/kill: propagate taint through variable assignments.
|
||||
fn apply_assignment(&self, info: &NodeInfo, state: &mut TaintState) {
|
||||
if matches!(
|
||||
info.label,
|
||||
Some(DataLabel::Source(_)) | Some(DataLabel::Sanitizer(_))
|
||||
) {
|
||||
return;
|
||||
}
|
||||
|
||||
if let Some(ref d) = info.defines
|
||||
&& let Some(sym) = self.interner.get(d)
|
||||
{
|
||||
let (combined_caps, combined_origins) = self.collect_uses_taint(info, state);
|
||||
if combined_caps.is_empty() {
|
||||
state.remove(sym);
|
||||
} else {
|
||||
state.set(
|
||||
sym,
|
||||
VarTaint {
|
||||
caps: combined_caps,
|
||||
origins: combined_origins,
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Collect taint from all `uses` variables (union of caps + merge origins).
|
||||
fn collect_uses_taint(
|
||||
&self,
|
||||
info: &NodeInfo,
|
||||
state: &TaintState,
|
||||
) -> (Cap, SmallVec<[TaintOrigin; 2]>) {
|
||||
let mut combined_caps = Cap::empty();
|
||||
let mut combined_origins: SmallVec<[TaintOrigin; 2]> = SmallVec::new();
|
||||
|
||||
for u in &info.uses {
|
||||
let taint = self.lookup_var(u, state);
|
||||
if let Some(t) = taint {
|
||||
combined_caps |= t.caps;
|
||||
for orig in &t.origins {
|
||||
if combined_origins.len() < 4
|
||||
&& !combined_origins.iter().any(|o| o.node == orig.node)
|
||||
{
|
||||
combined_origins.push(*orig);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
(combined_caps, combined_origins)
|
||||
}
|
||||
|
||||
/// Look up a variable's taint, falling back to global_seed for JS two-level solve.
|
||||
fn lookup_var<'a>(&'a self, name: &str, state: &'a TaintState) -> Option<&'a VarTaint> {
|
||||
if let Some(sym) = self.interner.get(name) {
|
||||
if let Some(taint) = state.get(sym) {
|
||||
return Some(taint);
|
||||
}
|
||||
// Fall back to global seed (JS two-level solve)
|
||||
if let Some(seed) = self.global_seed {
|
||||
return seed.get(sym);
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Resolve sink caps from label or callee summary.
|
||||
fn resolve_sink_caps(&self, info: &NodeInfo, caller_func: &str) -> Cap {
|
||||
match info.label {
|
||||
Some(DataLabel::Sink(caps)) => caps,
|
||||
_ => info
|
||||
.callee
|
||||
.as_ref()
|
||||
.and_then(|c| self.resolve_callee(c, caller_func, info.call_ordinal))
|
||||
.filter(|r| !r.sink_caps.is_empty())
|
||||
.map(|r| r.sink_caps)
|
||||
.unwrap_or(Cap::empty()),
|
||||
}
|
||||
}
|
||||
|
||||
/// Collect tainted variables at a sink node.
|
||||
fn collect_tainted_sink_vars(
|
||||
&self,
|
||||
info: &NodeInfo,
|
||||
state: &TaintState,
|
||||
sink_caps: Cap,
|
||||
) -> Vec<(SymbolId, Cap, SmallVec<[TaintOrigin; 2]>)> {
|
||||
let mut result = Vec::new();
|
||||
for u in &info.uses {
|
||||
if let Some(taint) = self.lookup_var(u, state)
|
||||
&& (taint.caps & sink_caps) != Cap::empty()
|
||||
&& let Some(sym) = self.interner.get(u)
|
||||
{
|
||||
result.push((sym, taint.caps, taint.origins.clone()));
|
||||
}
|
||||
}
|
||||
result
|
||||
}
|
||||
|
||||
/// Resolve a callee name to its summary (local → global → interop).
|
||||
fn resolve_callee(
|
||||
&self,
|
||||
callee: &str,
|
||||
caller_func: &str,
|
||||
call_ordinal: u32,
|
||||
) -> Option<ResolvedSummary> {
|
||||
let normalized = normalize_callee_name(callee);
|
||||
|
||||
// 1) Local (same-file)
|
||||
let local_matches: Vec<_> = self
|
||||
.local_summaries
|
||||
.iter()
|
||||
.filter(|(k, _)| {
|
||||
k.name == normalized && k.lang == self.lang && k.namespace == self.namespace
|
||||
})
|
||||
.collect();
|
||||
|
||||
if local_matches.len() == 1 {
|
||||
let (_, ls) = local_matches[0];
|
||||
return Some(ResolvedSummary {
|
||||
source_caps: ls.source_caps,
|
||||
sanitizer_caps: ls.sanitizer_caps,
|
||||
sink_caps: ls.sink_caps,
|
||||
propagates_taint: ls.propagates_taint,
|
||||
});
|
||||
}
|
||||
if local_matches.len() > 1 {
|
||||
return None;
|
||||
}
|
||||
|
||||
// 2) Global same-language
|
||||
if let Some(gs) = self.global_summaries {
|
||||
match gs.resolve_callee_key(normalized, self.lang, self.namespace, None) {
|
||||
CalleeResolution::Resolved(target_key) => {
|
||||
if let Some(fs) = gs.get(&target_key) {
|
||||
return Some(ResolvedSummary {
|
||||
source_caps: fs.source_caps(),
|
||||
sanitizer_caps: fs.sanitizer_caps(),
|
||||
sink_caps: fs.sink_caps(),
|
||||
propagates_taint: fs.propagates_taint,
|
||||
});
|
||||
}
|
||||
}
|
||||
CalleeResolution::NotFound | CalleeResolution::Ambiguous(_) => {}
|
||||
}
|
||||
}
|
||||
|
||||
// 3) Interop edges
|
||||
for edge in self.interop_edges {
|
||||
if edge.from.caller_lang == self.lang
|
||||
&& edge.from.caller_namespace == self.namespace
|
||||
&& edge.from.callee_symbol == callee
|
||||
&& (edge.from.caller_func.is_empty() || edge.from.caller_func == caller_func)
|
||||
&& (edge.from.ordinal == 0 || edge.from.ordinal == call_ordinal)
|
||||
&& let Some(gs) = self.global_summaries
|
||||
&& let Some(fs) = gs.get(&edge.to)
|
||||
{
|
||||
return Some(ResolvedSummary {
|
||||
source_caps: fs.source_caps(),
|
||||
sanitizer_caps: fs.sanitizer_caps(),
|
||||
sink_caps: fs.sink_caps(),
|
||||
propagates_taint: fs.propagates_taint,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
None
|
||||
}
|
||||
}
|
||||
|
||||
/// Resolved summary for a callee.
|
||||
struct ResolvedSummary {
|
||||
source_caps: Cap,
|
||||
sanitizer_caps: Cap,
|
||||
sink_caps: Cap,
|
||||
propagates_taint: bool,
|
||||
}
|
||||
|
|
@ -61,6 +61,10 @@ pub struct ScannerConfig {
|
|||
/// benchmarks, etc.) at their original severity. When false (default),
|
||||
/// findings in these paths are downgraded by one severity tier.
|
||||
pub include_nonprod: bool,
|
||||
|
||||
/// Enable the state-model dataflow engine for resource lifecycle and
|
||||
/// auth-state analysis. Default: false (opt-in).
|
||||
pub enable_state_analysis: bool,
|
||||
}
|
||||
impl Default for ScannerConfig {
|
||||
fn default() -> Self {
|
||||
|
|
@ -94,6 +98,7 @@ impl Default for ScannerConfig {
|
|||
follow_symlinks: false,
|
||||
scan_hidden_files: false,
|
||||
include_nonprod: false,
|
||||
enable_state_analysis: false,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -135,6 +140,60 @@ pub struct OutputConfig {
|
|||
|
||||
/// The maximum number of results to show.
|
||||
pub max_results: Option<u32>,
|
||||
|
||||
/// Enable attack-surface ranking to sort findings by exploitability.
|
||||
pub attack_surface_ranking: bool,
|
||||
|
||||
/// Minimum attack-surface score to include in output.
|
||||
/// Findings below this threshold are dropped after ranking.
|
||||
/// `None` means no minimum (all findings shown).
|
||||
pub min_score: Option<u32>,
|
||||
|
||||
/// Minimum confidence level to include in output.
|
||||
/// `None` means no minimum (all findings shown).
|
||||
#[serde(
|
||||
default,
|
||||
skip_serializing_if = "Option::is_none",
|
||||
deserialize_with = "deserialize_confidence_opt"
|
||||
)]
|
||||
pub min_confidence: Option<crate::evidence::Confidence>,
|
||||
|
||||
/// Include Quality-category findings (excluded by default).
|
||||
#[serde(default)]
|
||||
pub include_quality: bool,
|
||||
|
||||
/// Show all findings: disables category filtering, rollups, and LOW budgets.
|
||||
#[serde(default)]
|
||||
pub show_all: bool,
|
||||
|
||||
/// Maximum total LOW findings to show.
|
||||
#[serde(default = "default_max_low")]
|
||||
pub max_low: u32,
|
||||
|
||||
/// Maximum LOW findings per file.
|
||||
#[serde(default = "default_max_low_per_file")]
|
||||
pub max_low_per_file: u32,
|
||||
|
||||
/// Maximum LOW findings per rule.
|
||||
#[serde(default = "default_max_low_per_rule")]
|
||||
pub max_low_per_rule: u32,
|
||||
|
||||
/// Number of example locations to store in rollup findings.
|
||||
#[serde(default = "default_rollup_examples")]
|
||||
pub rollup_examples: u32,
|
||||
}
|
||||
|
||||
fn default_max_low() -> u32 {
|
||||
20
|
||||
}
|
||||
fn default_max_low_per_file() -> u32 {
|
||||
1
|
||||
}
|
||||
fn default_max_low_per_rule() -> u32 {
|
||||
10
|
||||
}
|
||||
fn default_rollup_examples() -> u32 {
|
||||
5
|
||||
}
|
||||
|
||||
impl Default for OutputConfig {
|
||||
|
|
@ -143,10 +202,36 @@ impl Default for OutputConfig {
|
|||
default_format: "console".into(),
|
||||
quiet: false,
|
||||
max_results: None,
|
||||
attack_surface_ranking: true,
|
||||
min_score: None,
|
||||
min_confidence: None,
|
||||
include_quality: false,
|
||||
show_all: false,
|
||||
max_low: 20,
|
||||
max_low_per_file: 1,
|
||||
max_low_per_rule: 10,
|
||||
rollup_examples: 5,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Deserialize an optional Confidence from a TOML string.
|
||||
fn deserialize_confidence_opt<'de, D>(
|
||||
deserializer: D,
|
||||
) -> Result<Option<crate::evidence::Confidence>, D::Error>
|
||||
where
|
||||
D: serde::Deserializer<'de>,
|
||||
{
|
||||
let opt: Option<String> = Option::deserialize(deserializer)?;
|
||||
match opt {
|
||||
None => Ok(None),
|
||||
Some(s) => s
|
||||
.parse::<crate::evidence::Confidence>()
|
||||
.map(Some)
|
||||
.map_err(serde::de::Error::custom),
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize, Deserialize, Clone)]
|
||||
#[serde(default)]
|
||||
pub struct PerformanceConfig {
|
||||
|
|
@ -303,6 +388,7 @@ fn merge_configs(mut default: Config, user: Config) -> Config {
|
|||
default.scanner.follow_symlinks = user.scanner.follow_symlinks;
|
||||
default.scanner.scan_hidden_files = user.scanner.scan_hidden_files;
|
||||
default.scanner.include_nonprod = user.scanner.include_nonprod;
|
||||
default.scanner.enable_state_analysis = user.scanner.enable_state_analysis;
|
||||
|
||||
// Merge exclusion lists (default ⊔ user), then sort & dedupe
|
||||
default
|
||||
|
|
@ -328,6 +414,15 @@ fn merge_configs(mut default: Config, user: Config) -> Config {
|
|||
default.output.default_format = user.output.default_format;
|
||||
default.output.quiet = user.output.quiet;
|
||||
default.output.max_results = user.output.max_results;
|
||||
default.output.attack_surface_ranking = user.output.attack_surface_ranking;
|
||||
default.output.min_score = user.output.min_score;
|
||||
default.output.min_confidence = user.output.min_confidence;
|
||||
default.output.include_quality = user.output.include_quality;
|
||||
default.output.show_all = user.output.show_all;
|
||||
default.output.max_low = user.output.max_low;
|
||||
default.output.max_low_per_file = user.output.max_low_per_file;
|
||||
default.output.max_low_per_rule = user.output.max_low_per_rule;
|
||||
default.output.rollup_examples = user.output.rollup_examples;
|
||||
|
||||
// --- PerformanceConfig ---
|
||||
default.performance.max_depth = user.performance.max_depth;
|
||||
|
|
|
|||
|
|
@ -147,8 +147,8 @@ pub fn spawn_file_walker(root: &Path, cfg: &Config) -> (Receiver<Paths>, JoinHan
|
|||
#[test]
|
||||
fn walker_respects_excluded_extensions() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
std::fs::write(tmp.path().join("keep.rs"), "fn main(){}").unwrap();
|
||||
std::fs::write(tmp.path().join("skip.txt"), "ignored").unwrap();
|
||||
std::fs::write(tmp.path().join("keep.rs"), "fn main(){}").unwrap(); // nyx:ignore cfg-unguarded-sink
|
||||
std::fs::write(tmp.path().join("skip.txt"), "ignored").unwrap(); // nyx:ignore cfg-unguarded-sink
|
||||
|
||||
let mut cfg = Config::default();
|
||||
cfg.scanner.excluded_extensions = vec!["txt".into()];
|
||||
|
|
|
|||
|
|
@ -7,11 +7,13 @@ use std::path::Path;
|
|||
|
||||
// ── Deterministic test config ──────────────────────────────────────────────
|
||||
|
||||
#[allow(dead_code)]
|
||||
pub fn test_config(mode: AnalysisMode) -> Config {
|
||||
let mut cfg = Config::default();
|
||||
cfg.scanner.mode = mode;
|
||||
cfg.scanner.read_vcsignore = false;
|
||||
cfg.scanner.require_git_to_read_vcsignore = false;
|
||||
cfg.scanner.enable_state_analysis = true;
|
||||
cfg.performance.worker_threads = Some(1);
|
||||
cfg.performance.batch_size = 64;
|
||||
cfg.performance.channel_multiplier = 1;
|
||||
|
|
@ -21,6 +23,7 @@ pub fn test_config(mode: AnalysisMode) -> Config {
|
|||
// ── Scan helpers ───────────────────────────────────────────────────────────
|
||||
|
||||
/// Full two-pass scan of a directory (filesystem only, no index).
|
||||
#[allow(dead_code)]
|
||||
pub fn scan_fixture_dir(path: &Path, mode: AnalysisMode) -> Vec<Diag> {
|
||||
let cfg = test_config(mode);
|
||||
nyx_scanner::scan_no_index(path, &cfg).expect("scan_no_index should succeed")
|
||||
|
|
@ -28,10 +31,12 @@ pub fn scan_fixture_dir(path: &Path, mode: AnalysisMode) -> Vec<Diag> {
|
|||
|
||||
// ── Counting / assertion helpers ───────────────────────────────────────────
|
||||
|
||||
#[allow(dead_code)]
|
||||
pub fn count_by_prefix(diags: &[Diag], prefix: &str) -> usize {
|
||||
diags.iter().filter(|d| d.id.starts_with(prefix)).count()
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
pub fn assert_min_findings(diags: &[Diag], prefix: &str, min: usize) {
|
||||
let count = count_by_prefix(diags, prefix);
|
||||
assert!(
|
||||
|
|
@ -52,6 +57,7 @@ pub fn assert_min_findings(diags: &[Diag], prefix: &str, min: usize) {
|
|||
);
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
pub fn assert_no_findings(diags: &[Diag], prefix: &str) {
|
||||
let matching: Vec<_> = diags.iter().filter(|d| d.id.starts_with(prefix)).collect();
|
||||
assert!(
|
||||
|
|
@ -65,6 +71,7 @@ pub fn assert_no_findings(diags: &[Diag], prefix: &str) {
|
|||
);
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
pub fn assert_max_findings(diags: &[Diag], max_total: usize, max_high: usize) {
|
||||
let high_count = diags
|
||||
.iter()
|
||||
|
|
@ -130,6 +137,7 @@ pub struct PerformanceExpectations {
|
|||
}
|
||||
|
||||
/// Load and parse `expectations.json` from a fixture directory.
|
||||
#[allow(dead_code)]
|
||||
pub fn load_expectations(fixture_dir: &Path) -> Expectations {
|
||||
let path = fixture_dir.join("expectations.json");
|
||||
let content = std::fs::read_to_string(&path)
|
||||
|
|
@ -139,6 +147,7 @@ pub fn load_expectations(fixture_dir: &Path) -> Expectations {
|
|||
}
|
||||
|
||||
/// Validate a set of diagnostics against a fixture's expectations.json.
|
||||
#[allow(dead_code)]
|
||||
pub fn validate_expectations(diags: &[Diag], fixture_dir: &Path) {
|
||||
let exp = load_expectations(fixture_dir);
|
||||
|
||||
|
|
|
|||
12
tests/fixtures/c_utils/expectations.json
vendored
12
tests/fixtures/c_utils/expectations.json
vendored
|
|
@ -1,12 +1,12 @@
|
|||
{
|
||||
"required_findings": [
|
||||
{ "id_prefix": "taint-unsanitised-flow", "min_count": 4 },
|
||||
{ "id_prefix": "strcpy_call", "min_count": 1 },
|
||||
{ "id_prefix": "strcat_call", "min_count": 1 },
|
||||
{ "id_prefix": "sprintf_call", "min_count": 4 },
|
||||
{ "id_prefix": "gets_call", "min_count": 1 },
|
||||
{ "id_prefix": "scanf_with_percent_s", "min_count": 1 },
|
||||
{ "id_prefix": "system_call", "min_count": 3 },
|
||||
{ "id_prefix": "c.memory.strcpy", "min_count": 1 },
|
||||
{ "id_prefix": "c.memory.strcat", "min_count": 1 },
|
||||
{ "id_prefix": "c.memory.sprintf", "min_count": 4 },
|
||||
{ "id_prefix": "c.memory.gets", "min_count": 1 },
|
||||
{ "id_prefix": "c.memory.scanf_percent_s", "min_count": 1 },
|
||||
{ "id_prefix": "c.cmdi.system", "min_count": 3 },
|
||||
{ "id_prefix": "cfg-unguarded-sink", "min_count": 5 }
|
||||
],
|
||||
"forbidden_findings": [],
|
||||
|
|
|
|||
8
tests/fixtures/express_app/expectations.json
vendored
8
tests/fixtures/express_app/expectations.json
vendored
|
|
@ -1,10 +1,10 @@
|
|||
{
|
||||
"required_findings": [
|
||||
{ "id_prefix": "taint-unsanitised-flow", "min_count": 6 },
|
||||
{ "id_prefix": "eval_call", "min_count": 1 },
|
||||
{ "id_prefix": "document_write", "min_count": 1 },
|
||||
{ "id_prefix": "settimeout_string", "min_count": 1 },
|
||||
{ "id_prefix": "cookie_assignment", "min_count": 1 }
|
||||
{ "id_prefix": "js.code_exec.eval", "min_count": 1 },
|
||||
{ "id_prefix": "js.xss.document_write", "min_count": 1 },
|
||||
{ "id_prefix": "js.code_exec.settimeout_string", "min_count": 1 },
|
||||
{ "id_prefix": "js.xss.cookie_write", "min_count": 1 }
|
||||
],
|
||||
"forbidden_findings": [],
|
||||
"noise_budget": {
|
||||
|
|
|
|||
8
tests/fixtures/flask_app/expectations.json
vendored
8
tests/fixtures/flask_app/expectations.json
vendored
|
|
@ -1,13 +1,13 @@
|
|||
{
|
||||
"required_findings": [
|
||||
{ "id_prefix": "taint-unsanitised-flow", "min_count": 8 },
|
||||
{ "id_prefix": "eval_call", "min_count": 1 },
|
||||
{ "id_prefix": "exec_call", "min_count": 2 },
|
||||
{ "id_prefix": "cfg-auth-gap", "min_count": 5 }
|
||||
{ "id_prefix": "py.code_exec.eval", "min_count": 1 },
|
||||
{ "id_prefix": "py.code_exec.exec", "min_count": 2 },
|
||||
{ "id_prefix": "state-unauthed-access", "min_count": 5 }
|
||||
],
|
||||
"forbidden_findings": [],
|
||||
"noise_budget": {
|
||||
"max_total_findings": 35,
|
||||
"max_total_findings": 50,
|
||||
"max_high_findings": 25
|
||||
},
|
||||
"performance_expectations": {
|
||||
|
|
|
|||
2
tests/fixtures/go_server/expectations.json
vendored
2
tests/fixtures/go_server/expectations.json
vendored
|
|
@ -1,7 +1,7 @@
|
|||
{
|
||||
"required_findings": [
|
||||
{ "id_prefix": "taint-unsanitised-flow", "min_count": 4 },
|
||||
{ "id_prefix": "exec_command", "min_count": 3 },
|
||||
{ "id_prefix": "go.cmdi.exec_command", "min_count": 3 },
|
||||
{ "id_prefix": "cfg-unguarded-sink", "min_count": 1 }
|
||||
],
|
||||
"forbidden_findings": [],
|
||||
|
|
|
|||
10
tests/fixtures/java_service/expectations.json
vendored
10
tests/fixtures/java_service/expectations.json
vendored
|
|
@ -1,14 +1,14 @@
|
|||
{
|
||||
"required_findings": [
|
||||
{ "id_prefix": "taint-unsanitised-flow", "min_count": 2 },
|
||||
{ "id_prefix": "runtime_exec", "min_count": 2 },
|
||||
{ "id_prefix": "class_for_name", "min_count": 1 },
|
||||
{ "id_prefix": "cfg-unguarded-sink", "min_count": 2 }
|
||||
{ "id_prefix": "java.cmdi.runtime_exec", "min_count": 2 },
|
||||
{ "id_prefix": "java.reflection.class_forname", "min_count": 1 },
|
||||
{ "id_prefix": "cfg-unguarded-sink", "min_count": 1 }
|
||||
],
|
||||
"forbidden_findings": [],
|
||||
"noise_budget": {
|
||||
"max_total_findings": 15,
|
||||
"max_high_findings": 8
|
||||
"max_total_findings": 20,
|
||||
"max_high_findings": 12
|
||||
},
|
||||
"performance_expectations": {
|
||||
"max_ms_no_index": 1000,
|
||||
|
|
|
|||
|
|
@ -1,10 +1,7 @@
|
|||
{
|
||||
"required_findings": [
|
||||
{ "id_prefix": "taint-unsanitised-flow", "min_count": 10 },
|
||||
{ "id_prefix": "eval_call", "min_count": 2 },
|
||||
{ "id_prefix": "unwrap_call", "min_count": 3 },
|
||||
{ "id_prefix": "expect_call", "min_count": 1 },
|
||||
{ "id_prefix": "panic_macro", "min_count": 1 },
|
||||
{ "id_prefix": "js.code_exec.eval", "min_count": 1 },
|
||||
{ "id_prefix": "cfg-unguarded-sink", "min_count": 2 }
|
||||
],
|
||||
"forbidden_findings": [],
|
||||
|
|
|
|||
24
tests/fixtures/patterns/c/negative.c
vendored
Normal file
24
tests/fixtures/patterns/c/negative.c
vendored
Normal file
|
|
@ -0,0 +1,24 @@
|
|||
/* Negative fixture: none of these should trigger security patterns. */
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
void safe_snprintf(const char *name) {
|
||||
char buf[128];
|
||||
snprintf(buf, sizeof(buf), "Hello %s", name);
|
||||
}
|
||||
|
||||
void safe_strncpy(const char *src) {
|
||||
char dst[32];
|
||||
strncpy(dst, src, sizeof(dst) - 1);
|
||||
dst[sizeof(dst) - 1] = '\0';
|
||||
}
|
||||
|
||||
void safe_fgets() {
|
||||
char buf[64];
|
||||
fgets(buf, sizeof(buf), stdin);
|
||||
}
|
||||
|
||||
void safe_printf_literal() {
|
||||
printf("Hello %s\n", "world");
|
||||
}
|
||||
50
tests/fixtures/patterns/c/positive.c
vendored
Normal file
50
tests/fixtures/patterns/c/positive.c
vendored
Normal file
|
|
@ -0,0 +1,50 @@
|
|||
/* Positive fixture: each snippet should trigger the named pattern. */
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
/* c.memory.gets */
|
||||
void trigger_gets() {
|
||||
char buf[64];
|
||||
gets(buf);
|
||||
}
|
||||
|
||||
/* c.memory.strcpy */
|
||||
void trigger_strcpy(char *src) {
|
||||
char dst[32];
|
||||
strcpy(dst, src);
|
||||
}
|
||||
|
||||
/* c.memory.strcat */
|
||||
void trigger_strcat(char *extra) {
|
||||
char buf[64] = "prefix";
|
||||
strcat(buf, extra);
|
||||
}
|
||||
|
||||
/* c.memory.sprintf */
|
||||
void trigger_sprintf(const char *name) {
|
||||
char buf[128];
|
||||
sprintf(buf, "Hello %s", name);
|
||||
}
|
||||
|
||||
/* c.memory.scanf_percent_s */
|
||||
void trigger_scanf() {
|
||||
char name[32];
|
||||
scanf("%s", name);
|
||||
}
|
||||
|
||||
/* c.cmdi.system */
|
||||
void trigger_system(const char *cmd) {
|
||||
system(cmd);
|
||||
}
|
||||
|
||||
/* c.cmdi.popen */
|
||||
void trigger_popen(const char *cmd) {
|
||||
FILE *f = popen(cmd, "r");
|
||||
pclose(f);
|
||||
}
|
||||
|
||||
/* c.memory.printf_no_fmt */
|
||||
void trigger_printf_no_fmt(char *user_data) {
|
||||
printf(user_data);
|
||||
}
|
||||
24
tests/fixtures/patterns/cpp/negative.cpp
vendored
Normal file
24
tests/fixtures/patterns/cpp/negative.cpp
vendored
Normal file
|
|
@ -0,0 +1,24 @@
|
|||
// Negative fixture: none of these should trigger security patterns.
|
||||
#include <cstdio>
|
||||
#include <cstring>
|
||||
#include <string>
|
||||
|
||||
void safe_string_ops() {
|
||||
std::string s = "hello";
|
||||
std::string copy = s;
|
||||
auto len = s.length();
|
||||
}
|
||||
|
||||
void safe_cast() {
|
||||
double d = 3.14;
|
||||
int i = static_cast<int>(d);
|
||||
}
|
||||
|
||||
void safe_snprintf(const char *name) {
|
||||
char buf[128];
|
||||
snprintf(buf, sizeof(buf), "Hello %s", name);
|
||||
}
|
||||
|
||||
void safe_printf_literal() {
|
||||
printf("Hello %s\n", "world");
|
||||
}
|
||||
49
tests/fixtures/patterns/cpp/positive.cpp
vendored
Normal file
49
tests/fixtures/patterns/cpp/positive.cpp
vendored
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
// Positive fixture: each snippet should trigger the named pattern.
|
||||
#include <cstdlib>
|
||||
#include <cstring>
|
||||
#include <cstdio>
|
||||
|
||||
// cpp.memory.gets
|
||||
void trigger_gets() {
|
||||
char buf[64];
|
||||
gets(buf);
|
||||
}
|
||||
|
||||
// cpp.memory.strcpy
|
||||
void trigger_strcpy(const char *src) {
|
||||
char dst[32];
|
||||
strcpy(dst, src);
|
||||
}
|
||||
|
||||
// cpp.memory.strcat
|
||||
void trigger_strcat(const char *extra) {
|
||||
char buf[64] = "prefix";
|
||||
strcat(buf, extra);
|
||||
}
|
||||
|
||||
// cpp.memory.sprintf
|
||||
void trigger_sprintf(const char *name) {
|
||||
char buf[128];
|
||||
sprintf(buf, "Hello %s", name);
|
||||
}
|
||||
|
||||
// cpp.cmdi.system
|
||||
void trigger_system(const char *cmd) {
|
||||
system(cmd);
|
||||
}
|
||||
|
||||
// cpp.memory.reinterpret_cast
|
||||
void trigger_reinterpret_cast() {
|
||||
int x = 42;
|
||||
float *fp = reinterpret_cast<float*>(&x);
|
||||
}
|
||||
|
||||
// cpp.memory.const_cast
|
||||
void trigger_const_cast(const int *p) {
|
||||
int *q = const_cast<int*>(p);
|
||||
}
|
||||
|
||||
// cpp.memory.printf_no_fmt
|
||||
void trigger_printf_no_fmt(char *user_data) {
|
||||
printf(user_data);
|
||||
}
|
||||
23
tests/fixtures/patterns/go/negative.go
vendored
Normal file
23
tests/fixtures/patterns/go/negative.go
vendored
Normal file
|
|
@ -0,0 +1,23 @@
|
|||
package main
|
||||
|
||||
import (
|
||||
"crypto/sha256"
|
||||
"database/sql"
|
||||
)
|
||||
|
||||
func safeHash(data []byte) {
|
||||
sha256.Sum256(data)
|
||||
}
|
||||
|
||||
func safeParamQuery(db *sql.DB, user string) {
|
||||
db.Query("SELECT * FROM users WHERE name = $1", user)
|
||||
}
|
||||
|
||||
func safeLiteralQuery(db *sql.DB) {
|
||||
db.Query("SELECT COUNT(*) FROM users")
|
||||
}
|
||||
|
||||
func safeStringOps() {
|
||||
x := "hello"
|
||||
_ = len(x)
|
||||
}
|
||||
55
tests/fixtures/patterns/go/positive.go
vendored
Normal file
55
tests/fixtures/patterns/go/positive.go
vendored
Normal file
|
|
@ -0,0 +1,55 @@
|
|||
package main
|
||||
|
||||
import (
|
||||
"crypto/md5"
|
||||
"crypto/sha1"
|
||||
"database/sql"
|
||||
"encoding/gob"
|
||||
"os"
|
||||
"os/exec"
|
||||
"unsafe"
|
||||
)
|
||||
|
||||
// go.cmdi.exec_command
|
||||
func triggerExecCommand(cmd string) {
|
||||
exec.Command("bash", "-c", cmd)
|
||||
}
|
||||
|
||||
// go.memory.unsafe_pointer
|
||||
func triggerUnsafePointer() {
|
||||
x := 42
|
||||
p := unsafe.Pointer(&x)
|
||||
_ = p
|
||||
}
|
||||
|
||||
// go.transport.insecure_skip_verify
|
||||
func triggerInsecureSkipVerify() {
|
||||
_ = struct{ InsecureSkipVerify bool }{InsecureSkipVerify: true}
|
||||
}
|
||||
|
||||
// go.crypto.md5
|
||||
func triggerMD5(data []byte) {
|
||||
md5.Sum(data)
|
||||
}
|
||||
|
||||
// go.crypto.sha1
|
||||
func triggerSHA1(data []byte) {
|
||||
sha1.Sum(data)
|
||||
}
|
||||
|
||||
// go.sqli.query_concat
|
||||
func triggerSQLConcat(db *sql.DB, user string) {
|
||||
db.Query("SELECT * FROM users WHERE name = '" + user + "'")
|
||||
}
|
||||
|
||||
// go.secrets.hardcoded_key
|
||||
func triggerHardcodedSecret() {
|
||||
password := "super_secret_password_12345"
|
||||
_ = password
|
||||
}
|
||||
|
||||
// go.deser.gob_decode
|
||||
func triggerGobDecode(f *os.File) {
|
||||
dec := gob.NewDecoder(f)
|
||||
_ = dec
|
||||
}
|
||||
22
tests/fixtures/patterns/java/negative.java
vendored
Normal file
22
tests/fixtures/patterns/java/negative.java
vendored
Normal file
|
|
@ -0,0 +1,22 @@
|
|||
import java.sql.*;
|
||||
import java.security.SecureRandom;
|
||||
|
||||
class Negative {
|
||||
// Safe: parameterized query
|
||||
void safeQuery(Connection conn, String user) throws Exception {
|
||||
PreparedStatement ps = conn.prepareStatement("SELECT * FROM users WHERE name = ?");
|
||||
ps.setString(1, user);
|
||||
ResultSet rs = ps.executeQuery();
|
||||
}
|
||||
|
||||
// Safe: SecureRandom instead of Random
|
||||
void safeRandom() {
|
||||
SecureRandom sr = new SecureRandom();
|
||||
int token = sr.nextInt();
|
||||
}
|
||||
|
||||
// Safe: no concatenation in SQL
|
||||
void safeLiteralQuery(Statement stmt) throws Exception {
|
||||
stmt.executeQuery("SELECT COUNT(*) FROM users");
|
||||
}
|
||||
}
|
||||
48
tests/fixtures/patterns/java/positive.java
vendored
Normal file
48
tests/fixtures/patterns/java/positive.java
vendored
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
import java.io.*;
|
||||
import java.util.Random;
|
||||
import java.security.MessageDigest;
|
||||
|
||||
class Positive {
|
||||
// java.deser.readobject
|
||||
void triggerDeser(InputStream is) throws Exception {
|
||||
ObjectInputStream ois = new ObjectInputStream(is);
|
||||
Object obj = ois.readObject();
|
||||
}
|
||||
|
||||
// java.cmdi.runtime_exec
|
||||
void triggerRuntimeExec(String cmd) throws Exception {
|
||||
Runtime.getRuntime().exec(cmd);
|
||||
}
|
||||
|
||||
// java.reflection.class_forname
|
||||
void triggerClassForName(String name) throws Exception {
|
||||
Class.forName(name);
|
||||
}
|
||||
|
||||
// java.reflection.method_invoke
|
||||
void triggerMethodInvoke(Object target) throws Exception {
|
||||
java.lang.reflect.Method m = target.getClass().getMethod("run");
|
||||
m.invoke(target);
|
||||
}
|
||||
|
||||
// java.sqli.execute_concat
|
||||
void triggerSqlConcat(java.sql.Statement stmt, String user) throws Exception {
|
||||
stmt.executeQuery("SELECT * FROM users WHERE name = '" + user + "'");
|
||||
}
|
||||
|
||||
// java.crypto.insecure_random
|
||||
void triggerInsecureRandom() {
|
||||
Random r = new Random();
|
||||
int token = r.nextInt();
|
||||
}
|
||||
|
||||
// java.crypto.weak_digest
|
||||
void triggerWeakDigest() throws Exception {
|
||||
MessageDigest md = MessageDigest.getInstance("MD5");
|
||||
}
|
||||
|
||||
// java.xss.getwriter_print
|
||||
void triggerGetWriterPrint(javax.servlet.http.HttpServletResponse resp) throws Exception {
|
||||
resp.getWriter().println("<html>" + "data" + "</html>");
|
||||
}
|
||||
}
|
||||
25
tests/fixtures/patterns/javascript/negative.js
vendored
Normal file
25
tests/fixtures/patterns/javascript/negative.js
vendored
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
// Negative fixture: none of these should trigger security patterns.
|
||||
|
||||
function safeStringOps() {
|
||||
var x = "hello";
|
||||
var y = x.toUpperCase();
|
||||
var z = JSON.stringify({ key: "value" });
|
||||
}
|
||||
|
||||
function safeTimeout(fn) {
|
||||
// Function reference, not string
|
||||
setTimeout(fn, 1000);
|
||||
}
|
||||
|
||||
function safeDomManipulation(el) {
|
||||
el.textContent = "safe text";
|
||||
el.setAttribute("class", "active");
|
||||
}
|
||||
|
||||
function safeRandomness() {
|
||||
var buf = crypto.getRandomValues(new Uint8Array(16));
|
||||
}
|
||||
|
||||
function safeCopy(src) {
|
||||
var copy = Object.assign({}, src);
|
||||
}
|
||||
51
tests/fixtures/patterns/javascript/positive.js
vendored
Normal file
51
tests/fixtures/patterns/javascript/positive.js
vendored
Normal file
|
|
@ -0,0 +1,51 @@
|
|||
// Positive fixture: each snippet should trigger the named pattern.
|
||||
|
||||
// js.code_exec.eval
|
||||
function triggerEval(code) {
|
||||
eval(code);
|
||||
}
|
||||
|
||||
// js.code_exec.new_function
|
||||
function triggerNewFunction(body) {
|
||||
var fn = new Function(body);
|
||||
}
|
||||
|
||||
// js.code_exec.settimeout_string
|
||||
function triggerSetTimeout() {
|
||||
setTimeout("alert(1)", 1000);
|
||||
}
|
||||
|
||||
// js.xss.document_write
|
||||
function triggerDocumentWrite(data) {
|
||||
document.write(data);
|
||||
}
|
||||
|
||||
// js.xss.outer_html
|
||||
function triggerOuterHtml(el, data) {
|
||||
el.outerHTML = data;
|
||||
}
|
||||
|
||||
// js.xss.insert_adjacent_html
|
||||
function triggerInsertAdjacentHtml(el, data) {
|
||||
el.insertAdjacentHTML("beforeend", data);
|
||||
}
|
||||
|
||||
// js.prototype.proto_assignment
|
||||
function triggerProtoAssignment(obj) {
|
||||
obj.__proto__ = { malicious: true };
|
||||
}
|
||||
|
||||
// js.xss.location_assign
|
||||
function triggerLocationAssign(url) {
|
||||
window.location = url;
|
||||
}
|
||||
|
||||
// js.xss.cookie_write
|
||||
function triggerCookieWrite(sid) {
|
||||
document.cookie = "session=" + sid;
|
||||
}
|
||||
|
||||
// js.crypto.math_random
|
||||
function triggerMathRandom() {
|
||||
var token = Math.random();
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue