mirror of
https://github.com/elicpeter/nyx.git
synced 2026-07-03 20:41:00 +02:00
Release/0.5.0 (#35)
* feat: Introduce function-scoped variable interning for state analysis with new tests and fixtures * feat: Add Phase 26 symbolic execution enhancements with bitwise operator support, abstract interpretation refinements, and new taint analysis tests * feat: Refine state analysis to handle factory-pattern resource returns with mixed-path tests and leak detection enhancements * feat: Add Phase 27 debug views with symbolic execution, abstract interpretation, SSA, and call graph viewers; integrate with debug layout and styles * feat: Add Phase 31 type-qualified symbolic resolution with receiver-based callee disambiguation and testing * feat: Extend symbolic execution with state iteration, enhanced debug views, and debounced input handling * feat: Add Phase 13 resource and auth pattern extensions with new tests and fixtures * feat: Introduce CFG debug graph renderer with compact mode, toolbar, and DAG layout integration * feat: Add Phase 28 encoding and decoding transform modeling with structural symex enhancements and new taint analysis tests * feat: Extend abstract interpretation with type facts and constant value tracking in debug views and server logic * feat: Add linear path handling and witness extraction to symbolic execution with Phase 28 transform mismatch detection * feat: Refine Go auth and sanitizer handling with enhanced rules, state updates, and benchmark improvements * feat: Enable auth-state analysis by default and update relevant tests in benchmark config * test: Update state_tests to reflect default enablement of auth-state analysis and add auth suppression test * docs: update CHANGELOG.md * feat: Introduce per-index taint tracking in `HeapState` with `HeapSlot`, overflow handling, and revised SSA transfers * feat: Introduce C/C++ language labels and refine heap state tracking in SSA transfers * feat: Implement per-index array slot tracking in symbolic heap with overflow collapse * feat: Add implicit definition handling for uninitialized declarations in SSA value allocation * feat: Refactor function parameters and constants for improved clarity and maintainability * refactor: Reorder module imports and improve formatting for consistency * refactor: Fix formatting erorrs * refactor: Fix clippy warnings * refactor: Fix fmt warnings (again) * chore: Update dependencies and improve feature configuration * Add comprehensive tests for undertested modules (#36) (COPILOT) * Add comprehensive tests for undertested modules Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083 * Add comprehensive tests for ext, project, walk, and errors modules Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083 --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * chore: Update dependencies and improve feature configuration * fix: formatting errors in new tests * chore: Update license list in about.toml * chore: made functions input inline * chore: updated cfg graph to take up the full page * chore: add Prettier configuration and update code formatting * Add frontend test suite with Vitest (111 tests) (#37) * Add Vitest test suite for frontend - 111 tests across utils, components, hooks, and graph utilities Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/7cf0dba2-ecff-4740-ba4d-92717e74a0b7 * ci: add frontend test step to CI workflow Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/5bc0ac9f-0a32-4d03-9cb7-7a15aea53fca --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * chore: simplify array initialization in test files for consistency * ran typecheck * feat: add AnalysisWorkspace component and integrate it into CfgViewerPage * feat: update routing in AppLayout and improve empty state message in ExplorerPage * feat: enhance scan progress tracking with additional metrics and stages * feat: update license information and add license check script * feat: implement cross-file symbolic execution with callee body persistence * feat: replace dagre graphs with Graphology + ELK + Sigma for more advanced call stack and cfg rendering * feat: ensure CFG function view is scoped to the selected function, preventing bleed into sibling functions * feat: enhance resource tracking with proxy method summaries and improve finding extraction * feat: add terminal function exit detection for accurate resource leak analysis * feat: add warnings for loops and functions without bodies to improve error recovery * feat: update lambda expression handling to ensure proper function classification and control flow * feat: remove bounded formatting/string ops and add JSON.parse sanitizer for improved data handling * feat: add inline return taint analysis and regression tests for improved security checks * feat: add engine version management and migration handling for database schema updates * feat: enhance first_call_ident to skip nested function bodies and add regression tests * feat: enhance callee name resolution with two-segment normalization and disambiguation * feat: add cross-file context flags and debug assertions for taint analysis * feat: refactor taint analysis structure to unify context handling and improve clarity * feat: enhance dead code elimination to preserve Sink, Source, and Sanitizer labels with new tests * docs: updated CHANGELOG.md * fmt: formatting fixes * fix: fixed frontend formatting and lint warnings * fix: optimized ci * fix: optimized ci * Add comprehensive multi-file test coverage to Nyx (#38) * Initial checklist for multi-file test suite expansion Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * Add 12 new multi-file test fixtures with TP/TN/near-miss coverage Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * deleted root repo * rebuilt to test for regressions --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Co-authored-by: elipeter <elicpeter@gmail.com> * feat: enhance import alias resolution and taint tracking * feat: implement security hardening with CSRF protection and path validation * feat: add support for import alias bindings in Python, PHP, and Rust * feat: enhance CFG analysis modes and improve code readability * feat: add detection for parameterized SQL queries to enhance security * feat: add safe internal redirect handling and enhance session destroy validation * feat: implement security improvements by addressing vulnerabilities in execAsync, session management, and file downloads * feat: enhance taint detection by adding support for inline source member expressions in call arguments * feat: implement pre-emission of Source nodes for inline source member expressions in call arguments * feat: add support for Throw statement in control flow and error handling * feat: add debug and echo endpoints with potential information leakage * feat: implement internal redirect suppression and enhance taint detection * feat: implement module alias tracking for dynamic dispatch in JS/TS * feat: add authorization analysis module with Express support * feat: add authorization analysis module with Express support * feat: add tests for admin guard requirements and clean checks in authorization analysis * feat: integrate Koa and Fastify frameworks into authorization analysis * feat: add Flask and Django support to authorization analysis module * feat: add support for Rails and Sinatra frameworks in authorization analysis * feat: add support for Axum, ActixWeb, and Rocket frameworks in authorization analysis * feat: add support for ActixWeb, Axum, and Rocket frameworks in authorization analysis * feat: add support for Rails and Sinatra in authorization analysis * chore: add .DS_Store to .gitignore * refactor: simplify conditional checks and improve readability in multiple files * refactor: update usage of Option methods for improved clarity and consistency * refactor: improve code readability by simplifying conditional checks and formatting * refactor: improve code formatting and readability by simplifying conditional checks * refactor: simplify conditional checks and improve readability in multiple files * refactor: simplify conditional checks in axum.rs for improved readability * feat: add CodeQL analysis configuration for enhanced security scanning * test: add comprehensive tests for `src/output.rs` SARIF builder (#39) * chore: start test coverage improvement work Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * test: add comprehensive tests for src/output.rs SARIF builder Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * refactor: improve code formatting and readability in output.rs --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Co-authored-by: elipeter <elicpeter@gmail.com> * refactor: improve code formatting and readability in output.rs * Potential fix for code scanning alert no. 210: Uncontrolled data used in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Potential fix for code scanning alert no. 211: Uncontrolled data used in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * refactor: enhance triage file path handling with improved error management and validation * refactor: updated func summaries for richer detail * refactor: update SSA summary extraction to use canonical FuncKey for distinct entries * refactor: enhance callee metadata structure to support arity, receiver, and qualifier for better overload resolution * refactor: add support for keyword arguments in function calls and enhance receiver extraction for method-style calls * refactor: implement new Flask routes for safe and unsafe shell command execution * refactor: separate receiver handling in SSA operations and enhance taint propagation * refactor: improve arity handling by using arg_uses for positional argument count and enhance witness scoring for tainted arguments * refactor: implement auth decorator extraction and classification for multiple languages * refactor: enhance Rust module path resolution and use map handling for cross-file disambiguation * refactor: introduce CalleeQuery struct for structured callee resolution and enhance resolver logic * refactor: implement same-file identity collision handling for `runTask` to ensure correct resolver behavior * refactor: standardize default struct initialization across multiple files * feat: add scripts for formatting checks and auto-fixes with test summaries * refactor: simplify character splitting and enhance namespace qualifier handling * refactor: improve documentation clarity and enhance code readability in resolver logic * refactor: replace default struct initialization with explicit field assignments for clarity * feat: enhance anonymous function naming by deriving context-based bindings * refactor: streamline match expressions for improved readability and performance * refactor: streamline match expressions for improved readability and performance * refactor: replace loop with while let for improved clarity and performance * feat: add SSA constant propagation support to analysis context for improved accuracy * feat: add SSA constant propagation support to analysis context for improved accuracy * feat: implement shell metacharacter validation and bounded-length checks in Rust analysis * feat: add static map analysis for command injection suppression and type safety * refactor: simplify match statements and reduce line breaks for improved readability * feat(summary): phase 1/5 SinkSite data model for primary sink-location attribution Introduce SinkSite (file_rel, line, col, snippet, cap) carrying the primary sink source-location through function summaries. Swap SsaFuncSummary.param_to_sink and FuncSummary.param_to_sink from a coarse Cap map to a deduped SmallVec<[SinkSite; 1]> per parameter, with a backward-compatible cap_sites() helper and serde defaults so pre-phase-1 on-disk rows continue to deserialise cleanly. Extraction: SinkSiteLocator bundles the tree/bytes/file_rel needed by extract_ssa_func_summary; ParsedFile::extract_ssa_artifacts wires the locator in for the persisted pass-1 path, while pass-2 intra-file transient summaries fall back to cap-only sites (behavior unchanged). Merge: GlobalSummaries::insert now unions sink sites with (file_rel, line, col, cap) dedup via shared union_param_sink_sites helper. Database: JSON-serialised summary columns carry the new shape automatically; no schema change needed. Phase 2 will consume SinkSite in build_taint_diag() to overwrite the caller-site Finding.line with the callee's sink line when resolved via summary. Phase 1 keeps behavior unchanged: scanning tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs still produces the same (wrong) line 10 finding. Adds round-trip tests covering SinkSite solo, SsaFuncSummary with sink sites, legacy-JSON default handling for both summary types, and merge dedup. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(taint): phase 2/5 thread SinkSite into SsaTaintEvent and Finding Plumb Phase 1's SinkSite through the event pipeline into Findings, no output change yet. SsaTaintEvent gains `primary_sink_site: Option<SinkSite>`; when the main or callback sink-emission path has non-empty `param_to_sink_sites`, filter to sites whose `(line != 0) && (cap ∩ sink_caps != ∅)` and emit one event per distinct site — the multi-primary collapse keeps each downstream Finding single-primary. Resolution: ResolvedSummary and SinkInfo gain mirror `param_to_sink_sites` fields, populated from `SsaFuncSummary.param_to_sink` (SSA + callback paths) and `FuncSummary.param_to_sink` (global paths). Label, local-summary, and interop resolution paths leave the field empty — they only ever had cap-level info to begin with. Finding: new `primary_location: Option<SinkLocation>` with `file_rel/line/col`. `ssa_events_to_findings` maps `event.primary_sink_site` → `Finding.primary_location`, filtering cap-only sites (`line == 0`) to `None` so the (0,0) sentinel never leaks to formatters. Dedup key extended with the primary location so multi-site events aren't collapsed back together. Invariants (debug_assert!): * every SinkSite reaching emission has `line != 0 && cap ∩ sink_caps != ∅` — enforced by the pick_primary_sink_sites* filters; * every populated Finding.primary_location has `line != 0` AND non-empty `file_rel` — the cap-only → None translation upstream guarantees this. Deliberately independent of `uses_summary`: that flag tracks whether the *taint chain* used a summary, whereas primary attribution requires only that the *sink* itself was summary-resolved. A local source reaching a cross-file sink produces `uses_summary=false` alongside a populated primary_location — documented on Finding.primary_location, covered by `cross_file_sink_finding_carries_primary_location`. build_taint_diag, SARIF/JSON/explanation formatters, and the benchmark scorer remain untouched: finding.line still comes from `cfg_graph[finding.sink]`, so cmdi_indirect.rs still reports line 10 and the benchmark's rs-cmdi-003 row still shows FN in the LOC column. Tests: `cross_file_sink_finding_carries_primary_location` (proves plumbing via a synthetic FuncSummary carrying a SinkSite at 42:5) and `cross_file_sink_cap_only_site_leaves_primary_location_none` (regression guard against cap-only sites surfacing). All 1566 lib tests + integration tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(output): phase 3/5 consume primary sink location in diag + SARIF When a finding's primary_location (populated in phase 2 from a callee summary's SinkSite) names the dangerous instruction inside a callee body, attribute the diagnostic line to that location instead of the caller's call site. The call site is demoted to a Call step in flow_steps, and a synthetic Sink step at the primary location is appended so analysts still see the full trace. Changes: - Add scan_root parameter to build_taint_diag so file_rel can be resolved back to an absolute path via a shared resolve_file_rel helper. Empty file_rel (single-file scans where namespace == "") resolves to the file under analysis. - Extend SinkLocation with snippet, carried from the upstream SinkSite so the formatter needs no second file read. - Relax the ssa_events_to_findings debug_assert to allow empty file_rel, which is valid when scan root equals the file itself. - SARIF: emit data-flow as codeFlows[0].threadFlows[0].locations[]; locations[0] already reflects the primary sink position via the updated diag line/col. Acceptance: scan on tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs now reports line 5 (Command::new) as the primary sink, with the call site at line 10 visible in flow_steps. Two expect.json fixtures updated (must_match line_range widened): - javascript/taint/context_sensitive_call: 12-14 -> 7-14 (line 8 is the real sink inside run()). - rust/cfg/closure_async: 10-10 -> 10-11 (line 11 is Command::new inside the closure). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(bench): phase 4/5 validate primary sink attribution across corpus Extend the benchmark scorer and ground truth to lock in phase 3's primary-location behavior, and add fixtures that exercise the new capability end-to-end. Scorer (tests/benchmark_test.rs): - Add optional `expected_call_site_lines: Option<Vec<[usize; 2]>>` on Case. When present, score_location_level additionally requires at least one flow_step in the finding's evidence trace to fall within ±2 of the call-site range. When absent, the check is skipped — fully forward-compatible with existing fixtures. - Retain ±2 tolerance on expected_sink_lines (compared against the now-primary Diag.line post-phase-3). Ground truth edits: - rs-cmdi-cross-001: expected_sink_lines [8,8] -> [9,9]. Line 8 is the transform::wrap call site (a cross-file propagator, not a sink); line 9 is Command::new, the real sink. The ±2 tolerance happened to mask this stale attribution but it was semantically wrong — phase 4 is the right time to correct it. Also adds expected_call_site_lines [8,8] so the new field is exercised on an existing cross-file case. - rs-cmdi-003: adds expected_call_site_lines [10,10] (run_cmd call). This fixture's sink (Command::new inside run_cmd at line 5) was the motivating case for phases 1-3; adding the call-site assertion guards against regression to caller-line attribution. New fixtures: - rust/cmdi/cmdi_indirect_multisink.rs (rs-cmdi-009): helper run_both takes two tainted params and invokes two Command sinks on consecutive lines. Locks in that primary line lands inside the helper (lines 5-6), not at the caller (line 12). Notes document that SinkSite is currently one-per-callee so both findings today collapse onto the first sink; expected_sink_lines=[5,6] and expected_call_site_lines=[12,12] stay valid either way. - python/cmdi/cross_indirect_sink/{app.py,helper.py} (py-cmdi-cross- 004): sink os.system lives in helper.py (cross-file), caller in app.py reads env source and calls run_cmd. Verifies phase 3's cross-file primary attribution: Diag.path = helper.py, Diag.line = 5, with app.py:7 recorded in flow_steps as a Call step. Acceptance: - `cargo test --test benchmark_test -- --ignored --nocapture` passes. - rs-cmdi-003 is TP/TP/TP (the target flip FN->TP at LOC). All pre-existing TP/TP/TP fixtures remain TP/TP/TP; 2 new fixtures are TP/TP/TP. - Aggregate rule-level: TP=158 FP=10 FN=1 TN=97, P=0.940 R=0.994 F1=0.966 on the 266-case corpus (was TP=156 FP=10 FN=1 TN=97 on 264 pre-phase-4, delta is the +2 new cases both resolving TP). - Full `cargo test` green (1566 lib tests + all integration tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(taint): phase 5/5 lock Finding.primary_location contract via regression test Add a regression test in src/taint/ssa_transfer.rs that wires up a synthetic SsaFuncSummary with a SinkSite at other.rs:42:10 and drives the three emission stages (pick_primary_sink_sites → emit_ssa_taint_events → ssa_events_to_findings) against a minimal caller SSA body. Asserts the resulting Finding.primary_location is exactly that triple. The existing integration tests in src/taint/tests.rs cover the coarse FuncSummary path end-to-end through analyse_file. This test locks in the lower-level SSA-side plumbing so a future refactor that silently drops the site between pick → emit → findings fails here rather than only at the benchmark layer. Also refreshes tests/benchmark/results/latest.json (timestamp only; rs-cmdi-003 remains TP/TP/TP and the aggregate P/R/F1 are unchanged from phase 4). Closes the primary sink-location attribution feature (phases 1-5/5): * Phase 1 — SinkSite data model on summaries. * Phase 2 — SinkSite threaded into SsaTaintEvent and Finding. * Phase 3 — diag + SARIF consume primary_location. * Phase 4 — benchmark validates primary_call_site_lines across corpus. * Phase 5 — regression test locks the event→finding contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: clean up formatting and improve readability in multiple files * refactor: simplify type definition for deduplication key in findings * test(harness): add must_not_match expectation for FP regression guards Extends ExpectedFinding with must_not_match field that asserts a diagnostic must NOT fire — presence is a hard failure. Non-consuming scan so it coexists with must_match entries on the same rule_id. Adds forbidden_violations accumulator and updates summary line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(regression): update expectations to ensure must_not_match for various taint and resource leak rules * feat: implement auto-seeding for JS/TS handler parameters to enhance taint tracking * feat: update switch statement handling to improve control flow analysis * feat: implement promisify alias handling for JS/TS to enhance taint tracking * feat: enhance taint tracking by refining expectation handling and adding mode filtering * feat: refine SQL handling in stream processing and enhance auto-seeding for handler parameters * feat: update taint tracking rules to enforce full mode matching and improve flow analysis * feat: enhance Ruby subshell handling to improve taint tracking and flow analysis * feat: update xss_response expectations to refine taint flow analysis and enhance regression guarding * feat: refine framework detection and update expectation handling for Echo and Sinatra * feat: implement max_count for taint tracking expectations and deduplicate findings * feat: add strict_unexpected handling for taint-unsanitised-flow in expectation files * feat: enhance deduplication of taint-unsanitised-flow findings by collapsing based on line and severity * feat: add strict_unexpected handling for taint-unsanitised-flow in multiple expectation files * feat: add structural invariant checks for SSA bodies * feat: ensure deterministic phi emission order using BTreeSet * feat: enhance handling of terminators to ensure authoritative flow through successor edges * feat: enhance Goto terminator handling to ensure all successors are marked executable * feat: refactor code for improved readability and organization * feat: simplify predicate checks and enhance readability in SSA handling * feat: implement per-file parse timeout and enhance file size handling * feat: migrate analysis engine toggles from environment variables to configuration file * feat: remove unnecessary whitespace in hostile_input_tests.rs * feat: remove unnecessary whitespace in hostile_input_tests.rs * feat: update dependencies and enhance documentation on language maturity * feat: enhance security headers and improve request body limits * feat: implement sink capability bits for deduplication and enhance evidence tagging * feat: implement dynamic activation handling for gated sinks and enhance validation logic * feat: enhance configuration documentation and clarify inline analysis cache behavior * feat: implement panic recovery during analysis to continue scans past errors * feat: add expectations configuration for taint analysis and performance metrics * feat: enhance error handling and logging during file reading and mutex locking * feat: add cross-file body loading tests and plumbing for CF-1 phase * feat: implement cross-file k=1 context-sensitive inline taint analysis with new tests and fixtures * feat: implement indexed-scan parity in cross-file inline analysis with new dropdown and copy functionality * feat: enhance classification span handling in CFG and AST for improved source attribution * feat: add new Express routes for handling user input and telemetry data * feat: implement ternary expression handling in CFG with diamond structure for JS/TS * feat: implement Phase CF-3 abstract-domain transfer channels in summaries * feat: add support for string-prefix transfer in cross-file calls and update tests * docs: reduce RESULTS.md doc size * feat: implement Phase CF-4 per-return-path summary decomposition with tests * feat: update parameter handling in pass1 and refactor SsaFuncSummary initialization * feat: implement Phase CF-5 for cross-file SCC joint fixed-point convergence with new flags and tests * feat: implement Phase CF-6 with parameter-granularity points-to summaries and associated tests * refactor: update comments and documentation for clarity and consistency * style: format code for consistency and readability * refactor: simplify verdict handling and improve edge checking logic * refactor: optimize path and identifier collection by avoiding unnecessary cloning * chore: update Cargo.toml for Rust version 1.85 and add ignored files; modify CHANGELOG and README for clarity on state analysis defaults * refactor: update documentation and improve clarity in configuration files * refactor: update documentation and improve clarity in configuration files * feat: add JS/TS pass-2 convergence tests and expectations configuration * feat: add Phase 5 regression tests for inline cache origin attribution and update related logic * feat: implement Phase 7 deduplication and alternative path linking for taint findings * feat: implement structural DFS index for anonymous functions and update naming conventions * feat: add Phase 8 regression tests for container-element taint in JS and Python * feat: add engine-depth profiles and explain-engine option for CLI * feat: update expectations and add new README fixtures for multi-file scan regression * feat: implement Phase 11 callback-alias and factory patterns with regression tests * feat: implement Terminator::Switch for multi-way dispatch and add regression tests * feat: add real-CVE benchmark fixtures for CVE-2023-48022, CVE-2019-14939, and CVE-2023-26159 with corresponding patched variants * refactor: extract cfg and ssa_transfer to submodules * refactor: cargo fmt * refactor: remove unnecessary blank line in cfg_tests.rs * refactor: remove unnecessary planning file * chore: update Rust version to 1.88 and bump dependencies in Cargo files * feat: enhance triage UI with new layout and controls, update README for clarity * feat: enhance triage UI with new layout and controls, update README for clarity * chore: remove outdated section from README for version 0.5.0 * docs: improve clarity and consistency in README content * chore: add "GPL-3.0-or-later" to license options in about.toml * chore: update license handling in about.toml and check-licenses.mjs * style: format code for improved readability in TriagePage component * style: format code for improved readability in TriagePage component * chore: enhance license handling and improve body_id scoping in seed lookup * feat: introduce owner and parent body IDs for enhanced seed scoping * feat: implement direction-aware engine provenance with new CLI flag for strict CI gating * feat: add Undef SSA operation for improved control-flow handling * style: improve code formatting for consistency and readability in multiple files * feat: add 16-function chain SCC across multiple files for enhanced analysis * style: simplify code formatting for improved readability in multiple files * fix: update CapHitReason default implementation and improve README clarity * docs: enhance README with detailed explanations of taint analysis and limitations * docs: refine README for clarity and consistency in taint analysis section * style: improve code formatting for better readability in NewScanModal and scans * fix: update cargo-about command to use --offline for deterministic license generation * fix: update cargo-about command to use --offline for deterministic license generation * ci: add step to prime cargo registry cache for deterministic license generation * feat: add support for non-sink collections in authorization analysis * feat: enhance authorization checks with row-level ownership equality and binding tracking * feat: implement self-scoped user handling and enhance ownership checks * refactor: simplify assertions and formatting in authorization analysis tests * fix: normalize line endings in THIRDPARTY-LICENSES.html generation and update README with AI disclosure * docs: update AI disclosure section for clarity and conciseness * feat: add AI Contribution Policy and update contributing guidelines for AI assistance disclosure * feat: enhance authorization analysis with SSA-derived variable type classification * feat: implement auth_finding_to_diag function for enhanced security diagnostics * feat: add args_value_refs to CallSite struct for enhanced argument tracking * feat: add args_value_refs to CallSite struct for enhanced argument tracking * feat: add direction-aware engine provenance with LossDirection classification and new CLI flag * feat: simplify strip_cap_from_call_args call by removing unnecessary line breaks * feat: enhance error message handling in cli_validation_tests for better Windows compatibility * feat: optimize release profile settings in Cargo.toml and update CodeQL configuration * feat: enhance release build process with SBOM generation and SLSA provenance * feat: update actions/checkout and actions/setup-node to v6, enhance CLI options, and improve auth-check summaries * feat: introduce PathFact handling for path safety checks and rejection logic * feat: introduce PathFact handling for path safety checks and rejection logic * feat: update benchmark data and enhance path sanitization logic with new safety checks * feat: document AI assistance in frontend UI development and human review process * feat: add return path facts for enhanced path safety checks and update documentation * chore: update release date for version 0.5.0 in CHANGELOG.md * chore: clean up ci.yml by removing outdated comments and clarifying steps * feat: implement cross-language path sanitizers and validators for enhanced security * feat: enhance SSA value usage tracking by including block terminators and improve path safety checks * feat: enhance switch statement handling by adding per-case path constraints and support for exclusive cases * refactor: simplify conditional formatting and improve code readability in executor and lower modules * feat: add vulnerable examples for various languages demonstrating authentication and sanitization issues * feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers * feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers * feat: add transform classifiers for Java, Go, and Ruby with corresponding tests * refactor: clarify comments on reassign-to-constant idiom and sink behavior in guards.rs --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
c4ce08b452
commit
41128177d2
2144 changed files with 201812 additions and 8927 deletions
|
|
@ -1,161 +1,130 @@
|
|||
# CFG Structural Analysis
|
||||
# CFG structural analysis
|
||||
|
||||
## Summary
|
||||
Nyx builds an intra-procedural control-flow graph per function and checks structural properties: whether sinks are guarded by sanitizers or validators, whether web handlers check authentication, whether resources are released on all exit paths, and whether error paths terminate before reaching dangerous code.
|
||||
|
||||
Nyx builds an intra-procedural control-flow graph (CFG) for each function and analyzes structural properties: whether sinks are guarded by sanitizers or validators, whether web handlers check authentication, whether resources are released on all exit paths, and whether error-handling code terminates properly.
|
||||
|
||||
These detectors use **dominator analysis** — they check whether a guard node dominates (must execute before) a sink node on the CFG.
|
||||
These detectors use dominator analysis. A guard dominates a sink when the guard must execute before the sink on every path from entry.
|
||||
|
||||
## Rule IDs
|
||||
|
||||
| Rule ID | Severity | Description |
|
||||
|---------|----------|-------------|
|
||||
| `cfg-unguarded-sink` | High/Medium | Sink reachable without a dominating guard or sanitizer |
|
||||
| `cfg-auth-gap` | High | Web handler reaches privileged sink without auth check |
|
||||
| `cfg-unreachable-sink` | Medium | Dangerous function in unreachable code |
|
||||
| `cfg-unreachable-sanitizer` | Low | Sanitizer in unreachable code |
|
||||
| `cfg-unreachable-source` | Low | Source in unreachable code |
|
||||
| `cfg-error-fallthrough` | High/Medium | Error check doesn't terminate; dangerous code follows |
|
||||
| `cfg-resource-leak` | Medium | Resource acquired but not released on all exit paths |
|
||||
| `cfg-lock-not-released` | Medium | Lock acquired but not released on all exit paths |
|
||||
| Rule ID | Severity |
|
||||
|---|---|
|
||||
| `cfg-unguarded-sink` | High/Medium |
|
||||
| `cfg-auth-gap` | High |
|
||||
| `cfg-unreachable-sink` | Medium |
|
||||
| `cfg-unreachable-sanitizer` | Low |
|
||||
| `cfg-unreachable-source` | Low |
|
||||
| `cfg-error-fallthrough` | High/Medium |
|
||||
| `cfg-resource-leak` | Medium |
|
||||
| `cfg-lock-not-released` | Medium |
|
||||
|
||||
## What It Detects
|
||||
## What it detects
|
||||
|
||||
### Unguarded sinks (`cfg-unguarded-sink`)
|
||||
A sink call (e.g. `system()`, `eval()`, `Command::new()`) is reachable from the function entry without passing through a guard or sanitizer that matches the sink's capability.
|
||||
**`cfg-unguarded-sink`**: A sink call (`system`, `eval`, `Command::new`, `db.execute`, etc.) is reachable from function entry without passing through any guard or sanitizer that matches the sink's capability.
|
||||
|
||||
### Auth gaps (`cfg-auth-gap`)
|
||||
A function identified as a web handler (by parameter naming conventions like `req`, `res`, `ctx`, `request`) reaches a privileged sink (shell execution, file I/O) without a prior call to an authentication function (`is_authenticated`, `require_auth`, `check_permission`, etc.).
|
||||
**`cfg-auth-gap`**: A function identified as a web handler (by parameter naming conventions like `req`, `res`, `ctx`, `request`, language-dependent) reaches a privileged sink (shell execution, file I/O) without a preceding authentication call.
|
||||
|
||||
### Unreachable security code (`cfg-unreachable-*`)
|
||||
Sinks, sanitizers, or sources in dead code branches. This often indicates a refactoring error where security-critical code was accidentally made unreachable.
|
||||
**`cfg-unreachable-*`**: Sinks, sanitizers, or sources in dead code. Usually signals a refactoring error that silently disabled security-relevant logic.
|
||||
|
||||
### Error fallthrough (`cfg-error-fallthrough`)
|
||||
An error check (null check, error return check) does not terminate the function or loop back. Execution continues to a dangerous operation on the error path.
|
||||
**`cfg-error-fallthrough`**: An error-handling branch (null check, error-return check) does not terminate. Execution falls through to a dangerous operation on the error path.
|
||||
|
||||
### Resource leaks (`cfg-resource-leak`, `cfg-lock-not-released`)
|
||||
A resource acquisition call (e.g. `File::open`, `fopen`, `socket`, `Lock`) is not matched by a release call (e.g. `close`, `fclose`, `unlock`) on all exit paths from the function.
|
||||
**`cfg-resource-leak`, `cfg-lock-not-released`**: A resource acquisition (`File::open`, `fopen`, `socket`, `Lock`) is not matched by a release on every exit path from the function.
|
||||
|
||||
## What It Cannot Detect
|
||||
## What it can't detect
|
||||
|
||||
- **Inter-procedural guards**: If authentication is checked in a middleware function that calls this handler, the CFG detector cannot see it. It only analyzes one function at a time.
|
||||
- **Dynamic dispatch**: Virtual method calls, function pointers, and closures are opaque to the CFG.
|
||||
- **Complex guard patterns**: Only recognized guard function names are checked. Custom validation logic (e.g. `if password == expected`) is not recognized as a guard.
|
||||
- **Correct sanitization**: The detector checks that *some* guard dominates the sink, not that the guard is *correct*. A guard that always passes would suppress the finding.
|
||||
- **Cross-function resource flows**: If a file handle is opened in one function and closed in another, the detector will report a leak in the first function.
|
||||
- **Inter-procedural guards.** Middleware-level auth, helper functions that internally call auth, and cleanup performed in a caller are invisible.
|
||||
- **Dynamic dispatch.** Virtual calls, function pointers, closures resolve to no specific callee.
|
||||
- **Correctness of guards.** The detector checks *a* guard dominates the sink. It cannot check the guard is correct. A no-op `if true {}` would suppress the finding.
|
||||
- **Custom validation logic.** Only recognised guard names are checked. `if password == expected` is not a recognised guard.
|
||||
- **Cross-function resource flows.** If a file handle opens in one function and closes in another, the opener gets flagged as a leak. This is the largest source of FPs on factory-pattern code.
|
||||
|
||||
## Common False Positives
|
||||
## Common false positives
|
||||
|
||||
| Scenario | Why it fires | Mitigation |
|
||||
|----------|-------------|------------|
|
||||
| Framework-level auth middleware | Handler doesn't call auth directly | Document as expected; suppress with severity filter |
|
||||
| Resource closed via RAII/defer | Implicit cleanup not visible to CFG | Currently not detected; known limitation |
|
||||
| Custom guard function name | Function not in the recognized guard list | Add the function name as a sanitizer in config |
|
||||
| Test handlers | Intentionally skip auth in tests | Default non-prod downgrade reduces severity; or exclude test dirs |
|
||||
| Scenario | Why | Mitigation |
|
||||
|---|---|---|
|
||||
| Framework middleware auth | Handler doesn't call auth directly | Expected; suppress with severity filter or exclude handlers |
|
||||
| RAII / defer cleanup | Implicit release not visible to CFG (partially handled for Rust Drop and Go defer) | Known limitation |
|
||||
| Custom guard name | Function not in the recognised guard list | Add it as a sanitizer rule in config |
|
||||
| Test handlers | Intentional lack of auth | Default non-prod downgrade reduces severity; or exclude test dirs |
|
||||
|
||||
## Common False Negatives
|
||||
## Common false negatives
|
||||
|
||||
| Scenario | Why it's missed |
|
||||
|----------|----------------|
|
||||
| Auth in called function | Cross-function guards not tracked |
|
||||
| Guard via type system | Type-level guarantees (e.g. Rust's `AuthenticatedUser` wrapper) not analyzed |
|
||||
| Resource closed in finally/defer | Some cleanup patterns not recognized |
|
||||
| Scenario | Why |
|
||||
|---|---|
|
||||
| Auth in a called helper | Cross-function guards not tracked |
|
||||
| Type-system guards | Rust `AuthenticatedUser<T>` wrappers, typestate patterns not analysed |
|
||||
| Cleanup in `finally`/`ensure`/`defer` in callers | Cross-function cleanup not tracked |
|
||||
|
||||
## Confidence Signals
|
||||
## Tuning
|
||||
|
||||
| Signal | Meaning |
|
||||
|--------|---------|
|
||||
| **Evidence lists guard nodes** | Shows which guards were checked and found missing |
|
||||
| **Sink has high capability** | Shell execution or file I/O sinks are higher risk |
|
||||
| **Handler detection matched** | Web handler identification is based on conventional parameter names |
|
||||
### Recognised guard names
|
||||
|
||||
## Tuning and Noise Controls
|
||||
Nyx accepts these patterns as dominating guards:
|
||||
|
||||
### Add custom guards/sanitizers
|
||||
| Pattern | Applies to |
|
||||
|---|---|
|
||||
| `validate*`, `sanitize*` | All sinks |
|
||||
| `check_*`, `verify_*`, `assert_*` | All sinks |
|
||||
| `shell_escape` | Shell sinks |
|
||||
| `html_escape` | HTML/XSS sinks |
|
||||
| `url_encode` | URL sinks |
|
||||
| `which` | Shell execution (binary lookup) |
|
||||
|
||||
### Recognised auth names
|
||||
|
||||
| Pattern | Language |
|
||||
|---|---|
|
||||
| `is_authenticated`, `require_auth`, `check_permission`, `authorize`, `authenticate`, `require_login`, `check_auth`, `verify_token`, `validate_token` | Cross-language |
|
||||
| `middleware.auth`, `auth.required` | Go |
|
||||
| `isAuthenticated`, `checkPermission`, `hasAuthority`, `hasRole` | Java |
|
||||
|
||||
For Rust auth checks (`require_*`, ownership equality, row-level checks), see [auth.md](../auth.md).
|
||||
|
||||
### Custom guards
|
||||
|
||||
```toml
|
||||
[[analysis.languages.python.rules]]
|
||||
matchers = ["validate_request", "check_csrf"]
|
||||
kind = "sanitizer"
|
||||
cap = "all"
|
||||
cap = "all"
|
||||
```
|
||||
|
||||
### Add auth rules
|
||||
|
||||
Auth checks are recognized by function name. If your codebase uses non-standard names:
|
||||
### Custom auth functions
|
||||
|
||||
```toml
|
||||
[[analysis.languages.javascript.rules]]
|
||||
matchers = ["ensureLoggedIn", "requirePermission"]
|
||||
kind = "sanitizer"
|
||||
cap = "all"
|
||||
```
|
||||
|
||||
### Filter results
|
||||
|
||||
```bash
|
||||
# Skip low-severity unreachable findings
|
||||
nyx scan . --severity ">=MEDIUM"
|
||||
```
|
||||
|
||||
### Disable CFG analysis
|
||||
|
||||
```bash
|
||||
nyx scan . --mode ast # AST patterns only
|
||||
cap = "all"
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Unguarded sink
|
||||
Unguarded sink:
|
||||
|
||||
```go
|
||||
func handler(w http.ResponseWriter, r *http.Request) {
|
||||
cmd := r.URL.Query().Get("cmd")
|
||||
exec.Command("sh", "-c", cmd).Run() // cfg-unguarded-sink: no guard dominates
|
||||
exec.Command("sh", "-c", cmd).Run() // cfg-unguarded-sink
|
||||
}
|
||||
```
|
||||
|
||||
### Auth gap
|
||||
Auth gap:
|
||||
|
||||
```javascript
|
||||
app.get('/admin/delete', (req, res) => {
|
||||
// No is_authenticated() call
|
||||
db.execute("DELETE FROM users WHERE id = " + req.params.id);
|
||||
// cfg-auth-gap: web handler reaches privileged sink without auth
|
||||
// No auth call
|
||||
db.execute("DELETE FROM users WHERE id = " + req.params.id); // cfg-auth-gap
|
||||
});
|
||||
```
|
||||
|
||||
### Resource leak
|
||||
Resource leak:
|
||||
|
||||
```c
|
||||
void process() {
|
||||
FILE *f = fopen("data.txt", "r"); // acquire
|
||||
FILE *f = fopen("data.txt", "r");
|
||||
if (error) {
|
||||
return; // cfg-resource-leak: f not closed on this path
|
||||
return; // cfg-resource-leak: f not closed on this path
|
||||
}
|
||||
fclose(f);
|
||||
}
|
||||
```
|
||||
|
||||
## Guard Rules
|
||||
|
||||
Nyx recognizes these function name patterns as guards:
|
||||
|
||||
| Pattern | Applies to |
|
||||
|---------|-----------|
|
||||
| `validate*`, `sanitize*` | All sinks |
|
||||
| `check_*`, `verify_*`, `assert_*` | All sinks |
|
||||
| `shell_escape` | Shell execution sinks |
|
||||
| `html_escape` | HTML/XSS sinks |
|
||||
| `url_encode` | URL sinks |
|
||||
| `which` | Shell execution (binary lookup) |
|
||||
|
||||
### Auth rules
|
||||
|
||||
| Pattern | Category |
|
||||
|---------|----------|
|
||||
| `is_authenticated`, `require_auth`, `check_permission` | Common |
|
||||
| `authorize`, `authenticate`, `require_login` | Common |
|
||||
| `check_auth`, `verify_token`, `validate_token` | Common |
|
||||
| `middleware.auth`, `auth.required` | Go |
|
||||
| `isAuthenticated`, `checkPermission`, `hasAuthority`, `hasRole` | Java |
|
||||
|
|
|
|||
|
|
@ -1,111 +1,84 @@
|
|||
# AST Pattern Matching
|
||||
# AST patterns
|
||||
|
||||
## Summary
|
||||
AST patterns are tree-sitter queries that match dangerous structural shapes in source. No dataflow, no CFG. A match means the construct is present; it's not proof the construct is exploitable.
|
||||
|
||||
AST patterns are tree-sitter queries that match specific structural code constructs. They are the simplest and fastest detector family — no dataflow, no CFG, just structural presence. A match means the dangerous construct exists in the code; it does not prove the code is exploitable.
|
||||
|
||||
AST patterns run in all analysis modes, including `--mode ast` (where they are the only active detector).
|
||||
Patterns run in every analysis mode. In `--mode ast` they're the only active detector.
|
||||
|
||||
## Rule IDs
|
||||
|
||||
Pattern rule IDs follow the format `<lang>.<category>.<specific>`:
|
||||
|
||||
```
|
||||
rs.memory.transmute
|
||||
js.code_exec.eval
|
||||
py.deser.pickle_loads
|
||||
c.memory.gets
|
||||
java.sqli.execute_concat
|
||||
<lang>.<category>.<name>
|
||||
```
|
||||
|
||||
See the [Rule Reference](../rules/index.md) for a complete listing per language.
|
||||
Examples: `js.code_exec.eval`, `py.deser.pickle_loads`, `c.memory.gets`, `java.sqli.execute_concat`.
|
||||
|
||||
## Pattern Tiers
|
||||
Full list: [rules.md](../rules.md).
|
||||
|
||||
| Tier | Meaning | Examples |
|
||||
|------|---------|---------|
|
||||
| **A** | Structural presence alone is high-signal | `gets()`, `eval()`, `pickle.loads()`, `mem::transmute` |
|
||||
| **B** | Query includes a heuristic guard | SQL `execute` with concatenated arg, `printf(var)` with non-literal format |
|
||||
## Tiers
|
||||
|
||||
Tier B patterns use additional tree-sitter predicates to reduce false positives. For example, `java.sqli.execute_concat` only fires when `executeQuery()` receives a `binary_expression` (string concatenation) as its argument, not when it receives a literal or parameter placeholder.
|
||||
| Tier | Meaning |
|
||||
|---|---|
|
||||
| **A** | Structural presence alone is high-signal. `gets`, `eval`, `pickle.loads`, `mem::transmute` |
|
||||
| **B** | Pattern includes a tree-sitter heuristic guard. Example: `java.sqli.execute_concat` only fires when `executeQuery` receives a `binary_expression` (string concatenation), not a literal or a parameterized statement |
|
||||
|
||||
## What It Detects
|
||||
## Categories
|
||||
|
||||
### By category
|
||||
| Category | Examples |
|
||||
|---|---|
|
||||
| CommandExec | `system`, `os.system`, `Runtime.exec`, backticks |
|
||||
| CodeExec | `eval`, `Function`, PHP `assert("string")`, `class_eval`, `instance_eval` |
|
||||
| Deserialization | `pickle.loads`, `yaml.load`, `Marshal.load`, `readObject`, `unserialize` |
|
||||
| SqlInjection | `executeQuery`/`Query`/`execute` with concatenated argument (Tier B) |
|
||||
| PathTraversal | PHP `include $var` |
|
||||
| Xss | `document.write`, `outerHTML`, `insertAdjacentHTML`, `getWriter().print` |
|
||||
| Crypto | `md5`, `sha1`, `Math.random`, `java.util.Random` for security use |
|
||||
| Secrets | hardcoded API keys (Go, JS, TS) |
|
||||
| InsecureTransport | `InsecureSkipVerify`, `fetch("http://...")` |
|
||||
| Reflection | `Class.forName`, `Method.invoke`, `send`, `constantize` |
|
||||
| MemorySafety | `transmute`, `unsafe`, `gets`, `strcpy`, `sprintf` |
|
||||
| Prototype | `__proto__` assignment, `Object.prototype.*` |
|
||||
| Config | CORS dynamic origin, `rejectUnauthorized: false`, insecure session settings |
|
||||
| CodeQuality | `unwrap`, `panic!`, `as any` |
|
||||
|
||||
| Category | What it matches | Example languages |
|
||||
|----------|----------------|-------------------|
|
||||
| **CommandExec** | Shell command execution functions | C (`system`), Python (`os.system`), Ruby (backticks) |
|
||||
| **CodeExec** | Dynamic code evaluation | JS (`eval`, `new Function()`), Python (`exec`), PHP (`eval`) |
|
||||
| **Deserialization** | Unsafe object deserialization | Java (`readObject`), Python (`pickle.loads`), Ruby (`Marshal.load`) |
|
||||
| **SqlInjection** | SQL with string concatenation | Java, Go, Python, PHP (Tier B heuristic) |
|
||||
| **PathTraversal** | File inclusion with variable path | PHP (`include $var`) |
|
||||
| **Xss** | XSS sink functions | JS (`document.write`, `outerHTML`), Java (`getWriter().print`) |
|
||||
| **Crypto** | Weak cryptographic algorithms | All languages (`md5`, `sha1`, `Math.random()`) |
|
||||
| **Secrets** | Hardcoded credentials | Go (variable name matching) |
|
||||
| **InsecureTransport** | Unencrypted communication | Go (`InsecureSkipVerify`), JS (`fetch("http://")`) |
|
||||
| **Reflection** | Dynamic class/method dispatch | Java (`Class.forName`, `Method.invoke`), Ruby (`send`, `constantize`) |
|
||||
| **MemorySafety** | Memory safety violations | Rust (`transmute`, `unsafe`), C (`gets`, `strcpy`, `sprintf`) |
|
||||
| **Prototype** | Prototype pollution | JS/TS (`__proto__` assignment) |
|
||||
| **CodeQuality** | Panic/abort/type-safety issues | Rust (`unwrap`, `panic!`), TS (`as any`) |
|
||||
## What patterns can't tell you
|
||||
|
||||
## What It Cannot Detect
|
||||
- **Dataflow.** `eval("1+1")` (safe) and `eval(userInput)` (dangerous) both match `js.code_exec.eval`. The taint detector is the one that distinguishes them.
|
||||
- **Reachability.** A pattern in dead code matches identically.
|
||||
- **Semantics.** `strcpy(dst, src)` always matches, regardless of buffer sizes.
|
||||
- **Indirect calls.** `let e = eval; e(input)` doesn't match `eval`.
|
||||
- **Aliased imports.** `from os import system as s; s(cmd)` won't match `system`.
|
||||
- **Macro expansions.** Tree-sitter parses the macro call site, not the expansion.
|
||||
|
||||
- **Dataflow**: Patterns don't track whether the dangerous function receives tainted input. `eval("hello")` (safe) and `eval(userInput)` (dangerous) both match `js.code_exec.eval`.
|
||||
- **Context**: Patterns don't understand whether the code is reachable, guarded, or inside a test.
|
||||
- **Semantics**: `strcpy(dst, src)` always matches — it cannot determine buffer sizes.
|
||||
- **Indirect calls**: Function pointers, dynamic dispatch, and aliased references are invisible.
|
||||
## Common false positives
|
||||
|
||||
## Common False Positives
|
||||
| Scenario | Why | Mitigation |
|
||||
|---|---|---|
|
||||
| `eval("hardcoded literal")` | Pattern matches structure | Run `--mode cfg` to drop AST patterns and rely on taint |
|
||||
| `unsafe` block with sound justification | Every `unsafe` matches `rs.quality.unsafe_block` | Filter `>=MEDIUM` (it's Medium) or accept the noise |
|
||||
| `.unwrap()` in tests | Acceptable in test code | Default non-prod severity downgrade reduces it |
|
||||
| `md5` for non-cryptographic checksums | Pattern can't see intent | Suppress with `--severity ">=MEDIUM"` or per-line `nyx:ignore` |
|
||||
| SQL concat with trusted data (Tier B) | Heuristic can't verify the source | Taint is more precise; or convert to a parameterized query |
|
||||
|
||||
| Scenario | Why it fires | Mitigation |
|
||||
|----------|-------------|------------|
|
||||
| `eval()` with a hardcoded string literal | Pattern matches structural presence | Taint analysis won't flag this — use `--mode cfg` for fewer false positives |
|
||||
| `unsafe` block in Rust with sound justification | All unsafe blocks match | Filter with `--severity ">=MEDIUM"` (unsafe_block is Medium) |
|
||||
| `.unwrap()` in test code | Acceptable in tests | Default non-prod downgrade reduces severity |
|
||||
| `md5()` used for checksums (not security) | Pattern doesn't know usage intent | Filter Low severity or add to exclusions |
|
||||
| SQL concatenation with trusted data | Tier B heuristic can't verify data source | Taint analysis is more precise here |
|
||||
## Confidence levels
|
||||
|
||||
## Common False Negatives
|
||||
Every AST pattern carries an explicit confidence:
|
||||
|
||||
| Scenario | Why it's missed |
|
||||
|----------|----------------|
|
||||
| `eval` called via alias (`let e = eval; e(input)`) | Pattern matches the identifier `eval`, not the resolved function |
|
||||
| Dangerous function in a macro expansion | Tree-sitter parses the macro call, not the expansion |
|
||||
| SQL injection via ORM query builder | No pattern for ORM-specific query building |
|
||||
| Imported function under different name | `from os import system as s; s(cmd)` — pattern looks for `system` |
|
||||
| Confidence | Use |
|
||||
|---|---|
|
||||
| High | Inherently dangerous construct with no safe usage. `gets`, `pickle.loads`, `eval` with no guard |
|
||||
| Medium | Likely issue, context may change the call. SQL concatenation (Tier B), `unsafe` blocks, `exec` |
|
||||
| Low | Heuristic. Often appears in safe code. Weak crypto for checksums, `unwrap` outside tests, `Math.random` |
|
||||
|
||||
## Confidence Signals
|
||||
`--min-confidence medium` (or `output.min_confidence = "medium"`) drops Low-confidence matches.
|
||||
|
||||
| Signal | Meaning |
|
||||
|--------|---------|
|
||||
| **Tier A** | High confidence — the function itself is dangerous |
|
||||
| **Tier B** | Moderate confidence — heuristic guard reduces false positives |
|
||||
| **High severity** | Critical vulnerability class (command exec, deserialization) |
|
||||
| **Low severity** | Informational (weak crypto, code quality) |
|
||||
| **Non-prod path** | Finding in test/vendor code — downgraded by default |
|
||||
|
||||
## Tuning and Noise Controls
|
||||
|
||||
### Severity filtering
|
||||
## Tuning
|
||||
|
||||
```bash
|
||||
# Skip code-quality and weak-crypto findings
|
||||
nyx scan . --severity ">=MEDIUM"
|
||||
|
||||
# Only critical findings
|
||||
nyx scan . --severity HIGH
|
||||
nyx scan . --severity ">=MEDIUM" # drop Low-tier patterns
|
||||
nyx scan . --severity HIGH # banned APIs and code-exec only
|
||||
nyx scan . --mode cfg # drop AST patterns; keep taint + state + cfg
|
||||
```
|
||||
|
||||
### Use taint for precision
|
||||
|
||||
```bash
|
||||
# Taint-only mode: only report findings with confirmed dataflow
|
||||
nyx scan . --mode cfg
|
||||
```
|
||||
|
||||
### Exclude directories
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
excluded_directories = ["node_modules", "vendor", "generated"]
|
||||
|
|
@ -113,37 +86,29 @@ excluded_directories = ["node_modules", "vendor", "generated"]
|
|||
|
||||
## Examples
|
||||
|
||||
### Tier A — structural presence
|
||||
Tier A, structural presence:
|
||||
|
||||
**C: Banned function**
|
||||
```c
|
||||
char buf[64];
|
||||
gets(buf); // c.memory.gets — always dangerous, no safe usage
|
||||
gets(buf); // c.memory.gets
|
||||
```
|
||||
|
||||
**Python: Unsafe deserialization**
|
||||
```python
|
||||
import pickle
|
||||
data = pickle.loads(user_input) # py.deser.pickle_loads
|
||||
data = pickle.loads(user_input) // py.deser.pickle_loads
|
||||
```
|
||||
|
||||
### Tier B — heuristic-guarded
|
||||
Tier B, heuristic guard:
|
||||
|
||||
**Java: SQL concatenation**
|
||||
```java
|
||||
// Fires: concatenated argument
|
||||
stmt.executeQuery("SELECT * FROM users WHERE id=" + userId);
|
||||
// java.sqli.execute_concat
|
||||
stmt.executeQuery("SELECT * FROM users WHERE id=" + userId); // java.sqli.execute_concat
|
||||
|
||||
// Does NOT fire: parameterized query
|
||||
// Does not fire: parameterized
|
||||
stmt.executeQuery(preparedSql);
|
||||
```
|
||||
|
||||
**C: Format string**
|
||||
```c
|
||||
// Fires: variable as first argument
|
||||
printf(user_input); // c.memory.printf_no_fmt
|
||||
|
||||
// Does NOT fire: literal format string
|
||||
printf("%s", user_input);
|
||||
printf(user_input); // c.memory.printf_no_fmt: fires (variable as fmt)
|
||||
printf("%s", user_input); // does not fire (literal fmt)
|
||||
```
|
||||
|
|
|
|||
|
|
@ -1,26 +1,22 @@
|
|||
# State Model Analysis
|
||||
# State model analysis
|
||||
|
||||
## Summary
|
||||
Tracks resource lifecycle and authentication state through a function. Detects use-after-close, double-close, leaks, and unauthenticated access to privileged operations.
|
||||
|
||||
Nyx's state model analysis tracks **resource lifecycle** and **authentication state** through a function using monotone dataflow over bounded lattices. It detects use-after-close bugs, double-close bugs, resource leaks, and unauthenticated access to privileged operations.
|
||||
|
||||
State analysis is **opt-in** — enable it with `scanner.enable_state_analysis = true` in config. It requires `mode = "full"` or `mode = "cfg"`.
|
||||
State analysis is on by default. Disable with `scanner.enable_state_analysis = false`. It runs in `--mode full` and `--mode taint`; AST-only mode skips it.
|
||||
|
||||
## Rule IDs
|
||||
|
||||
| Rule ID | Severity | Description |
|
||||
|---------|----------|-------------|
|
||||
| `state-use-after-close` | High | Variable used after being closed/released |
|
||||
| `state-double-close` | Medium | Resource closed twice |
|
||||
| `state-resource-leak` | Medium | Resource opened but never closed (definite) |
|
||||
| `state-resource-leak-possible` | Low | Resource may not be closed on all paths |
|
||||
| `state-unauthed-access` | High | Privileged operation reached without authentication |
|
||||
| Rule ID | Severity |
|
||||
|---|---|
|
||||
| `state-use-after-close` | High |
|
||||
| `state-double-close` | Medium |
|
||||
| `state-resource-leak` | Medium |
|
||||
| `state-resource-leak-possible` | Low |
|
||||
| `state-unauthed-access` | High |
|
||||
|
||||
## What It Detects
|
||||
## What it detects
|
||||
|
||||
### Use-after-close (`state-use-after-close`)
|
||||
|
||||
A resource transitions to the CLOSED state (via `close()`, `fclose()`, `disconnect()`, etc.), then a use operation (`read`, `write`, `send`, `recv`, `query`, etc.) is performed on it.
|
||||
**`state-use-after-close`**: Resource transitions to CLOSED (via `close`, `fclose`, `disconnect`, …), then a use operation happens on it.
|
||||
|
||||
```c
|
||||
FILE *f = fopen("data.txt", "r");
|
||||
|
|
@ -28,147 +24,108 @@ fclose(f);
|
|||
fread(buf, 1, 100, f); // state-use-after-close
|
||||
```
|
||||
|
||||
### Double-close (`state-double-close`)
|
||||
**`state-double-close`**: Resource closed twice. Crashes or undefined behaviour on most runtimes.
|
||||
|
||||
A resource is closed twice. This can cause crashes or undefined behavior.
|
||||
**`state-resource-leak`**: Resource opened but never closed on any path through the function. Definite leak.
|
||||
|
||||
```python
|
||||
f = open("data.txt")
|
||||
f.close()
|
||||
f.close() # state-double-close
|
||||
```
|
||||
**`state-resource-leak-possible`**: Resource closed on some paths but not others. Lower confidence; often an early-return error path.
|
||||
|
||||
### Resource leak (`state-resource-leak`)
|
||||
**`state-unauthed-access`**: A function recognised as a web handler reaches a privileged sink without an auth call on the path.
|
||||
|
||||
A resource is opened but never closed on any path through the function. This is a definite leak.
|
||||
A function counts as a web handler if its name starts with `handle_`, `route_`, or `api_` (sufficient on its own), or starts with `serve_`/`process_` and the file uses web-shaped parameter names (`request`, `req`, `ctx`, `res`, `response`, `w`, `writer`, language-dependent). `main` is excluded.
|
||||
|
||||
```java
|
||||
FileInputStream fis = new FileInputStream("data.txt");
|
||||
process(fis);
|
||||
// function exits without fis.close() — state-resource-leak
|
||||
```
|
||||
## Managed-resource suppression
|
||||
|
||||
### Possible resource leak (`state-resource-leak-possible`)
|
||||
Several language-specific cleanup patterns suppress leak findings:
|
||||
|
||||
A resource is closed on some paths but not others.
|
||||
| Pattern | Languages | Effect |
|
||||
|---|---|---|
|
||||
| RAII / Drop | Rust | All leak findings suppressed except `alloc`/`dealloc` |
|
||||
| Smart pointers | C++ | `make_unique`/`make_shared` treated as managed; raw `new`/`malloc` still tracked |
|
||||
| `defer` | Go | `defer f.Close()` suppresses leak at exit |
|
||||
| `with` context manager | Python | `with open(f) as f:` suppresses leak for the bound name |
|
||||
| try-with-resources | Java | TWR-bound resources suppressed |
|
||||
|
||||
```go
|
||||
f, err := os.Open("data.txt")
|
||||
if err != nil {
|
||||
return // f not closed here
|
||||
}
|
||||
f.Close() // closed here
|
||||
// state-resource-leak-possible on the error path
|
||||
```
|
||||
## What it can't detect
|
||||
|
||||
### Unauthenticated access (`state-unauthed-access`)
|
||||
- **Cross-function resource ownership.** Open in one function, close in another, leak gets reported in the opener. The most common FP source for leak detection.
|
||||
- **Factory / builder functions** that return a resource for the caller to manage.
|
||||
- **Variable shadowing across scopes.** Same name in inner and outer scope shares one symbol; an inner close masks an outer leak.
|
||||
- **Resources stored in collections.** Handles in arrays / maps / channels and cleaned up via iteration are not tracked.
|
||||
- **Dynamic dispatch.** Close called via trait object or interface may not be recognised.
|
||||
- **Type-state authentication.** `AuthenticatedRequest<T>` and similar Rust patterns are not recognised as auth.
|
||||
|
||||
A function identified as a web handler reaches a privileged sink (shell execution, file I/O) without any authentication check on the path.
|
||||
## Common false positives
|
||||
|
||||
A function is identified as a web handler if:
|
||||
1. Its name starts with `handle_`, `route_`, or `api_` (strong match — sufficient on its own), OR
|
||||
2. Its name starts with `serve_` or `process_` AND any function in the file has web-like parameter names (`request`, `req`, `ctx`, `res`, `response`, `w`, `writer`, etc., varying by language).
|
||||
| Scenario | Why | Mitigation |
|
||||
|---|---|---|
|
||||
| Factory returns a resource | Caller owns it | Known limitation |
|
||||
| Framework-managed handles | Connection pool, request scope | Exclude framework code or downgrade |
|
||||
| Variable name shadowing | Same name reused | Known limitation |
|
||||
|
||||
The function name `main` is explicitly excluded.
|
||||
## Per-language detection
|
||||
|
||||
```javascript
|
||||
app.post('/admin/exec', (req, res) => {
|
||||
// No auth check
|
||||
exec(req.body.command); // state-unauthed-access
|
||||
});
|
||||
```
|
||||
| Language | Leak | Double-close | Use-after-close | Notes |
|
||||
|---|---|---|---|---|
|
||||
| C | yes | yes | yes | `fopen`/`fclose`, `malloc`/`free`, `pthread_mutex_*` |
|
||||
| C++ | yes | yes | yes | C pairs plus `new`/`delete`; smart pointers suppressed |
|
||||
| Python | yes | yes | yes | `with` suppressed; `open`, `socket`, `connect` |
|
||||
| Go | yes | yes | yes | `defer` suppressed; `os.Open` / `.Close` |
|
||||
| Rust | unsafe only | n/a | n/a | RAII suppresses everything except `alloc`/`dealloc` |
|
||||
| JavaScript | yes | yes | partial | `fs.openSync`/`closeSync` |
|
||||
| TypeScript | yes | yes | partial | Same as JS |
|
||||
| PHP | yes | yes | partial | `fopen`/`fclose`, `curl_init`/`curl_close`, `mysqli_*` |
|
||||
| Ruby | partial | partial | partial | `File.open`/`close`, `TCPSocket` |
|
||||
| Java | limited | limited | limited | Constructor-callee matching is incomplete |
|
||||
|
||||
## What It Cannot Detect
|
||||
|
||||
- **Cross-function resource management**: Resources opened in one function and closed in another are not tracked. This is the most common source of false positives for leak detection.
|
||||
- **RAII / defer / try-with-resources**: Implicit cleanup via language-level constructs (Rust's `Drop`, Go's `defer`, Java's try-with-resources, Python's `with`) is not recognized. These patterns will produce false-positive leak findings.
|
||||
- **Dynamic dispatch**: If `close()` is called through a trait object or interface, it may not be recognized.
|
||||
- **Authentication via type system**: Rust's type-state pattern (e.g. `AuthenticatedRequest<T>`) is not recognized as an auth check.
|
||||
- **Complex authorization logic**: Only recognized function name patterns are checked.
|
||||
|
||||
## Common False Positives
|
||||
|
||||
| Scenario | Why it fires | Mitigation |
|
||||
|----------|-------------|------------|
|
||||
| RAII / Drop / defer cleanup | Implicit cleanup not visible | Known limitation; filter by severity |
|
||||
| Resource returned to caller | Ownership transferred, not leaked | Known limitation |
|
||||
| Framework-managed resources | Web framework manages connection lifecycle | Exclude framework-generated handlers |
|
||||
| Try-with-resources (Java) | Language construct not parsed | Known limitation |
|
||||
| Context manager (Python `with`) | Block construct not tracked | Known limitation |
|
||||
|
||||
## Common False Negatives
|
||||
|
||||
| Scenario | Why it's missed |
|
||||
|----------|----------------|
|
||||
| Resource closed in helper function | Cross-function tracking not implemented |
|
||||
| Auth in middleware | Auth check happens before handler is called |
|
||||
| Double-close via aliased reference | Alias analysis not performed |
|
||||
|
||||
## Confidence Signals
|
||||
|
||||
| Signal | Meaning |
|
||||
|--------|---------|
|
||||
| **Definite leak (state-resource-leak)** | Resource is never closed on any path — high confidence |
|
||||
| **Use-after-close** | Read/write operation after explicit close — high confidence |
|
||||
| **Web handler detected** | Entry point matched by parameter naming convention |
|
||||
| **Possible leak (state-resource-leak-possible)** | Resource closed on some but not all paths — lower confidence |
|
||||
|
||||
## Tuning and Noise Controls
|
||||
|
||||
### Enable state analysis
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
enable_state_analysis = true
|
||||
```
|
||||
|
||||
### Severity filtering
|
||||
## Tuning
|
||||
|
||||
```bash
|
||||
# Skip possible-leak findings (Low severity)
|
||||
nyx scan . --severity ">=MEDIUM"
|
||||
nyx scan . --severity ">=MEDIUM" # Skip "possible" leaks (Low)
|
||||
```
|
||||
|
||||
### Exclude test files
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
excluded_directories = ["tests", "test", "spec"]
|
||||
enable_state_analysis = true # default
|
||||
excluded_directories = ["tests", "test", "spec"]
|
||||
```
|
||||
|
||||
## Resource Pairs
|
||||
## Recognised pairs
|
||||
|
||||
The state engine recognizes these acquire/release pairs per language:
|
||||
The state engine ships these acquire/release pairs. Custom pairs are not yet configurable; file an issue if you need one.
|
||||
|
||||
### C/C++
|
||||
| Acquire | Release | Resource |
|
||||
|---------|---------|----------|
|
||||
| `fopen` | `fclose` | File handle |
|
||||
| `open` | `close` | File descriptor |
|
||||
| `socket` | `close` | Socket |
|
||||
| `malloc`, `calloc`, `realloc` | `free` | Heap memory |
|
||||
| `pthread_mutex_lock` | `pthread_mutex_unlock` | Mutex |
|
||||
**C / C++**
|
||||
|
||||
### Rust
|
||||
| Acquire | Release | Resource |
|
||||
|---------|---------|----------|
|
||||
| `File::open`, `File::create` | `drop`, `close` | File handle |
|
||||
| `TcpStream::connect` | `shutdown` | TCP connection |
|
||||
| `lock`, `read`, `write` (on Mutex/RwLock) | `drop` | Lock guard |
|
||||
| Acquire | Release |
|
||||
|---|---|
|
||||
| `fopen` | `fclose` |
|
||||
| `open` | `close` |
|
||||
| `socket` | `close` |
|
||||
| `malloc`, `calloc`, `realloc` | `free` |
|
||||
| `pthread_mutex_lock` | `pthread_mutex_unlock` |
|
||||
| `new`, `new[]` *(C++)* | `delete`, `delete[]` |
|
||||
|
||||
### Java
|
||||
| Acquire | Release | Resource |
|
||||
|---------|---------|----------|
|
||||
| `new FileInputStream` | `close` | File stream |
|
||||
| `getConnection` | `close` | DB connection |
|
||||
| `new Socket` | `close` | Socket |
|
||||
**Rust**
|
||||
|
||||
### Go, Python, JavaScript, Ruby, PHP
|
||||
Similar patterns with language-specific function names.
|
||||
| Acquire | Release |
|
||||
|---|---|
|
||||
| `File::open`, `File::create` | `drop`, `close` |
|
||||
| `TcpStream::connect` | `shutdown` |
|
||||
| `lock`, `read`, `write` (Mutex/RwLock) | `drop` |
|
||||
|
||||
## Use Patterns (Trigger use-after-close)
|
||||
**Java**
|
||||
|
||||
The following operations on a closed resource trigger `state-use-after-close`:
|
||||
| Acquire | Release |
|
||||
|---|---|
|
||||
| `new FileInputStream` (and friends) | `close` |
|
||||
| `getConnection` | `close` |
|
||||
| `new Socket` | `close` |
|
||||
|
||||
Go, Python, JavaScript, Ruby, PHP follow language-idiomatic equivalents.
|
||||
|
||||
## Use-after-close triggers
|
||||
|
||||
These operations on a closed resource fire `state-use-after-close`:
|
||||
|
||||
```
|
||||
read, write, send, recv, fread, fwrite, fgets, fputs, fprintf, fscanf,
|
||||
|
|
@ -177,28 +134,3 @@ ungetc, query, execute, fetch, sendto, recvfrom, ioctl, fcntl,
|
|||
strcpy, strncpy, strcat, strncat, memcpy, memmove, memset, memcmp,
|
||||
strcmp, strncmp, strlen, sprintf, snprintf
|
||||
```
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Resource Lifecycle Lattice
|
||||
|
||||
```
|
||||
UNINIT → OPEN → CLOSED
|
||||
→ MOVED
|
||||
```
|
||||
|
||||
States are tracked as bitflags, allowing the lattice to represent uncertainty (e.g. OPEN|CLOSED means the resource is open on some paths and closed on others).
|
||||
|
||||
### Leak Detection Scope
|
||||
|
||||
Resource leaks are checked at the file-level exit node and the **synthesized** function exit node (a single Return node that all early returns feed into). Early-return nodes are **not** checked individually — only the merged state at the function's synthesized exit is inspected. This prevents duplicate findings where an early-return path reports a definite leak while the merged exit correctly reports a possible leak.
|
||||
|
||||
This per-function exit inspection ensures that a variable leaked inside one function is not masked by a same-named variable that is properly closed in a subsequent function.
|
||||
|
||||
### Auth Level Lattice
|
||||
|
||||
```
|
||||
Unauthed < Authed < Admin
|
||||
```
|
||||
|
||||
Join semantics: take the minimum (conservative). If any path is unauthenticated, the result is unauthenticated.
|
||||
|
|
|
|||
|
|
@ -1,10 +1,8 @@
|
|||
# Taint Analysis
|
||||
# Taint analysis
|
||||
|
||||
## Summary
|
||||
Nyx tracks untrusted data from **sources** (where it enters the program) through assignments and function calls to **sinks** (where it's used dangerously). If the flow reaches a sink without passing a matching **sanitizer**, a finding fires.
|
||||
|
||||
Nyx's taint analysis tracks the flow of untrusted data from **sources** (where data enters the program) through **assignments and function calls** to **sinks** (where dangerous operations happen). If the data reaches a sink without passing through a **sanitizer** with matching capabilities, a finding is emitted.
|
||||
|
||||
The engine uses a monotone forward dataflow analysis over a finite lattice with guaranteed termination. Analysis is **intra-procedural with cross-file function summaries** — it does not follow calls into other functions but uses pre-computed summaries of their behavior.
|
||||
The engine is a monotone forward dataflow over a finite lattice with guaranteed termination. It's flow-sensitive inside a function, and interprocedural across files via persisted per-function summaries.
|
||||
|
||||
## Rule ID
|
||||
|
||||
|
|
@ -12,191 +10,135 @@ The engine uses a monotone forward dataflow analysis over a finite lattice with
|
|||
taint-unsanitised-flow (source <line>:<col>)
|
||||
```
|
||||
|
||||
One rule ID covers all taint findings. The parenthetical identifies the specific source location.
|
||||
One rule ID, parameterized by the source location. Suppressions can target either the base ID or the full string.
|
||||
|
||||
## What It Detects
|
||||
## What it detects
|
||||
|
||||
- Environment variables flowing to shell execution (`env::var` → `Command::new`)
|
||||
- User input flowing to code evaluation (`req.body` → `eval()`)
|
||||
- File contents flowing to SQL queries (`fs::read_to_string` → `db.execute()`)
|
||||
- Request parameters flowing to HTML output (`req.query` → `innerHTML`)
|
||||
- Any source-to-sink flow where the sink's required capability is not stripped by a sanitizer
|
||||
- User input flowing to shell execution: `req.body.cmd` → `child_process.exec`
|
||||
- User input flowing to code evaluation: `req.query.code` → `eval`
|
||||
- User input flowing to SQL: `request.args.get('id')` → `cursor.execute(f"... {id}")`
|
||||
- Environment variables flowing to shell: `env::var("CMD")` → `Command::new("sh").arg("-c")`
|
||||
- Request parameters flowing to HTML: `req.query.name` → `innerHTML`
|
||||
- File contents flowing to privileged sinks: `fs::read_to_string` → `db.execute`
|
||||
- Any other source-to-sink flow where the sink's required capability is not stripped along the way
|
||||
|
||||
## What It Cannot Detect
|
||||
## What it can't detect
|
||||
|
||||
- **Inter-procedural flows without summaries**: If a function isn't summarized (e.g. from a third-party library without source), the taint engine cannot track data through it. It conservatively treats unknown callees as neither propagating nor sanitizing.
|
||||
- **Flows through data structures**: Taint is tracked per-variable, not per-field. `obj.field = tainted; sink(obj.other_field)` may produce a false positive because taint attaches to `obj` as a whole.
|
||||
- **Aliasing**: `let y = &x; sink(*y)` — the engine tracks `y` as a fresh variable, not an alias of `x`. This can cause false negatives.
|
||||
- **Complex control flow**: The analysis is flow-sensitive (respects control flow within a function) but does not track taint through arbitrary loops with complex exit conditions.
|
||||
- **Implicit flows**: Taint only follows explicit data flow, not information flow through branching (e.g. `if (secret) { x = 1 } else { x = 0 }` does not taint `x`).
|
||||
- **Library calls without summaries.** If a callee has no summary (no source, binary-only dependency), Nyx treats it as neither propagating nor sanitizing. This is conservative for sanitization but lossy for propagation.
|
||||
- **Taint through struct fields and containers.** Taint attaches to whole variables. `obj.field = tainted; sink(obj.other_field)` can produce a false positive because `obj` itself is tainted.
|
||||
- **Aliasing.** `let y = &x; sink(*y)` tracks `y` separately from `x`. Can cause FNs.
|
||||
- **Implicit flows.** Taint follows explicit data, not branching signal. `if (secret) x = 1 else x = 0` does not taint `x`.
|
||||
- **Globals and statics across functions.** Not tracked across function boundaries.
|
||||
|
||||
## Common False Positives
|
||||
## Common false positives
|
||||
|
||||
| Scenario | Why it happens | Mitigation |
|
||||
|----------|---------------|------------|
|
||||
| Custom sanitizer not recognized | Nyx only knows built-in and configured sanitizers | Add a custom sanitizer rule in config |
|
||||
| Taint through struct fields | Variable-level (not field-level) tracking | No current mitigation; field sensitivity is planned |
|
||||
| Dead code paths | The engine is path-insensitive within a function (it considers all paths) | Contradiction pruning catches some cases; path-validated findings score lower |
|
||||
| Library wrappers | A wrapper around a dangerous function may re-introduce taint that was sanitized by the wrapper | Summarize the wrapper function or add it as a sanitizer |
|
||||
| Scenario | Why | Mitigation |
|
||||
|---|---|---|
|
||||
| Custom sanitizer not recognised | Only built-in + configured sanitizers match | Add a custom sanitizer rule in config |
|
||||
| Taint through struct fields | Variable-level tracking, not field-level | No fix yet; field-sensitivity is planned |
|
||||
| Dead branches | Path-insensitive within a function | Constraint solving catches trivially infeasible combos; path-validated findings are scored lower |
|
||||
| Library wrapper re-introduces taint | Wrapper opaque, or summary marks it as propagating | Summarize the wrapper explicitly or add it as a sanitizer |
|
||||
|
||||
## Common False Negatives
|
||||
## Common false negatives
|
||||
|
||||
| Scenario | Why it's missed |
|
||||
|----------|----------------|
|
||||
| Third-party library calls | No summary available; callee treated as opaque |
|
||||
| Taint through global/static variables | Not tracked across function boundaries |
|
||||
| Taint through closures/callbacks in some languages | Closure capture analysis is limited (JS/TS/Ruby/Go anonymous functions ARE analyzed) |
|
||||
| Flows spanning more than two files | Summary approximation loses precision at depth |
|
||||
| Scenario | Why |
|
||||
|---|---|
|
||||
| Third-party library on the path | No summary available, callee treated opaquely |
|
||||
| Globals / statics across function boundaries | Not tracked |
|
||||
| Some closure captures | Closure analysis is limited. JS/TS/Ruby/Go anonymous functions passed as callbacks *are* analyzed as separate scopes |
|
||||
| Very deep cross-file chains | Summary approximation loses precision at depth |
|
||||
|
||||
## Confidence Signals
|
||||
## Confidence signals
|
||||
|
||||
These signals in the output indicate higher-confidence findings:
|
||||
Higher confidence:
|
||||
- Source + Sink both present in evidence with specific call locations.
|
||||
- `source_kind: user_input` (direct attacker control).
|
||||
- `path_validated: false`.
|
||||
- No dominating guard on the path.
|
||||
- Symex produced a witness string (rendered sink value visible in JSON/SARIF `evidence.symbolic.witness`).
|
||||
|
||||
| Signal | What it means |
|
||||
|--------|--------------|
|
||||
| **Evidence: Source + Sink** | Both endpoints identified with specific function names and locations |
|
||||
| **Source kind = user input** | Source is directly controllable by an attacker (req.body, argv, etc.) |
|
||||
| **path_validated = false** | No validation guard on the path — higher exploitability |
|
||||
| **No guard_kind** | No dominating predicate check (null check, error check, etc.) |
|
||||
| **High rank_score** | Multiple confidence signals combined |
|
||||
Lower confidence:
|
||||
- Path-validated taint (`path_validated: true`).
|
||||
- Source is a database read or internal file (pre-validated at insertion is common).
|
||||
- Engine note `ForwardBailed` / `PathWidened`. Use `--require-converged` to drop these in strict gates.
|
||||
|
||||
Lower-confidence:
|
||||
## Tuning
|
||||
|
||||
| Signal | What it means |
|
||||
|--------|--------------|
|
||||
| **path_validated = true** | A validation predicate guards the path — may not be exploitable |
|
||||
| **guard_kind = "ValidationCall"** | An explicit validation function was called before the sink |
|
||||
| **Source kind = database** | Data from DB — may already be validated at insertion time |
|
||||
|
||||
## Tuning and Noise Controls
|
||||
|
||||
### Add custom sanitizers
|
||||
|
||||
If your codebase has a custom sanitizer that Nyx doesn't recognize:
|
||||
### Custom sanitizer
|
||||
|
||||
```toml
|
||||
# nyx.local
|
||||
[[analysis.languages.javascript.rules]]
|
||||
matchers = ["escapeHtml", "sanitizeInput"]
|
||||
kind = "sanitizer"
|
||||
cap = "html_escape"
|
||||
kind = "sanitizer"
|
||||
cap = "html_escape"
|
||||
```
|
||||
|
||||
Or via CLI:
|
||||
```bash
|
||||
nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape
|
||||
```
|
||||
Or: `nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape`.
|
||||
|
||||
### Filter by severity
|
||||
### Filter by severity or confidence
|
||||
|
||||
```bash
|
||||
nyx scan . --severity HIGH # Only high-severity taint findings
|
||||
nyx scan . --severity ">=MEDIUM" # Skip low-severity
|
||||
nyx scan . --severity HIGH
|
||||
nyx scan . --min-confidence medium
|
||||
```
|
||||
|
||||
### Skip non-production code
|
||||
|
||||
By default, findings in `tests/`, `vendor/`, `build/` paths are downgraded one severity tier. To exclude them entirely, add to config:
|
||||
|
||||
```toml
|
||||
[scanner]
|
||||
excluded_directories = ["tests", "vendor", "build", "examples"]
|
||||
```
|
||||
|
||||
### Disable taint (AST-only mode)
|
||||
### Skip dataflow entirely
|
||||
|
||||
```bash
|
||||
nyx scan . --mode ast
|
||||
```
|
||||
|
||||
AST-only mode gives you structural pattern matches without taint.
|
||||
|
||||
In the browser UI, taint findings render as a numbered flow walk so you can see each hop the engine took:
|
||||
|
||||
<p align="center"><img src="../../assets/screenshots/docs/serve-finding-detail.png" alt="Nyx finding detail: HIGH taint-unsanitised-flow with numbered source → call → sink steps and How to fix guidance" width="900"/></p>
|
||||
|
||||
## Example
|
||||
|
||||
**Vulnerable code** (Rust):
|
||||
Rust:
|
||||
|
||||
```rust
|
||||
use std::env;
|
||||
use std::process::Command;
|
||||
|
||||
fn main() {
|
||||
let cmd = env::var("USER_CMD").unwrap(); // line 5: source
|
||||
Command::new("sh").arg("-c").arg(&cmd).output(); // line 6: sink
|
||||
let cmd = env::var("USER_CMD").unwrap(); // source
|
||||
Command::new("sh").arg("-c").arg(&cmd).output(); // sink
|
||||
}
|
||||
```
|
||||
|
||||
**Finding**:
|
||||
Finding:
|
||||
|
||||
```
|
||||
[HIGH] taint-unsanitised-flow (source 5:15) src/main.rs:6:5
|
||||
Source: env::var("USER_CMD") at 5:15
|
||||
Sink: Command::new("sh").arg("-c")
|
||||
Score: 76
|
||||
[HIGH] taint-unsanitised-flow (source 5:15) src/main.rs:6:5
|
||||
Unsanitised user input flows from env::var → Command::new
|
||||
Source: env::var (5:15)
|
||||
Sink: Command::new
|
||||
```
|
||||
|
||||
**Safe alternative**:
|
||||
```rust
|
||||
use std::env;
|
||||
use std::process::Command;
|
||||
Safe rewrite: drop the shell and pass the value as argv directly (`Command::new(&cmd).output()`), or validate against an allowlist before passing to the shell.
|
||||
|
||||
fn main() {
|
||||
let cmd = env::var("USER_CMD").unwrap();
|
||||
// Use the value as a direct argument, not a shell command
|
||||
Command::new(&cmd).output();
|
||||
// Or validate against an allowlist
|
||||
}
|
||||
```
|
||||
## Capabilities
|
||||
|
||||
## Technical Details
|
||||
Sources, sanitizers, and sinks are linked by named capabilities. A sanitizer only clears taint for the cap it declares. A sink only fires when the remaining taint still carries its required cap.
|
||||
|
||||
### Capability System
|
||||
| Capability | Typical source | Typical sanitizer | Typical sink |
|
||||
|---|---|---|---|
|
||||
| `env_var` | `env::var`, `getenv`, `process.env` | | |
|
||||
| `html_escape` | | `html.escape`, `DOMPurify.sanitize` | `innerHTML`, `document.write` |
|
||||
| `shell_escape` | | `shlex.quote`, `shell_escape::escape` | `system`, `Command::new`, `eval` |
|
||||
| `url_encode` | | `encodeURIComponent` | `location.href`, HTTP client URL arg |
|
||||
| `json_parse` | | `JSON.parse` | |
|
||||
| `file_io` | | `os.path.realpath`, `filepath.Clean` | `open`, `fs::read_to_string`, `send_file` |
|
||||
| `fmt_string` | | | `printf(var)` |
|
||||
| `sql_query` | | parameterized query binders | `cursor.execute`, `db.query` with concatenation |
|
||||
| `deserialize` | | | `pickle.loads`, `yaml.load`, `Marshal.load` |
|
||||
| `ssrf` | | URL-prefix locks | `requests.get`, `fetch`, `HttpClient.send` |
|
||||
| `code_exec` | | | `eval`, `exec`, `Function` |
|
||||
| `crypto` | | | weak-algorithm constructors |
|
||||
| `unauthorized_id` | request-bound scoped IDs (Rust auth analysis) | ownership check | row-level write |
|
||||
| `all` | Sources typically use `all` so they match any sink | | |
|
||||
|
||||
Taint uses a bitflag capability system to match sources with appropriate sanitizers and sinks:
|
||||
|
||||
| Capability | Bit | Sources | Sanitizers | Sinks |
|
||||
|-----------|-----|---------|------------|-------|
|
||||
| `ENV_VAR` | 0x01 | `env::var`, `getenv` | — | — |
|
||||
| `HTML_ESCAPE` | 0x02 | — | `html_escape`, `DOMPurify.sanitize` | `innerHTML`, `document.write` |
|
||||
| `SHELL_ESCAPE` | 0x04 | — | `shell_escape` | `Command::new`, `system()`, `eval()` |
|
||||
| `URL_ENCODE` | 0x08 | — | `encodeURIComponent` | `location.href` |
|
||||
| `JSON_PARSE` | 0x10 | — | `JSON.parse` | — |
|
||||
| `FILE_IO` | 0x20 | — | `filepath.Clean`, `basename`, `os.path.realpath` | `fopen`, `open`, `send_file`, `fs::read_to_string` |
|
||||
| `FMT_STRING` | 0x40 | — | — | `printf(var)` |
|
||||
|
||||
Sources typically use `Cap::all()` to match any sink. A sanitizer strips specific capability bits. A finding fires when a tainted variable reaches a sink and the taint still has the matching capability bit set.
|
||||
|
||||
### Nested Function Analysis
|
||||
|
||||
The CFG builder recursively discovers function expressions nested inside call arguments:
|
||||
|
||||
- **JavaScript/TypeScript**: `function_expression`, `arrow_function` inside call arguments (e.g., Express route handlers)
|
||||
- **Ruby**: `do_block` and `block` nodes (e.g., Sinatra `get '/path' do...end`)
|
||||
- **Go**: `func_literal` (anonymous function literals)
|
||||
|
||||
Each nested function is walked as a separate scope and receives a unique identifier (`<anon@{byte_offset}>`) to prevent collisions when multiple anonymous functions exist in the same file.
|
||||
|
||||
### Chained Call Classification
|
||||
|
||||
Method chains like `r.URL.Query().Get("host")` are normalized by stripping internal `()` segments between `.` separators. The classifier matches against both the original text and the normalized form, enabling rules like `r.URL` to match within `r.URL.Query.Get`.
|
||||
|
||||
### Nested Call Fallback
|
||||
|
||||
When the outermost call in an expression doesn't classify as a source/sink, the engine tries all nested inner calls. This handles patterns like `str(eval(expr))` where `str` is not a sink but the inner `eval` is.
|
||||
|
||||
### Rust `if let` / `while let` Pattern Bindings
|
||||
|
||||
The CFG builder recognizes Rust `let_condition` nodes inside `if` and `while` expressions. The value expression is classified for source/sink labels, and the pattern binding is extracted as a variable definition:
|
||||
|
||||
```rust
|
||||
if let Ok(cmd) = env::var("CMD") {
|
||||
// cmd is tainted — env::var is a source, cmd is the binding
|
||||
Command::new("sh").arg("-c").arg(&cmd).output(); // taint-unsanitised-flow
|
||||
}
|
||||
```
|
||||
|
||||
This also works for `while let` patterns.
|
||||
|
||||
### JS/TS Two-Level Solve
|
||||
|
||||
For JavaScript and TypeScript, taint analysis uses a two-level approach:
|
||||
|
||||
1. **Level 1**: Solve top-level code (module scope)
|
||||
2. **Level 2**: Solve each function seeded with the converged top-level state
|
||||
|
||||
This prevents false positives from cross-function taint leakage while preserving global-to-function flows.
|
||||
Sources typically use `cap = "all"` so they match every sink. Sinks declare the specific cap they need. Sanitizers only clear the cap they name.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue