Release/0.5.0 (#35)

* feat: Introduce function-scoped variable interning for state analysis with new tests and fixtures

* feat: Add Phase 26 symbolic execution enhancements with bitwise operator support, abstract interpretation refinements, and new taint analysis tests

* feat: Refine state analysis to handle factory-pattern resource returns with mixed-path tests and leak detection enhancements

* feat: Add Phase 27 debug views with symbolic execution, abstract interpretation, SSA, and call graph viewers; integrate with debug layout and styles

* feat: Add Phase 31 type-qualified symbolic resolution with receiver-based callee disambiguation and testing

* feat: Extend symbolic execution with state iteration, enhanced debug views, and debounced input handling

* feat: Add Phase 13 resource and auth pattern extensions with new tests and fixtures

* feat: Introduce CFG debug graph renderer with compact mode, toolbar, and DAG layout integration

* feat: Add Phase 28 encoding and decoding transform modeling with structural symex enhancements and new taint analysis tests

* feat: Extend abstract interpretation with type facts and constant value tracking in debug views and server logic

* feat: Add linear path handling and witness extraction to symbolic execution with Phase 28 transform mismatch detection

* feat: Refine Go auth and sanitizer handling with enhanced rules, state updates, and benchmark improvements

* feat: Enable auth-state analysis by default and update relevant tests in benchmark config

* test: Update state_tests to reflect default enablement of auth-state analysis and add auth suppression test

* docs: update CHANGELOG.md

* feat: Introduce per-index taint tracking in `HeapState` with `HeapSlot`, overflow handling, and revised SSA transfers

* feat: Introduce C/C++ language labels and refine heap state tracking in SSA transfers

* feat: Implement per-index array slot tracking in symbolic heap with overflow collapse

* feat: Add implicit definition handling for uninitialized declarations in SSA value allocation

* feat: Refactor function parameters and constants for improved clarity and maintainability

* refactor: Reorder module imports and improve formatting for consistency

* refactor: Fix formatting erorrs

* refactor: Fix clippy warnings

* refactor: Fix fmt warnings (again)

* chore: Update dependencies and improve feature configuration

* Add comprehensive tests for undertested modules (#36) (COPILOT)

* Add comprehensive tests for undertested modules

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083

* Add comprehensive tests for ext, project, walk, and errors modules

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* chore: Update dependencies and improve feature configuration

* fix: formatting errors in new tests

* chore: Update license list in about.toml

* chore: made functions input inline

* chore: updated cfg graph to take up the full page

* chore: add Prettier configuration and update code formatting

* Add frontend test suite with Vitest (111 tests) (#37)

* Add Vitest test suite for frontend - 111 tests across utils, components, hooks, and graph utilities

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/7cf0dba2-ecff-4740-ba4d-92717e74a0b7

* ci: add frontend test step to CI workflow

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/5bc0ac9f-0a32-4d03-9cb7-7a15aea53fca

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* chore: simplify array initialization in test files for consistency

* ran typecheck

* feat: add AnalysisWorkspace component and integrate it into CfgViewerPage

* feat: update routing in AppLayout and improve empty state message in ExplorerPage

* feat: enhance scan progress tracking with additional metrics and stages

* feat: update license information and add license check script

* feat: implement cross-file symbolic execution with callee body persistence

* feat: replace dagre graphs with Graphology + ELK + Sigma for more advanced call stack and cfg rendering

* feat: ensure CFG function view is scoped to the selected function, preventing bleed into sibling functions

* feat: enhance resource tracking with proxy method summaries and improve finding extraction

* feat: add terminal function exit detection for accurate resource leak analysis

* feat: add warnings for loops and functions without bodies to improve error recovery

* feat: update lambda expression handling to ensure proper function classification and control flow

* feat: remove bounded formatting/string ops and add JSON.parse sanitizer for improved data handling

* feat: add inline return taint analysis and regression tests for improved security checks

* feat: add engine version management and migration handling for database schema updates

* feat: enhance first_call_ident to skip nested function bodies and add regression tests

* feat: enhance callee name resolution with two-segment normalization and disambiguation

* feat: add cross-file context flags and debug assertions for taint analysis

* feat: refactor taint analysis structure to unify context handling and improve clarity

* feat: enhance dead code elimination to preserve Sink, Source, and Sanitizer labels with new tests

* docs: updated CHANGELOG.md

* fmt: formatting fixes

* fix: fixed frontend formatting and lint warnings

* fix: optimized ci

* fix: optimized ci

* Add comprehensive multi-file test coverage to Nyx (#38)

* Initial checklist for multi-file test suite expansion

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* Add 12 new multi-file test fixtures with TP/TN/near-miss coverage

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* deleted root repo

* rebuilt to test for regressions

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Co-authored-by: elipeter <elicpeter@gmail.com>

* feat: enhance import alias resolution and taint tracking

* feat: implement security hardening with CSRF protection and path validation

* feat: add support for import alias bindings in Python, PHP, and Rust

* feat: enhance CFG analysis modes and improve code readability

* feat: add detection for parameterized SQL queries to enhance security

* feat: add safe internal redirect handling and enhance session destroy validation

* feat: implement security improvements by addressing vulnerabilities in execAsync, session management, and file downloads

* feat: enhance taint detection by adding support for inline source member expressions in call arguments

* feat: implement pre-emission of Source nodes for inline source member expressions in call arguments

* feat: add support for Throw statement in control flow and error handling

* feat: add debug and echo endpoints with potential information leakage

* feat: implement internal redirect suppression and enhance taint detection

* feat: implement module alias tracking for dynamic dispatch in JS/TS

* feat: add authorization analysis module with Express support

* feat: add authorization analysis module with Express support

* feat: add tests for admin guard requirements and clean checks in authorization analysis

* feat: integrate Koa and Fastify frameworks into authorization analysis

* feat: add Flask and Django support to authorization analysis module

* feat: add support for Rails and Sinatra frameworks in authorization analysis

* feat: add support for Axum, ActixWeb, and Rocket frameworks in authorization analysis

* feat: add support for ActixWeb, Axum, and Rocket frameworks in authorization analysis

* feat: add support for Rails and Sinatra in authorization analysis

* chore: add .DS_Store to .gitignore

* refactor: simplify conditional checks and improve readability in multiple files

* refactor: update usage of Option methods for improved clarity and consistency

* refactor: improve code readability by simplifying conditional checks and formatting

* refactor: improve code formatting and readability by simplifying conditional checks

* refactor: simplify conditional checks and improve readability in multiple files

* refactor: simplify conditional checks in axum.rs for improved readability

* feat: add CodeQL analysis configuration for enhanced security scanning

* test: add comprehensive tests for `src/output.rs` SARIF builder (#39)

* chore: start test coverage improvement work

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* test: add comprehensive tests for src/output.rs SARIF builder

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* refactor: improve code formatting and readability in output.rs

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Co-authored-by: elipeter <elicpeter@gmail.com>

* refactor: improve code formatting and readability in output.rs

* Potential fix for code scanning alert no. 210: Uncontrolled data used in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Potential fix for code scanning alert no. 211: Uncontrolled data used in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* refactor: enhance triage file path handling with improved error management and validation

* refactor: updated func summaries for richer detail

* refactor: update SSA summary extraction to use canonical FuncKey for distinct entries

* refactor: enhance callee metadata structure to support arity, receiver, and qualifier for better overload resolution

* refactor: add support for keyword arguments in function calls and enhance receiver extraction for method-style calls

* refactor: implement new Flask routes for safe and unsafe shell command execution

* refactor: separate receiver handling in SSA operations and enhance taint propagation

* refactor: improve arity handling by using arg_uses for positional argument count and enhance witness scoring for tainted arguments

* refactor: implement auth decorator extraction and classification for multiple languages

* refactor: enhance Rust module path resolution and use map handling for cross-file disambiguation

* refactor: introduce CalleeQuery struct for structured callee resolution and enhance resolver logic

* refactor: implement same-file identity collision handling for `runTask` to ensure correct resolver behavior

* refactor: standardize default struct initialization across multiple files

* feat: add scripts for formatting checks and auto-fixes with test summaries

* refactor: simplify character splitting and enhance namespace qualifier handling

* refactor: improve documentation clarity and enhance code readability in resolver logic

* refactor: replace default struct initialization with explicit field assignments for clarity

* feat: enhance anonymous function naming by deriving context-based bindings

* refactor: streamline match expressions for improved readability and performance

* refactor: streamline match expressions for improved readability and performance

* refactor: replace loop with while let for improved clarity and performance

* feat: add SSA constant propagation support to analysis context for improved accuracy

* feat: add SSA constant propagation support to analysis context for improved accuracy

* feat: implement shell metacharacter validation and bounded-length checks in Rust analysis

* feat: add static map analysis for command injection suppression and type safety

* refactor: simplify match statements and reduce line breaks for improved readability

* feat(summary): phase 1/5 SinkSite data model for primary sink-location attribution

Introduce SinkSite (file_rel, line, col, snippet, cap) carrying the
primary sink source-location through function summaries. Swap
SsaFuncSummary.param_to_sink and FuncSummary.param_to_sink from a coarse
Cap map to a deduped SmallVec<[SinkSite; 1]> per parameter, with a
backward-compatible cap_sites() helper and serde defaults so pre-phase-1
on-disk rows continue to deserialise cleanly.

Extraction: SinkSiteLocator bundles the tree/bytes/file_rel needed by
extract_ssa_func_summary; ParsedFile::extract_ssa_artifacts wires the
locator in for the persisted pass-1 path, while pass-2 intra-file
transient summaries fall back to cap-only sites (behavior unchanged).
Merge: GlobalSummaries::insert now unions sink sites with
(file_rel, line, col, cap) dedup via shared union_param_sink_sites
helper.

Database: JSON-serialised summary columns carry the new shape
automatically; no schema change needed.

Phase 2 will consume SinkSite in build_taint_diag() to overwrite the
caller-site Finding.line with the callee's sink line when resolved via
summary. Phase 1 keeps behavior unchanged: scanning
tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs still produces the
same (wrong) line 10 finding.

Adds round-trip tests covering SinkSite solo, SsaFuncSummary with sink
sites, legacy-JSON default handling for both summary types, and merge
dedup.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(taint): phase 2/5 thread SinkSite into SsaTaintEvent and Finding

Plumb Phase 1's SinkSite through the event pipeline into Findings,
no output change yet.  SsaTaintEvent gains `primary_sink_site:
Option<SinkSite>`; when the main or callback sink-emission path has
non-empty `param_to_sink_sites`, filter to sites whose
`(line != 0) && (cap ∩ sink_caps != ∅)` and emit one event per
distinct site — the multi-primary collapse keeps each downstream
Finding single-primary.

Resolution: ResolvedSummary and SinkInfo gain mirror
`param_to_sink_sites` fields, populated from `SsaFuncSummary.param_to_sink`
(SSA + callback paths) and `FuncSummary.param_to_sink` (global paths).
Label, local-summary, and interop resolution paths leave the field
empty — they only ever had cap-level info to begin with.

Finding: new `primary_location: Option<SinkLocation>` with
`file_rel/line/col`.  `ssa_events_to_findings` maps
`event.primary_sink_site` → `Finding.primary_location`, filtering
cap-only sites (`line == 0`) to `None` so the (0,0) sentinel never
leaks to formatters.  Dedup key extended with the primary location
so multi-site events aren't collapsed back together.

Invariants (debug_assert!):
* every SinkSite reaching emission has `line != 0 && cap ∩ sink_caps
  != ∅` — enforced by the pick_primary_sink_sites* filters;
* every populated Finding.primary_location has `line != 0` AND
  non-empty `file_rel` — the cap-only → None translation upstream
  guarantees this.

Deliberately independent of `uses_summary`: that flag tracks whether
the *taint chain* used a summary, whereas primary attribution
requires only that the *sink* itself was summary-resolved.  A local
source reaching a cross-file sink produces `uses_summary=false`
alongside a populated primary_location — documented on
Finding.primary_location, covered by
`cross_file_sink_finding_carries_primary_location`.

build_taint_diag, SARIF/JSON/explanation formatters, and the
benchmark scorer remain untouched: finding.line still comes from
`cfg_graph[finding.sink]`, so cmdi_indirect.rs still reports line 10
and the benchmark's rs-cmdi-003 row still shows FN in the LOC column.

Tests: `cross_file_sink_finding_carries_primary_location` (proves
plumbing via a synthetic FuncSummary carrying a SinkSite at 42:5) and
`cross_file_sink_cap_only_site_leaves_primary_location_none`
(regression guard against cap-only sites surfacing).  All 1566 lib
tests + integration tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(output): phase 3/5 consume primary sink location in diag + SARIF

When a finding's primary_location (populated in phase 2 from a callee
summary's SinkSite) names the dangerous instruction inside a callee
body, attribute the diagnostic line to that location instead of the
caller's call site. The call site is demoted to a Call step in
flow_steps, and a synthetic Sink step at the primary location is
appended so analysts still see the full trace.

Changes:
- Add scan_root parameter to build_taint_diag so file_rel can be
  resolved back to an absolute path via a shared resolve_file_rel
  helper. Empty file_rel (single-file scans where namespace == "")
  resolves to the file under analysis.
- Extend SinkLocation with snippet, carried from the upstream
  SinkSite so the formatter needs no second file read.
- Relax the ssa_events_to_findings debug_assert to allow empty
  file_rel, which is valid when scan root equals the file itself.
- SARIF: emit data-flow as codeFlows[0].threadFlows[0].locations[];
  locations[0] already reflects the primary sink position via the
  updated diag line/col.

Acceptance: scan on tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs
now reports line 5 (Command::new) as the primary sink, with the call
site at line 10 visible in flow_steps.

Two expect.json fixtures updated (must_match line_range widened):
- javascript/taint/context_sensitive_call: 12-14 -> 7-14 (line 8 is
  the real sink inside run()).
- rust/cfg/closure_async: 10-10 -> 10-11 (line 11 is Command::new
  inside the closure).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(bench): phase 4/5 validate primary sink attribution across corpus

Extend the benchmark scorer and ground truth to lock in phase 3's
primary-location behavior, and add fixtures that exercise the new
capability end-to-end.

Scorer (tests/benchmark_test.rs):
- Add optional `expected_call_site_lines: Option<Vec<[usize; 2]>>` on
  Case. When present, score_location_level additionally requires at
  least one flow_step in the finding's evidence trace to fall within
  ±2 of the call-site range. When absent, the check is skipped —
  fully forward-compatible with existing fixtures.
- Retain ±2 tolerance on expected_sink_lines (compared against the
  now-primary Diag.line post-phase-3).

Ground truth edits:
- rs-cmdi-cross-001: expected_sink_lines [8,8] -> [9,9]. Line 8 is the
  transform::wrap call site (a cross-file propagator, not a sink);
  line 9 is Command::new, the real sink. The ±2 tolerance happened to
  mask this stale attribution but it was semantically wrong — phase 4
  is the right time to correct it. Also adds expected_call_site_lines
  [8,8] so the new field is exercised on an existing cross-file case.
- rs-cmdi-003: adds expected_call_site_lines [10,10] (run_cmd call).
  This fixture's sink (Command::new inside run_cmd at line 5) was the
  motivating case for phases 1-3; adding the call-site assertion
  guards against regression to caller-line attribution.

New fixtures:
- rust/cmdi/cmdi_indirect_multisink.rs (rs-cmdi-009): helper run_both
  takes two tainted params and invokes two Command sinks on
  consecutive lines. Locks in that primary line lands inside the
  helper (lines 5-6), not at the caller (line 12). Notes document
  that SinkSite is currently one-per-callee so both findings today
  collapse onto the first sink; expected_sink_lines=[5,6] and
  expected_call_site_lines=[12,12] stay valid either way.
- python/cmdi/cross_indirect_sink/{app.py,helper.py} (py-cmdi-cross-
  004): sink os.system lives in helper.py (cross-file), caller in
  app.py reads env source and calls run_cmd. Verifies phase 3's
  cross-file primary attribution: Diag.path = helper.py, Diag.line =
  5, with app.py:7 recorded in flow_steps as a Call step.

Acceptance:
- `cargo test --test benchmark_test -- --ignored --nocapture` passes.
- rs-cmdi-003 is TP/TP/TP (the target flip FN->TP at LOC). All
  pre-existing TP/TP/TP fixtures remain TP/TP/TP; 2 new fixtures are
  TP/TP/TP.
- Aggregate rule-level: TP=158 FP=10 FN=1 TN=97, P=0.940 R=0.994
  F1=0.966 on the 266-case corpus (was TP=156 FP=10 FN=1 TN=97 on
  264 pre-phase-4, delta is the +2 new cases both resolving TP).
- Full `cargo test` green (1566 lib tests + all integration tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(taint): phase 5/5 lock Finding.primary_location contract via regression test

Add a regression test in src/taint/ssa_transfer.rs that wires up a synthetic
SsaFuncSummary with a SinkSite at other.rs:42:10 and drives the three
emission stages (pick_primary_sink_sites → emit_ssa_taint_events →
ssa_events_to_findings) against a minimal caller SSA body.  Asserts the
resulting Finding.primary_location is exactly that triple.

The existing integration tests in src/taint/tests.rs cover the coarse
FuncSummary path end-to-end through analyse_file.  This test locks in the
lower-level SSA-side plumbing so a future refactor that silently drops the
site between pick → emit → findings fails here rather than only at the
benchmark layer.

Also refreshes tests/benchmark/results/latest.json (timestamp only; rs-cmdi-003
remains TP/TP/TP and the aggregate P/R/F1 are unchanged from phase 4).

Closes the primary sink-location attribution feature (phases 1-5/5):
* Phase 1 — SinkSite data model on summaries.
* Phase 2 — SinkSite threaded into SsaTaintEvent and Finding.
* Phase 3 — diag + SARIF consume primary_location.
* Phase 4 — benchmark validates primary_call_site_lines across corpus.
* Phase 5 — regression test locks the event→finding contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: clean up formatting and improve readability in multiple files

* refactor: simplify type definition for deduplication key in findings

* test(harness): add must_not_match expectation for FP regression guards

Extends ExpectedFinding with must_not_match field that asserts a
diagnostic must NOT fire — presence is a hard failure. Non-consuming
scan so it coexists with must_match entries on the same rule_id.
Adds forbidden_violations accumulator and updates summary line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(regression): update expectations to ensure must_not_match for various taint and resource leak rules

* feat: implement auto-seeding for JS/TS handler parameters to enhance taint tracking

* feat: update switch statement handling to improve control flow analysis

* feat: implement promisify alias handling for JS/TS to enhance taint tracking

* feat: enhance taint tracking by refining expectation handling and adding mode filtering

* feat: refine SQL handling in stream processing and enhance auto-seeding for handler parameters

* feat: update taint tracking rules to enforce full mode matching and improve flow analysis

* feat: enhance Ruby subshell handling to improve taint tracking and flow analysis

* feat: update xss_response expectations to refine taint flow analysis and enhance regression guarding

* feat: refine framework detection and update expectation handling for Echo and Sinatra

* feat: implement max_count for taint tracking expectations and deduplicate findings

* feat: add strict_unexpected handling for taint-unsanitised-flow in expectation files

* feat: enhance deduplication of taint-unsanitised-flow findings by collapsing based on line and severity

* feat: add strict_unexpected handling for taint-unsanitised-flow in multiple expectation files

* feat: add structural invariant checks for SSA bodies

* feat: ensure deterministic phi emission order using BTreeSet

* feat: enhance handling of terminators to ensure authoritative flow through successor edges

* feat: enhance Goto terminator handling to ensure all successors are marked executable

* feat: refactor code for improved readability and organization

* feat: simplify predicate checks and enhance readability in SSA handling

* feat: implement per-file parse timeout and enhance file size handling

* feat: migrate analysis engine toggles from environment variables to configuration file

* feat: remove unnecessary whitespace in hostile_input_tests.rs

* feat: remove unnecessary whitespace in hostile_input_tests.rs

* feat: update dependencies and enhance documentation on language maturity

* feat: enhance security headers and improve request body limits

* feat: implement sink capability bits for deduplication and enhance evidence tagging

* feat: implement dynamic activation handling for gated sinks and enhance validation logic

* feat: enhance configuration documentation and clarify inline analysis cache behavior

* feat: implement panic recovery during analysis to continue scans past errors

* feat: add expectations configuration for taint analysis and performance metrics

* feat: enhance error handling and logging during file reading and mutex locking

* feat: add cross-file body loading tests and plumbing for CF-1 phase

* feat: implement cross-file k=1 context-sensitive inline taint analysis with new tests and fixtures

* feat: implement indexed-scan parity in cross-file inline analysis with new dropdown and copy functionality

* feat: enhance classification span handling in CFG and AST for improved source attribution

* feat: add new Express routes for handling user input and telemetry data

* feat: implement ternary expression handling in CFG with diamond structure for JS/TS

* feat: implement Phase CF-3 abstract-domain transfer channels in summaries

* feat: add support for string-prefix transfer in cross-file calls and update tests

* docs: reduce RESULTS.md doc size

* feat: implement Phase CF-4 per-return-path summary decomposition with tests

* feat: update parameter handling in pass1 and refactor SsaFuncSummary initialization

* feat: implement Phase CF-5 for cross-file SCC joint fixed-point convergence with new flags and tests

* feat: implement Phase CF-6 with parameter-granularity points-to summaries and associated tests

* refactor: update comments and documentation for clarity and consistency

* style: format code for consistency and readability

* refactor: simplify verdict handling and improve edge checking logic

* refactor: optimize path and identifier collection by avoiding unnecessary cloning

* chore: update Cargo.toml for Rust version 1.85 and add ignored files; modify CHANGELOG and README for clarity on state analysis defaults

* refactor: update documentation and improve clarity in configuration files

* refactor: update documentation and improve clarity in configuration files

* feat: add JS/TS pass-2 convergence tests and expectations configuration

* feat: add Phase 5 regression tests for inline cache origin attribution and update related logic

* feat: implement Phase 7 deduplication and alternative path linking for taint findings

* feat: implement structural DFS index for anonymous functions and update naming conventions

* feat: add Phase 8 regression tests for container-element taint in JS and Python

* feat: add engine-depth profiles and explain-engine option for CLI

* feat: update expectations and add new README fixtures for multi-file scan regression

* feat: implement Phase 11 callback-alias and factory patterns with regression tests

* feat: implement Terminator::Switch for multi-way dispatch and add regression tests

* feat: add real-CVE benchmark fixtures for CVE-2023-48022, CVE-2019-14939, and CVE-2023-26159 with corresponding patched variants

* refactor: extract cfg and ssa_transfer to submodules

* refactor: cargo fmt

* refactor: remove unnecessary blank line in cfg_tests.rs

* refactor: remove unnecessary planning file

* chore: update Rust version to 1.88 and bump dependencies in Cargo files

* feat: enhance triage UI with new layout and controls, update README for clarity

* feat: enhance triage UI with new layout and controls, update README for clarity

* chore: remove outdated section from README for version 0.5.0

* docs: improve clarity and consistency in README content

* chore: add "GPL-3.0-or-later" to license options in about.toml

* chore: update license handling in about.toml and check-licenses.mjs

* style: format code for improved readability in TriagePage component

* style: format code for improved readability in TriagePage component

* chore: enhance license handling and improve body_id scoping in seed lookup

* feat: introduce owner and parent body IDs for enhanced seed scoping

* feat: implement direction-aware engine provenance with new CLI flag for strict CI gating

* feat: add Undef SSA operation for improved control-flow handling

* style: improve code formatting for consistency and readability in multiple files

* feat: add 16-function chain SCC across multiple files for enhanced analysis

* style: simplify code formatting for improved readability in multiple files

* fix: update CapHitReason default implementation and improve README clarity

* docs: enhance README with detailed explanations of taint analysis and limitations

* docs: refine README for clarity and consistency in taint analysis section

* style: improve code formatting for better readability in NewScanModal and scans

* fix: update cargo-about command to use --offline for deterministic license generation

* fix: update cargo-about command to use --offline for deterministic license generation

* ci: add step to prime cargo registry cache for deterministic license generation

* feat: add support for non-sink collections in authorization analysis

* feat: enhance authorization checks with row-level ownership equality and binding tracking

* feat: implement self-scoped user handling and enhance ownership checks

* refactor: simplify assertions and formatting in authorization analysis tests

* fix: normalize line endings in THIRDPARTY-LICENSES.html generation and update README with AI disclosure

* docs: update AI disclosure section for clarity and conciseness

* feat: add AI Contribution Policy and update contributing guidelines for AI assistance disclosure

* feat: enhance authorization analysis with SSA-derived variable type classification

* feat: implement auth_finding_to_diag function for enhanced security diagnostics

* feat: add args_value_refs to CallSite struct for enhanced argument tracking

* feat: add args_value_refs to CallSite struct for enhanced argument tracking

* feat: add direction-aware engine provenance with LossDirection classification and new CLI flag

* feat: simplify strip_cap_from_call_args call by removing unnecessary line breaks

* feat: enhance error message handling in cli_validation_tests for better Windows compatibility

* feat: optimize release profile settings in Cargo.toml and update CodeQL configuration

* feat: enhance release build process with SBOM generation and SLSA provenance

* feat: update actions/checkout and actions/setup-node to v6, enhance CLI options, and improve auth-check summaries

* feat: introduce PathFact handling for path safety checks and rejection logic

* feat: introduce PathFact handling for path safety checks and rejection logic

* feat: update benchmark data and enhance path sanitization logic with new safety checks

* feat: document AI assistance in frontend UI development and human review process

* feat: add return path facts for enhanced path safety checks and update documentation

* chore: update release date for version 0.5.0 in CHANGELOG.md

* chore: clean up ci.yml by removing outdated comments and clarifying steps

* feat: implement cross-language path sanitizers and validators for enhanced security

* feat: enhance SSA value usage tracking by including block terminators and improve path safety checks

* feat: enhance switch statement handling by adding per-case path constraints and support for exclusive cases

* refactor: simplify conditional formatting and improve code readability in executor and lower modules

* feat: add vulnerable examples for various languages demonstrating authentication and sanitization issues

* feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers

* feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers

* feat: add transform classifiers for Java, Go, and Ruby with corresponding tests

* refactor: clarify comments on reassign-to-constant idiom and sink behavior in guards.rs

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Eli Peter 2026-04-25 17:59:11 -04:00 committed by GitHub
parent c4ce08b452
commit 41128177d2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2144 changed files with 201812 additions and 8927 deletions

View file

@ -1,161 +1,130 @@
# CFG Structural Analysis
# CFG structural analysis
## Summary
Nyx builds an intra-procedural control-flow graph per function and checks structural properties: whether sinks are guarded by sanitizers or validators, whether web handlers check authentication, whether resources are released on all exit paths, and whether error paths terminate before reaching dangerous code.
Nyx builds an intra-procedural control-flow graph (CFG) for each function and analyzes structural properties: whether sinks are guarded by sanitizers or validators, whether web handlers check authentication, whether resources are released on all exit paths, and whether error-handling code terminates properly.
These detectors use **dominator analysis** — they check whether a guard node dominates (must execute before) a sink node on the CFG.
These detectors use dominator analysis. A guard dominates a sink when the guard must execute before the sink on every path from entry.
## Rule IDs
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `cfg-unguarded-sink` | High/Medium | Sink reachable without a dominating guard or sanitizer |
| `cfg-auth-gap` | High | Web handler reaches privileged sink without auth check |
| `cfg-unreachable-sink` | Medium | Dangerous function in unreachable code |
| `cfg-unreachable-sanitizer` | Low | Sanitizer in unreachable code |
| `cfg-unreachable-source` | Low | Source in unreachable code |
| `cfg-error-fallthrough` | High/Medium | Error check doesn't terminate; dangerous code follows |
| `cfg-resource-leak` | Medium | Resource acquired but not released on all exit paths |
| `cfg-lock-not-released` | Medium | Lock acquired but not released on all exit paths |
| Rule ID | Severity |
|---|---|
| `cfg-unguarded-sink` | High/Medium |
| `cfg-auth-gap` | High |
| `cfg-unreachable-sink` | Medium |
| `cfg-unreachable-sanitizer` | Low |
| `cfg-unreachable-source` | Low |
| `cfg-error-fallthrough` | High/Medium |
| `cfg-resource-leak` | Medium |
| `cfg-lock-not-released` | Medium |
## What It Detects
## What it detects
### Unguarded sinks (`cfg-unguarded-sink`)
A sink call (e.g. `system()`, `eval()`, `Command::new()`) is reachable from the function entry without passing through a guard or sanitizer that matches the sink's capability.
**`cfg-unguarded-sink`**: A sink call (`system`, `eval`, `Command::new`, `db.execute`, etc.) is reachable from function entry without passing through any guard or sanitizer that matches the sink's capability.
### Auth gaps (`cfg-auth-gap`)
A function identified as a web handler (by parameter naming conventions like `req`, `res`, `ctx`, `request`) reaches a privileged sink (shell execution, file I/O) without a prior call to an authentication function (`is_authenticated`, `require_auth`, `check_permission`, etc.).
**`cfg-auth-gap`**: A function identified as a web handler (by parameter naming conventions like `req`, `res`, `ctx`, `request`, language-dependent) reaches a privileged sink (shell execution, file I/O) without a preceding authentication call.
### Unreachable security code (`cfg-unreachable-*`)
Sinks, sanitizers, or sources in dead code branches. This often indicates a refactoring error where security-critical code was accidentally made unreachable.
**`cfg-unreachable-*`**: Sinks, sanitizers, or sources in dead code. Usually signals a refactoring error that silently disabled security-relevant logic.
### Error fallthrough (`cfg-error-fallthrough`)
An error check (null check, error return check) does not terminate the function or loop back. Execution continues to a dangerous operation on the error path.
**`cfg-error-fallthrough`**: An error-handling branch (null check, error-return check) does not terminate. Execution falls through to a dangerous operation on the error path.
### Resource leaks (`cfg-resource-leak`, `cfg-lock-not-released`)
A resource acquisition call (e.g. `File::open`, `fopen`, `socket`, `Lock`) is not matched by a release call (e.g. `close`, `fclose`, `unlock`) on all exit paths from the function.
**`cfg-resource-leak`, `cfg-lock-not-released`**: A resource acquisition (`File::open`, `fopen`, `socket`, `Lock`) is not matched by a release on every exit path from the function.
## What It Cannot Detect
## What it can't detect
- **Inter-procedural guards**: If authentication is checked in a middleware function that calls this handler, the CFG detector cannot see it. It only analyzes one function at a time.
- **Dynamic dispatch**: Virtual method calls, function pointers, and closures are opaque to the CFG.
- **Complex guard patterns**: Only recognized guard function names are checked. Custom validation logic (e.g. `if password == expected`) is not recognized as a guard.
- **Correct sanitization**: The detector checks that *some* guard dominates the sink, not that the guard is *correct*. A guard that always passes would suppress the finding.
- **Cross-function resource flows**: If a file handle is opened in one function and closed in another, the detector will report a leak in the first function.
- **Inter-procedural guards.** Middleware-level auth, helper functions that internally call auth, and cleanup performed in a caller are invisible.
- **Dynamic dispatch.** Virtual calls, function pointers, closures resolve to no specific callee.
- **Correctness of guards.** The detector checks *a* guard dominates the sink. It cannot check the guard is correct. A no-op `if true {}` would suppress the finding.
- **Custom validation logic.** Only recognised guard names are checked. `if password == expected` is not a recognised guard.
- **Cross-function resource flows.** If a file handle opens in one function and closes in another, the opener gets flagged as a leak. This is the largest source of FPs on factory-pattern code.
## Common False Positives
## Common false positives
| Scenario | Why it fires | Mitigation |
|----------|-------------|------------|
| Framework-level auth middleware | Handler doesn't call auth directly | Document as expected; suppress with severity filter |
| Resource closed via RAII/defer | Implicit cleanup not visible to CFG | Currently not detected; known limitation |
| Custom guard function name | Function not in the recognized guard list | Add the function name as a sanitizer in config |
| Test handlers | Intentionally skip auth in tests | Default non-prod downgrade reduces severity; or exclude test dirs |
| Scenario | Why | Mitigation |
|---|---|---|
| Framework middleware auth | Handler doesn't call auth directly | Expected; suppress with severity filter or exclude handlers |
| RAII / defer cleanup | Implicit release not visible to CFG (partially handled for Rust Drop and Go defer) | Known limitation |
| Custom guard name | Function not in the recognised guard list | Add it as a sanitizer rule in config |
| Test handlers | Intentional lack of auth | Default non-prod downgrade reduces severity; or exclude test dirs |
## Common False Negatives
## Common false negatives
| Scenario | Why it's missed |
|----------|----------------|
| Auth in called function | Cross-function guards not tracked |
| Guard via type system | Type-level guarantees (e.g. Rust's `AuthenticatedUser` wrapper) not analyzed |
| Resource closed in finally/defer | Some cleanup patterns not recognized |
| Scenario | Why |
|---|---|
| Auth in a called helper | Cross-function guards not tracked |
| Type-system guards | Rust `AuthenticatedUser<T>` wrappers, typestate patterns not analysed |
| Cleanup in `finally`/`ensure`/`defer` in callers | Cross-function cleanup not tracked |
## Confidence Signals
## Tuning
| Signal | Meaning |
|--------|---------|
| **Evidence lists guard nodes** | Shows which guards were checked and found missing |
| **Sink has high capability** | Shell execution or file I/O sinks are higher risk |
| **Handler detection matched** | Web handler identification is based on conventional parameter names |
### Recognised guard names
## Tuning and Noise Controls
Nyx accepts these patterns as dominating guards:
### Add custom guards/sanitizers
| Pattern | Applies to |
|---|---|
| `validate*`, `sanitize*` | All sinks |
| `check_*`, `verify_*`, `assert_*` | All sinks |
| `shell_escape` | Shell sinks |
| `html_escape` | HTML/XSS sinks |
| `url_encode` | URL sinks |
| `which` | Shell execution (binary lookup) |
### Recognised auth names
| Pattern | Language |
|---|---|
| `is_authenticated`, `require_auth`, `check_permission`, `authorize`, `authenticate`, `require_login`, `check_auth`, `verify_token`, `validate_token` | Cross-language |
| `middleware.auth`, `auth.required` | Go |
| `isAuthenticated`, `checkPermission`, `hasAuthority`, `hasRole` | Java |
For Rust auth checks (`require_*`, ownership equality, row-level checks), see [auth.md](../auth.md).
### Custom guards
```toml
[[analysis.languages.python.rules]]
matchers = ["validate_request", "check_csrf"]
kind = "sanitizer"
cap = "all"
cap = "all"
```
### Add auth rules
Auth checks are recognized by function name. If your codebase uses non-standard names:
### Custom auth functions
```toml
[[analysis.languages.javascript.rules]]
matchers = ["ensureLoggedIn", "requirePermission"]
kind = "sanitizer"
cap = "all"
```
### Filter results
```bash
# Skip low-severity unreachable findings
nyx scan . --severity ">=MEDIUM"
```
### Disable CFG analysis
```bash
nyx scan . --mode ast # AST patterns only
cap = "all"
```
## Examples
### Unguarded sink
Unguarded sink:
```go
func handler(w http.ResponseWriter, r *http.Request) {
cmd := r.URL.Query().Get("cmd")
exec.Command("sh", "-c", cmd).Run() // cfg-unguarded-sink: no guard dominates
exec.Command("sh", "-c", cmd).Run() // cfg-unguarded-sink
}
```
### Auth gap
Auth gap:
```javascript
app.get('/admin/delete', (req, res) => {
// No is_authenticated() call
db.execute("DELETE FROM users WHERE id = " + req.params.id);
// cfg-auth-gap: web handler reaches privileged sink without auth
// No auth call
db.execute("DELETE FROM users WHERE id = " + req.params.id); // cfg-auth-gap
});
```
### Resource leak
Resource leak:
```c
void process() {
FILE *f = fopen("data.txt", "r"); // acquire
FILE *f = fopen("data.txt", "r");
if (error) {
return; // cfg-resource-leak: f not closed on this path
return; // cfg-resource-leak: f not closed on this path
}
fclose(f);
}
```
## Guard Rules
Nyx recognizes these function name patterns as guards:
| Pattern | Applies to |
|---------|-----------|
| `validate*`, `sanitize*` | All sinks |
| `check_*`, `verify_*`, `assert_*` | All sinks |
| `shell_escape` | Shell execution sinks |
| `html_escape` | HTML/XSS sinks |
| `url_encode` | URL sinks |
| `which` | Shell execution (binary lookup) |
### Auth rules
| Pattern | Category |
|---------|----------|
| `is_authenticated`, `require_auth`, `check_permission` | Common |
| `authorize`, `authenticate`, `require_login` | Common |
| `check_auth`, `verify_token`, `validate_token` | Common |
| `middleware.auth`, `auth.required` | Go |
| `isAuthenticated`, `checkPermission`, `hasAuthority`, `hasRole` | Java |

View file

@ -1,111 +1,84 @@
# AST Pattern Matching
# AST patterns
## Summary
AST patterns are tree-sitter queries that match dangerous structural shapes in source. No dataflow, no CFG. A match means the construct is present; it's not proof the construct is exploitable.
AST patterns are tree-sitter queries that match specific structural code constructs. They are the simplest and fastest detector family — no dataflow, no CFG, just structural presence. A match means the dangerous construct exists in the code; it does not prove the code is exploitable.
AST patterns run in all analysis modes, including `--mode ast` (where they are the only active detector).
Patterns run in every analysis mode. In `--mode ast` they're the only active detector.
## Rule IDs
Pattern rule IDs follow the format `<lang>.<category>.<specific>`:
```
rs.memory.transmute
js.code_exec.eval
py.deser.pickle_loads
c.memory.gets
java.sqli.execute_concat
<lang>.<category>.<name>
```
See the [Rule Reference](../rules/index.md) for a complete listing per language.
Examples: `js.code_exec.eval`, `py.deser.pickle_loads`, `c.memory.gets`, `java.sqli.execute_concat`.
## Pattern Tiers
Full list: [rules.md](../rules.md).
| Tier | Meaning | Examples |
|------|---------|---------|
| **A** | Structural presence alone is high-signal | `gets()`, `eval()`, `pickle.loads()`, `mem::transmute` |
| **B** | Query includes a heuristic guard | SQL `execute` with concatenated arg, `printf(var)` with non-literal format |
## Tiers
Tier B patterns use additional tree-sitter predicates to reduce false positives. For example, `java.sqli.execute_concat` only fires when `executeQuery()` receives a `binary_expression` (string concatenation) as its argument, not when it receives a literal or parameter placeholder.
| Tier | Meaning |
|---|---|
| **A** | Structural presence alone is high-signal. `gets`, `eval`, `pickle.loads`, `mem::transmute` |
| **B** | Pattern includes a tree-sitter heuristic guard. Example: `java.sqli.execute_concat` only fires when `executeQuery` receives a `binary_expression` (string concatenation), not a literal or a parameterized statement |
## What It Detects
## Categories
### By category
| Category | Examples |
|---|---|
| CommandExec | `system`, `os.system`, `Runtime.exec`, backticks |
| CodeExec | `eval`, `Function`, PHP `assert("string")`, `class_eval`, `instance_eval` |
| Deserialization | `pickle.loads`, `yaml.load`, `Marshal.load`, `readObject`, `unserialize` |
| SqlInjection | `executeQuery`/`Query`/`execute` with concatenated argument (Tier B) |
| PathTraversal | PHP `include $var` |
| Xss | `document.write`, `outerHTML`, `insertAdjacentHTML`, `getWriter().print` |
| Crypto | `md5`, `sha1`, `Math.random`, `java.util.Random` for security use |
| Secrets | hardcoded API keys (Go, JS, TS) |
| InsecureTransport | `InsecureSkipVerify`, `fetch("http://...")` |
| Reflection | `Class.forName`, `Method.invoke`, `send`, `constantize` |
| MemorySafety | `transmute`, `unsafe`, `gets`, `strcpy`, `sprintf` |
| Prototype | `__proto__` assignment, `Object.prototype.*` |
| Config | CORS dynamic origin, `rejectUnauthorized: false`, insecure session settings |
| CodeQuality | `unwrap`, `panic!`, `as any` |
| Category | What it matches | Example languages |
|----------|----------------|-------------------|
| **CommandExec** | Shell command execution functions | C (`system`), Python (`os.system`), Ruby (backticks) |
| **CodeExec** | Dynamic code evaluation | JS (`eval`, `new Function()`), Python (`exec`), PHP (`eval`) |
| **Deserialization** | Unsafe object deserialization | Java (`readObject`), Python (`pickle.loads`), Ruby (`Marshal.load`) |
| **SqlInjection** | SQL with string concatenation | Java, Go, Python, PHP (Tier B heuristic) |
| **PathTraversal** | File inclusion with variable path | PHP (`include $var`) |
| **Xss** | XSS sink functions | JS (`document.write`, `outerHTML`), Java (`getWriter().print`) |
| **Crypto** | Weak cryptographic algorithms | All languages (`md5`, `sha1`, `Math.random()`) |
| **Secrets** | Hardcoded credentials | Go (variable name matching) |
| **InsecureTransport** | Unencrypted communication | Go (`InsecureSkipVerify`), JS (`fetch("http://")`) |
| **Reflection** | Dynamic class/method dispatch | Java (`Class.forName`, `Method.invoke`), Ruby (`send`, `constantize`) |
| **MemorySafety** | Memory safety violations | Rust (`transmute`, `unsafe`), C (`gets`, `strcpy`, `sprintf`) |
| **Prototype** | Prototype pollution | JS/TS (`__proto__` assignment) |
| **CodeQuality** | Panic/abort/type-safety issues | Rust (`unwrap`, `panic!`), TS (`as any`) |
## What patterns can't tell you
## What It Cannot Detect
- **Dataflow.** `eval("1+1")` (safe) and `eval(userInput)` (dangerous) both match `js.code_exec.eval`. The taint detector is the one that distinguishes them.
- **Reachability.** A pattern in dead code matches identically.
- **Semantics.** `strcpy(dst, src)` always matches, regardless of buffer sizes.
- **Indirect calls.** `let e = eval; e(input)` doesn't match `eval`.
- **Aliased imports.** `from os import system as s; s(cmd)` won't match `system`.
- **Macro expansions.** Tree-sitter parses the macro call site, not the expansion.
- **Dataflow**: Patterns don't track whether the dangerous function receives tainted input. `eval("hello")` (safe) and `eval(userInput)` (dangerous) both match `js.code_exec.eval`.
- **Context**: Patterns don't understand whether the code is reachable, guarded, or inside a test.
- **Semantics**: `strcpy(dst, src)` always matches — it cannot determine buffer sizes.
- **Indirect calls**: Function pointers, dynamic dispatch, and aliased references are invisible.
## Common false positives
## Common False Positives
| Scenario | Why | Mitigation |
|---|---|---|
| `eval("hardcoded literal")` | Pattern matches structure | Run `--mode cfg` to drop AST patterns and rely on taint |
| `unsafe` block with sound justification | Every `unsafe` matches `rs.quality.unsafe_block` | Filter `>=MEDIUM` (it's Medium) or accept the noise |
| `.unwrap()` in tests | Acceptable in test code | Default non-prod severity downgrade reduces it |
| `md5` for non-cryptographic checksums | Pattern can't see intent | Suppress with `--severity ">=MEDIUM"` or per-line `nyx:ignore` |
| SQL concat with trusted data (Tier B) | Heuristic can't verify the source | Taint is more precise; or convert to a parameterized query |
| Scenario | Why it fires | Mitigation |
|----------|-------------|------------|
| `eval()` with a hardcoded string literal | Pattern matches structural presence | Taint analysis won't flag this — use `--mode cfg` for fewer false positives |
| `unsafe` block in Rust with sound justification | All unsafe blocks match | Filter with `--severity ">=MEDIUM"` (unsafe_block is Medium) |
| `.unwrap()` in test code | Acceptable in tests | Default non-prod downgrade reduces severity |
| `md5()` used for checksums (not security) | Pattern doesn't know usage intent | Filter Low severity or add to exclusions |
| SQL concatenation with trusted data | Tier B heuristic can't verify data source | Taint analysis is more precise here |
## Confidence levels
## Common False Negatives
Every AST pattern carries an explicit confidence:
| Scenario | Why it's missed |
|----------|----------------|
| `eval` called via alias (`let e = eval; e(input)`) | Pattern matches the identifier `eval`, not the resolved function |
| Dangerous function in a macro expansion | Tree-sitter parses the macro call, not the expansion |
| SQL injection via ORM query builder | No pattern for ORM-specific query building |
| Imported function under different name | `from os import system as s; s(cmd)` — pattern looks for `system` |
| Confidence | Use |
|---|---|
| High | Inherently dangerous construct with no safe usage. `gets`, `pickle.loads`, `eval` with no guard |
| Medium | Likely issue, context may change the call. SQL concatenation (Tier B), `unsafe` blocks, `exec` |
| Low | Heuristic. Often appears in safe code. Weak crypto for checksums, `unwrap` outside tests, `Math.random` |
## Confidence Signals
`--min-confidence medium` (or `output.min_confidence = "medium"`) drops Low-confidence matches.
| Signal | Meaning |
|--------|---------|
| **Tier A** | High confidence — the function itself is dangerous |
| **Tier B** | Moderate confidence — heuristic guard reduces false positives |
| **High severity** | Critical vulnerability class (command exec, deserialization) |
| **Low severity** | Informational (weak crypto, code quality) |
| **Non-prod path** | Finding in test/vendor code — downgraded by default |
## Tuning and Noise Controls
### Severity filtering
## Tuning
```bash
# Skip code-quality and weak-crypto findings
nyx scan . --severity ">=MEDIUM"
# Only critical findings
nyx scan . --severity HIGH
nyx scan . --severity ">=MEDIUM" # drop Low-tier patterns
nyx scan . --severity HIGH # banned APIs and code-exec only
nyx scan . --mode cfg # drop AST patterns; keep taint + state + cfg
```
### Use taint for precision
```bash
# Taint-only mode: only report findings with confirmed dataflow
nyx scan . --mode cfg
```
### Exclude directories
```toml
[scanner]
excluded_directories = ["node_modules", "vendor", "generated"]
@ -113,37 +86,29 @@ excluded_directories = ["node_modules", "vendor", "generated"]
## Examples
### Tier A — structural presence
Tier A, structural presence:
**C: Banned function**
```c
char buf[64];
gets(buf); // c.memory.gets — always dangerous, no safe usage
gets(buf); // c.memory.gets
```
**Python: Unsafe deserialization**
```python
import pickle
data = pickle.loads(user_input) # py.deser.pickle_loads
data = pickle.loads(user_input) // py.deser.pickle_loads
```
### Tier B — heuristic-guarded
Tier B, heuristic guard:
**Java: SQL concatenation**
```java
// Fires: concatenated argument
stmt.executeQuery("SELECT * FROM users WHERE id=" + userId);
// java.sqli.execute_concat
stmt.executeQuery("SELECT * FROM users WHERE id=" + userId); // java.sqli.execute_concat
// Does NOT fire: parameterized query
// Does not fire: parameterized
stmt.executeQuery(preparedSql);
```
**C: Format string**
```c
// Fires: variable as first argument
printf(user_input); // c.memory.printf_no_fmt
// Does NOT fire: literal format string
printf("%s", user_input);
printf(user_input); // c.memory.printf_no_fmt: fires (variable as fmt)
printf("%s", user_input); // does not fire (literal fmt)
```

View file

@ -1,26 +1,22 @@
# State Model Analysis
# State model analysis
## Summary
Tracks resource lifecycle and authentication state through a function. Detects use-after-close, double-close, leaks, and unauthenticated access to privileged operations.
Nyx's state model analysis tracks **resource lifecycle** and **authentication state** through a function using monotone dataflow over bounded lattices. It detects use-after-close bugs, double-close bugs, resource leaks, and unauthenticated access to privileged operations.
State analysis is **opt-in** — enable it with `scanner.enable_state_analysis = true` in config. It requires `mode = "full"` or `mode = "cfg"`.
State analysis is on by default. Disable with `scanner.enable_state_analysis = false`. It runs in `--mode full` and `--mode taint`; AST-only mode skips it.
## Rule IDs
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `state-use-after-close` | High | Variable used after being closed/released |
| `state-double-close` | Medium | Resource closed twice |
| `state-resource-leak` | Medium | Resource opened but never closed (definite) |
| `state-resource-leak-possible` | Low | Resource may not be closed on all paths |
| `state-unauthed-access` | High | Privileged operation reached without authentication |
| Rule ID | Severity |
|---|---|
| `state-use-after-close` | High |
| `state-double-close` | Medium |
| `state-resource-leak` | Medium |
| `state-resource-leak-possible` | Low |
| `state-unauthed-access` | High |
## What It Detects
## What it detects
### Use-after-close (`state-use-after-close`)
A resource transitions to the CLOSED state (via `close()`, `fclose()`, `disconnect()`, etc.), then a use operation (`read`, `write`, `send`, `recv`, `query`, etc.) is performed on it.
**`state-use-after-close`**: Resource transitions to CLOSED (via `close`, `fclose`, `disconnect`, …), then a use operation happens on it.
```c
FILE *f = fopen("data.txt", "r");
@ -28,147 +24,108 @@ fclose(f);
fread(buf, 1, 100, f); // state-use-after-close
```
### Double-close (`state-double-close`)
**`state-double-close`**: Resource closed twice. Crashes or undefined behaviour on most runtimes.
A resource is closed twice. This can cause crashes or undefined behavior.
**`state-resource-leak`**: Resource opened but never closed on any path through the function. Definite leak.
```python
f = open("data.txt")
f.close()
f.close() # state-double-close
```
**`state-resource-leak-possible`**: Resource closed on some paths but not others. Lower confidence; often an early-return error path.
### Resource leak (`state-resource-leak`)
**`state-unauthed-access`**: A function recognised as a web handler reaches a privileged sink without an auth call on the path.
A resource is opened but never closed on any path through the function. This is a definite leak.
A function counts as a web handler if its name starts with `handle_`, `route_`, or `api_` (sufficient on its own), or starts with `serve_`/`process_` and the file uses web-shaped parameter names (`request`, `req`, `ctx`, `res`, `response`, `w`, `writer`, language-dependent). `main` is excluded.
```java
FileInputStream fis = new FileInputStream("data.txt");
process(fis);
// function exits without fis.close() — state-resource-leak
```
## Managed-resource suppression
### Possible resource leak (`state-resource-leak-possible`)
Several language-specific cleanup patterns suppress leak findings:
A resource is closed on some paths but not others.
| Pattern | Languages | Effect |
|---|---|---|
| RAII / Drop | Rust | All leak findings suppressed except `alloc`/`dealloc` |
| Smart pointers | C++ | `make_unique`/`make_shared` treated as managed; raw `new`/`malloc` still tracked |
| `defer` | Go | `defer f.Close()` suppresses leak at exit |
| `with` context manager | Python | `with open(f) as f:` suppresses leak for the bound name |
| try-with-resources | Java | TWR-bound resources suppressed |
```go
f, err := os.Open("data.txt")
if err != nil {
return // f not closed here
}
f.Close() // closed here
// state-resource-leak-possible on the error path
```
## What it can't detect
### Unauthenticated access (`state-unauthed-access`)
- **Cross-function resource ownership.** Open in one function, close in another, leak gets reported in the opener. The most common FP source for leak detection.
- **Factory / builder functions** that return a resource for the caller to manage.
- **Variable shadowing across scopes.** Same name in inner and outer scope shares one symbol; an inner close masks an outer leak.
- **Resources stored in collections.** Handles in arrays / maps / channels and cleaned up via iteration are not tracked.
- **Dynamic dispatch.** Close called via trait object or interface may not be recognised.
- **Type-state authentication.** `AuthenticatedRequest<T>` and similar Rust patterns are not recognised as auth.
A function identified as a web handler reaches a privileged sink (shell execution, file I/O) without any authentication check on the path.
## Common false positives
A function is identified as a web handler if:
1. Its name starts with `handle_`, `route_`, or `api_` (strong match — sufficient on its own), OR
2. Its name starts with `serve_` or `process_` AND any function in the file has web-like parameter names (`request`, `req`, `ctx`, `res`, `response`, `w`, `writer`, etc., varying by language).
| Scenario | Why | Mitigation |
|---|---|---|
| Factory returns a resource | Caller owns it | Known limitation |
| Framework-managed handles | Connection pool, request scope | Exclude framework code or downgrade |
| Variable name shadowing | Same name reused | Known limitation |
The function name `main` is explicitly excluded.
## Per-language detection
```javascript
app.post('/admin/exec', (req, res) => {
// No auth check
exec(req.body.command); // state-unauthed-access
});
```
| Language | Leak | Double-close | Use-after-close | Notes |
|---|---|---|---|---|
| C | yes | yes | yes | `fopen`/`fclose`, `malloc`/`free`, `pthread_mutex_*` |
| C++ | yes | yes | yes | C pairs plus `new`/`delete`; smart pointers suppressed |
| Python | yes | yes | yes | `with` suppressed; `open`, `socket`, `connect` |
| Go | yes | yes | yes | `defer` suppressed; `os.Open` / `.Close` |
| Rust | unsafe only | n/a | n/a | RAII suppresses everything except `alloc`/`dealloc` |
| JavaScript | yes | yes | partial | `fs.openSync`/`closeSync` |
| TypeScript | yes | yes | partial | Same as JS |
| PHP | yes | yes | partial | `fopen`/`fclose`, `curl_init`/`curl_close`, `mysqli_*` |
| Ruby | partial | partial | partial | `File.open`/`close`, `TCPSocket` |
| Java | limited | limited | limited | Constructor-callee matching is incomplete |
## What It Cannot Detect
- **Cross-function resource management**: Resources opened in one function and closed in another are not tracked. This is the most common source of false positives for leak detection.
- **RAII / defer / try-with-resources**: Implicit cleanup via language-level constructs (Rust's `Drop`, Go's `defer`, Java's try-with-resources, Python's `with`) is not recognized. These patterns will produce false-positive leak findings.
- **Dynamic dispatch**: If `close()` is called through a trait object or interface, it may not be recognized.
- **Authentication via type system**: Rust's type-state pattern (e.g. `AuthenticatedRequest<T>`) is not recognized as an auth check.
- **Complex authorization logic**: Only recognized function name patterns are checked.
## Common False Positives
| Scenario | Why it fires | Mitigation |
|----------|-------------|------------|
| RAII / Drop / defer cleanup | Implicit cleanup not visible | Known limitation; filter by severity |
| Resource returned to caller | Ownership transferred, not leaked | Known limitation |
| Framework-managed resources | Web framework manages connection lifecycle | Exclude framework-generated handlers |
| Try-with-resources (Java) | Language construct not parsed | Known limitation |
| Context manager (Python `with`) | Block construct not tracked | Known limitation |
## Common False Negatives
| Scenario | Why it's missed |
|----------|----------------|
| Resource closed in helper function | Cross-function tracking not implemented |
| Auth in middleware | Auth check happens before handler is called |
| Double-close via aliased reference | Alias analysis not performed |
## Confidence Signals
| Signal | Meaning |
|--------|---------|
| **Definite leak (state-resource-leak)** | Resource is never closed on any path — high confidence |
| **Use-after-close** | Read/write operation after explicit close — high confidence |
| **Web handler detected** | Entry point matched by parameter naming convention |
| **Possible leak (state-resource-leak-possible)** | Resource closed on some but not all paths — lower confidence |
## Tuning and Noise Controls
### Enable state analysis
```toml
[scanner]
enable_state_analysis = true
```
### Severity filtering
## Tuning
```bash
# Skip possible-leak findings (Low severity)
nyx scan . --severity ">=MEDIUM"
nyx scan . --severity ">=MEDIUM" # Skip "possible" leaks (Low)
```
### Exclude test files
```toml
[scanner]
excluded_directories = ["tests", "test", "spec"]
enable_state_analysis = true # default
excluded_directories = ["tests", "test", "spec"]
```
## Resource Pairs
## Recognised pairs
The state engine recognizes these acquire/release pairs per language:
The state engine ships these acquire/release pairs. Custom pairs are not yet configurable; file an issue if you need one.
### C/C++
| Acquire | Release | Resource |
|---------|---------|----------|
| `fopen` | `fclose` | File handle |
| `open` | `close` | File descriptor |
| `socket` | `close` | Socket |
| `malloc`, `calloc`, `realloc` | `free` | Heap memory |
| `pthread_mutex_lock` | `pthread_mutex_unlock` | Mutex |
**C / C++**
### Rust
| Acquire | Release | Resource |
|---------|---------|----------|
| `File::open`, `File::create` | `drop`, `close` | File handle |
| `TcpStream::connect` | `shutdown` | TCP connection |
| `lock`, `read`, `write` (on Mutex/RwLock) | `drop` | Lock guard |
| Acquire | Release |
|---|---|
| `fopen` | `fclose` |
| `open` | `close` |
| `socket` | `close` |
| `malloc`, `calloc`, `realloc` | `free` |
| `pthread_mutex_lock` | `pthread_mutex_unlock` |
| `new`, `new[]` *(C++)* | `delete`, `delete[]` |
### Java
| Acquire | Release | Resource |
|---------|---------|----------|
| `new FileInputStream` | `close` | File stream |
| `getConnection` | `close` | DB connection |
| `new Socket` | `close` | Socket |
**Rust**
### Go, Python, JavaScript, Ruby, PHP
Similar patterns with language-specific function names.
| Acquire | Release |
|---|---|
| `File::open`, `File::create` | `drop`, `close` |
| `TcpStream::connect` | `shutdown` |
| `lock`, `read`, `write` (Mutex/RwLock) | `drop` |
## Use Patterns (Trigger use-after-close)
**Java**
The following operations on a closed resource trigger `state-use-after-close`:
| Acquire | Release |
|---|---|
| `new FileInputStream` (and friends) | `close` |
| `getConnection` | `close` |
| `new Socket` | `close` |
Go, Python, JavaScript, Ruby, PHP follow language-idiomatic equivalents.
## Use-after-close triggers
These operations on a closed resource fire `state-use-after-close`:
```
read, write, send, recv, fread, fwrite, fgets, fputs, fprintf, fscanf,
@ -177,28 +134,3 @@ ungetc, query, execute, fetch, sendto, recvfrom, ioctl, fcntl,
strcpy, strncpy, strcat, strncat, memcpy, memmove, memset, memcmp,
strcmp, strncmp, strlen, sprintf, snprintf
```
## Technical Details
### Resource Lifecycle Lattice
```
UNINIT → OPEN → CLOSED
→ MOVED
```
States are tracked as bitflags, allowing the lattice to represent uncertainty (e.g. OPEN|CLOSED means the resource is open on some paths and closed on others).
### Leak Detection Scope
Resource leaks are checked at the file-level exit node and the **synthesized** function exit node (a single Return node that all early returns feed into). Early-return nodes are **not** checked individually — only the merged state at the function's synthesized exit is inspected. This prevents duplicate findings where an early-return path reports a definite leak while the merged exit correctly reports a possible leak.
This per-function exit inspection ensures that a variable leaked inside one function is not masked by a same-named variable that is properly closed in a subsequent function.
### Auth Level Lattice
```
Unauthed < Authed < Admin
```
Join semantics: take the minimum (conservative). If any path is unauthenticated, the result is unauthenticated.

View file

@ -1,10 +1,8 @@
# Taint Analysis
# Taint analysis
## Summary
Nyx tracks untrusted data from **sources** (where it enters the program) through assignments and function calls to **sinks** (where it's used dangerously). If the flow reaches a sink without passing a matching **sanitizer**, a finding fires.
Nyx's taint analysis tracks the flow of untrusted data from **sources** (where data enters the program) through **assignments and function calls** to **sinks** (where dangerous operations happen). If the data reaches a sink without passing through a **sanitizer** with matching capabilities, a finding is emitted.
The engine uses a monotone forward dataflow analysis over a finite lattice with guaranteed termination. Analysis is **intra-procedural with cross-file function summaries** — it does not follow calls into other functions but uses pre-computed summaries of their behavior.
The engine is a monotone forward dataflow over a finite lattice with guaranteed termination. It's flow-sensitive inside a function, and interprocedural across files via persisted per-function summaries.
## Rule ID
@ -12,191 +10,135 @@ The engine uses a monotone forward dataflow analysis over a finite lattice with
taint-unsanitised-flow (source <line>:<col>)
```
One rule ID covers all taint findings. The parenthetical identifies the specific source location.
One rule ID, parameterized by the source location. Suppressions can target either the base ID or the full string.
## What It Detects
## What it detects
- Environment variables flowing to shell execution (`env::var``Command::new`)
- User input flowing to code evaluation (`req.body``eval()`)
- File contents flowing to SQL queries (`fs::read_to_string``db.execute()`)
- Request parameters flowing to HTML output (`req.query``innerHTML`)
- Any source-to-sink flow where the sink's required capability is not stripped by a sanitizer
- User input flowing to shell execution: `req.body.cmd``child_process.exec`
- User input flowing to code evaluation: `req.query.code``eval`
- User input flowing to SQL: `request.args.get('id')``cursor.execute(f"... {id}")`
- Environment variables flowing to shell: `env::var("CMD")``Command::new("sh").arg("-c")`
- Request parameters flowing to HTML: `req.query.name``innerHTML`
- File contents flowing to privileged sinks: `fs::read_to_string``db.execute`
- Any other source-to-sink flow where the sink's required capability is not stripped along the way
## What It Cannot Detect
## What it can't detect
- **Inter-procedural flows without summaries**: If a function isn't summarized (e.g. from a third-party library without source), the taint engine cannot track data through it. It conservatively treats unknown callees as neither propagating nor sanitizing.
- **Flows through data structures**: Taint is tracked per-variable, not per-field. `obj.field = tainted; sink(obj.other_field)` may produce a false positive because taint attaches to `obj` as a whole.
- **Aliasing**: `let y = &x; sink(*y)` — the engine tracks `y` as a fresh variable, not an alias of `x`. This can cause false negatives.
- **Complex control flow**: The analysis is flow-sensitive (respects control flow within a function) but does not track taint through arbitrary loops with complex exit conditions.
- **Implicit flows**: Taint only follows explicit data flow, not information flow through branching (e.g. `if (secret) { x = 1 } else { x = 0 }` does not taint `x`).
- **Library calls without summaries.** If a callee has no summary (no source, binary-only dependency), Nyx treats it as neither propagating nor sanitizing. This is conservative for sanitization but lossy for propagation.
- **Taint through struct fields and containers.** Taint attaches to whole variables. `obj.field = tainted; sink(obj.other_field)` can produce a false positive because `obj` itself is tainted.
- **Aliasing.** `let y = &x; sink(*y)` tracks `y` separately from `x`. Can cause FNs.
- **Implicit flows.** Taint follows explicit data, not branching signal. `if (secret) x = 1 else x = 0` does not taint `x`.
- **Globals and statics across functions.** Not tracked across function boundaries.
## Common False Positives
## Common false positives
| Scenario | Why it happens | Mitigation |
|----------|---------------|------------|
| Custom sanitizer not recognized | Nyx only knows built-in and configured sanitizers | Add a custom sanitizer rule in config |
| Taint through struct fields | Variable-level (not field-level) tracking | No current mitigation; field sensitivity is planned |
| Dead code paths | The engine is path-insensitive within a function (it considers all paths) | Contradiction pruning catches some cases; path-validated findings score lower |
| Library wrappers | A wrapper around a dangerous function may re-introduce taint that was sanitized by the wrapper | Summarize the wrapper function or add it as a sanitizer |
| Scenario | Why | Mitigation |
|---|---|---|
| Custom sanitizer not recognised | Only built-in + configured sanitizers match | Add a custom sanitizer rule in config |
| Taint through struct fields | Variable-level tracking, not field-level | No fix yet; field-sensitivity is planned |
| Dead branches | Path-insensitive within a function | Constraint solving catches trivially infeasible combos; path-validated findings are scored lower |
| Library wrapper re-introduces taint | Wrapper opaque, or summary marks it as propagating | Summarize the wrapper explicitly or add it as a sanitizer |
## Common False Negatives
## Common false negatives
| Scenario | Why it's missed |
|----------|----------------|
| Third-party library calls | No summary available; callee treated as opaque |
| Taint through global/static variables | Not tracked across function boundaries |
| Taint through closures/callbacks in some languages | Closure capture analysis is limited (JS/TS/Ruby/Go anonymous functions ARE analyzed) |
| Flows spanning more than two files | Summary approximation loses precision at depth |
| Scenario | Why |
|---|---|
| Third-party library on the path | No summary available, callee treated opaquely |
| Globals / statics across function boundaries | Not tracked |
| Some closure captures | Closure analysis is limited. JS/TS/Ruby/Go anonymous functions passed as callbacks *are* analyzed as separate scopes |
| Very deep cross-file chains | Summary approximation loses precision at depth |
## Confidence Signals
## Confidence signals
These signals in the output indicate higher-confidence findings:
Higher confidence:
- Source + Sink both present in evidence with specific call locations.
- `source_kind: user_input` (direct attacker control).
- `path_validated: false`.
- No dominating guard on the path.
- Symex produced a witness string (rendered sink value visible in JSON/SARIF `evidence.symbolic.witness`).
| Signal | What it means |
|--------|--------------|
| **Evidence: Source + Sink** | Both endpoints identified with specific function names and locations |
| **Source kind = user input** | Source is directly controllable by an attacker (req.body, argv, etc.) |
| **path_validated = false** | No validation guard on the path — higher exploitability |
| **No guard_kind** | No dominating predicate check (null check, error check, etc.) |
| **High rank_score** | Multiple confidence signals combined |
Lower confidence:
- Path-validated taint (`path_validated: true`).
- Source is a database read or internal file (pre-validated at insertion is common).
- Engine note `ForwardBailed` / `PathWidened`. Use `--require-converged` to drop these in strict gates.
Lower-confidence:
## Tuning
| Signal | What it means |
|--------|--------------|
| **path_validated = true** | A validation predicate guards the path — may not be exploitable |
| **guard_kind = "ValidationCall"** | An explicit validation function was called before the sink |
| **Source kind = database** | Data from DB — may already be validated at insertion time |
## Tuning and Noise Controls
### Add custom sanitizers
If your codebase has a custom sanitizer that Nyx doesn't recognize:
### Custom sanitizer
```toml
# nyx.local
[[analysis.languages.javascript.rules]]
matchers = ["escapeHtml", "sanitizeInput"]
kind = "sanitizer"
cap = "html_escape"
kind = "sanitizer"
cap = "html_escape"
```
Or via CLI:
```bash
nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape
```
Or: `nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape`.
### Filter by severity
### Filter by severity or confidence
```bash
nyx scan . --severity HIGH # Only high-severity taint findings
nyx scan . --severity ">=MEDIUM" # Skip low-severity
nyx scan . --severity HIGH
nyx scan . --min-confidence medium
```
### Skip non-production code
By default, findings in `tests/`, `vendor/`, `build/` paths are downgraded one severity tier. To exclude them entirely, add to config:
```toml
[scanner]
excluded_directories = ["tests", "vendor", "build", "examples"]
```
### Disable taint (AST-only mode)
### Skip dataflow entirely
```bash
nyx scan . --mode ast
```
AST-only mode gives you structural pattern matches without taint.
In the browser UI, taint findings render as a numbered flow walk so you can see each hop the engine took:
<p align="center"><img src="../../assets/screenshots/docs/serve-finding-detail.png" alt="Nyx finding detail: HIGH taint-unsanitised-flow with numbered source → call → sink steps and How to fix guidance" width="900"/></p>
## Example
**Vulnerable code** (Rust):
Rust:
```rust
use std::env;
use std::process::Command;
fn main() {
let cmd = env::var("USER_CMD").unwrap(); // line 5: source
Command::new("sh").arg("-c").arg(&cmd).output(); // line 6: sink
let cmd = env::var("USER_CMD").unwrap(); // source
Command::new("sh").arg("-c").arg(&cmd).output(); // sink
}
```
**Finding**:
Finding:
```
[HIGH] taint-unsanitised-flow (source 5:15) src/main.rs:6:5
Source: env::var("USER_CMD") at 5:15
Sink: Command::new("sh").arg("-c")
Score: 76
[HIGH] taint-unsanitised-flow (source 5:15) src/main.rs:6:5
Unsanitised user input flows from env::var → Command::new
Source: env::var (5:15)
Sink: Command::new
```
**Safe alternative**:
```rust
use std::env;
use std::process::Command;
Safe rewrite: drop the shell and pass the value as argv directly (`Command::new(&cmd).output()`), or validate against an allowlist before passing to the shell.
fn main() {
let cmd = env::var("USER_CMD").unwrap();
// Use the value as a direct argument, not a shell command
Command::new(&cmd).output();
// Or validate against an allowlist
}
```
## Capabilities
## Technical Details
Sources, sanitizers, and sinks are linked by named capabilities. A sanitizer only clears taint for the cap it declares. A sink only fires when the remaining taint still carries its required cap.
### Capability System
| Capability | Typical source | Typical sanitizer | Typical sink |
|---|---|---|---|
| `env_var` | `env::var`, `getenv`, `process.env` | | |
| `html_escape` | | `html.escape`, `DOMPurify.sanitize` | `innerHTML`, `document.write` |
| `shell_escape` | | `shlex.quote`, `shell_escape::escape` | `system`, `Command::new`, `eval` |
| `url_encode` | | `encodeURIComponent` | `location.href`, HTTP client URL arg |
| `json_parse` | | `JSON.parse` | |
| `file_io` | | `os.path.realpath`, `filepath.Clean` | `open`, `fs::read_to_string`, `send_file` |
| `fmt_string` | | | `printf(var)` |
| `sql_query` | | parameterized query binders | `cursor.execute`, `db.query` with concatenation |
| `deserialize` | | | `pickle.loads`, `yaml.load`, `Marshal.load` |
| `ssrf` | | URL-prefix locks | `requests.get`, `fetch`, `HttpClient.send` |
| `code_exec` | | | `eval`, `exec`, `Function` |
| `crypto` | | | weak-algorithm constructors |
| `unauthorized_id` | request-bound scoped IDs (Rust auth analysis) | ownership check | row-level write |
| `all` | Sources typically use `all` so they match any sink | | |
Taint uses a bitflag capability system to match sources with appropriate sanitizers and sinks:
| Capability | Bit | Sources | Sanitizers | Sinks |
|-----------|-----|---------|------------|-------|
| `ENV_VAR` | 0x01 | `env::var`, `getenv` | — | — |
| `HTML_ESCAPE` | 0x02 | — | `html_escape`, `DOMPurify.sanitize` | `innerHTML`, `document.write` |
| `SHELL_ESCAPE` | 0x04 | — | `shell_escape` | `Command::new`, `system()`, `eval()` |
| `URL_ENCODE` | 0x08 | — | `encodeURIComponent` | `location.href` |
| `JSON_PARSE` | 0x10 | — | `JSON.parse` | — |
| `FILE_IO` | 0x20 | — | `filepath.Clean`, `basename`, `os.path.realpath` | `fopen`, `open`, `send_file`, `fs::read_to_string` |
| `FMT_STRING` | 0x40 | — | — | `printf(var)` |
Sources typically use `Cap::all()` to match any sink. A sanitizer strips specific capability bits. A finding fires when a tainted variable reaches a sink and the taint still has the matching capability bit set.
### Nested Function Analysis
The CFG builder recursively discovers function expressions nested inside call arguments:
- **JavaScript/TypeScript**: `function_expression`, `arrow_function` inside call arguments (e.g., Express route handlers)
- **Ruby**: `do_block` and `block` nodes (e.g., Sinatra `get '/path' do...end`)
- **Go**: `func_literal` (anonymous function literals)
Each nested function is walked as a separate scope and receives a unique identifier (`<anon@{byte_offset}>`) to prevent collisions when multiple anonymous functions exist in the same file.
### Chained Call Classification
Method chains like `r.URL.Query().Get("host")` are normalized by stripping internal `()` segments between `.` separators. The classifier matches against both the original text and the normalized form, enabling rules like `r.URL` to match within `r.URL.Query.Get`.
### Nested Call Fallback
When the outermost call in an expression doesn't classify as a source/sink, the engine tries all nested inner calls. This handles patterns like `str(eval(expr))` where `str` is not a sink but the inner `eval` is.
### Rust `if let` / `while let` Pattern Bindings
The CFG builder recognizes Rust `let_condition` nodes inside `if` and `while` expressions. The value expression is classified for source/sink labels, and the pattern binding is extracted as a variable definition:
```rust
if let Ok(cmd) = env::var("CMD") {
// cmd is tainted — env::var is a source, cmd is the binding
Command::new("sh").arg("-c").arg(&cmd).output(); // taint-unsanitised-flow
}
```
This also works for `while let` patterns.
### JS/TS Two-Level Solve
For JavaScript and TypeScript, taint analysis uses a two-level approach:
1. **Level 1**: Solve top-level code (module scope)
2. **Level 2**: Solve each function seeded with the converged top-level state
This prevents false positives from cross-function taint leakage while preserving global-to-function flows.
Sources typically use `cap = "all"` so they match every sink. Sinks declare the specific cap they need. Sanitizers only clear the cap they name.