Release/0.5.0 (#35)

* feat: Introduce function-scoped variable interning for state analysis with new tests and fixtures

* feat: Add Phase 26 symbolic execution enhancements with bitwise operator support, abstract interpretation refinements, and new taint analysis tests

* feat: Refine state analysis to handle factory-pattern resource returns with mixed-path tests and leak detection enhancements

* feat: Add Phase 27 debug views with symbolic execution, abstract interpretation, SSA, and call graph viewers; integrate with debug layout and styles

* feat: Add Phase 31 type-qualified symbolic resolution with receiver-based callee disambiguation and testing

* feat: Extend symbolic execution with state iteration, enhanced debug views, and debounced input handling

* feat: Add Phase 13 resource and auth pattern extensions with new tests and fixtures

* feat: Introduce CFG debug graph renderer with compact mode, toolbar, and DAG layout integration

* feat: Add Phase 28 encoding and decoding transform modeling with structural symex enhancements and new taint analysis tests

* feat: Extend abstract interpretation with type facts and constant value tracking in debug views and server logic

* feat: Add linear path handling and witness extraction to symbolic execution with Phase 28 transform mismatch detection

* feat: Refine Go auth and sanitizer handling with enhanced rules, state updates, and benchmark improvements

* feat: Enable auth-state analysis by default and update relevant tests in benchmark config

* test: Update state_tests to reflect default enablement of auth-state analysis and add auth suppression test

* docs: update CHANGELOG.md

* feat: Introduce per-index taint tracking in `HeapState` with `HeapSlot`, overflow handling, and revised SSA transfers

* feat: Introduce C/C++ language labels and refine heap state tracking in SSA transfers

* feat: Implement per-index array slot tracking in symbolic heap with overflow collapse

* feat: Add implicit definition handling for uninitialized declarations in SSA value allocation

* feat: Refactor function parameters and constants for improved clarity and maintainability

* refactor: Reorder module imports and improve formatting for consistency

* refactor: Fix formatting erorrs

* refactor: Fix clippy warnings

* refactor: Fix fmt warnings (again)

* chore: Update dependencies and improve feature configuration

* Add comprehensive tests for undertested modules (#36) (COPILOT)

* Add comprehensive tests for undertested modules

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083

* Add comprehensive tests for ext, project, walk, and errors modules

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* chore: Update dependencies and improve feature configuration

* fix: formatting errors in new tests

* chore: Update license list in about.toml

* chore: made functions input inline

* chore: updated cfg graph to take up the full page

* chore: add Prettier configuration and update code formatting

* Add frontend test suite with Vitest (111 tests) (#37)

* Add Vitest test suite for frontend - 111 tests across utils, components, hooks, and graph utilities

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/7cf0dba2-ecff-4740-ba4d-92717e74a0b7

* ci: add frontend test step to CI workflow

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/5bc0ac9f-0a32-4d03-9cb7-7a15aea53fca

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* chore: simplify array initialization in test files for consistency

* ran typecheck

* feat: add AnalysisWorkspace component and integrate it into CfgViewerPage

* feat: update routing in AppLayout and improve empty state message in ExplorerPage

* feat: enhance scan progress tracking with additional metrics and stages

* feat: update license information and add license check script

* feat: implement cross-file symbolic execution with callee body persistence

* feat: replace dagre graphs with Graphology + ELK + Sigma for more advanced call stack and cfg rendering

* feat: ensure CFG function view is scoped to the selected function, preventing bleed into sibling functions

* feat: enhance resource tracking with proxy method summaries and improve finding extraction

* feat: add terminal function exit detection for accurate resource leak analysis

* feat: add warnings for loops and functions without bodies to improve error recovery

* feat: update lambda expression handling to ensure proper function classification and control flow

* feat: remove bounded formatting/string ops and add JSON.parse sanitizer for improved data handling

* feat: add inline return taint analysis and regression tests for improved security checks

* feat: add engine version management and migration handling for database schema updates

* feat: enhance first_call_ident to skip nested function bodies and add regression tests

* feat: enhance callee name resolution with two-segment normalization and disambiguation

* feat: add cross-file context flags and debug assertions for taint analysis

* feat: refactor taint analysis structure to unify context handling and improve clarity

* feat: enhance dead code elimination to preserve Sink, Source, and Sanitizer labels with new tests

* docs: updated CHANGELOG.md

* fmt: formatting fixes

* fix: fixed frontend formatting and lint warnings

* fix: optimized ci

* fix: optimized ci

* Add comprehensive multi-file test coverage to Nyx (#38)

* Initial checklist for multi-file test suite expansion

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* Add 12 new multi-file test fixtures with TP/TN/near-miss coverage

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* deleted root repo

* rebuilt to test for regressions

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Co-authored-by: elipeter <elicpeter@gmail.com>

* feat: enhance import alias resolution and taint tracking

* feat: implement security hardening with CSRF protection and path validation

* feat: add support for import alias bindings in Python, PHP, and Rust

* feat: enhance CFG analysis modes and improve code readability

* feat: add detection for parameterized SQL queries to enhance security

* feat: add safe internal redirect handling and enhance session destroy validation

* feat: implement security improvements by addressing vulnerabilities in execAsync, session management, and file downloads

* feat: enhance taint detection by adding support for inline source member expressions in call arguments

* feat: implement pre-emission of Source nodes for inline source member expressions in call arguments

* feat: add support for Throw statement in control flow and error handling

* feat: add debug and echo endpoints with potential information leakage

* feat: implement internal redirect suppression and enhance taint detection

* feat: implement module alias tracking for dynamic dispatch in JS/TS

* feat: add authorization analysis module with Express support

* feat: add authorization analysis module with Express support

* feat: add tests for admin guard requirements and clean checks in authorization analysis

* feat: integrate Koa and Fastify frameworks into authorization analysis

* feat: add Flask and Django support to authorization analysis module

* feat: add support for Rails and Sinatra frameworks in authorization analysis

* feat: add support for Axum, ActixWeb, and Rocket frameworks in authorization analysis

* feat: add support for ActixWeb, Axum, and Rocket frameworks in authorization analysis

* feat: add support for Rails and Sinatra in authorization analysis

* chore: add .DS_Store to .gitignore

* refactor: simplify conditional checks and improve readability in multiple files

* refactor: update usage of Option methods for improved clarity and consistency

* refactor: improve code readability by simplifying conditional checks and formatting

* refactor: improve code formatting and readability by simplifying conditional checks

* refactor: simplify conditional checks and improve readability in multiple files

* refactor: simplify conditional checks in axum.rs for improved readability

* feat: add CodeQL analysis configuration for enhanced security scanning

* test: add comprehensive tests for `src/output.rs` SARIF builder (#39)

* chore: start test coverage improvement work

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* test: add comprehensive tests for src/output.rs SARIF builder

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* refactor: improve code formatting and readability in output.rs

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Co-authored-by: elipeter <elicpeter@gmail.com>

* refactor: improve code formatting and readability in output.rs

* Potential fix for code scanning alert no. 210: Uncontrolled data used in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Potential fix for code scanning alert no. 211: Uncontrolled data used in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* refactor: enhance triage file path handling with improved error management and validation

* refactor: updated func summaries for richer detail

* refactor: update SSA summary extraction to use canonical FuncKey for distinct entries

* refactor: enhance callee metadata structure to support arity, receiver, and qualifier for better overload resolution

* refactor: add support for keyword arguments in function calls and enhance receiver extraction for method-style calls

* refactor: implement new Flask routes for safe and unsafe shell command execution

* refactor: separate receiver handling in SSA operations and enhance taint propagation

* refactor: improve arity handling by using arg_uses for positional argument count and enhance witness scoring for tainted arguments

* refactor: implement auth decorator extraction and classification for multiple languages

* refactor: enhance Rust module path resolution and use map handling for cross-file disambiguation

* refactor: introduce CalleeQuery struct for structured callee resolution and enhance resolver logic

* refactor: implement same-file identity collision handling for `runTask` to ensure correct resolver behavior

* refactor: standardize default struct initialization across multiple files

* feat: add scripts for formatting checks and auto-fixes with test summaries

* refactor: simplify character splitting and enhance namespace qualifier handling

* refactor: improve documentation clarity and enhance code readability in resolver logic

* refactor: replace default struct initialization with explicit field assignments for clarity

* feat: enhance anonymous function naming by deriving context-based bindings

* refactor: streamline match expressions for improved readability and performance

* refactor: streamline match expressions for improved readability and performance

* refactor: replace loop with while let for improved clarity and performance

* feat: add SSA constant propagation support to analysis context for improved accuracy

* feat: add SSA constant propagation support to analysis context for improved accuracy

* feat: implement shell metacharacter validation and bounded-length checks in Rust analysis

* feat: add static map analysis for command injection suppression and type safety

* refactor: simplify match statements and reduce line breaks for improved readability

* feat(summary): phase 1/5 SinkSite data model for primary sink-location attribution

Introduce SinkSite (file_rel, line, col, snippet, cap) carrying the
primary sink source-location through function summaries. Swap
SsaFuncSummary.param_to_sink and FuncSummary.param_to_sink from a coarse
Cap map to a deduped SmallVec<[SinkSite; 1]> per parameter, with a
backward-compatible cap_sites() helper and serde defaults so pre-phase-1
on-disk rows continue to deserialise cleanly.

Extraction: SinkSiteLocator bundles the tree/bytes/file_rel needed by
extract_ssa_func_summary; ParsedFile::extract_ssa_artifacts wires the
locator in for the persisted pass-1 path, while pass-2 intra-file
transient summaries fall back to cap-only sites (behavior unchanged).
Merge: GlobalSummaries::insert now unions sink sites with
(file_rel, line, col, cap) dedup via shared union_param_sink_sites
helper.

Database: JSON-serialised summary columns carry the new shape
automatically; no schema change needed.

Phase 2 will consume SinkSite in build_taint_diag() to overwrite the
caller-site Finding.line with the callee's sink line when resolved via
summary. Phase 1 keeps behavior unchanged: scanning
tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs still produces the
same (wrong) line 10 finding.

Adds round-trip tests covering SinkSite solo, SsaFuncSummary with sink
sites, legacy-JSON default handling for both summary types, and merge
dedup.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(taint): phase 2/5 thread SinkSite into SsaTaintEvent and Finding

Plumb Phase 1's SinkSite through the event pipeline into Findings,
no output change yet.  SsaTaintEvent gains `primary_sink_site:
Option<SinkSite>`; when the main or callback sink-emission path has
non-empty `param_to_sink_sites`, filter to sites whose
`(line != 0) && (cap ∩ sink_caps != ∅)` and emit one event per
distinct site — the multi-primary collapse keeps each downstream
Finding single-primary.

Resolution: ResolvedSummary and SinkInfo gain mirror
`param_to_sink_sites` fields, populated from `SsaFuncSummary.param_to_sink`
(SSA + callback paths) and `FuncSummary.param_to_sink` (global paths).
Label, local-summary, and interop resolution paths leave the field
empty — they only ever had cap-level info to begin with.

Finding: new `primary_location: Option<SinkLocation>` with
`file_rel/line/col`.  `ssa_events_to_findings` maps
`event.primary_sink_site` → `Finding.primary_location`, filtering
cap-only sites (`line == 0`) to `None` so the (0,0) sentinel never
leaks to formatters.  Dedup key extended with the primary location
so multi-site events aren't collapsed back together.

Invariants (debug_assert!):
* every SinkSite reaching emission has `line != 0 && cap ∩ sink_caps
  != ∅` — enforced by the pick_primary_sink_sites* filters;
* every populated Finding.primary_location has `line != 0` AND
  non-empty `file_rel` — the cap-only → None translation upstream
  guarantees this.

Deliberately independent of `uses_summary`: that flag tracks whether
the *taint chain* used a summary, whereas primary attribution
requires only that the *sink* itself was summary-resolved.  A local
source reaching a cross-file sink produces `uses_summary=false`
alongside a populated primary_location — documented on
Finding.primary_location, covered by
`cross_file_sink_finding_carries_primary_location`.

build_taint_diag, SARIF/JSON/explanation formatters, and the
benchmark scorer remain untouched: finding.line still comes from
`cfg_graph[finding.sink]`, so cmdi_indirect.rs still reports line 10
and the benchmark's rs-cmdi-003 row still shows FN in the LOC column.

Tests: `cross_file_sink_finding_carries_primary_location` (proves
plumbing via a synthetic FuncSummary carrying a SinkSite at 42:5) and
`cross_file_sink_cap_only_site_leaves_primary_location_none`
(regression guard against cap-only sites surfacing).  All 1566 lib
tests + integration tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(output): phase 3/5 consume primary sink location in diag + SARIF

When a finding's primary_location (populated in phase 2 from a callee
summary's SinkSite) names the dangerous instruction inside a callee
body, attribute the diagnostic line to that location instead of the
caller's call site. The call site is demoted to a Call step in
flow_steps, and a synthetic Sink step at the primary location is
appended so analysts still see the full trace.

Changes:
- Add scan_root parameter to build_taint_diag so file_rel can be
  resolved back to an absolute path via a shared resolve_file_rel
  helper. Empty file_rel (single-file scans where namespace == "")
  resolves to the file under analysis.
- Extend SinkLocation with snippet, carried from the upstream
  SinkSite so the formatter needs no second file read.
- Relax the ssa_events_to_findings debug_assert to allow empty
  file_rel, which is valid when scan root equals the file itself.
- SARIF: emit data-flow as codeFlows[0].threadFlows[0].locations[];
  locations[0] already reflects the primary sink position via the
  updated diag line/col.

Acceptance: scan on tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs
now reports line 5 (Command::new) as the primary sink, with the call
site at line 10 visible in flow_steps.

Two expect.json fixtures updated (must_match line_range widened):
- javascript/taint/context_sensitive_call: 12-14 -> 7-14 (line 8 is
  the real sink inside run()).
- rust/cfg/closure_async: 10-10 -> 10-11 (line 11 is Command::new
  inside the closure).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(bench): phase 4/5 validate primary sink attribution across corpus

Extend the benchmark scorer and ground truth to lock in phase 3's
primary-location behavior, and add fixtures that exercise the new
capability end-to-end.

Scorer (tests/benchmark_test.rs):
- Add optional `expected_call_site_lines: Option<Vec<[usize; 2]>>` on
  Case. When present, score_location_level additionally requires at
  least one flow_step in the finding's evidence trace to fall within
  ±2 of the call-site range. When absent, the check is skipped —
  fully forward-compatible with existing fixtures.
- Retain ±2 tolerance on expected_sink_lines (compared against the
  now-primary Diag.line post-phase-3).

Ground truth edits:
- rs-cmdi-cross-001: expected_sink_lines [8,8] -> [9,9]. Line 8 is the
  transform::wrap call site (a cross-file propagator, not a sink);
  line 9 is Command::new, the real sink. The ±2 tolerance happened to
  mask this stale attribution but it was semantically wrong — phase 4
  is the right time to correct it. Also adds expected_call_site_lines
  [8,8] so the new field is exercised on an existing cross-file case.
- rs-cmdi-003: adds expected_call_site_lines [10,10] (run_cmd call).
  This fixture's sink (Command::new inside run_cmd at line 5) was the
  motivating case for phases 1-3; adding the call-site assertion
  guards against regression to caller-line attribution.

New fixtures:
- rust/cmdi/cmdi_indirect_multisink.rs (rs-cmdi-009): helper run_both
  takes two tainted params and invokes two Command sinks on
  consecutive lines. Locks in that primary line lands inside the
  helper (lines 5-6), not at the caller (line 12). Notes document
  that SinkSite is currently one-per-callee so both findings today
  collapse onto the first sink; expected_sink_lines=[5,6] and
  expected_call_site_lines=[12,12] stay valid either way.
- python/cmdi/cross_indirect_sink/{app.py,helper.py} (py-cmdi-cross-
  004): sink os.system lives in helper.py (cross-file), caller in
  app.py reads env source and calls run_cmd. Verifies phase 3's
  cross-file primary attribution: Diag.path = helper.py, Diag.line =
  5, with app.py:7 recorded in flow_steps as a Call step.

Acceptance:
- `cargo test --test benchmark_test -- --ignored --nocapture` passes.
- rs-cmdi-003 is TP/TP/TP (the target flip FN->TP at LOC). All
  pre-existing TP/TP/TP fixtures remain TP/TP/TP; 2 new fixtures are
  TP/TP/TP.
- Aggregate rule-level: TP=158 FP=10 FN=1 TN=97, P=0.940 R=0.994
  F1=0.966 on the 266-case corpus (was TP=156 FP=10 FN=1 TN=97 on
  264 pre-phase-4, delta is the +2 new cases both resolving TP).
- Full `cargo test` green (1566 lib tests + all integration tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(taint): phase 5/5 lock Finding.primary_location contract via regression test

Add a regression test in src/taint/ssa_transfer.rs that wires up a synthetic
SsaFuncSummary with a SinkSite at other.rs:42:10 and drives the three
emission stages (pick_primary_sink_sites → emit_ssa_taint_events →
ssa_events_to_findings) against a minimal caller SSA body.  Asserts the
resulting Finding.primary_location is exactly that triple.

The existing integration tests in src/taint/tests.rs cover the coarse
FuncSummary path end-to-end through analyse_file.  This test locks in the
lower-level SSA-side plumbing so a future refactor that silently drops the
site between pick → emit → findings fails here rather than only at the
benchmark layer.

Also refreshes tests/benchmark/results/latest.json (timestamp only; rs-cmdi-003
remains TP/TP/TP and the aggregate P/R/F1 are unchanged from phase 4).

Closes the primary sink-location attribution feature (phases 1-5/5):
* Phase 1 — SinkSite data model on summaries.
* Phase 2 — SinkSite threaded into SsaTaintEvent and Finding.
* Phase 3 — diag + SARIF consume primary_location.
* Phase 4 — benchmark validates primary_call_site_lines across corpus.
* Phase 5 — regression test locks the event→finding contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: clean up formatting and improve readability in multiple files

* refactor: simplify type definition for deduplication key in findings

* test(harness): add must_not_match expectation for FP regression guards

Extends ExpectedFinding with must_not_match field that asserts a
diagnostic must NOT fire — presence is a hard failure. Non-consuming
scan so it coexists with must_match entries on the same rule_id.
Adds forbidden_violations accumulator and updates summary line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(regression): update expectations to ensure must_not_match for various taint and resource leak rules

* feat: implement auto-seeding for JS/TS handler parameters to enhance taint tracking

* feat: update switch statement handling to improve control flow analysis

* feat: implement promisify alias handling for JS/TS to enhance taint tracking

* feat: enhance taint tracking by refining expectation handling and adding mode filtering

* feat: refine SQL handling in stream processing and enhance auto-seeding for handler parameters

* feat: update taint tracking rules to enforce full mode matching and improve flow analysis

* feat: enhance Ruby subshell handling to improve taint tracking and flow analysis

* feat: update xss_response expectations to refine taint flow analysis and enhance regression guarding

* feat: refine framework detection and update expectation handling for Echo and Sinatra

* feat: implement max_count for taint tracking expectations and deduplicate findings

* feat: add strict_unexpected handling for taint-unsanitised-flow in expectation files

* feat: enhance deduplication of taint-unsanitised-flow findings by collapsing based on line and severity

* feat: add strict_unexpected handling for taint-unsanitised-flow in multiple expectation files

* feat: add structural invariant checks for SSA bodies

* feat: ensure deterministic phi emission order using BTreeSet

* feat: enhance handling of terminators to ensure authoritative flow through successor edges

* feat: enhance Goto terminator handling to ensure all successors are marked executable

* feat: refactor code for improved readability and organization

* feat: simplify predicate checks and enhance readability in SSA handling

* feat: implement per-file parse timeout and enhance file size handling

* feat: migrate analysis engine toggles from environment variables to configuration file

* feat: remove unnecessary whitespace in hostile_input_tests.rs

* feat: remove unnecessary whitespace in hostile_input_tests.rs

* feat: update dependencies and enhance documentation on language maturity

* feat: enhance security headers and improve request body limits

* feat: implement sink capability bits for deduplication and enhance evidence tagging

* feat: implement dynamic activation handling for gated sinks and enhance validation logic

* feat: enhance configuration documentation and clarify inline analysis cache behavior

* feat: implement panic recovery during analysis to continue scans past errors

* feat: add expectations configuration for taint analysis and performance metrics

* feat: enhance error handling and logging during file reading and mutex locking

* feat: add cross-file body loading tests and plumbing for CF-1 phase

* feat: implement cross-file k=1 context-sensitive inline taint analysis with new tests and fixtures

* feat: implement indexed-scan parity in cross-file inline analysis with new dropdown and copy functionality

* feat: enhance classification span handling in CFG and AST for improved source attribution

* feat: add new Express routes for handling user input and telemetry data

* feat: implement ternary expression handling in CFG with diamond structure for JS/TS

* feat: implement Phase CF-3 abstract-domain transfer channels in summaries

* feat: add support for string-prefix transfer in cross-file calls and update tests

* docs: reduce RESULTS.md doc size

* feat: implement Phase CF-4 per-return-path summary decomposition with tests

* feat: update parameter handling in pass1 and refactor SsaFuncSummary initialization

* feat: implement Phase CF-5 for cross-file SCC joint fixed-point convergence with new flags and tests

* feat: implement Phase CF-6 with parameter-granularity points-to summaries and associated tests

* refactor: update comments and documentation for clarity and consistency

* style: format code for consistency and readability

* refactor: simplify verdict handling and improve edge checking logic

* refactor: optimize path and identifier collection by avoiding unnecessary cloning

* chore: update Cargo.toml for Rust version 1.85 and add ignored files; modify CHANGELOG and README for clarity on state analysis defaults

* refactor: update documentation and improve clarity in configuration files

* refactor: update documentation and improve clarity in configuration files

* feat: add JS/TS pass-2 convergence tests and expectations configuration

* feat: add Phase 5 regression tests for inline cache origin attribution and update related logic

* feat: implement Phase 7 deduplication and alternative path linking for taint findings

* feat: implement structural DFS index for anonymous functions and update naming conventions

* feat: add Phase 8 regression tests for container-element taint in JS and Python

* feat: add engine-depth profiles and explain-engine option for CLI

* feat: update expectations and add new README fixtures for multi-file scan regression

* feat: implement Phase 11 callback-alias and factory patterns with regression tests

* feat: implement Terminator::Switch for multi-way dispatch and add regression tests

* feat: add real-CVE benchmark fixtures for CVE-2023-48022, CVE-2019-14939, and CVE-2023-26159 with corresponding patched variants

* refactor: extract cfg and ssa_transfer to submodules

* refactor: cargo fmt

* refactor: remove unnecessary blank line in cfg_tests.rs

* refactor: remove unnecessary planning file

* chore: update Rust version to 1.88 and bump dependencies in Cargo files

* feat: enhance triage UI with new layout and controls, update README for clarity

* feat: enhance triage UI with new layout and controls, update README for clarity

* chore: remove outdated section from README for version 0.5.0

* docs: improve clarity and consistency in README content

* chore: add "GPL-3.0-or-later" to license options in about.toml

* chore: update license handling in about.toml and check-licenses.mjs

* style: format code for improved readability in TriagePage component

* style: format code for improved readability in TriagePage component

* chore: enhance license handling and improve body_id scoping in seed lookup

* feat: introduce owner and parent body IDs for enhanced seed scoping

* feat: implement direction-aware engine provenance with new CLI flag for strict CI gating

* feat: add Undef SSA operation for improved control-flow handling

* style: improve code formatting for consistency and readability in multiple files

* feat: add 16-function chain SCC across multiple files for enhanced analysis

* style: simplify code formatting for improved readability in multiple files

* fix: update CapHitReason default implementation and improve README clarity

* docs: enhance README with detailed explanations of taint analysis and limitations

* docs: refine README for clarity and consistency in taint analysis section

* style: improve code formatting for better readability in NewScanModal and scans

* fix: update cargo-about command to use --offline for deterministic license generation

* fix: update cargo-about command to use --offline for deterministic license generation

* ci: add step to prime cargo registry cache for deterministic license generation

* feat: add support for non-sink collections in authorization analysis

* feat: enhance authorization checks with row-level ownership equality and binding tracking

* feat: implement self-scoped user handling and enhance ownership checks

* refactor: simplify assertions and formatting in authorization analysis tests

* fix: normalize line endings in THIRDPARTY-LICENSES.html generation and update README with AI disclosure

* docs: update AI disclosure section for clarity and conciseness

* feat: add AI Contribution Policy and update contributing guidelines for AI assistance disclosure

* feat: enhance authorization analysis with SSA-derived variable type classification

* feat: implement auth_finding_to_diag function for enhanced security diagnostics

* feat: add args_value_refs to CallSite struct for enhanced argument tracking

* feat: add args_value_refs to CallSite struct for enhanced argument tracking

* feat: add direction-aware engine provenance with LossDirection classification and new CLI flag

* feat: simplify strip_cap_from_call_args call by removing unnecessary line breaks

* feat: enhance error message handling in cli_validation_tests for better Windows compatibility

* feat: optimize release profile settings in Cargo.toml and update CodeQL configuration

* feat: enhance release build process with SBOM generation and SLSA provenance

* feat: update actions/checkout and actions/setup-node to v6, enhance CLI options, and improve auth-check summaries

* feat: introduce PathFact handling for path safety checks and rejection logic

* feat: introduce PathFact handling for path safety checks and rejection logic

* feat: update benchmark data and enhance path sanitization logic with new safety checks

* feat: document AI assistance in frontend UI development and human review process

* feat: add return path facts for enhanced path safety checks and update documentation

* chore: update release date for version 0.5.0 in CHANGELOG.md

* chore: clean up ci.yml by removing outdated comments and clarifying steps

* feat: implement cross-language path sanitizers and validators for enhanced security

* feat: enhance SSA value usage tracking by including block terminators and improve path safety checks

* feat: enhance switch statement handling by adding per-case path constraints and support for exclusive cases

* refactor: simplify conditional formatting and improve code readability in executor and lower modules

* feat: add vulnerable examples for various languages demonstrating authentication and sanitization issues

* feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers

* feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers

* feat: add transform classifiers for Java, Go, and Ruby with corresponding tests

* refactor: clarify comments on reassign-to-constant idiom and sink behavior in guards.rs

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Eli Peter 2026-04-25 17:59:11 -04:00 committed by GitHub
parent c4ce08b452
commit 41128177d2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2144 changed files with 201812 additions and 8927 deletions

29
docs/SUMMARY.md Normal file
View file

@ -0,0 +1,29 @@
# Summary
# Getting started
- [Quickstart](quickstart.md)
- [Installation](installation.md)
# Using nyx
- [CLI reference](cli.md)
- [Browser UI](serve.md)
- [Configuration](configuration.md)
- [Output formats](output.md)
# Coverage
- [Language maturity](language-maturity.md)
- [Rules](rules.md)
- [Auth analysis](auth.md)
# Under the hood
- [How it works](how-it-works.md)
- [Advanced analysis](advanced-analysis.md)
- [Detectors](detectors.md)
- [Patterns](detectors/patterns.md)
- [CFG](detectors/cfg.md)
- [State](detectors/state.md)
- [Taint](detectors/taint.md)

221
docs/advanced-analysis.md Normal file
View file

@ -0,0 +1,221 @@
# Advanced Analysis
Nyx ships four optional analysis passes that layer on top of the core SSA
taint engine. Each pass is independently switchable via config
(`[analysis.engine]` in `nyx.conf` / `nyx.local`), a matching CLI flag pair,
or; as a legacy last-resort override for library users with no CLI entry
point; a `NYX_*` environment variable. All four are **on by default**: turning
them off trades precision for speed.
See [`Configuration`](configuration.md#analysisengine) for the full config
surface and CLI flag table. This page explains what each pass does, why it
helps, how to disable it, and what it does not cover.
---
## Abstract interpretation
**What it does.** Propagates interval and string abstract domains through the
SSA worklist alongside taint. Integer values carry `[lo, hi]` bounds;
string values carry a prefix and suffix (plus a bit domain for known-zero /
known-one bits). Values are joined at merge points and widened at loop
heads so the worklist always terminates.
**Why it helps.** Lets Nyx suppress some findings that are obviously safe
given the abstract value; a proven-bounded integer does not flow into a
SQL sink as an injection risk; an SSRF sink whose URL prefix is locked to a
trusted host stays quiet. This turns a large class of FPs on numeric and
locked-prefix paths into true negatives.
**How to turn it off.**
| Surface | Value |
|---|---|
| Config | `abstract_interpretation = false` under `[analysis.engine]` |
| CLI flag | `--no-abstract-interp` |
| Env var (legacy) | `NYX_ABSTRACT_INTERP=0` |
**Limitations.** The interval domain is 64-bit signed; very wide or
overflow-producing arithmetic degrades to `` (unbounded). String prefix /
suffix tracking is concat-only; it does not model reordering, reversal, or
character-level regex constraints. Loop widening deliberately drops
changing bounds rather than chasing fixpoints.
**Source**: [`src/abstract_interp/`](https://github.com/elicpeter/nyx/tree/master/src/abstract_interp/).
---
## Context-sensitive analysis
**What it does.** Adds k=1 call-site-sensitive taint propagation for
intra-file callees. When a function is invoked, Nyx reanalyzes the callee
body with the actual per-argument taint signature of the call site,
producing call-site-specific return taint. Results are cached by
`(function_name, ArgTaintSig)` so repeated calls with the same signature
are free.
**Why it helps.** A helper called once with a tainted argument and once
with a sanitized argument produces two different findings; without k=1
sensitivity, the conservative union of both call sites would be applied
to the sanitized call, producing a spurious finding there.
**How to turn it off.**
| Surface | Value |
|---|---|
| Config | `context_sensitive = false` under `[analysis.engine]` |
| CLI flag | `--no-context-sensitive` |
| Env var (legacy) | `NYX_CONTEXT_SENSITIVE=0` |
**Limitations.** Intra-file only. Cross-file callees are resolved via
summaries (see `src/summary/`) rather than re-inlined. Depth is capped at
k=1 to prevent cache blow-up and re-entrancy; higher k would require a
different cache key design. Callee bodies larger than the internal
`MAX_INLINE_BLOCKS` threshold fall back to the summary path. Cache keys
hash per-argument `Cap` bits but not source-origin identity, so two
callers with identical caps but different origins share cached
origin-attribution.
**Source**: [`src/taint/ssa_transfer.rs`](https://github.com/elicpeter/nyx/blob/master/src/taint/ssa_transfer.rs)
(`ArgTaintSig`, `InlineCache`, `inline_analyse_callee`).
---
## Symbolic execution
**What it does.** Builds a symbolic expression tree per tainted SSA value,
generates a witness string for each taint finding (the concrete-looking
shape of the dangerous value at the sink), and detects sanitization
patterns that the taint engine alone would miss. Supports string
operations (`trim`, `replace`, `toLower`, `substring`, `strlen`, …),
arithmetic, concatenation, phi nodes, and opaque calls.
**Why it helps.** Raises finding quality. A taint finding with a rendered
witness like `"SELECT * FROM t WHERE id=" + userInput` is substantially
easier to triage than one without. Also powers some confidence-gating for
downstream display.
**How to turn it off.**
| Surface | Value |
|---|---|
| Config | `symex.enabled = false` under `[analysis.engine]` |
| CLI flag | `--no-symex` |
| Env var (legacy) | `NYX_SYMEX=0` |
Two nested switches refine the scope without disabling symex entirely:
| Setting | CLI | Env | Default | Effect |
|---|---|---|---|---|
| `symex.cross_file` | `--no-cross-file-symex` | `NYX_CROSS_FILE_SYMEX=0` | on | Consult cross-file SSA bodies so symex can reason about callees defined in other files |
| `symex.interprocedural` | `--no-symex-interproc` | `NYX_SYMEX_INTERPROC=0` | on | Intra-file interprocedural symex (k ≥ 2 via frame stack) |
**Limitations.** Expression trees are bounded at `MAX_EXPR_DEPTH=32`;
deeper expressions degrade to `Unknown` rather than growing unboundedly.
Sanitizer detection is informational: string-replace sanitizer patterns
are reported as witness metadata, not used to clear taint.
**Source**: [`src/symex/`](https://github.com/elicpeter/nyx/tree/master/src/symex/).
---
## Demand-driven analysis
**What it does.** After the forward pass-2 taint analysis finishes, runs a
*backwards* walk from each sink's tainted SSA operands. The walk follows
reverse SSA-edge transfer (phi fan-out, `Assign` operand-fanout, `Call`
body-expansion or arg-fanout) until it reaches a taint source, proves
the flow infeasible via an accumulated path predicate, or exhausts its
budget. Each forward finding is then annotated with the aggregate verdict:
- `backwards-confirmed`; a matching source was reached. Finding picks
up a small confidence boost and the note appears in
`evidence.symbolic.cutoff_notes`.
- `backwards-infeasible`; every walk proved the flow unreachable.
Finding is capped to Low confidence and a user-readable limiter is
attached.
- `backwards-budget-exhausted`; the walk hit `BACKWARDS_VALUE_BUDGET`
without a verdict. Recorded as a limiter so operators can see when
the pass could not keep up.
- Inconclusive outcomes are a no-op: the forward finding is untouched.
Because the backwards walk can consult `GlobalSummaries.bodies_by_key`
(populated by the cross-file callee body persistence layer) it closes
across file boundaries; when a callee body is not loadable the walk
falls back to fanning out over the call's arguments so local reach-back
is still possible.
**Why it helps.** Inverts the analysis direction so budget follows
questions the scanner actually cares about; "does any source reach
*this* sink?"; instead of proving every potential source-to-sink
path. Corroborated findings are a stronger signal than forward-only
ones, and proven-infeasible flows provide a principled way to lower
confidence on forward false positives without silently dropping them.
**How to turn it on.** Defaults off so the benchmark floor is preserved
while the pass stabilises.
| Surface | Value |
|---|---|
| Config | `backwards_analysis = true` under `[analysis.engine]` |
| CLI flag | `--backwards-analysis` / `--no-backwards-analysis` |
| Env var (legacy) | `NYX_BACKWARDS=1` |
**Limitations (first cut).** Reverse call-graph expansion past a
`ReachedParam` is deferred; the walk terminates at function parameters
rather than crossing back into callers. Path-constraint pruning is
conservative: only the accumulated `PredicateSummary` bits are consulted,
not the full symbolic predicate stack. Depth-bounded at k=2 for
cross-function body expansion. See `DEFAULT_BACKWARDS_DEPTH`,
`BACKWARDS_VALUE_BUDGET`, and `MAX_BACKWARDS_CALLEE_BLOCKS` in
`src/taint/backwards.rs` for the exact bounds.
**Source**: [`src/taint/backwards.rs`](https://github.com/elicpeter/nyx/blob/master/src/taint/backwards.rs).
---
## Constraint solving
**What it does.** Collects path constraints at each branch in SSA and
propagates them alongside taint. Prunes paths whose accumulated constraint
set is unsatisfiable; a taint flow guarded by `if x < 0 && x > 10` is
dropped rather than surfaced. Optionally delegates the satisfiability
check to Z3 when Nyx is built with the `smt` Cargo feature.
**Why it helps.** Removes a class of FPs rooted in clearly-infeasible
control-flow combinations. Without path constraints, a taint flow that
only occurs when mutually-exclusive branches are simultaneously taken can
still produce a finding.
**How to turn it off.**
| Surface | Value |
|---|---|
| Config | `constraint_solving = false` under `[analysis.engine]` |
| CLI flag | `--no-constraint-solving` |
| Env var (legacy) | `NYX_CONSTRAINT=0` |
The SMT backend is a separate switch:
| Setting | CLI | Env | Default | Effect |
|---|---|---|---|---|
| `symex.smt` | `--no-smt` | `NYX_SMT=0` | on when built with `smt` feature | Delegate satisfiability checks to Z3; ignored if Nyx was built without `smt` |
**Limitations.** The default path-constraint domain is syntactic;
trivially-inconsistent pairs are caught without an SMT solver, but richer
algebraic unsatisfiability requires the `smt` feature (Z3). Without `smt`,
Nyx ships a lightweight satisfiability check that catches literal
contradictions but not deeper reasoning.
**Source**: [`src/constraint/`](https://github.com/elicpeter/nyx/tree/master/src/constraint/).
---
## Combining the switches
The defaults (all on) are the configuration Nyx is benchmarked against.
Turning any switch off trades precision for speed and may move findings
relative to the published baseline; CI regression gates assume defaults.
If you need a minimal-overhead scan (for very large repositories or a
pre-commit fast path), the AST-only scan mode (`--mode ast`) skips CFG,
taint, and all four advanced passes entirely and is the right tool.

1
docs/assets Symbolic link
View file

@ -0,0 +1 @@
../assets

91
docs/auth.md Normal file
View file

@ -0,0 +1,91 @@
# Auth analysis
**Rust today.** Other languages have rule scaffolding in [`src/auth_analysis/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/config.rs) (Python, Ruby, Go, Java, JavaScript, TypeScript), but only Rust has benchmark corpus coverage and the precision work to back it. Treat findings on other languages as preview; the rule prefix (`py.auth.*`, `js.auth.*`, `rb.auth.*`, `go.auth.*`, `java.auth.*`) is reserved but the matchers haven't been validated against real codebases yet.
## What it catches
The Rust rule is `rs.auth.missing_ownership_check`. It fires when a request handler reaches a privileged operation that takes a scoped identifier (`*_id`, row reference, scoped resource) without a preceding ownership or membership check.
Concretely, it looks for five patterns of authorization in the function body and flags the call when none are present:
- A call to a recognised authorization helper. Defaults: `check_ownership`, `has_ownership`, `require_ownership`, `ensure_ownership`, `is_owner`, `authorize`, `verify_access`, `has_permission`, `can_access`, `can_manage`, plus `*_membership` and `require_{group,org,workspace,tenant,team}_member` variants. Extend in `[analysis.languages.rust]`.
- An ownership-equality check on a row reference: `if owner_id != user.id { return 403 }` or any `field_id != self_actor` shape. The check writes `AuthCheck` evidence back to the row-fetch arguments via `AnalysisUnit.row_field_vars`.
- A self-actor reference: `let user = require_auth(...).await?` followed by use of `user.id`, `user.user_id`, `user.uid`. The actor is recognised from typed extractor params (`Extension<Session>`, `CurrentUser`, etc.) and from typed helper bindings.
- A SQL query that joins through an ACL table or filters by `user_id` predicate. Detected without a SQL parser via [`sql_semantics.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/sql_semantics.rs); the authorized result variable propagates through `let row = ...prepare(LIT)...`, `for row in result`, `let id = row.get(...)`.
- A helper-summary lift: handler calls `validate_target(db, widget_id, user.id)` whose body contains a `require_*_member` call. Cross-function summaries are merged at fixed-point (capped at 4 iterations).
## Sink classification
The same call name can be safe on a local collection and dangerous on a database. The detector categorises each candidate sink before deciding whether to flag:
| Class | Examples | Default treatment |
|---|---|---|
| `InMemoryLocal` | `map.insert`, `set.insert`, `vec.push` on tracked local | Never a sink |
| `RealtimePublish` | `realtime.publish_to_group`, `pubsub.send` | Sink unless ownership is established for the channel scope |
| `OutboundNetwork` | `http.post`, `reqwest::Client::post` | Sink unless a sanitiser is on the path |
| `CacheCrossTenant` | `redis.set`, `memcached.set` with scoped keys | Sink unless tenant is checked |
| `DbMutation` | `db.insert`, `repo.save` with scoped IDs | Sink unless ownership is established |
| `DbCrossTenantRead` | `db.query` returning rows from a tenant scope | Sink unless ACL-join or tenant predicate is present |
Receiver type drives the classification when SSA type facts are available, so `client.send(...)` correctly resolves through the receiver's inferred type.
## What it can't catch
- **Non-Rust frameworks**, in practice. Scaffolding exists; coverage doesn't.
- **Type-system authorization.** A typestate pattern that makes unauthenticated handlers fail to compile (`fn endpoint(user: AuthenticatedUser<Admin>)`) is invisible. This is mostly fine because the type system already enforced the check, but the rule won't credit it.
- **Authorization performed only via macros** that the AST doesn't expose as a recognisable call.
- **Cross-async-boundary actor binding.** If the handler awaits `let user = require_auth(...).await?` and then spawns a task that uses `user.id` after a `tokio::spawn`, the spawn body is treated as a separate scope.
## The taint-based variant
A second rule, `rs.auth.missing_ownership_check.taint`, folds the same logic into the SSA/taint engine using the `Cap::UNAUTHORIZED_ID` capability (bit 12). Request-bound handler parameters seed `UNAUTHORIZED_ID` into taint state; ownership checks act as sanitizers that strip the cap; sinks that take scoped IDs require it absent.
This path is **off by default** while the standalone analyser carries the stable signal. Enable both:
```toml
[scanner]
enable_auth_as_taint = true
```
Run them together; if both fire for the same site, treat it as the same finding (the taint variant carries fuller flow evidence).
## Tuning
### Add a project-specific authorization helper
```toml
[[analysis.languages.rust.rules]]
matchers = ["require_subscription", "ensure_paid_seat"]
kind = "sanitizer"
cap = "unauthorized_id"
```
The same rule recognised in the standalone analyser also strips `Cap::UNAUTHORIZED_ID` for the taint-based variant.
### Recognised actor names
Recognised by default: `user.id`, `user.user_id`, `user.uid`, `session.user_id`, `current_user.id`, plus typed extractor parameters with `CurrentUser`, `SessionUser`, `AuthUser`, `Extension<...>` shapes. To add a custom binding pattern, file an issue or add a fixture; the heuristic is in [`src/auth_analysis/checks.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/checks.rs) under `extract_validation_target` and friends.
### Suppress
Inline:
```rust
db.insert(widget_id, value)?; // nyx:ignore rs.auth.missing_ownership_check
```
Or filter by severity / confidence in CI:
```bash
nyx scan . --severity ">=MEDIUM" --min-confidence medium
```
## In the UI
Auth findings render alongside taint findings in the [browser UI](serve.md). The flow visualiser shows the sink call, the actor reference (when one was found), and any helper-summary path the engine traversed; the How to fix panel mirrors the rule's recommendation.
<p align="center"><img src="../assets/screenshots/docs/serve-finding-detail.png" alt="Nyx finding detail: numbered source → call → sink walk with a How to fix panel and an inline evidence object" width="900"/></p>
## Where the work was done
The remediation work is documented release-by-release in `tests/benchmark/RESULTS.md` under the Rust auth row. Phases A1 through B5 (precision and structural improvements) and Phase C (taint-based variant) all landed on the 0.5.0 release branch. The benchmark corpus at [`tests/benchmark/corpus/rust/auth/`](https://github.com/elicpeter/nyx/tree/master/tests/benchmark/corpus/rust/auth/) is 10 fixtures covering the five FP patterns plus a true-positive control.

1
docs/changelog.md Normal file
View file

@ -0,0 +1 @@
{{#include ../CHANGELOG.md}}

View file

@ -53,8 +53,15 @@ nyx scan [PATH] [OPTIONS]
| Flag | Default | Description |
|------|---------|-------------|
| `-f, --format <FMT>` | `console` | Output format: `console`, `json`, or `sarif` |
| `--quiet` | off | Suppress status messages (stderr); stdout stays clean |
| `--quiet` | off | Suppress status messages (stderr), including the Preview-tier banner for C/C++ scans |
| `--no-rank` | off | Disable attack-surface ranking |
| `--no-state` | off | Disable state-model analysis (resource lifecycle + auth state). Overrides `scanner.enable_state_analysis` |
### Profiles
| Flag | Default | Description |
|------|---------|-------------|
| `--profile <NAME>` | *(none)* | Apply a named scan profile. Built-ins: `quick`, `full`, `ci`, `taint_only`, `conservative_large_repo`. User-defined profiles override built-ins with the same name. CLI flags still take precedence over profile values |
### Filtering
@ -63,10 +70,11 @@ nyx scan [PATH] [OPTIONS]
| `--severity <EXPR>` | *(none)* | Filter findings by severity |
| `--min-score <N>` | *(none)* | Drop findings with rank score below N |
| `--min-confidence <LEVEL>` | *(none)* | Drop findings below this confidence level (`low`, `medium`, `high`) |
| `--require-converged` | off | Drop findings whose engine provenance notes indicate widening (over-report) or analysis bail. Keeps `under-report` findings (emitted flow is still real). Intended for strict CI gates. |
| `--fail-on <SEV>` | *(none)* | Exit code 1 if any finding >= this severity |
| `--show-suppressed` | off | Show inline-suppressed findings (dimmed, tagged `[SUPPRESSED]`) |
| `--keep-nonprod-severity` | off | Don't downgrade severity for test/vendor paths |
| `--all` | off | Disable category filtering, rollups, and LOW budgets show everything |
| `--all` | off | Disable category filtering, rollups, and LOW budgets -- show everything |
| `--include-quality` | off | Include Quality-category findings (hidden by default) |
| `--max-low <N>` | `20` | Maximum total LOW findings to show |
| `--max-low-per-file <N>` | `1` | Maximum LOW findings per file |
@ -85,6 +93,65 @@ nyx scan [PATH] [OPTIONS]
**Deprecated aliases**: `--high-only` (use `--severity HIGH`), `--include-nonprod` (use `--keep-nonprod-severity`).
`--fail-on` returns a non-zero exit code when the threshold trips, so CI jobs fail without further wiring:
<p align="center"><img src="../assets/screenshots/docs/cli-failon.png" alt="nyx scan with --fail-on HIGH against a small fixture: three HIGH taint findings printed, followed by exit=1 from the shell" width="900"/></p>
Quality-category and rollup-prone Low findings are filtered down by default. The footer tells you exactly what got dropped and which knob to turn:
<p align="center"><img src="../assets/screenshots/docs/cli-rollup-tail.png" alt="nyx scan tail: warning '*' generated 57 issues; Suppressed 92 LOW/Quality findings; Active filters max_low=20, max_low_per_file=1, max_low_per_rule=10; Use --include-quality, --max-low, or --all to adjust" width="900"/></p>
### Analysis Engine Toggles
Override the corresponding `[analysis.engine]` values in `nyx.conf` for a single run. All default **on**; pass the `--no-*` variant to disable.
| Pair | Config field | Effect when disabled |
|------|---|---|
| `--constraint-solving` / `--no-constraint-solving` | `constraint_solving` | Skip path-constraint solving; infeasible paths no longer pruned |
| `--abstract-interp` / `--no-abstract-interp` | `abstract_interpretation` | Skip interval / string / bit abstract domains |
| `--context-sensitive` / `--no-context-sensitive` | `context_sensitive` | Treat intra-file callees insensitively (summary-only) |
| `--symex` / `--no-symex` | `symex.enabled` | Skip the symex pipeline; no symbolic verdicts or witnesses |
| `--cross-file-symex` / `--no-cross-file-symex` | `symex.cross_file` | Skip extracting / consulting cross-file SSA bodies |
| `--symex-interproc` / `--no-symex-interproc` | `symex.interprocedural` | Cap symex frame stack at the entry function |
| `--smt` / `--no-smt` | `symex.smt` | Skip the SMT backend (still a no-op without the `smt` feature) |
| `--backwards-analysis` / `--no-backwards-analysis` | `backwards_analysis` | Demand-driven backwards taint walk from sinks (default **off**) |
| `--parse-timeout-ms <N>` | `parse_timeout_ms` | Per-file tree-sitter parse timeout (ms); `0` disables the cap |
### Lattice-width Caps
Two caps bound the width of taint origin sets and points-to sets per SSA value. When a set would exceed the cap, entries are truncated deterministically and an engine note (`OriginsTruncated` / `PointsToTruncated`) is recorded on affected findings so you can see when precision was lost.
| Flag | Default | Description |
|------|---------|-------------|
| `--max-origins <N>` | `32` | Max taint origins retained per lattice value. Raise on very wide codebases where truncation is observed; lower only when lattice width is a measured bottleneck. Also set via `NYX_MAX_ORIGINS` |
| `--max-pointsto <N>` | `32` | Max abstract heap objects retained per points-to set. Raise on factory-heavy codebases where truncation is observed. Also set via `NYX_MAX_POINTSTO` |
See [configuration.md](configuration.md#analysisengine) for the full schema.
### Engine-Depth Profile
Individual engine toggles are fine-grained but hard to remember in combination. The `--engine-profile` shortcut sets the whole stack in one shot, and individual flags are layered on top after the profile is applied.
| Profile | Backwards | Symex | Abstract-interp | Context-sensitive |
|---------|-----------|-------|-----------------|-------------------|
| `fast` | off | off | off | off |
| `balanced` (default) | off | off | on | on |
| `deep` | on | on (cross-file + interprocedural) | on | on |
All three profiles build the AST, CFG, and SSA lattice and run forward taint; the columns above show which additional analyses each profile enables. SMT (`symex.smt`) is always off unless Nyx was built with `--features smt`.
Individual flags override the profile. For example, `--engine-profile fast --backwards-analysis` runs the fast stack but with backwards analysis on.
### Explain Effective Engine
`--explain-engine` prints the resolved engine configuration (profile + config + CLI overrides + env-var fallbacks) to stdout and exits without scanning. Useful for sanity-checking a CI invocation.
```bash
nyx scan --engine-profile deep --no-smt --explain-engine
```
<p align="center"><img src="../assets/screenshots/docs/cli-explain-engine.png" alt="nyx scan --engine-profile deep --explain-engine output: resolved config showing every analysis pass, its current state, and the CLI flag/env var that controls it" width="900"/></p>
### Examples
```bash
@ -148,6 +215,8 @@ nyx index status [PATH]
Display index statistics (file count, size, last modified) for the given path.
<p align="center"><img src="../assets/screenshots/docs/cli-idxstatus.png" alt="nyx index status output: project name, index path under the platform config dir, exists/size/modified fields" width="900"/></p>
---
## `nyx list`
@ -185,7 +254,9 @@ Manage configuration.
### `nyx config show`
Print the effective merged configuration as TOML.
Print the effective merged configuration as TOML. Useful for sanity-checking what the scanner is actually using after `nyx.conf` and `nyx.local` merge:
<p align="center"><img src="../assets/screenshots/docs/cli-configshow.png" alt="nyx config show output: TOML dump of the merged scanner config showing [scanner] mode/min_severity/excluded_extensions/excluded_directories, [database] settings, and resolved engine toggles" width="900"/></p>
### `nyx config path`
@ -204,7 +275,7 @@ Add a custom taint rule. Written to `nyx.local`.
| `--lang` | `rust`, `javascript`, `typescript`, `python`, `go`, `java`, `c`, `cpp`, `php`, `ruby` |
| `--matcher` | Function or property name to match |
| `--kind` | `source`, `sanitizer`, `sink` |
| `--cap` | `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `all` |
| `--cap` | `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `code_exec`, `crypto`, `unauthorized_id`, `all` |
### `nyx config add-terminator`
@ -216,19 +287,30 @@ Add a terminator function (e.g. `process.exit`). Written to `nyx.local`.
---
## Exit Codes
## Exit codes
| Code | Meaning |
|------|---------|
| `0` | Scan completed; no findings matched `--fail-on` threshold (or no `--fail-on` specified) |
| `1` | Scan completed but at least one finding met or exceeded the `--fail-on` severity |
| Non-zero | Error during scan (I/O error, config parse error, database error, etc.) |
See [output.md](output.md#exit-codes). Summary: `0` on success (including findings without `--fail-on`), `1` when `--fail-on` trips, non-zero on scan errors.
---
## Environment Variables
## Environment variables
Runtime behaviour:
| Variable | Description |
|----------|-------------|
| `RUST_LOG` | Set tracing verbosity (e.g. `RUST_LOG=debug nyx scan .`) |
| `NO_COLOR` | Disable ANSI color output |
Engine toggles (legacy, still honored; prefer CLI flags or `[analysis.engine]` config):
| Variable | Matches |
|---|---|
| `NYX_CONSTRAINT` | `--constraint-solving` |
| `NYX_ABSTRACT_INTERP` | `--abstract-interp` |
| `NYX_CONTEXT_SENSITIVE` | `--context-sensitive` |
| `NYX_SYMEX`, `NYX_CROSS_FILE_SYMEX`, `NYX_SYMEX_INTERPROC` | `--symex` and friends |
| `NYX_SMT` | `--smt` (no-op without the `smt` feature) |
| `NYX_BACKWARDS` | `--backwards-analysis` |
| `NYX_PARSE_TIMEOUT_MS` | `--parse-timeout-ms` |
| `NYX_MAX_ORIGINS`, `NYX_MAX_POINTSTO` | `--max-origins`, `--max-pointsto` |

View file

@ -1,6 +1,8 @@
# Configuration
Nyx uses TOML configuration files. A default config is auto-generated on first run.
Nyx uses TOML configuration files. A default config is auto-generated on first run. If you'd rather edit settings and rules from the browser, the [Config page in `nyx serve`](serve.md#config) is a live editor that writes back to `nyx.local`:
<p align="center"><img src="../assets/screenshots/docs/serve-config.png" alt="Nyx config page: General settings, Triage Sync toggle, Sources panel with language/matcher/capability dropdowns and a per-language matcher table" width="900"/></p>
## File Locations
@ -14,8 +16,8 @@ Run `nyx config path` to see the exact directory on your system.
## File Precedence
1. **`nyx.conf`** Default config (auto-created from built-in template on first run)
2. **`nyx.local`** User overrides (loaded on top of defaults)
1. **`nyx.conf`** -- Default config (auto-created from built-in template on first run)
2. **`nyx.local`** -- User overrides (loaded on top of defaults)
Both files are optional. CLI flags take precedence over both.
@ -24,8 +26,10 @@ Both files are optional. CLI flags take precedence over both.
| Type | Behavior |
|------|----------|
| Scalars (`mode`, `min_severity`, booleans) | User value wins |
| Arrays (`excluded_extensions`, `excluded_directories`) | Union + deduplicate |
| Arrays (`excluded_extensions`, `excluded_directories`, `excluded_files`) | Union + deduplicate |
| Analysis rules | Per-language union with deduplication |
| Profiles | User profile with same name fully replaces built-in |
| Server / Runs | User value wins (full section override) |
Example:
```toml
@ -36,7 +40,7 @@ excluded_extensions = ["jpg", "png", "exe"]
excluded_extensions = ["foo", "jpg"]
# Effective result:
# ["exe", "foo", "jpg", "png"] sorted, deduped union
# ["exe", "foo", "jpg", "png"] -- sorted, deduped union
```
---
@ -49,30 +53,33 @@ excluded_extensions = ["foo", "jpg"]
|-------|------|---------|-------------|
| `mode` | `"full"` \| `"ast"` \| `"cfg"` \| `"taint"` | `"full"` | Analysis mode |
| `min_severity` | `"Low"` \| `"Medium"` \| `"High"` | `"Low"` | Minimum severity to report |
| `max_file_size_mb` | int \| null | null | Max file size in MiB; null = unlimited |
| `max_file_size_mb` | int \| null | 16 | Max file size in MiB; null = unlimited. Default is a safe ceiling for untrusted repos; lift explicitly when scanning trusted codebases with large generated files |
| `excluded_extensions` | [string] | `["jpg", "png", "gif", "mp4", ...]` | File extensions to skip |
| `excluded_directories` | [string] | `["node_modules", ".git", "target", ...]` | Directories to skip |
| `excluded_files` | [string] | `[]` | Specific files to skip |
| `read_global_ignore` | bool | `false` | Honor global ignore file |
| `read_global_ignore` | bool | `false` | Honor global ignore file (RESERVED) |
| `read_vcsignore` | bool | `true` | Honor `.gitignore` / `.hgignore` |
| `require_git_to_read_vcsignore` | bool | `true` | Require `.git` dir to apply gitignore |
| `one_file_system` | bool | `false` | Don't cross filesystem boundaries |
| `follow_symlinks` | bool | `false` | Follow symbolic links |
| `scan_hidden_files` | bool | `false` | Scan dot-files |
| `include_nonprod` | bool | `false` | Keep original severity for test/vendor paths |
| `enable_state_analysis` | bool | `false` | Enable resource lifecycle + auth state analysis. Detects use-after-close, double-close, resource leaks (per-function scope), and unauthenticated access. Requires `mode = "full"` or `mode = "cfg"`. |
| `enable_state_analysis` | bool | `true` | Enable resource lifecycle + auth state analysis. Detects use-after-close, double-close, resource leaks (per-function scope), and unauthenticated access. Requires `mode = "full"` or `mode = "taint"`. |
### `[database]`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `path` | string | `""` | Custom SQLite DB path; empty = platform default |
| `path` | string | `""` | Custom SQLite DB path; empty = platform default (RESERVED) |
| `auto_cleanup_days` | int | `30` | Days to keep DB files (RESERVED) |
| `max_db_size_mb` | int | `1024` | Maximum DB size in MiB (RESERVED) |
| `vacuum_on_startup` | bool | `false` | Run VACUUM before indexed scans |
### `[output]`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `default_format` | `"console"` \| `"json"` \| `"sarif"` | `"console"` | Default output format |
| `default_format` | `"console"` \| `"json"` \| `"sarif"` | `"console"` | Default output format (used when `--format` is not specified) |
| `quiet` | bool | `false` | Suppress status messages |
| `max_results` | int \| null | null | Cap number of findings; null = unlimited |
| `attack_surface_ranking` | bool | `true` | Enable attack-surface ranking |
@ -89,11 +96,122 @@ excluded_extensions = ["foo", "jpg"]
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `max_depth` | int \| null | null | Max filesystem traversal depth; null = unlimited |
| `min_depth` | int \| null | null | Min depth for reported entries (RESERVED) |
| `prune` | bool | `false` | Stop traversing into matching directories (RESERVED) |
| `worker_threads` | int \| null | null | Worker thread count; null/0 = auto-detect |
| `batch_size` | int | `100` | Files per index batch |
| `channel_multiplier` | int | `4` | Channel capacity = threads x multiplier |
| `rayon_thread_stack_size` | int | `8388608` | Rayon thread stack size in bytes (8 MiB) |
| `prune` | bool | `false` | Stop traversing into matching directories |
| `scan_timeout_secs` | int \| null | null | Per-file timeout in seconds (RESERVED) |
| `memory_limit_mb` | int | `512` | Max memory in MiB (RESERVED) |
### `[server]`
Configuration for the local web UI (`nyx serve`).
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled` | bool | `true` | Whether the serve command is enabled |
| `host` | string | `"127.0.0.1"` | Host to bind to (localhost by default) |
| `port` | int | `9700` | Port for the web UI |
| `open_browser` | bool | `true` | Open browser automatically on serve |
| `auto_reload` | bool | `true` | Auto-reload UI when scan results change |
| `persist_runs` | bool | `true` | Persist scan runs for history view |
| `max_saved_runs` | int | `50` | Maximum number of saved runs |
### `[runs]`
Configuration for scan run persistence and history.
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `persist` | bool | `false` | Persist scan run history to disk |
| `max_runs` | int | `100` | Maximum number of runs to keep |
| `save_logs` | bool | `false` | Save scan logs with each run |
| `save_stdout` | bool | `false` | Save stdout capture with each run |
| `save_code_snippets` | bool | `true` | Save code snippets in findings |
### `[profiles.<name>]`
Named scan presets that override scan-related config. Activate with `--profile <name>`.
All fields are optional; omitted fields inherit from the base config.
| Field | Type | Description |
|-------|------|-------------|
| `mode` | string | Analysis mode |
| `min_severity` | string | Minimum severity |
| `max_file_size_mb` | int | Max file size in MiB |
| `include_nonprod` | bool | Keep original severity for test/vendor |
| `enable_state_analysis` | bool | Enable state analysis |
| `default_format` | string | Output format |
| `quiet` | bool | Suppress status output |
| `attack_surface_ranking` | bool | Enable ranking |
| `max_results` | int | Max findings |
| `min_score` | int | Min rank score |
| `show_all` | bool | Show all findings |
| `include_quality` | bool | Include quality findings |
| `worker_threads` | int | Worker thread count |
| `max_depth` | int | Max traversal depth |
**Built-in profiles:**
| Name | Description |
|------|-------------|
| `quick` | AST-only, medium+ severity |
| `full` | Full analysis with state analysis enabled |
| `ci` | Full analysis, medium+ severity, quiet, SARIF output |
| `taint_only` | Taint analysis only |
| `conservative_large_repo` | AST-only, high severity, 5 MiB file limit, depth 10 |
User-defined profiles with the same name as a built-in will override it.
### `[analysis.engine]`
Release-grade switches for the optional analysis passes. Each toggle has a
matching CLI flag (pair of `--foo` / `--no-foo`) that overrides the config
value for a single run. These used to be `NYX_*` environment variables
(`NYX_CONSTRAINT`, `NYX_ABSTRACT_INTERP`, `NYX_SYMEX`, `NYX_CROSS_FILE_SYMEX`,
`NYX_SYMEX_INTERPROC`, `NYX_CONTEXT_SENSITIVE`, `NYX_PARSE_TIMEOUT_MS`,
`NYX_SMT`); those env vars are still honored as a last-resort override when
nyx is used as a library (no CLI entry point), but the config/CLI surface is
the stable path.
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `constraint_solving` | bool | `true` | Path-constraint solving (prunes infeasible paths in taint) |
| `abstract_interpretation` | bool | `true` | Interval / string / bit abstract domains carried through the SSA worklist |
| `context_sensitive` | bool | `true` | k=1 context-sensitive callee inlining for intra-file calls |
| `backwards_analysis` | bool | `false` | Demand-driven backwards taint walk from sinks (adds scan time; default off) |
| `parse_timeout_ms` | int | `10000` | Per-file tree-sitter parse timeout; `0` disables the cap |
**`[analysis.engine.symex]`** sub-section:
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled` | bool | `true` | Run the symex pipeline after taint; adds witness strings and symbolic verdicts |
| `cross_file` | bool | `true` | Persist / consult cross-file SSA bodies so symex can reason about callees defined in other files |
| `interprocedural` | bool | `true` | Intra-file interprocedural symex (k ≥ 2 via frame stack) |
| `smt` | bool | `true` | Use the SMT backend when nyx is built with the `smt` feature; ignored otherwise |
CLI flag map (each pair is `--enable / --no-enable`):
| Config field | CLI flags |
|---|---|
| `constraint_solving` | `--constraint-solving` / `--no-constraint-solving` |
| `abstract_interpretation` | `--abstract-interp` / `--no-abstract-interp` |
| `context_sensitive` | `--context-sensitive` / `--no-context-sensitive` |
| `backwards_analysis` | `--backwards-analysis` / `--no-backwards-analysis` |
| `parse_timeout_ms` | `--parse-timeout-ms <N>` |
| `symex.enabled` | `--symex` / `--no-symex` |
| `symex.cross_file` | `--cross-file-symex` / `--no-cross-file-symex` |
| `symex.interprocedural` | `--symex-interproc` / `--no-symex-interproc` |
| `symex.smt` | `--smt` / `--no-smt` |
**Engine-depth profile shortcut**: instead of flipping individual toggles, pass `--engine-profile {fast,balanced,deep}` to set the whole stack at once. Individual flags override the profile, so `--engine-profile fast --backwards-analysis` runs the fast stack with backwards analysis on. See `docs/cli.md` for the exact toggle matrix.
**Explain effective engine**: pass `--explain-engine` to print the resolved engine configuration (profile + config + CLI overrides) and exit without scanning.
### `[analysis.languages.<slug>]`
@ -112,7 +230,9 @@ Per-language custom rules. `<slug>` is one of: `rust`, `javascript`, `typescript
matchers = ["escapeHtml"]
kind = "sanitizer" # "source" | "sanitizer" | "sink"
cap = "html_escape" # "env_var" | "html_escape" | "shell_escape" |
# "url_encode" | "json_parse" | "file_io" | "all"
# "url_encode" | "json_parse" | "file_io" |
# "fmt_string" | "sql_query" | "deserialize" |
# "ssrf" | "code_exec" | "crypto" | "all"
```
---
@ -146,6 +266,26 @@ default_format = "sarif"
worker_threads = 4
```
### Using a scan profile
```bash
# Use a built-in profile
nyx scan --profile ci
# CLI flags still override profile values
nyx scan --profile ci --format json
```
### Custom profile
```toml
[profiles.security_audit]
mode = "full"
min_severity = "Low"
enable_state_analysis = true
show_all = true
```
### Custom rules for a Node.js project
```toml
@ -181,3 +321,93 @@ nyx config add-terminator --lang javascript --name process.exit
# Verify
nyx config show
```
---
## Config Validation
Config is validated after loading and merging. Validation checks include:
- Server port must be 165535
- Server host must not be empty
- `max_saved_runs` must be > 0 when `persist_runs` is true
- `max_runs` must be > 0 when `persist` is true
- `batch_size` and `channel_multiplier` must be > 0
- `rollup_examples` must be > 0
- Profile names must be alphanumeric with underscores only
Invalid config produces structured error messages identifying the section, field, and issue.
---
## State Analysis
State analysis detects resource lifecycle violations (use-after-close, double-close, resource leaks) and unauthenticated access patterns. It is **enabled by default**.
To disable:
```toml
[scanner]
enable_state_analysis = false
```
State analysis requires `mode = "full"` or `mode = "taint"`. It has no effect in `mode = "ast"`.
**Tradeoffs**:
- Additional per-function state-machine pass adds some scan time
- May produce findings that require domain knowledge to evaluate (e.g., whether a resource handle is intentionally left open)
- Most useful for C, C++, Rust, Go, and Java where acquire/release patterns are common
---
## Upgrading
### Engine-version mismatch is handled automatically
Nyx stores the scanner's `CARGO_PKG_VERSION` in the project index database.
When the version recorded in the DB differs from the running binary; or the
row is missing entirely; every cached summary, SSA body, and file-hash row
is wiped on the next open so the next scan rebuilds the index against the new
engine. No flag is needed; CI pipelines keep working across upgrades.
The rebuild is logged at `info` level:
```
engine version changed (0.4.0 → 0.5.0), rebuilding index
```
If you see this once per upgrade it is working as intended. If you see it on
every scan, the metadata row is not being persisted; file an issue.
### Forcing a reindex
Use `--index rebuild` to throw away the current project's cached summaries
and re-run pass 1 against the current rules. Useful after editing
`nyx.local` rules, after an upgrade that changed label definitions without
changing the engine version, or when you want a known-clean baseline:
```bash
nyx scan --index rebuild .
```
This clears the current project's rows in `files`, `function_summaries`,
`ssa_function_summaries`, and `ssa_function_bodies`; other projects sharing
the same DB directory are untouched.
### Recovering from a corrupt database
If the `.sqlite` file itself is damaged (e.g. from a killed scan or full
disk) and `nyx scan` fails to open it, delete the file and let the next
scan recreate it:
```bash
rm "$(nyx config path)"/<project>.sqlite*
```
On the next scan Nyx builds a fresh index from scratch.
---
## Reserved Fields
Some config fields are defined but not yet implemented. They are marked `(RESERVED)` in the default config and accept values without effect. This allows forward-compatible config files; settings will activate when the feature is implemented without requiring config changes.

View file

@ -1,81 +1,68 @@
# Detector Overview
# Detectors
Nyx uses four independent detector families. Each targets different vulnerability classes and operates at a different level of analysis depth. Findings from all active detectors are merged, deduplicated, ranked, and presented in a single result set.
Nyx ships four independent detector families. They run together in `--mode full`, the default. Findings are merged, deduplicated, ranked, and printed in one result set.
## The Four Detector Families
| Family | Rule prefix | Looks at | What it finds |
|---|---|---|---|
| [Taint analysis](detectors/taint.md) | `taint-*` | Cross-file dataflow | Unsanitized data flowing source to sink |
| [CFG structural](detectors/cfg.md) | `cfg-*` | Per-function control flow | Auth gaps, unguarded sinks, error fallthrough, resource release on all paths |
| [State model](detectors/state.md) | `state-*` | Per-function state lattice | Use-after-close, double-close, leaks, unauthenticated access |
| [AST patterns](detectors/patterns.md) | `<lang>.<cat>.<name>` | Tree-sitter structural match | Banned APIs, weak crypto, dangerous constructs |
| Family | Rule prefix | Analysis depth | What it finds |
|--------|------------|----------------|---------------|
| [**Taint Analysis**](detectors/taint.md) | `taint-*` | Cross-file dataflow | Unsanitized data flowing from sources to sinks |
| [**CFG Structural**](detectors/cfg.md) | `cfg-*` | Intra-procedural CFG | Auth gaps, unguarded sinks, resource leaks, error fallthrough |
| [**State Model**](detectors/state.md) | `state-*` | Intra-procedural lattice | Use-after-close, double-close, resource leaks, unauthenticated access |
| [**AST Patterns**](detectors/patterns.md) | `<lang>.*.*` | Structural (no flow) | Dangerous function calls, banned APIs, weak crypto |
For Rust auth-specific rules (`rs.auth.*`), see [auth.md](auth.md).
## How They Combine
## How they combine
In `--mode full` (default), all four families run. Findings are deduplicated:
In `--mode full`:
1. **Taint supersedes AST**: If a taint finding and an AST pattern both fire at the same location (e.g. both flag `eval(userInput)`), both are kept with distinct rule IDs. The taint finding ranks higher due to the analysis-kind bonus.
1. **Taint and AST can both fire on one line.** If `eval(userInput)` triggers both `js.code_exec.eval` (AST) and `taint-unsanitised-flow` (taint), both are kept with distinct rule IDs. The taint finding ranks higher because of the analysis-kind bonus.
2. **State supersedes CFG on resource leaks.** When `state-resource-leak` and `cfg-resource-leak` fire at the same location, the CFG one is dropped.
3. **Exact duplicates are removed.** Same line, column, rule ID, severity → one finding.
2. **State supersedes CFG**: If a state-model finding (e.g. `state-resource-leak`) fires at the same location as a CFG finding (e.g. `cfg-resource-leak`), the CFG finding is suppressed.
## Modes
3. **Location-level dedup**: Exact duplicates (same line, column, rule ID, severity) are removed.
| Mode | Active detectors |
|---|---|
| `full` (default) | All four |
| `ast` | AST patterns only |
| `cfg` | Taint + CFG + State (no AST patterns) |
| `taint` | Taint + State |
## Analysis Modes
## Attack-surface ranking
| Mode | CLI flag | Active detectors |
|------|----------|-----------------|
| Full | `--mode full` | All four |
| AST-only | `--mode ast` | AST patterns only |
| CFG/Taint | `--mode cfg` | Taint + CFG + State |
## Attack-Surface Ranking
Every finding receives a deterministic **attack-surface score** estimating exploitability. Findings are sorted by descending score.
### Scoring Formula
Every finding gets a deterministic score. Findings are sorted by descending score by default. Disable with `--no-rank` or `output.attack_surface_ranking = false`.
```
score = severity_base + analysis_kind + evidence_strength + state_bonus - validation_penalty
```
| Component | Values | Purpose |
|-----------|--------|---------|
| **Severity base** | High=60, Medium=30, Low=10 | Primary signal |
| **Analysis kind** | taint=+10, state=+8, cfg(with evidence)=+5, cfg(no evidence)=+3, ast=+0 | Confidence of analysis |
| **Evidence strength** | +1 per evidence item (max 4), +2-6 for source kind | Specificity of finding |
| **State bonus** | use-after-close/unauthed=+6, double-close=+3, must-leak=+2, may-leak=+1 | State rule severity |
| **Validation penalty** | -5 if path-validated | Guard reduces exploitability |
| Component | Values |
|---|---|
| Severity base | High=60, Medium=30, Low=10 |
| Analysis kind | taint=+10, state=+8, cfg with evidence=+5, cfg without evidence=+3, ast=+0 |
| Evidence strength | +1 per evidence item up to 4; +2 to +6 for source kind |
| State bonus | use-after-close / unauthed=+6, double-close=+3, must-leak=+2, may-leak=+1 |
| Validation penalty | -5 if path-validated |
### Source-kind priority
Source-kind contributions (taint only):
| Source type | Bonus | Examples |
|-------------|-------|---------|
| User input | +6 | `req.body`, `argv`, `stdin`, `form`, `query`, `params` |
| Environment | +5 | `env::var`, `getenv`, `process.env` |
| Unknown | +4 | Conservative default |
| File system | +3 | `fs::read_to_string`, `fgets` |
| Database | +2 | Query results |
| Source | Bonus |
|---|---|
| User input (`req.body`, `argv`, `stdin`, `form`, `query`, `params`) | +6 |
| Environment (`env::var`, `getenv`, `process.env`) | +5 |
| Unknown | +4 |
| File system | +3 |
| Database | +2 |
### Score ranges (approximate)
Approximate score ranges:
| Finding type | Score range |
|-------------|------------|
| High taint + user input | ~76-80 |
| Finding type | Score |
|---|---|
| High taint with user input | 76 to 81 |
| High state (use-after-close) | ~74 |
| High CFG structural | ~63-68 |
| Medium taint + env source | ~45-50 |
| High CFG structural | 63 to 68 |
| Medium taint with env source | 45 to 50 |
| Medium state (resource leak) | ~40 |
| Low AST-only pattern | ~10 |
Ranking is enabled by default. Disable with `--no-rank` or `output.attack_surface_ranking = false`.
## Two-Pass Architecture
Nyx's taint analysis requires cross-file context, achieved via two passes:
1. **Pass 1 — Summary extraction**: Each file is parsed, a CFG is built, and a `FuncSummary` is extracted per function. Summaries capture source/sanitizer/sink capabilities (bitflags), taint propagation behavior, and callee lists. Summaries are persisted to SQLite.
2. **Pass 2 — Analysis**: All summaries are merged into a global map. Files are re-parsed and analyzed with full cross-file context. The taint engine resolves callees against local summaries (more precise) first, then falls back to global summaries.
With indexing enabled, Pass 1 skips files whose content hash hasn't changed since the last scan.
For the engine's runtime model (passes, summaries, SCC fixed-point), see [how-it-works.md](how-it-works.md).

View file

@ -1,161 +1,130 @@
# CFG Structural Analysis
# CFG structural analysis
## Summary
Nyx builds an intra-procedural control-flow graph per function and checks structural properties: whether sinks are guarded by sanitizers or validators, whether web handlers check authentication, whether resources are released on all exit paths, and whether error paths terminate before reaching dangerous code.
Nyx builds an intra-procedural control-flow graph (CFG) for each function and analyzes structural properties: whether sinks are guarded by sanitizers or validators, whether web handlers check authentication, whether resources are released on all exit paths, and whether error-handling code terminates properly.
These detectors use **dominator analysis** — they check whether a guard node dominates (must execute before) a sink node on the CFG.
These detectors use dominator analysis. A guard dominates a sink when the guard must execute before the sink on every path from entry.
## Rule IDs
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `cfg-unguarded-sink` | High/Medium | Sink reachable without a dominating guard or sanitizer |
| `cfg-auth-gap` | High | Web handler reaches privileged sink without auth check |
| `cfg-unreachable-sink` | Medium | Dangerous function in unreachable code |
| `cfg-unreachable-sanitizer` | Low | Sanitizer in unreachable code |
| `cfg-unreachable-source` | Low | Source in unreachable code |
| `cfg-error-fallthrough` | High/Medium | Error check doesn't terminate; dangerous code follows |
| `cfg-resource-leak` | Medium | Resource acquired but not released on all exit paths |
| `cfg-lock-not-released` | Medium | Lock acquired but not released on all exit paths |
| Rule ID | Severity |
|---|---|
| `cfg-unguarded-sink` | High/Medium |
| `cfg-auth-gap` | High |
| `cfg-unreachable-sink` | Medium |
| `cfg-unreachable-sanitizer` | Low |
| `cfg-unreachable-source` | Low |
| `cfg-error-fallthrough` | High/Medium |
| `cfg-resource-leak` | Medium |
| `cfg-lock-not-released` | Medium |
## What It Detects
## What it detects
### Unguarded sinks (`cfg-unguarded-sink`)
A sink call (e.g. `system()`, `eval()`, `Command::new()`) is reachable from the function entry without passing through a guard or sanitizer that matches the sink's capability.
**`cfg-unguarded-sink`**: A sink call (`system`, `eval`, `Command::new`, `db.execute`, etc.) is reachable from function entry without passing through any guard or sanitizer that matches the sink's capability.
### Auth gaps (`cfg-auth-gap`)
A function identified as a web handler (by parameter naming conventions like `req`, `res`, `ctx`, `request`) reaches a privileged sink (shell execution, file I/O) without a prior call to an authentication function (`is_authenticated`, `require_auth`, `check_permission`, etc.).
**`cfg-auth-gap`**: A function identified as a web handler (by parameter naming conventions like `req`, `res`, `ctx`, `request`, language-dependent) reaches a privileged sink (shell execution, file I/O) without a preceding authentication call.
### Unreachable security code (`cfg-unreachable-*`)
Sinks, sanitizers, or sources in dead code branches. This often indicates a refactoring error where security-critical code was accidentally made unreachable.
**`cfg-unreachable-*`**: Sinks, sanitizers, or sources in dead code. Usually signals a refactoring error that silently disabled security-relevant logic.
### Error fallthrough (`cfg-error-fallthrough`)
An error check (null check, error return check) does not terminate the function or loop back. Execution continues to a dangerous operation on the error path.
**`cfg-error-fallthrough`**: An error-handling branch (null check, error-return check) does not terminate. Execution falls through to a dangerous operation on the error path.
### Resource leaks (`cfg-resource-leak`, `cfg-lock-not-released`)
A resource acquisition call (e.g. `File::open`, `fopen`, `socket`, `Lock`) is not matched by a release call (e.g. `close`, `fclose`, `unlock`) on all exit paths from the function.
**`cfg-resource-leak`, `cfg-lock-not-released`**: A resource acquisition (`File::open`, `fopen`, `socket`, `Lock`) is not matched by a release on every exit path from the function.
## What It Cannot Detect
## What it can't detect
- **Inter-procedural guards**: If authentication is checked in a middleware function that calls this handler, the CFG detector cannot see it. It only analyzes one function at a time.
- **Dynamic dispatch**: Virtual method calls, function pointers, and closures are opaque to the CFG.
- **Complex guard patterns**: Only recognized guard function names are checked. Custom validation logic (e.g. `if password == expected`) is not recognized as a guard.
- **Correct sanitization**: The detector checks that *some* guard dominates the sink, not that the guard is *correct*. A guard that always passes would suppress the finding.
- **Cross-function resource flows**: If a file handle is opened in one function and closed in another, the detector will report a leak in the first function.
- **Inter-procedural guards.** Middleware-level auth, helper functions that internally call auth, and cleanup performed in a caller are invisible.
- **Dynamic dispatch.** Virtual calls, function pointers, closures resolve to no specific callee.
- **Correctness of guards.** The detector checks *a* guard dominates the sink. It cannot check the guard is correct. A no-op `if true {}` would suppress the finding.
- **Custom validation logic.** Only recognised guard names are checked. `if password == expected` is not a recognised guard.
- **Cross-function resource flows.** If a file handle opens in one function and closes in another, the opener gets flagged as a leak. This is the largest source of FPs on factory-pattern code.
## Common False Positives
## Common false positives
| Scenario | Why it fires | Mitigation |
|----------|-------------|------------|
| Framework-level auth middleware | Handler doesn't call auth directly | Document as expected; suppress with severity filter |
| Resource closed via RAII/defer | Implicit cleanup not visible to CFG | Currently not detected; known limitation |
| Custom guard function name | Function not in the recognized guard list | Add the function name as a sanitizer in config |
| Test handlers | Intentionally skip auth in tests | Default non-prod downgrade reduces severity; or exclude test dirs |
| Scenario | Why | Mitigation |
|---|---|---|
| Framework middleware auth | Handler doesn't call auth directly | Expected; suppress with severity filter or exclude handlers |
| RAII / defer cleanup | Implicit release not visible to CFG (partially handled for Rust Drop and Go defer) | Known limitation |
| Custom guard name | Function not in the recognised guard list | Add it as a sanitizer rule in config |
| Test handlers | Intentional lack of auth | Default non-prod downgrade reduces severity; or exclude test dirs |
## Common False Negatives
## Common false negatives
| Scenario | Why it's missed |
|----------|----------------|
| Auth in called function | Cross-function guards not tracked |
| Guard via type system | Type-level guarantees (e.g. Rust's `AuthenticatedUser` wrapper) not analyzed |
| Resource closed in finally/defer | Some cleanup patterns not recognized |
| Scenario | Why |
|---|---|
| Auth in a called helper | Cross-function guards not tracked |
| Type-system guards | Rust `AuthenticatedUser<T>` wrappers, typestate patterns not analysed |
| Cleanup in `finally`/`ensure`/`defer` in callers | Cross-function cleanup not tracked |
## Confidence Signals
## Tuning
| Signal | Meaning |
|--------|---------|
| **Evidence lists guard nodes** | Shows which guards were checked and found missing |
| **Sink has high capability** | Shell execution or file I/O sinks are higher risk |
| **Handler detection matched** | Web handler identification is based on conventional parameter names |
### Recognised guard names
## Tuning and Noise Controls
Nyx accepts these patterns as dominating guards:
### Add custom guards/sanitizers
| Pattern | Applies to |
|---|---|
| `validate*`, `sanitize*` | All sinks |
| `check_*`, `verify_*`, `assert_*` | All sinks |
| `shell_escape` | Shell sinks |
| `html_escape` | HTML/XSS sinks |
| `url_encode` | URL sinks |
| `which` | Shell execution (binary lookup) |
### Recognised auth names
| Pattern | Language |
|---|---|
| `is_authenticated`, `require_auth`, `check_permission`, `authorize`, `authenticate`, `require_login`, `check_auth`, `verify_token`, `validate_token` | Cross-language |
| `middleware.auth`, `auth.required` | Go |
| `isAuthenticated`, `checkPermission`, `hasAuthority`, `hasRole` | Java |
For Rust auth checks (`require_*`, ownership equality, row-level checks), see [auth.md](../auth.md).
### Custom guards
```toml
[[analysis.languages.python.rules]]
matchers = ["validate_request", "check_csrf"]
kind = "sanitizer"
cap = "all"
cap = "all"
```
### Add auth rules
Auth checks are recognized by function name. If your codebase uses non-standard names:
### Custom auth functions
```toml
[[analysis.languages.javascript.rules]]
matchers = ["ensureLoggedIn", "requirePermission"]
kind = "sanitizer"
cap = "all"
```
### Filter results
```bash
# Skip low-severity unreachable findings
nyx scan . --severity ">=MEDIUM"
```
### Disable CFG analysis
```bash
nyx scan . --mode ast # AST patterns only
cap = "all"
```
## Examples
### Unguarded sink
Unguarded sink:
```go
func handler(w http.ResponseWriter, r *http.Request) {
cmd := r.URL.Query().Get("cmd")
exec.Command("sh", "-c", cmd).Run() // cfg-unguarded-sink: no guard dominates
exec.Command("sh", "-c", cmd).Run() // cfg-unguarded-sink
}
```
### Auth gap
Auth gap:
```javascript
app.get('/admin/delete', (req, res) => {
// No is_authenticated() call
db.execute("DELETE FROM users WHERE id = " + req.params.id);
// cfg-auth-gap: web handler reaches privileged sink without auth
// No auth call
db.execute("DELETE FROM users WHERE id = " + req.params.id); // cfg-auth-gap
});
```
### Resource leak
Resource leak:
```c
void process() {
FILE *f = fopen("data.txt", "r"); // acquire
FILE *f = fopen("data.txt", "r");
if (error) {
return; // cfg-resource-leak: f not closed on this path
return; // cfg-resource-leak: f not closed on this path
}
fclose(f);
}
```
## Guard Rules
Nyx recognizes these function name patterns as guards:
| Pattern | Applies to |
|---------|-----------|
| `validate*`, `sanitize*` | All sinks |
| `check_*`, `verify_*`, `assert_*` | All sinks |
| `shell_escape` | Shell execution sinks |
| `html_escape` | HTML/XSS sinks |
| `url_encode` | URL sinks |
| `which` | Shell execution (binary lookup) |
### Auth rules
| Pattern | Category |
|---------|----------|
| `is_authenticated`, `require_auth`, `check_permission` | Common |
| `authorize`, `authenticate`, `require_login` | Common |
| `check_auth`, `verify_token`, `validate_token` | Common |
| `middleware.auth`, `auth.required` | Go |
| `isAuthenticated`, `checkPermission`, `hasAuthority`, `hasRole` | Java |

View file

@ -1,111 +1,84 @@
# AST Pattern Matching
# AST patterns
## Summary
AST patterns are tree-sitter queries that match dangerous structural shapes in source. No dataflow, no CFG. A match means the construct is present; it's not proof the construct is exploitable.
AST patterns are tree-sitter queries that match specific structural code constructs. They are the simplest and fastest detector family — no dataflow, no CFG, just structural presence. A match means the dangerous construct exists in the code; it does not prove the code is exploitable.
AST patterns run in all analysis modes, including `--mode ast` (where they are the only active detector).
Patterns run in every analysis mode. In `--mode ast` they're the only active detector.
## Rule IDs
Pattern rule IDs follow the format `<lang>.<category>.<specific>`:
```
rs.memory.transmute
js.code_exec.eval
py.deser.pickle_loads
c.memory.gets
java.sqli.execute_concat
<lang>.<category>.<name>
```
See the [Rule Reference](../rules/index.md) for a complete listing per language.
Examples: `js.code_exec.eval`, `py.deser.pickle_loads`, `c.memory.gets`, `java.sqli.execute_concat`.
## Pattern Tiers
Full list: [rules.md](../rules.md).
| Tier | Meaning | Examples |
|------|---------|---------|
| **A** | Structural presence alone is high-signal | `gets()`, `eval()`, `pickle.loads()`, `mem::transmute` |
| **B** | Query includes a heuristic guard | SQL `execute` with concatenated arg, `printf(var)` with non-literal format |
## Tiers
Tier B patterns use additional tree-sitter predicates to reduce false positives. For example, `java.sqli.execute_concat` only fires when `executeQuery()` receives a `binary_expression` (string concatenation) as its argument, not when it receives a literal or parameter placeholder.
| Tier | Meaning |
|---|---|
| **A** | Structural presence alone is high-signal. `gets`, `eval`, `pickle.loads`, `mem::transmute` |
| **B** | Pattern includes a tree-sitter heuristic guard. Example: `java.sqli.execute_concat` only fires when `executeQuery` receives a `binary_expression` (string concatenation), not a literal or a parameterized statement |
## What It Detects
## Categories
### By category
| Category | Examples |
|---|---|
| CommandExec | `system`, `os.system`, `Runtime.exec`, backticks |
| CodeExec | `eval`, `Function`, PHP `assert("string")`, `class_eval`, `instance_eval` |
| Deserialization | `pickle.loads`, `yaml.load`, `Marshal.load`, `readObject`, `unserialize` |
| SqlInjection | `executeQuery`/`Query`/`execute` with concatenated argument (Tier B) |
| PathTraversal | PHP `include $var` |
| Xss | `document.write`, `outerHTML`, `insertAdjacentHTML`, `getWriter().print` |
| Crypto | `md5`, `sha1`, `Math.random`, `java.util.Random` for security use |
| Secrets | hardcoded API keys (Go, JS, TS) |
| InsecureTransport | `InsecureSkipVerify`, `fetch("http://...")` |
| Reflection | `Class.forName`, `Method.invoke`, `send`, `constantize` |
| MemorySafety | `transmute`, `unsafe`, `gets`, `strcpy`, `sprintf` |
| Prototype | `__proto__` assignment, `Object.prototype.*` |
| Config | CORS dynamic origin, `rejectUnauthorized: false`, insecure session settings |
| CodeQuality | `unwrap`, `panic!`, `as any` |
| Category | What it matches | Example languages |
|----------|----------------|-------------------|
| **CommandExec** | Shell command execution functions | C (`system`), Python (`os.system`), Ruby (backticks) |
| **CodeExec** | Dynamic code evaluation | JS (`eval`, `new Function()`), Python (`exec`), PHP (`eval`) |
| **Deserialization** | Unsafe object deserialization | Java (`readObject`), Python (`pickle.loads`), Ruby (`Marshal.load`) |
| **SqlInjection** | SQL with string concatenation | Java, Go, Python, PHP (Tier B heuristic) |
| **PathTraversal** | File inclusion with variable path | PHP (`include $var`) |
| **Xss** | XSS sink functions | JS (`document.write`, `outerHTML`), Java (`getWriter().print`) |
| **Crypto** | Weak cryptographic algorithms | All languages (`md5`, `sha1`, `Math.random()`) |
| **Secrets** | Hardcoded credentials | Go (variable name matching) |
| **InsecureTransport** | Unencrypted communication | Go (`InsecureSkipVerify`), JS (`fetch("http://")`) |
| **Reflection** | Dynamic class/method dispatch | Java (`Class.forName`, `Method.invoke`), Ruby (`send`, `constantize`) |
| **MemorySafety** | Memory safety violations | Rust (`transmute`, `unsafe`), C (`gets`, `strcpy`, `sprintf`) |
| **Prototype** | Prototype pollution | JS/TS (`__proto__` assignment) |
| **CodeQuality** | Panic/abort/type-safety issues | Rust (`unwrap`, `panic!`), TS (`as any`) |
## What patterns can't tell you
## What It Cannot Detect
- **Dataflow.** `eval("1+1")` (safe) and `eval(userInput)` (dangerous) both match `js.code_exec.eval`. The taint detector is the one that distinguishes them.
- **Reachability.** A pattern in dead code matches identically.
- **Semantics.** `strcpy(dst, src)` always matches, regardless of buffer sizes.
- **Indirect calls.** `let e = eval; e(input)` doesn't match `eval`.
- **Aliased imports.** `from os import system as s; s(cmd)` won't match `system`.
- **Macro expansions.** Tree-sitter parses the macro call site, not the expansion.
- **Dataflow**: Patterns don't track whether the dangerous function receives tainted input. `eval("hello")` (safe) and `eval(userInput)` (dangerous) both match `js.code_exec.eval`.
- **Context**: Patterns don't understand whether the code is reachable, guarded, or inside a test.
- **Semantics**: `strcpy(dst, src)` always matches — it cannot determine buffer sizes.
- **Indirect calls**: Function pointers, dynamic dispatch, and aliased references are invisible.
## Common false positives
## Common False Positives
| Scenario | Why | Mitigation |
|---|---|---|
| `eval("hardcoded literal")` | Pattern matches structure | Run `--mode cfg` to drop AST patterns and rely on taint |
| `unsafe` block with sound justification | Every `unsafe` matches `rs.quality.unsafe_block` | Filter `>=MEDIUM` (it's Medium) or accept the noise |
| `.unwrap()` in tests | Acceptable in test code | Default non-prod severity downgrade reduces it |
| `md5` for non-cryptographic checksums | Pattern can't see intent | Suppress with `--severity ">=MEDIUM"` or per-line `nyx:ignore` |
| SQL concat with trusted data (Tier B) | Heuristic can't verify the source | Taint is more precise; or convert to a parameterized query |
| Scenario | Why it fires | Mitigation |
|----------|-------------|------------|
| `eval()` with a hardcoded string literal | Pattern matches structural presence | Taint analysis won't flag this — use `--mode cfg` for fewer false positives |
| `unsafe` block in Rust with sound justification | All unsafe blocks match | Filter with `--severity ">=MEDIUM"` (unsafe_block is Medium) |
| `.unwrap()` in test code | Acceptable in tests | Default non-prod downgrade reduces severity |
| `md5()` used for checksums (not security) | Pattern doesn't know usage intent | Filter Low severity or add to exclusions |
| SQL concatenation with trusted data | Tier B heuristic can't verify data source | Taint analysis is more precise here |
## Confidence levels
## Common False Negatives
Every AST pattern carries an explicit confidence:
| Scenario | Why it's missed |
|----------|----------------|
| `eval` called via alias (`let e = eval; e(input)`) | Pattern matches the identifier `eval`, not the resolved function |
| Dangerous function in a macro expansion | Tree-sitter parses the macro call, not the expansion |
| SQL injection via ORM query builder | No pattern for ORM-specific query building |
| Imported function under different name | `from os import system as s; s(cmd)` — pattern looks for `system` |
| Confidence | Use |
|---|---|
| High | Inherently dangerous construct with no safe usage. `gets`, `pickle.loads`, `eval` with no guard |
| Medium | Likely issue, context may change the call. SQL concatenation (Tier B), `unsafe` blocks, `exec` |
| Low | Heuristic. Often appears in safe code. Weak crypto for checksums, `unwrap` outside tests, `Math.random` |
## Confidence Signals
`--min-confidence medium` (or `output.min_confidence = "medium"`) drops Low-confidence matches.
| Signal | Meaning |
|--------|---------|
| **Tier A** | High confidence — the function itself is dangerous |
| **Tier B** | Moderate confidence — heuristic guard reduces false positives |
| **High severity** | Critical vulnerability class (command exec, deserialization) |
| **Low severity** | Informational (weak crypto, code quality) |
| **Non-prod path** | Finding in test/vendor code — downgraded by default |
## Tuning and Noise Controls
### Severity filtering
## Tuning
```bash
# Skip code-quality and weak-crypto findings
nyx scan . --severity ">=MEDIUM"
# Only critical findings
nyx scan . --severity HIGH
nyx scan . --severity ">=MEDIUM" # drop Low-tier patterns
nyx scan . --severity HIGH # banned APIs and code-exec only
nyx scan . --mode cfg # drop AST patterns; keep taint + state + cfg
```
### Use taint for precision
```bash
# Taint-only mode: only report findings with confirmed dataflow
nyx scan . --mode cfg
```
### Exclude directories
```toml
[scanner]
excluded_directories = ["node_modules", "vendor", "generated"]
@ -113,37 +86,29 @@ excluded_directories = ["node_modules", "vendor", "generated"]
## Examples
### Tier A — structural presence
Tier A, structural presence:
**C: Banned function**
```c
char buf[64];
gets(buf); // c.memory.gets — always dangerous, no safe usage
gets(buf); // c.memory.gets
```
**Python: Unsafe deserialization**
```python
import pickle
data = pickle.loads(user_input) # py.deser.pickle_loads
data = pickle.loads(user_input) // py.deser.pickle_loads
```
### Tier B — heuristic-guarded
Tier B, heuristic guard:
**Java: SQL concatenation**
```java
// Fires: concatenated argument
stmt.executeQuery("SELECT * FROM users WHERE id=" + userId);
// java.sqli.execute_concat
stmt.executeQuery("SELECT * FROM users WHERE id=" + userId); // java.sqli.execute_concat
// Does NOT fire: parameterized query
// Does not fire: parameterized
stmt.executeQuery(preparedSql);
```
**C: Format string**
```c
// Fires: variable as first argument
printf(user_input); // c.memory.printf_no_fmt
// Does NOT fire: literal format string
printf("%s", user_input);
printf(user_input); // c.memory.printf_no_fmt: fires (variable as fmt)
printf("%s", user_input); // does not fire (literal fmt)
```

View file

@ -1,26 +1,22 @@
# State Model Analysis
# State model analysis
## Summary
Tracks resource lifecycle and authentication state through a function. Detects use-after-close, double-close, leaks, and unauthenticated access to privileged operations.
Nyx's state model analysis tracks **resource lifecycle** and **authentication state** through a function using monotone dataflow over bounded lattices. It detects use-after-close bugs, double-close bugs, resource leaks, and unauthenticated access to privileged operations.
State analysis is **opt-in** — enable it with `scanner.enable_state_analysis = true` in config. It requires `mode = "full"` or `mode = "cfg"`.
State analysis is on by default. Disable with `scanner.enable_state_analysis = false`. It runs in `--mode full` and `--mode taint`; AST-only mode skips it.
## Rule IDs
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `state-use-after-close` | High | Variable used after being closed/released |
| `state-double-close` | Medium | Resource closed twice |
| `state-resource-leak` | Medium | Resource opened but never closed (definite) |
| `state-resource-leak-possible` | Low | Resource may not be closed on all paths |
| `state-unauthed-access` | High | Privileged operation reached without authentication |
| Rule ID | Severity |
|---|---|
| `state-use-after-close` | High |
| `state-double-close` | Medium |
| `state-resource-leak` | Medium |
| `state-resource-leak-possible` | Low |
| `state-unauthed-access` | High |
## What It Detects
## What it detects
### Use-after-close (`state-use-after-close`)
A resource transitions to the CLOSED state (via `close()`, `fclose()`, `disconnect()`, etc.), then a use operation (`read`, `write`, `send`, `recv`, `query`, etc.) is performed on it.
**`state-use-after-close`**: Resource transitions to CLOSED (via `close`, `fclose`, `disconnect`, …), then a use operation happens on it.
```c
FILE *f = fopen("data.txt", "r");
@ -28,147 +24,108 @@ fclose(f);
fread(buf, 1, 100, f); // state-use-after-close
```
### Double-close (`state-double-close`)
**`state-double-close`**: Resource closed twice. Crashes or undefined behaviour on most runtimes.
A resource is closed twice. This can cause crashes or undefined behavior.
**`state-resource-leak`**: Resource opened but never closed on any path through the function. Definite leak.
```python
f = open("data.txt")
f.close()
f.close() # state-double-close
```
**`state-resource-leak-possible`**: Resource closed on some paths but not others. Lower confidence; often an early-return error path.
### Resource leak (`state-resource-leak`)
**`state-unauthed-access`**: A function recognised as a web handler reaches a privileged sink without an auth call on the path.
A resource is opened but never closed on any path through the function. This is a definite leak.
A function counts as a web handler if its name starts with `handle_`, `route_`, or `api_` (sufficient on its own), or starts with `serve_`/`process_` and the file uses web-shaped parameter names (`request`, `req`, `ctx`, `res`, `response`, `w`, `writer`, language-dependent). `main` is excluded.
```java
FileInputStream fis = new FileInputStream("data.txt");
process(fis);
// function exits without fis.close() — state-resource-leak
```
## Managed-resource suppression
### Possible resource leak (`state-resource-leak-possible`)
Several language-specific cleanup patterns suppress leak findings:
A resource is closed on some paths but not others.
| Pattern | Languages | Effect |
|---|---|---|
| RAII / Drop | Rust | All leak findings suppressed except `alloc`/`dealloc` |
| Smart pointers | C++ | `make_unique`/`make_shared` treated as managed; raw `new`/`malloc` still tracked |
| `defer` | Go | `defer f.Close()` suppresses leak at exit |
| `with` context manager | Python | `with open(f) as f:` suppresses leak for the bound name |
| try-with-resources | Java | TWR-bound resources suppressed |
```go
f, err := os.Open("data.txt")
if err != nil {
return // f not closed here
}
f.Close() // closed here
// state-resource-leak-possible on the error path
```
## What it can't detect
### Unauthenticated access (`state-unauthed-access`)
- **Cross-function resource ownership.** Open in one function, close in another, leak gets reported in the opener. The most common FP source for leak detection.
- **Factory / builder functions** that return a resource for the caller to manage.
- **Variable shadowing across scopes.** Same name in inner and outer scope shares one symbol; an inner close masks an outer leak.
- **Resources stored in collections.** Handles in arrays / maps / channels and cleaned up via iteration are not tracked.
- **Dynamic dispatch.** Close called via trait object or interface may not be recognised.
- **Type-state authentication.** `AuthenticatedRequest<T>` and similar Rust patterns are not recognised as auth.
A function identified as a web handler reaches a privileged sink (shell execution, file I/O) without any authentication check on the path.
## Common false positives
A function is identified as a web handler if:
1. Its name starts with `handle_`, `route_`, or `api_` (strong match — sufficient on its own), OR
2. Its name starts with `serve_` or `process_` AND any function in the file has web-like parameter names (`request`, `req`, `ctx`, `res`, `response`, `w`, `writer`, etc., varying by language).
| Scenario | Why | Mitigation |
|---|---|---|
| Factory returns a resource | Caller owns it | Known limitation |
| Framework-managed handles | Connection pool, request scope | Exclude framework code or downgrade |
| Variable name shadowing | Same name reused | Known limitation |
The function name `main` is explicitly excluded.
## Per-language detection
```javascript
app.post('/admin/exec', (req, res) => {
// No auth check
exec(req.body.command); // state-unauthed-access
});
```
| Language | Leak | Double-close | Use-after-close | Notes |
|---|---|---|---|---|
| C | yes | yes | yes | `fopen`/`fclose`, `malloc`/`free`, `pthread_mutex_*` |
| C++ | yes | yes | yes | C pairs plus `new`/`delete`; smart pointers suppressed |
| Python | yes | yes | yes | `with` suppressed; `open`, `socket`, `connect` |
| Go | yes | yes | yes | `defer` suppressed; `os.Open` / `.Close` |
| Rust | unsafe only | n/a | n/a | RAII suppresses everything except `alloc`/`dealloc` |
| JavaScript | yes | yes | partial | `fs.openSync`/`closeSync` |
| TypeScript | yes | yes | partial | Same as JS |
| PHP | yes | yes | partial | `fopen`/`fclose`, `curl_init`/`curl_close`, `mysqli_*` |
| Ruby | partial | partial | partial | `File.open`/`close`, `TCPSocket` |
| Java | limited | limited | limited | Constructor-callee matching is incomplete |
## What It Cannot Detect
- **Cross-function resource management**: Resources opened in one function and closed in another are not tracked. This is the most common source of false positives for leak detection.
- **RAII / defer / try-with-resources**: Implicit cleanup via language-level constructs (Rust's `Drop`, Go's `defer`, Java's try-with-resources, Python's `with`) is not recognized. These patterns will produce false-positive leak findings.
- **Dynamic dispatch**: If `close()` is called through a trait object or interface, it may not be recognized.
- **Authentication via type system**: Rust's type-state pattern (e.g. `AuthenticatedRequest<T>`) is not recognized as an auth check.
- **Complex authorization logic**: Only recognized function name patterns are checked.
## Common False Positives
| Scenario | Why it fires | Mitigation |
|----------|-------------|------------|
| RAII / Drop / defer cleanup | Implicit cleanup not visible | Known limitation; filter by severity |
| Resource returned to caller | Ownership transferred, not leaked | Known limitation |
| Framework-managed resources | Web framework manages connection lifecycle | Exclude framework-generated handlers |
| Try-with-resources (Java) | Language construct not parsed | Known limitation |
| Context manager (Python `with`) | Block construct not tracked | Known limitation |
## Common False Negatives
| Scenario | Why it's missed |
|----------|----------------|
| Resource closed in helper function | Cross-function tracking not implemented |
| Auth in middleware | Auth check happens before handler is called |
| Double-close via aliased reference | Alias analysis not performed |
## Confidence Signals
| Signal | Meaning |
|--------|---------|
| **Definite leak (state-resource-leak)** | Resource is never closed on any path — high confidence |
| **Use-after-close** | Read/write operation after explicit close — high confidence |
| **Web handler detected** | Entry point matched by parameter naming convention |
| **Possible leak (state-resource-leak-possible)** | Resource closed on some but not all paths — lower confidence |
## Tuning and Noise Controls
### Enable state analysis
```toml
[scanner]
enable_state_analysis = true
```
### Severity filtering
## Tuning
```bash
# Skip possible-leak findings (Low severity)
nyx scan . --severity ">=MEDIUM"
nyx scan . --severity ">=MEDIUM" # Skip "possible" leaks (Low)
```
### Exclude test files
```toml
[scanner]
excluded_directories = ["tests", "test", "spec"]
enable_state_analysis = true # default
excluded_directories = ["tests", "test", "spec"]
```
## Resource Pairs
## Recognised pairs
The state engine recognizes these acquire/release pairs per language:
The state engine ships these acquire/release pairs. Custom pairs are not yet configurable; file an issue if you need one.
### C/C++
| Acquire | Release | Resource |
|---------|---------|----------|
| `fopen` | `fclose` | File handle |
| `open` | `close` | File descriptor |
| `socket` | `close` | Socket |
| `malloc`, `calloc`, `realloc` | `free` | Heap memory |
| `pthread_mutex_lock` | `pthread_mutex_unlock` | Mutex |
**C / C++**
### Rust
| Acquire | Release | Resource |
|---------|---------|----------|
| `File::open`, `File::create` | `drop`, `close` | File handle |
| `TcpStream::connect` | `shutdown` | TCP connection |
| `lock`, `read`, `write` (on Mutex/RwLock) | `drop` | Lock guard |
| Acquire | Release |
|---|---|
| `fopen` | `fclose` |
| `open` | `close` |
| `socket` | `close` |
| `malloc`, `calloc`, `realloc` | `free` |
| `pthread_mutex_lock` | `pthread_mutex_unlock` |
| `new`, `new[]` *(C++)* | `delete`, `delete[]` |
### Java
| Acquire | Release | Resource |
|---------|---------|----------|
| `new FileInputStream` | `close` | File stream |
| `getConnection` | `close` | DB connection |
| `new Socket` | `close` | Socket |
**Rust**
### Go, Python, JavaScript, Ruby, PHP
Similar patterns with language-specific function names.
| Acquire | Release |
|---|---|
| `File::open`, `File::create` | `drop`, `close` |
| `TcpStream::connect` | `shutdown` |
| `lock`, `read`, `write` (Mutex/RwLock) | `drop` |
## Use Patterns (Trigger use-after-close)
**Java**
The following operations on a closed resource trigger `state-use-after-close`:
| Acquire | Release |
|---|---|
| `new FileInputStream` (and friends) | `close` |
| `getConnection` | `close` |
| `new Socket` | `close` |
Go, Python, JavaScript, Ruby, PHP follow language-idiomatic equivalents.
## Use-after-close triggers
These operations on a closed resource fire `state-use-after-close`:
```
read, write, send, recv, fread, fwrite, fgets, fputs, fprintf, fscanf,
@ -177,28 +134,3 @@ ungetc, query, execute, fetch, sendto, recvfrom, ioctl, fcntl,
strcpy, strncpy, strcat, strncat, memcpy, memmove, memset, memcmp,
strcmp, strncmp, strlen, sprintf, snprintf
```
## Technical Details
### Resource Lifecycle Lattice
```
UNINIT → OPEN → CLOSED
→ MOVED
```
States are tracked as bitflags, allowing the lattice to represent uncertainty (e.g. OPEN|CLOSED means the resource is open on some paths and closed on others).
### Leak Detection Scope
Resource leaks are checked at the file-level exit node and the **synthesized** function exit node (a single Return node that all early returns feed into). Early-return nodes are **not** checked individually — only the merged state at the function's synthesized exit is inspected. This prevents duplicate findings where an early-return path reports a definite leak while the merged exit correctly reports a possible leak.
This per-function exit inspection ensures that a variable leaked inside one function is not masked by a same-named variable that is properly closed in a subsequent function.
### Auth Level Lattice
```
Unauthed < Authed < Admin
```
Join semantics: take the minimum (conservative). If any path is unauthenticated, the result is unauthenticated.

View file

@ -1,10 +1,8 @@
# Taint Analysis
# Taint analysis
## Summary
Nyx tracks untrusted data from **sources** (where it enters the program) through assignments and function calls to **sinks** (where it's used dangerously). If the flow reaches a sink without passing a matching **sanitizer**, a finding fires.
Nyx's taint analysis tracks the flow of untrusted data from **sources** (where data enters the program) through **assignments and function calls** to **sinks** (where dangerous operations happen). If the data reaches a sink without passing through a **sanitizer** with matching capabilities, a finding is emitted.
The engine uses a monotone forward dataflow analysis over a finite lattice with guaranteed termination. Analysis is **intra-procedural with cross-file function summaries** — it does not follow calls into other functions but uses pre-computed summaries of their behavior.
The engine is a monotone forward dataflow over a finite lattice with guaranteed termination. It's flow-sensitive inside a function, and interprocedural across files via persisted per-function summaries.
## Rule ID
@ -12,191 +10,135 @@ The engine uses a monotone forward dataflow analysis over a finite lattice with
taint-unsanitised-flow (source <line>:<col>)
```
One rule ID covers all taint findings. The parenthetical identifies the specific source location.
One rule ID, parameterized by the source location. Suppressions can target either the base ID or the full string.
## What It Detects
## What it detects
- Environment variables flowing to shell execution (`env::var``Command::new`)
- User input flowing to code evaluation (`req.body``eval()`)
- File contents flowing to SQL queries (`fs::read_to_string``db.execute()`)
- Request parameters flowing to HTML output (`req.query``innerHTML`)
- Any source-to-sink flow where the sink's required capability is not stripped by a sanitizer
- User input flowing to shell execution: `req.body.cmd``child_process.exec`
- User input flowing to code evaluation: `req.query.code``eval`
- User input flowing to SQL: `request.args.get('id')``cursor.execute(f"... {id}")`
- Environment variables flowing to shell: `env::var("CMD")``Command::new("sh").arg("-c")`
- Request parameters flowing to HTML: `req.query.name``innerHTML`
- File contents flowing to privileged sinks: `fs::read_to_string``db.execute`
- Any other source-to-sink flow where the sink's required capability is not stripped along the way
## What It Cannot Detect
## What it can't detect
- **Inter-procedural flows without summaries**: If a function isn't summarized (e.g. from a third-party library without source), the taint engine cannot track data through it. It conservatively treats unknown callees as neither propagating nor sanitizing.
- **Flows through data structures**: Taint is tracked per-variable, not per-field. `obj.field = tainted; sink(obj.other_field)` may produce a false positive because taint attaches to `obj` as a whole.
- **Aliasing**: `let y = &x; sink(*y)` — the engine tracks `y` as a fresh variable, not an alias of `x`. This can cause false negatives.
- **Complex control flow**: The analysis is flow-sensitive (respects control flow within a function) but does not track taint through arbitrary loops with complex exit conditions.
- **Implicit flows**: Taint only follows explicit data flow, not information flow through branching (e.g. `if (secret) { x = 1 } else { x = 0 }` does not taint `x`).
- **Library calls without summaries.** If a callee has no summary (no source, binary-only dependency), Nyx treats it as neither propagating nor sanitizing. This is conservative for sanitization but lossy for propagation.
- **Taint through struct fields and containers.** Taint attaches to whole variables. `obj.field = tainted; sink(obj.other_field)` can produce a false positive because `obj` itself is tainted.
- **Aliasing.** `let y = &x; sink(*y)` tracks `y` separately from `x`. Can cause FNs.
- **Implicit flows.** Taint follows explicit data, not branching signal. `if (secret) x = 1 else x = 0` does not taint `x`.
- **Globals and statics across functions.** Not tracked across function boundaries.
## Common False Positives
## Common false positives
| Scenario | Why it happens | Mitigation |
|----------|---------------|------------|
| Custom sanitizer not recognized | Nyx only knows built-in and configured sanitizers | Add a custom sanitizer rule in config |
| Taint through struct fields | Variable-level (not field-level) tracking | No current mitigation; field sensitivity is planned |
| Dead code paths | The engine is path-insensitive within a function (it considers all paths) | Contradiction pruning catches some cases; path-validated findings score lower |
| Library wrappers | A wrapper around a dangerous function may re-introduce taint that was sanitized by the wrapper | Summarize the wrapper function or add it as a sanitizer |
| Scenario | Why | Mitigation |
|---|---|---|
| Custom sanitizer not recognised | Only built-in + configured sanitizers match | Add a custom sanitizer rule in config |
| Taint through struct fields | Variable-level tracking, not field-level | No fix yet; field-sensitivity is planned |
| Dead branches | Path-insensitive within a function | Constraint solving catches trivially infeasible combos; path-validated findings are scored lower |
| Library wrapper re-introduces taint | Wrapper opaque, or summary marks it as propagating | Summarize the wrapper explicitly or add it as a sanitizer |
## Common False Negatives
## Common false negatives
| Scenario | Why it's missed |
|----------|----------------|
| Third-party library calls | No summary available; callee treated as opaque |
| Taint through global/static variables | Not tracked across function boundaries |
| Taint through closures/callbacks in some languages | Closure capture analysis is limited (JS/TS/Ruby/Go anonymous functions ARE analyzed) |
| Flows spanning more than two files | Summary approximation loses precision at depth |
| Scenario | Why |
|---|---|
| Third-party library on the path | No summary available, callee treated opaquely |
| Globals / statics across function boundaries | Not tracked |
| Some closure captures | Closure analysis is limited. JS/TS/Ruby/Go anonymous functions passed as callbacks *are* analyzed as separate scopes |
| Very deep cross-file chains | Summary approximation loses precision at depth |
## Confidence Signals
## Confidence signals
These signals in the output indicate higher-confidence findings:
Higher confidence:
- Source + Sink both present in evidence with specific call locations.
- `source_kind: user_input` (direct attacker control).
- `path_validated: false`.
- No dominating guard on the path.
- Symex produced a witness string (rendered sink value visible in JSON/SARIF `evidence.symbolic.witness`).
| Signal | What it means |
|--------|--------------|
| **Evidence: Source + Sink** | Both endpoints identified with specific function names and locations |
| **Source kind = user input** | Source is directly controllable by an attacker (req.body, argv, etc.) |
| **path_validated = false** | No validation guard on the path — higher exploitability |
| **No guard_kind** | No dominating predicate check (null check, error check, etc.) |
| **High rank_score** | Multiple confidence signals combined |
Lower confidence:
- Path-validated taint (`path_validated: true`).
- Source is a database read or internal file (pre-validated at insertion is common).
- Engine note `ForwardBailed` / `PathWidened`. Use `--require-converged` to drop these in strict gates.
Lower-confidence:
## Tuning
| Signal | What it means |
|--------|--------------|
| **path_validated = true** | A validation predicate guards the path — may not be exploitable |
| **guard_kind = "ValidationCall"** | An explicit validation function was called before the sink |
| **Source kind = database** | Data from DB — may already be validated at insertion time |
## Tuning and Noise Controls
### Add custom sanitizers
If your codebase has a custom sanitizer that Nyx doesn't recognize:
### Custom sanitizer
```toml
# nyx.local
[[analysis.languages.javascript.rules]]
matchers = ["escapeHtml", "sanitizeInput"]
kind = "sanitizer"
cap = "html_escape"
kind = "sanitizer"
cap = "html_escape"
```
Or via CLI:
```bash
nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape
```
Or: `nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape`.
### Filter by severity
### Filter by severity or confidence
```bash
nyx scan . --severity HIGH # Only high-severity taint findings
nyx scan . --severity ">=MEDIUM" # Skip low-severity
nyx scan . --severity HIGH
nyx scan . --min-confidence medium
```
### Skip non-production code
By default, findings in `tests/`, `vendor/`, `build/` paths are downgraded one severity tier. To exclude them entirely, add to config:
```toml
[scanner]
excluded_directories = ["tests", "vendor", "build", "examples"]
```
### Disable taint (AST-only mode)
### Skip dataflow entirely
```bash
nyx scan . --mode ast
```
AST-only mode gives you structural pattern matches without taint.
In the browser UI, taint findings render as a numbered flow walk so you can see each hop the engine took:
<p align="center"><img src="../../assets/screenshots/docs/serve-finding-detail.png" alt="Nyx finding detail: HIGH taint-unsanitised-flow with numbered source → call → sink steps and How to fix guidance" width="900"/></p>
## Example
**Vulnerable code** (Rust):
Rust:
```rust
use std::env;
use std::process::Command;
fn main() {
let cmd = env::var("USER_CMD").unwrap(); // line 5: source
Command::new("sh").arg("-c").arg(&cmd).output(); // line 6: sink
let cmd = env::var("USER_CMD").unwrap(); // source
Command::new("sh").arg("-c").arg(&cmd).output(); // sink
}
```
**Finding**:
Finding:
```
[HIGH] taint-unsanitised-flow (source 5:15) src/main.rs:6:5
Source: env::var("USER_CMD") at 5:15
Sink: Command::new("sh").arg("-c")
Score: 76
[HIGH] taint-unsanitised-flow (source 5:15) src/main.rs:6:5
Unsanitised user input flows from env::var → Command::new
Source: env::var (5:15)
Sink: Command::new
```
**Safe alternative**:
```rust
use std::env;
use std::process::Command;
Safe rewrite: drop the shell and pass the value as argv directly (`Command::new(&cmd).output()`), or validate against an allowlist before passing to the shell.
fn main() {
let cmd = env::var("USER_CMD").unwrap();
// Use the value as a direct argument, not a shell command
Command::new(&cmd).output();
// Or validate against an allowlist
}
```
## Capabilities
## Technical Details
Sources, sanitizers, and sinks are linked by named capabilities. A sanitizer only clears taint for the cap it declares. A sink only fires when the remaining taint still carries its required cap.
### Capability System
| Capability | Typical source | Typical sanitizer | Typical sink |
|---|---|---|---|
| `env_var` | `env::var`, `getenv`, `process.env` | | |
| `html_escape` | | `html.escape`, `DOMPurify.sanitize` | `innerHTML`, `document.write` |
| `shell_escape` | | `shlex.quote`, `shell_escape::escape` | `system`, `Command::new`, `eval` |
| `url_encode` | | `encodeURIComponent` | `location.href`, HTTP client URL arg |
| `json_parse` | | `JSON.parse` | |
| `file_io` | | `os.path.realpath`, `filepath.Clean` | `open`, `fs::read_to_string`, `send_file` |
| `fmt_string` | | | `printf(var)` |
| `sql_query` | | parameterized query binders | `cursor.execute`, `db.query` with concatenation |
| `deserialize` | | | `pickle.loads`, `yaml.load`, `Marshal.load` |
| `ssrf` | | URL-prefix locks | `requests.get`, `fetch`, `HttpClient.send` |
| `code_exec` | | | `eval`, `exec`, `Function` |
| `crypto` | | | weak-algorithm constructors |
| `unauthorized_id` | request-bound scoped IDs (Rust auth analysis) | ownership check | row-level write |
| `all` | Sources typically use `all` so they match any sink | | |
Taint uses a bitflag capability system to match sources with appropriate sanitizers and sinks:
| Capability | Bit | Sources | Sanitizers | Sinks |
|-----------|-----|---------|------------|-------|
| `ENV_VAR` | 0x01 | `env::var`, `getenv` | — | — |
| `HTML_ESCAPE` | 0x02 | — | `html_escape`, `DOMPurify.sanitize` | `innerHTML`, `document.write` |
| `SHELL_ESCAPE` | 0x04 | — | `shell_escape` | `Command::new`, `system()`, `eval()` |
| `URL_ENCODE` | 0x08 | — | `encodeURIComponent` | `location.href` |
| `JSON_PARSE` | 0x10 | — | `JSON.parse` | — |
| `FILE_IO` | 0x20 | — | `filepath.Clean`, `basename`, `os.path.realpath` | `fopen`, `open`, `send_file`, `fs::read_to_string` |
| `FMT_STRING` | 0x40 | — | — | `printf(var)` |
Sources typically use `Cap::all()` to match any sink. A sanitizer strips specific capability bits. A finding fires when a tainted variable reaches a sink and the taint still has the matching capability bit set.
### Nested Function Analysis
The CFG builder recursively discovers function expressions nested inside call arguments:
- **JavaScript/TypeScript**: `function_expression`, `arrow_function` inside call arguments (e.g., Express route handlers)
- **Ruby**: `do_block` and `block` nodes (e.g., Sinatra `get '/path' do...end`)
- **Go**: `func_literal` (anonymous function literals)
Each nested function is walked as a separate scope and receives a unique identifier (`<anon@{byte_offset}>`) to prevent collisions when multiple anonymous functions exist in the same file.
### Chained Call Classification
Method chains like `r.URL.Query().Get("host")` are normalized by stripping internal `()` segments between `.` separators. The classifier matches against both the original text and the normalized form, enabling rules like `r.URL` to match within `r.URL.Query.Get`.
### Nested Call Fallback
When the outermost call in an expression doesn't classify as a source/sink, the engine tries all nested inner calls. This handles patterns like `str(eval(expr))` where `str` is not a sink but the inner `eval` is.
### Rust `if let` / `while let` Pattern Bindings
The CFG builder recognizes Rust `let_condition` nodes inside `if` and `while` expressions. The value expression is classified for source/sink labels, and the pattern binding is extracted as a variable definition:
```rust
if let Ok(cmd) = env::var("CMD") {
// cmd is tainted — env::var is a source, cmd is the binding
Command::new("sh").arg("-c").arg(&cmd).output(); // taint-unsanitised-flow
}
```
This also works for `while let` patterns.
### JS/TS Two-Level Solve
For JavaScript and TypeScript, taint analysis uses a two-level approach:
1. **Level 1**: Solve top-level code (module scope)
2. **Level 2**: Solve each function seeded with the converged top-level state
This prevents false positives from cross-function taint leakage while preserving global-to-function flows.
Sources typically use `cap = "all"` so they match every sink. Sinks declare the specific cap they need. Sanitizers only clear the cap they name.

46
docs/how-it-works.md Normal file
View file

@ -0,0 +1,46 @@
# How Nyx works
If you're going to act on a finding, it helps to know how the scanner got there. This page is the short version. Source paths are linked where the answer to "exactly what does it do" lives in the code.
## The pipeline
A scan runs in two passes over the file tree, with an optional SQLite index that lets the second scan skip files whose content hash hasn't changed.
**Pass 1, per file.** Tree-sitter parses the file. Nyx builds an intra-procedural control-flow graph, lowers it to SSA, and extracts a summary per function describing what that function does at the boundary: which arguments flow to sinks, which sources it reads from, which sinks it calls, what taint it strips, what it returns. Summaries are persisted to SQLite ([`src/summary/`](https://github.com/elicpeter/nyx/tree/master/src/summary/), [`src/database.rs`](https://github.com/elicpeter/nyx/blob/master/src/database.rs)).
**Summary merge.** All per-file summaries get unioned into a global map keyed by qualified function name.
**Pass 2, per file.** Each file is reanalysed with the global summaries available. The taint engine runs a forward dataflow worklist over the SSA representation. When it hits a call, it consults summaries to decide whether the call propagates taint, sanitizes it, or terminates the flow. Findings are produced when tainted data reaches a sink whose required capability is still set on the value.
Two extra layers tune precision around calls. **Context-sensitive inlining** (k=1) re-runs intra-file callees with the actual argument taint at the call site, so a helper called once with tainted input and once with sanitized input produces the right result for each call. **SCC fixed-point**: when a group of mutually-recursive functions forms a strongly-connected component in the call graph, the engine iterates summaries to a joint fixed-point (capped at 64 iterations). SCCs that span files are also handled.
## Optional analyses on top
These run on top of the forward taint pass. They're independently switchable via `[analysis.engine]` config or matching CLI flags. See [advanced-analysis.md](advanced-analysis.md) for the full description and tradeoffs.
| Pass | Purpose | Default |
|---|---|---|
| Abstract interpretation | Carries interval and string prefix/suffix bounds alongside taint. Suppresses findings on proven-bounded integers and locked-prefix URLs | on |
| Context sensitivity | k=1 inlining for intra-file callees | on |
| Constraint solving | Drops paths whose accumulated branch predicates are unsatisfiable. Optional Z3 backend with `--features smt` | on |
| Symbolic execution | Builds an expression tree per tainted value. Produces a witness string at the sink. Detects sanitization patterns the taint engine alone would miss | on |
| Backwards analysis | After the forward pass, walks backwards from each sink to confirm or invalidate the flow. Annotates findings as `backwards-confirmed`, `backwards-infeasible`, or `backwards-budget-exhausted` | off |
`--engine-profile fast | balanced | deep` flips groups of these at once. `balanced` is the default and the configuration the benchmark numbers in [language-maturity.md](language-maturity.md) are measured against.
## Where bounds live
Static analysis at scale means choosing where to stop. Nyx exposes its bounds rather than hiding them:
- **Inline depth** is k=1. Callees larger than the inline body-size cap fall back to summary-based resolution.
- **SCC fixed-point** is capped at 64 iterations. If a recursive cluster doesn't converge, the engine emits the best summary it has and records an `engine_note` on affected findings.
- **Lattice width** is bounded. Taint origin sets cap at 32 entries per SSA value (`--max-origins`); points-to sets cap at 32 heap objects (`--max-pointsto`). Truncation is recorded as `OriginsTruncated` / `PointsToTruncated` so you can see when precision was lost.
- **Symbolic expressions** cap at depth 32. Deeper expressions degrade to `Unknown` rather than growing without bound.
Findings whose engine notes indicate a bound was hit can be filtered with `--require-converged` for strict CI gates. The flag drops over-reports and bails; under-reports (where the emitted finding is still real but the result set is a lower bound) are kept.
## What you get out
Each finding carries the source location, the sink location, the path in between (when symex produced one), the rule ID, severity, attack-surface score, confidence level, and a list of engine notes describing any precision loss along the way. Console output is human-readable; JSON and SARIF carry the full evidence object for tooling.
For the JSON shape and SARIF mapping, see [output.md](output.md).

View file

@ -1,32 +0,0 @@
# Nyx Documentation
Welcome to the Nyx documentation. Nyx is a multi-language static vulnerability scanner built in Rust.
## User Guide
- [Installation](installation.md) — Install via cargo, prebuilt binaries, or from source
- [Quick Start](quickstart.md) — Your first scan in 60 seconds
- [CLI Reference](cli.md) — Every flag, subcommand, and option
- [Configuration](configuration.md) — Config file schema, precedence, custom rules
- [Output Formats](output.md) — Console, JSON, SARIF; exit codes; evidence fields
## Detector Reference
- [Detector Overview](detectors.md) — How the four detector families work together
- [Taint Analysis](detectors/taint.md) — Cross-file source-to-sink dataflow tracking
- [CFG Structural Analysis](detectors/cfg.md) — Auth gaps, unguarded sinks, resource leaks
- [State Model Analysis](detectors/state.md) — Resource lifecycle and authentication state
- [AST Patterns](detectors/patterns.md) — Tree-sitter structural pattern matching
## Rule Reference
- [Rule Index](rules/index.md) — How rules are organized
- [Rust](rules/rust.md) | [C](rules/c.md) | [C++](rules/cpp.md) | [Java](rules/java.md) | [Go](rules/go.md)
- [JavaScript](rules/javascript.md) | [TypeScript](rules/typescript.md) | [Python](rules/python.md)
- [PHP](rules/php.md) | [Ruby](rules/ruby.md)
## Contributing
- [Contributing Guide](../CONTRIBUTING.md) — Development setup, adding rules, PR guidelines
- [Security Policy](../SECURITY.md) — Responsible disclosure
- [Code of Conduct](../CODE_OF_CONDUCT.md)

View file

@ -1,42 +1,42 @@
# Installation
## Install from crates.io
For the happy path (`cargo install nyx-scanner`, release binary on PATH), see the README. This page covers platform-specific notes and upgrade paths.
## Supported platforms
Release binaries are published for:
| Platform | Archive |
|---|---|
| Linux x86_64 | `nyx-x86_64-unknown-linux-gnu.zip` |
| macOS Intel | `nyx-x86_64-apple-darwin.zip` |
| macOS Apple Silicon | `nyx-aarch64-apple-darwin.zip` |
| Windows x86_64 | `nyx-x86_64-pc-windows-msvc.zip` |
Build from source works on any stable Rust 1.88+ target (edition 2024).
## Verify the download
Each release attaches a `SHA256SUMS` file. When the maintainer signs the release, a detached `SHA256SUMS.asc` is published alongside it.
```bash
cargo install nyx-scanner
# Verify the checksum file's signature (skip if .asc isn't present)
gpg --verify SHA256SUMS.asc SHA256SUMS
# Then check your archive against it
sha256sum -c SHA256SUMS --ignore-missing
```
This installs the `nyx` binary into `~/.cargo/bin/`.
If `sha256sum` is missing on macOS, `shasum -a 256 -c SHA256SUMS --ignore-missing` is equivalent.
## Install from GitHub releases
## Windows
1. Go to the [Releases](https://github.com/elicpeter/nyx/releases) page.
2. Download the binary for your platform:
| Platform | Archive |
|----------|---------|
| Linux x86_64 | `nyx-x86_64-unknown-linux-gnu.zip` |
| macOS Intel | `nyx-x86_64-apple-darwin.zip` |
| macOS Apple Silicon | `nyx-aarch64-apple-darwin.zip` |
| Windows x86_64 | `nyx-x86_64-pc-windows-msvc.zip` |
3. Extract and install:
```bash
# Linux / macOS
unzip nyx-*.zip
chmod +x nyx
sudo mv nyx /usr/local/bin/
# Windows (PowerShell)
Expand-Archive -Path nyx-*.zip -DestinationPath .
Move-Item -Path .\nyx.exe -Destination "C:\Program Files\Nyx\"
```
4. Verify:
```bash
nyx --version
```
```powershell
Expand-Archive -Path nyx-x86_64-pc-windows-msvc.zip -DestinationPath .
Move-Item -Path .\nyx.exe -Destination "C:\Program Files\Nyx\"
# Add C:\Program Files\Nyx to PATH in System Properties → Environment Variables
nyx --version
```
## Build from source
@ -44,33 +44,34 @@ This installs the `nyx` binary into `~/.cargo/bin/`.
git clone https://github.com/elicpeter/nyx.git
cd nyx
cargo build --release
cargo install --path .
# Binary at target/release/nyx
```
Requires **Rust 1.85+** (edition 2024).
The frontend is built and embedded into the binary during `cargo build`, so there's no separate step for `nyx serve`. Node is only required if you're working on the frontend itself; see `CONTRIBUTING.md`.
## CI Integration
Optional features:
### GitHub Actions
| Flag | Adds |
|---|---|
| `--features smt` | Bundles Z3 for stronger path-constraint solving. MIT-licensed; distributors should include Z3's license in their attribution |
| `--features smt-system-z3` | Links against a system-installed Z3 instead of bundling |
```yaml
- name: Install Nyx
run: cargo install nyx-scanner
## Upgrading
- name: Run security scan
run: nyx scan . --format sarif --fail-on medium > results.sarif
Nyx stores its scanner version in the project's index database. When the binary's version differs from the stored version, the index is wiped on the next scan and rebuilt against the new engine. You'll see one info-level log line:
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
```
engine version changed (0.4.0 → 0.5.0), rebuilding index
```
### Generic CI
No flag needed. If you see this on *every* scan, the metadata row isn't being persisted; file an issue.
## Corrupt database recovery
If the SQLite file itself is damaged (killed scan, full disk), delete it and let the next scan rebuild from scratch:
```bash
# Fail the build if any High or Medium finding is detected
nyx scan . --severity ">=MEDIUM" --fail-on medium --quiet --format json
rm "$(nyx config path)"/<project>.sqlite*
```
The `--fail-on` flag causes Nyx to exit with code **1** if any finding meets or exceeds the given severity. Exit code **0** means no findings matched.
Only the named project's rows are affected.

266
docs/language-maturity.md Normal file
View file

@ -0,0 +1,266 @@
# Language Maturity Matrix
Nyx supports ten languages, but support depth is not uniform. This page gives an
honest per-language picture so you can calibrate expectations before depending
on Nyx for a given stack.
The classifications here are grounded in three concrete signals:
1. **Rule depth**: how many distinct source / sanitizer / sink matchers exist
for the language in `src/labels/<lang>.rs`, and how many vulnerability
classes (Cap bits) those matchers cover.
2. **Benchmark results**: rule-level precision / recall / F1 on the 305-case
corpus (267 synthetic + 14 real-CVE pairs + 10 auth fixtures) in
[`tests/benchmark/RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md),
last measured 2026-04-23 with scanner version 0.5.0.
3. **Known weak spots**: FPs and FNs the maintainers have deliberately left
in the benchmark rather than suppressed, documented release-by-release in
[`RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md).
All parser integrations use tree-sitter and are stable; parsing is not a
differentiator between tiers. The differentiators are rule depth, cross-file
confidence, and modeled idioms.
---
## Tier Summary
| Tier | Languages | What to expect |
|------|-----------|----------------|
| **Stable** | Python, JavaScript, TypeScript | Deep rule sets, gated sinks (argument-role-aware), framework detection, extensive fixtures, and the bulk of advanced-analysis (SSA, context-sensitivity, symbolic execution) coverage. Safe to depend on in CI gates. |
| **Beta** | Go, Java, Ruby, PHP | Solid mid-depth rule sets with known narrower class coverage. No gated sinks yet. Cross-file flows work; some idioms (variable-typed method receivers, framework context, string interpolation) are incomplete. Usable in CI, but review FP/FN lists before tightening gates. |
| **Preview** | C, C++ | Pattern-only coverage. Pointer aliasing, function pointers, array-element taint, and STL container flows are not modeled. Suitable for finding obvious unsafe API uses; do not use as a sole SAST gate. Pair with clang-tidy / Clang Static Analyzer / Infer. |
| **Experimental** | Rust | Full source coverage relative to the framework ecosystem, but several FPs persist on adversarial safe cases pending engine work (match-arm guards, structural sinks with type facts). Appropriate for spot-checks and contribution but not yet recommended as a sole SAST dependency. |
---
## Per-Language Detail
### Stable tier
#### Python: 100% P / 100% R / 100% F1 *(29-case corpus)*
- **Rule depth**: 5 source families, 7 sanitizer families, 21 sink matchers
spanning HTML, URL, Shell, SQL, Code, SSRF, File I/O, and Deserialization.
- **Framework context**: Flask, Django, argparse source matchers; `flask_request`
import-alias support.
- **Advanced analysis**: gated sinks (`Popen`, `subprocess.run/call` with
activation-arg awareness), most SSA-equivalence and symbolic-execution
fixtures target Python.
- **Fixtures**: 125 under `tests/fixtures/` plus 30 benchmark cases.
- **Blind spots**: f-string interpolation is not explicitly modeled as a
distinct taint-producing construct; string-formatting flows are caught by
the general concatenation path.
#### JavaScript: 93.8% P / 100% R / 96.8% F1 *(27-case corpus)*
- **Rule depth**: 3 source families, 10 sanitizer families, 24 sink matchers
spanning HTML, URL, JSON, Shell, SQL, Code, SSRF, and File I/O.
- **Advanced analysis**: gated sinks (`setAttribute`, `parseFromString`),
two-level SSA solve for top-level + per-function scopes (`analyse_ssa_js_two_level`),
prefix-locked SSRF suppression via StringFact.
- **Framework context**: Express, Koa, Fastify (via in-file import scan when
`package.json` is absent).
- **Fixtures**: 238 under `tests/fixtures/`; the largest corpus of any
language.
- **Blind spots**: template literals are lowered through concatenation rather
than modeled as a first-class taint operator; dynamic property access
(`obj[user]`) is conservatively treated.
#### TypeScript: 100% P / 100% R / 100% F1 *(35-case corpus, most recent measurement)*
- **Rule depth**: Shares the JS ruleset (3 sources, 10 sanitizers, 24 sinks)
plus TS-specific grammar handling.
- **Advanced analysis**: TSX and JSX grammars wired as of 2026-04-20;
discriminated-union narrowing, generic erasure, decorator flow, and
interface dispatch are all validated against adversarial type-system
stressors.
- **Framework context**: Fastify detection via `detect_in_file_frameworks`
(import-driven, no `package.json` required).
- **Fixtures**: 39 test fixtures plus 35 benchmark cases.
- **Blind spots**: 0 known open weak spots as of 2026-04-20. `as any` casts
and `any`-typed flows are handled conservatively (treated as tainted).
### Beta tier
#### Go: 94.1% P / 100% R / 97.0% F1 *(28-case corpus)*
- **Rule depth**: 4 source families, 4 sanitizer families, 9 sink matchers
covering HTML, URL, Shell, SQL, SSRF, Crypto, and File I/O.
- **Framework context**: Gin, Echo source matchers.
- **Known gaps**: no gated sinks, no deserialization class, allowlist
early-return patterns in path-pruning benchmark cases still produce FPs
(`go-pathprune-safe-001`). `fmt.Sprintf` is deliberately not a sink.
#### Java: 92.9% P / 100% R / 96.3% F1 *(23-case corpus)*
- **Rule depth**: 3 source families, 8 sanitizer families, 10 sink matchers
covering HTML, URL, Shell, SQL, Code, SSRF, and Deserialization.
- **Framework context**: Spring, JPA, Hibernate ORM rules; JNDI injection
sinks.
- **Known gaps**: no gated sinks. Variable-receiver method calls
(`client.send(...)` vs `HttpClient.send(...)`) rely on type-qualified
resolution from receiver-type inference; flows where the receiver type
cannot be inferred are missed (`java-ssrf-002` historically persisted as
FN; closed via type facts but fragile on unusual builder chains).
#### Ruby: 100% P / 92.3% R / 96.0% F1 *(24-case corpus)*
- **Rule depth**: 3 source families, 7 sanitizer families, 15 sink matchers
covering HTML, Shell, SQL, Code, SSRF, File I/O, and Deserialization.
- **Framework context**: Rails helpers (`sanitize_sql`, `permit`, `require`).
- **Known gaps**: string interpolation inside shell and SQL strings is
recognized structurally but not modeled as a distinct operator.
`begin/rescue/ensure` exception-edge wiring is documented as deferred
(structurally incompatible with `build_try()`). One FN persists on an
interprocedural taint propagation case due to rule-ID mismatch, not a
missed flow (`rb-interproc-001`).
#### PHP: 86.7% P / 100% R / 92.9% F1 *(24-case corpus)*
- **Rule depth**: 3 source families (`$_GET`, `$_POST`, `$_REQUEST`
superglobals), 7 sanitizer families, 10 sink matchers covering HTML, URL,
Shell, SQL, Code, SSRF, File I/O, and Deserialization.
- **Known gaps**: no gated sinks. Limited framework context (Laravel raw
methods only). Interprocedural sanitizer-wrapping case
(`php-interproc-safe-001`) persists as FP. `echo` language-construct
detection is wired but its inner-argument propagation is narrower than
function-call sinks.
### Preview tier
C and C++ are labeled **Preview** (not Experimental) to convey a specific
shape of limitation: the parser and existing rules produce useful findings
on obvious unsafe-API uses, but the engine **structurally cannot model**
several pervasive C/C++ constructs. Running Nyx on a C/C++ codebase and
seeing a clean report should not be read as a clean audit. Pair Nyx with
clang-tidy, the Clang Static Analyzer, or Infer for production use.
**Not modeled** (common to both C and C++):
- Pointer aliasing. Taint through `*p`, `p->field`, arbitrary pointer
arithmetic, and aliased writes are not tracked.
- Function pointers and callback dispatch. Indirect calls through
`void (*fn)(char *)` resolve to no callee.
- Array-element taint. Writes to `buf[i]` do not propagate taint to `buf`
in the general case; structural taint chains involving `fgets` → array →
`system` have rule-ID matching issues (`c-cmdi-004`).
- STL container operations (C++ only). `std::vector`, `std::map`,
`std::string` methods are not taint-aware; `c_str()` breaks taint chains
(`cpp-cmdi-003`).
- Lambdas and nested classes (C++ only). Not modeled.
- Complex socket setup (C++ only). E.g. `connect()` chains are not detected
(`cpp-ssrf-002`).
#### C: 85.7% P / 100% R / 92.3% F1 *(20-case corpus)*
- **Rule depth**: 3 source families, **2** sanitizer families (prefix-based
only), 5 sink matchers spanning Shell, File, SSRF, and Format-String.
- **Known gaps**: no framework rules, no gated sinks. Path-validation via
`strstr()` is not recognized as a guard (`c-safe-006`). Forward-declared
sanitizers are not tracked (`c-safe-008`).
#### C++: 80.0% P / 100% R / 88.9% F1 *(20-case corpus)*
- **Rule depth**: Clones the C ruleset (3 sources, 2 sanitizers, 5 sinks) and
adds `std::cin` / `std::getline` sources.
- **Known gaps**: same sanitizer-recognition gaps as C. See the "Not
modeled" list above for structural gaps (STL containers, `c_str()`,
`connect()`, lambdas, nested classes).
### Experimental tier
#### Rust: 76.0% P / 100% R / 86.4% F1 *(31-case adversarial corpus)*
- **Rule depth**: 6 source families, **2** sanitizer families (prefix and
type-coercion), 11 sink matchers covering HTML, Shell, SQL, SSRF,
Deserialization, and File I/O. Extensive framework source coverage
(Axum, Actix, Rocket); the most of any language on the source side.
- **Recent additions (2026-04-20)**: new SQL class (`rusqlite`, `sqlx`,
`diesel`, `postgres`), new Deserialization class (`serde_yaml`,
`bincode`, `rmp_serde`, `ciborium`, `ron`, `toml`), expanded file I/O
(`fs::remove_file/dir/rename/copy`), `reqwest` SSRF builder chain.
- **Known gaps**:
- `rs-safe-003`: structural `cfg-unguarded-sink` fires when a tainted
variable is *declared* in scope but not used in the sink; intentional
for high-risk sinks.
- `rs-safe-009`: match-arm guards don't surface as `StmtKind::If`, so
`classify_condition` never sees the character-class validation.
- `safe_direct_sanitizer.rs`: still FP because the SSA lowering for
an OR-chain rejection (`if a || b || c { return X }`) joins both
return paths into a single block, losing the early-return
semantics. Distinct from the merged-return-block defect closed in
2026-04-24; tracked separately.
- **Closed by the 2026-04-23 PathFact domain**
(`src/abstract_interp/path_domain.rs`): `rs-safe-007` (`.replace("..",
"")` sanitiser), `rs-safe-008` (negative-validation return pattern),
`rs-safe-010` (static-map lookup; still handled by the dedicated
static-map analysis, but PathFact does not interfere), new `rs-safe-012`
(`.contains("..")` + `.starts_with('/')` intraprocedural rejection),
new `rs-safe-015` (`Path::new(p).is_absolute()` typed rejection), plus a
new `rs-path-006` negative-guard to prevent over-suppression.
- **Closed by the 2026-04-24 per-return-path PathFact landing**
(`PathFactReturnEntry` on `SsaFuncSummary` + structural
variant-wrapper transparency + non-data-return skipping +
path-fact-proven leaf detection in
`trace_tainted_leaf_values`):
`rs-safe-014` (Option-returning user sanitiser),
new `rs-safe-016` (cross-function `.contains("..")` rejection),
`CVE-2018-20997` patched (tar-rs zip-slip),
`CVE-2022-36113` patched (cargo `.cargo-ok` symlink),
`CVE-2024-24576` patched (BatBadBut argv injection).
- **Not yet covered**: unsafe FFI / `std::mem::transmute` (no rules), Tokio
`process::Command` async variants (not distinguished from sync),
`hyper` / `surf` / `ureq` SSRF clients (reqwest family only), and Rocket /
Actix positive cases (rules exist but no benchmark fixtures yet).
---
## How the tiers were assigned
A language lands in **Stable** when all three hold:
- Rule set covers ≥ 8 vulnerability classes with both source and sink
matchers, and at least one class has argument-role-aware gating.
- Benchmark F1 ≥ 95% on a corpus of ≥ 25 cases.
- Advanced analysis (SSA lowering, context-sensitivity, symbolic-execution)
is exercised by fixtures for the language.
A language lands in **Beta** when benchmark F1 ≥ 90% but at least one of the
Stable criteria fails; usually narrower cap coverage or absence of gated
sinks.
A language lands in **Preview** when the engine structurally cannot model
constructs that are pervasive in typical codebases for that language
(pointer aliasing, function pointers, array-element taint, STL containers
for C/C++). Pattern-only coverage is useful but not sufficient as a sole
SAST gate.
A language lands in **Experimental** when rule depth is clearly narrower
(≤ 5 sinks and ≤ 2 sanitizers), or benchmark F1 < 90%, or documented weak
spots require engine changes rather than rule additions to close, but the
engine does not have the pervasive structural blind spots of the Preview
tier.
---
## What this means for you
- **CI gates**: safe to set strict `--fail-on HIGH` gates on Stable-tier
languages. On Beta-tier, expect occasional FP triage; the weak-spot lists
above tell you exactly what to skim for. On Preview- and Experimental-tier,
treat Nyx findings as a starting point for manual review rather than
authoritative; Preview-tier languages in particular have structural
blind spots that a clean report will not disclose.
- **Rule contributions**: the shortest path to raising a language's tier is
contributing sink matchers and gated-sink registrations. Label files live
at `src/labels/<lang>.rs`; benchmark cases live at
`tests/benchmark/corpus/<lang>/`.
- **Scope planning**: if your primary stack is C, C++, or Rust, Nyx will
surface real findings, but you should budget for review time and consider
combining Nyx with a language-specific tool (e.g. `cargo-audit`,
`clang-tidy`) until those tiers mature.
The benchmark thresholds in `tests/benchmark_test.rs` are deliberately set
~5 pp below current baselines so any drop in a language's F1 fails CI. Tier
promotions require sustained benchmark performance, not just rule additions.

View file

@ -19,9 +19,9 @@ Human-readable, color-coded output to stdout. Status messages go to stderr.
| Tag | Color | Meaning |
|-----|-------|---------|
| `[HIGH]` | Red, bold | Critical likely exploitable |
| `[MEDIUM]` | Orange, bold | Important may be exploitable |
| `[LOW]` | Muted blue-gray | Informational code quality or weak signal |
| `[HIGH]` | Red, bold | Critical -- likely exploitable |
| `[MEDIUM]` | Orange, bold | Important -- may be exploitable |
| `[LOW]` | Muted blue-gray | Informational -- code quality or weak signal |
### Evidence fields
@ -139,9 +139,9 @@ Fields marked "no" are omitted when empty/null/false to keep output compact.
| Level | Meaning |
|-------|---------|
| `High` | Strong signal taint-confirmed flow, definite state violation |
| `Medium` | Moderate signal resource leak, path-validated taint, CFG structural |
| `Low` | Weak signal AST pattern match, possible resource leak, degraded analysis |
| `High` | Strong signal -- taint-confirmed flow, definite state violation |
| `Medium` | Moderate signal -- resource leak, path-validated taint, CFG structural |
| `Low` | Weak signal -- AST pattern match, possible resource leak, degraded analysis |
### Evidence object
@ -192,12 +192,12 @@ nyx scan . --format sarif > results.sarif
The SARIF output includes:
- **Tool metadata** Nyx name and version
- **Rules** Rule ID, description, severity mapping
- **Results** One result per finding with location, message, and properties
- **Properties** Each result includes `category` and optionally `confidence` and `rollup.count`
- **Related locations** Rollup findings include example locations in `relatedLocations`
- **Artifacts** File paths referenced by findings
- **Tool metadata** -- Nyx name and version
- **Rules** -- Rule ID, description, severity mapping
- **Results** -- One result per finding with location, message, and properties
- **Properties** -- Each result includes `category` and optionally `confidence` and `rollup.count`
- **Related locations** -- Rollup findings include example locations in `relatedLocations`
- **Artifacts** -- File paths referenced by findings
### GitHub Code Scanning integration
@ -229,9 +229,9 @@ Without `--fail-on`, Nyx always exits `0` on a successful scan regardless of fin
| Level | Description | Typical rules |
|-------|-------------|---------------|
| **High** | Critical vulnerabilities likely exploitable | Command injection, unsafe deserialization, banned C functions, taint-confirmed flows with user input sources |
| **Medium** | Important issues may be exploitable with additional context | SQL concatenation, XSS sinks, reflection, unguarded sinks, resource leaks |
| **Low** | Informational code quality or weak signals | Weak crypto algorithms, insecure randomness, `unwrap()`/`panic!()`, type-safety escapes |
| **High** | Critical vulnerabilities -- likely exploitable | Command injection, unsafe deserialization, banned C functions, taint-confirmed flows with user input sources |
| **Medium** | Important issues -- may be exploitable with additional context | SQL concatenation, XSS sinks, reflection, unguarded sinks, resource leaks |
| **Low** | Informational -- code quality or weak signals | Weak crypto algorithms, insecure randomness, `unwrap()`/`panic!()`, type-safety escapes |
### Non-production severity downgrade
@ -265,8 +265,8 @@ x = dangerous() # nyx:ignore taint-unsanitised-flow ← suppresses this lin
x = dangerous() ← suppresses this line
```
- `nyx:ignore <RULE_ID>` suppresses findings on the **same line** as the comment.
- `nyx:ignore-next-line <RULE_ID>` suppresses findings on the **next line**.
- `nyx:ignore <RULE_ID>` -- suppresses findings on the **same line** as the comment.
- `nyx:ignore-next-line <RULE_ID>` -- suppresses findings on the **next line**.
- For taint findings, the primary line is the **sink line** (the `line` field in output).
### Rule ID matching
@ -312,4 +312,4 @@ Suppressed findings do **not** trigger `--fail-on`. A scan with only suppressed
| `state-*` | State model | `state-use-after-close`, `state-resource-leak` |
| `<lang>.*.*` | AST patterns | `rs.memory.transmute`, `js.code_exec.eval` |
See the [Rule Reference](rules/index.md) for a complete listing.
See the [Rule Reference](rules.md) for a complete listing.

View file

@ -1,103 +1,101 @@
# Quick Start
# Quick start
## Your first scan
After `cargo install nyx-scanner` (or dropping a release binary on your PATH), point Nyx at a directory:
```bash
# Scan the current directory
nyx scan
# Scan a specific path
nyx scan ./my-project
```
Nyx automatically creates an SQLite index on first run. Subsequent scans skip unchanged files.
First run builds a SQLite index under `.nyx/`; later runs skip files whose content hash hasn't changed.
## Understanding the output
## What a finding looks like
A typical console output looks like:
<p align="center"><img src="../assets/screenshots/docs/cli-scan-quickstart.png" alt="nyx scan output: two HIGH taint flows (Python os.system, JavaScript document.write) framed by the brand purple gradient" width="900"/></p>
The same scan in console form:
```
[HIGH] taint-unsanitised-flow (source 5:11) src/handler.rs:12:5
Source: env::var("CMD") at 5:11
Sink: Command::new("sh").arg("-c")
Score: 76
/tmp/demo/cmdi_direct.py
6:5 ✖ [HIGH] taint-unsanitised-flow (source 5:11) (Score: 81, Confidence: High)
Unsanitised user input flows from request.args.get → os.system
[MEDIUM] cfg-unguarded-sink src/handler.rs:12:5
Score: 35
Source: request.args.get (5:11)
Sink: os.system
[MEDIUM] rs.quality.unsafe_block src/lib.rs:44:5
Score: 30
6:5 ✖ [HIGH] py.cmdi.os_system (Score: 64, Confidence: High)
Os.system() — shell command execution
/tmp/demo/xss_document_write.js
5:5 ✖ [HIGH] taint-unsanitised-flow (source 3:18) (Score: 81, Confidence: High)
Unsanitised user input flows from req.query.content → document.write
Source: req.query.content (3:18)
Sink: document.write
5:5 ⚠ [MEDIUM] js.xss.document_write (Score: 34, Confidence: High)
Document.write() — XSS sink
warning 'demo' generated 10 issues.
Finished in 0.054s.
```
Each finding shows:
Each finding is one line of header plus evidence. Fields that matter:
| Field | Meaning |
|-------|---------|
| **Severity tag** | `[HIGH]`, `[MEDIUM]`, or `[LOW]` |
| **Rule ID** | Identifies the detector and specific rule |
| **Location** | `file:line:col` |
| **Evidence** | Source, Sink, and guard details (taint findings only) |
| **Score** | Attack-surface ranking score (higher = more exploitable) |
|---|---|
| `[HIGH]` / `[MEDIUM]` / `[LOW]` | Severity after the non-prod downgrade |
| Rule ID | Either a taint rule (`taint-unsanitised-flow`), a structural rule (`cfg-*`, `state-*`), or an AST pattern (`<lang>.<category>.<name>`) |
| Score | Attack-surface ranking (severity + analysis kind + source kind + evidence). Higher is more exploitable |
| Confidence | `High`, `Medium`, `Low`. Drops for AST-only matches, capped widened flows, and lowered-to-Low backwards-infeasible findings |
| Source / Sink | Where tainted data entered and where the dangerous call happened |
## Common workflows
Two rules firing on the same line (the taint finding plus the AST pattern) is normal. The pattern matches the structural presence of `document.write`; the taint rule adds the evidence that `req.query.content` actually reached it. Both carry distinct rule IDs so suppressions can target one without the other.
### CI gate — fail on high-severity findings
## Fail a CI job on High findings
```bash
nyx scan . --fail-on high --quiet
# Exit code 1 if any HIGH finding exists, 0 otherwise
nyx scan . --fail-on HIGH --quiet
```
### Export for tooling
Exit 1 if any HIGH finding remains. `--quiet` drops the "Using default configuration" banner so CI logs stay tidy.
## Emit SARIF for GitHub Code Scanning
```bash
# JSON for scripting
nyx scan . --format json > findings.json
# SARIF for GitHub Code Scanning
nyx scan . --format sarif > results.sarif
```
### Fast structural scan (no dataflow)
Full SARIF schema and GitHub Actions wiring: [cli.md](cli.md) and [output.md](output.md).
## Tighten the gate
```bash
# Only HIGH findings
nyx scan . --severity HIGH
# HIGH + MEDIUM
nyx scan . --severity ">=MEDIUM"
# Drop anything below Medium confidence (useful for CI)
nyx scan . --min-confidence medium
# Also drop findings the engine could not fully resolve (widened / bailed)
nyx scan . --require-converged
```
`--require-converged` keeps `under-report` findings (the emitted flow is still real) but drops over-reports and widenings. Intended for strict gates where a noisy finding is worse than nothing.
## Skip dataflow for a fast first pass
```bash
nyx scan . --mode ast
```
AST-only mode runs tree-sitter pattern queries without building CFGs or running taint analysis. Much faster, but misses dataflow vulnerabilities.
AST-only mode runs tree-sitter patterns without building a CFG or running taint. It's fast and still catches banned-API uses, weak crypto, and obvious XSS sinks, but it can't tell `eval("1+1")` apart from `eval(userInput)`. Use it as a pre-commit filter, not as a CI gate replacement.
### Filter by severity
## Next
```bash
# Only high-severity
nyx scan . --severity HIGH
# High and medium
nyx scan . --severity ">=MEDIUM"
# Specific set
nyx scan . --severity "HIGH,MEDIUM"
```
### Skip the index
```bash
nyx scan . --index off
```
Useful for one-off scans or when you don't want to write to disk.
### Scan without non-production noise
By default, findings in test/vendor/build paths are downgraded one severity tier. To keep original severity:
```bash
nyx scan . --keep-nonprod-severity
```
## Next steps
- [CLI Reference](cli.md) — All flags and options
- [Configuration](configuration.md) — Customize rules, exclusions, and behavior
- [Detector Overview](detectors.md) — How the analysis engines work
- [Rule Reference](rules/index.md) — Browse all rules by language
- [CLI reference](cli.md) for every flag and subcommand.
- [Configuration](configuration.md) for the `nyx.conf` / `nyx.local` schema, profiles, and custom rules.
- [`nyx serve`](serve.md) for the browser UI, triage workflow, and scan history.
- [Language maturity](language-maturity.md) for per-language tier and known FP/FN patterns.

1
docs/roadmap.md Normal file
View file

@ -0,0 +1 @@
{{#include ../ROADMAP.md}}

258
docs/rules.md Normal file
View file

@ -0,0 +1,258 @@
# Rule reference
Every finding Nyx emits has a rule ID. This page enumerates the IDs that ship with scanner 0.5.0, grouped by family.
> This page is written by hand and drifts against the code. Authoritative sources: [`src/patterns/<lang>.rs`](https://github.com/elicpeter/nyx/tree/master/src/patterns) for AST patterns, [`src/labels/<lang>.rs`](https://github.com/elicpeter/nyx/tree/master/src/labels) for taint matchers, and [`src/auth_analysis/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/config.rs) for auth rules. If a rule fires that isn't listed here, the source file is right and this page is wrong.
If you'd rather browse rules interactively, [`nyx serve`](serve.md) ships a Rules page that lists every loaded matcher with its language, kind, and capability:
<p align="center"><img src="../assets/screenshots/docs/serve-rules.png" alt="Nyx Rules page: filterable list of 218 rules with language, kind (SOURCE/SANITIZER/SINK), capability, and finding count columns" width="900"/></p>
## ID format
| Prefix | Detector | Example |
|---|---|---|
| `taint-*` | Taint analysis | `taint-unsanitised-flow (source 5:11)` |
| `cfg-*` | CFG structural | `cfg-unguarded-sink`, `cfg-auth-gap` |
| `state-*` | State model | `state-use-after-close`, `state-resource-leak` |
| `<lang>.auth.*` | Auth analysis | `rs.auth.missing_ownership_check` |
| `<lang>.<category>.<name>` | AST patterns | `rs.memory.transmute`, `js.code_exec.eval` |
Language prefixes: `rs`, `c`, `cpp`, `go`, `java`, `js`, `ts`, `py`, `php`, `rb`.
## Cross-language rules
### Taint
One rule covers every source-to-sink flow. The parenthetical identifies the source location.
| Rule ID | Severity |
|---|---|
| `taint-unsanitised-flow (source L:C)` | Varies by source kind and sink capability |
The matcher sets (sources, sanitizers, sinks, gated sinks) live per-language in `src/labels/<lang>.rs`. [Language maturity](language-maturity.md) gives per-language counts and what's covered.
### CFG structural
| Rule ID | Severity |
|---|---|
| `cfg-unguarded-sink` | High/Medium |
| `cfg-auth-gap` | High |
| `cfg-unreachable-sink` | Medium |
| `cfg-unreachable-sanitizer` | Low |
| `cfg-unreachable-source` | Low |
| `cfg-error-fallthrough` | High/Medium |
| `cfg-resource-leak` | Medium |
| `cfg-lock-not-released` | Medium |
### State model
| Rule ID | Severity |
|---|---|
| `state-use-after-close` | High |
| `state-double-close` | Medium |
| `state-resource-leak` | Medium |
| `state-resource-leak-possible` | Low |
| `state-unauthed-access` | High |
### Auth analysis (Rust only, today)
| Rule ID | Severity |
|---|---|
| `rs.auth.missing_ownership_check` | High |
| `rs.auth.missing_ownership_check.taint` | High (gated by `scanner.enable_auth_as_taint`) |
See [auth.md](auth.md) for scope, the five sink-classes, and tuning.
## AST patterns by language
Each language ships a tree-sitter pattern registry. Structural match on the pattern, no dataflow. Some patterns also have a Tier B heuristic guard (e.g. SQL execute must receive a concatenation, not a literal) noted in the registry.
The tables below are generated from `src/patterns/<lang>.rs` by [`tools/docgen`](https://github.com/elicpeter/nyx/tree/master/tools/docgen). Run `cargo run --features docgen --bin nyx-docgen` after changing the registry to refresh them.
<!-- BEGIN AUTOGEN rules-by-language -->
### C: 8 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `c.cmdi.system` | High | A | High |
| `c.memory.gets` | High | A | High |
| `c.memory.printf_no_fmt` | High | B | Medium |
| `c.memory.scanf_percent_s` | High | A | High |
| `c.memory.sprintf` | High | A | High |
| `c.memory.strcat` | High | A | High |
| `c.memory.strcpy` | High | A | High |
| `c.cmdi.popen` | Medium | A | High |
### C++: 9 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `cpp.cmdi.popen` | High | A | High |
| `cpp.cmdi.system` | High | A | High |
| `cpp.memory.gets` | High | A | High |
| `cpp.memory.printf_no_fmt` | High | B | Medium |
| `cpp.memory.sprintf` | High | A | High |
| `cpp.memory.strcat` | High | A | High |
| `cpp.memory.strcpy` | High | A | High |
| `cpp.memory.const_cast` | Medium | A | High |
| `cpp.memory.reinterpret_cast` | Medium | A | High |
### Go: 8 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `go.cmdi.exec_command` | High | A | High |
| `go.transport.insecure_skip_verify` | High | A | High |
| `go.deser.gob_decode` | Medium | A | High |
| `go.memory.unsafe_pointer` | Medium | A | High |
| `go.secrets.hardcoded_key` | Medium | A | High |
| `go.sqli.query_concat` | Medium | B | Medium |
| `go.crypto.md5` | Low | A | Medium |
| `go.crypto.sha1` | Low | A | Medium |
### Java: 8 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `java.cmdi.runtime_exec` | High | A | High |
| `java.deser.readobject` | High | A | High |
| `java.reflection.class_forname` | Medium | A | High |
| `java.reflection.method_invoke` | Medium | A | High |
| `java.sqli.execute_concat` | Medium | B | Medium |
| `java.xss.getwriter_print` | Medium | A | High |
| `java.crypto.insecure_random` | Low | A | Medium |
| `java.crypto.weak_digest` | Low | A | Medium |
### JavaScript: 22 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `js.code_exec.eval` | High | A | High |
| `js.code_exec.new_function` | High | A | High |
| `js.config.cors_dynamic_origin` | High | A | Medium |
| `js.code_exec.settimeout_string` | Medium | A | High |
| `js.config.insecure_session_httponly` | Medium | A | High |
| `js.config.reject_unauthorized` | Medium | A | High |
| `js.config.verbose_error_response` | Medium | A | Medium |
| `js.crypto.weak_hash_import` | Medium | A | Medium |
| `js.prototype.extend_object` | Medium | A | High |
| `js.prototype.proto_assignment` | Medium | A | High |
| `js.secrets.fallback_secret` | Medium | A | Medium |
| `js.xss.cookie_write` | Medium | A | High |
| `js.xss.document_write` | Medium | A | High |
| `js.xss.insert_adjacent_html` | Medium | A | High |
| `js.xss.location_assign` | Medium | A | High |
| `js.xss.outer_html` | Medium | A | High |
| `js.config.insecure_session_samesite` | Low | A | High |
| `js.config.insecure_session_secure` | Low | A | Medium |
| `js.crypto.math_random` | Low | A | Medium |
| `js.crypto.weak_hash` | Low | A | Medium |
| `js.secrets.hardcoded_secret` | Low | A | Medium |
| `js.transport.fetch_http` | Low | A | Medium |
### PHP: 11 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `php.cmdi.system` | High | A | High |
| `php.code_exec.assert_string` | High | A | High |
| `php.code_exec.create_function` | High | A | High |
| `php.code_exec.eval` | High | A | High |
| `php.code_exec.preg_replace_e` | High | A | High |
| `php.deser.unserialize` | High | A | High |
| `php.path.include_variable` | High | B | Medium |
| `php.sqli.query_concat` | Medium | B | Medium |
| `php.crypto.md5` | Low | A | Medium |
| `php.crypto.rand` | Low | A | Medium |
| `php.crypto.sha1` | Low | A | Medium |
### Python: 13 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `py.cmdi.os_popen` | High | A | High |
| `py.cmdi.os_system` | High | A | High |
| `py.cmdi.subprocess_shell` | High | B | Medium |
| `py.code_exec.eval` | High | A | High |
| `py.code_exec.exec` | High | A | High |
| `py.deser.pickle_loads` | High | A | High |
| `py.deser.yaml_load` | High | A | High |
| `py.code_exec.compile` | Medium | A | High |
| `py.deser.shelve_open` | Medium | A | High |
| `py.sqli.execute_format` | Medium | B | Medium |
| `py.xss.jinja_from_string` | Medium | A | High |
| `py.crypto.md5` | Low | A | Medium |
| `py.crypto.sha1` | Low | A | Medium |
### Ruby: 11 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `rb.cmdi.backtick` | High | A | High |
| `rb.cmdi.system_interp` | High | A | High |
| `rb.code_exec.class_eval` | High | A | High |
| `rb.code_exec.eval` | High | A | High |
| `rb.code_exec.instance_eval` | High | A | High |
| `rb.deser.marshal_load` | High | A | High |
| `rb.deser.yaml_load` | High | A | High |
| `rb.reflection.constantize` | Medium | A | High |
| `rb.reflection.send_dynamic` | Medium | B | Medium |
| `rb.ssrf.open_uri` | Medium | A | High |
| `rb.crypto.md5` | Low | A | Medium |
### Rust: 13 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `rs.memory.copy_nonoverlapping` | High | A | High |
| `rs.memory.get_unchecked` | High | A | High |
| `rs.memory.mem_zeroed` | High | A | High |
| `rs.memory.ptr_read` | High | A | High |
| `rs.memory.transmute` | High | A | High |
| `rs.quality.unsafe_block` | Medium | A | High |
| `rs.quality.unsafe_fn` | Medium | A | High |
| `rs.memory.mem_forget` | Low | A | High |
| `rs.memory.narrow_cast` | Low | A | Medium |
| `rs.quality.expect` | Low | A | High |
| `rs.quality.panic_macro` | Low | A | High |
| `rs.quality.todo` | Low | A | High |
| `rs.quality.unwrap` | Low | A | High |
### TypeScript: 22 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `ts.code_exec.eval` | High | A | High |
| `ts.code_exec.new_function` | High | A | High |
| `ts.config.cors_dynamic_origin` | High | A | Medium |
| `ts.code_exec.settimeout_string` | Medium | A | High |
| `ts.config.insecure_session_httponly` | Medium | A | High |
| `ts.config.reject_unauthorized` | Medium | A | High |
| `ts.config.verbose_error_response` | Medium | A | Medium |
| `ts.crypto.weak_hash_import` | Medium | A | Medium |
| `ts.prototype.proto_assignment` | Medium | A | High |
| `ts.secrets.fallback_secret` | Medium | A | Medium |
| `ts.xss.document_write` | Medium | A | High |
| `ts.xss.insert_adjacent_html` | Medium | A | High |
| `ts.xss.location_assign` | Medium | A | High |
| `ts.xss.outer_html` | Medium | A | High |
| `ts.config.insecure_session_samesite` | Low | A | High |
| `ts.config.insecure_session_secure` | Low | A | Medium |
| `ts.crypto.math_random` | Low | A | Medium |
| `ts.crypto.weak_hash` | Low | A | Medium |
| `ts.quality.any_annotation` | Low | A | Medium |
| `ts.quality.as_any` | Low | A | Medium |
| `ts.secrets.hardcoded_secret` | Low | A | Medium |
| `ts.xss.cookie_write` | Low | A | Medium |
<!-- END AUTOGEN rules-by-language -->
## Capability list for custom rules
`nyx config add-rule --cap <name>` and `[analysis.languages.*.rules]` in config accept:
`env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `code_exec`, `crypto`, `unauthorized_id`, `all`
Source for both the enum and the `to_cap` mapping: [`src/labels/mod.rs`](https://github.com/elicpeter/nyx/blob/master/src/labels/mod.rs) (`Cap`) and [`src/utils/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/utils/config.rs) (`CapName`).

View file

@ -1,89 +0,0 @@
# C Rules
Nyx detects C vulnerabilities through AST patterns (banned functions, format strings) and taint analysis (user input → shell execution, buffer overflow sinks).
## Taint Sources
| Function | Capability | Source Kind |
|----------|-----------|-------------|
| `getenv` | `all` | EnvironmentConfig |
| `fgets`, `scanf`, `fscanf`, `gets`, `read` | `all` | UserInput |
## Taint Sinks
| Function | Required Capability |
|----------|-------------------|
| `system`, `popen`, `exec*` family | `SHELL_ESCAPE` |
| `sprintf`, `strcpy`, `strcat` | `HTML_ESCAPE` |
| `printf`, `fprintf` | `FMT_STRING` |
| `fopen`, `open` | `FILE_IO` |
---
## AST Pattern Rules
### Memory Safety (Banned Functions)
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `c.memory.gets` | High | A | `gets()` — no bounds checking, always exploitable |
| `c.memory.strcpy` | High | A | `strcpy()` — no bounds checking on destination buffer |
| `c.memory.strcat` | High | A | `strcat()` — no bounds checking on destination buffer |
| `c.memory.sprintf` | High | A | `sprintf()` — no length limit on output buffer |
| `c.memory.scanf_percent_s` | High | A | `scanf("%s")` — unbounded string read |
| `c.memory.printf_no_fmt` | High | B | `printf(var)` — format-string vulnerability (non-literal first arg) |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `c.cmdi.system` | High | A | `system()` — shell command execution |
| `c.cmdi.popen` | Medium | A | `popen()` — shell command execution with pipe |
---
## Examples
### `c.memory.gets` — Banned function
**Vulnerable:**
```c
char buf[64];
gets(buf); // No bounds checking — buffer overflow
```
**Safe alternative:**
```c
char buf[64];
fgets(buf, sizeof(buf), stdin);
```
### `c.memory.printf_no_fmt` — Format string
**Vulnerable:**
```c
char *user_input = get_input();
printf(user_input); // Format string vulnerability
```
**Safe alternative:**
```c
char *user_input = get_input();
printf("%s", user_input);
```
### `c.cmdi.system` — Shell execution
**Vulnerable:**
```c
char cmd[256];
snprintf(cmd, sizeof(cmd), "ls %s", user_dir);
system(cmd); // Command injection if user_dir contains shell metacharacters
```
**Safe alternative:**
```c
// Use execvp with explicit argument array
char *args[] = {"ls", user_dir, NULL};
execvp("ls", args);
```

View file

@ -1,66 +0,0 @@
# C++ Rules
C++ rules inherit C banned-function concerns and add C++-specific patterns like dangerous casts.
## Taint Labels
C++ shares taint labels with C. See [C Rules](c.md) for the full source/sink/sanitizer listing.
---
## AST Pattern Rules
### Memory Safety
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `cpp.memory.gets` | High | A | `gets()` — no bounds checking, always exploitable |
| `cpp.memory.strcpy` | High | A | `strcpy()` — no bounds checking on destination |
| `cpp.memory.strcat` | High | A | `strcat()` — no bounds checking on destination |
| `cpp.memory.sprintf` | High | A | `sprintf()` — no length limit on output |
| `cpp.memory.reinterpret_cast` | Medium | A | `reinterpret_cast` — type-punning cast |
| `cpp.memory.const_cast` | Medium | A | `const_cast` — removes const/volatile qualifier |
| `cpp.memory.printf_no_fmt` | High | B | `printf(var)` — format-string vulnerability |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `cpp.cmdi.system` | High | A | `system()` — shell command execution |
| `cpp.cmdi.popen` | High | A | `popen()` — shell command execution |
---
## Examples
### `cpp.memory.reinterpret_cast` — Type-punning cast
**Flagged:**
```cpp
int x = 42;
float* fp = reinterpret_cast<float*>(&x); // Type-punning, may violate strict aliasing
```
**Safe alternative:**
```cpp
int x = 42;
float f;
std::memcpy(&f, &x, sizeof(f)); // Well-defined type punning
```
### `cpp.memory.const_cast` — Removing const
**Flagged:**
```cpp
void process(const std::string& s) {
char* p = const_cast<char*>(s.c_str()); // Removes const
p[0] = 'X'; // Undefined behavior
}
```
**Safe alternative:**
```cpp
void process(std::string s) { // Take by value
s[0] = 'X';
}
```

View file

@ -1,148 +0,0 @@
# Go Rules
Nyx detects Go vulnerabilities through AST patterns and taint analysis, covering command execution, unsafe pointer usage, TLS misconfiguration, weak crypto, SQL injection, hardcoded secrets, and deserialization.
## Taint Labels
Go has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/go.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `os.Getenv` | all |
| `http.Request`, `r.FormValue`, `r.URL`, `r.Body`, `r.Header` | all |
| `r.URL.Query`, `r.URL.Query.Get`, `Request.FormValue`, `Request.URL` | all |
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `html.EscapeString`, `template.HTMLEscapeString` | HTML_ESCAPE |
| `url.QueryEscape`, `url.PathEscape` | URL_ENCODE |
| `filepath.Clean`, `filepath.Base` | FILE_IO |
### Sinks
| Matcher | Cap |
|---------|-----|
| `exec.Command` | SHELL_ESCAPE |
| `db.Query`, `db.Exec`, `db.QueryRow`, `db.Prepare` | SHELL_ESCAPE |
| `fmt.Fprintf`, `fmt.Sprintf`, `fmt.Printf` | FMT_STRING |
| `os.Open`, `os.OpenFile`, `os.Create`, `ioutil.ReadFile`, `os.ReadFile` | FILE_IO |
| `template.HTML` | HTML_ESCAPE |
> **Note:** Chained calls like `r.URL.Query().Get("host")` are normalized by stripping internal `()` segments before matching, so `r.URL.Query.Get` matches the source rule.
---
## AST Pattern Rules
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.cmdi.exec_command` | High | A | `exec.Command()` — arbitrary process execution |
### Memory Safety
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.memory.unsafe_pointer` | Medium | A | `unsafe.Pointer` — bypasses Go type system |
### Insecure Transport
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.transport.insecure_skip_verify` | High | A | `InsecureSkipVerify: true` — disables TLS certificate validation |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.crypto.md5` | Low | A | `md5.New()` / `md5.Sum()` — weak hash algorithm |
| `go.crypto.sha1` | Low | A | `sha1.New()` / `sha1.Sum()` — weak hash algorithm |
### SQL Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.sqli.query_concat` | Medium | B | `db.Query`/`Exec`/`QueryRow` with concatenated string |
### Secrets
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.secrets.hardcoded_key` | Medium | A | Variable with secret-like name assigned a string literal |
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.deser.gob_decode` | Medium | A | `gob.NewDecoder` — Go binary deserialization |
---
## Examples
### `go.transport.insecure_skip_verify` — TLS misconfiguration
**Vulnerable:**
```go
tr := &http.Transport{
TLSClientConfig: &tls.Config{
InsecureSkipVerify: true, // Disables certificate verification
},
}
```
**Safe alternative:**
```go
tr := &http.Transport{
TLSClientConfig: &tls.Config{
// Use proper CA certificates
RootCAs: certPool,
},
}
```
### `go.sqli.query_concat` — SQL concatenation
**Vulnerable:**
```go
rows, err := db.Query("SELECT * FROM users WHERE id=" + userID)
```
**Safe alternative:**
```go
rows, err := db.Query("SELECT * FROM users WHERE id=$1", userID)
```
### `go.secrets.hardcoded_key` — Hardcoded secret
**Flagged:**
```go
apiKey := "sk-1234567890abcdef"
password := "hunter2"
```
**Safe alternative:**
```go
apiKey := os.Getenv("API_KEY")
password := os.Getenv("DB_PASSWORD")
```
### `go.cmdi.exec_command` — Command execution
**Vulnerable:**
```go
cmd := exec.Command("sh", "-c", userInput)
cmd.Run()
```
**Safe alternative:**
```go
// Use explicit command and arguments, not shell
cmd := exec.Command("ls", "-la", safeDir)
cmd.Run()
```

View file

@ -1,79 +0,0 @@
# Rule Reference
This section lists every detection rule in Nyx, organized by language.
## Rule ID Format
| Prefix | Detector Family | Example |
|--------|----------------|---------|
| `taint-*` | [Taint analysis](../detectors/taint.md) | `taint-unsanitised-flow (source 5:11)` |
| `cfg-*` | [CFG structural](../detectors/cfg.md) | `cfg-unguarded-sink`, `cfg-auth-gap` |
| `state-*` | [State model](../detectors/state.md) | `state-use-after-close`, `state-resource-leak` |
| `<lang>.*.*` | [AST patterns](../detectors/patterns.md) | `rs.memory.transmute`, `js.code_exec.eval` |
## Cross-Language Rules
These rules apply to all supported languages:
### Taint Rules
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `taint-unsanitised-flow (source L:C)` | Varies by source kind | Unsanitized data flows from source to sink |
### CFG Structural Rules
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `cfg-unguarded-sink` | High/Medium | Sink without dominating guard |
| `cfg-auth-gap` | High | Web handler reaches privileged sink without auth |
| `cfg-unreachable-sink` | Medium | Dangerous function in unreachable code |
| `cfg-unreachable-sanitizer` | Low | Sanitizer in unreachable code |
| `cfg-unreachable-source` | Low | Source in unreachable code |
| `cfg-error-fallthrough` | High/Medium | Error path doesn't terminate before dangerous code |
| `cfg-resource-leak` | Medium | Resource not released on all exit paths |
| `cfg-lock-not-released` | Medium | Lock not released on all exit paths |
### State Model Rules
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `state-use-after-close` | High | Variable used after being closed |
| `state-double-close` | Medium | Resource closed twice |
| `state-resource-leak` | Medium | Resource never closed (definite) |
| `state-resource-leak-possible` | Low | Resource may not close on all paths |
| `state-unauthed-access` | High | Privileged operation without authentication |
## Per-Language AST Pattern Rules
Each language page lists all AST pattern rules with examples:
- [Rust](rust.md) — 12 rules (memory safety, code quality)
- [C](c.md) — 8 rules (banned functions, command execution, format strings)
- [C++](cpp.md) — 9 rules (banned functions, dangerous casts, command execution)
- [Java](java.md) — 8 rules (deserialization, command execution, reflection, SQL, crypto, XSS)
- [Go](go.md) — 8 rules (command execution, unsafe pointer, TLS, crypto, SQL, secrets, deserialization)
- [JavaScript](javascript.md) — 12 rules (code execution, XSS, prototype pollution, crypto, transport)
- [TypeScript](typescript.md) — 10 rules (mirrors JS + type-safety escapes)
- [Python](python.md) — 12 rules (code execution, command execution, deserialization, SQL, crypto, XSS)
- [PHP](php.md) — 11 rules (code execution, command execution, deserialization, SQL, path traversal, crypto)
- [Ruby](ruby.md) — 10 rules (code execution, command execution, deserialization, reflection, SSRF, crypto)
## Taint Label Coverage
Taint analysis uses language-specific source/sink/sanitizer labels. Coverage varies by language:
| Language | Sources | Sinks | Sanitizers | Coverage |
|----------|---------|-------|------------|----------|
| Rust | Complete | Complete | Complete | Full |
| JavaScript | Complete | Complete | Partial | Full |
| TypeScript | Partial | Partial | Partial | Moderate |
| Python | Partial | Complete | Partial | Moderate |
| C | Partial | Complete | Minimal | Moderate |
| C++ | Partial | Complete | Minimal | Moderate |
| Java | Partial | Partial | Partial | Moderate |
| Go | Complete | Complete | Partial | Full |
| PHP | Complete | Complete | Partial | Full |
| Ruby | Partial | Partial | Partial | Moderate |
"Starter" coverage means basic rules exist but many common library functions are not yet labeled. Contributions welcome.

View file

@ -1,135 +0,0 @@
# Java Rules
Nyx detects Java vulnerabilities through AST patterns and taint analysis, covering deserialization, command execution, reflection, SQL injection, weak crypto, and XSS.
## Taint Labels
Java has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/java.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `System.getenv` | all |
| `getParameter`, `getInputStream`, `getHeader`, `getCookies`, `getReader`, `getQueryString`, `getPathInfo` | all |
| `readObject`, `readLine` | all |
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `HtmlUtils.htmlEscape`, `StringEscapeUtils.escapeHtml4` | HTML_ESCAPE |
### Sinks
| Matcher | Cap |
|---------|-----|
| `Runtime.exec`, `ProcessBuilder` | SHELL_ESCAPE |
| `executeQuery`, `executeUpdate`, `prepareStatement` | SHELL_ESCAPE |
| `Class.forName` | SHELL_ESCAPE |
| `println`, `print`, `write` | HTML_ESCAPE |
---
## AST Pattern Rules
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.deser.readobject` | High | A | `ObjectInputStream.readObject()` — unsafe deserialization |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.cmdi.runtime_exec` | High | A | `Runtime.getRuntime().exec()` — shell command execution |
### Reflection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.reflection.class_forname` | Medium | A | `Class.forName()` — dynamic class loading |
| `java.reflection.method_invoke` | Medium | A | `Method.invoke()` — reflective method invocation |
### SQL Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.sqli.execute_concat` | Medium | B | SQL `execute*()` with concatenated string argument |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.crypto.insecure_random` | Low | A | `new Random()``java.util.Random` is not cryptographically secure |
| `java.crypto.weak_digest` | Low | A | `MessageDigest.getInstance("MD5"/"SHA1")` |
### XSS
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.xss.getwriter_print` | Medium | A | `response.getWriter().print/println/write` — direct output |
---
## Examples
### `java.deser.readobject` — Unsafe deserialization
**Vulnerable:**
```java
ObjectInputStream ois = new ObjectInputStream(request.getInputStream());
Object obj = ois.readObject(); // Arbitrary object instantiation
```
**Safe alternative:**
```java
// Use a safe format like JSON
ObjectMapper mapper = new ObjectMapper();
MyType obj = mapper.readValue(request.getInputStream(), MyType.class);
```
### `java.sqli.execute_concat` — SQL concatenation
**Vulnerable:**
```java
String query = "SELECT * FROM users WHERE id=" + userId;
stmt.executeQuery(query); // SQL injection
```
**Safe alternative:**
```java
PreparedStatement ps = conn.prepareStatement("SELECT * FROM users WHERE id=?");
ps.setString(1, userId);
ResultSet rs = ps.executeQuery();
```
### `java.cmdi.runtime_exec` — Command execution
**Vulnerable:**
```java
Runtime.getRuntime().exec("cmd /c " + userCommand);
```
**Safe alternative:**
```java
ProcessBuilder pb = new ProcessBuilder("cmd", "/c", "dir");
// Use explicit argument list, never concatenate user input
```
### `java.reflection.class_forname` — Dynamic class loading
**Flagged:**
```java
Class<?> cls = Class.forName(className);
Object obj = cls.getDeclaredConstructor().newInstance();
```
**Safe alternative:**
```java
// Use an allowlist of permitted class names
Map<String, Class<?>> allowed = Map.of("User", User.class, "Order", Order.class);
Class<?> cls = allowed.get(className);
if (cls != null) { /* ... */ }
```

View file

@ -1,138 +0,0 @@
# JavaScript Rules
JavaScript has the most complete taint label coverage alongside Rust. Nyx detects code execution, XSS, prototype pollution, command injection, and weak crypto.
## Taint Sources
| Function | Capability | Source Kind |
|----------|-----------|-------------|
| `document.location`, `window.location` | `all` | UserInput |
| `req.body`, `req.query`, `req.params` | `all` | UserInput |
| `req.headers`, `req.cookies` | `all` | UserInput |
| `process.env` | `all` | EnvironmentConfig |
## Taint Sinks
| Function | Required Capability |
|----------|-------------------|
| `eval` | `SHELL_ESCAPE` |
| `innerHTML` | `HTML_ESCAPE` |
| `location.href`, `window.location.href` | `URL_ENCODE` |
| `child_process.exec`, `child_process.execSync` | `SHELL_ESCAPE` |
| `child_process.spawn` | `SHELL_ESCAPE` |
## Taint Sanitizers
| Function | Strips Capability |
|----------|------------------|
| `JSON.parse` | `JSON_PARSE` |
| `encodeURIComponent`, `encodeURI` | `URL_ENCODE` |
| `DOMPurify.sanitize` | `HTML_ESCAPE` |
> **Note:** Anonymous function expressions and arrow functions passed as callback arguments (e.g., Express `app.get('/path', function(req, res) { ... })`) are automatically walked as separate function scopes for taint analysis. Each anonymous function gets a unique scope identifier to prevent cross-function taint leakage.
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.code_exec.eval` | High | A | `eval()` — dynamic code execution |
| `js.code_exec.new_function` | High | A | `new Function()` — eval equivalent |
| `js.code_exec.settimeout_string` | Medium | A | `setTimeout`/`setInterval` with string argument |
### XSS Sinks
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.xss.document_write` | Medium | A | `document.write()` / `document.writeln()` |
| `js.xss.outer_html` | Medium | A | Assignment to `.outerHTML` |
| `js.xss.insert_adjacent_html` | Medium | A | `insertAdjacentHTML()` |
| `js.xss.location_assign` | Medium | A | Assignment to `location`/`location.href` — open redirect |
| `js.xss.cookie_write` | Medium | A | Write to `document.cookie` |
### Prototype Pollution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.prototype.proto_assignment` | Medium | A | Assignment to `__proto__` |
| `js.prototype.extend_object` | Medium | A | Assignment to `Object.prototype.*` |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.crypto.weak_hash` | Low | A | `crypto.createHash("md5"/"sha1")` |
| `js.crypto.math_random` | Low | A | `Math.random()` — not cryptographically secure |
### Insecure Transport
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.transport.fetch_http` | Low | A | `fetch("http://...")` — plaintext HTTP |
---
## Examples
### `js.code_exec.eval` — Dynamic code execution
**Vulnerable:**
```javascript
const code = req.query.code;
eval(code); // Remote code execution
```
**Safe alternative:**
```javascript
// Use a sandboxed interpreter or avoid eval entirely
const allowed = { add: (a, b) => a + b };
const result = allowed[req.query.operation]?.(req.query.a, req.query.b);
```
### `js.xss.document_write` — XSS sink
**Vulnerable:**
```javascript
document.write("<h1>" + userName + "</h1>");
```
**Safe alternative:**
```javascript
const el = document.createElement("h1");
el.textContent = userName;
document.body.appendChild(el);
```
### `js.prototype.proto_assignment` — Prototype pollution
**Vulnerable:**
```javascript
function merge(target, source) {
for (let key in source) {
target[key] = source[key]; // If key is "__proto__", pollutes prototype
}
}
```
**Safe alternative:**
```javascript
function merge(target, source) {
for (let key in source) {
if (key === "__proto__" || key === "constructor") continue;
target[key] = source[key];
}
}
```
### Taint: `req.body``eval()`
**Finding:**
```
[HIGH] taint-unsanitised-flow (source 2:18) src/handler.js:3:5
Source: req.body at 2:18
Sink: eval()
Score: 78
```

View file

@ -1,138 +0,0 @@
# PHP Rules
Nyx detects PHP vulnerabilities through AST patterns and taint analysis, covering code execution, command injection, deserialization, SQL injection, path traversal, and weak crypto.
## Taint Labels
PHP has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/php.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `$_GET` / `_GET`, `$_POST` / `_POST`, `$_REQUEST` / `_REQUEST`, `$_COOKIE` / `_COOKIE`, `$_FILES` / `_FILES`, `$_SERVER` / `_SERVER`, `$_ENV` / `_ENV` | all |
| `file_get_contents`, `fread` | all |
> **Note:** PHP superglobal names are matched both with and without the `$` prefix because the CFG's `collect_idents` strips the leading `$` from variable names. Subscript access like `$_GET['cmd']` is handled via `element_reference` / `subscript_expression` node detection.
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `htmlspecialchars`, `htmlentities` | HTML_ESCAPE |
| `escapeshellarg`, `escapeshellcmd` | SHELL_ESCAPE |
| `basename` | FILE_IO |
### Sinks
| Matcher | Cap |
|---------|-----|
| `system`, `exec`, `passthru`, `shell_exec`, `proc_open`, `popen` | SHELL_ESCAPE |
| `eval`, `assert` | SHELL_ESCAPE |
| `include`, `include_once`, `require`, `require_once` | FILE_IO |
| `unserialize` | SHELL_ESCAPE |
| `move_uploaded_file`, `copy`, `file_put_contents`, `fwrite` | FILE_IO |
| `echo`, `print` | HTML_ESCAPE |
| `mysqli_query`, `pg_query`, `query` | SHELL_ESCAPE |
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.code_exec.eval` | High | A | `eval()` — dynamic code execution |
| `php.code_exec.create_function` | High | A | `create_function()` — deprecated eval-like constructor |
| `php.code_exec.preg_replace_e` | High | A | `preg_replace` with `/e` modifier — code execution via regex |
| `php.code_exec.assert_string` | High | A | `assert()` with string argument — evaluates PHP code |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.cmdi.system` | High | A | `system`/`shell_exec`/`exec`/`passthru`/`proc_open`/`popen` |
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.deser.unserialize` | High | A | `unserialize()` — PHP object injection |
### SQL Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.sqli.query_concat` | Medium | B | `mysql_query`/`mysqli_query` with concatenated SQL |
### Path Traversal
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.path.include_variable` | High | B | `include`/`require` with variable path — file inclusion |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.crypto.md5` | Low | A | `md5()` — weak hash function |
| `php.crypto.sha1` | Low | A | `sha1()` — weak hash function |
| `php.crypto.rand` | Low | A | `rand()`/`mt_rand()` — not cryptographically secure |
---
## Examples
### `php.code_exec.eval` — Dynamic code execution
**Vulnerable:**
```php
eval($_GET['code']);
```
**Safe alternative:**
```php
// Never use eval with user input
// Use a template engine or allowlisted operations
```
### `php.deser.unserialize` — Object injection
**Vulnerable:**
```php
$obj = unserialize($_COOKIE['data']);
```
**Safe alternative:**
```php
$data = json_decode($_COOKIE['data'], true);
```
### `php.path.include_variable` — File inclusion
**Vulnerable:**
```php
include($_GET['page']); // Local/remote file inclusion
```
**Safe alternative:**
```php
$allowed = ['home', 'about', 'contact'];
$page = in_array($_GET['page'], $allowed) ? $_GET['page'] : 'home';
include("pages/{$page}.php");
```
### `php.sqli.query_concat` — SQL concatenation
**Vulnerable:**
```php
mysqli_query($conn, "SELECT * FROM users WHERE id=" . $_GET['id']);
```
**Safe alternative:**
```php
$stmt = $conn->prepare("SELECT * FROM users WHERE id=?");
$stmt->bind_param("i", $_GET['id']);
$stmt->execute();
```

View file

@ -1,142 +0,0 @@
# Python Rules
Nyx detects Python vulnerabilities through AST patterns and taint analysis, covering code execution, command injection, deserialization, SQL injection, and weak crypto.
## Taint Labels
Python has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/python.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `os.getenv`, `os.environ` | all |
| `request.args`, `request.form`, `request.json`, `request.headers`, `request.cookies`, `input` | all |
| `sys.argv` | all |
| `argparse.parse_args`, `urllib.request.urlopen`, `requests.get`, `requests.post` | all |
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `html.escape` | HTML_ESCAPE |
| `shlex.quote` | SHELL_ESCAPE |
| `os.path.realpath` | FILE_IO |
### Sinks
| Matcher | Cap |
|---------|-----|
| `eval`, `exec` | SHELL_ESCAPE |
| `os.system`, `os.popen`, `subprocess.call`, `subprocess.run`, `subprocess.Popen`, `subprocess.check_output`, `subprocess.check_call` | SHELL_ESCAPE |
| `cursor.execute`, `cursor.executemany` | SHELL_ESCAPE |
| `send_file`, `send_from_directory` | FILE_IO |
| `open` | FILE_IO |
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.code_exec.eval` | High | A | `eval()` — dynamic code execution |
| `py.code_exec.exec` | High | A | `exec()` — dynamic code execution |
| `py.code_exec.compile` | Medium | A | `compile()` with exec/eval mode |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.cmdi.os_system` | High | A | `os.system()` — shell command execution |
| `py.cmdi.os_popen` | High | A | `os.popen()` — shell command execution |
| `py.cmdi.subprocess_shell` | High | B | `subprocess.*` with `shell=True` |
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.deser.pickle_loads` | High | A | `pickle.loads()` / `pickle.load()` — arbitrary object deserialization |
| `py.deser.yaml_load` | High | A | `yaml.load()` without SafeLoader |
| `py.deser.shelve_open` | Medium | A | `shelve.open()` — pickle-backed deserialization |
### SQL Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.sqli.execute_format` | Medium | B | `cursor.execute()` with string concatenation |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.crypto.md5` | Low | A | `hashlib.md5()` — weak hash algorithm |
| `py.crypto.sha1` | Low | A | `hashlib.sha1()` — weak hash algorithm |
### Template Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.xss.jinja_from_string` | Medium | A | `jinja2.Template.from_string()` — template injection |
---
## Examples
### `py.deser.pickle_loads` — Unsafe deserialization
**Vulnerable:**
```python
import pickle
data = pickle.loads(request.body) # Arbitrary code execution
```
**Safe alternative:**
```python
import json
data = json.loads(request.body) # JSON is safe
```
### `py.cmdi.subprocess_shell` — Shell execution
**Vulnerable:**
```python
import subprocess
subprocess.call(user_input, shell=True) # Command injection
```
**Safe alternative:**
```python
import subprocess
import shlex
subprocess.call(shlex.split(user_input), shell=False)
# Or better: use an explicit command list
subprocess.call(["ls", "-la", user_dir])
```
### `py.deser.yaml_load` — Unsafe YAML
**Vulnerable:**
```python
import yaml
config = yaml.load(user_data) # Can instantiate arbitrary objects
```
**Safe alternative:**
```python
import yaml
config = yaml.safe_load(user_data) # Only basic Python types
```
### `py.sqli.execute_format` — SQL concatenation
**Vulnerable:**
```python
cursor.execute("SELECT * FROM users WHERE id=" + user_id)
```
**Safe alternative:**
```python
cursor.execute("SELECT * FROM users WHERE id=?", (user_id,))
```

View file

@ -1,132 +0,0 @@
# Ruby Rules
Nyx detects Ruby vulnerabilities through AST patterns and taint analysis, covering code execution, command injection, deserialization, reflection, SSRF, and weak crypto.
## Taint Labels
Ruby has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/ruby.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `ENV`, `gets` | all |
| `params` | all |
> **Note:** Ruby's `params[:cmd]` subscript access is detected via `element_reference` node handling in the CFG. Sinatra/Rails `do...end` blocks are walked as function scopes.
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `CGI.escapeHTML`, `ERB::Util.html_escape` | HTML_ESCAPE |
| `Shellwords.escape`, `Shellwords.shellescape` | SHELL_ESCAPE |
### Sinks
| Matcher | Cap |
|---------|-----|
| `system`, `exec` | SHELL_ESCAPE |
| `eval` | SHELL_ESCAPE |
| `puts`, `print` | HTML_ESCAPE |
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.code_exec.eval` | High | A | `Kernel#eval` — dynamic code execution |
| `rb.code_exec.instance_eval` | High | A | `instance_eval` — evaluates string in object context |
| `rb.code_exec.class_eval` | High | A | `class_eval` / `module_eval` — evaluates string in class context |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.cmdi.backtick` | High | A | Backtick shell execution (`` `cmd` ``) |
| `rb.cmdi.system_interp` | High | A | `system`/`exec` call — command execution risk |
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.deser.yaml_load` | High | A | `YAML.load` — arbitrary object deserialization |
| `rb.deser.marshal_load` | High | A | `Marshal.load` — arbitrary Ruby object deserialization |
### Reflection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.reflection.send_dynamic` | Medium | B | `send()` with non-symbol argument — arbitrary method dispatch |
| `rb.reflection.constantize` | Medium | A | `constantize` / `safe_constantize` — dynamic class resolution |
### SSRF
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.ssrf.open_uri` | Medium | A | `Kernel#open` with HTTP URL — SSRF via open-uri |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.crypto.md5` | Low | A | `Digest::MD5` — weak hash algorithm |
---
## Examples
### `rb.deser.yaml_load` — Unsafe YAML deserialization
**Vulnerable:**
```ruby
data = YAML.load(params[:config]) # Arbitrary object instantiation
```
**Safe alternative:**
```ruby
data = YAML.safe_load(params[:config]) # Only basic Ruby types
```
### `rb.cmdi.backtick` — Backtick shell execution
**Vulnerable:**
```ruby
output = `ls #{user_dir}` # Command injection via interpolation
```
**Safe alternative:**
```ruby
require 'open3'
output, status = Open3.capture2('ls', user_dir)
```
### `rb.reflection.send_dynamic` — Dynamic method dispatch
**Vulnerable:**
```ruby
obj.send(params[:method], params[:arg]) # Arbitrary method invocation
```
**Safe alternative:**
```ruby
allowed = %w[name email phone]
if allowed.include?(params[:method])
obj.send(params[:method])
end
```
### `rb.deser.marshal_load` — Marshal deserialization
**Vulnerable:**
```ruby
obj = Marshal.load(request.body.read)
```
**Safe alternative:**
```ruby
data = JSON.parse(request.body.read)
```

View file

@ -1,105 +0,0 @@
# Rust Rules
Nyx detects Rust vulnerabilities through AST patterns (memory safety, code quality) and taint analysis (command injection via `env::var``Command::new`).
## Taint Sources
| Function | Capability | Source Kind |
|----------|-----------|-------------|
| `std::env::var`, `env::var` | `all` | EnvironmentConfig |
## Taint Sinks
| Function | Required Capability |
|----------|-------------------|
| `Command::new`, `Command::arg`, `Command::args` | `SHELL_ESCAPE` |
| `Command::status`, `Command::output` | `SHELL_ESCAPE` |
| `fs::read_to_string`, `fs::write`, `fs::read`, `File::open`, `File::create` | `FILE_IO` |
## Taint Sanitizers
| Function | Strips Capability |
|----------|------------------|
| `html_escape::encode_safe`, `sanitize_html` | `HTML_ESCAPE` |
| `shell_escape::unix::escape`, `sanitize_shell` | `SHELL_ESCAPE` |
> **Note:** `fs::read_to_string` was moved from taint sources to sinks to support path traversal detection (`env::var``fs::read_to_string`).
---
## AST Pattern Rules
### Memory Safety
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rs.memory.transmute` | High | A | `std::mem::transmute` — unchecked type reinterpretation |
| `rs.memory.copy_nonoverlapping` | High | A | `ptr::copy_nonoverlapping` — raw pointer memcpy |
| `rs.memory.get_unchecked` | High | A | `get_unchecked` / `get_unchecked_mut` — unchecked indexing |
| `rs.memory.mem_zeroed` | High | A | `std::mem::zeroed` — may be UB for non-POD types |
| `rs.memory.ptr_read` | High | A | `ptr::read` / `ptr::read_volatile` — raw pointer dereference |
| `rs.memory.narrow_cast` | Low | A | `as u8`/`i8`/`u16`/`i16` — possible truncation |
| `rs.memory.mem_forget` | Low | A | `std::mem::forget` — may leak resources |
### Code Quality
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rs.quality.unsafe_block` | Medium | A | `unsafe { }` block — manual memory safety obligation |
| `rs.quality.unsafe_fn` | Medium | A | `unsafe fn` declaration |
| `rs.quality.unwrap` | Low | A | `.unwrap()` — panics on `None`/`Err` |
| `rs.quality.expect` | Low | A | `.expect()` — panics on `None`/`Err` |
| `rs.quality.panic_macro` | Low | A | `panic!()` macro invocation |
| `rs.quality.todo` | Low | A | `todo!()` / `unimplemented!()` placeholder |
---
## Examples
### `rs.memory.transmute` — Unchecked type reinterpretation
**Vulnerable:**
```rust
let x: u32 = 42;
let y: f32 = unsafe { std::mem::transmute(x) };
```
**Safe alternative:**
```rust
let x: u32 = 42;
let y: f32 = f32::from_bits(x);
```
### `rs.quality.unsafe_block` — Unsafe block
**Flagged:**
```rust
unsafe {
let ptr = &x as *const i32;
println!("{}", *ptr);
}
```
**Safe alternative:**
```rust
// Use safe abstractions when possible
println!("{}", x);
```
### Taint: `env::var``Command::new`
**Vulnerable:**
```rust
let cmd = std::env::var("USER_CMD").unwrap();
Command::new("sh").arg("-c").arg(&cmd).output()?;
```
**Safe alternative:**
```rust
let cmd = std::env::var("USER_CMD").unwrap();
// Validate against allowlist
let allowed = ["ls", "whoami", "date"];
if allowed.contains(&cmd.as_str()) {
Command::new(&cmd).output()?;
}
```

View file

@ -1,81 +0,0 @@
# TypeScript Rules
TypeScript rules mirror JavaScript patterns plus TypeScript-specific type-safety escape detectors. Taint labels are shared with JavaScript (see [JavaScript Rules](javascript.md)).
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.code_exec.eval` | High | A | `eval()` — dynamic code execution |
| `ts.code_exec.new_function` | High | A | `new Function()` — eval equivalent |
| `ts.code_exec.settimeout_string` | Medium | A | `setTimeout`/`setInterval` with string argument |
### XSS Sinks
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.xss.document_write` | Medium | A | `document.write()` / `document.writeln()` |
| `ts.xss.outer_html` | Medium | A | Assignment to `.outerHTML` |
| `ts.xss.insert_adjacent_html` | Medium | A | `insertAdjacentHTML()` |
| `ts.xss.location_assign` | Medium | A | Assignment to `location`/`location.href` |
| `ts.xss.cookie_write` | Low | A | Write to `document.cookie` |
### Prototype Pollution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.prototype.proto_assignment` | Medium | A | Assignment to `__proto__` |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.crypto.math_random` | Low | A | `Math.random()` — not cryptographically secure |
### Code Quality (TypeScript-specific)
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.quality.any_annotation` | Low | A | Type annotation of `any` — disables type checking |
| `ts.quality.as_any` | Low | A | Type assertion `as any` — type-safety escape hatch |
---
## Examples
### `ts.quality.any_annotation``any` type
**Flagged:**
```typescript
function process(data: any) { // ts.quality.any_annotation
data.whatever(); // No type checking
}
```
**Safe alternative:**
```typescript
interface UserData { name: string; email: string; }
function process(data: UserData) {
console.log(data.name);
}
```
### `ts.quality.as_any` — Type assertion escape
**Flagged:**
```typescript
const result = someValue as any; // ts.quality.as_any
result.nonexistentMethod();
```
**Safe alternative:**
```typescript
if (isValidType(someValue)) {
const result = someValue as KnownType;
result.knownMethod();
}
```

124
docs/serve.md Normal file
View file

@ -0,0 +1,124 @@
# `nyx serve`: the browser UI
The CLI is fine for CI. For triage, you want context: the source snippet, the dataflow path, the history of how a finding has moved across scans, and a place to record decisions that survive the next run. `nyx serve` boots a local React UI bound to loopback.
```bash
nyx serve # opens http://localhost:9700 in your default browser
nyx serve ./my-project # serve a specific project root
nyx serve --port 9750 # override port
nyx serve --no-browser # don't auto-open
```
Persistent settings live under `[server]` in `nyx.conf` / `nyx.local`.
<p align="center"><img src="../assets/screenshots/docs/serve-overview.png" alt="Nyx UI overview: total findings, severity breakdown, language and category distribution, top affected files" width="900"/></p>
## What it serves, and what it doesn't
The frontend is built and embedded into the `nyx` binary at compile time. There's no separate install step, and the binary serves the entire UI from memory; nothing is fetched from a CDN. The UI talks to the local Nyx process over a small JSON API.
There is **no** account, no telemetry, no remote logging, no auto-update ping. The data the UI shows is the data on your disk: the SQLite project index plus `.nyx/triage.json`.
## Security model
`nyx serve` enforces three things at the HTTP layer ([`src/server/security.rs`](https://github.com/elicpeter/nyx/blob/master/src/server/security.rs)):
1. **Loopback bind only.** `--host` and `[server].host` are clamped to `127.0.0.1`, `localhost`, or `::1`. Any other value is refused at startup with `Nyx serve only binds to loopback addresses; refused host '<value>'`.
2. **Host-header check.** Every request must carry a `Host` header that matches the bound address and port. Missing or mismatched headers get a `400 invalid Host header`. Defends against DNS rebinding.
3. **CSRF on mutations.** `POST` / `PUT` / `PATCH` / `DELETE` requests must carry a per-process CSRF token in the `x-nyx-csrf` header. The token is generated once when the server starts and exposed at `GET /api/health` so the embedded SPA can read it. Cross-origin mutations are rejected before the CSRF check via the `Origin` header.
If you forward the port over SSH or expose it through a reverse proxy, the host-header check will reject the request because the `Host` won't match `localhost:9700`. That's the intended behaviour. Don't do this without a deliberate reason; the loopback bind is part of the security model.
## The pages
| Path | Page |
|---|---|
| `/` | Overview |
| `/findings` | Findings list |
| `/findings/:id` | Finding detail |
| `/triage` | Triage |
| `/explorer` | Explorer |
| `/scans` | Scans |
| `/scans/:id` | Scan detail and compare |
| `/rules` | Rules |
| `/rules/:id` | Rule detail |
| `/config` | Config |
The numeric `:id` for finding URLs is the position index in the current scan, not a stable fingerprint. Bookmarks across scans aren't reliable; rely on file path + line.
### Findings and Finding detail
The findings list is filterable by severity, confidence, category, language, rule ID, and triage state.
<p align="center"><img src="../assets/screenshots/docs/serve-findings-list.png" alt="Nyx findings list: 13 findings filtered by severity/confidence/rule, with status badges, file paths, and language tags" width="900"/></p>
Clicking through opens the **flow visualiser**: a numbered walk from source to sink with the snippet at each step, cross-file markers when the path leaves the current file, the rule's "How to fix" guidance, and the engine's evidence object inline.
<p align="center"><img src="../assets/screenshots/docs/serve-finding-detail.png" alt="Nyx finding detail: HIGH taint-unsanitised-flow showing source → call → sink steps, How to fix guidance, and evidence panel" width="900"/></p>
Engine notes call out when precision was bounded for that finding (`OriginsTruncated`, `PointsToTruncated`, `PathWidened`, `ForwardBailed`, etc.). Anything tagged `under-report` means the emitted flow is real and the result set is a lower bound; `over-report` means widening or bail. `--require-converged` in the CLI drops the over-report ones for strict gates.
### Triage
Each finding carries a triage state: `open`, `investigating`, `false_positive`, `accepted_risk`, `suppressed`, or `fixed`. The triage page bulk-updates them and shows the audit trail.
<p align="center"><img src="../assets/screenshots/docs/serve-triage.png" alt="Nyx triage page: 13 findings need attention, severity breakdown, Findings/Suppression rules/Audit log tabs, rule chips, Investigate buttons" width="900"/></p>
State writes are persisted to SQLite immediately, and (when `[server].triage_sync = true`, default on) mirrored to `.nyx/triage.json` in the project root. Commit that file:
```bash
git add .nyx/triage.json
```
It carries decisions across machines so a teammate's local scan reflects yours. The format is documented in [`src/server/triage_sync.rs`](https://github.com/elicpeter/nyx/blob/master/src/server/triage_sync.rs); the schema is stable and round-trip-safe with `nyx serve` re-imports.
### Explorer
A file tree with per-file finding counts, syntax-highlighted source, and a right rail with the file's symbols and findings. Useful for "what's wrong with this module" rather than "what's wrong with this finding".
<p align="center"><img src="../assets/screenshots/docs/serve-explorer.png" alt="Nyx explorer: file tree with per-file finding counts, syntax-highlighted Python source with red sink marker on the os.system line, file-summary right rail with findings" width="900"/></p>
The path query string preselects a file: `/explorer?file=src/handler.rs`.
### Scans and compare
Past runs are persisted when `[runs].persist = true` (off by default to avoid disk growth on heavy users). When persistence is on, `/scans` lists historical runs.
<p align="center"><img src="../assets/screenshots/docs/serve-scans.png" alt="Nyx scans list: completed scan run with root, duration, finding count, languages, and started timestamp" width="900"/></p>
Each run drills into a detail page with files scanned, findings count, duration, languages, and a per-pass timing breakdown.
<p align="center"><img src="../assets/screenshots/docs/serve-scan-detail.png" alt="Nyx scan detail: Summary tab with files scanned, findings, duration, languages; Details panel with Scan ID, Root, Engine version, started/finished timestamps; Timing breakdown bar showing Walk/Pass 1/Call Graph/Pass 2/Post" width="900"/></p>
Pick two scans to diff and see what got introduced, fixed, or rediscovered between runs. The retention cap is `[runs].max_runs` (default 100). Each run can also optionally save its log and stdout (`save_logs`, `save_stdout`); both are off by default. Code snippets are saved (`save_code_snippets = true`); turn off if storage is tight.
### Rules
Every rule the engine knows about, built-in plus user-added. Each row shows the matchers, kind (source / sanitiser / sink), capability, language, and how many findings it produced in the latest scan. Filter by language, by kind, or by free text.
<p align="center"><img src="../assets/screenshots/docs/serve-rules.png" alt="Nyx rules page: 218 rules with language/kind dropdowns and a matcher search; rows showing rule title, language, kind (SOURCE/SANITIZER/SINK), cap, and finding count" width="900"/></p>
User-added rules can be deleted from this page; built-ins are immutable. Built-ins live in `src/labels/<lang>.rs` and `src/patterns/<lang>.rs`; user-added entries write to `nyx.local`.
### Config
A live config editor. Reads the merged config (`nyx.conf` + `nyx.local`), lets you flip switches and add custom source / sanitizer / sink rules, and writes back to `nyx.local`. Changes apply to the next scan; the running server uses its initial config snapshot.
<p align="center"><img src="../assets/screenshots/docs/serve-config.png" alt="Nyx config page: General settings (analysis mode, max file size, excluded extensions, attack-surface ranking), Triage Sync toggle, Sources section with language/matcher/capability dropdowns and a per-language matcher table" width="900"/></p>
The custom-rule form picks a language, a matcher (function or property name), and a capability. The capability list matches the `Cap` bitflags the taint engine uses; see [rules.md](rules.md#capability-list-for-custom-rules) for what each one means.
## API surface
For tooling, the JSON endpoints under `/api/` are stable enough to script against. The full route map lives in [`src/server/routes/mod.rs`](https://github.com/elicpeter/nyx/blob/master/src/server/routes/mod.rs). Mutating endpoints require the `x-nyx-csrf` header (read it from `GET /api/health`).
## Disabling
If you don't want the UI for a project, set:
```toml
[server]
enabled = false
```
`nyx serve` will refuse to start. The CLI continues to work.