Release/0.5.0 (#35)

* feat: Introduce function-scoped variable interning for state analysis with new tests and fixtures

* feat: Add Phase 26 symbolic execution enhancements with bitwise operator support, abstract interpretation refinements, and new taint analysis tests

* feat: Refine state analysis to handle factory-pattern resource returns with mixed-path tests and leak detection enhancements

* feat: Add Phase 27 debug views with symbolic execution, abstract interpretation, SSA, and call graph viewers; integrate with debug layout and styles

* feat: Add Phase 31 type-qualified symbolic resolution with receiver-based callee disambiguation and testing

* feat: Extend symbolic execution with state iteration, enhanced debug views, and debounced input handling

* feat: Add Phase 13 resource and auth pattern extensions with new tests and fixtures

* feat: Introduce CFG debug graph renderer with compact mode, toolbar, and DAG layout integration

* feat: Add Phase 28 encoding and decoding transform modeling with structural symex enhancements and new taint analysis tests

* feat: Extend abstract interpretation with type facts and constant value tracking in debug views and server logic

* feat: Add linear path handling and witness extraction to symbolic execution with Phase 28 transform mismatch detection

* feat: Refine Go auth and sanitizer handling with enhanced rules, state updates, and benchmark improvements

* feat: Enable auth-state analysis by default and update relevant tests in benchmark config

* test: Update state_tests to reflect default enablement of auth-state analysis and add auth suppression test

* docs: update CHANGELOG.md

* feat: Introduce per-index taint tracking in `HeapState` with `HeapSlot`, overflow handling, and revised SSA transfers

* feat: Introduce C/C++ language labels and refine heap state tracking in SSA transfers

* feat: Implement per-index array slot tracking in symbolic heap with overflow collapse

* feat: Add implicit definition handling for uninitialized declarations in SSA value allocation

* feat: Refactor function parameters and constants for improved clarity and maintainability

* refactor: Reorder module imports and improve formatting for consistency

* refactor: Fix formatting erorrs

* refactor: Fix clippy warnings

* refactor: Fix fmt warnings (again)

* chore: Update dependencies and improve feature configuration

* Add comprehensive tests for undertested modules (#36) (COPILOT)

* Add comprehensive tests for undertested modules

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083

* Add comprehensive tests for ext, project, walk, and errors modules

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* chore: Update dependencies and improve feature configuration

* fix: formatting errors in new tests

* chore: Update license list in about.toml

* chore: made functions input inline

* chore: updated cfg graph to take up the full page

* chore: add Prettier configuration and update code formatting

* Add frontend test suite with Vitest (111 tests) (#37)

* Add Vitest test suite for frontend - 111 tests across utils, components, hooks, and graph utilities

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/7cf0dba2-ecff-4740-ba4d-92717e74a0b7

* ci: add frontend test step to CI workflow

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/5bc0ac9f-0a32-4d03-9cb7-7a15aea53fca

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* chore: simplify array initialization in test files for consistency

* ran typecheck

* feat: add AnalysisWorkspace component and integrate it into CfgViewerPage

* feat: update routing in AppLayout and improve empty state message in ExplorerPage

* feat: enhance scan progress tracking with additional metrics and stages

* feat: update license information and add license check script

* feat: implement cross-file symbolic execution with callee body persistence

* feat: replace dagre graphs with Graphology + ELK + Sigma for more advanced call stack and cfg rendering

* feat: ensure CFG function view is scoped to the selected function, preventing bleed into sibling functions

* feat: enhance resource tracking with proxy method summaries and improve finding extraction

* feat: add terminal function exit detection for accurate resource leak analysis

* feat: add warnings for loops and functions without bodies to improve error recovery

* feat: update lambda expression handling to ensure proper function classification and control flow

* feat: remove bounded formatting/string ops and add JSON.parse sanitizer for improved data handling

* feat: add inline return taint analysis and regression tests for improved security checks

* feat: add engine version management and migration handling for database schema updates

* feat: enhance first_call_ident to skip nested function bodies and add regression tests

* feat: enhance callee name resolution with two-segment normalization and disambiguation

* feat: add cross-file context flags and debug assertions for taint analysis

* feat: refactor taint analysis structure to unify context handling and improve clarity

* feat: enhance dead code elimination to preserve Sink, Source, and Sanitizer labels with new tests

* docs: updated CHANGELOG.md

* fmt: formatting fixes

* fix: fixed frontend formatting and lint warnings

* fix: optimized ci

* fix: optimized ci

* Add comprehensive multi-file test coverage to Nyx (#38)

* Initial checklist for multi-file test suite expansion

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* Add 12 new multi-file test fixtures with TP/TN/near-miss coverage

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* deleted root repo

* rebuilt to test for regressions

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Co-authored-by: elipeter <elicpeter@gmail.com>

* feat: enhance import alias resolution and taint tracking

* feat: implement security hardening with CSRF protection and path validation

* feat: add support for import alias bindings in Python, PHP, and Rust

* feat: enhance CFG analysis modes and improve code readability

* feat: add detection for parameterized SQL queries to enhance security

* feat: add safe internal redirect handling and enhance session destroy validation

* feat: implement security improvements by addressing vulnerabilities in execAsync, session management, and file downloads

* feat: enhance taint detection by adding support for inline source member expressions in call arguments

* feat: implement pre-emission of Source nodes for inline source member expressions in call arguments

* feat: add support for Throw statement in control flow and error handling

* feat: add debug and echo endpoints with potential information leakage

* feat: implement internal redirect suppression and enhance taint detection

* feat: implement module alias tracking for dynamic dispatch in JS/TS

* feat: add authorization analysis module with Express support

* feat: add authorization analysis module with Express support

* feat: add tests for admin guard requirements and clean checks in authorization analysis

* feat: integrate Koa and Fastify frameworks into authorization analysis

* feat: add Flask and Django support to authorization analysis module

* feat: add support for Rails and Sinatra frameworks in authorization analysis

* feat: add support for Axum, ActixWeb, and Rocket frameworks in authorization analysis

* feat: add support for ActixWeb, Axum, and Rocket frameworks in authorization analysis

* feat: add support for Rails and Sinatra in authorization analysis

* chore: add .DS_Store to .gitignore

* refactor: simplify conditional checks and improve readability in multiple files

* refactor: update usage of Option methods for improved clarity and consistency

* refactor: improve code readability by simplifying conditional checks and formatting

* refactor: improve code formatting and readability by simplifying conditional checks

* refactor: simplify conditional checks and improve readability in multiple files

* refactor: simplify conditional checks in axum.rs for improved readability

* feat: add CodeQL analysis configuration for enhanced security scanning

* test: add comprehensive tests for `src/output.rs` SARIF builder (#39)

* chore: start test coverage improvement work

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* test: add comprehensive tests for src/output.rs SARIF builder

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* refactor: improve code formatting and readability in output.rs

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Co-authored-by: elipeter <elicpeter@gmail.com>

* refactor: improve code formatting and readability in output.rs

* Potential fix for code scanning alert no. 210: Uncontrolled data used in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Potential fix for code scanning alert no. 211: Uncontrolled data used in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* refactor: enhance triage file path handling with improved error management and validation

* refactor: updated func summaries for richer detail

* refactor: update SSA summary extraction to use canonical FuncKey for distinct entries

* refactor: enhance callee metadata structure to support arity, receiver, and qualifier for better overload resolution

* refactor: add support for keyword arguments in function calls and enhance receiver extraction for method-style calls

* refactor: implement new Flask routes for safe and unsafe shell command execution

* refactor: separate receiver handling in SSA operations and enhance taint propagation

* refactor: improve arity handling by using arg_uses for positional argument count and enhance witness scoring for tainted arguments

* refactor: implement auth decorator extraction and classification for multiple languages

* refactor: enhance Rust module path resolution and use map handling for cross-file disambiguation

* refactor: introduce CalleeQuery struct for structured callee resolution and enhance resolver logic

* refactor: implement same-file identity collision handling for `runTask` to ensure correct resolver behavior

* refactor: standardize default struct initialization across multiple files

* feat: add scripts for formatting checks and auto-fixes with test summaries

* refactor: simplify character splitting and enhance namespace qualifier handling

* refactor: improve documentation clarity and enhance code readability in resolver logic

* refactor: replace default struct initialization with explicit field assignments for clarity

* feat: enhance anonymous function naming by deriving context-based bindings

* refactor: streamline match expressions for improved readability and performance

* refactor: streamline match expressions for improved readability and performance

* refactor: replace loop with while let for improved clarity and performance

* feat: add SSA constant propagation support to analysis context for improved accuracy

* feat: add SSA constant propagation support to analysis context for improved accuracy

* feat: implement shell metacharacter validation and bounded-length checks in Rust analysis

* feat: add static map analysis for command injection suppression and type safety

* refactor: simplify match statements and reduce line breaks for improved readability

* feat(summary): phase 1/5 SinkSite data model for primary sink-location attribution

Introduce SinkSite (file_rel, line, col, snippet, cap) carrying the
primary sink source-location through function summaries. Swap
SsaFuncSummary.param_to_sink and FuncSummary.param_to_sink from a coarse
Cap map to a deduped SmallVec<[SinkSite; 1]> per parameter, with a
backward-compatible cap_sites() helper and serde defaults so pre-phase-1
on-disk rows continue to deserialise cleanly.

Extraction: SinkSiteLocator bundles the tree/bytes/file_rel needed by
extract_ssa_func_summary; ParsedFile::extract_ssa_artifacts wires the
locator in for the persisted pass-1 path, while pass-2 intra-file
transient summaries fall back to cap-only sites (behavior unchanged).
Merge: GlobalSummaries::insert now unions sink sites with
(file_rel, line, col, cap) dedup via shared union_param_sink_sites
helper.

Database: JSON-serialised summary columns carry the new shape
automatically; no schema change needed.

Phase 2 will consume SinkSite in build_taint_diag() to overwrite the
caller-site Finding.line with the callee's sink line when resolved via
summary. Phase 1 keeps behavior unchanged: scanning
tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs still produces the
same (wrong) line 10 finding.

Adds round-trip tests covering SinkSite solo, SsaFuncSummary with sink
sites, legacy-JSON default handling for both summary types, and merge
dedup.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(taint): phase 2/5 thread SinkSite into SsaTaintEvent and Finding

Plumb Phase 1's SinkSite through the event pipeline into Findings,
no output change yet.  SsaTaintEvent gains `primary_sink_site:
Option<SinkSite>`; when the main or callback sink-emission path has
non-empty `param_to_sink_sites`, filter to sites whose
`(line != 0) && (cap ∩ sink_caps != ∅)` and emit one event per
distinct site — the multi-primary collapse keeps each downstream
Finding single-primary.

Resolution: ResolvedSummary and SinkInfo gain mirror
`param_to_sink_sites` fields, populated from `SsaFuncSummary.param_to_sink`
(SSA + callback paths) and `FuncSummary.param_to_sink` (global paths).
Label, local-summary, and interop resolution paths leave the field
empty — they only ever had cap-level info to begin with.

Finding: new `primary_location: Option<SinkLocation>` with
`file_rel/line/col`.  `ssa_events_to_findings` maps
`event.primary_sink_site` → `Finding.primary_location`, filtering
cap-only sites (`line == 0`) to `None` so the (0,0) sentinel never
leaks to formatters.  Dedup key extended with the primary location
so multi-site events aren't collapsed back together.

Invariants (debug_assert!):
* every SinkSite reaching emission has `line != 0 && cap ∩ sink_caps
  != ∅` — enforced by the pick_primary_sink_sites* filters;
* every populated Finding.primary_location has `line != 0` AND
  non-empty `file_rel` — the cap-only → None translation upstream
  guarantees this.

Deliberately independent of `uses_summary`: that flag tracks whether
the *taint chain* used a summary, whereas primary attribution
requires only that the *sink* itself was summary-resolved.  A local
source reaching a cross-file sink produces `uses_summary=false`
alongside a populated primary_location — documented on
Finding.primary_location, covered by
`cross_file_sink_finding_carries_primary_location`.

build_taint_diag, SARIF/JSON/explanation formatters, and the
benchmark scorer remain untouched: finding.line still comes from
`cfg_graph[finding.sink]`, so cmdi_indirect.rs still reports line 10
and the benchmark's rs-cmdi-003 row still shows FN in the LOC column.

Tests: `cross_file_sink_finding_carries_primary_location` (proves
plumbing via a synthetic FuncSummary carrying a SinkSite at 42:5) and
`cross_file_sink_cap_only_site_leaves_primary_location_none`
(regression guard against cap-only sites surfacing).  All 1566 lib
tests + integration tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(output): phase 3/5 consume primary sink location in diag + SARIF

When a finding's primary_location (populated in phase 2 from a callee
summary's SinkSite) names the dangerous instruction inside a callee
body, attribute the diagnostic line to that location instead of the
caller's call site. The call site is demoted to a Call step in
flow_steps, and a synthetic Sink step at the primary location is
appended so analysts still see the full trace.

Changes:
- Add scan_root parameter to build_taint_diag so file_rel can be
  resolved back to an absolute path via a shared resolve_file_rel
  helper. Empty file_rel (single-file scans where namespace == "")
  resolves to the file under analysis.
- Extend SinkLocation with snippet, carried from the upstream
  SinkSite so the formatter needs no second file read.
- Relax the ssa_events_to_findings debug_assert to allow empty
  file_rel, which is valid when scan root equals the file itself.
- SARIF: emit data-flow as codeFlows[0].threadFlows[0].locations[];
  locations[0] already reflects the primary sink position via the
  updated diag line/col.

Acceptance: scan on tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs
now reports line 5 (Command::new) as the primary sink, with the call
site at line 10 visible in flow_steps.

Two expect.json fixtures updated (must_match line_range widened):
- javascript/taint/context_sensitive_call: 12-14 -> 7-14 (line 8 is
  the real sink inside run()).
- rust/cfg/closure_async: 10-10 -> 10-11 (line 11 is Command::new
  inside the closure).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(bench): phase 4/5 validate primary sink attribution across corpus

Extend the benchmark scorer and ground truth to lock in phase 3's
primary-location behavior, and add fixtures that exercise the new
capability end-to-end.

Scorer (tests/benchmark_test.rs):
- Add optional `expected_call_site_lines: Option<Vec<[usize; 2]>>` on
  Case. When present, score_location_level additionally requires at
  least one flow_step in the finding's evidence trace to fall within
  ±2 of the call-site range. When absent, the check is skipped —
  fully forward-compatible with existing fixtures.
- Retain ±2 tolerance on expected_sink_lines (compared against the
  now-primary Diag.line post-phase-3).

Ground truth edits:
- rs-cmdi-cross-001: expected_sink_lines [8,8] -> [9,9]. Line 8 is the
  transform::wrap call site (a cross-file propagator, not a sink);
  line 9 is Command::new, the real sink. The ±2 tolerance happened to
  mask this stale attribution but it was semantically wrong — phase 4
  is the right time to correct it. Also adds expected_call_site_lines
  [8,8] so the new field is exercised on an existing cross-file case.
- rs-cmdi-003: adds expected_call_site_lines [10,10] (run_cmd call).
  This fixture's sink (Command::new inside run_cmd at line 5) was the
  motivating case for phases 1-3; adding the call-site assertion
  guards against regression to caller-line attribution.

New fixtures:
- rust/cmdi/cmdi_indirect_multisink.rs (rs-cmdi-009): helper run_both
  takes two tainted params and invokes two Command sinks on
  consecutive lines. Locks in that primary line lands inside the
  helper (lines 5-6), not at the caller (line 12). Notes document
  that SinkSite is currently one-per-callee so both findings today
  collapse onto the first sink; expected_sink_lines=[5,6] and
  expected_call_site_lines=[12,12] stay valid either way.
- python/cmdi/cross_indirect_sink/{app.py,helper.py} (py-cmdi-cross-
  004): sink os.system lives in helper.py (cross-file), caller in
  app.py reads env source and calls run_cmd. Verifies phase 3's
  cross-file primary attribution: Diag.path = helper.py, Diag.line =
  5, with app.py:7 recorded in flow_steps as a Call step.

Acceptance:
- `cargo test --test benchmark_test -- --ignored --nocapture` passes.
- rs-cmdi-003 is TP/TP/TP (the target flip FN->TP at LOC). All
  pre-existing TP/TP/TP fixtures remain TP/TP/TP; 2 new fixtures are
  TP/TP/TP.
- Aggregate rule-level: TP=158 FP=10 FN=1 TN=97, P=0.940 R=0.994
  F1=0.966 on the 266-case corpus (was TP=156 FP=10 FN=1 TN=97 on
  264 pre-phase-4, delta is the +2 new cases both resolving TP).
- Full `cargo test` green (1566 lib tests + all integration tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(taint): phase 5/5 lock Finding.primary_location contract via regression test

Add a regression test in src/taint/ssa_transfer.rs that wires up a synthetic
SsaFuncSummary with a SinkSite at other.rs:42:10 and drives the three
emission stages (pick_primary_sink_sites → emit_ssa_taint_events →
ssa_events_to_findings) against a minimal caller SSA body.  Asserts the
resulting Finding.primary_location is exactly that triple.

The existing integration tests in src/taint/tests.rs cover the coarse
FuncSummary path end-to-end through analyse_file.  This test locks in the
lower-level SSA-side plumbing so a future refactor that silently drops the
site between pick → emit → findings fails here rather than only at the
benchmark layer.

Also refreshes tests/benchmark/results/latest.json (timestamp only; rs-cmdi-003
remains TP/TP/TP and the aggregate P/R/F1 are unchanged from phase 4).

Closes the primary sink-location attribution feature (phases 1-5/5):
* Phase 1 — SinkSite data model on summaries.
* Phase 2 — SinkSite threaded into SsaTaintEvent and Finding.
* Phase 3 — diag + SARIF consume primary_location.
* Phase 4 — benchmark validates primary_call_site_lines across corpus.
* Phase 5 — regression test locks the event→finding contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: clean up formatting and improve readability in multiple files

* refactor: simplify type definition for deduplication key in findings

* test(harness): add must_not_match expectation for FP regression guards

Extends ExpectedFinding with must_not_match field that asserts a
diagnostic must NOT fire — presence is a hard failure. Non-consuming
scan so it coexists with must_match entries on the same rule_id.
Adds forbidden_violations accumulator and updates summary line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(regression): update expectations to ensure must_not_match for various taint and resource leak rules

* feat: implement auto-seeding for JS/TS handler parameters to enhance taint tracking

* feat: update switch statement handling to improve control flow analysis

* feat: implement promisify alias handling for JS/TS to enhance taint tracking

* feat: enhance taint tracking by refining expectation handling and adding mode filtering

* feat: refine SQL handling in stream processing and enhance auto-seeding for handler parameters

* feat: update taint tracking rules to enforce full mode matching and improve flow analysis

* feat: enhance Ruby subshell handling to improve taint tracking and flow analysis

* feat: update xss_response expectations to refine taint flow analysis and enhance regression guarding

* feat: refine framework detection and update expectation handling for Echo and Sinatra

* feat: implement max_count for taint tracking expectations and deduplicate findings

* feat: add strict_unexpected handling for taint-unsanitised-flow in expectation files

* feat: enhance deduplication of taint-unsanitised-flow findings by collapsing based on line and severity

* feat: add strict_unexpected handling for taint-unsanitised-flow in multiple expectation files

* feat: add structural invariant checks for SSA bodies

* feat: ensure deterministic phi emission order using BTreeSet

* feat: enhance handling of terminators to ensure authoritative flow through successor edges

* feat: enhance Goto terminator handling to ensure all successors are marked executable

* feat: refactor code for improved readability and organization

* feat: simplify predicate checks and enhance readability in SSA handling

* feat: implement per-file parse timeout and enhance file size handling

* feat: migrate analysis engine toggles from environment variables to configuration file

* feat: remove unnecessary whitespace in hostile_input_tests.rs

* feat: remove unnecessary whitespace in hostile_input_tests.rs

* feat: update dependencies and enhance documentation on language maturity

* feat: enhance security headers and improve request body limits

* feat: implement sink capability bits for deduplication and enhance evidence tagging

* feat: implement dynamic activation handling for gated sinks and enhance validation logic

* feat: enhance configuration documentation and clarify inline analysis cache behavior

* feat: implement panic recovery during analysis to continue scans past errors

* feat: add expectations configuration for taint analysis and performance metrics

* feat: enhance error handling and logging during file reading and mutex locking

* feat: add cross-file body loading tests and plumbing for CF-1 phase

* feat: implement cross-file k=1 context-sensitive inline taint analysis with new tests and fixtures

* feat: implement indexed-scan parity in cross-file inline analysis with new dropdown and copy functionality

* feat: enhance classification span handling in CFG and AST for improved source attribution

* feat: add new Express routes for handling user input and telemetry data

* feat: implement ternary expression handling in CFG with diamond structure for JS/TS

* feat: implement Phase CF-3 abstract-domain transfer channels in summaries

* feat: add support for string-prefix transfer in cross-file calls and update tests

* docs: reduce RESULTS.md doc size

* feat: implement Phase CF-4 per-return-path summary decomposition with tests

* feat: update parameter handling in pass1 and refactor SsaFuncSummary initialization

* feat: implement Phase CF-5 for cross-file SCC joint fixed-point convergence with new flags and tests

* feat: implement Phase CF-6 with parameter-granularity points-to summaries and associated tests

* refactor: update comments and documentation for clarity and consistency

* style: format code for consistency and readability

* refactor: simplify verdict handling and improve edge checking logic

* refactor: optimize path and identifier collection by avoiding unnecessary cloning

* chore: update Cargo.toml for Rust version 1.85 and add ignored files; modify CHANGELOG and README for clarity on state analysis defaults

* refactor: update documentation and improve clarity in configuration files

* refactor: update documentation and improve clarity in configuration files

* feat: add JS/TS pass-2 convergence tests and expectations configuration

* feat: add Phase 5 regression tests for inline cache origin attribution and update related logic

* feat: implement Phase 7 deduplication and alternative path linking for taint findings

* feat: implement structural DFS index for anonymous functions and update naming conventions

* feat: add Phase 8 regression tests for container-element taint in JS and Python

* feat: add engine-depth profiles and explain-engine option for CLI

* feat: update expectations and add new README fixtures for multi-file scan regression

* feat: implement Phase 11 callback-alias and factory patterns with regression tests

* feat: implement Terminator::Switch for multi-way dispatch and add regression tests

* feat: add real-CVE benchmark fixtures for CVE-2023-48022, CVE-2019-14939, and CVE-2023-26159 with corresponding patched variants

* refactor: extract cfg and ssa_transfer to submodules

* refactor: cargo fmt

* refactor: remove unnecessary blank line in cfg_tests.rs

* refactor: remove unnecessary planning file

* chore: update Rust version to 1.88 and bump dependencies in Cargo files

* feat: enhance triage UI with new layout and controls, update README for clarity

* feat: enhance triage UI with new layout and controls, update README for clarity

* chore: remove outdated section from README for version 0.5.0

* docs: improve clarity and consistency in README content

* chore: add "GPL-3.0-or-later" to license options in about.toml

* chore: update license handling in about.toml and check-licenses.mjs

* style: format code for improved readability in TriagePage component

* style: format code for improved readability in TriagePage component

* chore: enhance license handling and improve body_id scoping in seed lookup

* feat: introduce owner and parent body IDs for enhanced seed scoping

* feat: implement direction-aware engine provenance with new CLI flag for strict CI gating

* feat: add Undef SSA operation for improved control-flow handling

* style: improve code formatting for consistency and readability in multiple files

* feat: add 16-function chain SCC across multiple files for enhanced analysis

* style: simplify code formatting for improved readability in multiple files

* fix: update CapHitReason default implementation and improve README clarity

* docs: enhance README with detailed explanations of taint analysis and limitations

* docs: refine README for clarity and consistency in taint analysis section

* style: improve code formatting for better readability in NewScanModal and scans

* fix: update cargo-about command to use --offline for deterministic license generation

* fix: update cargo-about command to use --offline for deterministic license generation

* ci: add step to prime cargo registry cache for deterministic license generation

* feat: add support for non-sink collections in authorization analysis

* feat: enhance authorization checks with row-level ownership equality and binding tracking

* feat: implement self-scoped user handling and enhance ownership checks

* refactor: simplify assertions and formatting in authorization analysis tests

* fix: normalize line endings in THIRDPARTY-LICENSES.html generation and update README with AI disclosure

* docs: update AI disclosure section for clarity and conciseness

* feat: add AI Contribution Policy and update contributing guidelines for AI assistance disclosure

* feat: enhance authorization analysis with SSA-derived variable type classification

* feat: implement auth_finding_to_diag function for enhanced security diagnostics

* feat: add args_value_refs to CallSite struct for enhanced argument tracking

* feat: add args_value_refs to CallSite struct for enhanced argument tracking

* feat: add direction-aware engine provenance with LossDirection classification and new CLI flag

* feat: simplify strip_cap_from_call_args call by removing unnecessary line breaks

* feat: enhance error message handling in cli_validation_tests for better Windows compatibility

* feat: optimize release profile settings in Cargo.toml and update CodeQL configuration

* feat: enhance release build process with SBOM generation and SLSA provenance

* feat: update actions/checkout and actions/setup-node to v6, enhance CLI options, and improve auth-check summaries

* feat: introduce PathFact handling for path safety checks and rejection logic

* feat: introduce PathFact handling for path safety checks and rejection logic

* feat: update benchmark data and enhance path sanitization logic with new safety checks

* feat: document AI assistance in frontend UI development and human review process

* feat: add return path facts for enhanced path safety checks and update documentation

* chore: update release date for version 0.5.0 in CHANGELOG.md

* chore: clean up ci.yml by removing outdated comments and clarifying steps

* feat: implement cross-language path sanitizers and validators for enhanced security

* feat: enhance SSA value usage tracking by including block terminators and improve path safety checks

* feat: enhance switch statement handling by adding per-case path constraints and support for exclusive cases

* refactor: simplify conditional formatting and improve code readability in executor and lower modules

* feat: add vulnerable examples for various languages demonstrating authentication and sanitization issues

* feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers

* feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers

* feat: add transform classifiers for Java, Go, and Ruby with corresponding tests

* refactor: clarify comments on reassign-to-constant idiom and sink behavior in guards.rs

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Eli Peter 2026-04-25 17:59:11 -04:00 committed by GitHub
parent c4ce08b452
commit 41128177d2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2144 changed files with 201812 additions and 8927 deletions

6
.github/codeql/codeql-config.yml vendored Normal file
View file

@ -0,0 +1,6 @@
name: "CodeQL Config"
paths-ignore:
- examples
- tests
- benches

View file

@ -1,4 +1,5 @@
name: CI
permissions:
contents: read
@ -8,33 +9,232 @@ on:
pull_request:
branches: ["master"]
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
test:
frontend:
name: frontend
runs-on: ubuntu-latest
strategy:
matrix:
rust: [stable, beta]
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
- uses: actions/checkout@v6
- uses: actions/setup-node@v6
with:
toolchain: ${{ matrix.rust }}
components: clippy, rustfmt
- uses: Swatinem/rust-cache@v2
node-version: 20
cache: npm
cache-dependency-path: frontend/package-lock.json
- name: Install frontend dependencies
working-directory: frontend
run: npm ci
- name: Frontend license check
working-directory: frontend
run: npm run license:check
- name: Frontend format check
working-directory: frontend
run: npm run format:check
- name: Frontend lint
working-directory: frontend
run: npm run lint
- name: Frontend type check
working-directory: frontend
run: npm run typecheck
- name: Frontend tests
working-directory: frontend
run: npm test
rustfmt:
name: rustfmt
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
components: rustfmt
cache: true
- name: Format check
run: cargo fmt --all -- --check
clippy-stable:
name: clippy-stable
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
components: clippy
cache: true
- name: Lint (Clippy)
run: cargo clippy --all-targets --all-features -- -D warnings
- name: Build & Test
run: cargo test --all-features --verbose
cargo-deny:
name: cargo-deny
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Security audit
uses: actions-rs/audit-check@v1
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
token: ${{ secrets.GITHUB_TOKEN }}
toolchain: stable
cache: true
- uses: taiki-e/install-action@cargo-deny
- name: License & advisory checks
uses: EmbarkStudios/cargo-deny-action@v2
run: cargo deny check advisories licenses bans sources
third-party-licenses:
name: third-party-licenses
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@v2
with:
tool: cargo-about@0.7.1
- name: Prime cargo registry cache
run: cargo fetch --locked
- name: Regenerate license attribution
run: cargo about generate --offline about.hbs | tr -d '\r' > /tmp/THIRDPARTY-LICENSES.html
- name: Diff against committed file
run: diff -u --strip-trailing-cr THIRDPARTY-LICENSES.html /tmp/THIRDPARTY-LICENSES.html
docs-fresh:
name: docs-fresh
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- name: Regenerate rule reference
run: cargo run --features docgen --bin nyx-docgen
- name: Verify docs/rules.md is fresh
run: |
if ! git diff --exit-code docs/rules.md; then
echo "::error::docs/rules.md is stale. Run 'cargo run --features docgen --bin nyx-docgen' and commit the result."
exit 1
fi
rust-beta-build:
name: rust-beta-build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: beta
cache: true
- name: Beta compile compatibility check
run: cargo check --all-features --tests
rust-stable-test:
name: rust-stable-test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@nextest
- name: Rust tests (stable)
run: cargo nextest run --all-features
cross-platform-smoke:
name: cross-platform-smoke
strategy:
fail-fast: false
matrix:
os: [macos-latest, windows-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@nextest
- name: Build
run: cargo build --release --all-features
- name: Smoke tests
run: cargo nextest run --all-features --test integration_tests --test pattern_tests --test cli_validation_tests
rust-beta-test:
name: rust-beta-test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: beta
cache: true
- uses: taiki-e/install-action@nextest
- name: Rust tests (beta)
run: cargo nextest run --all-features
benchmark-gate:
name: benchmark-gate
runs-on: ubuntu-latest
timeout-minutes: 25
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
cache-key: benchmark-gate-release
- name: Accuracy regression gate (P/R/F1)
run: cargo test --release --all-features --test benchmark_test -- --ignored --nocapture benchmark_evaluation
- name: Performance regression gate
env:
NYX_CI_BENCH: "1"
run: cargo test --release --all-features --test perf_tests -- --nocapture
- name: Upload benchmark results
if: always()
uses: actions/upload-artifact@v7
with:
name: benchmark-results
path: tests/benchmark/results/latest.json
if-no-files-found: warn

45
.github/workflows/codeql.yml vendored Normal file
View file

@ -0,0 +1,45 @@
name: "CodeQL Advanced"
on:
push:
branches: ["master"]
pull_request:
branches: ["master"]
schedule:
- cron: "28 20 * * 2"
jobs:
analyze:
name: Analyze (${{ matrix.language }})
runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }}
permissions:
security-events: write
packages: read
actions: read
contents: read
strategy:
fail-fast: false
matrix:
include:
- language: actions
build-mode: none
- language: javascript-typescript
build-mode: none
- language: rust
build-mode: none
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Initialize CodeQL
uses: github/codeql-action/init@v4
with:
languages: ${{ matrix.language }}
build-mode: ${{ matrix.build-mode }}
config-file: ./.github/codeql/codeql-config.yml
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v4
with:
category: "/language:${{ matrix.language }}"

50
.github/workflows/docs.yml vendored Normal file
View file

@ -0,0 +1,50 @@
name: docs
on:
push:
branches: [master]
paths:
- "docs/**"
- "book.toml"
- ".github/workflows/docs.yml"
- "assets/screenshots/**"
workflow_dispatch:
permissions:
contents: read
pages: write
id-token: write
concurrency:
group: pages
cancel-in-progress: false
jobs:
build-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Cache mdbook
id: cache-mdbook
uses: actions/cache@v4
with:
path: ~/.cargo/bin/mdbook
key: mdbook-0.5.2-${{ runner.os }}
- name: Install mdbook
if: steps.cache-mdbook.outputs.cache-hit != 'true'
run: cargo install mdbook --version 0.5.2 --locked
# mdbook follows the committed docs/assets symlink (→ ../assets) so
# image references in docs resolve both in `mdbook serve` and in CI.
- name: Build
run: mdbook build
- name: Upload artifact
uses: actions/upload-pages-artifact@v4
with:
path: book
- name: Deploy to GitHub Pages
uses: actions/deploy-pages@v4

View file

@ -11,12 +11,14 @@ env:
BIN_NAME: nyx
jobs:
build-and-upload:
build:
strategy:
matrix:
include:
- target: x86_64-unknown-linux-gnu
os: ubuntu-latest
- target: aarch64-unknown-linux-gnu
os: ubuntu-latest
- target: x86_64-pc-windows-msvc
os: windows-latest
- target: x86_64-apple-darwin
@ -27,7 +29,7 @@ jobs:
steps:
- name: Check out sources
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Install Rust toolchain
uses: actions-rust-lang/setup-rust-toolchain@v1
@ -36,17 +38,25 @@ jobs:
target: ${{ matrix.target }}
cache: true
- name: Install cross-compilation tools (ARM Linux)
if: matrix.target == 'aarch64-unknown-linux-gnu'
run: |
sudo apt-get update
sudo apt-get install -y gcc-aarch64-linux-gnu
echo '[target.aarch64-unknown-linux-gnu]' >> ~/.cargo/config.toml
echo 'linker = "aarch64-linux-gnu-gcc"' >> ~/.cargo/config.toml
- name: Install target
run: rustup target add ${{ matrix.target }}
- name: Build
run: cargo build --release --bin ${{ env.BIN_NAME }} --target ${{ matrix.target }}
- name: Install cargo-about
run: cargo install cargo-about --locked
- name: Generate license bundle
run: cargo about generate about.hbs -o THIRDPARTY-LICENSES.html
# THIRDPARTY-LICENSES.html is committed at the repo root and kept in
# sync with the dependency graph by the `third-party-licenses` CI
# job. Release builds ship the committed copy directly — no
# regeneration (and no per-runner cargo-about install) on the
# release hot path.
- name: Package (Linux & macOS)
if: runner.os != 'Windows'
@ -81,9 +91,157 @@ jobs:
Add-Content -Path $env:GITHUB_ENV -Value "ASSET=$Archive"
- name: Upload to the release
uses: softprops/action-gh-release@v2
- name: Upload build artifact
uses: actions/upload-artifact@v7
with:
files: dist/${{ env.ASSET }}
name: release-${{ matrix.target }}
path: dist/${{ env.ASSET }}
if-no-files-found: error
retention-days: 1
reproducibility:
# Supply-chain smoke test: build the release binary twice with pinned
# SOURCE_DATE_EPOCH and path remapping, then diff the SHA256 hashes.
# Gates `publish` so non-reproducible builds cannot ship. Scoped to
# x86_64-linux — the most tractable target for byte-for-byte
# determinism; failures on other targets would be investigated
# separately.
name: reproducibility-check
runs-on: ubuntu-latest
steps:
- name: Check out sources
uses: actions/checkout@v6
- name: Install Rust toolchain
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
target: x86_64-unknown-linux-gnu
cache: true
- name: Build twice and diff hashes
shell: bash
env:
RUSTFLAGS: "--remap-path-prefix=${{ github.workspace }}=/build"
run: |
set -euo pipefail
TARGET=x86_64-unknown-linux-gnu
BIN=${{ env.BIN_NAME }}
BIN_PATH="target/$TARGET/release/$BIN"
SOURCE_DATE_EPOCH=$(git log -1 --format=%ct HEAD)
export SOURCE_DATE_EPOCH
echo "SOURCE_DATE_EPOCH=$SOURCE_DATE_EPOCH"
cargo build --release --bin "$BIN" --target "$TARGET"
HASH1=$(sha256sum "$BIN_PATH" | awk '{print $1}')
echo "first build: $HASH1"
cargo clean --release --target "$TARGET"
cargo build --release --bin "$BIN" --target "$TARGET"
HASH2=$(sha256sum "$BIN_PATH" | awk '{print $1}')
echo "second build: $HASH2"
if [ "$HASH1" != "$HASH2" ]; then
echo "::error::Reproducibility check failed: builds are not bit-identical"
echo " first: $HASH1"
echo " second: $HASH2"
exit 1
fi
echo "::notice::Reproducible build verified (sha256=$HASH1)"
publish:
# Collect all matrix build outputs, generate a single SHA256SUMS file,
# then push everything to the GitHub release in one shot. Doing this
# centrally (rather than per-matrix job) is the only way to produce a
# checksum file that covers every published artifact.
name: publish-release
runs-on: ubuntu-latest
needs: [build, reproducibility]
permissions:
contents: write
id-token: write
attestations: write
env:
GPG_PRIVATE_KEY: ${{ secrets.GPG_PRIVATE_KEY }}
GPG_PASSPHRASE: ${{ secrets.GPG_PASSPHRASE }}
steps:
- name: Check out sources
uses: actions/checkout@v6
# Generate the SBOM from the source tree BEFORE downloading
# artifacts. Syft scans `path: .` recursively; if release-artifacts/
# exists at scan time, it would walk into the zipped binaries and
# produce a polluted manifest.
- name: Generate CycloneDX SBOM
uses: anchore/sbom-action@v0
with:
path: .
format: cyclonedx-json
output-file: nyx-${{ github.event.release.tag_name }}.cdx.json
upload-artifact: false
upload-release-assets: false
- name: Download all build artifacts
uses: actions/download-artifact@v8
with:
path: release-artifacts
pattern: release-*
merge-multiple: true
- name: Generate SHA256SUMS
run: |
set -euo pipefail
cd release-artifacts
ls -lh
sha256sum *.zip > SHA256SUMS
cat SHA256SUMS
- name: Import GPG signing key
if: env.GPG_PRIVATE_KEY != ''
run: |
set -euo pipefail
printf '%s' "$GPG_PRIVATE_KEY" | gpg --batch --import
gpg --list-secret-keys --keyid-format=long
- name: Sign SHA256SUMS
if: env.GPG_PRIVATE_KEY != ''
run: |
set -euo pipefail
cd release-artifacts
if [ -n "${GPG_PASSPHRASE:-}" ]; then
printf '%s' "$GPG_PASSPHRASE" \
| gpg --batch --yes --pinentry-mode loopback \
--passphrase-fd 0 --armor --detach-sign SHA256SUMS
else
gpg --batch --yes --armor --detach-sign SHA256SUMS
fi
ls -l SHA256SUMS.asc
- name: Warn if GPG signing was skipped
if: env.GPG_PRIVATE_KEY == ''
run: |
echo "::warning::GPG_PRIVATE_KEY secret not configured; SHA256SUMS will ship unsigned. Add GPG_PRIVATE_KEY (ASCII-armored) and optional GPG_PASSPHRASE to repository secrets to enable signed checksums."
# SLSA v1 build provenance: signed attestation that these exact
# bytes were produced by this workflow run from this commit.
# Attestations are stored in the GitHub attestations API and can
# be verified with `gh attestation verify <file> --repo <repo>`.
- name: Generate SLSA build provenance
uses: actions/attest-build-provenance@v4
with:
subject-path: |
release-artifacts/*.zip
release-artifacts/SHA256SUMS
nyx-${{ github.event.release.tag_name }}.cdx.json
- name: Upload to the release
uses: softprops/action-gh-release@v3
with:
files: |
release-artifacts/*.zip
release-artifacts/SHA256SUMS
release-artifacts/SHA256SUMS.asc
nyx-${{ github.event.release.tag_name }}.cdx.json
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

5
.gitignore vendored
View file

@ -1,2 +1,7 @@
/target
/.idea
/frontend/node_modules
/src/server/assets/dist
/.nyx
/book
.DS_Store

5
.nyx/triage.json Normal file
View file

@ -0,0 +1,5 @@
{
"version": 1,
"decisions": [],
"suppression_rules": []
}

36
AI-POLICY.md Normal file
View file

@ -0,0 +1,36 @@
# AI Contribution Policy
Nyx accepts contributions that were drafted, refactored, or reviewed with the help of AI tools (LLMs, code assistants, agent systems). We care about the contribution, not the keystrokes. AI changes the failure modes though, so we ask contributors to follow a few rules.
## What we ask of contributors
By opening a pull request you affirm that:
1. **You have read and understood every line you are submitting.** If you cannot explain a change under review, it is not ready to merge. "The model wrote it" is not an answer we will accept for a bug or a regression.
2. **You have the right to submit the code.** AI-generated code is only as license-clean as its training data and its prompt. Do not paste proprietary, GPL-incompatible, or confidential code into an AI tool and then submit the output here. If a model reproduced a substantial verbatim snippet from an identifiable source, disclose it.
3. **You take responsibility for the change.** The DCO `Signed-off-by:` trailer applies the same way to AI-assisted code as it does to hand-written code. You are certifying origin and right-to-submit.
4. **You disclose material AI use in the PR description.** A one-line note is enough. For example, "Drafted with an AI assistant; reviewed and tested by me." Trivial uses like tab-completion, renames, or formatting do not need to be called out. New analysis passes, rule logic, or security-relevant code do.
## What we look for in review
AI-assisted PRs face the same bar as any other PR, but reviewers will pay extra attention to:
- **Tests that exercise the new behavior.** Not just "it compiles." Fixtures under `tests/fixtures/` and assertions in `expected.yaml` are how we verify security logic.
- **Consistency with the existing engine.** Drive-by refactors, speculative abstractions, or parallel implementations of existing passes will usually be rejected, even if they look clean in isolation.
- **Fabricated references.** AI tools sometimes invent function names, crate APIs, CVE IDs, or citations. Every symbol referenced in a PR must exist, and every external claim must be verifiable.
- **Rule metadata honesty.** Rule descriptions, CWE mappings, and severity ratings are part of how downstream users triage. Do not inflate severity or cite CWEs the rule does not actually detect.
## What we will not accept
- PRs that are clearly unreviewed agent output, such as changes in the wrong file, nonsense tests, hallucinated APIs, or code that does not compile.
- PRs that add "AI-generated" boilerplate, marketing copy, or filler documentation to pad scope.
- Mass-generated PRs across many unrelated areas in a single change.
- Code that was generated by pasting another project's proprietary source into an AI tool.
## Project's own use of AI
For transparency, the README includes an [AI Disclosure](README.md#ai-disclosure) describing where AI was used in Nyx itself. The short version: the analysis engine is predominantly human-written and human-reviewed, while documentation, fixtures, and rule metadata were drafted with AI assistance and audited before landing. We hold outside contributions to the same standard.
## Questions
If you are unsure whether a contribution falls inside this policy, open a draft PR or an issue and ask before investing time. We would rather have the conversation early than reject work at review.

View file

@ -1,320 +1,217 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
All notable changes to Nyx are documented here. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html). For where Nyx is going, see the [Roadmap](ROADMAP.md).
## [Unreleased]
## [0.4.0] - 2025-02-25
_No changes yet._
### Added
- **Low-noise prioritization system** — post-analysis pipeline that reduces noise from high-frequency LOW/Quality findings without hiding security signal. Three-stage process: category filtering, rollup grouping, and LOW budgets.
- **`FindingCategory` enum** (`Security`, `Reliability`, `Quality`) — every `Diag` now carries a `category` field. AST pattern findings derive their category from `PatternCategory` metadata (`CodeQuality``Quality`, all others → `Security`). Taint, CFG, and state findings are always `Security`.
- **Category filtering** — Quality-category findings (e.g. `rs.quality.unwrap`, `rs.quality.expect`) are excluded by default. Use `--include-quality` to include them.
- **Rollup grouping** — eligible HIGH-frequency rules (`rs.quality.unwrap`, `rs.quality.expect`, `rs.quality.panic_macro`) are grouped by `(file, rule)` into a single rollup finding with occurrence count and example locations. Canonical location is the first sorted occurrence. Example count controlled by `--rollup-examples` (default 5).
- **LOW budgets** — three configurable limits enforce noise caps: `--max-low` (default 20, total), `--max-low-per-file` (default 1), `--max-low-per-rule` (default 10). Rollups count as one finding for all budgets. High/Medium findings are never dropped.
- **`--all` CLI flag** — disables all prioritization (no category filtering, no rollups, no budgets).
- **`--show-instances <RULE>`** — bypasses rollup for a specific rule, expanding all individual occurrences.
- **Console suppression footer** — when findings are suppressed, a footer displays the count and active filter values with adjustment hints.
- **`rollup` field on `Diag`** — optional `RollupData` with `count` and `occurrences` (example `Location`s). Serializes to JSON automatically; omitted when not a rollup.
- **SARIF rollup support**`category` in result properties, rollup count in `properties.rollup.count`, example locations in `relatedLocations`.
- **`max_results` severity stability** — when `max_results` truncation is needed, High findings are kept first, then Medium, then Low. Low findings never displace higher-severity ones.
- New config fields in `[output]`: `include_quality`, `show_all`, `max_low`, `max_low_per_file`, `max_low_per_rule`, `rollup_examples`.
- 14 new unit tests covering category filtering, rollup grouping/examples/canonical, LOW budgets (per-file/per-rule/total), High/Medium immunity, rollup-counts-as-one, show_instances bypass, JSON serialization, and determinism.
- **Pattern-level confidence for AST rules** — each AST pattern in `src/patterns/` now carries an explicit `confidence: Confidence` field (High, Medium, or Low). Confidence is set at the pattern definition site and flows directly into emitted `Diag`s, replacing the old heuristic that inferred AST confidence from severity alone. `compute_confidence()` is retained as a fallback for detectors that don't set confidence (taint, state, legacy).
- Tier A patterns with High/Medium severity → `Confidence::High` (deterministic structural match).
- Tier A patterns with Low severity → `Confidence::Medium` (quality/crypto signals).
- Tier B patterns (heuristic-guarded) → `Confidence::Medium`.
- Example: `rs.quality.expect` now produces `Confidence: High` regardless of its Low severity.
- **Inline per-finding suppressions** — suppress specific findings directly in source code using `nyx:ignore` comments. Two directive forms: `nyx:ignore <RULE_ID>` (same line) and `nyx:ignore-next-line <RULE_ID>` (next line). Supports comma-separated IDs, wildcard suffixes (`rs.quality.*`), and automatic canonicalization of taint rule IDs (parenthetical suffixes stripped). Comment detection covers all 10 languages with string/raw-string/template-literal guards to avoid false positives.
- **`--show-suppressed` CLI flag** — reveal suppressed findings in output, dimmed with `[SUPPRESSED]` tag. Summary shows `"N issues (M suppressed)"`. In JSON/SARIF mode, suppressed findings include `"suppressed": true` and `"suppression": {...}` metadata fields.
- **`suppressed` and `suppression` fields on `Diag`** — conditionally serialized; JSON output is unchanged when no suppressions are active.
- Suppressed findings are excluded from `--fail-on` exit-code checks and severity counts.
- New module `src/suppress/mod.rs` with 22 unit tests covering all comment styles, string guards, wildcard matching, canonicalization, CRLF, and edge cases.
- **`--min-score <N>` CLI flag and `output.min_score` config option** — filter out findings whose attack-surface rank score falls below the given threshold. Applied after ranking and severity filtering, before `max_results` truncation. Has no effect when `--no-rank` is used. CLI value overrides config.
- **Attack surface ranking** — deterministic post-analysis scoring layer that prioritizes findings by exploitability. Each `Diag` receives an `f64` score computed from five components: severity base (High=60, Medium=30, Low=10), analysis kind bonus (taint +10 > state +8 > cfg +3/5 > ast 0), evidence strength (+1 per item, +26 for source-kind priority), state rule type bonus (+16), and a path-validation penalty (5 for guarded paths). Findings are sorted by descending score before truncation so `max_results` keeps the most important results. Tie-breaking is deterministic by severity, rule ID, file path, line, column, and message hash.
- **`rank_score` and `rank_reason` fields on `Diag`** — optional fields with `#[serde(skip_serializing_if = "Option::is_none")]`; JSON output is unchanged when ranking is disabled.
- **`--no-rank` CLI flag** — disables attack-surface ranking (enabled by default).
- **`output.attack_surface_ranking` config key** — boolean (default `true`) to control ranking via config file.
- **Console score display** — dim `Score: N` appended to each finding's header line when ranking is enabled.
- **New module `src/rank.rs`**`compute_attack_rank()`, `rank_diags()`, and `sort_key()` functions. Scoring uses only in-memory data; no extra file I/O or graph recomputation.
- 10 new unit tests: ordering correctness (high taint > medium file-io, must-leak > may-leak, taint > cfg-only, state rules, AST lowest at same severity), determinism (input-order-independent), path-validation penalty, and JSON serialization (rank fields omitted when None, present when set).
- **State-model dataflow analysis** — new `src/state/` module implementing a forward worklist dataflow engine over the existing CFG. Tracks per-variable resource lifecycle (`UNINIT`, `OPEN`, `CLOSED`, `MOVED`) via bitset lattice and per-path authentication level (`Unauthed`, `Authed`, `Admin`) as a composable product domain. Detects:
- **Use-after-close** (`state-use-after-close`, High) — variable read/written after its resource handle was closed.
- **Double-close** (`state-double-close`, Medium) — resource handle closed more than once.
- **Must-leak** (`state-resource-leak`, High) — resource acquired but never closed on any exit path.
- **May-leak** (`state-resource-leak-possible`, Medium) — resource open on some but not all exit paths (branch-aware via lattice join).
- **Unauthenticated access** (`state-unauthed-access`, High) — sensitive sink reached without a preceding auth/admin check.
- **State analysis architecture** — six-module design:
- `lattice.rs``Lattice` trait (`bot`, `join`, `leq`) for generic fixed-point computation.
- `domain.rs``ResourceLifecycle` (bitflag), `ResourceDomainState`, `AuthLevel`, `AuthDomainState`, `ProductState` with lattice impls.
- `symbol.rs``SymbolInterner` that builds a string-interning table from CFG node defines/uses; `SymbolId` newtype.
- `transfer.rs``DefaultTransfer` function: maps CFG node kinds (Call, Assignment, If, Return) to state transitions using the existing `ResourcePair` definitions from `cfg_analysis::rules`. Emits `TransferEvent` for illegal transitions.
- `engine.rs` — two-phase forward worklist solver: Phase 1 iterates to a fixed point (no events collected to avoid spurious reports from intermediate states); Phase 2 re-applies transfer once over converged states to collect events. Bounded by `MAX_TRACKED_VARS` (64) with guarded degradation.
- `facts.rs` — post-analysis pass: extracts `StateFinding`s from transfer events (use-after-close, double-close) and exit-node state inspection (must-leak, may-leak, unauthed access).
- **`scanner.enable_state_analysis` config option** — opt-in boolean (default `false`) in `ScannerConfig` and `default-nyx.conf`. Requires CFG mode (`full` or `taint`).
- **`Diag.message` field** — optional human-readable message on diagnostic output. State findings carry variable-specific context (e.g. "variable `f` used after close"). Surfaced in console output (dimmed line below the finding), JSON, and SARIF (`message.text` prefers per-finding message over generic rule description).
- **State finding dedup** — when state analysis produces findings on a line, overlapping `cfg-resource-leak` and `cfg-auth-gap` findings on the same line are suppressed (state analysis is more precise).
- **SARIF rule descriptions** for all five state rule IDs.
- 21 integration tests (`tests/state_tests.rs`) with 19 C fixture files covering: use-after-close, double-close, resource leak, clean usage, opt-in gating, may-leak vs must-leak branch semantics, early return, nested branches, both-branches-close, loop convergence, loop use-after-close, handle overwrite, reopen-after-close, multiple handles, conservative join masking, chain operations, malloc/free pairs, straight-line double-close, and message field population.
- 30+ unit tests across state modules: lattice properties, lifecycle join/leq, domain merging, auth-level join, product state composition, may/must leak semantics, symbol interning, and transfer event generation.
- **`--severity <EXPR>` filter** — replaces `--high-only` with a flexible severity expression supporting single levels (`HIGH`), comma lists (`HIGH,MEDIUM`), and thresholds (`>=MEDIUM`). Parsing is case-insensitive with whitespace tolerance. `SeverityFilter` type with `parse()` and `matches()` in `patterns/mod.rs`.
- **`--mode <full|ast|cfg|taint>`** — replaces `--ast-only` and `--cfg-only` with a single canonical analysis mode flag. Enforces mutual exclusivity via clap `ValueEnum`.
- **`--index <auto|off|rebuild>`** — replaces `--no-index` and `--rebuild-index` with a single flag (default `auto`).
- **`--fail-on <SEVERITY>`** — CI ergonomics: exit code 1 if any emitted finding meets or exceeds the threshold severity. Example: `--fail-on HIGH`.
- **`--quiet`** — CLI flag to suppress all human-readable status output (equivalent to `output.quiet = true` in config).
- **`--keep-nonprod-severity`** — renamed from `--include-nonprod` for clarity; old name kept as hidden alias.
- **`OutputFormat` enum** — `--format` now uses clap `ValueEnum` with typed `Console`, `Json`, `Sarif` variants (default `Console`). No more empty-string default.
- 10 new unit tests: `SeverityFilter` parsing (single, comma list, threshold, case-insensitive, whitespace, empty rejection, invalid level rejection), `Severity::from_str` rejection of unknown values, and `severity_filter_applied_at_output_stage` integration test verifying that downgraded findings are correctly filtered.
- **AST pattern overhaul** -- all 10 language pattern files (`src/patterns/*.rs`) rewritten with consistent conventions, structured metadata, and validated tree-sitter queries.
- **Pattern schema extensions** -- `PatternTier` (A = structural, B = heuristic-guarded), `PatternCategory` (13 vulnerability classes), and `Hash` on `Severity`. Module-level docs explain conventions and how to add new patterns.
- **Namespaced IDs** -- all pattern IDs follow `<lang>.<category>.<specific>` format (e.g. `java.deser.readobject`, `py.cmdi.os_system`, `js.xss.document_write`).
- **New vulnerability coverage** -- 30+ new patterns across languages: Python deserialization (`pickle.loads`, `yaml.load`, `shelve.open`), Python command injection (`os.system`, `os.popen`), Python weak crypto (`hashlib.md5/sha1`), Java reflection (`Method.invoke`), Java weak digest (`MessageDigest.getInstance("MD5")`), Java XSS (`getWriter().println`), Go TLS misconfiguration (`InsecureSkipVerify: true`), Go SQL concat, Go hardcoded secrets, Go gob deserialization, PHP `assert()` code exec, PHP `include $var` path traversal, PHP weak crypto (`md5`/`sha1`/`rand`), C/C++ `popen()`, C/C++ format-string with variable first arg, C++ `const_cast`, Ruby `Digest::MD5`.
- **Query fixes** -- fixed 11 broken tree-sitter queries: Java `object_creation_expression` used wrong type node (`identifier``type_identifier`), C++ `reinterpret_cast`/`const_cast` used non-existent node types (→ `template_function` match), Ruby backtick used `shell_command` (→ `subshell`), Python SQL used `binary_expression` (→ `binary_operator`), TypeScript `as any` used inaccessible field (→ positional child), PHP patterns missing `argument` wrapper nodes, Rust `unsafe fn` regex used unsupported `\b`.
- **No-duplicate rule** -- patterns that overlap with taint sinks use distinct ID namespaces and are documented; dedup in `ast.rs` prevents duplicate findings at the same location.
- **Severity recalibration** -- `unwrap`/`expect`/`panic!`/`todo!` moved to Low (filtered by default `min_severity`). Security patterns remain High/Medium.
- **Pattern test suite** (`tests/pattern_tests.rs`, 26 tests) -- sanity checks (unique IDs, query compilation, non-empty descriptions, naming convention, severity distribution), positive fixture tests (10 languages), and negative fixture tests (10 languages verifying no false positives on safe code).
- **Pattern test fixtures** -- positive and negative fixture files for all 10 languages under `tests/fixtures/patterns/<lang>/`.
- **Real world test suite** — comprehensive fixture-based test suite (`tests/real_world_tests.rs`) with ~180 test fixtures across all 10 supported languages (C, C++, Go, Java, JavaScript, PHP, Python, Ruby, Rust, TypeScript). Each fixture has an `.expect.json` file declaring expected findings (with `must_match` for hard requirements and soft expectations for aspirational coverage). Fixtures are organized by analysis type (`taint/`, `state/`, `cfg/`, `mixed/`) under `tests/fixtures/real_world/<lang>/`. A single parameterized test runner validates all fixtures in both `full` and `ast` modes, with verbose output via `NYX_TEST_VERBOSE=1`.
## [0.5.0] — 2026-04-24
The biggest release since launch. The taint engine was rebuilt on top of an SSA IR, cross-file analysis was deepened across the board, and Nyx now ships a local web UI for triaging findings without leaving your machine.
### Changed
- **Console header line now includes confidence** — the finding header shows score and confidence together as a parenthesized suffix: `(Score: 36, Confidence: Medium)`. The previous standalone `Confidence: ...` body line is removed. All four combinations are handled (both, score-only, confidence-only, neither).
- **Confidence display uses Title Case**`Confidence::Display` now renders as `Low`, `Medium`, `High` (previously lowercase).
- **Breaking**: Config and data directory changed from `dev.ecpeter23.nyx` to `nyx` (e.g. `~/Library/Application Support/nyx/` on macOS). Existing config files (`nyx.conf`, `nyx.local`) and SQLite indexes at the old path will not be picked up automatically — copy them to the new location or re-run `nyx scan` to regenerate.
- **Improved diagnostic output formatting** — overhauled console renderer for a professional, security-tool-grade look:
- Severity is now the strongest visual anchor: HIGH (bold red with ✖), MEDIUM (bold orange ⚠), LOW (muted blue-gray ●). Fewer colors, clearer hierarchy.
- File paths rendered dim blue (never brighter than severity).
- Taint flow messages now use `→` arrow between shortened source/sink instead of backtick-wrapped text.
- Evidence values (Source, Sink) no longer wrapped in backticks — cleaner rendering with no risk of broken backtick spans across wrapped lines.
- **Fixed taint expression rendering** — multi-line sink/source call chains are now normalised before display:
- Whitespace collapsed (`foo() .bar()``foo().bar()`).
- Newlines joined into single-line canonical form.
- Spacing artefacts between `)` and `.` in method chains cleaned up.
- Long chains truncated with `…` ellipsis.
- Added `terminal_size` dependency for terminal-width-aware line wrapping.
- **Monotone forward dataflow taint analysis** — replaced the BFS taint engine in `taint/mod.rs` with a proper worklist-based forward dataflow analysis where termination is guaranteed by lattice finiteness. The generic `Transfer<S: Lattice>` trait in `state/engine.rs` now powers both the resource lifecycle/auth analysis and taint analysis.
- **`TaintState` lattice** (`taint/domain.rs`) — bounded abstract state with per-variable `VarTaint` (Cap bitflags + multi-origin tracking via `SmallVec<[TaintOrigin; 2]>`), dual validation bitsets (`validated_must` for intersection/all-paths, `validated_may` for union/any-path), and monotone `PredicateSummary` for contradiction pruning. Variables stored in sorted `SmallVec` keyed by `SymbolId` for O(n) merge-join. Lattice height bounded at ~8700 (7-bit Cap × 64 vars + validation bits + predicate bits).
- **`TaintTransfer`** (`taint/transfer.rs`) — implements `Transfer<TaintState>` with identical taint logic to the old BFS (source → propagation → sanitization → sink check). Callee resolution unchanged (local → global same-lang → interop edges). Emits `TaintEvent::SinkReached` events during Phase 2 of the engine.
- **JS/TS two-level solve** — prevents cross-function taint leakage (the main source of state explosion in the old BFS) while preserving global-to-function flows. Level 1 solves top-level code; Level 2 solves each function seeded with read-only top-level taint via `global_seed`.
- **Monotone predicate tracking** — path-sensitivity predicates moved from per-BFS-item `PathState` (which duplicated state exponentially) to monotone `PredicateSummary` in the lattice. Contradiction pruning uses `known_true & known_false` bit intersection (NullCheck/EmptyCheck/ErrorCheck only), which is both more precise and guaranteed monotone.
- **Multi-origin tracking** — each tainted variable tracks up to 4 `TaintOrigin` (node + `SourceKind`), enabling multiple findings when distinct sources flow to the same sink.
- **Guaranteed termination** — no more `MAX_BFS_ITERATIONS`/`MAX_SEEN_STATES` safety nets needed (though a 100K worklist iteration budget remains as defense-in-depth). Convergence follows from finite lattice height × finite CFG edges.
- **`analyse_file()` signature unchanged** — `Finding` struct, `Diag` conversion, and all callers are unaffected.
- **Generic dataflow engine** (`state/engine.rs`) — `run_forward()` and `DataflowResult` are now generic over any `S: Lattice` + `T: Transfer<S>`. `DefaultTransfer` (resource lifecycle) implements `Transfer<ProductState>`; `TaintTransfer` implements `Transfer<TaintState>`. Per-domain iteration budget and `on_budget_exceeded` hooks added.
- **`path_state.rs` simplified** — removed `PathState`, `Predicate`, `MAX_PATH_PREDICATES`, `state_hash()`, `priority()` structs/methods. Kept `PredicateKind` enum and `classify_condition()` function (used by the new transfer for predicate classification).
- **Removed BFS infrastructure**`taint_hash()`, BFS `Item` struct, `pred` predecessor map, two-tier seen-state map, and all bail-out constants (`MAX_BFS_ITERATIONS=200K`, `MAX_SEEN_STATES=100K`, `PATH_SENSITIVITY_NODE_LIMIT=500`, `PATH_SENSITIVITY_QUEUE_LIMIT=10K`, `MAX_PATH_VARIANTS_PER_KEY=4`) are no longer needed and have been removed.
- **Severity filtering applied at output stage**`--severity` (and legacy `--high-only`) filtering is now applied ONCE in `scan::handle()` after all severity normalization (nonprod downgrades, dedup, truncation). Previously `--high-only` only filtered AST patterns during analysis; taint and CFG findings bypassed the filter entirely.
- **`--format` default is `console`** — previously defaulted to empty string, requiring fallback logic.
- **All status/progress output goes to stderr** — "Checking...", "Finished in...", config notes, and progress bars now use `eprintln!`/stderr exclusively. JSON and SARIF output is stdout-only.
- **`Severity::from_str` returns `Err` for unknown values** — previously returned `Ok(Severity::Low)` for any unrecognized input.
- **Deprecated CLI flags preserved as hidden aliases**`--high-only`, `--no-index`, `--rebuild-index`, `--ast-only`, `--cfg-only`, and `--include-nonprod` are hidden from help but still functional, mapping to their canonical replacements.
- **Path-sensitive taint analysis** -- the BFS taint engine now carries a `PathState` (bounded set of branch predicates) alongside the taint map. When the BFS traverses a True or False edge from an `If` node, it records a `Predicate` with the condition's variables, kind, and polarity. This enables two new capabilities:
- **Infeasible path pruning** -- paths with contradictory predicates (e.g. `if x.is_none() { return; } if x.is_none() { sink }`) are detected and pruned, eliminating false positives on code guarded by redundant null/empty/error checks. Contradiction detection is conservative: only whitelisted kinds (`NullCheck`, `EmptyCheck`, `ErrorCheck`) with single-variable predicates are pruned.
- **Validation guard annotation** -- when all tainted variables reaching a sink are guarded by a `ValidationCall` predicate (e.g. `if validate(&x) { sink }` or `if !validate(&x) { return; } sink`), the finding is annotated with `path_validated: true` and `guard_kind: ValidationCall`. This metadata is surfaced in JSON and console output without changing severity.
- **Condition metadata on CFG nodes** -- `NodeInfo` now carries `condition_text`, `condition_vars`, and `condition_negated` for `If` nodes, extracted during CFG construction. Negation detection handles `!expr`, `not expr`, and Ruby `unless`. Classification of condition text into `PredicateKind` (NullCheck, EmptyCheck, ErrorCheck, ValidationCall, SanitizerCall, Comparison, Unknown) is conservative: call-based kinds require `(` in the text and a matching callee token.
- **`path_validated` and `guard_kind` fields on `Diag`** -- taint findings carry path-sensitivity metadata in JSON output (fields omitted when not set) and console output (suffix line `Path guard: ValidationCall` when present). Finding IDs are unchanged for dedup stability.
- **`smallvec` dependency** -- used for inline-allocated predicate storage in `PathState` (avoids heap allocation for the common case of ≤4 predicates per path).
- **Interprocedural call graph** -- a whole-program `CallGraph` (`petgraph::DiGraph<FuncKey, CallEdge>`) is now built between Pass 1 and Pass 2 of every taint-enabled scan. Each function definition is a node; resolved callee relationships are edges. The graph is constructed from the merged `GlobalSummaries` and is available in both the filesystem and indexed scan paths.
- **Three-valued callee resolution** -- `CalleeResolution` enum distinguishes `Resolved(FuncKey)`, `NotFound`, and `Ambiguous(Vec<FuncKey>)`. Ambiguous callees (same name in multiple namespaces, caller in a third namespace) are tracked separately from missing callees for diagnostics.
- **Shared resolution helper** -- `GlobalSummaries::resolve_callee_key()` centralizes same-language callee resolution with arity-aware filtering and namespace disambiguation. Both the call graph builder and the taint engine now use the same resolution logic.
- **Callee-name normalization** -- `normalize_callee_name()` extracts the last segment from qualified callee text (`"env::var"``"var"`, `"obj.method"``"method"`) before resolution. The raw call-site text is preserved on graph edges for diagnostics.
- **SCC / topological analysis** -- `CallGraphAnalysis` computes strongly connected components via Tarjan's algorithm and exposes a callee-first (leaves-first) topological ordering of SCC indices, ready for future bottom-up taint propagation.
- **Call graph tracing** -- `tracing::info!` log with node count, edge count, unresolved-not-found count, unresolved-ambiguous count, and SCC count is emitted after every call graph build.
- 8 new path-sensitivity integration tests: early-return validation guard, failed-validation branch, contradictory null-check pruning, if/else validation annotation, sanitize-one-branch regression, path-state budget graceful degradation, unknown-predicate non-pruning, multi-var non-pruning.
- 35 new unit tests in `taint::path_state`: classify_condition variants, PathState push/truncation, contradiction detection (whitelisted kinds, single-var only), has_validation_for semantics, state_hash determinism, priority ordering.
- 11 new unit tests: callee normalization, same-name-different-namespaces resolution, cross-language isolation, arity separation, recursive SCC detection, not-found vs ambiguous diagnostics, diamond topo ordering, interop edge resolution, namespace normalization consistency, and raw call-site preservation.
- **Edge-aware taint traversal** -- `analyse_file()` now uses `cfg.edges(node)` instead of `cfg.neighbors(node)`, inspecting `EdgeKind` on each edge. This is required for predicate recording but also makes the taint engine aware of the CFG's branch structure for the first time.
- **Two-tier seen-state deduplication** -- the BFS seen-state map changed from `HashSet<(NodeIndex, u64)>` to a `HashMap` keyed by `(NodeIndex, taint_hash)` mapping to a bounded list of `(path_hash, priority)` pairs. At most `MAX_PATH_VARIANTS_PER_KEY` (4) path variants are tracked per taint state, with deterministic eviction preferring non-truncated states with fewer predicates.
- **Finding deduplication** -- taint findings are now deduplicated by `(sink, source)` pair after analysis, preferring findings with `path_validated = true` (most informative metadata).
- **`taint::Finding` struct** -- added `path_validated: bool` and `guard_kind: Option<PredicateKind>` fields. Code that constructs `Finding` directly must include these fields.
- **`Diag` struct** -- added `path_validated: bool` and `guard_kind: Option<String>` fields. Both use `#[serde(skip_serializing_if)]` to omit from JSON when not set.
- **`taint::resolve_callee()` refactored** -- the global resolution step now delegates to `GlobalSummaries::resolve_callee_key()` and applies `normalize_callee_name()` before lookup, unifying resolution logic with the call graph builder.
- **Label rules expanded across 8 languages:**
- **Go** — added `r.URL.Query`, `r.URL.Query.Get`, `Request.FormValue`, `Request.URL` sources; `filepath.Clean`/`filepath.Base` sanitizers; `fmt.Fprintf`/`fmt.Sprintf`/`fmt.Printf` format-string sinks; `os.Open`/`os.OpenFile`/`os.Create`/`ioutil.ReadFile`/`os.ReadFile` FILE_IO sinks; `template.HTML` HTML sink; `db.QueryRow`/`db.Prepare` SQL sinks.
- **PHP** — sources now match both `$_GET` and `_GET` (without `$` prefix, matching collect_idents stripping); added `$_FILES`/`_FILES`, `$_SERVER`/`_SERVER`, `$_ENV`/`_ENV` sources; `eval`/`assert` shell sinks; `include`/`include_once`/`require`/`require_once` FILE_IO sinks; `unserialize` sink; `move_uploaded_file`/`copy`/`file_put_contents`/`fwrite` FILE_IO sinks; `basename` FILE_IO sanitizer; `query` SQL sink.
- **Java** — added `readObject`/`readLine` sources; `ProcessBuilder` shell sink; `Class.forName` reflection sink; `println`/`print`/`write` HTML sinks.
- **Python** — added `send_file`/`send_from_directory` FILE_IO sinks; `os.path.realpath` FILE_IO sanitizer; `open` changed from source to FILE_IO sink (fixes source/sink conflict for path traversal detection).
- **Ruby**`params` source detection now works via subscript handling.
- **Rust** — added `fs::read_to_string`/`fs::write`/`fs::read`/`File::open`/`File::create` as FILE_IO sinks; `fs::read_to_string` removed from sources (was source/sink conflict).
- **C/C++** — added `fopen`/`open` as FILE_IO sinks.
- **Ruby `rb.cmdi.system_interp` pattern broadened** — no longer requires string interpolation in arguments; now matches any `system`/`exec` call, promoted from Tier B to Tier A.
- **C++ `cpp.cmdi.popen` pattern added**`popen()` command execution detection for C++, using the language-namespaced ID (the C pattern retains `c.cmdi.popen`).
- **Test config enables state analysis**`test_config()` now sets `enable_state_analysis = true`.
> Heads-up: false positives or regressions on cross-file flows are possible. Please open an issue with a minimal reproduction if you hit one.
### Highlights
### Fixed
- **Taint source kind misclassified as "unknown" for non-call sources** — source-bearing nodes with `CallWrapper` or `Assignment` kind (e.g. `userInput = req.query.data`) had their `callee` field set to `None` because the CFG builder only populated `callee` for `StmtKind::Call` nodes. This caused `infer_source_kind()` to receive an empty string, failing to match any keyword pattern and defaulting to `SourceKind::Unknown`. Fixed by also setting `callee` when a label (Source/Sink/Sanitizer) is detected, so the extracted member text (e.g. "req.query") flows through to source kind inference. Affects severity classification and diagnostic output for property-access sources across all languages.
- **Full KINDS map audit across all 10 languages** — 89 missing tree-sitter node types added to KINDS maps so the CFG builder no longer silently drops code inside switch/case, try/catch/finally, class bodies, closures/lambdas, and other container nodes. Previously, any node not in a language's KINDS map hit the `build_sub` fallback which created a terminal Seq node without recursing into children, effectively making all wrapped code invisible to analysis.
- **C** (+3): `switch_statement`, `case_statement`, `labeled_statement`
- **C++** (+7, 1 fix): `switch_statement`, `case_statement`, `labeled_statement`, `throw_statement` (Return), `try_statement`, `catch_clause`, `lambda_expression`; **critical fix**: `namespace_definition` changed from `Trivia` to `Block` (all function definitions inside namespaces were silently dropped)
- **Java** (+11): `do_statement` (While), `throw_statement` (Return), `switch_expression`, `switch_block`, `switch_block_statement_group`, `try_statement`, `catch_clause`, `finally_clause`, `lambda_expression`, `constructor_body`, `static_initializer`
- **JavaScript** (+11): `switch_statement`, `switch_body`, `switch_case`, `switch_default`, `try_statement`, `catch_clause`, `finally_clause`, `class_declaration`, `class` (expression), `class_body`, `export_statement`
- **TypeScript** (+13): all JS switch/try/class entries plus `abstract_class_declaration`, `export_statement`, `enum_declaration` (Trivia)
- **PHP** (+11): `do_statement` (While), `throw_expression` (Return), `switch_statement`, `switch_block`, `case_statement`, `default_statement`, `try_statement`, `catch_clause`, `finally_clause`, `colon_block`, `class_declaration`
- **Python** (+7): `try_statement`, `except_clause`, `finally_clause`, `class_definition`, `decorated_definition`, `match_statement`, `case_clause`
- **Ruby** (+11): `until` (While), `begin`, `rescue`, `ensure`, `case`, `when`, `class`, `module`, `singleton_method` (Function), `do`, `block`
- **Go** (+10): `expression_switch_statement`, `type_switch_statement`, `expression_case`, `type_case`, `default_case`, `select_statement`, `communication_case`, `go_statement`, `defer_statement`, `func_literal` (Function)
- **Rust** (+5, 1 removal): `closure_expression`, `async_block`, `impl_item`, `trait_item`, `declaration_list`; removed dead `loop_statement` entry (node doesn't exist in tree-sitter-rust 0.24.0)
- Removed unused `Kind::LoopBody` enum variant from `labels/mod.rs` (no arm in `build_sub`, last reference was the dead Rust `loop_statement` entry)
- **CFG: `else_clause` not recursed into for C/C++** — tree-sitter's C and C++ grammars wrap else bodies in an `else_clause` node. This node was missing from both languages' `KINDS` maps, so the CFG builder's fallback arm treated it as a terminal `Seq` node without descending into children. All statements inside else blocks (e.g. `fclose(f)`) were silently dropped from the CFG, causing false-positive resource leak and incorrect branch analysis. Fixed by mapping `"else_clause" => Kind::Block` in `src/labels/c.rs` and `src/labels/cpp.rs`.
- **CFG: `else_clause` missing from Rust, JavaScript, TypeScript, Python, PHP KINDS maps** — same bug class as C/C++: tree-sitter wraps else bodies in an `else_clause` node that was not in KINDS, silently dropping all code inside else blocks from the CFG. Fixed by mapping `"else_clause" => Kind::Block` in all five languages. Also added `"elif_clause" => Kind::Block` (Python), `"else_if_clause" => Kind::Block` (PHP), and `"elsif" => Kind::If` (Ruby) to handle chained elif/elsif nodes.
- **Rust KINDS using wrong tree-sitter node names** — tree-sitter-rust uses `_expression` suffixes (not `_statement`) for `while`, `for`, and `return` nodes. The existing `while_statement`, `for_statement`, and `return_statement` entries were dead code (0 grammar matches). Added `while_expression`, `for_expression`, and `return_expression` mappings.
- **Rust `match_expression`, `match_block`, `match_arm`, `unsafe_block` missing from KINDS** — these wrapper nodes were not mapped, causing all code inside match arms and unsafe blocks to be silently dropped from the CFG. Mapped to `Kind::Block` for sequential traversal.
- **TypeScript missing `throw_statement` and `do_statement`**`throw` was mapped in JavaScript but not TypeScript; `do_statement` (do-while loops) was missing from both JS and TS. Added `"throw_statement" => Kind::Return` and `"do_statement" => Kind::While` to both languages.
- **Python `raise_statement` and `with_statement` missing from KINDS**`raise` terminates the current path (mapped to `Kind::Return`); `with` wraps code in a context manager (mapped to `Kind::Block`). Both were silently dropping enclosed code.
- **Dead KINDS entries removed**`"for_of_statement"` in TypeScript (0 grammar matches; TS inherits `for_in_statement` from JS) and `"method_call"` in Ruby (0 grammar matches; Ruby only has `call`).
- **`--high-only` emitting Low/Medium taint and CFG findings** — severity filter was only applied to AST pattern queries during analysis. Taint findings (whose severity derives from `SourceKind`) and CFG structural findings passed through unfiltered. The filter is now applied at the final output stage after all severity normalization, ensuring `--severity HIGH` never emits downgraded Medium/Low findings.
- **JSON/SARIF output contaminated with status messages on stdout** — status messages ("Checking...", "Finished in...") used `println!` and appeared in stdout alongside machine output. Now all status goes to stderr.
- **CFG: False edge to then-block exits in no-else if statements** -- previously, `if (cond) { body }` without an else block created a `False` edge from the condition node directly to the then-block's exit nodes. This made the false path appear to traverse the then-block, causing incorrect predicate polarity in path-sensitive analysis and duplicate taint findings with contradictory metadata. The CFG now creates a synthetic pass-through `Seq` node for the false path with an explicit `False` edge from the condition, correctly modeling "skip the then-block." This also fixes the frontier: previously, the no-else non-terminating case duplicated `then_exits` in the frontier (`then_exits ++ then_exits.clone()`); it now correctly produces `then_exits [pass_through]`.
- **Taint BFS non-termination on large JS files** — the BFS taint engine in `taint/mod.rs` had no global iteration bound. The seen-state deduplication keyed on `(node, taint_hash)`, so every distinct taint map at a CFG node was treated as a novel state. In files with loops and many tainted variables (e.g. a 2,200-line JS file with 18+ top-level variables tainted via `window.location.search`), each loop iteration produced a slightly different taint map, causing the BFS to revisit loop bodies indefinitely. Both `--no-index` and `--rebuild-index` scans hung near completion (progress showed e.g. 87/88 files). Fixed by adding two hard bounds: `MAX_BFS_ITERATIONS` (200,000 queue pops) and `MAX_SEEN_STATES` (100,000 unique `(node, taint_hash)` entries in the seen-state map). When either limit is reached the analysis bails out gracefully and returns all findings collected so far. A `tracing::warn!` is emitted on iteration-limit bail-out. Normal files are unaffected (typical BFS uses <1,000 iterations).
- **Rust `if let` / `while let` taint propagation** — the CFG builder now extracts pattern bindings from `let_condition` nodes as variable definitions in `def_use()`, and classifies the value expression (e.g. `env::var("CMD")`) for source/sink labels in `push_node()`. Previously, `if let Ok(cmd) = env::var("CMD") { Command::new("sh").arg(&cmd) }` produced no taint finding because `cmd` was never recognized as a tainted definition. Now correctly detects taint flow through `if let` and `while let` bindings.
- **C++ `popen` pattern ID collision** — renamed `c.cmdi.popen` to `cpp.cmdi.popen` in C++ patterns to fix a cross-language duplicate ID that caused `all_pattern_ids_are_globally_unique` test failure.
- **State analysis early-return leak duplication**`extract_findings` in `state/facts.rs` now skips early-return nodes when checking for resource leaks, only inspecting the synthesized function exit node. Previously, early-return nodes with path-specific state (OPEN only) emitted `state-resource-leak` alongside the correct `state-resource-leak-possible` from the merged exit state.
- **Severity filter bug**`min_severity` comparison in `ast.rs` was inverted (`<=` instead of `>`), causing all AST patterns at the minimum severity level to be silently dropped. With the default `min_severity = Low`, all Low-severity patterns (`.unwrap()`, `.expect()`, `panic!`, `todo!`, `mem::forget`, Go crypto patterns, narrow casts) were never reported. Fixed 29 test cases.
- **Nested function analysis** — CFG builder now recurses into function expressions passed as call arguments (e.g., Express `app.get('/path', function(req, res) { ... })`, Sinatra `get '/path' do...end`). Added `collect_nested_function_nodes()` to discover `Kind::Function` nodes inside `CallWrapper`/`CallFn` AST subtrees. Also added `function_expression` to JS/TS KINDS maps, and `do_block`/`block` as `Kind::Function` in Ruby for Sinatra/Rails blocks. Anonymous functions now get unique names (`<anon@{offset}>`) to prevent scope collisions in JS two-level taint solve.
- **Chained method call classification**`classify()` now normalizes chained calls like `r.URL.Query().Get` by stripping internal `()` between `.` segments, producing `r.URL.Query.Get`. Suffix matching is attempted against both the original head and the normalized form, fixing Go HTTP handler source detection and similar patterns.
- **Subscript access source detection**`first_member_label` and `first_member_text` now handle `subscript_expression`, `subscript`, and `element_reference` nodes, enabling source classification for PHP `$_GET['cmd']`, Ruby `params[:cmd]`, and Python `os.environ['KEY']`.
- **Return-statement call extraction**`Kind::Return` added to the node types that extract inner call identifiers via `first_call_ident`, fixing cases like `return send_file(path)` where the sink was not classified.
- **Nested call classification** — new `find_classifiable_inner_call()` tries all nested calls when the outermost one doesn't classify, fixing `str(eval(expr))` where `eval` is a sink wrapped in a non-sink call.
- **Java `new` expression text extraction** — added `type` field fallback in `push_node` and `first_call_ident` for `CallFn` nodes, fixing `new ProcessBuilder(...)` not matching as a sink.
- **Function body lookup for anonymous functions**`Kind::Function` handler now falls back to finding a `Kind::Block` child when `child_by_field_name("body")` returns None, supporting JS/TS anonymous function expressions and Ruby blocks.
- **Function-level resource leak detection**`extract_findings` in `state/facts.rs` now inspects per-function Return nodes for leaked resources, not just the file-level Exit node. Previously, variables from one function could be overwritten by same-named variables in subsequent functions, masking leaks.
- **Use-after-free for memory functions** — added `strcpy`, `strncpy`, `memcpy`, `memmove`, `memset`, `memcmp`, `strcmp`, `strncmp`, `strlen`, `sprintf`, `snprintf` to `RESOURCE_USE_PATTERNS` in state analysis, enabling use-after-free detection for common C/C++ string and memory functions.
- **New SSA-based taint engine.** Block-level worklist analysis over a pruned SSA IR, replacing the legacy BFS engine across all 10 languages. More precise, easier to extend, and the foundation for everything else in this release.
- **Cross-file analysis.** Function summaries (including the new SSA summaries) flow across files via SQLite-backed persistence. Callee bodies can be inlined for context-sensitive analysis (k=1) and walked symbolically across file boundaries.
- **Symbolic execution layer.** Candidate findings are walked symbolically from source to sink, producing concrete attack witnesses, pruning infeasible paths, and (optionally) handing constraints off to Z3.
- **Local web UI (`nyx serve`).** React + Vite frontend for browsing findings, viewing flow paths, and triaging results. Triage decisions persist to `.nyx/triage.json` so they version with your code.
- **Hostile-repo hardening.** Path containment, loopback-only serving, CSRF tokens, bounded artifact reads. Safe to run on untrusted code.
- **Tighter false-positive controls.** Type-aware sink suppression, abstract interpretation (intervals + string prefixes), constraint solving, allowlist and type-check guard recognition, and confidence scoring on every finding.
## [0.3.0] - 2026-02-25
### Engine
### Added
- **Configurable analysis rules** -- users can define custom sources, sanitizers, and sinks per language via TOML config (`nyx.local`) or the new `nyx config` CLI. Config rules take priority over built-in rules, so project-specific sanitizers like `escapeHtml()` are recognized without code changes.
- **`nyx config` CLI subcommand** with four actions:
- `show` -- print effective merged configuration as TOML
- `path` -- print config directory path
- `add-rule --lang <LANG> --matcher <NAME> --kind <KIND> --cap <CAP>` -- append a label rule to `nyx.local`
- `add-terminator --lang <LANG> --name <NAME>` -- append a terminator function to `nyx.local`
- **`--include-nonprod` CLI flag** -- by default, findings in non-production paths (tests, vendor, benchmarks, examples, fixtures, build scripts, `*.min.js`) are now downgraded by one severity tier (High→Medium, Medium→Low). Pass `--include-nonprod` to restore original severity. Controlled by `scanner.include_nonprod` config key.
- **`SourceKind` enum** in the taint engine -- taint findings now carry a `source_kind` field (`UserInput`, `EnvironmentConfig`, `FileSystem`, `Database`, `Unknown`) inferred from the source callee name and capabilities. Severity is based on source kind rather than hardcoded to High: filesystem and database sources produce Medium, user input and environment sources produce High.
- **Configurable terminators** -- functions like `process.exit()` can be declared as terminators per language; the CFG treats them as dead ends, preventing false positives on code after termination calls.
- **Event handler callback suppression** -- functions passed as arguments to configured event handler calls (e.g. `addEventListener`) are no longer flagged as unreachable code.
- **Exec-path guard rules** -- calls to `which`, `resolve_binary`, `find_program`, `lookup_path`, and `shutil.which` are recognized as guards for `SHELL_ESCAPE` sinks. If such a guard dominates a shell-exec sink, the `cfg-unguarded-sink` finding is suppressed.
- **One-hop constant binding trace** -- the constant-arg sink suppression now traces one hop through the CFG. If a sink's variable was defined by a node with no uses and no Source label, it is treated as constant. Fixes false positives on patterns like `cmd = "git"; subprocess.run([cmd, "status"])`.
- **Evidence-based severity in cfg-only mode** -- when taint analysis is not active (no global summaries and no taint findings), structural `cfg-unguarded-sink` findings without source-derived evidence are downgraded from Medium to Low.
- **FileResponse ownership transfer** -- file handles passed to consuming sinks (`FileResponse`, `StreamingHttpResponse`, `send_file`, `make_response`) are no longer flagged as resource leaks.
- **Lock-not-released refinement** -- mutex findings now require an explicit `.acquire()` or `.lock()` call on the acquired variable. Constructor-only patterns like `lock = threading.Lock()` without acquire no longer produce `cfg-lock-not-released`.
- **Python `connect`/`cursor` exclusions** -- `signal.connect`, `event.connect`, and `.register` are excluded from the Python db-connection acquire pattern, preventing false `cfg-resource-leak` findings on Django signal handlers and event registrations.
- **`location.href` sink rules** for JavaScript -- `location.href`, `window.location.href`, and `document.location.href` assignments are classified as `Sink(URL_ENCODE)`.
- **`throw_statement` as terminator** in JavaScript -- `throw` now terminates the current block in the CFG (mapped to `Kind::Return`), preventing false `cfg-error-fallthrough` findings after throw statements.
- **`Cap::FMT_STRING` capability bit** -- new bitflag (`0b0100_0000`) for format-string vulnerabilities, distinct from HTML injection. Sources using `Cap::all()` automatically match.
- **Python taint sources** -- `open`, `argparse.parse_args`, `urllib.request.urlopen`, `requests.get`, `requests.post` added as `Cap::all()` sources for broader attack-surface coverage.
- **SARIF 2.1.0 output format** (`-f sarif`) -- produces spec-compliant Static Analysis Results Interchange Format JSON on stdout. Includes tool metadata, deduplicated rule definitions with descriptions, severity-to-level mapping (`High→error`, `Medium→warning`, `Low→note`), and physical locations with relative paths. Suitable for GitHub Code Scanning, Azure DevOps, and other SARIF-consuming CI tools.
- **Progress bars** via `indicatif` -- file discovery, Pass 1, and Pass 2 each display a progress bar on stderr with file counts and ETA. Bars are automatically hidden when output format is `json`/`sarif` or quiet mode is enabled. Index building also shows progress.
- **Quiet mode** (`output.quiet = true`) -- suppresses all status messages (config notes, "Checking...", "Finished in...") on stderr. Useful for CI pipelines and scripted invocations.
- **Resource leak detection for Python, Ruby, PHP, JavaScript, and TypeScript** -- new acquire/release pairs: Python (`open`/`.close`, `socket`/`.close`, `connect`/`.close`, `threading.Lock`/`.release`), Ruby (`File.open`/`.close`, `TCPSocket.new`/`.close`, `.lock`/`.unlock`), PHP (`fopen`/`fclose`, `mysqli_connect`/`mysqli_close`, `curl_init`/`curl_close`), JS/TS (`fs.open`/`fs.close`, `createReadStream`/`.close`).
- **Walker config wired up** -- `performance.max_depth`, `scanner.one_file_system`, `scanner.require_git_to_read_vcsignore`, and `scanner.excluded_files` are now enforced during directory walking (previously parsed but ignored).
- **`database.vacuum_on_startup`** -- when enabled, runs SQLite VACUUM before indexed scans to reclaim space.
- 31 new unit tests covering config round-trip, rule merging, classify extension, href classification, throw termination, terminator detection, config sanitizer suppression, Python/C++ precision, unreachable+unguarded dedup, resource leak detection, one-hop constant binding, exec-path guards, cfg-only severity downgrade, FileResponse ownership, lock constructor suppression, signal.connect exclusion, nonprod path detection, and severity downgrade.
- SSA IR with dominance-frontier phi insertion. The optimization pipeline runs constant propagation, branch pruning, copy propagation, alias analysis, DCE, type facts, and points-to in sequence.
- Multi-label classification — a single API can carry both Source and Sink labels (e.g. PHP `file_get_contents`, Java `readObject`).
- Gated sinks — `setAttribute`, `parseFromString`, etc. only activate when the constant attribute argument is dangerous, and only the payload argument is treated as taint-bearing.
- Container taint with per-index precision and bounded points-to. Aliased containers share heap identity correctly.
- Loop-aware analysis: induction-variable pruning, widening at loop heads, bounded unrolling in symex.
- Path-sensitive phi evaluation propagates validation when all tainted predecessors are guarded.
- Per-return-path summaries decompose function effects when paths produce different taint behavior.
- Cross-file SCC fixed-point — mutually recursive functions across files now reach a joint convergence.
- Demand-driven backwards analysis (off by default) annotates findings with cutoff diagnostics.
- Direction-aware engine notes (`UnderReport`, `OverReport`, `Bail`) flow into confidence scoring, ranking, and the new `--require-converged` strict mode.
### Changed
- **`taint::Finding` struct** -- added `source_kind: SourceKind` field. Code that constructs `Finding` directly must include this field.
- **`AnalysisContext` struct** -- added `taint_active: bool` and `analysis_rules` fields. Code that constructs `AnalysisContext` directly must include these fields.
- **`ScannerConfig` struct** -- added `include_nonprod: bool` field (default `false`). Deserialization is unaffected due to `#[serde(default)]`.
- **`proto_pollution` AST pattern severity** -- downgraded from High to Low. The AST-only pattern is a structural indicator; the taint engine separately produces High findings when attacker-controlled data flows to `__proto__`.
- **`location_href_assignment` AST pattern** -- constrained to require a known browser global object (`window`, `location`, `document`, `self`, `top`, `parent`, `frames`). Prevents `el.href = val` from matching; only `window.location.href = val` and similar patterns trigger the finding.
- **Taint finding severity** -- no longer hardcoded to High. Severity is now derived from `SourceKind`: UserInput/EnvironmentConfig/Unknown → High, FileSystem/Database → Medium.
- **C/C++ sink reclassification** -- `printf`/`fprintf` moved from `Sink(HTML_ESCAPE)` to `Sink(FMT_STRING)`. `std::cout`, `std::cerr`, `std::clog` removed from sinks entirely (output/logging, not injection vectors). `sprintf`/`strcpy`/`strcat` remain `Sink(HTML_ESCAPE)`.
- `classify()` now accepts an optional `extra: Option<&[RuntimeLabelRule]>` parameter; config-defined rules are checked first (higher priority) before built-in static rules.
- `build_cfg()`, `build_sub()`, and `push_node()` accept optional `LangAnalysisRules` for config-driven label classification, terminator detection, and event handler awareness.
- `find_guard_nodes()` and `is_guard_call()` now recognize config-defined sanitizers as guards with matching capability bits.
- `merge_configs()` union-merges analysis rules, terminators, and event handlers per language key with dedup.
- Assignment LHS classification now tries the full member expression text (e.g. `location.href`) before falling back to property-only (e.g. `innerHTML`), fixing false positives on `a.href` assignments.
- `handle_command()` now receives `config_dir` to support the `config` subcommand.
- **Fused single-pass analysis** -- AST-only mode now runs a single fused pass (`analyse_file_fused`) that parses each file and builds the CFG once, producing both function summaries and diagnostics. Previously every file was parsed twice (once for summary extraction, once for analysis). Taint mode uses the fused pass for Pass 1, eliminating redundant CFG construction during summary extraction.
- **O(N²) → O(N) function-level dataflow sweep in CFG builder** -- the light-weight dataflow sweep and return-node wiring in `build_sub` for `Kind::Function` now iterate only over nodes created within the current function scope (tracked via a snapshot of the node count) instead of scanning the entire graph. Eliminates quadratic scaling in files with many functions.
- **Parallel summary merging** -- `scan_filesystem` now uses rayon `fold`/`reduce` to build per-thread `GlobalSummaries` maps in parallel, then merges them in a binary reduce tree. Eliminates the serial `merge_summaries` bottleneck. Added `GlobalSummaries::merge()`.
- **Redundant file I/O eliminated in indexed path** -- files are now read once and hashed once per scan. Added `Indexer::should_scan_with_hash()` and `Indexer::upsert_file_with_hash()` to accept pre-computed hashes. Pass 2 uses `run_rules_on_bytes` with already-read bytes instead of re-reading from disk. Previously files could be read up to 4 times and hashed up to 3 times per indexed scan.
- **SQLite mutex mode relaxed** -- switched from `SQLITE_OPEN_FULL_MUTEX` (global serialization) to `SQLITE_OPEN_NO_MUTEX`. The r2d2 connection pool guarantees one-connection-per-thread safety; combined with WAL mode this allows concurrent readers without a global lock.
- **Parallel JSON deserialization in `load_all_summaries`** -- for large result sets (>256 summaries), JSON deserialization is now parallelized with rayon.
- **Zero-allocation taint hashing** -- `taint_hash()` replaced sorted-`Vec` + blake3 with an order-independent XOR-of-FNV scheme. Eliminates a heap allocation and sort per BFS edge in the taint engine.
- **In-place taint transfer** -- `apply_taint()` now mutates the taint map in place instead of cloning and returning a new `HashMap` per node visit. The BFS loop caches hash values and uses `std::mem::take` for the last successor to avoid unnecessary clones.
### Symbolic Execution
### Fixed
- **False positives on one-hop constant bindings** -- `cmd = "git"; Command::new(cmd)` no longer triggers `cfg-unguarded-sink` because the variable is traced back to a constant definition.
- **False positives from exec-path guards** -- `resolve_binary(&bin); Command::new(bin)` is now recognized as guarded.
- **False `cfg-resource-leak` on Django signal handlers** -- `signal.connect(handler)` no longer matches the Python db-connection acquire pattern.
- **False `cfg-lock-not-released` on Lock constructors** -- `threading.Lock()` without `.acquire()` no longer produces a finding.
- **False `cfg-resource-leak` on FileResponse** -- `f = open(...); return FileResponse(f)` is recognized as ownership transfer.
- **Inflated severity in cfg-only mode** -- structural findings without taint evidence now correctly produce Low severity instead of Medium.
- **`el.href = val` false positive in AST patterns** -- the `location_href_assignment` pattern now requires a known browser global, eliminating matches on DOM element `.href` assignments.
- **Structured output modes (`-f json`, `-f sarif`) now produce zero stderr noise** -- config notes, "Checking …", and "Finished in …" messages are fully suppressed (not just redirected to stderr) so that `nyx scan -f json | jq` and CI SARIF upload work without extraneous output. Human-readable console format continues to show status messages.
- **Console output column alignment** -- severity tags are now bracketed and padded to a fixed display width (`[HIGH]`, `[MEDIUM]`, `[LOW]`) so that rule IDs align consistently regardless of severity. ANSI color codes are applied after width calculation, not before.
- **`.href` false positives** -- `el.href = "/about"` no longer triggers `location_href_assignment` or sink classification; only `location.href` (and `window.location.href`, `document.location.href`) match.
- **Constant-arg sink false positives** -- sinks whose arguments are all constants (no variable uses beyond the callee name) with no taint confirmation are now suppressed. Fixes false positives on patterns like `subprocess.run(["make","clean"])` and `printf("hello\n")`.
- **Unreachable + unguarded dedup** -- when both `cfg-unreachable-sink` and `cfg-unguarded-sink` fire on the same span, the unguarded finding is suppressed (unreachable is more specific).
- **`std::cout` false positives** -- `std::cout` no longer classified as a sink, eliminating spurious findings on every C++ iostream print.
- **Break/continue scope correctness** -- `break` and `continue` inside loops now correctly wire to their enclosing loop header/exit. Previously, `break` in a `while`/`for` body created a dead-end node that left post-loop code unreachable, producing false `cfg-unreachable-*` findings. The If handler's no-else case also now correctly flows the false branch to subsequent code when the then-branch terminates (return/break/continue). True/False edge labels are applied to branch entry nodes rather than exit nodes, fixing `cfg-error-fallthrough` false positives on `if (err) { return; }` patterns.
- **Preprocessor dangling-else CFG recovery** -- `#ifdef`/`#endif` blocks that split an `if/else` across preprocessor boundaries no longer orphan subsequent code. The CFG block handler now recovers the frontier after preprocessor nodes, preventing false unreachable-code findings on code following `#ifdef ... #endif` blocks.
- **Wrapper resource function recognition** -- `curlx_fopen`, `curlx_fdopen`, `fdopen`, and `curlx_fclose` are now recognized as acquire/release functions for C file handles, eliminating false `cfg-resource-leak` findings on codebases (e.g. curl) that use wrapper functions around standard I/O.
- **`freopen` false positive** -- `freopen()` (and `curlx_freopen`) no longer triggers `cfg-resource-leak` findings. Previously `freopen` matched the `fopen` acquire pattern via `ends_with`; a new `exclude_acquire` field on `ResourcePair` filters out these false matches for both the file handle and file descriptor resource pairs.
- **Struct field ownership transfer** -- resource leak detection now recognizes ownership transfer via struct field assignment (`s->stream = fp`, `obj.field = ptr`). When an acquired resource is stored into a struct field downstream, the finding is suppressed since the receiving struct assumes lifetime responsibility.
- **Linked-list/global insertion** -- resource leak detection now recognizes linked-list insertion patterns (`p->next = list; list = p`) and global variable assignment as ownership transfers, eliminating false `cfg-resource-leak` findings on common C allocation-and-insert idioms.
- Removed incorrect `value_enum` attribute from CLI `--format` argument.
- Benchmark compilation error: `classify()` calls in `benches/scan_bench.rs` were missing the third `extra` parameter.
- Expression trees (`SymbolicValue`) preserve computation structure through the path walk: integers, strings, binary ops, concatenations, calls, phi merges.
- Witness strings reconstruct concrete attack payloads at sink nodes.
- Bounded multi-path forking with reachability pruning.
- Cross-file: callee summaries are modeled directly, and pre-lowered callee bodies are loaded from SQLite so witnesses can keep walking across files.
- Interprocedural mode: nested frames with full state propagation, transitive descent up to 3 levels, structured cutoff tracking.
- Field-sensitive symbolic heap with bounded fields per object.
- Symbolic string theory: `Substr`, `Replace`, `ToLower`, `ToUpper`, `Trim`, `StrLen` modeled with concrete folding and sanitizer pattern detection.
- Optional Z3 integration (compile-time `smt` feature) for cross-variable constraint solving.
## [0.2.0] - 2026-02-24
### Security & Coverage
### Added
- **Cross-file taint analysis** -- two-pass architecture: Pass 1 extracts `FuncSummary` per function (source/sanitizer/sink capabilities, taint propagation, callees), Pass 2 runs BFS taint propagation with cross-file callee resolution.
- **CFG analysis engine** with five detectors: unguarded sinks (`cfg-unguarded-sink`), auth gaps in web handlers (`cfg-auth-gap`), unreachable security code (`cfg-unreachable-*`), error fallthrough (`cfg-error-fallthrough`), and resource leaks (`cfg-resource-leak`).
- **Cross-language interop** -- taint flows across language boundaries via explicit `InteropEdge` structs without false-positive name collisions.
- **Function summaries** persisted to SQLite (`function_summaries` table) with arity, parameter names, capability bitflags, and callee lists.
- **Multi-language CFG + taint support** -- all 10 languages (Rust, C, C++, Java, Go, PHP, Python, Ruby, TypeScript, JavaScript) now have `KINDS` maps, `RULES`, and `PARAM_CONFIG` for full CFG construction and taint analysis.
- **Resource leak detection** for C/C++ (malloc/free, fopen/fclose), Go (os.Open/Close, Lock/Unlock), Rust (alloc/dealloc), and Java (streams, connections).
- **Finding scoring system** -- numeric scores based on severity, proximity to entry point, path complexity, taint confirmation, and confidence multiplier.
- **Analysis modes** -- `Full` (default), `Ast` (`--ast-only`), and `Taint` (`--cfg-only`) selectable via CLI flags or `scanner.mode` config.
- **`GlobalSummaries`** with conservative merge: union caps, OR booleans, union param/callee lists on name collisions across files.
- **Performance optimizations** -- `_from_bytes` variants to read-once/hash-once, lock-free rayon parallelism, SQLite WAL + 8 MB cache + 256 MB mmap.
- **Tracing instrumentation** -- `tracing` spans on all pipeline phases (walk, pass1, merge, pass2, per-file ops, db_init).
- **Benchmark suite** -- criterion benchmarks in `benches/scan_bench.rs` with fixtures.
- 107 unit tests covering taint propagation, cross-file resolution, cross-language interop, CFG analysis, and summaries.
- Vulnerability classes added: SSRF (10 languages), deserialization (Python, Ruby, Java, PHP), and `Cap::UNAUTHORIZED_ID` for auth-as-taint (off by default behind config flag).
- Auth analysis: receiver-type sink gating, row-level ownership-equality detection, self-actor recognition (`let user = require_auth()`), sink classification (in-memory vs realtime vs outbound), helper-summary lifting, and SQL JOIN-through-ACL recognition.
- State analysis (resource lifecycle, use-after-close, leaks, unauthed access) is now on by default. RAII-aware for Rust and C++; recognizes Python `with`, Go `defer`, Java try-with-resources.
- Framework rule packs: Express, Flask/Django, Spring/JNDI, Rails. Per-language label depth significantly expanded.
- C/C++ taint depth: output-parameter source propagation, implicit definitions for uninitialized declarations.
- Negative test corpus (30 fixtures) and a 262-case benchmark with CI gates on rule-level Precision/Recall/F1.
### Changed
- Bumped all dependencies to latest compatible versions.
- `Cap` bitflags expanded: `ENV_VAR`, `HTML_ESCAPE`, `SHELL_ESCAPE`, `URL_ENCODE`, `JSON_PARSE`, `FILE_IO`.
- `classify()` in labels uses zero-allocation byte-level case-insensitive comparisons.
- Indexed scans now always re-analyze all files in Pass 2 when taint is enabled (conservative: global summaries may have changed even if a file didn't).
### CLI & Output
### Fixed
- Clippy `ptr_arg` lint in perf tests (`&PathBuf` -> `&Path`).
- `nyx serve` — local web UI on `localhost` only (refuses non-loopback binds).
- `--require-converged` filters out findings where the engine bailed early.
- Analysis-engine toggles graduated from `NYX_*` env vars to first-class flags and `[analysis.engine]` config: `--constraint-solving`, `--abstract-interp`, `--context-sensitive`, `--symex`, `--cross-file-symex`, `--symex-interproc`, `--smt`, `--parse-timeout-ms`. Old env vars still work when Nyx is consumed as a library.
- Confidence (`High`/`Medium`/`Low`) shown on every finding, including console headers.
- Engine notes surfaced in console (`[capped: N notes — over-report]`), JSON (`engine_notes`, `confidence_capped`), and SARIF (`result.properties.loss_direction`).
- Flow paths reconstructed step-by-step with file/line/snippet for each hop.
- Concrete attack witness strings synthesized by the symbolic executor.
- Primary sink locations now point at the callee's real sink line; caller call sites are preserved as flow steps.
- Richer scan progress: explicit stages, timing breakdowns, language counters, skipped/reused file counts.
- Tighter taint-finding deduplication.
## [0.2.0-alpha] - 2025-06-28
### Hardening
### Added
- Experimental intraprocedural CFG + taint analysis for Rust. Nyx now builds a controlflow graph, applies dataflow rules, and flags unsanitised Source → Sink paths (e.g. env::var → Command::new).
- O(1) nodekind lookup via perlanguage PHF tables for zerocost dispatch.
- Six unit tests covering conditionals, loops, sanitizers, and multiple sources.
- Debug channel target=cfg (use RUST_LOG=nyx::cfg=debug) to inspect generated graphs.
- Centralized path containment rejects traversal, symlink escapes, and oversized reads across UI, debug, and triage routes.
- `nyx serve` validates `Host` headers, requires per-session CSRF tokens for mutations, and refuses scans outside the original repo root.
- Walker re-validates symlink targets against the scan root.
- Bounded reads on framework manifests and `.nyx/triage.json` imports.
- UI falls back to plain text on pathologically long lines to defeat regex-DoS in syntax highlighting.
- Parser timeout is now configuration-backed with hostile-input regression coverage.
### Fixed
- Fixed a bug in the release pipeline where Windows was trying to call the zip, PowerShell doesn't have a zip command
### Persistence
## [0.1.1-alpha] - 2025-06-25
- SQLite schema bumped to v2. Anonymous-function identity is now a structural DFS index instead of a byte offset, so inserting a line above an unchanged function no longer invalidates its `FuncKey`. Pre-0.5.0 caches are silently cleared on open; triage data and scan history are preserved.
- Engine-version metadata; persisted summaries and file hashes invalidate on mismatch.
- Stale SSA tables recreate when required columns are missing; deserialization failures log instead of silently dropping rows.
### Fixed
- Fixed a bug where the `scan --no-index` command would not respect the `max_results` config setting (#1)
### Frontend
### Added
- Integration tests covering indexing and scanning pipelines (#3, #4, #5, #8)
- Replaced the legacy `app.js` with a React + Vite + TypeScript SPA.
- Interactive graph workspace for CFG and call-graph views (Graphology + ELK + Sigma) with neighborhood reduction and a full-page inspector.
- Triage UI with database-backed decisions (true positive, false positive, deferred, suppressed) and `.nyx/triage.json` round-trip.
- Scan history, rules management, and finding detail panels with evidence and flow visualization.
- Vitest browser-side test suite wired into CI.
## [0.1.0-alpha] - 2025-06-25
### Removed
### Added
- Initial alpha release of **Nyx** CLI tool
- Multi-language AST pattern scanning via `tree-sitter` for Rust, C/C++, Java, Go, PHP, Python, Ruby, TypeScript, JavaScript
- `scan` command: filesystem walker, pattern execution, console output
- `index` command: build, rebuild, and status reporting of SQLite-backed index
- `list` command: list indexed projects with optional verbosity
- `clean` command: remove one or all project indexes
- Configuration system with `nyx.conf` (generated) and `nyx.local` (user overrides)
- Default severity levels: High, Medium, Low
- Unit tests for core modules (config, ext, project utils)
- Legacy BFS taint engine, `TaintTransfer`, `TaintState`, and the `NYX_LEGACY` fallback.
- Legacy vanilla-JS frontend (`app.js`).
## [0.4.0] — 2025-02-25
A precision and ergonomics release. Findings are now ranked, lower-noise by default, and easier to triage in CI.
### Highlights
- **Attack-surface ranking.** Every finding gets an exploitability score combining severity, analysis kind, evidence strength, and path-validation. Console output shows the score in the header line; `--no-rank` opts out.
- **Low-noise prioritization.** Quality-category findings are excluded by default (`--include-quality` brings them back). High-frequency Quality rules are rolled up per `(file, rule)` with example occurrences. LOW budgets cap noise without ever displacing High/Medium findings.
- **State-model dataflow analysis.** New per-variable resource-lifecycle and auth-level analysis catches use-after-close, double-close, must-leak, may-leak (branch-aware), and unauthenticated-sink access. Opt-in via `scanner.enable_state_analysis`.
- **Inline `nyx:ignore` suppressions** with same-line and next-line directives, comma lists, wildcard suffixes, and string-literal guards across all 10 languages.
- **AST pattern overhaul.** All 10 language pattern files rewritten with consistent metadata, namespaced IDs (`<lang>.<category>.<specific>`), and 30+ new patterns. 11 broken tree-sitter queries fixed.
- **Monotone forward-dataflow taint engine.** Replaced the BFS engine with a proper worklist over a finite lattice. Termination is now guaranteed by lattice height, eliminating BFS-budget bailouts on large files.
- **Path-sensitive taint analysis.** Branch predicates flow with the analysis. Contradictory guards prune infeasible paths; validation calls produce annotated findings without changing severity.
- **Interprocedural call graph.** Whole-program graph with three-valued callee resolution (`Resolved`/`NotFound`/`Ambiguous`), SCC analysis, and topo ordering ready for bottom-up taint propagation.
### CLI & Output
- `--severity <EXPR>` replaces `--high-only`. Supports `HIGH`, `HIGH,MEDIUM`, `>=MEDIUM`. Filtering is now applied at the output stage so taint and CFG findings are correctly downgraded too.
- `--mode <full|ast|cfg|taint>` replaces `--ast-only` and `--cfg-only`.
- `--index <auto|off|rebuild>` replaces `--no-index` and `--rebuild-index`.
- `--fail-on <SEVERITY>` for CI exit-code gating.
- `--min-score <N>` for ranking-aware filtering.
- `--show-suppressed` reveals suppressed findings dimmed with `[SUPPRESSED]`.
- `--keep-nonprod-severity` (renamed from `--include-nonprod`).
- `--quiet` mirrors `output.quiet`.
- Console renderer overhauled: severity is the strongest visual anchor, file paths are dim blue, taint flows use `→` arrows, multi-line call chains are normalized.
- Confidence shown alongside score in the header line.
- Pattern-level confidence is now set at the pattern definition site, not heuristically inferred from severity.
### Breaking
- Config and data directory renamed from `dev.ecpeter23.nyx` to `nyx`. Existing config and SQLite indexes at the old path won't be picked up — copy them across or re-run `nyx scan`.
- `Severity::from_str` now returns `Err` for unknown values instead of silently defaulting to Low.
### Notable Fixes
- KINDS-map audit across all 10 languages: 89 missing tree-sitter node types added. Switch/case, try/catch/finally, class bodies, lambdas, closures, and namespaces are no longer silently dropped.
- `else_clause` mapping fixed for C, C++, Rust, JS, TS, Python, PHP — code inside else blocks was being dropped from the CFG.
- Rust `if let` / `while let` taint propagation now works.
- Taint BFS non-termination on large JS files (the BFS engine has since been replaced).
- C++ `popen` pattern ID collision with C.
- Constant-arg sink suppression for AST patterns.
## [0.3.0] — 2026-02-25
Configurability, SARIF, and an aggressive false-positive purge.
### Highlights
- **Configurable analysis rules.** Sources, sanitizers, sinks, terminators, and event handlers can be defined per language in `nyx.local` or via `nyx config add-rule`/`add-terminator`. Config rules take priority over built-in rules.
- **`nyx config` CLI subcommand** with `show`, `path`, `add-rule`, `add-terminator`.
- **SARIF 2.1.0 output (`-f sarif`).** Spec-compliant for GitHub Code Scanning, Azure DevOps, and other SARIF consumers.
- **`SourceKind` taint classification.** Findings carry an inferred source kind (`UserInput`, `EnvironmentConfig`, `FileSystem`, `Database`, `Unknown`) and severity is now derived from it instead of being hardcoded to High.
- **Non-prod severity downgrade by default.** Findings in tests, vendor, benchmarks, examples, fixtures, build scripts, and `*.min.js` are downgraded one tier. `--include-nonprod` restores original severity.
- **Resource leak detection** for Python, Ruby, PHP, JavaScript, and TypeScript (file handles, sockets, locks, mysqli, curl, fs streams).
- **Progress bars and quiet mode.** Indicatif-driven progress for discovery, Pass 1, and Pass 2 (auto-hidden in JSON/SARIF/quiet modes).
### Performance
- Single fused parse+CFG pass replaces the previous two-parse summary extraction.
- Light-weight dataflow sweep in CFG builder is now O(N) per function instead of O(N²) over the whole file.
- Parallel summary merging via rayon fold/reduce.
- Indexed scans now read and hash each file once instead of up to 4 times.
- SQLite mutex mode relaxed (r2d2 + WAL provides safety without global lock).
- Zero-allocation taint hashing and in-place taint transfer.
### Notable Fixes
- One-hop constant-binding suppression: `cmd = "git"; subprocess.run([cmd, ...])` no longer flags.
- Exec-path guards (`which`, `resolve_binary`, `shutil.which`) recognized.
- `signal.connect` / `event.connect` no longer match Python db-connection acquire patterns.
- `threading.Lock()` without `.acquire()` no longer flags as unreleased.
- `FileResponse(f)` / `send_file(f)` recognized as ownership transfer.
- `el.href` no longer matches `location.href` patterns.
- Constant-only sink calls (`subprocess.run(["make","clean"])`) suppressed.
- `std::cout` no longer treated as a sink.
- Break/continue inside loops correctly wires into the loop header/exit, fixing false unreachable-code findings.
- Preprocessor `#ifdef`/`#endif` blocks no longer orphan subsequent code in C/C++.
- `freopen` no longer matches `fopen` acquire patterns.
- Struct-field, linked-list, and global assignment recognized as ownership transfers.
## [0.2.0] — 2026-02-24
The cross-file release.
- **Two-pass cross-file taint analysis.** Pass 1 extracts `FuncSummary` per function (caps, propagation, callees), Pass 2 runs BFS taint propagation with cross-file callee resolution.
- **CFG analysis engine** with five detectors: unguarded sinks, auth gaps in web handlers, unreachable security code, error fallthrough, resource leaks.
- **Cross-language interop** via explicit `InteropEdge` structs (no false-positive name collisions).
- **Function summaries persisted to SQLite** (`function_summaries` table).
- **Multi-language CFG + taint support** for all 10 languages.
- **Resource leak detection** for C/C++, Go, Rust, and Java.
- **Finding scoring system** combining severity, entry-point proximity, path complexity, taint confirmation, and confidence.
- **Analysis modes**: `Full` (default), `Ast` (`--ast-only`), `Taint` (`--cfg-only`).
- **Cap bitflags expanded**: `ENV_VAR`, `HTML_ESCAPE`, `SHELL_ESCAPE`, `URL_ENCODE`, `JSON_PARSE`, `FILE_IO`.
- Performance: read-once/hash-once via `_from_bytes` variants, lock-free rayon, SQLite WAL + 8 MB cache + 256 MB mmap.
- Tracing instrumentation on all pipeline stages; criterion benchmark suite.
## [0.2.0-alpha] — 2025-06-28
- Experimental intra-procedural CFG + taint analysis for Rust. Builds a CFG, applies dataflow, and flags unsanitised Source → Sink paths (e.g. `env::var``Command::new`).
- O(1) node-kind lookup via per-language PHF tables.
- Debug channel `target=cfg` (`RUST_LOG=nyx::cfg=debug`) to inspect generated graphs.
- Fixed Windows release pipeline (PowerShell has no `zip` command).
## [0.1.1-alpha] — 2025-06-25
- Fixed `scan --no-index` not respecting the `max_results` config setting (#1).
- Integration tests covering indexing and scanning pipelines (#3, #4, #5, #8).
## [0.1.0-alpha] — 2025-06-25
Initial alpha release.
- Multi-language AST pattern scanning via `tree-sitter` for Rust, C/C++, Java, Go, PHP, Python, Ruby, TypeScript, JavaScript.
- `scan` command: filesystem walker, pattern execution, console output.
- `index` command: build, rebuild, and status reporting of SQLite-backed index.
- `list` command: list indexed projects with optional verbosity.
- `clean` command: remove one or all project indexes.
- Configuration system with `nyx.conf` (generated) and `nyx.local` (user overrides).
- Default severity levels: High, Medium, Low.

73
CLA.md Normal file
View file

@ -0,0 +1,73 @@
# Nyx Contributor License Agreement
## Why this exists
Nyx is an open source project and will always have a fully open-source core available to the community.
This Contributor License Agreement (CLA) exists to ensure the long-term sustainability of the project. It allows Nyx to evolve over time, including improving, distributing, and potentially offering commercial versions or services that support continued development.
**You retain ownership of your contributions.** This agreement simply grants the project the rights needed to use and evolve them.
---
Thank you for your interest in contributing to Nyx (the "Project"). This Contributor License Agreement ("Agreement") clarifies the intellectual property rights granted with each Contribution from any person or entity. It is for Your protection as a contributor as well as the protection of the Project and its users.
By submitting a Contribution to the Project, You accept and agree to the terms below. If You do not agree to these terms, please do not submit Contributions.
## 1. Definitions
**"You"** (or **"Your"**) means the individual or legal entity making a Contribution to the Project. For a legal entity, "You" includes the entity and any entity that controls, is controlled by, or is under common control with that entity.
**"Contribution"** means any work of authorship, including any modifications or additions to an existing work, that is intentionally submitted by You to the Project for inclusion in, or documentation of, the Project. "Submitted" means any form of electronic, verbal, or written communication sent to the Project — including but not limited to pull requests, patches, and issue comments — but excluding communication that is conspicuously marked or otherwise designated in writing by You as "Not a Contribution."
## 2. Copyright License Grant
Subject to the terms of this Agreement, You hereby grant to the Project, to any entity that maintains or succeeds it, and to recipients of software distributed by the Project a perpetual, worldwide, non-exclusive, royalty-free, irrevocable copyright license, with the right to sublicense through multiple tiers of sublicensees, to reproduce, prepare derivative works of, publicly display, publicly perform, distribute, and sublicense Your Contribution and such derivative works.
## 3. Patent License Grant
Subject to the terms of this Agreement, You hereby grant to the Project, to any entity that maintains or succeeds it, and to recipients of software distributed by the Project a perpetual, worldwide, non-exclusive, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer Your Contribution and any combination of Your Contribution with the Project to which it was submitted. This patent license applies only to those patent claims licensable by You that are necessarily infringed by Your Contribution alone or by combination of Your Contribution with the Project.
If any entity institutes patent litigation against You or any other entity (including a cross-claim or counterclaim in a lawsuit) alleging that Your Contribution, or the Project to which You have contributed, constitutes direct or contributory patent infringement, then any patent licenses granted to that entity under this Agreement for that Contribution or Project shall terminate as of the date such litigation is filed.
## 4. Relicensing Right
In addition to the licenses granted in Sections 2 and 3, You grant the Project and any entity that maintains or succeeds it the right to relicense Your Contribution, in whole or in part, under terms other than the Project's current license (currently GPL-3.0-or-later), where necessary to support the long-term sustainability, distribution, and evolution of the Project.
This may include, without limitation:
1. Dual-licensing the Project under a commercial license;
2. Combining Your Contribution with proprietary components; or
3. Moving the Project to a different open source license.
This right is irrevocable and may be exercised by the Project's maintainers as part of maintaining and evolving the Project.
## 5. Moral Rights Waiver
To the maximum extent permitted by applicable law, You waive, and agree not to assert, any moral rights or similar rights of attribution and integrity that You may have in Your Contribution against the Project, its successors, and recipients of software distributed by the Project. To the extent such rights cannot be waived under applicable law, You agree not to enforce them in a manner that would limit the rights granted under this Agreement.
## 6. Representations
You represent that:
1. Each of Your Contributions is Your original creation, or You otherwise have the legal right to submit it under the terms of this Agreement;
2. To the best of Your knowledge, Your Contribution does not infringe any third party's copyright, patent, trade secret, or other intellectual property rights; and
3. You have the legal authority to enter into this Agreement and to grant the licenses set forth above.
If any portion of Your Contribution is not Your original creation, You will identify the source and any license or other restriction applicable to that material as part of Your submission.
## 7. Employer Authorization
If You are submitting a Contribution on behalf of Your employer, or the Contribution was made within the scope of Your employment, You represent that Your employer has authorized You to make the Contribution and to grant the licenses set forth in this Agreement. If You are unsure, please confirm with Your employer before submitting.
## 8. No Warranty
You provide Your Contributions on an "AS IS" basis, without warranties or conditions of any kind, either express or implied, including, without limitation, any warranties of title, non-infringement, merchantability, or fitness for a particular purpose. You are not required to provide support for Your Contributions, except to the extent You desire to provide such support.
## 9. Copyright Retained
You retain copyright to Your Contribution. This Agreement grants the licenses set forth above; it does not transfer ownership. Its purpose is to give the Project flexibility to evolve and to relicense the codebase over time without needing to obtain permission from each past contributor on a case-by-case basis.
## 10. Notice of Changes
If You become aware of any facts or circumstances that would make any representation in this Agreement inaccurate in any respect, You agree to notify the Project promptly.

View file

@ -25,7 +25,7 @@ Please read our [Code of Conduct](CODE_OF_CONDUCT.md) before participating.
### Prerequisites
- **Rust 1.85+** (edition 2024)
- **Rust 1.88+** (edition 2024)
- Git
### Building
@ -284,6 +284,8 @@ cargo clippy --all -- -D warnings
## Pull Request Guidelines
First-time contributors are welcome. If you are unsure where to start, open an issue and we can help identify a focused starter task.
1. **Branch from `master`**. Use descriptive branch names: `feat/add-kotlin-support`, `fix/false-positive-sql-concat`, `docs/update-rule-reference`.
2. **Keep PRs focused**. One logical change per PR.
@ -306,6 +308,8 @@ cargo clippy --all -- -D warnings
6. **Include test cases** for any new detection rules.
7. **Disclose material AI assistance** in the PR description if the change was drafted, generated, or substantially refactored by an AI tool. One line is enough. See [AI-POLICY.md](AI-POLICY.md) for the full policy and the bar we hold AI-assisted contributions to.
---
## Bug Reports
@ -348,4 +352,20 @@ Please do **not** open public issues for security-sensitive bugs. See [SECURITY.
## License
By contributing to Nyx, you agree that your contributions will be licensed under the [GPL-3.0](./LICENSE).
### Contributions are released under GPL-3.0-or-later
By submitting a pull request, patch, or other contribution to Nyx, you agree that your contribution will be released under the [GPL-3.0-or-later](./LICENSE), the same license as the project.
### Developer Certificate of Origin
We use the Developer Certificate of Origin (DCO) as a lightweight baseline for contributions. All commits must include a `Signed-off-by:` trailer, which certifies that you wrote the code yourself or otherwise have the right to submit it under the project license.
Use `git commit -s` to add this automatically.
### Contributor License Agreement
Before your first contribution can be merged, you must sign the Nyx [Contributor License Agreement](./CLA.md).
The CLA does not transfer ownership of your work. You retain copyright to your contributions. It grants Nyx the rights needed to maintain, distribute, and evolve the project over time, including the flexibility to support long-term sustainability through future licensing or commercial offerings.
If you do not agree to these terms, please do not submit contributions to Nyx.

921
Cargo.lock generated

File diff suppressed because it is too large Load diff

View file

@ -1,10 +1,11 @@
[package]
name = "nyx-scanner"
version = "0.4.0"
version = "0.5.0"
edition = "2024"
description = "A CLI security scanner for automating vulnerability checks"
license = "GPL-3.0"
authors = ["Eli Peter <elicpeter@exmaple.com>"]
rust-version = "1.88"
description = "A multi-language static analysis tool for detecting security vulnerabilities"
license = "GPL-3.0-or-later"
authors = ["Eli Peter <elicpeter@example.com>"]
homepage = "https://github.com/elicpeter/nyx"
repository = "https://github.com/elicpeter/nyx"
documentation = "https://github.com/elicpeter/nyx/tree/master/docs"
@ -14,18 +15,34 @@ readme = "README.md"
default-run = "nyx"
exclude = [
"assets/",
"frontend/node_modules/",
".github/",
"CLAUDE.md",
".claude/",
".idea/",
"tests/",
"benches/",
"examples/",
"docs/",
".DS_Store",
".nyx/",
".z3-trace",
"target/",
"book/",
]
autoexamples = false
[features]
default = ["serve"]
serve = ["dep:axum", "dep:tokio", "dep:tokio-stream", "dep:tower-http"]
smt = ["dep:z3", "z3/bundled"]
smt-system-z3 = ["dep:z3"]
# Build switch for the internal `nyx-docgen` tool. Empty on purpose: it
# only gates the [[bin]] target so consumers of `cargo install nyx-scanner`
# don't pick up the docgen binary. Maintainers run it via
# `cargo run --features docgen --bin nyx-docgen`.
docgen = []
[lib]
name = "nyx_scanner"
path = "src/lib.rs"
@ -34,6 +51,11 @@ path = "src/lib.rs"
name = "nyx"
path = "src/main.rs"
[[bin]]
name = "nyx-docgen"
path = "tools/docgen/main.rs"
required-features = ["docgen"]
[[bench]]
name = "scan_bench"
harness = false
@ -44,6 +66,7 @@ criterion = { version = "0.8", features = ["html_reports"] }
assert_cmd = "2"
predicates = "3"
glob = "0.3"
tower = { version = "0.5", features = ["util"] }
[dependencies]
directories = "6.0.0"
@ -54,8 +77,8 @@ toml = "1.0.3"
tracing-subscriber = { version = "0.3.22", features = ["env-filter", "json", "ansi","time"] }
tracing = "0.1.44"
num_cpus = "1.17.0"
rusqlite = { version = "0.38.0", features = ["bundled"] }
r2d2_sqlite = { version = "0.32.0", features = ["bundled"] }
rusqlite = { version = "0.39.0", features = ["bundled"] }
r2d2_sqlite = { version = "0.33.0", features = ["bundled"] }
ignore = "0.4.25"
tree-sitter = "0.26.6"
tree-sitter-rust = "0.24.0"
@ -76,11 +99,24 @@ terminal_size = "0.4"
rayon = "1.11.0"
r2d2 = "0.8.10"
bytesize = "2.3.1"
chrono = { version = "0.4.44", default-features = false, features = ["std", "clock"] }
chrono = { version = "0.4.44", default-features = false, features = ["std", "clock", "serde"] }
thiserror = "2.0.18"
dashmap = "7.0.0-rc2"
petgraph = "0.8.3"
dashmap = "6.1.0"
parking_lot = "0.12"
petgraph = { version = "0.8.3", features = ["serde-1"] }
bitflags = "2.11.0"
phf = { version = "0.13.1", features = ["macros"] }
indicatif = "0.18.4"
smallvec = "1.15"
smallvec = { version = "1.15", features = ["serde"] }
uuid = { version = "1", features = ["v4"] }
axum = { version = "0.8", optional = true }
tokio = { version = "1", features = ["rt-multi-thread", "macros", "signal", "sync"], optional = true }
tokio-stream = { version = "0.1", features = ["sync"], optional = true }
tower-http = { version = "0.6", features = ["cors", "compression-gzip", "trace", "set-header", "limit"], optional = true }
z3 = { version = "0.20.0", optional = true}
[profile.release]
lto = true
codegen-units = 1
debug = 1
strip = "none"

497
README.md
View file

@ -1,405 +1,238 @@
<div align="center">
<img src="assets/logo.png" alt="nyx logo" width="300"/>
<img src="assets/nyx-wordmark.svg" alt="nyx" height="110"/>
**Fast, cross-language cli vulnerability scanner.**
**A local-first security scanner with a browser UI. Scan your repo and triage in your browser, with no cloud and no account.**
[![crates.io](https://img.shields.io/crates/v/nyx-scanner.svg)](https://crates.io/crates/nyx-scanner)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![Rust 1.85+](https://img.shields.io/badge/rust-1.85%2B-orange)](https://www.rust-lang.org)
[![Rust 1.88+](https://img.shields.io/badge/rust-1.88%2B-orange)](https://www.rust-lang.org)
[![CI](https://img.shields.io/github/actions/workflow/status/elicpeter/nyx/ci.yml?branch=master)](https://github.com/elicpeter/nyx/actions)
</div>
---
## What is Nyx?
**Nyx** is a lightweight, lightning-fast Rust-native command-line tool that detects security vulnerabilities across 10 programming languages. It combines [`tree-sitter`](https://tree-sitter.github.io/) parsing, intra-procedural control-flow graphs, and cross-file taint analysis with an optional SQLite-backed index to deliver deep, repeatable scans on projects of any size.
<p align="center"><img src="assets/screenshots/demo.gif" alt="Nyx UI walkthrough: scan, browse findings, inspect flow path, triage" width="900"/></p>
---
## Key Capabilities
## Scan locally, browse locally
| Capability | Description |
Nyx runs a cross-language taint analysis on your repository, then serves the results to a React UI bound to `127.0.0.1`. You get a finding list with severity, evidence, and a step-by-step **flow visualiser** that walks the dataflow from source → sanitizer → sink. Triage decisions persist to `.nyx/triage.json`, which commits alongside your code so the team shares one triage state.
```bash
cargo install nyx-scanner
nyx scan # runs the analyzer, caches findings in .nyx/
nyx serve # opens http://localhost:9700 in your browser
```
Everything stays on your machine: loopback-only bind, host-header enforcement, CSRF on every mutation, no telemetry, no login.
<p align="center"><img src="assets/screenshots/overview.png" alt="Overview dashboard after two scans: 2 findings remaining (down from 5), 3 fixed, a findings-over-time line trending down, plus severity/language/category breakdowns and top affected files" width="900"/></p>
---
## What's in the UI
| Page | What it shows |
|---|---|
| Multi-language support | Rust, C, C++, Java, Go, PHP, Python, Ruby, TypeScript, JavaScript |
| AST-level pattern matching | Language-specific queries written against precise parse trees |
| Control-flow graph analysis | Auth gaps, unguarded sinks, unreachable security code, resource leaks, error fallthrough |
| Cross-file taint tracking | Monotone forward dataflow taint analysis from sources through sanitizers to sinks with function summaries |
| Cross-language interop | Taint flows across language boundaries via explicit interop edges |
| Two-pass architecture | Pass 1 extracts function summaries; Pass 2 runs taint with full cross-file context |
| Incremental indexing | SQLite database stores file hashes, summaries, and findings to skip unchanged files |
| Parallel execution | File walking and analysis run concurrently via Rayon; scales with available CPU cores |
| Configurable analysis rules | Define custom sources, sanitizers, sinks, terminators, and event handlers per language via TOML config or CLI |
| Configurable scan parameters | Exclude directories, set maximum file size, tune worker threads, limit output, and more |
| Multiple output formats | Console (default), JSON, and SARIF 2.1.0 for CI integration |
| Progress reporting | Real-time progress bars for file discovery and analysis passes |
| **Overview** | Dashboard: finding counts by severity, top offenders, engine profile summary |
| **Findings** | Browsable list with severity badges, triage status, rule filter, language filter |
| **Finding detail** | Flow-path visualiser with numbered steps (source → sanitizer → sink), code snippets, evidence, cross-file markers, triage dropdown |
| **Triage** | Bulk update states (open, investigating, fixed, false_positive, accepted_risk, suppressed), audit trail, import/export JSON |
| **Explorer** | File tree with per-file symbol list and finding overlay |
| **Scans** | Run history, metrics, diff two scans to see what changed |
| **Rules** | Built-in and custom rules per language; add rules from the UI |
| **Config** | Live config editor; reload without restart |
`nyx serve` flags: `--port <N>` (default `9700`), `--host <addr>` (loopback only: `127.0.0.1`, `localhost`, or `::1`), `--no-browser`. See `[server]` in `nyx.conf` for persistent settings, and [`docs/serve.md`](docs/serve.md) for the page-by-page UI tour and security model.
---
## Why choose Nyx?
## CLI for CI
| Advantage | What it means for you |
|---|---|
| **Pure-Rust, single binary** | No JVM, Python, or server to install; drop the `nyx` executable into your `$PATH` and go. |
| **Massively parallel** | Uses Rayon and a thread-pool walker; scales to all CPU cores. Scanning the entire **rust-lang/rust** codebase (~53,000 files) on an M2 MacBook Pro takes **~1 s**. |
| **Deep analysis** | Real CFG construction and monotone dataflow taint analysis with guaranteed termination, not just regex matching. Cross-file function summaries, capability-based sanitizer tracking, and scored findings. |
| **Index-aware** | An optional SQLite index stores file hashes and findings; subsequent scans touch *only* changed files, slashing CI times. |
| **Offline & privacy-friendly** | Requires no login, cloud account, or telemetry. Perfect for air-gapped environments and strict compliance policies. |
| **Tree-sitter precision** | Parses real language grammars, not regexes, giving far fewer false positives than line-based scanners. |
| **Extensible** | Add new patterns with concise `tree-sitter` queries; no SaaS lock-in. |
The same engine runs headless for CI pipelines. SARIF output uploads directly to GitHub Code Scanning.
<p align="center"><img src="assets/screenshots/cli-scan.png" alt="nyx scan console output: HIGH taint findings across a JS and Python file with source → sink arrows" width="820"/></p>
```bash
# Fail the job on medium or higher, emit SARIF
nyx scan --format sarif --fail-on MEDIUM > results.sarif
# Ad-hoc JSON, no index
nyx scan ./server --format json --index off
# AST patterns only (fastest; skips CFG + taint)
nyx scan --mode ast
# Engine-depth shortcut: fast | balanced (default) | deep
# `deep` adds symex + demand-driven backwards taint for higher precision at ~2-3× cost
nyx scan --engine-profile deep
```
Forward cross-file taint runs in every profile. Symex and the demand-driven backwards walk are opt-in. Turn them on either via `--engine-profile deep`, or individually (`--symex`, `--backwards-analysis`). See [`docs/cli.md`](docs/cli.md#engine-depth-profile) for the full toggle matrix.
### GitHub Action
```yaml
- uses: elicpeter/nyx@v0.5.0
with:
format: sarif
fail-on: MEDIUM
- uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: nyx-results.sarif
```
Inputs: `path`, `version`, `format` (`sarif`|`json`|`console`), `fail-on`, `args`, `token`. Outputs: `finding-count`, `sarif-file`, `exit-code`, `nyx-version`. Linux and macOS runners (x86_64, ARM64).
---
## Installation
## Install
### Install crate
**Cargo (recommended):**
```bash
$ cargo install nyx-scanner
cargo install nyx-scanner
```
### Install Github release
1. Navigate to the [Releases](https://github.com/elicpeter/nyx/releases) page of the repository.
2. Download the appropriate binary for your system:
```nyx-x86_64-unknown-linux-gnu.zip``` for Linux
```nyx-x86_64-pc-windows-msvc.zip``` for Windows
```nyx-x86_64-apple-darwin.zip``` or ```nyx-aarch64-apple-darwin.zip``` for macOS (Intel or Apple Silicon)
3. Unzip the file and move the executable to a directory in your system PATH:
```bash
# Example for Unix systems
unzip nyx-x86_64-unknown-linux-gnu.zip
chmod +x nyx
sudo mv nyx /usr/local/bin/
```
```bash
# Example for Windows in PowerShell
Expand-Archive -Path nyx-x86_64-pc-windows-msvc.zip -DestinationPath .
Move-Item -Path .\nyx.exe -Destination "C:\Program Files\Nyx\" # Add to PATH manually if needed
```
4. Verify the installation:
```bash
nyx --version
```
### Build from source
**Pre-built binaries:** Grab the archive for your platform from [Releases](https://github.com/elicpeter/nyx/releases), verify against `SHA256SUMS` (and the detached `SHA256SUMS.asc` GPG signature, when present), unzip, and drop `nyx` on your `PATH`.
```bash
$ git clone https://github.com/elicpeter/nyx.git
$ cd nyx
$ cargo build --release
# optional copy the binary into PATH
$ cargo install --path .
# Optional: verify the checksum file's GPG signature (when SHA256SUMS.asc is published)
gpg --verify SHA256SUMS.asc SHA256SUMS
sha256sum -c SHA256SUMS --ignore-missing
unzip nyx-x86_64-unknown-linux-gnu.zip && chmod +x nyx && sudo mv nyx /usr/local/bin/
```
Nyx targets **stable Rust 1.85 or later**.
**From source:**
```bash
git clone https://github.com/elicpeter/nyx.git
cd nyx && cargo build --release
```
Requires stable Rust 1.88+. The frontend is compiled and embedded in the binary at build time, so there is no separate install step for `nyx serve`.
---
## Quick Start
## Languages
```bash
# Scan the current directory (creates/uses an index automatically)
$ nyx scan
All 10 languages parse via tree-sitter and run through the full pipeline, but rule depth is uneven. Tiers reflect benchmark F1 on the 305-case corpus at [`tests/benchmark/ground_truth.json`](tests/benchmark/ground_truth.json):
# Scan a specific path and emit JSON
$ nyx scan ./server --format json
# Emit SARIF 2.1.0 for CI integration (GitHub Code Scanning, etc.)
$ nyx scan --format sarif > results.sarif
# Perform an ad-hoc scan without touching the index
$ nyx scan --index off
# Restrict results to high-severity findings
$ nyx scan --severity HIGH
# Filter by severity expression (high and medium)
$ nyx scan --severity ">=MEDIUM"
# AST pattern matching only (fastest, no CFG/taint)
$ nyx scan --mode ast
# CFG + taint analysis only (skip AST pattern rules)
$ nyx scan --mode cfg
# CI gate: fail on medium+, SARIF output
$ nyx scan --format sarif --fail-on MEDIUM > results.sarif
# Suppress status messages (for CI/scripting)
$ nyx scan --quiet --format json
# Include test/vendor/benchmark paths at original severity
# (by default these are downgraded one tier)
$ nyx scan --keep-nonprod-severity
```
### Index Management
```bash
# Create or rebuild an index
$ nyx index build [PATH] [--force]
# Display index metadata (size, modified date, etc.)
$ nyx index status [PATH]
# List all indexed projects (add -v for detailed view)
$ nyx list [-v]
# Remove a single project or purge all indexes
$ nyx clean <PROJECT_NAME>
$ nyx clean --all
```
### Configuration Management
```bash
# Print the effective merged configuration
$ nyx config show
# Print the config directory path
$ nyx config path
# Add a custom sanitizer rule (written to nyx.local)
$ nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape
# Add a terminator function
$ nyx config add-terminator --lang javascript --name process.exit
```
---
## Analysis Modes
Nyx supports four analysis modes, selectable via `--mode` or the `scanner.mode` config option:
| Mode | CLI flag | What runs |
|---|---|---|
| **Full** (default) | `--mode full` | AST pattern matching + CFG construction + taint analysis |
| **AST-only** | `--mode ast` | AST pattern matching only; skips CFG and taint entirely |
| **CFG** | `--mode cfg` | CFG + taint analysis only; filters out AST pattern findings |
| **Taint** | `--mode taint` | Alias for `cfg` (CFG + taint analysis) |
### What the CFG + taint engine detects
| Finding | Rule ID | Description |
|---|---|---|
| Tainted data flow | `taint-*` | Untrusted data (env vars, user input, file reads) flowing to dangerous sinks (shell exec, SQL, file write) without matching sanitization |
| Unguarded sink | `cfg-unguarded-sink` | Sink calls not dominated by a guard or sanitizer on the control-flow path |
| Auth gap | `cfg-auth-gap` | Web handler functions that reach privileged sinks without an auth check |
| Unreachable security code | `cfg-unreachable-*` | Sanitizers, guards, or sinks in dead code branches |
| Error fallthrough | `cfg-error-fallthrough` | Error-handling branches that don't terminate, allowing execution to fall through to dangerous operations |
| Resource leak | `cfg-resource-leak` | Resources acquired but not released on all exit paths (malloc/free, fopen/fclose, Lock/Unlock) |
| Use-after-close | `state-use-after-close` | Variable read/written after its resource handle was closed |
| Double-close | `state-double-close` | Resource handle closed more than once |
| Must-leak | `state-resource-leak` | Resource acquired but never closed on any exit path |
| May-leak | `state-resource-leak-possible` | Resource open on some but not all exit paths |
| Unauthenticated access | `state-unauthed-access` | Sensitive sink reached without a preceding auth/admin check |
### Attack Surface Ranking
Every finding is assigned a deterministic **attack-surface score** that estimates exploitability using only information already in memory — no extra source passes are needed. Findings are sorted by descending score before truncation, so `max_results` always keeps the most important results.
The score is the sum of five components:
| Component | Weight | Description |
|---|---|---|
| **Severity base** | High = 60, Medium = 30, Low = 10 | Primary ordering signal. Severity reflects source-kind exploitability and rule confidence. |
| **Analysis kind** | taint = +10, state = +8, cfg = +3/+5, ast = 0 | Taint-confirmed flows are the strongest signal; AST-only pattern matches rank lowest at equal severity. CFG findings with evidence get +5, without get +3. |
| **Evidence strength** | +1 per evidence item (max 4), +26 for source kind | More evidence increases confidence. Source-kind priority: user input (+6) > env/config (+5) > unknown (+4) > file system (+3) > database (+2). |
| **State rule type** | +1 to +6 | Use-after-close and unauthenticated access (+6) rank above double-close (+3), must-leak (+2), and may-leak (+1). |
| **Path validation** | 5 | Findings on paths guarded by a validation predicate receive a small exploitability penalty — the guard may prevent triggering. |
**Score ranges** (approximate):
| Finding type | Score |
|---|---|
| High taint + user input | ~78 |
| High state (use-after-close) | ~74 |
| High CFG structural | ~63 |
| Medium taint + env source | ~47 |
| Medium state (resource leak) | ~40 |
| Low AST-only pattern | ~10 |
Tie-breaking is deterministic: severity → rule ID → file path → line → column → message hash. The same set of findings always produces the same ordering regardless of parallelism or input order.
Ranking is enabled by default. Disable it with `--no-rank` or `output.attack_surface_ranking = false` in config. When disabled, `rank_score` is omitted from JSON/SARIF output.
---
## Supported Languages
All 10 languages have full AST pattern matching and CFG/taint analysis. Resource leak detection is available where language-specific acquire/release pairs are defined.
| Language | AST Patterns | CFG + Taint | Resource Leaks |
| Tier | Languages | F1 | Use as a CI gate? |
|---|---|---|---|
| Rust | Yes | Yes | Yes |
| C | Yes | Yes | Yes |
| C++ | Yes | Yes | Yes |
| Java | Yes | Yes | Yes |
| Go | Yes | Yes | Yes |
| PHP | Yes | Yes | Yes |
| Python | Yes | Yes | Yes |
| Ruby | Yes | Yes | Yes |
| TypeScript | Yes | Yes | Yes |
| JavaScript | Yes | Yes | Yes |
| **Stable** | Python, JavaScript, TypeScript | 96.8% to 100% | Yes |
| **Beta** | Go, Java, Ruby, PHP | 92.9% to 97.0% | Yes, with light FP triage |
| **Preview** | C, C++ | 88.9% to 92.3% | No. Pair with clang-tidy or Clang Static Analyzer |
| **Experimental** | Rust | 86.4% | Review findings, don't block merges |
Per-dimension detail and known blind spots live in [`docs/language-maturity.md`](docs/language-maturity.md).
### Validated against real CVEs
The corpus also holds a small set of vulnerable/patched pairs extracted from published advisories, so the benchmark floor is defended by regression protection on demonstrably real bugs rather than just synthetic analogues. Nyx fires on the vulnerable file and emits zero findings on the patched file for each pair.
| CVE | Project | Language | Class |
|---|---|---|---|
| [CVE-2023-48022](https://nvd.nist.gov/vuln/detail/CVE-2023-48022) | Ray | Python | Command injection |
| [CVE-2017-18342](https://nvd.nist.gov/vuln/detail/CVE-2017-18342) | PyYAML | Python | Deserialization |
| [CVE-2019-14939](https://nvd.nist.gov/vuln/detail/CVE-2019-14939) | mongo-express | JavaScript | Code execution (`eval`) |
| [CVE-2023-26159](https://nvd.nist.gov/vuln/detail/CVE-2023-26159) | follow-redirects | TypeScript | SSRF |
| [CVE-2022-30323](https://nvd.nist.gov/vuln/detail/CVE-2022-30323) | hashicorp/go-getter | Go | Command injection |
| [CVE-2015-7501](https://nvd.nist.gov/vuln/detail/CVE-2015-7501) | Apache Commons Collections | Java | Deserialization |
| [CVE-2013-0156](https://nvd.nist.gov/vuln/detail/CVE-2013-0156) | Ruby on Rails | Ruby | Deserialization |
| [CVE-2017-9841](https://nvd.nist.gov/vuln/detail/CVE-2017-9841) | PHPUnit | PHP | Code execution (`eval`) |
| [CVE-2018-15133](https://nvd.nist.gov/vuln/detail/CVE-2018-15133) | Laravel | PHP | Deserialization |
| [CVE-2016-3714](https://nvd.nist.gov/vuln/detail/CVE-2016-3714) | ImageMagick (ImageTragick) | C | Command injection |
| [CVE-2019-18634](https://nvd.nist.gov/vuln/detail/CVE-2019-18634) | sudo (pwfeedback) | C | Memory safety |
| [CVE-2019-13132](https://nvd.nist.gov/vuln/detail/CVE-2019-13132) | ZeroMQ libzmq | C++ | Memory safety |
| [CVE-2022-1941](https://nvd.nist.gov/vuln/detail/CVE-2022-1941) | Protocol Buffers | C++ | Memory safety |
| [CVE-2017-12629](https://nvd.nist.gov/vuln/detail/CVE-2017-12629) | Apache Solr | Java | Command injection |
Fixtures live under [`tests/benchmark/cve_corpus/`](tests/benchmark/cve_corpus/) with upstream attribution headers.
---
## Configuration Overview
## How it works
Nyx merges a default configuration file (`nyx.conf`) with user overrides (`nyx.local`). Both live in the platform-specific configuration directory shown below.
Two passes over the filesystem, with an optional SQLite index to skip unchanged files:
| Platform | Directory |
|---|---|
| Linux | `~/.config/nyx/` |
| macOS | `~/Library/Application Support/nyx/` |
| Windows | `%APPDATA%\elicpeter\nyx\config\` |
1. **Pass 1**: parse each file via tree-sitter, build an intra-procedural CFG (petgraph), lower to pruned SSA (Cytron phi insertion over dominance frontiers), and export per-function summaries (source/sanitizer/sink caps, taint transforms, points-to, callees).
2. **Summary merge**: union all per-file summaries into a `GlobalSummaries` map.
3. **Pass 2**: re-analyze each file with cross-file context under bounded context sensitivity (k=1 inlining for intra-file callees, SCC fixpoint capped at 64 iterations, and summary fallback for callees above the inline body-size cap). A forward dataflow worklist propagates taint through the SSA lattice with guaranteed convergence. Call-graph SCCs iterate to fixed-point (within the cap) so mutually recursive functions get accurate summaries.
4. **Rank, dedupe, emit**: findings are scored by severity × evidence strength × source-kind exploitability, then emitted to console, JSON, or SARIF.
Minimal example (`nyx.local`):
Detector families: taint (cross-file source→sink), CFG structural (auth gaps, unguarded sinks, resource leaks), state model (use-after-close, double-close, must-leak, unauthed-access), AST patterns (tree-sitter structural match). Full detector docs: [`docs/detectors.md`](docs/detectors.md).
---
## Configuration
Config merges `nyx.conf` (defaults) and `nyx.local` (your overrides) from the platform config directory (`~/.config/nyx/` on Linux, `~/Library/Application Support/nyx/` on macOS, `%APPDATA%\elicpeter\nyx\config\` on Windows).
```toml
[scanner]
mode = "full" # full | ast | taint
min_severity = "Medium"
follow_symlinks = true
excluded_extensions = ["mp3", "mp4"]
mode = "full" # full | ast | cfg | taint
min_severity = "Medium"
[output]
default_format = "json"
max_results = 200
quiet = true # suppress status messages
[performance]
worker_threads = 8 # 0 = auto-detect
batch_size = 200
channel_multiplier = 2
```
### Custom Analysis Rules
You can define custom sources, sanitizers, sinks, terminators, and event handlers per language. These take priority over built-in rules, letting you teach Nyx about project-specific functions.
```toml
[analysis.languages.javascript]
terminators = ["process.exit"]
event_handlers = ["addEventListener"]
[server]
host = "127.0.0.1"
port = 9700
open_browser = true
# Project-specific sanitizer
[[analysis.languages.javascript.rules]]
matchers = ["escapeHtml"]
kind = "sanitizer" # "source" | "sanitizer" | "sink"
cap = "html_escape" # "env_var" | "html_escape" | "shell_escape" |
# "url_encode" | "json_parse" | "file_io" | "all"
[[analysis.languages.javascript.rules]]
matchers = ["dangerouslySetHTML"]
kind = "sink"
cap = "html_escape"
kind = "sanitizer"
cap = "html_escape"
```
Rules can also be added interactively via `nyx config add-rule` and `nyx config add-terminator`.
A fully documented `nyx.conf` is generated automatically on first run.
Or add rules interactively: `nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape`. Caps: `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `code_exec`, `crypto`, `unauthorized_id`, `all`. Full schema: [`docs/configuration.md`](docs/configuration.md).
---
## Architecture in Brief
## Status
Nyx uses a **two-pass architecture** to enable cross-file analysis without sacrificing parallelism:
Under active development. APIs, detector behavior, and configuration options may change between releases. Rule-level F1 on the 305-case corpus is the CI regression floor; per-language detail lives in [`tests/benchmark/RESULTS.md`](tests/benchmark/RESULTS.md).
1. **File enumeration** -- A parallel walker (Rayon + `ignore` crate) applies gitignore rules, size limits, and user exclusions.
2. **Pass 1 -- Summary extraction** -- Each file is parsed via tree-sitter, an intra-procedural CFG is built (petgraph), and a `FuncSummary` is exported per function capturing source/sanitizer/sink capabilities (bitflags), taint propagation behavior, and callee lists. Summaries are persisted to SQLite.
3. **Summary merge** -- All per-file summaries are merged into a `GlobalSummaries` map with conservative conflict resolution (union caps, OR booleans).
4. **Pass 2 -- Analysis** -- Files are re-parsed and analyzed with the full cross-file context: a monotone forward dataflow engine resolves callees against local and global summaries and propagates taint through a bounded lattice with guaranteed convergence. CFG analysis checks for auth gaps, unguarded sinks, resource leaks, and more.
5. **Reporting** -- Findings are scored, ranked, deduplicated, and emitted to the console or serialized as JSON.
Taint analysis is interprocedural. Persisted per-function SSA summaries carry per-return-path transforms and parameter-granularity points-to, and call-graph SCCs (including SCCs that span files) iterate to a joint fixed-point. The default `balanced` profile also runs k=1 context-sensitive inlining for intra-file callees. Symex (with cross-file and interprocedural frames) and the demand-driven backwards walk are opt-in. Enable them individually with `--symex` and `--backwards-analysis`, or together with `--engine-profile deep`.
With indexing enabled, Pass 1 skips files whose blake3 content hash is unchanged, and cached findings are served directly for AST-only results.
---
## Roadmap
### Phase 1 -- Deep Static Engine (Complete)
| Feature | Status | Description |
|---|--------|---|
| Interprocedural call graph | Done | Precise symbol resolution via `FuncKey`, language-scoped namespaces, cross-module linking. Full call graph with SCC and topological analysis. |
| Path-sensitive analysis | Done | Track path predicates and conditional constraints. Detect infeasible paths and validation-only-in-one-branch patterns. Monotone predicate summaries with contradiction pruning. |
| Dataflow & state modeling | Done | Resource state machines (init -> use -> close), auth state transitions, privilege level tracking. Generic `Transfer` trait over bounded lattices with guaranteed convergence. |
| Monotone taint analysis | Done | Replaced BFS taint engine with a forward worklist dataflow analysis over a finite `TaintState` lattice. Multi-origin tracking, dual validated-must/may sets, JS/TS two-level solve. Guaranteed termination via lattice finiteness. |
| Attack surface ranking | Done | Deterministic post-analysis scoring of findings by severity, analysis kind, evidence strength, source-kind exploitability, and validation state. Findings sorted by score before truncation so `max_results` keeps the most important results. |
| Inline suppressions | Done | `nyx:ignore` and `nyx:ignore-next-line` comments with wildcard matching, all 10 languages supported. `--show-suppressed` flag for visibility. |
| Low-noise prioritization | Done | Category filtering, rollup grouping for high-frequency rules, configurable LOW budgets. Quality-category findings hidden by default. |
| Pattern-level confidence | Done | Explicit High/Medium/Low confidence on every AST pattern. Confidence flows into output alongside severity and rank score. |
| AST pattern overhaul | Done | 30+ new patterns across all languages, 11 broken query fixes, namespaced IDs, severity recalibration. |
### Phase 2 -- Dynamic Capability
| Feature | Description |
|---|---|
| Controlled dynamic execution | Local sandbox: identify entry points, spin up test harnesses, inject payloads, detect runtime crashes and command execution. Deterministic automated exploit validation -- static finds `exec(user_input)`, dynamic confirms it with `; id`. |
| Fuzzing integration | libFuzzer (C/C++), cargo-fuzz (Rust), go-fuzz, HTTP fuzzing harness. Static engine identifies interesting functions, fuzzer targets only those. |
### Phase 3 -- Intelligent Reasoning Layer
| Feature | Description |
|---|---|
| Semantic similarity | Embeddings for finding similar vulnerability patterns across codebases. |
| LLM reasoning | AI-assisted detection of non-obvious logic bugs. |
| Exploit refinement | Automated loops to refine and validate exploit chains. |
### Other planned improvements
| Area | Details |
|---|---|
| Output formats | JUnit XML, HTML report generator |
| Language coverage | Expanded taint rules per language |
| Rule updates | Remote rule feed with signature verification |
| UX | Smart file-watch re-scan |
Community feedback shapes priorities -- please [open an issue](https://github.com/elicpeter/nyx/issues) to discuss proposed changes.
Limitations:
- Interprocedural precision is bounded rather than unlimited. Context-sensitive inlining is k=1 with a callee body-size cap, and SCC fixed-point has an iteration cap. When the engine hits a bound it falls back to summaries and records an `engine_note` on the finding.
- Cross-language calls (FFI, subprocess, WASM) are not traversed. Each language is analysed independently.
- Several language features are not modeled: macros, most dynamic dispatch, aliased imports, reflection.
- Rust is experimental tier; C/C++ are preview tier. Pair them with a clang-based tool before using as a hard CI gate.
- Results may contain false positives or false negatives; manual review is expected.
---
## Documentation
Full documentation is available in the [`docs/`](docs/index.md) directory:
- [Installation](docs/installation.md) — cargo, binaries, CI tips
- [Quick Start](docs/quickstart.md) — Your first scan in 60 seconds
- [CLI Reference](docs/cli.md) — Every flag and subcommand
- [Configuration](docs/configuration.md) — Config file schema, custom rules
- [Output Formats](docs/output.md) — Console, JSON, SARIF; exit codes
- [Detector Overview](docs/detectors.md) — How the four detector families work
- [Taint Analysis](docs/detectors/taint.md) — Cross-file source-to-sink dataflow
- [CFG Structural](docs/detectors/cfg.md) — Auth gaps, unguarded sinks, resource leaks
- [State Model](docs/detectors/state.md) — Resource lifecycle, authentication state
- [AST Patterns](docs/detectors/patterns.md) — Tree-sitter structural matching
- [Rule Reference](docs/rules/index.md) — Per-language rule listings with examples
- [Quick Start](docs/quickstart.md) · [CLI Reference](docs/cli.md) · [Installation](docs/installation.md)
- [`nyx serve`](docs/serve.md) · [Output Formats](docs/output.md) · [Configuration](docs/configuration.md)
- [How it works](docs/how-it-works.md) · [Detectors](docs/detectors.md) ([Taint](docs/detectors/taint.md), [CFG](docs/detectors/cfg.md), [State](docs/detectors/state.md), [AST Patterns](docs/detectors/patterns.md))
- [Rule Reference](docs/rules.md) · [Language Maturity](docs/language-maturity.md) · [Advanced Analysis](docs/advanced-analysis.md) · [Auth Analysis](docs/auth.md)
---
## Contributing
Pull requests are welcome. To contribute:
Contributions are welcome.
1. Fork the repository and create a feature branch.
2. Adhere to `rustfmt` and ensure `cargo clippy --all -- -D warnings` passes.
3. Add unit and/or integration tests where applicable (`cargo test` should remain green).
4. Submit a concise, well-documented pull request.
Nyx is open source and will always have a fully open-source core. To support long-term development and keep the project sustainable, contributors may be asked to sign a Contributor License Agreement before their first merged contribution.
Please open an issue for any crash, panic, or suspicious result -- attach the minimal code snippet and mention the Nyx version.
Run `sh scripts/check.sh` before submitting. See [`CONTRIBUTING.md`](CONTRIBUTING.md) for the full guide, including how to add rules and support new languages. Open an issue for crashes, panics, or suspicious results; attach a minimal snippet and the Nyx version.
See [`CONTRIBUTING.md`](CONTRIBUTING.md) for full guidelines, including how to add new rules and support new languages.
---
## AI Disclosure
- **Engine code** (taint, SSA, CFG, call graph, abstract interp, symbolic exec): predominantly human-written. AI was used selectively for refactors and boilerplate, with all merges human-reviewed.
- **Docs and most of this README**: AI-generated from the code and hand-edited. Report doc/code drift as a bug.
- **Test fixtures and `expected.yaml` files**: AI-assisted drafting, human-audited before landing.
- **Frontend UI** (React app): built with AI assistance, human-reviewed.
As with any static analyzer, validate findings against your own corpus before using Nyx as a CI gate.
---
## License
Nyx is licensed under the **GNU General Public License v3.0 (GPL-3.0)**.
This ensures that all modified versions of the scanner remain free and open-source, protecting the integrity and transparency of security tools.
See [LICENSE](./LICENSE) for full details.
GNU General Public License v3.0 or later (GPL-3.0-or-later). The optional `smt` feature bundles Z3 (MIT-licensed); distributors of binaries built with `--features smt` should include Z3's license in their attribution. Full text in [LICENSE](./LICENSE); third-party dependencies in [THIRDPARTY-LICENSES.html](./THIRDPARTY-LICENSES.html).

22
ROADMAP.md Normal file
View file

@ -0,0 +1,22 @@
# Roadmap
Nyx today is a static-only multi-language vulnerability scanner. The roadmap below extends it into a hybrid scanner that combines static analysis with controlled execution and AI-assisted reasoning.
## Phase 1 — Static Analysis (current)
The shipped scanner. Multi-language taint tracking on a pruned SSA IR, cross-file function summaries, points-to and abstract interpretation, symbolic execution with an optional SMT backend, and a local web UI for triage. See the [Changelog](CHANGELOG.md) for the full breakdown of what's landed through 0.5.0.
## Phase 2 — Dynamic Capability
| Feature | Description |
| --- | --- |
| Controlled dynamic execution | Local sandbox: identify entry points, spin up test harnesses, inject payloads, detect runtime crashes and command execution. Deterministic automated exploit validation — static finds `exec(user_input)`, dynamic confirms it with `; id`. |
| Fuzzing integration | libFuzzer (C/C++), cargo-fuzz (Rust), go-fuzz, HTTP fuzzing harness. Static engine identifies interesting functions, fuzzer targets only those. |
## Phase 3 — Intelligent Reasoning Layer
| Feature | Description |
| --- | --- |
| Semantic similarity | Embeddings for finding similar vulnerability patterns across codebases. |
| LLM reasoning | AI-assisted detection of non-obvious logic bugs. |
| Exploit refinement | Automated loops to refine and validate exploit chains. |

View file

@ -4,9 +4,9 @@
| Version | Supported | Notes |
|---------|-----------|----------------------|
| 0.4.x | ✅ | Latest stable line |
| 0.3.x | ✅ | Critical fixes only |
| < 0.3 | | End-of-life |
| 0.5.x | ✅ | Latest stable line |
| 0.4.x | ✅ | Critical fixes only |
| < 0.4 | | End-of-life |
We follow [Semantic Versioning] as soon as we hit **1.0.0**.
Before that, breaking changes may land in any minor release.

6671
THIRDPARTY-LICENSES.html Normal file

File diff suppressed because it is too large Load diff

View file

@ -1,12 +1,80 @@
# Pin the target triples scanned so `cargo about generate` produces the
# same output regardless of host OS. Must match the release build matrix
# in .github/workflows/release-build.yml — otherwise the CI diff step
# (third-party-licenses) will fail on platform-specific crates like
# linux-raw-sys, android_system_properties, etc.
targets = [
"x86_64-unknown-linux-gnu",
"aarch64-unknown-linux-gnu",
"x86_64-pc-windows-msvc",
"x86_64-apple-darwin",
"aarch64-apple-darwin",
]
accepted = [
# --- Apache / MIT / BSD / permissive ---
"Apache-2.0",
"MIT",
"MIT-0",
"Unicode-3.0",
"BSD-2-Clause",
"Unlicense",
"BSD-3-Clause",
"ISC",
"Zlib",
"zlib-acknowledgement",
"BSL-1.0",
"NCSA",
"PostgreSQL",
"curl",
"BlueOak-1.0.0",
"X11",
"HPND",
"TCL",
"ICU",
"Info-ZIP",
# --- Unicode / data / specs ---
"Unicode-DFS-2016",
"Unicode-3.0",
# --- compression / libs ---
"bzip2-1.0.6",
"Libpng",
"libpng-2.0",
"IJG",
"FTL",
# --- public domain style ---
"CC0-1.0",
"Unlicense",
"0BSD",
# --- weak copyleft (GPL-compatible) ---
"MPL-2.0",
"GPL-3.0"
]
"LGPL-3.0",
"EPL-2.0",
# --- GPL family ---
"GPL-3.0",
"GPL-2.0",
# --- Python / PSF ---
"PSF-2.0",
"Python-2.0",
"Python-2.0.1",
# --- Artistic / Perl ---
"Artistic-2.0",
# --- LLVM / clang ---
"Apache-2.0 WITH LLVM-exception",
# --- data / ML ---
"CDLA-Permissive-2.0",
# --- fonts ---
"OFL-1.1",
# --- Creative Commons (code-safe ones) ---
"CC-BY-3.0",
"CC-BY-4.0",
]

148
action-scripts/download.sh Executable file
View file

@ -0,0 +1,148 @@
#!/usr/bin/env bash
set -euo pipefail
REPO="elicpeter/nyx"
VERSION="${NYX_VERSION:-latest}"
INSTALL_DIR="${RUNNER_TOOL_CACHE:-/tmp}/nyx"
# Optional: pin a GPG key fingerprint here (40-char, no spaces) or set
# NYX_GPG_FINGERPRINT in the calling env to require GPG-signed SHA256SUMS.
# Empty ⇒ GPG verification is skipped (SHA256 + SLSA attestation still run).
PINNED_GPG_FINGERPRINT="${NYX_GPG_FINGERPRINT:-}"
# ── Detect runner OS and architecture ─────────────────────────────────────────
OS="$(uname -s)"
ARCH="$(uname -m)"
case "${OS}-${ARCH}" in
Linux-x86_64) TARGET="x86_64-unknown-linux-gnu" ;;
Linux-aarch64) TARGET="aarch64-unknown-linux-gnu" ;;
Darwin-x86_64) TARGET="x86_64-apple-darwin" ;;
Darwin-arm64) TARGET="aarch64-apple-darwin" ;;
*)
echo "::error::Unsupported platform: ${OS} ${ARCH}"
exit 1
;;
esac
# ── Resolve "latest" to an actual release tag ────────────────────────────────
if [[ "$VERSION" == "latest" ]]; then
echo "::warning::version: latest follows a mutable tag. Pin to a specific release (e.g. v0.5.0) for supply-chain safety."
API_URL="https://api.github.com/repos/${REPO}/releases/latest"
CURL_ARGS=(-fsSL)
if [[ -n "${GITHUB_TOKEN:-}" ]]; then
CURL_ARGS+=(-H "Authorization: token ${GITHUB_TOKEN}")
fi
RELEASE_JSON="$(curl "${CURL_ARGS[@]}" "$API_URL")"
VERSION="$(echo "$RELEASE_JSON" | grep -o '"tag_name":\s*"[^"]*"' | head -1 | cut -d'"' -f4)"
if [[ -z "$VERSION" ]]; then
echo "::error::Failed to resolve latest release tag from ${API_URL}"
exit 1
fi
echo "Resolved latest version: ${VERSION}"
fi
# ── Download the release asset into an isolated staging dir ──────────────────
ASSET_NAME="nyx-${TARGET}.zip"
RELEASE_BASE="https://github.com/${REPO}/releases/download/${VERSION}"
DOWNLOAD_URL="${RELEASE_BASE}/${ASSET_NAME}"
STAGING="$(mktemp -d)"
trap 'rm -rf "$STAGING"' EXIT
CURL_COMMON=(-fsSL)
if [[ -n "${GITHUB_TOKEN:-}" ]]; then
CURL_COMMON+=(-H "Authorization: token ${GITHUB_TOKEN}")
fi
echo "Downloading nyx ${VERSION} for ${TARGET}..."
curl "${CURL_COMMON[@]}" -o "${STAGING}/${ASSET_NAME}" "$DOWNLOAD_URL"
# SHA256SUMS is required — the whole release signing chain hinges on it.
echo "Downloading SHA256SUMS..."
curl "${CURL_COMMON[@]}" -o "${STAGING}/SHA256SUMS" "${RELEASE_BASE}/SHA256SUMS"
# SHA256SUMS.asc is optional (GPG signing was wired up mid-0.x); fetch it if
# present so we can attempt signature verification.
SIG_PATH=""
if curl "${CURL_COMMON[@]}" -o "${STAGING}/SHA256SUMS.asc" "${RELEASE_BASE}/SHA256SUMS.asc" 2>/dev/null; then
SIG_PATH="${STAGING}/SHA256SUMS.asc"
fi
# ── Mandatory: verify the binary's SHA256 matches SHA256SUMS ─────────────────
(
cd "$STAGING"
# --ignore-missing: SHA256SUMS lists every platform archive; we only have one.
if ! sha256sum --ignore-missing -c SHA256SUMS >/dev/null 2>&1; then
echo "::error::SHA256 verification failed for ${ASSET_NAME}. Release may be tampered."
echo "Expected (from SHA256SUMS):"
grep -F "${ASSET_NAME}" SHA256SUMS || true
echo "Actual:"
sha256sum "${ASSET_NAME}" || true
exit 1
fi
)
echo "::notice::SHA256 checksum verified for ${ASSET_NAME}."
# ── Best-effort: GPG verify SHA256SUMS.asc against a pinned fingerprint ──────
# Trust model: only accept a signature from a fingerprint we have pinned. A
# signature from any other key is treated as a failure, not a success. If no
# fingerprint is pinned, GPG verification is skipped (SHA256+SLSA still run).
if [[ -n "$SIG_PATH" ]]; then
if [[ -z "$PINNED_GPG_FINGERPRINT" ]]; then
echo "::warning::SHA256SUMS.asc found but no GPG fingerprint pinned. Set NYX_GPG_FINGERPRINT (40-char, no spaces) to enforce GPG verification."
elif ! command -v gpg >/dev/null 2>&1; then
echo "::warning::gpg not installed on runner; skipping SHA256SUMS.asc verification."
else
# Fetch the pinned key from keys.openpgp.org into an ephemeral keyring.
GNUPGHOME="$(mktemp -d)"
export GNUPGHOME
chmod 700 "$GNUPGHOME"
trap 'rm -rf "$STAGING" "$GNUPGHOME"' EXIT
if ! gpg --batch --keyserver hkps://keys.openpgp.org \
--recv-keys "$PINNED_GPG_FINGERPRINT" >/dev/null 2>&1; then
echo "::error::Failed to fetch GPG key ${PINNED_GPG_FINGERPRINT} from keys.openpgp.org."
exit 1
fi
# --status-fd 1 gives machine-readable output; VALIDSIG + the pinned fpr
# is the only accept condition.
GPG_STATUS="$(gpg --batch --status-fd 1 --verify \
"$SIG_PATH" "${STAGING}/SHA256SUMS" 2>/dev/null || true)"
if ! grep -q "^\[GNUPG:\] VALIDSIG ${PINNED_GPG_FINGERPRINT} " <<<"$GPG_STATUS"; then
echo "::error::GPG signature on SHA256SUMS does not match pinned fingerprint ${PINNED_GPG_FINGERPRINT}."
echo "$GPG_STATUS"
exit 1
fi
echo "::notice::GPG signature verified against ${PINNED_GPG_FINGERPRINT}."
fi
else
echo "::warning::SHA256SUMS.asc not published for ${VERSION}; relying on SHA256 + SLSA only."
fi
# ── Best-effort: SLSA build-provenance attestation (Sigstore) ────────────────
# gh attestation verify ships with the gh CLI (preinstalled on GH-hosted
# runners) and validates attestations produced by actions/attest-build-
# provenance against the Sigstore public-good transparency log. Unlike GPG
# this requires no pre-shared key and is the preferred trust root.
if command -v gh >/dev/null 2>&1; then
if gh attestation verify "${STAGING}/${ASSET_NAME}" --repo "${REPO}" >/dev/null 2>&1; then
echo "::notice::SLSA build provenance verified for ${ASSET_NAME}."
else
echo "::warning::gh attestation verify failed or no attestation present for ${VERSION}. (Expected for releases predating attest-build-provenance.)"
fi
else
echo "::warning::gh CLI not available; skipping SLSA attestation verification."
fi
# ── Extract and install ──────────────────────────────────────────────────────
mkdir -p "$INSTALL_DIR"
# The zip stores target/{TARGET}/release/nyx — use -j to flatten paths
unzip -o -j "${STAGING}/${ASSET_NAME}" "*/nyx" -d "$INSTALL_DIR"
chmod +x "${INSTALL_DIR}/nyx"
# ── Add to PATH for subsequent steps ─────────────────────────────────────────
echo "${INSTALL_DIR}" >> "$GITHUB_PATH"
# ── Verify and set output ────────────────────────────────────────────────────
INSTALLED_VERSION="$("${INSTALL_DIR}/nyx" --version 2>&1 | head -1 || echo "unknown")"
echo "nyx-version=${INSTALLED_VERSION}" >> "$GITHUB_OUTPUT"
echo "Installed nyx: ${INSTALLED_VERSION} (${TARGET})"

87
action-scripts/run.sh Executable file
View file

@ -0,0 +1,87 @@
#!/usr/bin/env bash
set -uo pipefail
# Note: NOT -e — we capture nyx's exit code manually.
# ── Build the nyx command ────────────────────────────────────────────────────
FORMAT="${INPUT_FORMAT:-sarif}"
ARGS=("scan" "${INPUT_PATH:-.}" "--quiet" "--format" "$FORMAT")
if [[ -n "${INPUT_FAIL_ON:-}" ]]; then
ARGS+=("--fail-on" "$INPUT_FAIL_ON")
fi
# Append raw user args (word-split is intentional here)
if [[ -n "${INPUT_ARGS:-}" ]]; then
read -ra EXTRA <<< "$INPUT_ARGS"
ARGS+=("${EXTRA[@]}")
fi
# ── Execute the scan ─────────────────────────────────────────────────────────
OUTDIR="${RUNNER_TEMP:-/tmp}"
SARIF_FILE=""
NYX_EXIT=0
echo "::group::nyx scan"
echo "Running: nyx ${ARGS[*]}"
case "$FORMAT" in
sarif)
SARIF_FILE="${OUTDIR}/nyx-results.sarif"
nyx "${ARGS[@]}" > "$SARIF_FILE" || NYX_EXIT=$?
;;
json)
nyx "${ARGS[@]}" > "${OUTDIR}/nyx-results.json" || NYX_EXIT=$?
;;
*)
nyx "${ARGS[@]}" || NYX_EXIT=$?
;;
esac
echo "::endgroup::"
# ── Count findings ───────────────────────────────────────────────────────────
count_findings() {
python3 -c "
import json, sys
try:
data = json.load(open(sys.argv[1]))
fmt = sys.argv[2]
if fmt == 'sarif':
runs = data.get('runs', [])
print(len(runs[0].get('results', [])) if runs else 0)
else:
print(len(data) if isinstance(data, list) else 0)
except Exception:
print(0)
" "$1" "$2" 2>/dev/null || echo "0"
}
FINDING_COUNT="unknown"
case "$FORMAT" in
sarif)
if [[ -f "$SARIF_FILE" ]]; then
FINDING_COUNT="$(count_findings "$SARIF_FILE" sarif)"
fi
;;
json)
if [[ -f "${OUTDIR}/nyx-results.json" ]]; then
FINDING_COUNT="$(count_findings "${OUTDIR}/nyx-results.json" json)"
fi
;;
esac
# ── Set outputs ──────────────────────────────────────────────────────────────
echo "exit-code=${NYX_EXIT}" >> "$GITHUB_OUTPUT"
echo "finding-count=${FINDING_COUNT}" >> "$GITHUB_OUTPUT"
if [[ -n "$SARIF_FILE" ]]; then
echo "sarif-file=${SARIF_FILE}" >> "$GITHUB_OUTPUT"
fi
# ── Summary ──────────────────────────────────────────────────────────────────
if [[ "$NYX_EXIT" -eq 0 ]]; then
echo "::notice::Nyx scan completed. Findings: ${FINDING_COUNT}"
else
echo "::warning::Nyx scan found issues meeting threshold. Findings: ${FINDING_COUNT}"
fi
exit "$NYX_EXIT"

68
action.yml Normal file
View file

@ -0,0 +1,68 @@
name: 'Nyx Security Scanner'
description: 'Run the Nyx multi-language vulnerability scanner on your codebase. Supports Linux and macOS runners (x86_64 and ARM64).'
author: 'Eli Peter'
branding:
icon: 'shield'
color: 'purple'
inputs:
path:
description: 'Directory to scan'
required: false
default: '.'
version:
description: 'Nyx release tag (e.g. v0.5.0). "latest" is accepted but discouraged, pinning to a specific tag protects against upstream compromise.'
required: false
default: 'v0.5.0'
format:
description: 'Output format: sarif, json, or console'
required: false
default: 'sarif'
fail-on:
description: 'Exit non-zero if findings meet this severity threshold: HIGH, MEDIUM, or LOW'
required: false
default: ''
args:
description: 'Additional CLI arguments (e.g. "--severity >=MEDIUM --profile ci")'
required: false
default: ''
token:
description: 'GitHub token for release download (avoids rate limits)'
required: false
default: ${{ github.token }}
outputs:
finding-count:
description: 'Number of findings detected'
value: ${{ steps.scan.outputs.finding-count }}
sarif-file:
description: 'Path to SARIF results file (empty if format is not sarif)'
value: ${{ steps.scan.outputs.sarif-file }}
exit-code:
description: 'Nyx exit code (0 = clean, 1 = threshold breached)'
value: ${{ steps.scan.outputs.exit-code }}
nyx-version:
description: 'Installed nyx version'
value: ${{ steps.install.outputs.nyx-version }}
runs:
using: 'composite'
steps:
- name: Install nyx
id: install
shell: bash
env:
NYX_VERSION: ${{ inputs.version }}
GITHUB_TOKEN: ${{ inputs.token }}
run: ${{ github.action_path }}/action-scripts/download.sh
- name: Run nyx scan
id: scan
shell: bash
env:
INPUT_PATH: ${{ inputs.path }}
INPUT_FORMAT: ${{ inputs.format }}
INPUT_FAIL_ON: ${{ inputs.fail-on }}
INPUT_ARGS: ${{ inputs.args }}
run: ${{ github.action_path }}/action-scripts/run.sh

BIN
assets/nyx-logo-text.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 324 KiB

BIN
assets/nyx-logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 520 KiB

10
assets/nyx-wordmark.svg Normal file
View file

@ -0,0 +1,10 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 220 100" role="img" aria-label="nyx">
<text x="110" y="72"
text-anchor="middle"
dominant-baseline="alphabetic"
font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', system-ui, sans-serif"
font-weight="700"
font-size="100"
letter-spacing="-1"
fill="#5856d6">nyx</text>
</svg>

After

Width:  |  Height:  |  Size: 392 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 444 KiB

BIN
assets/screenshots/demo.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 205 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 407 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 304 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 158 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 296 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 309 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 315 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 164 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 234 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 340 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 268 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 159 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 113 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 298 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 357 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 416 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 388 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 355 KiB

View file

@ -0,0 +1,61 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* Clean open/close — no findings expected */
void clean_usage(void) {
FILE *f = fopen("data.txt", "r");
char buf[256];
fread(buf, 1, 256, f);
fclose(f);
}
/* Resource leak — fopen without fclose */
void leaky_function(void) {
FILE *f = fopen("log.txt", "w");
fprintf(f, "hello");
}
/* Use after close */
void use_after_close(void) {
FILE *f = fopen("tmp.txt", "r");
fclose(f);
char buf[64];
fread(buf, 1, 64, f);
}
/* Branch leak — closed on one path only */
void branch_leak(int cond) {
FILE *f = fopen("x.txt", "r");
if (cond) {
fclose(f);
}
}
/* Multiple handles — both properly closed */
void multi_handle(void) {
FILE *a = fopen("a.txt", "r");
FILE *b = fopen("b.txt", "w");
fclose(a);
fclose(b);
}
/* Double close */
void double_close(void) {
FILE *f = fopen("d.txt", "r");
fclose(f);
fclose(f);
}
/* Malloc/free — clean */
void malloc_clean(void) {
char *p = malloc(1024);
memset(p, 0, 1024);
free(p);
}
/* Malloc leak — never freed */
void malloc_leak(void) {
char *p = malloc(512);
memset(p, 0, 512);
}

View file

@ -73,6 +73,47 @@ fn bench_full_scan(c: &mut Criterion) {
});
}
fn bench_full_scan_with_state(c: &mut Criterion) {
let fixtures = Path::new(FIXTURES).canonicalize().expect("fixtures dir");
let mut cfg = Config::default();
cfg.scanner.mode = AnalysisMode::Full;
cfg.scanner.enable_state_analysis = true;
cfg.performance.worker_threads = Some(1);
cfg.performance.channel_multiplier = 1;
cfg.performance.batch_size = 64;
c.bench_function("full_scan_with_state", |b| {
b.iter(|| {
let (rx, handle) = nyx_scanner::walk::spawn_file_walker(&fixtures, &cfg);
if let Err(err) = handle.join() {
panic!("walker panicked: {err:#?}");
}
let paths: Vec<_> = rx.into_iter().flatten().collect();
// Pass 1: extract summaries
let mut all_sums = Vec::new();
for path in &paths {
if let Ok(sums) = nyx_scanner::ast::extract_summaries_from_file(path, &cfg) {
all_sums.extend(sums);
}
}
let root_str = fixtures.to_string_lossy();
let global = nyx_scanner::summary::merge_summaries(all_sums, Some(&root_str));
// Pass 2: full analysis with state
let mut diags = Vec::new();
for path in &paths {
if let Ok(mut d) =
nyx_scanner::ast::run_rules_on_file(path, &cfg, Some(&global), Some(&fixtures))
{
diags.append(&mut d);
}
}
diags
});
});
}
fn bench_single_file_parse_and_cfg(c: &mut Criterion) {
let fixture = Path::new(FIXTURES).join("sample.rs");
let fixture = fixture.canonicalize().expect("sample.rs fixture");
@ -86,6 +127,40 @@ fn bench_single_file_parse_and_cfg(c: &mut Criterion) {
});
}
fn bench_state_analysis_only(c: &mut Criterion) {
let fixture = Path::new(FIXTURES)
.join("state_bench.c")
.canonicalize()
.expect("state_bench.c fixture");
let mut cfg = Config::default();
cfg.scanner.mode = AnalysisMode::Full;
cfg.scanner.enable_state_analysis = true;
// Parse and build CFG once (outside benchmark loop)
let (file_cfg, lang) = nyx_scanner::ast::build_cfg_for_file(&fixture, &cfg)
.expect("build cfg")
.expect("supported language");
let source_bytes = std::fs::read(&fixture).expect("read fixture");
let top = file_cfg.toplevel();
c.bench_function("state_analysis_only", |b| {
b.iter(|| {
nyx_scanner::state::run_state_analysis(
&top.graph,
top.entry,
lang,
&source_bytes,
&file_cfg.summaries,
None,
true,
&[],
&[],
&std::collections::HashSet::new(),
)
});
});
}
fn bench_classify(c: &mut Criterion) {
c.bench_function("classify_hit", |b| {
b.iter(|| nyx_scanner::labels::classify("rust", "std::env::var", None));
@ -100,7 +175,9 @@ criterion_group!(
benches,
bench_ast_only_scan,
bench_full_scan,
bench_full_scan_with_state,
bench_single_file_parse_and_cfg,
bench_state_analysis_only,
bench_classify,
);
criterion_main!(benches);

20
book.toml Normal file
View file

@ -0,0 +1,20 @@
[book]
title = "Nyx"
authors = ["Eli Peter"]
description = " Multi-language static analysis with cross-file taint tracking. Scan your repo, triage findings in your browser, commit triage state with your code. No cloud, no account."
language = "en"
src = "docs"
[output.html]
default-theme = "navy"
preferred-dark-theme = "navy"
git-repository-url = "https://github.com/elicpeter/nyx"
edit-url-template = "https://github.com/elicpeter/nyx/edit/master/{path}"
site-url = "/nyx/"
[output.html.fold]
enable = true
level = 1
[output.html.search]
enable = true

72
build.rs Normal file
View file

@ -0,0 +1,72 @@
use std::path::Path;
use std::process::Command;
fn main() {
// Only relevant when the serve feature is active
if std::env::var("CARGO_FEATURE_SERVE").is_err() {
return;
}
let dist_dir = Path::new("src/server/assets/dist");
let index_html = dist_dir.join("index.html");
// Re-run build.rs only when dist output is missing/changed
println!("cargo:rerun-if-changed=src/server/assets/dist/index.html");
if index_html.exists() {
// Dist already built — nothing to do
return;
}
// Dist missing — try to build frontend
let frontend_dir = Path::new("frontend");
if !frontend_dir.join("package.json").exists() {
emit_placeholder_and_warn(dist_dir);
return;
}
// Run npm install + build
println!("cargo:warning=Frontend dist not found, running npm install && npm run build...");
let npm_install = Command::new("npm")
.arg("install")
.current_dir(frontend_dir)
.status();
match npm_install {
Ok(s) if s.success() => {}
_ => {
emit_placeholder_and_warn(dist_dir);
return;
}
}
let npm_build = Command::new("npm")
.arg("run")
.arg("build")
.current_dir(frontend_dir)
.status();
match npm_build {
Ok(s) if s.success() => {
println!("cargo:warning=Frontend built successfully.");
}
_ => {
emit_placeholder_and_warn(dist_dir);
}
}
}
fn emit_placeholder_and_warn(dist_dir: &Path) {
// Create minimal placeholder files so compilation succeeds
std::fs::create_dir_all(dist_dir).ok();
std::fs::write(
dist_dir.join("index.html"),
"<!DOCTYPE html><html><body><h1>Frontend not built</h1><p>Run: cd frontend &amp;&amp; npm install &amp;&amp; npm run build</p></body></html>",
)
.ok();
std::fs::write(dist_dir.join("app.js"), "// frontend not built\n").ok();
std::fs::write(dist_dir.join("style.css"), "/* frontend not built */\n").ok();
println!(
"cargo:warning=Node.js/npm not available — wrote placeholder frontend assets. Run 'cd frontend && npm install && npm run build' for the real UI."
);
}

View file

@ -8,16 +8,20 @@
[scanner]
## If full uses both ast patterns and cfg taint analysis,
## Possible values: full | ast | cfg
## Analysis mode: full | ast | cfg | taint
## full = AST analyses + CFG + state + taint
## ast = AST analyses only (tree-sitter patterns + auth analysis; no CFG/taint/state)
## cfg = CFG + state + taint only (no AST patterns)
## taint = taint-focused CFG analysis only (no AST patterns, no state findings)
mode = "full"
## Minimum severity level to include in the report
## Possible values: Low | Medium | High | Critical
## Possible values: Low | Medium | High
min_severity = "Low"
## Maximum file size to scan (MiB); null = unlimited
max_file_size_mb = null
## Maximum file size to scan (MiB); null = unlimited.
## Raise or set to `null` when scanning a trusted codebase with large generated files or bundles.
max_file_size_mb = 16
## File extensions to ignore completely
excluded_extensions = [
@ -34,7 +38,7 @@ excluded_directories = [
## Individual files to ignore completely
excluded_files = []
## Honour global ignore file (e.g. ~/.config/nyx/ignore)
## Honour global ignore file (e.g. ~/.config/nyx/ignore) (RESERVED)
read_global_ignore = false
## Honour .gitignore / .hgignore, etc.
@ -54,28 +58,44 @@ scan_hidden_files = false
## Enable state-model dataflow analysis (resource lifecycle + auth state).
## Detects use-after-close, double-close, resource leaks, and unauthed access.
## Requires mode = "full" or "taint" (needs CFG). Default: off.
enable_state_analysis = false
## Requires mode = "full" or "cfg" (or explicit taint/state-capable scans). Default: on.
enable_state_analysis = true
## Enable AST-based authorization analysis for supported web frameworks.
## Produces `<lang>.auth.*` findings such as admin-route, ownership, token,
## and stale-auth checks. Runs only when AST analysis is active:
## mode = "full" or "ast" => auth analysis runs
## mode = "cfg" or "taint" => auth analysis is skipped
## Per-language auth overrides live under [analysis.languages.<slug>.auth].
enable_auth_analysis = true
## Catch per-file panics during analysis and continue the scan.
## When false (default), a panic in one file's analyser aborts the whole
## scan — useful for catching engine bugs loudly in development.
## When true, the poisoned file is skipped with a warning; the rest of
## the scan proceeds. Enable when running against untrusted input.
# enable_panic_recovery = false
[database]
## Where to store the SQLite database (empty = default path)
## Custom SQLite database path (empty = platform default) (RESERVED)
path = ""
## Number of days to keep database files; 0 = no cleanup (UNIMPLEMENTED)
## Number of days to keep database files; 0 = no cleanup (RESERVED)
auto_cleanup_days = 30
## Maximum database size in MiB; 0 = no limit (UNIMPLEMENTED)
## Maximum database size in MiB; 0 = no limit (RESERVED)
max_db_size_mb = 1024
## Run VACUUM on startup (UNIMPLEMENTED)
## Run VACUUM on startup
vacuum_on_startup = false
[output]
## Output format: console | json | sarif
## Default output format: console | json | sarif
## Used when --format is not specified on the command line.
default_format = "console"
## Suppress all human-readable status output (stderr)
@ -120,13 +140,13 @@ rollup_examples = 5
[performance]
## Maximum search depth; null = unlimited (UNIMPLEMENTED)
## Maximum search depth; null = unlimited
max_depth = null
## Minimum depth for reported entries; null = none (UNIMPLEMENTED)
## Minimum depth for reported entries; null = none (RESERVED)
min_depth = null
## Stop traversing into matching directories
## Stop traversing into matching directories (RESERVED)
prune = false
## Worker threads; null or 0 = auto
@ -139,16 +159,165 @@ batch_size = 100
channel_multiplier = 4
## Maximum stack size for Rayon threads (bytes)
rayon_thread_stack_size = 8 * 1024 * 1024 # 8 MiB
rayon_thread_stack_size = 8388608 # 8 MiB
## Timeout on individual files (seconds); null = none (UNIMPLEMENTED)
## Timeout on individual files (seconds); null = none (RESERVED)
scan_timeout_secs = null
## Maximum memory to use in MiB; 0 = no limit (UNIMPLEMENTED)
## Maximum memory to use in MiB; 0 = no limit (RESERVED)
memory_limit_mb = 512
[server]
## Enable the local web UI server (nyx serve)
enabled = true
## Host to bind to (localhost only by default for security)
host = "127.0.0.1"
## Port for the web UI
port = 9700
## Open browser automatically when serve starts
open_browser = true
## Auto-reload UI when scan results change
auto_reload = true
## Persist scan runs for history view
persist_runs = true
## Maximum number of saved runs
max_saved_runs = 50
## Auto-sync triage decisions to .nyx/triage.json in the project root.
## When enabled, triage changes are written to this file so they can be
## committed to git and shared with your team.
triage_sync = true
[runs]
## Persist scan run history to disk
persist = false
## Maximum number of runs to keep
max_runs = 100
## Save scan logs with each run
save_logs = false
## Save stdout capture with each run
save_stdout = false
## Save code snippets in findings
save_code_snippets = true
# ─── Scan Profiles ──────────────────────────────────────────────────
# Named presets that override scan-related config.
# Activate with --profile <name> on the command line.
#
# Built-in profiles: quick, full, ci, taint_only, conservative_large_repo.
# Override a built-in by defining [profiles.<name>] here.
#
# [profiles.quick]
# mode = "ast"
# min_severity = "Medium"
#
# [profiles.ci]
# mode = "full"
# min_severity = "Medium"
# quiet = true
# default_format = "sarif"
# ─── Analysis engine toggles ────────────────────────────────────────
# Release-grade switches for optional analysis passes. Every field has a
# matching CLI flag (e.g. --no-symex / --backwards-analysis), which takes
# precedence over the config value for a single run. The listed env vars
# override both config and CLI when set to "0" or "false".
#
# For a shortcut that sets the full stack in one shot, use
# `nyx scan --engine-profile {fast,balanced,deep}`. The profile applies
# before individual toggles, so you can mix (e.g. `--engine-profile fast
# --backwards-analysis`). See `docs/cli.md` for profile contents.
#
# To print the resolved engine config for a given invocation without
# running a scan, pass `--explain-engine`.
[analysis.engine]
## Path-constraint solving (prunes infeasible paths in taint).
## Default: on. CLI: --constraint-solving / --no-constraint-solving.
## env: NYX_CONSTRAINT=0 disables.
constraint_solving = true
## Abstract interpretation (interval / string domains).
## Default: on. CLI: --abstract-interp / --no-abstract-interp.
## env: NYX_ABSTRACT_INTERP=0 disables.
abstract_interpretation = true
## k=1 context-sensitive callee inlining for intra-file calls.
## Default: on. CLI: --context-sensitive / --no-context-sensitive.
## env: NYX_CONTEXT_SENSITIVE=0 disables.
context_sensitive = true
## Demand-driven backwards taint analysis. Adds a second pass from
## candidate sinks back toward sources to recover flows the forward
## solver gave up on. Default: off because it adds scan time on large
## repos. CLI: --backwards-analysis / --no-backwards-analysis.
## env: NYX_BACKWARDS=1 enables.
backwards_analysis = false
## Per-file tree-sitter parse timeout (ms). 0 disables the cap.
## CLI: --parse-timeout-ms. env: NYX_PARSE_TIMEOUT_MS.
parse_timeout_ms = 10000
[analysis.engine.symex]
## Run the symex pipeline after taint. Produces witness strings and
## symbolic verdicts; disable only if you want raw taint output.
## Default: on. CLI: --symex / --no-symex. env: NYX_SYMEX=0 disables.
enabled = true
## Persist and consult cross-file SSA bodies so symex can reason about
## callees defined in other files. Adds index/DB work on pass 1.
## Default: on. CLI: --cross-file-symex / --no-cross-file-symex.
## env: NYX_CROSS_FILE_SYMEX=0 disables.
cross_file = true
## Intra-file interprocedural symex (k >= 2 via frame stack).
## Default: on. CLI: --symex-interproc / --no-symex-interproc.
## env: NYX_SYMEX_INTERPROC=0 disables.
interprocedural = true
## Use the SMT backend when nyx was built with the `smt` feature.
## Ignored when the feature is off.
## Default: on. CLI: --smt / --no-smt. env: NYX_SMT=0 disables.
smt = true
# ─── Per-language analysis rules ─────────────────────────────────────
# [analysis.languages.javascript.auth]
# enabled = true
# admin_path_patterns = ["/admin/"]
# admin_guard_names = ["requireAdmin", "isAdmin", "adminOnly"]
# login_guard_names = ["requireLogin", "authenticate", "requireAuth"]
# authorization_check_names = ["checkMembership", "hasWorkspaceMembership", "checkOwnership"]
# mutation_indicator_names = ["update", "delete", "create", "archive", "publish", "addMembership"]
# read_indicator_names = ["find", "findById", "get", "list"]
# token_lookup_names = ["findByToken"]
# token_expiry_fields = ["expires_at", "expiresAt"]
# token_recipient_fields = ["email", "recipient_email", "recipientEmail"]
# Auth-analysis rule IDs use language-normalized prefixes:
# javascript + typescript => js.auth.*
# python => py.auth.* ruby => rb.auth.* rust => rs.auth.*
# TypeScript inherits [analysis.languages.javascript.auth] by default; add an
# optional [analysis.languages.typescript.auth] block only for TS-specific
# overlays. These settings affect auth analysis only in "full" or "ast" mode.
# Add custom sources, sanitizers, sinks, terminators, and event handlers.
# Each language is keyed under [analysis.languages.<slug>] where slug is
# one of: rust, javascript, typescript, python, go, java, c, cpp, php, ruby.
@ -171,4 +340,6 @@ memory_limit_mb = 512
#
# Valid `kind` values: "source", "sanitizer", "sink"
# Valid `cap` values: "env_var", "html_escape", "shell_escape",
# "url_encode", "json_parse", "file_io", "all"
# "url_encode", "json_parse", "file_io",
# "fmt_string", "sql_query", "deserialize",
# "ssrf", "code_exec", "crypto", "all"

View file

@ -1,13 +1,68 @@
[licenses]
allow = [
# --- Apache / MIT / BSD / permissive ---
"Apache-2.0",
"MIT",
"MIT-0",
"Unicode-3.0",
"BSD-2-Clause",
"Unlicense",
"BSD-3-Clause",
"ISC",
"Zlib",
"zlib-acknowledgement",
"BSL-1.0",
"NCSA",
"PostgreSQL",
"curl",
"BlueOak-1.0.0",
"X11",
"HPND",
"TCL",
"ICU",
"Info-ZIP",
# --- Unicode / data / specs ---
"Unicode-DFS-2016",
"Unicode-3.0",
# --- compression / libs ---
"bzip2-1.0.6",
"libpng-2.0",
"IJG",
"FTL",
# --- public domain style ---
"CC0-1.0",
"Unlicense",
"0BSD",
# --- weak copyleft (GPL-compatible) ---
"MPL-2.0",
"LGPL-3.0",
"EPL-2.0",
# --- GPL family ---
"GPL-3.0",
"GPL-3.0-or-later",
"GPL-2.0",
# --- Python / PSF ---
"PSF-2.0",
"Python-2.0",
"Python-2.0.1",
# --- Artistic / Perl ---
"Artistic-2.0",
# --- LLVM / clang ---
"Apache-2.0 WITH LLVM-exception",
# --- data / ML ---
"CDLA-Permissive-2.0",
# --- fonts ---
"OFL-1.1",
# --- Creative Commons (code-safe ones) ---
"CC-BY-3.0",
"CC-BY-4.0",
]

29
docs/SUMMARY.md Normal file
View file

@ -0,0 +1,29 @@
# Summary
# Getting started
- [Quickstart](quickstart.md)
- [Installation](installation.md)
# Using nyx
- [CLI reference](cli.md)
- [Browser UI](serve.md)
- [Configuration](configuration.md)
- [Output formats](output.md)
# Coverage
- [Language maturity](language-maturity.md)
- [Rules](rules.md)
- [Auth analysis](auth.md)
# Under the hood
- [How it works](how-it-works.md)
- [Advanced analysis](advanced-analysis.md)
- [Detectors](detectors.md)
- [Patterns](detectors/patterns.md)
- [CFG](detectors/cfg.md)
- [State](detectors/state.md)
- [Taint](detectors/taint.md)

221
docs/advanced-analysis.md Normal file
View file

@ -0,0 +1,221 @@
# Advanced Analysis
Nyx ships four optional analysis passes that layer on top of the core SSA
taint engine. Each pass is independently switchable via config
(`[analysis.engine]` in `nyx.conf` / `nyx.local`), a matching CLI flag pair,
or; as a legacy last-resort override for library users with no CLI entry
point; a `NYX_*` environment variable. All four are **on by default**: turning
them off trades precision for speed.
See [`Configuration`](configuration.md#analysisengine) for the full config
surface and CLI flag table. This page explains what each pass does, why it
helps, how to disable it, and what it does not cover.
---
## Abstract interpretation
**What it does.** Propagates interval and string abstract domains through the
SSA worklist alongside taint. Integer values carry `[lo, hi]` bounds;
string values carry a prefix and suffix (plus a bit domain for known-zero /
known-one bits). Values are joined at merge points and widened at loop
heads so the worklist always terminates.
**Why it helps.** Lets Nyx suppress some findings that are obviously safe
given the abstract value; a proven-bounded integer does not flow into a
SQL sink as an injection risk; an SSRF sink whose URL prefix is locked to a
trusted host stays quiet. This turns a large class of FPs on numeric and
locked-prefix paths into true negatives.
**How to turn it off.**
| Surface | Value |
|---|---|
| Config | `abstract_interpretation = false` under `[analysis.engine]` |
| CLI flag | `--no-abstract-interp` |
| Env var (legacy) | `NYX_ABSTRACT_INTERP=0` |
**Limitations.** The interval domain is 64-bit signed; very wide or
overflow-producing arithmetic degrades to `` (unbounded). String prefix /
suffix tracking is concat-only; it does not model reordering, reversal, or
character-level regex constraints. Loop widening deliberately drops
changing bounds rather than chasing fixpoints.
**Source**: [`src/abstract_interp/`](https://github.com/elicpeter/nyx/tree/master/src/abstract_interp/).
---
## Context-sensitive analysis
**What it does.** Adds k=1 call-site-sensitive taint propagation for
intra-file callees. When a function is invoked, Nyx reanalyzes the callee
body with the actual per-argument taint signature of the call site,
producing call-site-specific return taint. Results are cached by
`(function_name, ArgTaintSig)` so repeated calls with the same signature
are free.
**Why it helps.** A helper called once with a tainted argument and once
with a sanitized argument produces two different findings; without k=1
sensitivity, the conservative union of both call sites would be applied
to the sanitized call, producing a spurious finding there.
**How to turn it off.**
| Surface | Value |
|---|---|
| Config | `context_sensitive = false` under `[analysis.engine]` |
| CLI flag | `--no-context-sensitive` |
| Env var (legacy) | `NYX_CONTEXT_SENSITIVE=0` |
**Limitations.** Intra-file only. Cross-file callees are resolved via
summaries (see `src/summary/`) rather than re-inlined. Depth is capped at
k=1 to prevent cache blow-up and re-entrancy; higher k would require a
different cache key design. Callee bodies larger than the internal
`MAX_INLINE_BLOCKS` threshold fall back to the summary path. Cache keys
hash per-argument `Cap` bits but not source-origin identity, so two
callers with identical caps but different origins share cached
origin-attribution.
**Source**: [`src/taint/ssa_transfer.rs`](https://github.com/elicpeter/nyx/blob/master/src/taint/ssa_transfer.rs)
(`ArgTaintSig`, `InlineCache`, `inline_analyse_callee`).
---
## Symbolic execution
**What it does.** Builds a symbolic expression tree per tainted SSA value,
generates a witness string for each taint finding (the concrete-looking
shape of the dangerous value at the sink), and detects sanitization
patterns that the taint engine alone would miss. Supports string
operations (`trim`, `replace`, `toLower`, `substring`, `strlen`, …),
arithmetic, concatenation, phi nodes, and opaque calls.
**Why it helps.** Raises finding quality. A taint finding with a rendered
witness like `"SELECT * FROM t WHERE id=" + userInput` is substantially
easier to triage than one without. Also powers some confidence-gating for
downstream display.
**How to turn it off.**
| Surface | Value |
|---|---|
| Config | `symex.enabled = false` under `[analysis.engine]` |
| CLI flag | `--no-symex` |
| Env var (legacy) | `NYX_SYMEX=0` |
Two nested switches refine the scope without disabling symex entirely:
| Setting | CLI | Env | Default | Effect |
|---|---|---|---|---|
| `symex.cross_file` | `--no-cross-file-symex` | `NYX_CROSS_FILE_SYMEX=0` | on | Consult cross-file SSA bodies so symex can reason about callees defined in other files |
| `symex.interprocedural` | `--no-symex-interproc` | `NYX_SYMEX_INTERPROC=0` | on | Intra-file interprocedural symex (k ≥ 2 via frame stack) |
**Limitations.** Expression trees are bounded at `MAX_EXPR_DEPTH=32`;
deeper expressions degrade to `Unknown` rather than growing unboundedly.
Sanitizer detection is informational: string-replace sanitizer patterns
are reported as witness metadata, not used to clear taint.
**Source**: [`src/symex/`](https://github.com/elicpeter/nyx/tree/master/src/symex/).
---
## Demand-driven analysis
**What it does.** After the forward pass-2 taint analysis finishes, runs a
*backwards* walk from each sink's tainted SSA operands. The walk follows
reverse SSA-edge transfer (phi fan-out, `Assign` operand-fanout, `Call`
body-expansion or arg-fanout) until it reaches a taint source, proves
the flow infeasible via an accumulated path predicate, or exhausts its
budget. Each forward finding is then annotated with the aggregate verdict:
- `backwards-confirmed`; a matching source was reached. Finding picks
up a small confidence boost and the note appears in
`evidence.symbolic.cutoff_notes`.
- `backwards-infeasible`; every walk proved the flow unreachable.
Finding is capped to Low confidence and a user-readable limiter is
attached.
- `backwards-budget-exhausted`; the walk hit `BACKWARDS_VALUE_BUDGET`
without a verdict. Recorded as a limiter so operators can see when
the pass could not keep up.
- Inconclusive outcomes are a no-op: the forward finding is untouched.
Because the backwards walk can consult `GlobalSummaries.bodies_by_key`
(populated by the cross-file callee body persistence layer) it closes
across file boundaries; when a callee body is not loadable the walk
falls back to fanning out over the call's arguments so local reach-back
is still possible.
**Why it helps.** Inverts the analysis direction so budget follows
questions the scanner actually cares about; "does any source reach
*this* sink?"; instead of proving every potential source-to-sink
path. Corroborated findings are a stronger signal than forward-only
ones, and proven-infeasible flows provide a principled way to lower
confidence on forward false positives without silently dropping them.
**How to turn it on.** Defaults off so the benchmark floor is preserved
while the pass stabilises.
| Surface | Value |
|---|---|
| Config | `backwards_analysis = true` under `[analysis.engine]` |
| CLI flag | `--backwards-analysis` / `--no-backwards-analysis` |
| Env var (legacy) | `NYX_BACKWARDS=1` |
**Limitations (first cut).** Reverse call-graph expansion past a
`ReachedParam` is deferred; the walk terminates at function parameters
rather than crossing back into callers. Path-constraint pruning is
conservative: only the accumulated `PredicateSummary` bits are consulted,
not the full symbolic predicate stack. Depth-bounded at k=2 for
cross-function body expansion. See `DEFAULT_BACKWARDS_DEPTH`,
`BACKWARDS_VALUE_BUDGET`, and `MAX_BACKWARDS_CALLEE_BLOCKS` in
`src/taint/backwards.rs` for the exact bounds.
**Source**: [`src/taint/backwards.rs`](https://github.com/elicpeter/nyx/blob/master/src/taint/backwards.rs).
---
## Constraint solving
**What it does.** Collects path constraints at each branch in SSA and
propagates them alongside taint. Prunes paths whose accumulated constraint
set is unsatisfiable; a taint flow guarded by `if x < 0 && x > 10` is
dropped rather than surfaced. Optionally delegates the satisfiability
check to Z3 when Nyx is built with the `smt` Cargo feature.
**Why it helps.** Removes a class of FPs rooted in clearly-infeasible
control-flow combinations. Without path constraints, a taint flow that
only occurs when mutually-exclusive branches are simultaneously taken can
still produce a finding.
**How to turn it off.**
| Surface | Value |
|---|---|
| Config | `constraint_solving = false` under `[analysis.engine]` |
| CLI flag | `--no-constraint-solving` |
| Env var (legacy) | `NYX_CONSTRAINT=0` |
The SMT backend is a separate switch:
| Setting | CLI | Env | Default | Effect |
|---|---|---|---|---|
| `symex.smt` | `--no-smt` | `NYX_SMT=0` | on when built with `smt` feature | Delegate satisfiability checks to Z3; ignored if Nyx was built without `smt` |
**Limitations.** The default path-constraint domain is syntactic;
trivially-inconsistent pairs are caught without an SMT solver, but richer
algebraic unsatisfiability requires the `smt` feature (Z3). Without `smt`,
Nyx ships a lightweight satisfiability check that catches literal
contradictions but not deeper reasoning.
**Source**: [`src/constraint/`](https://github.com/elicpeter/nyx/tree/master/src/constraint/).
---
## Combining the switches
The defaults (all on) are the configuration Nyx is benchmarked against.
Turning any switch off trades precision for speed and may move findings
relative to the published baseline; CI regression gates assume defaults.
If you need a minimal-overhead scan (for very large repositories or a
pre-commit fast path), the AST-only scan mode (`--mode ast`) skips CFG,
taint, and all four advanced passes entirely and is the right tool.

1
docs/assets Symbolic link
View file

@ -0,0 +1 @@
../assets

91
docs/auth.md Normal file
View file

@ -0,0 +1,91 @@
# Auth analysis
**Rust today.** Other languages have rule scaffolding in [`src/auth_analysis/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/config.rs) (Python, Ruby, Go, Java, JavaScript, TypeScript), but only Rust has benchmark corpus coverage and the precision work to back it. Treat findings on other languages as preview; the rule prefix (`py.auth.*`, `js.auth.*`, `rb.auth.*`, `go.auth.*`, `java.auth.*`) is reserved but the matchers haven't been validated against real codebases yet.
## What it catches
The Rust rule is `rs.auth.missing_ownership_check`. It fires when a request handler reaches a privileged operation that takes a scoped identifier (`*_id`, row reference, scoped resource) without a preceding ownership or membership check.
Concretely, it looks for five patterns of authorization in the function body and flags the call when none are present:
- A call to a recognised authorization helper. Defaults: `check_ownership`, `has_ownership`, `require_ownership`, `ensure_ownership`, `is_owner`, `authorize`, `verify_access`, `has_permission`, `can_access`, `can_manage`, plus `*_membership` and `require_{group,org,workspace,tenant,team}_member` variants. Extend in `[analysis.languages.rust]`.
- An ownership-equality check on a row reference: `if owner_id != user.id { return 403 }` or any `field_id != self_actor` shape. The check writes `AuthCheck` evidence back to the row-fetch arguments via `AnalysisUnit.row_field_vars`.
- A self-actor reference: `let user = require_auth(...).await?` followed by use of `user.id`, `user.user_id`, `user.uid`. The actor is recognised from typed extractor params (`Extension<Session>`, `CurrentUser`, etc.) and from typed helper bindings.
- A SQL query that joins through an ACL table or filters by `user_id` predicate. Detected without a SQL parser via [`sql_semantics.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/sql_semantics.rs); the authorized result variable propagates through `let row = ...prepare(LIT)...`, `for row in result`, `let id = row.get(...)`.
- A helper-summary lift: handler calls `validate_target(db, widget_id, user.id)` whose body contains a `require_*_member` call. Cross-function summaries are merged at fixed-point (capped at 4 iterations).
## Sink classification
The same call name can be safe on a local collection and dangerous on a database. The detector categorises each candidate sink before deciding whether to flag:
| Class | Examples | Default treatment |
|---|---|---|
| `InMemoryLocal` | `map.insert`, `set.insert`, `vec.push` on tracked local | Never a sink |
| `RealtimePublish` | `realtime.publish_to_group`, `pubsub.send` | Sink unless ownership is established for the channel scope |
| `OutboundNetwork` | `http.post`, `reqwest::Client::post` | Sink unless a sanitiser is on the path |
| `CacheCrossTenant` | `redis.set`, `memcached.set` with scoped keys | Sink unless tenant is checked |
| `DbMutation` | `db.insert`, `repo.save` with scoped IDs | Sink unless ownership is established |
| `DbCrossTenantRead` | `db.query` returning rows from a tenant scope | Sink unless ACL-join or tenant predicate is present |
Receiver type drives the classification when SSA type facts are available, so `client.send(...)` correctly resolves through the receiver's inferred type.
## What it can't catch
- **Non-Rust frameworks**, in practice. Scaffolding exists; coverage doesn't.
- **Type-system authorization.** A typestate pattern that makes unauthenticated handlers fail to compile (`fn endpoint(user: AuthenticatedUser<Admin>)`) is invisible. This is mostly fine because the type system already enforced the check, but the rule won't credit it.
- **Authorization performed only via macros** that the AST doesn't expose as a recognisable call.
- **Cross-async-boundary actor binding.** If the handler awaits `let user = require_auth(...).await?` and then spawns a task that uses `user.id` after a `tokio::spawn`, the spawn body is treated as a separate scope.
## The taint-based variant
A second rule, `rs.auth.missing_ownership_check.taint`, folds the same logic into the SSA/taint engine using the `Cap::UNAUTHORIZED_ID` capability (bit 12). Request-bound handler parameters seed `UNAUTHORIZED_ID` into taint state; ownership checks act as sanitizers that strip the cap; sinks that take scoped IDs require it absent.
This path is **off by default** while the standalone analyser carries the stable signal. Enable both:
```toml
[scanner]
enable_auth_as_taint = true
```
Run them together; if both fire for the same site, treat it as the same finding (the taint variant carries fuller flow evidence).
## Tuning
### Add a project-specific authorization helper
```toml
[[analysis.languages.rust.rules]]
matchers = ["require_subscription", "ensure_paid_seat"]
kind = "sanitizer"
cap = "unauthorized_id"
```
The same rule recognised in the standalone analyser also strips `Cap::UNAUTHORIZED_ID` for the taint-based variant.
### Recognised actor names
Recognised by default: `user.id`, `user.user_id`, `user.uid`, `session.user_id`, `current_user.id`, plus typed extractor parameters with `CurrentUser`, `SessionUser`, `AuthUser`, `Extension<...>` shapes. To add a custom binding pattern, file an issue or add a fixture; the heuristic is in [`src/auth_analysis/checks.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/checks.rs) under `extract_validation_target` and friends.
### Suppress
Inline:
```rust
db.insert(widget_id, value)?; // nyx:ignore rs.auth.missing_ownership_check
```
Or filter by severity / confidence in CI:
```bash
nyx scan . --severity ">=MEDIUM" --min-confidence medium
```
## In the UI
Auth findings render alongside taint findings in the [browser UI](serve.md). The flow visualiser shows the sink call, the actor reference (when one was found), and any helper-summary path the engine traversed; the How to fix panel mirrors the rule's recommendation.
<p align="center"><img src="../assets/screenshots/docs/serve-finding-detail.png" alt="Nyx finding detail: numbered source → call → sink walk with a How to fix panel and an inline evidence object" width="900"/></p>
## Where the work was done
The remediation work is documented release-by-release in `tests/benchmark/RESULTS.md` under the Rust auth row. Phases A1 through B5 (precision and structural improvements) and Phase C (taint-based variant) all landed on the 0.5.0 release branch. The benchmark corpus at [`tests/benchmark/corpus/rust/auth/`](https://github.com/elicpeter/nyx/tree/master/tests/benchmark/corpus/rust/auth/) is 10 fixtures covering the five FP patterns plus a true-positive control.

1
docs/changelog.md Normal file
View file

@ -0,0 +1 @@
{{#include ../CHANGELOG.md}}

View file

@ -53,8 +53,15 @@ nyx scan [PATH] [OPTIONS]
| Flag | Default | Description |
|------|---------|-------------|
| `-f, --format <FMT>` | `console` | Output format: `console`, `json`, or `sarif` |
| `--quiet` | off | Suppress status messages (stderr); stdout stays clean |
| `--quiet` | off | Suppress status messages (stderr), including the Preview-tier banner for C/C++ scans |
| `--no-rank` | off | Disable attack-surface ranking |
| `--no-state` | off | Disable state-model analysis (resource lifecycle + auth state). Overrides `scanner.enable_state_analysis` |
### Profiles
| Flag | Default | Description |
|------|---------|-------------|
| `--profile <NAME>` | *(none)* | Apply a named scan profile. Built-ins: `quick`, `full`, `ci`, `taint_only`, `conservative_large_repo`. User-defined profiles override built-ins with the same name. CLI flags still take precedence over profile values |
### Filtering
@ -63,10 +70,11 @@ nyx scan [PATH] [OPTIONS]
| `--severity <EXPR>` | *(none)* | Filter findings by severity |
| `--min-score <N>` | *(none)* | Drop findings with rank score below N |
| `--min-confidence <LEVEL>` | *(none)* | Drop findings below this confidence level (`low`, `medium`, `high`) |
| `--require-converged` | off | Drop findings whose engine provenance notes indicate widening (over-report) or analysis bail. Keeps `under-report` findings (emitted flow is still real). Intended for strict CI gates. |
| `--fail-on <SEV>` | *(none)* | Exit code 1 if any finding >= this severity |
| `--show-suppressed` | off | Show inline-suppressed findings (dimmed, tagged `[SUPPRESSED]`) |
| `--keep-nonprod-severity` | off | Don't downgrade severity for test/vendor paths |
| `--all` | off | Disable category filtering, rollups, and LOW budgets show everything |
| `--all` | off | Disable category filtering, rollups, and LOW budgets -- show everything |
| `--include-quality` | off | Include Quality-category findings (hidden by default) |
| `--max-low <N>` | `20` | Maximum total LOW findings to show |
| `--max-low-per-file <N>` | `1` | Maximum LOW findings per file |
@ -85,6 +93,65 @@ nyx scan [PATH] [OPTIONS]
**Deprecated aliases**: `--high-only` (use `--severity HIGH`), `--include-nonprod` (use `--keep-nonprod-severity`).
`--fail-on` returns a non-zero exit code when the threshold trips, so CI jobs fail without further wiring:
<p align="center"><img src="../assets/screenshots/docs/cli-failon.png" alt="nyx scan with --fail-on HIGH against a small fixture: three HIGH taint findings printed, followed by exit=1 from the shell" width="900"/></p>
Quality-category and rollup-prone Low findings are filtered down by default. The footer tells you exactly what got dropped and which knob to turn:
<p align="center"><img src="../assets/screenshots/docs/cli-rollup-tail.png" alt="nyx scan tail: warning '*' generated 57 issues; Suppressed 92 LOW/Quality findings; Active filters max_low=20, max_low_per_file=1, max_low_per_rule=10; Use --include-quality, --max-low, or --all to adjust" width="900"/></p>
### Analysis Engine Toggles
Override the corresponding `[analysis.engine]` values in `nyx.conf` for a single run. All default **on**; pass the `--no-*` variant to disable.
| Pair | Config field | Effect when disabled |
|------|---|---|
| `--constraint-solving` / `--no-constraint-solving` | `constraint_solving` | Skip path-constraint solving; infeasible paths no longer pruned |
| `--abstract-interp` / `--no-abstract-interp` | `abstract_interpretation` | Skip interval / string / bit abstract domains |
| `--context-sensitive` / `--no-context-sensitive` | `context_sensitive` | Treat intra-file callees insensitively (summary-only) |
| `--symex` / `--no-symex` | `symex.enabled` | Skip the symex pipeline; no symbolic verdicts or witnesses |
| `--cross-file-symex` / `--no-cross-file-symex` | `symex.cross_file` | Skip extracting / consulting cross-file SSA bodies |
| `--symex-interproc` / `--no-symex-interproc` | `symex.interprocedural` | Cap symex frame stack at the entry function |
| `--smt` / `--no-smt` | `symex.smt` | Skip the SMT backend (still a no-op without the `smt` feature) |
| `--backwards-analysis` / `--no-backwards-analysis` | `backwards_analysis` | Demand-driven backwards taint walk from sinks (default **off**) |
| `--parse-timeout-ms <N>` | `parse_timeout_ms` | Per-file tree-sitter parse timeout (ms); `0` disables the cap |
### Lattice-width Caps
Two caps bound the width of taint origin sets and points-to sets per SSA value. When a set would exceed the cap, entries are truncated deterministically and an engine note (`OriginsTruncated` / `PointsToTruncated`) is recorded on affected findings so you can see when precision was lost.
| Flag | Default | Description |
|------|---------|-------------|
| `--max-origins <N>` | `32` | Max taint origins retained per lattice value. Raise on very wide codebases where truncation is observed; lower only when lattice width is a measured bottleneck. Also set via `NYX_MAX_ORIGINS` |
| `--max-pointsto <N>` | `32` | Max abstract heap objects retained per points-to set. Raise on factory-heavy codebases where truncation is observed. Also set via `NYX_MAX_POINTSTO` |
See [configuration.md](configuration.md#analysisengine) for the full schema.
### Engine-Depth Profile
Individual engine toggles are fine-grained but hard to remember in combination. The `--engine-profile` shortcut sets the whole stack in one shot, and individual flags are layered on top after the profile is applied.
| Profile | Backwards | Symex | Abstract-interp | Context-sensitive |
|---------|-----------|-------|-----------------|-------------------|
| `fast` | off | off | off | off |
| `balanced` (default) | off | off | on | on |
| `deep` | on | on (cross-file + interprocedural) | on | on |
All three profiles build the AST, CFG, and SSA lattice and run forward taint; the columns above show which additional analyses each profile enables. SMT (`symex.smt`) is always off unless Nyx was built with `--features smt`.
Individual flags override the profile. For example, `--engine-profile fast --backwards-analysis` runs the fast stack but with backwards analysis on.
### Explain Effective Engine
`--explain-engine` prints the resolved engine configuration (profile + config + CLI overrides + env-var fallbacks) to stdout and exits without scanning. Useful for sanity-checking a CI invocation.
```bash
nyx scan --engine-profile deep --no-smt --explain-engine
```
<p align="center"><img src="../assets/screenshots/docs/cli-explain-engine.png" alt="nyx scan --engine-profile deep --explain-engine output: resolved config showing every analysis pass, its current state, and the CLI flag/env var that controls it" width="900"/></p>
### Examples
```bash
@ -148,6 +215,8 @@ nyx index status [PATH]
Display index statistics (file count, size, last modified) for the given path.
<p align="center"><img src="../assets/screenshots/docs/cli-idxstatus.png" alt="nyx index status output: project name, index path under the platform config dir, exists/size/modified fields" width="900"/></p>
---
## `nyx list`
@ -185,7 +254,9 @@ Manage configuration.
### `nyx config show`
Print the effective merged configuration as TOML.
Print the effective merged configuration as TOML. Useful for sanity-checking what the scanner is actually using after `nyx.conf` and `nyx.local` merge:
<p align="center"><img src="../assets/screenshots/docs/cli-configshow.png" alt="nyx config show output: TOML dump of the merged scanner config showing [scanner] mode/min_severity/excluded_extensions/excluded_directories, [database] settings, and resolved engine toggles" width="900"/></p>
### `nyx config path`
@ -204,7 +275,7 @@ Add a custom taint rule. Written to `nyx.local`.
| `--lang` | `rust`, `javascript`, `typescript`, `python`, `go`, `java`, `c`, `cpp`, `php`, `ruby` |
| `--matcher` | Function or property name to match |
| `--kind` | `source`, `sanitizer`, `sink` |
| `--cap` | `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `all` |
| `--cap` | `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `code_exec`, `crypto`, `unauthorized_id`, `all` |
### `nyx config add-terminator`
@ -216,19 +287,30 @@ Add a terminator function (e.g. `process.exit`). Written to `nyx.local`.
---
## Exit Codes
## Exit codes
| Code | Meaning |
|------|---------|
| `0` | Scan completed; no findings matched `--fail-on` threshold (or no `--fail-on` specified) |
| `1` | Scan completed but at least one finding met or exceeded the `--fail-on` severity |
| Non-zero | Error during scan (I/O error, config parse error, database error, etc.) |
See [output.md](output.md#exit-codes). Summary: `0` on success (including findings without `--fail-on`), `1` when `--fail-on` trips, non-zero on scan errors.
---
## Environment Variables
## Environment variables
Runtime behaviour:
| Variable | Description |
|----------|-------------|
| `RUST_LOG` | Set tracing verbosity (e.g. `RUST_LOG=debug nyx scan .`) |
| `NO_COLOR` | Disable ANSI color output |
Engine toggles (legacy, still honored; prefer CLI flags or `[analysis.engine]` config):
| Variable | Matches |
|---|---|
| `NYX_CONSTRAINT` | `--constraint-solving` |
| `NYX_ABSTRACT_INTERP` | `--abstract-interp` |
| `NYX_CONTEXT_SENSITIVE` | `--context-sensitive` |
| `NYX_SYMEX`, `NYX_CROSS_FILE_SYMEX`, `NYX_SYMEX_INTERPROC` | `--symex` and friends |
| `NYX_SMT` | `--smt` (no-op without the `smt` feature) |
| `NYX_BACKWARDS` | `--backwards-analysis` |
| `NYX_PARSE_TIMEOUT_MS` | `--parse-timeout-ms` |
| `NYX_MAX_ORIGINS`, `NYX_MAX_POINTSTO` | `--max-origins`, `--max-pointsto` |

View file

@ -1,6 +1,8 @@
# Configuration
Nyx uses TOML configuration files. A default config is auto-generated on first run.
Nyx uses TOML configuration files. A default config is auto-generated on first run. If you'd rather edit settings and rules from the browser, the [Config page in `nyx serve`](serve.md#config) is a live editor that writes back to `nyx.local`:
<p align="center"><img src="../assets/screenshots/docs/serve-config.png" alt="Nyx config page: General settings, Triage Sync toggle, Sources panel with language/matcher/capability dropdowns and a per-language matcher table" width="900"/></p>
## File Locations
@ -14,8 +16,8 @@ Run `nyx config path` to see the exact directory on your system.
## File Precedence
1. **`nyx.conf`** Default config (auto-created from built-in template on first run)
2. **`nyx.local`** User overrides (loaded on top of defaults)
1. **`nyx.conf`** -- Default config (auto-created from built-in template on first run)
2. **`nyx.local`** -- User overrides (loaded on top of defaults)
Both files are optional. CLI flags take precedence over both.
@ -24,8 +26,10 @@ Both files are optional. CLI flags take precedence over both.
| Type | Behavior |
|------|----------|
| Scalars (`mode`, `min_severity`, booleans) | User value wins |
| Arrays (`excluded_extensions`, `excluded_directories`) | Union + deduplicate |
| Arrays (`excluded_extensions`, `excluded_directories`, `excluded_files`) | Union + deduplicate |
| Analysis rules | Per-language union with deduplication |
| Profiles | User profile with same name fully replaces built-in |
| Server / Runs | User value wins (full section override) |
Example:
```toml
@ -36,7 +40,7 @@ excluded_extensions = ["jpg", "png", "exe"]
excluded_extensions = ["foo", "jpg"]
# Effective result:
# ["exe", "foo", "jpg", "png"] sorted, deduped union
# ["exe", "foo", "jpg", "png"] -- sorted, deduped union
```
---
@ -49,30 +53,33 @@ excluded_extensions = ["foo", "jpg"]
|-------|------|---------|-------------|
| `mode` | `"full"` \| `"ast"` \| `"cfg"` \| `"taint"` | `"full"` | Analysis mode |
| `min_severity` | `"Low"` \| `"Medium"` \| `"High"` | `"Low"` | Minimum severity to report |
| `max_file_size_mb` | int \| null | null | Max file size in MiB; null = unlimited |
| `max_file_size_mb` | int \| null | 16 | Max file size in MiB; null = unlimited. Default is a safe ceiling for untrusted repos; lift explicitly when scanning trusted codebases with large generated files |
| `excluded_extensions` | [string] | `["jpg", "png", "gif", "mp4", ...]` | File extensions to skip |
| `excluded_directories` | [string] | `["node_modules", ".git", "target", ...]` | Directories to skip |
| `excluded_files` | [string] | `[]` | Specific files to skip |
| `read_global_ignore` | bool | `false` | Honor global ignore file |
| `read_global_ignore` | bool | `false` | Honor global ignore file (RESERVED) |
| `read_vcsignore` | bool | `true` | Honor `.gitignore` / `.hgignore` |
| `require_git_to_read_vcsignore` | bool | `true` | Require `.git` dir to apply gitignore |
| `one_file_system` | bool | `false` | Don't cross filesystem boundaries |
| `follow_symlinks` | bool | `false` | Follow symbolic links |
| `scan_hidden_files` | bool | `false` | Scan dot-files |
| `include_nonprod` | bool | `false` | Keep original severity for test/vendor paths |
| `enable_state_analysis` | bool | `false` | Enable resource lifecycle + auth state analysis. Detects use-after-close, double-close, resource leaks (per-function scope), and unauthenticated access. Requires `mode = "full"` or `mode = "cfg"`. |
| `enable_state_analysis` | bool | `true` | Enable resource lifecycle + auth state analysis. Detects use-after-close, double-close, resource leaks (per-function scope), and unauthenticated access. Requires `mode = "full"` or `mode = "taint"`. |
### `[database]`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `path` | string | `""` | Custom SQLite DB path; empty = platform default |
| `path` | string | `""` | Custom SQLite DB path; empty = platform default (RESERVED) |
| `auto_cleanup_days` | int | `30` | Days to keep DB files (RESERVED) |
| `max_db_size_mb` | int | `1024` | Maximum DB size in MiB (RESERVED) |
| `vacuum_on_startup` | bool | `false` | Run VACUUM before indexed scans |
### `[output]`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `default_format` | `"console"` \| `"json"` \| `"sarif"` | `"console"` | Default output format |
| `default_format` | `"console"` \| `"json"` \| `"sarif"` | `"console"` | Default output format (used when `--format` is not specified) |
| `quiet` | bool | `false` | Suppress status messages |
| `max_results` | int \| null | null | Cap number of findings; null = unlimited |
| `attack_surface_ranking` | bool | `true` | Enable attack-surface ranking |
@ -89,11 +96,122 @@ excluded_extensions = ["foo", "jpg"]
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `max_depth` | int \| null | null | Max filesystem traversal depth; null = unlimited |
| `min_depth` | int \| null | null | Min depth for reported entries (RESERVED) |
| `prune` | bool | `false` | Stop traversing into matching directories (RESERVED) |
| `worker_threads` | int \| null | null | Worker thread count; null/0 = auto-detect |
| `batch_size` | int | `100` | Files per index batch |
| `channel_multiplier` | int | `4` | Channel capacity = threads x multiplier |
| `rayon_thread_stack_size` | int | `8388608` | Rayon thread stack size in bytes (8 MiB) |
| `prune` | bool | `false` | Stop traversing into matching directories |
| `scan_timeout_secs` | int \| null | null | Per-file timeout in seconds (RESERVED) |
| `memory_limit_mb` | int | `512` | Max memory in MiB (RESERVED) |
### `[server]`
Configuration for the local web UI (`nyx serve`).
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled` | bool | `true` | Whether the serve command is enabled |
| `host` | string | `"127.0.0.1"` | Host to bind to (localhost by default) |
| `port` | int | `9700` | Port for the web UI |
| `open_browser` | bool | `true` | Open browser automatically on serve |
| `auto_reload` | bool | `true` | Auto-reload UI when scan results change |
| `persist_runs` | bool | `true` | Persist scan runs for history view |
| `max_saved_runs` | int | `50` | Maximum number of saved runs |
### `[runs]`
Configuration for scan run persistence and history.
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `persist` | bool | `false` | Persist scan run history to disk |
| `max_runs` | int | `100` | Maximum number of runs to keep |
| `save_logs` | bool | `false` | Save scan logs with each run |
| `save_stdout` | bool | `false` | Save stdout capture with each run |
| `save_code_snippets` | bool | `true` | Save code snippets in findings |
### `[profiles.<name>]`
Named scan presets that override scan-related config. Activate with `--profile <name>`.
All fields are optional; omitted fields inherit from the base config.
| Field | Type | Description |
|-------|------|-------------|
| `mode` | string | Analysis mode |
| `min_severity` | string | Minimum severity |
| `max_file_size_mb` | int | Max file size in MiB |
| `include_nonprod` | bool | Keep original severity for test/vendor |
| `enable_state_analysis` | bool | Enable state analysis |
| `default_format` | string | Output format |
| `quiet` | bool | Suppress status output |
| `attack_surface_ranking` | bool | Enable ranking |
| `max_results` | int | Max findings |
| `min_score` | int | Min rank score |
| `show_all` | bool | Show all findings |
| `include_quality` | bool | Include quality findings |
| `worker_threads` | int | Worker thread count |
| `max_depth` | int | Max traversal depth |
**Built-in profiles:**
| Name | Description |
|------|-------------|
| `quick` | AST-only, medium+ severity |
| `full` | Full analysis with state analysis enabled |
| `ci` | Full analysis, medium+ severity, quiet, SARIF output |
| `taint_only` | Taint analysis only |
| `conservative_large_repo` | AST-only, high severity, 5 MiB file limit, depth 10 |
User-defined profiles with the same name as a built-in will override it.
### `[analysis.engine]`
Release-grade switches for the optional analysis passes. Each toggle has a
matching CLI flag (pair of `--foo` / `--no-foo`) that overrides the config
value for a single run. These used to be `NYX_*` environment variables
(`NYX_CONSTRAINT`, `NYX_ABSTRACT_INTERP`, `NYX_SYMEX`, `NYX_CROSS_FILE_SYMEX`,
`NYX_SYMEX_INTERPROC`, `NYX_CONTEXT_SENSITIVE`, `NYX_PARSE_TIMEOUT_MS`,
`NYX_SMT`); those env vars are still honored as a last-resort override when
nyx is used as a library (no CLI entry point), but the config/CLI surface is
the stable path.
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `constraint_solving` | bool | `true` | Path-constraint solving (prunes infeasible paths in taint) |
| `abstract_interpretation` | bool | `true` | Interval / string / bit abstract domains carried through the SSA worklist |
| `context_sensitive` | bool | `true` | k=1 context-sensitive callee inlining for intra-file calls |
| `backwards_analysis` | bool | `false` | Demand-driven backwards taint walk from sinks (adds scan time; default off) |
| `parse_timeout_ms` | int | `10000` | Per-file tree-sitter parse timeout; `0` disables the cap |
**`[analysis.engine.symex]`** sub-section:
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled` | bool | `true` | Run the symex pipeline after taint; adds witness strings and symbolic verdicts |
| `cross_file` | bool | `true` | Persist / consult cross-file SSA bodies so symex can reason about callees defined in other files |
| `interprocedural` | bool | `true` | Intra-file interprocedural symex (k ≥ 2 via frame stack) |
| `smt` | bool | `true` | Use the SMT backend when nyx is built with the `smt` feature; ignored otherwise |
CLI flag map (each pair is `--enable / --no-enable`):
| Config field | CLI flags |
|---|---|
| `constraint_solving` | `--constraint-solving` / `--no-constraint-solving` |
| `abstract_interpretation` | `--abstract-interp` / `--no-abstract-interp` |
| `context_sensitive` | `--context-sensitive` / `--no-context-sensitive` |
| `backwards_analysis` | `--backwards-analysis` / `--no-backwards-analysis` |
| `parse_timeout_ms` | `--parse-timeout-ms <N>` |
| `symex.enabled` | `--symex` / `--no-symex` |
| `symex.cross_file` | `--cross-file-symex` / `--no-cross-file-symex` |
| `symex.interprocedural` | `--symex-interproc` / `--no-symex-interproc` |
| `symex.smt` | `--smt` / `--no-smt` |
**Engine-depth profile shortcut**: instead of flipping individual toggles, pass `--engine-profile {fast,balanced,deep}` to set the whole stack at once. Individual flags override the profile, so `--engine-profile fast --backwards-analysis` runs the fast stack with backwards analysis on. See `docs/cli.md` for the exact toggle matrix.
**Explain effective engine**: pass `--explain-engine` to print the resolved engine configuration (profile + config + CLI overrides) and exit without scanning.
### `[analysis.languages.<slug>]`
@ -112,7 +230,9 @@ Per-language custom rules. `<slug>` is one of: `rust`, `javascript`, `typescript
matchers = ["escapeHtml"]
kind = "sanitizer" # "source" | "sanitizer" | "sink"
cap = "html_escape" # "env_var" | "html_escape" | "shell_escape" |
# "url_encode" | "json_parse" | "file_io" | "all"
# "url_encode" | "json_parse" | "file_io" |
# "fmt_string" | "sql_query" | "deserialize" |
# "ssrf" | "code_exec" | "crypto" | "all"
```
---
@ -146,6 +266,26 @@ default_format = "sarif"
worker_threads = 4
```
### Using a scan profile
```bash
# Use a built-in profile
nyx scan --profile ci
# CLI flags still override profile values
nyx scan --profile ci --format json
```
### Custom profile
```toml
[profiles.security_audit]
mode = "full"
min_severity = "Low"
enable_state_analysis = true
show_all = true
```
### Custom rules for a Node.js project
```toml
@ -181,3 +321,93 @@ nyx config add-terminator --lang javascript --name process.exit
# Verify
nyx config show
```
---
## Config Validation
Config is validated after loading and merging. Validation checks include:
- Server port must be 165535
- Server host must not be empty
- `max_saved_runs` must be > 0 when `persist_runs` is true
- `max_runs` must be > 0 when `persist` is true
- `batch_size` and `channel_multiplier` must be > 0
- `rollup_examples` must be > 0
- Profile names must be alphanumeric with underscores only
Invalid config produces structured error messages identifying the section, field, and issue.
---
## State Analysis
State analysis detects resource lifecycle violations (use-after-close, double-close, resource leaks) and unauthenticated access patterns. It is **enabled by default**.
To disable:
```toml
[scanner]
enable_state_analysis = false
```
State analysis requires `mode = "full"` or `mode = "taint"`. It has no effect in `mode = "ast"`.
**Tradeoffs**:
- Additional per-function state-machine pass adds some scan time
- May produce findings that require domain knowledge to evaluate (e.g., whether a resource handle is intentionally left open)
- Most useful for C, C++, Rust, Go, and Java where acquire/release patterns are common
---
## Upgrading
### Engine-version mismatch is handled automatically
Nyx stores the scanner's `CARGO_PKG_VERSION` in the project index database.
When the version recorded in the DB differs from the running binary; or the
row is missing entirely; every cached summary, SSA body, and file-hash row
is wiped on the next open so the next scan rebuilds the index against the new
engine. No flag is needed; CI pipelines keep working across upgrades.
The rebuild is logged at `info` level:
```
engine version changed (0.4.0 → 0.5.0), rebuilding index
```
If you see this once per upgrade it is working as intended. If you see it on
every scan, the metadata row is not being persisted; file an issue.
### Forcing a reindex
Use `--index rebuild` to throw away the current project's cached summaries
and re-run pass 1 against the current rules. Useful after editing
`nyx.local` rules, after an upgrade that changed label definitions without
changing the engine version, or when you want a known-clean baseline:
```bash
nyx scan --index rebuild .
```
This clears the current project's rows in `files`, `function_summaries`,
`ssa_function_summaries`, and `ssa_function_bodies`; other projects sharing
the same DB directory are untouched.
### Recovering from a corrupt database
If the `.sqlite` file itself is damaged (e.g. from a killed scan or full
disk) and `nyx scan` fails to open it, delete the file and let the next
scan recreate it:
```bash
rm "$(nyx config path)"/<project>.sqlite*
```
On the next scan Nyx builds a fresh index from scratch.
---
## Reserved Fields
Some config fields are defined but not yet implemented. They are marked `(RESERVED)` in the default config and accept values without effect. This allows forward-compatible config files; settings will activate when the feature is implemented without requiring config changes.

View file

@ -1,81 +1,68 @@
# Detector Overview
# Detectors
Nyx uses four independent detector families. Each targets different vulnerability classes and operates at a different level of analysis depth. Findings from all active detectors are merged, deduplicated, ranked, and presented in a single result set.
Nyx ships four independent detector families. They run together in `--mode full`, the default. Findings are merged, deduplicated, ranked, and printed in one result set.
## The Four Detector Families
| Family | Rule prefix | Looks at | What it finds |
|---|---|---|---|
| [Taint analysis](detectors/taint.md) | `taint-*` | Cross-file dataflow | Unsanitized data flowing source to sink |
| [CFG structural](detectors/cfg.md) | `cfg-*` | Per-function control flow | Auth gaps, unguarded sinks, error fallthrough, resource release on all paths |
| [State model](detectors/state.md) | `state-*` | Per-function state lattice | Use-after-close, double-close, leaks, unauthenticated access |
| [AST patterns](detectors/patterns.md) | `<lang>.<cat>.<name>` | Tree-sitter structural match | Banned APIs, weak crypto, dangerous constructs |
| Family | Rule prefix | Analysis depth | What it finds |
|--------|------------|----------------|---------------|
| [**Taint Analysis**](detectors/taint.md) | `taint-*` | Cross-file dataflow | Unsanitized data flowing from sources to sinks |
| [**CFG Structural**](detectors/cfg.md) | `cfg-*` | Intra-procedural CFG | Auth gaps, unguarded sinks, resource leaks, error fallthrough |
| [**State Model**](detectors/state.md) | `state-*` | Intra-procedural lattice | Use-after-close, double-close, resource leaks, unauthenticated access |
| [**AST Patterns**](detectors/patterns.md) | `<lang>.*.*` | Structural (no flow) | Dangerous function calls, banned APIs, weak crypto |
For Rust auth-specific rules (`rs.auth.*`), see [auth.md](auth.md).
## How They Combine
## How they combine
In `--mode full` (default), all four families run. Findings are deduplicated:
In `--mode full`:
1. **Taint supersedes AST**: If a taint finding and an AST pattern both fire at the same location (e.g. both flag `eval(userInput)`), both are kept with distinct rule IDs. The taint finding ranks higher due to the analysis-kind bonus.
1. **Taint and AST can both fire on one line.** If `eval(userInput)` triggers both `js.code_exec.eval` (AST) and `taint-unsanitised-flow` (taint), both are kept with distinct rule IDs. The taint finding ranks higher because of the analysis-kind bonus.
2. **State supersedes CFG on resource leaks.** When `state-resource-leak` and `cfg-resource-leak` fire at the same location, the CFG one is dropped.
3. **Exact duplicates are removed.** Same line, column, rule ID, severity → one finding.
2. **State supersedes CFG**: If a state-model finding (e.g. `state-resource-leak`) fires at the same location as a CFG finding (e.g. `cfg-resource-leak`), the CFG finding is suppressed.
## Modes
3. **Location-level dedup**: Exact duplicates (same line, column, rule ID, severity) are removed.
| Mode | Active detectors |
|---|---|
| `full` (default) | All four |
| `ast` | AST patterns only |
| `cfg` | Taint + CFG + State (no AST patterns) |
| `taint` | Taint + State |
## Analysis Modes
## Attack-surface ranking
| Mode | CLI flag | Active detectors |
|------|----------|-----------------|
| Full | `--mode full` | All four |
| AST-only | `--mode ast` | AST patterns only |
| CFG/Taint | `--mode cfg` | Taint + CFG + State |
## Attack-Surface Ranking
Every finding receives a deterministic **attack-surface score** estimating exploitability. Findings are sorted by descending score.
### Scoring Formula
Every finding gets a deterministic score. Findings are sorted by descending score by default. Disable with `--no-rank` or `output.attack_surface_ranking = false`.
```
score = severity_base + analysis_kind + evidence_strength + state_bonus - validation_penalty
```
| Component | Values | Purpose |
|-----------|--------|---------|
| **Severity base** | High=60, Medium=30, Low=10 | Primary signal |
| **Analysis kind** | taint=+10, state=+8, cfg(with evidence)=+5, cfg(no evidence)=+3, ast=+0 | Confidence of analysis |
| **Evidence strength** | +1 per evidence item (max 4), +2-6 for source kind | Specificity of finding |
| **State bonus** | use-after-close/unauthed=+6, double-close=+3, must-leak=+2, may-leak=+1 | State rule severity |
| **Validation penalty** | -5 if path-validated | Guard reduces exploitability |
| Component | Values |
|---|---|
| Severity base | High=60, Medium=30, Low=10 |
| Analysis kind | taint=+10, state=+8, cfg with evidence=+5, cfg without evidence=+3, ast=+0 |
| Evidence strength | +1 per evidence item up to 4; +2 to +6 for source kind |
| State bonus | use-after-close / unauthed=+6, double-close=+3, must-leak=+2, may-leak=+1 |
| Validation penalty | -5 if path-validated |
### Source-kind priority
Source-kind contributions (taint only):
| Source type | Bonus | Examples |
|-------------|-------|---------|
| User input | +6 | `req.body`, `argv`, `stdin`, `form`, `query`, `params` |
| Environment | +5 | `env::var`, `getenv`, `process.env` |
| Unknown | +4 | Conservative default |
| File system | +3 | `fs::read_to_string`, `fgets` |
| Database | +2 | Query results |
| Source | Bonus |
|---|---|
| User input (`req.body`, `argv`, `stdin`, `form`, `query`, `params`) | +6 |
| Environment (`env::var`, `getenv`, `process.env`) | +5 |
| Unknown | +4 |
| File system | +3 |
| Database | +2 |
### Score ranges (approximate)
Approximate score ranges:
| Finding type | Score range |
|-------------|------------|
| High taint + user input | ~76-80 |
| Finding type | Score |
|---|---|
| High taint with user input | 76 to 81 |
| High state (use-after-close) | ~74 |
| High CFG structural | ~63-68 |
| Medium taint + env source | ~45-50 |
| High CFG structural | 63 to 68 |
| Medium taint with env source | 45 to 50 |
| Medium state (resource leak) | ~40 |
| Low AST-only pattern | ~10 |
Ranking is enabled by default. Disable with `--no-rank` or `output.attack_surface_ranking = false`.
## Two-Pass Architecture
Nyx's taint analysis requires cross-file context, achieved via two passes:
1. **Pass 1 — Summary extraction**: Each file is parsed, a CFG is built, and a `FuncSummary` is extracted per function. Summaries capture source/sanitizer/sink capabilities (bitflags), taint propagation behavior, and callee lists. Summaries are persisted to SQLite.
2. **Pass 2 — Analysis**: All summaries are merged into a global map. Files are re-parsed and analyzed with full cross-file context. The taint engine resolves callees against local summaries (more precise) first, then falls back to global summaries.
With indexing enabled, Pass 1 skips files whose content hash hasn't changed since the last scan.
For the engine's runtime model (passes, summaries, SCC fixed-point), see [how-it-works.md](how-it-works.md).

View file

@ -1,161 +1,130 @@
# CFG Structural Analysis
# CFG structural analysis
## Summary
Nyx builds an intra-procedural control-flow graph per function and checks structural properties: whether sinks are guarded by sanitizers or validators, whether web handlers check authentication, whether resources are released on all exit paths, and whether error paths terminate before reaching dangerous code.
Nyx builds an intra-procedural control-flow graph (CFG) for each function and analyzes structural properties: whether sinks are guarded by sanitizers or validators, whether web handlers check authentication, whether resources are released on all exit paths, and whether error-handling code terminates properly.
These detectors use **dominator analysis** — they check whether a guard node dominates (must execute before) a sink node on the CFG.
These detectors use dominator analysis. A guard dominates a sink when the guard must execute before the sink on every path from entry.
## Rule IDs
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `cfg-unguarded-sink` | High/Medium | Sink reachable without a dominating guard or sanitizer |
| `cfg-auth-gap` | High | Web handler reaches privileged sink without auth check |
| `cfg-unreachable-sink` | Medium | Dangerous function in unreachable code |
| `cfg-unreachable-sanitizer` | Low | Sanitizer in unreachable code |
| `cfg-unreachable-source` | Low | Source in unreachable code |
| `cfg-error-fallthrough` | High/Medium | Error check doesn't terminate; dangerous code follows |
| `cfg-resource-leak` | Medium | Resource acquired but not released on all exit paths |
| `cfg-lock-not-released` | Medium | Lock acquired but not released on all exit paths |
| Rule ID | Severity |
|---|---|
| `cfg-unguarded-sink` | High/Medium |
| `cfg-auth-gap` | High |
| `cfg-unreachable-sink` | Medium |
| `cfg-unreachable-sanitizer` | Low |
| `cfg-unreachable-source` | Low |
| `cfg-error-fallthrough` | High/Medium |
| `cfg-resource-leak` | Medium |
| `cfg-lock-not-released` | Medium |
## What It Detects
## What it detects
### Unguarded sinks (`cfg-unguarded-sink`)
A sink call (e.g. `system()`, `eval()`, `Command::new()`) is reachable from the function entry without passing through a guard or sanitizer that matches the sink's capability.
**`cfg-unguarded-sink`**: A sink call (`system`, `eval`, `Command::new`, `db.execute`, etc.) is reachable from function entry without passing through any guard or sanitizer that matches the sink's capability.
### Auth gaps (`cfg-auth-gap`)
A function identified as a web handler (by parameter naming conventions like `req`, `res`, `ctx`, `request`) reaches a privileged sink (shell execution, file I/O) without a prior call to an authentication function (`is_authenticated`, `require_auth`, `check_permission`, etc.).
**`cfg-auth-gap`**: A function identified as a web handler (by parameter naming conventions like `req`, `res`, `ctx`, `request`, language-dependent) reaches a privileged sink (shell execution, file I/O) without a preceding authentication call.
### Unreachable security code (`cfg-unreachable-*`)
Sinks, sanitizers, or sources in dead code branches. This often indicates a refactoring error where security-critical code was accidentally made unreachable.
**`cfg-unreachable-*`**: Sinks, sanitizers, or sources in dead code. Usually signals a refactoring error that silently disabled security-relevant logic.
### Error fallthrough (`cfg-error-fallthrough`)
An error check (null check, error return check) does not terminate the function or loop back. Execution continues to a dangerous operation on the error path.
**`cfg-error-fallthrough`**: An error-handling branch (null check, error-return check) does not terminate. Execution falls through to a dangerous operation on the error path.
### Resource leaks (`cfg-resource-leak`, `cfg-lock-not-released`)
A resource acquisition call (e.g. `File::open`, `fopen`, `socket`, `Lock`) is not matched by a release call (e.g. `close`, `fclose`, `unlock`) on all exit paths from the function.
**`cfg-resource-leak`, `cfg-lock-not-released`**: A resource acquisition (`File::open`, `fopen`, `socket`, `Lock`) is not matched by a release on every exit path from the function.
## What It Cannot Detect
## What it can't detect
- **Inter-procedural guards**: If authentication is checked in a middleware function that calls this handler, the CFG detector cannot see it. It only analyzes one function at a time.
- **Dynamic dispatch**: Virtual method calls, function pointers, and closures are opaque to the CFG.
- **Complex guard patterns**: Only recognized guard function names are checked. Custom validation logic (e.g. `if password == expected`) is not recognized as a guard.
- **Correct sanitization**: The detector checks that *some* guard dominates the sink, not that the guard is *correct*. A guard that always passes would suppress the finding.
- **Cross-function resource flows**: If a file handle is opened in one function and closed in another, the detector will report a leak in the first function.
- **Inter-procedural guards.** Middleware-level auth, helper functions that internally call auth, and cleanup performed in a caller are invisible.
- **Dynamic dispatch.** Virtual calls, function pointers, closures resolve to no specific callee.
- **Correctness of guards.** The detector checks *a* guard dominates the sink. It cannot check the guard is correct. A no-op `if true {}` would suppress the finding.
- **Custom validation logic.** Only recognised guard names are checked. `if password == expected` is not a recognised guard.
- **Cross-function resource flows.** If a file handle opens in one function and closes in another, the opener gets flagged as a leak. This is the largest source of FPs on factory-pattern code.
## Common False Positives
## Common false positives
| Scenario | Why it fires | Mitigation |
|----------|-------------|------------|
| Framework-level auth middleware | Handler doesn't call auth directly | Document as expected; suppress with severity filter |
| Resource closed via RAII/defer | Implicit cleanup not visible to CFG | Currently not detected; known limitation |
| Custom guard function name | Function not in the recognized guard list | Add the function name as a sanitizer in config |
| Test handlers | Intentionally skip auth in tests | Default non-prod downgrade reduces severity; or exclude test dirs |
| Scenario | Why | Mitigation |
|---|---|---|
| Framework middleware auth | Handler doesn't call auth directly | Expected; suppress with severity filter or exclude handlers |
| RAII / defer cleanup | Implicit release not visible to CFG (partially handled for Rust Drop and Go defer) | Known limitation |
| Custom guard name | Function not in the recognised guard list | Add it as a sanitizer rule in config |
| Test handlers | Intentional lack of auth | Default non-prod downgrade reduces severity; or exclude test dirs |
## Common False Negatives
## Common false negatives
| Scenario | Why it's missed |
|----------|----------------|
| Auth in called function | Cross-function guards not tracked |
| Guard via type system | Type-level guarantees (e.g. Rust's `AuthenticatedUser` wrapper) not analyzed |
| Resource closed in finally/defer | Some cleanup patterns not recognized |
| Scenario | Why |
|---|---|
| Auth in a called helper | Cross-function guards not tracked |
| Type-system guards | Rust `AuthenticatedUser<T>` wrappers, typestate patterns not analysed |
| Cleanup in `finally`/`ensure`/`defer` in callers | Cross-function cleanup not tracked |
## Confidence Signals
## Tuning
| Signal | Meaning |
|--------|---------|
| **Evidence lists guard nodes** | Shows which guards were checked and found missing |
| **Sink has high capability** | Shell execution or file I/O sinks are higher risk |
| **Handler detection matched** | Web handler identification is based on conventional parameter names |
### Recognised guard names
## Tuning and Noise Controls
Nyx accepts these patterns as dominating guards:
### Add custom guards/sanitizers
| Pattern | Applies to |
|---|---|
| `validate*`, `sanitize*` | All sinks |
| `check_*`, `verify_*`, `assert_*` | All sinks |
| `shell_escape` | Shell sinks |
| `html_escape` | HTML/XSS sinks |
| `url_encode` | URL sinks |
| `which` | Shell execution (binary lookup) |
### Recognised auth names
| Pattern | Language |
|---|---|
| `is_authenticated`, `require_auth`, `check_permission`, `authorize`, `authenticate`, `require_login`, `check_auth`, `verify_token`, `validate_token` | Cross-language |
| `middleware.auth`, `auth.required` | Go |
| `isAuthenticated`, `checkPermission`, `hasAuthority`, `hasRole` | Java |
For Rust auth checks (`require_*`, ownership equality, row-level checks), see [auth.md](../auth.md).
### Custom guards
```toml
[[analysis.languages.python.rules]]
matchers = ["validate_request", "check_csrf"]
kind = "sanitizer"
cap = "all"
cap = "all"
```
### Add auth rules
Auth checks are recognized by function name. If your codebase uses non-standard names:
### Custom auth functions
```toml
[[analysis.languages.javascript.rules]]
matchers = ["ensureLoggedIn", "requirePermission"]
kind = "sanitizer"
cap = "all"
```
### Filter results
```bash
# Skip low-severity unreachable findings
nyx scan . --severity ">=MEDIUM"
```
### Disable CFG analysis
```bash
nyx scan . --mode ast # AST patterns only
cap = "all"
```
## Examples
### Unguarded sink
Unguarded sink:
```go
func handler(w http.ResponseWriter, r *http.Request) {
cmd := r.URL.Query().Get("cmd")
exec.Command("sh", "-c", cmd).Run() // cfg-unguarded-sink: no guard dominates
exec.Command("sh", "-c", cmd).Run() // cfg-unguarded-sink
}
```
### Auth gap
Auth gap:
```javascript
app.get('/admin/delete', (req, res) => {
// No is_authenticated() call
db.execute("DELETE FROM users WHERE id = " + req.params.id);
// cfg-auth-gap: web handler reaches privileged sink without auth
// No auth call
db.execute("DELETE FROM users WHERE id = " + req.params.id); // cfg-auth-gap
});
```
### Resource leak
Resource leak:
```c
void process() {
FILE *f = fopen("data.txt", "r"); // acquire
FILE *f = fopen("data.txt", "r");
if (error) {
return; // cfg-resource-leak: f not closed on this path
return; // cfg-resource-leak: f not closed on this path
}
fclose(f);
}
```
## Guard Rules
Nyx recognizes these function name patterns as guards:
| Pattern | Applies to |
|---------|-----------|
| `validate*`, `sanitize*` | All sinks |
| `check_*`, `verify_*`, `assert_*` | All sinks |
| `shell_escape` | Shell execution sinks |
| `html_escape` | HTML/XSS sinks |
| `url_encode` | URL sinks |
| `which` | Shell execution (binary lookup) |
### Auth rules
| Pattern | Category |
|---------|----------|
| `is_authenticated`, `require_auth`, `check_permission` | Common |
| `authorize`, `authenticate`, `require_login` | Common |
| `check_auth`, `verify_token`, `validate_token` | Common |
| `middleware.auth`, `auth.required` | Go |
| `isAuthenticated`, `checkPermission`, `hasAuthority`, `hasRole` | Java |

View file

@ -1,111 +1,84 @@
# AST Pattern Matching
# AST patterns
## Summary
AST patterns are tree-sitter queries that match dangerous structural shapes in source. No dataflow, no CFG. A match means the construct is present; it's not proof the construct is exploitable.
AST patterns are tree-sitter queries that match specific structural code constructs. They are the simplest and fastest detector family — no dataflow, no CFG, just structural presence. A match means the dangerous construct exists in the code; it does not prove the code is exploitable.
AST patterns run in all analysis modes, including `--mode ast` (where they are the only active detector).
Patterns run in every analysis mode. In `--mode ast` they're the only active detector.
## Rule IDs
Pattern rule IDs follow the format `<lang>.<category>.<specific>`:
```
rs.memory.transmute
js.code_exec.eval
py.deser.pickle_loads
c.memory.gets
java.sqli.execute_concat
<lang>.<category>.<name>
```
See the [Rule Reference](../rules/index.md) for a complete listing per language.
Examples: `js.code_exec.eval`, `py.deser.pickle_loads`, `c.memory.gets`, `java.sqli.execute_concat`.
## Pattern Tiers
Full list: [rules.md](../rules.md).
| Tier | Meaning | Examples |
|------|---------|---------|
| **A** | Structural presence alone is high-signal | `gets()`, `eval()`, `pickle.loads()`, `mem::transmute` |
| **B** | Query includes a heuristic guard | SQL `execute` with concatenated arg, `printf(var)` with non-literal format |
## Tiers
Tier B patterns use additional tree-sitter predicates to reduce false positives. For example, `java.sqli.execute_concat` only fires when `executeQuery()` receives a `binary_expression` (string concatenation) as its argument, not when it receives a literal or parameter placeholder.
| Tier | Meaning |
|---|---|
| **A** | Structural presence alone is high-signal. `gets`, `eval`, `pickle.loads`, `mem::transmute` |
| **B** | Pattern includes a tree-sitter heuristic guard. Example: `java.sqli.execute_concat` only fires when `executeQuery` receives a `binary_expression` (string concatenation), not a literal or a parameterized statement |
## What It Detects
## Categories
### By category
| Category | Examples |
|---|---|
| CommandExec | `system`, `os.system`, `Runtime.exec`, backticks |
| CodeExec | `eval`, `Function`, PHP `assert("string")`, `class_eval`, `instance_eval` |
| Deserialization | `pickle.loads`, `yaml.load`, `Marshal.load`, `readObject`, `unserialize` |
| SqlInjection | `executeQuery`/`Query`/`execute` with concatenated argument (Tier B) |
| PathTraversal | PHP `include $var` |
| Xss | `document.write`, `outerHTML`, `insertAdjacentHTML`, `getWriter().print` |
| Crypto | `md5`, `sha1`, `Math.random`, `java.util.Random` for security use |
| Secrets | hardcoded API keys (Go, JS, TS) |
| InsecureTransport | `InsecureSkipVerify`, `fetch("http://...")` |
| Reflection | `Class.forName`, `Method.invoke`, `send`, `constantize` |
| MemorySafety | `transmute`, `unsafe`, `gets`, `strcpy`, `sprintf` |
| Prototype | `__proto__` assignment, `Object.prototype.*` |
| Config | CORS dynamic origin, `rejectUnauthorized: false`, insecure session settings |
| CodeQuality | `unwrap`, `panic!`, `as any` |
| Category | What it matches | Example languages |
|----------|----------------|-------------------|
| **CommandExec** | Shell command execution functions | C (`system`), Python (`os.system`), Ruby (backticks) |
| **CodeExec** | Dynamic code evaluation | JS (`eval`, `new Function()`), Python (`exec`), PHP (`eval`) |
| **Deserialization** | Unsafe object deserialization | Java (`readObject`), Python (`pickle.loads`), Ruby (`Marshal.load`) |
| **SqlInjection** | SQL with string concatenation | Java, Go, Python, PHP (Tier B heuristic) |
| **PathTraversal** | File inclusion with variable path | PHP (`include $var`) |
| **Xss** | XSS sink functions | JS (`document.write`, `outerHTML`), Java (`getWriter().print`) |
| **Crypto** | Weak cryptographic algorithms | All languages (`md5`, `sha1`, `Math.random()`) |
| **Secrets** | Hardcoded credentials | Go (variable name matching) |
| **InsecureTransport** | Unencrypted communication | Go (`InsecureSkipVerify`), JS (`fetch("http://")`) |
| **Reflection** | Dynamic class/method dispatch | Java (`Class.forName`, `Method.invoke`), Ruby (`send`, `constantize`) |
| **MemorySafety** | Memory safety violations | Rust (`transmute`, `unsafe`), C (`gets`, `strcpy`, `sprintf`) |
| **Prototype** | Prototype pollution | JS/TS (`__proto__` assignment) |
| **CodeQuality** | Panic/abort/type-safety issues | Rust (`unwrap`, `panic!`), TS (`as any`) |
## What patterns can't tell you
## What It Cannot Detect
- **Dataflow.** `eval("1+1")` (safe) and `eval(userInput)` (dangerous) both match `js.code_exec.eval`. The taint detector is the one that distinguishes them.
- **Reachability.** A pattern in dead code matches identically.
- **Semantics.** `strcpy(dst, src)` always matches, regardless of buffer sizes.
- **Indirect calls.** `let e = eval; e(input)` doesn't match `eval`.
- **Aliased imports.** `from os import system as s; s(cmd)` won't match `system`.
- **Macro expansions.** Tree-sitter parses the macro call site, not the expansion.
- **Dataflow**: Patterns don't track whether the dangerous function receives tainted input. `eval("hello")` (safe) and `eval(userInput)` (dangerous) both match `js.code_exec.eval`.
- **Context**: Patterns don't understand whether the code is reachable, guarded, or inside a test.
- **Semantics**: `strcpy(dst, src)` always matches — it cannot determine buffer sizes.
- **Indirect calls**: Function pointers, dynamic dispatch, and aliased references are invisible.
## Common false positives
## Common False Positives
| Scenario | Why | Mitigation |
|---|---|---|
| `eval("hardcoded literal")` | Pattern matches structure | Run `--mode cfg` to drop AST patterns and rely on taint |
| `unsafe` block with sound justification | Every `unsafe` matches `rs.quality.unsafe_block` | Filter `>=MEDIUM` (it's Medium) or accept the noise |
| `.unwrap()` in tests | Acceptable in test code | Default non-prod severity downgrade reduces it |
| `md5` for non-cryptographic checksums | Pattern can't see intent | Suppress with `--severity ">=MEDIUM"` or per-line `nyx:ignore` |
| SQL concat with trusted data (Tier B) | Heuristic can't verify the source | Taint is more precise; or convert to a parameterized query |
| Scenario | Why it fires | Mitigation |
|----------|-------------|------------|
| `eval()` with a hardcoded string literal | Pattern matches structural presence | Taint analysis won't flag this — use `--mode cfg` for fewer false positives |
| `unsafe` block in Rust with sound justification | All unsafe blocks match | Filter with `--severity ">=MEDIUM"` (unsafe_block is Medium) |
| `.unwrap()` in test code | Acceptable in tests | Default non-prod downgrade reduces severity |
| `md5()` used for checksums (not security) | Pattern doesn't know usage intent | Filter Low severity or add to exclusions |
| SQL concatenation with trusted data | Tier B heuristic can't verify data source | Taint analysis is more precise here |
## Confidence levels
## Common False Negatives
Every AST pattern carries an explicit confidence:
| Scenario | Why it's missed |
|----------|----------------|
| `eval` called via alias (`let e = eval; e(input)`) | Pattern matches the identifier `eval`, not the resolved function |
| Dangerous function in a macro expansion | Tree-sitter parses the macro call, not the expansion |
| SQL injection via ORM query builder | No pattern for ORM-specific query building |
| Imported function under different name | `from os import system as s; s(cmd)` — pattern looks for `system` |
| Confidence | Use |
|---|---|
| High | Inherently dangerous construct with no safe usage. `gets`, `pickle.loads`, `eval` with no guard |
| Medium | Likely issue, context may change the call. SQL concatenation (Tier B), `unsafe` blocks, `exec` |
| Low | Heuristic. Often appears in safe code. Weak crypto for checksums, `unwrap` outside tests, `Math.random` |
## Confidence Signals
`--min-confidence medium` (or `output.min_confidence = "medium"`) drops Low-confidence matches.
| Signal | Meaning |
|--------|---------|
| **Tier A** | High confidence — the function itself is dangerous |
| **Tier B** | Moderate confidence — heuristic guard reduces false positives |
| **High severity** | Critical vulnerability class (command exec, deserialization) |
| **Low severity** | Informational (weak crypto, code quality) |
| **Non-prod path** | Finding in test/vendor code — downgraded by default |
## Tuning and Noise Controls
### Severity filtering
## Tuning
```bash
# Skip code-quality and weak-crypto findings
nyx scan . --severity ">=MEDIUM"
# Only critical findings
nyx scan . --severity HIGH
nyx scan . --severity ">=MEDIUM" # drop Low-tier patterns
nyx scan . --severity HIGH # banned APIs and code-exec only
nyx scan . --mode cfg # drop AST patterns; keep taint + state + cfg
```
### Use taint for precision
```bash
# Taint-only mode: only report findings with confirmed dataflow
nyx scan . --mode cfg
```
### Exclude directories
```toml
[scanner]
excluded_directories = ["node_modules", "vendor", "generated"]
@ -113,37 +86,29 @@ excluded_directories = ["node_modules", "vendor", "generated"]
## Examples
### Tier A — structural presence
Tier A, structural presence:
**C: Banned function**
```c
char buf[64];
gets(buf); // c.memory.gets — always dangerous, no safe usage
gets(buf); // c.memory.gets
```
**Python: Unsafe deserialization**
```python
import pickle
data = pickle.loads(user_input) # py.deser.pickle_loads
data = pickle.loads(user_input) // py.deser.pickle_loads
```
### Tier B — heuristic-guarded
Tier B, heuristic guard:
**Java: SQL concatenation**
```java
// Fires: concatenated argument
stmt.executeQuery("SELECT * FROM users WHERE id=" + userId);
// java.sqli.execute_concat
stmt.executeQuery("SELECT * FROM users WHERE id=" + userId); // java.sqli.execute_concat
// Does NOT fire: parameterized query
// Does not fire: parameterized
stmt.executeQuery(preparedSql);
```
**C: Format string**
```c
// Fires: variable as first argument
printf(user_input); // c.memory.printf_no_fmt
// Does NOT fire: literal format string
printf("%s", user_input);
printf(user_input); // c.memory.printf_no_fmt: fires (variable as fmt)
printf("%s", user_input); // does not fire (literal fmt)
```

View file

@ -1,26 +1,22 @@
# State Model Analysis
# State model analysis
## Summary
Tracks resource lifecycle and authentication state through a function. Detects use-after-close, double-close, leaks, and unauthenticated access to privileged operations.
Nyx's state model analysis tracks **resource lifecycle** and **authentication state** through a function using monotone dataflow over bounded lattices. It detects use-after-close bugs, double-close bugs, resource leaks, and unauthenticated access to privileged operations.
State analysis is **opt-in** — enable it with `scanner.enable_state_analysis = true` in config. It requires `mode = "full"` or `mode = "cfg"`.
State analysis is on by default. Disable with `scanner.enable_state_analysis = false`. It runs in `--mode full` and `--mode taint`; AST-only mode skips it.
## Rule IDs
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `state-use-after-close` | High | Variable used after being closed/released |
| `state-double-close` | Medium | Resource closed twice |
| `state-resource-leak` | Medium | Resource opened but never closed (definite) |
| `state-resource-leak-possible` | Low | Resource may not be closed on all paths |
| `state-unauthed-access` | High | Privileged operation reached without authentication |
| Rule ID | Severity |
|---|---|
| `state-use-after-close` | High |
| `state-double-close` | Medium |
| `state-resource-leak` | Medium |
| `state-resource-leak-possible` | Low |
| `state-unauthed-access` | High |
## What It Detects
## What it detects
### Use-after-close (`state-use-after-close`)
A resource transitions to the CLOSED state (via `close()`, `fclose()`, `disconnect()`, etc.), then a use operation (`read`, `write`, `send`, `recv`, `query`, etc.) is performed on it.
**`state-use-after-close`**: Resource transitions to CLOSED (via `close`, `fclose`, `disconnect`, …), then a use operation happens on it.
```c
FILE *f = fopen("data.txt", "r");
@ -28,147 +24,108 @@ fclose(f);
fread(buf, 1, 100, f); // state-use-after-close
```
### Double-close (`state-double-close`)
**`state-double-close`**: Resource closed twice. Crashes or undefined behaviour on most runtimes.
A resource is closed twice. This can cause crashes or undefined behavior.
**`state-resource-leak`**: Resource opened but never closed on any path through the function. Definite leak.
```python
f = open("data.txt")
f.close()
f.close() # state-double-close
```
**`state-resource-leak-possible`**: Resource closed on some paths but not others. Lower confidence; often an early-return error path.
### Resource leak (`state-resource-leak`)
**`state-unauthed-access`**: A function recognised as a web handler reaches a privileged sink without an auth call on the path.
A resource is opened but never closed on any path through the function. This is a definite leak.
A function counts as a web handler if its name starts with `handle_`, `route_`, or `api_` (sufficient on its own), or starts with `serve_`/`process_` and the file uses web-shaped parameter names (`request`, `req`, `ctx`, `res`, `response`, `w`, `writer`, language-dependent). `main` is excluded.
```java
FileInputStream fis = new FileInputStream("data.txt");
process(fis);
// function exits without fis.close() — state-resource-leak
```
## Managed-resource suppression
### Possible resource leak (`state-resource-leak-possible`)
Several language-specific cleanup patterns suppress leak findings:
A resource is closed on some paths but not others.
| Pattern | Languages | Effect |
|---|---|---|
| RAII / Drop | Rust | All leak findings suppressed except `alloc`/`dealloc` |
| Smart pointers | C++ | `make_unique`/`make_shared` treated as managed; raw `new`/`malloc` still tracked |
| `defer` | Go | `defer f.Close()` suppresses leak at exit |
| `with` context manager | Python | `with open(f) as f:` suppresses leak for the bound name |
| try-with-resources | Java | TWR-bound resources suppressed |
```go
f, err := os.Open("data.txt")
if err != nil {
return // f not closed here
}
f.Close() // closed here
// state-resource-leak-possible on the error path
```
## What it can't detect
### Unauthenticated access (`state-unauthed-access`)
- **Cross-function resource ownership.** Open in one function, close in another, leak gets reported in the opener. The most common FP source for leak detection.
- **Factory / builder functions** that return a resource for the caller to manage.
- **Variable shadowing across scopes.** Same name in inner and outer scope shares one symbol; an inner close masks an outer leak.
- **Resources stored in collections.** Handles in arrays / maps / channels and cleaned up via iteration are not tracked.
- **Dynamic dispatch.** Close called via trait object or interface may not be recognised.
- **Type-state authentication.** `AuthenticatedRequest<T>` and similar Rust patterns are not recognised as auth.
A function identified as a web handler reaches a privileged sink (shell execution, file I/O) without any authentication check on the path.
## Common false positives
A function is identified as a web handler if:
1. Its name starts with `handle_`, `route_`, or `api_` (strong match — sufficient on its own), OR
2. Its name starts with `serve_` or `process_` AND any function in the file has web-like parameter names (`request`, `req`, `ctx`, `res`, `response`, `w`, `writer`, etc., varying by language).
| Scenario | Why | Mitigation |
|---|---|---|
| Factory returns a resource | Caller owns it | Known limitation |
| Framework-managed handles | Connection pool, request scope | Exclude framework code or downgrade |
| Variable name shadowing | Same name reused | Known limitation |
The function name `main` is explicitly excluded.
## Per-language detection
```javascript
app.post('/admin/exec', (req, res) => {
// No auth check
exec(req.body.command); // state-unauthed-access
});
```
| Language | Leak | Double-close | Use-after-close | Notes |
|---|---|---|---|---|
| C | yes | yes | yes | `fopen`/`fclose`, `malloc`/`free`, `pthread_mutex_*` |
| C++ | yes | yes | yes | C pairs plus `new`/`delete`; smart pointers suppressed |
| Python | yes | yes | yes | `with` suppressed; `open`, `socket`, `connect` |
| Go | yes | yes | yes | `defer` suppressed; `os.Open` / `.Close` |
| Rust | unsafe only | n/a | n/a | RAII suppresses everything except `alloc`/`dealloc` |
| JavaScript | yes | yes | partial | `fs.openSync`/`closeSync` |
| TypeScript | yes | yes | partial | Same as JS |
| PHP | yes | yes | partial | `fopen`/`fclose`, `curl_init`/`curl_close`, `mysqli_*` |
| Ruby | partial | partial | partial | `File.open`/`close`, `TCPSocket` |
| Java | limited | limited | limited | Constructor-callee matching is incomplete |
## What It Cannot Detect
- **Cross-function resource management**: Resources opened in one function and closed in another are not tracked. This is the most common source of false positives for leak detection.
- **RAII / defer / try-with-resources**: Implicit cleanup via language-level constructs (Rust's `Drop`, Go's `defer`, Java's try-with-resources, Python's `with`) is not recognized. These patterns will produce false-positive leak findings.
- **Dynamic dispatch**: If `close()` is called through a trait object or interface, it may not be recognized.
- **Authentication via type system**: Rust's type-state pattern (e.g. `AuthenticatedRequest<T>`) is not recognized as an auth check.
- **Complex authorization logic**: Only recognized function name patterns are checked.
## Common False Positives
| Scenario | Why it fires | Mitigation |
|----------|-------------|------------|
| RAII / Drop / defer cleanup | Implicit cleanup not visible | Known limitation; filter by severity |
| Resource returned to caller | Ownership transferred, not leaked | Known limitation |
| Framework-managed resources | Web framework manages connection lifecycle | Exclude framework-generated handlers |
| Try-with-resources (Java) | Language construct not parsed | Known limitation |
| Context manager (Python `with`) | Block construct not tracked | Known limitation |
## Common False Negatives
| Scenario | Why it's missed |
|----------|----------------|
| Resource closed in helper function | Cross-function tracking not implemented |
| Auth in middleware | Auth check happens before handler is called |
| Double-close via aliased reference | Alias analysis not performed |
## Confidence Signals
| Signal | Meaning |
|--------|---------|
| **Definite leak (state-resource-leak)** | Resource is never closed on any path — high confidence |
| **Use-after-close** | Read/write operation after explicit close — high confidence |
| **Web handler detected** | Entry point matched by parameter naming convention |
| **Possible leak (state-resource-leak-possible)** | Resource closed on some but not all paths — lower confidence |
## Tuning and Noise Controls
### Enable state analysis
```toml
[scanner]
enable_state_analysis = true
```
### Severity filtering
## Tuning
```bash
# Skip possible-leak findings (Low severity)
nyx scan . --severity ">=MEDIUM"
nyx scan . --severity ">=MEDIUM" # Skip "possible" leaks (Low)
```
### Exclude test files
```toml
[scanner]
excluded_directories = ["tests", "test", "spec"]
enable_state_analysis = true # default
excluded_directories = ["tests", "test", "spec"]
```
## Resource Pairs
## Recognised pairs
The state engine recognizes these acquire/release pairs per language:
The state engine ships these acquire/release pairs. Custom pairs are not yet configurable; file an issue if you need one.
### C/C++
| Acquire | Release | Resource |
|---------|---------|----------|
| `fopen` | `fclose` | File handle |
| `open` | `close` | File descriptor |
| `socket` | `close` | Socket |
| `malloc`, `calloc`, `realloc` | `free` | Heap memory |
| `pthread_mutex_lock` | `pthread_mutex_unlock` | Mutex |
**C / C++**
### Rust
| Acquire | Release | Resource |
|---------|---------|----------|
| `File::open`, `File::create` | `drop`, `close` | File handle |
| `TcpStream::connect` | `shutdown` | TCP connection |
| `lock`, `read`, `write` (on Mutex/RwLock) | `drop` | Lock guard |
| Acquire | Release |
|---|---|
| `fopen` | `fclose` |
| `open` | `close` |
| `socket` | `close` |
| `malloc`, `calloc`, `realloc` | `free` |
| `pthread_mutex_lock` | `pthread_mutex_unlock` |
| `new`, `new[]` *(C++)* | `delete`, `delete[]` |
### Java
| Acquire | Release | Resource |
|---------|---------|----------|
| `new FileInputStream` | `close` | File stream |
| `getConnection` | `close` | DB connection |
| `new Socket` | `close` | Socket |
**Rust**
### Go, Python, JavaScript, Ruby, PHP
Similar patterns with language-specific function names.
| Acquire | Release |
|---|---|
| `File::open`, `File::create` | `drop`, `close` |
| `TcpStream::connect` | `shutdown` |
| `lock`, `read`, `write` (Mutex/RwLock) | `drop` |
## Use Patterns (Trigger use-after-close)
**Java**
The following operations on a closed resource trigger `state-use-after-close`:
| Acquire | Release |
|---|---|
| `new FileInputStream` (and friends) | `close` |
| `getConnection` | `close` |
| `new Socket` | `close` |
Go, Python, JavaScript, Ruby, PHP follow language-idiomatic equivalents.
## Use-after-close triggers
These operations on a closed resource fire `state-use-after-close`:
```
read, write, send, recv, fread, fwrite, fgets, fputs, fprintf, fscanf,
@ -177,28 +134,3 @@ ungetc, query, execute, fetch, sendto, recvfrom, ioctl, fcntl,
strcpy, strncpy, strcat, strncat, memcpy, memmove, memset, memcmp,
strcmp, strncmp, strlen, sprintf, snprintf
```
## Technical Details
### Resource Lifecycle Lattice
```
UNINIT → OPEN → CLOSED
→ MOVED
```
States are tracked as bitflags, allowing the lattice to represent uncertainty (e.g. OPEN|CLOSED means the resource is open on some paths and closed on others).
### Leak Detection Scope
Resource leaks are checked at the file-level exit node and the **synthesized** function exit node (a single Return node that all early returns feed into). Early-return nodes are **not** checked individually — only the merged state at the function's synthesized exit is inspected. This prevents duplicate findings where an early-return path reports a definite leak while the merged exit correctly reports a possible leak.
This per-function exit inspection ensures that a variable leaked inside one function is not masked by a same-named variable that is properly closed in a subsequent function.
### Auth Level Lattice
```
Unauthed < Authed < Admin
```
Join semantics: take the minimum (conservative). If any path is unauthenticated, the result is unauthenticated.

View file

@ -1,10 +1,8 @@
# Taint Analysis
# Taint analysis
## Summary
Nyx tracks untrusted data from **sources** (where it enters the program) through assignments and function calls to **sinks** (where it's used dangerously). If the flow reaches a sink without passing a matching **sanitizer**, a finding fires.
Nyx's taint analysis tracks the flow of untrusted data from **sources** (where data enters the program) through **assignments and function calls** to **sinks** (where dangerous operations happen). If the data reaches a sink without passing through a **sanitizer** with matching capabilities, a finding is emitted.
The engine uses a monotone forward dataflow analysis over a finite lattice with guaranteed termination. Analysis is **intra-procedural with cross-file function summaries** — it does not follow calls into other functions but uses pre-computed summaries of their behavior.
The engine is a monotone forward dataflow over a finite lattice with guaranteed termination. It's flow-sensitive inside a function, and interprocedural across files via persisted per-function summaries.
## Rule ID
@ -12,191 +10,135 @@ The engine uses a monotone forward dataflow analysis over a finite lattice with
taint-unsanitised-flow (source <line>:<col>)
```
One rule ID covers all taint findings. The parenthetical identifies the specific source location.
One rule ID, parameterized by the source location. Suppressions can target either the base ID or the full string.
## What It Detects
## What it detects
- Environment variables flowing to shell execution (`env::var``Command::new`)
- User input flowing to code evaluation (`req.body``eval()`)
- File contents flowing to SQL queries (`fs::read_to_string``db.execute()`)
- Request parameters flowing to HTML output (`req.query``innerHTML`)
- Any source-to-sink flow where the sink's required capability is not stripped by a sanitizer
- User input flowing to shell execution: `req.body.cmd``child_process.exec`
- User input flowing to code evaluation: `req.query.code``eval`
- User input flowing to SQL: `request.args.get('id')``cursor.execute(f"... {id}")`
- Environment variables flowing to shell: `env::var("CMD")``Command::new("sh").arg("-c")`
- Request parameters flowing to HTML: `req.query.name``innerHTML`
- File contents flowing to privileged sinks: `fs::read_to_string``db.execute`
- Any other source-to-sink flow where the sink's required capability is not stripped along the way
## What It Cannot Detect
## What it can't detect
- **Inter-procedural flows without summaries**: If a function isn't summarized (e.g. from a third-party library without source), the taint engine cannot track data through it. It conservatively treats unknown callees as neither propagating nor sanitizing.
- **Flows through data structures**: Taint is tracked per-variable, not per-field. `obj.field = tainted; sink(obj.other_field)` may produce a false positive because taint attaches to `obj` as a whole.
- **Aliasing**: `let y = &x; sink(*y)` — the engine tracks `y` as a fresh variable, not an alias of `x`. This can cause false negatives.
- **Complex control flow**: The analysis is flow-sensitive (respects control flow within a function) but does not track taint through arbitrary loops with complex exit conditions.
- **Implicit flows**: Taint only follows explicit data flow, not information flow through branching (e.g. `if (secret) { x = 1 } else { x = 0 }` does not taint `x`).
- **Library calls without summaries.** If a callee has no summary (no source, binary-only dependency), Nyx treats it as neither propagating nor sanitizing. This is conservative for sanitization but lossy for propagation.
- **Taint through struct fields and containers.** Taint attaches to whole variables. `obj.field = tainted; sink(obj.other_field)` can produce a false positive because `obj` itself is tainted.
- **Aliasing.** `let y = &x; sink(*y)` tracks `y` separately from `x`. Can cause FNs.
- **Implicit flows.** Taint follows explicit data, not branching signal. `if (secret) x = 1 else x = 0` does not taint `x`.
- **Globals and statics across functions.** Not tracked across function boundaries.
## Common False Positives
## Common false positives
| Scenario | Why it happens | Mitigation |
|----------|---------------|------------|
| Custom sanitizer not recognized | Nyx only knows built-in and configured sanitizers | Add a custom sanitizer rule in config |
| Taint through struct fields | Variable-level (not field-level) tracking | No current mitigation; field sensitivity is planned |
| Dead code paths | The engine is path-insensitive within a function (it considers all paths) | Contradiction pruning catches some cases; path-validated findings score lower |
| Library wrappers | A wrapper around a dangerous function may re-introduce taint that was sanitized by the wrapper | Summarize the wrapper function or add it as a sanitizer |
| Scenario | Why | Mitigation |
|---|---|---|
| Custom sanitizer not recognised | Only built-in + configured sanitizers match | Add a custom sanitizer rule in config |
| Taint through struct fields | Variable-level tracking, not field-level | No fix yet; field-sensitivity is planned |
| Dead branches | Path-insensitive within a function | Constraint solving catches trivially infeasible combos; path-validated findings are scored lower |
| Library wrapper re-introduces taint | Wrapper opaque, or summary marks it as propagating | Summarize the wrapper explicitly or add it as a sanitizer |
## Common False Negatives
## Common false negatives
| Scenario | Why it's missed |
|----------|----------------|
| Third-party library calls | No summary available; callee treated as opaque |
| Taint through global/static variables | Not tracked across function boundaries |
| Taint through closures/callbacks in some languages | Closure capture analysis is limited (JS/TS/Ruby/Go anonymous functions ARE analyzed) |
| Flows spanning more than two files | Summary approximation loses precision at depth |
| Scenario | Why |
|---|---|
| Third-party library on the path | No summary available, callee treated opaquely |
| Globals / statics across function boundaries | Not tracked |
| Some closure captures | Closure analysis is limited. JS/TS/Ruby/Go anonymous functions passed as callbacks *are* analyzed as separate scopes |
| Very deep cross-file chains | Summary approximation loses precision at depth |
## Confidence Signals
## Confidence signals
These signals in the output indicate higher-confidence findings:
Higher confidence:
- Source + Sink both present in evidence with specific call locations.
- `source_kind: user_input` (direct attacker control).
- `path_validated: false`.
- No dominating guard on the path.
- Symex produced a witness string (rendered sink value visible in JSON/SARIF `evidence.symbolic.witness`).
| Signal | What it means |
|--------|--------------|
| **Evidence: Source + Sink** | Both endpoints identified with specific function names and locations |
| **Source kind = user input** | Source is directly controllable by an attacker (req.body, argv, etc.) |
| **path_validated = false** | No validation guard on the path — higher exploitability |
| **No guard_kind** | No dominating predicate check (null check, error check, etc.) |
| **High rank_score** | Multiple confidence signals combined |
Lower confidence:
- Path-validated taint (`path_validated: true`).
- Source is a database read or internal file (pre-validated at insertion is common).
- Engine note `ForwardBailed` / `PathWidened`. Use `--require-converged` to drop these in strict gates.
Lower-confidence:
## Tuning
| Signal | What it means |
|--------|--------------|
| **path_validated = true** | A validation predicate guards the path — may not be exploitable |
| **guard_kind = "ValidationCall"** | An explicit validation function was called before the sink |
| **Source kind = database** | Data from DB — may already be validated at insertion time |
## Tuning and Noise Controls
### Add custom sanitizers
If your codebase has a custom sanitizer that Nyx doesn't recognize:
### Custom sanitizer
```toml
# nyx.local
[[analysis.languages.javascript.rules]]
matchers = ["escapeHtml", "sanitizeInput"]
kind = "sanitizer"
cap = "html_escape"
kind = "sanitizer"
cap = "html_escape"
```
Or via CLI:
```bash
nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape
```
Or: `nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape`.
### Filter by severity
### Filter by severity or confidence
```bash
nyx scan . --severity HIGH # Only high-severity taint findings
nyx scan . --severity ">=MEDIUM" # Skip low-severity
nyx scan . --severity HIGH
nyx scan . --min-confidence medium
```
### Skip non-production code
By default, findings in `tests/`, `vendor/`, `build/` paths are downgraded one severity tier. To exclude them entirely, add to config:
```toml
[scanner]
excluded_directories = ["tests", "vendor", "build", "examples"]
```
### Disable taint (AST-only mode)
### Skip dataflow entirely
```bash
nyx scan . --mode ast
```
AST-only mode gives you structural pattern matches without taint.
In the browser UI, taint findings render as a numbered flow walk so you can see each hop the engine took:
<p align="center"><img src="../../assets/screenshots/docs/serve-finding-detail.png" alt="Nyx finding detail: HIGH taint-unsanitised-flow with numbered source → call → sink steps and How to fix guidance" width="900"/></p>
## Example
**Vulnerable code** (Rust):
Rust:
```rust
use std::env;
use std::process::Command;
fn main() {
let cmd = env::var("USER_CMD").unwrap(); // line 5: source
Command::new("sh").arg("-c").arg(&cmd).output(); // line 6: sink
let cmd = env::var("USER_CMD").unwrap(); // source
Command::new("sh").arg("-c").arg(&cmd).output(); // sink
}
```
**Finding**:
Finding:
```
[HIGH] taint-unsanitised-flow (source 5:15) src/main.rs:6:5
Source: env::var("USER_CMD") at 5:15
Sink: Command::new("sh").arg("-c")
Score: 76
[HIGH] taint-unsanitised-flow (source 5:15) src/main.rs:6:5
Unsanitised user input flows from env::var → Command::new
Source: env::var (5:15)
Sink: Command::new
```
**Safe alternative**:
```rust
use std::env;
use std::process::Command;
Safe rewrite: drop the shell and pass the value as argv directly (`Command::new(&cmd).output()`), or validate against an allowlist before passing to the shell.
fn main() {
let cmd = env::var("USER_CMD").unwrap();
// Use the value as a direct argument, not a shell command
Command::new(&cmd).output();
// Or validate against an allowlist
}
```
## Capabilities
## Technical Details
Sources, sanitizers, and sinks are linked by named capabilities. A sanitizer only clears taint for the cap it declares. A sink only fires when the remaining taint still carries its required cap.
### Capability System
| Capability | Typical source | Typical sanitizer | Typical sink |
|---|---|---|---|
| `env_var` | `env::var`, `getenv`, `process.env` | | |
| `html_escape` | | `html.escape`, `DOMPurify.sanitize` | `innerHTML`, `document.write` |
| `shell_escape` | | `shlex.quote`, `shell_escape::escape` | `system`, `Command::new`, `eval` |
| `url_encode` | | `encodeURIComponent` | `location.href`, HTTP client URL arg |
| `json_parse` | | `JSON.parse` | |
| `file_io` | | `os.path.realpath`, `filepath.Clean` | `open`, `fs::read_to_string`, `send_file` |
| `fmt_string` | | | `printf(var)` |
| `sql_query` | | parameterized query binders | `cursor.execute`, `db.query` with concatenation |
| `deserialize` | | | `pickle.loads`, `yaml.load`, `Marshal.load` |
| `ssrf` | | URL-prefix locks | `requests.get`, `fetch`, `HttpClient.send` |
| `code_exec` | | | `eval`, `exec`, `Function` |
| `crypto` | | | weak-algorithm constructors |
| `unauthorized_id` | request-bound scoped IDs (Rust auth analysis) | ownership check | row-level write |
| `all` | Sources typically use `all` so they match any sink | | |
Taint uses a bitflag capability system to match sources with appropriate sanitizers and sinks:
| Capability | Bit | Sources | Sanitizers | Sinks |
|-----------|-----|---------|------------|-------|
| `ENV_VAR` | 0x01 | `env::var`, `getenv` | — | — |
| `HTML_ESCAPE` | 0x02 | — | `html_escape`, `DOMPurify.sanitize` | `innerHTML`, `document.write` |
| `SHELL_ESCAPE` | 0x04 | — | `shell_escape` | `Command::new`, `system()`, `eval()` |
| `URL_ENCODE` | 0x08 | — | `encodeURIComponent` | `location.href` |
| `JSON_PARSE` | 0x10 | — | `JSON.parse` | — |
| `FILE_IO` | 0x20 | — | `filepath.Clean`, `basename`, `os.path.realpath` | `fopen`, `open`, `send_file`, `fs::read_to_string` |
| `FMT_STRING` | 0x40 | — | — | `printf(var)` |
Sources typically use `Cap::all()` to match any sink. A sanitizer strips specific capability bits. A finding fires when a tainted variable reaches a sink and the taint still has the matching capability bit set.
### Nested Function Analysis
The CFG builder recursively discovers function expressions nested inside call arguments:
- **JavaScript/TypeScript**: `function_expression`, `arrow_function` inside call arguments (e.g., Express route handlers)
- **Ruby**: `do_block` and `block` nodes (e.g., Sinatra `get '/path' do...end`)
- **Go**: `func_literal` (anonymous function literals)
Each nested function is walked as a separate scope and receives a unique identifier (`<anon@{byte_offset}>`) to prevent collisions when multiple anonymous functions exist in the same file.
### Chained Call Classification
Method chains like `r.URL.Query().Get("host")` are normalized by stripping internal `()` segments between `.` separators. The classifier matches against both the original text and the normalized form, enabling rules like `r.URL` to match within `r.URL.Query.Get`.
### Nested Call Fallback
When the outermost call in an expression doesn't classify as a source/sink, the engine tries all nested inner calls. This handles patterns like `str(eval(expr))` where `str` is not a sink but the inner `eval` is.
### Rust `if let` / `while let` Pattern Bindings
The CFG builder recognizes Rust `let_condition` nodes inside `if` and `while` expressions. The value expression is classified for source/sink labels, and the pattern binding is extracted as a variable definition:
```rust
if let Ok(cmd) = env::var("CMD") {
// cmd is tainted — env::var is a source, cmd is the binding
Command::new("sh").arg("-c").arg(&cmd).output(); // taint-unsanitised-flow
}
```
This also works for `while let` patterns.
### JS/TS Two-Level Solve
For JavaScript and TypeScript, taint analysis uses a two-level approach:
1. **Level 1**: Solve top-level code (module scope)
2. **Level 2**: Solve each function seeded with the converged top-level state
This prevents false positives from cross-function taint leakage while preserving global-to-function flows.
Sources typically use `cap = "all"` so they match every sink. Sinks declare the specific cap they need. Sanitizers only clear the cap they name.

46
docs/how-it-works.md Normal file
View file

@ -0,0 +1,46 @@
# How Nyx works
If you're going to act on a finding, it helps to know how the scanner got there. This page is the short version. Source paths are linked where the answer to "exactly what does it do" lives in the code.
## The pipeline
A scan runs in two passes over the file tree, with an optional SQLite index that lets the second scan skip files whose content hash hasn't changed.
**Pass 1, per file.** Tree-sitter parses the file. Nyx builds an intra-procedural control-flow graph, lowers it to SSA, and extracts a summary per function describing what that function does at the boundary: which arguments flow to sinks, which sources it reads from, which sinks it calls, what taint it strips, what it returns. Summaries are persisted to SQLite ([`src/summary/`](https://github.com/elicpeter/nyx/tree/master/src/summary/), [`src/database.rs`](https://github.com/elicpeter/nyx/blob/master/src/database.rs)).
**Summary merge.** All per-file summaries get unioned into a global map keyed by qualified function name.
**Pass 2, per file.** Each file is reanalysed with the global summaries available. The taint engine runs a forward dataflow worklist over the SSA representation. When it hits a call, it consults summaries to decide whether the call propagates taint, sanitizes it, or terminates the flow. Findings are produced when tainted data reaches a sink whose required capability is still set on the value.
Two extra layers tune precision around calls. **Context-sensitive inlining** (k=1) re-runs intra-file callees with the actual argument taint at the call site, so a helper called once with tainted input and once with sanitized input produces the right result for each call. **SCC fixed-point**: when a group of mutually-recursive functions forms a strongly-connected component in the call graph, the engine iterates summaries to a joint fixed-point (capped at 64 iterations). SCCs that span files are also handled.
## Optional analyses on top
These run on top of the forward taint pass. They're independently switchable via `[analysis.engine]` config or matching CLI flags. See [advanced-analysis.md](advanced-analysis.md) for the full description and tradeoffs.
| Pass | Purpose | Default |
|---|---|---|
| Abstract interpretation | Carries interval and string prefix/suffix bounds alongside taint. Suppresses findings on proven-bounded integers and locked-prefix URLs | on |
| Context sensitivity | k=1 inlining for intra-file callees | on |
| Constraint solving | Drops paths whose accumulated branch predicates are unsatisfiable. Optional Z3 backend with `--features smt` | on |
| Symbolic execution | Builds an expression tree per tainted value. Produces a witness string at the sink. Detects sanitization patterns the taint engine alone would miss | on |
| Backwards analysis | After the forward pass, walks backwards from each sink to confirm or invalidate the flow. Annotates findings as `backwards-confirmed`, `backwards-infeasible`, or `backwards-budget-exhausted` | off |
`--engine-profile fast | balanced | deep` flips groups of these at once. `balanced` is the default and the configuration the benchmark numbers in [language-maturity.md](language-maturity.md) are measured against.
## Where bounds live
Static analysis at scale means choosing where to stop. Nyx exposes its bounds rather than hiding them:
- **Inline depth** is k=1. Callees larger than the inline body-size cap fall back to summary-based resolution.
- **SCC fixed-point** is capped at 64 iterations. If a recursive cluster doesn't converge, the engine emits the best summary it has and records an `engine_note` on affected findings.
- **Lattice width** is bounded. Taint origin sets cap at 32 entries per SSA value (`--max-origins`); points-to sets cap at 32 heap objects (`--max-pointsto`). Truncation is recorded as `OriginsTruncated` / `PointsToTruncated` so you can see when precision was lost.
- **Symbolic expressions** cap at depth 32. Deeper expressions degrade to `Unknown` rather than growing without bound.
Findings whose engine notes indicate a bound was hit can be filtered with `--require-converged` for strict CI gates. The flag drops over-reports and bails; under-reports (where the emitted finding is still real but the result set is a lower bound) are kept.
## What you get out
Each finding carries the source location, the sink location, the path in between (when symex produced one), the rule ID, severity, attack-surface score, confidence level, and a list of engine notes describing any precision loss along the way. Console output is human-readable; JSON and SARIF carry the full evidence object for tooling.
For the JSON shape and SARIF mapping, see [output.md](output.md).

View file

@ -1,32 +0,0 @@
# Nyx Documentation
Welcome to the Nyx documentation. Nyx is a multi-language static vulnerability scanner built in Rust.
## User Guide
- [Installation](installation.md) — Install via cargo, prebuilt binaries, or from source
- [Quick Start](quickstart.md) — Your first scan in 60 seconds
- [CLI Reference](cli.md) — Every flag, subcommand, and option
- [Configuration](configuration.md) — Config file schema, precedence, custom rules
- [Output Formats](output.md) — Console, JSON, SARIF; exit codes; evidence fields
## Detector Reference
- [Detector Overview](detectors.md) — How the four detector families work together
- [Taint Analysis](detectors/taint.md) — Cross-file source-to-sink dataflow tracking
- [CFG Structural Analysis](detectors/cfg.md) — Auth gaps, unguarded sinks, resource leaks
- [State Model Analysis](detectors/state.md) — Resource lifecycle and authentication state
- [AST Patterns](detectors/patterns.md) — Tree-sitter structural pattern matching
## Rule Reference
- [Rule Index](rules/index.md) — How rules are organized
- [Rust](rules/rust.md) | [C](rules/c.md) | [C++](rules/cpp.md) | [Java](rules/java.md) | [Go](rules/go.md)
- [JavaScript](rules/javascript.md) | [TypeScript](rules/typescript.md) | [Python](rules/python.md)
- [PHP](rules/php.md) | [Ruby](rules/ruby.md)
## Contributing
- [Contributing Guide](../CONTRIBUTING.md) — Development setup, adding rules, PR guidelines
- [Security Policy](../SECURITY.md) — Responsible disclosure
- [Code of Conduct](../CODE_OF_CONDUCT.md)

View file

@ -1,42 +1,42 @@
# Installation
## Install from crates.io
For the happy path (`cargo install nyx-scanner`, release binary on PATH), see the README. This page covers platform-specific notes and upgrade paths.
## Supported platforms
Release binaries are published for:
| Platform | Archive |
|---|---|
| Linux x86_64 | `nyx-x86_64-unknown-linux-gnu.zip` |
| macOS Intel | `nyx-x86_64-apple-darwin.zip` |
| macOS Apple Silicon | `nyx-aarch64-apple-darwin.zip` |
| Windows x86_64 | `nyx-x86_64-pc-windows-msvc.zip` |
Build from source works on any stable Rust 1.88+ target (edition 2024).
## Verify the download
Each release attaches a `SHA256SUMS` file. When the maintainer signs the release, a detached `SHA256SUMS.asc` is published alongside it.
```bash
cargo install nyx-scanner
# Verify the checksum file's signature (skip if .asc isn't present)
gpg --verify SHA256SUMS.asc SHA256SUMS
# Then check your archive against it
sha256sum -c SHA256SUMS --ignore-missing
```
This installs the `nyx` binary into `~/.cargo/bin/`.
If `sha256sum` is missing on macOS, `shasum -a 256 -c SHA256SUMS --ignore-missing` is equivalent.
## Install from GitHub releases
## Windows
1. Go to the [Releases](https://github.com/elicpeter/nyx/releases) page.
2. Download the binary for your platform:
| Platform | Archive |
|----------|---------|
| Linux x86_64 | `nyx-x86_64-unknown-linux-gnu.zip` |
| macOS Intel | `nyx-x86_64-apple-darwin.zip` |
| macOS Apple Silicon | `nyx-aarch64-apple-darwin.zip` |
| Windows x86_64 | `nyx-x86_64-pc-windows-msvc.zip` |
3. Extract and install:
```bash
# Linux / macOS
unzip nyx-*.zip
chmod +x nyx
sudo mv nyx /usr/local/bin/
# Windows (PowerShell)
Expand-Archive -Path nyx-*.zip -DestinationPath .
Move-Item -Path .\nyx.exe -Destination "C:\Program Files\Nyx\"
```
4. Verify:
```bash
nyx --version
```
```powershell
Expand-Archive -Path nyx-x86_64-pc-windows-msvc.zip -DestinationPath .
Move-Item -Path .\nyx.exe -Destination "C:\Program Files\Nyx\"
# Add C:\Program Files\Nyx to PATH in System Properties → Environment Variables
nyx --version
```
## Build from source
@ -44,33 +44,34 @@ This installs the `nyx` binary into `~/.cargo/bin/`.
git clone https://github.com/elicpeter/nyx.git
cd nyx
cargo build --release
cargo install --path .
# Binary at target/release/nyx
```
Requires **Rust 1.85+** (edition 2024).
The frontend is built and embedded into the binary during `cargo build`, so there's no separate step for `nyx serve`. Node is only required if you're working on the frontend itself; see `CONTRIBUTING.md`.
## CI Integration
Optional features:
### GitHub Actions
| Flag | Adds |
|---|---|
| `--features smt` | Bundles Z3 for stronger path-constraint solving. MIT-licensed; distributors should include Z3's license in their attribution |
| `--features smt-system-z3` | Links against a system-installed Z3 instead of bundling |
```yaml
- name: Install Nyx
run: cargo install nyx-scanner
## Upgrading
- name: Run security scan
run: nyx scan . --format sarif --fail-on medium > results.sarif
Nyx stores its scanner version in the project's index database. When the binary's version differs from the stored version, the index is wiped on the next scan and rebuilt against the new engine. You'll see one info-level log line:
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
```
engine version changed (0.4.0 → 0.5.0), rebuilding index
```
### Generic CI
No flag needed. If you see this on *every* scan, the metadata row isn't being persisted; file an issue.
## Corrupt database recovery
If the SQLite file itself is damaged (killed scan, full disk), delete it and let the next scan rebuild from scratch:
```bash
# Fail the build if any High or Medium finding is detected
nyx scan . --severity ">=MEDIUM" --fail-on medium --quiet --format json
rm "$(nyx config path)"/<project>.sqlite*
```
The `--fail-on` flag causes Nyx to exit with code **1** if any finding meets or exceeds the given severity. Exit code **0** means no findings matched.
Only the named project's rows are affected.

266
docs/language-maturity.md Normal file
View file

@ -0,0 +1,266 @@
# Language Maturity Matrix
Nyx supports ten languages, but support depth is not uniform. This page gives an
honest per-language picture so you can calibrate expectations before depending
on Nyx for a given stack.
The classifications here are grounded in three concrete signals:
1. **Rule depth**: how many distinct source / sanitizer / sink matchers exist
for the language in `src/labels/<lang>.rs`, and how many vulnerability
classes (Cap bits) those matchers cover.
2. **Benchmark results**: rule-level precision / recall / F1 on the 305-case
corpus (267 synthetic + 14 real-CVE pairs + 10 auth fixtures) in
[`tests/benchmark/RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md),
last measured 2026-04-23 with scanner version 0.5.0.
3. **Known weak spots**: FPs and FNs the maintainers have deliberately left
in the benchmark rather than suppressed, documented release-by-release in
[`RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md).
All parser integrations use tree-sitter and are stable; parsing is not a
differentiator between tiers. The differentiators are rule depth, cross-file
confidence, and modeled idioms.
---
## Tier Summary
| Tier | Languages | What to expect |
|------|-----------|----------------|
| **Stable** | Python, JavaScript, TypeScript | Deep rule sets, gated sinks (argument-role-aware), framework detection, extensive fixtures, and the bulk of advanced-analysis (SSA, context-sensitivity, symbolic execution) coverage. Safe to depend on in CI gates. |
| **Beta** | Go, Java, Ruby, PHP | Solid mid-depth rule sets with known narrower class coverage. No gated sinks yet. Cross-file flows work; some idioms (variable-typed method receivers, framework context, string interpolation) are incomplete. Usable in CI, but review FP/FN lists before tightening gates. |
| **Preview** | C, C++ | Pattern-only coverage. Pointer aliasing, function pointers, array-element taint, and STL container flows are not modeled. Suitable for finding obvious unsafe API uses; do not use as a sole SAST gate. Pair with clang-tidy / Clang Static Analyzer / Infer. |
| **Experimental** | Rust | Full source coverage relative to the framework ecosystem, but several FPs persist on adversarial safe cases pending engine work (match-arm guards, structural sinks with type facts). Appropriate for spot-checks and contribution but not yet recommended as a sole SAST dependency. |
---
## Per-Language Detail
### Stable tier
#### Python: 100% P / 100% R / 100% F1 *(29-case corpus)*
- **Rule depth**: 5 source families, 7 sanitizer families, 21 sink matchers
spanning HTML, URL, Shell, SQL, Code, SSRF, File I/O, and Deserialization.
- **Framework context**: Flask, Django, argparse source matchers; `flask_request`
import-alias support.
- **Advanced analysis**: gated sinks (`Popen`, `subprocess.run/call` with
activation-arg awareness), most SSA-equivalence and symbolic-execution
fixtures target Python.
- **Fixtures**: 125 under `tests/fixtures/` plus 30 benchmark cases.
- **Blind spots**: f-string interpolation is not explicitly modeled as a
distinct taint-producing construct; string-formatting flows are caught by
the general concatenation path.
#### JavaScript: 93.8% P / 100% R / 96.8% F1 *(27-case corpus)*
- **Rule depth**: 3 source families, 10 sanitizer families, 24 sink matchers
spanning HTML, URL, JSON, Shell, SQL, Code, SSRF, and File I/O.
- **Advanced analysis**: gated sinks (`setAttribute`, `parseFromString`),
two-level SSA solve for top-level + per-function scopes (`analyse_ssa_js_two_level`),
prefix-locked SSRF suppression via StringFact.
- **Framework context**: Express, Koa, Fastify (via in-file import scan when
`package.json` is absent).
- **Fixtures**: 238 under `tests/fixtures/`; the largest corpus of any
language.
- **Blind spots**: template literals are lowered through concatenation rather
than modeled as a first-class taint operator; dynamic property access
(`obj[user]`) is conservatively treated.
#### TypeScript: 100% P / 100% R / 100% F1 *(35-case corpus, most recent measurement)*
- **Rule depth**: Shares the JS ruleset (3 sources, 10 sanitizers, 24 sinks)
plus TS-specific grammar handling.
- **Advanced analysis**: TSX and JSX grammars wired as of 2026-04-20;
discriminated-union narrowing, generic erasure, decorator flow, and
interface dispatch are all validated against adversarial type-system
stressors.
- **Framework context**: Fastify detection via `detect_in_file_frameworks`
(import-driven, no `package.json` required).
- **Fixtures**: 39 test fixtures plus 35 benchmark cases.
- **Blind spots**: 0 known open weak spots as of 2026-04-20. `as any` casts
and `any`-typed flows are handled conservatively (treated as tainted).
### Beta tier
#### Go: 94.1% P / 100% R / 97.0% F1 *(28-case corpus)*
- **Rule depth**: 4 source families, 4 sanitizer families, 9 sink matchers
covering HTML, URL, Shell, SQL, SSRF, Crypto, and File I/O.
- **Framework context**: Gin, Echo source matchers.
- **Known gaps**: no gated sinks, no deserialization class, allowlist
early-return patterns in path-pruning benchmark cases still produce FPs
(`go-pathprune-safe-001`). `fmt.Sprintf` is deliberately not a sink.
#### Java: 92.9% P / 100% R / 96.3% F1 *(23-case corpus)*
- **Rule depth**: 3 source families, 8 sanitizer families, 10 sink matchers
covering HTML, URL, Shell, SQL, Code, SSRF, and Deserialization.
- **Framework context**: Spring, JPA, Hibernate ORM rules; JNDI injection
sinks.
- **Known gaps**: no gated sinks. Variable-receiver method calls
(`client.send(...)` vs `HttpClient.send(...)`) rely on type-qualified
resolution from receiver-type inference; flows where the receiver type
cannot be inferred are missed (`java-ssrf-002` historically persisted as
FN; closed via type facts but fragile on unusual builder chains).
#### Ruby: 100% P / 92.3% R / 96.0% F1 *(24-case corpus)*
- **Rule depth**: 3 source families, 7 sanitizer families, 15 sink matchers
covering HTML, Shell, SQL, Code, SSRF, File I/O, and Deserialization.
- **Framework context**: Rails helpers (`sanitize_sql`, `permit`, `require`).
- **Known gaps**: string interpolation inside shell and SQL strings is
recognized structurally but not modeled as a distinct operator.
`begin/rescue/ensure` exception-edge wiring is documented as deferred
(structurally incompatible with `build_try()`). One FN persists on an
interprocedural taint propagation case due to rule-ID mismatch, not a
missed flow (`rb-interproc-001`).
#### PHP: 86.7% P / 100% R / 92.9% F1 *(24-case corpus)*
- **Rule depth**: 3 source families (`$_GET`, `$_POST`, `$_REQUEST`
superglobals), 7 sanitizer families, 10 sink matchers covering HTML, URL,
Shell, SQL, Code, SSRF, File I/O, and Deserialization.
- **Known gaps**: no gated sinks. Limited framework context (Laravel raw
methods only). Interprocedural sanitizer-wrapping case
(`php-interproc-safe-001`) persists as FP. `echo` language-construct
detection is wired but its inner-argument propagation is narrower than
function-call sinks.
### Preview tier
C and C++ are labeled **Preview** (not Experimental) to convey a specific
shape of limitation: the parser and existing rules produce useful findings
on obvious unsafe-API uses, but the engine **structurally cannot model**
several pervasive C/C++ constructs. Running Nyx on a C/C++ codebase and
seeing a clean report should not be read as a clean audit. Pair Nyx with
clang-tidy, the Clang Static Analyzer, or Infer for production use.
**Not modeled** (common to both C and C++):
- Pointer aliasing. Taint through `*p`, `p->field`, arbitrary pointer
arithmetic, and aliased writes are not tracked.
- Function pointers and callback dispatch. Indirect calls through
`void (*fn)(char *)` resolve to no callee.
- Array-element taint. Writes to `buf[i]` do not propagate taint to `buf`
in the general case; structural taint chains involving `fgets` → array →
`system` have rule-ID matching issues (`c-cmdi-004`).
- STL container operations (C++ only). `std::vector`, `std::map`,
`std::string` methods are not taint-aware; `c_str()` breaks taint chains
(`cpp-cmdi-003`).
- Lambdas and nested classes (C++ only). Not modeled.
- Complex socket setup (C++ only). E.g. `connect()` chains are not detected
(`cpp-ssrf-002`).
#### C: 85.7% P / 100% R / 92.3% F1 *(20-case corpus)*
- **Rule depth**: 3 source families, **2** sanitizer families (prefix-based
only), 5 sink matchers spanning Shell, File, SSRF, and Format-String.
- **Known gaps**: no framework rules, no gated sinks. Path-validation via
`strstr()` is not recognized as a guard (`c-safe-006`). Forward-declared
sanitizers are not tracked (`c-safe-008`).
#### C++: 80.0% P / 100% R / 88.9% F1 *(20-case corpus)*
- **Rule depth**: Clones the C ruleset (3 sources, 2 sanitizers, 5 sinks) and
adds `std::cin` / `std::getline` sources.
- **Known gaps**: same sanitizer-recognition gaps as C. See the "Not
modeled" list above for structural gaps (STL containers, `c_str()`,
`connect()`, lambdas, nested classes).
### Experimental tier
#### Rust: 76.0% P / 100% R / 86.4% F1 *(31-case adversarial corpus)*
- **Rule depth**: 6 source families, **2** sanitizer families (prefix and
type-coercion), 11 sink matchers covering HTML, Shell, SQL, SSRF,
Deserialization, and File I/O. Extensive framework source coverage
(Axum, Actix, Rocket); the most of any language on the source side.
- **Recent additions (2026-04-20)**: new SQL class (`rusqlite`, `sqlx`,
`diesel`, `postgres`), new Deserialization class (`serde_yaml`,
`bincode`, `rmp_serde`, `ciborium`, `ron`, `toml`), expanded file I/O
(`fs::remove_file/dir/rename/copy`), `reqwest` SSRF builder chain.
- **Known gaps**:
- `rs-safe-003`: structural `cfg-unguarded-sink` fires when a tainted
variable is *declared* in scope but not used in the sink; intentional
for high-risk sinks.
- `rs-safe-009`: match-arm guards don't surface as `StmtKind::If`, so
`classify_condition` never sees the character-class validation.
- `safe_direct_sanitizer.rs`: still FP because the SSA lowering for
an OR-chain rejection (`if a || b || c { return X }`) joins both
return paths into a single block, losing the early-return
semantics. Distinct from the merged-return-block defect closed in
2026-04-24; tracked separately.
- **Closed by the 2026-04-23 PathFact domain**
(`src/abstract_interp/path_domain.rs`): `rs-safe-007` (`.replace("..",
"")` sanitiser), `rs-safe-008` (negative-validation return pattern),
`rs-safe-010` (static-map lookup; still handled by the dedicated
static-map analysis, but PathFact does not interfere), new `rs-safe-012`
(`.contains("..")` + `.starts_with('/')` intraprocedural rejection),
new `rs-safe-015` (`Path::new(p).is_absolute()` typed rejection), plus a
new `rs-path-006` negative-guard to prevent over-suppression.
- **Closed by the 2026-04-24 per-return-path PathFact landing**
(`PathFactReturnEntry` on `SsaFuncSummary` + structural
variant-wrapper transparency + non-data-return skipping +
path-fact-proven leaf detection in
`trace_tainted_leaf_values`):
`rs-safe-014` (Option-returning user sanitiser),
new `rs-safe-016` (cross-function `.contains("..")` rejection),
`CVE-2018-20997` patched (tar-rs zip-slip),
`CVE-2022-36113` patched (cargo `.cargo-ok` symlink),
`CVE-2024-24576` patched (BatBadBut argv injection).
- **Not yet covered**: unsafe FFI / `std::mem::transmute` (no rules), Tokio
`process::Command` async variants (not distinguished from sync),
`hyper` / `surf` / `ureq` SSRF clients (reqwest family only), and Rocket /
Actix positive cases (rules exist but no benchmark fixtures yet).
---
## How the tiers were assigned
A language lands in **Stable** when all three hold:
- Rule set covers ≥ 8 vulnerability classes with both source and sink
matchers, and at least one class has argument-role-aware gating.
- Benchmark F1 ≥ 95% on a corpus of ≥ 25 cases.
- Advanced analysis (SSA lowering, context-sensitivity, symbolic-execution)
is exercised by fixtures for the language.
A language lands in **Beta** when benchmark F1 ≥ 90% but at least one of the
Stable criteria fails; usually narrower cap coverage or absence of gated
sinks.
A language lands in **Preview** when the engine structurally cannot model
constructs that are pervasive in typical codebases for that language
(pointer aliasing, function pointers, array-element taint, STL containers
for C/C++). Pattern-only coverage is useful but not sufficient as a sole
SAST gate.
A language lands in **Experimental** when rule depth is clearly narrower
(≤ 5 sinks and ≤ 2 sanitizers), or benchmark F1 < 90%, or documented weak
spots require engine changes rather than rule additions to close, but the
engine does not have the pervasive structural blind spots of the Preview
tier.
---
## What this means for you
- **CI gates**: safe to set strict `--fail-on HIGH` gates on Stable-tier
languages. On Beta-tier, expect occasional FP triage; the weak-spot lists
above tell you exactly what to skim for. On Preview- and Experimental-tier,
treat Nyx findings as a starting point for manual review rather than
authoritative; Preview-tier languages in particular have structural
blind spots that a clean report will not disclose.
- **Rule contributions**: the shortest path to raising a language's tier is
contributing sink matchers and gated-sink registrations. Label files live
at `src/labels/<lang>.rs`; benchmark cases live at
`tests/benchmark/corpus/<lang>/`.
- **Scope planning**: if your primary stack is C, C++, or Rust, Nyx will
surface real findings, but you should budget for review time and consider
combining Nyx with a language-specific tool (e.g. `cargo-audit`,
`clang-tidy`) until those tiers mature.
The benchmark thresholds in `tests/benchmark_test.rs` are deliberately set
~5 pp below current baselines so any drop in a language's F1 fails CI. Tier
promotions require sustained benchmark performance, not just rule additions.

View file

@ -19,9 +19,9 @@ Human-readable, color-coded output to stdout. Status messages go to stderr.
| Tag | Color | Meaning |
|-----|-------|---------|
| `[HIGH]` | Red, bold | Critical likely exploitable |
| `[MEDIUM]` | Orange, bold | Important may be exploitable |
| `[LOW]` | Muted blue-gray | Informational code quality or weak signal |
| `[HIGH]` | Red, bold | Critical -- likely exploitable |
| `[MEDIUM]` | Orange, bold | Important -- may be exploitable |
| `[LOW]` | Muted blue-gray | Informational -- code quality or weak signal |
### Evidence fields
@ -139,9 +139,9 @@ Fields marked "no" are omitted when empty/null/false to keep output compact.
| Level | Meaning |
|-------|---------|
| `High` | Strong signal taint-confirmed flow, definite state violation |
| `Medium` | Moderate signal resource leak, path-validated taint, CFG structural |
| `Low` | Weak signal AST pattern match, possible resource leak, degraded analysis |
| `High` | Strong signal -- taint-confirmed flow, definite state violation |
| `Medium` | Moderate signal -- resource leak, path-validated taint, CFG structural |
| `Low` | Weak signal -- AST pattern match, possible resource leak, degraded analysis |
### Evidence object
@ -192,12 +192,12 @@ nyx scan . --format sarif > results.sarif
The SARIF output includes:
- **Tool metadata** Nyx name and version
- **Rules** Rule ID, description, severity mapping
- **Results** One result per finding with location, message, and properties
- **Properties** Each result includes `category` and optionally `confidence` and `rollup.count`
- **Related locations** Rollup findings include example locations in `relatedLocations`
- **Artifacts** File paths referenced by findings
- **Tool metadata** -- Nyx name and version
- **Rules** -- Rule ID, description, severity mapping
- **Results** -- One result per finding with location, message, and properties
- **Properties** -- Each result includes `category` and optionally `confidence` and `rollup.count`
- **Related locations** -- Rollup findings include example locations in `relatedLocations`
- **Artifacts** -- File paths referenced by findings
### GitHub Code Scanning integration
@ -229,9 +229,9 @@ Without `--fail-on`, Nyx always exits `0` on a successful scan regardless of fin
| Level | Description | Typical rules |
|-------|-------------|---------------|
| **High** | Critical vulnerabilities likely exploitable | Command injection, unsafe deserialization, banned C functions, taint-confirmed flows with user input sources |
| **Medium** | Important issues may be exploitable with additional context | SQL concatenation, XSS sinks, reflection, unguarded sinks, resource leaks |
| **Low** | Informational code quality or weak signals | Weak crypto algorithms, insecure randomness, `unwrap()`/`panic!()`, type-safety escapes |
| **High** | Critical vulnerabilities -- likely exploitable | Command injection, unsafe deserialization, banned C functions, taint-confirmed flows with user input sources |
| **Medium** | Important issues -- may be exploitable with additional context | SQL concatenation, XSS sinks, reflection, unguarded sinks, resource leaks |
| **Low** | Informational -- code quality or weak signals | Weak crypto algorithms, insecure randomness, `unwrap()`/`panic!()`, type-safety escapes |
### Non-production severity downgrade
@ -265,8 +265,8 @@ x = dangerous() # nyx:ignore taint-unsanitised-flow ← suppresses this lin
x = dangerous() ← suppresses this line
```
- `nyx:ignore <RULE_ID>` suppresses findings on the **same line** as the comment.
- `nyx:ignore-next-line <RULE_ID>` suppresses findings on the **next line**.
- `nyx:ignore <RULE_ID>` -- suppresses findings on the **same line** as the comment.
- `nyx:ignore-next-line <RULE_ID>` -- suppresses findings on the **next line**.
- For taint findings, the primary line is the **sink line** (the `line` field in output).
### Rule ID matching
@ -312,4 +312,4 @@ Suppressed findings do **not** trigger `--fail-on`. A scan with only suppressed
| `state-*` | State model | `state-use-after-close`, `state-resource-leak` |
| `<lang>.*.*` | AST patterns | `rs.memory.transmute`, `js.code_exec.eval` |
See the [Rule Reference](rules/index.md) for a complete listing.
See the [Rule Reference](rules.md) for a complete listing.

View file

@ -1,103 +1,101 @@
# Quick Start
# Quick start
## Your first scan
After `cargo install nyx-scanner` (or dropping a release binary on your PATH), point Nyx at a directory:
```bash
# Scan the current directory
nyx scan
# Scan a specific path
nyx scan ./my-project
```
Nyx automatically creates an SQLite index on first run. Subsequent scans skip unchanged files.
First run builds a SQLite index under `.nyx/`; later runs skip files whose content hash hasn't changed.
## Understanding the output
## What a finding looks like
A typical console output looks like:
<p align="center"><img src="../assets/screenshots/docs/cli-scan-quickstart.png" alt="nyx scan output: two HIGH taint flows (Python os.system, JavaScript document.write) framed by the brand purple gradient" width="900"/></p>
The same scan in console form:
```
[HIGH] taint-unsanitised-flow (source 5:11) src/handler.rs:12:5
Source: env::var("CMD") at 5:11
Sink: Command::new("sh").arg("-c")
Score: 76
/tmp/demo/cmdi_direct.py
6:5 ✖ [HIGH] taint-unsanitised-flow (source 5:11) (Score: 81, Confidence: High)
Unsanitised user input flows from request.args.get → os.system
[MEDIUM] cfg-unguarded-sink src/handler.rs:12:5
Score: 35
Source: request.args.get (5:11)
Sink: os.system
[MEDIUM] rs.quality.unsafe_block src/lib.rs:44:5
Score: 30
6:5 ✖ [HIGH] py.cmdi.os_system (Score: 64, Confidence: High)
Os.system() — shell command execution
/tmp/demo/xss_document_write.js
5:5 ✖ [HIGH] taint-unsanitised-flow (source 3:18) (Score: 81, Confidence: High)
Unsanitised user input flows from req.query.content → document.write
Source: req.query.content (3:18)
Sink: document.write
5:5 ⚠ [MEDIUM] js.xss.document_write (Score: 34, Confidence: High)
Document.write() — XSS sink
warning 'demo' generated 10 issues.
Finished in 0.054s.
```
Each finding shows:
Each finding is one line of header plus evidence. Fields that matter:
| Field | Meaning |
|-------|---------|
| **Severity tag** | `[HIGH]`, `[MEDIUM]`, or `[LOW]` |
| **Rule ID** | Identifies the detector and specific rule |
| **Location** | `file:line:col` |
| **Evidence** | Source, Sink, and guard details (taint findings only) |
| **Score** | Attack-surface ranking score (higher = more exploitable) |
|---|---|
| `[HIGH]` / `[MEDIUM]` / `[LOW]` | Severity after the non-prod downgrade |
| Rule ID | Either a taint rule (`taint-unsanitised-flow`), a structural rule (`cfg-*`, `state-*`), or an AST pattern (`<lang>.<category>.<name>`) |
| Score | Attack-surface ranking (severity + analysis kind + source kind + evidence). Higher is more exploitable |
| Confidence | `High`, `Medium`, `Low`. Drops for AST-only matches, capped widened flows, and lowered-to-Low backwards-infeasible findings |
| Source / Sink | Where tainted data entered and where the dangerous call happened |
## Common workflows
Two rules firing on the same line (the taint finding plus the AST pattern) is normal. The pattern matches the structural presence of `document.write`; the taint rule adds the evidence that `req.query.content` actually reached it. Both carry distinct rule IDs so suppressions can target one without the other.
### CI gate — fail on high-severity findings
## Fail a CI job on High findings
```bash
nyx scan . --fail-on high --quiet
# Exit code 1 if any HIGH finding exists, 0 otherwise
nyx scan . --fail-on HIGH --quiet
```
### Export for tooling
Exit 1 if any HIGH finding remains. `--quiet` drops the "Using default configuration" banner so CI logs stay tidy.
## Emit SARIF for GitHub Code Scanning
```bash
# JSON for scripting
nyx scan . --format json > findings.json
# SARIF for GitHub Code Scanning
nyx scan . --format sarif > results.sarif
```
### Fast structural scan (no dataflow)
Full SARIF schema and GitHub Actions wiring: [cli.md](cli.md) and [output.md](output.md).
## Tighten the gate
```bash
# Only HIGH findings
nyx scan . --severity HIGH
# HIGH + MEDIUM
nyx scan . --severity ">=MEDIUM"
# Drop anything below Medium confidence (useful for CI)
nyx scan . --min-confidence medium
# Also drop findings the engine could not fully resolve (widened / bailed)
nyx scan . --require-converged
```
`--require-converged` keeps `under-report` findings (the emitted flow is still real) but drops over-reports and widenings. Intended for strict gates where a noisy finding is worse than nothing.
## Skip dataflow for a fast first pass
```bash
nyx scan . --mode ast
```
AST-only mode runs tree-sitter pattern queries without building CFGs or running taint analysis. Much faster, but misses dataflow vulnerabilities.
AST-only mode runs tree-sitter patterns without building a CFG or running taint. It's fast and still catches banned-API uses, weak crypto, and obvious XSS sinks, but it can't tell `eval("1+1")` apart from `eval(userInput)`. Use it as a pre-commit filter, not as a CI gate replacement.
### Filter by severity
## Next
```bash
# Only high-severity
nyx scan . --severity HIGH
# High and medium
nyx scan . --severity ">=MEDIUM"
# Specific set
nyx scan . --severity "HIGH,MEDIUM"
```
### Skip the index
```bash
nyx scan . --index off
```
Useful for one-off scans or when you don't want to write to disk.
### Scan without non-production noise
By default, findings in test/vendor/build paths are downgraded one severity tier. To keep original severity:
```bash
nyx scan . --keep-nonprod-severity
```
## Next steps
- [CLI Reference](cli.md) — All flags and options
- [Configuration](configuration.md) — Customize rules, exclusions, and behavior
- [Detector Overview](detectors.md) — How the analysis engines work
- [Rule Reference](rules/index.md) — Browse all rules by language
- [CLI reference](cli.md) for every flag and subcommand.
- [Configuration](configuration.md) for the `nyx.conf` / `nyx.local` schema, profiles, and custom rules.
- [`nyx serve`](serve.md) for the browser UI, triage workflow, and scan history.
- [Language maturity](language-maturity.md) for per-language tier and known FP/FN patterns.

1
docs/roadmap.md Normal file
View file

@ -0,0 +1 @@
{{#include ../ROADMAP.md}}

258
docs/rules.md Normal file
View file

@ -0,0 +1,258 @@
# Rule reference
Every finding Nyx emits has a rule ID. This page enumerates the IDs that ship with scanner 0.5.0, grouped by family.
> This page is written by hand and drifts against the code. Authoritative sources: [`src/patterns/<lang>.rs`](https://github.com/elicpeter/nyx/tree/master/src/patterns) for AST patterns, [`src/labels/<lang>.rs`](https://github.com/elicpeter/nyx/tree/master/src/labels) for taint matchers, and [`src/auth_analysis/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/config.rs) for auth rules. If a rule fires that isn't listed here, the source file is right and this page is wrong.
If you'd rather browse rules interactively, [`nyx serve`](serve.md) ships a Rules page that lists every loaded matcher with its language, kind, and capability:
<p align="center"><img src="../assets/screenshots/docs/serve-rules.png" alt="Nyx Rules page: filterable list of 218 rules with language, kind (SOURCE/SANITIZER/SINK), capability, and finding count columns" width="900"/></p>
## ID format
| Prefix | Detector | Example |
|---|---|---|
| `taint-*` | Taint analysis | `taint-unsanitised-flow (source 5:11)` |
| `cfg-*` | CFG structural | `cfg-unguarded-sink`, `cfg-auth-gap` |
| `state-*` | State model | `state-use-after-close`, `state-resource-leak` |
| `<lang>.auth.*` | Auth analysis | `rs.auth.missing_ownership_check` |
| `<lang>.<category>.<name>` | AST patterns | `rs.memory.transmute`, `js.code_exec.eval` |
Language prefixes: `rs`, `c`, `cpp`, `go`, `java`, `js`, `ts`, `py`, `php`, `rb`.
## Cross-language rules
### Taint
One rule covers every source-to-sink flow. The parenthetical identifies the source location.
| Rule ID | Severity |
|---|---|
| `taint-unsanitised-flow (source L:C)` | Varies by source kind and sink capability |
The matcher sets (sources, sanitizers, sinks, gated sinks) live per-language in `src/labels/<lang>.rs`. [Language maturity](language-maturity.md) gives per-language counts and what's covered.
### CFG structural
| Rule ID | Severity |
|---|---|
| `cfg-unguarded-sink` | High/Medium |
| `cfg-auth-gap` | High |
| `cfg-unreachable-sink` | Medium |
| `cfg-unreachable-sanitizer` | Low |
| `cfg-unreachable-source` | Low |
| `cfg-error-fallthrough` | High/Medium |
| `cfg-resource-leak` | Medium |
| `cfg-lock-not-released` | Medium |
### State model
| Rule ID | Severity |
|---|---|
| `state-use-after-close` | High |
| `state-double-close` | Medium |
| `state-resource-leak` | Medium |
| `state-resource-leak-possible` | Low |
| `state-unauthed-access` | High |
### Auth analysis (Rust only, today)
| Rule ID | Severity |
|---|---|
| `rs.auth.missing_ownership_check` | High |
| `rs.auth.missing_ownership_check.taint` | High (gated by `scanner.enable_auth_as_taint`) |
See [auth.md](auth.md) for scope, the five sink-classes, and tuning.
## AST patterns by language
Each language ships a tree-sitter pattern registry. Structural match on the pattern, no dataflow. Some patterns also have a Tier B heuristic guard (e.g. SQL execute must receive a concatenation, not a literal) noted in the registry.
The tables below are generated from `src/patterns/<lang>.rs` by [`tools/docgen`](https://github.com/elicpeter/nyx/tree/master/tools/docgen). Run `cargo run --features docgen --bin nyx-docgen` after changing the registry to refresh them.
<!-- BEGIN AUTOGEN rules-by-language -->
### C: 8 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `c.cmdi.system` | High | A | High |
| `c.memory.gets` | High | A | High |
| `c.memory.printf_no_fmt` | High | B | Medium |
| `c.memory.scanf_percent_s` | High | A | High |
| `c.memory.sprintf` | High | A | High |
| `c.memory.strcat` | High | A | High |
| `c.memory.strcpy` | High | A | High |
| `c.cmdi.popen` | Medium | A | High |
### C++: 9 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `cpp.cmdi.popen` | High | A | High |
| `cpp.cmdi.system` | High | A | High |
| `cpp.memory.gets` | High | A | High |
| `cpp.memory.printf_no_fmt` | High | B | Medium |
| `cpp.memory.sprintf` | High | A | High |
| `cpp.memory.strcat` | High | A | High |
| `cpp.memory.strcpy` | High | A | High |
| `cpp.memory.const_cast` | Medium | A | High |
| `cpp.memory.reinterpret_cast` | Medium | A | High |
### Go: 8 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `go.cmdi.exec_command` | High | A | High |
| `go.transport.insecure_skip_verify` | High | A | High |
| `go.deser.gob_decode` | Medium | A | High |
| `go.memory.unsafe_pointer` | Medium | A | High |
| `go.secrets.hardcoded_key` | Medium | A | High |
| `go.sqli.query_concat` | Medium | B | Medium |
| `go.crypto.md5` | Low | A | Medium |
| `go.crypto.sha1` | Low | A | Medium |
### Java: 8 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `java.cmdi.runtime_exec` | High | A | High |
| `java.deser.readobject` | High | A | High |
| `java.reflection.class_forname` | Medium | A | High |
| `java.reflection.method_invoke` | Medium | A | High |
| `java.sqli.execute_concat` | Medium | B | Medium |
| `java.xss.getwriter_print` | Medium | A | High |
| `java.crypto.insecure_random` | Low | A | Medium |
| `java.crypto.weak_digest` | Low | A | Medium |
### JavaScript: 22 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `js.code_exec.eval` | High | A | High |
| `js.code_exec.new_function` | High | A | High |
| `js.config.cors_dynamic_origin` | High | A | Medium |
| `js.code_exec.settimeout_string` | Medium | A | High |
| `js.config.insecure_session_httponly` | Medium | A | High |
| `js.config.reject_unauthorized` | Medium | A | High |
| `js.config.verbose_error_response` | Medium | A | Medium |
| `js.crypto.weak_hash_import` | Medium | A | Medium |
| `js.prototype.extend_object` | Medium | A | High |
| `js.prototype.proto_assignment` | Medium | A | High |
| `js.secrets.fallback_secret` | Medium | A | Medium |
| `js.xss.cookie_write` | Medium | A | High |
| `js.xss.document_write` | Medium | A | High |
| `js.xss.insert_adjacent_html` | Medium | A | High |
| `js.xss.location_assign` | Medium | A | High |
| `js.xss.outer_html` | Medium | A | High |
| `js.config.insecure_session_samesite` | Low | A | High |
| `js.config.insecure_session_secure` | Low | A | Medium |
| `js.crypto.math_random` | Low | A | Medium |
| `js.crypto.weak_hash` | Low | A | Medium |
| `js.secrets.hardcoded_secret` | Low | A | Medium |
| `js.transport.fetch_http` | Low | A | Medium |
### PHP: 11 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `php.cmdi.system` | High | A | High |
| `php.code_exec.assert_string` | High | A | High |
| `php.code_exec.create_function` | High | A | High |
| `php.code_exec.eval` | High | A | High |
| `php.code_exec.preg_replace_e` | High | A | High |
| `php.deser.unserialize` | High | A | High |
| `php.path.include_variable` | High | B | Medium |
| `php.sqli.query_concat` | Medium | B | Medium |
| `php.crypto.md5` | Low | A | Medium |
| `php.crypto.rand` | Low | A | Medium |
| `php.crypto.sha1` | Low | A | Medium |
### Python: 13 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `py.cmdi.os_popen` | High | A | High |
| `py.cmdi.os_system` | High | A | High |
| `py.cmdi.subprocess_shell` | High | B | Medium |
| `py.code_exec.eval` | High | A | High |
| `py.code_exec.exec` | High | A | High |
| `py.deser.pickle_loads` | High | A | High |
| `py.deser.yaml_load` | High | A | High |
| `py.code_exec.compile` | Medium | A | High |
| `py.deser.shelve_open` | Medium | A | High |
| `py.sqli.execute_format` | Medium | B | Medium |
| `py.xss.jinja_from_string` | Medium | A | High |
| `py.crypto.md5` | Low | A | Medium |
| `py.crypto.sha1` | Low | A | Medium |
### Ruby: 11 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `rb.cmdi.backtick` | High | A | High |
| `rb.cmdi.system_interp` | High | A | High |
| `rb.code_exec.class_eval` | High | A | High |
| `rb.code_exec.eval` | High | A | High |
| `rb.code_exec.instance_eval` | High | A | High |
| `rb.deser.marshal_load` | High | A | High |
| `rb.deser.yaml_load` | High | A | High |
| `rb.reflection.constantize` | Medium | A | High |
| `rb.reflection.send_dynamic` | Medium | B | Medium |
| `rb.ssrf.open_uri` | Medium | A | High |
| `rb.crypto.md5` | Low | A | Medium |
### Rust: 13 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `rs.memory.copy_nonoverlapping` | High | A | High |
| `rs.memory.get_unchecked` | High | A | High |
| `rs.memory.mem_zeroed` | High | A | High |
| `rs.memory.ptr_read` | High | A | High |
| `rs.memory.transmute` | High | A | High |
| `rs.quality.unsafe_block` | Medium | A | High |
| `rs.quality.unsafe_fn` | Medium | A | High |
| `rs.memory.mem_forget` | Low | A | High |
| `rs.memory.narrow_cast` | Low | A | Medium |
| `rs.quality.expect` | Low | A | High |
| `rs.quality.panic_macro` | Low | A | High |
| `rs.quality.todo` | Low | A | High |
| `rs.quality.unwrap` | Low | A | High |
### TypeScript: 22 patterns
| Rule ID | Severity | Tier | Confidence |
|---|---|---|---|
| `ts.code_exec.eval` | High | A | High |
| `ts.code_exec.new_function` | High | A | High |
| `ts.config.cors_dynamic_origin` | High | A | Medium |
| `ts.code_exec.settimeout_string` | Medium | A | High |
| `ts.config.insecure_session_httponly` | Medium | A | High |
| `ts.config.reject_unauthorized` | Medium | A | High |
| `ts.config.verbose_error_response` | Medium | A | Medium |
| `ts.crypto.weak_hash_import` | Medium | A | Medium |
| `ts.prototype.proto_assignment` | Medium | A | High |
| `ts.secrets.fallback_secret` | Medium | A | Medium |
| `ts.xss.document_write` | Medium | A | High |
| `ts.xss.insert_adjacent_html` | Medium | A | High |
| `ts.xss.location_assign` | Medium | A | High |
| `ts.xss.outer_html` | Medium | A | High |
| `ts.config.insecure_session_samesite` | Low | A | High |
| `ts.config.insecure_session_secure` | Low | A | Medium |
| `ts.crypto.math_random` | Low | A | Medium |
| `ts.crypto.weak_hash` | Low | A | Medium |
| `ts.quality.any_annotation` | Low | A | Medium |
| `ts.quality.as_any` | Low | A | Medium |
| `ts.secrets.hardcoded_secret` | Low | A | Medium |
| `ts.xss.cookie_write` | Low | A | Medium |
<!-- END AUTOGEN rules-by-language -->
## Capability list for custom rules
`nyx config add-rule --cap <name>` and `[analysis.languages.*.rules]` in config accept:
`env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `code_exec`, `crypto`, `unauthorized_id`, `all`
Source for both the enum and the `to_cap` mapping: [`src/labels/mod.rs`](https://github.com/elicpeter/nyx/blob/master/src/labels/mod.rs) (`Cap`) and [`src/utils/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/utils/config.rs) (`CapName`).

View file

@ -1,89 +0,0 @@
# C Rules
Nyx detects C vulnerabilities through AST patterns (banned functions, format strings) and taint analysis (user input → shell execution, buffer overflow sinks).
## Taint Sources
| Function | Capability | Source Kind |
|----------|-----------|-------------|
| `getenv` | `all` | EnvironmentConfig |
| `fgets`, `scanf`, `fscanf`, `gets`, `read` | `all` | UserInput |
## Taint Sinks
| Function | Required Capability |
|----------|-------------------|
| `system`, `popen`, `exec*` family | `SHELL_ESCAPE` |
| `sprintf`, `strcpy`, `strcat` | `HTML_ESCAPE` |
| `printf`, `fprintf` | `FMT_STRING` |
| `fopen`, `open` | `FILE_IO` |
---
## AST Pattern Rules
### Memory Safety (Banned Functions)
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `c.memory.gets` | High | A | `gets()` — no bounds checking, always exploitable |
| `c.memory.strcpy` | High | A | `strcpy()` — no bounds checking on destination buffer |
| `c.memory.strcat` | High | A | `strcat()` — no bounds checking on destination buffer |
| `c.memory.sprintf` | High | A | `sprintf()` — no length limit on output buffer |
| `c.memory.scanf_percent_s` | High | A | `scanf("%s")` — unbounded string read |
| `c.memory.printf_no_fmt` | High | B | `printf(var)` — format-string vulnerability (non-literal first arg) |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `c.cmdi.system` | High | A | `system()` — shell command execution |
| `c.cmdi.popen` | Medium | A | `popen()` — shell command execution with pipe |
---
## Examples
### `c.memory.gets` — Banned function
**Vulnerable:**
```c
char buf[64];
gets(buf); // No bounds checking — buffer overflow
```
**Safe alternative:**
```c
char buf[64];
fgets(buf, sizeof(buf), stdin);
```
### `c.memory.printf_no_fmt` — Format string
**Vulnerable:**
```c
char *user_input = get_input();
printf(user_input); // Format string vulnerability
```
**Safe alternative:**
```c
char *user_input = get_input();
printf("%s", user_input);
```
### `c.cmdi.system` — Shell execution
**Vulnerable:**
```c
char cmd[256];
snprintf(cmd, sizeof(cmd), "ls %s", user_dir);
system(cmd); // Command injection if user_dir contains shell metacharacters
```
**Safe alternative:**
```c
// Use execvp with explicit argument array
char *args[] = {"ls", user_dir, NULL};
execvp("ls", args);
```

View file

@ -1,66 +0,0 @@
# C++ Rules
C++ rules inherit C banned-function concerns and add C++-specific patterns like dangerous casts.
## Taint Labels
C++ shares taint labels with C. See [C Rules](c.md) for the full source/sink/sanitizer listing.
---
## AST Pattern Rules
### Memory Safety
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `cpp.memory.gets` | High | A | `gets()` — no bounds checking, always exploitable |
| `cpp.memory.strcpy` | High | A | `strcpy()` — no bounds checking on destination |
| `cpp.memory.strcat` | High | A | `strcat()` — no bounds checking on destination |
| `cpp.memory.sprintf` | High | A | `sprintf()` — no length limit on output |
| `cpp.memory.reinterpret_cast` | Medium | A | `reinterpret_cast` — type-punning cast |
| `cpp.memory.const_cast` | Medium | A | `const_cast` — removes const/volatile qualifier |
| `cpp.memory.printf_no_fmt` | High | B | `printf(var)` — format-string vulnerability |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `cpp.cmdi.system` | High | A | `system()` — shell command execution |
| `cpp.cmdi.popen` | High | A | `popen()` — shell command execution |
---
## Examples
### `cpp.memory.reinterpret_cast` — Type-punning cast
**Flagged:**
```cpp
int x = 42;
float* fp = reinterpret_cast<float*>(&x); // Type-punning, may violate strict aliasing
```
**Safe alternative:**
```cpp
int x = 42;
float f;
std::memcpy(&f, &x, sizeof(f)); // Well-defined type punning
```
### `cpp.memory.const_cast` — Removing const
**Flagged:**
```cpp
void process(const std::string& s) {
char* p = const_cast<char*>(s.c_str()); // Removes const
p[0] = 'X'; // Undefined behavior
}
```
**Safe alternative:**
```cpp
void process(std::string s) { // Take by value
s[0] = 'X';
}
```

View file

@ -1,148 +0,0 @@
# Go Rules
Nyx detects Go vulnerabilities through AST patterns and taint analysis, covering command execution, unsafe pointer usage, TLS misconfiguration, weak crypto, SQL injection, hardcoded secrets, and deserialization.
## Taint Labels
Go has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/go.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `os.Getenv` | all |
| `http.Request`, `r.FormValue`, `r.URL`, `r.Body`, `r.Header` | all |
| `r.URL.Query`, `r.URL.Query.Get`, `Request.FormValue`, `Request.URL` | all |
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `html.EscapeString`, `template.HTMLEscapeString` | HTML_ESCAPE |
| `url.QueryEscape`, `url.PathEscape` | URL_ENCODE |
| `filepath.Clean`, `filepath.Base` | FILE_IO |
### Sinks
| Matcher | Cap |
|---------|-----|
| `exec.Command` | SHELL_ESCAPE |
| `db.Query`, `db.Exec`, `db.QueryRow`, `db.Prepare` | SHELL_ESCAPE |
| `fmt.Fprintf`, `fmt.Sprintf`, `fmt.Printf` | FMT_STRING |
| `os.Open`, `os.OpenFile`, `os.Create`, `ioutil.ReadFile`, `os.ReadFile` | FILE_IO |
| `template.HTML` | HTML_ESCAPE |
> **Note:** Chained calls like `r.URL.Query().Get("host")` are normalized by stripping internal `()` segments before matching, so `r.URL.Query.Get` matches the source rule.
---
## AST Pattern Rules
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.cmdi.exec_command` | High | A | `exec.Command()` — arbitrary process execution |
### Memory Safety
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.memory.unsafe_pointer` | Medium | A | `unsafe.Pointer` — bypasses Go type system |
### Insecure Transport
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.transport.insecure_skip_verify` | High | A | `InsecureSkipVerify: true` — disables TLS certificate validation |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.crypto.md5` | Low | A | `md5.New()` / `md5.Sum()` — weak hash algorithm |
| `go.crypto.sha1` | Low | A | `sha1.New()` / `sha1.Sum()` — weak hash algorithm |
### SQL Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.sqli.query_concat` | Medium | B | `db.Query`/`Exec`/`QueryRow` with concatenated string |
### Secrets
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.secrets.hardcoded_key` | Medium | A | Variable with secret-like name assigned a string literal |
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `go.deser.gob_decode` | Medium | A | `gob.NewDecoder` — Go binary deserialization |
---
## Examples
### `go.transport.insecure_skip_verify` — TLS misconfiguration
**Vulnerable:**
```go
tr := &http.Transport{
TLSClientConfig: &tls.Config{
InsecureSkipVerify: true, // Disables certificate verification
},
}
```
**Safe alternative:**
```go
tr := &http.Transport{
TLSClientConfig: &tls.Config{
// Use proper CA certificates
RootCAs: certPool,
},
}
```
### `go.sqli.query_concat` — SQL concatenation
**Vulnerable:**
```go
rows, err := db.Query("SELECT * FROM users WHERE id=" + userID)
```
**Safe alternative:**
```go
rows, err := db.Query("SELECT * FROM users WHERE id=$1", userID)
```
### `go.secrets.hardcoded_key` — Hardcoded secret
**Flagged:**
```go
apiKey := "sk-1234567890abcdef"
password := "hunter2"
```
**Safe alternative:**
```go
apiKey := os.Getenv("API_KEY")
password := os.Getenv("DB_PASSWORD")
```
### `go.cmdi.exec_command` — Command execution
**Vulnerable:**
```go
cmd := exec.Command("sh", "-c", userInput)
cmd.Run()
```
**Safe alternative:**
```go
// Use explicit command and arguments, not shell
cmd := exec.Command("ls", "-la", safeDir)
cmd.Run()
```

View file

@ -1,79 +0,0 @@
# Rule Reference
This section lists every detection rule in Nyx, organized by language.
## Rule ID Format
| Prefix | Detector Family | Example |
|--------|----------------|---------|
| `taint-*` | [Taint analysis](../detectors/taint.md) | `taint-unsanitised-flow (source 5:11)` |
| `cfg-*` | [CFG structural](../detectors/cfg.md) | `cfg-unguarded-sink`, `cfg-auth-gap` |
| `state-*` | [State model](../detectors/state.md) | `state-use-after-close`, `state-resource-leak` |
| `<lang>.*.*` | [AST patterns](../detectors/patterns.md) | `rs.memory.transmute`, `js.code_exec.eval` |
## Cross-Language Rules
These rules apply to all supported languages:
### Taint Rules
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `taint-unsanitised-flow (source L:C)` | Varies by source kind | Unsanitized data flows from source to sink |
### CFG Structural Rules
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `cfg-unguarded-sink` | High/Medium | Sink without dominating guard |
| `cfg-auth-gap` | High | Web handler reaches privileged sink without auth |
| `cfg-unreachable-sink` | Medium | Dangerous function in unreachable code |
| `cfg-unreachable-sanitizer` | Low | Sanitizer in unreachable code |
| `cfg-unreachable-source` | Low | Source in unreachable code |
| `cfg-error-fallthrough` | High/Medium | Error path doesn't terminate before dangerous code |
| `cfg-resource-leak` | Medium | Resource not released on all exit paths |
| `cfg-lock-not-released` | Medium | Lock not released on all exit paths |
### State Model Rules
| Rule ID | Severity | Description |
|---------|----------|-------------|
| `state-use-after-close` | High | Variable used after being closed |
| `state-double-close` | Medium | Resource closed twice |
| `state-resource-leak` | Medium | Resource never closed (definite) |
| `state-resource-leak-possible` | Low | Resource may not close on all paths |
| `state-unauthed-access` | High | Privileged operation without authentication |
## Per-Language AST Pattern Rules
Each language page lists all AST pattern rules with examples:
- [Rust](rust.md) — 12 rules (memory safety, code quality)
- [C](c.md) — 8 rules (banned functions, command execution, format strings)
- [C++](cpp.md) — 9 rules (banned functions, dangerous casts, command execution)
- [Java](java.md) — 8 rules (deserialization, command execution, reflection, SQL, crypto, XSS)
- [Go](go.md) — 8 rules (command execution, unsafe pointer, TLS, crypto, SQL, secrets, deserialization)
- [JavaScript](javascript.md) — 12 rules (code execution, XSS, prototype pollution, crypto, transport)
- [TypeScript](typescript.md) — 10 rules (mirrors JS + type-safety escapes)
- [Python](python.md) — 12 rules (code execution, command execution, deserialization, SQL, crypto, XSS)
- [PHP](php.md) — 11 rules (code execution, command execution, deserialization, SQL, path traversal, crypto)
- [Ruby](ruby.md) — 10 rules (code execution, command execution, deserialization, reflection, SSRF, crypto)
## Taint Label Coverage
Taint analysis uses language-specific source/sink/sanitizer labels. Coverage varies by language:
| Language | Sources | Sinks | Sanitizers | Coverage |
|----------|---------|-------|------------|----------|
| Rust | Complete | Complete | Complete | Full |
| JavaScript | Complete | Complete | Partial | Full |
| TypeScript | Partial | Partial | Partial | Moderate |
| Python | Partial | Complete | Partial | Moderate |
| C | Partial | Complete | Minimal | Moderate |
| C++ | Partial | Complete | Minimal | Moderate |
| Java | Partial | Partial | Partial | Moderate |
| Go | Complete | Complete | Partial | Full |
| PHP | Complete | Complete | Partial | Full |
| Ruby | Partial | Partial | Partial | Moderate |
"Starter" coverage means basic rules exist but many common library functions are not yet labeled. Contributions welcome.

View file

@ -1,135 +0,0 @@
# Java Rules
Nyx detects Java vulnerabilities through AST patterns and taint analysis, covering deserialization, command execution, reflection, SQL injection, weak crypto, and XSS.
## Taint Labels
Java has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/java.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `System.getenv` | all |
| `getParameter`, `getInputStream`, `getHeader`, `getCookies`, `getReader`, `getQueryString`, `getPathInfo` | all |
| `readObject`, `readLine` | all |
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `HtmlUtils.htmlEscape`, `StringEscapeUtils.escapeHtml4` | HTML_ESCAPE |
### Sinks
| Matcher | Cap |
|---------|-----|
| `Runtime.exec`, `ProcessBuilder` | SHELL_ESCAPE |
| `executeQuery`, `executeUpdate`, `prepareStatement` | SHELL_ESCAPE |
| `Class.forName` | SHELL_ESCAPE |
| `println`, `print`, `write` | HTML_ESCAPE |
---
## AST Pattern Rules
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.deser.readobject` | High | A | `ObjectInputStream.readObject()` — unsafe deserialization |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.cmdi.runtime_exec` | High | A | `Runtime.getRuntime().exec()` — shell command execution |
### Reflection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.reflection.class_forname` | Medium | A | `Class.forName()` — dynamic class loading |
| `java.reflection.method_invoke` | Medium | A | `Method.invoke()` — reflective method invocation |
### SQL Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.sqli.execute_concat` | Medium | B | SQL `execute*()` with concatenated string argument |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.crypto.insecure_random` | Low | A | `new Random()``java.util.Random` is not cryptographically secure |
| `java.crypto.weak_digest` | Low | A | `MessageDigest.getInstance("MD5"/"SHA1")` |
### XSS
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `java.xss.getwriter_print` | Medium | A | `response.getWriter().print/println/write` — direct output |
---
## Examples
### `java.deser.readobject` — Unsafe deserialization
**Vulnerable:**
```java
ObjectInputStream ois = new ObjectInputStream(request.getInputStream());
Object obj = ois.readObject(); // Arbitrary object instantiation
```
**Safe alternative:**
```java
// Use a safe format like JSON
ObjectMapper mapper = new ObjectMapper();
MyType obj = mapper.readValue(request.getInputStream(), MyType.class);
```
### `java.sqli.execute_concat` — SQL concatenation
**Vulnerable:**
```java
String query = "SELECT * FROM users WHERE id=" + userId;
stmt.executeQuery(query); // SQL injection
```
**Safe alternative:**
```java
PreparedStatement ps = conn.prepareStatement("SELECT * FROM users WHERE id=?");
ps.setString(1, userId);
ResultSet rs = ps.executeQuery();
```
### `java.cmdi.runtime_exec` — Command execution
**Vulnerable:**
```java
Runtime.getRuntime().exec("cmd /c " + userCommand);
```
**Safe alternative:**
```java
ProcessBuilder pb = new ProcessBuilder("cmd", "/c", "dir");
// Use explicit argument list, never concatenate user input
```
### `java.reflection.class_forname` — Dynamic class loading
**Flagged:**
```java
Class<?> cls = Class.forName(className);
Object obj = cls.getDeclaredConstructor().newInstance();
```
**Safe alternative:**
```java
// Use an allowlist of permitted class names
Map<String, Class<?>> allowed = Map.of("User", User.class, "Order", Order.class);
Class<?> cls = allowed.get(className);
if (cls != null) { /* ... */ }
```

View file

@ -1,138 +0,0 @@
# JavaScript Rules
JavaScript has the most complete taint label coverage alongside Rust. Nyx detects code execution, XSS, prototype pollution, command injection, and weak crypto.
## Taint Sources
| Function | Capability | Source Kind |
|----------|-----------|-------------|
| `document.location`, `window.location` | `all` | UserInput |
| `req.body`, `req.query`, `req.params` | `all` | UserInput |
| `req.headers`, `req.cookies` | `all` | UserInput |
| `process.env` | `all` | EnvironmentConfig |
## Taint Sinks
| Function | Required Capability |
|----------|-------------------|
| `eval` | `SHELL_ESCAPE` |
| `innerHTML` | `HTML_ESCAPE` |
| `location.href`, `window.location.href` | `URL_ENCODE` |
| `child_process.exec`, `child_process.execSync` | `SHELL_ESCAPE` |
| `child_process.spawn` | `SHELL_ESCAPE` |
## Taint Sanitizers
| Function | Strips Capability |
|----------|------------------|
| `JSON.parse` | `JSON_PARSE` |
| `encodeURIComponent`, `encodeURI` | `URL_ENCODE` |
| `DOMPurify.sanitize` | `HTML_ESCAPE` |
> **Note:** Anonymous function expressions and arrow functions passed as callback arguments (e.g., Express `app.get('/path', function(req, res) { ... })`) are automatically walked as separate function scopes for taint analysis. Each anonymous function gets a unique scope identifier to prevent cross-function taint leakage.
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.code_exec.eval` | High | A | `eval()` — dynamic code execution |
| `js.code_exec.new_function` | High | A | `new Function()` — eval equivalent |
| `js.code_exec.settimeout_string` | Medium | A | `setTimeout`/`setInterval` with string argument |
### XSS Sinks
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.xss.document_write` | Medium | A | `document.write()` / `document.writeln()` |
| `js.xss.outer_html` | Medium | A | Assignment to `.outerHTML` |
| `js.xss.insert_adjacent_html` | Medium | A | `insertAdjacentHTML()` |
| `js.xss.location_assign` | Medium | A | Assignment to `location`/`location.href` — open redirect |
| `js.xss.cookie_write` | Medium | A | Write to `document.cookie` |
### Prototype Pollution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.prototype.proto_assignment` | Medium | A | Assignment to `__proto__` |
| `js.prototype.extend_object` | Medium | A | Assignment to `Object.prototype.*` |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.crypto.weak_hash` | Low | A | `crypto.createHash("md5"/"sha1")` |
| `js.crypto.math_random` | Low | A | `Math.random()` — not cryptographically secure |
### Insecure Transport
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `js.transport.fetch_http` | Low | A | `fetch("http://...")` — plaintext HTTP |
---
## Examples
### `js.code_exec.eval` — Dynamic code execution
**Vulnerable:**
```javascript
const code = req.query.code;
eval(code); // Remote code execution
```
**Safe alternative:**
```javascript
// Use a sandboxed interpreter or avoid eval entirely
const allowed = { add: (a, b) => a + b };
const result = allowed[req.query.operation]?.(req.query.a, req.query.b);
```
### `js.xss.document_write` — XSS sink
**Vulnerable:**
```javascript
document.write("<h1>" + userName + "</h1>");
```
**Safe alternative:**
```javascript
const el = document.createElement("h1");
el.textContent = userName;
document.body.appendChild(el);
```
### `js.prototype.proto_assignment` — Prototype pollution
**Vulnerable:**
```javascript
function merge(target, source) {
for (let key in source) {
target[key] = source[key]; // If key is "__proto__", pollutes prototype
}
}
```
**Safe alternative:**
```javascript
function merge(target, source) {
for (let key in source) {
if (key === "__proto__" || key === "constructor") continue;
target[key] = source[key];
}
}
```
### Taint: `req.body``eval()`
**Finding:**
```
[HIGH] taint-unsanitised-flow (source 2:18) src/handler.js:3:5
Source: req.body at 2:18
Sink: eval()
Score: 78
```

View file

@ -1,138 +0,0 @@
# PHP Rules
Nyx detects PHP vulnerabilities through AST patterns and taint analysis, covering code execution, command injection, deserialization, SQL injection, path traversal, and weak crypto.
## Taint Labels
PHP has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/php.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `$_GET` / `_GET`, `$_POST` / `_POST`, `$_REQUEST` / `_REQUEST`, `$_COOKIE` / `_COOKIE`, `$_FILES` / `_FILES`, `$_SERVER` / `_SERVER`, `$_ENV` / `_ENV` | all |
| `file_get_contents`, `fread` | all |
> **Note:** PHP superglobal names are matched both with and without the `$` prefix because the CFG's `collect_idents` strips the leading `$` from variable names. Subscript access like `$_GET['cmd']` is handled via `element_reference` / `subscript_expression` node detection.
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `htmlspecialchars`, `htmlentities` | HTML_ESCAPE |
| `escapeshellarg`, `escapeshellcmd` | SHELL_ESCAPE |
| `basename` | FILE_IO |
### Sinks
| Matcher | Cap |
|---------|-----|
| `system`, `exec`, `passthru`, `shell_exec`, `proc_open`, `popen` | SHELL_ESCAPE |
| `eval`, `assert` | SHELL_ESCAPE |
| `include`, `include_once`, `require`, `require_once` | FILE_IO |
| `unserialize` | SHELL_ESCAPE |
| `move_uploaded_file`, `copy`, `file_put_contents`, `fwrite` | FILE_IO |
| `echo`, `print` | HTML_ESCAPE |
| `mysqli_query`, `pg_query`, `query` | SHELL_ESCAPE |
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.code_exec.eval` | High | A | `eval()` — dynamic code execution |
| `php.code_exec.create_function` | High | A | `create_function()` — deprecated eval-like constructor |
| `php.code_exec.preg_replace_e` | High | A | `preg_replace` with `/e` modifier — code execution via regex |
| `php.code_exec.assert_string` | High | A | `assert()` with string argument — evaluates PHP code |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.cmdi.system` | High | A | `system`/`shell_exec`/`exec`/`passthru`/`proc_open`/`popen` |
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.deser.unserialize` | High | A | `unserialize()` — PHP object injection |
### SQL Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.sqli.query_concat` | Medium | B | `mysql_query`/`mysqli_query` with concatenated SQL |
### Path Traversal
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.path.include_variable` | High | B | `include`/`require` with variable path — file inclusion |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `php.crypto.md5` | Low | A | `md5()` — weak hash function |
| `php.crypto.sha1` | Low | A | `sha1()` — weak hash function |
| `php.crypto.rand` | Low | A | `rand()`/`mt_rand()` — not cryptographically secure |
---
## Examples
### `php.code_exec.eval` — Dynamic code execution
**Vulnerable:**
```php
eval($_GET['code']);
```
**Safe alternative:**
```php
// Never use eval with user input
// Use a template engine or allowlisted operations
```
### `php.deser.unserialize` — Object injection
**Vulnerable:**
```php
$obj = unserialize($_COOKIE['data']);
```
**Safe alternative:**
```php
$data = json_decode($_COOKIE['data'], true);
```
### `php.path.include_variable` — File inclusion
**Vulnerable:**
```php
include($_GET['page']); // Local/remote file inclusion
```
**Safe alternative:**
```php
$allowed = ['home', 'about', 'contact'];
$page = in_array($_GET['page'], $allowed) ? $_GET['page'] : 'home';
include("pages/{$page}.php");
```
### `php.sqli.query_concat` — SQL concatenation
**Vulnerable:**
```php
mysqli_query($conn, "SELECT * FROM users WHERE id=" . $_GET['id']);
```
**Safe alternative:**
```php
$stmt = $conn->prepare("SELECT * FROM users WHERE id=?");
$stmt->bind_param("i", $_GET['id']);
$stmt->execute();
```

View file

@ -1,142 +0,0 @@
# Python Rules
Nyx detects Python vulnerabilities through AST patterns and taint analysis, covering code execution, command injection, deserialization, SQL injection, and weak crypto.
## Taint Labels
Python has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/python.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `os.getenv`, `os.environ` | all |
| `request.args`, `request.form`, `request.json`, `request.headers`, `request.cookies`, `input` | all |
| `sys.argv` | all |
| `argparse.parse_args`, `urllib.request.urlopen`, `requests.get`, `requests.post` | all |
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `html.escape` | HTML_ESCAPE |
| `shlex.quote` | SHELL_ESCAPE |
| `os.path.realpath` | FILE_IO |
### Sinks
| Matcher | Cap |
|---------|-----|
| `eval`, `exec` | SHELL_ESCAPE |
| `os.system`, `os.popen`, `subprocess.call`, `subprocess.run`, `subprocess.Popen`, `subprocess.check_output`, `subprocess.check_call` | SHELL_ESCAPE |
| `cursor.execute`, `cursor.executemany` | SHELL_ESCAPE |
| `send_file`, `send_from_directory` | FILE_IO |
| `open` | FILE_IO |
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.code_exec.eval` | High | A | `eval()` — dynamic code execution |
| `py.code_exec.exec` | High | A | `exec()` — dynamic code execution |
| `py.code_exec.compile` | Medium | A | `compile()` with exec/eval mode |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.cmdi.os_system` | High | A | `os.system()` — shell command execution |
| `py.cmdi.os_popen` | High | A | `os.popen()` — shell command execution |
| `py.cmdi.subprocess_shell` | High | B | `subprocess.*` with `shell=True` |
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.deser.pickle_loads` | High | A | `pickle.loads()` / `pickle.load()` — arbitrary object deserialization |
| `py.deser.yaml_load` | High | A | `yaml.load()` without SafeLoader |
| `py.deser.shelve_open` | Medium | A | `shelve.open()` — pickle-backed deserialization |
### SQL Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.sqli.execute_format` | Medium | B | `cursor.execute()` with string concatenation |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.crypto.md5` | Low | A | `hashlib.md5()` — weak hash algorithm |
| `py.crypto.sha1` | Low | A | `hashlib.sha1()` — weak hash algorithm |
### Template Injection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `py.xss.jinja_from_string` | Medium | A | `jinja2.Template.from_string()` — template injection |
---
## Examples
### `py.deser.pickle_loads` — Unsafe deserialization
**Vulnerable:**
```python
import pickle
data = pickle.loads(request.body) # Arbitrary code execution
```
**Safe alternative:**
```python
import json
data = json.loads(request.body) # JSON is safe
```
### `py.cmdi.subprocess_shell` — Shell execution
**Vulnerable:**
```python
import subprocess
subprocess.call(user_input, shell=True) # Command injection
```
**Safe alternative:**
```python
import subprocess
import shlex
subprocess.call(shlex.split(user_input), shell=False)
# Or better: use an explicit command list
subprocess.call(["ls", "-la", user_dir])
```
### `py.deser.yaml_load` — Unsafe YAML
**Vulnerable:**
```python
import yaml
config = yaml.load(user_data) # Can instantiate arbitrary objects
```
**Safe alternative:**
```python
import yaml
config = yaml.safe_load(user_data) # Only basic Python types
```
### `py.sqli.execute_format` — SQL concatenation
**Vulnerable:**
```python
cursor.execute("SELECT * FROM users WHERE id=" + user_id)
```
**Safe alternative:**
```python
cursor.execute("SELECT * FROM users WHERE id=?", (user_id,))
```

View file

@ -1,132 +0,0 @@
# Ruby Rules
Nyx detects Ruby vulnerabilities through AST patterns and taint analysis, covering code execution, command injection, deserialization, reflection, SSRF, and weak crypto.
## Taint Labels
Ruby has moderate taint label coverage. Sources, sinks, and sanitizers are defined in `src/labels/ruby.rs`.
### Sources
| Matcher | Cap |
|---------|-----|
| `ENV`, `gets` | all |
| `params` | all |
> **Note:** Ruby's `params[:cmd]` subscript access is detected via `element_reference` node handling in the CFG. Sinatra/Rails `do...end` blocks are walked as function scopes.
### Sanitizers
| Matcher | Cap |
|---------|-----|
| `CGI.escapeHTML`, `ERB::Util.html_escape` | HTML_ESCAPE |
| `Shellwords.escape`, `Shellwords.shellescape` | SHELL_ESCAPE |
### Sinks
| Matcher | Cap |
|---------|-----|
| `system`, `exec` | SHELL_ESCAPE |
| `eval` | SHELL_ESCAPE |
| `puts`, `print` | HTML_ESCAPE |
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.code_exec.eval` | High | A | `Kernel#eval` — dynamic code execution |
| `rb.code_exec.instance_eval` | High | A | `instance_eval` — evaluates string in object context |
| `rb.code_exec.class_eval` | High | A | `class_eval` / `module_eval` — evaluates string in class context |
### Command Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.cmdi.backtick` | High | A | Backtick shell execution (`` `cmd` ``) |
| `rb.cmdi.system_interp` | High | A | `system`/`exec` call — command execution risk |
### Deserialization
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.deser.yaml_load` | High | A | `YAML.load` — arbitrary object deserialization |
| `rb.deser.marshal_load` | High | A | `Marshal.load` — arbitrary Ruby object deserialization |
### Reflection
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.reflection.send_dynamic` | Medium | B | `send()` with non-symbol argument — arbitrary method dispatch |
| `rb.reflection.constantize` | Medium | A | `constantize` / `safe_constantize` — dynamic class resolution |
### SSRF
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.ssrf.open_uri` | Medium | A | `Kernel#open` with HTTP URL — SSRF via open-uri |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rb.crypto.md5` | Low | A | `Digest::MD5` — weak hash algorithm |
---
## Examples
### `rb.deser.yaml_load` — Unsafe YAML deserialization
**Vulnerable:**
```ruby
data = YAML.load(params[:config]) # Arbitrary object instantiation
```
**Safe alternative:**
```ruby
data = YAML.safe_load(params[:config]) # Only basic Ruby types
```
### `rb.cmdi.backtick` — Backtick shell execution
**Vulnerable:**
```ruby
output = `ls #{user_dir}` # Command injection via interpolation
```
**Safe alternative:**
```ruby
require 'open3'
output, status = Open3.capture2('ls', user_dir)
```
### `rb.reflection.send_dynamic` — Dynamic method dispatch
**Vulnerable:**
```ruby
obj.send(params[:method], params[:arg]) # Arbitrary method invocation
```
**Safe alternative:**
```ruby
allowed = %w[name email phone]
if allowed.include?(params[:method])
obj.send(params[:method])
end
```
### `rb.deser.marshal_load` — Marshal deserialization
**Vulnerable:**
```ruby
obj = Marshal.load(request.body.read)
```
**Safe alternative:**
```ruby
data = JSON.parse(request.body.read)
```

View file

@ -1,105 +0,0 @@
# Rust Rules
Nyx detects Rust vulnerabilities through AST patterns (memory safety, code quality) and taint analysis (command injection via `env::var``Command::new`).
## Taint Sources
| Function | Capability | Source Kind |
|----------|-----------|-------------|
| `std::env::var`, `env::var` | `all` | EnvironmentConfig |
## Taint Sinks
| Function | Required Capability |
|----------|-------------------|
| `Command::new`, `Command::arg`, `Command::args` | `SHELL_ESCAPE` |
| `Command::status`, `Command::output` | `SHELL_ESCAPE` |
| `fs::read_to_string`, `fs::write`, `fs::read`, `File::open`, `File::create` | `FILE_IO` |
## Taint Sanitizers
| Function | Strips Capability |
|----------|------------------|
| `html_escape::encode_safe`, `sanitize_html` | `HTML_ESCAPE` |
| `shell_escape::unix::escape`, `sanitize_shell` | `SHELL_ESCAPE` |
> **Note:** `fs::read_to_string` was moved from taint sources to sinks to support path traversal detection (`env::var``fs::read_to_string`).
---
## AST Pattern Rules
### Memory Safety
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rs.memory.transmute` | High | A | `std::mem::transmute` — unchecked type reinterpretation |
| `rs.memory.copy_nonoverlapping` | High | A | `ptr::copy_nonoverlapping` — raw pointer memcpy |
| `rs.memory.get_unchecked` | High | A | `get_unchecked` / `get_unchecked_mut` — unchecked indexing |
| `rs.memory.mem_zeroed` | High | A | `std::mem::zeroed` — may be UB for non-POD types |
| `rs.memory.ptr_read` | High | A | `ptr::read` / `ptr::read_volatile` — raw pointer dereference |
| `rs.memory.narrow_cast` | Low | A | `as u8`/`i8`/`u16`/`i16` — possible truncation |
| `rs.memory.mem_forget` | Low | A | `std::mem::forget` — may leak resources |
### Code Quality
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `rs.quality.unsafe_block` | Medium | A | `unsafe { }` block — manual memory safety obligation |
| `rs.quality.unsafe_fn` | Medium | A | `unsafe fn` declaration |
| `rs.quality.unwrap` | Low | A | `.unwrap()` — panics on `None`/`Err` |
| `rs.quality.expect` | Low | A | `.expect()` — panics on `None`/`Err` |
| `rs.quality.panic_macro` | Low | A | `panic!()` macro invocation |
| `rs.quality.todo` | Low | A | `todo!()` / `unimplemented!()` placeholder |
---
## Examples
### `rs.memory.transmute` — Unchecked type reinterpretation
**Vulnerable:**
```rust
let x: u32 = 42;
let y: f32 = unsafe { std::mem::transmute(x) };
```
**Safe alternative:**
```rust
let x: u32 = 42;
let y: f32 = f32::from_bits(x);
```
### `rs.quality.unsafe_block` — Unsafe block
**Flagged:**
```rust
unsafe {
let ptr = &x as *const i32;
println!("{}", *ptr);
}
```
**Safe alternative:**
```rust
// Use safe abstractions when possible
println!("{}", x);
```
### Taint: `env::var``Command::new`
**Vulnerable:**
```rust
let cmd = std::env::var("USER_CMD").unwrap();
Command::new("sh").arg("-c").arg(&cmd).output()?;
```
**Safe alternative:**
```rust
let cmd = std::env::var("USER_CMD").unwrap();
// Validate against allowlist
let allowed = ["ls", "whoami", "date"];
if allowed.contains(&cmd.as_str()) {
Command::new(&cmd).output()?;
}
```

View file

@ -1,81 +0,0 @@
# TypeScript Rules
TypeScript rules mirror JavaScript patterns plus TypeScript-specific type-safety escape detectors. Taint labels are shared with JavaScript (see [JavaScript Rules](javascript.md)).
---
## AST Pattern Rules
### Code Execution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.code_exec.eval` | High | A | `eval()` — dynamic code execution |
| `ts.code_exec.new_function` | High | A | `new Function()` — eval equivalent |
| `ts.code_exec.settimeout_string` | Medium | A | `setTimeout`/`setInterval` with string argument |
### XSS Sinks
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.xss.document_write` | Medium | A | `document.write()` / `document.writeln()` |
| `ts.xss.outer_html` | Medium | A | Assignment to `.outerHTML` |
| `ts.xss.insert_adjacent_html` | Medium | A | `insertAdjacentHTML()` |
| `ts.xss.location_assign` | Medium | A | Assignment to `location`/`location.href` |
| `ts.xss.cookie_write` | Low | A | Write to `document.cookie` |
### Prototype Pollution
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.prototype.proto_assignment` | Medium | A | Assignment to `__proto__` |
### Weak Crypto
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.crypto.math_random` | Low | A | `Math.random()` — not cryptographically secure |
### Code Quality (TypeScript-specific)
| Rule ID | Severity | Tier | Description |
|---------|----------|------|-------------|
| `ts.quality.any_annotation` | Low | A | Type annotation of `any` — disables type checking |
| `ts.quality.as_any` | Low | A | Type assertion `as any` — type-safety escape hatch |
---
## Examples
### `ts.quality.any_annotation``any` type
**Flagged:**
```typescript
function process(data: any) { // ts.quality.any_annotation
data.whatever(); // No type checking
}
```
**Safe alternative:**
```typescript
interface UserData { name: string; email: string; }
function process(data: UserData) {
console.log(data.name);
}
```
### `ts.quality.as_any` — Type assertion escape
**Flagged:**
```typescript
const result = someValue as any; // ts.quality.as_any
result.nonexistentMethod();
```
**Safe alternative:**
```typescript
if (isValidType(someValue)) {
const result = someValue as KnownType;
result.knownMethod();
}
```

124
docs/serve.md Normal file
View file

@ -0,0 +1,124 @@
# `nyx serve`: the browser UI
The CLI is fine for CI. For triage, you want context: the source snippet, the dataflow path, the history of how a finding has moved across scans, and a place to record decisions that survive the next run. `nyx serve` boots a local React UI bound to loopback.
```bash
nyx serve # opens http://localhost:9700 in your default browser
nyx serve ./my-project # serve a specific project root
nyx serve --port 9750 # override port
nyx serve --no-browser # don't auto-open
```
Persistent settings live under `[server]` in `nyx.conf` / `nyx.local`.
<p align="center"><img src="../assets/screenshots/docs/serve-overview.png" alt="Nyx UI overview: total findings, severity breakdown, language and category distribution, top affected files" width="900"/></p>
## What it serves, and what it doesn't
The frontend is built and embedded into the `nyx` binary at compile time. There's no separate install step, and the binary serves the entire UI from memory; nothing is fetched from a CDN. The UI talks to the local Nyx process over a small JSON API.
There is **no** account, no telemetry, no remote logging, no auto-update ping. The data the UI shows is the data on your disk: the SQLite project index plus `.nyx/triage.json`.
## Security model
`nyx serve` enforces three things at the HTTP layer ([`src/server/security.rs`](https://github.com/elicpeter/nyx/blob/master/src/server/security.rs)):
1. **Loopback bind only.** `--host` and `[server].host` are clamped to `127.0.0.1`, `localhost`, or `::1`. Any other value is refused at startup with `Nyx serve only binds to loopback addresses; refused host '<value>'`.
2. **Host-header check.** Every request must carry a `Host` header that matches the bound address and port. Missing or mismatched headers get a `400 invalid Host header`. Defends against DNS rebinding.
3. **CSRF on mutations.** `POST` / `PUT` / `PATCH` / `DELETE` requests must carry a per-process CSRF token in the `x-nyx-csrf` header. The token is generated once when the server starts and exposed at `GET /api/health` so the embedded SPA can read it. Cross-origin mutations are rejected before the CSRF check via the `Origin` header.
If you forward the port over SSH or expose it through a reverse proxy, the host-header check will reject the request because the `Host` won't match `localhost:9700`. That's the intended behaviour. Don't do this without a deliberate reason; the loopback bind is part of the security model.
## The pages
| Path | Page |
|---|---|
| `/` | Overview |
| `/findings` | Findings list |
| `/findings/:id` | Finding detail |
| `/triage` | Triage |
| `/explorer` | Explorer |
| `/scans` | Scans |
| `/scans/:id` | Scan detail and compare |
| `/rules` | Rules |
| `/rules/:id` | Rule detail |
| `/config` | Config |
The numeric `:id` for finding URLs is the position index in the current scan, not a stable fingerprint. Bookmarks across scans aren't reliable; rely on file path + line.
### Findings and Finding detail
The findings list is filterable by severity, confidence, category, language, rule ID, and triage state.
<p align="center"><img src="../assets/screenshots/docs/serve-findings-list.png" alt="Nyx findings list: 13 findings filtered by severity/confidence/rule, with status badges, file paths, and language tags" width="900"/></p>
Clicking through opens the **flow visualiser**: a numbered walk from source to sink with the snippet at each step, cross-file markers when the path leaves the current file, the rule's "How to fix" guidance, and the engine's evidence object inline.
<p align="center"><img src="../assets/screenshots/docs/serve-finding-detail.png" alt="Nyx finding detail: HIGH taint-unsanitised-flow showing source → call → sink steps, How to fix guidance, and evidence panel" width="900"/></p>
Engine notes call out when precision was bounded for that finding (`OriginsTruncated`, `PointsToTruncated`, `PathWidened`, `ForwardBailed`, etc.). Anything tagged `under-report` means the emitted flow is real and the result set is a lower bound; `over-report` means widening or bail. `--require-converged` in the CLI drops the over-report ones for strict gates.
### Triage
Each finding carries a triage state: `open`, `investigating`, `false_positive`, `accepted_risk`, `suppressed`, or `fixed`. The triage page bulk-updates them and shows the audit trail.
<p align="center"><img src="../assets/screenshots/docs/serve-triage.png" alt="Nyx triage page: 13 findings need attention, severity breakdown, Findings/Suppression rules/Audit log tabs, rule chips, Investigate buttons" width="900"/></p>
State writes are persisted to SQLite immediately, and (when `[server].triage_sync = true`, default on) mirrored to `.nyx/triage.json` in the project root. Commit that file:
```bash
git add .nyx/triage.json
```
It carries decisions across machines so a teammate's local scan reflects yours. The format is documented in [`src/server/triage_sync.rs`](https://github.com/elicpeter/nyx/blob/master/src/server/triage_sync.rs); the schema is stable and round-trip-safe with `nyx serve` re-imports.
### Explorer
A file tree with per-file finding counts, syntax-highlighted source, and a right rail with the file's symbols and findings. Useful for "what's wrong with this module" rather than "what's wrong with this finding".
<p align="center"><img src="../assets/screenshots/docs/serve-explorer.png" alt="Nyx explorer: file tree with per-file finding counts, syntax-highlighted Python source with red sink marker on the os.system line, file-summary right rail with findings" width="900"/></p>
The path query string preselects a file: `/explorer?file=src/handler.rs`.
### Scans and compare
Past runs are persisted when `[runs].persist = true` (off by default to avoid disk growth on heavy users). When persistence is on, `/scans` lists historical runs.
<p align="center"><img src="../assets/screenshots/docs/serve-scans.png" alt="Nyx scans list: completed scan run with root, duration, finding count, languages, and started timestamp" width="900"/></p>
Each run drills into a detail page with files scanned, findings count, duration, languages, and a per-pass timing breakdown.
<p align="center"><img src="../assets/screenshots/docs/serve-scan-detail.png" alt="Nyx scan detail: Summary tab with files scanned, findings, duration, languages; Details panel with Scan ID, Root, Engine version, started/finished timestamps; Timing breakdown bar showing Walk/Pass 1/Call Graph/Pass 2/Post" width="900"/></p>
Pick two scans to diff and see what got introduced, fixed, or rediscovered between runs. The retention cap is `[runs].max_runs` (default 100). Each run can also optionally save its log and stdout (`save_logs`, `save_stdout`); both are off by default. Code snippets are saved (`save_code_snippets = true`); turn off if storage is tight.
### Rules
Every rule the engine knows about, built-in plus user-added. Each row shows the matchers, kind (source / sanitiser / sink), capability, language, and how many findings it produced in the latest scan. Filter by language, by kind, or by free text.
<p align="center"><img src="../assets/screenshots/docs/serve-rules.png" alt="Nyx rules page: 218 rules with language/kind dropdowns and a matcher search; rows showing rule title, language, kind (SOURCE/SANITIZER/SINK), cap, and finding count" width="900"/></p>
User-added rules can be deleted from this page; built-ins are immutable. Built-ins live in `src/labels/<lang>.rs` and `src/patterns/<lang>.rs`; user-added entries write to `nyx.local`.
### Config
A live config editor. Reads the merged config (`nyx.conf` + `nyx.local`), lets you flip switches and add custom source / sanitizer / sink rules, and writes back to `nyx.local`. Changes apply to the next scan; the running server uses its initial config snapshot.
<p align="center"><img src="../assets/screenshots/docs/serve-config.png" alt="Nyx config page: General settings (analysis mode, max file size, excluded extensions, attack-surface ranking), Triage Sync toggle, Sources section with language/matcher/capability dropdowns and a per-language matcher table" width="900"/></p>
The custom-rule form picks a language, a matcher (function or property name), and a capability. The capability list matches the `Cap` bitflags the taint engine uses; see [rules.md](rules.md#capability-list-for-custom-rules) for what each one means.
## API surface
For tooling, the JSON endpoints under `/api/` are stable enough to script against. The full route map lives in [`src/server/routes/mod.rs`](https://github.com/elicpeter/nyx/blob/master/src/server/routes/mod.rs). Mutating endpoints require the `x-nyx-csrf` header (read it from `GET /api/health`).
## Disabling
If you don't want the UI for a project, set:
```toml
[server]
enabled = false
```
`nyx serve` will refuse to start. The CLI continues to work.

View file

@ -1,74 +0,0 @@
/**
EXPECTED OUTPUT (high-level):
1) cfg-unguarded-sink (High / High confidence)
- handler(req,res): source req.body.cmd flows to child_process.exec(cmd) without sanitizer/guard.
- Should rank high (entry-point-ish function name 'handler', close to entry).
2) cfg-auth-gap (High / Medium)
- handler is entry-point-ish (name matches handler/route/api conventions).
- No auth guard dominates sink (require_auth / is_authenticated / is_admin / authorize).
3) cfg-error-fallthrough (Medium / Medium)
- Example: if (err) { console.log(err); } then exec(...) still runs.
- This is the JS analogue of your Go heuristic. If your implementation only targets Go, this should be NO finding.
If you later generalize, this file includes a pattern you can test against.
4) cfg-unguarded-sink (HTML) (Medium/High)
- req.query.html is written into innerHTML without DOMPurify.sanitize
5) No findings for safe paths:
- safeHandler uses encodeURIComponent before exec (URL_ENCODE sanitizer) OR uses a dedicated sanitizer you map to SHELL_ESCAPE.
NOTE: encodeURIComponent is URL_ENCODE, not SHELL_ESCAPE so for SHELL_ESCAPE sinks, it may still be flagged depending on your caps logic.
The definitely safe case here uses a dummy sanitize_shell() wrapper to match your Rust-style naming if you add it for JS later.
- safeHtml uses DOMPurify.sanitize before innerHTML (HTML_ESCAPE).
Taint / dataflow:
- should find taint from req.body / req.query / process.env sources to exec/eval/innerHTML sinks.
*/
const child_process = require("child_process");
// ─── Entry-point-ish + unguarded shell sink + auth gap ────────────────────────────
function handler(req, res) {
// Source (Cap::all): req.body
const cmd = req.body.cmd;
// Vulnerable sink (Cap::SHELL_ESCAPE): child_process.exec
child_process.exec(cmd);
res.end("ok");
}
// ─── Guarded HTML sink (should NOT be flagged) ────────────────────────────────────
function safeHtml(req, res, DOMPurify) {
const html = req.query.html; // Source
const cleaned = DOMPurify.sanitize(html); // Sanitizer(HTML_ESCAPE)
document.getElementById("app").innerHTML = cleaned; // Sink(HTML_ESCAPE)
res.end("ok");
}
// ─── Unguarded HTML sink (should be flagged) ─────────────────────────────────────
function unsafeHtml(req, res) {
const html = req.query.html; // Source
document.getElementById("app").innerHTML = html; // Sink(HTML_ESCAPE) without sanitizer
res.end("ok");
}
// ─── Heuristic error fallthrough pattern (JS analogue) ───────────────────────────
// If your error-handling analysis is Go-only, ignore this for now.
// If generalized later, it should be flagged.
function errFallthrough(req, res) {
const err = req.query.err;
if (err) {
console.log(err);
}
child_process.exec(req.body.cmd);
res.end("ok");
}
// ─── Optional: eval sink (should be flagged) ─────────────────────────────────────
function evalSink(req) {
const payload = process.env.PAYLOAD; // Source
eval(payload); // Sink(SHELL_ESCAPE) per your rules
}

View file

@ -1,99 +0,0 @@
/*!
EXPECTED OUTPUT (high-level):
1) cfg-unguarded-sink (High / High confidence)
- In handle_request(): user input from std::env::var("INPUT") flows to std::process::Command::new("sh").arg(&input)
- No dominating SHELL_ESCAPE sanitizer or validation guard for that value.
- This should rank very high in scoring (entry-point-ish name + close to entry + shell sink).
2) cfg-auth-gap (High / Medium confidence)
- handle_request() looks like an entry-point (name matches handle_*)
- Contains a shell sink without an auth guard (require_auth / is_authenticated / is_admin etc.)
3) cfg-resource-leak (Medium / High or Medium confidence)
- alloc_then_return_leak(): malloc without free on an early return path.
4) cfg-unreachable-sanitizer or cfg-unreachable-guard (Medium/Low)
- unreachable_sanitizer(): sanitizer call in unreachable block.
5) taint / dataflow (existing BFS taint engine):
- should detect at least one taint finding for:
env::var source -> Command sink
- should NOT flag safe_shell() because it uses shell_escape::unix::escape(&input) and passes `safe`.
Notes:
- This fixture intentionally contains both vulnerable and safe patterns, plus unreachable code and resource misuse,
to exercise cfg_analysis::{unreachable, guards, auth, resources, scoring}.
*/
use std::process::Command;
// ─── CFG: Entry-point-ish + unguarded sink + auth gap ─────────────────────────────
pub fn handle_request() {
// Source (Cap::all)
let input = std::env::var("INPUT").unwrap();
// Vulnerable sink (Cap::SHELL_ESCAPE)
Command::new("sh").arg(&input).status().unwrap();
}
// ─── CFG: Guarded sink (should NOT produce cfg-unguarded-sink) ────────────────────
pub fn safe_shell() {
let input = std::env::var("INPUT").unwrap();
// Sanitizer (Cap::SHELL_ESCAPE)
let safe = shell_escape::unix::escape(&input);
// Sink, but guarded by dominating sanitizer
Command::new("sh").arg(&safe).status().unwrap();
}
// ─── CFG: Unreachable sanitizer (should report unreachable sanitizer/guard) ───────
pub fn unreachable_sanitizer() {
let input = std::env::var("INPUT").unwrap();
return;
// This block is unreachable; should produce an unreachable finding for sanitizer call.
let _safe = shell_escape::unix::escape(&input);
}
// ─── CFG: Resource misuse (malloc without free on some exit path) ─────────────────
extern "C" {
fn malloc(size: usize) -> *mut u8;
fn free(ptr: *mut u8);
}
pub fn alloc_then_return_leak(flag: bool) {
unsafe {
let p = malloc(128);
// Early return leaks `p` on this path.
if flag {
return;
}
free(p);
}
}
// ─── Extra: HTML sink labeling sanity (optional) ──────────────────────────────────
// `sink_html` is a test marker recognized as Sink(HTML_ESCAPE) by the label rules.
// In real code this would be something like response.body(), template.render(), etc.
fn sink_html(_s: &str) {}
pub fn html_print() {
let raw = std::env::var("HTML").unwrap();
sink_html(&raw);
}
pub fn html_print_sanitized() {
let raw = std::env::var("HTML").unwrap();
let safe = html_escape::encode_safe(&raw);
sink_html(&safe);
}

View file

@ -1,36 +0,0 @@
// ─────────────────────────────────────────────────────────────────────────────
// examples/cross-file/config.rs — Sources
//
// This module reads untrusted data from the environment and filesystem.
// Every public function here acts as a **source** — its return value
// carries taint.
//
// ┌─────────────────────────────────────────────────────────────────────────┐
// │ FuncSummary produced by pass 1: │
// │ │
// │ get_user_command → source_caps: ALL, sink: 0, sanitizer: 0 │
// │ get_config_path → source_caps: ALL, sink: 0, sanitizer: 0 │
// │ load_template → source_caps: ALL, sink: 0, sanitizer: 0 │
// └─────────────────────────────────────────────────────────────────────────┘
// ─────────────────────────────────────────────────────────────────────────────
use std::env;
use std::fs;
/// Reads a user-supplied command from the environment.
/// Taint: SOURCE(ALL) — caller must sanitise before passing to any sink.
pub fn get_user_command() -> String {
env::var("USER_CMD").unwrap_or_default()
}
/// Reads a path from the environment.
/// Taint: SOURCE(ALL)
pub fn get_config_path() -> String {
env::var("CONFIG_PATH").unwrap_or_default()
}
/// Reads an HTML template from disk (path is trusted, *content* is not).
/// Taint: SOURCE(ALL)
pub fn load_template(path: &str) -> String {
fs::read_to_string(path).unwrap_or_default()
}

View file

@ -1,41 +0,0 @@
// ─────────────────────────────────────────────────────────────────────────────
// examples/cross-file/exec.rs — Sinks
//
// Functions that perform dangerous operations. Passing tainted data to
// these without the matching sanitiser is a vulnerability.
//
// ┌─────────────────────────────────────────────────────────────────────────┐
// │ FuncSummary produced by pass 1: │
// │ │
// │ run_command → sink_caps: SHELL_ESCAPE, tainted_sink_params: [0] │
// │ render_page → sink_caps: HTML_ESCAPE, tainted_sink_params: [0] │
// │ log_and_execute → sink_caps: SHELL_ESCAPE, source_caps: ALL │
// │ (both a source AND a sink!) │
// └─────────────────────────────────────────────────────────────────────────┘
// ─────────────────────────────────────────────────────────────────────────────
use std::env;
use std::process::Command;
/// Executes a shell command.
/// Taint: SINK(SHELL_ESCAPE) on `cmd` (param 0).
pub fn run_command(cmd: &str) {
Command::new("sh").arg(cmd).status().unwrap();
}
/// Renders user content into an HTML page.
/// Taint: SINK(HTML_ESCAPE) on `body` (param 0).
pub fn render_page(body: &str) {
println!("<html><body>{body}</body></html>");
}
/// Reads an env var *and* shells out — a function that is simultaneously
/// a source (return value) and a sink (cmd parameter).
///
/// This exercises the "independent caps" design: source_caps and sink_caps
/// are both non-zero on the same summary.
pub fn log_and_execute(cmd: &str) -> String {
let log_path = env::var("LOG_PATH").unwrap_or_default();
Command::new("sh").arg(cmd).status().unwrap();
log_path
}

View file

@ -1,148 +0,0 @@
// ─────────────────────────────────────────────────────────────────────────────
// examples/cross-file/main.rs — The caller
//
// This file calls functions from config.rs, sanitize.rs, and exec.rs.
// It never directly touches std::env, std::fs, or std::process — every
// source, sanitiser, and sink lives in another file.
//
// Nyx's two-pass cross-file taint analysis should:
// • Pass 1: summarise config.rs, sanitize.rs, exec.rs
// • Pass 2: resolve calls in main.rs against those summaries
//
// ─────────────────────────────────────────────────────────────────────────────
//
// EXPECTED NYX OUTPUT
// ===================
//
// examples/cross-file/main.rs
// 12:5 [High] taint-unsanitised-flow ← case_1_direct_source_to_sink
// 22:5 [High] taint-unsanitised-flow ← case_3_wrong_sanitiser
// 34:5 [High] taint-unsanitised-flow ← case_5_passthrough_preserves_taint
// 40:5 [High] taint-unsanitised-flow ← case_6_taint_through_branch
// 50:5 [High] taint-unsanitised-flow ← case_8_source_and_sink_same_fn
//
// examples/cross-file/exec.rs
// 30:5 [High] taint-unsanitised-flow ← log_and_execute internal vuln
//
// NO findings expected for:
// case_2 (correct sanitiser applied)
// case_4 (correct html sanitiser applied)
// case_7 (sanitised before branch)
//
// ─────────────────────────────────────────────────────────────────────────────
// ─── Case 1: Direct source → sink (UNSAFE) ──────────────────────────────────
//
// get_user_command() returns tainted(ALL)
// run_command() is a sink(SHELL_ESCAPE)
// No sanitiser in between → FINDING
//
fn case_1_direct_source_to_sink() {
let cmd = get_user_command(); // tainted(ALL) via cross-file source
run_command(&cmd); // FINDING: taint reaches shell sink
}
// ─── Case 2: Correctly sanitised (SAFE) ─────────────────────────────────────
//
// get_user_command() returns tainted(ALL)
// sanitize_shell() strips SHELL_ESCAPE
// run_command() sinks SHELL_ESCAPE → bit is gone → no finding
//
fn case_2_sanitised_before_sink() {
let cmd = get_user_command(); // tainted(ALL)
let safe = sanitize_shell(&cmd); // SHELL_ESCAPE bit stripped
run_command(&safe); // SAFE — no finding
}
// ─── Case 3: Wrong sanitiser for the sink (UNSAFE) ──────────────────────────
//
// get_user_command() returns tainted(ALL)
// sanitize_html() strips HTML_ESCAPE — but NOT SHELL_ESCAPE
// run_command() sinks SHELL_ESCAPE → bit still set → FINDING
//
fn case_3_wrong_sanitiser() {
let cmd = get_user_command(); // tainted(ALL)
let wrong = sanitize_html(&cmd); // strips HTML_ESCAPE only
run_command(&wrong); // FINDING: SHELL_ESCAPE still set
}
// ─── Case 4: Correct HTML sanitiser (SAFE) ──────────────────────────────────
//
// load_template() returns tainted(ALL) from file read
// sanitize_html() strips HTML_ESCAPE
// render_page() sinks HTML_ESCAPE → bit is gone → no finding
//
fn case_4_html_sanitised() {
let tpl = load_template("page.html"); // tainted(ALL) via cross-file source
let safe = sanitize_html(&tpl); // HTML_ESCAPE bit stripped
render_page(&safe); // SAFE — no finding
}
// ─── Case 5: Passthrough preserves taint (UNSAFE) ───────────────────────────
//
// get_user_command() returns tainted(ALL)
// passthrough() propagates taint unchanged (propagates_taint = true)
// run_command() sinks SHELL_ESCAPE → still tainted → FINDING
//
fn case_5_passthrough_preserves_taint() {
let cmd = get_user_command(); // tainted(ALL)
let same = passthrough(&cmd); // taint flows through
run_command(&same); // FINDING: still tainted
}
// ─── Case 6: Taint flows through only one branch (UNSAFE) ───────────────────
//
// One branch sanitises, the other does not.
// The unsanitised branch reaches the sink → FINDING on that path.
//
fn case_6_taint_through_branch() {
let cmd = get_user_command(); // tainted(ALL)
if cmd.len() > 10 {
run_command(&cmd); // FINDING: unsanitised path
} else {
let safe = sanitize_shell(&cmd);
run_command(&safe); // SAFE path
}
}
// ─── Case 7: Sanitised before branch (SAFE) ─────────────────────────────────
//
// Sanitisation happens before the branch → both paths are clean.
//
fn case_7_sanitised_before_branch() {
let cmd = get_user_command(); // tainted(ALL)
let safe = sanitize_shell(&cmd); // SHELL_ESCAPE stripped
if safe.len() > 10 {
run_command(&safe); // SAFE
} else {
run_command(&safe); // SAFE
}
}
// ─── Case 8: Source-and-sink function (UNSAFE) ──────────────────────────────
//
// log_and_execute() is both:
// • a SINK(SHELL_ESCAPE) on its cmd parameter
// • a SOURCE(ALL) in its return value (reads env var)
//
// Passing tainted data to it → FINDING for the sink.
// Its return value is freshly tainted, but we don't pass it anywhere
// dangerous here — so only one finding.
//
fn case_8_source_and_sink_same_fn() {
let cmd = get_user_command(); // tainted(ALL)
let _log = log_and_execute(&cmd); // FINDING: tainted arg hits shell sink
// _log is now tainted(ALL) from log_and_execute's source behaviour,
// but we don't use it — no second finding.
}
fn main() {
case_1_direct_source_to_sink();
case_2_sanitised_before_sink();
case_3_wrong_sanitiser();
case_4_html_sanitised();
case_5_passthrough_preserves_taint();
case_6_taint_through_branch();
case_7_sanitised_before_branch();
case_8_source_and_sink_same_fn();
}

View file

@ -1,30 +0,0 @@
// ─────────────────────────────────────────────────────────────────────────────
// examples/cross-file/sanitize.rs — Sanitizers
//
// Functions that clean specific taint capabilities. After passing through
// one of these, the corresponding Cap bit is stripped.
//
// ┌─────────────────────────────────────────────────────────────────────────┐
// │ FuncSummary produced by pass 1: │
// │ │
// │ sanitize_shell → sanitizer_caps: SHELL_ESCAPE, propagates: true │
// │ sanitize_html → sanitizer_caps: HTML_ESCAPE, propagates: true │
// │ passthrough → sanitizer: 0, source: 0, sink: 0, propagates: true │
// └─────────────────────────────────────────────────────────────────────────┘
// ─────────────────────────────────────────────────────────────────────────────
/// Escapes shell metacharacters. Strips the SHELL_ESCAPE cap bit.
pub fn sanitize_shell(input: &str) -> String {
shell_escape::unix::escape(input.into()).to_string()
}
/// Escapes HTML entities. Strips the HTML_ESCAPE cap bit.
pub fn sanitize_html(input: &str) -> String {
html_escape::encode_safe(input).to_string()
}
/// Does nothing security-relevant — just returns a copy.
/// Taint passes straight through (propagates_taint = true).
pub fn passthrough(input: &str) -> String {
input.to_string()
}

View file

@ -1,96 +0,0 @@
//! demo.rs — realistic taint-tracking playground
//! `cargo add html-escape shell-escape` before compiling.
use std::{env, process::Command, fs};
#[derive(Default)]
struct UserCtx {
query: String, // potentially tainted
sanitized: String, // should remain clean
}
/// ---------- helper wrappers so we get nice Source / Sink labels ----------
fn source_env(var: &str) -> String {
env::var(var).unwrap_or_default() // Source(env-var)
}
fn source_file(path: &str) -> String {
fs::read_to_string(path).unwrap_or_default() // Source(file-io)
}
fn sink_shell(arg: &str) {
Command::new("sh").arg(arg).status().unwrap(); // Sink(process-spawn)
}
fn sink_html(out: &str) {
println!("{out}"); // Sink(html-out)
}
fn sanitize_html(s: &str) -> String {
html_escape::encode_safe(s) // Sanitizer(html-escape)
}
fn sanitize_shell(s: &str) -> String {
shell_escape::unix::escape(s.into()).into_owned() // Sanitizer(shell-escape)
}
/// ---------- 1. Main demo fuction ----------
fn main() {
// FLOW A ────────────────────────────────────────────────────────────────
// env → sanitized → safe shell
let raw = source_env("USER_CMD");
let clean = sanitize_shell(&raw);
sink_shell(&clean); // EXPECT: SAFE
// FLOW B ────────────────────────────────────────────────────────────────
// env → if-else, only one branch escapes
let arg = source_env("ANOTHER");
if arg.len() > 5 {
sink_shell(&arg); // EXPECT: UNSAFE (branch tainted)
} else {
let escaped = sanitize_shell(&arg);
sink_shell(&escaped); // safe
}
// FLOW C ────────────────────────────────────────────────────────────────
// file → while loop → HTML sanitizer cleared
let mut data = source_file("/tmp/input.txt");
while data.len() < 32 {
data.push('x');
}
let html_ok = sanitize_html(&data);
sink_html(&html_ok); // safe
// FLOW D ────────────────────────────────────────────────────────────────
// file → struct field → match → unsanitised HTML
let mut ctx = UserCtx::default();
ctx.query = source_file("/tmp/q.txt");
// overwrite the clean field; `ctx.sanitized` is *not* tainted
ctx.sanitized = sanitize_html("constant");
match ctx {
UserCtx { query, sanitized } if query.contains("DROP") => {
sink_html(&query); // EXPECT: UNSAFE
}
_ => {
sink_html(&ctx.sanitized); // safe
}
}
// FLOW E ────────────────────────────────────────────────────────────────
// source → function call → reassignment clears taint
let mut name = source_env("USER"); // tainted
greet(&name); // just prints
name = "anonymous".into(); // kills taint
greet(&name); // safe
// FLOW F ────────────────────────────────────────────────────────────────
// Multiple sanitizers, only the *right* one matters
let cmd = source_env("MIXED");
let partly = sanitize_html(&cmd); // wrong sanitizer
sink_shell(&partly); // EXPECT: UNSAFE
}
/// helper (non-sink) function
fn greet(who: &str) {
println!("Hello, {who}");
}

View file

@ -1,8 +0,0 @@
fn source_env(var: &str) -> String {
env::var(var).unwrap_or_default() // Source(env-var)
}
fn main() {
let raw = source_env("USER_CMD");
Command::new("sh").arg(raw).status().unwrap();
}

View file

@ -1,30 +0,0 @@
fn source_env(var: &str) -> String {
env::var(var).unwrap_or_default() // Source(env-var)
}
fn source_file(path: &str) -> String {
fs::read_to_string(path).unwrap_or_default() // Source(file-io)
}
fn sink_shell(arg: &str) {
Command::new("sh").arg(arg).status().unwrap(); // Sink(process-spawn)
}
fn sink_html(out: &str) {
println!("{out}"); // Sink(html-out)
}
fn main() {
let raw = source_env("USER_CMD");
let raw2 = source_file("ANOTHER");
let x = source_env("ANOTHER");
if x.len() > 5 {
sink_shell(&x); // EXPECT: UNSAFE
return;
} else {
let escaped = sanitize_shell(&x);
sink_shell(&escaped); // safe
}
sink_shell(raw); // EXPECT: UNSAFE
sink_html(raw2);
}

4
frontend/.prettierignore Normal file
View file

@ -0,0 +1,4 @@
node_modules
tsconfig.tsbuildinfo
dist
../src/server/assets/dist

View file

@ -0,0 +1,4 @@
{
"singleQuote": true,
"trailingComma": "all"
}

38
frontend/eslint.config.js Normal file
View file

@ -0,0 +1,38 @@
import js from '@eslint/js';
import globals from 'globals';
import reactHooks from 'eslint-plugin-react-hooks';
import reactRefresh from 'eslint-plugin-react-refresh';
import tseslint from 'typescript-eslint';
export default tseslint.config(
{
ignores: [
'node_modules',
'tsconfig.tsbuildinfo',
'dist',
'../src/server/assets/dist',
],
},
{
files: ['**/*.{ts,tsx}'],
extends: [js.configs.recommended, ...tseslint.configs.recommended],
languageOptions: {
ecmaVersion: 2020,
sourceType: 'module',
globals: {
...globals.browser,
...globals.es2020,
},
},
plugins: {
'react-hooks': reactHooks,
'react-refresh': reactRefresh,
},
rules: {
'@typescript-eslint/no-unused-vars': 'off',
'react-hooks/rules-of-hooks': 'error',
'react-hooks/exhaustive-deps': 'warn',
'react-refresh/only-export-components': 'off',
},
},
);

13
frontend/index.html Normal file
View file

@ -0,0 +1,13 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Nyx Scanner</title>
<link rel="icon" href="/favicon.svg" type="image/svg+xml" />
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/main.tsx"></script>
</body>
</html>

5485
frontend/package-lock.json generated Normal file

File diff suppressed because it is too large Load diff

50
frontend/package.json Normal file
View file

@ -0,0 +1,50 @@
{
"name": "nyx-frontend",
"private": true,
"version": "0.5.0",
"license": "GPL-3.0-or-later",
"type": "module",
"scripts": {
"dev": "vite",
"build": "tsc -b && vite build",
"preview": "vite preview",
"license:check": "node ./scripts/check-licenses.mjs",
"lint": "eslint .",
"typecheck": "tsc -b",
"format": "prettier --write .",
"format:check": "prettier --check .",
"test": "vitest run",
"test:watch": "vitest",
"test:coverage": "vitest run --coverage"
},
"dependencies": {
"@tanstack/react-query": "^5.62.0",
"elkjs": "^0.11.1",
"graphology": "^0.26.0",
"react": "^18.3.1",
"react-dom": "^18.3.1",
"react-router-dom": "^6.28.0",
"sigma": "^3.0.2"
},
"devDependencies": {
"@eslint/js": "^9.39.4",
"@testing-library/jest-dom": "^6.9.1",
"@testing-library/react": "^16.3.2",
"@testing-library/user-event": "^14.6.1",
"@types/react": "^18.3.12",
"@types/react-dom": "^18.3.1",
"@vitejs/plugin-react": "^4.3.4",
"@vitest/coverage-v8": "^4.1.1",
"eslint": "^9.39.4",
"eslint-plugin-react-hooks": "^7.0.1",
"eslint-plugin-react-refresh": "^0.5.2",
"globals": "^17.4.0",
"jsdom": "^29.0.1",
"license-checker-rseidelsohn": "^4.4.2",
"prettier": "^3.8.1",
"typescript": "~5.6.2",
"typescript-eslint": "^8.57.2",
"vite": "^6.0.0",
"vitest": "^4.1.1"
}
}

View file

@ -0,0 +1,81 @@
import { readFileSync } from 'node:fs';
import { dirname, join } from 'node:path';
import { fileURLToPath } from 'node:url';
import { spawnSync } from 'node:child_process';
const scriptDir = dirname(fileURLToPath(import.meta.url));
const frontendDir = join(scriptDir, '..');
const repoRoot = join(frontendDir, '..');
const aboutToml = join(repoRoot, 'about.toml');
const frontendPackageJson = join(frontendDir, 'package.json');
const aboutContents = readFileSync(aboutToml, 'utf8');
const acceptedBlock = aboutContents.match(/accepted\s*=\s*\[([\s\S]*?)\]/);
if (!acceptedBlock) {
console.error(`Could not find accepted licenses in ${aboutToml}`);
process.exit(1);
}
const rawAcceptedLicenses = [...acceptedBlock[1].matchAll(/"([^"]+)"/g)].map(
([, license]) => license,
);
if (rawAcceptedLicenses.length === 0) {
console.error(`No accepted licenses found in ${aboutToml}`);
process.exit(1);
}
// cargo-about rejects modern SPDX `-only` / `-or-later` forms in its allow
// list, so about.toml uses the deprecated bare identifiers (e.g. "GPL-3.0").
// npm ecosystems standardize on the modern forms, so accept both here.
const deprecatedSpdxFamily = /^(?:L?GPL|AGPL|GFDL)-\d+\.\d+$/;
const acceptedLicenses = [
...new Set(
rawAcceptedLicenses.flatMap((license) =>
deprecatedSpdxFamily.test(license)
? [license, `${license}-only`, `${license}-or-later`]
: [license],
),
),
];
const frontendPackage = JSON.parse(readFileSync(frontendPackageJson, 'utf8'));
const frontendLicense = frontendPackage.license;
if (!frontendLicense) {
console.error(
`Package "${frontendPackage.name}@${frontendPackage.version}" is missing a license field.`,
);
process.exit(1);
}
if (!acceptedLicenses.includes(frontendLicense)) {
console.error(
`Package "${frontendPackage.name}@${frontendPackage.version}" is licensed under "${frontendLicense}" which is not permitted.`,
);
process.exit(1);
}
const result = spawnSync(
'./node_modules/.bin/license-checker-rseidelsohn',
[
'--start',
'.',
'--excludePrivatePackages',
'--onlyAllow',
acceptedLicenses.join(';'),
'--summary',
],
{
cwd: frontendDir,
stdio: 'inherit',
},
);
if (result.error) {
console.error(result.error.message);
process.exit(1);
}
process.exit(result.status ?? 1);

17
frontend/src/App.tsx Normal file
View file

@ -0,0 +1,17 @@
import { QueryClientProvider } from '@tanstack/react-query';
import { BrowserRouter } from 'react-router-dom';
import { queryClient } from './api/queryClient';
import { SSEProvider } from './contexts/SSEContext';
import { AppLayout } from './components/layout/AppLayout';
export function App() {
return (
<QueryClientProvider client={queryClient}>
<SSEProvider>
<BrowserRouter>
<AppLayout />
</BrowserRouter>
</SSEProvider>
</QueryClientProvider>
);
}

Some files were not shown because too many files have changed in this diff Show more