Compare commits

...

49 commits

Author SHA1 Message Date
Eli Peter
c9776a5caf
Introduce repro cli subcommand
Some checks failed
CI / docs-fresh (push) Has been cancelled
CI / rustdoc (push) Has been cancelled
CI / rust-beta-build (push) Has been cancelled
CI / msrv (push) Has been cancelled
CI / rust-stable-test / linux-without-docker (push) Has been cancelled
CI / rust-stable-test / linux-with-docker (push) Has been cancelled
CI / escape-positive-control (push) Has been cancelled
CI / cross-platform-smoke (push) Has been cancelled
CI / cross-platform-smoke-1 (push) Has been cancelled
CI / rust-beta-test (push) Has been cancelled
CI / cargo-package (push) Has been cancelled
CI / benchmark-gate (push) Has been cancelled
CI / corpus-marker-audit (push) Has been cancelled
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (javascript-typescript) (push) Has been cancelled
CodeQL Advanced / Analyze (rust) (push) Has been cancelled
docs / build-deploy (push) Has been cancelled
dynamic / dynamic / linux-process-only (push) Has been cancelled
dynamic / dynamic / linux-with-docker (push) Has been cancelled
dynamic / dynamic / macos (push) Has been cancelled
eval / eval / owasp-benchmark-v1.2 (push) Has been cancelled
eval / eval / juiceshop (push) Has been cancelled
eval / eval / nodegoat (push) Has been cancelled
eval / eval / dvpwa (push) Has been cancelled
eval / eval / dvwa (push) Has been cancelled
eval / eval / gosec (push) Has been cancelled
eval / eval / railsgoat (push) Has been cancelled
eval / eval / rustsec (push) Has been cancelled
repro-bare / repro-bare / tests/repro_fixtures/python-3.11/repro (push) Has been cancelled
OSSF Scorecard / scorecard (push) Has been cancelled
2026-06-05 13:34:07 -05:00
elipeter
a2d1a1583f updated CHANGELOG.md 2026-06-05 13:13:42 -05:00
elipeter
8a7d2b8010 added repro subcommand 2026-06-05 13:10:58 -05:00
Eli Peter
c1fa6a87cf
ui-fixes 2026-06-05 12:39:39 -05:00
elipeter
f52b3bed1e changed sizes 2026-06-05 12:39:13 -05:00
elipeter
214bf91b63 bumped dep 2026-06-05 12:27:16 -05:00
elipeter
49fa174607 added svg for confirmed verdict badge 2026-06-05 12:04:09 -05:00
elipeter
291fe5d7be updated CHANGELOG.md 2026-06-05 11:36:52 -05:00
Eli Peter
25863d222a
Merge pull request #86 from nyx-sec/triage-works-in-cli
fix(cli): apply repository triage file during scans
2026-06-05 10:59:40 -05:00
elipeter
d09a97008e updated CHANGELOG.md 2026-06-05 10:53:09 -05:00
elipeter
1148e65f36 fix(cli): apply repository triage file during scans 2026-06-05 10:50:25 -05:00
Eli Peter
991c84a1eb
Dynamic (#77) 2026-06-05 10:16:30 -05:00
Eli Peter
55247b7fcd
Critical bug fixes and recall improvements (#68) 2026-05-11 12:42:39 -04:00
Eli Peter
7d0e7320e2
new capacity bits (#67) 2026-05-07 01:29:31 -04:00
elipeter
afaffc0df6 updated third party licenses 2026-05-06 05:03:00 -04:00
elipeter
c6f4c3e1cf chore: Update CHANGELOG with recent UI refresh, layout improvements, and screenshot enhancements 2026-05-06 05:01:43 -04:00
elipeter
6c607634da style: Improve code formatting for better readability in CSS and JSX files 2026-05-06 04:49:13 -04:00
elipeter
b51ae4f89d feat: Increase screenshot resolution to 1600x992 for improved quality 2026-05-06 04:45:50 -04:00
elipeter
77be7f10d9 refactor: Update UI components for consistency and improve layout 2026-05-06 04:38:04 -04:00
elipeter
da619171cf chore: Update package versions in Cargo.lock and package.json 2026-05-05 19:53:40 -04:00
elipeter
e8f1c64dc9 feat: Add asset mirroring for nyxscan.dev landing site and update favicon 2026-05-05 19:21:11 -04:00
elipeter
e830fd0a7e fix: Correct image paths in documentation for consistency 2026-05-05 19:08:51 -04:00
elipeter
c6baa4d5dc feat: Update brand color to mint-cyan across screenshots and UI elements 2026-05-05 19:02:47 -04:00
elipeter
bbf6f91c56 feat: Enhance CLI screenshot capture with raw file saving and GIF generation 2026-05-05 18:17:53 -04:00
Eli Peter
fb698d2c27
Performance and precision pass (#64) 2026-05-04 19:58:04 -04:00
Eli Peter
c7c5e0f3a1
Precision pass on auth and resource analysis (#63) 2026-05-03 13:51:46 -04:00
elipeter
064801a3a4 feat: Simplify inner-call release detection logic in resource filtering 2026-05-02 21:49:01 -04:00
elipeter
ebe4a15a72 feat: Enhance resource leak detection by recognizing inner-call release patterns and err-companion guards 2026-05-02 21:47:03 -04:00
elipeter
48bc43e1a6 feat: Add SSA summaries support for validated parameter propagation and enhance loop body error handling 2026-05-02 21:02:47 -04:00
elipeter
92aaa36ed6 chore: Update version placeholders and changelog for release 0.6.0 2026-05-02 18:06:50 -04:00
elipeter
215dd02eff docs: Update CVE list in README to include recent vulnerabilities and their details 2026-05-02 17:51:42 -04:00
Eli Peter
1f2bfe76c1
docs: Enhance module documentation across various files for clarity a… (#62)
* docs: Enhance module documentation across various files for clarity and completeness

* fix: Remove unnecessary blank line in build.rs for cleaner code

* docs: Update documentation to improve clarity and consistency in code comments
2026-05-02 17:46:45 -04:00
Eli Peter
40995e45e7
Authorization analysis logic improvements (#61) 2026-05-02 16:44:49 -04:00
Eli Peter
3c89bddbf2
Improved path traversal detection and enhanced sink classification logic 2026-05-02 03:36:14 -04:00
Eli Peter
58f1794a4e
Added Cap::DATA_EXFIL and taint fp and fn fixes on real repos (#59)
* feat: Enhance data exfiltration detection with source sensitivity gating for cookies and headers

* feat: Implement cross-file data exfiltration detection with parameter-specific gate filters

* feat: Add calibration tests and refine DATA_EXFIL severity scoring logic

* feat: Introduce per-detector configuration for data exfiltration suppression

* feat: Enhance DATA_EXFIL findings with destination field tracking in diagnostics and SARIF output

* feat: Add tainted body and URL handling for data exfiltration detection

* feat: Add integration tests and fixtures for DATA_EXFIL and SSRF detection in Go

* feat: Add Java integration tests and fixtures for DATA_EXFIL detection across multiple HTTP clients

* feat: Add synthetic externals handling for closure-captured variables in SSA

* feat: Implement closure-based suppression for resource leak findings

* feat: Add regression guards for shell-injection and taint propagation in for-of destructure patterns

* feat: Implement constructor cap narrowing for data exfiltration detection in HTTP request builders

* feat: Add gated sinks for data exfiltration detection in C and C++ using curl_easy_setopt

* feat: Implement DATA_EXFIL cap parity for backwards analysis and add integration tests

* feat: Add data exfiltration sinks for various languages and enhance documentation

* refactor: Simplify formatting and improve readability in various files

* refactor: Improve readability by simplifying conditional statements and adding clippy linting

* docs: Update CHANGELOG and comments for data exfiltration features and configuration

* docs: Clarify configuration instructions for data exfiltration trusted destinations

* docs: Enhance comments for evidence routing logic in data exfiltration
2026-05-01 10:59:52 -04:00
Eli Peter
a438886217
Python fp and docs updtes (#58)
* refactor: Update comments for clarity and add expectations.json files for performance metrics

* feat: Implement FP guard for JS/TS local-collection receivers to suppress missing ownership checks

* feat: Enhance Rust parameter handling to classify local collections and prevent false ownership checks

* refactor: Simplify code formatting for better readability in multiple files

* refactor: Improve UTF-8 sequence length handling and enhance clarity in loop iteration

* feat: Update Java and Python patterns to include new security rules

* refactor: Improve comment clarity and consistency across multiple Rust files

* refactor: Simplify code formatting for improved readability in integration tests and module files

* refactor: Improve comment formatting and enhance clarity in assertions across multiple files
2026-04-29 19:53:34 -04:00
elipeter
4db0805de6 ci: Enhance release workflow to support manual tag input and ensure consistent artifact naming 2026-04-29 11:59:50 -04:00
elipeter
65add619a0 ci: Update cosign signing commands to use bundle output format 2026-04-29 11:53:55 -04:00
Eli Peter
832533a8cd
Fix fn and bump frontend packages (#57)
* chore(deps): update frontend dependencies to latest versions

* fix: update reconnectTimer type and adjust tsconfig paths for consistency

* fix: add toast to dependencies in FindingsPage component

* fix: add toast to dependencies in FindingsPage component

* fix: update language maturity metrics and improve Go validation handling

* fix: update CHANGELOG with recent enhancements and dependency bumps

* fix: format reconnectTimer initialization for improved readability
2026-04-29 02:57:57 -04:00
dependabot[bot]
281699faae
chore(deps): bump react-router-dom from 6.30.3 to 7.14.2 in /frontend (#49)
Bumps [react-router-dom](https://github.com/remix-run/react-router/tree/HEAD/packages/react-router-dom) from 6.30.3 to 7.14.2.
- [Release notes](https://github.com/remix-run/react-router/releases)
- [Changelog](https://github.com/remix-run/react-router/blob/main/packages/react-router-dom/CHANGELOG.md)
- [Commits](https://github.com/remix-run/react-router/commits/react-router-dom@7.14.2/packages/react-router-dom)

---
updated-dependencies:
- dependency-name: react-router-dom
  dependency-version: 7.14.2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-29 01:07:22 -04:00
dependabot[bot]
d08c835ea3
chore(deps): bump blake3 in the cargo-minor-and-patch group (#47)
Bumps the cargo-minor-and-patch group with 1 update: [blake3](https://github.com/BLAKE3-team/BLAKE3).


Updates `blake3` from 1.8.4 to 1.8.5
- [Release notes](https://github.com/BLAKE3-team/BLAKE3/releases)
- [Commits](https://github.com/BLAKE3-team/BLAKE3/compare/1.8.4...1.8.5)

---
updated-dependencies:
- dependency-name: blake3
  dependency-version: 1.8.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: cargo-minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-29 01:06:51 -04:00
dependabot[bot]
f4b1ab8a34
chore(deps): bump the frontend-minor-and-patch group (#48)
Bumps the frontend-minor-and-patch group in /frontend with 8 updates:

| Package | From | To |
| --- | --- | --- |
| [@tanstack/react-query](https://github.com/TanStack/query/tree/HEAD/packages/react-query) | `5.95.2` | `5.100.6` |
| [@vitest/coverage-v8](https://github.com/vitest-dev/vitest/tree/HEAD/packages/coverage-v8) | `4.1.1` | `4.1.5` |
| [eslint-plugin-react-hooks](https://github.com/facebook/react/tree/HEAD/packages/eslint-plugin-react-hooks) | `7.0.1` | `7.1.1` |
| [globals](https://github.com/sindresorhus/globals) | `17.4.0` | `17.5.0` |
| [jsdom](https://github.com/jsdom/jsdom) | `29.0.1` | `29.1.0` |
| [prettier](https://github.com/prettier/prettier) | `3.8.1` | `3.8.3` |
| [typescript-eslint](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/typescript-eslint) | `8.57.2` | `8.59.1` |
| [vitest](https://github.com/vitest-dev/vitest/tree/HEAD/packages/vitest) | `4.1.1` | `4.1.5` |


Updates `@tanstack/react-query` from 5.95.2 to 5.100.6
- [Release notes](https://github.com/TanStack/query/releases)
- [Changelog](https://github.com/TanStack/query/blob/main/packages/react-query/CHANGELOG.md)
- [Commits](https://github.com/TanStack/query/commits/@tanstack/react-query@5.100.6/packages/react-query)

Updates `@vitest/coverage-v8` from 4.1.1 to 4.1.5
- [Release notes](https://github.com/vitest-dev/vitest/releases)
- [Commits](https://github.com/vitest-dev/vitest/commits/v4.1.5/packages/coverage-v8)

Updates `eslint-plugin-react-hooks` from 7.0.1 to 7.1.1
- [Release notes](https://github.com/facebook/react/releases)
- [Changelog](https://github.com/facebook/react/blob/main/packages/eslint-plugin-react-hooks/CHANGELOG.md)
- [Commits](https://github.com/facebook/react/commits/eslint-plugin-react-hooks@7.1.1/packages/eslint-plugin-react-hooks)

Updates `globals` from 17.4.0 to 17.5.0
- [Release notes](https://github.com/sindresorhus/globals/releases)
- [Commits](https://github.com/sindresorhus/globals/compare/v17.4.0...v17.5.0)

Updates `jsdom` from 29.0.1 to 29.1.0
- [Release notes](https://github.com/jsdom/jsdom/releases)
- [Commits](https://github.com/jsdom/jsdom/compare/v29.0.1...v29.1.0)

Updates `prettier` from 3.8.1 to 3.8.3
- [Release notes](https://github.com/prettier/prettier/releases)
- [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prettier/prettier/compare/3.8.1...3.8.3)

Updates `typescript-eslint` from 8.57.2 to 8.59.1
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/typescript-eslint/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.59.1/packages/typescript-eslint)

Updates `vitest` from 4.1.1 to 4.1.5
- [Release notes](https://github.com/vitest-dev/vitest/releases)
- [Commits](https://github.com/vitest-dev/vitest/commits/v4.1.5/packages/vitest)

---
updated-dependencies:
- dependency-name: "@tanstack/react-query"
  dependency-version: 5.100.6
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: frontend-minor-and-patch
- dependency-name: "@vitest/coverage-v8"
  dependency-version: 4.1.5
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: frontend-minor-and-patch
- dependency-name: eslint-plugin-react-hooks
  dependency-version: 7.1.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: frontend-minor-and-patch
- dependency-name: globals
  dependency-version: 17.5.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: frontend-minor-and-patch
- dependency-name: jsdom
  dependency-version: 29.1.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: frontend-minor-and-patch
- dependency-name: prettier
  dependency-version: 3.8.3
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: frontend-minor-and-patch
- dependency-name: typescript-eslint
  dependency-version: 8.59.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: frontend-minor-and-patch
- dependency-name: vitest
  dependency-version: 4.1.5
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: frontend-minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-29 01:05:33 -04:00
Eli Peter
82f18184b1
Prerelease cleanup (#46)
* feat: Add const_bound_vars tracking to prevent false positives in ownership checks

* feat: Introduce field interner and typed bounded vars for enhanced type tracking

* feat: Add typed_call_receivers and typed_bounded_dto_fields for enhanced type tracking

* feat: Centralize method name extraction with bare_method_name helper

* feat: Implement Phase-6 hierarchy fan-out for runtime virtual dispatch

* feat: Enhance C++ taint tracking with additional container operations and inline method resolution

* feat: Introduce field-sensitive points-to analysis for enhanced resource tracking

* feat: Implement Pointer-Phase 6 subscript handling for enhanced container analysis

* test: Add comprehensive tests for JavaScript control flow constructs and lattice operations

* docs: Update advanced analysis documentation with field-sensitive points-to and hierarchy fan-out details

* test: Add comprehensive tests for lattice algebra laws and SSA edge cases

* feat: Add destructured session user handling and safe user ID access patterns

* feat: Implement row-population reverse-walk for enhanced authorization checks

* feat: Enhance authorization checks with local alias chain for self-actor types

* feat: Introduce ActiveRecord query safety checks and enhance snippet extraction

* feat: Implement chained method call inner-gate rebinding for SSRF prevention

* feat: Add observability and error modules, enhance debug functionality, and implement theme context

* feat: Remove Auth Analysis page and update navigation to redirect to Explorer

* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor

* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor

* feat: Reset path-safe-suppressed spans before lowering to maintain analysis integrity

* fix(ssa): ungate debug_assert_bfs_ordering for release-tests build

The helper at src/ssa/lower.rs was gated `#[cfg(debug_assertions)]` while
the unit test at the bottom of the file was gated only `#[cfg(test)]`.
Since `cfg(test)` is set in release builds with `--tests` but
`cfg(debug_assertions)` is not, `cargo build --release --tests` failed
with E0425. Removing the gate fixes the build; the body is `debug_assert!`
only, so the helper is free in release. Also drop the gate at the call
site to avoid a `dead_code` warning when the lib is built without
`--tests`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(closure-capture): flip JS/TS fixtures to required-finding

The JS and TS closure-capture fixtures pinned the old broken behaviour
via `forbidden_findings: [{ "id_prefix": "taint-" }]`. The engine now
correctly traces taint through the closure boundary (env source captured
by an arrow function, sunk via `child_process.exec` inside the body), so
the formerly-forbidden finding is a true positive.

Match the Python sibling's shape — `required_findings` with
`id_prefix` + `min_count` plus a small `noise_budget` — and rewrite the
companion READMEs and the phase8_fragility_tests doc-comments from
"known gap" to "regression guard".

Verified:
- cargo test --release --test phase8_fragility_tests → 8/8 pass
- cargo test --release --lib bfs_assertion → pass
- corpus benchmark F1 = 0.9976 (TP=205, FP=1, FN=0) — unchanged

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: Add OWASP mapping and baseline mutation hooks for enhanced security analysis

* feat: Introduce health module and enhance health score computation with calibration tests

* feat: Add expectations configuration and cleanup .gitignore for log files

* feat: Implement theme selection and enhance settings panel for triage sync

* feat: Suppress false positives for strcpy calls with literal sources in AST

* feat: Update analyse_function_ssa to return body CFG for accurate analysis

* feat: Add bug report and feature request templates for improved issue tracking

* feat: removed dev scripts

* feat: update README.md for clarity and consistency in fixture descriptions

* feat: removed dev docs

* feat: clean up error handling and UI elements for improved user experience

* feat: adjust button sizes in HeaderBar for better UI consistency

* feat: enhance taint analysis with additional context for sanitizer and taint findings

* cargo fmt

* prettier

* refactor: simplify conditional checks and improve code readability in AST and screenshot capture scripts

* feat: add script to frame PNG screenshots with brand gradient

* feat: add fuzzing support with new targets and CI workflows

* refactor: streamline match expressions and improve formatting in CLI and output handling

* feat: enhance configuration display with detailed output options

* feat: stage demo configuration for improved CLI screenshot output

* feat: expose merge_configs function for user-configurable settings

* refactor: simplify code structure and improve readability in config handling

* refactor: improve descriptions for vulnerability patterns in various languages

* feat: update MIT License section with additional usage details and copyright information

* feat: update screenshots

* refactor: update build process and paths for frontend assets

* feat: add cross-file taint fuzzing target and supporting dictionary

* refactor: clean up formatting and comments in fuzz configuration and example files

* refactor: remove outdated comments and clean up CI configuration files

* chore: update changelog dates and improve formatting in documentation

* refactor: update Cargo.toml and CI configuration for improved packaging and build process

* refactor: enhance quote-stripping logic to prevent panics and add regression tests

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:58:38 -04:00
dependabot[bot]
79c29b394d
chore(deps): bump postcss from 8.5.8 to 8.5.10 in /frontend (#43)
Bumps [postcss](https://github.com/postcss/postcss) from 8.5.8 to 8.5.10.
- [Release notes](https://github.com/postcss/postcss/releases)
- [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md)
- [Commits](https://github.com/postcss/postcss/compare/8.5.8...8.5.10)

---
updated-dependencies:
- dependency-name: postcss
  dependency-version: 8.5.10
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-25 18:42:30 -04:00
dependabot[bot]
134fd6913d
chore(deps-dev): bump vite from 6.4.1 to 6.4.2 in /frontend (#44)
Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 6.4.1 to 6.4.2.
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/v6.4.2/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v6.4.2/packages/vite)

---
updated-dependencies:
- dependency-name: vite
  dependency-version: 6.4.2
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-25 18:42:08 -04:00
Eli Peter
41128177d2
Release/0.5.0 (#35)
* feat: Introduce function-scoped variable interning for state analysis with new tests and fixtures

* feat: Add Phase 26 symbolic execution enhancements with bitwise operator support, abstract interpretation refinements, and new taint analysis tests

* feat: Refine state analysis to handle factory-pattern resource returns with mixed-path tests and leak detection enhancements

* feat: Add Phase 27 debug views with symbolic execution, abstract interpretation, SSA, and call graph viewers; integrate with debug layout and styles

* feat: Add Phase 31 type-qualified symbolic resolution with receiver-based callee disambiguation and testing

* feat: Extend symbolic execution with state iteration, enhanced debug views, and debounced input handling

* feat: Add Phase 13 resource and auth pattern extensions with new tests and fixtures

* feat: Introduce CFG debug graph renderer with compact mode, toolbar, and DAG layout integration

* feat: Add Phase 28 encoding and decoding transform modeling with structural symex enhancements and new taint analysis tests

* feat: Extend abstract interpretation with type facts and constant value tracking in debug views and server logic

* feat: Add linear path handling and witness extraction to symbolic execution with Phase 28 transform mismatch detection

* feat: Refine Go auth and sanitizer handling with enhanced rules, state updates, and benchmark improvements

* feat: Enable auth-state analysis by default and update relevant tests in benchmark config

* test: Update state_tests to reflect default enablement of auth-state analysis and add auth suppression test

* docs: update CHANGELOG.md

* feat: Introduce per-index taint tracking in `HeapState` with `HeapSlot`, overflow handling, and revised SSA transfers

* feat: Introduce C/C++ language labels and refine heap state tracking in SSA transfers

* feat: Implement per-index array slot tracking in symbolic heap with overflow collapse

* feat: Add implicit definition handling for uninitialized declarations in SSA value allocation

* feat: Refactor function parameters and constants for improved clarity and maintainability

* refactor: Reorder module imports and improve formatting for consistency

* refactor: Fix formatting erorrs

* refactor: Fix clippy warnings

* refactor: Fix fmt warnings (again)

* chore: Update dependencies and improve feature configuration

* Add comprehensive tests for undertested modules (#36) (COPILOT)

* Add comprehensive tests for undertested modules

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083

* Add comprehensive tests for ext, project, walk, and errors modules

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* chore: Update dependencies and improve feature configuration

* fix: formatting errors in new tests

* chore: Update license list in about.toml

* chore: made functions input inline

* chore: updated cfg graph to take up the full page

* chore: add Prettier configuration and update code formatting

* Add frontend test suite with Vitest (111 tests) (#37)

* Add Vitest test suite for frontend - 111 tests across utils, components, hooks, and graph utilities

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/7cf0dba2-ecff-4740-ba4d-92717e74a0b7

* ci: add frontend test step to CI workflow

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/5bc0ac9f-0a32-4d03-9cb7-7a15aea53fca

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* chore: simplify array initialization in test files for consistency

* ran typecheck

* feat: add AnalysisWorkspace component and integrate it into CfgViewerPage

* feat: update routing in AppLayout and improve empty state message in ExplorerPage

* feat: enhance scan progress tracking with additional metrics and stages

* feat: update license information and add license check script

* feat: implement cross-file symbolic execution with callee body persistence

* feat: replace dagre graphs with Graphology + ELK + Sigma for more advanced call stack and cfg rendering

* feat: ensure CFG function view is scoped to the selected function, preventing bleed into sibling functions

* feat: enhance resource tracking with proxy method summaries and improve finding extraction

* feat: add terminal function exit detection for accurate resource leak analysis

* feat: add warnings for loops and functions without bodies to improve error recovery

* feat: update lambda expression handling to ensure proper function classification and control flow

* feat: remove bounded formatting/string ops and add JSON.parse sanitizer for improved data handling

* feat: add inline return taint analysis and regression tests for improved security checks

* feat: add engine version management and migration handling for database schema updates

* feat: enhance first_call_ident to skip nested function bodies and add regression tests

* feat: enhance callee name resolution with two-segment normalization and disambiguation

* feat: add cross-file context flags and debug assertions for taint analysis

* feat: refactor taint analysis structure to unify context handling and improve clarity

* feat: enhance dead code elimination to preserve Sink, Source, and Sanitizer labels with new tests

* docs: updated CHANGELOG.md

* fmt: formatting fixes

* fix: fixed frontend formatting and lint warnings

* fix: optimized ci

* fix: optimized ci

* Add comprehensive multi-file test coverage to Nyx (#38)

* Initial checklist for multi-file test suite expansion

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* Add 12 new multi-file test fixtures with TP/TN/near-miss coverage

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* deleted root repo

* rebuilt to test for regressions

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Co-authored-by: elipeter <elicpeter@gmail.com>

* feat: enhance import alias resolution and taint tracking

* feat: implement security hardening with CSRF protection and path validation

* feat: add support for import alias bindings in Python, PHP, and Rust

* feat: enhance CFG analysis modes and improve code readability

* feat: add detection for parameterized SQL queries to enhance security

* feat: add safe internal redirect handling and enhance session destroy validation

* feat: implement security improvements by addressing vulnerabilities in execAsync, session management, and file downloads

* feat: enhance taint detection by adding support for inline source member expressions in call arguments

* feat: implement pre-emission of Source nodes for inline source member expressions in call arguments

* feat: add support for Throw statement in control flow and error handling

* feat: add debug and echo endpoints with potential information leakage

* feat: implement internal redirect suppression and enhance taint detection

* feat: implement module alias tracking for dynamic dispatch in JS/TS

* feat: add authorization analysis module with Express support

* feat: add authorization analysis module with Express support

* feat: add tests for admin guard requirements and clean checks in authorization analysis

* feat: integrate Koa and Fastify frameworks into authorization analysis

* feat: add Flask and Django support to authorization analysis module

* feat: add support for Rails and Sinatra frameworks in authorization analysis

* feat: add support for Axum, ActixWeb, and Rocket frameworks in authorization analysis

* feat: add support for ActixWeb, Axum, and Rocket frameworks in authorization analysis

* feat: add support for Rails and Sinatra in authorization analysis

* chore: add .DS_Store to .gitignore

* refactor: simplify conditional checks and improve readability in multiple files

* refactor: update usage of Option methods for improved clarity and consistency

* refactor: improve code readability by simplifying conditional checks and formatting

* refactor: improve code formatting and readability by simplifying conditional checks

* refactor: simplify conditional checks and improve readability in multiple files

* refactor: simplify conditional checks in axum.rs for improved readability

* feat: add CodeQL analysis configuration for enhanced security scanning

* test: add comprehensive tests for `src/output.rs` SARIF builder (#39)

* chore: start test coverage improvement work

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* test: add comprehensive tests for src/output.rs SARIF builder

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* refactor: improve code formatting and readability in output.rs

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Co-authored-by: elipeter <elicpeter@gmail.com>

* refactor: improve code formatting and readability in output.rs

* Potential fix for code scanning alert no. 210: Uncontrolled data used in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Potential fix for code scanning alert no. 211: Uncontrolled data used in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* refactor: enhance triage file path handling with improved error management and validation

* refactor: updated func summaries for richer detail

* refactor: update SSA summary extraction to use canonical FuncKey for distinct entries

* refactor: enhance callee metadata structure to support arity, receiver, and qualifier for better overload resolution

* refactor: add support for keyword arguments in function calls and enhance receiver extraction for method-style calls

* refactor: implement new Flask routes for safe and unsafe shell command execution

* refactor: separate receiver handling in SSA operations and enhance taint propagation

* refactor: improve arity handling by using arg_uses for positional argument count and enhance witness scoring for tainted arguments

* refactor: implement auth decorator extraction and classification for multiple languages

* refactor: enhance Rust module path resolution and use map handling for cross-file disambiguation

* refactor: introduce CalleeQuery struct for structured callee resolution and enhance resolver logic

* refactor: implement same-file identity collision handling for `runTask` to ensure correct resolver behavior

* refactor: standardize default struct initialization across multiple files

* feat: add scripts for formatting checks and auto-fixes with test summaries

* refactor: simplify character splitting and enhance namespace qualifier handling

* refactor: improve documentation clarity and enhance code readability in resolver logic

* refactor: replace default struct initialization with explicit field assignments for clarity

* feat: enhance anonymous function naming by deriving context-based bindings

* refactor: streamline match expressions for improved readability and performance

* refactor: streamline match expressions for improved readability and performance

* refactor: replace loop with while let for improved clarity and performance

* feat: add SSA constant propagation support to analysis context for improved accuracy

* feat: add SSA constant propagation support to analysis context for improved accuracy

* feat: implement shell metacharacter validation and bounded-length checks in Rust analysis

* feat: add static map analysis for command injection suppression and type safety

* refactor: simplify match statements and reduce line breaks for improved readability

* feat(summary): phase 1/5 SinkSite data model for primary sink-location attribution

Introduce SinkSite (file_rel, line, col, snippet, cap) carrying the
primary sink source-location through function summaries. Swap
SsaFuncSummary.param_to_sink and FuncSummary.param_to_sink from a coarse
Cap map to a deduped SmallVec<[SinkSite; 1]> per parameter, with a
backward-compatible cap_sites() helper and serde defaults so pre-phase-1
on-disk rows continue to deserialise cleanly.

Extraction: SinkSiteLocator bundles the tree/bytes/file_rel needed by
extract_ssa_func_summary; ParsedFile::extract_ssa_artifacts wires the
locator in for the persisted pass-1 path, while pass-2 intra-file
transient summaries fall back to cap-only sites (behavior unchanged).
Merge: GlobalSummaries::insert now unions sink sites with
(file_rel, line, col, cap) dedup via shared union_param_sink_sites
helper.

Database: JSON-serialised summary columns carry the new shape
automatically; no schema change needed.

Phase 2 will consume SinkSite in build_taint_diag() to overwrite the
caller-site Finding.line with the callee's sink line when resolved via
summary. Phase 1 keeps behavior unchanged: scanning
tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs still produces the
same (wrong) line 10 finding.

Adds round-trip tests covering SinkSite solo, SsaFuncSummary with sink
sites, legacy-JSON default handling for both summary types, and merge
dedup.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(taint): phase 2/5 thread SinkSite into SsaTaintEvent and Finding

Plumb Phase 1's SinkSite through the event pipeline into Findings,
no output change yet.  SsaTaintEvent gains `primary_sink_site:
Option<SinkSite>`; when the main or callback sink-emission path has
non-empty `param_to_sink_sites`, filter to sites whose
`(line != 0) && (cap ∩ sink_caps != ∅)` and emit one event per
distinct site — the multi-primary collapse keeps each downstream
Finding single-primary.

Resolution: ResolvedSummary and SinkInfo gain mirror
`param_to_sink_sites` fields, populated from `SsaFuncSummary.param_to_sink`
(SSA + callback paths) and `FuncSummary.param_to_sink` (global paths).
Label, local-summary, and interop resolution paths leave the field
empty — they only ever had cap-level info to begin with.

Finding: new `primary_location: Option<SinkLocation>` with
`file_rel/line/col`.  `ssa_events_to_findings` maps
`event.primary_sink_site` → `Finding.primary_location`, filtering
cap-only sites (`line == 0`) to `None` so the (0,0) sentinel never
leaks to formatters.  Dedup key extended with the primary location
so multi-site events aren't collapsed back together.

Invariants (debug_assert!):
* every SinkSite reaching emission has `line != 0 && cap ∩ sink_caps
  != ∅` — enforced by the pick_primary_sink_sites* filters;
* every populated Finding.primary_location has `line != 0` AND
  non-empty `file_rel` — the cap-only → None translation upstream
  guarantees this.

Deliberately independent of `uses_summary`: that flag tracks whether
the *taint chain* used a summary, whereas primary attribution
requires only that the *sink* itself was summary-resolved.  A local
source reaching a cross-file sink produces `uses_summary=false`
alongside a populated primary_location — documented on
Finding.primary_location, covered by
`cross_file_sink_finding_carries_primary_location`.

build_taint_diag, SARIF/JSON/explanation formatters, and the
benchmark scorer remain untouched: finding.line still comes from
`cfg_graph[finding.sink]`, so cmdi_indirect.rs still reports line 10
and the benchmark's rs-cmdi-003 row still shows FN in the LOC column.

Tests: `cross_file_sink_finding_carries_primary_location` (proves
plumbing via a synthetic FuncSummary carrying a SinkSite at 42:5) and
`cross_file_sink_cap_only_site_leaves_primary_location_none`
(regression guard against cap-only sites surfacing).  All 1566 lib
tests + integration tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(output): phase 3/5 consume primary sink location in diag + SARIF

When a finding's primary_location (populated in phase 2 from a callee
summary's SinkSite) names the dangerous instruction inside a callee
body, attribute the diagnostic line to that location instead of the
caller's call site. The call site is demoted to a Call step in
flow_steps, and a synthetic Sink step at the primary location is
appended so analysts still see the full trace.

Changes:
- Add scan_root parameter to build_taint_diag so file_rel can be
  resolved back to an absolute path via a shared resolve_file_rel
  helper. Empty file_rel (single-file scans where namespace == "")
  resolves to the file under analysis.
- Extend SinkLocation with snippet, carried from the upstream
  SinkSite so the formatter needs no second file read.
- Relax the ssa_events_to_findings debug_assert to allow empty
  file_rel, which is valid when scan root equals the file itself.
- SARIF: emit data-flow as codeFlows[0].threadFlows[0].locations[];
  locations[0] already reflects the primary sink position via the
  updated diag line/col.

Acceptance: scan on tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs
now reports line 5 (Command::new) as the primary sink, with the call
site at line 10 visible in flow_steps.

Two expect.json fixtures updated (must_match line_range widened):
- javascript/taint/context_sensitive_call: 12-14 -> 7-14 (line 8 is
  the real sink inside run()).
- rust/cfg/closure_async: 10-10 -> 10-11 (line 11 is Command::new
  inside the closure).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(bench): phase 4/5 validate primary sink attribution across corpus

Extend the benchmark scorer and ground truth to lock in phase 3's
primary-location behavior, and add fixtures that exercise the new
capability end-to-end.

Scorer (tests/benchmark_test.rs):
- Add optional `expected_call_site_lines: Option<Vec<[usize; 2]>>` on
  Case. When present, score_location_level additionally requires at
  least one flow_step in the finding's evidence trace to fall within
  ±2 of the call-site range. When absent, the check is skipped —
  fully forward-compatible with existing fixtures.
- Retain ±2 tolerance on expected_sink_lines (compared against the
  now-primary Diag.line post-phase-3).

Ground truth edits:
- rs-cmdi-cross-001: expected_sink_lines [8,8] -> [9,9]. Line 8 is the
  transform::wrap call site (a cross-file propagator, not a sink);
  line 9 is Command::new, the real sink. The ±2 tolerance happened to
  mask this stale attribution but it was semantically wrong — phase 4
  is the right time to correct it. Also adds expected_call_site_lines
  [8,8] so the new field is exercised on an existing cross-file case.
- rs-cmdi-003: adds expected_call_site_lines [10,10] (run_cmd call).
  This fixture's sink (Command::new inside run_cmd at line 5) was the
  motivating case for phases 1-3; adding the call-site assertion
  guards against regression to caller-line attribution.

New fixtures:
- rust/cmdi/cmdi_indirect_multisink.rs (rs-cmdi-009): helper run_both
  takes two tainted params and invokes two Command sinks on
  consecutive lines. Locks in that primary line lands inside the
  helper (lines 5-6), not at the caller (line 12). Notes document
  that SinkSite is currently one-per-callee so both findings today
  collapse onto the first sink; expected_sink_lines=[5,6] and
  expected_call_site_lines=[12,12] stay valid either way.
- python/cmdi/cross_indirect_sink/{app.py,helper.py} (py-cmdi-cross-
  004): sink os.system lives in helper.py (cross-file), caller in
  app.py reads env source and calls run_cmd. Verifies phase 3's
  cross-file primary attribution: Diag.path = helper.py, Diag.line =
  5, with app.py:7 recorded in flow_steps as a Call step.

Acceptance:
- `cargo test --test benchmark_test -- --ignored --nocapture` passes.
- rs-cmdi-003 is TP/TP/TP (the target flip FN->TP at LOC). All
  pre-existing TP/TP/TP fixtures remain TP/TP/TP; 2 new fixtures are
  TP/TP/TP.
- Aggregate rule-level: TP=158 FP=10 FN=1 TN=97, P=0.940 R=0.994
  F1=0.966 on the 266-case corpus (was TP=156 FP=10 FN=1 TN=97 on
  264 pre-phase-4, delta is the +2 new cases both resolving TP).
- Full `cargo test` green (1566 lib tests + all integration tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(taint): phase 5/5 lock Finding.primary_location contract via regression test

Add a regression test in src/taint/ssa_transfer.rs that wires up a synthetic
SsaFuncSummary with a SinkSite at other.rs:42:10 and drives the three
emission stages (pick_primary_sink_sites → emit_ssa_taint_events →
ssa_events_to_findings) against a minimal caller SSA body.  Asserts the
resulting Finding.primary_location is exactly that triple.

The existing integration tests in src/taint/tests.rs cover the coarse
FuncSummary path end-to-end through analyse_file.  This test locks in the
lower-level SSA-side plumbing so a future refactor that silently drops the
site between pick → emit → findings fails here rather than only at the
benchmark layer.

Also refreshes tests/benchmark/results/latest.json (timestamp only; rs-cmdi-003
remains TP/TP/TP and the aggregate P/R/F1 are unchanged from phase 4).

Closes the primary sink-location attribution feature (phases 1-5/5):
* Phase 1 — SinkSite data model on summaries.
* Phase 2 — SinkSite threaded into SsaTaintEvent and Finding.
* Phase 3 — diag + SARIF consume primary_location.
* Phase 4 — benchmark validates primary_call_site_lines across corpus.
* Phase 5 — regression test locks the event→finding contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: clean up formatting and improve readability in multiple files

* refactor: simplify type definition for deduplication key in findings

* test(harness): add must_not_match expectation for FP regression guards

Extends ExpectedFinding with must_not_match field that asserts a
diagnostic must NOT fire — presence is a hard failure. Non-consuming
scan so it coexists with must_match entries on the same rule_id.
Adds forbidden_violations accumulator and updates summary line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(regression): update expectations to ensure must_not_match for various taint and resource leak rules

* feat: implement auto-seeding for JS/TS handler parameters to enhance taint tracking

* feat: update switch statement handling to improve control flow analysis

* feat: implement promisify alias handling for JS/TS to enhance taint tracking

* feat: enhance taint tracking by refining expectation handling and adding mode filtering

* feat: refine SQL handling in stream processing and enhance auto-seeding for handler parameters

* feat: update taint tracking rules to enforce full mode matching and improve flow analysis

* feat: enhance Ruby subshell handling to improve taint tracking and flow analysis

* feat: update xss_response expectations to refine taint flow analysis and enhance regression guarding

* feat: refine framework detection and update expectation handling for Echo and Sinatra

* feat: implement max_count for taint tracking expectations and deduplicate findings

* feat: add strict_unexpected handling for taint-unsanitised-flow in expectation files

* feat: enhance deduplication of taint-unsanitised-flow findings by collapsing based on line and severity

* feat: add strict_unexpected handling for taint-unsanitised-flow in multiple expectation files

* feat: add structural invariant checks for SSA bodies

* feat: ensure deterministic phi emission order using BTreeSet

* feat: enhance handling of terminators to ensure authoritative flow through successor edges

* feat: enhance Goto terminator handling to ensure all successors are marked executable

* feat: refactor code for improved readability and organization

* feat: simplify predicate checks and enhance readability in SSA handling

* feat: implement per-file parse timeout and enhance file size handling

* feat: migrate analysis engine toggles from environment variables to configuration file

* feat: remove unnecessary whitespace in hostile_input_tests.rs

* feat: remove unnecessary whitespace in hostile_input_tests.rs

* feat: update dependencies and enhance documentation on language maturity

* feat: enhance security headers and improve request body limits

* feat: implement sink capability bits for deduplication and enhance evidence tagging

* feat: implement dynamic activation handling for gated sinks and enhance validation logic

* feat: enhance configuration documentation and clarify inline analysis cache behavior

* feat: implement panic recovery during analysis to continue scans past errors

* feat: add expectations configuration for taint analysis and performance metrics

* feat: enhance error handling and logging during file reading and mutex locking

* feat: add cross-file body loading tests and plumbing for CF-1 phase

* feat: implement cross-file k=1 context-sensitive inline taint analysis with new tests and fixtures

* feat: implement indexed-scan parity in cross-file inline analysis with new dropdown and copy functionality

* feat: enhance classification span handling in CFG and AST for improved source attribution

* feat: add new Express routes for handling user input and telemetry data

* feat: implement ternary expression handling in CFG with diamond structure for JS/TS

* feat: implement Phase CF-3 abstract-domain transfer channels in summaries

* feat: add support for string-prefix transfer in cross-file calls and update tests

* docs: reduce RESULTS.md doc size

* feat: implement Phase CF-4 per-return-path summary decomposition with tests

* feat: update parameter handling in pass1 and refactor SsaFuncSummary initialization

* feat: implement Phase CF-5 for cross-file SCC joint fixed-point convergence with new flags and tests

* feat: implement Phase CF-6 with parameter-granularity points-to summaries and associated tests

* refactor: update comments and documentation for clarity and consistency

* style: format code for consistency and readability

* refactor: simplify verdict handling and improve edge checking logic

* refactor: optimize path and identifier collection by avoiding unnecessary cloning

* chore: update Cargo.toml for Rust version 1.85 and add ignored files; modify CHANGELOG and README for clarity on state analysis defaults

* refactor: update documentation and improve clarity in configuration files

* refactor: update documentation and improve clarity in configuration files

* feat: add JS/TS pass-2 convergence tests and expectations configuration

* feat: add Phase 5 regression tests for inline cache origin attribution and update related logic

* feat: implement Phase 7 deduplication and alternative path linking for taint findings

* feat: implement structural DFS index for anonymous functions and update naming conventions

* feat: add Phase 8 regression tests for container-element taint in JS and Python

* feat: add engine-depth profiles and explain-engine option for CLI

* feat: update expectations and add new README fixtures for multi-file scan regression

* feat: implement Phase 11 callback-alias and factory patterns with regression tests

* feat: implement Terminator::Switch for multi-way dispatch and add regression tests

* feat: add real-CVE benchmark fixtures for CVE-2023-48022, CVE-2019-14939, and CVE-2023-26159 with corresponding patched variants

* refactor: extract cfg and ssa_transfer to submodules

* refactor: cargo fmt

* refactor: remove unnecessary blank line in cfg_tests.rs

* refactor: remove unnecessary planning file

* chore: update Rust version to 1.88 and bump dependencies in Cargo files

* feat: enhance triage UI with new layout and controls, update README for clarity

* feat: enhance triage UI with new layout and controls, update README for clarity

* chore: remove outdated section from README for version 0.5.0

* docs: improve clarity and consistency in README content

* chore: add "GPL-3.0-or-later" to license options in about.toml

* chore: update license handling in about.toml and check-licenses.mjs

* style: format code for improved readability in TriagePage component

* style: format code for improved readability in TriagePage component

* chore: enhance license handling and improve body_id scoping in seed lookup

* feat: introduce owner and parent body IDs for enhanced seed scoping

* feat: implement direction-aware engine provenance with new CLI flag for strict CI gating

* feat: add Undef SSA operation for improved control-flow handling

* style: improve code formatting for consistency and readability in multiple files

* feat: add 16-function chain SCC across multiple files for enhanced analysis

* style: simplify code formatting for improved readability in multiple files

* fix: update CapHitReason default implementation and improve README clarity

* docs: enhance README with detailed explanations of taint analysis and limitations

* docs: refine README for clarity and consistency in taint analysis section

* style: improve code formatting for better readability in NewScanModal and scans

* fix: update cargo-about command to use --offline for deterministic license generation

* fix: update cargo-about command to use --offline for deterministic license generation

* ci: add step to prime cargo registry cache for deterministic license generation

* feat: add support for non-sink collections in authorization analysis

* feat: enhance authorization checks with row-level ownership equality and binding tracking

* feat: implement self-scoped user handling and enhance ownership checks

* refactor: simplify assertions and formatting in authorization analysis tests

* fix: normalize line endings in THIRDPARTY-LICENSES.html generation and update README with AI disclosure

* docs: update AI disclosure section for clarity and conciseness

* feat: add AI Contribution Policy and update contributing guidelines for AI assistance disclosure

* feat: enhance authorization analysis with SSA-derived variable type classification

* feat: implement auth_finding_to_diag function for enhanced security diagnostics

* feat: add args_value_refs to CallSite struct for enhanced argument tracking

* feat: add args_value_refs to CallSite struct for enhanced argument tracking

* feat: add direction-aware engine provenance with LossDirection classification and new CLI flag

* feat: simplify strip_cap_from_call_args call by removing unnecessary line breaks

* feat: enhance error message handling in cli_validation_tests for better Windows compatibility

* feat: optimize release profile settings in Cargo.toml and update CodeQL configuration

* feat: enhance release build process with SBOM generation and SLSA provenance

* feat: update actions/checkout and actions/setup-node to v6, enhance CLI options, and improve auth-check summaries

* feat: introduce PathFact handling for path safety checks and rejection logic

* feat: introduce PathFact handling for path safety checks and rejection logic

* feat: update benchmark data and enhance path sanitization logic with new safety checks

* feat: document AI assistance in frontend UI development and human review process

* feat: add return path facts for enhanced path safety checks and update documentation

* chore: update release date for version 0.5.0 in CHANGELOG.md

* chore: clean up ci.yml by removing outdated comments and clarifying steps

* feat: implement cross-language path sanitizers and validators for enhanced security

* feat: enhance SSA value usage tracking by including block terminators and improve path safety checks

* feat: enhance switch statement handling by adding per-case path constraints and support for exclusive cases

* refactor: simplify conditional formatting and improve code readability in executor and lower modules

* feat: add vulnerable examples for various languages demonstrating authentication and sanitization issues

* feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers

* feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers

* feat: add transform classifiers for Java, Go, and Ruby with corresponding tests

* refactor: clarify comments on reassign-to-constant idiom and sink behavior in guards.rs

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 17:59:11 -04:00
Eli Peter
c4ce08b452
fix: Exclude 'docs/' directory from package inclusion in Cargo.toml (#34) 2026-02-25 21:29:26 -05:00
Eli Peter
1bbe4b1cfb
Phase 1 (#33)
* chore: Exclude CLAUDE.md from Cargo.toml

* feat: add callgraph module and integrate into main analysis flow

* feat: enhance CLI with new severity filtering and analysis modes

* feat: update CHANGELOG with recent enhancements and fixes to severity filtering and output handling

* feat: implement state-model dataflow analysis for resource lifecycle and auth state

* feat: enhance diagnostic output formatting and add evidence structure

* feat: implement attack surface ranking for diagnostics with scoring and sorting

* feat: add comprehensive documentation for installation, usage, and rules reference

* feat: add multiple language support for command execution and evaluation endpoints

* feat: implement inline suppression for findings using `nyx:ignore` comments

* feat: add confidence levels to AST patterns and update output structure

* feat: implement low-noise prioritization system with category filtering, rollup grouping, and configurable budgets

* feat: bump version to 0.4.0 and update changelog with new features and improvements

* feat: add dead code allowances to various functions in mod.rs and real_world_tests.rs
2026-02-25 21:16:36 -05:00
Eli Peter
19b578c5c4
Feat/configurable sanitizers and js precision (#32)
* chore: Exclude CLAUDE.md from Cargo.toml

* feat: Add configurable analysis rules and CLI commands for custom sanitizers and terminators

* feat: Enhance resource management and analysis efficiency

- Implemented parallel summary merging in `scan_filesystem` using rayon for improved performance.
- Introduced `GlobalSummaries::merge()` for efficient merging of summaries.
- Optimized file reading and hashing to eliminate redundant I/O operations.
- Added `should_scan_with_hash()` and `upsert_file_with_hash()` methods to streamline file processing.
- Enhanced taint analysis with in-place mutations to reduce memory allocations.
- Updated resource acquisition patterns to exclude false positives for `freopen` and wrapper functions.

* feat: Implement severity downgrade for findings in non-production paths and add source kind inference

* feat: Update versioning information in SECURITY.md for new stable line

* feat: Update categories in Cargo.toml to include parser-implementations and text-processing

* feat: Update dependencies in Cargo.lock for improved compatibility and performance

* feat: Update dependencies in Cargo.lock and Cargo.toml for improved compatibility
2026-02-25 04:02:11 -05:00
4543 changed files with 605034 additions and 4892 deletions

19
.config/nextest.toml Normal file
View file

@ -0,0 +1,19 @@
# nextest configuration
#
# See https://nexte.st/docs/configuration/ for the full schema.
# ── Test groups ──────────────────────────────────────────────────────────────
#
# `hostile-input-timing` serialises the two timing-bounded
# `hostile_input_tests` cases that pass under nextest in isolation but fail
# under the full-suite parallel run on darwin (resource contention from the
# other ~4000 tests pushes them past their internal budget). Pinning them to
# a single thread within their own group keeps their wall-clock predictable
# without slowing the rest of the suite.
[test-groups]
hostile-input-timing = { max-threads = 1 }
[[profile.default.overrides]]
filter = 'binary(hostile_input_tests) and (test(very_long_single_line_parses) or test(many_small_functions_do_not_explode))'
test-group = 'hostile-input-timing'

1
.github/CODEOWNERS vendored Normal file
View file

@ -0,0 +1 @@
* @elicpeter

1
.github/FUNDING.yml vendored Normal file
View file

@ -0,0 +1 @@
github: elicpeter

75
.github/ISSUE_TEMPLATE/bug_report.yml vendored Normal file
View file

@ -0,0 +1,75 @@
name: Bug report
description: Report a crash, incorrect output, or other broken behavior in Nyx.
labels: ["bug"]
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to file a bug. **Please do not file security vulnerabilities here** — use the private advisory link in SECURITY.md.
For false positives or missed detections (rule quality), this is the right place — those are quality bugs.
- type: textarea
id: summary
attributes:
label: Summary
description: One or two sentences describing what's wrong.
validations:
required: true
- type: textarea
id: repro
attributes:
label: Reproduction
description: Minimal source snippet or repo + the exact `nyx` command you ran. The smaller, the better — ideally a single file.
render: shell
validations:
required: true
- type: textarea
id: expected
attributes:
label: Expected behavior
validations:
required: true
- type: textarea
id: actual
attributes:
label: Actual behavior
description: Include the finding (or lack of finding), error output, or stack trace.
validations:
required: true
- type: input
id: version
attributes:
label: Nyx version
description: Output of `nyx --version`.
placeholder: "nyx 0.7.0"
validations:
required: true
- type: input
id: os
attributes:
label: OS / arch
placeholder: "macOS 14.5 arm64"
validations:
required: true
- type: dropdown
id: language
attributes:
label: Target language (if applicable)
options:
- "n/a"
- JavaScript / TypeScript
- Python
- Java
- Go
- Ruby
- PHP
- Rust
- C / C++
- Other
validations:
required: false
- type: textarea
id: extra
attributes:
label: Additional context
description: Logs, screenshots, related issues — anything else that helps.

8
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View file

@ -0,0 +1,8 @@
blank_issues_enabled: false
contact_links:
- name: Security vulnerability
url: https://github.com/elicpeter/nyx/security/advisories/new
about: Do NOT file public issues for security bugs. Use private disclosure (see SECURITY.md).
- name: Question or discussion
url: https://github.com/elicpeter/nyx/discussions
about: Open-ended questions, ideas, or help using Nyx belong in Discussions.

View file

@ -0,0 +1,27 @@
name: Feature request
description: Suggest a new capability, rule, language, or UX improvement.
labels: ["enhancement"]
body:
- type: textarea
id: problem
attributes:
label: Problem
description: What are you trying to do that Nyx can't do today? Concrete scenarios beat abstract wishes.
validations:
required: true
- type: textarea
id: proposal
attributes:
label: Proposed solution
description: How should it work? Sketches, example commands, or example findings are welcome.
validations:
required: true
- type: textarea
id: alternatives
attributes:
label: Alternatives considered
description: Other approaches you've thought about, and why they don't fit.
- type: textarea
id: extra
attributes:
label: Additional context

20
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View file

@ -0,0 +1,20 @@
## Summary
<!-- What does this PR change, and why? Keep it short. The diff already shows the "what". -->
## Related issues
<!-- "Closes #123", "Refs #456". Delete if none. -->
## Checklist
- [ ] `cargo test --bin nyx` passes
- [ ] `cargo clippy --all -- -D warnings` is clean
- [ ] `cargo fmt -- --check` passes
- [ ] User-visible changes are noted in `CHANGELOG.md` under `## [Unreleased]`
- [ ] Docs updated if behavior, flags, or config changed (`docs/`, `README.md`, `CONTRIBUTING.md`)
- [ ] New rules / language support include fixtures and integration tests
## Notes for reviewers
<!-- Anything you want a reviewer to look at first, tradeoffs, follow-ups. Delete if none. -->

6
.github/codeql/codeql-config.yml vendored Normal file
View file

@ -0,0 +1,6 @@
name: "CodeQL Config"
paths-ignore:
- examples
- tests
- benches

33
.github/dependabot.yml vendored Normal file
View file

@ -0,0 +1,33 @@
version: 2
updates:
- package-ecosystem: cargo
directory: "/"
schedule:
interval: weekly
open-pull-requests-limit: 10
groups:
cargo-minor-and-patch:
update-types:
- minor
- patch
- package-ecosystem: github-actions
directory: "/"
schedule:
interval: weekly
groups:
actions-minor-and-patch:
update-types:
- minor
- patch
- package-ecosystem: npm
directory: "/frontend"
schedule:
interval: weekly
open-pull-requests-limit: 10
groups:
frontend-minor-and-patch:
update-types:
- minor
- patch

View file

@ -1,4 +1,5 @@
name: CI name: CI
permissions: permissions:
contents: read contents: read
@ -7,34 +8,397 @@ on:
branches: ["master"] branches: ["master"]
pull_request: pull_request:
branches: ["master"] branches: ["master"]
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs: jobs:
test: frontend:
name: frontend
runs-on: ubuntu-latest runs-on: ubuntu-latest
strategy:
matrix:
rust: [stable, beta]
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- uses: actions-rs/toolchain@v1
- uses: actions/setup-node@v6
with: with:
toolchain: ${{ matrix.rust }} node-version: 20
components: clippy, rustfmt cache: npm
- uses: Swatinem/rust-cache@v2 cache-dependency-path: frontend/package-lock.json
- name: Install frontend dependencies
working-directory: frontend
run: npm ci
- name: Frontend license check
working-directory: frontend
run: npm run license:check
- name: Frontend format check
working-directory: frontend
run: npm run format:check
- name: Frontend lint
working-directory: frontend
run: npm run lint
- name: Frontend type check
working-directory: frontend
run: npm run typecheck
- name: Frontend tests
working-directory: frontend
run: npm test
- name: Frontend build
working-directory: frontend
run: npm run build
rustfmt:
name: rustfmt
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
components: rustfmt
cache: true
- name: Format check - name: Format check
run: cargo fmt --all -- --check run: cargo fmt --all -- --check
clippy-stable:
name: clippy-stable
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
components: clippy
cache: true
- name: Lint (Clippy) - name: Lint (Clippy)
run: cargo clippy --all-targets --all-features -- -D warnings run: cargo clippy --all-targets --all-features -- -D warnings
- name: Build & Test cargo-deny:
run: cargo test --all-features --verbose name: cargo-deny
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Security audit - uses: actions-rust-lang/setup-rust-toolchain@v1
uses: actions-rs/audit-check@v1
with: with:
token: ${{ secrets.GITHUB_TOKEN }} toolchain: stable
cache: true
- uses: taiki-e/install-action@cargo-deny
- name: License & advisory checks - name: License & advisory checks
uses: EmbarkStudios/cargo-deny-action@v2 run: cargo deny check advisories licenses bans sources
unused-deps:
name: unused-deps
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: bnjbvr/cargo-machete@v0.9.2
third-party-licenses:
name: third-party-licenses
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@v2
with:
tool: cargo-about@0.7.1
- name: Prime cargo registry cache
run: cargo fetch --locked
- name: Regenerate license attribution
run: cargo about generate --offline about.hbs | tr -d '\r' > /tmp/THIRDPARTY-LICENSES.html
- name: Diff against committed file
run: diff -u --strip-trailing-cr THIRDPARTY-LICENSES.html /tmp/THIRDPARTY-LICENSES.html
docs-fresh:
name: docs-fresh
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- name: Regenerate rule reference
run: cargo run --features docgen --bin nyx-docgen
- name: Verify docs/rules.md is fresh
run: |
if ! git diff --exit-code docs/rules.md; then
echo "::error::docs/rules.md is stale. Run 'cargo run --features docgen --bin nyx-docgen' and commit the result."
exit 1
fi
rustdoc:
name: rustdoc
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- name: Check rustdoc links
env:
RUSTDOCFLAGS: "-D warnings"
run: cargo doc --workspace --no-deps --all-features
rust-beta-build:
name: rust-beta-build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: beta
cache: true
- name: Beta compile compatibility check
run: cargo check --all-features --tests
msrv:
name: msrv
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: "1.88"
cache: true
- name: Compile check at MSRV
run: cargo check --all-features --tests
rust-stable-test-linux-without-docker:
name: rust-stable-test / linux-without-docker
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@nextest
- name: Rust tests (stable, no docker)
run: cargo nextest run --no-fail-fast --all-features
rust-stable-test-linux-with-docker:
name: rust-stable-test / linux-with-docker
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@nextest
- name: Pull language images for sandbox tests
run: |
docker pull python:3-slim
docker pull node:20-slim
docker pull eclipse-temurin:21-jre-jammy
docker pull php:8-cli
- name: Smoke-test interpreter availability
run: |
docker run --rm python:3-slim python3 --version
docker run --rm node:20-slim node --version
docker run --rm eclipse-temurin:21-jre-jammy java -version
docker run --rm php:8-cli php --version
- name: Rust tests with docker (sandbox escape gate)
run: cargo nextest run --no-fail-fast --all-features --test dynamic_sandbox_escape --test dynamic_parity
escape-positive-control:
name: escape-positive-control
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@nextest
- name: Pull python image
run: docker pull python:3-slim
- name: Escape positive control (gate wiring check)
run: |
cargo nextest run --no-fail-fast --all-features --test dynamic_sandbox_escape \
-- --include-ignored positive_control_cap_sys_admin
cross-platform-smoke:
name: cross-platform-smoke
strategy:
fail-fast: false
matrix:
os: [macos-latest, windows-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@nextest
- name: Build
run: cargo build --release --all-features
- name: Smoke tests
run: cargo nextest run --no-fail-fast --all-features --test integration_tests --test pattern_tests --test cli_validation_tests
rust-beta-test:
name: rust-beta-test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: beta
cache: true
- uses: taiki-e/install-action@nextest
- name: Rust tests (beta)
run: cargo nextest run --no-fail-fast --all-features
cargo-package:
name: cargo-package
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/setup-node@v6
with:
node-version: 20
cache: npm
cache-dependency-path: frontend/package-lock.json
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- name: Build frontend
working-directory: frontend
run: |
npm ci
npm run build
- name: Verify dist embedded in package
run: |
for f in src/server/assets/dist/index.html src/server/assets/dist/app.js src/server/assets/dist/style.css src/server/assets/favicon.svg default-nyx.conf build.rs; do
if ! cargo package --list --allow-dirty | grep -qx "$f"; then
echo "::error::missing from cargo package: $f"
exit 1
fi
done
- name: cargo package (verify build)
run: cargo package --allow-dirty
benchmark-gate:
name: benchmark-gate
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
cache-key: benchmark-gate-release
- uses: taiki-e/install-action@nextest
- name: Build benchmark + perf test binaries
run: cargo nextest run --release --all-features --test benchmark_test --test perf_tests --no-run
- name: Accuracy regression gate (P/R/F1)
run: cargo nextest run --no-fail-fast --release --all-features --test benchmark_test --run-ignored only --no-capture benchmark_evaluation
- name: Performance regression gate
env:
NYX_CI_BENCH: "1"
run: cargo nextest run --no-fail-fast --release --all-features --test perf_tests --no-capture
- name: Upload benchmark results
if: always()
uses: actions/upload-artifact@v7
with:
name: benchmark-results
path: tests/benchmark/results/latest.json
if-no-files-found: warn
corpus-marker-audit:
name: corpus-marker-audit
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
python-version: "3.12"
- name: Marker collision audit (§16.3)
run: python3 scripts/corpus_dashboard.py
# Exits non-zero if any oracle marker from one cap appears in another
# cap's payload bytes. This catches cross-cap oracle collisions that
# would cause false-positive confirmed verdicts.
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@nextest
- name: Corpus unit tests (no_marker_collisions, all_payloads_have_fixture_paths)
run: cargo nextest run --no-fail-fast --lib -p nyx-scanner dynamic::corpus
env:
RUST_LOG: error
- name: Corpus dashboard sync check (Python/Rust payload table parity)
run: python3 scripts/check_corpus_sync.py

45
.github/workflows/codeql.yml vendored Normal file
View file

@ -0,0 +1,45 @@
name: "CodeQL Advanced"
on:
push:
branches: ["master"]
pull_request:
branches: ["master"]
schedule:
- cron: "0 9 * * 2"
jobs:
analyze:
name: Analyze (${{ matrix.language }})
runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }}
permissions:
security-events: write
packages: read
actions: read
contents: read
strategy:
fail-fast: false
matrix:
include:
- language: actions
build-mode: none
- language: javascript-typescript
build-mode: none
- language: rust
build-mode: none
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Initialize CodeQL
uses: github/codeql-action/init@v4
with:
languages: ${{ matrix.language }}
build-mode: ${{ matrix.build-mode }}
config-file: ./.github/codeql/codeql-config.yml
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v4
with:
category: "/language:${{ matrix.language }}"

167
.github/workflows/corpus_promote.yml vendored Normal file
View file

@ -0,0 +1,167 @@
name: Corpus Promote
# Weekly automated promotion-PR template.
#
# Scans fuzz-discovered/ for candidates not yet in src/dynamic/corpus.rs
# and opens a PR proposing them for human review (§16.4 — no auto-merge).
#
# Also runs the marker-collision audit as a hard gate: if any collision is
# found the workflow fails rather than proposing the promotion.
on:
schedule:
# Sundays at 09:00 UTC — offset from the fuzz run (06:00 UTC) so
# discovered candidates are ready before the promotion job runs.
- cron: "0 9 * * 0"
workflow_dispatch:
inputs:
dry_run:
description: "Dry run (print PR body but do not open)"
required: false
default: "false"
permissions:
contents: write
pull-requests: write
concurrency:
group: corpus-promote
cancel-in-progress: true
jobs:
promote:
name: Propose corpus promotions
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: actions/setup-node@v6
with:
node-version: 20
cache: npm
cache-dependency-path: frontend/package-lock.json
- name: Build frontend
working-directory: frontend
run: |
npm ci
npm run build
# ── Marker collision audit ──────────────────────────────────────────────
- name: Marker collision audit
run: |
set -euo pipefail
cargo build --features dynamic -p nyx-scanner 2>/dev/null || true
cd fuzz/dynamic_corpus
cargo run -- audit-markers
env:
RUST_LOG: error
# ── Discover candidates ─────────────────────────────────────────────────
- name: Find promotion candidates
id: candidates
run: |
set -euo pipefail
count=0
files=""
if [ -d fuzz-discovered ]; then
while IFS= read -r f; do
# Skip .gitkeep, sidecar JSONs, and files already listed in corpus.rs.
[[ "$f" == *".gitkeep" ]] && continue
[[ "$f" == *".json" ]] && continue
bytes=$(xxd -p "$f" | tr -d '\n')
if ! grep -q "$bytes" src/dynamic/corpus.rs 2>/dev/null; then
count=$((count + 1))
files="$files $f"
fi
done < <(find fuzz-discovered -type f | sort)
fi
echo "count=$count" >> "$GITHUB_OUTPUT"
echo "files=$files" >> "$GITHUB_OUTPUT"
- name: Skip if no new candidates
if: steps.candidates.outputs.count == '0'
run: |
echo "No new candidates found in fuzz-discovered/. Nothing to promote."
# ── Open promotion PR ───────────────────────────────────────────────────
- name: Open promotion PR
if: >
steps.candidates.outputs.count != '0' &&
github.event.inputs.dry_run != 'true'
env:
GH_TOKEN: ${{ github.token }}
CANDIDATE_COUNT: ${{ steps.candidates.outputs.count }}
CANDIDATE_FILES: ${{ steps.candidates.outputs.files }}
run: |
set -euo pipefail
branch="corpus-promote-$(date +%Y%m%d)"
git checkout -b "$branch"
# Stage candidate files into fuzz-discovered (already there).
# The PR body provides the reviewer with everything they need.
# Build PR body into a temp file to avoid shell re-interpolation of
# sidecar JSON content (which may contain backticks or $(...) sequences).
body_file=$(mktemp)
cat > "$body_file" <<'PREAMBLE'
## Corpus Promotion Proposal
This PR was generated automatically by the weekly corpus-promote workflow.
It does **not** auto-merge — a human reviewer must approve each candidate
before it can land in `src/dynamic/corpus.rs` (§16.4).
### Candidates
The following payloads were discovered by the internal mutation fuzzer and
confirmed via `sink_hit && oracle_fired` against instrumented fixtures:
PREAMBLE
for f in $CANDIDATE_FILES; do
sidecar="${f}.json"
printf -- '- `%s`\n' "$f" >> "$body_file"
if [ -f "$sidecar" ]; then
printf ' ```json\n' >> "$body_file"
cat "$sidecar" >> "$body_file"
printf '\n ```\n' >> "$body_file"
fi
done
cat >> "$body_file" <<'CHECKLIST'
### Review checklist
- [ ] Bytes are a genuine attack vector, not a fixture artifact
- [ ] Oracle marker is unique (no collision with other caps)
- [ ] `fixture_paths` updated in `src/dynamic/corpus.rs`
- [ ] `since_corpus_version` set to next version
- [ ] `CORPUS_VERSION` bumped and bump history updated
_Generated by corpus_promote.yml — do not auto-merge._
CHECKLIST
git add fuzz-discovered/ || true
git diff --cached --quiet || git commit -m "chore: add ${CANDIDATE_COUNT} fuzzer-discovered corpus candidates"
git push origin "$branch"
gh pr create \
--title "chore(corpus): promote ${CANDIDATE_COUNT} fuzzer-discovered payload(s)" \
--body "$(cat "$body_file")" \
--base master \
--label "corpus-promotion" || true
rm -f "$body_file"
- name: Dry run summary
if: github.event.inputs.dry_run == 'true'
run: |
echo "Dry run: would promote ${{ steps.candidates.outputs.count }} candidate(s)."
echo "Files: ${{ steps.candidates.outputs.files }}"

View file

@ -0,0 +1,30 @@
name: Dependabot auto-merge
on: pull_request
permissions:
contents: write
pull-requests: write
jobs:
auto-merge:
runs-on: ubuntu-latest
# Skip fork PRs entirely (the merge would fail anyway, but no need to run).
if: >-
github.event.pull_request.user.login == 'dependabot[bot]' &&
github.event.pull_request.head.repo.full_name == github.repository
steps:
- name: Fetch Dependabot metadata
id: metadata
uses: dependabot/fetch-metadata@v3
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
- name: Enable auto-merge for patch and minor updates
if: >-
steps.metadata.outputs.update-type == 'version-update:semver-patch' ||
steps.metadata.outputs.update-type == 'version-update:semver-minor'
run: gh pr merge --auto --squash "$PR_URL"
env:
PR_URL: ${{ github.event.pull_request.html_url }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

53
.github/workflows/docs.yml vendored Normal file
View file

@ -0,0 +1,53 @@
name: docs
on:
push:
branches: [master]
paths:
- "docs/**"
- "book.toml"
- ".github/workflows/docs.yml"
- "assets/screenshots/**"
workflow_dispatch:
permissions:
contents: read
pages: write
id-token: write
concurrency:
group: pages
cancel-in-progress: false
jobs:
build-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- name: Cache mdbook
id: cache-mdbook
uses: actions/cache@v5
with:
path: ~/.cargo/bin/mdbook
key: mdbook-0.5.2-${{ runner.os }}
- name: Install mdbook
if: steps.cache-mdbook.outputs.cache-hit != 'true'
run: cargo install mdbook --version 0.5.2 --locked
- name: Build
run: mdbook build
- name: Upload artifact
uses: actions/upload-pages-artifact@v5
with:
path: book
- name: Deploy to GitHub Pages
uses: actions/deploy-pages@v5

146
.github/workflows/dynamic.yml vendored Normal file
View file

@ -0,0 +1,146 @@
# Phase 29 (Track I): dedicated dynamic-verification matrix.
#
# Three rows exercise the dynamic harness pipeline (`cargo nextest run
# --features dynamic`) under the host configurations the Phase 1728
# tracks documented as supported:
#
# linux-process-only — Ubuntu host, no docker daemon. Forces the
# process backend and exercises the Phase 17
# Linux hardening primitives (chroot, seccomp,
# unshare, no_new_privs). `libc6-dev` is
# installed so the hardening probe + escape
# suite can `cc -static`; without it the
# chroot-leg of the escape suite skips silently
# (Phase 20 follow-up #4 in deferred.md).
#
# linux-with-docker — Ubuntu host with the runner Docker daemon. Exercises
# the docker backend (Phase 19) and the
# differential-confirmation parity tests.
#
# macos — macOS-latest, no docker. Exercises the
# Phase-18 `sandbox-exec` primitives plus the
# process backend on Darwin. Track-I acceptance
# literal: "cargo nextest run --features dynamic
# is green on macOS without docker."
name: dynamic
permissions:
contents: read
on:
push:
branches: ["master"]
pull_request:
branches: ["master"]
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
linux-process-only:
name: dynamic / linux-process-only
runs-on: ubuntu-latest
env:
# Force the process backend even when callers default to Auto so
# docker-unavailable paths cannot accidentally hide a regression.
NYX_SANDBOX_BACKEND: process
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@nextest
# Phase 17 / Phase 20 follow-up: the hardening probe + escape
# suite chroot leg need static glibc. Without these packages the
# `cc -static probe.c` step in tests/sandbox_hardening_linux.rs +
# tests/sandbox_escape_suite.rs falls back to dynamic linking and
# the chroot leg silently skips.
- name: Install fixture prerequisites (static libc)
run: |
sudo apt-get update -y
sudo apt-get install -y --no-install-recommends libc6-dev libc-dev-bin
- name: Smoke-test interpreter availability
run: |
python3 --version
node --version || sudo apt-get install -y --no-install-recommends nodejs
ruby --version || true
php --version || true
- name: Dynamic suite (process backend only)
run: cargo nextest run --no-fail-fast --features dynamic
linux-with-docker:
name: dynamic / linux-with-docker
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@nextest
- name: Install fixture prerequisites (static libc)
run: |
sudo apt-get update -y
sudo apt-get install -y --no-install-recommends libc6-dev libc-dev-bin
- name: Pull language images for sandbox tests
run: |
docker pull python:3-slim
docker pull node:20-slim
docker pull eclipse-temurin:21-jre-jammy
docker pull php:8-cli
- name: Smoke-test docker interpreter availability
run: |
docker run --rm python:3-slim python3 --version
docker run --rm node:20-slim node --version
docker run --rm eclipse-temurin:21-jre-jammy java -version
docker run --rm php:8-cli php --version
- name: Dynamic suite (process + docker backends)
run: cargo nextest run --no-fail-fast --features dynamic
macos:
name: dynamic / macos
runs-on: macos-latest
env:
# macOS runners ship without docker; force process backend so the
# `Auto` resolver in src/dynamic/sandbox.rs cannot accidentally
# pick up a stray Lima/Colima daemon and confuse the matrix.
NYX_SANDBOX_BACKEND: process
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@nextest
- name: Smoke-test sandbox-exec availability
run: |
/usr/bin/sandbox-exec -p '(version 1)(allow default)' /bin/echo ok
- name: Smoke-test interpreter availability
run: |
python3 --version
node --version
ruby --version
# Phase 29 acceptance literal: "cargo nextest run --features
# dynamic is green on macOS without docker (process-only row)."
- name: Dynamic suite (macOS, process backend)
run: cargo nextest run --no-fail-fast --features dynamic

348
.github/workflows/eval.yml vendored Normal file
View file

@ -0,0 +1,348 @@
# Real-corpus acceptance (Track R).
#
# * owasp (Phase 27 / Track R.0): Gate 6 vs a real OWASP BenchmarkJava
# checkout (Java).
# * jsts (Phase 28 / Track R.1): Gate 7 vs OWASP NodeGoat (Express, .js)
# and OWASP Juice Shop (TypeScript, .ts), one matrix row per corpus.
# * polyglot (Phase 29 / Track R.2): Gate 8 vs OWASP RailsGoat (Rails, .rb),
# DVWA (PHP), DVPWA (aiohttp, .py), gosec (Go) and the RustSec advisory-db
# (Rust negative control), one matrix row per corpus.
#
# Runs on every PR that touches the dynamic verifier (src/dynamic/), the
# eval-corpus harness (tests/eval_corpus/), or the gate script itself.
#
# Each gate enforces, against the committed ground truth:
# * verify wall-clock <= 15 min (CI budget; the dev reference is 10 min),
# * the per-(cap,lang) budget in tests/eval_corpus/budget.toml,
# * per-cap confirmed-rate / precision / recall — hard-gated only for caps
# in NYX_*_FLOOR_CAPS (empty by default → published report-only until a
# cap Confirms end to end), with destinations >= 40% / >= 0.85 / >= 0.40.
#
# No corpus is vendored. Each is cloned at a pinned ref and cached so reruns
# skip the clone. Before the gate runs, the committed ground truth is
# regenerated from its source against the fresh clone and asserted in sync,
# and the converter hard-errors on any labelled path missing from the corpus,
# so a corpus bump that drifts the labels fails the job loudly.
name: eval
permissions:
contents: read
on:
push:
branches: ["master"]
paths:
- "src/dynamic/**"
- "tests/eval_corpus/**"
- "scripts/m7_ship_gate.sh"
- ".github/workflows/eval.yml"
pull_request:
branches: ["master"]
paths:
- "src/dynamic/**"
- "tests/eval_corpus/**"
- "scripts/m7_ship_gate.sh"
- ".github/workflows/eval.yml"
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
owasp:
name: eval / owasp-benchmark-v1.2
runs-on: ubuntu-latest
env:
# Gate 6 self-skips unless this points at a real checkout.
NYX_OWASP_CORPUS: ${{ github.workspace }}/.eval-corpus/owasp_benchmark_v1.2
# CI wall-clock budget: 20 min. The 2740-file OWASP scan+verify lands
# right at the old 15-min ceiling on the hosted runners (observed 900.2s),
# so the gate tripped on CI variance alone; 1200s restores headroom. The
# dev reference stays 10 min — override locally to tighten.
NYX_OWASP_WALLCLOCK_BUDGET_SECONDS: "1200"
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@nextest
# The Phase 22 Java compile pool drives `com.sun.tools.javac` out of a
# warm JDK; temurin 21 ships the compiler module the pool loads.
- name: Set up JDK 21
uses: actions/setup-java@v5
with:
distribution: temurin
java-version: "21"
- name: Cache OWASP BenchmarkJava (1.2beta)
id: cache-owasp
uses: actions/cache@v5
with:
path: .eval-corpus/owasp_benchmark_v1.2
key: owasp-benchmark-1.2beta
- name: Clone OWASP BenchmarkJava (1.2beta tag)
if: steps.cache-owasp.outputs.cache-hit != 'true'
run: |
git clone --depth 1 --branch 1.2beta \
https://github.com/OWASP-Benchmark/BenchmarkJava \
.eval-corpus/owasp_benchmark_v1.2
# No-compromise guard: the committed ground truth must be exactly what a
# fresh conversion of the pinned CSV produces. Catches GT drift (a
# corpus bump, a hand-edit) before the gate runs on stale labels.
- name: Verify ground truth is in sync with the pinned corpus
run: |
python3 tests/eval_corpus/owasp_gt_convert.py \
--corpus-dir .eval-corpus/owasp_benchmark_v1.2 \
--output /tmp/owasp_gt_regen.json
python3 - <<'PY'
import json, sys
committed = json.load(open("tests/eval_corpus/ground_truth/owasp_benchmark_v1.2.json"))
regen = json.load(open("/tmp/owasp_gt_regen.json"))
if committed != regen:
sys.exit("committed ground truth diverges from a fresh conversion of "
"the 1.2beta CSV; regenerate with owasp_gt_convert.py")
print(f"ground truth in sync: {len(committed)} records")
PY
- name: eval-corpus harness regression tests
run: |
python3 tests/eval_corpus/test_tabulate_regression.py
python3 tests/eval_corpus/test_manifest_gt_convert.py
- name: Gate 6 — OWASP Benchmark v1.2 acceptance
run: scripts/m7_ship_gate.sh --sets owasp
jsts:
name: eval / ${{ matrix.corpus.name }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
corpus:
- name: nodegoat
repo: https://github.com/OWASP/NodeGoat
# NodeGoat ships no release tags; pin the default branch and let
# the cache key hold it stable. The manifest's path layout
# (app/, config/) has been constant for years.
ref: master
env: NYX_NODEGOAT_CORPUS
manifest: nodegoat.manifest.toml
ground_truth: nodegoat.json
- name: juiceshop
repo: https://github.com/juice-shop/juice-shop
ref: v15.0.0
env: NYX_JUICESHOP_CORPUS
manifest: juiceshop.manifest.toml
ground_truth: juiceshop.json
env:
# CI wall-clock budget: 15 min. Override locally to tighten.
NYX_JSTS_WALLCLOCK_BUDGET_SECONDS: "900"
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@nextest
# The dynamic verifier's Node build pool (Phase 23) compiles its
# harnesses with a real node/npm toolchain.
- name: Set up Node 20
uses: actions/setup-node@v6
with:
node-version: "20"
- name: Cache ${{ matrix.corpus.name }}
id: cache-corpus
uses: actions/cache@v5
with:
path: .eval-corpus/${{ matrix.corpus.name }}
key: jsts-${{ matrix.corpus.name }}-${{ matrix.corpus.ref }}
- name: Clone ${{ matrix.corpus.name }} (${{ matrix.corpus.ref }})
if: steps.cache-corpus.outputs.cache-hit != 'true'
run: |
git clone --depth 1 --branch ${{ matrix.corpus.ref }} \
${{ matrix.corpus.repo }} \
.eval-corpus/${{ matrix.corpus.name }}
# No-compromise guard: the committed ground truth must be exactly what a
# fresh conversion of the curated manifest produces *against this
# corpus*. manifest_gt_convert.py hard-errors on any labelled path that
# no longer exists in the clone (corpus drift / typo), and the diff
# below catches a stale committed JSON.
- name: Verify ground truth is in sync with the pinned corpus
run: |
python3 tests/eval_corpus/manifest_gt_convert.py \
--manifest tests/eval_corpus/ground_truth/${{ matrix.corpus.manifest }} \
--corpus-dir .eval-corpus/${{ matrix.corpus.name }} \
--output /tmp/${{ matrix.corpus.name }}_gt_regen.json
python3 - <<'PY'
import json, sys
name = "${{ matrix.corpus.ground_truth }}"
committed = json.load(open(f"tests/eval_corpus/ground_truth/{name}"))
regen = json.load(open("/tmp/${{ matrix.corpus.name }}_gt_regen.json"))
if committed != regen:
sys.exit("committed ground truth diverges from a fresh conversion of "
"the manifest against the pinned corpus; regenerate with "
"manifest_gt_convert.py")
print(f"ground truth in sync: {len(committed)} records")
PY
- name: eval-corpus harness regression tests
run: |
python3 tests/eval_corpus/test_tabulate_regression.py
python3 tests/eval_corpus/test_manifest_gt_convert.py
- name: Gate 7 — ${{ matrix.corpus.name }} acceptance
run: |
export ${{ matrix.corpus.env }}="${{ github.workspace }}/.eval-corpus/${{ matrix.corpus.name }}"
scripts/m7_ship_gate.sh --sets ${{ matrix.corpus.name }}
polyglot:
name: eval / ${{ matrix.corpus.name }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
corpus:
- name: railsgoat
repo: https://github.com/OWASP/railsgoat
ref: rails.5.0.0
lang: ruby
env: NYX_RAILSGOAT_CORPUS
manifest: railsgoat.manifest.toml
ground_truth: railsgoat.json
- name: dvwa
repo: https://github.com/digininja/DVWA
ref: "2.5"
lang: php
env: NYX_DVWA_CORPUS
manifest: dvwa.manifest.toml
ground_truth: dvwa.json
- name: dvpwa
repo: https://github.com/anxolerd/dvpwa
# DVPWA ships no release tags; pin the default branch and let the
# cache key hold it stable.
ref: master
lang: python
env: NYX_DVPWA_CORPUS
manifest: dvpwa.manifest.toml
ground_truth: dvpwa.json
- name: gosec
repo: https://github.com/securego/gosec
ref: v2.26.1
lang: go
env: NYX_GOSEC_CORPUS
manifest: gosec.manifest.toml
ground_truth: gosec.json
- name: rustsec
repo: https://github.com/rustsec/advisory-db
# advisory-db ships no release tags; pin the default branch. This
# is the Rust NEGATIVE CONTROL (advisory metadata, no scannable
# source) — its committed ground truth is empty by construction.
ref: main
lang: rust
env: NYX_RUSTSEC_CORPUS
manifest: rustsec.manifest.toml
ground_truth: rustsec.json
env:
# CI wall-clock budget: 15 min. Override locally to tighten.
NYX_POLYGLOT_WALLCLOCK_BUDGET_SECONDS: "900"
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- uses: taiki-e/install-action@nextest
# The dynamic verifier's per-language build pool (Phase 22/23) compiles
# its harnesses with a real toolchain. Each matrix row sets up only the
# toolchain for its corpus's target language; the Rust row needs no extra
# step (the rust toolchain above covers it, and advisory-db has no
# buildable source anyway).
- name: Set up Ruby
if: matrix.corpus.lang == 'ruby'
uses: ruby/setup-ruby@v1
with:
ruby-version: "3.3"
- name: Set up PHP
if: matrix.corpus.lang == 'php'
uses: shivammathur/setup-php@v2
with:
php-version: "8.3"
- name: Set up Python
if: matrix.corpus.lang == 'python'
uses: actions/setup-python@v6
with:
python-version: "3.12"
- name: Set up Go
if: matrix.corpus.lang == 'go'
uses: actions/setup-go@v6
with:
go-version: "1.22"
- name: Cache ${{ matrix.corpus.name }}
id: cache-corpus
uses: actions/cache@v5
with:
path: .eval-corpus/${{ matrix.corpus.name }}
key: polyglot-${{ matrix.corpus.name }}-${{ matrix.corpus.ref }}
- name: Clone ${{ matrix.corpus.name }} (${{ matrix.corpus.ref }})
if: steps.cache-corpus.outputs.cache-hit != 'true'
run: |
git clone --depth 1 --branch ${{ matrix.corpus.ref }} \
${{ matrix.corpus.repo }} \
.eval-corpus/${{ matrix.corpus.name }}
# No-compromise guard: the committed ground truth must be exactly what a
# fresh conversion of the curated manifest produces *against this corpus*.
# manifest_gt_convert.py hard-errors on any labelled path that no longer
# exists in the clone (corpus drift / typo); the diff below catches a
# stale committed JSON. For the RustSec negative control the manifest
# carries `negative_control = true` and zero entries, so the converter
# emits an empty `[]` — still validated against the real clone.
- name: Verify ground truth is in sync with the pinned corpus
run: |
python3 tests/eval_corpus/manifest_gt_convert.py \
--manifest tests/eval_corpus/ground_truth/${{ matrix.corpus.manifest }} \
--corpus-dir .eval-corpus/${{ matrix.corpus.name }} \
--output /tmp/${{ matrix.corpus.name }}_gt_regen.json
python3 - <<'PY'
import json, sys
name = "${{ matrix.corpus.ground_truth }}"
committed = json.load(open(f"tests/eval_corpus/ground_truth/{name}"))
regen = json.load(open("/tmp/${{ matrix.corpus.name }}_gt_regen.json"))
if committed != regen:
sys.exit("committed ground truth diverges from a fresh conversion of "
"the manifest against the pinned corpus; regenerate with "
"manifest_gt_convert.py")
print(f"ground truth in sync: {len(committed)} records")
PY
- name: eval-corpus harness regression tests
run: |
python3 tests/eval_corpus/test_tabulate_regression.py
python3 tests/eval_corpus/test_manifest_gt_convert.py
- name: Gate 8 — ${{ matrix.corpus.name }} acceptance
run: |
export ${{ matrix.corpus.env }}="${{ github.workspace }}/.eval-corpus/${{ matrix.corpus.name }}"
scripts/m7_ship_gate.sh --sets ${{ matrix.corpus.name }}

217
.github/workflows/fuzz.yml vendored Normal file
View file

@ -0,0 +1,217 @@
name: Fuzz
on:
pull_request:
branches: ["master"]
paths:
- "src/**"
- "fuzz/**"
- "Cargo.toml"
- "Cargo.lock"
- ".github/workflows/fuzz.yml"
schedule:
# Long-form weekly run, Sundays at 06:00 UTC.
- cron: "0 6 * * 0"
workflow_dispatch:
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
fuzz:
name: fuzz-${{ matrix.target }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
target: [scan_bytes, extract_summaries, cross_file_taint]
steps:
- uses: actions/checkout@v6
# cargo-fuzz needs nightly for the libFuzzer codegen flags.
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: nightly
cache: true
cache-workspaces: |
.
fuzz
- uses: taiki-e/install-action@v2
with:
tool: cargo-fuzz
- uses: actions/setup-node@v6
with:
node-version: 20
cache: npm
cache-dependency-path: frontend/package-lock.json
- name: Build frontend
working-directory: frontend
run: |
npm ci
npm run build
- name: Restore fuzz corpus
uses: actions/cache@v5
with:
path: fuzz/corpus/${{ matrix.target }}
key: fuzz-corpus-${{ matrix.target }}-${{ github.sha }}
restore-keys: |
fuzz-corpus-${{ matrix.target }}-
# The harness reads inputs as <lang_idx_byte><source>, so we prefix
# each seed with its language index here at stage time. Files in
# fuzz/seed_corpus/ are committed as plain source without the byte
# because some IDEs strip 0x00 on save.
- name: Layer seed corpus
run: |
set -euo pipefail
target=${{ matrix.target }}
dest="fuzz/corpus/$target"
mkdir -p "$dest"
ext_to_idx() {
case "$1" in
rs) echo 0 ;;
js) echo 1 ;;
ts) echo 2 ;;
py) echo 3 ;;
go) echo 4 ;;
java) echo 5 ;;
rb) echo 6 ;;
php) echo 7 ;;
c) echo 8 ;;
cpp) echo 9 ;;
*) return 1 ;;
esac
}
stage() {
src="$1"
ext="${src##*.}"
idx=$(ext_to_idx "$ext") || return 0
hash=$(sha256sum "$src" | cut -c1-16)
out="$dest/seed-${ext}-${hash}"
[ -e "$out" ] && return 0
printf '%b' "$(printf '\\%03o' "$idx")" > "$out"
cat "$src" >> "$out"
}
for f in benches/fixtures/sample.*; do
[ -e "$f" ] && stage "$f"
done
while IFS= read -r f; do
stage "$f"
done < <(find tests/benchmark/corpus -type f \( \
-name '*.rs' -o -name '*.js' -o -name '*.ts' \
-o -name '*.py' -o -name '*.go' -o -name '*.java' \
-o -name '*.rb' -o -name '*.php' -o -name '*.c' \
-o -name '*.cpp' \))
if [ -d "fuzz/seed_corpus/$target" ]; then
while IFS= read -r f; do
stage "$f"
done < <(find "fuzz/seed_corpus/$target" -type f \( \
-name '*.rs' -o -name '*.js' -o -name '*.ts' \
-o -name '*.py' -o -name '*.go' -o -name '*.java' \
-o -name '*.rb' -o -name '*.php' -o -name '*.c' \
-o -name '*.cpp' \))
fi
echo "Corpus dir: $(ls "$dest" | wc -l) files"
- name: Choose fuzz duration
id: budget
run: |
if [ "${{ github.event_name }}" = "schedule" ] || [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "seconds=18000" >> "$GITHUB_OUTPUT"
else
echo "seconds=600" >> "$GITHUB_OUTPUT"
fi
- name: Run fuzz target
run: |
cargo fuzz run --target x86_64-unknown-linux-gnu ${{ matrix.target }} -- \
-max_total_time=${{ steps.budget.outputs.seconds }} \
-max_len=65536 \
-timeout=60 \
-rss_limit_mb=8192 \
-dict=fuzz/dict/all.dict
- name: Upload crash artifacts
if: failure()
uses: actions/upload-artifact@v7
with:
name: fuzz-artifacts-${{ matrix.target }}-${{ github.run_id }}
path: fuzz/artifacts/${{ matrix.target }}/
if-no-files-found: ignore
retention-days: 14
harness-fuzz:
name: harness-fuzz-${{ matrix.cap }}
runs-on: ubuntu-latest
# Run only on schedule and manual dispatch — 50 k iterations per cap is
# too slow for PR checks but is the right cadence for weekly corpus growth.
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
strategy:
fail-fast: false
matrix:
include:
- cap: sql_query
harness: tests/dynamic_fixtures/python/sqli_positive.py
- cap: code_exec
harness: tests/dynamic_fixtures/python/cmdi_positive.py
- cap: file_io
harness: tests/dynamic_fixtures/python/fileio_positive.py
- cap: ssrf
harness: tests/dynamic_fixtures/python/ssrf_positive.py
- cap: html_escape
harness: tests/dynamic_fixtures/python/xss_positive.py
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
cache: true
cache-workspaces: |
.
fuzz/dynamic_corpus
- uses: actions/setup-node@v6
with:
node-version: 20
cache: npm
cache-dependency-path: frontend/package-lock.json
- name: Build frontend
working-directory: frontend
run: |
npm ci
npm run build
- name: Build nyx-dynamic-corpus
working-directory: fuzz/dynamic_corpus
run: cargo build
- uses: actions/setup-python@v6
with:
python-version: "3.x"
- name: Run harness fuzzer — ${{ matrix.cap }}
run: |
fuzz/dynamic_corpus/target/debug/nyx-dynamic-corpus run \
--cap ${{ matrix.cap }} \
--spec-hash "ci-${{ matrix.cap }}" \
--harness-cmd "python3 ${{ matrix.harness }}" \
--iterations 50000 \
--output fuzz-discovered
- name: Upload discovered candidates
if: always()
uses: actions/upload-artifact@v7
with:
name: harness-fuzz-${{ matrix.cap }}-${{ github.run_id }}
path: fuzz-discovered/
if-no-files-found: ignore
retention-days: 30

68
.github/workflows/image-builder.yml vendored Normal file
View file

@ -0,0 +1,68 @@
name: image-builder
# Phase 19 (Track E.3): daily drift PR.
#
# Runs `nyx-image-builder build --all` on a Linux runner that has docker
# available, captures the rewritten `tools/image-builder/images.toml`, and
# opens a PR when any pinned digest changed. The PR is reviewed manually
# before merge so a hostile upstream image cannot silently land in
# `IMAGE_DIGESTS`.
permissions:
contents: write
pull-requests: write
on:
schedule:
# 04:23 UTC daily — off-peak for the major upstream registries so
# transient pull errors are rare.
- cron: "23 4 * * *"
workflow_dispatch:
concurrency:
group: image-builder
cancel-in-progress: false
jobs:
refresh-digests:
name: refresh image digests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
cache: true
- name: Verify docker is reachable
run: docker info
- name: Build pinned-digest catalogue
run: |
cargo run -F image-builder --bin nyx-image-builder -- build --all
- name: Verify catalogue against local pulls
run: |
cargo run -F image-builder --bin nyx-image-builder -- verify
- name: Open PR on drift
uses: peter-evans/create-pull-request@v8
with:
token: ${{ secrets.GITHUB_TOKEN }}
commit-message: "image-builder: refresh pinned digests"
title: "image-builder: refresh pinned digests"
body: |
Automated digest refresh by `nyx-image-builder build --all`.
The CI job pulled every base image in
`tools/image-builder/images.toml`, captured the resolved
`sha256:` digest, and wrote it back into the file. Review
the diff before merging — a hostile upstream image would
show up here as an unexpected digest change.
branch: image-builder/refresh-digests
base: master
delete-branch: true
labels: |
image-builder
automation

View file

@ -3,20 +3,61 @@ name: Release build & publish
on: on:
release: release:
types: [created] types: [created]
workflow_dispatch:
inputs:
tag:
description: "Existing release tag to (re)build and publish (e.g. v0.5.0)"
required: true
type: string
permissions: permissions:
contents: write contents: write
env: env:
BIN_NAME: nyx BIN_NAME: nyx
RELEASE_TAG: ${{ github.event.release.tag_name || inputs.tag }}
jobs: jobs:
build-and-upload: frontend:
name: build-frontend
runs-on: ubuntu-latest
steps:
- name: Check out sources
uses: actions/checkout@v6
with:
ref: ${{ env.RELEASE_TAG }}
- uses: actions/setup-node@v6
with:
node-version: 20
cache: npm
cache-dependency-path: frontend/package-lock.json
- name: Install frontend dependencies
working-directory: frontend
run: npm ci
- name: Build frontend
working-directory: frontend
run: npm run build
- name: Upload frontend dist
uses: actions/upload-artifact@v7
with:
name: frontend-dist
path: src/server/assets/dist/
if-no-files-found: error
retention-days: 1
build:
needs: frontend
strategy: strategy:
matrix: matrix:
include: include:
- target: x86_64-unknown-linux-gnu - target: x86_64-unknown-linux-gnu
os: ubuntu-latest os: ubuntu-latest
- target: aarch64-unknown-linux-gnu
os: ubuntu-latest
- target: x86_64-pc-windows-msvc - target: x86_64-pc-windows-msvc
os: windows-latest os: windows-latest
- target: x86_64-apple-darwin - target: x86_64-apple-darwin
@ -27,7 +68,15 @@ jobs:
steps: steps:
- name: Check out sources - name: Check out sources
uses: actions/checkout@v4 uses: actions/checkout@v6
with:
ref: ${{ env.RELEASE_TAG }}
- name: Download prebuilt frontend dist
uses: actions/download-artifact@v8
with:
name: frontend-dist
path: src/server/assets/dist/
- name: Install Rust toolchain - name: Install Rust toolchain
uses: actions-rust-lang/setup-rust-toolchain@v1 uses: actions-rust-lang/setup-rust-toolchain@v1
@ -36,18 +85,20 @@ jobs:
target: ${{ matrix.target }} target: ${{ matrix.target }}
cache: true cache: true
- name: Install cross-compilation tools (ARM Linux)
if: matrix.target == 'aarch64-unknown-linux-gnu'
run: |
sudo apt-get update
sudo apt-get install -y gcc-aarch64-linux-gnu
echo '[target.aarch64-unknown-linux-gnu]' >> ~/.cargo/config.toml
echo 'linker = "aarch64-linux-gnu-gcc"' >> ~/.cargo/config.toml
- name: Install target - name: Install target
run: rustup target add ${{ matrix.target }} run: rustup target add ${{ matrix.target }}
- name: Build - name: Build
run: cargo build --release --bin ${{ env.BIN_NAME }} --target ${{ matrix.target }} run: cargo build --release --bin ${{ env.BIN_NAME }} --target ${{ matrix.target }}
- name: Install cargo-about
run: cargo install cargo-about --locked
- name: Generate license bundle
run: cargo about generate about.hbs -o THIRDPARTY-LICENSES.html
- name: Package (Linux & macOS) - name: Package (Linux & macOS)
if: runner.os != 'Windows' if: runner.os != 'Windows'
shell: bash shell: bash
@ -59,7 +110,12 @@ jobs:
BIN_PATH=target/$TARGET/release/$BIN$EXT BIN_PATH=target/$TARGET/release/$BIN$EXT
mkdir -p dist mkdir -p dist
ARCHIVE=$BIN-$TARGET.zip ARCHIVE=$BIN-$TARGET.zip
zip -9 "dist/$ARCHIVE" "$BIN_PATH" THIRDPARTY-LICENSES.html LICENSE* COPYING* files=("$BIN_PATH" THIRDPARTY-LICENSES.html)
shopt -s nullglob
license_files=(LICENSE* COPYING*)
shopt -u nullglob
files+=("${license_files[@]}")
zip -9 "dist/$ARCHIVE" "${files[@]}"
echo "ASSET=$ARCHIVE" >> "$GITHUB_ENV" echo "ASSET=$ARCHIVE" >> "$GITHUB_ENV"
- name: Package (Windows) - name: Package (Windows)
@ -72,18 +128,161 @@ jobs:
$BinPath = "target/$Target/release/$Bin$Ext" $BinPath = "target/$Target/release/$Bin$Ext"
New-Item -ItemType Directory -Path dist -Force | Out-Null New-Item -ItemType Directory -Path dist -Force | Out-Null
$Archive = "$Bin-$Target.zip" $Archive = "$Bin-$Target.zip"
$LicenseFiles = @(Get-ChildItem -Path 'LICENSE*', 'COPYING*' -File -ErrorAction SilentlyContinue | ForEach-Object { $_.FullName })
$Files = @($BinPath, 'THIRDPARTY-LICENSES.html') + $LicenseFiles
# PowerShells native ZIP
Compress-Archive ` Compress-Archive `
-Path $BinPath, 'THIRDPARTY-LICENSES.html', 'LICENSE*', 'COPYING*' ` -Path $Files `
-DestinationPath "dist/$Archive" ` -DestinationPath "dist/$Archive" `
-CompressionLevel Optimal -CompressionLevel Optimal
Add-Content -Path $env:GITHUB_ENV -Value "ASSET=$Archive" Add-Content -Path $env:GITHUB_ENV -Value "ASSET=$Archive"
- name: Upload to the release - name: Upload build artifact
uses: softprops/action-gh-release@v2 uses: actions/upload-artifact@v7
with: with:
files: dist/${{ env.ASSET }} name: release-${{ matrix.target }}
path: dist/${{ env.ASSET }}
if-no-files-found: error
retention-days: 1
reproducibility:
name: reproducibility-check
needs: frontend
runs-on: ubuntu-latest
continue-on-error: true
steps:
- name: Check out sources
uses: actions/checkout@v6
with:
ref: ${{ env.RELEASE_TAG }}
- name: Download prebuilt frontend dist
uses: actions/download-artifact@v8
with:
name: frontend-dist
path: src/server/assets/dist/
- name: Install Rust toolchain
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: stable
target: x86_64-unknown-linux-gnu
cache: true
- name: Build twice and diff hashes
shell: bash
env:
RUSTFLAGS: "--remap-path-prefix=${{ github.workspace }}=/build"
run: |
set -euo pipefail
TARGET=x86_64-unknown-linux-gnu
BIN=${{ env.BIN_NAME }}
BIN_PATH="target/$TARGET/release/$BIN"
SOURCE_DATE_EPOCH=$(git log -1 --format=%ct HEAD)
export SOURCE_DATE_EPOCH
echo "SOURCE_DATE_EPOCH=$SOURCE_DATE_EPOCH"
cargo build --release --bin "$BIN" --target "$TARGET"
HASH1=$(sha256sum "$BIN_PATH" | awk '{print $1}')
echo "first build: $HASH1"
cargo clean --release --target "$TARGET"
cargo build --release --bin "$BIN" --target "$TARGET"
HASH2=$(sha256sum "$BIN_PATH" | awk '{print $1}')
echo "second build: $HASH2"
if [ "$HASH1" != "$HASH2" ]; then
echo "::error::Reproducibility check failed: builds are not bit-identical"
echo " first: $HASH1"
echo " second: $HASH2"
exit 1
fi
echo "::notice::Reproducible build verified (sha256=$HASH1)"
publish:
name: publish-release
runs-on: ubuntu-latest
needs: [build]
permissions:
contents: write
id-token: write
attestations: write
steps:
- name: Check out sources
uses: actions/checkout@v6
with:
ref: ${{ env.RELEASE_TAG }}
- name: Generate CycloneDX SBOM
uses: anchore/sbom-action@v0
with:
path: .
format: cyclonedx-json
output-file: nyx-${{ env.RELEASE_TAG }}.cdx.json
upload-artifact: false
upload-release-assets: false
- name: Download all build artifacts
uses: actions/download-artifact@v8
with:
path: release-artifacts
pattern: release-*
merge-multiple: true
- name: Generate SHA256SUMS
run: |
set -euo pipefail
cd release-artifacts
ls -lh
sha256sum *.zip > SHA256SUMS
cat SHA256SUMS
# Sigstore keyless signing. Verify with:
# cosign verify-blob --bundle <file>.bundle \
# --certificate-identity-regexp 'https://github.com/elicpeter/nyx/.*' \
# --certificate-oidc-issuer https://token.actions.githubusercontent.com \
# <file>
- name: Install cosign
uses: sigstore/cosign-installer@v4.1.2
- name: Cosign keyless sign release artifacts
shell: bash
run: |
set -euo pipefail
SBOM="nyx-${{ env.RELEASE_TAG }}.cdx.json"
(
cd release-artifacts
for f in *.zip SHA256SUMS; do
cosign sign-blob --yes \
--bundle "$f.bundle" \
"$f"
done
)
cosign sign-blob --yes \
--bundle "$SBOM.bundle" \
"$SBOM"
# SLSA v1 provenance. Verify with `gh attestation verify <file> --repo <repo>`.
- name: Generate SLSA build provenance
uses: actions/attest-build-provenance@v4
with:
subject-path: |
release-artifacts/*.zip
release-artifacts/SHA256SUMS
nyx-${{ env.RELEASE_TAG }}.cdx.json
- name: Upload to the release
uses: softprops/action-gh-release@v3
with:
tag_name: ${{ env.RELEASE_TAG }}
files: |
release-artifacts/*.zip
release-artifacts/*.zip.bundle
release-artifacts/SHA256SUMS
release-artifacts/SHA256SUMS.bundle
nyx-${{ env.RELEASE_TAG }}.cdx.json
nyx-${{ env.RELEASE_TAG }}.cdx.json.bundle
env: env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

104
.github/workflows/repro-bare.yml vendored Normal file
View file

@ -0,0 +1,104 @@
# Replay every tree-committed dynamic repro bundle with host language
# toolchains blocked so we catch regressions where a bundle silently
# depends on an interpreter the operator does not have.
#
# The setup step prepends deny-list wrappers for python3, node, ruby,
# php, and Java so the only toolchain the bundle can use is the docker
# daemon. reproduce.sh in --docker mode pulls the pinned base image
# (via docker_pull.sh) and runs the harness inside the container; if the
# bundle accidentally relied on a host interpreter the run falls over
# before the sentinel check.
#
# Adding a new fixture: extend the `matrix.fixture` list with the new
# `tests/repro_fixtures/<toolchain_id>/<spec_hash>` path. The bundle
# must already exist on disk, see tests/repro_fixture_bundles.rs for
# the regeneration recipe.
name: repro-bare
permissions:
contents: read
on:
push:
branches: ["master"]
pull_request:
branches: ["master"]
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
bare-image-replay:
name: repro-bare / ${{ matrix.fixture }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
fixture:
- tests/repro_fixtures/python-3.11/repro
steps:
- uses: actions/checkout@v6
- name: Block host language toolchains
run: |
set -euo pipefail
# Do not mutate the hosted runner image. ubuntu-latest carries
# preinstalled and cached language runtimes, and apt package
# relationships can shift underneath us as the image is updated.
# A PATH-level deny layer gives this job the bare-host semantics it
# needs without depending on apt being able to uninstall core bits.
deny_dir="${RUNNER_TEMP}/nyx-deny-toolchains"
mkdir -p "$deny_dir"
for exe in \
python python3 python3.10 python3.11 python3.12 python3.13 python3.14 \
node npm npx corepack \
ruby gem bundle \
php \
java javac jar
do
{
printf '%s\n' '#!/bin/sh'
printf '%s\n' 'echo "error: host language toolchain is disabled in repro-bare; use the Docker replay path" >&2'
printf '%s\n' 'exit 127'
} > "${deny_dir}/${exe}"
chmod +x "${deny_dir}/${exe}"
done
export PATH="${deny_dir}:${PATH}"
echo "${deny_dir}" >> "${GITHUB_PATH}"
hash -r 2>/dev/null || true
# Confirm the deny layer is active — surface the failure here
# rather than inside reproduce.sh where it would look like a
# bundle bug.
for exe in python3 node ruby php java; do
resolved="$(command -v "${exe}" || true)"
if [ "${resolved}" != "${deny_dir}/${exe}" ]; then
echo "error: ${exe} deny wrapper is not first on PATH (got ${resolved:-not found})" >&2
exit 1
fi
if "${exe}" --version >/dev/null 2>&1; then
echo "error: ${exe} still runs after host-toolchain block" >&2
exit 1
fi
done
if ! command -v docker >/dev/null 2>&1; then
echo "error: docker is no longer reachable after host-toolchain block" >&2
exit 1
fi
- name: Verify docker is reachable
run: docker info
- name: Pre-pull pinned image
working-directory: ${{ matrix.fixture }}
run: ./docker_pull.sh
- name: Replay bundle via docker
working-directory: ${{ matrix.fixture }}
run: ./reproduce.sh --docker

45
.github/workflows/scorecard.yml vendored Normal file
View file

@ -0,0 +1,45 @@
name: OSSF Scorecard
on:
branch_protection_rule:
schedule:
- cron: "0 7 * * 1"
push:
branches: ["master"]
workflow_dispatch:
permissions: read-all
jobs:
analysis:
name: scorecard
runs-on: ubuntu-latest
permissions:
security-events: write
id-token: write
contents: read
steps:
- uses: actions/checkout@v6
with:
persist-credentials: false
- name: Run analysis
uses: ossf/scorecard-action@v2.4.3
with:
results_file: results.sarif
results_format: sarif
# Flip to true once we're happy with the score and want the badge.
publish_results: false
- name: Upload SARIF artifact
uses: actions/upload-artifact@v7
with:
name: scorecard-sarif
path: results.sarif
retention-days: 14
- name: Upload SARIF to Security tab
uses: github/codeql-action/upload-sarif@v4
with:
sarif_file: results.sarif

20
.gitignore vendored
View file

@ -1,2 +1,22 @@
/target /target
/fuzz/target
/fuzz/corpus
/fuzz/dynamic_corpus/target
/fuzz/artifacts
/.idea /.idea
/frontend/node_modules
/src/server/assets/dist
/marketing
/.nyx
/.nyx-build-cache
/logs
/book
.DS_Store
.z3-trace
.pitboss
.eval-corpus
.node_modules-target
node_modules
__pycache__/
*.pyc
tools/sb-trace/*.trace.raw

36
AI-POLICY.md Normal file
View file

@ -0,0 +1,36 @@
# AI Contribution Policy
Nyx accepts contributions that were drafted, refactored, or reviewed with the help of AI tools (LLMs, code assistants, agent systems). We care about the contribution, not the keystrokes. AI changes the failure modes though, so we ask contributors to follow a few rules.
## What we ask of contributors
By opening a pull request you affirm that:
1. **You have read and understood every line you are submitting.** If you cannot explain a change under review, it is not ready to merge. "The model wrote it" is not an answer we will accept for a bug or a regression.
2. **You have the right to submit the code.** AI-generated code is only as license-clean as its training data and its prompt. Do not paste proprietary, GPL-incompatible, or confidential code into an AI tool and then submit the output here. If a model reproduced a substantial verbatim snippet from an identifiable source, disclose it.
3. **You take responsibility for the change.** The DCO `Signed-off-by:` trailer applies the same way to AI-assisted code as it does to hand-written code. You are certifying origin and right-to-submit.
4. **You disclose material AI use in the PR description.** A one-line note is enough. For example, "Drafted with an AI assistant; reviewed and tested by me." Trivial uses like tab-completion, renames, or formatting do not need to be called out. New analysis passes, rule logic, or security-relevant code do.
## What we look for in review
AI-assisted PRs face the same bar as any other PR, but reviewers will pay extra attention to:
- **Tests that exercise the new behavior.** Not just "it compiles." Fixtures under `tests/fixtures/` and assertions in `expected.yaml` are how we verify security logic.
- **Consistency with the existing engine.** Drive-by refactors, speculative abstractions, or parallel implementations of existing passes will usually be rejected, even if they look clean in isolation.
- **Fabricated references.** AI tools sometimes invent function names, crate APIs, CVE IDs, or citations. Every symbol referenced in a PR must exist, and every external claim must be verifiable.
- **Rule metadata honesty.** Rule descriptions, CWE mappings, and severity ratings are part of how downstream users triage. Do not inflate severity or cite CWEs the rule does not actually detect.
## What we will not accept
- PRs that are clearly unreviewed agent output, such as changes in the wrong file, nonsense tests, hallucinated APIs, or code that does not compile.
- PRs that add "AI-generated" boilerplate, marketing copy, or filler documentation to pad scope.
- Mass-generated PRs across many unrelated areas in a single change.
- Code that was generated by pasting another project's proprietary source into an AI tool.
## Project's own use of AI
For transparency, the README includes an [AI Disclosure](README.md#ai-disclosure) describing where AI was used in Nyx itself. The short version: the analysis engine is predominantly human-written and human-reviewed, while documentation, fixtures, and rule metadata were drafted with AI assistance and audited before landing. We hold outside contributions to the same standard.
## Questions
If you are unsure whether a contribution falls inside this policy, open a draft PR or an issue and ask before investing time. We would rather have the conversation early than reject work at review.

View file

@ -1,64 +1,551 @@
# Changelog # Changelog
All notable changes to this project will be documented in this file. All notable changes to Nyx are documented here. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html). For where Nyx is going, see the [Roadmap](ROADMAP.md).
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), ## [0.8.0] - 2026-06-06
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
The dynamic-verification release. An attack-surface map, a sandboxed dynamic verifier, a framework adapter registry that grounds both, the per-language build infrastructure that makes per-finding verification affordable at corpus scale, and the first real-corpus acceptance gates.
The attack-surface map and chain composer turn the flat finding list into a route-to-sink graph. The dynamic verifier re-runs every Medium-or-higher finding against a payload corpus and stamps a Confirmed / PartiallyConfirmed / NotConfirmed / Inconclusive / Unsupported verdict on each. The adapter registry (130+ entries across 8 languages) covers HTTP, message-broker, scheduled-job, GraphQL, WebSocket, middleware, and migration entry points. Per-language build pools and copy-on-write workdirs hold the with-verify wall-clock to within 1.5x of a static-only scan.
### Attack-surface map
- **`nyx surface` subcommand.** Prints the project's entry points, datastores, external services, and dangerous local sinks as text, JSON, Graphviz `dot`, or rendered SVG. Loads the persisted `SurfaceMap` from the most recent indexed scan when available, or rebuilds inline from source. `--build` forces a full pass-1 + call-graph walk so DataStore / ExternalService / DangerousLocal nodes populate on an unscanned project.
- **Surface page in `nyx serve`.** New `SurfacePage` renders the same graph in the browser UI, with ELK layout, sidebar navigation, and a wide-canvas SVG viewer. Persists alongside the index so the frontend reloads without a rescan.
- **Chain findings.** `ChainFinding` records connect a route entry point to a downstream sink via the call graph + surface map. The composer scores `(impact × evidence)` per chain, queues the top-N for composite reverification, and wires the result into `findings.json` / SARIF / the dashboard. Chains rank above isolated findings.
### Framework adapter registry
`src/dynamic/framework/` ships a `FrameworkAdapter` trait with concrete adapters across 8 languages (116 entries today, growing per release). Each adapter binds a route / handler / consumer pattern to a `FrameworkBinding` so the surface map and dynamic verifier can locate entry points without re-walking the AST.
- **HTTP routers.** Flask, Django, FastAPI, Starlette (Python); Express, Koa, NestJS, Fastify (JS/TS); Spring, Quarkus, Micronaut, Jakarta Servlet (Java); Gin, Echo, Fiber, Chi (Go); Axum, Actix, Rocket, Warp (Rust); Rails, Sinatra, Hanami (Ruby); Laravel, Symfony, CodeIgniter (PHP).
- **New `EntryKind` variants.** `ClassMethod`, `MessageHandler`, `ScheduledJob`, `GraphQLResolver`, `WebSocket`, `Middleware`, `Migration` join the existing `RouteHandler` / `Function` set so the surface map shows non-HTTP entry surfaces.
- **Message broker handlers.** Kafka, AWS SQS, Google Pub/Sub, NATS, and RabbitMQ consumers across Python, Node, Java, and Go.
- **Scheduled jobs.** Celery (Python), Sidekiq (Ruby), Quartz (Java), plain cron expression recognition.
- **GraphQL resolvers.** Apollo, Relay, gqlgen, Juniper, Graphene.
- **WebSocket handlers.** ws, Socket.IO, ActionCable, Django Channels.
- **Middleware + migrations.** Express, Laravel, Spring, Django, Rails middleware; Django, Flask, Laravel, Rails, Prisma, Sequelize migration scripts.
- **Sanitizer-aware adapter strengthening.** Every XXE, header-injection, open-redirect, SSTI, LDAP, XPath, deserialization, crypto, and data-exfiltration adapter rejects bindings when the surrounding source visibly hardens the parser (`disallow-doctype-decl`, `resolve_entities=False`, `libxml_disable_entity_loader`), routes the value through a known encoder (`LdapEncoder.filterEncode`, `escape_filter_chars`, `ldap_escape`), swaps a weak primitive for a CSPRNG (`secrets.token_bytes`, `crypto.randomBytes`, `SecureRandom`), or validates the destination host through an allowlist. Cuts adapter FPs without losing the genuinely dangerous calls.
### Dynamic verification
- **`nyx scan --verify`.** Every finding with `Confidence >= Medium` is re-executed inside a sandboxed harness against a curated payload corpus. The verdict (`Confirmed` / `NotConfirmed` / `Inconclusive` / `Unsupported`) lands on `Evidence.dynamic_verdict` and shows up in console output, JSON, SARIF, and the dashboard via a new `VerdictBadge` component on the finding detail page.
- **Backends.** In-process on Linux with `Standard` / `Strict` hardening (namespace unshare, chroot, RLIMIT cap, seccomp filter), in-process on macOS via `sandbox-exec` with a profile-per-policy wrap, Docker with a published image-builder catalogue, and a Firecracker trait stub for future microVM execution. The Docker backend ships native binary support for Rust and Go so harnesses no longer need to drag a toolchain into every image.
- **Language coverage.** Per-language harness emitters for Python, JS/TS, Go, Java, PHP, Ruby, Rust, C, and C++. Stub harness intercepts SQL, HTTP, Redis, and filesystem boundaries so the verdict reflects the sink, not the network. The `JSON_PARSE`, `UNAUTHORIZED_ID`, and `DATA_EXFIL` cap dispatchers are wired into every emitter that ships these caps (Python, JS, TS, Go, Java, PHP, Ruby, Rust), so the verdict pipeline closes the loop on each cap end-to-end rather than per-language piecemeal.
- **Abstract-interpretation and symex sanitizer suppression.** Symbolic execution and the interval/string abstract domain are now consulted at verdict time, so a payload that the static engine would call dangerous but symex can prove never reaches the sink lands as NotConfirmed.
- **Guard-aware verdicts.** When a known input-validation or output-sanitization middleware sits in front of a Confirmed sink (Spring `@PreAuthorize`, Express `helmet`, Nest `@UseGuards`, Django `@permission_classes`, and the per-language registry in `src/dynamic/framework/auth_markers.rs`), the verdict demotes to `ConfirmedWithKnownGuard` and the guard names land on `differential.known_guards`. Authentication-only filters do not trigger the demotion since they do not mitigate injection.
- **Repro bundles.** Every verified finding writes a hermetic bundle to `~/.cache/nyx/dynamic/repro/<spec_hash>/` with `reproduce.sh`, `expected/{verdict.json,outcome.json,trace.jsonl}`, and a `docker_pull.sh` when the toolchain is pinned in `tools/image-builder/images.toml`. `--verbose` flushes the per-step `VerifyTrace` to stderr for live triage.
- **Real-engine harness paths.** LDAP injection routes through an embedded LDAPv3 BER server, exercised from Java via JNDI `InitialDirContext` and from Python and PHP via pure-stdlib BER clients. XPath injection runs against the live parser in each language: Java `javax.xml.xpath`, PHP `DOMXPath`, JS `xpath` npm, Python `lxml`. `Cap::CRYPTO` lands a `WeakKey` probe across Python, Go, Java, PHP, and Rust that flags sub-2^16 keys produced by non-CSPRNG sources. A new `HeaderSmuggledInWire` oracle predicate catches CRLF smuggling on hand-rolled raw-socket HTTP servers (Python `http.server`, Node `net`, Rust `std::net::TcpListener`) where framework-level CRLF strip cannot intervene.
- **Differential rule v2 and partial confirmations.** A finding confirms when *any* vulnerable payload in the set fires and *every* paired benign control stays clean, replacing the strict pair-wise rule so a single missing control no longer downgrades a confirmable finding. A new `PartiallyConfirmed` verdict marks findings where the sink is reached but the exploit chain does not complete (no marker written, no callback observed), so engine work can ratchet without the tool overstating what it proved.
- **Spec derivation v2.** Every derivation strategy now runs and is scored on flow-step depth, framework binding, cross-file source resolution via `GlobalSummaries`, and payload availability; the highest-scoring candidate wins and the runner-up ranking lands in the trace so engine gaps stay visible. Cross-file seeding walks the call graph (max depth 5) until a `Source` step or framework binding is found. New `EntryKind` adapters auto-recover the entry surface from framework decorators and annotations.
### Performance
- **Per-language build pools.** A warm `javac` daemon compiles batched harness sources in one long-lived JVM (Track O headline, Phase 22); Node, PHP, Ruby, Go, Rust, C, and C++ reuse shared module / package / object caches; Python layers a read-only venv per `requirements_hash` with a warmed bytecode cache. Target per-finding harness build: P50 ≤ 200ms hot, ≤ 1.5s cold. Pools self-skip when a toolchain is absent so toolchain-less CI rows stay green.
- **Copy-on-write workdirs.** Per-finding workdir setup uses `clonefile` on macOS and `reflink` / `copy_file_range` on Linux instead of copying every harness file, cutting setup cost to single-digit milliseconds.
- **Cap-routed concurrency lanes.** The verifier worker pool splits into per-cap lanes (`SSRF: 8`, `DESERIALIZE: 2`, `CRYPTO: 1`, and so on) so a slow harness for one cap cannot head-of-line block fast ones.
- **Ship-gate budgets.** Gate 3 holds the with-verify / static-only wall-clock ratio at ≤ 1.5x on `benches/fixtures/`; Gate 6 holds the Java OWASP Benchmark `--verify` run at ≤ 15 min on CI / ≤ 10 min on the dev reference machine.
### Determinism, policy, telemetry
- **YAML policy deny list.** `src/policy.rs` is consulted before harness build. Network egress, filesystem writes outside the sandbox root, and process spawns can be denied per-rule; deny decisions land in the trace, redacted via the shared scrubber.
- **Seeded RNG.** `dynamic::rand::SpecRng` is seeded from each `HarnessSpec` hash so two runs of the same spec produce identical payloads. `scripts/check_no_unseeded_rand.sh` audits the tree for unseeded `rand` usage on every CI run.
- **`VerifyTrace` observability.** Every per-step decision (probe selection, payload mutation, oracle check, deny verdict) writes to the trace stream and the repro bundle.
- **Schema-versioned telemetry.** `events.jsonl` carries `schema_version`, `nyx_version`, `corpus_version`, `kind`, and `ts` on every envelope. PII and secret scrubbing runs on every persisted artefact via `src/utils/redact.rs`.
- **`NYX_NO_TELEMETRY=1`** disables event persistence outright.
### CVE corpus and ground truth
- **New `Cap` corpora.** Vulnerable + patched fixtures landed for the seven new cap classes (LDAP injection, XPath injection, header injection, open redirect, SSTI, XXE, prototype pollution) plus deserialization, crypto, JSON parsing, unauthorized-id, and data exfiltration. Every cap now carries at least one positive / negative / adversarial / unsupported fixture quad per supported language.
- **OWASP Benchmark v1.2 importer.** `tests/eval_corpus/owasp_gt_convert.py` converts the OWASP Java Benchmark expected-results manifest into Nyx ground truth and lands a 16k-line `owasp_benchmark_v1.2.json` for evaluation.
- **NIST SARD importer.** `tests/eval_corpus/sard_gt_convert.py` converts SARD test cases into the same format so cross-dataset recall numbers stay comparable.
- **Evaluation corpus tooling.** `tests/eval_corpus/run_full.sh` runs the Nyx benchmark, OWASP Benchmark, and NIST SARD evaluation sets and writes `tests/eval_corpus/results.json`. `tests/eval_corpus/report.py` and `tabulate.py` produce the per-cap and per-language summary used to track coverage and accuracy.
- **Real-corpus acceptance gates.** `scripts/m7_ship_gate.sh` adds Gate 6 (Java OWASP Benchmark v1.2), Gate 7 (NodeGoat + Juice Shop), and Gate 8 (RailsGoat, DVWA, DVPWA, gosec, RustSec). Each row enforces the per-`(cap, lang)` budget in `tests/eval_corpus/budget.toml` and publishes per-cap precision / recall / confirmed-rate against a committed ground truth. The corpora are not vendored; each row self-skips unless its `NYX_<NAME>_CORPUS` points at a checkout.
- **Per-spec cryptographic canary.** Every oracle marker is now derived from `BLAKE3(spec_hash || run_nonce)` rather than a fixed literal, so markers are unique per finding, collision-resistant against ambient harness output, and never leak to the host. A compile-time audit rejects any new ad-hoc canary.
### Engine
- **DB fast-fail preflight.** `Indexer::init` reads the first 16 bytes of any candidate SQLite file and rejects anything without the standard `SQLite format 3\0` magic. Stops a misnamed JSON / text file from corrupting the index path with a SQLite error halfway through migration.
- **Symbolic-execution coverage.** Symex now recognises a wider set of string operations (`substr`, `replace`, `to_lower`, `to_upper`, `trim`, `strlen`) per the value/transfer pipeline, and the abstract-interpretation framework reasons about interval and prefix/suffix string facts during the dynamic verdict pass.
### CLI
- **`nyx scan --verify`** (enabled by default in standard builds) and `--backend {auto,process,docker}` select the dynamic-verification harness. `--no-verify` skips verification for a single run without changing config.
- **`nyx scan --harden {standard,strict}`** picks the process-backend hardening profile. `standard` is no-new-privs plus a memory rlimit on Linux. `strict` layers namespace unshare, chroot to the workdir, and a default-deny seccomp filter on Linux, or wraps the harness with `sandbox-exec` on macOS.
- **Patch-validation CI mode.** `--baseline FILE` reads a previous scan's JSON (or a stripped `.nyx/baseline.json` written by `--baseline-write`) and diffs it against the current scan on `stable_hash`, emitting `New` / `Resolved` / `FlippedConfirmed` / `FlippedNotConfirmed` transitions. `--gate {no-new-confirmed,resolve-all-confirmed}` exits non-zero when the diff violates the policy so CI fails the build instead of merging an unreviewed regression. The stripped baseline carries only `stable_hash`, `dynamic_verdict`, `severity`, `path`, and `rule_id`, so persisting it between scans does not leak source.
- **Repository triage in CI.** `nyx scan` now reads the same `.nyx/triage.json` file written by `nyx serve`. Terminal triage states (`false_positive`, `accepted_risk`, `suppressed`, `fixed`) are hidden from CLI output and excluded from `--fail-on` by default, while `--show-suppressed` includes them with `triage_state` / `triage_note` metadata for JSON, SARIF, and console output.
- **`nyx scan --verify-all-confidence`** drops the Medium cutoff and re-verifies everything.
- **`nyx scan --unsafe-sandbox`** disables hardening (development only, never for CI).
- **`nyx verify-feedback <finding_id> --wrong <reason> | --right`** records a correction or confirmation for a finding's verdict in the local telemetry log.
- **`nyx scan --explain-engine`** prints the effective engine configuration and exits without scanning.
- **`nyx surface`** (described above) with `--format {text,json,dot,svg}` and `--build`.
- **`nyx repro` subcommand.** Replays dynamic repro bundles by finding id,
spec hash, or explicit bundle path, with `--docker`, `--print-path`, and
`--list` helpers. The CLI now matches the browser UI's reproduced command
and uses bundle manifests to bridge stable finding ids to spec-hash cache
directories.
### Frontend
- **Project target selector in `nyx serve`.** The sidebar now remembers scan roots, lets you switch the active target, and accepts a new project path without restarting the server. `/api/targets` backs the selector, scans can opt into a different `scan_root`, and `nyx scan` / `nyx index build` register the projects they touch so `nyx serve` can pick them up later.
- **Surface page** with ELK auto-layout and the shared node-style palette.
- **Verdict badge** on finding detail, plus a dynamic-verdict section that surfaces the verdict, the payload that triggered it, and a link to the repro bundle.
- **Scan compare** gains a dynamic-verdict diff column so two scans can be compared on what was confirmed versus what was downgraded.
### License
- **Internal license grants documentation** at `LICENSE-GRANTS.md`. Grant 1 covers Nyctos derived works. The repo stays GPL-3.0-or-later; the grants document scope of internal product licensing.
## [0.7.0] - 2026-05-11
A focused release that adds seven new vulnerability classes, ships two SSA sidecars for XML and XPath parser hardening, deepens cross-file authorization for FastAPI, trims roughly a thousand auth false positives on Go DAO helpers along with the dominant Hibernate Criteria SQL cluster, and runs a performance pass on the auth extractor, SCCP, and the global summaries map. A `nyx rules list` CLI surfaces the rule registry, the web UI gets a brand-aligned visual refresh, and the CVE corpus grows across Python, PHP, JavaScript, and C.
### Highlights
- New caps for LDAP injection, XPath injection, header / CRLF injection, open redirect, server-side template injection, XXE, and prototype pollution, with per-language label rules across all eight supported languages.
- Cross-file FastAPI authorization: `include_router` chains and module-level `APIRouter(dependencies=[…])` now lift onto every attached route, with `Security(..., scopes=[...])` recognised distinctly from `Depends(...)`.
- Type-tracked XML and XPath hardening through two new SSA sidecars: parser bodies that set `secure_processing` / `processEntities: false` / `resolve_entities=False`, and `XPath` instances bound to `setXPathVariableResolver(...)`, are recognised as safe.
- ~957 `go.auth.missing_ownership_check` findings closed on gitea-shaped DAO helpers (id-scalar precision pass), 169 of 216 openmrs `cfg-unguarded-sink` findings closed on Hibernate Criteria-API receivers, joomla and drupal `php.deser.unserialize` closed on `Serializable::unserialize($input)` magic-method bodies.
- `nyx rules list` CLI subcommand, brand-aligned `nyx serve` visual refresh, and regenerated README / docs screenshots and GIFs.
### Detector classes
- New `Cap` bits and canonical rule ids: `Cap::LDAP_INJECTION` / `taint-ldap-injection`, `Cap::XPATH_INJECTION` / `taint-xpath-injection`, `Cap::HEADER_INJECTION` / `taint-header-injection`, `Cap::OPEN_REDIRECT` / `taint-open-redirect`, `Cap::SSTI` / `taint-template-injection`, `Cap::XXE` / `taint-xxe`, `Cap::PROTOTYPE_POLLUTION` / `taint-prototype-pollution`. Each ships per-language sink, sanitizer, and gated-sink rules across JS/TS, Python, Java, PHP, Go, Ruby, Rust, and C/C++. Severity, OWASP 2021 mapping, and human-readable description live in `CAP_RULE_REGISTRY` in `src/labels/mod.rs`; `cap_rule_meta()` and `rule_id_for_caps()` are the public lookups.
- `Cap` widened from `u16` to `u32` to fit the new bits. `Evidence.sink_caps` and `RuleInfo.cap_bits` follow. The serde decoder accepts any unsigned integer width so caches written before the bump still load. SQLite schema bumped from 3 to 4 to force a rescan, since older `source_caps` / `sanitizer_caps` / `sink_caps` blobs were emitted before any of the new bits could appear.
- `owasp_bucket_for` consults `CAP_RULE_REGISTRY` first so adding a cap class no longer requires a second-table edit. The match requires an exact rule id or a recognised separator (` `, `(`, `.`) so a future `taint-ssrf-allowlist-violation` cannot silently inherit `taint-ssrf`'s bucket. The legacy family-token table now also routes `xpath`, `header`, and `xxe` to A03 / A05.
- `issue_category_label` (dashboard badge) routes the seven new rule-id prefixes to dedicated labels: LDAP Injection, XPath Injection, Header Injection, Open Redirect, Template Injection, XXE, Prototype Pollution.
### Engine
- **XML-parser configuration tracking.** `src/ssa/xml_config.rs` runs alongside type-fact analysis and carries per-receiver `secure_processing` / `disallow_doctype` / `external_entities` flags forward through copy assignments and phi joins (meet for safe flags, sticky union for the unsafe `external_entities` polarity). `xxe_safe()` queries the result at the type-qualified `XmlParser.parse` sink and strips `Cap::XXE` when the parser was provably hardened (JAXP `setFeature(FEATURE_SECURE_PROCESSING, true)`, lxml `XMLParser(resolve_entities=False, no_network=True)`, fast-xml-parser `processEntities: false`). Persisted to `OptimizeResult.xml_parser_config`.
- **XPath-receiver configuration tracking.** `src/ssa/xpath_config.rs` mirrors the XML sidecar for Java's `XPath` instances: `setXPathVariableResolver(...)` flips the receiver's `has_resolver` flag, copy assignments union, phi joins meet. `xpath_safe()` strips `Cap::XPATH_INJECTION` at `xpath.evaluate(expr, ...)` / `xpath.compile(expr)` sinks when the receiver was provably bound to a resolver. Persisted to `OptimizeResult.xpath_config`.
- **Five new `TypeKind` variants.** `LdapClient` (JNDI `InitialDirContext` / `InitialLdapContext`, Spring `LdapTemplate`, ldapjs `createClient`, python-ldap `initialize`, ldap3 `Connection`), `XPathClient` (JAXP `newXPath`, lxml `etree.XPath`, npm `xpath`), `XmlParser` (JAXP factory products: `newDocumentBuilder`, `newSAXParser`, `getXMLReader`), `Template` (FreeMarker `new Template(...)` / `Configuration.getTemplate`), and `NullPrototypeObject` for JS/TS values produced by `Object.create(null)`. Wired into `constructor_type` for return-type inference and `TypeKind::label_prefix()` for type-qualified callee resolution. `XPathClient` is kept distinct from `DatabaseConnection` so a generic `pdo->query` SQL_QUERY sink does not collide with `xpath.query`.
- **`GateActivation::LiteralOnly`.** Strict literal-value activation: the gate fires only when the activation argument is a literal that matches `dangerous_values` / `dangerous_prefixes`. Unknown or dynamic activation argument suppresses (no conservative `ALL_ARGS_PAYLOAD` push). Used where the dangerous shape is identifiable only by an explicit literal flag, e.g. `jQuery.extend(true, target, src)` deep-merge against Backbone's `Model.extend({proto})`.
- **Two new path-state predicates for inline open-redirect sanitisers.** `RelativeUrlValidated` covers `x.startsWith("/")`, `x.starts_with("/")`, `x.startswith("/")`, PHP `strpos($x, "/") === 0`, and direct `x[0] === "/"`. `HostAllowlistValidated` covers `new URL(x).host === ALLOWED`, `urlparse(x).netloc == ALLOWED`, multi-statement `parsed.host_str() == "..."` for Rust, and `parsed.Host == "..."` / `parsed.Hostname() == "..."` for Go. Both clear `Cap::OPEN_REDIRECT` only on the validated branch, leaving any non-redirect taint downstream to fire on its own caps. The Go form gates on case-sensitive capital `H` so a lowercase `u.host == X` field comparison falls through to the generic `Comparison` predicate.
- **`Object.create(null)` recogniser.** `is_object_create_null_call` in `cfg/literals.rs` matches `Object.create(null)` (and parenthesised, awaited, or TS type-cast wrappers) and tags `CallMeta.produces_null_proto = true`. Type-fact analysis lifts the flag to `TypeKind::NullPrototypeObject` on the returned SSA value so the synthetic `__index_set__` sink is suppressed flow-sensitively. Phi joins drop the tag back to `Unknown` so a partial null-proto receiver still fires on the unsafe path.
- **CFG-layer prototype-pollution suppression** at the synthetic `__index_set__` sink (JS/TS, recognised by the existing `try_lower_subscript_write` lowering). Three flow-insensitive shapes elide the `Sink(PROTOTYPE_POLLUTION)` label before SSA sees the node: constant-key fold (literal key not in `__proto__` / `constructor` / `prototype`), reject pattern (sibling `if (idx === "__proto__" || ...) return / throw / break;`), and allowlist pattern (ancestor `if (idx === "name" || idx === "id") { obj[idx] = v }`). Walks stop at the enclosing function so closure-captured guards in an outer scope cannot silently authorise inner assignments.
- **Spring MVC `return "redirect:" + tainted` recogniser** (Java). `try_lower_spring_redirect_return` in `cfg/mod.rs` matches the leftmost `+`-chain whose root is a `redirect:` string literal and emits a synthetic `__spring_redirect__` Call sink with `Sink(Cap::OPEN_REDIRECT)` between the predecessors and the Return node. Concatenated identifiers from anywhere in the right-hand chain feed the synthetic node's `arg_uses[0]`, so the taint pipeline carries any tainted suffix through OPEN_REDIRECT.
- **Subscript-set form classification for header sinks.** `response.headers["X-Foo"] = bar` / `headers["X-Foo"] = bar` (Ruby `element_reference`, JS/TS `subscript_expression`, Python `subscript`) had no `property` field on the LHS. `push_node` now walks into the subscript's `object` and classifies its member-expression text, so `Cap::HEADER_INJECTION` fires on the bare bracket form alongside `setHeader` / `res.set` / `headers_mut.insert`.
- **PHP literal extraction** extended in `cfg/literals.rs`: PHP `encapsed_string` (double-quoted) when every child is a pure-literal segment; boolean literals (`true` / `false`) for the jQuery `extend(true, ...)` `LiteralOnly` gate; leading-string `binary_expression` concat (`"Location: " . $url`, JS/TS `"Location: " + url`) so `dangerous_prefixes` matching activates on partially dynamic concatenations.
- **PHP receiver-text strip** in `helpers::root_receiver_text` drops the leading `$` from `variable_name` nodes so `$smarty->fetch(...)` / `$twig->createTemplate(...)` reconstruct as `Smarty.fetch` / `Environment.createTemplate` for suffix-matcher gates.
- **Gate-callee resolution hardening for member-source rewrites.** When `first_member_label` rewrites a call's `text` to a Source like `req.body`, the gate matcher now reads the call's `function` / `method` / `name` field instead, so `setValue(target, req.body, ...)` matches the `setValue` proto-pollution gate. Whitespace stripped from the function field so multi-line chains still match flat gate matchers.
- **Ruby option-constant lookup in gate activation.** Bare `scope_resolution` / `constant` nodes (`Nokogiri::XML::ParseOptions::NOENT`) now fall back to the macro-arg extractor used by C/C++/PHP, so Nokogiri XXE gates activate on idiomatic option-flag arguments.
- **PHP `unary_op_expression` negation recognition.** tree-sitter-php emits `unary_op_expression` for unary `!`; CFG `detect_negation` and condition-chain decomposition now match it, so `if (!validate($x))` no longer carries `condition_negated=false` and the surviving branch is the rejection arm, not the validated one.
- **PHP container kinds.** `declaration_list`, `interface_declaration`, `trait_declaration`, `enum_declaration`, `enum_declaration_list` mapped to `Kind::Block` so methods inside them participate in CFG construction.
- **Go variadic `parameter_declaration` named-field handling** for `collect_param_names`. `name` and `type` named fields read directly so type-segment identifiers no longer pollute the param-name set (`info *PackageInfo` no longer contributes `PackageInfo`).
- **Empty-formals SSA lowering signal.** Per-parameter summary probing now seeds via `BodyMeta.param_destructured_fields`; JS/TS arrow `() => {…}` lowers with `with_params=true` so it is treated as "explicitly zero formals" rather than "no formals info".
### Authorization
- **FastAPI cross-file `include_router` dependency tracking.** `auth_analysis/router_facts.rs` captures per-file router declarations (`<router> = X(deps=[…])`) and `<parent>.include_router(<child_module>.<child_var>)` edges in pass 1, persists them into `GlobalSummaries::router_facts_by_module`, and resolves them into the active file's `AuthorizationModel::cross_file_router_deps` at pass 2 entry. Transitive lifts (grandparent to parent to child) handled by iterative index walk. Module identity is the file basename without `.py`. Closes the airflow execution-API shape where a child router lives in `routes/task_instances.py` and its auth is declared on the parent in `routes/__init__.py`.
- **FastAPI router-level `dependencies=[...]` propagation.** Module-level `router = APIRouter(dependencies=[Security(...)])` is pre-walked once per file and merged onto every `@<router>.<verb>(...)` route attached in the same file. Closes airflow execution-API routes that re-use a single `ti_id_router` declared once at module scope.
- **FastAPI `Security(callable, scopes=[...])` recognised distinctly from `Depends(callable)`.** Scoped Security promotes the synthetic `AuthCheck` to `AuthCheckKind::Other` (route-level scope-checked authorization), not Login. New scope-tracking boolean threaded through `expand_decorator_calls` and `extract_fastapi_dependencies`.
- **Caller-scope IPA: same-file route-handler-to-helper auth lift.** `apply_caller_scope_propagation` walks every non-route helper unit; if its in-file callers are non-empty AND every caller is itself an authorized route handler (route-level non-Login auth check) or already authorized via this same propagation, the caller's checks lift onto the helper as synthetic `is_route_level=true` `AuthCheck`s. Iterated to a small fixpoint so transitive helper chains (route to mid_helper to leaf_helper) are covered. Refuses to authorize helpers with no in-file caller, helpers called from a mix of authorized and unauthorized callers, and helpers called only from un-lifted helpers. Cross-file lifting is not implemented. Closes the dominant FastAPI / Django / Flask "route authenticates via decorator/dependency, then delegates to a private helper that performs the sink" FP shape on sentry / saleor / airflow.
- **Go DAO-helper id-scalar precision pass.** For non-route Go units, a parameter whose declared type is a bounded primitive scalar (`int64`, `uint32`, `string`, `bool`, `byte`, `rune`, `float64`, …) and whose name is id-shaped (`id`, `*Id`, `*_id`, `*ids`) is dropped from `unit.params` before ownership-check evaluation. Real Go HTTP handlers always carry a framework-request-typed param (`*http.Request`, `*gin.Context`, `echo.Context`, `*fiber.Ctx`); per-framework route extractors set `include_id_like_typed=true` so id-shaped path params survive on real routes. Mirrors the existing Python `is_python_id_like_typed_param` filter. Closes ~957 `go.auth.missing_ownership_check` findings on gitea backend DAO helpers (`func GetRunByRepoAndID(ctx, repoID, runID int64)`, `func DeleteRunner(ctx, id int64)`, the entire `models/...` layer where the ownership check sits in the calling route handler) and equivalent shapes in minio / Go ORM codebases.
- **Bare-callee verb-name fallback gate.** `list(...)`, `filter(...)`, `update(...)`, `create_audit_entry(...)`, `update_coding_agent_state(...)` (no receiver dot at all) no longer classify as `DbMutation` / `DbCrossTenantRead` via the loose verb-name fallback. Real ORM/DB calls carry a receiver (`User.find(id)`, `Model.objects.filter`, `repo.save(x)`); a bare `list(events)` is the Python builtin and `filter(fn, xs)` is `Iterable.filter`. New helper `receiver_is_simple_chain(callee)` requires a non-chained receiver dot. The realtime / outbound / cache prefix dispatches still match by chain root.
### Type-aware sinks and validators
- **Java JPA / Hibernate Criteria API as structural SQL.** `TypeKind::JpaCriteriaQuery` covers `CriteriaQuery<T>`, `CriteriaUpdate<T>`, `CriteriaDelete<T>`, `Subquery<T>`, `TypedQuery<T>`. `sink_args_jpa_criteria_query_safe` clears `cfg-unguarded-sink` SQL_QUERY when any positional argument to the sink call is JpaCriteriaQuery-typed (receiver excluded; receiver of `session.createQuery(cq)` is the Session/EntityManager channel, never the SQL payload). `cb.createQuery(...)`, `em.getCriteriaBuilder()`, and the JpaCriteriaQuery type chain inferred via constructor / factory return-type hints in `type_facts.rs`. Closes the dominant FP cluster on openmrs (169 of 216 cfg-unguarded-sink), xwiki, and keycloak Hibernate DAO methods.
- **Receiver-side validator registry.** `labels::lookup_receiver_validator(lang, callee)` clears `Cap` from the receiver value (and call equivalents) on success, distinct from `Sanitizer` which clears caps from the return value. Python registers `relative_to => Cap::FILE_IO` so `path.relative_to(base)` drops the file-IO cap on the path. Closes the CVE-2024-23334 patched aiohttp `static_root_path.joinpath(filename).resolve().relative_to(static_root_path)` shape.
- **JS/TS Array-method validator-callback narrowing.** `arr.filter(isSafeIdentifier)`, `arr.find(isValidId)`, `arr.findLast(...)` with a `BooleanTrueIsValid` callback (`isValid…`, `isSafe…`, `hasValid…` and snake-case variants) propagate `validated_must` through the call's return value. Resolves callback name from `info.arg_callees` (call-shape arguments) and SSA `value_defs[v].var_name` (bare-identifier callbacks, the dominant patched-CVE form). Strict-additive: anonymous arrows / opaque identifiers leave existing propagation untouched. `findIndex` / `every` / `some` excluded (scalar return shape). Motivated by CVE-2026-42353.
- **JS/TS ternary-branch source classification.** `let arr = cond ? req.query.lng : "";` previously lowered each branch to a labelless Assign with empty uses; the join phi saw no taint. `lower_ternary_branch` now runs `first_member_label` on the branch AST when no `Source` label is already attached.
- **PHP `fopen` modeled as `Sink(Cap::SSRF)`** (same dual SSRF / LFI shape as `file_get_contents`; fires only on tainted argument). Closes CVE-2026-33486 (roadiz/documents `DownloadedFile::fromUrl` wrapping `fopen($url, 'r')`).
- **PHP `Serializable::unserialize($input)` magic-method passthrough recognition.** The legacy `Serializable` interface contract (deprecated since PHP 8.1) requires the implementation to call `\unserialize($input)` on the formal parameter inside `public function unserialize($x) { ... }`. PHP itself invokes the method when restoring an instance, so the body's call cannot be removed without breaking the interface. `php.deser.unserialize` now suppresses inside this exact shape (method named `unserialize`, single formal, bare-parameter argument). Class-level `Serializable` implementation is the actionable signal (fix is migration to `__serialize` / `__unserialize`). Closes joomla / drupal Serializable-implementing class FPs.
- **SQLAlchemy query-builder chained-call recognition.** `select(X).filter_by(...)`, `query(X).filter(...)`, `select().join().where()` chains now anchor through the chain root primitive when the chain receiver type is opaque. New `db_query_builder_roots` config (Python defaults: `select`, `query`). Closes airflow `session.scalar(select(C).filter_by(conn_id=user_input))` shapes that previously dropped under the chained-call suppression in `classify_sink_class`.
- **Python non-sink container constructor recognition.** Bare-callee `set()` / `dict()` / `list()` / `tuple()` / `frozenset()` / `defaultdict(...)` is treated as a non-sink constructor, so `verified_ids = set(); verified_ids.update(myteams)` does not classify the `.update` call as `DbMutation`. Type-annotation hint form `set[int]` / `dict[str, int]` recognised via PEP 585 generic suffix strip alongside the existing angle-bracket strip.
- **Python `request.match_info` source label** (aiohttp path-parameter source).
- **New Python pattern `py.xss.make_response_format` (Tier B).** Flask `make_response(<f-string-or-concat>)` reflection. Recognises both bare `make_response(...)` and `flask.make_response(...)`. Closes CVE-2023-6568 (mlflow auth `create_user` reflecting attacker-controlled `Content-Type` header into the response body).
### Language coverage
Per-language label rules expanded for the seven new caps.
- **JavaScript / TypeScript:** ldapjs `LdapClient.search`, `escapeXpath` / `xpathEscape`, `document.evaluate` / npm `xpath.select`, `setHeader` / `res.set` / `res.append` / `res.headers[]=`, `stripCRLF` / `escapeHeader`, lodash / dot-prop / object-path deep-merge prototype-pollution gates, Handlebars / EJS / Mustache template sinks, fast-xml-parser / xml2js with `processEntities`-aware activation, `redirect` / `Location` open-redirect sinks.
- **Python:** python-ldap `LDAPObject.search_s`, ldap3 `Connection.search`, lxml `etree.XPath` / `lxml.etree.parse` with parser-config awareness, Flask `response.headers[]=` / `make_response`, Jinja2 `Template(...)` and Mako `Template(...)` SSTI sinks, `flask.redirect` / `aiohttp HTTPFound` open-redirect.
- **Java / Kotlin:** `DirContext.search`, `XPath.evaluate` / `XPath.compile`, JAXP `DocumentBuilder.parse` / `SAXParser.parse` / `XMLReader.parse`, FreeMarker `Template.process`, Spring `redirect:` view-name synthetic sink, `HttpServletResponse.setHeader` / `addHeader`.
- **PHP:** `ldap_search` / `ldap_list` / `ldap_read`, `DOMXPath::query` / `DOMXPath::evaluate`, `header()` with leading-prefix activation, Smarty `fetch` / Twig `createTemplate` / Blade compile + `eval` template forms, `loadXML` / `simplexml_load_string` with `LIBXML_NOENT` activation.
- **Go:** `go-ldap conn.Search`, `etree.Path` / `xmlpath.Compile`, `http.Header.Set` / `Response.Header().Set`, `html/template` and `text/template` `Parse(...)`, `encoding/xml.Unmarshal` / `Decoder.Decode`, `http.Redirect` with relative-URL / host-allowlist gating.
- **Ruby:** `Net::LDAP#search`, `Nokogiri::XML::Document#xpath`, `response.headers[]=`, `ERB.new` SSTI, `Nokogiri::XML.parse` with `NOENT` / `DTDLOAD` activation, `redirect_to` with relative-URL gate.
- **C / C++:** libldap `ldap_search_ext_s`, libxml2 `xmlXPathEval`, `curl_easy_setopt` with header-list activation, libxml2 `xmlReadFile` / `xmlReadMemory` with `XML_PARSE_NOENT` activation.
- **Rust:** actix-web `HeaderMap.insert` / `HeaderValue::from_str` header-injection gates. `Redirect::to` retagged from `Cap::SSRF` to `Cap::OPEN_REDIRECT` so the open-redirect rule fires distinctly from the SSRF rule.
`NYX_PYTHON_PROTO_POLLUTION` opt-in flag: Python `dict.update` / `__dict__.update` proto-pollution gates are off by default because bare `update` overlaps too broadly with `Counter.update` and ordinary state-mutation patterns to ship as a default sink.
### CVE corpus
- **C.** CVE-2017-1000117 (git argv injection via `ssh://-oProxyCommand=…`) vulnerable + patched fixtures under `tests/benchmark/cve_corpus/c/CVE-2017-1000117/`. Known remaining gap: array-element taint propagation, `c.cmdi.exec*` AST patterns, and dash-prefix-byte sanitizer recognition.
- **Python.** CVE-2023-6568 (mlflow reflected XSS), CVE-2024-21513 (langchain SQL / Jinja), CVE-2024-23334 (aiohttp static-file path traversal) vulnerable + patched fixtures.
- **PHP.** CVE-2026-33486 (roadiz/documents SSRF) vulnerable + patched fixtures.
- **JavaScript.** CVE-2026-42353 (i18next-http-middleware path traversal) vulnerable + patched fixtures.
### CLI
- **`nyx rules list`** subcommand. Surfaces the same registry the dashboard's `/api/rules` page reads from: built-in cap-class entries (one per `Cap` with a canonical rule id), per-language label rules (sink / source / sanitizer), gated sinks, and any custom rules from config. Filters: `--lang <slug>`, `--kind <class|source|sink|sanitizer>`, `--class-only` for registry entries only, `--no-class` for per-language rules only. `--json` for machine output. Cap-class entries carry `language = "all"` so a language filter still surfaces them unless `--no-class` is set.
- **`RuleInfo.is_class` / `RuleInfo.emission_active` flags.** Cap-class entries carry `is_class = true` so dashboards can group them separately. `emission_active = false` marks legacy classes (SQL_QUERY, SSRF, FILE_IO, FMT_STRING, DESERIALIZE, CODE_EXEC, CRYPTO) whose findings still surface under the catch-all `taint-unsanitised-flow` rule id; the seven new classes plus `unauthorized_id` and `data_exfil` are `emission_active = true`. The active set is pinned in `cap_rule_registry_emission_active_set_is_pinned` so a future migration of a legacy cap cannot drift silently.
- **`parse_cap` and `CapName::FromStr`** accept the new short names: `ldap_injection` / `ldapi`, `xpath_injection` / `xpathi`, `header_injection` / `crlf` / `response_splitting`, `open_redirect` / `redirect`, `ssti` / `template_injection`, `xxe`, `prototype_pollution` / `proto_pollution`, plus the existing `data_exfil` alias. The `nyx config add-rule --cap` flag and `[analysis.languages.*.rules]` entries take any of these.
### Frontend
- **Refreshed local web UI visual system** around the mint-cyan Nyx brand: warmer light surfaces, deep green accents, updated severity / confidence colors, tighter typography, smaller radii, denser cards, table, badge, button, header, and sidebar styling, and matched graph / code-viewer colors.
- **Reworked `nyx serve` surfaces** for a more operational layout. Overview uses the refreshed health-score card and chart grid; Scans has a fixed compact table with capped language badges; Scan Detail places summary and timing data side by side; Triage, Rules, Config, Explorer, Finding Detail, Scan Compare, and Debug pages received focused spacing, overflow, and density fixes.
- **Branded asset set** shared between the SPA and the embedded server bundle: PNG favicons, Apple touch icon, sidebar logo image, refreshed SVG favicon, and Rust static handlers for the new `/logo.png` and favicon files.
- **Frontend `RuleListItem` and `RuleDetailView`** carry the new `is_class` flag so the dashboard's Rules page can group cap-class entries separately.
- **Regenerated README and docs screenshots and GIFs** against the new UI at 1600x992, saving raw originals before framing and adding CLI GIF plus combined CLI-to-serve demo GIF capture support. Extended the screenshot capture workflow with mint-led framing copy, optional `nyxscan.dev` asset mirroring, WebP regeneration for mirrored PNGs, and raw `_raw` image / GIF outputs for downstream reuse.
### Performance
- **Hoisted `collect_top_level_units` out of the per-extractor loop** in `extract_authorization_model`. Multi-extractor languages (Go gin+echo, JS/TS express+koa+fastify, Python flask+django, Rust axum+actix_web+rocket, Ruby sinatra) had been re-walking the entire AST and rebuilding the `Function`-kind unit set per extractor, then deduping by span. New `AuthExtractor::requires_top_level_units()` opt-out for Spring / Rails which build their own. Was 46% of `extract_authorization_model` wall-clock on the mattermost/server/channels/app subtree.
- **Single `AuthorizationModel` build per file in fused mode.** The diag path and the per-file summary path each ran their own `extract_authorization_model`, duplicating the hoisted unit pass and every framework extractor's AST walk. Auth summaries now extract from the base model (pre var-types, pre helper-lifting) so the persisted per-file summary matches the legacy `extract_auth_summaries_by_key` path bit-for-bit.
- **O(N) shallow value-ref emission in `collect_unit_state`.** The previous per-node `extract_value_refs(node, bytes)` walked the entire subtree on every recursion level (O(N²) per body) even though the recursion below already visits every descendant once. New `append_shallow_value_ref` emits the node's own ref and lets recursion handle the descent. Public callers of `extract_value_refs` (`collect_call`, `collect_condition`, assignment-side extraction) keep the deep walk. Was ~17% + 15% + 11% of wall-clock split across `build_function_unit_with_meta`, `collect_unit_state`, and `extract_value_refs` on mattermost.
- **Per-`ParsedFile` `body_const_facts_cache: OnceCell`.** SSA + const-prop + type-fact build was running 2-3× per body across `run_cfg_analyses_with_lowered`, `run_auth_analyses`, and `collect_file_var_types`. Single-pass cache; gin profile dropped from 13.6% to ~4.5%.
- **SCCP switched from `HashMap<SsaValue, _>` and `HashSet<(BlockId, BlockId)>`** to dense `Vec` per-value lattice and per-destination predecessor `SmallVec<[BlockId; 2]>`. The inner fixed-point loop no longer SipHashes a 64-bit pair for every operand of every phi. Public `ConstPropResult` shape unchanged (one final O(num_values) HashMap conversion).
- **`GlobalSummaries.by_key` switched to `FxHashMap`** (rustc-hash 2.1) from stdlib SipHash. `FuncKey` carries 3 String fields, so any HashMap operation hashes at least 30 bytes; FxHash is ~5× faster on this workload. Seed is fixed (no DoS hardening), fine for an in-process index keyed by program-derived names.
- `large_go_module.go` perf fixture (1493 lines) added to `benches/perf_fixtures/`; `benches/scan_bench.rs` extended with auth-extractor, SCCP, and summary-resolution rows.
### Fixed (false positives)
- `Object.create(null)` receivers no longer fire prototype-pollution at the synthetic `__index_set__` sink. Suppression is flow-sensitive via `TypeKind::NullPrototypeObject` so a phi join that only sometimes resolves to a null-proto receiver still fires on the unsafe path.
- `cfg-unguarded-sink` over-fires on JS/TS object-literal property writes guarded by an explicit `__proto__` / `constructor` / `prototype` reject `if` (early `return` / `throw` / `break`) or by an allowlist `if` whose true arm contains the assignment. Resolved at the CFG layer before the SSA sink scan.
- Spring MVC `return "redirect:" + url` flagged generic `taint-unsanitised-flow` even when the redirect destination was the load-bearing taint. Now routed through the synthetic `__spring_redirect__` sink so the finding emerges as `taint-open-redirect`.
- `$smarty->fetch(...)` / `$twig->createTemplate(...)` no longer drop their SSTI gate match on idiomatic PHP receiver shapes.
- `setValue(target, req.body, ...)` and similar wrappers no longer gate-match on the rewritten Source `req.body` text.
- Nokogiri / lxml / fast-xml-parser parser bodies hardened with `setFeature` / `processEntities: false` / `XMLParser(resolve_entities=False)` no longer fire `taint-xxe`.
- `XPath` instances bound to `setXPathVariableResolver(...)` no longer fire `taint-xpath-injection` on subsequent `xpath.evaluate(expr, ...)` sinks.
- Inline `if (!url.startsWith("/")) reject` and `if (new URL(url).host !== ALLOWED) reject` open-redirect sanitisers narrow `Cap::OPEN_REDIRECT` on the validated branch instead of falling through to the generic `Comparison` predicate. Other taint downstream still fires on its own caps.
- Rust `Redirect::to` no longer fires `taint-ssrf` for what is structurally an open redirect; retagged to `Cap::OPEN_REDIRECT`.
- ~957 gitea backend DAO `go.auth.missing_ownership_check` findings (id-scalar precision pass).
- 169 of 216 openmrs `cfg-unguarded-sink` findings (JpaCriteriaQuery type). Equivalent reductions on xwiki / keycloak Hibernate DAO clusters.
- joomla and drupal `php.deser.unserialize` flagged inside `Serializable::unserialize($input)` magic-method bodies.
- airflow execution-API routes flagged `missing_ownership_check` despite being authorized via cross-file `include_router` chains and module-level `APIRouter(dependencies=[…])` declarations.
- sentry `verified_ids = set(); verified_ids.update(myteams)` flagged as `DbMutation`.
- aiohttp `path.relative_to(static_root_path)` not recognised as a path-traversal validator.
- i18next-http-middleware `arr.filter(utils.isSafeIdentifier)` not narrowing taint on the result.
- `cond ? req.query.lng : ""` ternary lost `Source` label on the truthy branch.
- `if (!validate($x))` rejection-arm narrowing flipped on PHP unary `!`.
- mlflow `make_response(f"Invalid content type: '{content_type}'")` (Tier B pattern).
- Bare-callee verb-name dispatch on Python builtins / locally-defined helpers (`list`, `filter`, `update`, `create_audit_entry`, `update_coding_agent_state`).
- FastAPI `Depends(...)` / `Security(...)` deps declared on a module-level `APIRouter` no longer dropped on every attached route.
- FastAPI `Security(callable, scopes=[...])` no longer downgraded to a Login-only check.
### Tests
- New per-cap integration suites: `tests/{xpath_injection,xxe,ssti,prototype_pollution,header_injection,open_redirect,ldap_injection}_tests.rs`, plus `python_proto_pollution_tests.rs` for the env-gated Python form. Per-cap fixture trees under `tests/fixtures/<class>/<lang>/` cover safe, unsafe, and irrelevant-baseline shapes for every supported language.
- Cross-file FastAPI integration test `tests/fastapi_cross_file_include_router_tests.rs` with airflow-shaped fixture tree under `tests/fixtures/auth_cross_file/airflow_execution_api_includes/`.
- New `cfg/cfg_tests.rs` covers ternary-branch CFG lowering shapes.
- New `summary/tests.rs` covers cross-file `include_router` summary persistence and resolution.
- Per-language safe / vuln auth and detector fixtures across Python, Java, Go, PHP, JS, TS.
### Other
- Refactor passes across `auth_analysis`, `ssa/const_prop`, `ssa/type_facts`, `summary`, and the per-framework auth extractors (cleaner conditional checks, simpler function signatures, deduplicated assertions). No behaviour change.
- README links to a Simplified Chinese translation (`README.zh-CN.md`).
## [0.6.1] - 2026-05-03
A precision pass on auth and resource analysis plus three fresh CVE corpus pairs, plus a UTF-8 slice panic in the path abstract domain. Closes ~1900 Go auth FPs on gitea-shaped helpers, the mastodon/diaspora private-callback Ruby controller pattern, and a phantom-taint outbreak from JS/TS / Java lambda shorthand in jest-style nested test callbacks.
### Added
- Java JDBC raw-SQL sinks. `Statement.execute`, `Statement.executeBatch`, and `Statement.executeLargeUpdate` modeled as `SQL_QUERY` sinks, classified via type-qualified resolution (`DatabaseConnection.execute`) so bare `execute` (Runnable, Executor, HttpClient) does not over-fire. `conn.createStatement()` and `conn.prepareCall()` now infer return type `DatabaseConnection`, so the JDBC chain `Statement s = conn.createStatement(); s.execute(q)` types `s` correctly. Closes GHSA-h8cj-hpmg-636v (Appsmith FilterDataServiceCE.dropTable). Vulnerable + patched Java fixtures added.
- Java/Kotlin `Pattern.matcher(value).matches()` chain recognised as a `ValidationCall` allowlist. Receiver of `.matcher(` must contain `regex` or `pattern`. Validation target is the `.matcher()` argument, not the bare `.matches()` receiver. Branch narrowing applies the `validated_must` to the input variable on the surviving branch. Same GHSA as above (`FILTER_TEMP_TABLE_NAME_PATTERN.matcher(tableName).matches()`).
- Per-parameter SSA summary probe now receives `BodyMeta.param_types`, so `extract_ssa_func_summary` runs a local `analyze_types_with_param_types` pass before extraction. Helper bodies whose sinks resolve only via type-qualified callees (e.g. `DatabaseConnection.execute` for JDBC `Statement.execute`) no longer drop the sink during cross-function summary extraction. Fixes the Appsmith helper `executeDbQuery(query)` that routed SQL through `statement.execute(query)`.
- Short-circuit branch condition CFG nodes now mirror `condition_vars` into `taint.uses`, so `apply_branch_predicates` interns the variable for short-circuit-decomposed validators (`if (x == null || !regex.matcher(x).matches()) throw`). Without this, the per-disjunct cond nodes built via `build_condition_chain` silently no-opped and `x` never reached `validated_must` on the surviving branch.
- Go `goqu.L(s)` and `goqu.Lit(s)` raw-SQL literal builders modeled as `SQL_QUERY` sinks. Safe siblings (`goqu.I` identifier, `goqu.C` column, `goqu.T` table, `goqu.V` parameterised value, `goqu.SUM`, `goqu.COUNT`, …) stay unlabeled. Gin source list extended with the array-returning siblings of the existing scalar helpers: `c.QueryArray`, `c.GetQueryArray`, `c.PostFormArray`, `c.GetPostFormArray`. Closes CVE-2026-41422 (daptin: `c.QueryArray("column")``goqu.L(project)` with the loop variable lifted through `for _, project := range columns`). Vulnerable + patched Go corpus pair under `tests/benchmark/cve_corpus/go/CVE-2026-41422/`.
- Go `for ident := range iter` def-use lifting. The `range_clause` child of `for_statement` is now consulted when `left`/`right` aren't direct fields of the `for` node, so taint from the iterable reaches the loop binding. Required for the daptin CVE shape above.
- Java `enhanced_for_statement`, PHP `foreach`, and Ruby `for` def-use lifting, completing the loop forms the Go `range_clause` fix above started. The `Kind::For` def-use arm only knew the JS/Python `left`/`right` pair and Go's `range_clause`; Java carries the binding on `name` and the iterable on `value`, Ruby's `for` on `pattern`/`value`, and PHP's `foreach` keeps both as unnamed children split by the `as` keyword, so none recorded the loop variable as a define and taint on the iterable never reached the binding (`for (Cookie c : req.getCookies()) { … c.getValue() … }` lost the flow at `c`). Each form now folds onto the shared define/use path. Lifts Java OWASP Benchmark recall: path_traversal 0.21 → 0.32, sqli 0.16 → 0.28, cmdi 0.04 → 0.08.
- Iterable-expression classification for the loop forms above. The loop node is classified against its iterable text, so a source-returning iterable (`req.getCookies()`, `req.getParameterValues("v")`, `$_GET['list']`) lands a `Source` on the loop node and the binding inherits its taint, the same rewrite JS/Python `for … of` / `for … in` already had. Subscript iterables (`$_GET['x']`, `params[:list]`) classify on their base object since sources key on the base name, not the index.
- Java iterable-returning request accessors modeled as sources: `getParameterValues`, `getParameterMap`, `getParameterNames`, `getHeaders`, `getHeaderNames`. The `getParameter` / `getHeader` matchers are word-boundary suffix matches and never covered the plural collection variants that feed for-each loops (`for (String s : req.getParameterValues("v"))`). The dominant OWASP Benchmark vulnerable-source shape.
- Rust format-string named-argument lifting (`format!("...{x}...")`, stable since 1.58). Identifiers captured by `{name}` / `{name:fmt-spec}` are pulled into the call's `uses` for known format-style macros: `format`, `print`/`println`, `eprint`/`eprintln`, `write`/`writeln`, `panic`, `format_args`, `assert`/`debug_assert`, `todo`, `unimplemented`, `unreachable`, plus log-crate severity macros (`info`, `warn`, `error`, `debug`, `trace`). Recursive descent through one or two layers of expression wrapping (`format!("{x}").to_owned()`, RHS chained method calls). Without this, taint stopped at the macro boundary. `let q = format!("...{x}...")` carried no `x` because the identifier lives in format-string bytes rather than as a separate AST argument node. Mirrors the Python f-string lifter.
- Rust CVE corpus extended. CVE-2023-42456, CVE-2024-32884, CVE-2025-53549 vulnerable + patched fixtures under `tests/benchmark/cve_corpus/rust/`.
- Java lambda shorthand recognised by `extract_param_meta`. `lambda_expression`'s `parameters` field as a bare `identifier` (`cmd -> …`) or as an `inferred_parameters` wrapper around identifiers (`(a, b) -> …`) was not matching the formal_parameter / spread_parameter kinds in `PARAM_CONFIG`, so the lambda appeared parameterless and the SSA pipeline treated its formals as closure captures. Mirrors the JS/TS arrow shorthand path.
### Fixed
- Panic on non-ASCII input to `has_first_char_absolute_check` in the path abstract domain. The 32-byte search window around `[0]` was sliced as `&clause[lo..hi]` (str), which panicked when `hi` landed inside a multi-byte UTF-8 char (e.g. the em dash `—`, bytes 34..37). Switched to `&bytes[lo..hi]` with `windows()` byte-pattern checks; all needles are ASCII so the searches are equivalent. Surfaced by `cargo fuzz` (`scan_bytes` target, `.c` extension path, embedded `—` in a comment near `s[0] == '/'`). Regression test added.
### Fixed (false positives)
- `cfg-unguarded-sink` parameter-only trace no longer clears a sink argument whose reaching definition is a loop binding. Once the loop variable resolves to its iterable (the def-use lifting above), a `foreach ($param as $v) { sink($v) }` element looked like a bare `sink($p)` wrapper pass-through and the structural finding was dropped. A loop element over a parameter collection is not wrapper plumbing, so the finding survives for loop-bound sink arguments; literal-keyed arrays stay suppressed through `sink_arg_uses_safe_foreach_key`. Keeps the negative case in `fp_guard_php_foreach_safe_literal_keys` firing.
- Go `unit_has_user_input_evidence` framework-request-name allow-list narrowed for Go. `ctx`, `context`, `info`, `body`, `path`, `payload`, `dto`, `form`, `query` are no longer treated as user-input indicators on Go: in Go these are `context.Context` (cancellation/value-bag from the stdlib) or struct-pointer payload params (`info *PackageInfo`, `opts *FooOptions`), not request bindings. Go HTTP frameworks bind the request to per-framework typed params (`r *http.Request`, `c *gin.Context`, `c echo.Context`, `c *fiber.Ctx`); these arrive at the gate via `RouteHandler` kind or the type-aware param filter below. Stdlib `req` / `request` (the `*http.Request` convention) preserved. Other languages keep the broader allow-list.
- Go param collection drops `ctx context.Context` and `ctx context.CancelFunc` parameters entirely rather than seeding their names into `unit.params`. Tree-sitter-go's `parameter_declaration` exposes `name` and `type` as named fields; descend only into `name` so type-segment identifiers don't pollute the param-name set (`info *PackageInfo` no longer contributes `PackageInfo`). Together with the allow-list narrowing above, closes ~1900 `go.auth.missing_ownership_check` findings on gitea backend helpers whose only "user-input evidence" was the ubiquitous `ctx context.Context` first param.
- Ruby controller method visibility + filter-callback gate. Methods marked `private` (bare `private` directive, targeted `private :foo, :bar`, or `protected`) and Rails filter callback targets (`before_action`, `after_action`, `around_action`, their `prepend_*` / `append_*` / `skip_*` siblings, and the legacy `*_filter` aliases) are no longer emitted as `Function` units. Visibility tracking is class-body source-order with two directive forms (bare toggles default visibility, targeted explicitly marks named methods). Block-form filters (`before_action do … end`) carry no symbol arg and are correctly ignored. Closes mastodon / diaspora `rb.auth.missing_ownership_check` flood on `set_X` row-fetch helpers used as `before_action` callbacks.
- Field-LHS resource acquires no longer counted as local resource leaks at the `apply_assignment` site. `e->name = (char *)e + sizeof(*e)` (sub-buffer alias inside a returned struct) and `mem->buf = ptr` (local-into-field ownership transfer) now mark the RHS local `MOVED` and stop tracking the field as a separately OPEN resource. The parent struct owns the field's lifecycle. Cross-language (distinct from the Go-only `apply_call` field-LHS gate, which is restricted because JS/TS class-field acquires `this.fd = fs.openSync(...)` are the documented expected leak pattern in that path). Closes curl `entry_new` and equivalent C/C++ shapes in openssl / postgres.
- Empty-formals SSA lowering signal. `lower_to_ssa_with_params` now sets `with_params=true` even when `formal_params` is empty, so an arrow `() => {…}` is treated as "explicitly zero formals" rather than "no formals info". External vars in a zero-formal arrow are now correctly tagged as synthetic closure captures, so the JS/TS / Java auto-seed pass cannot mistake a bubbled-up free var (e.g. `userId` lifted from a nested jest test callback) for a real handler formal. Closes 934 phantom taint findings on the outline test suite (`describe("…", () => { test("…", () => { server.post(…) }) })`-shaped fixtures).
- Rust integer-typed values now suppress `Cap::FILE_IO` at the abstract-domain leaf gate (previously HTML_ESCAPE only). An integer's decimal representation is digits with optional leading `-`, never path metacharacters (`/`, `\`, `.`); magnitude is irrelevant. Closes the sudo-rs RUSTSEC-2023-0069 patched FP `let uid: u32 = user.parse()?; path.push(uid.to_string())`.
## [0.6.0] - 2026-05-02
A focused release that splits data-exfiltration off from SSRF and ships sinks for outbound HTTP request bodies across all 10 languages, with calibration tuned so plain user input echoed back upstream does not fire.
### Added
- New `taint-data-exfiltration` rule, separate from SSRF. Fires when a Sensitive-tier source (cookie, header, env, file, database, caught exception) reaches the body, headers, or json payload of an outbound HTTP call. Plain user input gets suppressed at emission time so a gateway echoing `req.body` back upstream is not flagged.
- Sinks ship for `fetch` body, `XMLHttpRequest.send`, Python `requests.post` and `httpx.AsyncClient.post`, Java JDK `HttpClient.send` with `BodyPublishers`, OkHttp builder chains, Apache HttpClient `execute`, RestTemplate, WebClient, Go `http.Post` and `http.NewRequest` + `Do`, Rust `reqwest`/`ureq`/`surf`/`hyper` body/json/form/multipart chains, Ruby `Net::HTTP.post` and RestClient, C and C++ `curl_easy_setopt(CURLOPT_POSTFIELDS, ...)` gated by the macro arg.
- Three suppression knobs:
- Sanitizer convention. `logEvent`, `forwardPayload`, `tracker.send`, `analytics.track`, `metrics.report`, `serializeForUpstream` are treated as `Sanitizer(data_exfil)` by default. Add your own with the standard custom-rule path.
- Trusted destination allowlist in `detectors.data_exfil.trusted_destinations`. Matched against the abstract-string domain prefix; a literal or template prefix that begins with one of these entries drops the cap.
- Detector toggle `detectors.data_exfil.enabled = false` strips the cap before emission. Other taint classes are unaffected.
- Calibration. Severity is High for cookie or env sources, Medium for header, file, database, or caught-exception sources. Confidence stays at Medium even with strong corroboration, drops to Low without abstract or symbolic backing, and drops one tier on path-validated flows. SARIF output carries a `properties.data_exfil_field` entry on data-exfil findings, set to the destination object-literal field the leak reached (`body`, `headers`, or `json`).
- Benchmark coverage. 13 vulnerable fixtures across 8 languages under `tests/benchmark/corpus/{lang}/data_exfil/` and 6 paired safe fixtures for the sensitivity gate and sanitizer convention. New `data_exfil` row in the per-class breakdown. Per-class CI floor at P, R, F1 ≥ 0.85 (current baseline is 1.000).
- Backwards taint walk recognises `Cap::DATA_EXFIL` and emits the same rule ID.
- Ruby SSRF coverage. `OpenURI.open_uri` now classified as an SSRF sink (the low-level fetcher that `URI.open` delegates to). Closes the CarrierWave CVE-2021-21288 download path and equivalent gem shapes that route through `OpenURI` directly.
- Ruby chained-call wrapper classification. Statement-level wrappers like `YAML.safe_load(File.read(filename))` and `Marshal.load(File.read(p))` now classify the inner sink for cross-function summary extraction. Without this, the outer call became a non-sink node and the inner sink was lost when the helper was summarised.
- Ruby CVE corpus. Vulnerable + patched fixtures added for CVE-2021-21288 (CarrierWave SSRF) and CVE-2023-38337 (rswag path traversal).
- Lodash `_.template` modeled as a gated `Cap::CODE_EXEC` sink. Activates on the template-string argument; suppresses when arg-1 carries a literal `{ evaluate: false }`. Closes Strapi CVE-2023-22621 (server-side template injection → RCE via `<% … %>` evaluate blocks). Vulnerable + patched fixtures added under `tests/benchmark/cve_corpus/javascript/CVE-2023-22621/`.
- JS/TS gated-sink kwarg extractor falls back to inspecting arg-1 object literals (`fn(x, { evaluate: false })`) when the language has no `keyword_argument` node. Required so the lodash gate can read its options object.
- Lodash double-call form (`_.template(t)(data)`) routes through `find_chained_inner_call` so the outer call's gated-sink rebinding fires.
- Cross-function helper-validation propagation. New `SsaFuncSummary.validated_params_to_return` field records parameter indices whose taint flow to the return value is fully validated by a dominating predicate (regex allowlist, type check, validation call) on every return path. At call sites, each tainted argument passed to a validated position, and the call's own return value, are marked `validated_must` / `validated_may` in the caller's SSA taint state, the same way an inline `if (!regex.test(x)) throw` would. Closes the helper-validator gap behind PayloadCMS CVE-2026-25544 (Drizzle SQL injection in `sanitizeValue`). Vulnerable + patched TypeScript fixtures added.
- Destructured-arg sibling expansion in per-parameter taint summary probing. JS/TS object-pattern formals (`({ column, operator, value }) => …`) now seed every binding sharing the slot, and any sibling reaching `validated_must` counts as the slot being validated. New `BodyMeta.param_destructured_fields` carries sibling lists alongside `params` and `param_types`. JS `PARAM_CONFIG` accepts `assignment_pattern` (default-value formals) and `object_pattern` (destructured formals).
- Regex-allowlist branch narrowing. `<X>.test(value)` / `<X>.match(value)` / `<X>.matches(value)` where the receiver name contains `regex` or `pattern` classifies as a `ValidationCall` and narrows the call's first argument, not the regex receiver. Was also extended to `extract_validation_target` so the surviving branch validates `value`, not the regex object. Motivated by Payload CVE-2026-25544 (`if (!SAFE_STRING_REGEX.test(value)) throw …`).
- TypeScript template-substring (`${fn(arg)}`) call-resolution arity-hint fallback. When CFG lowering drops `arg_uses` but `args` is non-empty, the resolver passes `None` so the unique-name fallback can still pick up the lone candidate.
- Caller-scope-entity exemption in `rs.auth.missing_ownership_check`. `<entity>.id` / `<entity>.pk` no longer fires when `<entity>` is a unit parameter named after a multi-tenant scope primitive: `organization` / `org`, `project`, `team`, `workspace`, `tenant`, `account`, `community`, `group`, `repository` / `repo`, `company`. Other field names (`.name`, `.slug`) still flag, and `user` / `member` / `actor` are deliberately excluded (handled by `is_actor_context_subject`). Closes a flood of FPs in Sentry / Saleor / Discourse / Mastodon-shaped multi-tenant helpers (`get_environments(request, organization)`, `_filter_releases_by_query(qs, organization, …)`).
- Auth value-ref walker recurses into the `value` child of `keyword_argument` / `keyword_arg` / `named_argument` nodes. `Model.objects.filter(organization_id=org.id)` no longer surfaces the kwarg key (`organization_id`) as a bare-identifier user-input subject. The schema column name is fixed at call time.
- Test-decorator denylist for Flask route extraction. `mock.patch`, `mock.patch.object` / `.dict` / `.multiple`, `unittest.mock.*`, `monkeypatch.setattr` / `setenv` / `delattr` / `delenv`, and `pytest.mark.parametrize` no longer collide with `<app>.patch` route registration. Stops every `@mock.patch("…")`-decorated test method from being attached as a Flask PATCH handler and flagged as `missing_ownership_check`.
- Typed-extractor route-level guard injection for axum and actix-web. Handlers registered via attribute macros (`#[get("/path")]`, `#[routes::path(…)]`) or via external service-config builders previously never had their typed-extractor guards seeded. New `apply_typed_extractor_guards_to_units` walks every `Function`-kind unit and injects guard checks from typed-extractor params, complementing the route-walk path that already covered `.route(...)` registration.
- New auth config key `policy_guard_names`. Typed-extractor wrappers that prove route-level capability/policy enforcement (e.g. meilisearch's `GuardedData<ActionPolicy<X>, _>`) are recognised distinctly from authentication-only wrappers. Matched as last-segment + case-insensitive `starts_with`. Rust default: `["Guarded"]`. Distinct from `login_guard_names` so the pattern doesn't pollute regular call recognition (a function like `guarded_load(..)` is not a login guard).
- Outer-wrapper-aware classification of typed extractors. `GuardedData<ActionPolicy<X>, Data<AuthController>>` is classified by the outer `GuardedData` (policy-bearing → `AuthCheckKind::Other`), not by whether an inner generic arg substring-matches `auth`. Bare data-only extractors (`Path<u64>`, `Query<X>`, `Json<X>`, `Form<X>`, `State<X>`, `Extension<X>`, `Data<X>`) outer-name-match early-return to `None` regardless of inner type tokens. Reference-marker (`&`, `&mut`, `&'a`) and module-path (`std::collections::`) prefixes stripped before matching.
- Project-level web-framework signal in Rust auth analysis. New `FrameworkContext::lang_has_web_framework(lang)` is three-valued: `Some(true)` when manifest names a framework, `Some(false)` when the manifest was inspected and named none, `None` when no manifest was inspected. New `rust_file_imports_web_framework` does a per-file `axum::` / `actix_web::` / `rocket::` / `axum_extra::` import probe (8 KB head). When the project's Cargo.toml is inspected and lists no Rust web framework AND the file does not directly import one, the `context_inputs` and param-name-heuristic arms of `unit_has_user_input_evidence` are suppressed. `RouteHandler` classification (concrete route-registration evidence) still bypasses the gate. Closes a flood of `missing_ownership_check` FPs in non-web Rust crates such as zed-style desktop / GUI codebases where a debug-session handle named `session` would trip `matches_session_context` on `session.update(cx, …)`. Currently Rust-only; other languages keep prior behavior (`None`).
- Rust auth corpus extended with `safe_actix_guarded_data_extractor.rs` and `unsafe_actix_no_guarded_data_extractor.rs` (typed-extractor guard injection); `safe_non_web_rust_project/` and `unsafe_actix_web_project_no_check/` (full Cargo.toml + src/lib.rs project shapes for the framework-signal gate).
- Python auth corpus extended with `vuln_user_id_param_no_auth.py`, `safe_django_orm_caller_scoped_entity.py` (caller-scope-entity exemption), `safe_mock_patch_test_method.py` (test-decorator denylist).
- Go safe corpus extended with `safe_inner_call_close_in_arg.go` (`require.NoError(t, f.Close())` shape), `safe_struct_field_resource_owned_by_struct.go` (field-LHS ownership transfer), and a `vuln_resource_leak_no_close.go` regression guard.
### Fixed (false positives)
- C++ `cpp.memory.reinterpret_cast` no longer fires when the target type is well-defined by C++ aliasing rules. Suppressed targets: byte-pointer family (`char*`, `unsigned char*`, `signed char*`, `wchar_t*`, `uint8_t*`, `int8_t*`, `std::byte*`, `byte*`), `void*`, integer round-trip (`uintptr_t`, `intptr_t`, and `std::` variants, no pointer required), and the BSD socket address family (`sockaddr*`, `struct sockaddr*`, `sockaddr_in*`, `sockaddr_in6*`, `sockaddr_un*`, `sockaddr_storage*`). User-defined struct or class pointer targets keep firing. Closes ~70% over-fire on serialization, hashing, IPC, and socket-API code where the cast is the standard-blessed idiom.
- PHP `php.crypto.md5` and `php.crypto.sha1` suppress when the call's consuming context yields a non-cryptographic identifier name. Recognised contexts: assignment LHS (variable, `$obj->property`, `$arr['key']`), array element keys, subscript indices, return statements (resolved to enclosing method or function name with `get` prefix stripped), and method-call arguments where the method is a key/cache/lookup verb (`get`, `set`, `has`, `delete`, `fetch`, `store`, `find`, `getItem`, `setItem`). Names containing a crypto keyword (`password`, `secret`, `token`, `signature`, `hmac`, `digest`, `salt`, `key`) keep firing. Closes ETag generation, cache-key hashing, dedup fingerprint, and `getCacheKey()`-style false positives in real PHP repos (phpmyadmin, nextcloud).
- JS and TS `secrets.fallback_secret` no longer fire on empty-string fallbacks (`process.env.X || ""`). Developers write `|| ""` to satisfy non-undefined string types without committing a real secret. Non-empty literal fallbacks still fire.
- Path-traversal sink suppression accepts canonicalised-and-rooted shapes. New `PathFact::is_path_traversal_safe` predicate clears `Cap::FILE_IO` when the path is dotdot-free and either non-absolute or carries a verified prefix-lock. New `OPAQUE_PREFIX_LOCK` marker records the structural invariant ("rooted under SOME prefix") when the `starts_with`-style guard's argument is a method call, field access, or configured root rather than a string literal. Closes the Ruby `File.expand_path + start_with?(root)` shape (rswag CVE-2023-38337 patched counterpart), the Python `os.path.realpath + .startswith(root)` shape, and the JS `path.resolve + .startsWith(root)` shape. `classify_path_assertion` extended to JS `.startsWith(...)`, Python `.startswith(...)`, Ruby `.start_with?(...)` (paren and paren-less), and Go `strings.HasPrefix(...)`.
- Branch narrowing now flips prefix-lock attachment under condition negation. For `if !target.startsWith(ROOT) { return; }` the lock attaches to the surviving block, not the rejection arm. Rejection-axis narrowing is unchanged because the rejection classifier is text-level and already accounts for leading `!`.
- Go field-LHS resource acquires no longer counted as local resource leaks. `b.cpuprof = os.Create(...)` transfers ownership to the containing struct; closure responsibility belongs to a paired `Stop()` / `Release()` method on the struct's lifecycle. Gated in both `state/transfer.rs::apply_call` and `cfg_analysis/resources.rs::run`. Restricted to Go (`Lang::Go` check). JS/TS class-field acquires (`this.fd = fs.openSync(...)`) keep being tracked because the leak fixtures rely on it. Production trigger: prometheus `cmd/promtool/tsdb.go::startProfiling` cluster (`b.cpuprof`, `b.memprof`, `b.blockprof`, `b.mtxprof`).
- Go inner-call release in argument position. `require.NoError(t, f.Close())`, `errs = append(errs, f.Close())`, JUnit `assertEquals(0, in.read())`: releases that live in argument position now mark the receiver `CLOSED`. Bare-receiver inner calls only (chained-receiver releases stay owned by `chain_proxies`); marks `CLOSED` only with no `DoubleClose` attribution; respects `in_defer` for symmetry.
### Other
- Action download script warning for the mutable `latest` tag now references `v0.6.0` instead of `v0.5.0`.
## [0.5.0] - 2026-04-29
The biggest release since launch. The taint engine was rebuilt on top of an SSA IR, cross-file analysis was deepened across the board, and Nyx now ships a local web UI for triaging findings without leaving your machine.
> Heads-up: false positives or regressions on cross-file flows are possible. Please open an issue with a minimal reproduction if you hit one.
### Highlights
- **New SSA-based taint engine.** Block-level worklist analysis over a pruned SSA IR, replacing the legacy BFS engine across all 10 languages. More precise, easier to extend, and the foundation for everything else in this release.
- **Cross-file analysis.** Function summaries (including the new SSA summaries) flow across files via SQLite-backed persistence. Callee bodies can be inlined for context-sensitive analysis (k=1) and walked symbolically across file boundaries.
- **Symbolic execution layer.** Candidate findings are walked symbolically from source to sink, producing concrete attack witnesses, pruning infeasible paths, and (optionally) handing constraints off to Z3.
- **Local web UI (`nyx serve`).** React + Vite frontend for browsing findings, viewing flow paths, and triaging results. Triage decisions persist to `.nyx/triage.json` so they version with your code.
- **Hostile-repo hardening.** Path containment, loopback-only serving, CSRF tokens, bounded artifact reads. Safe to run on untrusted code.
- **Tighter false-positive controls.** Type-aware sink suppression, abstract interpretation (intervals + string prefixes), constraint solving, allowlist and type-check guard recognition, and confidence scoring on every finding.
### Engine
- SSA IR with dominance-frontier phi insertion. The optimization pipeline runs constant propagation, branch pruning, copy propagation, alias analysis, DCE, type facts, and points-to in sequence.
- Multi-label classification. A single API can carry both Source and Sink labels (e.g. PHP `file_get_contents`, Java `readObject`).
- Gated sinks. `setAttribute`, `parseFromString`, etc. only activate when the constant attribute argument is dangerous, and only the payload argument is treated as taint-bearing.
- Container taint with per-index precision and bounded points-to. Aliased containers share heap identity correctly.
- Loop-aware analysis: induction-variable pruning, widening at loop heads, bounded unrolling in symex.
- Path-sensitive phi evaluation propagates validation when all tainted predecessors are guarded.
- Per-return-path summaries decompose function effects when paths produce different taint behavior.
- Cross-file SCC fixed-point. Mutually recursive functions across files now reach a joint convergence.
- Demand-driven backwards analysis (off by default) annotates findings with cutoff diagnostics.
- Direction-aware engine notes (`UnderReport`, `OverReport`, `Bail`) flow into confidence scoring, ranking, and the new `--require-converged` strict mode.
- Synthetic field-write inheritance: `u.Path = "/foo"` no longer drops taint carried by other fields of `u`. Fixes Owncast CVE-2023-3188 (SSRF).
- Phantom-Param-aware field suppression skips method/function references that share a base name with a tainted variable.
- Validation err-check narrowing for the two-statement Go idiom `_, err := strconv.Atoi(input); if err != nil { return }`: `input` is marked validated on the surviving `err == nil` branch.
- Go: `strings.Replace` / `strings.ReplaceAll` recognised as a sanitizer when the OLD literal contains a known-dangerous payload (shell metachars, path-traversal, HTML, SQL) and the NEW literal does not reintroduce one.
- Go: literal-strip cap detection extended to shell metachars (`;`, `|`, `&`, `$`, backtick) and SQL metachars (`'`, `"`, `--`).
- Go: `interpreted_string_literal` / `raw_string_literal` handled in tree-sitter so const-string arg extraction works for Go's double-quoted and backtick forms.
### Symbolic Execution
- Expression trees (`SymbolicValue`) preserve computation structure through the path walk: integers, strings, binary ops, concatenations, calls, phi merges.
- Witness strings reconstruct concrete attack payloads at sink nodes.
- Bounded multi-path forking with reachability pruning.
- Cross-file: callee summaries are modeled directly, and pre-lowered callee bodies are loaded from SQLite so witnesses can keep walking across files.
- Interprocedural mode: nested frames with full state propagation, transitive descent up to 3 levels, structured cutoff tracking.
- Field-sensitive symbolic heap with bounded fields per object.
- Symbolic string theory: `Substr`, `Replace`, `ToLower`, `ToUpper`, `Trim`, `StrLen` modeled with concrete folding and sanitizer pattern detection.
- Optional Z3 integration (compile-time `smt` feature) for cross-variable constraint solving.
### Security & Coverage
- Vulnerability classes added: SSRF (10 languages), deserialization (Python, Ruby, Java, PHP), and `Cap::UNAUTHORIZED_ID` for auth-as-taint (off by default behind config flag).
- Auth analysis: receiver-type sink gating, row-level ownership-equality detection, self-actor recognition (`let user = require_auth()`), sink classification (in-memory vs realtime vs outbound), helper-summary lifting, and SQL JOIN-through-ACL recognition.
- State analysis (resource lifecycle, use-after-close, leaks, unauthed access) is now on by default. RAII-aware for Rust and C++; recognizes Python `with`, Go `defer`, Java try-with-resources.
- Framework rule packs: Express, Flask/Django, Spring/JNDI, Rails. Per-language label depth significantly expanded.
- C/C++ taint depth: output-parameter source propagation, implicit definitions for uninitialized declarations.
- Negative test corpus (30 fixtures) and a 262-case benchmark with CI gates on rule-level Precision/Recall/F1.
### Detection metrics
- Aggregate rule-level F1 reaches **0.998** (P=0.995, R=1.000). All real-CVE fixtures fire; only one open FP (`go-safe-009`).
- Go: 98.0% F1 on the 53-case corpus (1 FP / 0 FNs).
- CVE-2023-3188 (owncast SSRF) now detects.
### CLI & Output
- `nyx serve`: local web UI on `localhost` only (refuses non-loopback binds).
- `--require-converged` filters out findings where the engine bailed early.
- Analysis-engine toggles graduated from `NYX_*` env vars to first-class flags and `[analysis.engine]` config: `--constraint-solving`, `--abstract-interp`, `--context-sensitive`, `--symex`, `--cross-file-symex`, `--symex-interproc`, `--smt`, `--parse-timeout-ms`. Old env vars still work when Nyx is consumed as a library.
- Confidence (`High`/`Medium`/`Low`) shown on every finding, including console headers.
- Engine notes surfaced in console (`[capped: N notes, over-report]`), JSON (`engine_notes`, `confidence_capped`), and SARIF (`result.properties.loss_direction`).
- Flow paths reconstructed step-by-step with file/line/snippet for each hop.
- Concrete attack witness strings synthesized by the symbolic executor.
- Primary sink locations now point at the callee's real sink line; caller call sites are preserved as flow steps.
- Richer scan progress: explicit stages, timing breakdowns, language counters, skipped/reused file counts.
- Tighter taint-finding deduplication.
### Hardening
- Centralized path containment rejects traversal, symlink escapes, and oversized reads across UI, debug, and triage routes.
- `nyx serve` validates `Host` headers, requires per-session CSRF tokens for mutations, and refuses scans outside the original repo root.
- Walker re-validates symlink targets against the scan root.
- Bounded reads on framework manifests and `.nyx/triage.json` imports.
- UI falls back to plain text on pathologically long lines to defeat regex-DoS in syntax highlighting.
- Parser timeout is now configuration-backed with hostile-input regression coverage.
### Persistence
- SQLite schema bumped to v2. Anonymous-function identity is now a structural DFS index instead of a byte offset, so inserting a line above an unchanged function no longer invalidates its `FuncKey`. Pre-0.5.0 caches are silently cleared on open; triage data and scan history are preserved.
- Engine-version metadata; persisted summaries and file hashes invalidate on mismatch.
- Stale SSA tables recreate when required columns are missing; deserialization failures log instead of silently dropping rows.
### Frontend
- Replaced the legacy `app.js` with a React + Vite + TypeScript SPA.
- Interactive graph workspace for CFG and call-graph views (Graphology + ELK + Sigma) with neighborhood reduction and a full-page inspector.
- Triage UI with database-backed decisions (true positive, false positive, accepted risk, suppressed) and `.nyx/triage.json` round-trip.
- Scan history, rules management, and finding detail panels with evidence and flow visualization.
- Vitest browser-side test suite wired into CI.
- Bumped to React 19, Vite 8, TypeScript 6.0, ESLint 10, `@vitejs/plugin-react` 6, with aligned `@types/react*`.
- `SSEContext`: typed `reconnectTimer` ref as `ReturnType<typeof setTimeout> | undefined` to satisfy TS 6's stricter `useRef` overloads.
- `FindingsPage`: included `toast` in `useCallback` deps to avoid stale-closure warnings.
- `tsconfig.json`: dropped `baseUrl`, using a relative `./src/*` path mapping instead.
### Removed
- Legacy BFS taint engine, `TaintTransfer`, `TaintState`, and the `NYX_LEGACY` fallback.
- Legacy vanilla-JS frontend (`app.js`).
## [0.4.0] - 2026-02-25
A precision and ergonomics release. Findings are now ranked, lower-noise by default, and easier to triage in CI.
### Highlights
- **Attack-surface ranking.** Every finding gets an exploitability score combining severity, analysis kind, evidence strength, and path-validation. Console output shows the score in the header line; `--no-rank` opts out.
- **Low-noise prioritization.** Quality-category findings are excluded by default (`--include-quality` brings them back). High-frequency Quality rules are rolled up per `(file, rule)` with example occurrences. LOW budgets cap noise without ever displacing High/Medium findings.
- **State-model dataflow analysis.** New per-variable resource-lifecycle and auth-level analysis catches use-after-close, double-close, must-leak, may-leak (branch-aware), and unauthenticated-sink access. Opt-in via `scanner.enable_state_analysis`.
- **Inline `nyx:ignore` suppressions** with same-line and next-line directives, comma lists, wildcard suffixes, and string-literal guards across all 10 languages.
- **AST pattern overhaul.** All 10 language pattern files rewritten with consistent metadata, namespaced IDs (`<lang>.<category>.<specific>`), and 30+ new patterns. 11 broken tree-sitter queries fixed.
- **Monotone forward-dataflow taint engine.** Replaced the BFS engine with a proper worklist over a finite lattice. Termination is now guaranteed by lattice height, eliminating BFS-budget bailouts on large files.
- **Path-sensitive taint analysis.** Branch predicates flow with the analysis. Contradictory guards prune infeasible paths; validation calls produce annotated findings without changing severity.
- **Interprocedural call graph.** Whole-program graph with three-valued callee resolution (`Resolved`/`NotFound`/`Ambiguous`), SCC analysis, and topo ordering ready for bottom-up taint propagation.
### CLI & Output
- `--severity <EXPR>` replaces `--high-only`. Supports `HIGH`, `HIGH,MEDIUM`, `>=MEDIUM`. Filtering is now applied at the output stage so taint and CFG findings are correctly downgraded too.
- `--mode <full|ast|cfg|taint>` replaces `--ast-only` and `--cfg-only`.
- `--index <auto|off|rebuild>` replaces `--no-index` and `--rebuild-index`.
- `--fail-on <SEVERITY>` for CI exit-code gating.
- `--min-score <N>` for ranking-aware filtering.
- `--show-suppressed` reveals suppressed findings dimmed with `[SUPPRESSED]`.
- `--keep-nonprod-severity` (renamed from `--include-nonprod`).
- `--quiet` mirrors `output.quiet`.
- Console renderer overhauled: severity is the strongest visual anchor, file paths are dim blue, taint flows use `→` arrows, multi-line call chains are normalized.
- Confidence shown alongside score in the header line.
- Pattern-level confidence is now set at the pattern definition site, not heuristically inferred from severity.
### Breaking
- Config and data directory renamed from `dev.ecpeter23.nyx` to `nyx`. Existing config and SQLite indexes at the old path won't be picked up. Copy them across or re-run `nyx scan`.
- `Severity::from_str` now returns `Err` for unknown values instead of silently defaulting to Low.
### Notable Fixes
- KINDS-map audit across all 10 languages: 89 missing tree-sitter node types added. Switch/case, try/catch/finally, class bodies, lambdas, closures, and namespaces are no longer silently dropped.
- `else_clause` mapping fixed for C, C++, Rust, JS, TS, Python, PHP. Code inside else blocks was being dropped from the CFG.
- Rust `if let` / `while let` taint propagation now works.
- Taint BFS non-termination on large JS files (the BFS engine has since been replaced).
- C++ `popen` pattern ID collision with C.
- Constant-arg sink suppression for AST patterns.
## [0.3.0] - 2026-02-25
Configurability, SARIF, and an aggressive false-positive purge.
### Highlights
- **Configurable analysis rules.** Sources, sanitizers, sinks, terminators, and event handlers can be defined per language in `nyx.local` or via `nyx config add-rule`/`add-terminator`. Config rules take priority over built-in rules.
- **`nyx config` CLI subcommand** with `show`, `path`, `add-rule`, `add-terminator`.
- **SARIF 2.1.0 output (`-f sarif`).** Spec-compliant for GitHub Code Scanning, Azure DevOps, and other SARIF consumers.
- **`SourceKind` taint classification.** Findings carry an inferred source kind (`UserInput`, `EnvironmentConfig`, `FileSystem`, `Database`, `Unknown`) and severity is now derived from it instead of being hardcoded to High.
- **Non-prod severity downgrade by default.** Findings in tests, vendor, benchmarks, examples, fixtures, build scripts, and `*.min.js` are downgraded one tier. `--include-nonprod` restores original severity.
- **Resource leak detection** for Python, Ruby, PHP, JavaScript, and TypeScript (file handles, sockets, locks, mysqli, curl, fs streams).
- **Progress bars and quiet mode.** Indicatif-driven progress for discovery, Pass 1, and Pass 2 (auto-hidden in JSON/SARIF/quiet modes).
### Performance
- Single fused parse+CFG pass replaces the previous two-parse summary extraction.
- Light-weight dataflow sweep in CFG builder is now O(N) per function instead of O(N²) over the whole file.
- Parallel summary merging via rayon fold/reduce.
- Indexed scans now read and hash each file once instead of up to 4 times.
- SQLite mutex mode relaxed (r2d2 + WAL provides safety without global lock).
- Zero-allocation taint hashing and in-place taint transfer.
### Notable Fixes
- One-hop constant-binding suppression: `cmd = "git"; subprocess.run([cmd, ...])` no longer flags.
- Exec-path guards (`which`, `resolve_binary`, `shutil.which`) recognized.
- `signal.connect` / `event.connect` no longer match Python db-connection acquire patterns.
- `threading.Lock()` without `.acquire()` no longer flags as unreleased.
- `FileResponse(f)` / `send_file(f)` recognized as ownership transfer.
- `el.href` no longer matches `location.href` patterns.
- Constant-only sink calls (`subprocess.run(["make","clean"])`) suppressed.
- `std::cout` no longer treated as a sink.
- Break/continue inside loops correctly wires into the loop header/exit, fixing false unreachable-code findings.
- Preprocessor `#ifdef`/`#endif` blocks no longer orphan subsequent code in C/C++.
- `freopen` no longer matches `fopen` acquire patterns.
- Struct-field, linked-list, and global assignment recognized as ownership transfers.
## [0.2.0] - 2026-02-24 ## [0.2.0] - 2026-02-24
### Added The cross-file release.
- **Cross-file taint analysis** -- two-pass architecture: Pass 1 extracts `FuncSummary` per function (source/sanitizer/sink capabilities, taint propagation, callees), Pass 2 runs BFS taint propagation with cross-file callee resolution.
- **CFG analysis engine** with five detectors: unguarded sinks (`cfg-unguarded-sink`), auth gaps in web handlers (`cfg-auth-gap`), unreachable security code (`cfg-unreachable-*`), error fallthrough (`cfg-error-fallthrough`), and resource leaks (`cfg-resource-leak`).
- **Cross-language interop** -- taint flows across language boundaries via explicit `InteropEdge` structs without false-positive name collisions.
- **Function summaries** persisted to SQLite (`function_summaries` table) with arity, parameter names, capability bitflags, and callee lists.
- **Multi-language CFG + taint support** -- all 10 languages (Rust, C, C++, Java, Go, PHP, Python, Ruby, TypeScript, JavaScript) now have `KINDS` maps, `RULES`, and `PARAM_CONFIG` for full CFG construction and taint analysis.
- **Resource leak detection** for C/C++ (malloc/free, fopen/fclose), Go (os.Open/Close, Lock/Unlock), Rust (alloc/dealloc), and Java (streams, connections).
- **Finding scoring system** -- numeric scores based on severity, proximity to entry point, path complexity, taint confirmation, and confidence multiplier.
- **Analysis modes** -- `Full` (default), `Ast` (`--ast-only`), and `Taint` (`--cfg-only`) selectable via CLI flags or `scanner.mode` config.
- **`GlobalSummaries`** with conservative merge: union caps, OR booleans, union param/callee lists on name collisions across files.
- **Performance optimizations** -- `_from_bytes` variants to read-once/hash-once, lock-free rayon parallelism, SQLite WAL + 8 MB cache + 256 MB mmap.
- **Tracing instrumentation** -- `tracing` spans on all pipeline phases (walk, pass1, merge, pass2, per-file ops, db_init).
- **Benchmark suite** -- criterion benchmarks in `benches/scan_bench.rs` with fixtures.
- 107 unit tests covering taint propagation, cross-file resolution, cross-language interop, CFG analysis, and summaries.
### Changed - **Two-pass cross-file taint analysis.** Pass 1 extracts `FuncSummary` per function (caps, propagation, callees), Pass 2 runs BFS taint propagation with cross-file callee resolution.
- Bumped all dependencies to latest compatible versions. - **CFG analysis engine** with five detectors: unguarded sinks, auth gaps in web handlers, unreachable security code, error fallthrough, resource leaks.
- `Cap` bitflags expanded: `ENV_VAR`, `HTML_ESCAPE`, `SHELL_ESCAPE`, `URL_ENCODE`, `JSON_PARSE`, `FILE_IO`. - **Cross-language interop** via explicit `InteropEdge` structs (no false-positive name collisions).
- `classify()` in labels uses zero-allocation byte-level case-insensitive comparisons. - **Function summaries persisted to SQLite** (`function_summaries` table).
- Indexed scans now always re-analyze all files in Pass 2 when taint is enabled (conservative: global summaries may have changed even if a file didn't). - **Multi-language CFG + taint support** for all 10 languages.
- **Resource leak detection** for C/C++, Go, Rust, and Java.
### Fixed - **Finding scoring system** combining severity, entry-point proximity, path complexity, taint confirmation, and confidence.
- Clippy `ptr_arg` lint in perf tests (`&PathBuf` -> `&Path`). - **Analysis modes**: `Full` (default), `Ast` (`--ast-only`), `Taint` (`--cfg-only`).
- **Cap bitflags expanded**: `ENV_VAR`, `HTML_ESCAPE`, `SHELL_ESCAPE`, `URL_ENCODE`, `JSON_PARSE`, `FILE_IO`.
- Performance: read-once/hash-once via `_from_bytes` variants, lock-free rayon, SQLite WAL + 8 MB cache + 256 MB mmap.
- Tracing instrumentation on all pipeline stages; criterion benchmark suite.
## [0.2.0-alpha] - 2025-06-28 ## [0.2.0-alpha] - 2025-06-28
### Added - Experimental intra-procedural CFG + taint analysis for Rust. Builds a CFG, applies dataflow, and flags unsanitised Source → Sink paths (e.g. `env::var``Command::new`).
- Experimental intraprocedural CFG + taint analysis for Rust. Nyx now builds a controlflow graph, applies dataflow rules, and flags unsanitised Source → Sink paths (e.g. env::var → Command::new). - O(1) node-kind lookup via per-language PHF tables.
- O(1) nodekind lookup via perlanguage PHF tables for zerocost dispatch. - Debug channel `target=cfg` (`RUST_LOG=nyx::cfg=debug`) to inspect generated graphs.
- Six unit tests covering conditionals, loops, sanitizers, and multiple sources. - Fixed Windows release pipeline (PowerShell has no `zip` command).
- Debug channel target=cfg (use RUST_LOG=nyx::cfg=debug) to inspect generated graphs.
### Fixed
- Fixed a bug in the release pipeline where Windows was trying to call the zip, PowerShell doesn't have a zip command
## [0.1.1-alpha] - 2025-06-25 ## [0.1.1-alpha] - 2025-06-25
### Fixed - Fixed `scan --no-index` not respecting the `max_results` config setting (#1).
- Fixed a bug where the `scan --no-index` command would not respect the `max_results` config setting (#1) - Integration tests covering indexing and scanning pipelines (#3, #4, #5, #8).
### Added
- Integration tests covering indexing and scanning pipelines (#3, #4, #5, #8)
## [0.1.0-alpha] - 2025-06-25 ## [0.1.0-alpha] - 2025-06-25
### Added Initial alpha release.
- Initial alpha release of **Nyx** CLI tool
- Multi-language AST pattern scanning via `tree-sitter` for Rust, C/C++, Java, Go, PHP, Python, Ruby, TypeScript, JavaScript - Multi-language AST pattern scanning via `tree-sitter` for Rust, C/C++, Java, Go, PHP, Python, Ruby, TypeScript, JavaScript.
- `scan` command: filesystem walker, pattern execution, console output - `scan` command: filesystem walker, pattern execution, console output.
- `index` command: build, rebuild, and status reporting of SQLite-backed index - `index` command: build, rebuild, and status reporting of SQLite-backed index.
- `list` command: list indexed projects with optional verbosity - `list` command: list indexed projects with optional verbosity.
- `clean` command: remove one or all project indexes - `clean` command: remove one or all project indexes.
- Configuration system with `nyx.conf` (generated) and `nyx.local` (user overrides) - Configuration system with `nyx.conf` (generated) and `nyx.local` (user overrides).
- Default severity levels: High, Medium, Low - Default severity levels: High, Medium, Low.
- Unit tests for core modules (config, ext, project utils)

73
CLA.md Normal file
View file

@ -0,0 +1,73 @@
# Nyx Contributor License Agreement
## Why this exists
Nyx is an open source project and will always have a fully open-source core available to the community.
This Contributor License Agreement (CLA) exists to ensure the long-term sustainability of the project. It allows Nyx to evolve over time, including improving, distributing, and potentially offering commercial versions or services that support continued development.
**You retain ownership of your contributions.** This agreement simply grants the project the rights needed to use and evolve them.
---
Thank you for your interest in contributing to Nyx (the "Project"). This Contributor License Agreement ("Agreement") clarifies the intellectual property rights granted with each Contribution from any person or entity. It is for Your protection as a contributor as well as the protection of the Project and its users.
By submitting a Contribution to the Project, You accept and agree to the terms below. If You do not agree to these terms, please do not submit Contributions.
## 1. Definitions
**"You"** (or **"Your"**) means the individual or legal entity making a Contribution to the Project. For a legal entity, "You" includes the entity and any entity that controls, is controlled by, or is under common control with that entity.
**"Contribution"** means any work of authorship, including any modifications or additions to an existing work, that is intentionally submitted by You to the Project for inclusion in, or documentation of, the Project. "Submitted" means any form of electronic, verbal, or written communication sent to the Project (including but not limited to pull requests, patches, and issue comments) but excluding communication that is conspicuously marked or otherwise designated in writing by You as "Not a Contribution."
## 2. Copyright License Grant
Subject to the terms of this Agreement, You hereby grant to the Project, to any entity that maintains or succeeds it, and to recipients of software distributed by the Project a perpetual, worldwide, non-exclusive, royalty-free, irrevocable copyright license, with the right to sublicense through multiple tiers of sublicensees, to reproduce, prepare derivative works of, publicly display, publicly perform, distribute, and sublicense Your Contribution and such derivative works.
## 3. Patent License Grant
Subject to the terms of this Agreement, You hereby grant to the Project, to any entity that maintains or succeeds it, and to recipients of software distributed by the Project a perpetual, worldwide, non-exclusive, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer Your Contribution and any combination of Your Contribution with the Project to which it was submitted. This patent license applies only to those patent claims licensable by You that are necessarily infringed by Your Contribution alone or by combination of Your Contribution with the Project.
If any entity institutes patent litigation against You or any other entity (including a cross-claim or counterclaim in a lawsuit) alleging that Your Contribution, or the Project to which You have contributed, constitutes direct or contributory patent infringement, then any patent licenses granted to that entity under this Agreement for that Contribution or Project shall terminate as of the date such litigation is filed.
## 4. Relicensing Right
In addition to the licenses granted in Sections 2 and 3, You grant the Project and any entity that maintains or succeeds it the right to relicense Your Contribution, in whole or in part, under terms other than the Project's current license (currently GPL-3.0-or-later), where necessary to support the long-term sustainability, distribution, and evolution of the Project.
This may include, without limitation:
1. Dual-licensing the Project under a commercial license;
2. Combining Your Contribution with proprietary components; or
3. Moving the Project to a different open source license.
This right is irrevocable and may be exercised by the Project's maintainers as part of maintaining and evolving the Project.
## 5. Moral Rights Waiver
To the maximum extent permitted by applicable law, You waive, and agree not to assert, any moral rights or similar rights of attribution and integrity that You may have in Your Contribution against the Project, its successors, and recipients of software distributed by the Project. To the extent such rights cannot be waived under applicable law, You agree not to enforce them in a manner that would limit the rights granted under this Agreement.
## 6. Representations
You represent that:
1. Each of Your Contributions is Your original creation, or You otherwise have the legal right to submit it under the terms of this Agreement;
2. To the best of Your knowledge, Your Contribution does not infringe any third party's copyright, patent, trade secret, or other intellectual property rights; and
3. You have the legal authority to enter into this Agreement and to grant the licenses set forth above.
If any portion of Your Contribution is not Your original creation, You will identify the source and any license or other restriction applicable to that material as part of Your submission.
## 7. Employer Authorization
If You are submitting a Contribution on behalf of Your employer, or the Contribution was made within the scope of Your employment, You represent that Your employer has authorized You to make the Contribution and to grant the licenses set forth in this Agreement. If You are unsure, please confirm with Your employer before submitting.
## 8. No Warranty
You provide Your Contributions on an "AS IS" basis, without warranties or conditions of any kind, either express or implied, including, without limitation, any warranties of title, non-infringement, merchantability, or fitness for a particular purpose. You are not required to provide support for Your Contributions, except to the extent You desire to provide such support.
## 9. Copyright Retained
You retain copyright to Your Contribution. This Agreement grants the licenses set forth above; it does not transfer ownership. Its purpose is to give the Project flexibility to evolve and to relicense the codebase over time without needing to obtain permission from each past contributor on a case-by-case basis.
## 10. Notice of Changes
If You become aware of any facts or circumstances that would make any representation in this Agreement inaccurate in any respect, You agree to notify the Project promptly.

View file

@ -61,7 +61,7 @@ representative at an online or offline event.
Instances of abusive, harassing, or otherwise unacceptable behavior may be Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at reported to the community leaders responsible for enforcement at
**opening a private issue** at [https://github.com/ecpeter23/nyx/issues/new/choose](). **opening a private issue** at [https://github.com/elicpeter/nyx/issues/new/choose]().
All complaints will be reviewed and investigated promptly and fairly. All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the All community leaders are obligated to respect the privacy and security of the

View file

@ -1,142 +1,399 @@
# Contributing to Nyx # Contributing to Nyx
First off, **thank you for taking the time to contribute!** By participating in this project, you agree to abide by the community values and expectations described in our [Code of Conduct](CODE_OF_CONDUCT.md). Thank you for your interest in improving Nyx. This guide covers everything you need to contribute effectively.
Nyx is duallicensed under **MIT** and **Apache2.0**. By submitting code, documentation, or any other material, you agree to license your contribution under these same terms. User-facing documentation lives at **[elicpeter.github.io/nyx](https://elicpeter.github.io/nyx/)**; the source for those pages is in [`docs/`](docs/).
Please read our [Code of Conduct](CODE_OF_CONDUCT.md) before participating.
--- ---
## Table of Contents ## Table of Contents
1. [Getting Started](#getting-started) 1. [Development Setup](#development-setup)
2. [How to Contribute](#how-to-contribute) 2. [Project Layout](#project-layout)
3. [How to Add a New AST Pattern](#how-to-add-a-new-ast-pattern)
* [Bug Reports](#bug-reports) 4. [How to Add a New Taint Rule](#how-to-add-a-new-taint-rule)
* [Feature Requests](#feature-requests) 5. [How to Add a New Language](#how-to-add-a-new-language)
* [Pull Requests](#pull-requests) 6. [Testing](#testing)
3. [Development Workflow](#development-workflow) 7. [Pull Request Guidelines](#pull-request-guidelines)
4. [Commit & Branching Conventions](#commit--branching-conventions) 8. [Bug Reports](#bug-reports)
5. [Style Guide](#style-guide) 9. [Feature Requests](#feature-requests)
6. [Security Policy](#security-policy) 10. [Release Process](#release-process)
7. [Community Standards](#community-standards)
--- ---
## Getting Started ## Development Setup
Clone the repository and build Nyx in release mode: ### Prerequisites
- **Rust 1.88+** (edition 2024)
- Git
- **Node 20+** — only if you touch the browser UI under `frontend/` (the
`nyx serve` web app). Pure-Rust changes do not need it.
### Building
```bash ```bash
git clone https://github.com/<yourorg>/nyx.git git clone https://github.com/elicpeter/nyx.git
cd nyx cd nyx
cargo build --release
cargo build # Debug build
cargo build --release # Release build
cargo install --path . # Install as `nyx` binary
``` ```
Run the testsuite: ### Running Quality Checks
The fastest way to reproduce CI locally is the bundled script — it runs the same
commands CI runs (fmt, Clippy, tests, and the frontend checks):
```bash ```bash
cargo test ./scripts/check.sh # Mirror CI: fmt + clippy + tests (+ frontend)
./scripts/check.sh --rust-only # Skip the frontend checks
./scripts/fix.sh # Auto-fix: cargo fmt + clippy --fix + prettier/eslint
``` ```
> **Tip**: The first build downloads and compiles several `treesitter` grammars. Later builds will be faster. Or run the steps individually:
```bash
cargo test --all-features # Tests, incl. tests/ integration suite
cargo clippy --all-targets --all-features -- -D warnings # Lint, warnings = errors
cargo fmt # Format code
cargo fmt -- --check # Check formatting without modifying
```
> **Match CI exactly.** CI lints and tests with `--all-targets --all-features`.
> The older `cargo test --bin nyx` / `cargo clippy --all` commands skip the
> `tests/` integration suite and feature-gated code, so they can pass locally
> while CI fails. Prefer `./scripts/check.sh`.
> **Note**: The first build downloads and compiles tree-sitter grammars for all 10 languages. Subsequent builds are faster.
### Benchmarks
```bash
cargo bench --bench scan_bench
```
Benchmark fixtures live in `benches/fixtures/`. Criterion produces HTML reports in `target/criterion/`.
--- ---
## How to Contribute ## Project Layout
### Bug Reports > **New here?** [`docs/how-it-works.md`](docs/how-it-works.md) walks the analysis
> pipeline end to end (with a diagram), and [`docs/detectors/taint.md`](docs/detectors/taint.md)
> covers the taint engine. The easiest first contribution is usually a new AST
> pattern (see [below](#how-to-add-a-new-ast-pattern)) — small, self-contained,
> and well templated.
* Search existing [issues](https://github.com/<yourorg>/nyx/issues) to ensure the bug has not already been reported. ```
* Include **steps to reproduce**, expected vs. actual behaviour, and your environment details (`nyx --version`, `rustc --version`). src/
* Attach a minimal code sample if possible. main.rs CLI entry point
lib.rs Library re-exports (benchmarks, integration tests)
### Feature Requests cli.rs Clap command definitions
commands/ Subcommand handlers (scan, index, list, clean, config, serve)
We welcome wellmotivated feature proposals. Please describe: ast.rs Entry points for both passes; tree-sitter parsing
cfg/ CFG construction from AST, type hierarchy
1. **Problem statement** what pain point does this solve? cfg_analysis/ CFG structural detectors
2. **Proposed solution** highlevel description, optionally with pseudocode. guards.rs Unguarded sink detection (dominator analysis)
3. **Alternatives considered** why existing functionality is not enough. auth.rs Auth gap detection
resources.rs Resource leak detection
### Pull Requests error_handling.rs Error fallthrough detection
unreachable.rs Unreachable security code detection
Every PR should: rules.rs Guard rules, auth rules, resource pairs
ssa/ SSA IR (lowering, optimization passes, const prop)
1. Target the `main` branch. taint/ SSA-based taint engine (sole engine since 0.5.0)
2. Contain a single, focused change (small orthogonal fixes are okay). mod.rs Facade + JS two-level solve
3. Pass `cargo test`, `cargo fmt --check`, and `cargo clippy -- -D warnings`. domain.rs Shared lattice types (VarTaint, Cap, TaintOrigin)
4. Update documentation and, when relevant, add tests. ssa_transfer/ Block-level worklist, k=1 inline cache, gated sinks
5. Reference related issue numbers in the description (`Fixes #123`). backwards.rs Demand-driven backwards taint walk (opt-in)
path_state.rs Predicate tracking and contradiction pruning
A reviewer will provide feedback within **3 business days**. Squashmerge is the default strategy; maintainers may edit commit messages for clarity. state/
engine.rs Generic monotone dataflow engine (Transfer<S: Lattice>)
transfer.rs DefaultTransfer: resource lifecycle + auth state
summary/ FuncSummary, SsaFuncSummary, GlobalSummaries, hierarchy index
abstract_interp/ Interval + string prefix/suffix domains
pointer/ Field-sensitive points-to (Steensgaard-style)
symex/ Symbolic execution + witness generation
constraint/ Path-constraint solving (optional Z3 via `smt` feature)
auth_analysis/ Rust auth rule (`rs.auth.missing_ownership_check`) + sink classes
suppress/ Inline `nyx:ignore` directive parsing
labels/ Per-language label rules (one file per language)
patterns/ Per-language AST pattern queries (one file per language)
callgraph.rs Call graph construction (petgraph), SCC, topo sort
database.rs SQLite indexing via r2d2 pool
rank.rs Attack-surface ranking
fmt.rs Console output formatting
output.rs SARIF 2.1 builder
walk.rs Parallel file walker (ignore crate, respects .gitignore)
symbol/ Symbol interning (SymbolId)
server/ `nyx serve` HTTP layer, routes, triage sync
interop.rs Cross-language interop edges
engine_notes.rs Direction-aware engine notes (UnderReport / OverReport / Bail)
evidence.rs Structured evidence emitted with each finding
errors.rs NyxError, NyxResult types
utils/
config.rs TOML config loading, merging, Config struct
```
--- ---
## Development Workflow ## How to Add a New AST Pattern
1. **Fork** the repo and create your feature branch: AST patterns are the simplest detector to add. Each pattern is a tree-sitter query that matches a structural code construct.
```bash ### Step-by-step
git checkout -b feature/myfeature
1. **Pick the language file** under `src/patterns/<lang>.rs`.
2. **Choose the metadata**:
| Field | Options | Guidelines |
|-------|---------|------------|
| **ID** | `<lang>.<category>.<specific>` | e.g. `py.cmdi.os_popen` |
| **Tier** | `A` or `B` | `A` = presence alone is high-signal; `B` = query includes a heuristic guard |
| **Severity** | `High`, `Medium`, `Low` | High: command exec, deser, banned functions. Medium: SQL concat, reflection, XSS. Low: weak crypto, code quality. |
| **Category** | See `PatternCategory` enum | `CommandExec`, `CodeExec`, `Deserialization`, `SqlInjection`, `PathTraversal`, `Xss`, `Crypto`, `Secrets`, `InsecureTransport`, `Reflection`, `MemorySafety`, `Prototype`, `CodeQuality` |
3. **Write the tree-sitter query**:
```rust
Pattern {
id: "py.cmdi.os_popen",
description: "os.popen() shell command execution",
query: r#"(call
function: (attribute
object: (identifier) @pkg (#eq? @pkg "os")
attribute: (identifier) @fn (#eq? @fn "popen")))
@vuln"#,
severity: Severity::High,
tier: PatternTier::A,
category: PatternCategory::CommandExec,
},
``` ```
2. Make your changes, then run: The query **must** capture a `@vuln` node. That node's span determines the reported location.
4. **Test it**:
```bash ```bash
cargo fmt cargo test --bin nyx
```
5. **Update docs**: Add the new rule to `docs/rules/<lang>.md`.
### Tips
- Use the [tree-sitter playground](https://tree-sitter.github.io/tree-sitter/playground) to develop and test queries.
- Avoid duplicating taint coverage. If the same function is already a labeled sink in `src/labels/<lang>.rs`, the AST pattern is still useful for `--mode ast`, but use a distinct ID namespace. The dedup pass prevents exact-duplicate findings at the same location.
- Test with real-world code to check false positive rates before choosing a tier.
---
## How to Add a New Taint Rule
Taint rules define sources (where untrusted data enters), sinks (where dangerous operations happen), and sanitizers (where data is made safe).
### Step-by-step
1. **Open the language file** in `src/labels/<lang>.rs`.
2. **Add an entry** to the `RULES` slice:
```rust
LabelRule {
matchers: &["dangerouslySetInnerHTML"],
label: DataLabel::Sink(Cap::HTML_ESCAPE),
},
```
3. **Choose the right label type**:
| Type | Purpose | Example |
|------|---------|---------|
| `DataLabel::Source(cap)` | Introduces tainted data | `env::var`, `req.body` |
| `DataLabel::Sanitizer(cap)` | Strips matching capability bits | `html_escape`, `encodeURIComponent` |
| `DataLabel::Sink(cap)` | Dangerous operation requiring sanitization | `eval`, `innerHTML`, `Command::new` |
4. **Choose capabilities**:
| Capability | When to use |
|-----------|-------------|
| `Cap::all()` | Sources that produce universally dangerous data |
| `Cap::SHELL_ESCAPE` | Shell command injection sinks/sanitizers |
| `Cap::HTML_ESCAPE` | XSS sinks/sanitizers |
| `Cap::URL_ENCODE` | URL injection sinks/sanitizers |
| `Cap::JSON_PARSE` | JSON parsing sanitizers |
| `Cap::FILE_IO` | File I/O sinks |
| `Cap::FMT_STRING` | Format string sinks |
| `Cap::ENV_VAR` | Environment/config data sources |
5. **Matcher semantics**:
- Case-insensitive suffix matching by default.
- If a matcher ends with `_`, it acts as a prefix match.
- Multiple matchers in one rule are alternatives (any match triggers the rule).
### User-defined rules (no code change needed)
Users can add taint rules via config:
```toml
[[analysis.languages.javascript.rules]]
matchers = ["dangerouslySetInnerHTML"]
kind = "sink"
cap = "html_escape"
```
Or via CLI:
```bash
nyx config add-rule --lang javascript --matcher dangerouslySetInnerHTML --kind sink --cap html_escape
```
---
## How to Add a New Language
Adding a new language requires changes across several modules. Use an existing language (e.g. Go or Python) as a template.
### Checklist
1. **Tree-sitter parser**: Add `tree-sitter-<lang>` to `Cargo.toml`.
2. **Language registration**: Register the parser in `ast.rs` (language detection from file extension, parser initialization).
3. **CFG node kinds**: Create `src/labels/<lang>.rs` with a `KINDS` map that maps tree-sitter node types to the internal `Kind` enum (`Block`, `If`, `While`, `For`, `Return`, `CallFn`, `CallMethod`, `Assignment`, etc.).
4. **Parameter extraction**: Add a `PARAM_CONFIG` constant specifying how to extract function parameters from the AST (field name for parameter list, node type for individual parameters, extraction field for parameter names).
5. **Label rules**: Add `RULES` (sources, sinks, sanitizers) and `TERMINATORS` to the labels file.
6. **AST patterns**: Create `src/patterns/<lang>.rs` with a `PATTERNS` constant.
7. **Registry updates**:
- `src/patterns/mod.rs`: add to the `REGISTRY` HashMap
- `src/labels/mod.rs`: add to the `classify()` dispatch
8. **File extension mapping**: Add the extension in `ast.rs`.
9. **Tests**: Write unit tests and add test fixtures.
---
## Testing
### Tests
Unit tests are inline `#[test]` blocks inside source modules; integration tests
live under `tests/`. Run everything the way CI does:
```bash
cargo test --all-features
```
### What to Test
- **New AST patterns**: Ensure the tree-sitter query matches the intended construct and does not match safe alternatives.
- **New taint rules**: Verify that source-to-sink flows are detected and that sanitizers properly neutralize findings.
- **New CFG rules**: Test that guard dominance logic correctly suppresses findings when guards are present.
- **Edge cases**: Empty files, files with syntax errors (tree-sitter is error-tolerant), deeply nested structures.
### Linting
CI runs Clippy with strict settings. Before submitting:
```bash
cargo clippy --all-targets --all-features -- -D warnings
```
---
## Pull Request Guidelines
First-time contributors are welcome. If you are unsure where to start, open an issue and we can help identify a focused starter task.
1. **Branch from `master`**. Use descriptive branch names: `feat/add-kotlin-support`, `fix/false-positive-sql-concat`, `docs/update-rule-reference`.
2. **Keep PRs focused**. One logical change per PR.
3. **Ensure CI passes** — run `./scripts/check.sh` (mirrors CI), or the steps individually:
```bash
cargo test --all-features
cargo clippy --all-targets --all-features -- -D warnings cargo clippy --all-targets --all-features -- -D warnings
cargo test cargo fmt -- --check
``` ```
3. **Signoff** your commits if your employer requires a Developer Certificate of Origin (DCO): 4. **Commit style**: Use [Conventional Commits](https://www.conventionalcommits.org/).
```
```bash feat(patterns): add Python subprocess.Popen pattern
git commit -s -m "feat: add XYZ" fix(taint): prevent false positive on sanitized innerHTML
docs(rules): update JavaScript rule reference
``` ```
4. Push the branch and open a PR against `main`. 5. **Document new rules**. If you add patterns or taint rules, update the corresponding `docs/rules/<lang>.md` page.
6. **Include test cases** for any new detection rules.
7. **Disclose material AI assistance** in the PR description if the change was drafted, generated, or substantially refactored by an AI tool. One line is enough. See [AI-POLICY.md](AI-POLICY.md) for the full policy and the bar we hold AI-assisted contributions to.
--- ---
## Commit & Branching Conventions ## Bug Reports
* **Branch names**: `feature/<slug>`, `fix/<slug>`, `docs/<slug>` Please [open an issue](https://github.com/elicpeter/nyx/issues) for:
* **Commit style** Conventional Commits (simplified):
```text - **Crashes or panics**: include the backtrace (`RUST_BACKTRACE=1 nyx scan .`)
type(scope): subject - **False positives**: include the minimal code snippet, rule ID, and Nyx version
- **False negatives**: describe what you expected Nyx to find and why
body (optional) - **Documentation errors**: point to the specific page and what's wrong
```
| Type | Use for |
|------------|--------------------------------------|
| `feat` | New functionality |
| `fix` | Bug fixes |
| `docs` | Documentation only |
| `refactor` | Code change without behaviour change |
| `test` | Adding or changing tests |
| `chore` | Build process, tooling |
--- ---
## Style Guide ## Feature Requests
* **Formatting**: run `cargo fmt` before committing. We welcome well-motivated feature proposals. Please describe:
* **Linting**: CI runs Clippy with `-D warnings`; keep the tree warningfree.
* **Unsafe Rust**: prohibited unless absolutely necessary. Justify with incode comments. 1. **Problem statement**: what pain point does this solve?
* **Public API stability**: avoid breaking changes on exported types and functions without prior discussion. 2. **Proposed solution**: high-level description, optionally with pseudo-code.
3. **Alternatives considered**: why existing functionality is not enough.
--- ---
## Security Policy ## Release Process
Please do **not** open public issues for securitysensitive bugs. Instead, email the maintainers at `<security@example.com>` with the details and a proof of concept. We aim to acknowledge reports within **48 hours**. 1. Update version in `Cargo.toml`.
2. Update `CHANGELOG.md` with the new version section.
3. Run full checks: `./scripts/check.sh` (or `cargo test --all-features && cargo clippy --all-targets --all-features -- -D warnings`).
4. Create a git tag: `git tag v0.x.y`.
5. Push tag: `git push origin v0.x.y`.
6. CI builds release binaries and publishes to crates.io.
--- ---
## Community Standards ## Security Issues
We strive to maintain a welcoming and inclusive community. Harassment, discrimination, or other forms of unacceptable behavior will be addressed per the [Code of Conduct](CODE_OF_CONDUCT.md). Please do **not** open public issues for security-sensitive bugs. See [SECURITY.md](SECURITY.md) for our responsible disclosure process.
Thank you for helping to make Nyx better! ---
## License
### Contributions are released under GPL-3.0-or-later
By submitting a pull request, patch, or other contribution to Nyx, you agree that your contribution will be released under the [GPL-3.0-or-later](./LICENSE), the same license as the project.
### Developer Certificate of Origin
We use the Developer Certificate of Origin (DCO) as a lightweight baseline for contributions. All commits must include a `Signed-off-by:` trailer, which certifies that you wrote the code yourself or otherwise have the right to submit it under the project license.
Use `git commit -s` to add this automatically.
### Contributor License Agreement
Before your first contribution can be merged, you must sign the Nyx [Contributor License Agreement](./CLA.md).
The CLA does not transfer ownership of your work. You retain copyright to your contributions. It grants Nyx the rights needed to maintain, distribute, and evolve the project over time, including the flexibility to support long-term sustainability through future licensing or commercial offerings.
If you do not agree to these terms, please do not submit contributions to Nyx.

1370
Cargo.lock generated

File diff suppressed because it is too large Load diff

View file

@ -1,29 +1,67 @@
[package] [package]
name = "nyx-scanner" name = "nyx-scanner"
version = "0.2.0" version = "0.8.0"
edition = "2024" edition = "2024"
description = "A CLI security scanner for automating vulnerability checks" rust-version = "1.88"
license = "GPL-3.0" description = "A multi-language static analysis tool for detecting security vulnerabilities"
authors = ["Eli Peter <elicpeter@exmaple.com>"] license = "GPL-3.0-or-later"
homepage = "https://github.com/ecpeter23/nyx" authors = ["Eli Peter <elicpeter@example.com>"]
repository = "https://github.com/ecpeter23/nyx" homepage = "https://nyxsec.dev/scanner"
documentation = "https://github.com/ecpeter23/nyx#readme" repository = "https://github.com/elicpeter/nyx"
documentation = "https://nyxsec.dev/docs/nyx/"
keywords = ["security", "vulnerability", "scanner", "static-analysis", "cli"] keywords = ["security", "vulnerability", "scanner", "static-analysis", "cli"]
categories = ["command-line-utilities", "development-tools", "security"] categories = ["security", "command-line-utilities", "development-tools", "parser-implementations", "text-processing"]
readme = "README.md" readme = "README.md"
default-run = "nyx" default-run = "nyx"
exclude = [ include = [
"assets/", "/src/**",
".github/", "/tools/**",
".claude/", "/build.rs",
".idea/", "/Cargo.toml",
"tests/", "/Cargo.lock",
"benches/", "/README.md",
"examples/", "/LICENSE",
"/THIRDPARTY-LICENSES.html",
"/default-nyx.conf",
] ]
autoexamples = false autoexamples = false
[package.metadata.binstall]
pkg-url = "{ repo }/releases/download/v{ version }/nyx-{ target }{ archive-suffix }"
pkg-fmt = "zip"
bin-dir = "target/{ target }/release/{ bin }{ binary-ext }"
# docs.rs builds the `serve` feature (default) so the server module renders.
# `smt` is left off — bundled Z3 takes too long on docs.rs builders, and
# `smt-system-z3` needs a system library that isn't available there.
[package.metadata.docs.rs]
features = ["serve"]
rustdoc-args = ["--cfg", "docsrs"]
[features]
default = ["serve", "dynamic"]
serve = ["dep:axum", "dep:tokio", "dep:tokio-stream", "dep:tower-http"]
smt = ["dep:z3", "z3/bundled"]
smt-system-z3 = ["dep:z3"]
docgen = []
# Dynamic verification layer: builds harnesses from findings, runs them in a
# sandbox, reports back whether the sink fires.
dynamic = ["dep:bytes", "dep:h2", "dep:http", "dep:prost", "dep:tempfile", "dep:tokio"]
# Phase 19 (Track E.3): the `nyx-image-builder` helper binary that builds
# and pins per-toolchain Docker images. Gated so it does not bloat the
# default `nyx` build with extra TOML-write logic CI-only operators need.
image-builder = []
# Phase 20 (Track E.4): the firecracker VM backend. Off by default so
# the standard build pulls in zero Firecracker-related code; turning it
# on adds the `firecracker.rs` backend module and exposes
# `SandboxBackend::Firecracker` to callers. When the feature is on but
# the `firecracker` binary is absent on PATH, the backend returns
# `SandboxError::BackendUnavailable(SandboxBackend::Firecracker)` so the
# verifier can route around it cleanly.
firecracker = ["dynamic"]
[lib] [lib]
name = "nyx_scanner" name = "nyx_scanner"
path = "src/lib.rs" path = "src/lib.rs"
@ -32,32 +70,49 @@ path = "src/lib.rs"
name = "nyx" name = "nyx"
path = "src/main.rs" path = "src/main.rs"
[[bin]]
name = "nyx-docgen"
path = "tools/docgen/main.rs"
required-features = ["docgen"]
[[bin]]
name = "nyx-image-builder"
path = "tools/image-builder/main.rs"
required-features = ["image-builder"]
[[bench]] [[bench]]
name = "scan_bench" name = "scan_bench"
harness = false harness = false
[[bench]]
name = "dynamic_bench"
harness = false
required-features = []
[dev-dependencies] [dev-dependencies]
tempfile = "3.26.0" tempfile = "3.27.0"
criterion = { version = "0.8", features = ["html_reports"] } criterion = { version = "0.8.2", features = ["html_reports"] }
assert_cmd = "2" assert_cmd = "2.2.2"
predicates = "3" predicates = "3.1.4"
glob = "0.3" glob = "0.3.3"
tower = { version = "0.5.3", features = ["util"] }
[dependencies] [dependencies]
directories = "6.0.0" directories = "6.0.0"
clap = { version = "4.5.60", features = ["derive"] } clap = { version = "4.6.1", features = ["derive"] }
serde = { version = "1.0.228", features = ["derive"] } serde = { version = "1.0.228", features = ["derive"] }
serde_json = "1.0" serde_json = "1.0.150"
toml = "1.0.3" rmp-serde = "1.3.1"
tracing-subscriber = { version = "0.3.22", features = ["env-filter", "json", "ansi","time"] } toml = "1.1.2"
tracing-subscriber = { version = "0.3.23", features = ["env-filter", "json", "ansi","time"] }
tracing = "0.1.44" tracing = "0.1.44"
num_cpus = "1.17.0" num_cpus = "1.17.0"
rusqlite = { version = "0.38.0", features = ["bundled"] } rusqlite = { version = "0.39.0", features = ["bundled"] }
r2d2_sqlite = { version = "0.32.0", features = ["bundled"] } r2d2_sqlite = { version = "0.34.0", features = ["bundled"] }
ignore = "0.4.25" ignore = "0.4.26"
tree-sitter = "0.26.5" tree-sitter = "0.26.9"
tree-sitter-rust = "0.24.0" tree-sitter-rust = "0.24.2"
tree-sitter-c = "0.24.1" tree-sitter-c = "0.24.2"
tree-sitter-cpp = "0.23.4" tree-sitter-cpp = "0.23.4"
tree-sitter-java = "0.23.5" tree-sitter-java = "0.23.5"
tree-sitter-typescript = "0.23.2" tree-sitter-typescript = "0.23.2"
@ -67,15 +122,46 @@ tree-sitter-php = "0.24.2"
tree-sitter-python = "0.25.0" tree-sitter-python = "0.25.0"
tree-sitter-ruby = "0.23.1" tree-sitter-ruby = "0.23.1"
crossbeam-channel = "0.5.15" crossbeam-channel = "0.5.15"
blake3 = "1.8.3" blake3 = "1.8.5"
once_cell = "1.21.3" once_cell = "1.21.4"
console = "0.16.2" console = "0.16.3"
rayon = "1.11.0" terminal_size = "0.4.4"
rayon = "1.12.0"
r2d2 = "0.8.10" r2d2 = "0.8.10"
bytesize = "2.3.1" bytesize = "2.3.1"
chrono = { version = "0.4.44", default-features = false, features = ["std", "clock"] } chrono = { version = "0.4.45", default-features = false, features = ["std", "clock", "serde"] }
thiserror = "2.0.18" thiserror = "2.0.18"
dashmap = "7.0.0-rc2" dashmap = "6.2.1"
petgraph = "0.8.3" parking_lot = "0.12.5"
bitflags = "2.11.0" petgraph = { version = "0.8.3", features = ["serde-1"] }
bitflags = "2.12.1"
phf = { version = "0.13.1", features = ["macros"] } phf = { version = "0.13.1", features = ["macros"] }
indicatif = "0.18.4"
smallvec = { version = "1.15.1", features = ["serde"] }
rustc-hash = "2.1.2"
uuid = { version = "1.23.2", features = ["v4"] }
axum = { version = "0.8.9", optional = true }
bytes = { version = "1.11.1", optional = true }
h2 = { version = "0.4.14", optional = true }
http = { version = "1.4.1", optional = true }
prost = { version = "0.14.3", optional = true }
tokio = { version = "1.52.3", features = ["rt-multi-thread", "macros", "signal", "sync", "net", "io-util"], optional = true }
tokio-stream = { version = "0.1.18", features = ["sync"], optional = true }
tower-http = { version = "0.6.11", features = ["cors", "compression-gzip", "trace", "set-header", "limit"], optional = true }
z3 = { version = "0.20.0", optional = true}
tempfile = { version = "3.27.0", optional = true }
[lints.clippy]
# Allowed project-wide instead of per-file. The vast majority of
# `collapsible_if` hits are `if let Some(x) = .. { if cond { .. } }` patterns
# whose only "fix" is to collapse into a let-chain, which hurts readability on
# the complex extractor expressions throughout the engine. Keeping the decision
# here means the rationale lives in one place and new files inherit it
# automatically rather than re-declaring `#![allow(clippy::collapsible_if)]`.
collapsible_if = "allow"
[profile.release]
lto = true
codegen-units = 1
debug = 1
strip = "none"

89
LICENSE-GRANTS.md Normal file
View file

@ -0,0 +1,89 @@
# Internal License Grants
This file records dual-licensing grants the copyright holder of Nyx has issued
beyond the public GPL-3.0-or-later release.
Nyx ships publicly under GPL-3.0-or-later. That license continues to apply to
every public release on GitHub, crates.io, and any other channel. The grants
recorded here are separate, private licenses from the copyright holder to
specific projects. They do not modify the public GPL terms and they are not
transferable to third parties.
The right to issue these grants is preserved in `CLA.md` Section 4
(Relicensing Right):
> [The contributor] grants the Project and any entity that maintains or
> succeeds it the right to relicense Your Contribution, in whole or in part,
> under terms other than the Project's current license (currently
> GPL-3.0-or-later), where necessary to support the long-term sustainability,
> distribution, and evolution of the Project.
The copyright holder is the sole author of every Contribution to Nyx
(verifiable via `git log`). The CLA covers any future external Contributions.
The copyright holder may therefore grant any party, including projects owned
by the same copyright holder, a license to use Nyx under terms other than
GPL-3.0-or-later, without affecting the public GPL release.
## How forks are affected
A third-party fork of nyx-agent that obtains the nyx-agent source under PolyForm
Small Business 1.0.0 (or any successor source-available license) does not
acquire any rights to Nyx beyond the public GPL-3.0-or-later terms. The
internal grant below is project-to-project and non-transferable. Anyone
redistributing a binary that statically or dynamically links the `nyx` crate
must comply with the GPL on the `nyx` portion of the work. GPL is viral
copyleft on distribution. Only the copyright holder may issue further
dual-licensing grants.
---
## Grant Register
### Grant 1: nyx-agent
| Field | Value |
|---|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Grantor | Eli Peter, sole copyright holder of Nyx as of the effective date |
| Grantee | The nyx-agent project (`nyx-agent` daemon, web UI, and accompanying tooling). Repository: `nyx-agent` |
| Effective date | 2026-05-17 |
| Scope | All Nyx source code, documentation, fixtures, build artefacts, and binaries (the "Licensed Material") in any version released as of the effective date or thereafter, plus any future modifications the Grantor authors or accepts under the CLA |
| Permitted uses | (a) static or dynamic linking of the Licensed Material into the nyx-agent daemon; (b) modification of the Licensed Material as required for nyx-agent integration; (c) redistribution of the Licensed Material as part of the nyx-agent distribution; (d) sublicensing the Licensed Material to end users of nyx-agent solely under whatever license terms nyx-agent itself is distributed under (currently PolyForm Small Business 1.0.0, or a separately negotiated commercial license) |
| Restrictions | (a) this grant does not modify, supersede, or revoke the public GPL-3.0-or-later release of Nyx; (b) this grant is non-transferable; only the nyx-agent project, owned by the Grantor, may exercise it; (c) any third-party fork of nyx-agent must obtain Nyx under the public GPL terms unless it negotiates a separate grant from the Grantor; (d) attribution of Nyx authorship must be preserved in any redistribution per the CLA's moral-rights waiver |
| Duration | Perpetual and irrevocable, subject only to the Grantee maintaining ownership-or-control by the Grantor. If the nyx-agent project is sold, assigned, or otherwise transferred to a third party, this grant terminates and the new owner must negotiate a separate license |
| Sublicensing of the grant itself | Not permitted. The Grantee may distribute Nyx as part of nyx-agent to end users under nyx-agent's outward terms, but the Grantee may not grant any other project the right to use Nyx outside the public GPL terms |
| Governing law | Same as Nyx CLA |
---
## Adding future grants
New grants follow the same format as Grant 1. Append a new section
(`### Grant N: <recipient name>`) below the existing entries and commit to
the Nyx repository. Grants are append-only. Revisions land as superseding
entries with their own date, not as edits to the original.
Grants the Grantor anticipates issuing in the future include:
- Commercial-license SKU grants to individual customers of nyx-agent that
exceed the PolyForm Small Business threshold. These will be issued
per-customer under a separate Nyx Commercial License contract.
- Stewardship-transition grants if the project is ever handed off (for
example, to a foundation). These would be a single grant to the receiving
entity.
The Grantor reserves the right to refuse to issue any grant.
---
## What this file is NOT
- It is not a redistribution license. Third parties cannot rely on it to use
Nyx outside the public GPL terms.
- It is not a Contributor License Agreement. `CLA.md` covers contribution
terms separately.
- It is not a public-facing license file. The canonical public license for
Nyx is `LICENSE` (GPL-3.0-or-later).
---
Copyright (c) 2026 Eli Peter. All rights reserved.

445
README.md
View file

@ -1,291 +1,302 @@
<div align="center"> <div align="center">
<img src="assets/logo.png" alt="nyx logo" width="300"/> <img src="assets/nyx-readme-header.png" alt="NYX" width="640"/>
**Fast, cross-language cli vulnerability scanner.** **A local-first security scanner with sandboxed dynamic verification and a browser UI. Scan your repo and triage in your browser, with no cloud and no account.**
[![crates.io](https://img.shields.io/crates/v/nyx-scanner.svg)](https://crates.io/crates/nyx-scanner) [![crates.io](https://img.shields.io/crates/v/nyx-scanner.svg)](https://crates.io/crates/nyx-scanner)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0) [![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![Rust 1.85+](https://img.shields.io/badge/rust-1.85%2B-orange)](https://www.rust-lang.org) [![Rust 1.88+](https://img.shields.io/badge/rust-1.88%2B-orange)](https://www.rust-lang.org)
[![CI](https://img.shields.io/github/actions/workflow/status/ecpeter23/nyx/ci.yml?branch=master)](https://github.com/ecpeter23/nyx/actions) [![CI](https://img.shields.io/github/actions/workflow/status/elicpeter/nyx/ci.yml?branch=master)](https://github.com/elicpeter/nyx/actions)
[![Docs](https://img.shields.io/badge/docs-nyxscan.dev%2Fdocs-blue)](https://nyxscan.dev/docs/)
English · [简体中文](./README.zh-CN.md)
</div> </div>
--- <p align="center"><img src="assets/screenshots/demo.gif" alt="Nyx UI walkthrough: empty Welcome state, kicking off a scan, the populated overview with Health Score, drilling into a HIGH finding's flow visualizer, then the triage flow" width="900"/></p>
## What is Nyx?
**Nyx** is a lightweight, lightning-fast Rust-native command-line tool that detects security vulnerabilities across 10 programming languages. It combines [`tree-sitter`](https://tree-sitter.github.io/) parsing, intra-procedural control-flow graphs, and cross-file taint analysis with an optional SQLite-backed index to deliver deep, repeatable scans on projects of any size.
--- ---
## Key Capabilities ## Scan locally, browse locally
| Capability | Description | Nyx runs cross-language taint analysis on your repository, then verifies Medium or higher confidence findings by running small sandboxed harnesses against the real code. Results are served to a React UI bound to `127.0.0.1`. You get severity, static evidence, dynamic verdicts, and a step-by-step **flow visualiser** that walks the dataflow from source → sanitizer → sink. Triage decisions persist to `.nyx/triage.json`, which commits alongside your code so the team shares one triage state.
```bash
cargo install nyx-scanner
nyx scan # runs the analyzer, caches findings in .nyx/
nyx serve # opens http://localhost:9700 in your browser
```
Everything stays on your machine: loopback-only bind, host-header enforcement, CSRF on every mutation, no remote telemetry, no login.
<p align="center"><img src="assets/screenshots/overview.png" alt="Overview dashboard for a small JS app: Health Score C 78 with the five-component breakdown (Severity pressure, Confidence quality, Trend, Triage coverage, Regression resistance), 3 findings detected, OWASP A03 and A02 buckets, confidence distribution and issue category bars, top affected files" width="900"/></p>
---
## What's in the UI
| Page | What it shows |
|---|---| |---|---|
| Multi-language support | Rust, C, C++, Java, Go, PHP, Python, Ruby, TypeScript, JavaScript | | **Overview** | Dashboard: finding counts by severity, top offenders, engine profile summary |
| AST-level pattern matching | Language-specific queries written against precise parse trees | | **Findings** | Browsable list with severity badges, triage status, rule filter, language filter |
| Control-flow graph analysis | Auth gaps, unguarded sinks, unreachable security code, resource leaks, error fallthrough | | **Finding detail** | Flow-path visualiser with numbered steps (source → sanitizer → sink), dynamic verdicts, code snippets, evidence, cross-file markers, triage dropdown |
| Cross-file taint tracking | BFS taint propagation from sources through sanitizers to sinks with function summaries | | **Triage** | Bulk update states (open, investigating, fixed, false_positive, accepted_risk, suppressed), audit trail, import/export JSON |
| Cross-language interop | Taint flows across language boundaries via explicit interop edges | | **Explorer** | File tree with per-file symbol list and finding overlay |
| Two-pass architecture | Pass 1 extracts function summaries; Pass 2 runs taint with full cross-file context | | **Scans** | Run history, metrics, diff two scans to see what changed |
| Incremental indexing | SQLite database stores file hashes, summaries, and findings to skip unchanged files | | **Rules** | Built-in and custom rules per language; add rules from the UI |
| Parallel execution | File walking and analysis run concurrently via Rayon; scales with available CPU cores | | **Config** | Live config editor; reload without restart |
| Configurable scan parameters | Exclude directories, set maximum file size, tune worker threads, limit output, and more |
| Multiple output formats | Human-readable console view (default) and machine-readable JSON |
`nyx serve` flags: `--port <N>` (default `9700`), `--host <addr>` (loopback only: `127.0.0.1`, `localhost`, or `::1`), `--no-browser`. See `[server]` in `nyx.conf` for persistent settings, and the [Browser UI guide](https://nyxscan.dev/docs/serve.html) for the page-by-page UI tour and security model.
--- ---
## Why choose Nyx? ## CLI for CI
| Advantage | What it means for you | The same engine runs headless for CI pipelines. SARIF output uploads directly to GitHub Code Scanning.
|---|---|
| **Pure-Rust, single binary** | No JVM, Python, or server to install; drop the `nyx` executable into your `$PATH` and go. |
| **Massively parallel** | Uses Rayon and a thread-pool walker; scales to all CPU cores. Scanning the entire **rust-lang/rust** codebase (~53,000 files) on an M2 MacBook Pro takes **~1 s**. |
| **Deep analysis** | Real CFG construction and taint propagation, not just regex matching. Cross-file function summaries, capability-based sanitizer tracking, and scored findings. |
| **Index-aware** | An optional SQLite index stores file hashes and findings; subsequent scans touch *only* changed files, slashing CI times. |
| **Offline & privacy-friendly** | Requires no login, cloud account, or telemetry. Perfect for air-gapped environments and strict compliance policies. |
| **Tree-sitter precision** | Parses real language grammars, not regexes, giving far fewer false positives than line-based scanners. |
| **Extensible** | Add new patterns with concise `tree-sitter` queries; no SaaS lock-in. |
--- <p align="center"><img src="assets/screenshots/cli-scan.gif" alt="nyx scan console output: HIGH taint findings across a JS and Python file with source → sink arrows" width="820"/></p>
## Installation
### Install crate
```bash
$ cargo install nyx-scanner
```
### Install Github release
1. Navigate to the [Releases](https://github.com/ecpeter23/nyx/releases) page of the repository.
2. Download the appropriate binary for your system:
```nyx-x86_64-unknown-linux-gnu.zip``` for Linux
```nyx-x86_64-pc-windows-msvc.zip``` for Windows
```nyx-x86_64-apple-darwin.zip``` or ```nyx-aarch64-apple-darwin.zip``` for macOS (Intel or Apple Silicon)
3. Unzip the file and move the executable to a directory in your system PATH:
```bash
# Example for Unix systems
unzip nyx-x86_64-unknown-linux-gnu.zip
chmod +x nyx
sudo mv nyx /usr/local/bin/
```
```bash
# Example for Windows in PowerShell
Expand-Archive -Path nyx-x86_64-pc-windows-msvc.zip -DestinationPath .
Move-Item -Path .\nyx.exe -Destination "C:\Program Files\Nyx\" # Add to PATH manually if needed
```
4. Verify the installation:
```bash
nyx --version
```
### Build from source
```bash ```bash
$ git clone https://github.com/ecpeter23/nyx.git # Fail the job on medium or higher, emit SARIF
$ cd nyx nyx scan --format sarif --fail-on MEDIUM > results.sarif
$ cargo build --release
# optional copy the binary into PATH # Ad-hoc JSON, no index
$ cargo install --path . nyx scan ./server --format json --index off
# AST patterns only (fastest; skips CFG + taint)
nyx scan --mode ast
# Engine-depth shortcut: fast | balanced (default) | deep
# `deep` adds symex + demand-driven backwards taint for higher precision at ~2-3× cost
nyx scan --engine-profile deep
``` ```
Nyx targets **stable Rust 1.85 or later**. Forward cross-file taint runs in every profile. Symex and the demand-driven backwards walk are opt-in. Turn them on either via `--engine-profile deep`, or individually (`--symex`, `--backwards-analysis`). See the [CLI reference](https://nyxscan.dev/docs/cli.html#engine-depth-profile) for the full toggle matrix.
### GitHub Action
```yaml
- uses: elicpeter/nyx@v0.8.0
with:
format: sarif
fail-on: MEDIUM
- uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: nyx-results.sarif
```
Inputs: `path`, `version`, `format` (`sarif`|`json`|`console`), `fail-on`, `args`, `token`. Outputs: `finding-count`, `sarif-file`, `exit-code`, `nyx-version`. Linux and macOS runners (x86_64, ARM64).
--- ---
## Quick Start ## Install
**Cargo (recommended):**
```bash
cargo install nyx-scanner
```
**Pre-built binaries:** Grab the archive for your platform from [Releases](https://github.com/elicpeter/nyx/releases), verify against `SHA256SUMS` (and the detached `SHA256SUMS.asc` GPG signature, when present), unzip, and drop `nyx` on your `PATH`.
```bash ```bash
# Scan the current directory (creates/uses an index automatically) # Optional: verify the checksum file's GPG signature (when SHA256SUMS.asc is published)
$ nyx scan gpg --verify SHA256SUMS.asc SHA256SUMS
sha256sum -c SHA256SUMS --ignore-missing
# Scan a specific path and emit JSON unzip nyx-x86_64-unknown-linux-gnu.zip && chmod +x nyx && sudo mv nyx /usr/local/bin/
$ nyx scan ./server --format json
# Perform an ad-hoc scan without touching the index
$ nyx scan --no-index
# Restrict results to high-severity findings
$ nyx scan --high-only
# AST pattern matching only (fastest, no CFG/taint)
$ nyx scan --ast-only
# CFG + taint analysis only (skip AST pattern rules)
$ nyx scan --cfg-only
``` ```
### Index Management **From source:**
```bash ```bash
# Create or rebuild an index git clone https://github.com/elicpeter/nyx.git
$ nyx index build [PATH] [--force] cd nyx && cargo build --release
# Display index metadata (size, modified date, etc.)
$ nyx index status [PATH]
# List all indexed projects (add -v for detailed view)
$ nyx list [-v]
# Remove a single project or purge all indexes
$ nyx clean <PROJECT_NAME>
$ nyx clean --all
``` ```
--- Requires stable Rust 1.88+. The frontend is compiled and embedded in the binary at build time, so there is no separate install step for `nyx serve`.
## Analysis Modes
Nyx supports three analysis modes, selectable via the `scanner.mode` config option or CLI flags:
| Mode | CLI flag | What runs |
|---|---|---|
| **Full** (default) | — | AST pattern matching + CFG construction + taint analysis |
| **AST-only** | `--ast-only` | AST pattern matching only; skips CFG and taint entirely |
| **Taint-only** | `--cfg-only` | CFG + taint analysis only; filters out AST pattern findings |
### What the CFG + taint engine detects
| Finding | Rule ID | Description |
|---|---|---|
| Tainted data flow | `taint-*` | Untrusted data (env vars, user input, file reads) flowing to dangerous sinks (shell exec, SQL, file write) without matching sanitization |
| Unguarded sink | `cfg-unguarded-sink` | Sink calls not dominated by a guard or sanitizer on the control-flow path |
| Auth gap | `cfg-auth-gap` | Web handler functions that reach privileged sinks without an auth check |
| Unreachable security code | `cfg-unreachable-*` | Sanitizers, guards, or sinks in dead code branches |
| Error fallthrough | `cfg-error-fallthrough` | Error-handling branches that don't terminate, allowing execution to fall through to dangerous operations |
| Resource leak | `cfg-resource-leak` | Resources acquired but not released on all exit paths (malloc/free, fopen/fclose, Lock/Unlock) |
Findings are scored and ranked by severity, proximity to entry point, path complexity, and taint confirmation.
--- ---
## Supported Languages ## Languages
All 10 languages have full AST pattern matching and CFG/taint analysis. Resource leak detection is available where language-specific acquire/release pairs are defined. All 10 languages parse via tree-sitter and run through the full pipeline, but rule depth and engine coverage are uneven. Benchmark F1 on the synthetic corpus at [`tests/benchmark/ground_truth.json`](tests/benchmark/ground_truth.json) is 100% across all ten languages at the last measured baseline (see [`tests/benchmark/RESULTS.md`](tests/benchmark/RESULTS.md)), so F1 alone no longer separates the tiers. Tiering reflects rule depth, gated-sink coverage, and structural idioms the synthetic corpus does not fully stress:
| Language | AST Patterns | CFG + Taint | Resource Leaks | | Tier | Languages | F1 | Use as a CI gate? |
|---|---|---|---| |---|---|---|---|
| Rust | Yes | Yes | Yes | | **Stable** | Python, JavaScript, TypeScript | 100% | Yes |
| C | Yes | Yes | Yes | | **Beta** | Java, PHP, Ruby, Rust, Go | 100% | Yes, with light FP triage |
| C++ | Yes | Yes | Yes | | **Preview** | C, C++ | 100% on synthetic corpus | No. STL container flow, builder chains, and inline class member functions are tracked, but deep pointer aliasing and function pointers are not. Pair with clang-tidy or Clang Static Analyzer |
| Java | Yes | Yes | Yes |
| Go | Yes | Yes | Yes | All real-CVE fixtures fire and the corpus carries zero open FPs at the recorded baseline (P=R=F1=1.000). Per-dimension detail and known blind spots live on the [Language maturity page](https://nyxscan.dev/docs/language-maturity.html).
| PHP | Yes | Yes | — |
| Python | Yes | Yes | — | ### Validated against real CVEs
| Ruby | Yes | Yes | — |
| TypeScript | Yes | Yes | — | The corpus also holds a small set of vulnerable/patched pairs extracted from published advisories, so the benchmark floor is defended by regression protection on demonstrably real bugs rather than just synthetic analogues. Nyx fires on the vulnerable file and emits zero findings on the patched file for each pair.
| JavaScript | Yes | Yes | — |
| CVE | Project | Language | Class |
|---|---|---|---|
| [CVE-2023-48022](https://nvd.nist.gov/vuln/detail/CVE-2023-48022) | Ray | Python | Command injection |
| [CVE-2017-18342](https://nvd.nist.gov/vuln/detail/CVE-2017-18342) | PyYAML | Python | Deserialization |
| [CVE-2019-14939](https://nvd.nist.gov/vuln/detail/CVE-2019-14939) | mongo-express | JavaScript | Code execution (`eval`) |
| [CVE-2023-22621](https://nvd.nist.gov/vuln/detail/CVE-2023-22621) | Strapi | JavaScript | Code execution (SSTI) |
| [CVE-2025-64430](https://nvd.nist.gov/vuln/detail/CVE-2025-64430) | Parse Server | JavaScript | SSRF |
| [CVE-2023-26159](https://nvd.nist.gov/vuln/detail/CVE-2023-26159) | follow-redirects | TypeScript | SSRF |
| [GHSA-4x48-cgf9-q33f](https://github.com/advisories/GHSA-4x48-cgf9-q33f) | Novu | TypeScript | SSRF |
| [CVE-2026-25544](https://nvd.nist.gov/vuln/detail/CVE-2026-25544) | Payload CMS | TypeScript | SQL injection |
| [CVE-2022-30323](https://nvd.nist.gov/vuln/detail/CVE-2022-30323) | hashicorp/go-getter | Go | Command injection |
| [CVE-2024-31450](https://nvd.nist.gov/vuln/detail/CVE-2024-31450) | owncast | Go | Path traversal |
| [CVE-2023-3188](https://nvd.nist.gov/vuln/detail/CVE-2023-3188) | owncast | Go | SSRF |
| [CVE-2026-41422](https://github.com/daptin/daptin/security/advisories/GHSA-rw2c-8rfq-gwfv) | daptin | Go | SQL injection |
| [CVE-2015-7501](https://nvd.nist.gov/vuln/detail/CVE-2015-7501) | Apache Commons Collections | Java | Deserialization |
| [CVE-2017-12629](https://nvd.nist.gov/vuln/detail/CVE-2017-12629) | Apache Solr | Java | Command injection |
| [CVE-2022-1471](https://nvd.nist.gov/vuln/detail/CVE-2022-1471) | SnakeYAML | Java | Deserialization |
| [CVE-2022-42889](https://nvd.nist.gov/vuln/detail/CVE-2022-42889) | Apache Commons Text | Java | Code execution |
| [GHSA-h8cj-hpmg-636v](https://github.com/advisories/GHSA-h8cj-hpmg-636v) | Appsmith | Java | SQL injection |
| [CVE-2013-0156](https://nvd.nist.gov/vuln/detail/CVE-2013-0156) | Ruby on Rails | Ruby | Deserialization |
| [CVE-2020-8130](https://nvd.nist.gov/vuln/detail/CVE-2020-8130) | Rake | Ruby | Command injection |
| [CVE-2021-21288](https://nvd.nist.gov/vuln/detail/CVE-2021-21288) | CarrierWave | Ruby | SSRF |
| [CVE-2023-38337](https://nvd.nist.gov/vuln/detail/CVE-2023-38337) | rswag-api | Ruby | Path traversal |
| [CVE-2017-9841](https://nvd.nist.gov/vuln/detail/CVE-2017-9841) | PHPUnit | PHP | Code execution (`eval`) |
| [CVE-2018-15133](https://nvd.nist.gov/vuln/detail/CVE-2018-15133) | Laravel | PHP | Deserialization |
| [CVE-2018-20997](https://nvd.nist.gov/vuln/detail/CVE-2018-20997) | tar-rs | Rust | Path traversal |
| [CVE-2022-36113](https://nvd.nist.gov/vuln/detail/CVE-2022-36113) | cargo | Rust | Path traversal |
| [CVE-2024-24576](https://nvd.nist.gov/vuln/detail/CVE-2024-24576) | Rust stdlib | Rust | Command injection |
| [CVE-2023-42456](https://rustsec.org/advisories/RUSTSEC-2023-0069.html) | sudo-rs | Rust | Path traversal |
| [CVE-2024-32884](https://rustsec.org/advisories/RUSTSEC-2024-0335.html) | gitoxide | Rust | Command injection |
| [CVE-2025-53549](https://rustsec.org/advisories/RUSTSEC-2025-0043.html) | matrix-rust-sdk | Rust | SQL injection |
| [CVE-2016-3714](https://nvd.nist.gov/vuln/detail/CVE-2016-3714) | ImageMagick (ImageTragick) | C | Command injection |
| [CVE-2019-18634](https://nvd.nist.gov/vuln/detail/CVE-2019-18634) | sudo (pwfeedback) | C | Memory safety |
| [CVE-2019-13132](https://nvd.nist.gov/vuln/detail/CVE-2019-13132) | ZeroMQ libzmq | C++ | Memory safety |
| [CVE-2022-1941](https://nvd.nist.gov/vuln/detail/CVE-2022-1941) | Protocol Buffers | C++ | Memory safety |
| [CVE-2025-69662](https://nvd.nist.gov/vuln/detail/CVE-2025-69662) | geopandas | Python | SQL injection |
| [CVE-2026-33626](https://nvd.nist.gov/vuln/detail/CVE-2026-33626) | LMDeploy | Python | SSRF |
Fixtures live under [`tests/benchmark/cve_corpus/`](tests/benchmark/cve_corpus/) with upstream attribution headers.
<!--
### Real-world findings
- **Nextcloud server**, [PR #59979](https://github.com/nextcloud/server/pull/59979), merged. The runtime decoder for this column already restricted `allowed_classes`, but the repair routine called `unserialize()` without it, so magic methods on referenced classes could still run. Fix matches the runtime path.
-->
--- ---
## Configuration Overview ## How it works
Nyx merges a default configuration file (`nyx.conf`) with user overrides (`nyx.local`). Both live in the platform-specific configuration directory shown below. Two passes over the filesystem, with an optional SQLite index to skip unchanged files:
| Platform | Directory | ```mermaid
|---|---| flowchart LR
| Linux | `~/.config/nyx/` | Repo["Repository files"] --> Pass1["Pass 1 per file<br/>tree-sitter, CFG, SSA"]
| macOS | `~/Library/Application Support/dev.ecpeter23.nyx/` | Pass1 --> Summaries["Function summaries<br/>sources, sinks, sanitizers, points-to"]
| Windows | `%APPDATA%\ecpeter23\nyx\config\` | Summaries --> Index["SQLite index<br/>optional incremental cache"]
Index --> Pass2["Pass 2 cross-file<br/>global summaries, k=1 inline, SCC fixpoint"]
Pass2 --> Rank["Rank and dedupe<br/>severity, evidence, exploitability"]
Rank --> Verify["Dynamic verification<br/>sandboxed harnesses, verdicts"]
Verify --> Output["Console, JSON, SARIF<br/>and browser UI"]
```
Minimal example (`nyx.local`): 1. **Pass 1**: parse each file via tree-sitter, build an intra-procedural CFG (petgraph), lower to pruned SSA (Cytron phi insertion over dominance frontiers), and export per-function summaries (source/sanitizer/sink caps, taint transforms, points-to, callees).
2. **Summary merge**: union all per-file summaries into a `GlobalSummaries` map.
3. **Pass 2**: re-analyze each file with cross-file context under bounded context sensitivity (k=1 inlining for intra-file callees, SCC fixpoint capped at 64 iterations, and summary fallback for callees above the inline body-size cap). A forward dataflow worklist propagates taint through the SSA lattice with guaranteed convergence. Call-graph SCCs iterate to fixed-point (within the cap) so mutually recursive functions get accurate summaries.
4. **Rank, dedupe, verify, emit**: findings are scored by severity × evidence strength × source-kind exploitability. Medium or higher confidence findings are dynamically verified by default, then results are emitted to console, JSON, SARIF, and the browser UI.
Detector families: taint (cross-file source→sink, with cap-specific rule classes for SQLi, XSS, command/code exec, deserialization, SSRF, path traversal, format string, crypto, LDAP injection, XPath injection, HTTP header / response splitting, open redirect, server-side template injection, XXE, prototype pollution, data exfiltration, and the auth fold-in), CFG structural (auth gaps, unguarded sinks, resource leaks), state model (use-after-close, double-close, must-leak, unauthed-access), AST patterns (tree-sitter structural match). Full detector docs: [Detectors](https://nyxscan.dev/docs/detectors.html).
---
## Verify findings dynamically
Static analysis says a sink is reachable. Dynamic verification tries to prove it. With `--verify` (on by default), Nyx builds a small harness around each Medium-or-higher finding, runs it in a sandbox against a curated payload corpus, and stamps a verdict onto the finding.
```bash
nyx scan --verify # build + run a harness per finding (default)
nyx scan --no-verify # static analysis only, for fast local loops
```
A finding is **Confirmed** only when an attacker-controlled payload fires the sink *and* a paired benign control stays clean. That differential rule, plus behavioral oracles (a template that renders `49`, a deserializer that resolves a gadget class, a redirect that leaves the origin), keeps the verifier from confirming on an echoed string. Sinks behind a recognized guard demote to `ConfirmedWithKnownGuard`; sinks reached without a completed exploit chain land as `PartiallyConfirmed`.
Coverage spans 18 verifiable capability classes and 120+ registered adapters across all ten languages (Flask, Django, Express, NestJS, Spring, Rails, Laravel, Gin, Axum, and more), with per-language build pools and copy-on-write workdirs to keep the per-finding cost low. Confirmed findings write a hermetic repro bundle with a `reproduce.sh`. Runs are deterministic: every payload is seeded from the spec hash.
```bash
# CI: fail the build if a new Confirmed finding appears vs. a baseline
nyx scan --baseline .nyx/baseline.json --gate no-new-confirmed
```
Backends: Docker (preferred, network-blocked by default) or an in-process runner with `--harden {standard,strict}`. Full matrix, oracle list, and limitations: [Dynamic verification](https://nyxscan.dev/docs/dynamic.html).
---
## Configuration
Config merges `nyx.conf` (defaults) and `nyx.local` (your overrides) from the platform config directory (`~/.config/nyx/` on Linux, `~/Library/Application Support/nyx/` on macOS, `%APPDATA%\elicpeter\nyx\config\` on Windows).
```toml ```toml
[scanner] [scanner]
mode = "full" # full | ast | taint mode = "full" # full | ast | cfg | taint
min_severity = "Medium" min_severity = "Medium"
follow_symlinks = true
excluded_extensions = ["mp3", "mp4"]
[output] [server]
default_format = "json" host = "127.0.0.1"
max_results = 200 port = 9700
open_browser = true
[performance] # Project-specific sanitizer
worker_threads = 8 # 0 = auto-detect [[analysis.languages.javascript.rules]]
batch_size = 200 matchers = ["escapeHtml"]
channel_multiplier = 2 kind = "sanitizer"
cap = "html_escape"
``` ```
A fully documented `nyx.conf` is generated automatically on first run. Or add rules interactively: `nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape`. Caps: `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `data_exfil`, `code_exec`, `crypto`, `unauthorized_id`, `ldap_injection`, `xpath_injection`, `header_injection`, `open_redirect`, `ssti`, `xxe`, `prototype_pollution`, `all`. Full schema: [Configuration](https://nyxscan.dev/docs/configuration.html). Run `nyx rules list` to browse the registry from the terminal.
--- ---
## Architecture in Brief ## Status
Nyx uses a **two-pass architecture** to enable cross-file analysis without sacrificing parallelism: Under active development. APIs, detector behavior, and configuration options may change between releases. Rule-level F1 on the synthetic corpus is the CI regression floor; per-language detail lives in [`tests/benchmark/RESULTS.md`](tests/benchmark/RESULTS.md).
1. **File enumeration** -- A parallel walker (Rayon + `ignore` crate) applies gitignore rules, size limits, and user exclusions. Taint analysis is interprocedural. Persisted per-function SSA summaries carry per-return-path transforms and parameter-granularity points-to, and call-graph SCCs (including SCCs that span files) iterate to a joint fixed-point. The default `balanced` profile also runs k=1 context-sensitive inlining for intra-file callees. Symex (with cross-file and interprocedural frames) and the demand-driven backwards walk are opt-in. Enable them individually with `--symex` and `--backwards-analysis`, or together with `--engine-profile deep`.
2. **Pass 1 -- Summary extraction** -- Each file is parsed via tree-sitter, an intra-procedural CFG is built (petgraph), and a `FuncSummary` is exported per function capturing source/sanitizer/sink capabilities (bitflags), taint propagation behavior, and callee lists. Summaries are persisted to SQLite.
3. **Summary merge** -- All per-file summaries are merged into a `GlobalSummaries` map with conservative conflict resolution (union caps, OR booleans).
4. **Pass 2 -- Analysis** -- Files are re-parsed and analyzed with the full cross-file context: BFS taint propagation resolves callees against local and global summaries, CFG analysis checks for auth gaps, unguarded sinks, resource leaks, and more.
5. **Reporting** -- Findings are scored, ranked, deduplicated, and emitted to the console or serialized as JSON.
With indexing enabled, Pass 1 skips files whose blake3 content hash is unchanged, and cached findings are served directly for AST-only results. Limitations:
- Interprocedural precision is bounded rather than unlimited. Context-sensitive inlining is k=1 with a callee body-size cap, and SCC fixed-point has an iteration cap. When the engine hits a bound it falls back to summaries and records an `engine_note` on the finding.
- Cross-language calls (FFI, subprocess, WASM) are not traversed. Each language is analysed independently.
- Several language features are not modeled: macros, most dynamic dispatch, aliased imports, reflection.
- C/C++ are preview tier. STL container flow, builder chains, and inline class member functions are tracked now; deep pointer aliasing and function pointers are not. A clean report should not be read as a clean audit. Pair with a clang-based tool before using as a hard CI gate.
- Results may contain false positives or false negatives; manual review is expected.
--- ---
## Roadmap ## Documentation
### Phase 1 -- Deep Static Engine Browse the full docs site at **[nyxscan.dev/docs](https://nyxscan.dev/docs/)**.
| Feature | Description | - [Quick Start](https://nyxscan.dev/docs/quickstart.html) · [CLI Reference](https://nyxscan.dev/docs/cli.html) · [Installation](https://nyxscan.dev/docs/installation.html)
|---|---| - [`nyx serve`](https://nyxscan.dev/docs/serve.html) · [Output Formats](https://nyxscan.dev/docs/output.html) · [Configuration](https://nyxscan.dev/docs/configuration.html) · [Dynamic verification](https://nyxscan.dev/docs/dynamic.html)
| Interprocedural call graph | Precise symbol resolution via `FuncKey`, language-scoped namespaces, cross-module linking. No name-collision merging -- full call graph with topological analysis. | - [How it works](https://nyxscan.dev/docs/how-it-works.html) · [Detectors](https://nyxscan.dev/docs/detectors.html) ([Taint](https://nyxscan.dev/docs/detectors/taint.html), [CFG](https://nyxscan.dev/docs/detectors/cfg.html), [State](https://nyxscan.dev/docs/detectors/state.html), [AST Patterns](https://nyxscan.dev/docs/detectors/patterns.html))
| Path-sensitive analysis | Track path predicates and conditional constraints. Detect infeasible paths and validation-only-in-one-branch patterns. Dramatically reduces false positives. | - [Rule Reference](https://nyxscan.dev/docs/rules.html) · [Language Maturity](https://nyxscan.dev/docs/language-maturity.html) · [Advanced Analysis](https://nyxscan.dev/docs/advanced-analysis.html) · [Auth Analysis](https://nyxscan.dev/docs/auth.html)
| Dataflow & state modeling | Resource state machines (init -> use -> close), auth state transitions, privilege level tracking. Semantic analysis beyond pattern matching. |
| Attack surface ranking | Score entry points by distance-to-sink, guard strength, path complexity, and privilege escalation potential. Deterministic attack surface scoring. |
### Phase 2 -- Dynamic Capability
| Feature | Description |
|---|---|
| Controlled dynamic execution | Local sandbox: identify entry points, spin up test harnesses, inject payloads, detect runtime crashes and command execution. Deterministic automated exploit validation -- static finds `exec(user_input)`, dynamic confirms it with `; id`. |
| Fuzzing integration | libFuzzer (C/C++), cargo-fuzz (Rust), go-fuzz, HTTP fuzzing harness. Static engine identifies interesting functions, fuzzer targets only those. |
### Phase 3 -- Intelligent Reasoning Layer
| Feature | Description |
|---|---|
| Semantic similarity | Embeddings for finding similar vulnerability patterns across codebases. |
| LLM reasoning | AI-assisted detection of non-obvious logic bugs. |
| Exploit refinement | Automated loops to refine and validate exploit chains. |
### Other planned improvements
| Area | Details |
|---|---|
| Output formats | SARIF 2.1.0, JUnit XML, HTML report generator |
| Language coverage | Expanded taint rules per language, resource leak pairs for Python/Ruby/PHP/JS/TS |
| Rule updates | Remote rule feed with signature verification |
| UX | Progress bar, smart file-watch re-scan |
Community feedback shapes priorities -- please [open an issue](https://github.com/ecpeter23/nyx/issues) to discuss proposed changes.
--- ---
## Contributing ## Contributing
Pull requests are welcome. To contribute: Contributions are welcome.
1. Fork the repository and create a feature branch. Nyx is open source and will always have a fully open-source core. To support long-term development and keep the project sustainable, contributors may be asked to sign a Contributor License Agreement before their first merged contribution.
2. Adhere to `rustfmt` and ensure `cargo clippy --all -- -D warnings` passes.
3. Add unit and/or integration tests where applicable (`cargo test` should remain green).
4. Submit a concise, well-documented pull request.
Please open an issue for any crash, panic, or suspicious result -- attach the minimal code snippet and mention the Nyx version. Run `sh scripts/check.sh` before submitting. See [`CONTRIBUTING.md`](CONTRIBUTING.md) for the full guide, including how to add rules and support new languages. Open an issue for crashes, panics, or suspicious results; attach a minimal snippet and the Nyx version.
See `CONTRIBUTING.md` for full guidelines. ---
## AI Disclosure
- **Engine code** (taint, SSA, CFG, call graph, abstract interp, symbolic exec): predominantly human-written. AI was used selectively for refactors and boilerplate, with all merges human-reviewed.
- **Docs and most of this README**: AI-generated from the code and hand-edited. Report doc/code drift as a bug.
- **Test fixtures and `expected.yaml` files**: AI-assisted drafting, human-audited before landing.
- **Frontend UI** (React app): built with AI assistance, human-reviewed.
As with any static analyzer, validate findings against your own corpus before using Nyx as a CI gate.
--- ---
## License ## License
Nyx is licensed under the **GNU General Public License v3.0 (GPL-3.0)**. GNU General Public License v3.0 or later (GPL-3.0-or-later). The optional `smt` feature bundles Z3 (MIT-licensed); distributors of binaries built with `--features smt` should include Z3's license in their attribution. Full text in [LICENSE](./LICENSE); third-party dependencies in [THIRDPARTY-LICENSES.html](./THIRDPARTY-LICENSES.html).
This ensures that all modified versions of the scanner remain free and open-source, protecting the integrity and transparency of security tools.
See [LICENSE](./LICENSE) for full details.

276
README.zh-CN.md Normal file
View file

@ -0,0 +1,276 @@
<div align="center">
<img src="assets/nyx-readme-header.png" alt="NYX" width="640"/>
**本地优先的安全扫描器,带沙箱动态验证和浏览器 UI。在本地扫描代码仓库并在浏览器中分诊处理无需云端、无需账号。**
[![crates.io](https://img.shields.io/crates/v/nyx-scanner.svg)](https://crates.io/crates/nyx-scanner)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![Rust 1.88+](https://img.shields.io/badge/rust-1.88%2B-orange)](https://www.rust-lang.org)
[![CI](https://img.shields.io/github/actions/workflow/status/elicpeter/nyx/ci.yml?branch=master)](https://github.com/elicpeter/nyx/actions)
[![Docs](https://img.shields.io/badge/docs-nyxscan.dev%2Fdocs-blue)](https://nyxscan.dev/docs/)
[English](./README.md) · 简体中文
</div>
<p align="center"><img src="assets/screenshots/demo.gif" alt="Nyx UI 演示:从空欢迎页开始扫描,查看含健康分的总览页,钻入一条 HIGH 级发现的流可视化,再到分诊流程" width="900"/></p>
---
## 本地扫描,本地浏览
Nyx 在你的代码仓库上运行跨语言污点分析,然后对中高置信度发现运行小型沙箱 harness验证真实代码里 source 到 sink 的流是否会触发。结果通过绑定到 `127.0.0.1` 的 React UI 提供给你。你会看到严重等级、静态证据、动态验证结果,以及分步**流可视化**,从源 → 净化器 → 汇逐步呈现数据流。分诊决策持久化在 `.nyx/triage.json` 中,与代码一同提交,团队共享同一份分诊状态。
```bash
cargo install nyx-scanner
nyx scan # 运行分析器,把发现缓存到 .nyx/
nyx serve # 在浏览器中打开 http://localhost:9700
```
一切都留在你本地:仅回环绑定、强制 host 头校验、所有变更操作均带 CSRF、无远程遥测、无登录。
<p align="center"><img src="assets/screenshots/overview.png" alt="一个小型 JS 应用的总览仪表盘:健康分 C 78五项分量分解严重度压力、置信度质量、趋势、分诊覆盖、回归抗性3 条发现OWASP A03 与 A02 类别,置信度分布与问题类别条形图,受影响最多的文件" width="900"/></p>
---
## UI 中包含什么
| 页面 | 显示内容 |
|---|---|
| **总览** | 仪表盘:按严重等级分类的发现计数、热点文件、引擎画像摘要 |
| **发现** | 可浏览列表,含严重度徽章、分诊状态、规则筛选、语言筛选 |
| **发现详情** | 流路径可视化,带编号步骤(源 → 净化器 → 汇)、动态验证结果、代码片段、证据、跨文件标记、分诊下拉框 |
| **分诊** | 批量更新状态open、investigating、fixed、false_positive、accepted_risk、suppressed审计日志JSON 导入/导出 |
| **资源管理器** | 文件树,含每个文件的符号列表与发现叠加层 |
| **扫描** | 历史记录、指标,对比两次扫描查看差异 |
| **规则** | 各语言的内置与自定义规则;可在 UI 中添加规则 |
| **配置** | 实时配置编辑器;无需重启即可重载 |
`nyx serve` 参数:`--port <N>`(默认 `9700`)、`--host <addr>`(仅回环:`127.0.0.1``localhost``::1`)、`--no-browser`。持久化设置见 `nyx.conf``[server]` 段,分页面 UI 介绍与安全模型详见 [Browser UI 指南](https://nyxscan.dev/docs/serve.html)。
---
## 用于 CI 的 CLI
同一个引擎可以无头运行用于 CI 流水线。SARIF 输出可直接上传到 GitHub Code Scanning。
<p align="center"><img src="assets/screenshots/cli-scan.gif" alt="nyx scan 终端输出JS 与 Python 文件中的 HIGH 级污点发现及 source → sink 箭头" width="820"/></p>
```bash
# 在 medium 及以上等级让 CI 失败,并输出 SARIF
nyx scan --format sarif --fail-on MEDIUM > results.sarif
# 临时 JSON无索引
nyx scan ./server --format json --index off
# 仅 AST 模式(最快;跳过 CFG + 污点)
nyx scan --mode ast
# 引擎深度快捷方式fast | balanced默认 | deep
# `deep` 增加 symex 与按需后向污点,精度更高,开销约 2-3 倍
nyx scan --engine-profile deep
```
正向跨文件污点在所有画像下都会运行。Symex 与按需后向遍历是可选项,可通过 `--engine-profile deep` 一次性开启,或单独开启(`--symex``--backwards-analysis`)。完整开关矩阵见 [CLI 参考](https://nyxscan.dev/docs/cli.html#engine-depth-profile)。
### GitHub Action
```yaml
- uses: elicpeter/nyx@v0.8.0
with:
format: sarif
fail-on: MEDIUM
- uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: nyx-results.sarif
```
输入:`path``version``format``sarif`|`json`|`console`)、`fail-on``args``token`。输出:`finding-count``sarif-file``exit-code``nyx-version`。支持 Linux 与 macOS runnerx86_64、ARM64
---
## 安装
**Cargo推荐**
```bash
cargo install nyx-scanner
```
**预编译二进制:** 从 [Releases](https://github.com/elicpeter/nyx/releases) 下载对应平台的归档包,对照 `SHA256SUMS`(以及随附的 `SHA256SUMS.asc` GPG 签名,如有提供)校验,解压并把 `nyx` 放到 `PATH` 中。
```bash
# 可选:校验校验文件的 GPG 签名(当 SHA256SUMS.asc 已发布时)
gpg --verify SHA256SUMS.asc SHA256SUMS
sha256sum -c SHA256SUMS --ignore-missing
unzip nyx-x86_64-unknown-linux-gnu.zip && chmod +x nyx && sudo mv nyx /usr/local/bin/
```
**从源码编译:**
```bash
git clone https://github.com/elicpeter/nyx.git
cd nyx && cargo build --release
```
需要 stable Rust 1.88+。前端会在编译期被打包嵌入二进制中,因此 `nyx serve` 没有单独的安装步骤。
---
## 语言支持
全部 10 种语言都通过 tree-sitter 解析并跑完整流水线,但规则深度与引擎覆盖并不均衡。在 [`tests/benchmark/ground_truth.json`](tests/benchmark/ground_truth.json) 的合成语料上,所有十种语言在最近一次基线测量中 F1 均为 100%(见 [`tests/benchmark/RESULTS.md`](tests/benchmark/RESULTS.md)),因此 F1 已无法单独区分梯度。分级反映规则深度、门控汇覆盖、以及合成语料未充分覆盖的结构性惯用法:
| 梯度 | 语言 | F1 | 适合用作 CI 门禁吗? |
|---|---|---|---|
| **稳定** | Python、JavaScript、TypeScript | 100% | 适合 |
| **Beta** | Java、PHP、Ruby、Rust、Go | 100% | 适合,需轻度 FP 分诊 |
| **预览** | C、C++ | 合成语料 100% | 不适合。已跟踪 STL 容器流、builder 链、内联类成员函数;尚未覆盖深度指针别名与函数指针。建议与 clang-tidy 或 Clang Static Analyzer 搭配使用 |
所有真实 CVE 用例均触发,语料在记录基线下无未关闭的 FPP=R=F1=1.000)。各维度详情与已知盲区见 [语言成熟度页面](https://nyxscan.dev/docs/language-maturity.html)。
### 通过真实 CVE 验证
语料中还包含一小批从公开公告中提取的「漏洞 / 已修复」配对,因此基准下限不仅由合成的同形测例守护,还由对真实 bug 的回归保护守护。每个配对 Nyx 都在漏洞文件上触发、在已修复文件上零发现。
| CVE | 项目 | 语言 | 类别 |
|---|---|---|---|
| [CVE-2023-48022](https://nvd.nist.gov/vuln/detail/CVE-2023-48022) | Ray | Python | 命令注入 |
| [CVE-2017-18342](https://nvd.nist.gov/vuln/detail/CVE-2017-18342) | PyYAML | Python | 反序列化 |
| [CVE-2019-14939](https://nvd.nist.gov/vuln/detail/CVE-2019-14939) | mongo-express | JavaScript | 代码执行(`eval` |
| [CVE-2023-22621](https://nvd.nist.gov/vuln/detail/CVE-2023-22621) | Strapi | JavaScript | 代码执行SSTI |
| [CVE-2025-64430](https://nvd.nist.gov/vuln/detail/CVE-2025-64430) | Parse Server | JavaScript | SSRF |
| [CVE-2023-26159](https://nvd.nist.gov/vuln/detail/CVE-2023-26159) | follow-redirects | TypeScript | SSRF |
| [GHSA-4x48-cgf9-q33f](https://github.com/advisories/GHSA-4x48-cgf9-q33f) | Novu | TypeScript | SSRF |
| [CVE-2026-25544](https://nvd.nist.gov/vuln/detail/CVE-2026-25544) | Payload CMS | TypeScript | SQL 注入 |
| [CVE-2022-30323](https://nvd.nist.gov/vuln/detail/CVE-2022-30323) | hashicorp/go-getter | Go | 命令注入 |
| [CVE-2024-31450](https://nvd.nist.gov/vuln/detail/CVE-2024-31450) | owncast | Go | 路径穿越 |
| [CVE-2023-3188](https://nvd.nist.gov/vuln/detail/CVE-2023-3188) | owncast | Go | SSRF |
| [CVE-2026-41422](https://github.com/daptin/daptin/security/advisories/GHSA-rw2c-8rfq-gwfv) | daptin | Go | SQL 注入 |
| [CVE-2015-7501](https://nvd.nist.gov/vuln/detail/CVE-2015-7501) | Apache Commons Collections | Java | 反序列化 |
| [CVE-2017-12629](https://nvd.nist.gov/vuln/detail/CVE-2017-12629) | Apache Solr | Java | 命令注入 |
| [CVE-2022-1471](https://nvd.nist.gov/vuln/detail/CVE-2022-1471) | SnakeYAML | Java | 反序列化 |
| [CVE-2022-42889](https://nvd.nist.gov/vuln/detail/CVE-2022-42889) | Apache Commons Text | Java | 代码执行 |
| [GHSA-h8cj-hpmg-636v](https://github.com/advisories/GHSA-h8cj-hpmg-636v) | Appsmith | Java | SQL 注入 |
| [CVE-2013-0156](https://nvd.nist.gov/vuln/detail/CVE-2013-0156) | Ruby on Rails | Ruby | 反序列化 |
| [CVE-2020-8130](https://nvd.nist.gov/vuln/detail/CVE-2020-8130) | Rake | Ruby | 命令注入 |
| [CVE-2021-21288](https://nvd.nist.gov/vuln/detail/CVE-2021-21288) | CarrierWave | Ruby | SSRF |
| [CVE-2023-38337](https://nvd.nist.gov/vuln/detail/CVE-2023-38337) | rswag-api | Ruby | 路径穿越 |
| [CVE-2017-9841](https://nvd.nist.gov/vuln/detail/CVE-2017-9841) | PHPUnit | PHP | 代码执行(`eval` |
| [CVE-2018-15133](https://nvd.nist.gov/vuln/detail/CVE-2018-15133) | Laravel | PHP | 反序列化 |
| [CVE-2018-20997](https://nvd.nist.gov/vuln/detail/CVE-2018-20997) | tar-rs | Rust | 路径穿越 |
| [CVE-2022-36113](https://nvd.nist.gov/vuln/detail/CVE-2022-36113) | cargo | Rust | 路径穿越 |
| [CVE-2024-24576](https://nvd.nist.gov/vuln/detail/CVE-2024-24576) | Rust stdlib | Rust | 命令注入 |
| [CVE-2023-42456](https://rustsec.org/advisories/RUSTSEC-2023-0069.html) | sudo-rs | Rust | 路径穿越 |
| [CVE-2024-32884](https://rustsec.org/advisories/RUSTSEC-2024-0335.html) | gitoxide | Rust | 命令注入 |
| [CVE-2025-53549](https://rustsec.org/advisories/RUSTSEC-2025-0043.html) | matrix-rust-sdk | Rust | SQL 注入 |
| [CVE-2016-3714](https://nvd.nist.gov/vuln/detail/CVE-2016-3714) | ImageMagick (ImageTragick) | C | 命令注入 |
| [CVE-2019-18634](https://nvd.nist.gov/vuln/detail/CVE-2019-18634) | sudo (pwfeedback) | C | 内存安全 |
| [CVE-2019-13132](https://nvd.nist.gov/vuln/detail/CVE-2019-13132) | ZeroMQ libzmq | C++ | 内存安全 |
| [CVE-2022-1941](https://nvd.nist.gov/vuln/detail/CVE-2022-1941) | Protocol Buffers | C++ | 内存安全 |
| [CVE-2025-69662](https://nvd.nist.gov/vuln/detail/CVE-2025-69662) | geopandas | Python | SQL 注入 |
| [CVE-2026-33626](https://nvd.nist.gov/vuln/detail/CVE-2026-33626) | LMDeploy | Python | SSRF |
用例文件位于 [`tests/benchmark/cve_corpus/`](tests/benchmark/cve_corpus/),并附上游归属头注释。
---
## 工作原理
对文件系统进行两遍扫描,可选用 SQLite 索引跳过未变更文件:
1. **Pass 1**:用 tree-sitter 解析每个文件,构建过程内 CFGpetgraph下降到剪枝后的 SSA在支配边界上做 Cytron phi 插入并导出每函数摘要source/sanitizer/sink 能力位、污点变换、指向集、被调集合)。
2. **摘要合并**:将每文件摘要并集合并为 `GlobalSummaries` 映射。
3. **Pass 2**:在跨文件上下文与有限上下文敏感(文件内被调用 k=1 内联SCC 不动点上限 64 次迭代,超过内联体大小阈值的被调用走摘要回退)下重新分析每个文件。正向数据流工作表通过 SSA 格传播污点,保证收敛。调用图 SCC 迭代到不动点(在上限内),使相互递归函数能拿到准确摘要。
4. **排序、去重、动态验证、输出**:按 严重度 × 证据强度 × 源类可利用性 打分。默认构建会对中高置信度发现做动态验证然后输出到控制台、JSON、SARIF 和浏览器 UI。
检测器家族:污点(跨文件 source→sink含 SQLi、XSS、命令/代码执行、反序列化、SSRF、路径穿越、格式串、加密、LDAP 注入、XPath 注入、HTTP 头/响应拆分、开放重定向、服务端模板注入、XXE、原型污染、数据外泄、以及 auth 折入的能力位类规则、CFG 结构鉴权缺失、未守卫汇、资源泄漏、状态模型use-after-close、double-close、must-leak、unauthed-access、AST 模式tree-sitter 结构匹配)。完整检测器文档:[Detectors](https://nyxscan.dev/docs/detectors.html)。
---
## 动态验证
静态分析说明 source 到 sink 可达。动态验证会尝试证明这条路径在真实代码里会触发。默认构建开启该功能,`nyx scan` 会为中高置信度发现生成 harness在沙箱中用 curated payload 运行,并把结果写入 `evidence.dynamic_verdict`
```bash
nyx scan --verify # 默认行为的显式写法
nyx scan --no-verify # 只跑静态分析,适合本地快速循环
```
`Confirmed` 只有在攻击 payload 触发 sink 且对应的良性 control 保持干净时才会出现。`NotConfirmed` 表示 harness 跑完但没有触发,不等于发现已关闭。完整能力矩阵、后端与限制见 [Dynamic verification](https://nyxscan.dev/docs/dynamic.html)。
---
## 配置
配置由 `nyx.conf`(默认值)与 `nyx.local`你的覆写合并而成从平台配置目录读取Linux 为 `~/.config/nyx/`macOS 为 `~/Library/Application Support/nyx/`Windows 为 `%APPDATA%\elicpeter\nyx\config\`)。
```toml
[scanner]
mode = "full" # full | ast | cfg | taint
min_severity = "Medium"
[server]
host = "127.0.0.1"
port = 9700
open_browser = true
# 项目专属净化器
[[analysis.languages.javascript.rules]]
matchers = ["escapeHtml"]
kind = "sanitizer"
cap = "html_escape"
```
或交互式添加规则:`nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape`。能力位caps`env_var``html_escape``shell_escape``url_encode``json_parse``file_io``fmt_string``sql_query``deserialize``ssrf``data_exfil``code_exec``crypto``unauthorized_id``ldap_injection``xpath_injection``header_injection``open_redirect``ssti``xxe``prototype_pollution``all`。完整 schema[Configuration](https://nyxscan.dev/docs/configuration.html)。运行 `nyx rules list` 可在终端浏览注册表。
---
## 状态
正在积极开发中。API、检测器行为、配置项可能在版本间发生变化。合成语料上的规则级 F1 是 CI 回归下限;分语言详情见 [`tests/benchmark/RESULTS.md`](tests/benchmark/RESULTS.md)。
污点分析是过程间的。持久化的每函数 SSA 摘要带有按返回路径的变换与参数粒度的指向集,调用图 SCC包括跨文件 SCC迭代到联合不动点。默认 `balanced` 画像还会对文件内被调用做 k=1 上下文敏感内联。Symex含跨文件与过程间帧以及按需后向遍历是可选项。可分别用 `--symex``--backwards-analysis` 单独开启,或通过 `--engine-profile deep` 一并开启。
局限:
- 过程间精度是有界而非无限的。上下文敏感内联为 k=1 且有被调用体大小上限SCC 不动点有迭代上限。引擎触达上限时回退到摘要,并在发现上记录 `engine_note`
- 不跨语言追踪调用FFI、子进程、WASM。每种语言独立分析。
- 几项语言特性未建模:宏、大多数动态分派、别名导入、反射。
- C/C++ 处于预览梯度。当前已跟踪 STL 容器流、builder 链、内联类成员函数;深度指针别名与函数指针未跟踪。干净报告不应被理解为干净审计。在作为硬性 CI 门禁之前,请与基于 clang 的工具搭配使用。
- 结果可能含误报或漏报;预期需要人工复核。
---
## 文档
完整文档站点:**[nyxscan.dev/docs](https://nyxscan.dev/docs/)**。
- [Quick Start](https://nyxscan.dev/docs/quickstart.html) · [CLI Reference](https://nyxscan.dev/docs/cli.html) · [Installation](https://nyxscan.dev/docs/installation.html)
- [`nyx serve`](https://nyxscan.dev/docs/serve.html) · [Output Formats](https://nyxscan.dev/docs/output.html) · [Configuration](https://nyxscan.dev/docs/configuration.html)
- [How it works](https://nyxscan.dev/docs/how-it-works.html) · [Detectors](https://nyxscan.dev/docs/detectors.html)[Taint](https://nyxscan.dev/docs/detectors/taint.html)、[CFG](https://nyxscan.dev/docs/detectors/cfg.html)、[State](https://nyxscan.dev/docs/detectors/state.html)、[AST Patterns](https://nyxscan.dev/docs/detectors/patterns.html)
- [Rule Reference](https://nyxscan.dev/docs/rules.html) · [Language Maturity](https://nyxscan.dev/docs/language-maturity.html) · [Advanced Analysis](https://nyxscan.dev/docs/advanced-analysis.html) · [Auth Analysis](https://nyxscan.dev/docs/auth.html)
---
## 参与贡献
欢迎贡献。
Nyx 是开源项目,并将永远保有完全开源的核心。为了支持长期开发并使项目可持续,贡献者在首次合入前可能会被要求签署 Contributor License Agreement。
提交前请运行 `sh scripts/check.sh`。完整指南(包括如何添加规则与支持新语言)见 [`CONTRIBUTING.md`](CONTRIBUTING.md)。崩溃、panic 或可疑结果请提 issue附最小复现片段与 Nyx 版本号。
---
## AI 披露
- **引擎代码**taint、SSA、CFG、调用图、抽象解释、符号执行以人工编写为主。AI 仅用于有选择的重构与样板代码,所有合入均经人工审阅。
- **文档与本 README 的大部分内容**:由 AI 基于代码生成并经人工编辑。文档与代码漂移请作为 bug 上报。
- **测试用例与 `expected.yaml` 文件**AI 协助起草,落库前经人工审核。
- **前端 UI**React 应用):在 AI 协助下构建,经人工审阅。
与任何静态分析器一样,在把 Nyx 用作 CI 门禁前,请基于你自己的语料验证发现。
---
## 许可证
GNU General Public License v3.0 或更高版本GPL-3.0-or-later。可选的 `smt` 特性会捆绑 Z3MIT 许可);分发以 `--features smt` 构建的二进制时,应在归属信息中包含 Z3 的许可证。完整文本见 [LICENSE](./LICENSE);第三方依赖见 [THIRDPARTY-LICENSES.html](./THIRDPARTY-LICENSES.html)。

94
RELEASE_CHECKLIST.md Normal file
View file

@ -0,0 +1,94 @@
# Release checklist: 0.8.0 (dynamic verification)
Maintainer-facing gate for cutting `0.8.0`. The release ships the dynamic
verifier (Tracks J through S of `.pitboss/play/plan.md`). Sign-off requires
every row below green, and every CI matrix row green for at least three
consecutive runs on `master`.
Legend: `[x]` verified locally on the dev reference machine, `[ ]` confirmed
by CI (must hold for three consecutive runs before tagging).
## Cross-cutting invariants
- [x] `cargo check --no-default-features --features serve` green.
- [x] `cargo check --features dynamic` green.
- [x] `cargo nextest run --features dynamic` green: 6545 passed, 0 failed, 16 skipped.
- [x] Determinism: every payload RNG seeds from `spec.spec_hash`; oracle canaries derive from `BLAKE3(spec_hash || run_nonce)`. `scripts/check_no_unseeded_rand.sh` audits the tree.
- [x] Observability: each new code path emits a `VerifyTrace` event and a typed `Inconclusive` / `Unsupported` reason.
- [x] Security: every sink-under-test routes through `src/dynamic/policy.rs` deny rules; no phase weakened the seccomp / `.sb` profile sets.
- [ ] Performance: default `nyx scan` (no `--verify`) latency does not regress.
## Ship gates (`scripts/m7_ship_gate.sh`)
- [x] Gate 1: static-only scan green on `tests/benchmark/corpus`.
- [x] Gate 2: `cargo nextest run --features dynamic` green (covers Gate 4 + Gate 5 binaries).
- [x] Gate 3: with-verify / static-only wall-clock ratio <= 1.5x on `benches/fixtures/`.
- [x] Gate 4: SARIF schema validation on every dynamic verdict variant.
- [x] Gate 5: layering boundary test green.
- [ ] Gate 6: Java OWASP Benchmark v1.2 `--verify` acceptance (wall-clock <= 15 min CI, per-cap precision >= 0.85 / recall >= 0.40, per-`(cap, lang)` budget). Self-skips without `NYX_OWASP_CORPUS`.
- [ ] Gate 7: NodeGoat + Juice Shop acceptance. Self-skips without `NYX_NODEGOAT_CORPUS` / `NYX_JUICESHOP_CORPUS`.
- [ ] Gate 8: RailsGoat / DVWA / DVPWA / gosec / RustSec acceptance. Self-skips without the matching `NYX_*_CORPUS`.
Gates 6 through 8 run against real corpora that are not vendored into the repo.
They are enforced in the `eval` workflow with the corpora cached on the CI
runner. Locally they self-skip with a clear message.
## CI matrix rows (must be green three runs running)
`ci.yml`:
- [ ] frontend, rustfmt, clippy-stable, cargo-deny, unused-deps, third-party-licenses
- [ ] docs-fresh (`nyx-docgen` output committed), rustdoc
- [ ] rust-beta-build, msrv
- [ ] rust-stable-test-linux-without-docker, rust-stable-test-linux-with-docker (`cargo nextest run --all-features`)
`dynamic.yml` (each runs `cargo nextest run --features dynamic`):
- [ ] linux-process-only
- [ ] linux-with-docker
- [ ] macos
`eval.yml`:
- [ ] owasp (Gate 6)
- [ ] jsts matrix: nodegoat, juiceshop (Gate 7)
- [ ] polyglot matrix: railsgoat, dvwa, dvpwa, gosec, rustsec (Gate 8)
## Docs and metadata
- [x] `Cargo.toml` version bumped to `0.8.0`; `Cargo.lock` regenerated.
- [x] `docs/dynamic.md` rewritten: cap x lang matrix, framework adapter table, oracle table, performance budgets, limitations.
- [x] `README.md` dynamic verification section + docs link.
- [x] `CHANGELOG.md` `[0.8.0]` entry covers Tracks J through S.
- [x] Stray version strings updated (README GitHub Action pin, telemetry doc example).
## Known limitations carried into 0.8.0
These are documented in `docs/dynamic.md` and accepted for the MVP. They are
not release blockers, but the release notes should not overstate the verifier.
- **Guarded-sink over-confirmation (resolved on `dynamic`).** The synthesized
harness now drives the finding's enclosing entry function when one is
derivable, routing the payload to the tainted parameter, so a guard that
lives in the caller (a `Object.create(null)` merge target, an allowlisting
`resolveClass`, a const-name check before `Marshal.load`) runs first and
participates in the verdict. The build-time entry-vs-sink choice is recorded
on the verify trace as `entry_invocation`. When no enclosing entry can be
derived the harness falls back to driving the sink directly, which can still
over-confirm a guard it never executes. On the in-house fixture set the
verify scan now confirms the 8 genuine vulnerabilities and reads
`NotConfirmed` on all 4 negative-control files.
- **In-house confirmed rate is modest.** A `--verify` scan of
`tests/dynamic_fixtures` (process backend) lands 8 Confirmed / 15
NotConfirmed / 115 Inconclusive / 137 Unsupported of 275. The Unsupported
bulk is `SoundOracleUnavailable` (ENV_VAR / SHELL_ESCAPE / URL_ENCODE source
and sanitizer caps, correct by design); the Inconclusive bulk is
`SpecDerivationFailed` on benign and scaffolding fixtures with no derivable
flow. The authoritative confirmed / precision / recall numbers come from the
real-corpus gates (6 through 8), which require the corpora.
- **Real-corpus gates unverified locally.** Gates 6 through 8 self-skip without
`NYX_*_CORPUS`. The >= 40% confirmed and >= 0.85 precision targets are
enforced only in the `eval` workflow.
## Tag
- [ ] Three consecutive green CI runs on `master` confirmed.
- [ ] Real-corpus gates (6 through 8) green in the `eval` workflow with corpora wired.
- [ ] `git tag v0.8.0` and push; `release-build.yml` publishes the binaries and `SHA256SUMS`.

23
ROADMAP.md Normal file
View file

@ -0,0 +1,23 @@
# Roadmap
## Now: recall and precision on real codebases
The current focus is straightforward. Run Nyx against real open-source repositories and real CVEs, then close the gap between what it finds and what it should find.
That means:
- **Recall.** Pick CVEs with public fixes. Reproduce them on the vulnerable commit. If Nyx misses, figure out why (missing source, missing sink, lost flow across a call, dropped at a sanitizer that was not actually a sanitizer) and fix the underlying analysis, not the fixture.
- **Precision.** Triage the noise on large repos (phpMyAdmin, Nextcloud, and others). Each false positive gets reduced to a pattern: receiver-type gate, non-crypto context for `md5`/`sha1`, type-safe sink suppression, etc. Land the gate, re-run the corpus, confirm the count drops without taking real bugs with it.
- **Corpus discipline.** Every fix lands with a fixture (positive or negative) and a corpus row. Rule-level F1 on `tests/benchmark/corpus/` is the scoreboard. CI floors only ratchet up.
The scanner internals (SSA, cross-file summaries, abstract interpretation, symbolic execution, auth analysis) are in place. They get refined in service of the recall/precision work, not extended for their own sake.
## Later: dynamic capability
Static analysis confirms a flow exists. Dynamic execution confirms it fires. The plan is a local sandbox that picks up entry points Nyx already identifies, builds a harness, injects a payload, and watches for the crash or shell. Pairs naturally with fuzzing (libFuzzer, cargo-fuzz, go-fuzz, HTTP) where the static engine picks the targets.
Not started. Lands after the static side is honest on real corpora.
## Later still: reasoning layer
Embeddings for cross-codebase pattern similarity. LLM-assisted detection for logic bugs that resist taint modeling. Automated exploit refinement loops. All speculative until the foundation is solid.

View file

@ -1,46 +1,88 @@
# Security Policy # Security Policy
## Supported Versions ## Reporting a vulnerability
| Version | Supported | Notes | Report privately. Do not open a public GitHub issue for a security bug.
|---------|-----------|----------------------|
| 0.2.x | ✅ | Latest stable line |
| 0.1.x | ✅ | Critical fixes only |
| < 0.1 | | End-of-life |
We follow [Semantic Versioning] as soon as we hit **1.0.0**. Use [GitHub Security Advisories](https://github.com/elicpeter/nyx/security/advisories/new) to file a private report. Only the maintainers see it.
Before that, breaking changes may land in any minor release.
## Reporting a Vulnerability Include:
* **Private disclosure first.** - Affected version (`nyx --version`) and OS
Please **do not** open public GitHub issues for security bugs. - Reproduction steps or a minimal PoC
- Impact (RCE, file read or write, sandbox escape, auth bypass in `nyx serve`, etc.)
- Whether you have a fix in mind
* **How to report** You'll get an acknowledgement within 3 business days, and a status update every 7 days until the issue is closed.
1. To report a vulnerability, please use the GitHub disclosure in the security tab to alert us to a security issue.
* **What to include** ## Scope
A minimal PoC or reproduction steps
Affected Nyx version (`nyx --version`) and OS
Impact explanation (e.g. RCE, DoS, data leak)
* **Response timeline** In scope: bugs that let untrusted input reach the Nyx process and cause harm.
We acknowledge within **3 business days** and give a status update every **7 days** thereafter until resolution.
## Disclosure Process - Code execution in the scanner: parser exploits, deserialization, command injection in helpers, custom-rule sandbox escape.
- Path traversal or arbitrary file access outside the target repo.
- `nyx serve` issues: auth bypass, host-header bypass, CSRF on mutating routes, XSS in the UI, cross-origin access from a non-loopback origin.
- Memory safety bugs in any unsafe Rust we introduce.
- Tampering with `.nyx/` triage state from outside the user's repo.
- Supply chain issues affecting published `nyx-scanner` crates or release artifacts.
1. We confirm the issue and assign a CVE (via GitHub or MITRE). Out of scope:
2. A fix is developed on a private branch and back-ported if needed.
3. Coordinated release: new version on crates.io + public advisory.
4. Credit is given to the reporter unless they request anonymity.
## Scope & Severity - False positives or missed detections in scan output. File a regular GitHub issue with the rule ID and a fixture.
- Findings Nyx reports against your own code. That's the scanner working, not a Nyx vulnerability.
- Anything requiring physical or local-account access to the user's machine.
- Self-XSS and missing security headers on `127.0.0.1` endpoints. The UI is loopback-only.
- Performance pathologies on hostile input (a 50 GB file, deeply nested grammars). We harden where we can.
- Issues only reachable by a user editing their own `nyx.conf` to weaken defaults.
This policy covers vulnerabilities that let an **untrusted Nyx input** cause: ## Supported versions
* Remote or local code execution in the Nyx process | Version | Status |
* Privilege escalation, data exfiltration, or denial of service |---------|-----------------------|
| 0.7.x | Supported |
| 0.6.x | Critical fixes only |
| < 0.6 | End of life |
**False positives / missed detections** in scan results are *quality issues*, not security issues—please file normal GitHub issues for those. The project follows [Semantic Versioning](https://semver.org) once it reaches 1.0.0. Until then, breaking changes can land in any minor release.
[Semantic Versioning]: https://semver.org ## Severity
We use [CVSS 3.1](https://www.first.org/cvss/v3.1/specification-document) to rate reports.
| Severity | Examples |
|----------|-----------------------------------------------------------------------------------------------|
| Critical | Unauthenticated RCE in `nyx serve`, custom-rule sandbox escape during a default scan |
| High | Auth bypass against `nyx serve`, arbitrary file write outside the repo |
| Medium | Stored XSS in the UI, CSRF on a mutating route, host-header bypass |
| Low | Information disclosure with no privilege change, log-injection, denial of service via input |
## Disclosure
Coordinated disclosure.
1. We confirm the report and assign severity.
2. We request a CVE through GitHub or MITRE.
3. A fix is developed on a private branch, with backports to supported lines if needed.
4. A new release ships on crates.io and a public advisory goes out.
5. The reporter is credited in the advisory and the changelog, unless they ask to stay anonymous.
Target window from report to fix is 90 days. If you need to publish on a shorter timeline, tell us in the report and we'll work toward it.
## Safe harbor
Good-faith security research is welcome. We won't pursue legal action against researchers who:
- Report privately and give a reasonable window before publishing.
- Test against their own installations, not third-party deployments running Nyx.
- Avoid data destruction, account takeover, and service disruption.
- Stop and reach out if a test starts to affect data or systems they don't own.
If you're not sure whether a test is in scope, ask first.
## Bounty
There is no paid bug bounty program. Credit, a thank-you in the advisory, and a mention in the changelog are what we offer today.
## Security model recap
Nyx runs locally. The browser UI binds to `127.0.0.1` by default, requires a matching `Host` header, and uses a CSRF token on every mutating request. There is no login, no telemetry, and no remote control plane. If you find a way around any of those defaults, that's a security issue and we want to hear about it.

6498
THIRDPARTY-LICENSES.html Normal file

File diff suppressed because it is too large Load diff

View file

@ -1,12 +1,80 @@
# Pin the target triples scanned so `cargo about generate` produces the
# same output regardless of host OS. Must match the release build matrix
# in .github/workflows/release-build.yml — otherwise the CI diff step
# (third-party-licenses) will fail on platform-specific crates like
# linux-raw-sys, android_system_properties, etc.
targets = [
"x86_64-unknown-linux-gnu",
"aarch64-unknown-linux-gnu",
"x86_64-pc-windows-msvc",
"x86_64-apple-darwin",
"aarch64-apple-darwin",
]
accepted = [ accepted = [
# --- Apache / MIT / BSD / permissive ---
"Apache-2.0", "Apache-2.0",
"MIT", "MIT",
"MIT-0", "MIT-0",
"Unicode-3.0",
"BSD-2-Clause", "BSD-2-Clause",
"Unlicense", "BSD-3-Clause",
"ISC",
"Zlib", "Zlib",
"zlib-acknowledgement",
"BSL-1.0",
"NCSA",
"PostgreSQL",
"curl",
"BlueOak-1.0.0",
"X11",
"HPND",
"TCL",
"ICU",
"Info-ZIP",
# --- Unicode / data / specs ---
"Unicode-DFS-2016",
"Unicode-3.0",
# --- compression / libs ---
"bzip2-1.0.6",
"Libpng",
"libpng-2.0",
"IJG",
"FTL",
# --- public domain style ---
"CC0-1.0", "CC0-1.0",
"Unlicense",
"0BSD",
# --- weak copyleft (GPL-compatible) ---
"MPL-2.0", "MPL-2.0",
"GPL-3.0" "LGPL-3.0",
] "EPL-2.0",
# --- GPL family ---
"GPL-3.0",
"GPL-2.0",
# --- Python / PSF ---
"PSF-2.0",
"Python-2.0",
"Python-2.0.1",
# --- Artistic / Perl ---
"Artistic-2.0",
# --- LLVM / clang ---
"Apache-2.0 WITH LLVM-exception",
# --- data / ML ---
"CDLA-Permissive-2.0",
# --- fonts ---
"OFL-1.1",
# --- Creative Commons (code-safe ones) ---
"CC-BY-3.0",
"CC-BY-4.0",
]

148
action-scripts/download.sh Executable file
View file

@ -0,0 +1,148 @@
#!/usr/bin/env bash
set -euo pipefail
REPO="elicpeter/nyx"
VERSION="${NYX_VERSION:-latest}"
INSTALL_DIR="${RUNNER_TOOL_CACHE:-/tmp}/nyx"
# Optional: pin a GPG key fingerprint here (40-char, no spaces) or set
# NYX_GPG_FINGERPRINT in the calling env to require GPG-signed SHA256SUMS.
# Empty ⇒ GPG verification is skipped (SHA256 + SLSA attestation still run).
PINNED_GPG_FINGERPRINT="${NYX_GPG_FINGERPRINT:-}"
# ── Detect runner OS and architecture ─────────────────────────────────────────
OS="$(uname -s)"
ARCH="$(uname -m)"
case "${OS}-${ARCH}" in
Linux-x86_64) TARGET="x86_64-unknown-linux-gnu" ;;
Linux-aarch64) TARGET="aarch64-unknown-linux-gnu" ;;
Darwin-x86_64) TARGET="x86_64-apple-darwin" ;;
Darwin-arm64) TARGET="aarch64-apple-darwin" ;;
*)
echo "::error::Unsupported platform: ${OS} ${ARCH}"
exit 1
;;
esac
# ── Resolve "latest" to an actual release tag ────────────────────────────────
if [[ "$VERSION" == "latest" ]]; then
echo "::warning::version: latest follows a mutable tag. Pin to a specific release (e.g. v0.7.0) for supply-chain safety."
API_URL="https://api.github.com/repos/${REPO}/releases/latest"
CURL_ARGS=(-fsSL)
if [[ -n "${GITHUB_TOKEN:-}" ]]; then
CURL_ARGS+=(-H "Authorization: token ${GITHUB_TOKEN}")
fi
RELEASE_JSON="$(curl "${CURL_ARGS[@]}" "$API_URL")"
VERSION="$(echo "$RELEASE_JSON" | grep -o '"tag_name":\s*"[^"]*"' | head -1 | cut -d'"' -f4)"
if [[ -z "$VERSION" ]]; then
echo "::error::Failed to resolve latest release tag from ${API_URL}"
exit 1
fi
echo "Resolved latest version: ${VERSION}"
fi
# ── Download the release asset into an isolated staging dir ──────────────────
ASSET_NAME="nyx-${TARGET}.zip"
RELEASE_BASE="https://github.com/${REPO}/releases/download/${VERSION}"
DOWNLOAD_URL="${RELEASE_BASE}/${ASSET_NAME}"
STAGING="$(mktemp -d)"
trap 'rm -rf "$STAGING"' EXIT
CURL_COMMON=(-fsSL)
if [[ -n "${GITHUB_TOKEN:-}" ]]; then
CURL_COMMON+=(-H "Authorization: token ${GITHUB_TOKEN}")
fi
echo "Downloading nyx ${VERSION} for ${TARGET}..."
curl "${CURL_COMMON[@]}" -o "${STAGING}/${ASSET_NAME}" "$DOWNLOAD_URL"
# SHA256SUMS is required — the whole release signing chain hinges on it.
echo "Downloading SHA256SUMS..."
curl "${CURL_COMMON[@]}" -o "${STAGING}/SHA256SUMS" "${RELEASE_BASE}/SHA256SUMS"
# SHA256SUMS.asc is optional (GPG signing was wired up mid-0.x); fetch it if
# present so we can attempt signature verification.
SIG_PATH=""
if curl "${CURL_COMMON[@]}" -o "${STAGING}/SHA256SUMS.asc" "${RELEASE_BASE}/SHA256SUMS.asc" 2>/dev/null; then
SIG_PATH="${STAGING}/SHA256SUMS.asc"
fi
# ── Mandatory: verify the binary's SHA256 matches SHA256SUMS ─────────────────
(
cd "$STAGING"
# --ignore-missing: SHA256SUMS lists every platform archive; we only have one.
if ! sha256sum --ignore-missing -c SHA256SUMS >/dev/null 2>&1; then
echo "::error::SHA256 verification failed for ${ASSET_NAME}. Release may be tampered."
echo "Expected (from SHA256SUMS):"
grep -F "${ASSET_NAME}" SHA256SUMS || true
echo "Actual:"
sha256sum "${ASSET_NAME}" || true
exit 1
fi
)
echo "::notice::SHA256 checksum verified for ${ASSET_NAME}."
# ── Best-effort: GPG verify SHA256SUMS.asc against a pinned fingerprint ──────
# Trust model: only accept a signature from a fingerprint we have pinned. A
# signature from any other key is treated as a failure, not a success. If no
# fingerprint is pinned, GPG verification is skipped (SHA256+SLSA still run).
if [[ -n "$SIG_PATH" ]]; then
if [[ -z "$PINNED_GPG_FINGERPRINT" ]]; then
echo "::warning::SHA256SUMS.asc found but no GPG fingerprint pinned. Set NYX_GPG_FINGERPRINT (40-char, no spaces) to enforce GPG verification."
elif ! command -v gpg >/dev/null 2>&1; then
echo "::warning::gpg not installed on runner; skipping SHA256SUMS.asc verification."
else
# Fetch the pinned key from keys.openpgp.org into an ephemeral keyring.
GNUPGHOME="$(mktemp -d)"
export GNUPGHOME
chmod 700 "$GNUPGHOME"
trap 'rm -rf "$STAGING" "$GNUPGHOME"' EXIT
if ! gpg --batch --keyserver hkps://keys.openpgp.org \
--recv-keys "$PINNED_GPG_FINGERPRINT" >/dev/null 2>&1; then
echo "::error::Failed to fetch GPG key ${PINNED_GPG_FINGERPRINT} from keys.openpgp.org."
exit 1
fi
# --status-fd 1 gives machine-readable output; VALIDSIG + the pinned fpr
# is the only accept condition.
GPG_STATUS="$(gpg --batch --status-fd 1 --verify \
"$SIG_PATH" "${STAGING}/SHA256SUMS" 2>/dev/null || true)"
if ! grep -q "^\[GNUPG:\] VALIDSIG ${PINNED_GPG_FINGERPRINT} " <<<"$GPG_STATUS"; then
echo "::error::GPG signature on SHA256SUMS does not match pinned fingerprint ${PINNED_GPG_FINGERPRINT}."
echo "$GPG_STATUS"
exit 1
fi
echo "::notice::GPG signature verified against ${PINNED_GPG_FINGERPRINT}."
fi
else
echo "::warning::SHA256SUMS.asc not published for ${VERSION}; relying on SHA256 + SLSA only."
fi
# ── Best-effort: SLSA build-provenance attestation (Sigstore) ────────────────
# gh attestation verify ships with the gh CLI (preinstalled on GH-hosted
# runners) and validates attestations produced by actions/attest-build-
# provenance against the Sigstore public-good transparency log. Unlike GPG
# this requires no pre-shared key and is the preferred trust root.
if command -v gh >/dev/null 2>&1; then
if gh attestation verify "${STAGING}/${ASSET_NAME}" --repo "${REPO}" >/dev/null 2>&1; then
echo "::notice::SLSA build provenance verified for ${ASSET_NAME}."
else
echo "::warning::gh attestation verify failed or no attestation present for ${VERSION}. (Expected for releases predating attest-build-provenance.)"
fi
else
echo "::warning::gh CLI not available; skipping SLSA attestation verification."
fi
# ── Extract and install ──────────────────────────────────────────────────────
mkdir -p "$INSTALL_DIR"
# The zip stores target/{TARGET}/release/nyx — use -j to flatten paths
unzip -o -j "${STAGING}/${ASSET_NAME}" "*/nyx" -d "$INSTALL_DIR"
chmod +x "${INSTALL_DIR}/nyx"
# ── Add to PATH for subsequent steps ─────────────────────────────────────────
echo "${INSTALL_DIR}" >> "$GITHUB_PATH"
# ── Verify and set output ────────────────────────────────────────────────────
INSTALLED_VERSION="$("${INSTALL_DIR}/nyx" --version 2>&1 | head -1 || echo "unknown")"
echo "nyx-version=${INSTALLED_VERSION}" >> "$GITHUB_OUTPUT"
echo "Installed nyx: ${INSTALLED_VERSION} (${TARGET})"

87
action-scripts/run.sh Executable file
View file

@ -0,0 +1,87 @@
#!/usr/bin/env bash
set -uo pipefail
# Note: NOT -e — we capture nyx's exit code manually.
# ── Build the nyx command ────────────────────────────────────────────────────
FORMAT="${INPUT_FORMAT:-sarif}"
ARGS=("scan" "${INPUT_PATH:-.}" "--quiet" "--format" "$FORMAT")
if [[ -n "${INPUT_FAIL_ON:-}" ]]; then
ARGS+=("--fail-on" "$INPUT_FAIL_ON")
fi
# Append raw user args (word-split is intentional here)
if [[ -n "${INPUT_ARGS:-}" ]]; then
read -ra EXTRA <<< "$INPUT_ARGS"
ARGS+=("${EXTRA[@]}")
fi
# ── Execute the scan ─────────────────────────────────────────────────────────
OUTDIR="${RUNNER_TEMP:-/tmp}"
SARIF_FILE=""
NYX_EXIT=0
echo "::group::nyx scan"
echo "Running: nyx ${ARGS[*]}"
case "$FORMAT" in
sarif)
SARIF_FILE="${OUTDIR}/nyx-results.sarif"
nyx "${ARGS[@]}" > "$SARIF_FILE" || NYX_EXIT=$?
;;
json)
nyx "${ARGS[@]}" > "${OUTDIR}/nyx-results.json" || NYX_EXIT=$?
;;
*)
nyx "${ARGS[@]}" || NYX_EXIT=$?
;;
esac
echo "::endgroup::"
# ── Count findings ───────────────────────────────────────────────────────────
count_findings() {
python3 -c "
import json, sys
try:
data = json.load(open(sys.argv[1]))
fmt = sys.argv[2]
if fmt == 'sarif':
runs = data.get('runs', [])
print(len(runs[0].get('results', [])) if runs else 0)
else:
print(len(data) if isinstance(data, list) else 0)
except Exception:
print(0)
" "$1" "$2" 2>/dev/null || echo "0"
}
FINDING_COUNT="unknown"
case "$FORMAT" in
sarif)
if [[ -f "$SARIF_FILE" ]]; then
FINDING_COUNT="$(count_findings "$SARIF_FILE" sarif)"
fi
;;
json)
if [[ -f "${OUTDIR}/nyx-results.json" ]]; then
FINDING_COUNT="$(count_findings "${OUTDIR}/nyx-results.json" json)"
fi
;;
esac
# ── Set outputs ──────────────────────────────────────────────────────────────
echo "exit-code=${NYX_EXIT}" >> "$GITHUB_OUTPUT"
echo "finding-count=${FINDING_COUNT}" >> "$GITHUB_OUTPUT"
if [[ -n "$SARIF_FILE" ]]; then
echo "sarif-file=${SARIF_FILE}" >> "$GITHUB_OUTPUT"
fi
# ── Summary ──────────────────────────────────────────────────────────────────
if [[ "$NYX_EXIT" -eq 0 ]]; then
echo "::notice::Nyx scan completed. Findings: ${FINDING_COUNT}"
else
echo "::warning::Nyx scan found issues meeting threshold. Findings: ${FINDING_COUNT}"
fi
exit "$NYX_EXIT"

68
action.yml Normal file
View file

@ -0,0 +1,68 @@
name: 'Nyx Security Scanner'
description: 'Run the Nyx multi-language vulnerability scanner on your codebase. Supports Linux and macOS runners (x86_64 and ARM64).'
author: 'Eli Peter'
branding:
icon: 'shield'
color: 'purple'
inputs:
path:
description: 'Directory to scan'
required: false
default: '.'
version:
description: 'Nyx release tag (e.g. v0.7.0). "latest" is accepted but discouraged, pinning to a specific tag protects against upstream compromise.'
required: false
default: 'v0.7.0'
format:
description: 'Output format: sarif, json, or console'
required: false
default: 'sarif'
fail-on:
description: 'Exit non-zero if findings meet this severity threshold: HIGH, MEDIUM, or LOW'
required: false
default: ''
args:
description: 'Additional CLI arguments (e.g. "--severity >=MEDIUM --profile ci")'
required: false
default: ''
token:
description: 'GitHub token for release download (avoids rate limits)'
required: false
default: ${{ github.token }}
outputs:
finding-count:
description: 'Number of findings detected'
value: ${{ steps.scan.outputs.finding-count }}
sarif-file:
description: 'Path to SARIF results file (empty if format is not sarif)'
value: ${{ steps.scan.outputs.sarif-file }}
exit-code:
description: 'Nyx exit code (0 = clean, 1 = threshold breached)'
value: ${{ steps.scan.outputs.exit-code }}
nyx-version:
description: 'Installed nyx version'
value: ${{ steps.install.outputs.nyx-version }}
runs:
using: 'composite'
steps:
- name: Install nyx
id: install
shell: bash
env:
NYX_VERSION: ${{ inputs.version }}
GITHUB_TOKEN: ${{ inputs.token }}
run: ${{ github.action_path }}/action-scripts/download.sh
- name: Run nyx scan
id: scan
shell: bash
env:
INPUT_PATH: ${{ inputs.path }}
INPUT_FORMAT: ${{ inputs.format }}
INPUT_FAIL_ON: ${{ inputs.fail-on }}
INPUT_ARGS: ${{ inputs.args }}
run: ${{ github.action_path }}/action-scripts/run.sh

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.4 MiB

After

Width:  |  Height:  |  Size: 432 KiB

Before After
Before After

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.9 KiB

View file

@ -0,0 +1,24 @@
<svg xmlns="http://www.w3.org/2000/svg" width="900" height="275" viewBox="0 0 900 275" role="img" aria-labelledby="title desc">
<title id="title">NYX</title>
<desc id="desc">NYX security scanner.</desc>
<defs>
<style>
.banner {
font-family: ui-monospace, SFMono-Regular, Menlo, Consolas, "Liberation Mono", monospace;
font-size: 38px;
font-weight: 800;
letter-spacing: 0;
white-space: pre;
}
</style>
</defs>
<g transform="translate(146 48)" xml:space="preserve">
<text class="banner" x="0" y="0" fill="#2ea067" xml:space="preserve">███╗ ██╗██╗ ██╗██╗ ██╗</text>
<text class="banner" x="0" y="43" fill="#2ea067" xml:space="preserve">████╗ ██║╚██╗ ██╔╝╚██╗██╔╝</text>
<text class="banner" x="0" y="86" fill="#2ea067" xml:space="preserve">██╔██╗ ██║ ╚████╔╝ ╚███╔╝</text>
<text class="banner" x="0" y="129" fill="#2ea067" xml:space="preserve">██║╚██╗██║ ╚██╔╝ ██╔██╗</text>
<text class="banner" x="0" y="172" fill="#2ea067" xml:space="preserve">██║ ╚████║ ██║ ██╔╝ ██╗</text>
<text class="banner" x="0" y="215" fill="#2ea067" xml:space="preserve">╚═╝ ╚═══╝ ╚═╝ ╚═╝ ╚═╝</text>
</g>
</svg>

After

Width:  |  Height:  |  Size: 1.4 KiB

10
assets/nyx-wordmark.svg Normal file
View file

@ -0,0 +1,10 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 220 100" role="img" aria-label="nyx">
<text x="110" y="72"
text-anchor="middle"
dominant-baseline="alphabetic"
font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', system-ui, sans-serif"
font-weight="700"
font-size="100"
letter-spacing="-1"
fill="#72f3d7">nyx</text>
</svg>

After

Width:  |  Height:  |  Size: 392 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 225 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 257 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 204 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 248 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 MiB

BIN
assets/screenshots/demo.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 222 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 190 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 257 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 248 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 276 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 132 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 137 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 160 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 122 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 145 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 109 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 99 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 168 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 109 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 101 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 167 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 233 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 166 KiB

686
benches/dynamic_bench.rs Normal file
View file

@ -0,0 +1,686 @@
//! Dynamic verification benchmarks (§8.4).
//!
//! Tracks the per-scan cost anchors:
//!
//! 1. `harness_build_cold` — fresh workdir, spec → BuiltHarness (source gen + disk write).
//! 2. `harness_build_warm` — same spec, workdir already staged (file write skipped).
//! 3. `sandbox_run_payload` — single payload run via process backend against
//! sqli_positive.py (subprocess + settrace overhead, no networking).
//! 4. `docker_image_build` — cold image pull/build for the python:3-slim base.
//! 5. `docker_exec_warm` — `docker exec` into a running container (no cold start).
//! 6. `docker_payload_cost` — per-payload sandbox cost via docker backend end-to-end.
//! 7. `composite_chain_reverify_dispatch` — `reverify_top_chains` on a
//! synthetic 3-member chain with no member diags. Measures the no-derive
//! dispatch path (chain_step_specs miss, early-exit build/run loops,
//! Inconclusive verdict allocation, severity downgrade).
//! 8. `composite_chain_reverify_stub_confirmed` — same chain shape, stubbed
//! reverifier returning `Confirmed`. Measures the apply-verdict happy path
//! (no severity bucket change).
//! 9. `composite_chain_reverify_top_n_slice` — 5-chain slice with `top_n=3`.
//! Measures the slice traversal cost so a regression that walks the full
//! slice instead of the prefix is visible.
//! 10. `composite_chain_reverify_replay_stable` — same chain shape as
//! `stub_confirmed`, but with `VerifyOptions::replay_stable_check=true`
//! and a stub that stamps `replay_stable=Some(true)`. Anchors the
//! apply-verdict allocation cost when the telemetry stability field
//! is populated; a regression that adds per-chain work behind the
//! replay opt-in (e.g. an extra run_chain_steps call leaking out of
//! the live path into the stub layer) shows up here.
//!
//! Wall-clock budget anchors for the composite reverify path: the live
//! process backend stays under 400ms per 3-member chain, the docker
//! backend under 1500ms. Those live-run numbers are covered by the
//! `flask_eval_chain_reverify_populates_dynamic_verdict` integration
//! test in `tests/chain_emission_e2e.rs`; the microbenches here anchor
//! the dispatch + verdict-application overhead so regressions on the
//! API-shape half land in the criterion baseline.
//!
//! Baselines committed to `benches/dynamic_bench_baseline.json`.
//! Run: `cargo bench --features dynamic -- dynamic`
//!
//! Docker benchmarks are no-ops when docker is unavailable (skipped, not failed).
use criterion::{Criterion, criterion_group, criterion_main};
#[cfg(feature = "dynamic")]
use nyx_scanner::dynamic::spec::{
EntryKind, HarnessSpec, JavaToolchain, PayloadSlot, SpecDerivationStrategy,
};
#[cfg(feature = "dynamic")]
use nyx_scanner::labels::Cap;
#[cfg(feature = "dynamic")]
use nyx_scanner::symbol::Lang;
#[cfg(feature = "dynamic")]
fn make_rust_sqli_spec() -> HarnessSpec {
HarnessSpec {
finding_id: "bench_rust_0001".into(),
entry_file: "tests/dynamic_fixtures/rust/sqli_positive.rs".into(),
entry_name: "run".into(),
entry_kind: nyx_scanner::dynamic::spec::EntryKind::Function,
lang: Lang::Rust,
toolchain_id: "rust-stable".into(),
payload_slot: PayloadSlot::Param(0),
expected_cap: Cap::SQL_QUERY,
constraint_hints: vec![],
sink_file: "tests/dynamic_fixtures/rust/sqli_positive.rs".into(),
sink_line: 18,
spec_hash: "benchrustsqli0001".into(),
derivation: SpecDerivationStrategy::FromFlowSteps,
stubs_required: vec![],
framework: None,
java_toolchain: JavaToolchain::default(),
}
}
#[cfg(feature = "dynamic")]
fn make_sqli_spec() -> HarnessSpec {
HarnessSpec {
finding_id: "bench0000000001".into(),
entry_file: "tests/dynamic_fixtures/python/sqli_positive.py".into(),
entry_name: "login".into(),
entry_kind: EntryKind::Function,
lang: Lang::Python,
toolchain_id: "python-3".into(),
payload_slot: PayloadSlot::Param(0),
expected_cap: Cap::SQL_QUERY,
constraint_hints: vec![],
sink_file: "tests/dynamic_fixtures/python/sqli_positive.py".into(),
sink_line: 7,
spec_hash: "benchsqli000001".into(),
derivation: SpecDerivationStrategy::FromFlowSteps,
stubs_required: vec![],
framework: None,
java_toolchain: JavaToolchain::default(),
}
}
#[cfg(feature = "dynamic")]
fn bench_harness_build_cold(c: &mut Criterion) {
use nyx_scanner::dynamic::harness;
let spec = make_sqli_spec();
c.bench_function("harness_build_cold", |b| {
b.iter(|| {
let workdir = std::env::temp_dir()
.join("nyx-harness")
.join(&spec.spec_hash);
let _ = std::fs::remove_dir_all(&workdir);
harness::build(&spec).expect("harness build")
});
});
}
#[cfg(feature = "dynamic")]
fn bench_harness_build_warm(c: &mut Criterion) {
use nyx_scanner::dynamic::harness;
let spec = make_sqli_spec();
harness::build(&spec).expect("harness pre-stage");
c.bench_function("harness_build_warm", |b| {
b.iter(|| harness::build(&spec).expect("harness build warm"));
});
}
#[cfg(feature = "dynamic")]
fn bench_sandbox_run_payload(c: &mut Criterion) {
use nyx_scanner::dynamic::corpus::payloads_for;
use nyx_scanner::dynamic::harness;
use nyx_scanner::dynamic::sandbox::{self, SandboxOptions};
let spec = make_sqli_spec();
let harness = harness::build(&spec).expect("harness build");
let payloads = payloads_for(Cap::SQL_QUERY);
let payload = payloads
.iter()
.find(|p| !p.is_benign)
.expect("sqli payload");
let opts = SandboxOptions {
timeout: std::time::Duration::from_secs(10),
..SandboxOptions::default()
};
c.bench_function("sandbox_run_payload", |b| {
b.iter(|| sandbox::run(&harness, payload.bytes, &opts).expect("sandbox run"));
});
}
#[cfg(feature = "dynamic")]
fn docker_available() -> bool {
std::process::Command::new("docker")
.arg("info")
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::null())
.status()
.map(|s| s.success())
.unwrap_or(false)
}
/// Cold docker image pull/build.
///
/// Measures the time to ensure `python:3-slim` is present locally. On a
/// warm cache this is just an inspect call (sub-second). On a cold host it
/// includes the pull from the registry.
///
/// Registers a labelled noop measurement when Docker is absent so criterion's
/// output is never empty for this slot.
#[cfg(feature = "dynamic")]
fn bench_docker_image_build(c: &mut Criterion) {
if !docker_available() {
c.bench_function("docker_image_build_no_docker", |b| b.iter(|| ()));
return;
}
c.bench_function("docker_image_build", |b| {
b.iter(|| {
// `docker pull` is idempotent and fast when image is already local.
let _ = std::process::Command::new("docker")
.args(["pull", "python:3-slim"])
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::null())
.status();
});
});
}
/// Warm `docker exec` reuse benchmark.
///
/// Starts a single container before the benchmark loop and measures the cost
/// of each `docker exec` call (no cold-start amortisation visible here — that
/// is visible by comparing this vs `bench_docker_payload_cost`).
#[cfg(feature = "dynamic")]
fn bench_docker_exec_warm(c: &mut Criterion) {
if !docker_available() {
eprintln!("bench_docker_exec_warm: docker unavailable, skipping");
return;
}
// Start a long-lived container for the benchmark.
let container = "nyx-bench-exec-warm";
let _ = std::process::Command::new("docker")
.args([
"run",
"-d",
"--rm",
"--name",
container,
"--cap-drop=ALL",
"--security-opt",
"no-new-privileges:true",
"--network",
"none",
"python:3-slim",
"sleep",
"300",
])
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::null())
.status();
c.bench_function("docker_exec_warm", |b| {
b.iter(|| {
let _ = std::process::Command::new("docker")
.args(["exec", container, "python3", "-c", "pass"])
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::null())
.status();
});
});
let _ = std::process::Command::new("docker")
.args(["stop", container])
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::null())
.status();
}
/// Per-payload sandbox cost via docker backend end-to-end.
///
/// Measures the complete path: harness already built + docker backend +
/// process the sqli_positive fixture. The first call includes container
/// start; subsequent calls show exec-reuse cost.
///
/// Registers a labelled noop measurement when Docker is absent so criterion's
/// output is never empty for this slot.
#[cfg(feature = "dynamic")]
fn bench_docker_payload_cost(c: &mut Criterion) {
if !docker_available() {
c.bench_function("docker_payload_cost_no_docker", |b| b.iter(|| ()));
return;
}
use nyx_scanner::dynamic::corpus::payloads_for;
use nyx_scanner::dynamic::harness;
use nyx_scanner::dynamic::sandbox::{self, SandboxBackend, SandboxOptions};
let spec = make_sqli_spec();
let built = harness::build(&spec).expect("harness build");
let payloads = payloads_for(Cap::SQL_QUERY);
let payload = payloads
.iter()
.find(|p| !p.is_benign)
.expect("sqli payload");
let opts = SandboxOptions {
timeout: std::time::Duration::from_secs(30),
backend: SandboxBackend::Docker,
..SandboxOptions::default()
};
c.bench_function("docker_payload_cost", |b| {
b.iter(|| {
let _ = sandbox::run(&built, payload.bytes, &opts);
});
});
}
/// Rust harness build (source gen + disk write, no compilation).
///
/// Measures only `harness::build()` — staging files to the workdir.
/// The expensive `cargo build --release` step is NOT included here
/// (that is the province of an integration benchmark, not this microbench).
#[cfg(feature = "dynamic")]
fn bench_rust_harness_build_cold(c: &mut Criterion) {
use nyx_scanner::dynamic::harness;
let spec = make_rust_sqli_spec();
c.bench_function("rust_harness_build_cold", |b| {
b.iter(|| {
let workdir = std::env::temp_dir()
.join("nyx-harness")
.join(&spec.spec_hash);
let _ = std::fs::remove_dir_all(&workdir);
harness::build(&spec).expect("harness build")
});
});
}
#[cfg(feature = "dynamic")]
fn make_js_sqli_spec() -> HarnessSpec {
HarnessSpec {
finding_id: "bench_js_0001".into(),
entry_file: "tests/dynamic_fixtures/js/sqli_positive.js".into(),
entry_name: "login".into(),
entry_kind: nyx_scanner::dynamic::spec::EntryKind::Function,
lang: Lang::JavaScript,
toolchain_id: "node-20".into(),
payload_slot: PayloadSlot::Param(0),
expected_cap: Cap::SQL_QUERY,
constraint_hints: vec![],
sink_file: "tests/dynamic_fixtures/js/sqli_positive.js".into(),
sink_line: 8,
spec_hash: "benchjssqli000001".into(),
derivation: SpecDerivationStrategy::FromFlowSteps,
stubs_required: vec![],
framework: None,
java_toolchain: JavaToolchain::default(),
}
}
#[cfg(feature = "dynamic")]
fn make_go_sqli_spec() -> HarnessSpec {
HarnessSpec {
finding_id: "bench_go_0001".into(),
entry_file: "tests/dynamic_fixtures/go/sqli_positive.go".into(),
entry_name: "Login".into(),
entry_kind: nyx_scanner::dynamic::spec::EntryKind::Function,
lang: Lang::Go,
toolchain_id: "go-1.21".into(),
payload_slot: PayloadSlot::Param(0),
expected_cap: Cap::SQL_QUERY,
constraint_hints: vec![],
sink_file: "tests/dynamic_fixtures/go/sqli_positive.go".into(),
sink_line: 12,
spec_hash: "benchgosqli000001".into(),
derivation: SpecDerivationStrategy::FromFlowSteps,
stubs_required: vec![],
framework: None,
java_toolchain: JavaToolchain::default(),
}
}
#[cfg(feature = "dynamic")]
fn make_java_sqli_spec() -> HarnessSpec {
HarnessSpec {
finding_id: "bench_java_0001".into(),
entry_file: "tests/dynamic_fixtures/java/sqli_positive.java".into(),
entry_name: "login".into(),
entry_kind: nyx_scanner::dynamic::spec::EntryKind::Function,
lang: Lang::Java,
toolchain_id: "java-21".into(),
payload_slot: PayloadSlot::Param(0),
expected_cap: Cap::SQL_QUERY,
constraint_hints: vec![],
sink_file: "tests/dynamic_fixtures/java/sqli_positive.java".into(),
sink_line: 9,
spec_hash: "benchjavasqli00001".into(),
derivation: SpecDerivationStrategy::FromFlowSteps,
stubs_required: vec![],
framework: None,
java_toolchain: JavaToolchain::default(),
}
}
#[cfg(feature = "dynamic")]
fn make_php_sqli_spec() -> HarnessSpec {
HarnessSpec {
finding_id: "bench_php_0001".into(),
entry_file: "tests/dynamic_fixtures/php/sqli_positive.php".into(),
entry_name: "login".into(),
entry_kind: nyx_scanner::dynamic::spec::EntryKind::Function,
lang: Lang::Php,
toolchain_id: "php-8".into(),
payload_slot: PayloadSlot::Param(0),
expected_cap: Cap::SQL_QUERY,
constraint_hints: vec![],
sink_file: "tests/dynamic_fixtures/php/sqli_positive.php".into(),
sink_line: 9,
spec_hash: "benchphpsqli000001".into(),
derivation: SpecDerivationStrategy::FromFlowSteps,
stubs_required: vec![],
framework: None,
java_toolchain: JavaToolchain::default(),
}
}
/// JS harness build (source gen + disk write).
#[cfg(feature = "dynamic")]
fn bench_js_harness_build_cold(c: &mut Criterion) {
use nyx_scanner::dynamic::harness;
let spec = make_js_sqli_spec();
c.bench_function("js_harness_build_cold", |b| {
b.iter(|| {
let workdir = std::env::temp_dir()
.join("nyx-harness")
.join(&spec.spec_hash);
let _ = std::fs::remove_dir_all(&workdir);
harness::build(&spec).expect("JS harness build")
});
});
}
/// Go harness build (source gen + disk write, no compilation).
#[cfg(feature = "dynamic")]
fn bench_go_harness_build_cold(c: &mut Criterion) {
use nyx_scanner::dynamic::harness;
let spec = make_go_sqli_spec();
c.bench_function("go_harness_build_cold", |b| {
b.iter(|| {
let workdir = std::env::temp_dir()
.join("nyx-harness")
.join(&spec.spec_hash);
let _ = std::fs::remove_dir_all(&workdir);
harness::build(&spec).expect("Go harness build")
});
});
}
/// Java harness build (source gen + disk write, no compilation).
#[cfg(feature = "dynamic")]
fn bench_java_harness_build_cold(c: &mut Criterion) {
use nyx_scanner::dynamic::harness;
let spec = make_java_sqli_spec();
c.bench_function("java_harness_build_cold", |b| {
b.iter(|| {
let workdir = std::env::temp_dir()
.join("nyx-harness")
.join(&spec.spec_hash);
let _ = std::fs::remove_dir_all(&workdir);
harness::build(&spec).expect("Java harness build")
});
});
}
/// PHP harness build (source gen + disk write).
#[cfg(feature = "dynamic")]
fn bench_php_harness_build_cold(c: &mut Criterion) {
use nyx_scanner::dynamic::harness;
let spec = make_php_sqli_spec();
c.bench_function("php_harness_build_cold", |b| {
b.iter(|| {
let workdir = std::env::temp_dir()
.join("nyx-harness")
.join(&spec.spec_hash);
let _ = std::fs::remove_dir_all(&workdir);
harness::build(&spec).expect("PHP harness build")
});
});
}
#[cfg(feature = "dynamic")]
fn mk_chain_member(hash: u64, idx: usize) -> nyx_scanner::chain::FindingRef {
use nyx_scanner::surface::SourceLocation;
nyx_scanner::chain::FindingRef {
finding_id: format!("bench-chain-member-{idx}"),
stable_hash: hash,
location: SourceLocation::new("bench/synthetic.py", (idx as u32) + 1, 1),
rule_id: "taint-unsanitised-flow".into(),
cap_bits: 0,
}
}
#[cfg(feature = "dynamic")]
fn mk_synthetic_chain(hash: u64, members: usize) -> nyx_scanner::chain::ChainFinding {
use nyx_scanner::chain::{ChainFinding, ChainSeverity, ChainSink, ImpactCategory};
ChainFinding {
stable_hash: hash,
members: (0..members)
.map(|i| mk_chain_member(hash.wrapping_add(i as u64 + 1), i))
.collect(),
sink: ChainSink {
file: "bench/synthetic.py".into(),
line: 99,
col: 1,
function_name: "sink".into(),
cap_bits: 0,
},
implied_impact: ImpactCategory::Rce,
severity: ChainSeverity::Critical,
score: 100.0,
dynamic_verdict: None,
reverify_reason: None,
}
}
#[cfg(feature = "dynamic")]
struct BenchConfirmedReverifier;
#[cfg(feature = "dynamic")]
impl nyx_scanner::chain::CompositeReverifier for BenchConfirmedReverifier {
fn reverify(
&self,
_chain: &nyx_scanner::chain::ChainFinding,
_member_diags: &[nyx_scanner::commands::scan::Diag],
_surface: &nyx_scanner::surface::SurfaceMap,
opts: &nyx_scanner::dynamic::verify::VerifyOptions,
) -> nyx_scanner::evidence::VerifyResult {
// Mirror `DefaultCompositeReverifier::reverify`'s replay-stable
// stamping shape so the apply-verdict allocation cost matches
// the live path when the opt-in is on. The stub does not
// re-run any work (it has none to re-run) but the resulting
// `VerifyResult` populates `replay_stable=Some(true)` so
// downstream sites that branch on the field exercise the same
// path they would for a real Confirmed-with-stable run.
let replay_stable = if opts.replay_stable_check {
Some(true)
} else {
None
};
nyx_scanner::evidence::VerifyResult {
finding_id: "bench".into(),
status: nyx_scanner::evidence::VerifyStatus::Confirmed,
triggered_payload: None,
reason: None,
inconclusive_reason: None,
detail: None,
attempts: vec![],
toolchain_match: None,
differential: None,
replay_stable,
wrong: None,
hardening_outcome: None,
}
}
}
/// Phase 26 dispatch-cost anchor: synthetic 3-member chain with no
/// matching member diags. The reverifier walks chain_step_specs (3
/// HashMap misses → 3 NoFlowSteps errors), the build loop sees zero
/// derived specs and exits early, the run loop sees zero built steps
/// and exits early. The composed VerifyResult is allocated and applied
/// via `apply_dynamic_verdict` (Inconclusive → severity downgrade).
///
/// This is the no-toolchain-dep dispatch overhead — a regression here
/// signals a hot-path allocation introduced into the reverify pipeline.
#[cfg(feature = "dynamic")]
fn bench_composite_chain_reverify_dispatch(c: &mut Criterion) {
use nyx_scanner::chain::reverify;
use nyx_scanner::dynamic::verify::VerifyOptions;
use nyx_scanner::surface::SurfaceMap;
let surface = SurfaceMap::new();
let opts = VerifyOptions::default();
c.bench_function("composite_chain_reverify_dispatch", |b| {
b.iter(|| {
let mut chains = [mk_synthetic_chain(0xC1A1, 3)];
let _ = reverify::reverify_top_chains(&mut chains, &[], &surface, &opts, 1);
});
});
}
/// Phase 26 stub-reverifier happy-path anchor: synthetic 3-member
/// chain driven through `reverify_top_chains_with` + a stubbed
/// reverifier returning `Confirmed`. Measures the apply-verdict path
/// when the verdict does NOT trigger a severity downgrade, so the
/// `ChainReverifyResult` allocation + `chain.apply_dynamic_verdict`
/// transition cost is exercised independent of the verdict-side
/// allocation in the dispatch bench.
#[cfg(feature = "dynamic")]
fn bench_composite_chain_reverify_stub_confirmed(c: &mut Criterion) {
use nyx_scanner::chain::reverify;
use nyx_scanner::dynamic::verify::VerifyOptions;
use nyx_scanner::surface::SurfaceMap;
let surface = SurfaceMap::new();
let opts = VerifyOptions::default();
let reverifier = BenchConfirmedReverifier;
c.bench_function("composite_chain_reverify_stub_confirmed", |b| {
b.iter(|| {
let mut chains = [mk_synthetic_chain(0xC2A2, 3)];
let _ = reverify::reverify_top_chains_with(
&mut chains,
&[],
&surface,
&opts,
1,
&reverifier,
);
});
});
}
/// Phase 26 top-N slice anchor: 5-chain slice with `top_n=3`. Asserts
/// (by way of regression) that the reverify pass never walks past the
/// top-N prefix. The fan-in is the per-chain dispatch cost times three;
/// a regression that drops the `bound = top_n.min(chains.len())` cap
/// would show up as a ~5/3 increase in this bench.
#[cfg(feature = "dynamic")]
fn bench_composite_chain_reverify_top_n_slice(c: &mut Criterion) {
use nyx_scanner::chain::reverify;
use nyx_scanner::dynamic::verify::VerifyOptions;
use nyx_scanner::surface::SurfaceMap;
let surface = SurfaceMap::new();
let opts = VerifyOptions::default();
let reverifier = BenchConfirmedReverifier;
c.bench_function("composite_chain_reverify_top_n_slice", |b| {
b.iter(|| {
let mut chains: [nyx_scanner::chain::ChainFinding; 5] = [
mk_synthetic_chain(0xC301, 3),
mk_synthetic_chain(0xC302, 3),
mk_synthetic_chain(0xC303, 3),
mk_synthetic_chain(0xC304, 3),
mk_synthetic_chain(0xC305, 3),
];
let _ = reverify::reverify_top_chains_with(
&mut chains,
&[],
&surface,
&opts,
3,
&reverifier,
);
});
});
}
/// Phase 26 replay-stable anchor: same 3-member synthetic chain as
/// `stub_confirmed`, driven through `reverify_top_chains_with` with
/// `VerifyOptions::replay_stable_check=true`. The `BenchConfirmedReverifier`
/// stub honours the opt-in by stamping `replay_stable=Some(true)` on
/// the returned `VerifyResult`, exercising the apply-verdict path with
/// the telemetry stability field populated.
///
/// Purpose: anchor the cost of the replay-stable apply path so a
/// regression that leaks a real `run_chain_steps` invocation into the
/// stubbed verifier layer (or that allocates extra state behind the
/// `replay_stable_check` toggle in `chain::reverify::apply_one`) shows
/// up immediately against the `stub_confirmed` baseline.
#[cfg(feature = "dynamic")]
fn bench_composite_chain_reverify_replay_stable(c: &mut Criterion) {
use nyx_scanner::chain::reverify;
use nyx_scanner::dynamic::verify::VerifyOptions;
use nyx_scanner::surface::SurfaceMap;
let surface = SurfaceMap::new();
let opts = VerifyOptions {
replay_stable_check: true,
..VerifyOptions::default()
};
let reverifier = BenchConfirmedReverifier;
c.bench_function("composite_chain_reverify_replay_stable", |b| {
b.iter(|| {
let mut chains = [mk_synthetic_chain(0xC4A3, 3)];
let _ = reverify::reverify_top_chains_with(
&mut chains,
&[],
&surface,
&opts,
1,
&reverifier,
);
});
});
}
#[cfg(feature = "dynamic")]
#[allow(dead_code)]
fn bench_noop(_c: &mut Criterion) {}
// When dynamic feature is off, provide a stub so the binary still links.
#[cfg(not(feature = "dynamic"))]
fn bench_noop(c: &mut Criterion) {
c.bench_function("dynamic_disabled_noop", |b| b.iter(|| ()));
}
#[cfg(feature = "dynamic")]
criterion_group!(
dynamic,
bench_harness_build_cold,
bench_harness_build_warm,
bench_sandbox_run_payload,
bench_docker_image_build,
bench_docker_exec_warm,
bench_docker_payload_cost,
bench_rust_harness_build_cold,
bench_js_harness_build_cold,
bench_go_harness_build_cold,
bench_java_harness_build_cold,
bench_php_harness_build_cold,
bench_composite_chain_reverify_dispatch,
bench_composite_chain_reverify_stub_confirmed,
bench_composite_chain_reverify_top_n_slice,
bench_composite_chain_reverify_replay_stable,
);
#[cfg(not(feature = "dynamic"))]
criterion_group!(dynamic, bench_noop);
criterion_main!(dynamic);

View file

@ -0,0 +1,26 @@
{
"schema": 1,
"note": "ASPIRATIONAL placeholder — values were hand-typed, not captured from a real bench run. Regenerate with: benches/regen_baseline.sh (requires --features dynamic and python3 on PATH). Commit the updated file to establish a real regression reference for M3+.",
"benchmarks": {
"harness_build_cold": {
"mean_ns": 800000,
"stddev_ns": 120000,
"description": "Fresh workdir; spec → BuiltHarness including source gen + disk write."
},
"harness_build_warm": {
"mean_ns": 180000,
"stddev_ns": 30000,
"description": "Workdir already staged; file write skipped by dst.exists() guard."
},
"sandbox_run_payload": {
"mean_ns": 120000000,
"stddev_ns": 15000000,
"description": "Single process-backend run with sqli payload; includes python3 startup + settrace."
}
},
"regression_thresholds": {
"harness_build_cold": 2.0,
"harness_build_warm": 2.0,
"sandbox_run_payload": 1.5
}
}

View file

@ -0,0 +1,61 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* Clean open/close — no findings expected */
void clean_usage(void) {
FILE *f = fopen("data.txt", "r");
char buf[256];
fread(buf, 1, 256, f);
fclose(f);
}
/* Resource leak — fopen without fclose */
void leaky_function(void) {
FILE *f = fopen("log.txt", "w");
fprintf(f, "hello");
}
/* Use after close */
void use_after_close(void) {
FILE *f = fopen("tmp.txt", "r");
fclose(f);
char buf[64];
fread(buf, 1, 64, f);
}
/* Branch leak — closed on one path only */
void branch_leak(int cond) {
FILE *f = fopen("x.txt", "r");
if (cond) {
fclose(f);
}
}
/* Multiple handles — both properly closed */
void multi_handle(void) {
FILE *a = fopen("a.txt", "r");
FILE *b = fopen("b.txt", "w");
fclose(a);
fclose(b);
}
/* Double close */
void double_close(void) {
FILE *f = fopen("d.txt", "r");
fclose(f);
fclose(f);
}
/* Malloc/free — clean */
void malloc_clean(void) {
char *p = malloc(1024);
memset(p, 0, 1024);
free(p);
}
/* Malloc leak — never freed */
void malloc_leak(void) {
char *p = malloc(512);
memset(p, 0, 512);
}

File diff suppressed because it is too large Load diff

84
benches/regen_baseline.sh Executable file
View file

@ -0,0 +1,84 @@
#!/usr/bin/env bash
# Regenerate benches/dynamic_bench_baseline.json from a real cargo bench run.
#
# Usage:
# bash benches/regen_baseline.sh
#
# Requirements:
# - python3 on PATH
# - cargo (nightly or stable with edition 2024)
# - Criterion's JSON output (criterion feature already in dev-deps)
#
# The script runs the dynamic bench group, parses Criterion's estimates JSON,
# and overwrites dynamic_bench_baseline.json with real numbers.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
BASELINE_FILE="${SCRIPT_DIR}/dynamic_bench_baseline.json"
echo "Running cargo bench --features dynamic -- dynamic ..."
cargo bench --manifest-path "${REPO_ROOT}/Cargo.toml" \
--features dynamic \
-- dynamic \
2>&1 | tee /tmp/nyx_bench_raw.txt
# Criterion writes estimates to target/criterion/<bench>/<group>/estimates.json.
# Extract mean_ns for each tracked benchmark.
extract_ns() {
local path="$1"
if [[ -f "${path}" ]]; then
python3 -c "
import json, sys
d = json.load(open('${path}'))
mean = d['mean']['point_estimate']
stddev = (d['std_dev']['point_estimate']) if 'std_dev' in d else 0
print(int(mean), int(stddev))
"
else
echo "0 0"
fi
}
TARGET="${REPO_ROOT}/target/criterion"
read COLD_MEAN COLD_STDDEV < <(extract_ns "${TARGET}/harness_build_cold/default/estimates.json")
read WARM_MEAN WARM_STDDEV < <(extract_ns "${TARGET}/harness_build_warm/default/estimates.json")
read RUN_MEAN RUN_STDDEV < <(extract_ns "${TARGET}/sandbox_run_payload/default/estimates.json")
MACHINE="$(uname -m) / $(uname -s)"
NYX_VER="$(cargo metadata --manifest-path "${REPO_ROOT}/Cargo.toml" --no-deps --format-version 1 \
| python3 -c "import json,sys; d=json.load(sys.stdin); print(next(p['version'] for p in d['packages'] if p['name']=='nyx-scanner'))")"
DATE="$(date +%Y-%m-%d)"
cat > "${BASELINE_FILE}" <<EOF
{
"schema": 1,
"note": "Baseline captured on ${MACHINE}, nyx v${NYX_VER}, ${DATE}. Regenerate with: benches/regen_baseline.sh",
"benchmarks": {
"harness_build_cold": {
"mean_ns": ${COLD_MEAN},
"stddev_ns": ${COLD_STDDEV},
"description": "Fresh workdir; spec → BuiltHarness including source gen + disk write."
},
"harness_build_warm": {
"mean_ns": ${WARM_MEAN},
"stddev_ns": ${WARM_STDDEV},
"description": "Workdir already staged; file write skipped by dst.exists() guard."
},
"sandbox_run_payload": {
"mean_ns": ${RUN_MEAN},
"stddev_ns": ${RUN_STDDEV},
"description": "Single process-backend run with sqli payload; includes python3 startup + settrace."
}
},
"regression_thresholds": {
"harness_build_cold": 2.0,
"harness_build_warm": 2.0,
"sandbox_run_payload": 1.5
}
}
EOF
echo "Updated ${BASELINE_FILE}"

View file

@ -73,6 +73,47 @@ fn bench_full_scan(c: &mut Criterion) {
}); });
} }
fn bench_full_scan_with_state(c: &mut Criterion) {
let fixtures = Path::new(FIXTURES).canonicalize().expect("fixtures dir");
let mut cfg = Config::default();
cfg.scanner.mode = AnalysisMode::Full;
cfg.scanner.enable_state_analysis = true;
cfg.performance.worker_threads = Some(1);
cfg.performance.channel_multiplier = 1;
cfg.performance.batch_size = 64;
c.bench_function("full_scan_with_state", |b| {
b.iter(|| {
let (rx, handle) = nyx_scanner::walk::spawn_file_walker(&fixtures, &cfg);
if let Err(err) = handle.join() {
panic!("walker panicked: {err:#?}");
}
let paths: Vec<_> = rx.into_iter().flatten().collect();
// Pass 1: extract summaries
let mut all_sums = Vec::new();
for path in &paths {
if let Ok(sums) = nyx_scanner::ast::extract_summaries_from_file(path, &cfg) {
all_sums.extend(sums);
}
}
let root_str = fixtures.to_string_lossy();
let global = nyx_scanner::summary::merge_summaries(all_sums, Some(&root_str));
// Pass 2: full analysis with state
let mut diags = Vec::new();
for path in &paths {
if let Ok(mut d) =
nyx_scanner::ast::run_rules_on_file(path, &cfg, Some(&global), Some(&fixtures))
{
diags.append(&mut d);
}
}
diags
});
});
}
fn bench_single_file_parse_and_cfg(c: &mut Criterion) { fn bench_single_file_parse_and_cfg(c: &mut Criterion) {
let fixture = Path::new(FIXTURES).join("sample.rs"); let fixture = Path::new(FIXTURES).join("sample.rs");
let fixture = fixture.canonicalize().expect("sample.rs fixture"); let fixture = fixture.canonicalize().expect("sample.rs fixture");
@ -86,13 +127,309 @@ fn bench_single_file_parse_and_cfg(c: &mut Criterion) {
}); });
} }
fn bench_state_analysis_only(c: &mut Criterion) {
let fixture = Path::new(FIXTURES)
.join("state_bench.c")
.canonicalize()
.expect("state_bench.c fixture");
let mut cfg = Config::default();
cfg.scanner.mode = AnalysisMode::Full;
cfg.scanner.enable_state_analysis = true;
// Parse and build CFG once (outside benchmark loop)
let (file_cfg, lang) = nyx_scanner::ast::build_cfg_for_file(&fixture, &cfg)
.expect("build cfg")
.expect("supported language");
let source_bytes = std::fs::read(&fixture).expect("read fixture");
let top = file_cfg.toplevel();
c.bench_function("state_analysis_only", |b| {
b.iter(|| {
nyx_scanner::state::run_state_analysis(
&top.graph,
top.entry,
lang,
&source_bytes,
&file_cfg.summaries,
None,
true,
&[],
&[],
&std::collections::HashSet::new(),
None,
None,
)
});
});
}
fn bench_classify(c: &mut Criterion) { fn bench_classify(c: &mut Criterion) {
c.bench_function("classify_hit", |b| { c.bench_function("classify_hit", |b| {
b.iter(|| nyx_scanner::labels::classify("rust", "std::env::var")); b.iter(|| nyx_scanner::labels::classify("rust", "std::env::var", None));
}); });
c.bench_function("classify_miss", |b| { c.bench_function("classify_miss", |b| {
b.iter(|| nyx_scanner::labels::classify("rust", "some_random_function")); b.iter(|| nyx_scanner::labels::classify("rust", "some_random_function", None));
});
}
/// Per-file fused analysis throughput on a realistic ~1.5k-line Go module
/// (gin context.go, ~147 fns). Guards the
/// `ParsedFile::body_const_facts_cache` optimization that collapses the
/// 2-3× per-body re-lowering that previously dominated `analyse_file_fused`
/// (~14% of wall-clock on the gin-scan profile). Regressions here mean
/// per-body work is being recomputed across passes again.
fn bench_analyse_file_fused_large_go(c: &mut Criterion) {
let fixture = Path::new("benches/perf_fixtures/large_go_module.go")
.canonicalize()
.expect("perf fixture");
let bytes = std::fs::read(&fixture).expect("read fixture");
let mut cfg = Config::default();
cfg.scanner.mode = AnalysisMode::Full;
cfg.scanner.enable_state_analysis = true;
cfg.performance.worker_threads = Some(1);
// One-shot diagnostic: count `build_body_const_facts` calls per fused
// analysis so a regression that removes the per-file cache surfaces here
// (expected ~148 calls on this fixture; pre-cache was ~444).
nyx_scanner::cfg_analysis::BUILD_BODY_CONST_FACTS_CALLS
.store(0, std::sync::atomic::Ordering::Relaxed);
let _ = nyx_scanner::ast::analyse_file_fused(&bytes, &fixture, &cfg, None, None)
.expect("warmup analyse");
let calls = nyx_scanner::cfg_analysis::BUILD_BODY_CONST_FACTS_CALLS
.load(std::sync::atomic::Ordering::Relaxed);
eprintln!("[diag] build_body_const_facts calls per analyse_file_fused: {calls}");
c.bench_function("analyse_file_fused_large_go", |b| {
b.iter(|| {
nyx_scanner::ast::analyse_file_fused(&bytes, &fixture, &cfg, None, None)
.expect("analyse_file_fused")
});
});
}
/// Per-file `extract_authorization_model` throughput on the realistic
/// ~1.5k-line Go fixture (gin context.go). Guards the
/// `extract_authorization_model` orchestrator hoist that pulled the
/// shared `collect_top_level_units` AST walk out of every supporting
/// extractor's `extract()` (one walk per file instead of one per
/// matching extractor). On Go files both `EchoExtractor` and
/// `GinExtractor` match by default — pre-hoist this bench measured the
/// AST being walked twice; regressions here mean the hoist has been
/// broken or a new Go extractor was added that re-walks the tree.
fn bench_extract_authorization_model_go(c: &mut Criterion) {
use tree_sitter::Parser;
let fixture = Path::new("benches/perf_fixtures/large_go_module.go")
.canonicalize()
.expect("perf fixture");
let bytes = std::fs::read(&fixture).expect("read fixture");
let mut parser = Parser::new();
let go_lang: tree_sitter::Language = tree_sitter_go::LANGUAGE.into();
parser.set_language(&go_lang).expect("set go grammar");
let tree = parser.parse(&bytes, None).expect("parse fixture");
let cfg = Config::default();
let rules = nyx_scanner::auth_analysis::config::build_auth_rules(&cfg, "go");
c.bench_function("extract_authorization_model_go", |b| {
b.iter(|| {
nyx_scanner::auth_analysis::extract::extract_authorization_model(
"go",
cfg.framework_ctx.as_ref(),
&tree,
&bytes,
&fixture,
&rules,
None,
)
});
});
}
/// Per-file shared-vs-double `extract_authorization_model` cost on a
/// realistic Go fixture (gin context.go). Pre-fix
/// `analyse_file_fused` called `extract_authorization_model` twice per
/// file (once for diagnostics via `run_auth_analysis`, once for
/// per-file summary keying via `extract_auth_summaries_by_key`). This
/// bench records the **shared-model path** only (extract once, derive
/// both summaries + diagnostics) so a regression that re-introduces
/// the double-call surfaces as a ≥1.7× slowdown here.
fn bench_extract_authorization_model_shared_go(c: &mut Criterion) {
use tree_sitter::Parser;
let fixture = Path::new("benches/perf_fixtures/large_go_module.go")
.canonicalize()
.expect("perf fixture");
let bytes = std::fs::read(&fixture).expect("read fixture");
let mut parser = Parser::new();
let go_lang: tree_sitter::Language = tree_sitter_go::LANGUAGE.into();
parser.set_language(&go_lang).expect("set go grammar");
let tree = parser.parse(&bytes, None).expect("parse fixture");
let cfg = Config::default();
let rules = nyx_scanner::auth_analysis::config::build_auth_rules(&cfg, "go");
c.bench_function("extract_authorization_model_shared_go", |b| {
b.iter(|| {
// Mirror `analyse_file_fused`: extract once, derive both
// per-file summaries (cheap iter over units) AND run the
// full diagnostic pipeline against the same model.
let model = nyx_scanner::auth_analysis::extract::extract_authorization_model(
"go",
cfg.framework_ctx.as_ref(),
&tree,
&bytes,
&fixture,
&rules,
None,
);
let summaries = nyx_scanner::auth_analysis::extract_auth_summaries_from_model(
&model, "go", &fixture, None,
);
let diags = nyx_scanner::auth_analysis::run_auth_analysis_with_model(
model, &tree, "go", &fixture, &rules, None, None, None,
);
(summaries, diags)
});
});
}
/// Per-file `collect_top_level_units` cost on a realistic Go fixture
/// (gin context.go, ~147 functions). Targets the inner per-function
/// AST-walk path: `collect_top_level_units` →
/// `build_function_unit_with_meta` → `collect_unit_state` (recursive
/// per-AST-node walk that emits per-node value-refs).
///
/// Pre-fix (2026-05-04 perfhunt session-0009) `collect_unit_state`
/// called `extract_value_refs(node, bytes)` at every AST node, and that
/// helper recursively walked the node's full subtree. Combined with
/// the recursion below, every descendant got walked once for each of
/// its ancestors — total work O(N²) per function body. The fix
/// replaced that call with an O(1)-per-node `append_shallow_value_ref`
/// helper. A regression that re-introduces the deep walk surfaces
/// here as a ≥2× slowdown.
fn bench_collect_top_level_units_go(c: &mut Criterion) {
use tree_sitter::Parser;
let fixture = Path::new("benches/perf_fixtures/large_go_module.go")
.canonicalize()
.expect("perf fixture");
let bytes = std::fs::read(&fixture).expect("read fixture");
let mut parser = Parser::new();
let go_lang: tree_sitter::Language = tree_sitter_go::LANGUAGE.into();
parser.set_language(&go_lang).expect("set go grammar");
let tree = parser.parse(&bytes, None).expect("parse fixture");
let cfg = Config::default();
let rules = nyx_scanner::auth_analysis::config::build_auth_rules(&cfg, "go");
c.bench_function("collect_top_level_units_go", |b| {
b.iter(|| {
let mut model = nyx_scanner::auth_analysis::model::AuthorizationModel::default();
nyx_scanner::auth_analysis::extract::common::collect_top_level_units(
tree.root_node(),
&bytes,
&rules,
&mut model,
);
model
});
});
}
/// SCCP throughput on every SSA body lowered from the gin context.go
/// fixture. Targets `nyx_scanner::ssa::const_prop::const_propagate`
/// directly, isolating it from the surrounding `optimize_ssa` pass and
/// the full-fused per-file analysis.
///
/// Pre-fix (2026-05-04 perfhunt) `const_propagate` stored its lattice in
/// `HashMap<SsaValue, ConstLattice>` and walked
/// `inst_uses(inst).contains(&val)` for every block re-evaluation in the
/// SSA worklist — both shapes paid `SipHash` cost on every operand, and
/// the `inst_uses` factory allocated a fresh `Vec<SsaValue>` on every
/// call. Switching the lattice + executable-edge maps to dense
/// `Vec`-indexed storage and the use-check to a zero-allocation
/// predicate cut `const_propagate` self-time roughly in half on the
/// large-Go fixture. A regression that re-introduces the hash-keyed
/// inner loop will surface here as a ≥1.4× slowdown.
fn bench_const_propagate_large_go(c: &mut Criterion) {
use nyx_scanner::ssa;
let fixture = Path::new("benches/perf_fixtures/large_go_module.go")
.canonicalize()
.expect("perf fixture");
let cfg_obj = Config::default();
let (file_cfg, _lang) = nyx_scanner::ast::build_cfg_for_file(&fixture, &cfg_obj)
.expect("build cfg")
.expect("supported language");
// Lower every body once outside the bench loop so we measure only
// SCCP cost. The collected `(SsaBody, Cfg)` pairs are the input to
// the inner loop.
let mut bodies: Vec<ssa::ir::SsaBody> = Vec::new();
for body in &file_cfg.bodies {
// Use `body.meta.name` as the scope filter so the SSA lowering
// pulls only this function's nodes; `scope_all=true` is reserved
// for the synthetic top-level body where `name` is None.
let scope = body.meta.name.as_deref();
let scope_all = scope.is_none();
match ssa::lower_to_ssa(&body.graph, body.entry, scope, scope_all) {
Ok(ssa_body) => bodies.push(ssa_body),
Err(_) => continue,
}
}
eprintln!(
"[diag] const_propagate bench: {} bodies lowered",
bodies.len()
);
c.bench_function("const_propagate_large_go", |b| {
b.iter(|| {
let mut total_values = 0usize;
for body in &bodies {
let result = ssa::const_prop::const_propagate(body);
total_values += result.values.len();
}
total_values
});
});
}
/// `GlobalSummaries::lookup_same_lang` cost on a populated index. The
/// inner loop hashes `(Lang, String)` once per call, then `FuncKey` once
/// per candidate via `by_key.get(k)`. Pre-fix the four secondary
/// indices used `std::collections::HashMap` (SipHash). Post-fix
/// (2026-05-04 perfhunt session-0015) they use `rustc_hash::FxHashMap`,
/// trading DoS hardening (irrelevant for in-process program-keyed
/// indices) for ~5x faster hashing on the 30+ byte 3-string `FuncKey`
/// hash workload. A regression that re-introduces SipHash would
/// surface here as a ≥3x slowdown.
fn bench_global_summaries_lookup_same_lang_go(c: &mut Criterion) {
let fixture = Path::new("benches/perf_fixtures/large_go_module.go")
.canonicalize()
.expect("perf fixture");
let cfg = Config::default();
let summaries =
nyx_scanner::ast::extract_summaries_from_file(&fixture, &cfg).expect("extract summaries");
let names: Vec<String> = summaries.iter().map(|s| s.name.clone()).collect();
let global = nyx_scanner::summary::merge_summaries(summaries, None);
let lang = nyx_scanner::symbol::Lang::Go;
eprintln!("[diag] lookup_same_lang bench: {} names", names.len());
c.bench_function("global_summaries_lookup_same_lang_go", |b| {
b.iter(|| {
let mut total = 0usize;
for name in &names {
total += global.lookup_same_lang(lang, name).len();
}
total
});
}); });
} }
@ -100,7 +437,15 @@ criterion_group!(
benches, benches,
bench_ast_only_scan, bench_ast_only_scan,
bench_full_scan, bench_full_scan,
bench_full_scan_with_state,
bench_single_file_parse_and_cfg, bench_single_file_parse_and_cfg,
bench_state_analysis_only,
bench_classify, bench_classify,
bench_analyse_file_fused_large_go,
bench_extract_authorization_model_go,
bench_extract_authorization_model_shared_go,
bench_collect_top_level_units_go,
bench_const_propagate_large_go,
bench_global_summaries_lookup_same_lang_go,
); );
criterion_main!(benches); criterion_main!(benches);

22
book.toml Normal file
View file

@ -0,0 +1,22 @@
[book]
title = "Nyx"
authors = ["Eli Peter"]
description = " Multi-language static analysis with cross-file taint tracking. Scan your repo, triage findings in your browser, commit triage state with your code. No cloud, no account."
language = "en"
src = "docs"
[output.html]
default-theme = "navy"
preferred-dark-theme = "navy"
git-repository-url = "https://github.com/elicpeter/nyx"
edit-url-template = "https://github.com/elicpeter/nyx/edit/master/{path}"
site-url = "/nyx/"
additional-css = ["docs/mermaid.css"]
additional-js = ["docs/mermaid-init.js"]
[output.html.fold]
enable = true
level = 1
[output.html.search]
enable = true

436
build.rs Normal file
View file

@ -0,0 +1,436 @@
use std::collections::BTreeMap;
use std::path::Path;
use std::process::Command;
fn main() {
// Phase 17 (Track E.1): always emit the seccomp policy table to
// OUT_DIR. Gated runtime via `#[cfg(target_os = "linux")]`, but the
// codegen runs on every host so `cargo check` on macOS still emits
// the file (the include never actually compiles on non-Linux).
emit_seccomp_policy();
// Phase 19 (Track E.3): emit the IMAGE_DIGESTS table from
// tools/image-builder/images.toml. The runtime side (src/dynamic/
// toolchain.rs) `include!`s the generated file unconditionally so
// every host build has the same pinned-digest catalogue.
emit_image_digests();
// Only relevant when the serve feature is active.
if std::env::var("CARGO_FEATURE_SERVE").is_err() {
return;
}
let dist_dir = Path::new("src/server/assets/dist");
let index_html = dist_dir.join("index.html");
// Re-run build.rs only when dist output is missing/changed
println!("cargo:rerun-if-changed=src/server/assets/dist/index.html");
if index_html.exists() {
// Dist already built, nothing to do
return;
}
// Dist missing, try to build frontend
let frontend_dir = Path::new("frontend");
if !frontend_dir.join("package.json").exists() {
emit_placeholder_and_warn(dist_dir);
return;
}
// Run npm install + build
println!("cargo:warning=Frontend dist not found, running npm install && npm run build...");
let npm_install = Command::new("npm")
.arg("install")
.current_dir(frontend_dir)
.status();
match npm_install {
Ok(s) if s.success() => {}
_ => {
emit_placeholder_and_warn(dist_dir);
return;
}
}
let npm_build = Command::new("npm")
.arg("run")
.arg("build")
.current_dir(frontend_dir)
.status();
match npm_build {
Ok(s) if s.success() => {
println!("cargo:warning=Frontend built successfully.");
}
_ => {
emit_placeholder_and_warn(dist_dir);
}
}
}
fn emit_placeholder_and_warn(dist_dir: &Path) {
// Create minimal placeholder files so compilation succeeds
std::fs::create_dir_all(dist_dir).ok();
std::fs::write(
dist_dir.join("index.html"),
"<!DOCTYPE html><html><body><h1>Frontend not built</h1><p>Run: cd frontend &amp;&amp; npm install &amp;&amp; npm run build</p></body></html>",
)
.ok();
std::fs::write(dist_dir.join("app.js"), "// frontend not built\n").ok();
std::fs::write(dist_dir.join("style.css"), "/* frontend not built */\n").ok();
println!(
"cargo:warning=Node.js/npm not available — wrote placeholder frontend assets. Run 'cd frontend && npm install && npm run build' for the real UI."
);
}
// ── Phase 17 (Track E.1) — seccomp policy codegen ────────────────────────────
const SECCOMP_POLICY_PATH: &str = "src/dynamic/sandbox/seccomp/seccomp_policy.toml";
/// Cap-name → Cap bit value table. Mirrors the `bitflags!` block in
/// `src/labels/mod.rs`. Keep in sync when adding/removing `Cap`
/// constants.
const CAP_BIT_FOR_NAME: &[(&str, u32)] = &[
("ENV_VAR", 1 << 0),
("HTML_ESCAPE", 1 << 1),
("SHELL_ESCAPE", 1 << 2),
("URL_ENCODE", 1 << 3),
("JSON_PARSE", 1 << 4),
("FILE_IO", 1 << 5),
("FMT_STRING", 1 << 6),
("SQL_QUERY", 1 << 7),
("DESERIALIZE", 1 << 8),
("SSRF", 1 << 9),
("CODE_EXEC", 1 << 10),
("CRYPTO", 1 << 11),
("UNAUTHORIZED_ID", 1 << 12),
("DATA_EXFIL", 1 << 13),
("LDAP_INJECTION", 1 << 14),
("XPATH_INJECTION", 1 << 15),
("HEADER_INJECTION", 1 << 16),
("OPEN_REDIRECT", 1 << 17),
("SSTI", 1 << 18),
("XXE", 1 << 19),
("PROTOTYPE_POLLUTION", 1 << 20),
];
fn emit_seccomp_policy() {
println!("cargo:rerun-if-changed={}", SECCOMP_POLICY_PATH);
let out_dir = std::env::var("OUT_DIR").expect("OUT_DIR must be set by cargo");
let out_path = Path::new(&out_dir).join("seccomp_policy.rs");
// Read the policy file; on missing file (e.g. fresh checkout on a
// foreign target), emit empty tables so compilation still succeeds.
let toml_text = match std::fs::read_to_string(SECCOMP_POLICY_PATH) {
Ok(s) => s,
Err(_) => {
std::fs::write(
&out_path,
"pub static BASE: &[&str] = &[];\npub static CAP: &[(u32, &[&str])] = &[];\n",
)
.expect("write empty seccomp policy stub");
return;
}
};
let parsed = parse_seccomp_toml(&toml_text);
let mut out = String::new();
out.push_str("// generated by build.rs from seccomp_policy.toml — do not edit\n\n");
// Base allowlist.
out.push_str("pub static BASE: &[&str] = &[\n");
for name in &parsed.base {
out.push_str(&format!(" \"{}\",\n", escape(name)));
}
out.push_str("];\n\n");
// Per-cap allowlists.
out.push_str("pub static CAP: &[(u32, &[&str])] = &[\n");
for (cap_name, allow) in &parsed.caps {
let bit = CAP_BIT_FOR_NAME
.iter()
.find(|(n, _)| *n == cap_name.as_str())
.map(|(_, b)| *b)
.unwrap_or_else(|| {
panic!(
"seccomp_policy.toml references unknown Cap '{cap_name}' — \
add it to CAP_BIT_FOR_NAME in build.rs first"
)
});
out.push_str(&format!(" (0x{bit:08x}_u32, &[\n"));
for name in allow {
out.push_str(&format!(" \"{}\",\n", escape(name)));
}
out.push_str(" ]),\n");
}
out.push_str("];\n");
std::fs::write(&out_path, out).expect("write seccomp policy table");
}
#[derive(Default)]
struct SeccompPolicy {
base: Vec<String>,
caps: BTreeMap<String, Vec<String>>,
}
/// Tiny line-oriented TOML parser scoped to the shape used by
/// `seccomp_policy.toml`:
///
/// [base]
/// allow = ["read", "write", ...]
///
/// [cap.SQL_QUERY]
/// allow = [
/// "fdatasync",
/// ...
/// ]
///
/// Comments (`#`) and blank lines are skipped. Multi-line array bodies
/// are accumulated until the closing `]`.
fn parse_seccomp_toml(src: &str) -> SeccompPolicy {
let mut policy = SeccompPolicy::default();
let mut current_section: Option<String> = None;
let mut accumulating_array: Option<String> = None;
let mut array_buf = String::new();
for raw_line in src.lines() {
let line = strip_comment(raw_line).trim();
if line.is_empty() {
continue;
}
if let Some(_key) = accumulating_array.as_ref() {
array_buf.push_str(line);
array_buf.push('\n');
if line.contains(']') {
let key = accumulating_array.take().unwrap();
let values = parse_string_array(&array_buf);
store_allow(&mut policy, current_section.as_deref(), &key, values);
array_buf.clear();
}
continue;
}
if let Some(section) = line.strip_prefix('[').and_then(|s| s.strip_suffix(']')) {
current_section = Some(section.to_string());
continue;
}
if let Some((key, rest)) = line.split_once('=') {
let key = key.trim().to_string();
let rest = rest.trim();
if rest.starts_with('[') && rest.contains(']') {
let values = parse_string_array(rest);
store_allow(&mut policy, current_section.as_deref(), &key, values);
} else if rest.starts_with('[') {
accumulating_array = Some(key);
array_buf.push_str(rest);
array_buf.push('\n');
}
continue;
}
}
policy
}
fn strip_comment(line: &str) -> &str {
let mut in_string = false;
let bytes = line.as_bytes();
for (i, &b) in bytes.iter().enumerate() {
match b {
b'"' => in_string = !in_string,
b'#' if !in_string => return &line[..i],
_ => {}
}
}
line
}
fn parse_string_array(src: &str) -> Vec<String> {
// Find every "..." run between the first `[` and the last `]`.
let start = src.find('[').map(|i| i + 1).unwrap_or(0);
let end = src.rfind(']').unwrap_or(src.len());
let body = &src[start..end];
let mut out = Vec::new();
let mut chars = body.chars().peekable();
while let Some(c) = chars.next() {
if c == '"' {
let mut s = String::new();
for c2 in chars.by_ref() {
if c2 == '"' {
break;
}
s.push(c2);
}
out.push(s);
}
}
out
}
fn store_allow(policy: &mut SeccompPolicy, section: Option<&str>, key: &str, values: Vec<String>) {
if key != "allow" {
return;
}
match section {
Some("base") => policy.base = values,
Some(other) => {
if let Some(cap_name) = other.strip_prefix("cap.") {
policy.caps.insert(cap_name.to_string(), values);
}
}
None => {}
}
}
fn escape(s: &str) -> String {
s.replace('\\', "\\\\").replace('"', "\\\"")
}
// ── Phase 19 (Track E.3) — image digest codegen ──────────────────────────────
const IMAGE_CATALOGUE_PATH: &str = "tools/image-builder/images.toml";
/// Parse `tools/image-builder/images.toml` and emit two tables to
/// `$OUT_DIR/image_digests.rs`:
///
/// pub static IMAGE_DIGESTS: phf::Map<&'static str, &'static str> = …;
/// pub static IMAGE_BASES: phf::Map<&'static str, &'static str> = …;
///
/// `IMAGE_DIGESTS` keys are toolchain IDs (`python-3.11`, …) and values are
/// `<base>@sha256:…` strings ready to hand to `docker pull`. An empty digest
/// in `images.toml` is treated as "not yet pinned" and the entry is omitted
/// from `IMAGE_DIGESTS`; `IMAGE_BASES` always carries the unpinned reference
/// so `docker.rs` can fall back to a tag pull when no digest is recorded.
fn emit_image_digests() {
println!("cargo:rerun-if-changed={}", IMAGE_CATALOGUE_PATH);
let out_dir = std::env::var("OUT_DIR").expect("OUT_DIR must be set by cargo");
let out_path = Path::new(&out_dir).join("image_digests.rs");
let toml_text = match std::fs::read_to_string(IMAGE_CATALOGUE_PATH) {
Ok(s) => s,
Err(_) => {
// Missing catalogue (fresh checkout without the file) — emit
// empty maps so the runtime include still compiles.
std::fs::write(
&out_path,
"/// generated empty IMAGE_DIGESTS — images.toml missing\n\
pub static IMAGE_DIGESTS: phf::Map<&'static str, &'static str> = \
phf::phf_map! {};\n\
pub static IMAGE_BASES: phf::Map<&'static str, &'static str> = \
phf::phf_map! {};\n",
)
.expect("write empty image digests stub");
return;
}
};
let entries = parse_image_catalogue(&toml_text);
let mut out = String::new();
out.push_str("// generated by build.rs from tools/image-builder/images.toml — do not edit\n\n");
// IMAGE_DIGESTS: only entries with a non-empty digest survive.
out.push_str(
"pub static IMAGE_DIGESTS: phf::Map<&'static str, &'static str> = phf::phf_map! {\n",
);
for e in &entries {
if e.digest.is_empty() {
continue;
}
let pinned = format!("{}@{}", e.base, e.digest);
out.push_str(&format!(
" \"{}\" => \"{}\",\n",
escape(&e.toolchain_id),
escape(&pinned),
));
}
out.push_str("};\n\n");
// IMAGE_BASES: every entry, digest stripped. Used by docker.rs when no
// digest is pinned yet so a `docker pull <base>` is still possible.
out.push_str(
"pub static IMAGE_BASES: phf::Map<&'static str, &'static str> = phf::phf_map! {\n",
);
for e in &entries {
out.push_str(&format!(
" \"{}\" => \"{}\",\n",
escape(&e.toolchain_id),
escape(&e.base),
));
}
out.push_str("};\n");
std::fs::write(&out_path, out).expect("write image_digests.rs");
}
#[derive(Default)]
struct ImageEntry {
toolchain_id: String,
base: String,
digest: String,
}
/// Tiny TOML parser scoped to the `[[image]] toolchain_id = …` shape used
/// by `images.toml`. Only the three fields we consume here are extracted;
/// the rest of each entry (`toolchain`, `packages`) is ignored.
fn parse_image_catalogue(src: &str) -> Vec<ImageEntry> {
let mut entries: Vec<ImageEntry> = Vec::new();
let mut current: Option<ImageEntry> = None;
for raw_line in src.lines() {
let line = strip_comment(raw_line).trim();
if line.is_empty() {
continue;
}
if line == "[[image]]" {
if let Some(prev) = current.take()
&& !prev.toolchain_id.is_empty()
{
entries.push(prev);
}
current = Some(ImageEntry::default());
continue;
}
if line.starts_with("[[") || line.starts_with('[') {
// Any other section ends accumulation.
if let Some(prev) = current.take()
&& !prev.toolchain_id.is_empty()
{
entries.push(prev);
}
continue;
}
let Some(slot) = current.as_mut() else {
continue;
};
let Some((key, value)) = line.split_once('=') else {
continue;
};
let key = key.trim();
let value = value.trim().trim_matches('"').trim_matches('\'');
match key {
"toolchain_id" => slot.toolchain_id = value.to_owned(),
"base" => slot.base = value.to_owned(),
"digest" => slot.digest = value.to_owned(),
_ => {}
}
}
if let Some(prev) = current.take()
&& !prev.toolchain_id.is_empty()
{
entries.push(prev);
}
entries
}

View file

@ -8,16 +8,20 @@
[scanner] [scanner]
## If full uses both ast patterns and cfg taint analysis, ## Analysis mode: full | ast | cfg | taint
## Possible values: full | ast | cfg ## full = AST analyses + CFG + state + taint
## ast = AST analyses only (tree-sitter patterns + auth analysis; no CFG/taint/state)
## cfg = CFG + state + taint only (no AST patterns)
## taint = taint-focused CFG analysis only (no AST patterns, no state findings)
mode = "full" mode = "full"
## Minimum severity level to include in the report ## Minimum severity level to include in the report
## Possible values: Low | Medium | High | Critical ## Possible values: Low | Medium | High
min_severity = "Low" min_severity = "Low"
## Maximum file size to scan (MiB); null = unlimited ## Maximum file size to scan (MiB); null = unlimited.
max_file_size_mb = null ## Raise or set to `null` when scanning a trusted codebase with large generated files or bundles.
max_file_size_mb = 16
## File extensions to ignore completely ## File extensions to ignore completely
excluded_extensions = [ excluded_extensions = [
@ -34,7 +38,7 @@ excluded_directories = [
## Individual files to ignore completely ## Individual files to ignore completely
excluded_files = [] excluded_files = []
## Honour global ignore file (e.g. ~/.config/nyx/ignore) ## Honour global ignore file (e.g. ~/.config/nyx/ignore) (RESERVED)
read_global_ignore = false read_global_ignore = false
## Honour .gitignore / .hgignore, etc. ## Honour .gitignore / .hgignore, etc.
@ -52,43 +56,112 @@ follow_symlinks = false
## Scan hidden files (dot-files) ## Scan hidden files (dot-files)
scan_hidden_files = false scan_hidden_files = false
## Enable state-model dataflow analysis (resource lifecycle + auth state).
## Detects use-after-close, double-close, resource leaks, and unauthed access.
## Requires mode = "full" or "cfg" (or explicit taint/state-capable scans). Default: on.
enable_state_analysis = true
## Enable AST-based authorization analysis for supported web frameworks.
## Produces `<lang>.auth.*` findings such as admin-route, ownership, token,
## and stale-auth checks. Runs only when AST analysis is active:
## mode = "full" or "ast" => auth analysis runs
## mode = "cfg" or "taint" => auth analysis is skipped
## Per-language auth overrides live under [analysis.languages.<slug>.auth].
enable_auth_analysis = true
## Run dynamic verification on Medium/High confidence findings after static analysis.
## Default builds include this support. Use --no-verify or set this false for
## fast static-only scans, or when building with --no-default-features.
verify = true
## Also verify Low-confidence findings. Slower; intended for payload tuning.
verify_all_confidence = false
## Dynamic sandbox backend: auto | docker | process | firecracker
## auto uses Docker when available, otherwise the process backend.
verify_backend = "auto"
## Process-backend hardening profile: standard | strict
harden_profile = "standard"
## Catch per-file panics during analysis and continue the scan.
## When false (default), a panic in one file's analyser aborts the whole
## scan — useful for catching engine bugs loudly in development.
## When true, the poisoned file is skipped with a warning; the rest of
## the scan proceeds. Enable when running against untrusted input.
# enable_panic_recovery = false
[database] [database]
## Where to store the SQLite database (empty = default path) ## Custom SQLite database path (empty = platform default) (RESERVED)
path = "" path = ""
## Number of days to keep database files; 0 = no cleanup (UNIMPLEMENTED) ## Number of days to keep database files; 0 = no cleanup (RESERVED)
auto_cleanup_days = 30 auto_cleanup_days = 30
## Maximum database size in MiB; 0 = no limit (UNIMPLEMENTED) ## Maximum database size in MiB; 0 = no limit (RESERVED)
max_db_size_mb = 1024 max_db_size_mb = 1024
## Run VACUUM on startup (UNIMPLEMENTED) ## Run VACUUM on startup
vacuum_on_startup = false vacuum_on_startup = false
[output] [output]
## Output format — only "console" exists for now ## Default output format: console | json | sarif
## Used when --format is not specified on the command line.
default_format = "console" default_format = "console"
## Suppress all console output (UNIMPLEMENTED) ## Suppress all human-readable status output (stderr)
quiet = false quiet = false
## Enable attack-surface ranking (sort findings by exploitability score)
attack_surface_ranking = true
## Cap the number of issues shown; null = unlimited ## Cap the number of issues shown; null = unlimited
max_results = null max_results = null
## Minimum attack-surface score to include; null = no minimum
## Findings below this threshold are dropped after ranking.
## Requires attack_surface_ranking to be enabled.
min_score = null
## Minimum confidence level to include in output; null = no minimum
## Values: "low", "medium", "high"
# min_confidence = "medium"
## Include Quality-category findings (excluded by default).
## Quality findings (e.g. unwrap, expect, panic) are noise-heavy and hidden
## unless this is set to true or --include-quality is passed.
include_quality = false
## Show all findings: disables category filtering, rollups, and LOW budgets.
## Equivalent to --all on the command line.
show_all = false
## Maximum total LOW findings to show (rollups count as 1).
max_low = 20
## Maximum LOW findings per file (rollups count as 1).
max_low_per_file = 1
## Maximum LOW findings per rule (rollups count as 1).
max_low_per_rule = 10
## Number of example locations stored in rollup findings.
rollup_examples = 5
[performance] [performance]
## Maximum search depth; null = unlimited (UNIMPLEMENTED) ## Maximum search depth; null = unlimited
max_depth = null max_depth = null
## Minimum depth for reported entries; null = none (UNIMPLEMENTED) ## Minimum depth for reported entries; null = none (RESERVED)
min_depth = null min_depth = null
## Stop traversing into matching directories ## Stop traversing into matching directories (RESERVED)
prune = false prune = false
## Worker threads; null or 0 = auto ## Worker threads; null or 0 = auto
@ -101,10 +174,212 @@ batch_size = 100
channel_multiplier = 4 channel_multiplier = 4
## Maximum stack size for Rayon threads (bytes) ## Maximum stack size for Rayon threads (bytes)
rayon_thread_stack_size = 8 * 1024 * 1024 # 8 MiB rayon_thread_stack_size = 8388608 # 8 MiB
## Timeout on individual files (seconds); null = none (UNIMPLEMENTED) ## Timeout on individual files (seconds); null = none (RESERVED)
scan_timeout_secs = null scan_timeout_secs = null
## Maximum memory to use in MiB; 0 = no limit (UNIMPLEMENTED) ## Maximum memory to use in MiB; 0 = no limit (RESERVED)
memory_limit_mb = 512 memory_limit_mb = 512
[server]
## Enable the local web UI server (nyx serve)
enabled = true
## Host to bind to (localhost only by default for security)
host = "127.0.0.1"
## Port for the web UI
port = 9700
## Open browser automatically when serve starts
open_browser = true
## Auto-reload UI when scan results change
auto_reload = true
## Persist scan runs for history view
persist_runs = true
## Maximum number of saved runs
max_saved_runs = 50
## Auto-sync triage decisions to .nyx/triage.json in the project root.
## When enabled, triage changes are written to this file so they can be
## committed to git and shared with your team.
triage_sync = true
[runs]
## Persist scan run history to disk
persist = false
## Maximum number of runs to keep
max_runs = 100
## Save scan logs with each run
save_logs = false
## Save stdout capture with each run
save_stdout = false
## Save code snippets in findings
save_code_snippets = true
# ─── Scan Profiles ──────────────────────────────────────────────────
# Named presets that override scan-related config.
# Activate with --profile <name> on the command line.
#
# Built-in profiles: quick, full, ci, taint_only, conservative_large_repo.
# Override a built-in by defining [profiles.<name>] here.
#
# [profiles.quick]
# mode = "ast"
# min_severity = "Medium"
#
# [profiles.ci]
# mode = "full"
# min_severity = "Medium"
# quiet = true
# default_format = "sarif"
# ─── Analysis engine toggles ────────────────────────────────────────
# Release-grade switches for optional analysis passes. Every field has a
# matching CLI flag (e.g. --no-symex / --backwards-analysis), which takes
# precedence over the config value for a single run. The listed env vars
# override both config and CLI when set to "0" or "false".
#
# For a shortcut that sets the full stack in one shot, use
# `nyx scan --engine-profile {fast,balanced,deep}`. The profile applies
# before individual toggles, so you can mix (e.g. `--engine-profile fast
# --backwards-analysis`). See `docs/cli.md` for profile contents.
#
# To print the resolved engine config for a given invocation without
# running a scan, pass `--explain-engine`.
[analysis.engine]
## Path-constraint solving (prunes infeasible paths in taint).
## Default: on. CLI: --constraint-solving / --no-constraint-solving.
## env: NYX_CONSTRAINT=0 disables.
constraint_solving = true
## Abstract interpretation (interval / string domains).
## Default: on. CLI: --abstract-interp / --no-abstract-interp.
## env: NYX_ABSTRACT_INTERP=0 disables.
abstract_interpretation = true
## k=1 context-sensitive callee inlining for intra-file calls.
## Default: on. CLI: --context-sensitive / --no-context-sensitive.
## env: NYX_CONTEXT_SENSITIVE=0 disables.
context_sensitive = true
## Demand-driven backwards taint analysis. Adds a second pass from
## candidate sinks back toward sources to recover flows the forward
## solver gave up on. Default: off because it adds scan time on large
## repos. CLI: --backwards-analysis / --no-backwards-analysis.
## env: NYX_BACKWARDS=1 enables.
backwards_analysis = false
## Per-file tree-sitter parse timeout (ms). 0 disables the cap.
## CLI: --parse-timeout-ms. env: NYX_PARSE_TIMEOUT_MS.
parse_timeout_ms = 10000
[analysis.engine.symex]
## Run the symex pipeline after taint. Produces witness strings and
## symbolic verdicts; disable only if you want raw taint output.
## Default: on. CLI: --symex / --no-symex. env: NYX_SYMEX=0 disables.
enabled = true
## Persist and consult cross-file SSA bodies so symex can reason about
## callees defined in other files. Adds index/DB work on pass 1.
## Default: on. CLI: --cross-file-symex / --no-cross-file-symex.
## env: NYX_CROSS_FILE_SYMEX=0 disables.
cross_file = true
## Intra-file interprocedural symex (k >= 2 via frame stack).
## Default: on. CLI: --symex-interproc / --no-symex-interproc.
## env: NYX_SYMEX_INTERPROC=0 disables.
interprocedural = true
## Use the SMT backend when nyx was built with the `smt` feature.
## Ignored when the feature is off.
## Default: on. CLI: --smt / --no-smt. env: NYX_SMT=0 disables.
smt = true
# ─── Detector knobs ──────────────────────────────────────────────────
# Per-detector class suppression and enablement. These knobs target
# common false-positive classes that show up on legitimate forwarding
# pipelines (telemetry / analytics / metrics dispatch).
#
# [detectors.data_exfil]
#
# # Toggle the entire `taint-data-exfiltration` detector class. Set to
# # false on projects whose architecture routes user-derived payloads
# # through trusted forwarding boundaries by design.
# enabled = true
#
# # URL prefixes treated as trusted destinations. Outbound calls whose
# # destination argument has a static prefix (proven by the abstract
# # string domain or visible as a literal) matching one of these entries
# # have `Cap::DATA_EXFIL` dropped before event emission. Mirrors the
# # SSRF prefix-lock semantics. Use full origins or origin-prefixed
# # paths (e.g. "https://api.internal/") so partial matches across
# # unrelated hosts cannot occur.
# trusted_destinations = [
# "https://api.internal/",
# "https://telemetry.",
# ]
# ─── Per-language analysis rules ─────────────────────────────────────
# [analysis.languages.javascript.auth]
# enabled = true
# admin_path_patterns = ["/admin/"]
# admin_guard_names = ["requireAdmin", "isAdmin", "adminOnly"]
# login_guard_names = ["requireLogin", "authenticate", "requireAuth"]
# authorization_check_names = ["checkMembership", "hasWorkspaceMembership", "checkOwnership"]
# mutation_indicator_names = ["update", "delete", "create", "archive", "publish", "addMembership"]
# read_indicator_names = ["find", "findById", "get", "list"]
# token_lookup_names = ["findByToken"]
# token_expiry_fields = ["expires_at", "expiresAt"]
# token_recipient_fields = ["email", "recipient_email", "recipientEmail"]
# Auth-analysis rule IDs use language-normalized prefixes:
# javascript + typescript => js.auth.*
# python => py.auth.* ruby => rb.auth.* rust => rs.auth.*
# TypeScript inherits [analysis.languages.javascript.auth] by default; add an
# optional [analysis.languages.typescript.auth] block only for TS-specific
# overlays. These settings affect auth analysis only in "full" or "ast" mode.
# Add custom sources, sanitizers, sinks, terminators, and event handlers.
# Each language is keyed under [analysis.languages.<slug>] where slug is
# one of: rust, javascript, typescript, python, go, java, c, cpp, php, ruby.
#
# Example: recognise `escapeHtml` as an HTML sanitizer in JavaScript:
#
# [analysis.languages.javascript]
# event_handlers = ["addEventListener"]
# terminators = ["process.exit"]
#
# [[analysis.languages.javascript.rules]]
# matchers = ["escapeHtml"]
# kind = "sanitizer"
# cap = "html_escape"
#
# [[analysis.languages.javascript.rules]]
# matchers = ["location.href", "window.location.href"]
# kind = "sink"
# cap = "url_encode"
#
# Valid `kind` values: "source", "sanitizer", "sink"
# Valid `cap` values: "env_var", "html_escape", "shell_escape",
# "url_encode", "json_parse", "file_io",
# "fmt_string", "sql_query", "deserialize",
# "ssrf", "code_exec", "crypto", "all"

View file

@ -1,13 +1,68 @@
[licenses] [licenses]
allow = [ allow = [
# --- Apache / MIT / BSD / permissive ---
"Apache-2.0", "Apache-2.0",
"MIT", "MIT",
"MIT-0", "MIT-0",
"Unicode-3.0",
"BSD-2-Clause", "BSD-2-Clause",
"Unlicense", "BSD-3-Clause",
"ISC",
"Zlib", "Zlib",
"zlib-acknowledgement",
"BSL-1.0",
"NCSA",
"PostgreSQL",
"curl",
"BlueOak-1.0.0",
"X11",
"HPND",
"TCL",
"ICU",
"Info-ZIP",
# --- Unicode / data / specs ---
"Unicode-DFS-2016",
"Unicode-3.0",
# --- compression / libs ---
"bzip2-1.0.6",
"libpng-2.0",
"IJG",
"FTL",
# --- public domain style ---
"CC0-1.0", "CC0-1.0",
"Unlicense",
"0BSD",
# --- weak copyleft (GPL-compatible) ---
"MPL-2.0", "MPL-2.0",
"LGPL-3.0",
"EPL-2.0",
# --- GPL family ---
"GPL-3.0", "GPL-3.0",
"GPL-3.0-or-later",
"GPL-2.0",
# --- Python / PSF ---
"PSF-2.0",
"Python-2.0",
"Python-2.0.1",
# --- Artistic / Perl ---
"Artistic-2.0",
# --- LLVM / clang ---
"Apache-2.0 WITH LLVM-exception",
# --- data / ML ---
"CDLA-Permissive-2.0",
# --- fonts ---
"OFL-1.1",
# --- Creative Commons (code-safe ones) ---
"CC-BY-3.0",
"CC-BY-4.0",
] ]

35
docs/SUMMARY.md Normal file
View file

@ -0,0 +1,35 @@
# Summary
# Getting started
- [Quickstart](quickstart.md)
- [Installation](installation.md)
# Using nyx
- [CLI reference](cli.md)
- [Browser UI](serve.md)
- [Dynamic verification](dynamic.md)
- [Configuration](configuration.md)
- [Output formats](output.md)
# Coverage
- [Language maturity](language-maturity.md)
- [Rules](rules.md)
- [Auth analysis](auth.md)
# Under the hood
- [How it works](how-it-works.md)
- [Advanced analysis](advanced-analysis.md)
- [Detectors](detectors.md)
- [Patterns](detectors/patterns.md)
- [CFG](detectors/cfg.md)
- [State](detectors/state.md)
- [Taint](detectors/taint.md)
# Project
- [Roadmap](roadmap.md)
- [Changelog](changelog.md)

340
docs/advanced-analysis.md Normal file
View file

@ -0,0 +1,340 @@
# Advanced Analysis
Nyx layers several analysis passes on top of the core SSA taint engine.
Most are switchable via config (`[analysis.engine]` in `nyx.conf` /
`nyx.local`), a matching CLI flag pair, or, as a last-resort override for
library users with no CLI entry point, a `NYX_*` environment variable. The
five precision-tuning passes (abstract interpretation, context sensitivity,
symbolic execution, constraint solving, field-sensitive points-to) are
**on by default** because the benchmark numbers in
[language-maturity.md](language-maturity.md) are measured with them on.
The demand-driven backwards walk and hierarchy fan-out sit alongside but
are not user-toggleable in the same way.
See [`Configuration`](configuration.md#analysisengine) for the full config
surface and CLI flag table. This page explains what each pass does, why it
helps, how to disable it, and what it does not cover.
---
## Abstract interpretation
**What it does.** Propagates interval and string abstract domains through the
SSA worklist alongside taint. Integer values carry `[lo, hi]` bounds;
string values carry a prefix and suffix (plus a bit domain for known-zero /
known-one bits). Values are joined at merge points and widened at loop
heads so the worklist always terminates.
**Why it helps.** Lets Nyx suppress some findings that are obviously safe
given the abstract value; a proven-bounded integer does not flow into a
SQL sink as an injection risk; an SSRF sink whose URL prefix is locked to a
trusted host stays quiet. This turns a large class of FPs on numeric and
locked-prefix paths into true negatives.
**Path traversal.** The path domain accepts canonicalised-and-rooted
shapes via `PathFact::is_path_traversal_safe`: a path that is
dotdot-free and either non-absolute or carries a verified prefix-lock has
its `Cap::FILE_IO` cleared. When the lock argument is a string literal
the lock prefix is recorded directly; when it is a method call, field
access, or configured root, an `OPAQUE_PREFIX_LOCK` marker captures the
structural invariant ("rooted under SOME prefix") instead. This closes
the Ruby `File.expand_path + start_with?(root)`, Python
`os.path.realpath + .startswith(root)`, and JS
`path.resolve + .startsWith(root)` shapes. `classify_path_assertion`
recognises JS `.startsWith(...)`, Python `.startswith(...)`, Ruby
`.start_with?(...)` (paren and paren-less), and Go `strings.HasPrefix(...)`.
Branch narrowing flips lock attachment under condition negation
(`if !target.startsWith(ROOT) { return; }` attaches the lock to the
surviving block, not the rejection arm).
**How to turn it off.**
| Surface | Value |
|---|---|
| Config | `abstract_interpretation = false` under `[analysis.engine]` |
| CLI flag | `--no-abstract-interp` |
| Env var (legacy) | `NYX_ABSTRACT_INTERP=0` |
**Limitations.** The interval domain is 64-bit signed; very wide or
overflow-producing arithmetic degrades to `` (unbounded). String prefix /
suffix tracking is concat-only; it does not model reordering, reversal, or
character-level regex constraints. Loop widening deliberately drops
changing bounds rather than chasing fixpoints.
**Source**: [`src/abstract_interp/`](https://github.com/elicpeter/nyx/tree/master/src/abstract_interp/).
---
## Context-sensitive analysis
**What it does.** Adds k=1 call-site-sensitive taint propagation for
intra-file callees. When a function is invoked, Nyx reanalyzes the callee
body with the actual per-argument taint signature of the call site,
producing call-site-specific return taint. Results are cached by
`(function_name, ArgTaintSig)` so repeated calls with the same signature
are free.
**Why it helps.** A helper called once with a tainted argument and once
with a sanitized argument produces two different findings; without k=1
sensitivity, the conservative union of both call sites would be applied
to the sanitized call, producing a spurious finding there.
**How to turn it off.**
| Surface | Value |
|---|---|
| Config | `context_sensitive = false` under `[analysis.engine]` |
| CLI flag | `--no-context-sensitive` |
| Env var (legacy) | `NYX_CONTEXT_SENSITIVE=0` |
**Limitations.** Intra-file only. Cross-file callees are resolved via
summaries (see `src/summary/`) rather than re-inlined. Depth is capped at
k=1 to prevent cache blow-up and re-entrancy; higher k would require a
different cache key design. Callee bodies larger than the internal
`MAX_INLINE_BLOCKS` threshold fall back to the summary path. Cache keys
hash per-argument `Cap` bits but not source-origin identity, so two
callers with identical caps but different origins share cached
origin-attribution.
**Helper-validator propagation.** SSA summaries carry a
`validated_params_to_return` field listing parameter indices whose
taint flow to the return value is fully validated by a dominating
predicate (regex allowlist, type check, validation call) on every
return path. At call sites, each tainted argument passed to a
validated position, and the call's own return value, are marked
`validated_must` / `validated_may` in the caller's SSA taint state,
the same way an inline `if (!regex.test(x)) throw …` would validate
the surviving branch. Sound because the summary is recorded only when
the parameter's name is in `validated_must` at *every* return block; a
normal-returning call therefore proves the validating arm. JS/TS
object-pattern formals (`({ column, operator, value }) => …`) seed
every destructured sibling in the per-parameter probe, so flow through
any of them counts toward the slot being validated.
**Source**: [`src/taint/ssa_transfer/`](https://github.com/elicpeter/nyx/tree/master/src/taint/ssa_transfer/)
(`ArgTaintSig`, `InlineCache`, `inline_analyse_callee`,
`propagate_validated_params_to_return`).
---
## Field-sensitive points-to
**What it does.** Runs a Steensgaard-style alias analysis that interns field
accesses as their own abstract locations. `c.mu` becomes `Field(c, mu)`,
distinct from `c` itself; a write to `obj.cache` and a read from
`obj.cache` in different methods both land on the same abstract location;
subscript reads and writes (`arr[i]`, `map[k] = v`) lower to synthetic
`__index_get__` / `__index_set__` calls so the engine can model them
through the same container store/load primitives used for STL containers,
Python lists, JS arrays, and similar.
**Why it helps.** It splits a class of false positives that the
whole-variable taint model produced. Before this pass, `obj.field =
tainted; sink(obj.other_field)` would taint `obj` as a whole and fire on
the safe field; the receiver-type / sub-field distinction is also what
lets the resource-lifecycle pass attribute a `c.mu.Lock()` to the lock
field rather than to its container. Cross-method field flow (writer in
one method, reader in another) shows up only when fields have stable
identity independent of the parent value.
**How to turn it off.**
| Surface | Value |
|---|---|
| Env var | `NYX_POINTER_ANALYSIS=0` |
The pass is **on by default**. The env-var override exists so you can
compare against the pre-pointer baseline.
**Limitations.** This is not a general escape analysis. Function pointers
and arbitrary indirect calls still resolve to no callee, and deep alias
chains through `*p` / `p->field` in C/C++ are not tracked beyond the
direct field case. The points-to set per value is capped at
`--max-pointsto` (default 32); when truncation happens, an engine note
records the precision loss.
**Source**: [`src/pointer/`](https://github.com/elicpeter/nyx/tree/master/src/pointer/).
---
## Hierarchy fan-out for virtual dispatch
**What it does.** Builds a per-language type-hierarchy index in pass 1
(extends, implements, impl-for, includes; the exact construct depends on
the language) and uses it in pass 2 to widen method-call resolution. When
a call's receiver is statically typed as a super-class, trait, or
interface, the resolver returns every concrete implementer it has seen
in the codebase rather than just the first match.
**Why it helps.** Without it, a call like `repository.findById(id)` where
`repository` is typed as the interface gets resolved against whatever the
single-result resolver finds first; if the matching implementer is in
another file the call effectively goes opaque. With the hierarchy, the
taint engine sees the union of every implementer's transform and the
flow shows up regardless of which file holds the concrete class.
**Limitations.** Fan-out is capped at 8 implementers per call site; over
that, the tail is silently dropped (a debug log records the cap hit) and
the call is treated as a non-deterministic union of the kept
implementers. Languages that use structural / implicit interface
satisfaction (Go) are deliberately skipped because per-file extraction
is intractable; those calls fall back to the single-result resolver. The
extractor covers Java, Rust, TS/JS/TSX, Python, Ruby, PHP, and C++.
**Source**: [`src/cfg/hierarchy.rs`](https://github.com/elicpeter/nyx/blob/master/src/cfg/hierarchy.rs)
and [`src/summary/mod.rs`](https://github.com/elicpeter/nyx/blob/master/src/summary/mod.rs)
(`TypeHierarchyIndex`, `resolve_callee_widened`).
---
## Symbolic execution
**What it does.** Builds a symbolic expression tree per tainted SSA value,
generates a witness string for each taint finding (the concrete-looking
shape of the dangerous value at the sink), and detects sanitization
patterns that the taint engine alone would miss. Supports string
operations (`trim`, `replace`, `toLower`, `substring`, `strlen`, …),
arithmetic, concatenation, phi nodes, and opaque calls.
**Why it helps.** Raises finding quality. A taint finding with a rendered
witness like `"SELECT * FROM t WHERE id=" + userInput` is substantially
easier to triage than one without. Also powers some confidence-gating for
downstream display.
**How to turn it off.**
| Surface | Value |
|---|---|
| Config | `symex.enabled = false` under `[analysis.engine]` |
| CLI flag | `--no-symex` |
| Env var (legacy) | `NYX_SYMEX=0` |
Two nested switches refine the scope without disabling symex entirely:
| Setting | CLI | Env | Default | Effect |
|---|---|---|---|---|
| `symex.cross_file` | `--no-cross-file-symex` | `NYX_CROSS_FILE_SYMEX=0` | on | Consult cross-file SSA bodies so symex can reason about callees defined in other files |
| `symex.interprocedural` | `--no-symex-interproc` | `NYX_SYMEX_INTERPROC=0` | on | Intra-file interprocedural symex (k ≥ 2 via frame stack) |
**Limitations.** Expression trees are bounded at `MAX_EXPR_DEPTH=32`;
deeper expressions degrade to `Unknown` rather than growing unboundedly.
Sanitizer detection is informational: string-replace sanitizer patterns
are reported as witness metadata, not used to clear taint.
**Source**: [`src/symex/`](https://github.com/elicpeter/nyx/tree/master/src/symex/).
---
## Demand-driven analysis
**What it does.** After the forward pass-2 taint analysis finishes, runs a
*backwards* walk from each sink's tainted SSA operands. The walk follows
reverse SSA-edge transfer (phi fan-out, `Assign` operand-fanout, `Call`
body-expansion or arg-fanout) until it reaches a taint source, proves
the flow infeasible via an accumulated path predicate, or exhausts its
budget. Each forward finding is then annotated with the aggregate verdict:
- `backwards-confirmed`; a matching source was reached. Finding picks
up a small confidence boost and the note appears in
`evidence.symbolic.cutoff_notes`.
- `backwards-infeasible`; every walk proved the flow unreachable.
Finding is capped to Low confidence and a user-readable limiter is
attached.
- `backwards-budget-exhausted`; the walk hit `BACKWARDS_VALUE_BUDGET`
without a verdict. Recorded as a limiter so operators can see when
the pass could not keep up.
- Inconclusive outcomes are a no-op: the forward finding is untouched.
Because the backwards walk can consult `GlobalSummaries.bodies_by_key`
(populated by the cross-file callee body persistence layer) it closes
across file boundaries; when a callee body is not loadable the walk
falls back to fanning out over the call's arguments so local reach-back
is still possible.
**Why it helps.** Inverts the analysis direction so budget follows
questions the scanner actually cares about; "does any source reach
*this* sink?"; instead of proving every potential source-to-sink
path. Corroborated findings are a stronger signal than forward-only
ones, and proven-infeasible flows provide a principled way to lower
confidence on forward false positives without silently dropping them.
**How to turn it on.** Defaults off so the benchmark floor is preserved
while the pass stabilises.
| Surface | Value |
|---|---|
| Config | `backwards_analysis = true` under `[analysis.engine]` |
| CLI flag | `--backwards-analysis` / `--no-backwards-analysis` |
| Env var (legacy) | `NYX_BACKWARDS=1` |
**Limitations.** Reverse call-graph expansion stops at `ReachedParam`; the walk
terminates at function parameters rather than crossing back into callers.
Path-constraint pruning is conservative: only the accumulated
`PredicateSummary` bits are consulted, not the full symbolic predicate stack.
Depth-bounded at k=2 for
cross-function body expansion. See `DEFAULT_BACKWARDS_DEPTH`,
`BACKWARDS_VALUE_BUDGET`, and `MAX_BACKWARDS_CALLEE_BLOCKS` in
`src/taint/backwards.rs` for the exact bounds.
**Cap parity.** The walk treats `DemandState.caps` as opaque bitflags,
every cap defined in `src/labels/mod.rs` round-trips identically through
the demand transfer. Including `Cap::DATA_EXFIL` (bit 13): a
`taint-data-exfiltration` forward finding receives `backwards-confirmed`
exactly like a `taint-unsanitised-flow` SQL/CMD/SSRF finding when its
demand walk reaches a Sensitive source. The cap-routing logic in
`src/ast.rs` then surfaces the rule id correctly regardless of which
direction confirmed the flow. See
`tests/backwards_analysis_tests.rs::demand_driven_suite` (the
`data_exfil` sub-case) and
`taint::backwards::tests::driver_walks_data_exfil_source_to_sink` for
the regression guards.
**Source**: [`src/taint/backwards.rs`](https://github.com/elicpeter/nyx/blob/master/src/taint/backwards.rs).
---
## Constraint solving
**What it does.** Collects path constraints at each branch in SSA and
propagates them alongside taint. Prunes paths whose accumulated constraint
set is unsatisfiable; a taint flow guarded by `if x < 0 && x > 10` is
dropped rather than surfaced. Optionally delegates the satisfiability
check to Z3 when Nyx is built with the `smt` Cargo feature.
**Why it helps.** Removes a class of FPs rooted in clearly-infeasible
control-flow combinations. Without path constraints, a taint flow that
only occurs when mutually-exclusive branches are simultaneously taken can
still produce a finding.
**How to turn it off.**
| Surface | Value |
|---|---|
| Config | `constraint_solving = false` under `[analysis.engine]` |
| CLI flag | `--no-constraint-solving` |
| Env var (legacy) | `NYX_CONSTRAINT=0` |
The SMT backend is a separate switch:
| Setting | CLI | Env | Default | Effect |
|---|---|---|---|---|
| `symex.smt` | `--no-smt` | `NYX_SMT=0` | on when built with `smt` feature | Delegate satisfiability checks to Z3; ignored if Nyx was built without `smt` |
**Limitations.** The default path-constraint domain is syntactic;
trivially-inconsistent pairs are caught without an SMT solver, but richer
algebraic unsatisfiability requires the `smt` feature (Z3). Without `smt`,
Nyx ships a lightweight satisfiability check that catches literal
contradictions but not deeper reasoning.
**Source**: [`src/constraint/`](https://github.com/elicpeter/nyx/tree/master/src/constraint/).
---
## Combining the switches
The defaults (all on) are the configuration Nyx is benchmarked against.
Turning any switch off trades precision for speed and may move findings
relative to the published baseline; CI regression gates assume defaults.
If you need a minimal-overhead scan (for very large repositories or a
pre-commit fast path), the AST-only scan mode (`--mode ast`) skips CFG,
taint, and all four advanced passes entirely and is the right tool.

1
docs/assets Symbolic link
View file

@ -0,0 +1 @@
../assets

143
docs/auth.md Normal file
View file

@ -0,0 +1,143 @@
# Auth analysis
**Rust is the stable target.** Python and Go have shipped precision work as of 0.7.0 (FastAPI cross-file dependencies, Go DAO-helper filtering, same-file caller-scope IPA) and are usable on real codebases. Ruby, Java, JavaScript, and TypeScript have rule scaffolding in [`src/auth_analysis/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/config.rs) but no benchmark corpus yet; treat findings there as preview.
## What it catches
The Rust rule is `rs.auth.missing_ownership_check`. It fires when a request handler reaches a privileged operation that takes a scoped identifier (`*_id`, row reference, scoped resource) without a preceding ownership or membership check.
Concretely, it looks for these patterns of authorization in the function body and flags the call when none are present:
- A call to a recognised authorization helper. Defaults: `check_ownership`, `has_ownership`, `require_ownership`, `ensure_ownership`, `is_owner`, `authorize`, `verify_access`, `has_permission`, `can_access`, `can_manage`, plus `*_membership` and `require_{group,org,workspace,tenant,team}_member` variants. Extend in `[analysis.languages.rust]`.
- An ownership-equality check on a row reference: `if owner_id != user.id { return 403 }` or any `field_id != self_actor` shape. The check writes `AuthCheck` evidence back to the row-fetch arguments via `AnalysisUnit.row_field_vars`.
- A self-actor reference: `let user = require_auth(...).await?` followed by use of `user.id`, `user.user_id`, `user.uid`. The actor is recognised from typed extractor params (`Extension<Session>`, `CurrentUser`, etc.) and from typed helper bindings.
- A typed extractor wrapper that proves route-level capability/policy enforcement: meilisearch-style `GuardedData<ActionPolicy<X>, _>`. Recognised by outer wrapper name (last segment, case-insensitive `starts_with`) so `GuardedData<ActionPolicy<X>, Data<AuthController>>` is classified by the outer `GuardedData`, not by whether an inner generic arg substring-matches `auth`. Configured via `policy_guard_names` (Rust default: `["Guarded"]`). Distinct from authentication-only wrappers so the pattern doesn't pollute regular call recognition.
- A SQL query that joins through an ACL table or filters by `user_id` predicate. Detected without a SQL parser via [`sql_semantics.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/sql_semantics.rs); the authorized result variable propagates through `let row = ...prepare(LIT)...`, `for row in result`, `let id = row.get(...)`.
- A helper-summary lift: handler calls `validate_target(db, widget_id, user.id)` whose body contains a `require_*_member` call. Cross-function summaries are merged at fixed-point (capped at 4 iterations).
Handlers registered through attribute macros (`#[get("/path")]`, `#[routes::path(…)]`) or external service-config builders are also walked for typed-extractor guards, complementing the `.route(...)` registration path.
## Caller-scope-entity exemption
`<entity>.id` / `<entity>.pk` is not flagged when `<entity>` is a unit parameter named after a multi-tenant scope primitive: `organization` / `org`, `project`, `team`, `workspace`, `tenant`, `account`, `community`, `group`, `repository` / `repo`, `company`. The argument represents the caller's scope, not a user-controlled target, so internal helpers like `def get_environments(request, organization): Environment.objects.filter(organization_id=organization.id, …)` inherit the caller's authorization. Other field names (`.name`, `.slug`) still flag, and `user` / `member` / `actor` are deliberately excluded; those are handled by the actor-context recogniser.
## Project-level web-framework gate (Rust)
In Rust, the `context_inputs` and param-name arms of the user-input heuristic are gated by a project-level web-framework signal. The signal is three-valued:
- `Some(true)`: the project's `Cargo.toml` names `axum`, `actix-web`, or `rocket`, OR the file directly imports one (`axum::`, `actix_web::`, `rocket::`, `axum_extra::`). Heuristics stay on.
- `Some(false)`: `Cargo.toml` was inspected and named no web framework, AND the file does not directly import one. Heuristics off; only `RouteHandler` classification (concrete route-registration evidence) survives.
- `None`: no detection ran (single-file scan with no project root). Heuristics on; behavior unchanged.
This avoids a class of FPs in non-web Rust crates where a debug-session handle named `session` would trip on `session.update(cx, …)`-style desktop-app code. Other languages keep prior behavior; the gate is currently Rust-only.
## Python: FastAPI cross-file dependencies
FastAPI's `include_router` chain is resolved across files. A child router declared in `routes/task_instances.py` and attached on a parent in `routes/__init__.py` inherits the parent's `dependencies=[...]`.
- Module-level `router = APIRouter(dependencies=[Security(...)])` is pre-walked once per file and merged onto every `@<router>.<verb>(...)` route attached in the same file.
- `<parent>.include_router(<child_module>.<child_var>)` edges are captured per file in pass 1, persisted into `GlobalSummaries::router_facts_by_module`, and lifted onto the active file's `AuthorizationModel::cross_file_router_deps` at pass 2 entry. Transitive lifts (grandparent to parent to child) iterate to fixpoint.
- `Security(callable, scopes=[...])` is recognised distinctly from `Depends(callable)` and promotes the synthetic `AuthCheck` to `AuthCheckKind::Other` (route-level scope-checked authorization). Bare `Depends(callable)` is still a Login-only check.
Module identity is the file basename without `.py`. This is sufficient for airflow-style `task_instances.router` naming; a project with two files of the same name in different subtrees will currently collide.
## Go: DAO-helper id-scalar precision pass
For non-route Go units, a parameter whose declared type is a bounded primitive scalar (`int64`, `uint32`, `string`, `bool`, `byte`, `rune`, `float64`, etc.) and whose name is id-shaped (`id`, `*Id`, `*_id`, `*ids`) is dropped from `unit.params` before ownership-check evaluation.
Real Go HTTP handlers always carry a framework-request-typed param (`*http.Request`, `*gin.Context`, `echo.Context`, `*fiber.Ctx`); per-framework route extractors set `include_id_like_typed=true` so id-shaped path params survive on real routes. The filter only fires when the unit was not classified as a route handler, so helpers like `func GetRunByRepoAndID(ctx, repoID, runID int64)` are recognised as DAO callees and the ownership check is expected at the calling route handler, not inside the helper.
## Same-file caller-scope IPA
When a private helper is called only from authorized route handlers in the same file, the caller's auth checks lift onto the helper as synthetic `is_route_level=true` `AuthCheck` entries.
- Iterated to a small fixpoint so transitive chains (route to mid_helper to leaf_helper) are covered.
- Refuses to authorize helpers with no in-file caller, helpers called from a mix of authorized and unauthorized callers, and helpers called only from un-lifted helpers.
- Cross-file caller-scope lifting is not implemented yet.
This closes the FastAPI / Django / Flask shape where a route authenticates via decorator or dependency, then delegates to a private helper that performs the sink.
## Sink classification
The same call name can be safe on a local collection and dangerous on a database. The detector categorises each candidate sink before deciding whether to flag:
| Class | Examples | Default treatment |
|---|---|---|
| `InMemoryLocal` | `map.insert`, `set.insert`, `vec.push` on tracked local | Never a sink |
| `RealtimePublish` | `realtime.publish_to_group`, `pubsub.send` | Sink unless ownership is established for the channel scope |
| `OutboundNetwork` | `http.post`, `reqwest::Client::post` | Sink unless a sanitiser is on the path |
| `CacheCrossTenant` | `redis.set`, `memcached.set` with scoped keys | Sink unless tenant is checked |
| `DbMutation` | `db.insert`, `repo.save` with scoped IDs | Sink unless ownership is established |
| `DbCrossTenantRead` | `db.query` returning rows from a tenant scope | Sink unless ACL-join or tenant predicate is present |
Receiver type drives the classification when SSA type facts are available, so `client.send(...)` correctly resolves through the receiver's inferred type.
## What it can't catch
- **Non-Rust frameworks**, in practice. Scaffolding exists; coverage doesn't.
- **Type-system authorization.** A typestate pattern that makes unauthenticated handlers fail to compile (`fn endpoint(user: AuthenticatedUser<Admin>)`) is invisible. This is mostly fine because the type system already enforced the check, but the rule won't credit it.
- **Authorization performed only via macros** that the AST doesn't expose as a recognisable call.
- **Cross-async-boundary actor binding.** If the handler awaits `let user = require_auth(...).await?` and then spawns a task that uses `user.id` after a `tokio::spawn`, the spawn body is treated as a separate scope.
## The taint-based variant
A second rule, `rs.auth.missing_ownership_check.taint`, folds the same logic into the SSA/taint engine using the `Cap::UNAUTHORIZED_ID` capability (bit 12). Request-bound handler parameters seed `UNAUTHORIZED_ID` into taint state; ownership checks act as sanitizers that strip the cap; sinks that take scoped IDs require it absent.
This path is **off by default** while the standalone analyser carries the stable signal. Enable both:
```toml
[scanner]
enable_auth_as_taint = true
```
Run them together; if both fire for the same site, treat it as the same finding (the taint variant carries fuller flow evidence).
## Tuning
### Add a project-specific authorization helper
```toml
[[analysis.languages.rust.rules]]
matchers = ["require_subscription", "ensure_paid_seat"]
kind = "sanitizer"
cap = "unauthorized_id"
```
The same rule recognised in the standalone analyser also strips `Cap::UNAUTHORIZED_ID` for the taint-based variant.
### Add a project-specific typed-extractor policy wrapper
```toml
[analysis.languages.rust.auth]
policy_guard_names = ["MyAppGuarded", "PolicyExtractor"]
```
Matched as last-segment + case-insensitive `starts_with` (so a single entry `"Guarded"` covers `Guarded`, `GuardedData`, `GuardedRoute`). Distinct from `login_guard_names` and `admin_guard_names`.
### Recognised actor names
Recognised by default: `user.id`, `user.user_id`, `user.uid`, `session.user_id`, `current_user.id`, plus typed extractor parameters with `CurrentUser`, `SessionUser`, `AuthUser`, `Extension<...>` shapes. To add a custom binding pattern, file an issue or add a fixture; the heuristic lives in [`src/auth_analysis/extract/common.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/extract/common.rs) under the `*self_actor*` helpers (`collect_self_actor_binding`, `collect_typed_extractor_self_actor`, `is_self_actor_type_text`).
### Suppress
Inline:
```rust
db.insert(widget_id, value)?; // nyx:ignore rs.auth.missing_ownership_check
```
Or filter by severity / confidence in CI:
```bash
nyx scan . --severity ">=MEDIUM" --min-confidence medium
```
## In the UI
Auth findings render alongside taint findings in the [browser UI](serve.md). The flow visualiser shows the sink call, the actor reference (when one was found), and any helper-summary path the engine traversed; the How to fix panel mirrors the rule's recommendation.
<p align="center"><img src="assets/screenshots/docs/serve-finding-detail.png" alt="Nyx finding detail: numbered source → call → sink walk with a How to fix panel and an inline evidence object" width="900"/></p>
## Benchmark corpus
The Rust auth corpus at [`tests/benchmark/corpus/rust/auth/`](https://github.com/elicpeter/nyx/tree/master/tests/benchmark/corpus/rust/auth/) covers the recognised authorization patterns, true-positive controls, typed-extractor guard injection, and the project-level web-framework gate (full-Cargo.toml fixtures under `safe_non_web_rust_project/` and `unsafe_actix_web_project_no_check/`). Per-row metrics live under the Rust auth row in `tests/benchmark/RESULTS.md`.

1
docs/changelog.md Normal file
View file

@ -0,0 +1 @@
{{#include ../CHANGELOG.md}}

475
docs/cli.md Normal file
View file

@ -0,0 +1,475 @@
# CLI Reference
## Global
```
nyx [COMMAND]
nyx --version
nyx --help
```
---
## `nyx scan`
Run a security scan on a directory.
```
nyx scan [PATH] [OPTIONS]
```
**PATH** defaults to `.` (current directory).
### Analysis Mode
| Flag | Default | Description |
|------|---------|-------------|
| `--mode <MODE>` | `full` | Analysis mode: `full`, `ast`, `cfg`, or `taint` |
| Mode | What runs |
|------|-----------|
| `full` | AST patterns + CFG structural analysis + taint analysis |
| `ast` | AST patterns only (fastest, no CFG or taint) |
| `cfg` / `taint` | CFG + taint analysis only (no AST patterns) |
**Deprecated aliases**: `--ast-only` (use `--mode ast`), `--cfg-only` (use `--mode cfg`), `--all-targets` (use `--mode full`).
### Index Control
| Flag | Default | Description |
|------|---------|-------------|
| `--index <MODE>` | `auto` | Index behavior: `auto`, `off`, or `rebuild` |
| Index Mode | Behavior |
|------------|----------|
| `auto` | Use existing index if available; build if missing |
| `off` | Skip indexing, scan filesystem directly |
| `rebuild` | Force rebuild index before scanning |
**Deprecated aliases**: `--no-index` (use `--index off`), `--rebuild-index` (use `--index rebuild`).
### Output
| Flag | Default | Description |
|------|---------|-------------|
| `-f, --format <FMT>` | `console` | Output format: `console`, `json`, or `sarif` |
| `--quiet` | off | Suppress status messages (stderr), including the Preview-tier banner for C/C++ scans |
| `--no-rank` | off | Disable attack-surface ranking |
| `--no-state` | off | Disable state-model analysis (resource lifecycle + auth state). Overrides `scanner.enable_state_analysis` |
### Profiles
| Flag | Default | Description |
|------|---------|-------------|
| `--profile <NAME>` | *(none)* | Apply a named scan profile. Built-ins: `quick`, `full`, `ci`, `taint_only`, `conservative_large_repo`. User-defined profiles override built-ins with the same name. CLI flags still take precedence over profile values |
### Filtering
| Flag | Default | Description |
|------|---------|-------------|
| `--severity <EXPR>` | *(none)* | Filter findings by severity |
| `--min-score <N>` | *(none)* | Drop findings with rank score below N |
| `--min-confidence <LEVEL>` | *(none)* | Drop findings below this confidence level (`low`, `medium`, `high`) |
| `--require-converged` | off | Drop findings whose engine provenance notes indicate widening (over-report) or analysis bail. Keeps `under-report` findings (emitted flow is still real). Intended for strict CI gates. |
| `--fail-on <SEV>` | *(none)* | Exit code 1 if any finding >= this severity |
| `--show-suppressed` | off | Show inline-suppressed findings (dimmed, tagged `[SUPPRESSED]`) |
| `--keep-nonprod-severity` | off | Don't downgrade severity for test/vendor paths |
| `--all` | off | Disable category filtering, rollups, and LOW budgets. Shows everything |
| `--include-quality` | off | Include Quality-category findings (hidden by default) |
| `--max-low <N>` | `20` | Maximum total LOW findings to show |
| `--max-low-per-file <N>` | `1` | Maximum LOW findings per file |
| `--max-low-per-rule <N>` | `10` | Maximum LOW findings per rule |
| `--rollup-examples <N>` | `5` | Number of example locations in rollup findings |
| `--show-instances <RULE>` | *(none)* | Expand all instances of a specific rule (bypass rollup) |
`nyx scan` automatically reads `.nyx/triage.json` from the scan root when the
file exists. Terminal triage states written by `nyx serve` (`false_positive`,
`accepted_risk`, `suppressed`, and `fixed`) are hidden from CLI output and do
not trigger `--fail-on` by default. Use `--show-suppressed` to include them in
console, JSON, or SARIF output with their `triage_state` and optional
`triage_note`.
**Severity expression formats**:
```bash
--severity HIGH # Only high
--severity "HIGH,MEDIUM" # High or medium
--severity ">=MEDIUM" # Medium and above (high + medium)
--severity ">= low" # All severities (case-insensitive)
```
**Deprecated aliases**: `--high-only` (use `--severity HIGH`), `--include-nonprod` (use `--keep-nonprod-severity`).
`--fail-on` returns a non-zero exit code when the threshold trips, so CI jobs fail without further wiring:
<p align="center"><img src="assets/screenshots/docs/cli-failon.png" alt="nyx scan with --fail-on HIGH against a small fixture: three HIGH taint findings printed, followed by exit=1 from the shell" width="900"/></p>
Quality-category and rollup-prone Low findings are filtered down by default. The footer tells you exactly what got dropped and which knob to turn:
<p align="center"><img src="assets/screenshots/docs/cli-rollup-tail.png" alt="nyx scan tail: warning '*' generated 57 issues; Suppressed 92 LOW/Quality findings; Active filters max_low=20, max_low_per_file=1, max_low_per_rule=10; Use --include-quality, --max-low, or --all to adjust" width="900"/></p>
### Analysis Engine Toggles
Override the corresponding `[analysis.engine]` values in `nyx.conf` for a single run. All default **on**; pass the `--no-*` variant to disable.
| Pair | Config field | Effect when disabled |
|------|---|---|
| `--constraint-solving` / `--no-constraint-solving` | `constraint_solving` | Skip path-constraint solving; infeasible paths no longer pruned |
| `--abstract-interp` / `--no-abstract-interp` | `abstract_interpretation` | Skip interval / string / bit abstract domains |
| `--context-sensitive` / `--no-context-sensitive` | `context_sensitive` | Treat intra-file callees insensitively (summary-only) |
| `--symex` / `--no-symex` | `symex.enabled` | Skip the symex pipeline; no symbolic verdicts or witnesses |
| `--cross-file-symex` / `--no-cross-file-symex` | `symex.cross_file` | Skip extracting / consulting cross-file SSA bodies |
| `--symex-interproc` / `--no-symex-interproc` | `symex.interprocedural` | Cap symex frame stack at the entry function |
| `--smt` / `--no-smt` | `symex.smt` | Skip the SMT backend (still a no-op without the `smt` feature) |
| `--backwards-analysis` / `--no-backwards-analysis` | `backwards_analysis` | Demand-driven backwards taint walk from sinks (default **off**) |
| `--parse-timeout-ms <N>` | `parse_timeout_ms` | Per-file tree-sitter parse timeout (ms); `0` disables the cap |
### Lattice-width Caps
Two caps bound the width of taint origin sets and points-to sets per SSA value. When a set would exceed the cap, entries are truncated deterministically and an engine note (`OriginsTruncated` / `PointsToTruncated`) is recorded on affected findings so you can see when precision was lost.
| Flag | Default | Description |
|------|---------|-------------|
| `--max-origins <N>` | `32` | Max taint origins retained per lattice value. Raise on very wide codebases where truncation is observed; lower only when lattice width is a measured bottleneck. Also set via `NYX_MAX_ORIGINS` |
| `--max-pointsto <N>` | `32` | Max abstract heap objects retained per points-to set. Raise on factory-heavy codebases where truncation is observed. Also set via `NYX_MAX_POINTSTO` |
See [configuration.md](configuration.md#analysisengine) for the full schema.
### Engine-Depth Profile
Individual engine toggles are fine-grained but hard to remember in combination. The `--engine-profile` shortcut sets the whole stack in one shot, and individual flags are layered on top after the profile is applied.
| Profile | Backwards | Symex | Abstract-interp | Context-sensitive |
|---------|-----------|-------|-----------------|-------------------|
| `fast` | off | off | off | off |
| `balanced` (default) | off | off | on | on |
| `deep` | on | on (cross-file + interprocedural) | on | on |
All three profiles build the AST, CFG, and SSA lattice and run forward taint; the columns above show which additional analyses each profile enables. SMT (`symex.smt`) is always off unless Nyx was built with `--features smt`.
Individual flags override the profile. For example, `--engine-profile fast --backwards-analysis` runs the fast stack but with backwards analysis on.
### Explain Effective Engine
`--explain-engine` prints the resolved engine configuration (profile + config + CLI overrides + env-var fallbacks) to stdout and exits without scanning. Useful for sanity-checking a CI invocation.
```bash
nyx scan --engine-profile deep --no-smt --explain-engine
```
<p align="center"><img src="assets/screenshots/docs/cli-explain-engine.png" alt="nyx scan --engine-profile deep --explain-engine output: resolved config showing every analysis pass, its current state, and the CLI flag/env var that controls it" width="900"/></p>
### Dynamic verification
Available in default builds, or in custom builds with `--features dynamic`. See [dynamic.md](dynamic.md) for the full pipeline and verdict semantics.
| Flag | Default | Description |
|------|---------|-------------|
| `--verify` | on | Enable dynamic verification (default when built with `dynamic`). Conflicts with `--no-verify` |
| `--no-verify` | off | Skip verification for this run. Useful for fast static-only scans without editing config |
| `--verify-all-confidence` | off | Also verify findings below `Confidence >= Medium`. Slower; intended for payload tuning |
| `--backend <BACKEND>` | `auto` | Sandbox backend: `auto` (docker if available, else process), `docker` (required), `process` (in-process runner) |
| `--unsafe-sandbox` | off | Force the process backend. Equivalent to `--backend process`. Cannot combine with `--backend docker` |
| `--harden <PROFILE>` | `standard` | Process-backend lockdown: `standard` (no-new-privs + rlimit on Linux) or `strict` (namespaces + chroot + seccomp on Linux; `sandbox-exec` on macOS) |
| `--verbose` | off | Flush the per-finding `VerifyTrace` to stderr after each verdict. Same stream that lands in `expected/trace.jsonl` in the repro bundle |
### Baseline / patch validation
| Flag | Default | Description |
|------|---------|-------------|
| `--baseline <FILE>` | *(none)* | Read a prior scan's JSON (or a stripped `.nyx/baseline.json`) and diff it against this scan on `stable_hash`. Reports `New` / `Resolved` / `FlippedConfirmed` / `FlippedNotConfirmed` transitions |
| `--baseline-write <FILE>` | *(none)* | After scanning, write a stripped baseline (only `stable_hash`, `dynamic_verdict`, `severity`, `path`, `rule_id`; no source). Safe to commit |
| `--gate <GATE>` | *(none)* | CI gate to enforce when `--baseline` is active. `no-new-confirmed` exits 2 on any new Confirmed finding; `resolve-all-confirmed` exits 2 if any baseline-Confirmed finding is not fully resolved |
### Examples
```bash
# Basic scan
nyx scan
# Scan specific path, JSON output
nyx scan ./server --format json
# CI gate: fail on medium+, SARIF output
nyx scan . --format sarif --fail-on medium > results.sarif
# Fast AST-only scan, no index
nyx scan . --mode ast --index off
# High-severity only, quiet mode
nyx scan . --severity HIGH --quiet
# Only findings scoring 50 or above
nyx scan . --min-score 50
# Only medium+ confidence findings
nyx scan . --min-confidence medium
# Show everything (no filtering, no rollups)
nyx scan . --all
# Include quality findings but keep rollups and budgets
nyx scan . --include-quality
# See all unwrap findings expanded
nyx scan . --include-quality --show-instances rs.quality.unwrap
# Allow more LOW findings
nyx scan . --max-low 50 --max-low-per-file 5
```
---
## `nyx repro`
Replay a dynamic repro bundle for a confirmed finding.
```
nyx repro (--finding <ID> | --spec-hash <HASH> | --bundle <DIR>) [OPTIONS]
```
Nyx writes repro bundles under the platform cache directory and keys them by
`spec_hash`. The browser UI and scan output show `finding_id`, so
`--finding` scans cached bundle manifests and replays the newest match.
| Flag | Description |
|------|-------------|
| `--finding <ID>` | Find the newest cached bundle whose manifest carries this stable finding ID |
| `--spec-hash <HASH>` | Replay an exact cache bundle by spec hash |
| `--bundle <DIR>` | Replay an explicit bundle directory |
| `--docker` | Run the bundle's Docker replay path (`./reproduce.sh --docker`) |
| `--print-path` | Print the resolved bundle path and exit without replaying |
| `--list` | With `--finding`, list all matching cached bundles newest first |
Examples:
```bash
nyx repro --finding b9caa35df2213040
nyx repro --finding b9caa35df2213040 --docker
nyx repro --finding b9caa35df2213040 --print-path
nyx repro --spec-hash 8bca7f8e0311d6c9
nyx repro --bundle /path/to/repro/8bca7f8e0311d6c9
```
Exit codes mirror `reproduce.sh`: `0` pass, `1` replay mismatch, `2` Docker
unavailable, `3` process-backend toolchain mismatch. Any other script exit is
passed through.
---
## `nyx index`
Manage the SQLite file index.
### `nyx index build`
```
nyx index build [PATH] [--force]
```
Build or update the index for the given path (default: `.`).
| Flag | Description |
|------|-------------|
| `-f, --force` | Force full rebuild, ignoring cached file hashes |
### `nyx index status`
```
nyx index status [PATH]
```
Display index statistics (file count, size, last modified) for the given path.
<p align="center"><img src="assets/screenshots/docs/cli-idxstatus.png" alt="nyx index status output: project name, index path under the platform config dir, exists/size/modified fields" width="900"/></p>
---
## `nyx list`
```
nyx list [-v]
```
List all indexed projects.
| Flag | Description |
|------|-------------|
| `-v, --verbose` | Show detailed information per project |
---
## `nyx clean`
```
nyx clean [PROJECT] [--all]
```
Remove index data.
| Argument/Flag | Description |
|---------------|-------------|
| `PROJECT` | Project name or path to clean |
| `--all` | Clean all indexed projects |
---
## `nyx surface`
Print the project's attack-surface map.
```
nyx surface [PATH] [--format <FMT>] [--build]
```
Loads the `SurfaceMap` persisted by the most recent indexed scan when available; otherwise runs the per-language framework probes against the on-disk source to produce an entry-points-only map. Pass `--build` to force a full inline build (pass-1 summary extraction + call-graph construction) on an unscanned project, which adds `DataStore` / `ExternalService` / `DangerousLocal` nodes the entry-points-only fallback omits.
| Flag | Default | Description |
|------|---------|-------------|
| `--format <FMT>` | `text` | Output format: `text` (indented tree), `json` (canonical SurfaceMap), `dot` (Graphviz source), or `svg` (spawns `dot` locally) |
| `--build` | off | Force a full SurfaceMap build inline when no indexed scan exists. Same cost as `nyx index build` |
Pipe `dot` output through `dot -Tsvg` for a renderable graph, or use `--format svg` for a one-step render when graphviz is installed.
---
## `nyx serve`
Start the local browser UI for browsing scan results.
```
nyx serve [PATH] [OPTIONS]
```
**PATH** defaults to `.` (current directory). The server binds to a loopback address only and refuses non-loopback hosts at startup.
| Flag | Default | Description |
|------|---------|-------------|
| `-p, --port <PORT>` | *(from config)* | Port to bind to (overrides `[server].port`) |
| `--host <HOST>` | *(from config)* | Host to bind to (overrides `[server].host`) |
| `--no-browser` | off | Skip opening the browser automatically |
See [serve.md](serve.md) for the UI tour, route map, and CSRF / host-header behaviour.
---
## `nyx verify-feedback`
Record a correction or confirmation against a dynamic-verifier verdict. Requires `--features dynamic`.
```
nyx verify-feedback <FINDING_ID> [--wrong <REASON> | --right] [--upload]
```
| Argument/Flag | Description |
|---------------|-------------|
| `FINDING_ID` | Stable 16-char hex id shown in `nyx scan --verify` output |
| `--wrong <REASON>` | Mark the verdict wrong and record the reason. Conflicts with `--right` |
| `--right` | Confirm the verdict. Conflicts with `--wrong` |
| `--upload` | Reserved; uploading to Nyx telemetry is not yet implemented |
Feedback is written to the local telemetry log under the platform cache dir.
---
## `nyx config`
Manage configuration.
### `nyx config show`
Print the effective merged configuration as TOML. Useful for sanity-checking what the scanner is actually using after `nyx.conf` and `nyx.local` merge:
<p align="center"><img src="assets/screenshots/docs/cli-configshow.png" alt="nyx config show output: TOML dump of the merged scanner config showing [scanner] mode/min_severity/excluded_extensions/excluded_directories, [database] settings, and resolved engine toggles" width="900"/></p>
### `nyx config path`
Print the configuration directory path.
### `nyx config add-rule`
```
nyx config add-rule --lang <LANG> --matcher <MATCHER> --kind <KIND> --cap <CAP>
```
Add a custom taint rule. Written to `nyx.local`.
| Flag | Values |
|------|--------|
| `--lang` | `rust`, `javascript`, `typescript`, `python`, `go`, `java`, `c`, `cpp`, `php`, `ruby` |
| `--matcher` | Function or property name to match |
| `--kind` | `source`, `sanitizer`, `sink` |
| `--cap` | `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `code_exec`, `crypto`, `unauthorized_id`, `data_exfil`, `ldap_injection`, `xpath_injection`, `header_injection`, `open_redirect`, `ssti`, `xxe`, `prototype_pollution`, `all` |
### `nyx config add-terminator`
```
nyx config add-terminator --lang <LANG> --name <NAME>
```
Add a terminator function (e.g. `process.exit`). Written to `nyx.local`.
---
## `nyx rules`
Browse the built-in rule registry from the terminal. Same dataset the dashboard's Rules page reads from: cap-class entries (one per `Cap` with a canonical rule id), per-language label rules (sink / source / sanitizer), gated sinks, and any custom rules from your config.
### `nyx rules list`
```
nyx rules list [--lang <SLUG>] [--kind <KIND>] [--class-only|--no-class] [--json]
```
| Flag | Values |
|------|--------|
| `--lang` | Language slug (`javascript`, `typescript`, `python`, `java`, `php`, `go`, `ruby`, `rust`, `c`, `cpp`). Cap-class entries (`language = "all"`) still surface alongside any language filter unless `--no-class` is set. |
| `--kind` | `class` (cap-class entry), `source`, `sink`, `sanitizer` |
| `--class-only` | Show only the cap-class registry entries, suppressing per-language label rules and gated sinks. |
| `--no-class` | Suppress cap-class registry entries, show only per-language label rules and gated sinks. Conflicts with `--class-only`. |
| `--json` | Emit JSON instead of the human-readable table. Schema matches the `/api/rules` response. |
Examples:
```bash
# Browse the seven new vulnerability classes
nyx rules list --class-only
# All Java sinks
nyx rules list --lang java --kind sink
# JSON output for scripted filtering
nyx rules list --json | jq '.[] | select(.cap == "ldap_injection")'
```
The `enabled` column reflects the `analysis.disabled_rules` overlay from your config, so a rule disabled in `nyx.local` shows up here too. Custom rules added via `nyx config add-rule` appear at the end with `is_custom: true`.
---
## Exit codes
See [output.md](output.md#exit-codes). Summary: `0` on success (including findings without `--fail-on`), `1` when `--fail-on` trips, non-zero on scan errors.
---
## Environment variables
Runtime behaviour:
| Variable | Description |
|----------|-------------|
| `RUST_LOG` | Set tracing verbosity (e.g. `RUST_LOG=debug nyx scan .`) |
| `NO_COLOR` | Disable ANSI color output |
Engine toggles (legacy, still honored; prefer CLI flags or `[analysis.engine]` config):
| Variable | Matches |
|---|---|
| `NYX_CONSTRAINT` | `--constraint-solving` |
| `NYX_ABSTRACT_INTERP` | `--abstract-interp` |
| `NYX_CONTEXT_SENSITIVE` | `--context-sensitive` |
| `NYX_SYMEX`, `NYX_CROSS_FILE_SYMEX`, `NYX_SYMEX_INTERPROC` | `--symex` and friends |
| `NYX_SMT` | `--smt` (no-op without the `smt` feature) |
| `NYX_BACKWARDS` | `--backwards-analysis` |
| `NYX_PARSE_TIMEOUT_MS` | `--parse-timeout-ms` |
| `NYX_MAX_ORIGINS`, `NYX_MAX_POINTSTO` | `--max-origins`, `--max-pointsto` |

471
docs/configuration.md Normal file
View file

@ -0,0 +1,471 @@
# Configuration
Nyx uses TOML configuration files. A default config is auto-generated on first run. If you'd rather edit settings and rules from the browser, the [Config page in `nyx serve`](serve.md#config) is a live editor that writes back to `nyx.local`:
<p align="center"><img src="assets/screenshots/docs/serve-config.png" alt="Nyx config page: General settings, Triage Sync toggle, Sources panel with language/matcher/capability dropdowns and a per-language matcher table" width="900"/></p>
## File Locations
| Platform | Directory |
|----------|-----------|
| Linux | `~/.config/nyx/` |
| macOS | `~/Library/Application Support/nyx/` |
| Windows | `%APPDATA%\elicpeter\nyx\config\` |
Run `nyx config path` to see the exact directory on your system.
## File Precedence
1. **`nyx.conf`**: default config (auto-created from built-in template on first run)
2. **`nyx.local`**: user overrides (loaded on top of defaults)
Both files are optional. CLI flags take precedence over both.
## Merge Strategy
| Type | Behavior |
|------|----------|
| Scalars (`mode`, `min_severity`, booleans) | User value wins |
| Arrays (`excluded_extensions`, `excluded_directories`, `excluded_files`) | Union + deduplicate |
| Analysis rules | Per-language union with deduplication |
| Profiles | User profile with same name fully replaces built-in |
| Server / Runs | User value wins (full section override) |
Example:
```toml
# nyx.conf (default):
excluded_extensions = ["jpg", "png", "exe"]
# nyx.local (user):
excluded_extensions = ["foo", "jpg"]
# Effective result:
# ["exe", "foo", "jpg", "png"] (sorted, deduped union)
```
---
## Full Schema
### `[scanner]`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `mode` | `"full"` \| `"ast"` \| `"cfg"` \| `"taint"` | `"full"` | Analysis mode |
| `min_severity` | `"Low"` \| `"Medium"` \| `"High"` | `"Low"` | Minimum severity to report |
| `max_file_size_mb` | int \| null | 16 | Max file size in MiB; null = unlimited. Default is a safe ceiling for untrusted repos; lift explicitly when scanning trusted codebases with large generated files |
| `excluded_extensions` | [string] | `["jpg", "png", "gif", "mp4", ...]` | File extensions to skip |
| `excluded_directories` | [string] | `["node_modules", ".git", "target", ...]` | Directories to skip |
| `excluded_files` | [string] | `[]` | Specific files to skip |
| `read_global_ignore` | bool | `false` | Honor global ignore file (RESERVED) |
| `read_vcsignore` | bool | `true` | Honor `.gitignore` / `.hgignore` |
| `require_git_to_read_vcsignore` | bool | `true` | Require `.git` dir to apply gitignore |
| `one_file_system` | bool | `false` | Don't cross filesystem boundaries |
| `follow_symlinks` | bool | `false` | Follow symbolic links |
| `scan_hidden_files` | bool | `false` | Scan dot-files |
| `include_nonprod` | bool | `false` | Keep original severity for test/vendor paths |
| `enable_state_analysis` | bool | `true` | Enable resource lifecycle + auth state analysis. Detects use-after-close, double-close, resource leaks (per-function scope), and unauthenticated access. Requires `mode = "full"` or `mode = "taint"`. |
| `enable_auth_analysis` | bool | `true` | Enable auth-state analysis within the state engine. When false, only resource lifecycle findings (leak, use-after-close, double-close) are produced. |
| `enable_panic_recovery` | bool | `false` | Catch per-file analysis panics as warnings and continue. When false, a panic aborts the scan, preserving the loud-fail behaviour for users debugging engine bugs. |
| `enable_auth_as_taint` | bool | `false` | Fold auth analysis into the SSA/taint engine via `Cap::UNAUTHORIZED_ID`. Off while the standalone path still carries stable detection. |
| `verify` | bool | `true` | Run dynamic verification on each `Confidence >= Medium` finding after the static pass. Included in default builds; custom `--no-default-features` builds need `--features dynamic`. CLI overrides: `--verify` / `--no-verify`. |
| `verify_all_confidence` | bool | `false` | Extend dynamic verification to findings below `Confidence::Medium`. Intended for corpus-building, not production scans. CLI: `--verify-all-confidence`. |
| `verify_backend` | string | `"auto"` | Sandbox backend for dynamic verification. `"auto"` picks docker when available else process; `"docker"` requires docker; `"process"` runs in-process (same as `--unsafe-sandbox`). |
| `harden_profile` | string | `"standard"` | Process-backend hardening profile. `"standard"` engages `PR_SET_NO_NEW_PRIVS` + `setrlimit(RLIMIT_AS)` on Linux; `"strict"` adds namespace unshare, chroot to workdir, and a default-deny seccomp filter on Linux, plus `sandbox-exec` wrapping on macOS keyed off the finding's expected cap. |
### `[database]`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `path` | string | `""` | Custom SQLite DB path; empty = platform default (RESERVED) |
| `auto_cleanup_days` | int | `30` | Days to keep DB files (RESERVED) |
| `max_db_size_mb` | int | `1024` | Maximum DB size in MiB (RESERVED) |
| `vacuum_on_startup` | bool | `false` | Run VACUUM before indexed scans |
### `[output]`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `default_format` | `"console"` \| `"json"` \| `"sarif"` | `"console"` | Default output format (used when `--format` is not specified) |
| `quiet` | bool | `false` | Suppress status messages |
| `max_results` | int \| null | null | Cap number of findings; null = unlimited |
| `attack_surface_ranking` | bool | `true` | Enable attack-surface ranking |
| `min_score` | int \| null | null | Minimum rank score to include; null = no minimum |
| `min_confidence` | string \| null | null | Minimum confidence level (`"low"`, `"medium"`, `"high"`); null = no minimum |
| `include_quality` | bool | `false` | Include Quality-category findings (hidden by default) |
| `show_all` | bool | `false` | Disable category filtering, rollups, and LOW budgets |
| `max_low` | int | `20` | Maximum total LOW findings to show (rollups count as 1) |
| `max_low_per_file` | int | `1` | Maximum LOW findings per file (rollups count as 1) |
| `max_low_per_rule` | int | `10` | Maximum LOW findings per rule (rollups count as 1) |
| `rollup_examples` | int | `5` | Number of example locations stored in rollup findings |
### `[performance]`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `max_depth` | int \| null | null | Max filesystem traversal depth; null = unlimited |
| `min_depth` | int \| null | null | Min depth for reported entries (RESERVED) |
| `prune` | bool | `false` | Stop traversing into matching directories (RESERVED) |
| `worker_threads` | int \| null | null | Worker thread count; null/0 = auto-detect |
| `batch_size` | int | `100` | Files per index batch |
| `channel_multiplier` | int | `4` | Channel capacity = threads x multiplier |
| `rayon_thread_stack_size` | int | `8388608` | Rayon thread stack size in bytes (8 MiB) |
| `scan_timeout_secs` | int \| null | null | Per-file timeout in seconds (RESERVED) |
| `memory_limit_mb` | int | `512` | Max memory in MiB (RESERVED) |
### `[server]`
Configuration for the local web UI (`nyx serve`).
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled` | bool | `true` | Whether the serve command is enabled |
| `host` | string | `"127.0.0.1"` | Host to bind to (localhost by default) |
| `port` | int | `9700` | Port for the web UI |
| `open_browser` | bool | `true` | Open browser automatically on serve |
| `auto_reload` | bool | `true` | Auto-reload UI when scan results change |
| `persist_runs` | bool | `true` | Persist scan runs for history view |
| `max_saved_runs` | int | `50` | Maximum number of saved runs |
| `triage_sync` | bool | `true` | Auto-sync triage decisions to `.nyx/triage.json` in the project root so changes can be committed to git. |
### `[runs]`
Configuration for scan run persistence and history.
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `persist` | bool | `false` | Persist scan run history to disk |
| `max_runs` | int | `100` | Maximum number of runs to keep |
| `save_logs` | bool | `false` | Save scan logs with each run |
| `save_stdout` | bool | `false` | Save stdout capture with each run |
| `save_code_snippets` | bool | `true` | Save code snippets in findings |
### `[profiles.<name>]`
Named scan presets that override scan-related config. Activate with `--profile <name>`.
All fields are optional; omitted fields inherit from the base config.
| Field | Type | Description |
|-------|------|-------------|
| `mode` | string | Analysis mode |
| `min_severity` | string | Minimum severity |
| `max_file_size_mb` | int | Max file size in MiB |
| `include_nonprod` | bool | Keep original severity for test/vendor |
| `enable_state_analysis` | bool | Enable state analysis |
| `default_format` | string | Output format |
| `quiet` | bool | Suppress status output |
| `attack_surface_ranking` | bool | Enable ranking |
| `max_results` | int | Max findings |
| `min_score` | int | Min rank score |
| `show_all` | bool | Show all findings |
| `include_quality` | bool | Include quality findings |
| `worker_threads` | int | Worker thread count |
| `max_depth` | int | Max traversal depth |
**Built-in profiles:**
| Name | Description |
|------|-------------|
| `quick` | AST-only, medium+ severity |
| `full` | Full analysis with state analysis enabled |
| `ci` | Full analysis, medium+ severity, quiet, SARIF output |
| `taint_only` | Taint analysis only |
| `conservative_large_repo` | AST-only, high severity, 5 MiB file limit, depth 10 |
User-defined profiles with the same name as a built-in will override it.
### `[analysis.engine]`
Release-grade switches for the optional analysis passes. Each toggle has a
matching CLI flag (pair of `--foo` / `--no-foo`) that overrides the config
value for a single run. These used to be `NYX_*` environment variables
(`NYX_CONSTRAINT`, `NYX_ABSTRACT_INTERP`, `NYX_SYMEX`, `NYX_CROSS_FILE_SYMEX`,
`NYX_SYMEX_INTERPROC`, `NYX_CONTEXT_SENSITIVE`, `NYX_BACKWARDS`,
`NYX_PARSE_TIMEOUT_MS`, `NYX_SMT`); those env vars are still honored as a
fallback default when nyx is used as a library (no CLI entry point), but the
config/CLI surface is the stable path.
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `constraint_solving` | bool | `true` | Path-constraint solving (prunes infeasible paths in taint) |
| `abstract_interpretation` | bool | `true` | Interval / string / bit abstract domains carried through the SSA worklist |
| `context_sensitive` | bool | `true` | k=1 context-sensitive callee inlining for intra-file calls |
| `backwards_analysis` | bool | `false` | Demand-driven backwards taint walk from sinks (adds scan time; default off) |
| `parse_timeout_ms` | int | `10000` | Per-file tree-sitter parse timeout; `0` disables the cap |
| `max_origins` | int | `32` | Maximum taint origins retained per lattice value. Excess origins are dropped deterministically (sorted by source location) and an `OriginsTruncated` engine note is recorded. CLI: `--max-origins`. |
| `max_pointsto` | int | `32` | Maximum abstract heap objects retained per intra-procedural points-to set. Excess objects are dropped and a `PointsToTruncated` engine note is recorded. CLI: `--max-pointsto`. |
**`[analysis.engine.symex]`** sub-section:
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled` | bool | `true` | Run the symex pipeline after taint; adds witness strings and symbolic verdicts |
| `cross_file` | bool | `true` | Persist / consult cross-file SSA bodies so symex can reason about callees defined in other files |
| `interprocedural` | bool | `true` | Intra-file interprocedural symex (k ≥ 2 via frame stack) |
| `smt` | bool | `true` | Use the SMT backend when nyx is built with the `smt` feature; ignored otherwise |
CLI flag map (each pair is `--enable / --no-enable`):
| Config field | CLI flags |
|---|---|
| `constraint_solving` | `--constraint-solving` / `--no-constraint-solving` |
| `abstract_interpretation` | `--abstract-interp` / `--no-abstract-interp` |
| `context_sensitive` | `--context-sensitive` / `--no-context-sensitive` |
| `backwards_analysis` | `--backwards-analysis` / `--no-backwards-analysis` |
| `parse_timeout_ms` | `--parse-timeout-ms <N>` |
| `symex.enabled` | `--symex` / `--no-symex` |
| `symex.cross_file` | `--cross-file-symex` / `--no-cross-file-symex` |
| `symex.interprocedural` | `--symex-interproc` / `--no-symex-interproc` |
| `symex.smt` | `--smt` / `--no-smt` |
| `max_origins` | `--max-origins <N>` |
| `max_pointsto` | `--max-pointsto <N>` |
**Engine-depth profile shortcut**: instead of flipping individual toggles, pass `--engine-profile {fast,balanced,deep}` to set the whole stack at once. Individual flags override the profile, so `--engine-profile fast --backwards-analysis` runs the fast stack with backwards analysis on. See `docs/cli.md` for the exact toggle matrix.
**Explain effective engine**: pass `--explain-engine` to print the resolved engine configuration (profile + config + CLI overrides) and exit without scanning.
### `[chain]`
Bounded-DFS path search across taint findings. Emits multi-step attack chains when several findings link through shared SSA values or call edges.
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `max_depth` | int | `4` | Maximum per-finding hops in a single chain path. |
| `min_score` | float | `9.5` | Score threshold; chains below this value are dropped. |
| `reverify_top_n` | int | `5` | Only the top-N chains by score are eligible for composite dynamic re-verification. `0` disables composite re-verification. |
### `[telemetry]`
Sampling policy for the on-disk event log written by dynamic verification (`~/.cache/nyx/dynamic/events.jsonl`). Confirmed and Inconclusive verdicts are calibration-critical and kept by default; other verdict statuses can be downsampled to bound log growth. Decisions are seeded by `spec_hash` for determinism. See `docs/dynamic.md` for the on-disk schema and `NYX_NO_TELEMETRY=1` opt-out.
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `keep_all_confirmed` | bool | `true` | Always retain `Confirmed` verdicts. |
| `keep_all_inconclusive` | bool | `true` | Always retain `Inconclusive` verdicts. |
| `sample_rate_other` | float | `1.0` | Retention probability for verdicts not covered by the keep-all flags. `1.0` keeps everything, `0.0` drops everything. |
### `[detectors.data_exfil]`
Per-project tuning for the `taint-data-exfiltration` rule. All fields are optional.
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled` | bool | `true` | Set `false` to strip `Cap::DATA_EXFIL` from sink caps before emission. No `taint-data-exfiltration` finding reaches the report. Other taint classes are not affected. |
| `trusted_destinations` | [string] | `[]` | URL prefixes that drop `Cap::DATA_EXFIL` on the call site. Matched against the abstract-string domain prefix of the destination arg, so a literal URL or a template literal with a static prefix both work. Use full origins or origin-pinned paths and include the trailing `/`, otherwise `https://api.` matches `https://api.evil.example.com/` too. |
```toml
[detectors.data_exfil]
enabled = true
trusted_destinations = [
"https://api.internal/",
"https://telemetry.example.com/",
]
```
For the sanitizer convention, source sensitivity gate, and per-language sink coverage, see [Detectors / Taint / DATA_EXFIL](detectors/taint.md#data_exfil-suppression-layers).
### `[analysis.languages.<slug>]`
Per-language custom rules. `<slug>` is one of: `rust`, `javascript`, `typescript`, `python`, `go`, `java`, `c`, `cpp`, `php`, `ruby`.
| Field | Type | Description |
|-------|------|-------------|
| `rules` | array of rule objects | Custom label rules |
| `terminators` | [string] | Functions that terminate execution |
| `event_handlers` | [string] | Event handler function names |
**Rule object**:
```toml
[[analysis.languages.javascript.rules]]
matchers = ["escapeHtml"]
kind = "sanitizer" # "source" | "sanitizer" | "sink"
cap = "html_escape" # "env_var" | "html_escape" | "shell_escape" |
# "url_encode" | "json_parse" | "file_io" |
# "fmt_string" | "sql_query" | "deserialize" |
# "ssrf" | "data_exfil" | "code_exec" | "crypto" |
# "unauthorized_id" | "ldap_injection" |
# "xpath_injection" | "header_injection" |
# "open_redirect" | "ssti" | "xxe" |
# "prototype_pollution" | "all"
```
Aliases accepted by `parse_cap` and `[..rules].cap`: `data_exfiltration` for `data_exfil`, `ldapi` for `ldap_injection`, `xpathi` for `xpath_injection`, `crlf` and `response_splitting` for `header_injection`, `redirect` for `open_redirect`, `template_injection` for `ssti`, `proto_pollution` for `prototype_pollution`.
---
## Example Configurations
### Minimal override (`nyx.local`)
```toml
[scanner]
min_severity = "Medium"
[output]
default_format = "json"
max_results = 100
```
### CI-optimized
```toml
[scanner]
mode = "full"
min_severity = "Medium"
excluded_directories = ["node_modules", ".git", "target", "vendor", "dist"]
[output]
quiet = true
default_format = "sarif"
[performance]
worker_threads = 4
```
### Using a scan profile
```bash
# Use a built-in profile
nyx scan --profile ci
# CLI flags still override profile values
nyx scan --profile ci --format json
```
### Custom profile
```toml
[profiles.security_audit]
mode = "full"
min_severity = "Low"
enable_state_analysis = true
show_all = true
```
### Custom rules for a Node.js project
```toml
[analysis.languages.javascript]
terminators = ["process.exit", "abort"]
event_handlers = ["addEventListener"]
[[analysis.languages.javascript.rules]]
matchers = ["escapeHtml", "sanitizeInput"]
kind = "sanitizer"
cap = "html_escape"
[[analysis.languages.javascript.rules]]
matchers = ["dangerouslySetInnerHTML"]
kind = "sink"
cap = "html_escape"
[[analysis.languages.javascript.rules]]
matchers = ["getRequestBody", "readUserInput"]
kind = "source"
cap = "all"
```
### Adding rules via CLI
```bash
# Add a sanitizer
nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape
# Add a terminator
nyx config add-terminator --lang javascript --name process.exit
# Verify
nyx config show
```
---
## Config Validation
Config is validated after loading and merging. Validation checks include:
- Server port must be 1 to 65535
- Server host must not be empty
- `max_saved_runs` must be > 0 when `persist_runs` is true
- `max_runs` must be > 0 when `persist` is true
- `batch_size` and `channel_multiplier` must be > 0
- `rollup_examples` must be > 0
- Profile names must be alphanumeric with underscores only
Invalid config produces structured error messages identifying the section, field, and issue.
---
## State Analysis
State analysis detects resource lifecycle violations (use-after-close, double-close, resource leaks) and unauthenticated access patterns. It is **enabled by default**.
To disable:
```toml
[scanner]
enable_state_analysis = false
```
State analysis requires `mode = "full"` or `mode = "taint"`. It has no effect in `mode = "ast"`.
**Tradeoffs**:
- Additional per-function state-machine pass adds some scan time
- May produce findings that require domain knowledge to evaluate (e.g., whether a resource handle is intentionally left open)
- Most useful for C, C++, Rust, Go, and Java where acquire/release patterns are common
---
## Upgrading
### Engine-version mismatch is handled automatically
Nyx stores the scanner's `CARGO_PKG_VERSION` in the project index database.
When the version recorded in the DB differs from the running binary, or the
row is missing entirely, every cached summary, SSA body, and file-hash row
is wiped on the next open. The next scan rebuilds the index against the new
engine. No flag is needed; CI pipelines keep working across upgrades.
The rebuild is logged at `info` level:
```
engine version changed (<old><new>), rebuilding index
```
If you see this once per upgrade it is working as intended. If you see it on
every scan, the metadata row is not being persisted; file an issue.
### Forcing a reindex
Use `--index rebuild` to throw away the current project's cached summaries
and re-run pass 1 against the current rules. Useful after editing
`nyx.local` rules, after an upgrade that changed label definitions without
changing the engine version, or when you want a known-clean baseline:
```bash
nyx scan --index rebuild .
```
This clears the current project's rows in `files`, `function_summaries`,
`ssa_function_summaries`, and `ssa_function_bodies`; other projects sharing
the same DB directory are untouched.
### Recovering from a corrupt database
If the `.sqlite` file itself is damaged (e.g. from a killed scan or full
disk) and `nyx scan` fails to open it, delete the file and let the next
scan recreate it:
```bash
rm "$(nyx config path)"/<project>.sqlite*
```
On the next scan Nyx builds a fresh index from scratch.
---
## Reserved Fields
Some config fields are defined but not yet implemented. They are marked `(RESERVED)` in the default config and accept values without effect. Config files stay forward-compatible: settings start having an effect when the feature ships, with no edit needed.

102
docs/detectors.md Normal file
View file

@ -0,0 +1,102 @@
# Detectors
Nyx ships four independent detector families. They run together in `--mode full`, the default. Findings are merged, deduplicated, ranked, and printed in one result set.
| Family | Rule prefix | Looks at | What it finds |
|---|---|---|---|
| [Taint analysis](detectors/taint.md) | `taint-*` | Cross-file dataflow | Unsanitized data flowing source to sink |
| [CFG structural](detectors/cfg.md) | `cfg-*` | Per-function control flow | Auth gaps, unguarded sinks, error fallthrough, resource release on all paths |
| [State model](detectors/state.md) | `state-*` | Per-function state lattice | Use-after-close, double-close, leaks, unauthenticated access |
| [AST patterns](detectors/patterns.md) | `<lang>.<cat>.<name>` | Tree-sitter structural match | Banned APIs, weak crypto, dangerous constructs |
```mermaid
flowchart LR
Taint["Taint analysis<br/>cross-file source-to-sink"] --> Normalize["Normalize findings"]
Cfg["CFG structural<br/>guards, exits, resource paths"] --> Normalize
State["State model<br/>resource and auth lattice"] --> Normalize
Ast["AST patterns<br/>tree-sitter structural match"] --> Normalize
Normalize --> Dedupe["Deduplicate<br/>same site, rule, severity"]
Dedupe --> Rank["Rank<br/>severity, evidence, context"]
Rank --> Output["Console, JSON, SARIF, UI"]
```
The taint family is split into cap-specific rule classes when a sink callee carries multiple vulnerability classes:
| Rule id | Cap | Surface |
|---|---|---|
| `taint-unsanitised-flow` | `sql_query`, `ssrf`, `code_exec`, `file_io`, `fmt_string`, `deserialize`, `crypto` | Catch-all class for the legacy caps that have not migrated to a dedicated rule id yet. |
| `taint-ldap-injection` | `ldap_injection` | Attacker-controlled data concatenated into an LDAP filter or DN without RFC 4515 escaping. Receivers typed as `LdapClient` (JNDI `DirContext`, Spring `LdapTemplate`, ldapjs `Client`, python-ldap `LDAPObject`, ldap3 `Connection`) and chained `.search` / `.searchByEntity` / `.search_s` form the sink set. |
| `taint-xpath-injection` | `xpath_injection` | Attacker-controlled string passed as the XPath expression to `xpath.evaluate` / `xpath.compile` / `document.evaluate` / `DOMXPath::query` / `etree.XPath`. Suppressed when the receiver was bound to an `XPathVariableResolver` (parameterised XPath shape). |
| `taint-header-injection` | `header_injection` | Attacker-controlled bytes landing in an HTTP response header without `\r\n` stripping (response splitting, cache poisoning). Covers `setHeader` / `res.set` / `res.append` / `headers["X-Foo"] = bar` / `Header().Set` / `add_header` / `setcookie` / `http.Header.Set`. |
| `taint-open-redirect` | `open_redirect` | Attacker-controlled URL driving a redirect / `Location` header without an allowlist or relative-URL check. Includes the Spring MVC `return "redirect:" + url` view-name shape via the `__spring_redirect__` synthetic sink. Suppressed by `RelativeUrlValidated` (`startsWith("/")` family) and `HostAllowlistValidated` (`new URL(x).host === ALLOWED`, `urlparse(x).netloc == ...`) inline predicates. |
| `taint-template-injection` | `ssti` | Attacker controls the *template source string* fed to a server-side renderer (Jinja2 / Mako / FreeMarker / Twig / Handlebars / EJS / Mustache / ERB / `text/template` / `html/template` / Smarty / Blade `Template(...)` / `compile(...)`), distinct from rendering a trusted template with tainted variables. |
| `taint-xxe` | `xxe` | Attacker-controlled XML reaching a parser that resolves external entities. Covers JAXP `DocumentBuilder.parse` / `SAXParser.parse` / `XMLReader.parse`, lxml `etree.parse`, Nokogiri, fast-xml-parser, xml2js, libxml2 `xmlReadFile` / `xmlReadMemory`. Suppressed when the receiver carries a hardening fact in `xml_parser_config` (`secure_processing`, `disallow_doctype`, `processEntities: false`, `LIBXML_NOENT` not set). |
| `taint-prototype-pollution` | `prototype_pollution` | Attacker-controlled key reaching an object property assignment that can mutate `Object.prototype`. JS/TS only. Covers `obj[tainted] = v` (synthetic `__index_set__` sink), library-mediated deep-merge / set helpers (`_.merge`, `_.set`, `dotProp.set`, `objectPath.set`, `setValue`), and jQuery's `extend(true, target, src)` deep-merge form via the `LiteralOnly` activation gate. Suppressed by constant-key fold (`__proto__` / `constructor` / `prototype` filtering), reject / allowlist guards on the key, and `Object.create(null)` receivers (flow-sensitive `NullPrototypeObject` type). Python equivalent (`dict.update`) is opt-in via `NYX_PYTHON_PROTO_POLLUTION=1`. |
| `taint-data-exfiltration` | `data_exfil` | Sensitive data flowing into the payload of an outbound network request (body / headers / json on `fetch`, body on `XMLHttpRequest.send`). Distinct from SSRF: the destination is fixed but attacker-influenced bytes leave the process. |
| `rs.auth.missing_ownership_check.taint` | `unauthorized_id` | Rust auth subsystem fold-in; see [auth.md](auth.md). |
A single call site can fire several of these at once when it carries multiple gates. `fetch(taintedUrl, {body: tainted})` produces both an SSRF finding (URL flow) and a `taint-data-exfiltration` finding (body flow), each with its own cap mask rather than a conflated union.
Each cap-class entry is registered in `CAP_RULE_REGISTRY` (`src/labels/mod.rs`) with its title, severity, OWASP 2021 code, and description. Browse the registry from the CLI with `nyx rules list --class-only`, or `nyx rules list --kind class --json` for machine output.
For Rust auth-specific rules (`rs.auth.*`), see [auth.md](auth.md).
## How they combine
In `--mode full`:
1. **Taint and AST can both fire on one line.** If `eval(userInput)` triggers both `js.code_exec.eval` (AST) and `taint-unsanitised-flow` (taint), both are kept with distinct rule IDs. The taint finding ranks higher because of the analysis-kind bonus.
2. **State supersedes CFG on resource leaks.** When `state-resource-leak` and `cfg-resource-leak` fire at the same location, the CFG one is dropped.
3. **Exact duplicates are removed.** Same line, column, rule ID, severity → one finding.
## Modes
| Mode | Active detectors |
|---|---|
| `full` (default) | All four |
| `ast` | AST patterns only |
| `cfg` | Taint + CFG + State (no AST patterns) |
| `taint` | Taint + State |
## Attack-surface ranking
Every finding gets a deterministic score. Findings are sorted by descending score by default. Disable with `--no-rank` or `output.attack_surface_ranking = false`.
```
score = severity_base + analysis_kind + evidence_strength + state_bonus - validation_penalty
```
| Component | Values |
|---|---|
| Severity base | High=60, Medium=30, Low=10 |
| Analysis kind | taint=+10, taint-data-exfiltration=+7, state=+8, cfg with evidence=+5, cfg without evidence=+3, ast=+0 |
| Evidence strength | +1 per evidence item up to 4; +2 to +6 for source kind |
| State bonus | use-after-close / unauthed=+6, double-close=+3, must-leak=+2, may-leak=+1 |
| Validation penalty | -5 if path-validated |
DATA_EXFIL is calibrated below other taint classes by design. Severity is High only when the source carries credential / session material (cookies, env vars); other Sensitive sources (request headers, file system, database, caught exception) downgrade to Medium. Confidence is capped at Medium and only fires Medium when the abstract / symbolic domain corroborates a concrete string body reaching the outbound payload; otherwise it falls to Low. A guarded flow (`path_validated`) drops a confidence tier. The intent is to seat data-exfiltration findings below SSRF / SQLi / command-injection but above informational AST patterns.
Source-kind contributions (taint only):
| Source | Bonus |
|---|---|
| User input (`req.body`, `argv`, `stdin`, `form`, `query`, `params`) | +6 |
| Environment (`env::var`, `getenv`, `process.env`) | +5 |
| Unknown | +4 |
| File system | +3 |
| Database | +2 |
Approximate score ranges:
| Finding type | Score |
|---|---|
| High taint with user input | 76 to 81 |
| High state (use-after-close) | ~74 |
| High CFG structural | 63 to 68 |
| High DATA_EXFIL (cookie / env source, body confirmed) | ~76 |
| Medium taint with env source | 45 to 50 |
| Medium DATA_EXFIL (header / fs / db / caught-exception source) | 40 to 45 |
| Medium state (resource leak) | ~40 |
| Low AST-only pattern | ~10 |
For the engine's runtime model (passes, summaries, SCC fixed-point), see [how-it-works.md](how-it-works.md).

Some files were not shown because too many files have changed in this diff Show more