apunkt/nyx

mirror of https://github.com/elicpeter/nyx.git synced 2026-06-09 19:45:13 +02:00

* feat: Add const_bound_vars tracking to prevent false positives in ownership checks

* feat: Introduce field interner and typed bounded vars for enhanced type tracking

* feat: Add typed_call_receivers and typed_bounded_dto_fields for enhanced type tracking

* feat: Centralize method name extraction with bare_method_name helper

* feat: Implement Phase-6 hierarchy fan-out for runtime virtual dispatch

* feat: Enhance C++ taint tracking with additional container operations and inline method resolution

* feat: Introduce field-sensitive points-to analysis for enhanced resource tracking

* feat: Implement Pointer-Phase 6 subscript handling for enhanced container analysis

* test: Add comprehensive tests for JavaScript control flow constructs and lattice operations

* docs: Update advanced analysis documentation with field-sensitive points-to and hierarchy fan-out details

* test: Add comprehensive tests for lattice algebra laws and SSA edge cases

* feat: Add destructured session user handling and safe user ID access patterns

* feat: Implement row-population reverse-walk for enhanced authorization checks

* feat: Enhance authorization checks with local alias chain for self-actor types

* feat: Introduce ActiveRecord query safety checks and enhance snippet extraction

* feat: Implement chained method call inner-gate rebinding for SSRF prevention

* feat: Add observability and error modules, enhance debug functionality, and implement theme context

* feat: Remove Auth Analysis page and update navigation to redirect to Explorer

* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor

* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor

* feat: Reset path-safe-suppressed spans before lowering to maintain analysis integrity

* fix(ssa): ungate debug_assert_bfs_ordering for release-tests build

The helper at src/ssa/lower.rs was gated `#[cfg(debug_assertions)]` while
the unit test at the bottom of the file was gated only `#[cfg(test)]`.
Since `cfg(test)` is set in release builds with `--tests` but
`cfg(debug_assertions)` is not, `cargo build --release --tests` failed
with E0425. Removing the gate fixes the build; the body is `debug_assert!`
only, so the helper is free in release. Also drop the gate at the call
site to avoid a `dead_code` warning when the lib is built without
`--tests`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(closure-capture): flip JS/TS fixtures to required-finding

The JS and TS closure-capture fixtures pinned the old broken behaviour
via `forbidden_findings: [{ "id_prefix": "taint-" }]`. The engine now
correctly traces taint through the closure boundary (env source captured
by an arrow function, sunk via `child_process.exec` inside the body), so
the formerly-forbidden finding is a true positive.

Match the Python sibling's shape — `required_findings` with
`id_prefix` + `min_count` plus a small `noise_budget` — and rewrite the
companion READMEs and the phase8_fragility_tests doc-comments from
"known gap" to "regression guard".

Verified:
- cargo test --release --test phase8_fragility_tests → 8/8 pass
- cargo test --release --lib bfs_assertion → pass
- corpus benchmark F1 = 0.9976 (TP=205, FP=1, FN=0) — unchanged

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: Add OWASP mapping and baseline mutation hooks for enhanced security analysis

* feat: Introduce health module and enhance health score computation with calibration tests

* feat: Add expectations configuration and cleanup .gitignore for log files

* feat: Implement theme selection and enhance settings panel for triage sync

* feat: Suppress false positives for strcpy calls with literal sources in AST

* feat: Update analyse_function_ssa to return body CFG for accurate analysis

* feat: Add bug report and feature request templates for improved issue tracking

* feat: removed dev scripts

* feat: update README.md for clarity and consistency in fixture descriptions

* feat: removed dev docs

* feat: clean up error handling and UI elements for improved user experience

* feat: adjust button sizes in HeaderBar for better UI consistency

* feat: enhance taint analysis with additional context for sanitizer and taint findings

* cargo fmt

* prettier

* refactor: simplify conditional checks and improve code readability in AST and screenshot capture scripts

* feat: add script to frame PNG screenshots with brand gradient

* feat: add fuzzing support with new targets and CI workflows

* refactor: streamline match expressions and improve formatting in CLI and output handling

* feat: enhance configuration display with detailed output options

* feat: stage demo configuration for improved CLI screenshot output

* feat: expose merge_configs function for user-configurable settings

* refactor: simplify code structure and improve readability in config handling

* refactor: improve descriptions for vulnerability patterns in various languages

* feat: update MIT License section with additional usage details and copyright information

* feat: update screenshots

* refactor: update build process and paths for frontend assets

* feat: add cross-file taint fuzzing target and supporting dictionary

* refactor: clean up formatting and comments in fuzz configuration and example files

* refactor: remove outdated comments and clean up CI configuration files

* chore: update changelog dates and improve formatting in documentation

* refactor: update Cargo.toml and CI configuration for improved packaging and build process

* refactor: enhance quote-stripping logic to prevent panics and add regression tests

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-29 00:58:38 -04:00

8.2 KiB

Raw Blame History

Benchmark Results

Current baseline (2026-04-29):

Metric	File-level	Rule-level	CI floor
Precision	0.991	0.991	0.861
Recall	0.995	0.995	0.944
F1	0.993	0.993	0.901

Corpus: 433 cases across 10 languages, 432 evaluated (1 disabled). Per-run JSON lands in tests/benchmark/results/ (latest.json plus dated snapshots). See README.md for what the scoring modes mean and how to run a subset.

The corpus is mostly synthetic 8-20 line fixtures, one vulnerability or one safe pattern per file. A smaller real-CVE replay set under cve_corpus/ covers 18 published CVEs across all 10 languages. Both contribute to the headline numbers.

Real CVE coverage

Real disclosed CVEs reduced to minimal reproducers, vulnerable + patched pair per CVE. Vulnerable fixtures must produce a finding for the disclosed sink class. Patched fixtures must produce zero findings.

CVE	Language	Project	License	Class	Status
CVE-2023-48022	Python	Ray	Apache-2.0	CMDI	detected
CVE-2017-18342	Python	PyYAML	MIT	Deserialization	detected
CVE-2019-14939	JavaScript	mongo-express	MIT	code_exec	detected
CVE-2025-64430	JavaScript	Parse Server	Apache-2.0	SSRF	detected
CVE-2023-26159	TypeScript	follow-redirects	MIT	SSRF	detected
CVE-2022-30323	Go	hashicorp/go-getter	MPL-2.0	CMDI	detected
CVE-2023-3188	Go	owncast	MIT	SSRF	open FN
CVE-2024-31450	Go	owncast	MIT	path_traversal	detected
CVE-2015-7501	Java	Apache Commons Collections	Apache-2.0	Deserialization	detected
CVE-2017-12629	Java	Apache Solr	Apache-2.0	CMDI	detected
CVE-2013-0156	Ruby	Ruby on Rails	MIT	Deserialization	detected
CVE-2020-8130	Ruby	Rake	MIT	CMDI	detected
CVE-2017-9841	PHP	PHPUnit	BSD-3-Clause	code_exec	detected
CVE-2018-15133	PHP	Laravel	MIT	Deserialization	detected
CVE-2016-3714	C	ImageMagick (ImageTragick)	ImageMagick License	CMDI	detected
CVE-2019-18634	C	sudo (pwfeedback)	ISC	memory_safety	detected
CVE-2019-13132	C++	ZeroMQ libzmq	MPL-2.0	memory_safety	detected
CVE-2022-1941	C++	Protocol Buffers	BSD-3-Clause	memory_safety	detected

Deferred entries are real bugs Nyx can't yet detect. The fixture stays committed with disabled: true in ground truth so the gap remains visible.

How CVEs get picked

Publicly disclosed with a stable advisory link.
Class Nyx already has a rule for, so the vulnerable fixture asserts on a concrete rule ID, not just a generic taint flow.
Reducible to roughly 30 lines without hiding the disclosed sink shape.
Permissive upstream license (MIT, Apache, BSD, MPL, ISC, ImageMagick).

Fixtures are minimal reproducers of the unsafe pattern, not verbatim upstream code.

CI floor

CI fails the build if rule-level precision drops below 0.861, recall below 0.944, or F1 below 0.901. Floors sit roughly 8 percentage points below the live baseline. A single-case flip is about 0.6 pp on this corpus, so the headroom absorbs honest FP/TN trades while still tripping on a class-level regression. Floors only move up, when a durable improvement lands. Never relax them to paper over a regression.

The gate runs in the benchmark-gate job in .github/workflows/ci.yml. Thresholds are encoded at the bottom of tests/benchmark_test.rs.

Recent changes

Most recent first. Metrics are rule-level on the corpus size at that point.

Date	Change	Corpus	P	R	F1
2026-04-28	Ruby bare `Kernel#open` CMDI sink, exact-match sigil on label matchers	428	0.995	1.000	0.998
2026-04-28	Go SSRF/FILE_IO sink expansion (`http.DefaultClient.*`, `os.Remove`/`WriteFile`) plus Decode-writeback container op	426	0.995	1.000	0.998
2026-04-27	JS chained-method inner-gate classification (`http.get(u, cb).on(...)`)	422	0.994	1.000	0.997
2026-04-23	Auth FP remediation: 10 Rust ownership-check fixtures wired to corpus	305	0.946	0.994	0.970
2026-04-23	C and C++ added as first-class CVE-corpus languages (5 new CVE pairs)	295	0.945	0.994	0.969
2026-04-23	Go, Java, Ruby, PHP, plus second Python CVE pair	285	0.944	0.994	0.968
2026-04-23	Real-CVE replay corpus seeded (Python, JS, TS, one CVE per language)	273	0.942	0.994	0.967
2026-04-22	Cross-file points-to summaries, SCC joint fixed-point, backwards taint	273	0.940	0.994	0.966
2026-04-22	Cross-file context-sensitive inline taint (k=1)	270	0.940	0.994	0.966
2026-04-20	Rust weak-spot fixes across FILE_IO, SSRF, SQL, DESERIALIZE sink families	262	0.906	0.994	0.948
2026-04-20	TypeScript weak-spot fixes, Fastify framework detection, TSX/JSX grammar	262	0.899	0.981	0.938
2026-04-20	Rust corpus expansion: honest FNs in classes lacking Rust rules	262	0.891	0.961	0.925
2026-04-20	TypeScript corpus 0 to 32 cases across 12 vuln classes	246	0.904	0.986	0.944
2026-03-24	Benchmark expansion: C, C++, Rust as first-class; +73 cases	214	0.827	0.950	0.885
2026-03-22	Cross-file SSA validation, multi-file directory cases	141	0.840	0.975	0.903
2026-03-22	Ruby corpus 1 to 21 cases across 8 vuln classes	123	0.821	0.986	0.896
2026-03-22	SSA lowering hardening (PHP closures, Python try/except, exception edges)	103	0.841	0.983	0.906
2026-03-21	SSRF semantic completion (axios, got, undici, httpx, Net::HTTP, HTTParty)	103	0.671	0.966	0.792
2026-03-21	Constant-arg suppression at AST and CFG level	95	0.654	0.964	0.779
2026-03-21	Bare `exec`/`execSync` as JS CMDI sinks; Python `Template` as XSS sink	95	0.624	0.964	0.757
2026-03-21	First baseline after symbolic-strings work	95	0.620	0.891	0.731

Known limitations

These show up across multiple corpora and aren't fully fixed yet.

Variable-receiver method calls (client.send(...) vs HttpClient.send(...)) miss without an inferred receiver type. Type-aware callee resolution closes most cases; some residuals remain.
Arbitrary import aliases (from flask import request as r) aren't traced. Only explicitly listed aliases resolve.
URL-parsing isn't credited as SSRF sanitization. Allowlist checks in conditions are recognised; call-site sanitizers aren't.
Rust unguarded-sink still fires for shell-escape sinks when a source is in scope but not flowing to the sink arg. Intentional for high-risk classes.
Rust negative-validation patterns (contains dominators, match-arm guards) aren't recognised yet.
DNS rebinding and async-callback flows are out of scope for static analysis without runtime context.

8.2 KiB Raw Blame History