* feat: Add const_bound_vars tracking to prevent false positives in ownership checks
* feat: Introduce field interner and typed bounded vars for enhanced type tracking
* feat: Add typed_call_receivers and typed_bounded_dto_fields for enhanced type tracking
* feat: Centralize method name extraction with bare_method_name helper
* feat: Implement Phase-6 hierarchy fan-out for runtime virtual dispatch
* feat: Enhance C++ taint tracking with additional container operations and inline method resolution
* feat: Introduce field-sensitive points-to analysis for enhanced resource tracking
* feat: Implement Pointer-Phase 6 subscript handling for enhanced container analysis
* test: Add comprehensive tests for JavaScript control flow constructs and lattice operations
* docs: Update advanced analysis documentation with field-sensitive points-to and hierarchy fan-out details
* test: Add comprehensive tests for lattice algebra laws and SSA edge cases
* feat: Add destructured session user handling and safe user ID access patterns
* feat: Implement row-population reverse-walk for enhanced authorization checks
* feat: Enhance authorization checks with local alias chain for self-actor types
* feat: Introduce ActiveRecord query safety checks and enhance snippet extraction
* feat: Implement chained method call inner-gate rebinding for SSRF prevention
* feat: Add observability and error modules, enhance debug functionality, and implement theme context
* feat: Remove Auth Analysis page and update navigation to redirect to Explorer
* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor
* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor
* feat: Reset path-safe-suppressed spans before lowering to maintain analysis integrity
* fix(ssa): ungate debug_assert_bfs_ordering for release-tests build
The helper at src/ssa/lower.rs was gated `#[cfg(debug_assertions)]` while
the unit test at the bottom of the file was gated only `#[cfg(test)]`.
Since `cfg(test)` is set in release builds with `--tests` but
`cfg(debug_assertions)` is not, `cargo build --release --tests` failed
with E0425. Removing the gate fixes the build; the body is `debug_assert!`
only, so the helper is free in release. Also drop the gate at the call
site to avoid a `dead_code` warning when the lib is built without
`--tests`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(closure-capture): flip JS/TS fixtures to required-finding
The JS and TS closure-capture fixtures pinned the old broken behaviour
via `forbidden_findings: [{ "id_prefix": "taint-" }]`. The engine now
correctly traces taint through the closure boundary (env source captured
by an arrow function, sunk via `child_process.exec` inside the body), so
the formerly-forbidden finding is a true positive.
Match the Python sibling's shape — `required_findings` with
`id_prefix` + `min_count` plus a small `noise_budget` — and rewrite the
companion READMEs and the phase8_fragility_tests doc-comments from
"known gap" to "regression guard".
Verified:
- cargo test --release --test phase8_fragility_tests → 8/8 pass
- cargo test --release --lib bfs_assertion → pass
- corpus benchmark F1 = 0.9976 (TP=205, FP=1, FN=0) — unchanged
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: Add OWASP mapping and baseline mutation hooks for enhanced security analysis
* feat: Introduce health module and enhance health score computation with calibration tests
* feat: Add expectations configuration and cleanup .gitignore for log files
* feat: Implement theme selection and enhance settings panel for triage sync
* feat: Suppress false positives for strcpy calls with literal sources in AST
* feat: Update analyse_function_ssa to return body CFG for accurate analysis
* feat: Add bug report and feature request templates for improved issue tracking
* feat: removed dev scripts
* feat: update README.md for clarity and consistency in fixture descriptions
* feat: removed dev docs
* feat: clean up error handling and UI elements for improved user experience
* feat: adjust button sizes in HeaderBar for better UI consistency
* feat: enhance taint analysis with additional context for sanitizer and taint findings
* cargo fmt
* prettier
* refactor: simplify conditional checks and improve code readability in AST and screenshot capture scripts
* feat: add script to frame PNG screenshots with brand gradient
* feat: add fuzzing support with new targets and CI workflows
* refactor: streamline match expressions and improve formatting in CLI and output handling
* feat: enhance configuration display with detailed output options
* feat: stage demo configuration for improved CLI screenshot output
* feat: expose merge_configs function for user-configurable settings
* refactor: simplify code structure and improve readability in config handling
* refactor: improve descriptions for vulnerability patterns in various languages
* feat: update MIT License section with additional usage details and copyright information
* feat: update screenshots
* refactor: update build process and paths for frontend assets
* feat: add cross-file taint fuzzing target and supporting dictionary
* refactor: clean up formatting and comments in fuzz configuration and example files
* refactor: remove outdated comments and clean up CI configuration files
* chore: update changelog dates and improve formatting in documentation
* refactor: update Cargo.toml and CI configuration for improved packaging and build process
* refactor: enhance quote-stripping logic to prevent panics and add regression tests
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.2 KiB
Benchmark Results
Current baseline (2026-04-29):
| Metric | File-level | Rule-level | CI floor |
|---|---|---|---|
| Precision | 0.991 | 0.991 | 0.861 |
| Recall | 0.995 | 0.995 | 0.944 |
| F1 | 0.993 | 0.993 | 0.901 |
Corpus: 433 cases across 10 languages, 432 evaluated (1 disabled). Per-run JSON lands in tests/benchmark/results/ (latest.json plus dated snapshots). See README.md for what the scoring modes mean and how to run a subset.
The corpus is mostly synthetic 8-20 line fixtures, one vulnerability or one safe pattern per file. A smaller real-CVE replay set under cve_corpus/ covers 18 published CVEs across all 10 languages. Both contribute to the headline numbers.
Real CVE coverage
Real disclosed CVEs reduced to minimal reproducers, vulnerable + patched pair per CVE. Vulnerable fixtures must produce a finding for the disclosed sink class. Patched fixtures must produce zero findings.
| CVE | Language | Project | License | Class | Status |
|---|---|---|---|---|---|
| CVE-2023-48022 | Python | Ray | Apache-2.0 | CMDI | detected |
| CVE-2017-18342 | Python | PyYAML | MIT | Deserialization | detected |
| CVE-2019-14939 | JavaScript | mongo-express | MIT | code_exec | detected |
| CVE-2025-64430 | JavaScript | Parse Server | Apache-2.0 | SSRF | detected |
| CVE-2023-26159 | TypeScript | follow-redirects | MIT | SSRF | detected |
| CVE-2022-30323 | Go | hashicorp/go-getter | MPL-2.0 | CMDI | detected |
| CVE-2023-3188 | Go | owncast | MIT | SSRF | open FN |
| CVE-2024-31450 | Go | owncast | MIT | path_traversal | detected |
| CVE-2015-7501 | Java | Apache Commons Collections | Apache-2.0 | Deserialization | detected |
| CVE-2017-12629 | Java | Apache Solr | Apache-2.0 | CMDI | detected |
| CVE-2013-0156 | Ruby | Ruby on Rails | MIT | Deserialization | detected |
| CVE-2020-8130 | Ruby | Rake | MIT | CMDI | detected |
| CVE-2017-9841 | PHP | PHPUnit | BSD-3-Clause | code_exec | detected |
| CVE-2018-15133 | PHP | Laravel | MIT | Deserialization | detected |
| CVE-2016-3714 | C | ImageMagick (ImageTragick) | ImageMagick License | CMDI | detected |
| CVE-2019-18634 | C | sudo (pwfeedback) | ISC | memory_safety | detected |
| CVE-2019-13132 | C++ | ZeroMQ libzmq | MPL-2.0 | memory_safety | detected |
| CVE-2022-1941 | C++ | Protocol Buffers | BSD-3-Clause | memory_safety | detected |
Deferred entries are real bugs Nyx can't yet detect. The fixture stays committed with disabled: true in ground truth so the gap remains visible.
How CVEs get picked
- Publicly disclosed with a stable advisory link.
- Class Nyx already has a rule for, so the vulnerable fixture asserts on a concrete rule ID, not just a generic taint flow.
- Reducible to roughly 30 lines without hiding the disclosed sink shape.
- Permissive upstream license (MIT, Apache, BSD, MPL, ISC, ImageMagick).
Fixtures are minimal reproducers of the unsafe pattern, not verbatim upstream code.
CI floor
CI fails the build if rule-level precision drops below 0.861, recall below 0.944, or F1 below 0.901. Floors sit roughly 8 percentage points below the live baseline. A single-case flip is about 0.6 pp on this corpus, so the headroom absorbs honest FP/TN trades while still tripping on a class-level regression. Floors only move up, when a durable improvement lands. Never relax them to paper over a regression.
The gate runs in the benchmark-gate job in .github/workflows/ci.yml. Thresholds are encoded at the bottom of tests/benchmark_test.rs.
Recent changes
Most recent first. Metrics are rule-level on the corpus size at that point.
| Date | Change | Corpus | P | R | F1 |
|---|---|---|---|---|---|
| 2026-04-28 | Ruby bare Kernel#open CMDI sink, exact-match sigil on label matchers |
428 | 0.995 | 1.000 | 0.998 |
| 2026-04-28 | Go SSRF/FILE_IO sink expansion (http.DefaultClient.*, os.Remove/WriteFile) plus Decode-writeback container op |
426 | 0.995 | 1.000 | 0.998 |
| 2026-04-27 | JS chained-method inner-gate classification (http.get(u, cb).on(...)) |
422 | 0.994 | 1.000 | 0.997 |
| 2026-04-23 | Auth FP remediation: 10 Rust ownership-check fixtures wired to corpus | 305 | 0.946 | 0.994 | 0.970 |
| 2026-04-23 | C and C++ added as first-class CVE-corpus languages (5 new CVE pairs) | 295 | 0.945 | 0.994 | 0.969 |
| 2026-04-23 | Go, Java, Ruby, PHP, plus second Python CVE pair | 285 | 0.944 | 0.994 | 0.968 |
| 2026-04-23 | Real-CVE replay corpus seeded (Python, JS, TS, one CVE per language) | 273 | 0.942 | 0.994 | 0.967 |
| 2026-04-22 | Cross-file points-to summaries, SCC joint fixed-point, backwards taint | 273 | 0.940 | 0.994 | 0.966 |
| 2026-04-22 | Cross-file context-sensitive inline taint (k=1) | 270 | 0.940 | 0.994 | 0.966 |
| 2026-04-20 | Rust weak-spot fixes across FILE_IO, SSRF, SQL, DESERIALIZE sink families | 262 | 0.906 | 0.994 | 0.948 |
| 2026-04-20 | TypeScript weak-spot fixes, Fastify framework detection, TSX/JSX grammar | 262 | 0.899 | 0.981 | 0.938 |
| 2026-04-20 | Rust corpus expansion: honest FNs in classes lacking Rust rules | 262 | 0.891 | 0.961 | 0.925 |
| 2026-04-20 | TypeScript corpus 0 to 32 cases across 12 vuln classes | 246 | 0.904 | 0.986 | 0.944 |
| 2026-03-24 | Benchmark expansion: C, C++, Rust as first-class; +73 cases | 214 | 0.827 | 0.950 | 0.885 |
| 2026-03-22 | Cross-file SSA validation, multi-file directory cases | 141 | 0.840 | 0.975 | 0.903 |
| 2026-03-22 | Ruby corpus 1 to 21 cases across 8 vuln classes | 123 | 0.821 | 0.986 | 0.896 |
| 2026-03-22 | SSA lowering hardening (PHP closures, Python try/except, exception edges) | 103 | 0.841 | 0.983 | 0.906 |
| 2026-03-21 | SSRF semantic completion (axios, got, undici, httpx, Net::HTTP, HTTParty) | 103 | 0.671 | 0.966 | 0.792 |
| 2026-03-21 | Constant-arg suppression at AST and CFG level | 95 | 0.654 | 0.964 | 0.779 |
| 2026-03-21 | Bare exec/execSync as JS CMDI sinks; Python Template as XSS sink |
95 | 0.624 | 0.964 | 0.757 |
| 2026-03-21 | First baseline after symbolic-strings work | 95 | 0.620 | 0.891 | 0.731 |
Known limitations
These show up across multiple corpora and aren't fully fixed yet.
- Variable-receiver method calls (
client.send(...)vsHttpClient.send(...)) miss without an inferred receiver type. Type-aware callee resolution closes most cases; some residuals remain. - Arbitrary import aliases (
from flask import request as r) aren't traced. Only explicitly listed aliases resolve. - URL-parsing isn't credited as SSRF sanitization. Allowlist checks in conditions are recognised; call-site sanitizers aren't.
- Rust unguarded-sink still fires for shell-escape sinks when a source is in scope but not flowing to the sink arg. Intentional for high-risk classes.
- Rust negative-validation patterns (
containsdominators, match-arm guards) aren't recognised yet. - DNS rebinding and async-callback flows are out of scope for static analysis without runtime context.