nyx/tests/health_score_calibration.rs
Eli Peter 82f18184b1
Prerelease cleanup (#46)
* feat: Add const_bound_vars tracking to prevent false positives in ownership checks

* feat: Introduce field interner and typed bounded vars for enhanced type tracking

* feat: Add typed_call_receivers and typed_bounded_dto_fields for enhanced type tracking

* feat: Centralize method name extraction with bare_method_name helper

* feat: Implement Phase-6 hierarchy fan-out for runtime virtual dispatch

* feat: Enhance C++ taint tracking with additional container operations and inline method resolution

* feat: Introduce field-sensitive points-to analysis for enhanced resource tracking

* feat: Implement Pointer-Phase 6 subscript handling for enhanced container analysis

* test: Add comprehensive tests for JavaScript control flow constructs and lattice operations

* docs: Update advanced analysis documentation with field-sensitive points-to and hierarchy fan-out details

* test: Add comprehensive tests for lattice algebra laws and SSA edge cases

* feat: Add destructured session user handling and safe user ID access patterns

* feat: Implement row-population reverse-walk for enhanced authorization checks

* feat: Enhance authorization checks with local alias chain for self-actor types

* feat: Introduce ActiveRecord query safety checks and enhance snippet extraction

* feat: Implement chained method call inner-gate rebinding for SSRF prevention

* feat: Add observability and error modules, enhance debug functionality, and implement theme context

* feat: Remove Auth Analysis page and update navigation to redirect to Explorer

* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor

* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor

* feat: Reset path-safe-suppressed spans before lowering to maintain analysis integrity

* fix(ssa): ungate debug_assert_bfs_ordering for release-tests build

The helper at src/ssa/lower.rs was gated `#[cfg(debug_assertions)]` while
the unit test at the bottom of the file was gated only `#[cfg(test)]`.
Since `cfg(test)` is set in release builds with `--tests` but
`cfg(debug_assertions)` is not, `cargo build --release --tests` failed
with E0425. Removing the gate fixes the build; the body is `debug_assert!`
only, so the helper is free in release. Also drop the gate at the call
site to avoid a `dead_code` warning when the lib is built without
`--tests`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(closure-capture): flip JS/TS fixtures to required-finding

The JS and TS closure-capture fixtures pinned the old broken behaviour
via `forbidden_findings: [{ "id_prefix": "taint-" }]`. The engine now
correctly traces taint through the closure boundary (env source captured
by an arrow function, sunk via `child_process.exec` inside the body), so
the formerly-forbidden finding is a true positive.

Match the Python sibling's shape — `required_findings` with
`id_prefix` + `min_count` plus a small `noise_budget` — and rewrite the
companion READMEs and the phase8_fragility_tests doc-comments from
"known gap" to "regression guard".

Verified:
- cargo test --release --test phase8_fragility_tests → 8/8 pass
- cargo test --release --lib bfs_assertion → pass
- corpus benchmark F1 = 0.9976 (TP=205, FP=1, FN=0) — unchanged

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: Add OWASP mapping and baseline mutation hooks for enhanced security analysis

* feat: Introduce health module and enhance health score computation with calibration tests

* feat: Add expectations configuration and cleanup .gitignore for log files

* feat: Implement theme selection and enhance settings panel for triage sync

* feat: Suppress false positives for strcpy calls with literal sources in AST

* feat: Update analyse_function_ssa to return body CFG for accurate analysis

* feat: Add bug report and feature request templates for improved issue tracking

* feat: removed dev scripts

* feat: update README.md for clarity and consistency in fixture descriptions

* feat: removed dev docs

* feat: clean up error handling and UI elements for improved user experience

* feat: adjust button sizes in HeaderBar for better UI consistency

* feat: enhance taint analysis with additional context for sanitizer and taint findings

* cargo fmt

* prettier

* refactor: simplify conditional checks and improve code readability in AST and screenshot capture scripts

* feat: add script to frame PNG screenshots with brand gradient

* feat: add fuzzing support with new targets and CI workflows

* refactor: streamline match expressions and improve formatting in CLI and output handling

* feat: enhance configuration display with detailed output options

* feat: stage demo configuration for improved CLI screenshot output

* feat: expose merge_configs function for user-configurable settings

* refactor: simplify code structure and improve readability in config handling

* refactor: improve descriptions for vulnerability patterns in various languages

* feat: update MIT License section with additional usage details and copyright information

* feat: update screenshots

* refactor: update build process and paths for frontend assets

* feat: add cross-file taint fuzzing target and supporting dictionary

* refactor: clean up formatting and comments in fuzz configuration and example files

* refactor: remove outdated comments and clean up CI configuration files

* chore: update changelog dates and improve formatting in documentation

* refactor: update Cargo.toml and CI configuration for improved packaging and build process

* refactor: enhance quote-stripping logic to prevent panics and add regression tests

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:58:38 -04:00

433 lines
14 KiB
Rust
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

//! Health-score calibration regression net (v3.5).
//!
//! Pins synthetic reference scenarios catalogued in
//! `docs/health-score-audit.md` to expected score bands. When a
//! constant or weight in `src/server/health.rs` changes, this test
//! fails fast if the change silently re-grades the boundary cases.
//!
//! Bands are deliberately wide (±5 points around the calibration
//! number) so honest curve-shape adjustments don't trip the test —
//! it's a "did weights silently change everyone's grade?" guard, not
//! an exact-output snapshot.
//!
//! v3.5 protections this test enforces:
//!
//! 1. **No-HIGH floor.** Any repo with `effective_high == 0` grades
//! ≥ C (70) regardless of MEDIUM/LOW/quality volume.
//! 2. **Quality lints saturate.** 1000 quality lints don't grade
//! worse than ~200 quality lints (drag caps at 15 points).
//! 3. **HIGH ceiling honours credibility.** Five low-credibility
//! HIGHs (low conf + AST-only) collapse to ~1 effective HIGH.
//! 4. **Test-path discount.** Same finding in a test path grades
//! better than in a production path.
//! 5. **Confirmed HIGH costs more than NotAttempted HIGH.** Symex-
//! confirmed findings are full credibility; AST-only HIGHs are
//! discounted.
use nyx_scanner::commands::scan::Diag;
use nyx_scanner::evidence::{Confidence, Evidence, SymbolicVerdict, Verdict};
use nyx_scanner::patterns::{FindingCategory, Severity};
use nyx_scanner::server::health::{HealthInputs, compute};
use nyx_scanner::server::models::{BacklogStats, FindingSummary, HealthScore};
// ── Helpers ──────────────────────────────────────────────────────────────────
fn diag(severity: Severity, id: &str, conf: Option<Confidence>) -> Diag {
Diag {
path: "src/lib.rs".into(),
line: 1,
col: 1,
severity,
id: id.into(),
category: FindingCategory::Security,
path_validated: false,
guard_kind: None,
message: None,
labels: Vec::new(),
confidence: conf,
evidence: None,
rank_score: None,
rank_reason: None,
suppressed: false,
suppression: None,
rollup: None,
finding_id: String::new(),
alternative_finding_ids: Vec::new(),
}
}
fn diag_at(path: &str, severity: Severity, conf: Option<Confidence>) -> Diag {
let mut d = diag(severity, "rs.taint.x", conf);
d.path = path.into();
d
}
fn with_verdict(mut d: Diag, verdict: Verdict) -> Diag {
// Add a single flow step so context_factor sees this as a real
// taint flow (1.0×) rather than AST-only (0.75×). Confirmed +
// intra-file flow puts credibility at 1.2.
let ev = Evidence {
symbolic: Some(SymbolicVerdict {
verdict,
constraints_checked: 0,
paths_explored: 0,
witness: None,
interproc_call_chains: Vec::new(),
cutoff_notes: Vec::new(),
}),
flow_steps: vec![nyx_scanner::evidence::FlowStep {
step: 0,
kind: nyx_scanner::evidence::FlowStepKind::Source,
file: d.path.clone(),
line: d.line as u32,
col: d.col as u32,
snippet: None,
variable: None,
callee: None,
function: None,
is_cross_file: false,
}],
..Default::default()
};
d.evidence = Some(ev);
d
}
fn summary_of(findings: &[Diag]) -> FindingSummary {
let mut s = FindingSummary {
total: findings.len(),
..Default::default()
};
for d in findings {
*s.by_severity
.entry(d.severity.as_db_str().to_string())
.or_insert(0) += 1;
}
s
}
fn first_scan<'a>(
summary: &'a FindingSummary,
findings: &'a [Diag],
triage: f64,
files: u64,
) -> HealthInputs<'a> {
HealthInputs {
summary,
findings,
triage_coverage: triage,
new_since_last: 0,
fixed_since_last: 0,
reintroduced: 0,
repo_files: Some(files),
backlog: None,
has_history: false,
blanket_suppression_rate: None,
}
}
fn assert_band(case: &str, score: u8, low: u8, high: u8) {
assert!(
score >= low && score <= high,
"[calibration] {case}: score {score} outside band [{low}, {high}]"
);
}
fn sev(h: &HealthScore) -> u8 {
h.components
.iter()
.find(|c| c.label == "Severity pressure")
.unwrap()
.score
}
// ── Calibration cases (synthetic, mirror docs/health-score-audit.md) ─────────
#[test]
fn calibration_clean_first_scan() {
let findings: Vec<Diag> = vec![];
let s = summary_of(&findings);
let h = compute(&first_scan(&s, &findings, 0.0, 100));
assert_band("clean first scan", h.score, 95, 100);
assert_eq!(h.grade, "A");
}
#[test]
fn calibration_one_high_no_evidence_caps_at_b() {
// Single HIGH, no evidence (AST-only) → credibility 0.75 →
// effective_high = 1 → ceiling 85 → at most B.
let findings = vec![diag(Severity::High, "rs.taint.x", Some(Confidence::High))];
let s = summary_of(&findings);
let h = compute(&first_scan(&s, &findings, 0.0, 100));
assert_band("1 HIGH (AST-only)", h.score, 80, 89);
assert_ne!(h.grade, "A");
}
#[test]
fn calibration_one_confirmed_high_caps_at_b() {
// Same single HIGH but symex Confirmed → credibility 0.9 (1.0 ×
// 1.0 × 1.0 cross-file? no, no flow_steps means context=0.75).
// Actually no flow_steps + Confirmed verdict is unusual but test
// the math anyway.
let findings = vec![with_verdict(
diag(Severity::High, "rs.taint.x", Some(Confidence::High)),
Verdict::Confirmed,
)];
let s = summary_of(&findings);
let h = compute(&first_scan(&s, &findings, 0.0, 100));
assert_band("1 confirmed HIGH", h.score, 80, 89);
assert_ne!(h.grade, "A");
}
#[test]
fn calibration_three_high_caps_below_b() {
// 3 HIGHs all credible → effective_high ~3 → ceiling 68 → max D+.
let findings: Vec<Diag> = (0..3)
.map(|_| {
with_verdict(
diag(Severity::High, "rs.taint.x", Some(Confidence::High)),
Verdict::Confirmed,
)
})
.collect();
let s = summary_of(&findings);
let h = compute(&first_scan(&s, &findings, 0.0, 100));
assert_band("3 confirmed HIGHs", h.score, 50, 68);
assert!(matches!(h.grade.as_str(), "D" | "F"));
}
#[test]
fn calibration_six_confirmed_high_grades_f() {
let findings: Vec<Diag> = (0..6)
.map(|_| {
with_verdict(
diag(Severity::High, "rs.taint.x", Some(Confidence::High)),
Verdict::Confirmed,
)
})
.collect();
let s = summary_of(&findings);
let h = compute(&first_scan(&s, &findings, 0.0, 1000));
assert_eq!(h.grade, "F");
assert!(h.score <= 58, "6+ confirmed HIGHs ≤58, got {}", h.score);
}
#[test]
fn calibration_no_high_floor_holds_at_c() {
// Pile of mediums + LOWs + quality. Without the floor the
// density math would crater this to F. With the floor: ≥70 (C).
let mut findings: Vec<Diag> = (0..200)
.map(|_| diag(Severity::Medium, "rs.taint.x", Some(Confidence::High)))
.collect();
findings.extend(
(0..2000).map(|_| diag(Severity::Low, "rs.quality.unwrap", Some(Confidence::High))),
);
findings.extend((0..50).map(|_| diag(Severity::Low, "rs.taint.low", Some(Confidence::Medium))));
let s = summary_of(&findings);
let h = compute(&first_scan(&s, &findings, 0.0, 200));
assert!(
h.score >= 65,
"0 HIGH must grade ≥C-ish even with high noise, got {}",
h.score
);
}
#[test]
fn calibration_thousand_low_only_floor_at_c() {
let findings: Vec<Diag> = (0..1000)
.map(|_| diag(Severity::Low, "rs.taint.foo", Some(Confidence::Medium)))
.collect();
let s = summary_of(&findings);
let h = compute(&first_scan(&s, &findings, 0.0, 200));
// No HIGH → floor 70. Density would naturally be lower.
assert!(
h.score >= 65,
"1000 LOW only floor protection, got {}",
h.score
);
}
#[test]
fn calibration_thousand_quality_only_grades_at_least_b() {
// 1000 quality lints, no security findings. Quality drag caps
// at 15. base ~100, drag = 15 → score ~85 (B). No-HIGH floor
// also applies but doesn't bind (85 > 70).
let findings: Vec<Diag> = (0..1000)
.map(|_| diag(Severity::Low, "rs.quality.unwrap", Some(Confidence::High)))
.collect();
let s = summary_of(&findings);
let h = compute(&first_scan(&s, &findings, 0.0, 100));
assert!(
h.score >= 80,
"1000 quality lints alone should grade ≥B, got {}",
h.score
);
}
#[test]
fn calibration_low_credibility_high_does_not_crater() {
// 5 raw HIGHs, all Low confidence, all AST-only (no evidence).
// credibility per: 1.0 (NotAttempted) × 0.3 (Low conf) × 0.75
// (AST-only) = 0.225. 5 × 0.225 = 1.125 → effective_high = 1.
// Ceiling 85. This is the FP-protection guarantee.
let findings: Vec<Diag> = (0..5)
.map(|_| {
let mut d = diag(Severity::High, "rs.taint.x", Some(Confidence::Low));
d.evidence = None;
d
})
.collect();
let s = summary_of(&findings);
let h = compute(&first_scan(&s, &findings, 0.0, 100));
assert!(
h.score >= 60,
"5 low-credibility HIGHs shouldn't crater to F, got {}",
h.score
);
assert!(
h.score <= 85,
"5 low-credibility HIGHs still capped, got {}",
h.score
);
}
#[test]
fn calibration_test_path_discounts_findings() {
let in_test = vec![diag_at(
"src/feature/__tests__/handler.test.ts",
Severity::High,
Some(Confidence::High),
)];
let in_prod = vec![diag_at(
"src/feature/handler.ts",
Severity::High,
Some(Confidence::High),
)];
let st = summary_of(&in_test);
let sp = summary_of(&in_prod);
let h_test = compute(&first_scan(&st, &in_test, 0.0, 50));
let h_prod = compute(&first_scan(&sp, &in_prod, 0.0, 50));
assert!(
h_test.score >= h_prod.score,
"test-path HIGH ({}) should grade ≥ prod HIGH ({})",
h_test.score,
h_prod.score
);
}
#[test]
fn calibration_density_is_size_aware_with_caps() {
// Same 3 HIGHs at varying repo sizes. Severity component score
// should not decrease as the repo gets bigger; should plateau
// past the file ceiling.
let findings: Vec<Diag> = (0..3)
.map(|_| diag(Severity::Medium, "rs.taint.x", Some(Confidence::High)))
.collect();
let s = summary_of(&findings);
let small = sev(&compute(&first_scan(&s, &findings, 0.0, 100)));
let mid = sev(&compute(&first_scan(&s, &findings, 0.0, 5000)));
let big = sev(&compute(&first_scan(&s, &findings, 0.0, 50_000)));
let huge = sev(&compute(&first_scan(&s, &findings, 0.0, 500_000)));
assert!(small <= mid, "small {} should ≤ mid {}", small, mid);
assert!(mid <= big, "mid {} should ≤ big {}", mid, big);
assert!(
(big as i32 - huge as i32).abs() <= 1,
"size-cap broken: big={} huge={}",
big,
huge
);
}
#[test]
fn calibration_triage_drops_when_total_under_floor() {
let findings: Vec<Diag> = (0..5)
.map(|_| diag(Severity::Low, "rs.x", Some(Confidence::High)))
.collect();
let s = summary_of(&findings);
let h = compute(&first_scan(&s, &findings, 0.0, 100));
let tri = h
.components
.iter()
.find(|c| c.label == "Triage coverage")
.unwrap();
assert_eq!(tri.weight, 0.0);
assert!(tri.detail.contains("Not applicable"));
}
#[test]
fn calibration_trend_drops_on_first_scan() {
let findings: Vec<Diag> = (0..30)
.map(|_| diag(Severity::Medium, "rs.x", Some(Confidence::High)))
.collect();
let s = summary_of(&findings);
let h = compute(&first_scan(&s, &findings, 0.5, 100));
let trend = h.components.iter().find(|c| c.label == "Trend").unwrap();
assert_eq!(trend.weight, 0.0);
assert!(trend.detail.contains("Not applicable"));
}
#[test]
fn calibration_stale_high_lowers_regression_component() {
let findings = vec![with_verdict(
diag(Severity::High, "rs.taint.x", Some(Confidence::High)),
Verdict::Confirmed,
)];
let s = summary_of(&findings);
let backlog_clean = BacklogStats {
oldest_open_days: Some(2),
median_age_days: Some(1),
stale_count: 0,
age_buckets: vec![],
};
let backlog_stale = BacklogStats {
oldest_open_days: Some(120),
median_age_days: Some(60),
stale_count: 3,
age_buckets: vec![],
};
let fresh_inputs = HealthInputs {
backlog: Some(&backlog_clean),
has_history: true,
..first_scan(&s, &findings, 0.0, 100)
};
let rotting_inputs = HealthInputs {
backlog: Some(&backlog_stale),
has_history: true,
..first_scan(&s, &findings, 0.0, 100)
};
let fresh = compute(&fresh_inputs);
let rotting = compute(&rotting_inputs);
let f_reg = fresh
.components
.iter()
.find(|c| c.label == "Regression resistance")
.unwrap()
.score;
let r_reg = rotting
.components
.iter()
.find(|c| c.label == "Regression resistance")
.unwrap()
.score;
assert!(
r_reg < f_reg,
"stale should lower regression: fresh {} vs rotting {}",
f_reg,
r_reg
);
}
#[test]
fn calibration_grade_thresholds_unchanged() {
// Sentinel: rebuilding the score from synthetic inputs that
// SHOULD land on a band boundary still does. This catches
// accidental threshold edits.
let findings: Vec<Diag> = vec![];
let s = summary_of(&findings);
let h = compute(&first_scan(&s, &findings, 0.0, 100));
// 0 findings, no history → expected grade A
assert_eq!(h.grade, "A");
}